Converting Excel data to SQL(PostgreSQL) Table in Python

Vikas Maurya
2 min readFeb 23, 2021
Excel to SQL Tables

Introduction

In the real world, data is mostly in the format of files i.e. Excel sheets. These excel sheets are not well structured and it contains multiple sheets, which is very complex to handle and hard to visualize (larger data) with using cross sheets with some complex conditions. SQL does this task very easily with schema structure and joins.

SQL is a Structured Query Language, which is the base of all the SQL Database. for e.g. PostgreSQL, MongoDB, Microsoft SQL Server, etc.

SQL is much faster than Excel sheets. It can take hours to do a task in Excel and in SQL it takes minutes to perform the same task.

Every big company uses databases to handle data. So there is a requirement of converting this excel sheet data into SQL Tables.

Steps to convert the Excel data to SQL Table

Prerequisites

Check that you have all the required libraries.

# To create the data frame into Table

from sqlalchemy import create_engine

#For manipulating data frame

import pandas as pd

Step1:

First load the excel sheet.

Dataframe=pd.read_excel(filename,index_col=0,header=[1,4],sheet_name=’sheet_name_which_is_to_be_loaded’)

index_col = 0 => It will set first column as index column.

header =[1,4] => It will set first 4 row as header which is column names.

Step2:

Format the excel sheet so that columns contain the same datatypes and also handle the missing values.

You have four main options for converting types in pandas:

1. to_numeric() - provides functionality to safely convert non-numeric types (e.g. strings) to a suitable numeric type. (See also to_datetime() and to_timedelta().)

2. astype() - convert (almost) any type to (almost) any other type (even if it's not necessarily sensible to do so). Also allows you to convert to categorial types (very useful).

3. infer_objects() - a utility method to convert object columns holding Python objects to a pandas type if possible.

4. convert_dtypes() - convert DataFrame columns to the "best possible" dtype that supports pd.NA (pandas' object to indicate a missing value).

Handling Missing Values:

Dataframe.fillna(Dataframe.mean())

fillna => It will fill the null value with the mean of that column.

For string type, we can replace it with string to handle missing or null values.

Step3:

Create Engine in sqlalchemy

engine = create_engine(r’postgresql://user_name:password@server_ip_address:port_number/Database_name’, echo = True)

create_engine => It will create the engine for postgresql that will connect the server to the postgreSQL .

Type your username, password, server IP address, Port number, and Database Name to connect that specified server.

Dataframe.to_sql(‘table_name’,engine,if_exists=’replace’)

to_sql => It will convert the Excel sheet into specified table name.

if_exists=>It will replace the Table to handle multiple execution of program.

Replace will drop the table before inserting new values.

To Execute SQL query using the following command:

engine.execute(“Write the SQL Query”)

You can read more about the to_sql from the below link:

https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.DataFrame.to_sql.html

Read more about sqlalchemy:

https://www.compose.com/articles/using-postgresql-through-sqlalchemy/

--

--