Converting Excel data to SQL(PostgreSQL) Table in Python

Excel to SQL Tables

Introduction

In the real world, data is mostly in the format of files i.e. Excel sheets. These excel sheets are not well structured and it contains multiple sheets, which is very complex to handle and hard to visualize (larger data) with using cross sheets with some complex conditions. SQL does this task very easily with schema structure and joins.

SQL is a Structured Query Language, which is the base of all the SQL Database. for e.g. PostgreSQL, MongoDB, Microsoft SQL Server, etc.

SQL is much faster than Excel sheets. It can take hours to do a task in Excel and in SQL it takes minutes to perform the same task.

Every big company uses databases to handle data. So there is a requirement of converting this excel sheet data into SQL Tables.

Steps to convert the Excel data to SQL Table

Prerequisites

Check that you have all the required libraries.

# To create the data frame into Table

from sqlalchemy import create_engine

#For manipulating data frame

import pandas as pd

Step1:

First load the excel sheet.

Dataframe=pd.read_excel(filename,index_col=0,header=[1,4],sheet_name=’sheet_name_which_is_to_be_loaded’)

index_col = 0 => It will set first column as index column.

header =[1,4] => It will set first 4 row as header which is column names.

Step2:

Format the excel sheet so that columns contain the same datatypes and also handle the missing values.

You have four main options for converting types in pandas:

1. to_numeric() - provides functionality to safely convert non-numeric types (e.g. strings) to a suitable numeric type. (See also to_datetime() and to_timedelta().)

2. astype() - convert (almost) any type to (almost) any other type (even if it's not necessarily sensible to do so). Also allows you to convert to categorial types (very useful).

3. infer_objects() - a utility method to convert object columns holding Python objects to a pandas type if possible.

4. convert_dtypes() - convert DataFrame columns to the "best possible" dtype that supports pd.NA (pandas' object to indicate a missing value).

Handling Missing Values:

Dataframe.fillna(Dataframe.mean())

fillna => It will fill the null value with the mean of that column.

For string type, we can replace it with string to handle missing or null values.

Step3:

Create Engine in sqlalchemy

engine = create_engine(r’postgresql://user_name:password@server_ip_address:port_number/Database_name’, echo = True)

create_engine => It will create the engine for postgresql that will connect the server to the postgreSQL .

Type your username, password, server IP address, Port number, and Database Name to connect that specified server.

Dataframe.to_sql(‘table_name’,engine,if_exists=’replace’)

to_sql => It will convert the Excel sheet into specified table name.

if_exists=>It will replace the Table to handle multiple execution of program.

Replace will drop the table before inserting new values.

To Execute SQL query using the following command:

engine.execute(“Write the SQL Query”)

You can read more about the to_sql from the below link:

https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.DataFrame.to_sql.html

Read more about sqlalchemy:

https://www.compose.com/articles/using-postgresql-through-sqlalchemy/

Lead Data Scientist at Tata Power Ltd.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

“Friday Feels”-My Experience in Coding Bootcamp

High availability — The stuff a business is made or broken on

Septic Tank Pumping Thompson ND

Raiden Pulse #6: News from May and June

Deploying Unity games for PC

Enemy firing when detecting player

3 Red Flags to Watch Out for as a Junior Developer

SANS Holiday Hack Challenge 2020: Speaker UnPrep Door

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Vikas Maurya

Vikas Maurya

Lead Data Scientist at Tata Power Ltd.

More from Medium

How to Create and Manipulate SQL Databases with Python(Introduction to Python SQL Libraries)

Creating and Executing Python Machine Learning Code on SQL Server

Working with CSV files using Pandas

Migration of PostgreSQL using Pandas