An Excel file is a spreadsheet file containing some cells in rows and columns (Tabular view) and can help in the arrangement, calculation, sorting, and managing of data. The data in the spreadsheet may be numeric, text, formulas, hyperlinks, functions, etc. An XLS file stores data as binary streams. It can only be created by the most popular MS Excel or other spreadsheet programs. The file format .xlsx always indicates an excel file on its own.
The following image depicts an excel file created by the MS-excel program:
How to read Excel files using Python
To read excel files using Python, we need to use some popular Python modules and methods. Let’s understand those as well.
Using Python xlrd module
xlrd is a python library or module to read and manage information from Excel files ( i.e. files in .
xlsx format ). This Module will not be applicable for anything other than .
Let’s have a quick look at how to install
C:\Users\pc> pip install xlrd
As you are using python, You must have downloaded the
pip package installer. You can also use another Python package manager of your choice.
In this method, We are going to use
xlwings module along with the method associated with it (i.e.
This method will automatically open our .xlsx in the background for us in its original program (i.e. MS-Excel) where we can operate and manage our data.
#importing the xlwings module as xw
import xlwings as xw
#you can get ur excel file already been opened after execution of this command
ws = xw.Book("C:\\Users\\pc\\Desktop\\students.xlsx").sheets['Sheet1']
From the above code snippet, We can get our Excel automatically opened on our desktop where we can access it.
Using Python pandas module
Pandas is an open-source Python library or module that provides in-built high-performance data structures and data analysis tools. It is most preferably used to analyze data along with two other core python libraries-
Matplotlib for data visualization and
NumPy for mathematical operations.
We are going to install this module in the same way as our previous module using the pip installer as follows.
C:\Users\pc> pip install pandas
The above code snippet will install the
pandas module for us as follows.
To read excel files, let’s run the following snippet of code.
# importing pandas module as pd
import pandas as pd
#using read_excel() method to read our excel file and storing the same in the variable named "df "
df = pd.read_excel("C:\\Users\\pc\\Desktop\\students.xlsx")
#printing our spreadsheet using print() method
In the above method, We are using
read_excel() method to read our .
xlsx file. We can use this method along with the pandas module as
panda.read_excel() to read the excel file data into a DataFrame object (Here it is ‘
The above code snippet will print our spreadsheet as follows.
Using Python openpyxl module
Openpyxl is a Python library or module used to read or write from an Excel file. This module needs to be installed to use certain methods like load_workbook(), otherwise, we can’t use those methods, It will throw
error. Let’s install this module using our command prompt.
C:\Users\pc> pip install openpyxl
The above code snippet will install our
openpyxl module as follows.
In our second method, We are going to use our openpyxl module along with load_workbook() method as our following code snippet.
# importing openpyxl module
#using load_workbook() method to read our excel file and storing to dataframe object table1
table1 = openpyxl.load_workbook("C:\\Users\\pc\\Desktop\\students.xlsx")
#To access the table1 we need to activate and store to an another object (Here it is table2)
table2 = table1.active
for row in range(1, table2.max_row):
for col in table2.iter_cols(1, table2.max_column):
print(col[row].value, end = " ")
In the above code snippet, We are using load_workbook() method to read our required excel file along with openpyxl module. We couldn’t use this method without importing this library or module. Not only this, This module is responsible for reading the location as a parameter( Here it is “C:\Users\pc\Desktop\students.xlsx” ) in our read_excel() method.
After reading our excel file and assigning it to table1, It needs to be activated. Otherwise, If we print the table1, the following output occurs.
We are going to access table2 by using for loop as the above code snippet. We will get our results as follows.
In this article, We covered Different methods to read our Excel file using Python. We discussed some popular Modules along with some required methods of Python for our appropriate output. Hope You must have practiced and enjoyed our code snippets. We must visit again with some more exciting topics.