Microsoft Excel is one of the most widely used applications for data analysis and visualization. Nearly every organization works with Excel files in some form. As a Python programmer, a useful skill to have is the ability to read, parse, and manipulate Excel files using Python.
In this tutorial, you’ll learn how to explore Excel files using the xlrd module in Python. xlrd allows you to extract data from Excel files (.xlsx and .xls) without needing to install MS Office.
- Loading Excel files
- Accessing worksheets
- Reading cell values
- Getting sheet dimensions
- Printing column names
- Extracting rows and columns
Whether you want to automate Excel reports, parse spreadsheet data, or simply open Excel files in Python, the xlrd module has you covered. Mastering xlrd will allow you to tap into a vast amount of data locked away in Excel spreadsheets.
So let’s get started!
What is Microsoft Excel?
Microsoft Excel is the world’s most popular spreadsheet software. Used by over a billion people worldwide, it helps users organize, analyze, and visualize data from an intuitive user interface.
At its core, Excel is a table. It has the typical structure of a table including the rows and columns. But the formulas and functionality that you can integrate within each and every cell is what makes Excel an extremely powerful database/table.
With Excel, you can:
- Perform complex calculations and analyses on massive datasets
- Create stunning data visualizations like charts, graphs, and pivot tables
- Develop your own formulas and macros to automate repetitive tasks
- Share and collaborate on spreadsheets with colleagues
- Import and export data from various file formats
Excel has a simple yet powerful interface. While the basics are easy to pick up, Excel has many sophisticated features for handling large, complex datasets, creating advanced formulas, modeling data, and designing professional reports and dashboards.
Getting Started with the xlrd Module
The xlrd module allows you to extract data from Excel spreadsheets (.xls and .xlsx) without needing MS Office installed. With xlrd, you can read Excel files in Python and access sheets, rows, columns, and cells.
To get started, you first need to install xlrd. You can do this using pip:
pip install xlrd
Once installed, you can import the module into your Python script:
python import xlrd
The primary objects in xlrd are:
- Book – Represents the Excel workbook (the file itself).
- Sheet – Represents one worksheet in a workbook.
- Cell – Represents one cell in a sheet. Contains row, column, and cell value.
- Row and Column – Represent a single row or column in a sheet.
Working with Excel Files Using the xlrd Module
We would be working on a few things with the help of the xlrd module one after another. Before doing anything let’s look at the dataset we would be using.
You can use any dataset you find suitable or create your own custom dataset. I downloaded a random finance dataset which can be downloaded here.
Step 1: Load an Excel File
Initially, we would be importing the xlrd module and then use the
open_workbook function under the xlrd module in order to load the workbook whose path is mentioned inside the function.
# Importing Module import xlrd # Loading Excel file wb = xlrd.open_workbook('Financial Sample.xlsx')
Step 2: Access a Specific Worksheet
Now there can be multiple workbooks inside a single excel file but we want to work on the first workbook and for the same, we will make use of the
sheet_by_index function and pass the index of the sheet we want to work on.
# Storing the first sheet into a variable sheet = wb.sheet_by_index(0)
Printing value of a cell
After loading a certain workbook, we would like to print the value of a certain cell using the
cell_value function which would require the row and column number of the cell we want to read.
# Printing various cell values print("Value of 0-0 cell: ",sheet.cell_value(0, 0)) print("Value of 20-4 cell: ",sheet.cell_value(20, 4))
The output after execution is:
Value of 0-0 cell: Segment Value of 20-4 cell: 1006.0
Printing the Number of rows and columns
In order to print the number of rows and columns in the excel file, we would make use of the ‘nrows’ and ‘ncols’ function.
# Get max no of rows and columns print("Number of Rows: ", sheet.nrows) print("Number of Columns: ",sheet.ncols)
The output after execution is:
Number of Rows: 701 Number of Columns: 16
Printing the Names of All the Columns
This can be achieved using the code mentioned below. We just need to print the value of each column cell in the first row.
# Get all column names print("ALL COLUMN NAMES ARE: ") for i in range(sheet.ncols): print(sheet.cell_value(0,i))
The output after execution is shown below:
ALL COLUMN NAMES ARE: Segment Country Product Discount Band Units Sold Manufacturing Price Sale Price Gross Sales Discounts Sales COGS Profit Date Month Number Month Name Year
Printing the first 10 rows of 5 columns
This can be achieved using the code mentioned below. You can print multiple rows and columns if you wish to.
# Get first 10 rows for 5 columns for i in range(11): for j in range(5): print(sheet.cell_value(i,j), end="\t\t\t") print()
The output is mentioned below:
Segment Country Product Discount Band Units Sold Government Canada Carretera None 1618.5 Government Germany Carretera None 1321.0 Midmarket France Carretera None 2178.0 Midmarket Germany Carretera None 888.0 Midmarket Mexico Carretera None 2470.0 Government Germany Carretera None 1513.0 Midmarket Germany Montana None 921.0 Channel Partners Canada Montana None 2518.0 Government France Montana None 1899.0 Channel Partners Germany Montana None 1545.0
In this tutorial, you learned the basics of using xlrd to explore and extract data from Excel files in Python. We covered loading workbooks, accessing sheets, reading cell values, getting sheet dimensions, printing column names, and extracting rows and columns.
With these skills, you can now automate Excel reporting, parse spreadsheet data, and unlock insights trapped in Excel files. xlrd opens up a whole new world of data that you can analyze and visualize using the power of Python.
You now have a powerful tool to extract data from Excel files in Python. I hope you found this introduction to xlrd useful!
Liked the tutorial? Have a look at these tutorials: