The xlrd Module – How To Handle Excel Files In Python?

FeaImg Xlrd Module

Microsoft Excel is one of the most widely used applications for data analysis and visualization. Nearly every organization works with Excel files in some form. As a Python programmer, a useful skill to have is the ability to read, parse, and manipulate Excel files using Python.

In this tutorial, you’ll learn how to explore Excel files using the xlrd module in Python. xlrd allows you to extract data from Excel files (.xlsx and .xls) without needing to install MS Office.

We’ll cover:

  • Loading Excel files
  • Accessing worksheets
  • Reading cell values
  • Getting sheet dimensions
  • Printing column names
  • Extracting rows and columns

Whether you want to automate Excel reports, parse spreadsheet data, or simply open Excel files in Python, the xlrd module has you covered. Mastering xlrd will allow you to tap into a vast amount of data locked away in Excel spreadsheets.

So let’s get started!


What is Microsoft Excel?

Microsoft Excel is the world’s most popular spreadsheet software. Used by over a billion people worldwide, it helps users organize, analyze, and visualize data from an intuitive user interface.

MSExcel Logo

At its core, Excel is a table. It has the typical structure of a table including the rows and columns. But the formulas and functionality that you can integrate within each and every cell is what makes Excel an extremely powerful database/table.

With Excel, you can:

  • Perform complex calculations and analyses on massive datasets
  • Create stunning data visualizations like charts, graphs, and pivot tables 
  • Develop your own formulas and macros to automate repetitive tasks
  • Share and collaborate on spreadsheets with colleagues
  • Import and export data from various file formats

Excel has a simple yet powerful interface. While the basics are easy to pick up, Excel has many sophisticated features for handling large, complex datasets, creating advanced formulas, modeling data, and designing professional reports and dashboards.


Getting Started with the xlrd Module

The xlrd module allows you to extract data from Excel spreadsheets (.xls and .xlsx) without needing MS Office installed. With xlrd, you can read Excel files in Python and access sheets, rows, columns, and cells.

To get started, you first need to install xlrd. You can do this using pip:

pip install xlrd

Once installed, you can import the module into your Python script:

python
import xlrd

The primary objects in xlrd are:

  • Book – Represents the Excel workbook (the file itself).
  • Sheet – Represents one worksheet in a workbook.
  • Cell – Represents one cell in a sheet. Contains row, column, and cell value.
  • Row and Column – Represent a single row or column in a sheet.      

Working with Excel Files Using the xlrd Module

We would be working on a few things with the help of the xlrd module one after another. Before doing anything let’s look at the dataset we would be using.

You can use any dataset you find suitable or create your own custom dataset. I downloaded a random finance dataset which can be downloaded here.

Financial Data Xlrd Module
Exploring a Sample Financial Dataset

Step 1: Load an Excel File

Initially, we would be importing the xlrd module and then use the open_workbook function under the xlrd module in order to load the workbook whose path is mentioned inside the function.

# Importing Module
import xlrd
# Loading Excel file
wb = xlrd.open_workbook('Financial Sample.xlsx')

Step 2: Access a Specific Worksheet

Now there can be multiple workbooks inside a single excel file but we want to work on the first workbook and for the same, we will make use of the sheet_by_index function and pass the index of the sheet we want to work on.

# Storing the first sheet into a variable
sheet = wb.sheet_by_index(0)

Printing value of a cell

After loading a certain workbook, we would like to print the value of a certain cell using the cell_value function which would require the row and column number of the cell we want to read.

# Printing various cell values
print("Value of 0-0 cell: ",sheet.cell_value(0, 0))
print("Value of 20-4 cell: ",sheet.cell_value(20, 4))

The output after execution is:

Value of 0-0 cell:  Segment
Value of 20-4 cell:  1006.0

Printing the Number of rows and columns

In order to print the number of rows and columns in the excel file, we would make use of the ‘nrows’ and ‘ncols’ function.

# Get max no of rows and columns
print("Number of Rows: ", sheet.nrows)
print("Number of Columns: ",sheet.ncols)

The output after execution is:

Number of Rows:  701
Number of Columns:  16

Printing the Names of All the Columns

This can be achieved using the code mentioned below. We just need to print the value of each column cell in the first row.

# Get all column names
print("ALL COLUMN NAMES ARE: ")
for i in range(sheet.ncols):
    print(sheet.cell_value(0,i))

The output after execution is shown below:

ALL COLUMN NAMES ARE: 
Segment
Country
Product
Discount Band
Units Sold
Manufacturing Price
Sale Price
Gross Sales
Discounts
 Sales
COGS
Profit
Date
Month Number
Month Name
Year

Printing the first 10 rows of 5 columns

This can be achieved using the code mentioned below. You can print multiple rows and columns if you wish to.

# Get first 10 rows for 5 columns
for i in range(11):
    for j in range(5):
        print(sheet.cell_value(i,j), end="\t\t\t")
    print()

The output is mentioned below:

Segment			Country			Product			Discount Band			Units Sold			
Government			Canada			Carretera			None			1618.5			
Government			Germany			Carretera			None			1321.0			
Midmarket			France			Carretera			None			2178.0			
Midmarket			Germany			Carretera			None			888.0			
Midmarket			Mexico			Carretera			None			2470.0			
Government			Germany			Carretera			None			1513.0			
Midmarket			Germany			Montana			None			921.0			
Channel Partners			Canada			Montana			None			2518.0			
Government			France			Montana			None			1899.0			
Channel Partners			Germany			Montana			None			1545.0	

Conclusion

In this tutorial, you learned the basics of using xlrd to explore and extract data from Excel files in Python. We covered loading workbooks, accessing sheets, reading cell values, getting sheet dimensions, printing column names, and extracting rows and columns.

With these skills, you can now automate Excel reporting, parse spreadsheet data, and unlock insights trapped in Excel files. xlrd opens up a whole new world of data that you can analyze and visualize using the power of Python.

You now have a powerful tool to extract data from Excel files in Python. I hope you found this introduction to xlrd useful!

Liked the tutorial? Have a look at these tutorials:

  1. Plot data from Excel Sheet using Python
  2. Read Text Files Using Pandas – A Brief Reference