How to read MATLAB files in Python (MAT Files in Python)

How To Work With Mat Files In Python

A large number of datasets, especially those used for research and machine learning projects, utilize .mat files, often stemming from MATLAB origins.  In this tutorial, we’ll learn how to read MATLAB files using Python and explore their structure and data in detail.

Why Use MATLAB 7.3 Format Mat Files in Python?

Why Use MATLAB 7.3 Format Mat Files in Python?

The key purpose of a MATLAB .mat file, especially the 7.3 format, in Python may not seem obvious at first glance. But when working with large datasets, the information contained within these files is absolutely crucial for data science/machine learning projects!

This is because the .mat files, often used in different programming languages, contain the metadata and key information of every object or variable in the dataset.

While the files are not exactly designed for the sole purpose of creating annotations, a lot of researchers use MATLAB for their research and data collection, causing a lot of the annotations that we use in Machine Learning to be present in the form of .mat files.

So, for anyone diving into data science, it’s pivotal to understand how to efficiently load and convert .mat files in Python for your projects. These also help you better work with training and testing data sets instead of working with regular CSV files.

Let’s get started!

Read .mat Files in Python Using Different Programming Libraries

By default, when working with different file formats in Python, the programming language isn’t equipped to read MATLAB .mat files. To read the MAT file in Python, we’ll have to import a specific library, like scipy, which provides the functionality to handle this file type.

1. Installing Scipy Version: A Key Python Library for MAT-Files

Similar to how we use the CSV module to work with .csv files, we’ll import the scipy libary to work with .mat files in Python.

If you don’t already have scipy, you can use the pip command to install the same

pip install scipy

Having set up scipy in your python script is to launch your Python script, utilize the load MATLAB functionality, and extract the desired data from the .mat file.

2. Using scipy.io.loadmat to Import MATLAB Variables and Annotations in Python

In this annotation example, I will be using the accordion annotations provided by Caltech, in 101 Object Categories.

from scipy.io import loadmat
annots = loadmat('annotation_0001.mat')
print(annots)

Upon execution, printing out annots would provide us with this as the output.

{'__header__': b'MATLAB 5.0 MAT-file, Platform: PCWIN, Created on: Tue Dec 14 15:57:03 2004', '__version__': '1.0', '__globals__': [], 'box_coord': array([[  2, 300,   1, 260]], dtype=uint16), 'obj_contour': array([[ 37.16574586,  61.94475138,  89.47697974, 126.92081031,
        169.32044199, 226.03683241, 259.07550645, 258.52486188,
        203.46040516, 177.5801105 , 147.84530387, 117.0092081 ,
          1.37384899,   1.37384899,   7.98158379,   0.82320442,
         16.2412523 ,  31.65930018,  38.81767956,  38.81767956],
       [ 58.59300184,  44.27624309,  23.90239411,   0.77532228,
          2.97790055,  61.34622468, 126.87292818, 214.97605893,
        267.83793738, 270.59116022, 298.67403315, 298.67403315,
        187.99447514,  94.93554328,  90.53038674,  77.31491713,
         62.44751381,  62.99815838,  56.94106814,  56.94106814]])}

Initially, as we load the MATLAB mat-file of version 7.3 format, you can discern that this single .mat file imparts details about the version of MATLAB, the platform it was created on, the specific date of its inception, and much more.

The part that we should be focusing on is, however, the box_coord, and the obj_contour.

3. Parsing the 7.3 Format MATLAB File: Extracting Object Contours and Data Structures

If you’ve gone through the information regarding the Annotations provided by Caltech, you’d know that these numbers are the outlines of the corresponding image in the dataset.

In a little more detail, this means that the object present in image 0001, consists of these outlines. A little further down in the article, we’ll be sorting through the numbers, so, don’t worry about it for now.

Parsing through this file structure, we could assign all the contour values to a new Python list.

con_list = [[element for element in upperElement] for upperElement in annots['obj_contour']]

If we printed out con_list, we would receive a simple 2D array.

[[37.16574585635357, 61.94475138121544, 89.47697974217309, 126.92081031307546, 169.32044198895025, 226.03683241252295, 259.0755064456721, 258.52486187845295, 203.4604051565377, 177.58011049723754, 147.84530386740326, 117.0092081031307, 1.3738489871086301, 1.3738489871086301, 7.98158379373848, 0.8232044198894926, 16.24125230202577, 31.65930018416205, 38.81767955801104, 38.81767955801104], [58.59300184162066, 44.27624309392269, 23.90239410681403, 0.7753222836096256, 2.9779005524862328, 61.34622467771641, 126.87292817679563, 214.97605893186008, 267.83793738489874, 270.59116022099454, 298.6740331491713, 298.6740331491713, 187.9944751381216, 94.93554327808477, 90.53038674033152, 77.31491712707185, 62.44751381215474, 62.998158379373876, 56.94106813996319, 56.94106813996319]]

4. Use Pandas Dataframes to Change MAT Data into Python DataFrames

Now that you have the information and have managed to get the data, how would you work with it? Continue to use lists? Definitely not.

We use data structure like Dataframes as the structure to work with, in that it functions much like a table of data. Neat to look at, and extremely simple to use.

Now, to work with Dataframes, we’ll need to import yet another module, Pandas.

import pandas as pd

Pandas is an open source data analysis tool, that is used by machine learning enthusiasts and data scientists throughout the world. The operations provided by it are considered vital and fundamental in a lot of data science applications.

We’ll only be working with DataFrames in this article, but, keep in mind that the opportunities provided by Pandas are immense.

Working with the data we’ve received above can be simplified by using pandas to construct a data frame with rows and columns for the data.

# zip provides us with both the x and y in a tuple.
newData = list(zip(con_list[0], con_list[1]))
columns = ['obj_contour_x', 'obj_contour_y']
df = pd.DataFrame(newData, columns=columns)

Now, we have our data in a neat DataFrame!

    obj_contour_x  obj_contour_y
0       37.165746      58.593002
1       61.944751      44.276243
2       89.476980      23.902394
3      126.920810       0.775322
4      169.320442       2.977901
5      226.036832      61.346225
6      259.075506     126.872928
7      258.524862     214.976059
8      203.460405     267.837937
9      177.580110     270.591160
10     147.845304     298.674033
11     117.009208     298.674033
12       1.373849     187.994475
13       1.373849      94.935543
14       7.981584      90.530387
15       0.823204      77.314917
16      16.241252      62.447514
17      31.659300      62.998158
18      38.817680      56.941068
19      38.817680      56.941068

As you can see, we have the X and Y coordinates for the image’s outline in a simple DataFrame of two columns.

This should provide you with some clarity about the nature of the data in the file.

The process of creating DataFrames for each .mat file is different but, with experience and practice, creating them out of .mat files should come naturally to you.

That’s all for this article!

Wrapping Up: Using MATLAB Data for Machine Learning Projects

With this tutorial under your belt, you now possess the knowledge to work with MATLAB files in Python and harness the power of pandas to structure this data into dataframes.

The next steps to work with this data would be to and create your own models, or employ existing ones for training or testing your copy of the dataset.

References

  1. Official Scipy.io Documentation
  2. Official Pandas DataFrame Documentation