Using a Pandas Data Frame Index for X-Axis in Matplotlib Plot

Using Index As The Values Of X Axis In Plot

A Data Frame Index is a column in the data frame that represents the data frame as a whole. It can be specified while creating the data frame or we can even set the index after analyzing it.

While we are talking about the index of a data frame, it is essential to know what a data frame is. A data frame is a common storage unit of the Pandas library that is similar to a table. That is, it stores data in rows and columns.

The index of a data frame can be any column that is found relevant to the data. It can also take many forms. An index can be numeric data, a string literal, a datetime entity, and so on.

The Index of a data frame is its most crucial feature. We can choose the column that best describes the data frame as an index. If the Index is chosen correctly, it might help us in understanding the data frame better. We can visualize and manipulate the data if we understand what the data frame holds.

Coming to visualization, the Matplotlib library of Python is very much useful in carrying out data visualization and manipulation tasks. We can also visualize a data frame with the help of this library. When we try to visualize the data frame, we can also use its index as values for the X-axis while plotting.

This article focuses on the key concepts of a data frame and its index, and how we can use this index as values for the X-axis in plotting a graph.

What Is a Data Frame?

As discussed above, a data frame is a storage unit that stores data across multiple rows and columns. It can store heterogeneous data which means, a data frame contains data of multiple types. While the header row contains a string data type, the elements inside can be numerical.

The pd.DataFrame method is used to return a data frame from data structures like lists, dictionaries, and a list of dictionaries. A data frame can also be created in Excel format, CSV format, and so on.

Let us see how we can obtain a data frame from a CSV file. For this, we need a CSV dataset to be downloaded into our environment.

Read this post to know how to concatenate multiple CSV files in one data frame.

The data set we are going to use is a popular anime series- ONE PIECE. This data set has the following columns- Episode name, Episode rating, Episode rank, year of commencement, and so on.

Let us see how we can create a data frame out of this data set.

 import pandas as pd
data=pd.read_csv('/ONE PIECE.csv')
df=pd.DataFrame(data)
df

First of all, we need to import the Pandas library to create a data frame. Next, the CSV dataset we downloaded is being read with the help of read_csv method of the Pandas library and is stored in a variable called data.

This data is passed as an argument to another method pd.DataFrame to create a data frame. The last line prints the data frame.

Data Frame
Data Frame

As you can see the data frame is huge We don’t need the entire data frame to analyze it. We also can print the first five rows or the last five rows of the data frame to perform analysis.

Let us see how we can display selected rows of a data frame.

df.head()

The head attribute of the data frame prints the first five rows of the data frame. Similarly, the tail attribute prints the last five rows of the data frame.

First Five Rows Of A Data Frame
First Five Rows Of A Data Frame

How to Set the Index of a Data Frame?

There are a few ways to name a column of the data frame as its Index. Let us take a look at these methods.

Read this post to learn how to get the index of a data frame.

Using a List of Values as Index

For a data frame, we can create a list of elements to be used as the index column.

import pandas as pd
groc = {'Food': ['Tacos','Mac and Cheese','Carbonara','Lasagna','Croissant'],
        'Calories': [226,164,574,135,406]}
index1 = ['Food_A','Food_B','Food_C','Food_D','Food_E']
df = pd.DataFrame(groc, index=index1)
print(df)

First, we are importing the Pandas library as pd. Then, we created a dictionary called groc that contains some food items and their calories respectively.

Next, we created a list called index1 that contains elements in the form of Food_A to Food_E. This list is used as the index for the data frame.

The above dictionary is rendered as a data frame with index1 as the index to the data frame.The last line prints the data frame.

Setting A List As Index
Setting A List As Index

Using the Set Index Method

In the last example, we have how to add an index to the data frame that does not contain the index column in the data frame. The set_index method of the pandas library is used to set the column that is existing in a data frame as the index.

The set_index method can take a single column or multiple existing columns to set an index.

import pandas as pd
sal = {'year': [2016,2017,2018, 2019, 2020, 2021,2022,2023],
        'sales': [120,100, 150, 200, 250,220,270,300]}
df = pd.DataFrame(sal)
df.set_index('year', inplace=True)
print(df)

Like always, we need to first import the Pandas library. Next, a dictionary called sal is created which consists of the details of the year-wise sales of a company.

This dictionary is then used to create a data frame called df. Next, the set_index method is used to set the year column of the data frame as an index.

Lastly, we are printing the data frame.

Using The Set Index Method
Using The Set Index Method

How to Use the Index as Values of X-Axis?

Matplotlib is a famous library used for the visualization of data. We can also visualize and analyze a data frame using this library.

Now when we are plotting a data frame, we can use the Index of it as the values for the X-axis.

Read this article to learn about the matplotlib library.

Using plt.plot to Set Index as X-Axis Values

After creating a data frame and setting the index, we can plot a graph and use the index column for naming the X-axis values.

import pandas as pd
import matplotlib.pyplot as plt
sal = {'year': [2016,2017,2018, 2019, 2020, 2021,2022,2023],
        'sales': [120,100, 150, 200, 250,220,270,300]}
df = pd.DataFrame(sal)
df.set_index('year', inplace=True)
plt.plot(df.index, df['sales'])
plt.xlabel('Year')
plt.ylabel('Sales')
plt.title('Year-wise sales')
plt.show()

We have taken the same sales data and specified the index to be the year column. Next, we are using the plot method of the matplotlib library. The parameters passed to this method are the index of the data frame df and the y-axis is specified to be the sales column of the data frame.

The label of the X-axis is mentioned to be Year and the Y-axis is Sles. The title of the plot is given as Year-wise sales.

Then. the show() method of the library is used to display the graph.

Using Plot Method
Using Plot Method

Using the data frame.plot Method

Similar to the plot function of the matplotlib library, the Pandas library also has a property called df.plot that is used to visualize a data frame or a series.

Let us see how we can use this function to plot a data frame.

import pandas as pd
import matplotlib.pyplot as plt
groc = {'Food': ['Tacos','Mac and Cheese','Carbonara','Lasagna','Croissant'],
        'Calories': [226,164,574,135,406]}
df = pd.DataFrame(groc)
df.set_index('Food', inplace=True)
df.plot(kind='line', marker='D')
plt.title('Different foods and their calories')
plt.xlabel('Food Item')
plt.ylabel('Calories')
plt.show()

We have used the grocery data frame for this example. The index of this data frame is set to be the Food column of it.

The df.plot method is used to plot the Food column on X-axis and the Calories on Y-axis in a line plot. The marker component is used to design the marker style. In this case, D represents a diamond.

In the next three lines, we are specifying the title and the label names for X and Y axis

The show method is used to display the graph.

Using Df Plot
Using Df Plot

Using the Xticks to Set the Index

The xticks method of the matplotlib library is used to interpret the X-axis of a graph or plot easily. They also help in understanding the scale of the plot.

import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'A': [1, 2, 3, 4, 5],
                   'B': [6, 7, 8, 9, 10]},
                  index=pd.date_range('2022-03-04', '2022-03-08'))
df.plot(y=['A', 'B'], kind='line', marker='x')
plt.xticks(ticks=df.index, labels=df.index.strftime('%Y-%m-%d'))
plt.title('Using xticks')
plt.xlabel('Date')
plt.ylabel('Value')
plt.show()

In this code, we are creating a data frame of random values that has the index specified to date between the third of March 2022 and till eighth of March 2022.

Then. df.plot is used to plot the two columns of the data frame in the y-axis and the label of the x-axis is specified to be the index of the data frame. The dates are converted into a compatible format using the strfttime function.

The title of the graph is given as ‘Using xticks and the labels of the x and y axis are given as Date and Value respectively.

Lastly, the show method is used to display the graph.

Using Xticks To Set Index
Using Xticks To Set Index

That’s a Wrap!

To conclude, we have discussed what is a data frame and how we can create a data frame from a CSV file. This CSV file is an anime data set that has the columns like the year of release of the anime, its popularity, and so on.

Next, we learned how to set the index for a data frame in two different methods. The first method is used to set the index of a data frame using a list of elements.

The second method used a method called set_index that makes a column already existing in the data frame as an index.

Coming to the main point of this post, we have seen three approaches to set the index as the values for X-axis when we plot the data frame using the matplotlib.

The first method uses plt.plot which plots the index of the data frame which is pre-declared as the X-axis and another column as the Y-axis.

The next method we used is the df.plot that is primarily used to plot a data frame or a series.

Lastly, we have used the xticks function of the matplotlib to understand the scale of the plot better.

References

Get the anime data set from here.

You can learn more about the set_index method from the Pandas documentation.

Learn more about the df.plot method.

You can find more about the xticks method here.

Refer to this stack overflow answer chain on the same topic.