In this article, we’ll talk about Pivot Tables in Python. We’ll implement the same using the pivot_table function in the Pandas module.
What is a Pivot Table?
Pivot Tables are a key feature of Microsoft Excel and one of the reasons that made excel so popular in the corporate world.
Pivot tables provide great flexibility to perform analysis of the data.
It allows us to transform columns into rows and rows into columns. As it allows grouping based on any parameter, it can be easily confused with pandas Groupby function.
Think of pivot tables as a multidimensional version of
Steps to Implement Pivot Tables in Python
Let’s get right into the implementation of Pivot tables in Python.
1. Load a dataset of your choice
We’ll be analyzing the Pokémon with stats dataset (Downloaded from Kaggle). You can get the dataset here.
Irrespective of whether you are fond of pokemon or not you can still get 100% from this article.
Gotta catch ‘em all… 🙂
import pandas as pd #importing the dataset data = pd.read_csv('Pokemon.csv') data.head()
2. Syntax of pivot_table() method
DataFrame.pivot_table(data, values=None, index=None,columns=None, aggfunc='mean')
pivot_table method on a dataframe, let’s breakdown the essential input arguments given to the method.
- data – it is the numerical column on which we apply the aggregation function. for example, sales, speed, price, etc.
- index – The columns which we want to convert as rows. we can pass multiple values as a list.
- columns – Column variables which we now wish to keep as Column.
- aggfunc – type of operation to perform on data. i.e. sum, mean, count, etc.
It will be a lot clearer with an Example.
3. Implementing pivot_tables in Python
Let’s say we need to find the average Speed of Pokémons belonging to Type-1.
It can be easily done using pandas
Groupby, but the same output can be achieved easily using
pivot_table with a much cleaner code.
With each example, we’ll slowly explore pivot_table in its full glory.
Similar result using
df.pivot_table('Speed' , index = ['Type 1'], aggfunc = 'mean')
4. Find the average speed using Pivot tables
But now if we were asked to find the average speed of each Type 1 Pokémon along with segregating them into Legendary and Non-Legendary.
df.pivot_table('Speed' , index = ['Type 1'] , columns= ['Legendary'] , aggfunc = 'mean')
Tada! With a single line of code we were able to achieve the desired results.
I couldn’t help but notice that on average The Legendary Pokémons were faster than the non-Legendary ones. Magic of pivot tables.
An important thing to understand here is that we needed the Type 1 column as our index so we passed it to the index argument in the pivot_table method.
But now, as each Type 1 Pokemon belonged to either Legendary category or Non-Legendary, we just transformed our data frame in a way so that it now shows an average speed of each type.
Now we have Legendary or Non-Legendary as features in our data.
As we needed ‘Legendary’ feature in our data as the column of new data, we simply needed to pass it in the
columns parameter. Let’s modify our dataset a bit and do some new analysis.
We now create bins of Speed variable as a new added feature.
df['speed_range'] = pd.cut(df['Speed'] , [0,40,60,80,120] , include_lowest=True)
Now let’s output a dataframe which shows average speed as based on Legendary feature and the Speed range feature.
df.pivot_table('Speed' , index = ['Type 1'] , columns= ['speed_range','Legendary'] , aggfunc = 'mean')
I encourage you to try pandas
pivot_table on the Titanic dataset as it would complement your practice of the topic.
Pivot tables in Python are not an easy topic to master but given the benefits it has to offer one must definitely include this knowledge in their analysis toolkit. Remember Practice is the Key here!
if this article we tried to get some idea on Pandas
pivot_table by implementing it on a dataset. We realized it to be a multidimensional version of groupby operation (Conceptually speaking). Happy Learning!