Sometimes we want to generate quick insights on only a portion of data from large datasets. Pivot table in Python comes in handy in such situations and helps us to extract the data we need and generate insights from the new data. Today we will be using the pivot table function present in Pandas. This article will give you a comprehensive overview of how you can pivot table one dataset to another in python.
Also read: How to insert multiple records in Sqlite3 database
What is a pivot table and how do you make a pivot table?
A pivot table is a table that helps in extracting data from a larger table or a dataset. In other words, we “pivot” data from a larger dataset. Let’s have a look at the syntax of a pivot table:
pandas.pivot_table(data, values=None, index=None, columns=None, aggfunc='mean', fill_value=None)
The pivot table function will return a dataframe. Now that we know the basics of a pivot table, lets’ start pivoting!
Load the Dataset
Let’s start by loading the dataset in our environment.
import pandas as pd
df= pd.read_csv("C://Users//Intel//Documents//covid-variants.csv")
df.head()

Now we will start building a pivot table.
Generating a Pivot Table in Python
num_sequences_by_variant = pd.pivot_table(df, index = 'variant', values = 'num_sequences',aggfunc='mean')
print(num_sequences_by_variant)
num_sequences
variant
Alpha 270.696702
B.1.1.277 0.282744
B.1.1.302 0.116157
B.1.1.519 5.455306
B.1.160 8.130736
B.1.177 40.740201
B.1.221 3.675191
B.1.258 7.358270
B.1.367 0.229685
B.1.620 0.242830
Beta 9.683078
Delta 916.371893
Epsilon 15.804732
Eta 1.654876
Gamma 27.522945
Iota 10.254541
Kappa 1.787046
Lambda 2.249283
Mu 3.405354
Omicron 27.614245
S:677H.Robin1 1.564771
S:677P.Pelican 1.156071
non_who 222.537763
others 153.585803
As you can see, we created a pivot table with the variables num_sequences and variants. As we have passed the aggfunc function as mean, the num_sequences is actually the average of all the num_sequences.
If you want the sum of all num_sequences, you can pass the sum to the aggfunc parameter.
total_by_variant = pd.pivot_table(df, index = 'variant', values = 'num_sequences',aggfunc='sum')
print(total_by_variant)
Output
num_sequences
variant
Alpha 1132595
B.1.1.277 1183
B.1.1.302 486
B.1.1.519 22825
B.1.160 34019
B.1.177 170457
B.1.221 15377
B.1.258 30787
B.1.367 961
B.1.620 1016
Beta 40514
Delta 3834100
Epsilon 66127
Eta 6924
Gamma 115156
Iota 42905
Kappa 7477
Lambda 9411
Mu 14248
Omicron 115538
S:677H.Robin1 6547
S:677P.Pelican 4837
non_who 931098
others 642603
Setting the Index for the Pivot Table in Python
We can very simply reset the index of the above dataframe with the help of the following command.
num_sequences_by_variant.reset_index()

Selecting/Filtering Data from a Pivot Table in Python
Now suppose, we want to look at the variants with the total number of sequences greater than 30000 we have to pass the following command.
total_by_variant[total_by_variant['num_sequences'] > 30000]

Now again, let’s suppose we want to see the variants having a number of sequences only between 30000 and 70000. Then we have to pass the following command.
total_by_variant[(total_by_variant['num_sequences'] > 30000) | (total_by_variant['num_sequences'] < 70000)]

Now let’s jump into learning about multi-index pivot tables.
Multi-Index Pivot Table in Python
Till now, we have only learned about pivot tabes with a single index. But we can also form pivot tables with multiple indices and that can help you generate more insights from your data.
In the following table, we have set variant type and date as our indices.
multi_index_table= pd.pivot_table(df, index = ['variant', 'date'], values = 'num_sequences', aggfunc = 'sum')
print(multi_index_table)
Output
num_sequences
variant date
Alpha 2020-05-11 0
2020-05-25 0
2020-06-08 0
2020-06-22 0
2020-07-06 0
... ...
others 2021-11-15 928
2021-11-29 928
2021-12-13 619
2021-12-27 603
2022-01-05 2
[1080 rows x 1 columns]
Multi-Index Pivot Table with Multiple Functions
Apart from setting multiple indices, we can also set multiple functions to the aggfunc parameter for our pivot table.
For example:
multi_functions_table = pd.pivot_table(df, index = ['variant', 'date'], values = 'num_sequences', aggfunc = ['sum',len])
print(multi_functions_table)
Output
sum len
num_sequences num_sequences
variant date
Alpha 2020-05-11 0 78
2020-05-25 0 72
2020-06-08 0 78
2020-06-22 0 79
2020-07-06 0 74
... ... ...
others 2021-11-15 928 89
2021-11-29 928 85
2021-12-13 619 86
2021-12-27 603 64
2022-01-05 2 12
[1080 rows x 2 columns]
Adding New Columns To The Pivot Table in Python
In this step, we will add the variant feature as a column and set the date as the index. This will make your data more organized and readable.
add_columns=pd.pivot_table(df, index = 'date', columns = 'variant', values = 'num_sequences', aggfunc = 'sum')
print(add_columns)
Output
variant Alpha B.1.1.277 B.1.1.302 B.1.1.519 B.1.160 B.1.177 \
date
2020-05-11 0 0 0 0 0 2
2020-05-25 0 0 0 0 0 0
2020-06-08 0 0 0 0 0 2
2020-06-22 0 0 0 0 0 8
2020-07-06 0 0 0 0 0 72
2020-07-20 0 7 3 0 8 88
2020-08-03 0 12 8 0 22 172
2020-08-17 0 43 12 0 89 499
2020-08-31 0 94 24 0 386 1400
2020-09-14 0 143 32 0 1017 2923
2020-09-28 3 178 27 0 1049 6225
2020-10-12 15 120 31 0 1188 8693
2020-10-26 47 155 21 1 2017 13008
2020-11-09 381 180 21 5 2773 15723
2020-11-23 1450 72 27 17 2274 15622
2020-12-07 1128 42 29 34 2029 10489
2020-12-21 9061 28 42 82 3110 19350
2021-01-04 14002 29 21 184 2934 14161
Visuzaling Your Pivot Table in Python
In this section, we will create bar plots for our table. This will help you visualize the data more clearly.
add_columns.plot(kind='bar')

Although the plot looks very clumsy, as we always say, you can use the matplotlib library or Seaborn library in Python to make the plots more presentable.
Working with missing values
In this section, we will learn how to handle missing values in your pivot table data. First, let’s check if there are any missing values present in the dataset.
print(df.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100416 entries, 0 to 100415
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 location 100416 non-null object
1 date 100416 non-null object
2 variant 100416 non-null object
3 num_sequences 100416 non-null int64
4 perc_sequences 100416 non-null float64
5 num_sequences_total 100416 non-null int64
dtypes: float64(1), int64(2), object(3)
memory usage: 4.6+ MB
None
As we can see, we do not have any missing values in our dataset, so we do not need to do anything with our data.
In case you have any missing values in your data, you can pass the fill_value argument as N/A in the pivot table function.
Conclusion
In this article, we learned how to generate pivot tables from a given dataset and transform it into another dataframe. Pivot tables will really come in handy for you when you are analyzing any dataset with a large number of features or variables. Learning all the above pivot table functions by heart will help you a lot in your data analysis projects.