Pandas pivot – Return reshaped DataFrame

Pandas Pivot

In this article, let’s try to understand the pivot() method, which is one of the package’s general functions. Open-sourced Python software called Pandas is used for data analysis and manipulation. It is meant to refer to “Panel Data” and “Python Data Analysis” both as “Pandas.” It makes working with time series and statistical tables simpler by providing built-in general functions and data types. It is one of the best tools to work on real-world unorganized as well as vast datasets, and hence is a great asset for data scientists and data analysts.

Pandas Pivot Tables in Python – Easy Guide

Why is the Pandas pivot() function used?

This function is used to reshape data based on column values (create a “pivot” table). It forms the axes of the final DataFrame using unique values from the provided index or columns. Data aggregation is not supported by this function and multiple values produce a MultiIndex in the columns. The return type of the function is a DataFrame which is reshaped according to the parameter passed.

When there is a need to pass multiple values to the parameter or the data is needed to be aggregated it is suggested to use the pivot_table() function instead of the pivot() function. Pivot_table is a generalization of the pivot function.

Syntax of Pandas pivot()

pandas.pivot(data, index=None, columns=None, values=None)
  • data: DataFrame
    • DataFrame which needs to be reshaped.
  • index: string or object or a list of strings, optional
    • column to be used in creating the index for a new frame. When None, use the current index.
  • columns: string or object or a list of strings
    • Column/s to be used for creating new frame columns.
  • values: string, object, or a list of the previous, optional
    • columns to use to fill the values of the new frame. If nothing is provided, all relevant columns will be used, and the outcome will have columns that are hierarchically indexed.

In the new version, 1.1.0 one can pass a list of index names and even a list of column names.

Note: If the column passed as the index parameter contains duplicate values, then a Traceback ValueError will occur. (Take a look at Example 3)

Implementing Pandas pivot()

Make sure to install and import the Pandas package in your python IDE before beginning with the method. To accomplish this, run the following line of code in your IDE.

import pandas as pd

Example 1: Passing the index parameter.

#creating dataframe
df = pd.DataFrame({'Food': ['Salad','Pasta','Burger','Burger','Salad'],
                   'Name': ['Lewis','Daniel','Max','Daniel','Max'],
                   'Bill': [240, 360, 190, 190, 230],
                   'Rating': [9,8,7,8,9]
                   })

table = pd.pivot(df, index='Food', columns='Name')
print(table)
Example 1: Passing the index parameter.
Example 1: Passing the index parameter.

Note: In the above example as the ‘value’ parameter is not passed, therefore, all the relevant columns (‘Bill’ and ‘Rating’) are present in the output.

Example 2: Passing the values parameter

#creating dataframe
df = pd.DataFrame({'Food': ['Salad','Pasta','Burger','Burger','Salad'],
                   'Name': ['Lewis','Daniel','Max','Daniel','Max'],
                   'Bill': [240, 360, 190, 190, 230],
                   'Rating': [9,8,7,8,9]
                   })

table = pd.pivot(df, index='Food', columns='Name', values='Bill')
print(table)
Example 2: Passing the values parameter
Example 2: Passing the values parameter

Note: In the above example only the Column name is passed in the ‘values’ parameter is present in the output.

Example 3: Passing Duplicate input

#creating dataframe
df = pd.DataFrame({'Food': ['Salad','Salad','Burger','Burger'],
                   'Name': ['Lewis','Lewis','Max','Carlos'],
                   'Bill': [240, 360, 190, 190],
                   })
table = pd.pivot(df, index='Food', columns='Name', values='Bill')
print(table)
Example 3: Passing Duplicate input
Example 3: Passing Duplicate input

Note: The first two values of the ‘Food’ and ‘Name’ column are exactly the same. Hence, an Error has occurred.

Summary

In conclusion, pivot() is a useful general function of the Pandas package that helps the user to reshape the provided data frame by specifying the required parameters and indexes. This function is helpful to quickly summarize the required data which is part of a larger dataset.

The only condition where the pivot() function fails, is when duplicate values are present in the columns assigned to the index parameter. To learn more about pandas function and python programming language click here!

Reference

https://pandas.pydata.org/docs/reference/api/pandas.pivot.html