Understanding Pandas groupby() function

Pandas Groupby() Function

Hey, folks! In this article, we will be understanding the Pandas groupby() function along with the different functionality served by it.


What is the groupby() function?

Python Pandas module is extensively used for better data pre-preprocessing and goes in hand for data visualization.

Pandas module has various in-built functions to deal with the data more efficiently. The dataframe.groupby() function of Pandas module is used to split and segregate some portion of data from a whole dataset based on certain predefined conditions or options.

Syntax:

dataframe.groupby('column-name')

Using the above syntax, we can split up the data set and select all the data belonging to the passed column as an argument to the function.

Input Dataset:

Input Dataset Pandas groupby() Function
Input Dataset Pandas groupby() Function

Example:

import pandas
data = pandas.read_csv("C:/marketing_tr.csv")
data_grp = data.groupby('marital')
data_grp.first()

In the above example, we have used the groupby() function to split and separately create a new data frame with all the data belonging to the column ‘marital’, respectively.

Output:

Python Pandas groupby() function
Python Pandas groupby() function

Pandas groupby() function with multiple columns

Splitting of data as per multiple column values can be done using the Pandas dataframe.groupby() function. We can thus pass multiple column tags as arguments to split and segregate the data values along with those column values only.

Syntax:

dataframe.groupby(['column1', 'column2', ...., 'columnN'])

Example:

import pandas
data = pandas.read_csv("C:/marketing_tr.csv")4
data_grp = data.groupby(['marital','schooling'])
data_grp.first()

Output:

Grouping Multiple Columns Using groupby() function
Grouping Multiple Columns Using groupby() function

Pandas groupby() function to view groups

Apart from splitting the data according to a specific column value, we can even view the details of every group formed from the categories of a column using dataframe.groupby().groups function.

Here’s a snapshot of the sample dataset used in this example:

Marketing Tr Csv 1
Marketing Tr Csv 1

Syntax:

dataframe.groupby('column').groups

Example:

import pandas
data = pandas.read_csv("C:/marketing_tr.csv")
data_grp = data.groupby('marital').groups
data_grp

As seen above, we have split the data and formed a new dataframe of values from column – ‘marital’.

Further, we have used groupby().groups function to display all the categories of values present in that particular column.

Further, it also represents the position of those categories in the original dataset along with the data type and the number of values present.

Output:

{'divorced': Int64Index([   3,    4,   33,   34,   63,   66,   73,   77,   98,  112,
             ...
             7284, 7298, 7300, 7326, 7349, 7362, 7365, 7375, 7391, 7412],
            dtype='int64', length=843),
 'married': Int64Index([   1,    2,    5,    7,    8,    9,   10,   11,   13,   14,
             ...
             7399, 7400, 7403, 7404, 7405, 7406, 7407, 7408, 7410, 7413],
            dtype='int64', length=4445),
 'single': Int64Index([   0,    6,   12,   16,   18,   19,   24,   29,   31,   32,
             ...
             7383, 7385, 7386, 7390, 7397, 7398, 7401, 7402, 7409, 7411],
            dtype='int64', length=2118),
 'unknown': Int64Index([2607, 4770, 4975, 5525, 5599, 5613, 6754, 7221], dtype='int64')}

Selecting a group using Pandas groupby() function

As seen till now, we can view different categories of an overview of the unique values present in the column with its details.

Using dataframe.get_group('column-value') ,we can display the values belonging to the particular category/data value of the column grouped by the groupby() function.

Syntax:

dataframe.get_group('column-value')

Example:

import pandas
data = pandas.read_csv("C:/marketing_tr.csv")
data_grp = data.groupby('marital')
df = data_grp.get_group('divorced')
df.head()

In the above example, we have displayed the data belonging to the column-value ‘divorced’ of the column ‘marital’.

Output:

Selecting a group Using groupby() function
Selecting a group Using groupby() function

Conclusion

Thus, in this article, we have understood the working of Pandas groupby() function in detail.


References

  • Pandas groupby() function — JournalDev