Hey, folks! In this article, we will be understanding the Pandas groupby() function along with the different functionality served by it.
What is the groupby() function?
Python Pandas module is extensively used for better data pre-preprocessing and goes in hand for data visualization.
Pandas module has various in-built functions to deal with the data more efficiently. The dataframe.groupby() function
of Pandas module is used to split and segregate some portion of data from a whole dataset based on certain predefined conditions or options.
Syntax:
dataframe.groupby('column-name')
Using the above syntax, we can split up the data set and select all the data belonging to the passed column as an argument to the function.
Input Dataset:

Example:
import pandas
data = pandas.read_csv("C:/marketing_tr.csv")
data_grp = data.groupby('marital')
data_grp.first()
In the above example, we have used the groupby() function to split and separately create a new data frame with all the data belonging to the column ‘marital’, respectively.
Output:

Pandas groupby() function with multiple columns
Splitting of data as per multiple column values can be done using the Pandas dataframe.groupby() function
. We can thus pass multiple column tags as arguments to split and segregate the data values along with those column values only.
Syntax:
dataframe.groupby(['column1', 'column2', ...., 'columnN'])
Example:
import pandas
data = pandas.read_csv("C:/marketing_tr.csv")4
data_grp = data.groupby(['marital','schooling'])
data_grp.first()
Output:

Pandas groupby() function to view groups
Apart from splitting the data according to a specific column value, we can even view the details of every group formed from the categories of a column using dataframe.groupby().groups
function.
Here’s a snapshot of the sample dataset used in this example:

Syntax:
dataframe.groupby('column').groups
Example:
import pandas
data = pandas.read_csv("C:/marketing_tr.csv")
data_grp = data.groupby('marital').groups
data_grp
As seen above, we have split the data and formed a new dataframe of values from column – ‘marital’.
Further, we have used groupby().groups function to display all the categories of values present in that particular column.
Further, it also represents the position of those categories in the original dataset along with the data type and the number of values present.
Output:
{'divorced': Int64Index([ 3, 4, 33, 34, 63, 66, 73, 77, 98, 112,
...
7284, 7298, 7300, 7326, 7349, 7362, 7365, 7375, 7391, 7412],
dtype='int64', length=843),
'married': Int64Index([ 1, 2, 5, 7, 8, 9, 10, 11, 13, 14,
...
7399, 7400, 7403, 7404, 7405, 7406, 7407, 7408, 7410, 7413],
dtype='int64', length=4445),
'single': Int64Index([ 0, 6, 12, 16, 18, 19, 24, 29, 31, 32,
...
7383, 7385, 7386, 7390, 7397, 7398, 7401, 7402, 7409, 7411],
dtype='int64', length=2118),
'unknown': Int64Index([2607, 4770, 4975, 5525, 5599, 5613, 6754, 7221], dtype='int64')}
Selecting a group using Pandas groupby() function
As seen till now, we can view different categories of an overview of the unique values present in the column with its details.
Using dataframe.get_group('column-value')
,we can display the values belonging to the particular category/data value of the column grouped by the groupby() function.
Syntax:
dataframe.get_group('column-value')
Example:
import pandas
data = pandas.read_csv("C:/marketing_tr.csv")
data_grp = data.groupby('marital')
df = data_grp.get_group('divorced')
df.head()
In the above example, we have displayed the data belonging to the column-value ‘divorced’ of the column ‘marital’.
Output:

Conclusion
Thus, in this article, we have understood the working of Pandas groupby() function in detail.
References
- Pandas groupby() function — JournalDev