Pandas Dataframe Mean – How to Calculate the Mean?

Pandas Mean

In this article, we’ll calculate the Dataframe Mean in Python pandas. Python is widely used for data analysis and processing. So generally python is used to process huge and unclassified informal data. To get meaningful information from our existing data, we use statistical concepts such as Mean, Median, and Mode. These concepts help us in the proper classification and modeling of our data in order to come up with a very efficient model.

What is Mean?

Mean is basically the average value of our dataset. For a data set, the arithmetic mean, also known as arithmetic average, is a central value of a finite set of numbers: specifically, the sum of the values divided by the number of values. Mean is given by the formula:

A= \frac {1}{n} \sum \limits_{i=1}^n a_i
A=arithmetic mean
n=number of values
a_i=data set values

Dataframe Mean in Pandas

We have an in-built mean function in pandas which could be used on our data frame objects. In order to use the mean function, we need to import the pandas library in our code snippet. Let us now understand the basic syntax and properties of the mean function

pandas.DataFrame.mean

The mean function, when applied on the series would return the mean of the series and when applied on a dataframe object, it would return the list of the means of all the series present in a dataframe. Let us now understand the syntax and the parameters of the mean function.

Syntax

DataFrame.mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs) 

Parameters

  • axis: It can have either 0 or 1 as its value. The default value is 0 which indicates the index / row axis.
    when axis = 0, the function is applied across the indexed axis and
  • when axis = 1, it is applied on columns.
  • skipna: It excludes all the null values while calculating the result.
  • level: It counts along with a particular level and collapsing into a Series if the axis is a MultiIndex (hierarchical),
  • numeric_only: It includes only int, float, boolean columns. If None, it will attempt to use everything, then use only numeric data. Not implemented for Series.
  • **kwargs: Additional keyword arguments to be passed to the function.

Returns the mean of series or the data frame.

Now that we are familiarized with the syntax and parameters of the function, let us now try to understand the working of the function with some examples.

Example – How to Calculate Dataframe Mean

import pandas as pd

data = [[4, 1, 5], [3, 6, 7], [4, 5, 2], [2, 9, 4]]

df = pd.DataFrame(data)

print(df.mean(axis = 0))

Output

0    3.25
1    5.25
2    4.50
dtype: float64

We can see that the mean value is calculated for every row/index of the dataframe

Example – Calculate Dataframe Mean With Axis 1

import pandas as pd

data = [[4, 1, 5], [3, 6, 7], [4, 5, 2], [2, 9, 4]]

df = pd.DataFrame(data)

print(df.mean(axis = 1))

Output

0    3.333333
1    5.333333
2    3.666667
3    5.000000
dtype: float64

Here we can see that the mean is calculated for each column.

In our next example, we shall see how to apply mean function to a specific series in the dataframe.

Example 3 – Calculate Mean Without Axis

import pandas as pd

data = [[4, 1, 5], [3, 6, 7], [4, 5, 2], [2, 9, 4]]

df = pd.DataFrame(data)

print(df[0].mean())

This above code will just print the mean of the first index axis in the dataframe.

Output

3.25

Here we can verify that the output is a scalar value which is the mean of df[0] = {4, 3, 4, 2}. That is, (4+3+4+2)/3 = 3.25

Conclusion

Through this article, we have understood the uses and applications of mean() function in the pandas library.

References

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.mean.html