In this article, we’ll calculate the Dataframe Mean in Python pandas. Python is widely used for data analysis and processing. So generally python is used to process huge and unclassified informal data. To get meaningful information from our existing data, we use statistical concepts such as Mean, Median, and Mode. These concepts help us in the proper classification and modeling of our data in order to come up with a very efficient model.
What is Mean?
Mean is basically the average value of our dataset. For a data set, the arithmetic mean, also known as arithmetic average, is a central value of a finite set of numbers: specifically, the sum of the values divided by the number of values. Mean is given by the formula:
= | arithmetic mean | |
= | number of values | |
= | data set values |
Dataframe Mean in Pandas
We have an in-built mean function in pandas which could be used on our data frame objects. In order to use the mean function, we need to import the pandas library in our code snippet. Let us now understand the basic syntax and properties of the mean function
pandas.DataFrame.mean
The mean function, when applied on the series would return the mean of the series and when applied on a dataframe object, it would return the list of the means of all the series present in a dataframe. Let us now understand the syntax and the parameters of the mean function.
Syntax
DataFrame.mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
Parameters
- axis: It can have either 0 or 1 as its value. The default value is 0 which indicates the index / row axis.
when axis = 0, the function is applied across the indexed axis and - when axis = 1, it is applied on columns.
- skipna: It excludes all the null values while calculating the result.
- level: It counts along with a particular level and collapsing into a Series if the axis is a MultiIndex (hierarchical),
- numeric_only: It includes only int, float, boolean columns. If None, it will attempt to use everything, then use only numeric data. Not implemented for Series.
- **kwargs: Additional keyword arguments to be passed to the function.
Returns the mean of series or the data frame.
Now that we are familiarized with the syntax and parameters of the function, let us now try to understand the working of the function with some examples.
Example – How to Calculate Dataframe Mean
import pandas as pd
data = [[4, 1, 5], [3, 6, 7], [4, 5, 2], [2, 9, 4]]
df = pd.DataFrame(data)
print(df.mean(axis = 0))
Output
0 3.25 1 5.25 2 4.50 dtype: float64
We can see that the mean value is calculated for every row/index of the dataframe
Example – Calculate Dataframe Mean With Axis 1
import pandas as pd
data = [[4, 1, 5], [3, 6, 7], [4, 5, 2], [2, 9, 4]]
df = pd.DataFrame(data)
print(df.mean(axis = 1))
Output
0 3.333333
1 5.333333
2 3.666667
3 5.000000
dtype: float64
Here we can see that the mean is calculated for each column.
In our next example, we shall see how to apply mean function to a specific series in the dataframe.
Example 3 – Calculate Mean Without Axis
import pandas as pd
data = [[4, 1, 5], [3, 6, 7], [4, 5, 2], [2, 9, 4]]
df = pd.DataFrame(data)
print(df[0].mean())
This above code will just print the mean of the first index axis in the dataframe.
Output
3.25
Here we can verify that the output is a scalar value which is the mean of df[0] = {4, 3, 4, 2}. That is, (4+3+4+2)/3 = 3.25
Conclusion
Through this article, we have understood the uses and applications of mean() function in the pandas library.
References
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.mean.html