Pandas math functions for Data Analysis that you should know!

Hello, readers! In this article, we will be focusing on Python Pandas math functions for data analysis, in detail. So, let us get started!

Role of Pandas math functions in Data Analysis

In the domain of statistics and data analysis, the basic task is to analyze the data and draw observations out of them to have a better model built on it. For the same, it is necessary for us to explore functions that would help in the process of analyzing the data to draw meaning information out of it.

Python programming offers us with Pandas Module that contains various functions to enable us to analyze the data values.

Analysis of data simply means drawing out meaning information from the raw data source. This information enables us have an intimation about the distribution and structure of the data.

In the course of this article, we will be having a look at the below functions:

Pandas.DataFrame.mean() function
Pandas.DataFrame.sum() function
Pandas.DataFrame.median() function
Pandas min() and max() functions
Pandas.DataFrame.value_counts() function
Pandas.DataFrame.describe() function

Let us have at each of them in the upcoming section!

In this article, we have made use of Bike Rental Prediction dataset. You can find the dataset here!

1. Pandas mean() function

Mean, as a statistical value, represents the entire distribution of data through a single value. Using dataframe.mean() function, we can get the value of mean for a single column or multiple columns i.e. entire dataset.

Example:

In this example, we have applied the mean() function on the entire dataset.

BIKE.mean()

Output:

As a result, the mean values for all the columns of the dataset is represented as shown below–

instant        366.000000
season           2.496580
yr               0.500684
mnth             6.519836
holiday          0.028728
weekday          2.997264
workingday       0.683995
weathersit       1.395349
temp             0.495385
atemp            0.474354
hum              0.627894
windspeed        0.190486
casual         848.176471
registered    3656.172367
cnt           4504.348837
dtype: float64

2. Pandas sum() function

Apart from mean() function, we can make use of Pandas sum() function to get the summation of the values of the columns at a larger scale. This enables us to have a broader perspective of the data in quantitative terms.

Example:

Here, we have calculated the summation of every column of the dataset by applying sum() function on the entire dataset.

BIKE.sum()

Output:

instant                                                  267546
dteday        2011-01-012011-01-022011-01-032011-01-042011-0...
season                                                     1825
yr                                                          366
mnth                                                       4766
holiday                                                      21
weekday                                                    2191
workingday                                                  500
weathersit                                                 1020
temp                                                    362.126
atemp                                                   346.753
hum                                                     458.991
windspeed                                               139.245
casual                                                   620017
registered                                              2672662
cnt                                                     3292679
dtype: object

3. Pandas median() function

With median() function, we get the 50 percentile value or the central value of the set of data.

Example:

Here, we have applied median() function on every column of the dataset.

BIKE.median()

Output:

Here, we see the median values for every column of the dataset.

instant        366.000000
season           3.000000
yr               1.000000
mnth             7.000000
holiday          0.000000
weekday          3.000000
workingday       1.000000
weathersit       1.000000
temp             0.498333
atemp            0.486733
hum              0.626667
windspeed        0.180975
casual         713.000000
registered    3662.000000
cnt           4548.000000

4. Pandas min() and max() functions

With min() and max() functions, we can obtain the minimum and maximum values of every column of the dataset as well as the a single column of the dataframe.

Example:

Here, we have applied the max() function to obtain the maximum limit of every column of the dataset.

BIKE.max()

Output:

instant              731
dteday        2012-12-31
season                 4
yr                     1
mnth                  12
holiday                1
weekday                6
workingday             1
weathersit             3
temp            0.861667
atemp           0.840896
hum               0.9725
windspeed       0.507463
casual              3410
registered          6946
cnt                 8714
dtype: object

5. Pandas value_counts() function

With value_counts() function, we can fetch the count of every category or group present in a variable. It is beneficial with categorical variables.

Example:

BIKE.season.value_counts()

Here, we have applied value_counts() function on the season variable. As seen below, we get the count of every group present in the variable as a separate category.

Output:

6. Pandas describe() function

With describe() function, we get the below statistical information at once:

count of the data values of every column
mean
standard deviation
minimum value
maximum value
25% value [1st quartile]
50% i.e. median
75% value [3rd quartile]

Example:

BIKE.describe()

Output:

Conclusion

By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.

For more such posts related to Python programming, stay tuned with us.

Till then, Happy Learning!! 🙂