Hello, readers! In this article, we will be focusing on Python Pandas math functions for data analysis, in detail. So, let us get started!
Role of Pandas math functions in Data Analysis
In the domain of statistics and data analysis, the basic task is to analyze the data and draw observations out of them to have a better model built on it. For the same, it is necessary for us to explore functions that would help in the process of analyzing the data to draw meaning information out of it.
Python programming offers us with Pandas Module that contains various functions to enable us to analyze the data values.
Analysis of data simply means drawing out meaning information from the raw data source. This information enables us have an intimation about the distribution and structure of the data.
In the course of this article, we will be having a look at the below functions:
- Pandas.DataFrame.mean() function
- Pandas.DataFrame.sum() function
- Pandas.DataFrame.median() function
- Pandas min() and max() functions
- Pandas.DataFrame.value_counts() function
- Pandas.DataFrame.describe() function
Let us have at each of them in the upcoming section!
In this article, we have made use of Bike Rental Prediction dataset. You can find the dataset here!
1. Pandas mean() function
Mean, as a statistical value, represents the entire distribution of data through a single value. Using dataframe.mean() function, we can get the value of mean for a single column or multiple columns i.e. entire dataset.
In this example, we have applied the mean() function on the entire dataset.
As a result, the mean values for all the columns of the dataset is represented as shown below–
instant 366.000000 season 2.496580 yr 0.500684 mnth 6.519836 holiday 0.028728 weekday 2.997264 workingday 0.683995 weathersit 1.395349 temp 0.495385 atemp 0.474354 hum 0.627894 windspeed 0.190486 casual 848.176471 registered 3656.172367 cnt 4504.348837 dtype: float64
2. Pandas sum() function
Apart from mean() function, we can make use of Pandas sum() function to get the summation of the values of the columns at a larger scale. This enables us to have a broader perspective of the data in quantitative terms.
Here, we have calculated the summation of every column of the dataset by applying sum() function on the entire dataset.
instant 267546 dteday 2011-01-012011-01-022011-01-032011-01-042011-0... season 1825 yr 366 mnth 4766 holiday 21 weekday 2191 workingday 500 weathersit 1020 temp 362.126 atemp 346.753 hum 458.991 windspeed 139.245 casual 620017 registered 2672662 cnt 3292679 dtype: object
3. Pandas median() function
With median() function, we get the 50 percentile value or the central value of the set of data.
Here, we have applied median() function on every column of the dataset.
Here, we see the median values for every column of the dataset.
instant 366.000000 season 3.000000 yr 1.000000 mnth 7.000000 holiday 0.000000 weekday 3.000000 workingday 1.000000 weathersit 1.000000 temp 0.498333 atemp 0.486733 hum 0.626667 windspeed 0.180975 casual 713.000000 registered 3662.000000 cnt 4548.000000
4. Pandas min() and max() functions
With min() and max() functions, we can obtain the minimum and maximum values of every column of the dataset as well as the a single column of the dataframe.
Here, we have applied the max() function to obtain the maximum limit of every column of the dataset.
instant 731 dteday 2012-12-31 season 4 yr 1 mnth 12 holiday 1 weekday 6 workingday 1 weathersit 3 temp 0.861667 atemp 0.840896 hum 0.9725 windspeed 0.507463 casual 3410 registered 6946 cnt 8714 dtype: object
5. Pandas value_counts() function
With value_counts() function, we can fetch the count of every category or group present in a variable. It is beneficial with categorical variables.
Here, we have applied value_counts() function on the season variable. As seen below, we get the count of every group present in the variable as a separate category.
3 188 2 184 1 181 4 178
6. Pandas describe() function
With describe() function, we get the below statistical information at once:
- count of the data values of every column
- standard deviation
- minimum value
- maximum value
- 25% value [1st quartile]
- 50% i.e. median
- 75% value [3rd quartile]
By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.
For more such posts related to Python programming, stay tuned with us.
Till then, Happy Learning!! 🙂