Python statistics module provides functions to calculate mathematical statistical data on a given set of numbers. It was introduced in Python 3.4 release. This is a very simple module and works on numbers – int, float, Decimal, and Fraction. In this article, we will be focusing on 7 important functions of Python statistics module.
Python statistics module functions
We would be focusing on some of the most prominent functions offered by statistics module in Python.
- mean() function
- median() function
- median_high() function
- median_low() function
- stdev() function
- _sum() function
- _counts() function
Let’s have a look at them one by one.
1. The mean() function
Mean is one of the most used statistical measures to understand the data at a glance. The mean value represents the overall average estimation of the entire data at once. It’s calculated by adding all the values in the dataset and then dividing by the number of values.
For example, if the dataset is [1,2,3,4,5], then the mean will be (1+2+3+4+5)/5 = 3.
statistics.mean() function returns the mean from the set of numeric data values.
2. The median() function
Apart from the mean, we often come across situations where we need a value that represents the middle section of the entire data. With
statistics.median() function, we can calculate the middle value for the data values. The median value is derived after sorting the dataset from the lowest to the greatest value. If the dataset has an even number of values, then the median is the average of the middle two numbers.
For example, if the dataset is [1, 3, 10, 2], then first we will arrange it in the increasing order, i.e. [1, 2, 3, 10]. Since there is an even number of values, the median will be the average of the middle two numbers i.e. 2 and 3. So the median will be 2.5. For dataset [1, 10, 3], the median will be 3.
3. The median_high() function
median_high() function of the statistics module returns the higher median value from the dataset. The high median is especially useful when the data values are discrete in nature. If the dataset has an even number of values, the higher of the middle two values is returned. For an odd number of values, median_high is the same as the median value.
For example, if the dataset is [1, 2, 3, 10], the median_high will be 3. If the dataset is [1, 3, 5], the median_high is the same as the median value 3.
4. The statistics.median_low() function
median_low() function returns the lowest of the median values from the set of values. It is useful when the data is discrete in nature and when we need the exact data point rather than interpolation points. If the dataset has an even number of values, the lower of the middle two values is returned. For an odd number of values, median_low is the same as the median value.
For example, if the dataset is [1, 2, 3, 10], the median_low will be 2. If the dataset is [1, 3, 5], the median_low is the same as the median value 3.
5. The statistics.stdev() function
stdev() function returns the standard deviation of the data. First, the mean of data is calculated. Then the variation is calculated. The square root of the variance is the SD of the dataset.
6. The _sum() function of statistics
When it comes to accumulation of the data points passed as arguments, the _sum() function comes into the picture. With
_sum() function, we can get the summation of all the data values along with the count of all the data points passed to it.
7. The _counts() function
_counts() function, we can get the frequency of every data point from the set of values. It counts the occurrence of every single data point and returns the list of tuples of size 2. The first value of the tuple is the dataset value and the second value is the occurrence count.
Python statistics module functions examples
Let’s look at some examples of using the statistics module functions.
import statistics data = [10, 203, 20, 30, 40, 50, 60, 70, 80, 100] res = statistics.mean(data) print("Mean: ", res) res = statistics.median(data) print("Median: ", res) res = statistics.median_high(data) print("Median High value: ", res) res = statistics.median_low(data) print("Median Low value: ", res) res = statistics.stdev(data) print("Standard Deviation: ", res) res = statistics._sum(data) print("Sum: ", res) res = statistics._counts(data) print("Count: ", res)
Mean: 66.3 Median: 55.0 Median High value: 60 Median Low value: 50 Standard Deviation: 55.429735301150004 Sum: (<class 'int'>, Fraction(663, 1), 10) Count: [(10, 1), (203, 1), (20, 1), (30, 1), (40, 1), (50, 1), (60, 1), (70, 1), (80, 1), (100, 1)]
Python statistics module is useful to get the mean, median, mode, and standard deviation of the numerical datasets. They work on numbers and provide simple functions to calculate these values. However, if you are already using the NumPy or the Pandas module, you can use their functions to calculate these values.