To calculate summary statistics in Python you need to use the .describe() method under Pandas. The .describe() method works on both numeric data as well as object data such as strings or timestamps.
The output for the two will contain different fields. For numeric data the result will include:
- count
- mean
- standard deviation
- minimum
- maximum
- 25 percentile
- 50 percentile
- 75 percentiles
For object data the result will include :
- count
- unique
- top
- freq
Calculate Summary Statistics in Python Using the describe() method
In this tutorial, we will see how to use .describe() method with numeric and object data.
We will also see how to analyze a large dataset and timestamp series using .describe method.
Let’s get started.
1. Summary Statistics for Numeric data
Let’s define a list with numbers from 1 to 6 and try getting summary statistics for the list.
We will start by importing pandas.
import pandas as pd
Now we can define a series as :
s = pd.Series([1, 2, 3, 4, 5, 6])
To display summary statistics use:
s.describe()
The complete code and output are as follows :
import pandas as pd
s = pd.Series([1, 2, 3, 4, 5, 6])
s.describe()
Output :
count 6.000000
mean 3.500000
std 1.870829
min 1.000000
25% 2.250000
50% 3.500000
75% 4.750000
max 6.000000
dtype: float64
Let’s understand what each of the value means.
count | Total number of entries |
mean | Average of all the entries |
std | standard deviation |
min | minimum value |
25% | 25 percentile mark |
50% | 50 percentile mark (median) |
75% | 75 percentile mark |
max | maximum value |
2. Summary Statistics for Python Object data
Let’s define a series as a set of characters and use the .describe method on it to calculate summary statistics.
We can define the series as:
s = pd.Series(['a', 'a', 'b', 'c'])
To get the summary statistics use :
s.describe()
The complete code and output is as follows:
import pandas
s = pd.Series(['a', 'a', 'b', 'c'])
s.describe()
Output:
count 4
unique 3
top a
freq 2
dtype: object
Let’s understand what each of the following means:
count | Total number of entries |
unique | Total number of unique entries |
top | Most frequent entry |
freq | Frequency of the most frequent entry |
3. Summary statistics of a large data set
You can use pandas to get the summary statistics from a large dataset as well. You just need to import the dataset into a pandas data frame and then use the .describe method.
In this tutorial, we will be using the California Housing dataset as the sample dataset.
Let’s start by importing the CSV dataset and then call the .describe method on it.
import pandas as pd
housing = pd.read_csv("/content/sample_data/california_housing.csv")
housing.describe()
Output :

We can see that the result contains the summary statistics for all the columns in our dataset.
4. Summary Statistics for timestamp series
You can use .describe to get summary statistics for a timestamp series as well. Let’s start by defining a timestamp series.
import datetime
import numpy as np
s = pd.Series([np.datetime64("2000-01-01"),np.datetime64("2010-01-01"),np.datetime64("2010-01-01"),np.datetime64("2002-05-08")])
Now you can call .describe on this timestamp series.
s.describe()
The complete code and output are as follows :
import datetime
import numpy as np
s = pd.Series([np.datetime64("2000-01-01"),np.datetime64("2010-01-01"),np.datetime64("2010-01-01"),np.datetime64("2002-05-08")])
s.describe()
Output:
count 4
unique 3
top 2010-01-01 00:00:00
freq 2
first 2000-01-01 00:00:00
last 2010-01-01 00:00:00
dtype: object
You can also instruct .describe to treat dateTime as a numeric. This will display the result in a manner similar to that of numeric data. You can get mean, median, 25 percentile and 75 percentile in DateTime format.
This can be done using :
s.describe(datetime_is_numeric=True)
The output is as follows:
count 4
mean 2005-08-03 00:00:00
min 2000-01-01 00:00:00
25% 2001-10-05 12:00:00
50% 2006-03-05 12:00:00
75% 2010-01-01 00:00:00
max 2010-01-01 00:00:00
You can see that the result contains mean, median, 25 percentile and 75 percentile in DateTime format.
Conclusion
This tutorial was about computing summary statistics in Python. We looked at numeric data, object data, large datasets and timestamp series to calculate summary statistics.