Python Equivalent of Histfit and Fitdist

Python Equivalent Of Histfit And Fitdist

MATLAB is one of the most valuable tools in this tech-driven era. From manipulating and monitoring signals to data visualization, MATLAB can be used for pretty much everything.

Statistics is one such field that can be dealt with easily using MATLAB. In this tutorial, we are going to look at two essential functions of MATLAB used for statistics and their Python counterparts.

The functions are histfit and fitdist which can be used to plot a histogram for a distribution and fit a normal distribution over a curve, respectively.

Read on to learn the important Python functions for Statistics!

Histfit in Matlab

The histfit function, as the name suggests, is used to fit a histogram over a distribution or data. A histogram is a bar plot, that has bins (rectangle plots) that represent ranges on the X-axis and frequency on the Y-axis. The histfit function is used to plot a histogram of values in the data we choose, with the number of rectangles equal to the square root of the number of elements in our data. It also fits a normal density function.

It follows the below syntax.

histfit(data)

Let us see an example of using this function.

data = randn(1000, 1);
histfit(data, 20, 'normal');
title('Histogram with Normal PDF Fit');

The code seems short, but it does exactly what we need. In the very first line, we are generating a 1000 random points using the function randn.

In the second line, we can see the usage of the histfit function which creates a histogram with 20 bins for the data we generated in a normal distribution. The title of the plot is decided in the last line. This code generates a histogram of the data including the normal curve.

Histfit in MATLAB
Histfit in MATLAB

We have plotted a histogram of the data and included a normal distribution curve around it. But what if we have to perform some statistical analysis on the data plot? Fitdist can help us in this situation!

Fitdist in Matlab

The fitdist function, short for fit probability distribution, is used to perform statistical analysis of data and estimate the parameters of the distribution that covers the data.

We can also choose the type of distribution we want. This function follows the below syntax.

pd = fitdist(x,'Normal')

Let us see an example.

data = randn(1000, 1);
pd = fitdist(data, 'Normal');
mu = pd.mu; 
sigma = pd.sigma;  
fprintf('Estimated Mean (mu): %f\n', mu);
fprintf('Estimated Standard Deviation (sigma): %f\n', sigma);

Just like we saw in the first example, we generate 1000 random values using the randn function and store them in a variable called data. The main function is called in the next line for the data variable using a Normal distribution. This result is stored in a new variable-pd. The mean and standard deviation of the distribution are calculated and printed in the last few lines.

This function does not produce any plot rather, it displays the estimated parameters(like mean and deviation) of the data.

Fitdist in MATLAB
Fitdist in MATLAB

Can we use the two functions together? We sure can! Follow the example below.

data = randn(1000, 1);
histfit(data, 20, 'normal');
pd = fitdist(data, 'Normal');
mu = pd.mu; 
sigma = pd.sigma; 
title('Histogram with Normal PDF Fit');
fprintf('Estimated Mean (mu): %f\n', mu);
fprintf('Estimated Standard Deviation (sigma): %f\n', sigma);

We just combined the two code snippets and that is all!

Histfit and Fitdist
Histfit and Fitdist

Now that we have understood the basic functionalities of these methods, let us try to obtain the same or well, equivalent results with Python.

Python Equivalent of Histfit

If you have understood the histfit function, this function is used to create a histogram and include a distribution curve around it. There is no direct approach or a single function to achieve the same results as histfit, but we can use a couple of methods to achieve a closer output.

Learn how to create histograms with Matplotlib(Beginner to Advanced)

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
data = np.random.randn(1000) 
plt.hist(data, bins=20, density=True, alpha=0.6, color='b', edgecolor='k')
xmin, xmax = plt.xlim()
x = np.linspace(xmin, xmax, 100)
p = norm.pdf(x)
plt.plot(x, p, 'r', linewidth=2)
plt.title('Histogram with Normal PDF Fit')
plt.show()

We import the important libraries in the first three lines, followed by generating 1000 random points using the numpy library’s randn function. The matplotlib library’s hist method is used to create a histogram with 20 bins and a blue color. In the next few lines, we are creating a normal distribution curve for the data.

Python Equivalent of Histfit
Python Equivalent of Histfit

If you have keen eyes, you might have already noticed the difference in the outputs; which is normal because MATLAB and Python may treat data differently and can also have different arguments which make the outputs look a little bit different.

Python Equivalent of Fitdist

Let us see if we can produce the same results as fitdist with Python. Unlike the first case(Equivalent of Histfit), we have a somewhat direct approach to estimating the parameters with the help of the Scipy library.

import numpy as np
from scipy.stats import norm
data = np.random.randn(1000)
mu, sigma = norm.fit(data)
print(f'Estimated Mean (mu): {mu:.6f}')
print(f'Estimated Standard Deviation (sigma): {sigma:.6f}')

As usual, we import the necessary libraries, generating a set of 1000 random values using a random number generator. The scipy library’s norm function is used to estimate the mean(mu) and standard deviation(sigma). Then, we display the values using the print function.

Python Equivalent of Fitdist
Python Equivalent of Fitdist

For both Fitdist and its equivalent in Python, we might get different values every time we execute the code because we didn’t seed the values.

Conclusion

In this comprehensive guide for MATLAB vs Python, we tried to understand the two functions of MATLAB – histfit and fitdist, which are used to create a histogram and also estimate the parameters of a distribution and also tried to recreate the results with Python.

While the results may not be the same and there may not be direct approaches in Python, these methods can be used as an equivalent to MATLAB methods.

References

MATLAB Histfit

Find more about fitdist here

Matplotlib’s Histogram function

Scipy norm documentation