# Matplotlib Histogram from Basic to Advanced In today’s everyday newspaper we very often see histograms and pie charts explaining the stocks or finance or COVID-19 data. There is no doubt that histograms make our day-to-day life a lot easier. They help us to visualize the data at a glance and get an understanding of the data. In this article today we are going to learn about histograms(from basics to advanced) to help you with your data analysis or machine learning projects.

## What is a histogram?

The histogram is a type of bar plot which is used to represent the numerical data distribution. In histograms, X-axis represents the bin ranges and the Y-axis gives the frequency. A histogram creates a bin of the ranges and distributes the entire range of values into intervals and counts the number of values(frequency) that fall into each of those intervals.The matplotlib.pyplot.hist() function helps us to plot a histogram.

## What is the Matplotlib library in Python?

Matplotlib is one of the most commonly used data visualization libraries in Python. It is a great tool for simple visualization as well as complex visualizations.

Let us quickly take a look at the syntax of the matplotlib histogram function:

```matplotlib.pyplot.hist(x, bins=None, range=None, density=False, weights=None, cumulative=False, bottom=None, histtype=’bar’, align=’mid’, orientation=’vertical’, rwidth=None, log=False, color=None, label=None, stacked=False)
```

## Importing Matplotlib and Necessary Libraries

We will import all the necessary libraries before we begin our histogram plotting. Let’s how to install matplotlib and the necessary libraries.

```import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
```

Now let’s start with the very basic one and then we will move on to the advanced histogram plots.

## Histogram with Basic Distribution

To create a histogram of basic distribution, we have used the random NumPy function here. To represent the data distribution, we have passed the mean and standard deviation values as well.

In the histogram function, we have provided the total count of values, the number of bins, and the number of patches.

We have also passed input parameters like density, facecolor, and alpha to make the histogram more representable. You can play around and change the bin size and the number of bins. We have passed the histogram type here as Bar.

The xlim and ylim were used to set the minimum and maximum values for the X and Y axes, respectively. If you do not wish to have grid lines, you can still pass the plt.grid function as False.

```import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# Using numpy random function to generate random data
np.random.seed(19685689)

mu, sigma = 120, 30
x = mu + sigma * np.random.randn(10000)

# passing the histogram function
n, bins, patches = plt.hist(x, 70, histtype='bar', density=True, facecolor='yellow', alpha=0.80)

plt.xlabel('Values')
plt.ylabel('Probability Distribution')
plt.title('Histogram showing Data Distribution')
plt.xlim(50, 180)
plt.ylim(0, 0.04)
plt.grid(True)
plt.show()
```

Output:

## Histogram Plots with Color Distribution

Plotting histograms with color representation is an excellent way to visualize the different values across the range of your data. We will use the subplot function for this type of plot. We have removed the axes spines and x,y ticks to make the plot look more presentable. We have also added padding and gridlines to it.

For the color representation, we have divided the histogram into fractions or pieces and then we have set different colors for different sections of the histogram.

```#importing the packages for colors
from matplotlib import colors
from matplotlib.ticker import PercentFormatter

# Forming the dataset with numpy random function
np.random.seed(190345678)
N_points = 100000
n_bins = 40

# Creating distribution
x = np.random.randn(N_points)
y = .10 ** x + np.random.randn(100000) + 25
legend = ['distribution']

# Passing subplot function
fig, axs = plt.subplots(1, 1, figsize =(10, 7),  tight_layout = True)

# Removing axes spines
for s in ['top', 'bottom', 'left', 'right']:
axs.spines[s].set_visible(False)

# Removing x, y ticks
axs.xaxis.set_ticks_position('none')
axs.yaxis.set_ticks_position('none')

axs.grid(b = True, color ='pink',  linestyle ='-.', linewidth = 0.6,  alpha = 0.6)

# Passing histogram function
N, bins, patches = axs.hist(x, bins = n_bins)

# Setting the color
fracs = ((N**(1 / 5)) / N.max())
norm = colors.Normalize(fracs.min(), fracs.max())

for thisfrac, thispatch in zip(fracs, patches):
color = plt.cm.viridis_r(norm(thisfrac))
thispatch.set_facecolor(color)

# Adding extra features for making it more presentable
plt.xlabel("X-axis")
plt.ylabel("y-axis")
plt.legend(legend)

plt.show()
```

Output:

## Histogram Plotting with Bars

This is a rather easy one to do. For this, we have just created random data using Numpy random function and then we have used the hist() function and passed the histtype parameter as a bar. You can change the parameter into barstacked step or stepwell.

```np.random.seed(9**7)
n_bins = 15
x = np.random.randn(10000, 5)

colors = ['blue', 'pink', 'orange','green','red']

plt.hist(x, n_bins, density = True,  histtype ='step', color = colors, label = colors)

plt.legend(prop ={'size': 10})

plt.show()
```

Output:

## KDE Plot and Histogram

This is another interesting way to plot histograms with KDE. In this example, we will plot KDE (kerned Density Estimation) along with histogram with the help of subplot function.KDE plots help in determining the probability of data in a given space. So together with a KDE plot and histogram, we can represent the probability distribution of data. For this, we have first created a data frame by generating random values of mean and standard deviation and have assigned means to the loc parameter and standard deviations to the scale parameter.

```np.random.seed(9**7)
n_bins = 15
x = np.random.randn(10000, 5)

colors = ['blue', 'pink', 'orange','green','red']

plt.hist(x, n_bins, density = True,  histtype ='bar', color = colors, label = colors)

plt.legend(prop ={'size': 10})

plt.show()
```

Output:

## Histogram with Multiple Variables

In this example, we are using the “ramen-rating” dataset to plot a histogram with multiple variables. We have assigned the three different brands of ramen to different variables. We have used the hist() function three times to create the histogram for three different brands of ramen and to plot the probability of getting a 5-star rating for three different brands of ramen.

```import pandas as pd
```
```x1 = df.loc[df.Style=='Bowl', 'Stars']
x2 = df.loc[df.Style=='Cup', 'Stars']
x3 = df.loc[df.Style=='Pack', 'Stars']

# Normalize
kwargs = dict(alpha=0.5, bins=60, density=True, stacked=False)

# Plotting the histogram
plt.hist(x1,**kwargs,histtype='stepfilled',color='b',label='Bowl')
plt.hist(x2,**kwargs,histtype='stepfilled',color='r',label='Cup')
plt.hist(x3,**kwargs,histtype='stepfilled',color='y',label='Pack')
plt.gca().set(title='Histogram of Probability of Ratings by Brand', ylabel='Probability')
plt.xlim(2,5)
plt.legend();
```

Output:

## Two-Dimensional Histogram

2D histogram is another interesting way to visualize your data. We can plot a histogram with just using the function plt.hist2d.We can customize the plot and the bin size just as the previous ones. Let’s look at a very simple example of 2D histogram below.

```import numpy as np
import matplotlib.pyplot as plt
import random

# Generating random data
n = 1000
x = np.random.standard_normal(1000)
y = 5.0 * x + 3.0* np.random.standard_normal(1000)

fig = plt.subplots(figsize =(10, 7))

# Plotting 2D Histogram
plt.hist2d(x, y,bins=100)
plt.title("2D Histogram")

plt.show()
```

Output:

## Conclusion

In summary, we learned five different ways in which we can plot a histogram and can customize our histograms, and also how to create a histogram with multiple variables in a dataset. These methods will help you a lot in visualizing your data for any data science project.