5 NumPy Data Distributions to know

Hello, readers! In this article, we will be focusing on 5 NumPy Data Distributions in Python. So, let us get started!! 🙂

To begin with, Data Distribution enables us to have an idea about the distribution of the data. That is, it represents a list of all the possible values in the term of the data range and also represents the frequency of these data values in the distribution.

Python NumPy module offers us with random class that helps us have randomly generated data distributions for the randomized data values.

NumPy Data Distributions

Let’s work with the below NumPy Data Distributions.

Zipf distribution
Pareto distribution
Rayleigh distribution
Exponential distribution
Random distribution with choice() function

1. Zipf distribution

The Zipf NumPy data distribution is based on zipf’s law which states that the xth most common element is 1/x times the most common element from the range.

Python random.zipf() function enables us to implement zipf distribution on an array.

Syntax:

random.zipf(a,size)

a: distribution parameter
size: dimensions of the resultant array.

Example:

from numpy import random

data = random.zipf(a=2, size=(2, 4))

print(data)

Output:

[[   2   24    1    1]
 [   4 1116    4    4]]

2. Pareto Distribution

It follows Pareto’s law which states that 20 percent of factors contribute and cause 80 percent of the outcomes. The pareto() function enables us to implement Pareto Data Distribution on the randomized numbers.

Have a look at the below syntax!

random.pareto(a,size)

a: shape
size: dimensions of the resultant array.

Example:

from numpy import random

data = random.pareto(a=2, size=(2, 4))

print(data)

Output:

[[2.33897169 0.40735475 0.39352079 2.68105791]
 [0.02858458 0.60243598 1.17126724 0.36481641]]

3. Rayleigh Distribution

With Rayleigh Distribution, we can define and understand the distribution in terms of probability density in Signal processing.

Have a look at the below syntax!

random.rayleigh(scale,size)

scale: It is the standard deviation value that basically decides the flatness of a data distribution.
size: The dimensions of the output array.

Example:

from numpy import random

data = random.rayleigh(scale=2, size=(2, 4))

print(data)

Output:

[[3.79504431 2.24471025 2.3216389  4.01435725]
 [3.1247996  1.08692756 3.03840615 2.35757077]]

4. Exponential Distribution

Exponential Distribution enables us to understand the time frame till the occurrence of the next event. That is, the rate of the occurrence of any action depending upon the probability score. For example, the frame of success v/s failure rate – success/failure.

Syntax:

random.exponential(scale, size)

scale: Inverse value of the number of occurrences of an action. Default value = 1.0
size: The size of the output array.

Example:

from numpy import random

data = random.exponential(scale=2, size=(2, 4))

print(data)

Output:

[[0.56948472 0.08230081 1.39297867 5.97532969]
 [1.51290257 0.95905262 4.40997749 7.25248917]]

5. Random Distribution with choice() function

Random distribution represents the set of random data that follows certain traits of probability density values. The random class offers us with choice() function which enables us to define random numbers based on the set of probability values.

The probability ranges between 0 and 1 – 0 represents the number won’t ever occur and 1 represents the number will definitely and always occur in the set.

Syntax:

random.choice(array, p, size)

array: The elements amongst which the random data distribution needs to occur. The number of array elements should be equal to the count of p.
p: The probability score of every array element to occur in the random data distribution. The sum of all the values of p must be equal to 1.
size: The size of the 2-D/1-D array.

Example:

from numpy import random

data = random.choice([1,3,5,7], p=[0.1, 0.3, 0.2, 0.4], size=(2, 2))

print(data)

Output:

[[7 7]
 [1 3]]

Conclusion

By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.

For more such posts related to Python programming, Stay tuned with us.

Till then, Happy Learning!! 🙂

NumPy Data Distributions

1. Zipf distribution

2. Pareto Distribution

3. Rayleigh Distribution

4. Exponential Distribution

5. Random Distribution with choice() function

Conclusion

Safa Mulani

Related Posts

NumPy arange() method in Python

NumPy linspace(): Create Arrays Fast

An Ultimate Guide to Python numpy.where() method