Understanding Probability Density and Distribution Functions

Understanding Probability Distribution (1)

Probability helps us with making investment choices. Since we can’t foresee the future, investors have to go on likelihood and chance instead of certainty. So before an analyst suggests a stock, they’ll gauge the probability it will gain or lose value.

In this article, we’ll look at different types of probability distributions – discrete and continuous. Simply put, discrete deals with countable things, while continuous involves uncountable things. We’ll check out some common distributions like the uniform, normal, chi-squared, and F-distributions.

Exploring Probability Distributions for Investment Decisions

Probability Distribution functions are generally categorized into two groups i.e. Discrete distributions and Continuous distributions.

Discrete distributions refer to those distributions where our random variable can assume only a finite number of values. The simplest example is flipping a coin which provides us with two values which are Heads ( H ) and Tails ( T ). In this article, we will discuss Uniform Discrete Distribution, Binomial Distribution, Poisson Distribution, and Geometric Distribution.

In contrast to discrete distributions, continuous random variables can assume any values within a given range. For example, if the given range is [2,5], any value such as 3.145159256…. can be assumed thus making the number of values taken to be uncountable. We will discuss uniform distribution, normal distribution, t-distribution, chi-squared distribution, and f-distribution in this reading.

Recommended: Python and Probability: Simulating Blackjack Card Counting with Python Code

Understanding Discrete Probability Distributions

Uniform Discrete distribution is very easy to understand. It has a finite number of observations with equally likely outcomes. It is denoted by the given formula

f(x)=1/n

Here ‘n’ is the total number of observations.

import matplotlib.pyplot as plt
import numpy as np

# Define parameters
n = 10  # Number of outcomes

# Generate discrete uniform distribution data
x = np.arange(n)  # Possible outcomes (0 to n-1)
p = np.ones(n) / n  # Equal probability for each outcome

# Create the plot
plt.figure(figsize=(8, 6))
plt.bar(x, p, color='skyblue', edgecolor='black')
plt.xlabel('Outcome')
plt.ylabel('Probability')
plt.title('Discrete Uniform Distribution (n={})'.format(n))
plt.grid(True)
plt.show()
Discrete Uniform Distribution
Discrete Uniform Distribution

In the above plot, you can observe that each of the ten observations has an equal chance to get selected i.e. 10%.

The binomial distribution function is a combination of n independent trials. Only two outcomes are possible i.e. success and failure. The function is denoted by the given formula:

f(x) = nCx px qn-x

Here n means the number of trials, p is the probability of success, and q means the probability of failure. Here x is our random variable which means x successes out of n trials. Let’s look at its Python implementation.

import matplotlib.pyplot as plt
import scipy.stats as stats

# Define parameters
n = 10  # Number of trials
p = 0.4  # Probability of success

# Generate binomial distribution data
x = range(n + 1)  # Possible outcomes (0 to n successes)
p_vals = stats.binom.pmf(x, n, p)  # Probability of each outcome

# Create the plot
plt.figure(figsize=(8, 6))
plt.bar(x, p_vals, color='skyblue', edgecolor='black')
plt.xlabel('Number of successes')
plt.ylabel('Probability')
plt.title('Binomial Distribution (n={}, p={})'.format(n, p))
plt.grid(True)
plt.show()
Binomial Distribution
Binomial Distribution

When n=10, and p=0.4, the corresponding probabilities are given with x ranging from 1 to 10. The above graph denotes the same.

Poisson distribution calculates the probability of the number of occurrences of an event within a specified interval of time. These events can happen randomly and independently over time. Let’s look at its formula:

f(x) =(e– λ λx)/x!

Here λ is the average rate of the occurrence of the event. ‘x‘ is the Poisson random variable.

Let’s look at its Python code.

import matplotlib.pyplot as plt
import scipy.stats as stats

# Define parameters
lambda_param = 5  # Mean number of occurrences

# Generate Poisson distribution data
x = range(10)  # Possible outcomes (0 to 9 occurrences)
p_vals = stats.poisson.pmf(x, lambda_param)  # Probability of each outcome

# Create the plot
plt.figure(figsize=(8, 6))
plt.bar(x, p_vals, color='skyblue', edgecolor='black')
plt.xlabel('Number of occurrences')
plt.ylabel('Probability')
plt.title('Poisson Distribution (lambda={})'.format(lambda_param))
plt.grid(True)
plt.show()
Poisson Distribution
Poisson Distribution

From the above plot, the probabilities ranging from 0 to 10 are given with λ=5.

The last discrete distribution we tackle is the Geometric distribution. In Geometric Distribution, the ‘n’ number of independent trials are carried out until we achieve our first success. Let’s look at its formula:

P(X = x) = (1 – p)x – 1p

‘p’ is the probability of success of the given trials, whereas ‘x’ is the random variable. Let’s look at its Python implementation as well.

import matplotlib.pyplot as plt
import scipy.stats as stats

# Define parameters
p = 0.3  # Probability of success

# Generate geometric distribution data
x = range(1, 11)  # Possible outcomes (1 to 10 trials)
p_vals = stats.geom.pmf(x, p)  # Probability of each outcome

# Create the plot
plt.figure(figsize=(8, 6))
plt.bar(x, p_vals, color='skyblue', edgecolor='black')
plt.xlabel('Number of trials until success')
plt.ylabel('Probability')
plt.title('Geometric Distribution (p={})'.format(p))
plt.grid(True)
plt.show()
Geometric Distribution
Geometric Distribution

The above plot gives us the visual representation of p=0.3 at different values ranging from 1 to 10.

Also read: Probability Distributions with Python (Implemented Examples)

Understanding Continuous Probability Distributions

Continuous Uniform distribution is a simple but very important probability distribution in statistics. It gives you the probability over a given range where every value is equally likely to occur.

f(x) = { 1/(b-a), if a ≤ x ≤ b

0, otherwise

In the above formula ‘a’ and ‘b’ denotes the starting and ending point of the range. Let’s look at how continuous uniform distribution works in Python.

#import required modules
import matplotlib.pyplot as plt
import numpy as np

# Define parameters
a = 0  # Lower bound
b = 5  # Upper bound
size = 1000  # Number of samples

# Generate random samples
x = np.linspace(a, b, size)

# Define the PDF (Probability Density Function)
def pdf(x, a, b):
  return np.ones_like(x) / (b - a)

# Calculate PDF values for samples
y = pdf(x, a, b)

# Plot the PDF
plt.figure(figsize=(8, 6))
plt.plot(x, y, label='PDF')
plt.xlabel('x')
plt.ylabel('Probability Density')
plt.title('Continuous Uniform Distribution PDF (a={}, b={})'.format(a, b))
plt.grid(True)
Continuous Uniform Distribution
Continuous Uniform Distribution

In the above plot, we can observe that a rectangular plot is formed with the range being [0,5].

Normal distribution also known as Gaussian distribution is a probability distribution where the mean, median, and mode are the same. It is a bell-curved function and is symmetrically distributed around the mean. Let’s look at its formula to analyze it in more depth.

Normal Distribution Formula
Normal Distribution Formula

It has two parameters which are μ ( mean ) and σ (standard deviation ).

import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import norm

def plot_normal_distribution(method="sample", mean=50, std=10, size=1000, bins=20):
  """
  Plots a normal distribution using either random samples or PDF calculation.

  Args:
      method (str, optional): "sample" to generate random samples, "pdf" to plot PDF. Defaults to "sample".
      mean (float, optional): Mean of the distribution. Defaults to 50.
      std (float, optional): Standard deviation of the distribution. Defaults to 10.
      size (int, optional): Number of samples to generate (only used for "sample" method). Defaults to 1000.
      bins (int, optional): Number of bins for histogram (only used for "sample" method). Defaults to 20.
  """
  if method == "sample":
    data = np.random.normal(mean, std, size)
    plt.hist(data, bins=bins, density=True, edgecolor='black', label="Sample-based")
  elif method == "pdf":
    x = np.linspace(mean - 3*std, mean + 3*std, 1000)
    y = norm.pdf(x, loc=mean, scale=std)
    plt.plot(x, y, label="PDF-based")
  else:
    raise ValueError("Invalid method specified. Choose 'sample' or 'pdf'.")

  plt.xlabel('Value')
  plt.ylabel('Density')
  plt.title('Normal Distribution (μ={}, σ={})'.format(mean, std))
  plt.grid(True)
  plt.legend()
  plt.show()

# Choose your preferred method and customize parameters
plot_normal_distribution(method="sample", mean=75, std=15, size=500)  # Sample-based with custom parameters
#plot_normal_distribution(method="pdf", mean=20, std=5)  # PDF-based with different parameters
Normal Distribution Curve
Normal Distribution Curve

The above plot gives us the visual representation when the mean is 75 and the standard deviation is 0.

The student’s t-distribution or t-distribution is similar to normal distribution and is used when the sample size is less than 30. Moreover, the standard deviation is unknown and t-distribution has fatter tails. Let’s look at its Python code and the subsequent plot. It has a parameter known as df or degrees of freedom.

import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import t

# Define parameters
mu = 50  # Mean
std = 10  # Standard deviation
df = 5  # Degrees of freedom
x = np.linspace(mu - 3*std, mu + 3*std, 1000)  # Range of values

# Define the t-distribution PDF formula (omitting normalization constant)
def t_pdf(x, mu, std, df):
  return (1 / (np.sqrt(df*np.pi) * std)) * ((1 + ((x - mu) / (std**2 * df))**2)**(-(df + 1)/2))

# Calculate PDF values
y = t_pdf(x, mu, std, df)

# Plot the PDF and a normal distribution for comparison
plt.plot(x, y, label='t-distribution (df={})'.format(df))
plt.plot(x, norm.pdf(x, loc=mu, scale=std), label='Normal distribution')
plt.xlabel('Value')
plt.ylabel('Density')
plt.title('t-distribution vs. Normal distribution (μ={}, σ={})'.format(mu, std))
plt.grid(True)
plt.legend()
plt.show()
T Distribution
T Distribution

The above plot is similar to a normal distribution with degrees of freedom ( df ) equal to 5. Once again we use t-distribution when the standard deviation is unknown and the sample size is less than 30.

Conclusion

These are just some of the basic distribution functions that will equip budding investment analysts with some arsenal. Please note that there are numerous distributions present for different situations. Moreover, every distribution with a large enough sample size tends to be a normal distribution which is the essence of the Central Limit Theorem.

Hope you enjoyed reading!!

Recommended: Monte-Carlo Simulation to find the probability of Coin toss in python