Bayesian Inference in Python: A Comprehensive Guide with Examples

Data-driven decision-making has become essential across various fields, from finance and economics to medicine and engineering. Understanding probability and statistics is crucial for making informed choices today. Bayesian inference, a powerful tool in probabilistic reasoning, allows us to update our beliefs about an event based on new evidence.

Bayes’s theorem, a fundamental concept in probability theory, forms the foundation of Bayesian inference. This theorem provides a way to calculate the probability of an event occurring given prior knowledge and observed data. Combining our initial beliefs with the likelihood of the evidence, we can arrive at a more accurate posterior probability.

Bayesian inference is a statistical method based on Bayes’s theorem, which updates the probability of an event as new data becomes available. It is widely used in various fields, such as finance, medicine, and engineering, to make predictions and decisions based on prior knowledge and observed data. In Python, Bayesian inference can be implemented using libraries like NumPy and Matplotlib to generate and visualize posterior distributions.

This article will explore Bayesian inference and its implementation using Python, a popular programming language for data analysis and scientific computing. We will start by understanding the fundamentals of Bayes’s theorem and formula, then move on to a step-by-step guide on implementing Bayesian inference in Python. Along the way, we will discuss a real-world example of predicting website conversion rates to illustrate the practical application of this powerful technique.

What is Bayesian Inference?

Bayesian inference is based on Bayes’s theorem, which is based on the prior probability of an event. As events happen, the probability of the event keeps updating. Let us look at the formula of Baye’s theorem.

Implementing Bayesian Inference in Python

Let us try to implement the same in Python with the code below.

import numpy as np
import matplotlib.pyplot as plt

# Generate some synthetic data
np.random.seed(42)
true_mu = 5
true_sigma = 2
data = np.random.normal(true_mu, true_sigma, size=100)

# Define the prior hyperparameters
prior_mu_mean = 0
prior_mu_precision = 1  # Variance = 1 / precision
prior_sigma_alpha = 2
prior_sigma_beta = 2  # Beta = alpha / beta

# Update the prior hyperparameters with the data
posterior_mu_precision = prior_mu_precision + len(data) / true_sigma**2
posterior_mu_mean = (prior_mu_precision * prior_mu_mean + np.sum(data)) / posterior_mu_precision

posterior_sigma_alpha = prior_sigma_alpha + len(data) / 2
posterior_sigma_beta = prior_sigma_beta + np.sum((data - np.mean(data))**2) / 2

# Calculate the posterior parameters
posterior_mu = np.random.normal(posterior_mu_mean, 1 / np.sqrt(posterior_mu_precision), size=10000)
posterior_sigma = np.random.gamma(posterior_sigma_alpha, 1 / posterior_sigma_beta, size=10000)

# Plot the posterior distributions
plt.figure(figsize=(10, 4))
plt.subplot(1, 2, 1)
plt.hist(posterior_mu, bins=30, density=True, color='skyblue', edgecolor='black')
plt.title('Posterior distribution of $\mu$')
plt.xlabel('$\mu$')
plt.ylabel('Density')

plt.subplot(1, 2, 2)
plt.hist(posterior_sigma, bins=30, density=True, color='lightgreen', edgecolor='black')
plt.title('Posterior distribution of $\sigma$')
plt.xlabel('$\sigma$')
plt.ylabel('Density')

plt.tight_layout()
plt.show()

# Calculate summary statistics
mean_mu = np.mean(posterior_mu)
std_mu = np.std(posterior_mu)
print("Mean of mu:", mean_mu)
print("Standard deviation of mu:", std_mu)

mean_sigma = np.mean(posterior_sigma)
std_sigma = np.std(posterior_sigma)
print("Mean of sigma:", mean_sigma)
print("Standard deviation of sigma:", std_sigma)

Let us look at the output of the same.

Real-World Example: Predicting Website Conversion Rates with Bayesian Inference

Let us now look at the case of a website. We are trying to predict how many people will buy the product to the ratio of the number of visitors. We have also created some parameters.

import numpy as np
import matplotlib.pyplot as plt

# Observed data
num_visitors = 1000  # Total number of visitors to the website
num_conversions = 50  # Number of conversions (desired actions)

# Prior hyperparameters for the Beta distribution
prior_alpha = 1  # Shape parameter
prior_beta = 1   # Shape parameter

# Update the prior with the observed data to get the posterior parameters
posterior_alpha = prior_alpha + num_conversions
posterior_beta = prior_beta + (num_visitors - num_conversions)

# Generate samples from the posterior Beta distribution
posterior_samples = np.random.beta(posterior_alpha, posterior_beta, size=10000)

# Plot the posterior distribution
plt.figure(figsize=(8, 6))
plt.hist(posterior_samples, bins=30, density=True, color='skyblue', edgecolor='black', alpha=0.7)
plt.title('Posterior Distribution of Conversion Rate')
plt.xlabel('Conversion Rate')
plt.ylabel('Density')
plt.xlim(0, 0.1)  # Limiting x-axis to focus on conversion rates close to zero
plt.show()

# Calculate summary statistics
mean_conversion_rate = posterior_alpha / (posterior_alpha + posterior_beta)
mode_conversion_rate = (posterior_alpha - 1) / (posterior_alpha + posterior_beta - 2)  # Mode of the Beta distribution

print("Mean conversion rate:", mean_conversion_rate)
print("Mode conversion rate:", mode_conversion_rate)

Let us look at the output of the above code and try to deduce some information from it.

According to our predictions, the probability of conversion is around 5%.

Conclusion

Here you go! Now, you know a lot more about Bayesian inference. In this article, we learned what Bayesian inference is and also touched upon how to implement it in the Python programming language. We also learned about a very simple case study where we calculated the probability of customers’ conversions if they visited a particular website. Bayesian inference, as mentioned, is also used heavily in the fields of finance, economics, and engineering.

Hope you enjoyed reading it!!