A Basic Intro to Python Correlation

CORRELATION

Today’s world is full of lifestyle challenges. Lifestyle challenges like smoking and drinking are causing new levels of obesity and other health risks. Thus for medical researchers, finding different lifestyle patterns and their potential risks is of utmost importance.

This is done with the help of a statistical concept called correlation. Correlation is how strongly two variables are related. Correlation is especially useful in finance, particularly in portfolio management.

Correlation analysis measures the strength of relationship between two variables, explaining if they move together or independently. Positive correlation means variables move in the same direction, negative correlation means they move oppositely, and zero correlation shows no linkage. Python’s NumPy and Matplotlib provide tools to compute correlation coefficients and visualize correlation graphically.

In this article, we will understand what correlation is. After that, we will see an example of correlation using Python and understand its usefulness in the real world.

Recommended: Correlation Matrix in Python – Practical Implementation

What is Correlation?

Correlation tells us whether two variables move together or not. It tells us whether there is a linear relationship between these two variables. For example, we can observe that as the summer season comes around, people buy more ice-creams. This essentially signals us that warmer weather makes people treat themselves to ice cream. This is the case of positive correlation.

A case of negative correlation is video game time and physical activity. It is observed that the more people spend time playing video games, they tend to be less physically active.

There is a case of zero correlation as well for example shoe size and hair color. Universally we cannot find any correlation between these two.

Visualizing Positive Correlation

In the first example, we will observe what a positive correlation is and also plot its graph in Python as well. In the code below we have added noise to the data to create a positive correlation.

import numpy as np
import matplotlib.pyplot as plt

# Generate sample data with positive correlation
x = np.random.rand(100) * 5 + 10  # Random values between 10 and 15
y = 2 * x + np.random.randn(100) * 2  # Add noise to create a positive correlation

# Calculate correlation coefficient
correlation = np.corrcoef(x, y)[0, 1]

# Scatter plot with correlation value
plt.figure(figsize=(8, 6))
plt.scatter(x, y)
plt.title(f"Positive Correlation: {correlation:.2f}", fontsize=16)
plt.xlabel("X", fontsize=14)
plt.ylabel("Y", fontsize=14)
plt.grid(True)
plt.show()

Now let us observe its output which is given below.

Positive Correlation Output
Positive Correlation Output

Visualizing Negative Correlation

We will now look at the negative correlation using Python code and we will plot it as well. Similar to positive correlation, here we will add a negative slope with noise. Let us look at the code.

import numpy as np
import matplotlib.pyplot as plt

# Generate sample data with negative correlation
x = np.random.rand(100) * 20  # Values between 0 and 20
y = -0.8 * x + np.random.randn(100) * 4  # Negative slope with noise

# Calculate correlation coefficient
correlation = np.corrcoef(x, y)[0, 1]

# Create a clear and informative plot
plt.figure(figsize=(8, 6))  # Set appropriate figure size
plt.scatter(x, y)
plt.title(f"Negative Correlation: {correlation:.2f}", fontsize=16)
plt.xlabel("X", fontsize=14)
plt.ylabel("Y", fontsize=14)
plt.grid(True)  # Add grid for better readability
plt.show()

Let us look at the output of the code above.

Negative Correlation Output
Negative Correlation Output

Visualizing Zero Correlation

Let us move on to our last type of correlation which is zero correlation. The code is given below. Unlike others, we have not added any noise to skew our results.

import numpy as np
import matplotlib.pyplot as plt

# Generate independent data sets with no correlation
x = np.random.rand(100) * 10  # Random values between 0 and 10
y = np.random.rand(100) * 20  # Random values between 0 and 20 (independent of x)

# Calculate correlation coefficient
correlation = np.corrcoef(x, y)[0, 1]

# Create a clear and informative plot
plt.figure(figsize=(8, 6))  # Set appropriate figure size
plt.scatter(x, y)
plt.title(f"Zero Correlation: {correlation:.2f}", fontsize=16)
plt.xlabel("X", fontsize=14)
plt.ylabel("Y", fontsize=14)
plt.grid(True)  # Add grid for better readability
plt.show()

Zero Correlation Output
Zero Correlation Output

The above code correlates almost zero thus not establishing any relationship between the variables.

Conclusion

We’ve explored the statistical concept of correlation analysis and have covered three types of correlation:

  • Positive correlation: Variables move in the same direction
  • Negative correlation: Variables move in opposite directions
  • Zero correlation: No relationship between variables

Hope you have a better understanding of the concepts along with the ability to put them into practice. Let me know if you have any questions!

Recommended: Pearson Correlation – Implementing Pearson Correlation in Python