Exploring Spearman Correlation in Python

Implementing Spearman correlation in python

In Python, we can measure the strength and direction of the association between two variables this statistical measure is known as Spearman correlation. It does not assume a linear relationship between the variables or that the variables are normally distributed. We compare the ranks of the values of the two variables being compared.

It measures the degree to which the ranks of one variable are related to other variables. It usually ranges from -1 tom+1 ,+1 means it perfectly positive correlation (as one variable increases the other increases too) -1 means its perfectly negative correlation ( one increase other decreases) and 0 indicates no correlation .

The Spearman correlation is valuable across numerous disciplines, such as social sciences, biology, engineering, and finance.

Implementing Spearman Correlation in Python

Let’s explore the Spearman correlation in Python, a statistical measure used to determine the strength and direction of non-linear associations between two variables without assuming a linear relationship or normal distribution.

We demonstrate its implementation in various examples, including calculating the Spearman correlation coefficient between arrays, generating correlation matrices for multiple arrays, plotting data with a correlation line, and finding rank correlations between DataFrame columns.

Example 1: Calculate Spearman Correlation Coefficient Between Two Arrays

import scipy.stats as stats

x = [1, 2, 3, 4, 5]
y = [5, 4, 3, 2, 1]

# Calculate the Spearman correlation coefficient
rho, p_value = stats.spearmanr(x, y)

print("Spearman correlation coefficient:", rho)

After importing the scipy.stats module we create two array x and y with the same number of elements which are also perfectly negatively correlated ie one variable increases the other variable decreases. These arrays are passed as parameters to the spearmanr function which returns two values ,the first Spearman correlation coefficient rho and the second the two-sided p-value for a hypothesis test whose null hypothesis is that the two samples are uncorrelated. The Spearman correlation coefficient ranges from -1 to +1. The -1 indicates a perfect negative correlation, 0 indicates no correlation and +1 indicates a perfect positive correlation.At the end we print the result calculated.

Output:

Spearman Eg1

Example 2: Calculate Spearman Correlation Matrix Between Multiple Arrays

import numpy as np
import scipy.stats as stats

data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Calculate the Spearman correlation matrix
rho, p_value = stats.spearmanr(data, axis=1)

print("Spearman correlation matrix:\n", rho)

We create a 3×3 matrix of data using Numpy’s array function.This matrix is passed as an argument to spearmanr function with axis=1 . 1 in axis specifies to calculate the correlation for the rows of the matrix if we wish to calculate for columns we need to set axis=0 .At the end we print the result calculated.

Output:

Spearman Eg2

Example 3: Plot a Scatter Plot with a Spearman Correlation Line

import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt

x = np.random.normal(0, 1, 100)
y = np.random.normal(0, 1, 100)

# Calculate the Spearman correlation coefficient
rho, p_value = stats.spearmanr(x, y)

plt.scatter(x, y)
plt.plot(np.sort(x), np.sort(y), color='red')
plt.title("Spearman correlation coefficient: {:.2f}".format(rho))
plt.xlabel("x")
plt.ylabel("y")
plt.show()

Unlike before examples here we create two arrays of random data x and y .After calculating the Speaman correlation coefficient we plot a scatter plot of data and add a line to the plot that shows the Spearman correlation between the two variables. The red line depicts the correlation between x and y .At the end we print the result calculated

Output:

Spearman Eg3

Example 4: Calculate Spearman Rank Correlation Between Two DataFrame Columns

import scipy.stats as stats
import pandas as pd

df = pd.read_csv("data.csv")

# Calculate the Spearman rank correlation between two columns
rho, p_value = stats.spearmanr(df["column1"], df["column2"])

print("Spearman rank correlation:", rho)

We load a sample dataset into a pandas DataFrame and calculate the Spearman rank correlation between two columns of the data frame using the spearmanr function from scipy.stats.At the end we print the result calcauted.

Below is the data.csv file used in the above code.

Output:

Spearman Eg4

Conclusion: Importance and Applications of Spearman Correlation

In conclusion, we have delved into the world of Spearman correlation and its practical applications in Python. As an important measure for non-linear relationships, it holds significant relevance in numerous fields, ranging from social sciences to engineering and finance. Understanding and implementing Spearman correlation is an essential skill for anyone working with data analysis. What are some other unique applications of Spearman correlation in your domain of interest?

You can also read a few interesting articles below: