Statistical Hypothesis Testing: A Comprehensive Guide

We’ve all heard it – “go to college to get a good job.” The assumption is that higher education leads straight to higher incomes. Elite Indian institutes like the IITs and IIMs are even judged based on the average starting salaries of their graduates. But is this direct connection between schooling and income actually true?

Intuitively, it seems believable. But how can we really prove this assumption that more school = more money? Is there hard statistical evidence either way? Turns out, there are methods to scientifically test widespread beliefs like this – what statisticians call hypothesis testing.

In this article, we’ll dig into the concept of hypothesis testing and the tools to rigorously question conventional wisdom: null and alternate hypotheses, one and two-tailed tests, paired sample tests, and more.

Statistical hypothesis testing allows researchers to make inferences about populations based on sample data. It involves setting up a null hypothesis, choosing a confidence level, calculating a p-value, and conducting tests such as two-tailed, one-tailed, or paired sample tests to draw conclusions.

What is Hypothesis Testing?

Statistical Hypothesis Testing is a method used to make inferences about a population based on sample data. Before we move ahead and understand what Hypothesis Testing is, we need to understand some basic terms.

Null Hypothesis

The Null Hypothesis is generally where we start our journey. Null Hypotheses are statements that are generally accepted or statements that you want to challenge. Since it is generally accepted that income level is positively correlated with quality of education, this will be our Null Hypothesis. It is denoted by H₀.

H₀: Income levels are positively correlated with quality of education.

Alternate Hypothesis

The Alternate Hypothesis is the opposite of the Null hypothesis. An alternate Hypothesis is what we want to prove as a researcher and is not generally accepted by society. An alternate hypothesis is denoted H_a. The alternate hypothesis of the above is given below.

H_a: Income levels are negatively correlated with the quality of education.

Confidence Level (1-α)

Confidence Levels represent the probability that the range of values contains the true parameter value. The most common confidence levels are 95% and 99%. It can be interpreted that our test is 95% accurate if our confidence level is 95%. It is denoted by 1-α.

p-value (p)

The p-value represents the probability of obtaining test results at least as extreme as the results actually observed, under the assumption that the null hypothesis is correct. A lower p-value means fewer chances for our observed result to happen. If our p-value is less than α, our null hypothesis is rejected, otherwise null hypothesis is accepted.

Types of Hypothesis Tests

Since we are equipped with the basic terms, let’s go ahead and conduct some hypothesis tests.

Conducting a Two-Tailed Hypothesis Test

In a two-tailed hypothesis test, our analysis can go in either direction i.e. either more than or less than our observed value. For example, a medical researcher testing out the effects of a placebo wants to know whether it increases or decreases blood pressure. Let’s look at its Python implementation.

import pandas as pd
from scipy.stats import ttest_ind

# Define alpha value (significance level)
alpha = 0.05

# Generate sample data
data = {
    "student_id": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    "study_method": ["new", "traditional", "new", "traditional", "new", "traditional", "new", "traditional", "new", "traditional"],
    "exam_score": [85, 78, 92, 80, 75, 90, 88, 72, 95, 83]
}

df = pd.DataFrame(data)

# Group data by study method
new_method_scores = df[df["study_method"] == "new"]["exam_score"].tolist()
traditional_method_scores = df[df["study_method"] == "traditional"]["exam_score"].tolist()

# Perform two-tailed t-test and print results
t_statistic, p_value = ttest_ind(new_method_scores, traditional_method_scores, equal_var=False)

# Check if p-value is less than alpha for rejection
if p_value < alpha:
    print("Reject null hypothesis (p-value < alpha):", p_value)
else:
    print("Fail to reject null hypothesis (p-value >= alpha):", p_value)

# Print t-statistic for reference
print("t-statistic:", t_statistic)

In the above code, we want to know if the group study method is an effective way to study or not. Therefore our null and alternate hypotheses are as follows.

H₀: The Group study method is not an effective way to study.
H_a: The group study method is an effective way to study.

Two Tailed Test Output — ***Two-Tailed Test Output***

Since the p-value is greater than α, we fail to reject the null hypothesis. Therefore the group study method is not an effective way to study.

In a one-tailed hypothesis test, we have certain expectations in which way our observed value will move i.e. higher or lower. For example, our researchers want to know if a particular medicine lowers our cholesterol level. Let’s look at its Python code.

import pandas as pd
from scipy.stats import ttest_ind

# Define alpha value (significance level)
alpha = 0.05

# Generate sample data
data = {
    "student_id": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    "study_method": ["new", "traditional", "new", "traditional", "new", "traditional", "new", "traditional", "new", "traditional"],
    "exam_score": [85, 78, 92, 80, 75, 90, 88, 72, 95, 83]
}

df = pd.DataFrame(data)

# Group data by study method
new_method_scores = df[df["study_method"] == "new"]["exam_score"].tolist()
traditional_method_scores = df[df["study_method"] == "traditional"]["exam_score"].tolist()

# Perform one-tailed right-tailed t-test
t_statistic, p_value = ttest_ind(new_method_scores, traditional_method_scores, alternative="greater")

# Check if p-value is less than alpha for rejection
if p_value < alpha:
    print("Reject null hypothesis (p-value < alpha):", p_value)
    print("One-tailed right-tailed t-test suggests higher scores in the new method group.")
else:
    print("Fail to reject null hypothesis (p-value >= alpha):", p_value)

# Print t-statistic for reference
print("t-statistic:", t_statistic)

Here our null and alternate hypothesis tests are given below.

H₀: The Group study method does not increase our marks.
H_a: The group study method increases our marks.

One Tailed Test Output — ***One-Tailed Test Output***

Since the p-value is greater than α, we fail to reject the null hypothesis. Therefore the group study method does not increase our marks.

A paired sample test compares two sets of observations and then provides us with a conclusion. For example, we need to know whether the reaction time of our participants increases after consuming caffeine. Let’s look at another example with a Python code as well.

import pandas as pd
from scipy.stats import ttest_rel

# Define alpha value (significance level)
alpha = 0.05

# Generate sample data
data = {
    "student_id": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    "pre_score": [75, 80, 85, 90, 78, 82, 88, 72, 95, 83],
    "post_score": [85, 92, 90, 88, 75, 90, 84, 77, 98, 87]
}

df = pd.DataFrame(data)

# Calculate score difference for each student
df["score_difference"] = df["post_score"] - df["pre_score"]

# Perform paired samples t-test and print results
t_statistic, p_value = ttest_rel(df["pre_score"], df["post_score"])

# Check if p-value is less than alpha for rejection
if p_value < alpha:
    print("Reject null hypothesis (p-value < alpha):", p_value)
    print("Paired samples t-test suggests significant difference in scores after using the study method.")
else:
    print("Fail to reject null hypothesis (p-value >= alpha):", p_value)

# Print t-statistic for reference
print("t-statistic:", t_statistic)

Similar to the above hypothesis tests, we consider the group study method here as well. Our null and alternate hypotheses are as follows.

H₀: The group study method does not provide us with significant differences in our scores.
H_a: The group study method gives us significant differences in our scores.

Since the p-value is greater than α, we fail to reject the null hypothesis.

Conclusion

Here you go! Now you are equipped to perform statistical hypothesis testing on different samples and draw out different conclusions. You need to collect data and decide on null and alternate hypotheses. Furthermore, based on the predetermined hypothesis, you need to decide on which type of test to perform. Statistical hypothesis testing is one of the most powerful tools in the world of research.

Now that you have a grasp on statistical hypothesis testing, how will you apply these concepts to your own research or data analysis projects? What hypotheses are you eager to test?

Do check out: How to find critical value in Python