Augmented Dickey-Fuller Test In Time-Series Analysis

ADF Test

In this article we will learn about a very important hypothesis test – ADF test (Augmented Dickey Fuller test) for time series analysis.

Performing this test to know if the data we are working on is stationary or not is very important before proceeding to building forecasting models because this is a key assumption in lot of predictive models like ARIMA model.

Before going in details of the test, let us first understand a very important concept of Unit root.

What is Unit Root Test?

Unit root test is a statistical method used to determine if a time series is stationary or not. Stationarity refers to the property of a time series where the statistical properties such as mean, variance, and autocorrelation remain constant over time. A unit root is a condition in a time series where the root of the characteristic equation is equal to 1, indicating that the time series is non-stationary. Mathematically the unit root test can be represented as 

Image 56
Unit root test

 where, Yt is the value of the time series at time ‘t’ and Xe is an exogenous variable (a separate explanatory variable, which is also a time series)

If the value of α is 1 then time series is said to be non-stationary. This is because a value of 1 indicates that the current value of the series is perfectly correlated with the previous value, and this correlation extends back infinitely, making it impossible to distinguish between trend and random fluctuations in the data. In such a case, differencing or other techniques need to be used to make the series stationary before any meaningful analysis can be performed.

What is Dickey fuller test?

The Dickey-Fuller test is a statistical test that is commonly used to test for the presence of a unit root in a time series dataset. The null hypothesis of the test is that there is a unit root in the time series, which implies that the series is non-stationary and has a trend. The Dickey-Fuller test is based on the following model equation:

Image 57

where Y(t-1)is lag1 of time series and ΔY(t-1) first difference of the series at time (t-1)

  • The null hypothesis of the Dickey-Fuller test is that α=1, which implies that there is a unit root in the time series.
  • The alternative hypothesis is that α<1, which implies that the time series is stationary and does not have a unit root.

The Augmented Dickey-Fuller test evolved based on the above equation and is one of the most common form of Unit Root test.

What is ADF test?

The Augmented Dickey-Fuller (ADF) test is an extension of the Dickey-Fuller (DF) test that accounts for higher-order autoregressive processes and other variables that may affect the time series. The DF test is based on a regression of the first difference of the time series on its lagged values, and the test statistic is compared to critical values from a table to determine statistical significance.

The ADF test, on the other hand, adds additional lagged terms of the first difference of the time series to the DF regression equation to account for higher-order autoregressive processes. The ADF test also includes other variables, such as a linear trend, to account for factors that may affect the time series.

The null hypothesis of the ADF test is also similar to DF test that the time series has a unit root, while the alternative hypothesis is that the time series is stationary. The regression equation takes the following form:

Image 59
augmented dickey-duller Test

where ΔYt is the first difference of the time series at time t, Yt-1 is the lagged level of the time series at time t-1, α is the coefficients of lagged term, βt is a linear trend term, Φ1, Φ2 and Φp are the coefficients of difference of respective lagged term and εt is the error term.

How ADF test works?

The Augmented Dickey-Fuller (ADF) test can be performed using the following steps for the regression equation:

  1. First, we subtract the lagged value of Y from the current value of Y to obtain the first difference, denoted as ΔYt. We then replace Yt-1 with ΔYt-1 in the regression equation.
  2. We estimate the regression equation using ordinary least squares (OLS) to obtain the coefficients of the variables in the equation, including the trend coefficient ßt, the lagged level of Y coefficient , and the coefficients α of the lagged differences of Y, Φ1, Φ2, …, Φp.
  3. We calculate the test statistic based on the t-statistic of the coefficient α. The test statistic is adjusted for the number of lagged differences included in the regression equation. The test statistic is given below.
  4. We compare the test statistic to critical values from a table that depend on the sample size and the level of significance chosen. If the test statistic is less than the critical value, we reject the null hypothesis that the time series has a unit root and conclude that the time series is stationary. If the test statistic is greater than the critical value, we fail to reject the null hypothesis and conclude that the time series has a unit root and is non-stationary.
ADF

where SE(α) is the standard error of the coefficient α.

    Steps to perform the ADF Test in Python

    The statsmodel package provides a reliable implementation of the ADF test via the adfuller() function in statsmodels.tsa.stattools. It returns the following outputs:

    1. The p-value
    2. The value of the test statistic
    3. Number of lags considered for the test
    4. The critical value cutoffs.

    Also read: Stats model official documentation

    The adfuller() function from the statsmodels library in Python allows the user to specify the number of lags to include in the OLS regression by using the optional argument ‘maxlags’. By default, the function computes the optimal number of lags based on the number of observations in the series.

    You can also choose to let the function automatically select the number of lags by setting the ‘autolag’ parameter to ‘AIC’. This is generally considered a good approach because it selects the optimal number of lags while accounting for the trade-off between model complexity and goodness of fit.

    Step 1: Creating random time series data

    We’ll use random time series data to show the implementation as well as importing adfuller() method from statsmodels library

    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    from statsmodels.tsa.stattools import adfuller
    
    # Generate a sample time series
    ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2020', periods=1000))
    
    # Plot the time series
    plt.plot(ts)
    plt.title('Sample Time Series')
    plt.show()
    
    df output
    time-series data

    Step 2: Printing the results

    # ADF Test
    result = adfuller(ts, autolag='AIC')
    print(f'ADF Statistic: {result[0]}')
    print(f'n_lags: {result[1]}')
    print(f'p-value: {result[1]}')
    for key, value in result[4].items():
        print('Critial Values:')
        print(f'   {key}, {value}')  
    

    Output:

    ADF Statistic: -10.539311962050613
    n_lags: 8.733479479517842e-19
    p-value: 8.733479479517842e-19
    Critial Values:
       1%, -3.436972562223603
    Critial Values:
       5%, -2.864463856182476
    Critial Values:
       10%, -2.5683268054280175
    

    Here the p-value is less than the significance level (usually 0.05) and also the ADF statistic is less than any of the critical values., we reject the null hypothesis that the time series has a unit root and conclude that the time series is stationary.

    Summary

    We have learnt that the ADF test is a unit root test used to determine if a time series is stationary or non-stationary. The test involves regressing the time series on its lagged differences and calculating a test statistic that is compared to critical values to determine if the time series has a unit root.