ARIMA Model Demonstration in Python

In this article, we’ll learn about a well-known time series forecasting model – ARIMA Model.

Time series data is different in the sense that the data is recorded at a constant interval of time. Time series data has an added time component to it and each data point in the series depends on the previous data points.

A widely used statistical method for time series forecasting is the ARIMA model.

Suppose we need to forecast the sales of apples and we have previous sales records for each day. This problem can be categorized as time-series modeling.

In this article we are going to implement the ARIMA model.

What is ARIMA?

ARIMA stands for Autoregressive Integrated Moving Average. It is based on describing autocorrelations in the data and is one of the popular and powerful time-series algorithms for analyzing and forecasting time series data.

Let’s break down what ARIMA means:

Autoregressive(AR): the dependent relationship between an observation and some number of lagged observations. It means that the past values used for forecasting the next value.
Integrated(I): refers to the differencing operation performed on series to make it stationary.
Moving average(MA): It means the number of past forecast errors used to predict future values.

1. Parameters of ARIMA Model.

ARIMA requires three components, p, d, and q, to build the model.

p: it refers to the Number of autoregressive lags and is required for the auto-regressive aspect of the model.
d: d is associated with the integrated part of the model. It is the order of differencing required to make the series stationary.
q: q refers to the number of moving average lags. It is associated with the moving average part of the model.

2. Stationary Series

A stationary series is a series where the properties do not change over time. This means the statistical properties like mean, variance, and covariance of time series are all constant over time.

We cannot build a time series model if our series is not stationary. ARIMA model requires data to be a Stationary series.

Some of the popular methods to make a series of stationary are Augmented Dickey-Fuller test, Differencing, Detrending, etc.

Demonstration of the ARIMA Model in Python

We will implement the auto_arima function. It automatically finds the optimal parameters for an ARIMA model.

In other words, the function will automatically determine the parameters p, d, and q of the ARIMA model which is very convenient as the data preparation and parameter tuning processes end up being really time-consuming.

We’ll use pmdarima module which has auto_arima function. So let’s get right

1. Importing Dataset

The dataset we’ll be using for this demonstration is the Electrical_Production dataset (Downloaded from Kaggle).

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

series = pd.read_csv('Electric_Production.csv' , index_col = 0)

#divide into train and validation set
train = series[:int(0.7*(len(series)))]
valid = series[int(0.7*(len(series))):]

#Plotting the data
plt.figure(figsize = (8,5))
ax = plt.gca()
ax.xaxis.set_major_locator(plt.MaxNLocator(20))
plt.xticks(rotation = 45)
plt.plot(series)
plt.show()

2. Check if the series is stationary

Let’s perform the ‘Augmented Dickey-Fuller Test’ to check whether the data is stationary or not.

# Importing required modules
from pmdarima.arima import ADFTest

adf_test = ADFTest(alpha = 0.05)
adf_test.should_diff(series)

Output: (0.01, False)

pmdarima.arima has ADFTest class to perform the Augmented Dickey-Fuller Test.

We can also use statsmodels.tsa.stattools module which has adfuller class to perform the test.

The output from the above code means that the series is not stationary and we need to make it a stationary series first to implement ARIMA.

Here’s when auto Arima is useful, it automatically determines the differencing parameter ‘d’ to make the series stationary.

3. Implementing the ARIMA Model

#Importing the module
import pmdarima as pmd

arima_model = pmd.auto_arima(train, 
                              start_p=0,d = 1,start_q=0,
                              test="adf", supress_warnings = True,
                              trace=True)

#Summary of the model
arima_model.summary()

There are some important input arguments that we passed to the function. The important parameters of the function are:

The time-series on which to fit the ARIMA model.
start_p: the order of the auto-regressive (AR) model.
start_q: The order of the moving average (MA) model.
d : the order of first-differencing. The default is set to None.
test: type of unit root test to use in order to detect stationarity.

4. Checking Model Performance Using MAPE

Now checking how good our model was using Mean absolute percentage error as the performance metric.

#Predict the future values
valid['predicted'] = arima_model.predict(n_periods = len(valid))

def MAPE(true, pred): 
    true, pred = np.array(true), np.array(pred)
    return np.mean(np.abs((true - pred) / true)) * 100

MAPE(valid.IPG2211A2N, valid.predicted)

output:

12.44044096590272

MAPE value should be as low as possible, it is an indicator that our model is making fewer errors.

Conclusion

In this article, we learned the ARIMA model for time series forecasting and implemented it in Python. We determined the stationary condition for the time series dataset for the model to perform well and implemented the Augmented Dickey-Fuller Test to check the stationarity.

Happy Learning!