In this article, we’ll learn about a well-known time series forecasting model – ARIMA Model.
Time series data is different in the sense that the data is recorded at a constant interval of time. Time series data has an added time component to it and each data point in the series depends on the previous data points.
A widely used statistical method for time series forecasting is the ARIMA model.
Suppose we need to forecast the sales of apples and we have previous sales records for each day. This problem can be categorized as time-series modeling.
In this article we are going to implement the ARIMA model.
What is ARIMA?
ARIMA stands for Autoregressive Integrated Moving Average. It is based on describing autocorrelations in the data and is one of the popular and powerful time-series algorithms for analyzing and forecasting time series data.
Let’s break down what ARIMA means:
- Autoregressive(AR): the dependent relationship between an observation and some number of lagged observations. It means that the past values used for forecasting the next value.
- Integrated(I): refers to the differencing operation performed on series to make it stationary.
- Moving average(MA): It means the number of past forecast errors used to predict future values.
1. Parameters of ARIMA Model.
ARIMA requires three components,
q, to build the model.
- p: it refers to the Number of autoregressive lags and is required for the auto-regressive aspect of the model.
dis associated with the integrated part of the model. It is the order of differencing required to make the series stationary.
- q: q refers to the number of moving average lags. It is associated with the moving average part of the model.
2. Stationary Series
A stationary series is a series where the properties do not change over time. This means the statistical properties like mean, variance, and covariance of time series are all constant over time.
We cannot build a time series model if our series is not stationary. ARIMA model requires data to be a Stationary series.
Demonstration of the ARIMA Model in Python
We will implement the
auto_arima function. It automatically finds the optimal parameters for an ARIMA model.
In other words, the function will automatically determine the parameters
q of the ARIMA model which is very convenient as the data preparation and parameter tuning processes end up being really time-consuming.
pmdarima module which has auto_arima function. So let’s get right
1. Importing Dataset
The dataset we’ll be using for this demonstration is the Electrical_Production dataset (Downloaded from Kaggle).
import pandas as pd import numpy as np import matplotlib.pyplot as plt series = pd.read_csv('Electric_Production.csv' , index_col = 0) #divide into train and validation set train = series[:int(0.7*(len(series)))] valid = series[int(0.7*(len(series))):] #Plotting the data plt.figure(figsize = (8,5)) ax = plt.gca() ax.xaxis.set_major_locator(plt.MaxNLocator(20)) plt.xticks(rotation = 45) plt.plot(series) plt.show()
2. Check if the series is stationary
Let’s perform the ‘Augmented Dickey-Fuller Test’ to check whether the data is stationary or not.
# Importing required modules from pmdarima.arima import ADFTest adf_test = ADFTest(alpha = 0.05) adf_test.should_diff(series)
Output: (0.01, False)
ADFTest class to perform the Augmented Dickey-Fuller Test.
We can also use
statsmodels.tsa.stattools module which has
adfuller class to perform the test.
The output from the above code means that the series is not stationary and we need to make it a stationary series first to implement ARIMA.
Here’s when auto Arima is useful, it automatically determines the differencing parameter ‘d’ to make the series stationary.
3. Implementing the ARIMA Model
#Importing the module import pmdarima as pmd arima_model = pmd.auto_arima(train, start_p=0,d = 1,start_q=0, test="adf", supress_warnings = True, trace=True) #Summary of the model arima_model.summary()
There are some important input arguments that we passed to the function. The important parameters of the function are:
- The time-series on which to fit the ARIMA model.
start_p: the order of the auto-regressive (AR) model.
start_q: The order of the moving average (MA) model.
d: the order of first-differencing. The default is set to None.
test: type of unit root test to use in order to detect stationarity.
4. Checking Model Performance Using MAPE
Now checking how good our model was using Mean absolute percentage error as the performance metric.
#Predict the future values valid['predicted'] = arima_model.predict(n_periods = len(valid)) def MAPE(true, pred): true, pred = np.array(true), np.array(pred) return np.mean(np.abs((true - pred) / true)) * 100 MAPE(valid.IPG2211A2N, valid.predicted)
MAPE value should be as low as possible, it is an indicator that our model is making fewer errors.
In this article, we learned the ARIMA model for time series forecasting and implemented it in Python. We determined the stationary condition for the time series dataset for the model to perform well and implemented the Augmented Dickey-Fuller Test to check the stationarity.