Recurrent Neural Networks with Keras: A Comprehensive Guide

There is no doubt that deep learning is taking over the world with its advanced applications like next-word prediction and autonomous vehicles. Deep Learning is used almost everywhere in this era. Fortunately, deep learning also has many networks we can implement for each use case.

Recurrent Neural Networks are quoted as a special type of neural network because of their ability to retain past information. These networks have a hidden unit that can store past data. With this capability, RNNs have been used in natural language processing, time series, and speech recognition where the input data is sequential.

We are going to discuss the architecture of RNNs, and how RNNs can be implemented with the help of the Keras library.

Recurrent Neural Networks (RNNs), fundamental in processing sequential data, utilize hidden layers to store past information, crucial for applications like language processing. While traditional RNNs struggle with long sequences, their successors, LSTMs and GRUs, address this limitation. Keras simplifies RNN implementation, with its SimpleRNN layer offering various parameters like unit count and activation functions, making it a versatile tool for tasks like time series prediction.

Architecture of Recurrent Neural Networks

Let’s first understand about RNNs before we move on to their architecture.RNNs are a type of neural network that has a memory to remember past sequences of data. These were the first neural networks that could process past inputs, but there were a few improvements to RNNs in LSTMs and GRUs, as the RNNs were found to lack the power to process long sequential data.

Also read: LSTMs

This brings us to the main question, How do these networks remember?

Hidden units. Yes, these networks have a component called the hidden layer which can store the past sequential data. This comes in extremely handy when we are dealing with next-word prediction where the information about the previous word is essential to guess what comes after.

Now, let us discuss the architecture of RNNs.

Architecture of Recurrent Neural Networks

We can say that block h is the crucial component as it is the hidden layer. It stores the output of the previous cells, which is essentially the past data we are concerned about.

While RNNs were a breakthrough in processing sequential data, they were found to be lacking in processing long sequences of data. For this very reason, its successors, Long Short-Term Memory (LSTM) and Gated Recurrent Unit(GRU) were introduced to overcome the long-term dependency problem of RNN.

Building RNNs with Keras Simplified

The Keras library is one of the most useful libraries out there for building deep learning models, providing the model layers, metrics and loss functions, and many more services.

Even for RNN, we have something called the SimpleRNN layer, which can be used to build an RNN model.

We can use this layer as a class, whose syntax is given below.

keras.layers.SimpleRNN(
    units,
    activation="tanh",
    use_bias=True,
    kernel_initializer="glorot_uniform",
    recurrent_initializer="orthogonal",
    bias_initializer="zeros",
    kernel_regularizer=None,
    recurrent_regularizer=None,
    bias_regularizer=None,
    activity_regularizer=None,
    kernel_constraint=None,
    recurrent_constraint=None,
    bias_constraint=None,
    dropout=0.0,
    recurrent_dropout=0.0,
    return_sequences=False,
    return_state=False,
    go_backwards=False,
    stateful=False,
    unroll=False,
    seed=None,
    **kwargs
)

Let us discuss the parameters briefly.

units describe the number of neurons in the output layer
activation specifies which activation function to use, the default being tanh
use_bias tells us whether to use a bias to improve the performance of the network
dropout is a parameter that determines the number of units to drop out in each epoch to avoid overfitting
kernel_initializer acts as an initializer to the kernel weights matrix, which in turn can be used to transform the inputs. The default being glorot_uniform
bias_initializer is the initializer for the bias vector, default = zero
return_sequences is a boolean parameter that determines if we have to return the full sequence of output. The default is False

Time Series Prediction Using RNN

In this section, we are going to use Keras’ SimpleRNN layer to analyze fabricated time series data and predict what the next value could be.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from keras.models import Sequential
from keras.layers import Dense,SimpleRNN

The pandas library is used to create a data frame, numpy for data transformation and computation, and matplotlib for data visualization. Next, we have the Keras library Sequential model, Dense, and SimpleRNN layers.

n=2000
T=900
t = np.arange(0,n)
x = np.sin(0.02*t)+np.random.rand(n)
df = pd.DataFrame(x)
df

We are generating sinusoidal data with the help of the numpy library, which is then stored in a data frame called x.

The data frame has 2000 rows since we have initialized n to 2000. To plot this data frame we have to use the matplotlib’s plot method.

plt.plot(df)
plt.show()

Up next, we are splitting the data into train and test sets.

#split
val = df.values
train,test = val[0:T,:],val[T:n,:]

The training set has values from 0-T(900) and the test set has values from T to N(900-2000).

step = 4
test = np.append(test,np.repeat(test[-1,],step))
train= np.append(train,np.repeat(train[-1,],step))
def conmat(data,step):
  X,Y=[],[]
  for i in range(len(data)-step):
    d=i+step
    X.append(data[i:d,])
    Y.append(data[d,])
  return np.array(X),np.array(Y)
trainX,trainY = conmat(train,step)
testX,testY = conmat(test,step)

In the above snippet, we are extending the train and test size and then converting the values into a matrix with the help of the function conmat.

trainX = np.reshape(trainX,(trainX.shape[0],1,trainX.shape[1]))
testX = np.reshape(testX,(testX.shape[0],1,testX.shape[1]))
trainX.shape
testX.shape

The SimpleRNN model requires three inputs, but we only have two. Hence, we are adding the uniform value of 1 as the third input.

model = Sequential()
model.add(SimpleRNN(units=32,input_shape=(1,step),activation="relu"))
model.add(Dense(8,activation="relu"))
model.add(Dense(1))
model.compile(loss='mean_squared_error',optimizer='rmsprop')
model.summary()

We are adding a simple RNN mode to the sequential model with 32 units which uses the ReLU activation function. The model is monitored with the Mean squared error and we also use an optimizer to enhance the performance of the model.

model.fit(trainX,trainY,epochs = 50,batch_size=16,verbose=2)
trpredict = model.predict(trainX)
tepredict = model.predict(testX)
predicted = np.concatenate((trpredict,tepredict),axis=0)

We are fitting the model to our data in 50 epochs, with a batch size of 16. The predict method is used to perform a prediction of how well the model has learned our data.

score = model.evaluate(trainX,trainY,verbose=0)
print(score)

As the last step, we are going to predict the next value in the time series data.

new_input = testX[-1, :, :]
new_input = np.reshape(new_input, (1, 1, step))
new_prediction = model.predict(new_input)
print("Predicted Next Data Point:", new_prediction[0, 0])

According to the model, the next value in the series data is 0.7758928.

Wrapping Up

While RNNs were the first networks to be associated with memory, these networks have their own set of issues that are later resolved with LSTMs and GRUs. The Simple RNNs can be used for tasks that require no long-term dependencies and short sequences of data. What future enhancements do you think could further evolve the capabilities of RNNs?

References

Keras – RNN

Issues with RNN