Leaky ReLU Activation Function in Neural Networks

An activation function in Neural Networks is a function applied on each node in a layer, such that it produces an output based on its input. Functions such as Sigmoid Function or Step Functions are generally used as Activation functions in Neural Networks. One of such functions is the Rectified Linear Unit (ReLU). The ReLU is defined as follows:

ReLU(x): {
                   x, if x>0
                   0, if x&lt;=0
               }

It is a Linear function for inputs greater than 0, and 0 for inputs smaller than 0. It follows the following graph:

Here, basically all the negative inputs are ignored to a preferred 0 output. But there are usually cases, in which negative inputs also play a major role. In such cases, another activation function is preferred, called as Leaky Rectified Linear Unit or Leaky ReLU. It is called Leaky ReLU because it takes into consideration the negative inputs, but diminishes the impact they have on the output. Leaky ReLU is defined as follows:

LeakyRelu(x): {
                          x, x>0
                          A*x, x&lt;=0
                      }
Where A is a constant defined in the function. A usually has a very small value such as 0.01 or 0.05 etc.

Leaky ReLU follows the following graph:

Leaky ReLU With A0 5 — Leaky ReLU With A=0.2

It can be seen in the above graph that the negative inputs do not impact the output in a more dominating fashion. It can be more effective than ReLU in certain use cases, while there can also be use cases where the negative inputs need to be completely neglected. In such cases, ReLU is more useful.

Using of ReLU or Leaky ReLU can be determined by using analysis on the dataset. It depends on the negative features in the input parameters and how they are supposed to act upon the Neural network. Data analysis for activation function can be made after all the features have been extracted. In many cases of feature extraction, for example Principle Component Analysis, a lot of negative features are created after feature extraction. So, in such cases, Leaky ReLU is more useful in dataset after feature extraction, while ReLU might would have been more suitable before feature extraction. This article covers up Activation Functions in more detail.

Implementing Leaky ReLU in Python

Leaky ReLU has a simple implementation. It uses basic if-else statement in Python and checks the input against 0. If greater than 0, the input is returned back as output, if smaller than 0, it is returned back, after being multiplied by a constant, called A in this article. The below code demonstrates the implementation:

def leakyrelu(A,x):
  if x<0:
    return A*x
  else:
    return x

It should be noted here that, leakyrelu() function takes the constant A as well as the input number as its input. A usually lies the range of 0 to 1 and has a very small value. The following code demonstrates the graph of the leakyrelu() function

X=[x for x in range(-10,11)]
Y=[leakyrelu(0.2,x) for x in range(-10,11)]
plt.xlim((-10,10))
plt.ylim((-10,10))
plt.plot([0,0],[-10,10],color='blue')
plt.plot([-10,10],[0,0],color='blue')
plt.plot(X,Y)
plt.show()

It can be seen in the creation of list Y, the constant A for leakyrelu() function has been fixed to 0.2 . Line 5 and Line 6 in the above code snippet are used to convert the resultant graph into quadrants. It produces the following output:

Leaky ReLU With A0 5 1 — Leaky ReLU With A=0.2

On fixing the constant to 0.09, the following graph is produced:

Leaky ReLU With A0 09 — Leaky ReLU With A=0.09

But this implementation cannot be used in Keras Neural Networks. It requires a NumPy np.array returning function, so it can be used as a lambda function in Neural Networks. But that is a secondary concern, Python Keras have a layer called LeakyReLU which can be used to implement Leaky ReLU in Keras Neural Networks.

LeakyReLU in Keras Python

Keras provides a LeakyReLU layer in Python. It should be noted that even though LeakyReLU is an activation function, it is produced as a layer in Keras. Hence the right way to use LeakyReLU in Keras, is to provide the activation function to preceding layer as Identity function and use LeakyReLU layer to calculate the output.

It will be well demonstrated by an example. In this example, the article tries to predict Diabetes in a patient using Neural Networks. The Database used has 6 features and 2 outputs. One of the outputs will be discarded because it is redundant in nature. In the another output, modifications will be made to convert Normal patients to 0, Type 1 diabetic patients to 1 and Type 2 diabetic patients to -1. All of this preprocessing steps are described in the code below:

import pandas as pd
df=pd.read_csv('Diabetestype.csv')
df.head()
df=df.drop('Class',axis=1)
typecol=list(df['Type'])
types=[]
for i in range(len(typecol)):
  if typecol[i]=='Normal':
    types.append(0)
  elif typecol[i]=='Type1':
    types.append(1)
  else:
    types.append(-1)
df['Type']=types
print(df)

In the above code snippet, the types list consists of all the output encoded in numerical format as explained above. Line 14 in the above code snippet, assigns the types list to DataFrame as our output. It produces the following output:

Now the input parameters would be separated from output parameters and they are divided into Training and Testing sets. It is shown in code below:

Y=df[['Type']]
X=df.drop(Y,axis=1)
from sklearn.model_selection import train_test_split
Xtrain,Xtest,Ytrain,Ytest=train_test_split(X,Y,test_size=0.2)

The Y or the output dataframe has only the Type column of dataframe. X or the input dataframe has all the other parameters except Y. The test size chosen over here is 20%. Now, a neural network can be created. The Neural Network will be basic Keras Sequential structure with one hidden layer. The input layer will have 6 nodes for 6 inputs, hidden layer will have 3 hidden nodes and the output layer will have 1 output node representing the Type column of Dataframe. All the layers have Leaky ReLU as their activation function. The following code demonstrates it:

import tensorflow as tf
from keras.models import Sequential
from keras.layers import Dense,LeakyReLU
model=Sequential()
model.add(Dense(3,input_dim=6))
model.add(LeakyReLU(alpha=0.05))
model.add(Dense(1))
model.add(LeakyReLU(alpha=0.05))

It should be noted that there are no activation functions explicitly described in the Dense layers of Keras Neural Network. It is because the next layer is activation function or the LeakyReLU layer. In the Keras LeakyReLU object, the A constant is described as alpha. Here alpha is taken as 0.05 in both the layers. Only input dimension for hidden layer is mentioned, it clarifies that there are 6 inputs. The first term in Dense() function tells the Keras Neural Network on how many nodes a layer shall have. Therefore, the last layer has only 1 node, the output node. Here is a visual representation of the Neural Network:

Now this network can be compiled and ran for measuring training and testing accuracy. The following code does the same:

model.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
model.fit(Xtrain,Ytrain,epochs=100,batch_size=10)

It should be noted that the loss function used in the model is Categorical Cross Entropy, which is used because the model has 3 different output classes. Adam optimizer is used here and the metric of ranking is used as Accuracy. The model trains for 100 loops or epochs with each epoch having a batch size of 10 samples. This article covers up Neural networks in more detail.

The above code snippet produces the following output:

This output shows that having LeakyReLU as activation function produces only 61.96% accuracy. The low accuracy score is because of the reason that none of the input parameters have negative input values. Its testing accuracy is measured as:

accuracy=model.evaluate(Xtest,Ytest)

It gives a 64.85% accuracy, higher than the training accuracy but still significantly lower.

Even though very low accuracy has been acquired from the model, it clarifies the use of LeakyReLU as an activation function. Dataset analysis is highly required before settling down on Leaky ReLU as activation function. Only datasets with many negative impacting inputs shall have it as activation function to produce the best results.

Conclusion

Leaky ReLU is a very powerful yet simple activation function used in neural networks. It is an updated version of ReLU where negative inputs have a impacting value. Leaky ReLU should only be used where there are many negative input factors that impact the output. It produces the best results only in such a dataset. Dataset analysis for choosing of activation function should only be done after feature extraction

Leaky ReLU is not provided as an activation function in Python Keras, but as a Layer. The preceding layer has identity function as its Activation function and the output is processed by LeakyReLU layer. Leaky ReLU can improvise a Neural network than ReLU but only in certain use cases. ReLU finds its use cases in certain Neural Networks better than Leaky ReLU as well.