Multilayer Perceptron (MLP) – A Sneak Peak

Multilayer Perceptron

Artificial Neural Networks or simply neural networks-NNs are being used by machine learning engineers for tasks so simple as classifying images using Convolutional Neural Networks(CNN) to using Recurrent Neural Networks (RNN) that can remember past information to deal with time series data.

Ever wondered what constitutes these complex architectures? Just as their names suggest(‘neurons’), these architectures are designed to imitate or mimic the constitution of human brains. The base of these networks is something called a perceptron. A perceptron is a simple form of a neural network, that is used to classify binary inputs.

The focus of this post is to understand the variant of a perceptron called the MultiLayer Perceptron(MLP).

Multilayer Perceptrons (MLPs), advanced forms of neural networks, overcome perceptrons’ limitations by handling non-linear data. MLPs consist of input, hidden, and output layers, utilizing backpropagation for efficient learning. This article explores MLPs, contrasting them with basic perceptrons

Also read: Building a Single layer perceptron

Difference Between Perceptron and Multilayer Perceptron(MLP)

The main difference between a perceptron and MLP is that a perceptron has a simple architecture. It takes input with the help of neurons in the first layer, processes the input, and produces the output. An MLP follows the same architecture, but it differs from the perceptron in the architecture.

An MLP has an input layer and an output layer, just like a perceptron. In addition to that, an MLP also has a few hidden layers. These layers are not seen but are the most important components of the network as they perform complex computations. These hidden layers also enable the network to learn non-linear relationships in the complex data.

Let us understand the concept of MLP in detail.

Multilayer Perceptron(MLP)

The main reason an MLP was developed is to overcome the limitation of a perceptron – only applicable for linear data. MLP on the other hand, is used when there exists a non-linear relationship between the input and output data.

MLP goes by a feedforward neural network because the output of the previous layer is fed to the next layer.

Observing the architecture of an MLP is crucial to understand the above statement.

MLP Architecture

As discussed in the beginning, the MLP network has three layers – an input layer, hidden layer(s), and an output layer.

Architecture of Multilayer Perceptron
Architecture of Multilayer Perceptron

Similar to a perceptron, an MLP has an input layer, weights, and bias to improve the model’s performance, activation function, and finally an output layer.

What differs is the addition of a hidden layer, and also how MLP treats the error in the output.

When the model predicts an output, the expected output may be different from what we get. However, these outputs may not be the same. Error is defined as the difference between the desired output and actual output.

Error(E) = yactual– ypredicted

The perceptron training algorithm computes the error and adjusts the weights and bias of the input features so that the error is minimized and eventually, the actual output equals the expected output.

Quite contrary to that, the MLP network uses a backpropagation learning algorithm to update the weights of the inputs.

Backpropagation Algorithm

The backpropagation algorithm allows the network to iteratively adjust the weights of the input features to minimize the error function(E).

After the information is passed to the layers in a feed-forward manner, the model computes the output(actual) and it is compared to the true output, measuring the error using a loss function. The gradients of the loss related to the input weights are passed back to the previous layer, to adjust the weights and bias. This process occurs iteratively until the error function is minimized.

Also check: Guide to backpropagation here

Simple Implementation

Let us take a look at the basic construction of an MLP with hidden layers.

#import the necessary libraries
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import Adam
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification

We will be mostly using the Keras library for model building and optimization. The scikit learn library is imported to create a synthetic dataset and to split the dataset into training and testing instances.

#creating and splitting the dataset
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

The synthetic data is created with around a thousand records and 20 features with two classification labels. The dataset is then split into training and testing sets in the ratio 80:20.

model = Sequential()
model.add(Dense(units=64, input_dim=20, activation='relu'))
model.add(Dense(units=32, activation='relu'))
model.add(Dense(units=1, activation='sigmoid'))

We are building a sequential model with an input layer for 20 features and a hidden layer with 64 units. The activation function used here is ReLU. Another hidden layer is added with 32 units and a ReLU activation function. Lastly, we define the output layer with a single unit using a sigmoid activation function.

#model compilation 
model.compile(loss='binary_crossentropy', optimizer=Adam(lr=0.001), metrics=['accuracy']), y_train, epochs=10, batch_size=32, validation_split=0.1)
loss, accuracy = model.evaluate(X_test, y_test)
print(f"Test Loss: {loss:.4f}, Test Accuracy: {accuracy*100:.2f}%")

The model is compiled with a loss function and the Adam optimizer. We are going to assess the performance of the model by how accurate it is. The model is fitted across the training set and is evaluated on the test set. Finally, the loss and accuracy are printed. The number of iterations the model has to be trained on is set to 10.

The Adam optimizer is chosen for its efficiency in handling large datasets and sparse gradients. It combines the advantages of two other popular optimizers: AdaGrad and RMSProp. Adam adjusts the learning rate during training, providing a balance of speed and accuracy, making it ideal for our MLP model

MLP Training
MLP Training


Multilayer Perceptrons represent a significant evolution in the field of neural networks, offering sophisticated solutions to complex, non-linear problems that simple perceptrons can’t handle. By understanding the intricate architecture and functionalities of MLPs, we open new possibilities in machine learning. As we continue to innovate, we can’t help but wonder: what applications will MLPs enable shortly?