Neural Networks in Python - A Complete Reference for Beginners

Neural Networks are an interconnected group of neurons that processes mathematical computation and have gained a lot of popularity because of their successful applications in the field of Artificial Intelligence. In this tutorial, you will learn how to make a neural network that can recognize digits in an image with a simple implementation of it using Tensorflow.

What is a neural network?

Neural Networks is a powerful learning algorithm used in Machine Learning that provides a way of approximating complex functions and try to learn relationships between data and labels. Neural Networks are inspired by the working of the human brain and mimics the way it operates.

Neurons

Inspired from a biological neuron, a single artificial neuron is a tree-like structure that consists of input nodes and a single output and other components as shown below:

Components involved in a single neuron are:

Input Nodes: Input Nodes contain information in the form of real numerical values. This information is processed by the neuron.
Weights: Between a single input node and neuron, there exists a connection with a weight associated with it that determines the fraction of information that will be passed to the neuron. These weights are the parameters that are learned by the neural network to learn a relationship mapping.
Summation: In the next step, all the input nodes along with their associated weights are brought together and a weighted sum is calculated, i.e., y_sum = Σ Wj*Ij or y_sum = W₁*I₁ + W₂*I₂ + ... + W_n*I_n .
Activation Function: The result of summation will be the input to a function called the activation function. The activation function decides whether a neuron should activate itself or not using the calculated weighted sum. The output of this step y = f(y_sum) where f() is the activation function.
Output Node: The result of the activation function is passed on to other neurons present in the neural network.

Layers

A layer in a neural network consists of nodes/neurons of the same type. It is a stacked aggregation of neurons. To define a layer in the fully connected neural network, we specify 2 properties of a layer:

Units: The number of neurons present in a layer.
Activation Function: An activation function that triggers neurons present in the layer. Commonly used activation functions are:
- ReLU Activation: Rectified Linear Unit(ReLU) function return the same value if the value is positive, else return 0. It is a non-linear activation function.
- Sigmoid Activation: Sigmoid function maps a value from range (-∞ , ∞) to (0, 1). The sigmoid function is widely used in binary classification problems where we have only 2 classes to predict and represents the probability of one of the class.
- Softmax Activation: The softmax function calculated the probability distribution over n events. It takes n values and converts each of them in 0 – 1 representing its probability of occurrence. It is used for multi-class classification where we have more than 2 classes to predict.

Neural Network

When multiple layers are connected in some fashion, a neural network is formed. Thus a neural network is a stacked aggregation of layers. Layers can be connected in a linear fashion as well as tree-like structure depending on the requirements.

The first layer of the neural network is called Input Layer, the last layer of the neural network that gives output is called the Output Layer, and all other intermediate layers are called Hidden Layer.

Defining a neural network takes 3 properties:

Architecture: The number and types of layers that you use in your neural network and how you connect them to define the architecture of a neural network. The different architecture of neural networks gives different results.
Loss Function: The loss function tells our model how to find the error between the actual value and the value predicted by the model. We want our model to minimize the value of the loss function. Commonly used loss functions are:
Optimizer: Optimizer tells our model how to update weights/parameters of the model by looking at the data and loss function value. Commonly used optimizers are:

How to make a Neural Network?

In this tutorial, we will make a neural network that can classify digits present in an image in python using the Tensorflow module.

1. Importing Modules

First, we will import the modules used in the implementation. We will be using Tensorflow for making the neural network and Matplotlib to display images and plot the metrics.

import tensorflow as tf
import matplotlib.pyplot as plt

2. Exploring the Data

Next, we will load the dataset in our notebook and check how it looks like. We will be using the MNIST dataset already present in our Tensorflow module which can be accessed using the API tf.keras.dataset.mnist.

MNIST dataset consists of 60,000 training images and 10,000 test images along with labels representing the digit present in the image. Each image is represented by 28×28 grayscale pixels. We will load the dataset using load_data() method.

mnist = tf.keras.datasets.mnist
(train_images, train_labels) , (test_images, test_labels) = mnist.load_data()

Lets see the shape of above variables and also the how our dataset looks like

# Printing the shapes
print("train_images shape: ", train_images.shape)
print("train_labels shape: ", train_labels.shape)
print("test_images shape: ", test_images.shape)
print("test_labels shape: ", test_labels.shape)


# Displaying first 9 images of dataset
fig = plt.figure(figsize=(10,10))

nrows=3
ncols=3
for i in range(9):
  fig.add_subplot(nrows, ncols, i+1)
  plt.imshow(train_images[i])
  plt.title("Digit: {}".format(train_labels[i]))
  plt.axis(False)
plt.show()

3. Preprocessing the Data

You should always preprocess your data before moving it to train a neural network. Preprocessing the dataset makes it ready as input to the machine learning model.

Images in our dataset are made up of grayscale pixels in range 0 – 255. Machine Learning models works better if the range of values dataset is using is small. So we convert its range to 0 – 1 by dividing it by 255.

We also convert our labels from digit labels to one-hot encoded vectors. One-hot encoded vector is a binary vector representation of labels in which all elements are 0 except index of the corresponding label whose value is 1. We will use to_categorical() method to convert labels to one-hot.

For example, for label 2, index 2 will have have 1, rest all will be 0. ( [ 0 0 1 0 0 0 0 0 0 0 ] ).

# Converting image pixel values to 0 - 1
train_images = train_images / 255
test_images = test_images / 255

print("First Label before conversion:")
print(train_labels[0])

# Converting labels to one-hot encoded vectors
train_labels = tf.keras.utils.to_categorical(train_labels)
test_labels = tf.keras.utils.to_categorical(test_labels)

print("First Label after conversion:")
print(train_labels[0])

Its output is:

First Label before conversion:
5
First Label after conversion:
[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]

4. Build your Neural Network

Building a neural network takes 2 steps: configuring the layers and compiling the model.

Setting up the layers

This will be the architecture of our model:

Flatten Layer: Our input images are 2D arrays. Flatten layer converts the 2D arrays(of 28 by 28 pixels) into a 1D array(of 28*28=784 pixels) by unstacking the rows one after another. This layer just changes the data shape and no parameters/weights are learned.
Hidden Layer: Our only hidden layer consists of a fully connected Dense layer of 512 nodes(or neurons) each with relu activation function.
Output Layer: The output layer of the neural network consists of a Dense layer with 10 output neurons which outputs 10 probabilities each for digit 0 – 9 representing the probability of the image being the corresponding digit. The output layer is given softmax activation function to convert input activations to probabilities.

Since the output of each layer is input to a single layer only and all the layers are stacked in linear fashion, we will use Sequential() API that takes a list of layers that will come in order one after another.

# Using Sequential() to build layers one after another
model = tf.keras.Sequential([
  
  # Flatten Layer that converts images to 1D array
  tf.keras.layers.Flatten(),
  
  # Hidden Layer with 512 units and relu activation
  tf.keras.layers.Dense(units=512, activation='relu'),
  
  # Output Layer with 10 units for 10 classes and softmax activation
  tf.keras.layers.Dense(units=10, activation='softmax')
])

Compiling the model

Before we train our model, we need to tell our model a few things. Here are the 3 attributes given to the model during the models compile step:

Loss Function: This tells our model how to find the error between the actual label and the label predicted by the model. This metric measures how accurate our model was during training. We want our model to minimize this function value. We will use categorical_crossentropy loss function for our model.
Optimizer: This tells our model how to update weights/parameters of the model by looking at the data and loss function value. We will use adam optimizer for our model
Metrics(Optional): It contains a list of metrics used to monitor the train and test steps. We will use accuracy or the number of images our model classifies correctly.

model.compile(
  loss = 'categorical_crossentropy',
  optimizer = 'adam',
  metrics = ['accuracy']
)

5. Training a neural network

Training a neural network takes a lot of boilerplate code that includes forward propagation, finding loss using loss function, backpropagating the error back, and update the weights using the optimizer. However, frameworks like Tensorflow take care of all of this for you.

To train our neural network, we will call fit() method on model that takes:

Training Data: In this, we will use train_images consisting of images that we will feed to the neural network.
Training Labels: In this, we will use train_labels consisting of labels that represent the output of our training images.
Epochs: Epochs are the number of times our model will iterate on all training examples. For example, if we specify 10 epochs, then our model will run on all 60,000 training images 10 times.

fit() method returns a history object that contains the loss values and metrics specified during compile time after each epoch.

history = model.fit(
  x = train_images,
  y = train_labels,
  epochs = 10
)

Its output is:

Epoch 1/10
1875/1875 [==============================] - 8s 4ms/step - loss: 0.1994 - accuracy: 0.9412
Epoch 2/10
1875/1875 [==============================] - 8s 4ms/step - loss: 0.0818 - accuracy: 0.9745
Epoch 3/10
1875/1875 [==============================] - 8s 4ms/step - loss: 0.0529 - accuracy: 0.9836
Epoch 4/10
1875/1875 [==============================] - 9s 5ms/step - loss: 0.0372 - accuracy: 0.9883
Epoch 5/10
1875/1875 [==============================] - 8s 4ms/step - loss: 0.0270 - accuracy: 0.9915
Epoch 6/10
1875/1875 [==============================] - 9s 5ms/step - loss: 0.0218 - accuracy: 0.9928
Epoch 7/10
1875/1875 [==============================] - 8s 4ms/step - loss: 0.0169 - accuracy: 0.9942
Epoch 8/10
1875/1875 [==============================] - 9s 5ms/step - loss: 0.0139 - accuracy: 0.9953
Epoch 9/10
1875/1875 [==============================] - 9s 5ms/step - loss: 0.0122 - accuracy: 0.9961
Epoch 10/10
1875/1875 [==============================] - 8s 4ms/step - loss: 0.0104 - accuracy: 0.9966

We got an accuracy of 99.6%, pretty good. Here we can see our loss values decreasing and accuracy increasing after each step. We can also plot these values in a graphical manner using matplotlib.

# Showing plot for loss
plt.plot(history.history['loss'])
plt.xlabel('epochs')
plt.legend(['loss'])
plt.show()

# Showing plot for accuracy
plt.plot(history.history['accuracy'], color='orange')
plt.xlabel('epochs')
plt.legend(['accuracy'])
plt.show()

6. Evaluating a neural network

Now we have trained our neural network, we would like to see how it performs on data our model haven’t seen before. For this we will use our test dataset to see how much accurate it is. For this we will call evaluate() method on model.

# Call evaluate to find the accuracy on test images
test_loss, test_accuracy = model.evaluate(
  x = test_images, 
  y = test_labels
)

print("Test Loss: %.4f"%test_loss)
print("Test Accuracy: %.4f"%test_accuracy)

313/313 [==============================] - 1s 2ms/step - loss: 0.0852 - accuracy: 0.9799
Test Loss: 0.0852
Test Accuracy: 0.9799

With our trained model, we can also make predictions on new images and see what our model identifies in the image. We make predictions in 2 steps:

Predicting Probabilities: We will use model.predict() that will return the probabilities for an image of being it to one of the classes. In our example, for a single image, it will return 10 probabilities for each image representing probabilities of it being a digit 0 – 9.
Predicting Classes: Now we have 10 probabilities, the class with maximum probability is the one predicted by the model. To find this, we will use tf.argmax() that will return the index with maximum value.

predicted_probabilities = model.predict(test_images)
predicted_classes = tf.argmax(predicted_probabilities, axis=-1).numpy()

Now you can see what our model has predicted. You can change the index to see output for different test images.

index=11

# Showing image
plt.imshow(test_images[index])

# Printing Probabilities
print("Probabilities predicted for image at index", index)
print(predicted_probabilities[index])

print()

# Printing Predicted Class
print("Probabilities class for image at index", index)
print(predicted_classes[index])

Final Code

import tensorflow as tf
import matplotlib.pyplot as plt

mnist = tf.keras.datasets.mnist
(train_images, train_labels) , (test_images, test_labels) = mnist.load_data()

# Printing the shapes
print("train_images shape: ", train_images.shape)
print("train_labels shape: ", train_labels.shape)
print("test_images shape: ", test_images.shape)
print("test_labels shape: ", test_labels.shape)

# Displaying first 9 images of dataset
fig = plt.figure(figsize=(10,10))

nrows=3
ncols=3
for i in range(9):
  fig.add_subplot(nrows, ncols, i+1)
  plt.imshow(train_images[i])
  plt.title("Digit: {}".format(train_labels[i]))
  plt.axis(False)
plt.show()


# Converting image pixel values to 0 - 1
train_images = train_images / 255
test_images = test_images / 255

print("First Label before conversion:")
print(train_labels[0])

# Converting labels to one-hot encoded vectors
train_labels = tf.keras.utils.to_categorical(train_labels)
test_labels = tf.keras.utils.to_categorical(test_labels)

print("First Label after conversion:")
print(train_labels[0])


# Defining Model
# Using Sequential() to build layers one after another
model = tf.keras.Sequential([

  # Flatten Layer that converts images to 1D array
  tf.keras.layers.Flatten(),
  
  # Hidden Layer with 512 units and relu activation
  tf.keras.layers.Dense(units=512, activation='relu'),
  
  # Output Layer with 10 units for 10 classes and softmax activation
  tf.keras.layers.Dense(units=10, activation='softmax')
])

model.compile(
  loss = 'categorical_crossentropy',
  optimizer = 'adam',
  metrics = ['accuracy']
)

history = model.fit(
  x = train_images,
  y = train_labels,
  epochs = 10
)


# Showing plot for loss
plt.plot(history.history['loss'])
plt.xlabel('epochs')
plt.legend(['loss'])
plt.show()

# Showing plot for accuracy
plt.plot(history.history['accuracy'], color='orange')
plt.xlabel('epochs')
plt.legend(['accuracy'])
plt.show()


# Call evaluate to find the accuracy on test images
test_loss, test_accuracy = model.evaluate(
  x = test_images, 
  y = test_labels
)

print("Test Loss: %.4f"%test_loss)
print("Test Accuracy: %.4f"%test_accuracy)

# Making Predictions
predicted_probabilities = model.predict(test_images)
predicted_classes = tf.argmax(predicted_probabilities, axis=-1).numpy()

index=11

# Showing image
plt.imshow(test_images[index])

# Printing Probabilities
print("Probabilities predicted for image at index", index)
print(predicted_probabilities[index])

print()

# Printing Predicted Class
print("Probabilities class for image at index", index)
print(predicted_classes[index])

Conclusion

Congratulations! Now you know about neural networks and how to make one in python to classify digit images. Hope you liked it! Stay tuned to learn more!

Thanks for reading!

Neural Networks in Python – A Complete Reference for Beginners

What is a neural network?

Neurons

Layers

Neural Network

How to make a Neural Network?

1. Importing Modules

2. Exploring the Data

3. Preprocessing the Data

4. Build your Neural Network

Setting up the layers

Compiling the model

5. Training a neural network

6. Evaluating a neural network

Final Code

Conclusion

Yogesh Sharma

What is a neural network?

Neurons

Layers

Neural Network

How to make a Neural Network?

1. Importing Modules

2. Exploring the Data

3. Preprocessing the Data

4. Build your Neural Network

Setting up the layers

Compiling the model

5. Training a neural network

6. Evaluating a neural network

Final Code

Conclusion

Yogesh Sharma

Related Posts

Building RAG Applications with Python: Complete 2026 Guide

OpenAI Python SDK: Complete Developer Guide (2026)

Python “in” and “not in” Membership Operators: Examples and Usage