Optimizing Neural Networks with torch.optim in PyTorch

Pytorch is a prevalent machine learning library in Python programming language. Pytorch is a handy tool in neural networks and torch.optim module is used in various neural network models for training. This module provides us with multiple optimization algorithms for training neural networks.

In this article, we will understand in depth about the torch.optim module and also learn about its key components with its Python implementation.

The torch.optim module in PyTorch provides various optimization algorithms commonly used for training neural networks. These algorithms minimize the loss function by adjusting the weights and biases of the network, ultimately improving the model’s performance.

What is torch.optim?

The torch.optim module, as mentioned above, provides us with multiple optimization algorithms that are most commonly used to minimize the loss function during the training of neural networks. In short, these algorithms adjust the weights and biases of the neural network to improve the performance of the model.

Key Components of torch.optim

Optimizer Classes

torch.optim gives us various classes that present us with specific optimization algorithms. Some popular optimizers are SGD (Stochastic Gradient Descent which changes model parameters to reduce losses), Adam (it combines both momentum and RMSprop), Adagrad(optimization algorithm that adjusts the learning rate of parameters based on historical gradient) and RMSprop (an adaptive optimization algorithm )

2. Parameter Groups

An optimization algorithm in PyTorch handles multiple parameter groups. A parameter group is essentially a dictionary and its optimization groups. It allows users to change learning rates and weights in different parts of the model.

3. Learning Rate Schedulers

torch.optim also includes learning rate schedulers that adjust the learning rate during training, Some common schedulers are StepLR, MultiStepLR, etc.

Let us now further understand torch.optim with an example in Python programming language.

Example: SGD Optimizer

In this example, we will create a simple neural network and train it on a dataset using the SGD optimizer. Let us look at the code.

import torch
import torch.nn as nn
import torch.optim as optim

# Define a simple neural network class
class SimpleNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        return x

# Set random seed for reproducibility
torch.manual_seed(42)

# Define input size, hidden size, and output size
input_size = 10
hidden_size = 20
output_size = 5

# Create an instance of the SimpleNN class
model = SimpleNN(input_size, hidden_size, output_size)

# Define a synthetic dataset
input_data = torch.randn(100, input_size)
target = torch.randn(100, output_size)

# Define a loss function
criterion = nn.MSELoss()

# Define the SGD optimizer
optimizer = optim.SGD(model.parameters(), lr=0.01)

# Training loop
epochs = 100
for epoch in range(epochs):
    # Forward pass
    output = model(input_data)

    # Compute the loss
    loss = criterion(output, target)

    # Backward pass and optimization
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    # Print the loss for every few epochs
    if (epoch + 1) % 10 == 0:
        print(f'Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}')

Let us look at the output below.

Thus we have used SGD optimizer to minimize the mean squared error loss. The learning rate is set to 0.01 and the model is trained for 100 iterations.

Example: Adam Optimizer

Let us look at another Python code where we have used Adam Optimizer.

import torch
import torch.nn as nn
import torch.optim as optim
import matplotlib.pyplot as plt
import numpy as np

# Generate synthetic dataset
torch.manual_seed(42)  # For reproducibility

# Generate random data
X = torch.unsqueeze(torch.linspace(-1, 1, 100), dim=1)
y = 3 * X + 1 + 0.2 * torch.randn(X.size())

# Define a simple linear regression model
class LinearRegression(nn.Module):
    def __init__(self):
        super(LinearRegression, self).__init__()
        self.linear = nn.Linear(1, 1)

    def forward(self, x):
        return self.linear(x)

# Instantiate the model
model = LinearRegression()

# Define the Mean Squared Error (MSE) loss
criterion = nn.MSELoss()

# Define the Adam optimizer
optimizer = optim.Adam(model.parameters(), lr=0.01)

# Training loop
num_epochs = 1000
losses = []

for epoch in range(num_epochs):
    # Forward pass
    predictions = model(X)
    loss = criterion(predictions, y)

    # Backward pass and optimization
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    # Save the loss for plotting
    losses.append(loss.item())

    # Print the loss every 100 epochs
    if (epoch + 1) % 100 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

# Plot the training progress
plt.plot(range(1, num_epochs+1), losses, label='Training Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training Loss over Epochs')
plt.legend()
plt.show()

# Make predictions using the trained model
with torch.no_grad():
    predicted_y = model(X)

# Plot the original data and the predicted values
plt.scatter(X.numpy(), y.numpy(), label='Original Data')
plt.plot(X.numpy(), predicted_y.numpy(), 'r-', label='Predicted Line')
plt.xlabel('X')
plt.ylabel('y')
plt.title('Linear Regression with Adam Optimizer')
plt.legend()
plt.show()

In the code above, we have used Adam optimizer to train a simple linear regression model. The training loop also iterates for 1000 times. Let us also look at the output and its plots.

Summary

torch.optim is a powerful module in PyTorch that simplifies the optimization process for training neural networks. With a wide range of optimization algorithms and useful features like parameter groups and learning rate schedulers, torch.optim helps developers train models and achieve better performance efficiently. As you continue using PyTorch, keep working with torch.optim to build and optimize your neural networks. Which optimizer will you choose for your next project?

Recommended: A Quick Guide to Pytorch Loss Functions