Skip to content

How to train a Neural Network?

In training, we need to perform the following three tasks repeatedly:

  • Forward Pass
  • Calculate Loss
  • Backward Pass

In this process, if done correctly, the loss should decrease, meaning that we are overfitting the model. If the loss is increasing, then we are underfitting the model. We need to find a balance between overfitting and underfitting until we achieve "Generalization." Our ultimate goal is to generalize well on the training data or any untrained data, so that we can make accurate predictions on unseen data as well.

Example: Suppose we have a single-layer neural network with two inputs, two hidden nodes, and one output node. The network is trained on a dataset that contains two inputs (\(x[0]\) and \(x[1]\)) and one output (\(y\)). The goal is to find the weights and biases that will produce the correct output (\(y\)) for each input (\(x[0]\) and \(x[1]\)).

  • Forward Pass: In the forward pass, we feed the input data (\(x[0]\) and \(x[1]\)) into the network and calculate the predicted output (\(\hat{y}\)). This is done by multiplying each input by its corresponding weight, adding the biases, and passing the result through an activation function.

  • Calculate Loss: In this step, we calculate the error between the predicted output (\(\hat{y}\)) and the actual output (\(y\)). This is typically done using a loss function, such as mean squared error.

  • Backward Pass: In the backward pass, we calculate the gradient of the loss with respect to each weight and bias in the network. This is done using the chain rule of differentiation, which allows us to backpropagate the error from the output layer to the input layer.

  • Update Weights and Biases: Finally, we update the weights and biases in the network using the gradients calculated in the backward pass. This is typically done using an optimization algorithm, such as gradient descent, which adjusts the weights and biases in the direction of the steepest decrease in the loss.

Programmatically

Python Code Example 1:

neural_network_training.py
import numpy as np

# Sigmoid activation function
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

# Derivative of the sigmoid function
def sigmoid_derivative(x):
    return x * (1 - x)

# Input dataset
X = np.array([[0,0],[0,1],[1,0],[1,1]])

# Output dataset
y = np.array([[0],[1],[1],[0]])

print(f"Training Data | X: {X}")
print(f"Output Data | y: {y}")
# Seed random numbers for reproducibility
np.random.seed(1)

# Initialize weights randomly with mean 0
weights0 = 2 * np.random.random((2,2)) - 1
weights1 = 2 * np.random.random((2,1)) - 1

# Training loop
for i in range(10000):
    # Forward pass: calculate predicted output
    layer0 = X
    layer1 = sigmoid(np.dot(layer0, weights0))
    layer2 = sigmoid(np.dot(layer1, weights1))

    # Calculate error
    layer2_error = y - layer2

    # Backward pass: calculate gradient
    layer2_delta = layer2_error * sigmoid_derivative(layer2)
    layer1_error = layer2_delta.dot(weights1.T)
    layer1_delta = layer1_error * sigmoid_derivative(layer1)

    # Update weights
    weights1 += layer1.T.dot(layer2_delta)
    weights0 += layer0.T.dot(layer1_delta)

# Final prediction
# print(layer2)

# Test dataset
X_test = np.array([[0,1],[1,0]])
print(f"Test data: {X_test}")

# Forward pass for test data
layer0_test = X_test
layer1_test = sigmoid(np.dot(layer0_test, weights0))
layer2_test = sigmoid(np.dot(layer1_test, weights1))

# Prediction
print(f"Prediction for test data: {layer2_test}")
> python3 neural_network_training.py
Training Data | X: [[0 0]
[0 1]
[1 0]
[1 1]]
Output Data | y: [[0]
[1]
[1]
[0]]
Test data: [[0 1]
[1 0]]
Prediction for test data: [[0.86002842]
[0.86003543]]

Python Code Example 2:

Now, Let's use a different input-dataset and see what happens:

neural_network_training.py
import numpy as np

# Sigmoid activation function
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

# Derivative of the sigmoid function
def sigmoid_derivative(x):
    return x * (1 - x)

# Input dataset
X = np.array([[0,0,0,0],[0,0,0,1],[0,0,1,0],[0,0,1,1],[0,1,0,0],[0,1,0,1],[0,1,1,0],[0,1,1,1],[1,0,0,0],[1,0,0,1],[1,0,1,0],[1,0,1,1],[1,1,0,0],[1,1,0,1],[1,1,1,0],[1,1,1,1]])

# Output dataset
y = np.array([[0],[1],[1],[0],[1],[0],[0],[1],[1],[0],[0],[1],[0],[1],[1],[0]])

print(f"Training Data | X: {X}")
print(f"Output Data | y: {y}")
# Seed random numbers for reproducibility
np.random.seed(1)

# Initialize weights randomly with mean 0
weights0 = 2 * np.random.random((4,4)) - 1
weights1 = 2 * np.random.random((4,1)) - 1

# Training loop
for i in range(10000):
    # Forward pass: calculate predicted output
    layer0 = X
    layer1 = sigmoid(np.dot(layer0, weights0))
    layer2 = sigmoid(np.dot(layer1, weights1))

    # Calculate error
    layer2_error = y - layer2

    # Backward pass: calculate gradient
    layer2_delta = layer2_error * sigmoid_derivative(layer2)
    layer1_error = layer2_delta.dot(weights1.T)
    layer1_delta = layer1_error * sigmoid_derivative(layer1)

    # Update weights
    weights1 += layer1.T.dot(layer2_delta)
    weights0 += layer0.T.dot(layer1_delta)

# Final prediction
# print(layer2)

# Test dataset
X_test = np.array([[1,0,1,1], [0,1,0,0]])
print(f"Test data: {X_test}")

# Forward pass for test data
layer0_test = X_test
layer1_test = sigmoid(np.dot(layer0_test, weights0))
layer2_test = sigmoid(np.dot(layer1_test, weights1))

# Prediction
print(f"Prediction for test data: {layer2_test}")
> python3 neural_network_training2.py
Training Data | X: [[0 0 0 0]
[0 0 0 1]
[0 0 1 0]
[0 0 1 1]
[0 1 0 0]
[0 1 0 1]
[0 1 1 0]
[0 1 1 1]
[1 0 0 0]
[1 0 0 1]
[1 0 1 0]
[1 0 1 1]
[1 1 0 0]
[1 1 0 1]
[1 1 1 0]
[1 1 1 1]]
Output Data | y: [[0]
[1]
[1]
[0]
[1]
[0]
[0]
[1]
[1]
[0]
[0]
[1]
[0]
[1]
[1]
[0]]
Test data: [[1 0 1 1]
[0 1 0 0]]
Prediction for test data: [[0.99995361]
[0.95753402]]

Comments