How to train a Neural Network?¶
In training, we need to perform the following three tasks repeatedly:
- Forward Pass
- Calculate Loss
- Backward Pass
In this process, if done correctly, the loss should decrease, meaning that we are overfitting the model. If the loss is increasing, then we are underfitting the model. We need to find a balance between overfitting and underfitting until we achieve "Generalization." Our ultimate goal is to generalize well on the training data or any untrained data, so that we can make accurate predictions on unseen data as well.
Example: Suppose we have a single-layer neural network with two inputs, two hidden nodes, and one output node. The network is trained on a dataset that contains two inputs (\(x[0]\) and \(x[1]\)) and one output (\(y\)). The goal is to find the weights and biases that will produce the correct output (\(y\)) for each input (\(x[0]\) and \(x[1]\)).
-
Forward Pass: In the forward pass, we feed the input data (\(x[0]\) and \(x[1]\)) into the network and calculate the predicted output (\(\hat{y}\)). This is done by multiplying each input by its corresponding weight, adding the biases, and passing the result through an activation function.
-
Calculate Loss: In this step, we calculate the error between the predicted output (\(\hat{y}\)) and the actual output (\(y\)). This is typically done using a loss function, such as mean squared error.
-
Backward Pass: In the backward pass, we calculate the gradient of the loss with respect to each weight and bias in the network. This is done using the chain rule of differentiation, which allows us to backpropagate the error from the output layer to the input layer.
-
Update Weights and Biases: Finally, we update the weights and biases in the network using the gradients calculated in the backward pass. This is typically done using an optimization algorithm, such as gradient descent, which adjusts the weights and biases in the direction of the steepest decrease in the loss.
Programmatically
Python Code Example 1:¶
> python3 neural_network_training.py
Training Data | X: [[0 0]
[0 1]
[1 0]
[1 1]]
Output Data | y: [[0]
[1]
[1]
[0]]
Test data: [[0 1]
[1 0]]
Prediction for test data: [[0.86002842]
[0.86003543]]
Python Code Example 2:¶
Now, Let's use a different input-dataset and see what happens:
> python3 neural_network_training2.py
Training Data | X: [[0 0 0 0]
[0 0 0 1]
[0 0 1 0]
[0 0 1 1]
[0 1 0 0]
[0 1 0 1]
[0 1 1 0]
[0 1 1 1]
[1 0 0 0]
[1 0 0 1]
[1 0 1 0]
[1 0 1 1]
[1 1 0 0]
[1 1 0 1]
[1 1 1 0]
[1 1 1 1]]
Output Data | y: [[0]
[1]
[1]
[0]
[1]
[0]
[0]
[1]
[1]
[0]
[0]
[1]
[0]
[1]
[1]
[0]]
Test data: [[1 0 1 1]
[0 1 0 0]]
Prediction for test data: [[0.99995361]
[0.95753402]]