The BasicsWhat is Loss?How is Loss Calculated?Mean Squared ErrorOptimization (Conceptual)VideoPrevious Section
Backpropogation is the heart and soul of how Neural Networks train on data. At its core, it uses the loss from a neural network to update the weights and biases, thus “fitting” the model to the training data.
Loss is a measure of error for a neural network. The lowest that value can be is 0, indicating the neural network can correctly predict the output for any set of inputs, however, in practice, a loss of 0 cannot be achieved through network training. As a rule of thumb, an acceptable loss is between 0 and ~1, although that value changes drastically depending on the use case. You can think of loss as, generally, the opposite of your score on a test - in other words, how many questions you get wrong.
Loss is a calculation based on how “wrong” the network is for a given piece of data. Just like with a test, the calculation method for loss changes based on the prediction type: for regression tasks, where the network must predict an integer output, we use Mean Squared Error, whereas with classification tasks, akin to multiple choice, we use CrossEntropy. Because the mathematical basis of CrossEntropy is beyond the scope of this course, we will focus on Mean Squared Error, but just remember that like Mean Squared Error, CrossEntropy also calculated the “incorrectness” of a neural network’s prediction.
Mean Squared Error calculates the difference between the predicted value and the actual value, like any regression loss function does. We can calculate it using the following formula:
In this equation, is the actual value from the data, is the predicted value, and is the number of data points. The fact that the difference is squared does 2 things: first, it turns the difference positive and increases training speed. By making the difference positive, the model won’t worry about whether its output is smaller or greater than the actual value, which helps it reach the right value faster. The model also trains faster because the difference is squared, meaning that a difference of 2 will yeild a loss of 4, for example, which forces the neural network to be perfect, rather than close.
Because the mathematics behind backpropogation, part of a larger process called optimization, is beyond the scope of this course, we will instead focus on the concept of optimization. Optimization begins with forward propogation, which is the process of an input being fed into the network, appling each nodes’ weights and baises, and ending with a singular output value, as explained in 3.1. The next step is backpropogation, which does 2 things: first, it calculates the loss as explained above, and second, adjusts the weights and biases to “train” the model and lower the loss.
Although this video is a bit math-heavy, it provides a nice deep dive on optimization, both in math and in raw python.1.2 Neural Network Architecture