4.3 Log Loss

The Log Loss requires some understanding of probabilities. Let's classify an image - particularly, let's try to identify whether the entity visible in the image is a cat or a dog.

In using Logistic Regression, you may have noticed that prediction doesn't produce a strict class name, but rather a probability. Based on this, the instance is then given the most likely class. For example, take this image:

Instead of outright judging whether this was a cat or a dog, a Machine Learning algorithm might say that it is 59% likely to be a cat. Since this is more likely than it being a dog (whose likelihood would be 100-59 = 41%), we will label this image as a Cat.

But, as we have seen thus far, models are rarely perfect, and a good error metric is meant to take account of its shortcomings. In the case of models that output probabilities, it can be more efficient to evaluate the accuracy of these than to evaluate the correctness of merely the predicted classes. In order to do this, we will take the corrected probabilities and sum these up instead.

A corrected probability is the inverse of a wrong probability, although it is unchanged if the probability is in the right direction. For example, with the image above, if it was indeed a cat, the corrected probability would simply be the predicted probability - 59% -, but if it was actually a dog, the corrected probability would be the inverse, 41%.

Then, we sum up these corrected probability values. However, since these values are less than one, they are often difficult for a computer to keep track of. So instead, we use the logarithm of each of these values, which is easier to track.

At this point, the logarithm of each of the corrected probabilities are summed up, and we then take their average. However, our normal convention with Error Metrics is that a smaller loss is better, and this is not maintained with the Log Loss function. So instead, we multiply this average by -1. This results in the following formula:

Don’t worry if this begins to go over your head, we’ll be using a library function for this, and this information was mostly to build intuition.

References: 1. https://www.kaggle.com/dansbecker/what-is-log-loss

2. https://www.analyticsvidhya.com/blog/2020/11/binary-cross-entropy-aka-log-loss-the-cost-function-used-in-logistic-regression

Previous Section

2️⃣

4.2 Jaccard Index

Next Section

4️⃣

4.4 Precision, Recall and F1-Score

⚖️

Copyright © 2021 Code 4 Tomorrow. All rights reserved. The code in this course is licensed under the MIT License. If you would like to use content from any of our courses, you must obtain our explicit written permission and provide credit. Please contact classes@code4tomorrow.org for inquiries.