Cross-entropy and KL divergence - Eli Bendersky's website
Cross-entropy is widely used in modern ML to compute the loss for classification
tasks. This post is a brief overview of the math behind it and a related
concept called Kullback-Leibler (KL) divergence.
Information content of a single random event
We'll start with a single event (E) that has probability p. The information
content (or "degree of surprise") of this event occurring is defined as:
The base 2 here is used so that we can count the information in units of bits.
Thinking about this defi...
Read more at eli.thegreenplace.net