Cross-entropy and KL divergence explained: How ML calculates loss for classification tasks

Cross-entropy and KL divergence - Eli Bendersky's website

Cross-entropy is widely used in modern ML to compute the loss for classification tasks. This post is a brief overview of the math behind it and a related concept called Kullback-Leibler (KL) divergence. Information content of a single random event We'll start with a single event (E) that has probability p. The information content (or "degree of surprise") of this event occurring is defined as: The base 2 here is used so that we can count the information in units of bits. Thinking about this defi...