# Explain cross entropy loss.

Suppose I build a neural network for classification. The last layer is a dense layer with Softmax activation. I have five different classes to classify. Suppose for a single training example, the true label is [1 0 0 0 0] while the predictions are [0.1 0.5 0.1 0.1 0.2]. How would I calculate the cross entropy loss for this example?

The cross entropy loss formula takes in two distributions, p(x), the true distribution, and q(x), the estimated distribution, defined over the discrete variable x and is given by

``  H(p,q)=−∑∀xp(x)log(q(x))``

For a neural network, the calculation is independent of the following:

What kind of layer was used?

What kind of activation was used - although many activations will not be compatible with the calculation because their outputs are not interpretable as probabilities (i.e., their outputs are negative, greater than 1, or do not sum to 1). Softmax is often used for multiclass classification because it guarantees a well-behaved probability distribution function.

For a neural network, you will usually see the equation written in a form where y is the ground truth vector and y^ (or some other value taken directly from the last layer output) is the estimate. For a single example, it would look like this:

``  L=−y⋅log(y^)``

where ⋅ is the inner product.

Your example ground truth y gives all probability to the first value, and the other values are zero, so we can ignore them, and just use the matching term from your estimates y^

``````L=−(1×log(0.1)+0×log(0.5)+...)
L=−log(0.1)≈2.303
``  J=−1N(∑i=1Nyi⋅log(y^i))``