June 2020
Intermediate to advanced
364 pages
13h 56m
English
Cross entropy loss is used mostly when we have a binary classification problem; that is, where the network outputs either 1 or 0.
Suppose we are given a training dataset,
and
. We can then write this in the following form:

Here, θ is the parameters of the network (weights and biases). We can express this in terms of a Bernoulli distribution, as follows:
The probability, given the entire dataset, is then as follows:
If we take ...