Now that we have defined the input, targets, and the network architecture, let's implement the training:
- The first step is to define a loss function, which describes the cost of outputting a wrong sequence of characters, given the input and targets. Because we are predicting the next character considering the previous characters, it's a classification problem and we will use cross-entropy loss. We can do this with the sparse_softmax_cross_entropy_with_logits TF function, which takes as input the logit network output (before the softmax) and the targets as class labels. To reduce the loss over the full sequence and all the batches, we'll use their mean value. First, we have to flatten the targets into one-dimensional vectors to make ...