The following are the questions:
- Can you try to use from a different book to see how well the model is able to generate new text?
- What happens to the generated text if you double the batch size and decrease the learning rate?
- Can you train the model without gradient clipping and see if the result improves?