As with any deep learning architecture, there are a few hyperparameters that someone can use to control the model and fine-tune it. The following is the set of hyperparameters that we are using for this architecture:
- Batch size is the number of sequences running through the network in one pass.
- The number of steps is the number of characters in the sequence the network is trained on. Larger is better typically; the network will learn more long-range dependencies, but will take longer to train. 100 is typically a good number here.
- The LSTM size is the number of units in the hidden layers.
- Architecture number layers is the number of hidden LSTM layers to use.
- Learning rate is the typical learning rate for training.