Next, we compile the model and start training. The compile option specifies three parameters:
- Optimizer: The optimizers we can use are adam, rmsprop, sgd, adadelta, adagrad, adamax, and nadam. For a list of Keras optimizers, please refer to https://keras.io/optimizers:
- sgd stands for Stochastic gradient descent. As the name suggests, it uses the gradient value for the optimizer.
- adam stands for adaptive moment. It uses the gradient in the last step to adjust the gradient descent parameter. Adam works well and needs very little tuning. It will be used often throughout this book.
- adagrad works well for sparse data and also needs very little tuning. For adagrad, a default learning rate is not required.