April 2017
Intermediate to advanced
318 pages
7h 40m
English
Checkpointing is a process that saves a snapshot of the application's state at regular intervals, so the application can be restarted from the last saved state in case of failure. This is useful during training of deep learning models, which can often be a time-consuming task. The state of a deep learning model at any point in time is the weights of the model at that time. Keras saves these weights in HDF5 format (for more information, refer to https://www.hdfgroup.org/) and provides checkpointing using its callback API.
Some scenarios where checkpointing can be useful include the following: