February 2018
Intermediate to advanced
378 pages
10h 14m
English
To 8 bits or less.
Weights quantization allows to decrease the size of the model, but at expense of prediction accuracy. In any case, it requires the same amount of memory during the runtime. For quantization any general purpose clustering algorithm can be used, for example, k-means.
Standard clustering algorithms applied to the weights. By this we replace all this floating-points numbers with a few bits, representing its cluster. 1 floating point per cluster. Retrain again.
https://petewarden.com/2016/05/03/how-to-quantize-neural-networks-with-tensorflow/
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/quantization
Read now
Unlock full access