The Edge TPU is a small processor that is capable of executing deep feedforward networks, such as convolutional neural networks. However, it only supports quantized TFLite models. Quantization is an optimization technique that converts all of the 32-bit floating-point numbers into the nearest 8-bit fixed-point numbers. This makes the model smaller and faster, albeit a bit less precise and accurate.
Two types of quantization are supported in TF. The first style of quantization is post-training quantization. This is done at the time of conversion of the TF model into a TFLite model by setting the model optimization attribute to a list with tf.lite.Optimize.OPTIMIZE_FOR_SIZE. This causes the weights to ...