Chapter 7: Model Optimization
In this chapter, we will learn about the concept of model optimization through a technique known as quantization. This is important because even though capacity, such as compute and memory, are less of an issue in a cloud environment, latency and throughput are always a factor in the quality and quantity of the model's output. Therefore, model optimization to reduce latency and maximize throughput can help reduce the compute cost. In the edge environment, many of the constraints are related to resources such as memory, compute, power consumption, and bandwidth.
In this chapter, you will learn how to make your model as lean and mean as possible, with acceptable or negligible changes in the model's accuracy. In other ...
Get Learn TensorFlow Enterprise now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.