Inference on a device
In this approach, the machine learning model is loaded into the client mobile application. To make a prediction, the mobile application runs all the inference computations locally on the device, on its own CPU or GPU. It need not communicate to the server for anything related to machine learning.
Speed is the major reason for doing inference directly on a device. We need not send a request over the server and wait for the reply. Things happen almost instantaneously.
Since the model is bundled along with the mobile application, it is not very easy to upgrade the model in one place and reuse it. The mobile application upgrade has to be done. The upgrade push has to be provided to all active users. All this is a big overhead ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access