Chapter 12. Bringing Reinforcement Learning from the Lab to the Convenience Store

Nearly all the AI models and services we’ve covered so far have been based on supervised and semi-supervised machine learning, but a new technique called reinforcement learning has recently emerged from research labs to offer almost real-time learning.

Instead of looking for patterns in data, reinforcement learning systems learn by doing: training agents look at the context, make decisions, and get rewards as feedback. In the lab, reinforcement learning agents train inside games like Minecraft, where the context is the current state of the game, and there are a limited number of actions and clear rewards. In the real world, reinforcement learning can be useful for deciding what products to suggest to users,1 what to have a bot say next, how to phrase an alert, which picture or video or ad to show—or any other optimization problem.

Azure uses reinforcement learning to decide the least disruptive time to reboot VMs that need to be reset or moved to a different physical server. Microsoft Teams uses it to determine what audio jitter buffer to use for every individual call. That buffer smooths out the way audio packets are handled to match any changes in the latency of the connection during the call, so the sound doesn’t lag and you don’t get dropouts when packets are delayed or choppy, mechanical-sounding speech as the system plays packets that arrive more quickly.

The Personalizer service (one of the ...

Get Azure AI Services at Scale for Cloud, Mobile, and Edge now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.