Chapter 12. Under the Hood and Beyond

In this chapter, we’re going to touch on some of the approaches we have used throughout the previous chapters on simulation.

We’ve covered the gist: in simulation-based agent learning, an agent undergoes a training process to develop a policy for its behavior. The policy acts as a mapping from previous observations to the actions it took in response and the corresponding rewards it earned for doing so. Training takes place across a large number of episodes during which the cumulative reward should increase as the agent improves at the given task, partially dictated by hyperparameters that control aspects of agent behavior during training—including the algorithm used to produce the behavior model.

Once trained, inference is used to query the trained agent model for the appropriate behavior (actions) in response to given stimuli (observations), but learning has ceased and thus the agent will no longer improve at the given task.

We’ve talked about most of these concepts already:

  • We know about observations, actions, and rewards, and how the mapping between them is used to build up a policy.

  • We know that a training phase occurs over a large number of episodes, and how once this is completed, the agent transitions to inference (only querying the model, not updating it any longer).

  • We know we pass a file of hyperparameters to the mlagents -learn process, but we kind of glossed over that part.

  • We know there are different algorithms to choose ...

Get Practical Simulations for Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.