book

Practical Simulations for Machine Learning

by Paris Buttfield-Addison, Mars Buttfield-Addison, Tim Nugent, Jon Manning

June 2022

Beginner to intermediate

331 pages

7h 15m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Resources Used in This BookAudience and ApproachOrganization of This BookUsing This BookOur TasksConventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
A Whole New World of MLThe DomainsSimulationSynthesisThe ToolsUnityPyTorch via Unity ML-AgentsUnity ML-Agents ToolkitUnity PerceptionThe TechniquesReinforcement LearningImitation LearningHybrid LearningSummary of TechniquesProjectsSimulation ProjectsSynthesis ProjectsSummary and Next Steps
Everybody Remembers Their First SimulationOur SimulationSetting UpCreating the Unity ProjectPackages All the Way DownThe EnvironmentThe FloorThe TargetThe AgentStarting and Stopping the AgentLetting the Agent Observe the EnvironmentLetting the Agent Take Actions in the EnvironmentGiving the Agent Rewards for Its BehaviorFinishing Touches for the AgentProviding a Manual Control System for the AgentTraining with the SimulationMonitoring the Training with TensorBoardWhen the Training Is CompleteWhat’s It All Mean?Coming Up Next
Unity PerceptionThe ProcessUsing Unity PerceptionCreating the Unity ProjectCreating a SceneGetting the Dice ModelsA Very Simple ScenePreparing for SynthesisTesting the ScenarioSetting Up Our LabelsChecking the LabelsWhat’s Next?
Setting Up the Block PusherCreating the Unity ProjectThe EnvironmentThe FloorThe WallsThe BlockThe GoalThe AgentThe EnvironmentTraining and Testing
Creating the EnvironmentThe TrackThe CarSetting Up for MLTraining the SimulationTrainingWhen the Training Is Complete
Simulation EnvironmentCreating the GroundCreating the GoalThe Name’s Ball, Agent BallThe CameraBuilding the SimulationAgent ComponentsAdding Heuristic ControlsObservations and GoalsGenerating Data and TrainingCreating Training DataConfiguring for TrainingBegin TrainingRunning with Our Trained ModelUnderstanding and Using Imitation Learning
Meet GAILDo What I Say and DoA GAIL ScenarioModifying the Agent’s ActionsModifying the ObservationsResetting the AgentUpdating the Agent PropertiesDemonstration TimeTraining with GAILRunning It and Beyond

Curriculum Learning in MLA Curriculum Learning ScenarioBuilding in UnityCreating the GroundCreating the TargetThe AgentBuilding the SimulationMaking the Agent an AgentActionsObservationsHeuristic Controls for HumansCreating the CurriculumResetting the EnvironmentCurriculum ConfigTrainingRunning ItCurriculum Versus Other ApproachesWhat’s Next?
A Simulation for CooperationBuilding the Environment in UnityCoding the AgentsCoding the Environment ManagerCoding the BlocksFinalizing the Environment and AgentsTraining for CooperationCooperative Agents or One Big Agent
Observations and Camera SensorsBuilding a Camera-Only AgentCoding the Camera-Only AgentAdding a New Camera for the AgentSeeing What the Agent’s Camera SeesTraining the Camera-Based AgentCameras and You
Python All the Way DownExperimenting with an EnvironmentWhat Can Be Done with Python?Using Your Own EnvironmentCompletely Custom TrainingWhat’s the Point of Python?
Hyperparameters (and Just Parameters)ParametersReward ParametersHyperparametersAlgorithmsUnity Inference Engine and IntegrationsUsing the ML-Agents Gym WrapperSide Channels
Adding Random Elements to the SceneRandomizing the Floor ColorRandomizing the Camera PositionWhat’s Next?
Creating the Unity EnvironmentA Perception CameraFaking It Until You Make ItUsing Synthesized Data

Content preview from Practical Simulations for Machine Learning

Chapter 9. Cooperative Learning

In this chapter, we’re going to take another step forward with our simulations and reinforcement learning, and create a simulation environment in which multiple agents must work together toward a common goal. These sorts of simulations involve cooperative learning, and agents will usually receive their rewards as a group, instead of individually—including agents that might not have contributed to the actions that resulted in the rewards.

In Unity ML-Agents, the preferred training algorithm and approach for cooperative learning is known as Multi-Agent POsthumous Credit Assignment (or MA-POCA, for short). MA-POCA involves the training of a centralized critic or coach for a group of agents. The MA-POCA approach means agents can still learn what they need to do, even though the group is the entity being rewarded.

Tip

In cooperative learning environments, you can still give rewards to individual agents if you want. We’ll briefly touch on this later. You can also use other algorithms, or just PPO like usual, but MA-POCA has specialized features to make cooperative learning better. You could wire together a collection of PPO-trained agents to get a similar result. We don’t recommend it, though.

A Simulation for Cooperation

Let’s build a simulation environment with a collection of agents that need to work together. This environment has a lot of pieces, so take your time, step through slowly, and take notes if you need to.

Our environment will involve ...