Chapter 1. Introducing Synthesis and Simulation

The world is hungry for data. Machine learning and artificial intelligence are some of the most data-hungry domains around. Algorithms and models are growing ever bigger, and the real world is insufficient. Manual creation of data and real-world systems are not scalable, and we need new approaches. That’s where Unity, and software traditionally used for video game development, steps in.

This book is all about synthesis and simulation, and leveraging the power of modern video game engines for machine learning. Combining machine learning with simulations and synthetic data sounds relatively straightforward on the surface, but the reality is the idea of including video game technology in the serious business world of machine learning scares an unreasonable number of companies and businesses away from the idea.

We hope this book will steer you into this world and alleviate your concerns. Three of the authors of this book are video game developers with a significant background in computer science, and one is a serious machine learning and data scientist. Our combined perspectives and knowledge, built over many years in a variety of industries and approaches, are presented here for you.

This book will take you on a journey through the approaches and techniques that can be used to build and train machine learning systems using, and using data generated by, the Unity video game engine. There are two distinct domains in this book: simulation and synthesis. Simulation refers to, for all intents and purposes, building virtual robots (known as agents) that learn to do something inside a virtual world of your own creation. Synthesis refers to building virtual objects or worlds, outputting data about those objects and worlds, and using it to train machine learning systems outside of a game engine.

Both simulation and synthesis are powerful techniques that enable new and exciting approaches to data-centric machine learning and AI.

A Whole New World of ML

We’ll get to the structure of the book shortly, but first, here’s a synopsis of the remainder of this chapter, which is split into four sections:

  • In “The Domains”, we’ll introduce the domains of machine learning that the book explores: simulation and synthesis.

  • In “The Tools”, we’ll meet the tools we’ll be using—the Unity engine, the Unity ML-Agents Toolkit, PyTorch, and Unity Perception—and how they fit together.

  • In “The Techniques”, we’ll look at the techniques we’ll be using for machine learning: proximal policy optimization (PPO), soft actor-critic (SAC), behavioral cloning (BC), and generative adversarial imitation learning (GAIL).

  • And finally, in “Projects”, we’ll summarize the projects that we’ll be building throughout this book, and how they relate to the domains and the tools.

By the end of this chapter, you’ll be ready to dive into the world of simulations and synthesis, you’ll know at a high level how a game engine works, and you’ll see why it’s a nearly perfect tool for machine learning. By the end of the book, you’ll be ready to tackle any problem you can think of that might benefit from game engine-driven simulation or synthesis.

The Domains

The twin pillars of this book are simulation and synthesis. In this section, we’ll unpack exactly what we mean by each of these terms and how this book will explore the concepts.

Simulation and synthesis are core parts of the future of artificial intelligence and machine learning.

Many applications immediately jump out at you: combine simulation with deep reinforcement learning to validate how a new robot will function before building a physical product; create the brain of your self-driving car without the car; build your warehouse and train your pick-and-place robots without the warehouse (or the robots).

Other uses are more subtle: synthesize data to create artificial data using simulations, instead of information recorded from the real world, and then train traditional machine learning models; take real user activity and, with behavioral cloning combined with simulations, use it to add a biological- or human-seeming element to an otherwise perfect, machine-learned task.

A video game engine, such as Unity, can simulate enough of the real world, with enough fidelity, to be useful for simulation-based machine learning and artificial intelligence. Not only can a game engine allow you to simulate enough of a city and a car to test, train, and validate a self-driving car deep learning model, but it can also simulate the hardware down to the level of engine temperatures, power remaining, LIDAR, sonar, x-ray, and beyond. Want to incorporate a fancy, expensive new sensor in your robot? Try it out and see if it might improve performance before you invest a single cent in new equipment. Save money, time, compute power, and engineering resources, and get a better view of your problem space.

Is it literally impossible, or potentially unsafe, to acquire enough of your data? Create a simulation and test your theories. Cheap, unlimited training data is only a simulation away.


There’s not one specific thing that we refer to when we say simulation. Simulation, in this context, can mean practically any use of a game engine to develop a scene or environment where machine learning is then applied. In this book, we use simulation as a term to broadly refer to the following:

  • Using a game engine to create an environment with certain components that are the agent or agents

  • Giving the agent(s) the ability to move, or otherwise interact or work with, the environment and/or other agents

  • Connecting the environment to a machine learning framework to train a model that can operate the agent(s) within the environment

  • Using that trained model to operate with the environment in the future, or connecting the model to a similarly equipped agent elsewhere (e.g., in the real world, with an actual robot)


Synthesis is a significantly easier thing to pin down: synthesis, in the context of this book, is the creation of ostensibly fake training data using a game engine. For example, if you were building some kind of image identification machine learning model for a supermarket, you might need to take photos of a box of a specific cereal brand from many different angles and with many different backgrounds and contexts.

Using a game engine, you could create and load a 3D model of a box of cereal and then generate thousands of images of it—synthesizing them—in different angles, backgrounds, and skews, and save them out to a standard image format (JPG or PNG, for example). Then, with your enormous trove of training data, you could use a perfectly standard machine learning framework and toolkit (e.g., TensorFlow, PyTorch, Create ML, Turi Create, or one of the many web services-based training systems) and train a model that can recognize your cereal box.

This mode could then be deployed to, for example, some sort of on-trolley AI system that helps people shop, guides them to the items on their shopping list, or helps store staff fill the shelves correctly and conduct inventory forecasting.

The synthesis is the creation of the training data by using the game engine, and the game engine often has nothing, or very little, to do with the training process itself.

The Tools

This chapter provides you with an introduction to the tools that we’ll be using on our journey. If you’re not a game developer, the primary new tool you’ll encounter is Unity. Unity was traditionally a game engine but is now billed as a real-time 3D engine.

Let’s go one by one through the tools you’ll encounter in this book.


First and foremost, Unity is a game and visual effects engine. Unity Technologies describes Unity as a real-time 3D development platform. We’re not going to repeat the marketing material from the Unity website for you, but if you’re curious about how the company positions itself, you can check it out.


This book isn’t here to teach you the fundamentals of Unity. Some of the authors of this book have already written several books on that—from a game development perspective—and you can find those at O’Reilly Media if you’re interested. You don’t need to learn Unity as a game developer to make use of it for simulation and synthesis with machine learning; in this book we’ll teach you just enough Unity to be effective at this.

The Unity user interface looks like almost every other professional software package that has 3D features. We’ve included an example screenshot in Figure 1-1. The interface has panes that can be manipulated, a 3D canvas for working with objects, and lots of settings. We’ll come back to the specifics of Unity’s user interface later. You can get a solid overview of its different elements in the Unity documentation.

You’ll be using Unity for both simulation and synthesis in this book.

psml 0101
Figure 1-1. The Unity user interface

The Unity engine comes with a robust set of tools that allow you to simulate gravity, forces, friction, movement, sensors of various kinds, and more. These tools are the exact set of tools needed to build a modern video game. It turns out that these are also the exact same set of tools needed to create simulations and to synthesize data for machine learning. But you probably already guessed that, given that you’re reading our book.


This book was written for Unity 2021 and newer. If you’re reading this book in 2023 or beyond, Unity might look slightly different from our screenshots, but the concepts and overall flow shouldn’t have changed much. Game engines tend to, by and large, accumulate features rather than remove them, so the most common sorts of changes you’ll see are icons looking slightly different and things of that nature. For the latest notes on anything that might have changed, head to our special website for the book.

PyTorch via Unity ML-Agents

If you’re in the machine learning space, you’ve probably heard of the PyTorch open source project. As one of the most popular platforms and ecosystems for machine learning in both academia and industry, it’s nearly ubiquitous. In the simulation and synthesis space, it’s no different: PyTorch is one of the go-to frameworks.

In this book, the underlying machine learning that we explore will mostly be done via PyTorch. We won’t be getting into the weeds of PyTorch, because much of the work we’ll be doing with PyTorch will be via the Unity ML-Agents Toolkit. We’ll be discussing the ML-Agents Toolkit momentarily, but essentially all you need to remember is that PyTorch is the engine that powers what the Unity ML-Agents Toolkit does. It’s there all the time, under the hood, and you can tinker with it if you need to, or if you know what you’re doing, but most of the time you don’t need to touch it at all.


We’re going to spend the rest of this section discussing the Unity ML-Agents Toolkit, so if you need a refresher on PyTorch, we highly recommend the PyTorch website, or one of the many excellent books that O’Reilly Media has published on the subject.

PyTorch is a library that provides support for performing computations using data flow graphs. It supports both training and inference using CPUs and GPUs (and other specialized machine learning hardware), and it runs on a huge variety of platforms ranging from serious ML-optimized servers to mobile devices.


Because most of the work you’ll be doing with PyTorch in this book is abstracted away, we will rarely be talking in terms of PyTorch itself. So, while it’s in the background of almost everything we’re going to explore, your primary interface to it will be via the Unity ML-Agents Toolkit and other tools.

We’ll be using PyTorch, via Unity ML-Agents, for all the simulation activities in the book.

Unity ML-Agents Toolkit

The Unity ML-Agents Toolkit (which, against Unity branding, we’ll abbreviate to UnityML or ML-Agents much of the time) is the backbone of the work you’ll be doing in this book. ML-Agents was initially released as a bare-bones experimental project and slowly grew to encompass a range of features that enable the Unity engine to serve as the simulation environment for training and exploring intelligent agents and other machine learning applications.

It’s an open source project that ships with many exciting and well-considered examples (as shown in Figure 1-2), and it is freely available via its GitHub project.

psml 0102
Figure 1-2. The “hero image” of the Unity ML-Agents Toolkit, showing some of Unity’s example characters

If it wasn’t obvious, we’ll be using ML-Agents for all the simulation activities in the book. We’ll show you how to get ML-Agents up and running on your own system in Chapter 2. Don’t rush off to install it just yet!

Unity Perception

The Unity Perception package (which we’ll abbreviate to Perception much of the time) is the tool we’ll be using to generate synthetic data. Unity Perception provides a collection of additional features to the Unity Editor that allow you to set scenes up appropriately to create fake data.

Like ML-Agents, Perception is an open source project, and you can find it via its GitHub project.

The Techniques

The ML-Agents Toolkit supports training using either, or a combination of, reinforcement learning and imitation learning techniques. Each of these will allow an agent to “learn” a desired behavior through repetitive trial and error—or “reinforcement”—and eventually converge on the ideal behavior for the provided success criteria. What differs between these techniques are the criteria, which are used to assess and optimize agent performance throughout.

Reinforcement Learning

Reinforcement learning (RL) refers to learning processes that employ explicit rewards. It’s up to the implementation to award “points” for desirable behaviors and to deduct them for undesirable behaviors.

At this point you may be thinking, If I have to tell it what to do and what not to do, what’s the point of machine learning? But let’s think, as an example, of teaching a bipedal agent to walk. Giving an explicit set of instructions for each state change required to walk—the exact degree of rotation each joint should take, in sequence—would be extensive and complex.

But by giving an agent a few points for moving toward a finish line, lots of points for reaching it, negative points when it falls over, and several hundred thousand attempts to get it right, it will be able to figure out the specifics on its own. So, RL’s great strength is in the ability to give goal-centric instructions that require complex behaviors to achieve.

The ML-Agents framework ships with implementations for two different RL algorithms built in: proximal policy optimization (PPO) and soft actor-critic (SAC).


Take note of the acronyms for these techniques and algorithms: RL, PPO, and SAC. Memorize them. We’ll be using them often throughout the book.

PPO is a powerful, general-purpose RL algorithm that’s repeatedly been proven to be highly effective and generally stable across a range of applications. PPO is the default algorithm used in ML-Agents, and it will be used for most of this book. We’ll be exploring in more detail how PPO works a little later on.


Proximal policy optimization was created by the team at OpenAI and debuted in 2017. You can read the original paper on arXiv, if you’re interested in diving into the details.

SAC is an off-policy RL algorithm. We’ll get to what that means a little later, but for now, it generally offers a reduction in the number of training cycles needed in return for increased memory requirements. This makes it a better choice for slow training environments when compared to an on-policy approach like PPO. We’ll be using SAC once or twice in this book, and we’ll explore how it works in a little more detail when we get there.


Soft actor-critic was created by the Berkeley Artificial Intelligence Research (BAIR) group and debuted in December 2018. You can read the original release documentation for the details.

Imitation Learning

Similar to RL, imitation learning (IL) removes the need to define complex instructions in favor of simply setting objectives. However, IL also removes the need to define explicit objectives or rewards. Instead, a demonstration is given—usually a recording of the agent being manually controlled by a human—and rewards are defined intrinsically based on the agent imitating the behavior being demonstrated.

This is great for complex domains in which the desirable behaviors are highly specific or the vast majority of possible actions are undesirable. Training with IL is also highly effective for multistage objectives—where an agent needs to achieve intermediate objectives in a certain order to receive a reward.

The ML-Agents framework ships with implementations for two different IL algorithms built in: behavioral cloning (BC) and generative adversarial imitation learning (GAIL).

BC is an IL algorithm that trains an agent to precisely mimic the demonstrated behavior. Here, BC is only responsible for defining and allocating intrinsic rewards; an existing RL approach such as PPO or SAC is employed for the underlying training process.

GAIL is a generative adversarial approach, applied to IL. In GAIL, two separate models are pitted against each other during training: one is the agent behavior model, which does its best to mimic the given demonstration; the other is a discriminator, which is repeatedly served either a snippet of human-driven demonstrator behavior or agent-driven model behavior and must guess which one it is.


GAIL originated in Jonathan Ho and Stefano Ermon’s paper “Generative Adversarial Imitation Learning”.

As the discriminator gets better at spotting the mimic, the agent model must improve to be able to fool it once again. Likewise, as the agent model improves, the discriminator must establish increasingly strict or nuanced internal criteria to spot the fake. In this back-and-forth, each is forced to iteratively improve.


Behavioral cloning is often the best approach for applications in which it is possible to demonstrate all, or almost all, of the conditions that the agent may find itself in. GAIL is instead able to extrapolate new behaviors, which allows imitation to be learned from limited demonstrations.

BC and GAIL can also be used together, often by employing BC in early training and then allocating the partially trained behavior model to be the agent half of a GAIL model. Starting with BC will often make an agent improve quickly in early training, while switching to GAIL in late training will allow it to develop behaviors beyond those that were demonstrated.

Hybrid Learning

Though RL or IL alone will almost always do the trick, they can be combined. An agent can then be rewarded—and its behavior informed—by both explicitly defined rewards for achieving objectives and implicit rewards for effective imitation. The weights of each can even be tuned so that an agent can be trained to prioritize one as the primary objective or both as equal objectives.

In hybrid training, the IL demonstration serves to put the agent on the right path early in training, while explicit RL rewards encourage specific behavior within or beyond that. This is necessary in domains where the ideal agent should outperform the human demonstrator. Because of that early hand-holding, training with RL and IL together can make it significantly faster to train an agent to solve complex problems or navigate a complex environment in a scenario with sparse rewards.


Sparse-reward environments are those in which the agent is rewarded especially infrequently with explicit rewards. In such an environment, the time it takes for an agent to “accidentally” stumble upon a rewardable behavior—and thus receive its first indication of what it should be doing—can waste much of the available training time. But combined with IL, the demonstration can inform on desirable behaviors that work toward explicit rewards.

Together these produce a complex rewards scheme that can encourage highly specific behaviors from an agent, but applications that require this level of complexity for an agent to succeed are few.

Summary of Techniques

This chapter is an introductory survey of concepts and techniques, and you’ll be exposed to and use each of the techniques we’ve looked at here over the course of this book. In doing so, you’ll become more familiar with how each of them works in a practical sense.

The gist of it is as follows:

  • The Unity ML-Agents Toolkit currently provides a selection of training algorithms across two categories:

    • For reinforcement learning (RL): proximal policy optimization (PPO) and soft actor-critic (SAC)

    • For imitation learning (IL): behavioral cloning (BC) and generative adversarial imitation learning (GAIL)

  • These methods can be used independently or together:

    • RL can be used with PPO or SAC alone, or in conjunction with an IL method such as BC.

    • BC can be used alone, as a step on the path to an approach using GAIL, or in conjunction with RL.

  • RL techniques require a set of defined rewards.

  • IL techniques require some sort of provided demonstration.

  • Both RL and IL learn by doing.

We’ll be touching on or directly using all these techniques across the remainder of the book’s exploration of simulation topics.


This book is a practical, pragmatic work. We want you to get up and running using simulations and synthesis as quickly as possible, and we assume you’d prefer to focus on the implementation whenever possible.

So, while we do explore behind the scenes often, the meat of the book is in the projects we’ll be building together.

The practical, project-based side of the book is split between the two domains we discussed earlier: simulation and synthesis.

Simulation Projects

Our simulation projects will be varied: when you’re building a simulation environment in Unity, there’s a wide range of ways in which the agent that exists in the environment can observe and sense its world.

Some simulation projects will use an agent that observes the world using vector observations: that is, numbers. Whatever numbers you might want to send it. Literally anything you like. Realistically, though, vector observations are usually things like the agent’s distance from something, or other positional information. But really, any number can be an observation.

Some simulation projects will use an agent that observes the world using visual observations: that is, pictures! Because Unity is a game engine, and game engines, like film, have a concept of cameras, you can simply (virtually) mount cameras on your agent and just have it exist in the game world. The view from these cameras can then be fed into your machine learning system, allowing the agent to learn about its world based on the camera input.

The simulation examples we’ll be looking at using Unity, ML-Agents, and PyTorch include:

  • A ball that can roll itself to a target, in Chapter 2 (we know, it sounds too amazing to be true, but it is!)

  • A cube that can push a block into a goal area, in Chapter 4

  • A simple self-driving car navigating a track, in Chapter 5

  • A ball that seeks a coin, trained by imitating human demonstrations, in Chapter 6

  • An ballistic launcher agent that can launch a ball at a target, using curriculum learning, in Chapter 8

  • A group of cubes that work together to push blocks to goals, in Chapter 9

  • Training agents to balance a ball on top of itself, using visual inputs (i.e., cameras) instead of precise measurements, in Chapter 10

  • Connecting to and manipulating ML-Agents with, Python, in Chapter 11

Synthesis Projects

Our synthesis projects will be fewer than our simulations because the domain is a little simpler. We focus on building on the material supplied by Unity to showcase the possibilities of simulation.

The synthesis examples we’ll be looking at, using Unity and Perception, include:

  • A generator for images of randomly thrown and placed dice, in Chapter 3

  • Improving the dice image generator by changing the floor and colors of the dice, in Chapter 13

  • Generating images of supermarket products to allow for out-of-Unity training on images with complex backdrops and haphazard positioning, in Chapter 14

We won’t focus on the actual training process once you’ve generated your synthesized data, as there are many, many good books and online posts on the subject and we only have so many pages in this book.

Summary and Next Steps

You’ve taken the first steps, and this chapter contained a bit of the required background material. From here onward, we’ll be teaching you by doing. This book has the word practical in the title for a reason, and we want you to get a feel for simulation and synthesis by building projects of your own.


You can find the code for every example at our special website for the book—we recommend downloading the code only when you need it. We’ll also keep the website up-to-date with any changes you should be aware of, so do bookmark it!

In the next chapter, we’ll look at how you can create your first simulation, implement an agent to do something in it, and train a machine learning system using reinforcement learning.

Get Practical Simulations for Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.