Chapter 4. Creating a More Advanced Simulation

So far, you’ve been introduced to the basics of simulation and the basics of synthesis. It’s time to dive in a bit further and do some more simulation. Back in Chapter 2, we built a simple simulation environment that showed you how easy it is to assemble a scene in Unity and use it to train an agent.

In this chapter, we’re going to build on the things you’ve already learned and create a slightly more advanced simulation using the same fundamental principles. The simulation environment we’re going to build is shown in Figure 4-1.

psml 0401
Figure 4-1. The simulation we’ll be building

This simulation will consist of a cube, which will again serve as our agent. The agent’s goal will be to push a block into a goal area as quickly as possible.

By the end of this chapter, you’ll have continued to solidify your Unity skills for assembling simulation environments, and have a better handle on the components and features of the ML-Agents Toolkit.

Setting Up the Block Pusher

For a full rundown and discussion of the tools you’ll need for simulation and machine learning, refer back to Chapter 1. This section will give you a quick summary of the bits and pieces you’ll need to accomplish this particular activity.

Specifically, here we will do the following:

  1. Create a new Unity project and set it up for use with ML-Agents.

  2. Create the environment for our block pusher in a scene in that Unity project.

  3. Implement the necessary code to make our block pushing agent function in the environment and be trainable using reinforcement learning.

  4. And finally, train our agent in the environment and see how it runs.

Creating the Unity Project

Once again, we’ll be creating a brand-new Unity project for this simulation:

  1. Open the Unity Hub and create a new 3D project. We’ll name ours “BlockPusher.”

  2. Install the ML-Agents Toolkit package. Refer back to Chapter 2 for instructions.

That’s all! You’re ready to go ahead and make the environment for the block pusher agent to live in.

The Environment

With our empty Unity project ready to go with ML-Agents, the next step is to create the simulation environment. In addition to the agent itself, our simulation environment for this chapter has the following requirements:

  • A floor for the agent to move around on

  • A block for the agent to push around

  • A set of outer walls to prevent the agent from falling into the void

  • A goal area for the agent to push the block into

In the following few sections, we’ll create each of these pieces in the Unity Editor.

The Floor

The floor is where our agent and the block it pushes will live. The floor is very similar to the one created in Chapter 2, but here we’ll also be building walls around it. With the new Unity project open in the editor, we’ll create a new scene and create the floor for our agent (and the block it pushes) to live on:

  1. Open the GameObject menu → 3D Object → Cube. Click on the cube that you’ve created in the Hierarchy view, and as before set its name to “Floor” or something similar.

  2. With the new floor selected, set its position to something suitable, and its scale to (20, 0.35, 20) or something similar, so that it’s a big flat floor with a bit of thickness to it, as shown in Figure 4-2.

    psml 0402
    Figure 4-2. The floor for our simulation
    Tip

    The floor is the center of existence for this world. By centering the world on the floor, the floor’s position doesn’t really matter.

We want our floor to have a little more character this time, so we’re going to give it some color:

  1. Open the Assets menu → Create → Material to create a new material asset in the project (you can see it in the Project view). Rename the material to “Material_Floor” or something similar by right-clicking and selecting Rename (or pressing Return while the material is selected).

  2. Ensure that the new material is selected in the Project view and use the Inspector to set the albedo color to something fancy. We recommend a nice orange color, but anything is fine. Your Inspector should look something like Figure 4-3.

    psml 0403
    Figure 4-3. The floor material
  3. Select the floor in the Hierarchy view and drag the new material from the Project view directly onto either the floor’s entry in the Project view, or the empty space at the bottom of the floor’s Inspector. The floor should change color in the Scene view, and the Inspector for the floor should have a new component, as shown in Figure 4-4.

    psml 0404
    Figure 4-4. The Inspector for the floor, showing the new Material component

That’s it for the floor! Make sure you save the scene before continuing.

The Walls

Next, we need to create some walls around the floor. Unlike in Chapter 2, we don’t want the agent to ever have the possibility of falling off the floor.

To create walls, we’ll once again be using our old, versatile friend, the cube. Back in the Unity scene where you made the floor a moment ago, do the following:

  1. Create a new cube in the scene. Make it the same scale on the x-axis as the floor (so probably about 20), 1 unit high on the y-axis, and around 0.25 on the z-axis. It should look something like Figure 4-5.

    psml 0405
    Figure 4-5. The first wall
  2. Create a new material for the walls, give it a nice color, and apply it to the wall you’ve created. Ours is shown in Figure 4-6.

    psml 0406
    Figure 4-6. The new wall material
  3. Rename the cube “Wall” or something similar, and duplicate it once. These will be our walls on one axis. Don’t worry about moving them to the right position just yet.

  4. Duplicate one of the walls again and, using the Inspector, rotate it 90 degrees on the y-axis. Once it’s there, duplicate it.

    Tip

    You can switch to the move tool by pressing the W key on your keyboard.

  5. Position the walls by using the move tool while each wall is selected (in either the Scene view or the Hierarchy view) and holding the V key on your keyboard to enter vertex snapping mode. While the V key is held, mouse over the different vertices in the wall’s mesh. Mouse over one of the outer bottom-corner vertices of a wall, and then click and drag on the move handle to snap it to the appropriate upper-corner vertex on the floor. This process is shown in Figure 4-7.

    psml 0407
    Figure 4-7. Vertex-snapping on the corner
    Tip

    You can switch between different views in the Scene view using the widget in the upper-right corner, shown in Figure 4-8.

    psml 0408
    Figure 4-8. The scene widget
  6. Repeat this for each wall segment. Some of the wall segments will overlap and intersect each other, and that’s fine.

    When you’re done, your walls should look like Figure 4-9. As always, save your scene before you continue.

    psml 0409
    Figure 4-9. The four final walls

The Block

The block, at this phase, is the simplest element that we need to create in the editor. Like many of us, it exists to be pushed around (in this case, by the agent). We’ll add the block in the Unity scene:

  1. Add a new cube to the scene, and rename it “Block.”

  2. Use the Inspector to add a Rigidbody component to the agent, setting its mass to 10 and its drag to 4, and freezing its rotation on all three axes, as shown in Figure 4-10.

    psml 0410
    Figure 4-10. The block’s parameters
  3. Position the block somewhere on the floor. Anywhere is fine.

    Tip

    If you’re having trouble positioning the block precisely on the floor, you can use the move tool in vertex snapping mode, like you did for the walls, and snap the agent to one of the corners of the floor (where it will be intersecting with the walls). Then use the directional move tool (by clicking and dragging on the arrows coming out of the agent while it’s in move mode) or the Inspector to move it to the desired location.

The Goal

The goal is the location in the scene where the agent needs to push the block. It’s less of a physical thing and more of a concept. But concepts can’t be represented in video game engines, so how do we implement it? That’s a great question, dear reader! We make a plane—a flat area—that we set to a specific color so that the watching human (i.e., us) can tell where the goal area is. The color won’t be used by the agent at all, it’s just for us.

The agent will use the collider we add, which is a big volume of space that exists above the colored ground area, and using C# code, we can know when something is inside that volume (hence the name “collider”).

Follow these steps to create the goal and its collider:

  1. Create a new plane in the scene and rename it “Goal” or something similar.

  2. Create a new material for the goal and apply it. We recommend you use a color that will stand out, since this is the goal area that we want the agent to push the cube into. Apply the new material to the goal.

  3. Use the same trick with vertex snapping that you used earlier in “The Walls” to position the goal using the Rect tool (accessible using T on your keyboard) or via the tools selector, shown in Figure 4-11. Position the goal roughly as shown in Figure 4-12.

    psml 0411
    Figure 4-11. The tool selector
    psml 0412
    Figure 4-12. The goal in position
  4. Using the inspector, remove the Mesh Collider component from the goal, and use the Add Component button to add a Box Collider component instead.

  5. With the goal selected in the Hierarchy, click the Edit Collider button in the Box Collider component of the goal’s Inspector (shown in Figure 4-13.)

    psml 0413
    Figure 4-13. Edit Collider button
  6. Use the small green square handles to size the collider of the goal so that it encompasses more of the environment’s volume, so if the agent enters the collider it will be detected. Ours is shown in Figure 4-14, but this is not a science; you just need to make it big! You might find it easier just to increase the Box Collider component’s size on its y-axis using the Inspector.

    psml 0414
    Figure 4-14. Our large collider, showing the handles

As before, don’t forget to save the scene.

The Agent

Finally (almost), we need to create the agent itself. Our agent is going to be a cube, with the appropriate script (which we’ll also create) attached to it, just like we did with the ball agent in Chapter 2.

Still in the Unity Editor, do the following in your scene:

  1. Create a new cube and name it “Agent” or something similar.

  2. In the agent’s Inspector, select the Add Component button and add a new script. Name it something like “BlockSorterAgent.”

  3. Open the newly created script and add the following import statements:

    using Unity.MLAgents;
    using Unity.MLAgents.Actuators;
    using Unity.MLAgents.Sensors;
  4. Update the class to be a child of Agent.

  5. Now you need some properties, starting with a handle for the floor and environment (we’ll get back to assigning these shortly). These go inside the class, before any methods:

    public GameObject floor;
    public GameObject env;
  6. You also need something to represent the bounds of the floor:

    public Bounds areaBounds;
  7. And you need something to represent the goal area and the block that needs to be pushed to the goal:

    public GameObject goal;
    public GameObject block;
  8. Now add some Rigidbodys to store the body of the block and the agent:

    Rigidbody blockRigidbody;
    Rigidbody agentRigidbody;

When the agent is initialized, we need to do a few things, so the first thing we’ll make is the Initialize() function:

  1. Add the Initialize() function:

    public override void Initialize()
    {
    
    }
  1. Inside, get a handle on the agent and block’s Rigidbodys:

    agentRigidbody = GetComponent<Rigidbody>();
    blockRigidbody = block.GetComponent<Rigidbody>();
  2. And finally, for the Initialize() function, get a handle on the bounds of the floor:

    areaBounds = floor.GetComponent<Collider>().bounds;

Next, we want to be able to randomly position the agent within the floor when it spawns (and for each training run), so we’ll make a GetRandomStartPosition() method. This method is entirely ours, and isn’t implementing a required piece of ML-Agents (like the methods we override):

  1. Add the GetRandomStartPosition() method:

    public Vector3 GetRandomStartPosition()
    {
    
    }

    We’ll call this method whenever we want to position something randomly within the floor that’s in our simulation. It will return a random usable position on the floor.

  2. Inside GetRandomStartPosition(), get a handle on the bounds of the floor and the goal:

    Bounds floorBounds = floor.GetComponent<Collider>().bounds;
    Bounds goalBounds = goal.GetComponent<Collider>().bounds;
  3. Now create someplace to store the new point on the floor (we’ll return to this in a bit):

    Vector3 pointOnFloor;
  4. Now, make a timer so that you can see if this process takes too long for some reason:

    var watchdogTimer = System.Diagnostics.Stopwatch.StartNew();
  5. Next, add a variable to store a margin. We’ll use this to add and remove a small buffer from the random position that is picked:

    float margin = 1.0f;
  6. Now start a do-while that continues picking a random point if it picks one that is inside the goal’s bounds:

    do
    {
    
    } while (goalBounds.Contains(pointOnFloor));
  7. Inside the do, check if the timer has gone on too long, and throw an exception if it did:

    if (watchdogTimer.ElapsedMilliseconds > 30)
    {
        throw new System.TimeoutException
          ("Took too long to find a point on the floor!");
    }
  8. Then, still inside the do, but below the if statement, pick a point on the top face of the floor:

    pointOnFloor = new Vector3(
        Random.Range(floorBounds.min.x + margin, floorBounds.max.x - margin),
        floorBounds.max.y,
        Random.Range(floorBounds.min.z + margin, floorBounds.max.z - margin)
    );

    Add and remove the margin so that the box is always on the floor, and not in the walls or in space.

  9. After the do-while, return the pointOnFloor that you created:

    return pointOnFloor;

That’s it for GetRandomStartPosition(). Next, we need a function to call when the agent gets the block to the goal. We’ll use this function to reward the agent for doing the right thing, reinforcing the policy that we want:

  1. Create the GoalScored() function:

    public void GoalScored()
    {
    
    }
  2. Add a call to AddReward():

    AddReward(5f);
  3. And add a call to EndEpisode():

    EndEpisode();

Next, we’ll implement OnEpisodeBegin(),the function that’s called when each training or inference episode begins:

  1. First, we’ll put the function in place:

    public override void OnEpisodeBegin()
    {
    
    }
  2. And we’ll get a random rotation and angle:

    var rotation = Random.Range(0, 4);
    var rotationAngle = rotation * 90f;
  3. Now we’ll get a random start position for the block, using the function we created:

    block.transform.position = GetRandomStartPosition();
  4. We’ll set the block’s velocity and angular velocity, using its Rigidbody:

    blockRigidbody.velocity = Vector3.zero;
    blockRigidbody.angularVelocity = Vector3.zero;
  5. We’ll get a random start position for the agent:

    transform.position = GetRandomStartPosition();
  6. And we’ll set the agent’s velocity and angular velocity, using its Rigidbody as well:

    agentRigidbody.velocity = Vector3.zero;
    agentRigidbody.angularVelocity = Vector3.zero;
  7. Finally, we’ll rotate the whole environment. We do this so that the agent doesn’t learn the side that always has the goal:

    //env.transform.Rotate(new Vector3(0f, rotationAngle, 0f));

And that’s it for the OnEpisodeBegin() function. Save your code.

Next, we’re going to implement the Heuristic() function so that we can manually control the agent if we want to:

  1. Create the function Heuristic():

    public override void Heuristic(in ActionBuffers actionsOut)
    {
    
    }
    Note

    Manual control of the agent here is entirely unrelated to the training process. It just exists so that we can verify the agent can move in the environment appropriately.

  2. Get a handle on the actions that the Unity ML-Agents Toolkit sends, and set the action to 0 so that you know you’ll always end up with a valid action or 0 by the end of the call to Heuristic():

    var discreteActionsOut = actionsOut.DiscreteActions;
    discreteActionsOut[0] = 0;
  3. Then, for each key—D, W, A, and S—check if it’s being used, and send the appropriate action:

    if(Input.GetKey(KeyCode.D))
    {
        discreteActionsOut[0] = 3;
    }
    else if(Input.GetKey(KeyCode.W))
    {
        discreteActionsOut[0] = 1;
    }
    else if (Input.GetKey(KeyCode.A))
    {
        discreteActionsOut[0] = 4;
    }
    else if (Input.GetKey(KeyCode.S))
    {
        discreteActionsOut[0] = 2;
    }
    Tip

    These numbers are totally arbitrary. As long as they stay consistent and don’t overlap, it doesn’t matter what they are. One number consistently represents one direction (which corresponds to a keypress when under human control).

And that’s all for the Heuristic() function.

Next, we need to implement the MoveAgent() function, which will allow the ML-Agents framework to control the agent for both training and inference purposes:

  1. First, we’ll implement the function:

    public void MoveAgent(ActionSegment<int> act)
    {
    
    }
  2. Then, inside, we’ll zero out the direction and rotation that will be used for the movement:

    var direction = Vector3.zero;
    var rotation = Vector3.zero;
  3. And we’ll assign the action coming in from the Unity ML-Agents Toolkit to something a little more readable:

    var action = act[0];
  1. Now we’ll switch on that action and set the direction or rotation appropriately:

    switch (action)
    {
        case 1:
            direction = transform.forward * 1f;
            break;
        case 2:
            direction = transform.forward * -1f;
            break;
        case 3:
            rotation = transform.up * 1f;
            break;
        case 4:
            rotation = transform.up * -1f;
            break;
        case 5:
            direction = transform.right * -0.75f;
            break;
        case 6:
            direction = transform.right * 0.75f;
            break;
    }
  2. Then, outside the switch, we’ll act on any rotation:

    transform.Rotate(rotation, Time.fixedDeltaTime * 200f);
  3. And we’ll also act on any direction, by applying a force to the agent’s Rigidbody:

    agentRigidbody.AddForce(direction * 1, ForceMode.VelocityChange);

And that’s all for MoveAgent(). Again, save your code.

Finally, for now, we need to implement the OnActionReceived() function, which doesn’t do much more than pass the received action on to our MoveAgent() function:

  1. Create the function:

    public override void OnActionReceived(ActionBuffers actions)
    {
    
    }
  2. Call your own MoveAgent() function, passing in the discrete actions:

    MoveAgent(actions.DiscreteActions);
  3. And punish the agent by setting a negative reward based on the step:

    SetReward(-1f / MaxStep);

    This negative reward will hopefully encourage the agent to economize its movement and take as few moves as possible in order to maximize its reward and achieve the goal we want from it.

That’s everything for now. Make sure your code is saved before you continue.

The Environment

We need to do a little more administrative work in setting up the environment before we continue, so switch back to your scene in the Unity Editor. We’ll start by creating a GameObject to hold the walls in, just to keep the Hierarchy clean:

  1. Right-click on the Hierarchy view and choose Create Empty. Rename the empty GameObject “Walls,” as shown in Figure 4-15.

    psml 0415
    Figure 4-15. The walls object, named
  2. Select all four walls (you can hold your Shift key and click them one by one, or hold Shift after clicking the first one and then click the last one) and drag them under the new walls object. It should look like Figure 4-16.

    psml 0416
    Figure 4-16. The walls are nicely encapsulated

Now we’ll create an empty GameObject in which to hold the entire environment:

  1. Right-click in the Hierarchy view and choose Create Empty. Rename the empty GameObject “Environment.”

  2. In the Hierarchy view, drag the walls object we just made, plus the agent, floor, block, and goal, into the new environment object. It should look like Figure 4-17 at this point.

psml 0417
Figure 4-17. The environment, encapsulated

Next, we need to configure some things on our agent:

  1. Select the agent in the Hierarchy view, and scroll down to the script you added in the Inspector view. Drag the floor object from the Hierarchy view into the Floor slot in the Inspector.

  2. Do the same for the overall environment GameObject, the goal, and the block. Set the Max Steps to 5000 in the editor so that the agent doesn’t take forever to push a block to the goal. Your Inspector should look like Figure 4-18.

    psml 0418
    Figure 4-18. The agent script properties
  3. Now, using the Add Component button in the Inspector for the agent, add a DecisionRequester script and set its Decision Period to 5, as shown in Figure 4-19.

    psml 0419
    Figure 4-19. The Decision Requester component, added to the agent and appropriately configured
  4. Add two Ray Perception Sensor 3D components, each with three detectable tags: block, goal, and wall, with the settings shown in Figure 4-20.

    Back in “Letting the Agent Observe the Environment”, we said you can add observations via code or via components. There we did it all via code. Here we’re going to do it all via components. The components in question are the Ray Perception Sensor 3D components that we just added.

    psml 0420
    Figure 4-20. The two Ray Perception sensors
    Tip

    We don’t even have a CollectObservations method in our agent this time, because all the observations are collected via the Ray Perception Sensor 3D components that we add in the editor.

  5. We’ll need to add the tags we just used to the objects we actually want to tag. The tags allow us to refer to objects based on what they’re tagged with, so if something is tagged with “wall,” we can treat it as a wall, and so on. Select the block in the Hierarchy, and use the Inspector to add a new tag, as shown in Figure 4-21.

    psml 0421
    Figure 4-21. Adding a new tag
  1. Name the new tag “block,” as shown in Figure 4-22.

    psml 0422
    Figure 4-22. Naming a new tag
  2. And finally, attach the new tag to the block, as shown in Figure 4-23.

    psml 0423
    Figure 4-23. Attaching the tag to an object
  3. Repeat this for the goal, using a “goal” tag, and for all the wall components, using a “wall” tag. With these in place, the Ray Perception Sensor 3D components we added will only “see” things tagged with “block,” “goal,” or “wall.” As shown in Figure 4-24, we’ve added two layers of Ray Perception sensors, which case a line out from the object they’re attached to and report back on the first thing that line hits (in this case, only if it’s a wall, a goal, or a block). We’ve added two that are staggered at different angles. They’ll only be visible in the Unity Editor.

    psml 0424
    Figure 4-24. The Ray Perception Sensor 3D components
  4. Finally, add a Behavior Parameters component, using the Add Component button. Name the behavior “Push” and set the parameters as shown in Figure 4-25.

    psml 0425
    Figure 4-25. The Behavior Parameters for the agent

Save your scene in the Unity Editor. Now we’ll do some configuration on our block:

  1. Add a new script to the block, named something like “GoalScore.”

  2. Open the script, and add a property to refer to the agent:

    public Block_Sorter_Agent agent;

    The type of the property you create here should match the class name for the class attached to the agent.

    Tip

    You don’t need to change the parentage to Agent or import any ML-Agents components this time, as this script isn’t an agent. It’s just a regular script.

  3. Add an OnCollisionEnter() function:

    private void OnCollisionEnter(Collision collision)
    {
    
    }
  4. Inside OnCollisionEnter(), add the following code:

    if(collision.gameObject.CompareTag("goal"))
    {
        agent.GoalScored();
    }
  5. Save the script and return to Unity, and with the block selected in the Hierarchy, drag the agent from the Hierarchy into the Agent slot in the new GoalScore script. This is shown in Figure 4-26.

    psml 0426
    Figure 4-26. The GoalScore script

Don’t forget to save the scene again.

Training and Testing

With everything built in both Unity and C# scripts, it’s time to train the agent and see how the simulation works. We’ll be following the same process we followed in “Training with the Simulation”: creating a new YAML file to serve as the hyperparameters for our training.

Here’s how to set up the hyperparameters:

  1. Create a new YAML file to serve as the hyperparameters for the training. Ours is called Push.yaml and includes the following hyperparameters and values:

    behaviors:
      Push:
        trainer_type: ppo
        hyperparameters:
          batch_size: 10
          buffer_size: 100
          learning_rate: 3.0e-4
          beta: 5.0e-4
          epsilon: 0.2
          lambd: 0.99
          num_epoch: 3
          learning_rate_schedule: linear
        network_settings:
          normalize: false
          hidden_units: 128
          num_layers: 2
        reward_signals:
          extrinsic:
            gamma: 0.99
            strength: 1.0
        max_steps: 500000
        time_horizon: 64
        summary_freq: 10000
  2. Next, inside the venv we created earlier in “Setting Up”, fire up the training process by running the following command in your terminal:

    mlagents-learn _config/Push.yaml_ --run-id=PushAgent1
    Note

    Replace config/Push.yaml with the path to the configuration file you just created.

  1. Once the command is up and running, you should see something that looks like Figure 4-27. At this point, you can press the Play button in Unity.

    psml 0222
    Figure 4-27. The ML-Agents process begins training

    You’ll know the training process is working when you see output that looks like Figure 4-28.

    psml 0223
    Figure 4-28. The ML-Agents process during training

When the training is complete, refer back to “When the Training Is Complete” for a refresher on how to fine the .nn or .onnx file that’s been generated.

Use the model to run the agent, and watch it go!

Get Practical Simulations for Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.