O'Reilly logo

Deep Reinforcement Learning Hands-On by Maxim Lapan

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

ES on HalfCheetah

In the next example, we'll go beyond the simplest ES implementation and look at how this method can be parallelized efficiently using the shared seed strategy proposed by the paper [1]. To show this approach, we'll use the environment from the roboschool library that we already experimented with in Chapter 15, Trust Regions – TRPO, PPO, and ACKTR, HalfCheetah, which is a continuous action problem where a weird two-legged creature gains reward by running forward without injuring itself.

First, let's discuss the idea of shared seeds. The performance of the ES algorithm is mostly determined by the speed that we can gather our training batch, which consists of sampling the noise and checking the total reward of the perturbed noise. ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required