O'Reilly logo

Deep Reinforcement Learning Hands-On by Maxim Lapan

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Results

Let’s now take a look at the results.

The feed-forward model

The convergence on Yandex data for one year requires about 10M training steps, which can take a while (GTX 1080Ti trains at a speed of 230-250 steps per second). During training, we have several charts in TensorBoard showing us what’s going on.

The following are two charts, reward_100 and steps_100, with average reward (which is in percentages) and the average length of the episode for the last 100 episodes, respectively:

The feed-forward model

Figure 3: The reward plot for the feed-forward version

The charts show us two good things:

  1. Our agent was able to figure out when to buy and sell the share to get positive ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required