Results

Let’s now take a look at the results.

The feed-forward model

The convergence on Yandex data for one year requires about 10M training steps, which can take a while (GTX 1080Ti trains at a speed of 230-250 steps per second). During training, we have several charts in TensorBoard showing us what’s going on.

The following are two charts, reward_100 and steps_100, with average reward (which is in percentages) and the average length of the episode for the last 100 episodes, respectively:

Figure 3: The reward plot for the feed-forward version

The charts show us two good things:

Our agent was able to figure out when to buy and sell the share to get positive ...

Get Deep Reinforcement Learning Hands-On now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Deep Reinforcement Learning Hands-On by Maxim Lapan

Results

The feed-forward model

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly