Resource efficiency

Current deep reinforcement learning algorithms require vast amounts of time, training data, and computational resources in order to reach a desirable level of proficiency. For algorithms such as AlphaGo Zero, where our reinforcement learning algorithm learns to play Go with zero prior knowledge and experience, resource efficiency becomes a major bottleneck for taking such algorithms to commercial scales. Recall that when DeepMind implemented AlphaGo Zero, they needed to train the agent on tens of millions of games using hundreds of GPUs and thousands of CPUs. For AlphaGo Zero to reach a reasonable proficiency, it needs to play a number of games, equivalent to what hundreds of thousands of humans would play in their lifetimes. ...

Get Python Reinforcement Learning Projects now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.