March 2018
Intermediate to advanced
272 pages
7h 53m
English
As our agent transitions from state to state, by taking actions, it receives a reward. The agent can learn online by using each state, action, and reward as training input. After every action, the agent will update it's neural network weights, hopefully getting smarter along the way. This is the basic idea of online learning. The agent learns as it goes, just like you and I do.
The shortcomings of this naive type of online learning are somewhat obvious and two-fold: