Reinforcement learning algorithms concept

Let's create a simplistic model for reinforcement learning with an introduction of the basic terminologies: 

At each step and time (t), the agent:

  • Executes action at
  • Receives observation ot
  • Receives a reward rt

At each step and time (t), the environment:

  • Receives action at
  • Generates observation ot+1
  • Generates scalar reward rt+1

The environment is considered to be non-deterministic (action at based on ot will receive reward rt and the same action in the same state may result in different rewards). 

The agent (intelligent machine) is connected to the environmental context with its observation and ...

Get Artificial Intelligence for Big Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.