Getting to know your learning agent

As we've seen in our exploration of the Taxi-v2 environment, your agent is a self-driving taxicab whose job it is to pick up passengers from a starting location and drop them off at their desired destination as efficiently as possible. The taxi collects a reward when it drops off a passenger and gets penalties for taking other actions. The following is a rendering of the taxi environment:

The rewards your agent collects are stored in the Q-table. The Q-table in our model-free algorithm is a lookup table that maps states to actions.

Think of the Q-table as an implementation of a Q-function of the Q form ...

Get Hands-On Q-Learning with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.