June 2018
Intermediate to advanced
318 pages
9h 24m
English
MAXQ Value Function Decomposition is one of the frequently used algorithms in HRL; let's see how MAXQ works. In MAXQ Value Function Decomposition, we decompose the value function into a set of value functions for each of the subtasks. Let's take the same example given in the paper. Remember the taxi problem we solved using Q learning and SARSA?
There are four locations in total, and the agent has to pick up a passenger at one location and drop them off at another location. The agent will receive +20 points as a reward for a successful drop off and -1 point for every time step it takes. The agent will also lose -10 points for illegal pickups and drops. So the goal of our agent is to learn to pick up and drop ...