June 2018
Intermediate to advanced
436 pages
10h 33m
English
Now that the agent is provided with some guidance—reinforcement—the next task is to calculate the reward for each action the agent makes. Take a look at this code:
// Compute reward for an action float calcReward(float[][] CurrMap, float[][] NextMap) { int newGoal = calcGoalPos(NextMap);// first, we calculate goal position for each map if(newGoal == -1) // if goal position is the initial position (i.e. no move) return (size * size + 1); // we reward the agent to 4*4+ 1 = 17 (i.e. maximum reward) return -1f; // else we reward -1.0 for each bad move }