A simple and complete Q-Learner implementation for solving the Mountain Car problem

In this section, we will put together the whole code into a single Python script to initialize the environment, launch the agent's training process, get the trained policy, test the performance of the agent, and also record how it acts in the environment!

#!/usr/bin/env/ pythonimport gymimport numpy as npMAX_NUM_EPISODES = 50000STEPS_PER_EPISODE = 200 #  This is specific to MountainCar. May change with envEPSILON_MIN = 0.005max_num_steps = MAX_NUM_EPISODES * STEPS_PER_EPISODEEPSILON_DECAY = 500 * EPSILON_MIN / max_num_stepsALPHA = 0.05  # Learning rateGAMMA = 0.98  # Discount factorNUM_DISCRETE_BINS = 30  # Number of bins to Discretize each observation dimclass ...

Get Hands-On Intelligent Agents with OpenAI Gym now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.