The Q-learning process

In this experiment, we use 0.01 as a learning rate and 0.8 as a discount value. This is the code to initialize the environment and hyperparameters for the training:

async function qlearning() {  const episodes = [];  for (let i = 0; i < 1000; i++) {    episodes.push(i);  }  // Initialize Environment  const env = new Environment();  // Initialize the action-value function as the 2-dim tensor   // with the shape [numState, numActions]  let actionValue = tf.fill([env.getNumStates(), env.getNumActions()], 10);  // Learning Rate  const alpha = 0.01;  // Discount Value  const discount = 0.8;  // Optimization with Q-learning  // ...}

We can update the action-value function by observing the result from the environment in episodes (1,000 episodes ...

Get Hands-On Machine Learning with TensorFlow.js now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.