Training the reinforcement learning agent at the Gym
The procedure to train the Q-learning agent may look familiar to you already, because it has many of the same lines of code as, and also a similar structure to, the boilerplate code that we used before. Instead of choosing a random action from the environment's actions space, we now get the action from the agent using the agent.get_action(obs)
method. We also call the agent.learn(obs, action, reward, next_obs)
method after sending the agent's action to the environment and receiving the feedback. The training function is listed here:
def train(agent, env): best_reward = -float('inf') for episode in range(MAX_NUM_EPISODES): done = False obs = env.reset() total_reward = 0.0 while not done: action = agent.get_action(obs) next_obs, reward, done, info = env.step(action) agent.learn(obs, action, reward, next_obs) obs = next_obs total_reward += reward...