Implementing a Q-learning agent from scratch
In this section, we will start implementing our intelligent agent step-by-step. We will be implementing the famous Q-learning algorithm using the NumPy
library and the MountainCar-V0
environment from the OpenAI Gym library.
Let's revisit the reinforcement learning Gym boiler plate code we used in Chapter 4, Exploring the Gym and its Features, as follows:
#!/usr/bin/env python import gym env = gym.make("Qbert-v0") MAX_NUM_EPISODES = 10 MAX_STEPS_PER_EPISODE = 500 for episode in range(MAX_NUM_EPISODES): obs = env.reset() for step in range(MAX_STEPS_PER_EPISODE): env.render() action = env.action_space.sample()# Sample random action. This will be replaced by our agent's action when we start developing the agent algorithms next_state, reward, done, info = env.step(action) # Send the action to the environment and receive the next_state, reward and whether done or not obs = next_state if done is True: ...