FrozenLake-v0 implementation problem
Here we report a basic Q-learning implementation for the FrozenLake-v0 problem.
Import the following two basic libraries:
import gym import numpyasnp
Then, we load the FrozenLake-v0
environment:
environment = gym.make('FrozenLake-v0')
Then, we build the Q-learning table; it has the dimensions SxA, where S
is the dimension of the observation space, S
, while A
is the dimension of the action space, A
:
S = environment.observation_space.n A = environment.action_space.n
The FrozenLake environment provides a state for each block, and four actions (that is, the four directions of movement), giving us a 16x4 table of Q-values to initialize:
Q = np.zeros([S,A])
Then, we define the a
parameter for the training rule and the discount g
factor:
alpha = .85 gamma = .99
We fix the total number of episodes (trials):
num_episodes = 2000
Then, we initialize the rList
, where we'll append the cumulative reward to evaluate the algorithm's score:
rList = []
Finally, we start the Q-learning...