Packt+ | Advance your knowledge in tech

You're reading from Deep Reinforcement Learning Hands-On Apply modern RL methods, with deep Q-networks, value iteration, policy gradients, TRPO, AlphaGo Zero and more

Product type Paperback

Published in Jun 2018

Publisher Packt

ISBN-13 9781788834247

Length 546 pages

Edition 1st Edition

Languages

Python

Tools

Deep Reinforcement Learning

Concepts

Deep Reinforcement Learning

Author (1):

Maxim Lapan

View More author details

Table of Contents (23) Chapters

Deep Reinforcement Learning Hands-On

Contributors

Preface

Other Books You May Enjoy

1. What is Reinforcement Learning? FREE CHAPTER

2. OpenAI Gym

3. Deep Learning with PyTorch

4. The Cross-Entropy Method

5. Tabular Learning and the Bellman Equation

6. Deep Q-Networks

7. DQN Extensions

8. Stocks Trading Using RL

9. Policy Gradients – An Alternative

10. The Actor-Critic Method

11. Asynchronous Advantage Actor-Critic

12. Chatbots Training with RL

13. Web Navigation

14. Continuous Action Space

15. Trust Regions – TRPO, PPO, and ACKTR

16. Black-Box Optimization in RL

17. Beyond Model-Free – Imagination

18. AlphaGo Zero

Index

Q-learning for FrozenLake

The whole example is in the Chapter05/02_frozenlake_q_learning.py file, and the difference is really minor. The most obvious change is to our value table. In the previous example, we kept the value of the state, so the key in the dictionary was just a state. Now we need to store values of the Q-function, which has two parameters: state and action, so the key in the value table is now a composite.

The second difference is in our calc_action_value function. We just don't need it anymore, as our action values are stored in the value table. Finally, the most important change in the code is in the agent's value_iteration method. Before, it was just a wrapper around the calc_action_value call, which did the job of Bellman approximation. Now, as this function has gone and was replaced by a value table, we need to do this approximation in the value_iteration method.

Let's look at the code. As it's almost the same, I'll jump directly to the most interesting value_iteration...