Packt+ | Advance your knowledge in tech

You're reading from Deep Reinforcement Learning Hands-On Apply modern RL methods, with deep Q-networks, value iteration, policy gradients, TRPO, AlphaGo Zero and more

Product type Paperback

Published in Jun 2018

Publisher Packt

ISBN-13 9781788834247

Length 546 pages

Edition 1st Edition

Languages

Python

Tools

Deep Reinforcement Learning

Concepts

Deep Reinforcement Learning

Author (1):

Maxim Lapan

View More author details

Table of Contents (23) Chapters

Deep Reinforcement Learning Hands-On

Contributors

Preface

Other Books You May Enjoy

1. What is Reinforcement Learning? FREE CHAPTER

2. OpenAI Gym

3. Deep Learning with PyTorch

4. The Cross-Entropy Method

5. Tabular Learning and the Bellman Equation

6. Deep Q-Networks

7. DQN Extensions

8. Stocks Trading Using RL

9. Policy Gradients – An Alternative

10. The Actor-Critic Method

11. Asynchronous Advantage Actor-Critic

12. Chatbots Training with RL

13. Web Navigation

14. Continuous Action Space

15. Trust Regions – TRPO, PPO, and ACKTR

16. Black-Box Optimization in RL

17. Beyond Model-Free – Imagination

18. AlphaGo Zero

Index

Example – GAN on Atari images

Almost every book about DL uses the MNIST dataset to show you the power of DL, which, over the years, has made this dataset extremely boring, like a fruit fly for genetic researchers. To break this tradition, and add a bit more fun to the book, I've tried to avoid well-beaten paths and illustrate PyTorch using something different. You may have heard about generative adversarial networks (GANs), which were invented and popularized by Ian Goodfellow. In this example, we'll train a GAN to generate screenshots of various Atari games.

The simplest GAN architecture is this: we have two networks and the first works as a "cheater" (it is also called generator), and the other is a "detective" (another name is discriminator). Both networks compete with each other: the generator tries to generate fake data, which will be hard for the discriminator to distinguish from your dataset, and the discriminator tries to detect the generated data samples. Over time, both networks...