Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Deep Reinforcement Learning Hands-On

You're reading from   Deep Reinforcement Learning Hands-On Apply modern RL methods to practical problems of chatbots, robotics, discrete optimization, web automation, and more

Arrow left icon
Product type Paperback
Published in Jan 2020
Publisher Packt
ISBN-13 9781838826994
Length 826 pages
Edition 2nd Edition
Languages
Concepts
Arrow right icon
Author (1):
Arrow left icon
Maxim Lapan Maxim Lapan
Author Profile Icon Maxim Lapan
Maxim Lapan
Arrow right icon
View More author details
Toc

Table of Contents (28) Chapters Close

Preface 1. What Is Reinforcement Learning? 2. OpenAI Gym FREE CHAPTER 3. Deep Learning with PyTorch 4. The Cross-Entropy Method 5. Tabular Learning and the Bellman Equation 6. Deep Q-Networks 7. Higher-Level RL Libraries 8. DQN Extensions 9. Ways to Speed up RL 10. Stocks Trading Using RL 11. Policy Gradients – an Alternative 12. The Actor-Critic Method 13. Asynchronous Advantage Actor-Critic 14. Training Chatbots with RL 15. The TextWorld Environment 16. Web Navigation 17. Continuous Action Space 18. RL in Robotics 19. Trust Regions – PPO, TRPO, ACKTR, and SAC 20. Black-Box Optimization in RL 21. Advanced Exploration 22. Beyond Model-Free – Imagination 23. AlphaGo Zero 24. RL in Discrete Optimization 25. Multi-agent RL 26. Other Books You May Enjoy
27. Index

Reinforcement learning

RL is the third camp and lies somewhere in between full supervision and a complete lack of predefined labels. On the one hand, it uses many well-established methods of supervised learning, such as deep neural networks for function approximation, stochastic gradient descent, and backpropagation, to learn data representation. On the other hand, it usually applies them in a different way.

In the next two sections of the chapter, we will explore specific details of the RL approach, including assumptions and abstractions in its strict mathematical form. For now, to compare RL with supervised and unsupervised learning, we will take a less formal, but more easily understood, path.

Imagine that you have an agent that needs to take actions in some environment. (Both "agent" and "environment" will be defined in detail later in this chapter.) A robot mouse in a maze is a good example, but you can also imagine an automatic helicopter trying to perform a roll, or a chess program learning how to beat a grandmaster. Let's go with the robot mouse for simplicity.

\\192.168.0.200\All_Books\2018\Working_Titles\Books2018\9471_Deep Reinforcement Learning Hands-On\Current-Titles\Chapter01\Graphics\B09471_01_01.png

Figure 1.1: The robot mouse maze world

In this case, the environment is a maze with food at some points and electricity at others. The robot mouse can take actions, such as turn left/right and move forward. At each moment, it can observe the full state of the maze to make a decision about the actions to take. The robot mouse tries to find as much food as possible while avoiding getting an electric shock whenever possible. These food and electricity signals stand as the reward that is given to the agent (robot mouse) by the environment as additional feedback about the agent's actions. The reward is a very important concept in RL, and we will talk about it later in the chapter. For now, it is enough for you to know that the final goal of the agent is to get as much total reward as possible. In our particular example, the robot mouse could suffer a slight electric shock to get to a place with plenty of food—this would be a better result for the robot mouse than just standing still and gaining nothing.

We don't want to hard-code knowledge about the environment and the best actions to take in every specific situation into the robot mouse—it will take too much effort and may become useless even with a slight maze change. What we want is to have some magic set of methods that will allow our robot mouse to learn on its own how to avoid electricity and gather as much food as possible. RL is exactly this magic toolbox and it behaves differently from supervised and unsupervised learning methods; it doesn't work with predefined labels in the way that supervised learning does. Nobody labels all the images that the robot sees as good or bad, or gives it the best direction to turn in.

However, we're not completely blind as in an unsupervised learning setup—we have a reward system. The reward can be positive from gathering the food, negative from electric shocks, or neutral when nothing special happens. By observing the reward and relating it to the actions taken, our agent learns how to perform an action better, gather more food, and get fewer electric shocks. Of course, RL generality and flexibility comes with a price. RL is considered to be a much more challenging area than supervised or unsupervised learning. Let's quickly discuss what makes RL tricky.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime
Visually different images