What Is Reinforcement Learning?
Reinforcement learning (RL) is a subfield of machine learning (ML) that addresses the problem of the automatic learning of optimal decisions over time. This is a general and common problem that has been studied in many scientific and engineering fields.
In our changing world, even problems that look like static input-output problems can become dynamic if time is taken into account. For example, imagine that you want to solve the simple supervised learning problem of pet image classification with two target classes—dog and cat. You gather the training dataset and implement the classifier using your favorite deep learning (DL) toolkit. After a while, the model that has converged demonstrates excellent performance. Great! You deploy it and leave it running for a while. However, after a vacation at some seaside resort, you return to discover that dog grooming fashions have changed and a significant portion of your queries are now misclassified, so you need to update your training images and repeat the process again. Not so great!
The preceding example is intended to show that even simple ML problems have a hidden time dimension. This is frequently overlooked, but it might become an issue in a production system. RL is an approach that natively incorporates an extra dimension (which is usually time, but not necessarily) into learning equations. This places RL much closer to how people understand artificial intelligence (AI).
In this chapter, we will discuss RL in more detail and you will become familiar with the following:
- How RL is related to and differs from other ML disciplines: supervised and unsupervised learning
- What the main RL formalisms are and how they are related to each other
- Theoretical foundations of RL—the Markov decision processes