Summary
In this chapter, we covered the most famous algorithms in reinforcement learning, the policy gradients and actor-critic algorithms. There is a lot of research going on in developing policy gradients to benchmark better results in reinforcement learning. Further study of policy gradients include Trust Region Policy Optimization (TRPO), Natural Policy Gradients, and Deep Dependency Policy Gradients (DDPG), which are beyond the scope of this book.
In the next chapter, we will take a look at the building blocks of Q-Learning, applying deep neural networks, and many more techniques.