Exercises
Use the following exercises to improve your understanding of RL and the PPO trainer.
- Convert one of the Unity examples to use just visual observations. Hint, use the
GridWorld
example as a guide, and remember that the agent may need its own camera. - Alter the CNN configuration of an agent using visual observations in three different ways. You can add more layers, take them away, or alter the kernel filter. Run the training sessions and compare the differences with TensorBoard.
- Convert the
GridWorld
sample to use vector observations and recurrent networks with memory. Hint, you can borrow several pieces of code from theHallway
example.
- Revisit the
Ball3D
example and set it up to use multiple asynchronous agent training. - Set up the crawler example and run it with multiple asynchronous agent training.
If you encounter problems running through these samples, be sure to check online. These samples will likely be well worn, with many other people tweaking or enhancing them further.