Adversarial self-play
The last example we looked at is best defined as a competitive multi-agent training scenario where the agents are learning by competing against each other to collect bananas or freeze other agents out. In this section, we will look at another similar form of training that pits agent vs. agent using an inverse reward scheme called Adversarial self-play. Inverse rewards are used to punish an opposing agent when a competing agent receives as reward. Let's see what this looks like in the Unity ML-Agents Soccer (football) example by following this exercise:
- Open up Unity to the
SoccerTwos
scene located in theAssets/ML-Agents/Examples/Soccer/Scenes
folder. - Run the scene and use the WASD keys to play all four agents. Stop the scene when you are done having fun.
- Expand the
Academy
object in theHierarchy
window. - Select the
StrikerBrain
and switch it toExternal
. - Select the
GoalieBrain
and switch it toExternal
. - From the menu, select
File
|Build Settings...
. Click the Add Open Scene...