Implementing regret matching
Continuing with the bandit algorithms, we will explore an improvement to the UCB1 algorithm, called regret matching. We will use the same case of playing rock-paper-scissors, but it can be repurposed for other types of games, such as fighting games.
Getting ready...
It's important to have read the previous recipe and to have taken into account the member variables and data structures. The member functions are not relevant for the purpose of this algorithm as we will implement a different set to have a different recipe, but it's based on the knowledge gained previously.
How to do it...
We will implement the following steps in the same Bandit
class we created before.
- Define the required member variables:
float initialRegret = 10f; float[] regret; float[] chance; RPSAction lastOpponentAction; RPSAction[] lastActionRM;
- Define the member function for initialization:
public void InitRegretMatching() { if (init) return; // next steps }
- Declare the local variables and...