Q-learning algorithm
Solving a Reinforcement Learning problem during the learning process estimates an evaluation function. This function must be able to assess, through the sum of the rewards, the convenience or, otherwise, a policy. The basic idea of Q-learning is that the algorithm learns the optimal evaluation function on the whole space of states and actions (SxA).
The so-called Q-function provides a match in the form Q: S × A => V, where V is the value of future rewards of an action, a Î A
, executed in the state s Î S
.
Once it has learned the optimal function, Q
, the agent will of course be able to recognize what action will lead to the highest future reward in a s
state.
One of the most used examples for implementing the Q-learning algorithm involves the use of a table. Each cell of the table is a value, Q(s; a)= V, initialized to 0.
The agent can perform any action a Î A
, where A
is the total set of actions known by the agent. The basic idea of the algorithm is the training rule,...