Neural Q-learning
Most reinforcement learning algorithms boil down to just three main steps: infer, do, and learn. During the first step, the algorithm selects the best action a in a given state s using the knowledge it has so far. Next, it performs an action to find the reward r as well as the next state s'.
Then it improves its understanding of the world using the newly acquired knowledge (s, r, a, s'). These steps can be formulated even better using QLearning algorithms, which is more or less at the core of Deep Reinforcement Learning.
Introduction to QLearning
Computing the acquired knowledge using (s, r, a, s') is just a naive way to calculate the utility. So, we need to find a more robust way to compute it in such that we calculate the utility of a particular state-action pair (s, a) by recursively considering the utilities of future actions. The utility of your current action is influenced by not only the immediate reward but also the next best action, as shown in the following formula...