Computing the acquired knowledge using *(s, r, a, s')* is just a naive way to calculate the utility. So, we need to find a more robust way to compute it in such that we calculate the utility of a particular state-action pair *(s, a)* by recursively considering the utilities of future actions. The utility of your current action is influenced by not only the immediate reward but also the next best action, as shown in the following formula, called **Q-function**:

In the previous formula, *s'* denotes the next state, *a'* denotes the next action, and the reward of taking action *a* in state *s* is denoted by *r(s, a).* Whereas*,* γ is a ...