In this chapter, we will explore policy-based methods, which is another category of family of reinforcement learning algorithms. In previous chapters, we focused on value-based methods, which estimate the optimal state-action value function. Value-based methods can become computationally expensive for large state or action spaces, and they can struggle with environments where the dynamics of the environment are stochastic.
Policy-based methods, on the other hand, directly learn the optimal policy without estimating ...