Chatbots are taking the world by storm, and are predicted to become more prevalent in the coming years. The coherence of the results obtained from dialogues with these chatbots has to constantly improve if they are to gain widespread acceptance. One way to achieve this would be via the use of reinforcement learning.
In this chapter, we implemented reinforcement learning in the creation of a chatbot. The learning was based on a policy gradient method that focused on the future direction of a dialogue agent, in order to generate coherent and interesting interactions. The datasets that we used were from movie conversations. We proceeded to clean and preprocess the datasets, obtaining the vocabulary from them. We then formulated our policy ...