The idea behind SeqGAN is to get it to solve problems that vanilla GANs can't, since they are good at synthesizing discrete data, and discriminator networks can't, since they can't evaluate sequential data with various lengths. To solve the first problem, Policy Gradients are used for updating the generator network. The second problem is addressed by generating the remaining data with the Monte Carlo Tree Search (MCTS) method.
The reinforcement learning strategy in SeqGAN is designed as follows. Let's assume that at time , the generated sequence is denoted as and that the current action, , needs to be given by the generator ...