• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2020, Vol. 42 ›› Issue (09): 1680-1689.

Previous Articles     Next Articles

Proximal policy optimization and adversarial learning based dialog generation

CAI Yue1,YOU Jin-guo1,2,DING Jia-man1   

  1. (1.Faculty of Information Engineering and Automation,Kunming University of Science And Technology,Kunming 650500;

    2.Computer Technology Application Key Laboratory of Yunnan Province, Kunming 650500,China)

  • Received:2019-11-26 Revised:2020-03-10 Accepted:2020-09-25 Online:2020-09-25 Published:2020-09-25

Abstract: Dialogue generation is the key research direction of natural language processing. Generative adversarial nets (GAN) have recently been well applied in the field of dialog generation. In order to further improve the quality of dialogue generation and solve the low efficiency of model training caused by the discriminative model return reward low utilization rate in the GAN training process, this paper proposes a dialogue generation algorithm (PPO_GAN) based on proximal policy optimization (PPO). 
The algorithm, via GAN, generates a dialogue through the generation model, and distinguishes between generated dialogue and real dialogue through the discriminant model. The GAN is trained by proximal policy optimization method, which can deal with the situation that the back propagation of GAN cannot be differentiated when the dialogue is generated. While ensuring the monotonic non-reduction training of the generated model, the rewards obtained by the discriminant model can be reused by limiting the gra- dient of the generated model iteration. The experimental results show that, compared with dialog gene- ration algorithm such as the maxinum likelihood estimation  and Adver-REGS, the PPO_GAN algorithm improves the efficiency of dialogue training and the quality of dialog generation.


Key words: dialog generation, proximal policy optimization (PPO), reinforcement learning, generative adversarial nets (GAN), sequence-to-sequence model