Proximal policy optimization and adversarial learning based dialog generation

Computer Engineering & Science ›› 2020, Vol. 42 ›› Issue (09): 1680-1689.

Previous Articles Next Articles

Proximal policy optimization and adversarial learning based dialog generation

CAI Yue1，YOU Jin-guo1,2，DING Jia-man1

（1.Faculty of Information Engineering and Automation,Kunming University of Science And Technology,Kunming 650500;

2.Computer Technology Application Key Laboratory of Yunnan Province, Kunming 650500,China）

Received:2019-11-26 Revised:2020-03-10 Accepted:2020-09-25 Online:2020-09-25 Published:2020-09-25

Abstract

Abstract: Dialogue generation is the key research direction of natural language processing. Generative adversarial nets (GAN) have recently been well applied in the field of dialog generation. In order to further improve the quality of dialogue generation and solve the low efficiency of model training caused by the discriminative model return reward low utilization rate in the GAN training process, this paper proposes a dialogue generation algorithm (PPO_GAN) based on proximal policy optimization (PPO).
The algorithm, via GAN, generates a dialogue through the generation model, and distinguishes between generated dialogue and real dialogue through the discriminant model. The GAN is trained by proximal policy optimization method, which can deal with the situation that the back propagation of GAN cannot be differentiated when the dialogue is generated. While ensuring the monotonic non-reduction training of the generated model, the rewards obtained by the discriminant model can be reused by limiting the gra- dient of the generated model iteration. The experimental results show that, compared with dialog gene- ration algorithm such as the maxinum likelihood estimation and Adver-REGS, the PPO_GAN algorithm improves the efficiency of dialogue training and the quality of dialog generation.

Key words: dialog generation, proximal policy optimization (PPO), reinforcement learning, generative adversarial nets (GAN), sequence-to-sequence model

CAI Yue, YOU Jin-guo, DING Jia-man. Proximal policy optimization and adversarial learning based dialog generation[J]. Computer Engineering & Science, 2020, 42(09): 1680-1689.

[1]	ZHUANG Shu-xin, CHEN Yong-hong, HAO Yi-hang, WU Wei-wei, XU Xue-yong, WANG Wan-yuan. A population diversity-based robust policy generation method in adversarial game environments#br# [J]. Computer Engineering & Science, 2024, 46(06): 1081-1091.
[2]	ZENG Fan-feng, WANG Chun-zhen, LI Chen. An unsupervised video summarization algorithm based on deep and shallow feature fusion [J]. Computer Engineering & Science, 2023, 45(09): 1602-1610.
[3]	WANG Yang, CHEN Zhi-bin. A dynamic graph transformer model for solving CVRP [J]. Computer Engineering & Science, 2023, 45(05): 859-868.
[4]	PENG Kun-yan, YIN Xiang, LIU Xiao-zhu, LI Heng-yu. A strategy search method based on particle swarm optimization and deep reinforcement learning [J]. Computer Engineering & Science, 2023, 45(04): 718-725.
[5]	TONG Zhao, YE Feng, LIU Bi-lan, DENG Xiao-mei, MEI Jing, LIU Hong. A task offloading and resource allocation algorithm under multiple constraints in mobile edge computing [J]. Computer Engineering & Science, 2020, 42(10高性能专刊): 1869-1879.
[6]	GUAN Rui, DING Jia-man, JIA Lian-yin, YOU Jin-guo, JIANG Ying, . A diversity document ranking algorithm based on reinforcement learning [J]. Computer Engineering & Science, 2020, 42(09): 1697-1703.
[7]	HAN Hu, SUN Tian-yue, ZHAO Qi-tao. Generative adversarial networks with autoencoder for text generation [J]. Computer Engineering & Science, 2020, 42(09): 1704-1710.
[8]	ZHOU Bi-ying1,WANG Ai-ping1,FEI Chang-jiang2,YU Wan-rong2,ZHAO Bao-kang2. A satellite network resource scheduling mechanism based on reinforcement learning [J]. Computer Engineering & Science, 2019, 41(12): 2134-2142.
[9]	WAN Qian1,2,LIU Wei1,2,XU Longlong1,2,GUO Jingzhi1,2. Optimal strategy planning of BDI agent based on Q-learning in uncertain environments [J]. Computer Engineering & Science, 2019, 41(01): 166-172.
[10]	SONG Jiajia,WANG Zuowei. A modified U-tree algorithm based on effective instances [J]. Computer Engineering & Science, 2019, 41(01): 185-190.
[11]	YAN Xuefei,LI Xinming,LIU Dong,LIU Desheng，LI Qiang. A RL-based command and control algorithm for SoS confrontation simulation at the tactical level [J]. Computer Engineering & Science, 2018, 40(08): 1511-1520.
[12]	WANG Jianjun，LIU Yulin. Online updating of self-adaptive middleware based on reinforcement learning [J]. J4, 2014, 36(08): 1462-1468.
[13]	HUANG Hanwen1，2，ZHENG Yu3. [J]. J4, 2011, 33(6): 118-124.
[14]	SHEN Le,MAO Xinjun,DONG Menggao. The Construction of a Selfadaptive MultiAgent System Based on Reinforcement Learning [J]. J4, 2011, 33(12): 72-77.
[15]	LI Qiong,GUO Yufeng,JIANG Yanhuang. An Intelligent I/O Scheduling Algorithm Based on Reinforcement Learning [J]. J4, 2010, 32(7): 58-61.

Proximal policy optimization and adversarial learning based dialog generation

PDF

Knowledge

Abstract

Cite this article

share this article

Related Articles 15

Recommended Articles

Metrics

Comments