• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science

Previous Articles     Next Articles

A document representation model based on Wasserstein GAN

MA Yongjun,LI Yajun,WANG Rui,CHEN Haishan   

  1. (College of Computer Science and Information Engineering,Tianjin University of Science & Technology,Tianjin 300457,China)

     
  • Received:2018-01-22 Revised:2018-02-28 Online:2019-01-25 Published:2019-01-25

Abstract:

Document representation models can convert unstructured text data into structured data, which is the basis of many natural language processing tasks. Currently, wordbased models cannot deal with unregistered words and documents in the document representation tasks. The generative adversarial network (GAN) can use two neural networks to deal with confrontation so as to learn the distribution of the original data well. We propose a Wasserstein adversarial document model (WADM), which uses denoising autoencoder as its discriminant network and obtains document representation directly by its hidden layer. Experiments show that the WADM can extract document features accurately and has stronger document representation capability than word-based models.
 

Key words: document representation, generative adversarial network(GAN), denoising autoencoder, neural network