• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science

Previous Articles     Next Articles

A context character feature based neural
network model for Thai word segmentation

TAO Guang-feng,XIAN Yan-tuan,WANG Hong-bin,WANG Shu-juan   

  1. (School of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China)
  • Received:2016-11-18 Revised:2016-12-23 Online:2018-05-25 Published:2018-05-25

Abstract:

Automatic word segmentation is a fundamental technology of natural language processing. Aiming at the problem of complex feature template and large search space in the traditional Thai word segmentation method, this paper proposes a context character feature based neural network model for Thai word segmentation. The proposed model uses the word distribution table to train the word representation vector, and utilizes a multi-layer neural network classifier for Thai word segmentation. Experimental results on InterBEST 2009 Thai word evaluation corpus show that, compared with the conditional random field model, the Character-Cluster Hybrid segmentation model, and the GLR and N-gram segmentation model, our proposal achieves better performance. Word segmentation accuracy, recall ratio and F value reach 97.27%, 99.26% and 98.26%, respectively. Our model improves the segmentation speed by 112.78% in comparison to the conditional random field model.
 

Key words: Thai word segmentation, neural network model, context character feature, characters vector