• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2021, Vol. 43 ›› Issue (12): 2238-2242.

Previous Articles     Next Articles

Thai sentence segmentation based on Siamese recurrent neural network

XIAN Yan-tuan1,2,ZHANG Zhi-ju1,2,WANG Hong-bin1,2,WEN Yong-hua1,2#br#

#br#
  

  1. (1.Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500;

    2.Yunnan Key Laboratory of Artificial Intelligence,Kunming University of Science and Technology,Kunming 650500,China)


  • Received:2020-07-28 Revised:2020-11-04 Accepted:2021-12-25 Online:2021-12-25 Published:2021-12-31

Abstract: Thai rarely use punctuation, and there are no obvious separators between sentences. Sentences need to be segmented by semantics, which brings extra difficulties to natural language processing tasks such as lexical analysis, syntactic analysis and machine translation. This paper proposes a sentence segmentation method based on dual-path neural network. Compared with the traditional Thai sentence segmentation method, this method does not need to define the feature manually, but uses a unified circular neural network to encode the sequence of words before and after the candidate interval. Then, the coding vector of the sequence before and after the sequence is used as the feature to construct the Thai segmentation classification model. Experimental results on the Orchid97 Thai corpus show that the proposed method is superior to the traditional Thai sentence segmentation method.


Key words: Thai language, sentence segmentation, recurrent neural network ,