• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science

Previous Articles     Next Articles

A Lao word segmentation method based on
bidirectional longshort term memory neural network model

HE Li,ZHOU Lanjiang,ZHOU Feng,GUO Jianyi
 
  

  1. (Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China)
  • Received:2018-07-18 Revised:2018-11-08 Online:2019-07-25 Published:2019-07-25

Abstract:

It is necessary to divide the continuous Lao language into words, which are the smallest independent and meaningful unit of language. We propose a Lao word segmentation method based on bidirectional long-short term memory (BLSTM) neural network model. The model is trained from a Lao corpus that contains 913487 manually tagged words. In this model, the Lao word segmentation task can be transformed into a syllablebased sequential tagging task, in which a Lao syllable is labeled as four tags: begin-word (B), middleword (M), end-word (E) and singleword (S). Firstly, Lao sentences are divided into syllables and the syllables are trained into vectors. Secondly, as the input of the BLSTM neural network model, these vectors are used to predict the label of the syllable. Thirdly, the sequence inference algorithm is used to determine the label of the syllable. We carry out experiments on the manually labeled word-segmentation corpus. Experimental results show that the proposal has an accuracy of 87.48%, which is obviously better than that of
existing  word segmentation methods.

Key words: neural network, syllable, bidirectional long-short term memory, Lao word segmentation