• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2012, Vol. 34 ›› Issue (6): 187-190.

• 论文 • Previous Articles    

Research on the Automatic Identification of Tibetan Sentence Boundaries with Maximum Entropy Classifier

CAI Zangtai   

  1. (School of Computer,Qinghai Normal University,Xining 810008,China)
  • Received:2011-09-01 Revised:2011-11-03 Online:2012-06-25 Published:2012-06-25

Abstract:

The boundary Ientification of Tibetan sentence is the basical research of Tibetan text analysis. It is the essential work to build a Parallel Corpora between Tibetan and other languages, and also it is the base to do TibetanChinese machine translation. The article raises the ways of Boundary Identification of Tibetan sentences through the analyze of the ending forms of Tibetan sentences and the study of it’s boundary rules. The method is firstly using the special rules and word forms to identify Tibetan Sentences, and then to make a further identification for those ambiguous sentences by using Maximum Entropy Model. So it can improve the boundary identification rate of Tibetan sentences.

Key words: Tibetan sentence;boundary identification;maximum entropy model