• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2008, Vol. 30 ›› Issue (5): 126-128.

• 论文 • 上一篇    下一篇

中文分词切分技术研究

徐飞 孙劲光   

  • 出版日期:2008-05-01 发布日期:2010-05-19

  • Online:2008-05-01 Published:2010-05-19

摘要:

本文分析了现有的基于词典的分词算法,在比较各种算法优缺点的基础上提出了将正向匹配算法与逆向匹配算法所得到的结果集进行叠加,生成粗分结果集的新观点,再对生成的粗分结果集构造非负权有向图,最后应用最短路径算法求解有向图。通过Nutch实验验证,该算法较Nutch原始搜索系统提高了其汉语切分的准确性以及切分速度,同时部分解决了交集型歧义切分问题。

关键词: 中文分词 最短路径 叠加运算

Abstract:

This paper analyzes the existing segmentation algorithms, compares different algorithms on the basis of their advantages and disadvantages. The paper    proposes the superposition results of positive and reverse matching algorithms,generates a rough-cutting result,and constructs a non-negative right graph. Finally we obtain the right results by using the shortest path algorithm. It is used in Nutch, and the results show that the algorithm is effective i n improving Chinese segmentation, cutting accuracy and cutting speed, Meanwhile a partial solution to the intersection of the ambiguity segmentation pro blem is given.

Key words: Chinese-word-segmentation, shortest path, superposition