• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2013, Vol. 35 ›› Issue (9): 127-131.

• 论文 • Previous Articles     Next Articles

Research of an improved algorithm for Chinese word
segmentation dictionary based on double-array Trie-tree        

YANG Wenchuan,LIU Jian,YU Miao   

  1. (School of Computer Science,Beijing University of Post and Telecommunication,Beijing 100876,China)
  • Received:2013-04-11 Revised:2013-07-24 Online:2013-09-25 Published:2013-09-25

Abstract:

Chinese word segmentation dictionary based on the doublearray Trietree has higher search efficiency, but the dynamic insertion consumes a lot of time. Therefore, an improved algorithm (iDAT) based on doublearray Trietree for Chinese word segmentation dictionary is proposed. The nodes with more branches are handled while the original dictionary is being initialized. After the initialization, a Hash process is performed on the index values of empty sequence in base array. The final Hash table stores the sum of the empty sequences before the current empty sequence. After that, the iDAT is used to carry out the dynamic insertion process. This algorithm adopts Sunday jumps algorithm of single pattern matching. With the reasonable increasement of space, it reduces the the average time complexity of the dynamic insertion process in Trietree. Practical results show it has good operation performance.

Key words: doublearray;Trietree;time complexity;word segmentation dictionary