• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2010, Vol. 32 ›› Issue (5): 133-135.doi: 10.3969/j.issn.1007130X.2010.

• 论文 • 上一篇    下一篇

基于统计方法的Web新词分词方法研究

张敏,王春红   

  1. (运城学院计算机科学与技术系,山西 运城 044000)
  • 收稿日期:2009-09-13 修回日期:2009-11-10 出版日期:2010-04-28 发布日期:2010-05-11
  • 通讯作者: 张敏 E-mail:ycuzhm@126.com
  • 作者简介:张敏 (1978),男,安徽巢湖人,硕士,讲师,研究方向为搜索引擎和信息处理;王春红, 副教授,研究方向为数据库应用和网络信息系统。
  • 基金资助:
    山西省高等学校科技开发项目(20091150);运城学院项目(JC2009009)

Study on New Words of Web Based on Statistical Word Segmentation

ZHANG Min,WANG Chunhong   

  1. (Department of Computer Science and Technology,Yuncheng University,Yuncheng 044000,China)
  • Received:2009-09-13 Revised:2009-11-10 Online:2010-04-28 Published:2010-05-11

摘要: 本文对信息处理技术中各种分词方法进行了研究,针对目前分词方法无法识别网络中不断出现的新词,设计了一种新的基于统计的分词方法。该方法避开现有的分词方法中的复杂语法规则,无需词典的支持,很好地解决了新词不断出现的问题,而且分词速度快,具有重要的理论和实用价值。

关键词: 统计分词, 词典, 特征提取

Abstract: This paper analyzes the various segmentation methods in the information processing technology.In view of the current segmentation methods in the network which do not recognize the new emerging words,we design a new subword method based on statistics. This method avoids complex grammar and rules, needs no enormous support from dictionaries, and resolves the problems brought by the new words. So we conclude that this method has better exactness and is very pragmatic and powerful in practical operations.

Key words: web;statistical word segmentation;dictionary;feature selection

中图分类号: