基于统计方法的Web新词分词方法研究

张敏,王春红

doi:10.3969/j.issn.1007130X.2010.

计算机工程与科学 >

2010 , Vol. 32 >Issue 5: 133 - 135

DOI: https://doi.org/10.3969/j.issn.1007130X.2010.

论文

基于统计方法的Web新词分词方法研究

展开

(运城学院计算机科学与技术系,山西运城 044000)

张敏 (1978)，男，安徽巢湖人，硕士，讲师,研究方向为搜索引擎和信息处理；王春红，副教授，研究方向为数据库应用和网络信息系统。

收稿日期: 2009-09-13

修回日期: 2009-11-10

网络出版日期: 2010-05-11

基金资助

山西省高等学校科技开发项目（20091150）;运城学院项目（JC2009009）

收起

Study on New Words of Web Based on Statistical Word Segmentation

Expand

（Department of Computer Science and Technology,Yuncheng University,Yuncheng 044000,China）

Received date: 2009-09-13

Revised date: 2009-11-10

Online published: 2010-05-11

Fold

摘要

本文对信息处理技术中各种分词方法进行了研究，针对目前分词方法无法识别网络中不断出现的新词，设计了一种新的基于统计的分词方法。该方法避开现有的分词方法中的复杂语法规则，无需词典的支持，很好地解决了新词不断出现的问题，而且分词速度快，具有重要的理论和实用价值。

关键词： 统计分词; 词典; 特征提取

本文引用格式

张敏,王春红 . 基于统计方法的Web新词分词方法研究[J]. 计算机工程与科学, 2010 , 32(5) : 133 -135 . DOI: 10.3969/j.issn.1007130X.2010.

Abstract

This paper analyzes the various segmentation methods in the information processing technology.In view of the current segmentation methods in the network which do not recognize the new emerging words,we design a new subword method based on statistics. This method avoids complex grammar and rules, needs no enormous support from dictionaries, and resolves the problems brought by the new words. So we conclude that this method has better exactness and is very pragmatic and powerful in practical operations.

Key words： web;statistical word segmentation;dictionary;feature selection

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract