• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2014, Vol. 36 ›› Issue (05): 971-976.

• 论文 • 上一篇    下一篇

一种结合关键词与共现词对的向量空间模型

唐守忠,齐建东   

  1. (北京林业大学信息学院,北京 100083)
  • 收稿日期:2013-02-25 修回日期:2013-04-24 出版日期:2014-05-25 发布日期:2014-05-25
  • 基金资助:

    十二五科技支撑课题(2011BAH10B04)

Vector space model based on keywords
and cooccurrence word pairs          

TANG Shouzhong,QI Jiandong   

  1. (School of Information,Beijing Forestry University,Beijing 100083,China)
  • Received:2013-02-25 Revised:2013-04-24 Online:2014-05-25 Published:2014-05-25

摘要:

提出了一种结合关键词特征和共现词对特征的向量空间模型。首先,通过分词和去除停用词提取文本中的候选关键词,利用文本频率筛选关键词特征。然后,基于获得的关键词特征两两构造候选共现词对,定义支持度和置信度筛选共现词对特征。最后,结合关键词特征和共现词对特征构建向量空间模型。文本分类实验结果表明,提出的模型具有更强的文本分类能力。

关键词: 向量空间模型, 共现词对, 语义相关性, 文本分类

Abstract:

A new vector space model is proposed, which uses both keyword and cooccurrence term as the representation features of documents. Firstly, the keyword candidates are extracted from documents by segmenting texts and removing stop words,and the keyword features are filtered by document frequency.Secondly, based on the obtained keyword features, the cooccurrence word pairs are constructed,and support degree and confidence degree are defined to filter the features of cooccurrence word pairs. Finally, the keyword features and the features of cooccurrence word pairs are combined to construct the vector space model. The textclassification experiments show that the proposed model has better ability of text classification.
   

Key words: vector space model;cooccurrence word;semantical relationship;text classification