• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2014, Vol. 36 ›› Issue (05): 971-976.

• 论文 • Previous Articles     Next Articles

Vector space model based on keywords
and cooccurrence word pairs          

TANG Shouzhong,QI Jiandong   

  1. (School of Information,Beijing Forestry University,Beijing 100083,China)
  • Received:2013-02-25 Revised:2013-04-24 Online:2014-05-25 Published:2014-05-25

Abstract:

A new vector space model is proposed, which uses both keyword and cooccurrence term as the representation features of documents. Firstly, the keyword candidates are extracted from documents by segmenting texts and removing stop words,and the keyword features are filtered by document frequency.Secondly, based on the obtained keyword features, the cooccurrence word pairs are constructed,and support degree and confidence degree are defined to filter the features of cooccurrence word pairs. Finally, the keyword features and the features of cooccurrence word pairs are combined to construct the vector space model. The textclassification experiments show that the proposed model has better ability of text classification.
   

Key words: vector space model;cooccurrence word;semantical relationship;text classification