• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2012, Vol. 34 ›› Issue (6): 140-145.

• 论文 • Previous Articles     Next Articles

A Text Feature Selection Algorithm Based on Analysing the Relationship Between Words

WU Shuang,ZHANG Wensheng,XU Hairui   

  1. (Institute of Automation,Chinese Academy of Sciences,Beijing 100190,China)
  • Received:2011-04-29 Revised:2011-07-15 Online:2012-06-25 Published:2012-06-25

Abstract:

The traditional feature selection algorithms usually select features distinguishing the different types of documents by the evaluation functions. However, these methods take the separate word as unit to establish a vector space model. The important words in the documents and the relationship between words are  not realized. In allusion to the disadvantages mentioned above, a new feature selection algorithm based on the relationship between words is presented. This algorithm considers key words, mines words’ association and checks these association rules by a correlation analysis to produce a feature space which closely relates to the category attributes. The experiment indicates that this method is better to express the semantic content of the documents and has a good categorization result.

Key words: relationship between words;feature selection;association rule;text categorization