• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science

Previous Articles     Next Articles

A short text feature extraction method combining
term co-occurrence distance and category information
 

MA Huifang1,2,XING Yuying1,WANG Shuang1,ZHANG Xupeng1   

  1. (1.College of Computer Science and Engineering,Northwest Normal University,Lanzhou 730070;
    2.Gangxi Key Laboratory of Trused Software,Guilin University of Electronic Technology,Guilin 541004,China)

     
  • Received:2017-01-03 Revised:2017-05-26 Online:2018-09-25 Published:2018-09-25

Abstract:

Aiming at the problem that the traditional feature weighting methods do not fully consider the semantic information and category distribution information between terms, a short text feature extraction method combining term cooccurrence distance and category information is proposed. On the one hand, the number of terms between two terms in the same short text is taken as the cooccurrence distance, and the correlation weight between them is calculated. On the other hand, the improved expected cross entropy is used to calculate the weight value of a term in a certain category. They are integrated to obtain the weight value of all the terms in a certain category. The terms in all categories are sorted in descending order according to their weight values, and the top K terms are selected as the new feature term set. Experiments show that our method can improve the effect of short text feature extraction.

Key words: short text, co-occurrence distance, expected cross entropy, feature extraction