• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science

Previous Articles    

A short text classification method combining
lexical category features and semantics
 

MA Hui-fang,ZHOU Ru-nan,JI Yu-gang,LU Xiao-yong   

  1. (College of Computer Science and Engineering,Northwest Normal University,Lanzhou 730070,China)
  • Received:2015-11-13 Revised:2015-12-30 Online:2017-02-25 Published:2017-02-25

Abstract:

Classification of short texts is challenging due to their typical characteristics of severe sparseness and high dimension. We propose a novel approach to classify short texts by combining both lexical and semantic features. To construct the term dictionary, we firstly select lexical features of the most distinctive words of a certain category, and then extract the optimal topic distribution from the background knowledge repository based on the Latent Dirichlet Allocation so as to construct the new features of short texts. Experiments on classifying Sohu news titles which are typical short texts via SVM and K-NN show that our method can greatly improve the classification results.
 

Key words: short text classification, Latent Dirichlet Allocation, lexical features, semantic features, feature selection