• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊


• 论文 • 上一篇    



  1. (西北师范大学计算机科学与工程学院,甘肃 兰州 730070)
  • 收稿日期:2015-11-13 修回日期:2015-12-30 出版日期:2017-02-25 发布日期:2017-02-25
  • 基金资助:


A short text classification method combining
lexical category features and semantics

MA Hui-fang,ZHOU Ru-nan,JI Yu-gang,LU Xiao-yong   

  1. (College of Computer Science and Engineering,Northwest Normal University,Lanzhou 730070,China)
  • Received:2015-11-13 Revised:2015-12-30 Online:2017-02-25 Published:2017-02-25



关键词: 短文本分类, 隐含狄利克雷分布, 词汇特征, 语义特征, 特征选择


Classification of short texts is challenging due to their typical characteristics of severe sparseness and high dimension. We propose a novel approach to classify short texts by combining both lexical and semantic features. To construct the term dictionary, we firstly select lexical features of the most distinctive words of a certain category, and then extract the optimal topic distribution from the background knowledge repository based on the Latent Dirichlet Allocation so as to construct the new features of short texts. Experiments on classifying Sohu news titles which are typical short texts via SVM and K-NN show that our method can greatly improve the classification results.

Key words: short text classification, Latent Dirichlet Allocation, lexical features, semantic features, feature selection