• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science

Previous Articles     Next Articles

Optimization and visualization application
of  CTM model in text classification

MA Chang-lin,YANG Zheng-liang,XIE Luo-di   

  1. (School of Computer,Central China Normal University,Wuhan 430079,China)
  • Received:2016-09-20 Revised:2016-11-03 Online:2017-03-25 Published:2017-03-25

Abstract:

How to automatically extract related information from enormous texts has become a huge challenge. As an efficient way to solve this problem, text classification has attracted much attention, in which text representation is a critical factor to affect classification results. The correlated topic model can implement text representation, which can correctly reflect the correlation between topics under the case to remain the integrity of information. Based on this model, we optimize feature selection and the number of topics, and determine the number of topics with perplexity and log-likelihood function. We adopt the principal component analysis algorithm based on mutual information to optimize feature selection, which can reduce data dimension and the redundancy of text features. The R language is used to visualize the experimental results.
 

Key words: text classification, CTM model, feature selection