• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science

Previous Articles     Next Articles

Core term based mean partition
similarity for short text clustering

MA Hui-fang,ZHU Zhi-qiang,CHENG Yu-dan,JIA Jun-jie
 
  

  1. (College of Computer Science and Engineering,Northwest Normal University,Lanzhou 730070,China)
  • Received:2016-03-24 Revised:2016-05-13 Online:2017-08-25 Published:2017-08-25

Abstract:

Aiming at the characteristics of extreme sparse and context dependent features of short texts, we propose a novel core term based mean partition similarity for short text clustering algorithm (CTMPS) with top-down strategy. The CTMPS firstly determines probabilistic correlation among terms in the corpus. Secondly, based on the probabilistic correlation,terms in a short text are weighted. The terms with larger weight are considered as the most representative terms of the short text and they then form the core terms set. On the basis of information theory, mean partition similarity (MPS) is calculated via core terms, and the MPS with the maximum core terms in the short text forms one class. Finally, experimental results show that the CTMPS outperforms the baseline algorithm in term of performance and clustering efficiency.
 

Key words: short text clustering, core term, mean partition similarity, probabilistic correlation, entropy