• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science

Previous Articles     Next Articles

Short text similarity measure based on
co-occurrence distance and discrimination

LIU Wen1,MA Huifang1,2,TUO Ting1,CHEN Haibo1   

  1. (1.College of Computer Science and Engineering,Northwest Normal University,Lanzhou 730070;
    2.Guangxi Key Laboratory of Trusted Software,Guilin University of Electronic Technology,Guilin 541004,China)
  • Received:2016-12-20 Revised:2017-02-28 Online:2018-07-25 Published:2018-07-25

Abstract:

Aiming at the typical characteristics of severe sparseness and high dimension of short texts, we propose a short text similarity measure method based on cooccurrence distance and discrimination. On the one hand, the method leverages the cooccurrence distance between terms in each document to determine cooccurrence distance correlation. On the other hand, we calculate the cooccurrence discrimination to improve the accuracy of cooccurrence distance correlation, and then the relevance weight of the terms in the text is calculated. The text similarity between two short texts is calculated according to the term weights and the cooccurrence distance between terms. Experimental results show that the proposed method outperforms the baseline algorithm in term of performance and efficiency in similarity calculation.


 

Key words: short text, co-occurrence distance correlation, cooccurrence discrimination, term weighting, similarity calculation