• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science

Previous Articles     Next Articles

A microblog retrieval model combining BTM and graph theory

CAI Chen1,2,LUO Ke1,2   

  1. (1.School of Computer & Communication Engineering,Changsha University of Science and Technology,Changsha 410114;
    2.Hunan Provincial Key Laboratory of Intelligent Processing of Big Data on Transportation,
    Changsha University of Science and Technology,Changsha 410114,China)
  • Received:2018-09-27 Revised:2018-11-30 Online:2019-08-25 Published:2019-08-25

Abstract:

Microblogs have a large amount of data but a few characters in the text, and their features are sparse. In order to improve the retrieval precision, we propose a microblog retrieval model combining BTM and graph theory. The lexical semantic correlation is used to calculate the correlation between features with labels in microblog text. Then we construct a bi-term topic model, use JSD distance to calculate the correlation of pair words in the short text that mapped to the model. Thirdly, we extract the entity and graph structure in CN-DBpedia, and then use the SimRank algorithm to calculate inter-entity correlation between graph structures. The above three correlations are the final correlation of the model. Finally, the Sina Weibo data set is used for the retrieval experiments. Experimental results show that compared with the retrieval model based on the combination of the implicit Dirichlet distribution algorithm and graph theory and the system model based on open data correlation and graph theory, the performance of the new model is significantly improved in MAP, accuracy and recall rate, indicating that the model has better retrieval performance.
 

Key words: microblog, short text, similarity calculation, BTM, graph theory, topic model