• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2021, Vol. 43 ›› Issue (06): 1081-1087.

• 人工智能与数据挖掘 • 上一篇    下一篇

融合相似度图和随机游走模型的多标签短文本分类算法

李晓红,王闪闪,马堉银,马慧芳   

  1. (西北师范大学计算机科学与工程学院,甘肃 兰州 730070) 
  • 收稿日期:2019-10-12 修回日期:2020-06-02 接受日期:2021-06-25 出版日期:2021-06-25 发布日期:2021-06-22
  • 基金资助:
    国家自然科学基金(61762078,61967013);高等学校创新创业基金(2020B-089);甘肃省科技计划(20JR5RA518);甘肃省自然科学基金(20JR10RA076)

A short text multi-label classification method combining similarity graph and random walk model 

LI Xiao-hong,WANG Shan-shan,MA Yu-yin,MA Hui-fang   

  1. (College of Computer Science and Engineering,Northwest Normal University,Lanzhou 730070,China) 
  • Received:2019-10-12 Revised:2020-06-02 Accepted:2021-06-25 Online:2021-06-25 Published:2021-06-22
  • Supported by:

摘要: 提出一种融合相似度图和随机游走模型的多标签短文本分类算法。首先,以样本数据和标签为节点创建相似度图,借助外部知识库计算样本与标签之间的权重,得到预测样本与标签集合之间的匹配度。然后,将多标签数据映射成多标签依赖图,在图上进行重启随机游走,并将已获得的匹配度作为初始预测值,计算每个节点的概率分布,直到概率分布趋于稳定时,节点的概率分布即为标签的概率分布,进而确定预测文本的标签集。实验结果表明,本文提出的算法有较好的多标签文本分类性能,与同类算法相比较,分类性能显著提升。

关键词: 多标签短文本分类;相似度图;重启随机游走;语义网WordNet ,

Abstract: A short text multi-label classification algorithm combining similarity graph and random walk model is proposed. Firstly, the sample data and labels are used as nodes to create a similarity graph, and the weight between the sample and the label is calculated with the help of an external know- ledge base to obtain the matching degree between the predicted sample and the label set. Secondly, the multi-label data are mapped into a multi-label dependency graph. A random walk is performed on the graph, and the previous matching degree is used as the initial prediction value to calculate the probability distribution of each node. When the probability distribution tends to be stable, the probability distribution of the node is the probability distribution of the label, and then the label set of the predicted text is determined. The experimental results show that the proposed method achieves better performance in the classification of multi-label texts. Compared with similar algorithms, the classification performance is significantly improved.


Key words: multi-label short text classification, similarity graph, restart random walk, WordNet