基于多标签语义关联关系的微博用户兴趣建模方法

计算机工程与科学

基于多标签语义关联关系的微博用户兴趣建模方法

王艳茹1,马慧芳1,2,刘海姣1,魏家辉1

（1.西北师范大学计算机科学与工程学院,甘肃兰州 730070;

2.桂林电子科技大学广西可信软件重点实验室，广西桂林 541004）

收稿日期:2017-06-07 修回日期:2017-09-14 出版日期:2018-11-25 发布日期:2018-11-25
基金资助:
国家自然科学基金(61363058,61762078)；广西可信软件重点实验室研究课题（kx201705）

A microblog user interest modeling method

based on multi-tag semantic correlation

WANG Yanru1,MA Huifang1,2,LIU Haijiao1,WEI Jiahui1

（1.College of Computer Science and Engineering,Northwest Normal University,Lanzhou 730070;

2.Guangxi Key Laboratory of Trusted Software,Guilin University of Electronic Technology,Guilin 541004,China）

Received:2017-06-07 Revised:2017-09-14 Online:2018-11-25 Published:2018-11-25

摘要/Abstract

摘要：

微博用户利用标签信息表征其兴趣及属性,通过分析微博用户标签特点以及现有微博推荐方法的局限性,提出一种改进的基于多标签语义关联关系的微博兴趣建模方法。为了解决现有加标方法忽略了语义关联及多标签间关联的问题,首先通过计算标签对在微博用户集合中的共现频率得到标签对语义内联关系；其次构建由标签对连接词组成的路径,通过共享熵进一步计算标签对语义外联关系；最后将两者结合得到标签对语义关联关系矩阵,由此来对用户标签矩阵进行更新,得到基于多标签语义关联关系的微博用户兴趣模型。以新浪微博公开API抓取的大量微博信息作为实验数据,进行了一系列的实验和分析,结果表明本文构建的用户兴趣模型具有较好的性能。

关键词: 多标签, 标签关联关系, 标签语义特征, 用户兴趣模型

Abstract:

Tags are always utilized to represent the interest and property of microblog users. We propose an improved microblog user interest modeling method based on multitag semantic correlation via analyzing the tag characteristics of microblog users and the limitations of existing microblog recommendation methods. Firstly, the co-occurrence frequency of tag pairs in the micro-blog user set is calculated to obtain the inner correlation between tag pairs. Secondly, the path is constructed based on the link tags for each tag pair and the outer correlation of tag pairs is obtained via the shared entropy. Finally, we combine the above two correlations to acquire the semantic correlation relation matrix, based on which the user tag matrix can be updated, thus the microblog user interest model based on multitag semantic correlation can be constructed. We evaluate our method through a series of experiments based on a dataset crawled from the open API of Sina Weibo and the results are analyzed. The results show that our method outperforms traditional user interest discovering methods.

Key words:

multi-tag, tag correlation, tag semantic feature, user interest model

王艳茹1,马慧芳1,2,刘海姣1,魏家辉1. 基于多标签语义关联关系的微博用户兴趣建模方法[J]. 计算机工程与科学.

WANG Yanru1,MA Huifang1,2,LIU Haijiao1,WEI Jiahui1.

A microblog user interest modeling method

based on multi-tag semantic correlation

[J]. Computer Engineering & Science.

[1]	陆斌, 范强, 周晓磊, 严浩, 王芳潇, . 一种基于超图的多模态多标签分类方法[J]. 计算机工程与科学, 2024, 46(09): 1667-1674.
[2]	肖新正, 黄瑞章, 陈艳平, 秦永彬, 宋玉梅, 周裕林, . Corrective-Net：面向多标签文本分类的标签关联学习模块[J]. 计算机工程与科学, 2024, 46(06): 1092-1100.
[3]	李雨晨, 魏巍, 白伟明, 王达. 基于标签共现关系的多标签特征选择[J]. 计算机工程与科学, 2021, 43(11): 2049-2055.
[4]	程玉胜, 曹天成, 王一宾, 郑伟杰. 基于负相关性增强的不平衡多标签学习算法[J]. 计算机工程与科学, 2021, 43(09): 1700-1710.
[5]	李晓红, 王闪闪, 马堉银, 马慧芳. 融合相似度图和随机游走模型的多标签短文本分类算法[J]. 计算机工程与科学, 2021, 43(06): 1081-1087.
[6]	陈文祺, 王英, 王鑫, 汪洪吉. 基于生成对抗网络的多标签节点分类研究[J]. 计算机工程与科学, 2021, 43(02): 280-287.
[7]	杨岚雁, 靳敏, 张迎春, 张珣. 一种基于关联规则的MLKNN多标签分类算法[J]. 计算机工程与科学, 2020, 42(07): 1309-1317.
[8]	李校林, 王成, . 一种基于质心的多标签文本分类模型研究[J]. 计算机工程与科学, 2020, 42(06): 1120-1126.
[9]	王进，夏翠萍，欧阳卫华，王鸿，邓欣，陈乔松. Spark下的并行多标签最近邻算法[J]. 计算机工程与科学, 2017, 39(02): 227-235.
[10]	郭玉堂1，2，李艳1. 基于连续预测的半监督学习图像语义标注[J]. J4, 2015, 37(03): 553-558.
[11]	李玲1，刘华文1,2，马宗杰1，赵建民1. 基于特征选择的集成多标签分类算法[J]. J4, 2013, 35(10): 137-143.
[12]	李建廷. 基于简化ODP的用户兴趣模型[J]. J4, 2010, 32(5): 121-123.