• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 人工智能与数据挖掘 • 上一篇    下一篇

基于多标签语义关联关系的微博用户兴趣建模方法

王艳茹1,马慧芳1,2,刘海姣1,魏家辉1   

  1. (1.西北师范大学计算机科学与工程学院,甘肃 兰州 730070;
    2.桂林电子科技大学广西可信软件重点实验室,广西 桂林 541004)
  • 收稿日期:2017-06-07 修回日期:2017-09-14 出版日期:2018-11-25 发布日期:2018-11-25
  • 基金资助:

    国家自然科学基金(61363058,61762078);广西可信软件重点实验室研究课题(kx201705)

A microblog user interest modeling method
based on multi-tag semantic correlation

WANG Yanru1,MA Huifang1,2,LIU Haijiao1,WEI Jiahui1   

  1. (1.College of Computer Science and Engineering,Northwest Normal University,Lanzhou 730070;
    2.Guangxi Key Laboratory of Trusted Software,Guilin University of Electronic Technology,Guilin 541004,China)
  • Received:2017-06-07 Revised:2017-09-14 Online:2018-11-25 Published:2018-11-25

摘要:

微博用户利用标签信息表征其兴趣及属性,通过分析微博用户标签特点以及现有微博推荐方法的局限性,提出一种改进的基于多标签语义关联关系的微博兴趣建模方法。为了解决现有加标方法忽略了语义关联及多标签间关联的问题,首先通过计算标签对在微博用户集合中的共现频率得到标签对语义内联关系;其次构建由标签对连接词组成的路径,通过共享熵进一步计算标签对语义外联关系;最后将两者结合得到标签对语义关联关系矩阵,由此来对用户标签矩阵进行更新,得到基于多标签语义关联关系的微博用户兴趣模型。以新浪微博公开API抓取的大量微博信息作为实验数据,进行了一系列的实验和分析,结果表明本文构建的用户兴趣模型具有较好的性能。

 

关键词: 多标签, 标签关联关系, 标签语义特征, 用户兴趣模型

Abstract:

Tags are always utilized to represent the interest and property of microblog users. We propose an improved microblog user interest modeling method based on multitag semantic correlation via analyzing the tag characteristics of microblog users and the limitations of existing microblog recommendation methods. Firstly, the co-occurrence frequency of tag pairs in the micro-blog user set is calculated to obtain the inner correlation between tag pairs. Secondly, the path is constructed based on the link tags for each tag pair and the outer correlation of tag pairs is obtained via the shared entropy. Finally, we combine the above two correlations to acquire the semantic correlation relation matrix, based on which the user tag matrix can be updated, thus the microblog user interest model based on multitag semantic correlation can be constructed. We evaluate our method through a series of experiments based on a dataset crawled from the open API of Sina Weibo and the results are analyzed. The results show that our method outperforms traditional user interest discovering methods.

Key words: