• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 人工智能与数据挖掘 • 上一篇    下一篇

基于弹幕情感分析和聚类算法的视频用户群体分类

洪庆,王思尧,赵钦佩,李江峰,饶卫雄   

  1. (同济大学软件学院,上海 200092)
  • 收稿日期:2017-10-11 修回日期:2018-01-15 出版日期:2018-06-25 发布日期:2018-06-25
  • 基金资助:

    国家自然科学基金(61572365,61503286,61702372);上海市自然科学基金(15ZR1443000);上海市科技英才扬帆计划项目(15YF1412600);上海市科委项目(14DZ1118700);中央高校基本科研业务费专项资金

Video user group classification based on barrage
comments sentiment analysis and clustering algorithms

HONG Qing,WANG Siyao,ZHAO Qinpei,LI Jiangfeng,RAO Weixiong
 
  

  1. (School of Software Engineering,Tongji University,Shanghai 200092,China)
     
  • Received:2017-10-11 Revised:2018-01-15 Online:2018-06-25 Published:2018-06-25

摘要:

随着数字媒体等技术的发展,出现了弹幕系统这种新型的评论模式并逐渐流行。它能够使视频观众即时发布关于视频情节内容的评论,也可以帮助观众理解视频内容。弹幕文本数据的产生,为短文本处理和实时数据处理提供了新的素材。研究弹幕数据的特点和其表达的情感,可以帮助我们更好地理解视频情节;研究弹幕内容之间的相似度进而分析用户之间的关联关系,不仅能够深入了解弹幕用户的特点、发掘不同视频之间的潜在联系,而且可以为视频制作时受众群体的选择提供更为准确的解决方案。首先将弹幕文本数据进行收集和预处理,然后计算这些文本的情感值。针对弹幕文本口语化的特点,建立了网络弹幕常用词词典。通过改进传统的k-means聚类算法,对所有发表弹幕的用户进行基于情感值的分类。这样的分类可以帮助我们了解观看特定类型视频的观众在情感上的异同点。
 

关键词: 弹幕系统, 短文本分析, 时间序列, 情感分析, 用户分类

Abstract:

With the development of digital media and other technologies, barrage comments, a new type of commentary system have become more and more popular. It enables audiences to immediately comment on videos and helps them understand the content. Barrage comments open up a new study area in short text and realtime data processing. By studying barrage comments deeply, we can understand the video plot; by studying the similarity between barrage comments and analyzing the association between users, we are able to understand the features of the users and potential connections between different videos, which can also provide a more accurate solution to the selection of target audience at the time of video production. We first introduce the collection and preprocessing on barrage comments, and then calculate the emotional values. Since the barrage comments are usually oral and out of structure in syntax and grammar, a dictionary for the commonly used barrage comments is built. The classic kmeans is adapted for obtaining the user groups based on the emotional values. We perform emotionbased classification for all users who post barrage comments. This sort of classification can help us understand the emotional similarities and differences among viewers watching a particular type of videos.
 

Key words: barrage comments system, short text analysis, time series, sentiment analysis, user classification