• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2013, Vol. 35 ›› Issue (5): 15-19.

• 论文 • 上一篇    下一篇

应用数据填充缓解稀疏问题实现个性化推荐

夏建勋1,2,4,吴非2,3,4,谢长生2,3,4   

  1. (1. 湖北工程学院计算机与信息科学学院, 湖北 孝感 432100;2. 武汉光电国家实验室,湖北 武汉 430074;
    3. 信息存储系统教育部重点实验室,湖北 武汉 430074;4. 华中科技大学计算机科学与技术学院,湖北 武汉 430074)
  • 收稿日期:2012-05-16 修回日期:2012-09-20 出版日期:2013-05-25 发布日期:2013-05-25
  • 基金资助:

    湖北工程学院科研基金资助项目(z2013011);国家973重大基础研究资助项目(2011CB302303);国家自然科学基金资助重点项目(60933002);武汉市晨光计划资助项目(201050231072);湖北省自然科学基金资助项目(2010CDB01605);中央高校基本科研业务费资助项目(2011QN053,2011QN032)

Applying data filling to alleviate the sparsity
problem for personalized recommendation

XIA Jianxun1,2,4,WU Fei2,3,4,XIE Changsheng2,3,4   

  1. (1.School of Computer and Information Science,Hubei Engineering University,Xiaogan 432100;
    2.Wuhan National Laboratory for Optoelectronics,Wuhan 430074;
    3.Key Laboratory of Data Storage Systems,Ministry of Education of China,Wuhan 430074;
    4.School of Computer Science and Technology,Huazhong University of Science and Technology,Wuhan 430074,China)
  • Received:2012-05-16 Revised:2012-09-20 Online:2013-05-25 Published:2013-05-25

摘要:

协同过滤是到目前为止最成功和应用最广泛的推荐技术,然而,由于用户项目矩阵极端稀疏导致推荐不精确。针对该问题,提出了三种数据填充方法和两种推荐策略。对评分矩阵中未评分数据的三种数据填充方法是:(1)采用行和列数据的加权平均值填充;(2)采用行和列数据的众数的平均值填充;(3)采用行和列数据的中位数的平均值填充。一种推荐策略是直接用填充数据作为预测评分进行推荐;另一种推荐策略是将填充数据后的评分矩阵作为伪评分矩阵,应用Pearson相关相似性进行协同过滤推荐。采用MovieLens数据集进行的实验结果表明:上述几种推荐策略均可有效地缓解评分数据稀疏性问题,且提高了推荐精确度。

关键词: 推荐系统, 个性化推荐, 协同过滤, 数据填充

Abstract:

Till now, collaborative filtering has been the most successful and widely used technology in recommender systems. However, the rating data is extremely sparse so as to affect the prediction accuracy seriously in traditional collaborative filtering. In order to overcome the drawback, in this paper, we proposed three data filling approaches and two recommendation strategies. These data filling approaches for non-rating data in rating matrix are: (1) Filling data using weighed average of row and column data; (2) Filling data using mode average of row and column data; and (3) Filling data using median average of row and column data. One of the recommendation strategies is taking filling data for predicative rating directly, and another is to set the rating matrix filled data as a pseudo rating matrix and collaborative filtering applying Pearson correlation. The experimental results on the MovieLens data set show that all these recommendation strategies can effectively alleviate the trouble of rating data sparseness and gain better recommendation accuracy.

Key words: recommender system;personalized recommendation;collaborative filtering;data filling