• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2015, Vol. 37 ›› Issue (08): 1450-1457.

• 论文 • 上一篇    下一篇

基于满二叉树的二分K-means聚类并行推荐算法

陈平华,陈传瑜   

  1. (广东工业大学计算机学院,广东 广州 510006)
  • 收稿日期:2014-08-25 修回日期:2014-10-11 出版日期:2015-08-25 发布日期:2015-08-25
  • 基金资助:

    广东省教育部产学研结合项目(2012B091100003,2012B091000058);广东省专业镇中小微企业服务平台建设项目(2012B040500034)

A bisecting K-means clustering parallel
recommendation algorithm based on full binary tree 

CHEN Pinghua,CHEN Chuanyu   

  1. (Faculty of Computer,Guangdong University of Technology,Guangzhou 510006,China)
  • Received:2014-08-25 Revised:2014-10-11 Online:2015-08-25 Published:2015-08-25

摘要:

在推荐系统中应用K-means算法聚类可有效降维,然而聚类效果往往依赖于选定的初始中心,并且一旦选定目标簇后,推荐过程只针对目标簇进行,与其他簇无关。针对上述两个问题,提出一种基于满二叉树的二分K-means聚类并行推荐算法。该算法首先反复迭代二分Kmeans算法,迭代过程中使用簇内凝聚度作为分裂阈值,形成一颗满二叉树;然后通过层次遍历将用户归入到K个叶子节点(簇);最后针对K个簇,应用MapReduce框架进行并行推荐预测。MovieLens上的实验结果表明,该算法可大幅度提高推荐系统准确性,同时增强系统可扩展性。

关键词: 满二叉树, K-means, 聚类, 推荐算法, MapReduce

Abstract:

K-means clustering algorithms can effectively reduce dimensions when they are applied to recommendation systems. However, the clustering effect is often dependent on the initial centers. And once the target cluster is selected, the recommendation process is executed only according to the target cluster and has nothing to do with other clusters. To solve these problems, we present a bisecting Kmeans clustering parallel recommendation algorithm based on full binary tree. Firstly, the bisecting K-means clustering algorithm is iterated, and during the iterative process the cluster cohesion level serves as the split threshold to form a full binary tree. Then the active users are classified into k leaf nodes (clusters) using the method of level traversal. Lastly, via the MapReduce framework, the process of recommendation prediction can be parallelized onto the k clusters. Experimental results on the MovieLens show that the proposed algorithm can not only greatly improve the accuracy of the recommendation results but also enhance the system scalability.Key words:  

Key words: full binary tree;K-means;clustering;recommendation algorithm;MapReduce