• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2016, Vol. 38 ›› Issue (04): 768-774.

• 论文 • 上一篇    下一篇

基于改进CURE算法的不确定性移动用户数据聚类

高长元1,2,王海晶1,王京1,2   

  1. (1.哈尔滨理工大学管理学院,黑龙江 哈尔滨 150040;2.哈尔滨理工大学高新技术产业发展研究中心,黑龙江 哈尔滨 150040)
  • 收稿日期:2015-04-21 修回日期:2015-06-18 出版日期:2016-04-25 发布日期:2016-04-25
  • 基金资助:

    国家自然科学基金(71272191,71072085);黑龙江省自然科学基金(G201301);黑龙江省高等学校哲学社会科学创新团队建设计划(TD201203)

An improved CURE algorithm based on the
uncertainty of mobile user data clustering  

GAO Changyuan1,2,WANG Haijing1,WANG Jing1,2   

  1. (1. College of Management,Harbin University of Science and Technology,Harbin 150040;
    2.Hightech Industrial Development Research Center,Harbin University of Science and Technology,Harbin 150040,China)
  • Received:2015-04-21 Revised:2015-06-18 Online:2016-04-25 Published:2016-04-25

摘要:

随着云计算、大数据以及移动互联网的发展,移动终端用户数据呈现出数据量大、噪声大、动态性及不确定性增强的趋势,影响了移动用户数据聚类准确率与效率。针对上述问题,提出了一种改进的层次聚类算法CURE。该算法将原有算法中抽样处理数据的方式用Map Reduce函数实现并行化处理,同时结合区间数的概念,将移动用户数据用一个区间表示,计算其区间距离来适应移动用户数据的不确定性特点,从而提高聚类效率与准确率。最后利用MIT Reality项目数据集进行仿真,仿真结果表明了该方法的有效性及可行性,为移动用户数据的进一步利用及用户的个性化推荐提供支持。

关键词: CURE, 不确定性数据, 移动用户数据, Map Reduce

Abstract:

With the development of cloud computing, big data and mobile internet, mobile user data shows a trend of large data, big noise, increasing dynamic and uncertainty. This impacts the accuracy and efficiency of mobile user data clustering. As a result, we propose an improved custering using representatives (CRUE) algorithm to solve this problem. This algorithm converts the sampling method in the original algorithm, and uses the Map Reduce function to achieve parallel processing. In addition, an interval is used to represent the mobile user data by combining  the concept of interval number. By calculating its interval distance to accommodate the uncertainty of mobile user data, the efficiency and accuracy of clustering are  thereby improved. Finally this algorithm is applied on MIT Reality Project data set, and simulation results  demonstrate the effectiveness and feasibility of the proposed algorithm. It provides support for the further use of mobile enduser data and user's personalized recommendation.

Key words: clustering using representatives(CURE);uncertain data;mobile enduser data;Map Reduce