• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2012, Vol. 34 ›› Issue (3): 62-66.

• 论文 • 上一篇    下一篇

基于云计算的SLIQ并行算法研究

杨长春,沈晓玲   

  1. (常州大学信息科学与工程学院,江苏 常州 213164)
  • 收稿日期:2011-09-24 修回日期:2011-11-21 出版日期:2012-03-26 发布日期:2012-03-25
  • 基金资助:

    国家自然科学基金资助项目(61003163);江苏省科技基金资助项目(BZ2010021)

Research on the SLIQ Parallel Algorithm Based on Cloud Computing

YANG Changchun,SHEN Xiaoling   

  1. (School of Information Science and Engineering,Changzhou University,Changzhou 213164,China)
  • Received:2011-09-24 Revised:2011-11-21 Online:2012-03-26 Published:2012-03-25

摘要:

云计算为存储和分析海量数据提供了高效的解决方案,对数据挖掘算法的研究具有重要的理论意义和应用价值。SLIQ算法采用逐一遍历并计算伸缩性指标的方法来寻找最佳分裂点,这种方法过于消耗时间,当数据量增大时,算法的执行效率很低。本文针对云计算环境下的决策规则挖掘算法展开研究,介绍了MapReduce编程模型,在此基础上,以实现云计算环境下SLIQ并行化挖掘为目的,给出了改进后的SLIQ算法在MapReduce编程模型上的应用过程。

关键词: 云计算, SLIQ, MapReduce, 数据挖掘

Abstract:

Cloud computing provides efficient solutions to storing and analyzing mass data.It is very important to study the data mining algorithms based on cloud computing from the theoretical viewpoint and the practical viewpoint.The SLIQ algorithm finds the best split point through calculating the scalability indexes one by one.When the amount of data increases,the method is timeconsuming,and the efficiency of the algorithm is very low.In this paper,the algorithms of mining decision rules based on the cloud computing environment are focused on the MapReduce programming model.On the basis,an improved SLIQ algorithm as well as the procedure of the improved SLIQ algorithm on MapReduce is designed in order to realize parallel data mining.

Key words: cloud computing;SLIQ;MapReduce;data mining