• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science

Previous Articles     Next Articles

A parallel MRACO-PAM clustering
algorithm based on MapReduce
 

ZHAO Bao-wen,XU Hua   

  1. (School of IOT,Jiangnan University,Wuxi 214122,China)
  • Received:2015-11-06 Revised:2016-09-21 Online:2017-10-25 Published:2017-10-25

Abstract:

Clustering analysis is one of the most commonly used data processing algorithms, and the partitioning around medoid (PAM) has been one of the most popular clustering algorithms since it was proposed in 1990. The PAM clustering algorithm solves the problem that the K-Means algorithm encounters when processing outlier data, which is sensitive to dirty data in clustering process. However, the original PAM’s convergence speed is slow and it works inefficiently for large datasets due to its time complexity. To address this problem, we enhance the global and local searching capabilities of the PAM by taking advantage of the ant colony algorithm, and propose a parallel MRACO-PAM clustering algorithm based on MapReduce programming framework. Experimental results demonstrate that the parallel MRACO-PAM algorithm based on MapReduce  improves the convergence speed and is capable of dealing with large-scale data with good scalability.

Key words: MapReduce, ant colony optimization(ACO), partitioning around medoid(PAM), big data, parallel computing