A k-means clustering algorithm
parallelization design  based on Hash

Computer Engineering & Science

Previous Articles Next Articles

A k-means clustering algorithm

parallelization design based on Hash

ZHANG Bo,XU Weihong,CHEN Yuantao,ZHU Ling

（School of Computer &Communication Engineering,Changsha University of Science &Technology,Changsha 410114,China）

Received:2015-07-07 Revised:2015-09-23 Online:2016-10-25 Published:2016-10-26

Abstract

Abstract:

As the traditional kmeans algorithm has poor clustering effect when dealing with massive volume and high dimensional data, and the existing optimization algorithms are not conductive to parallelization, we propose a parallel optimization scheme based on Hash algorithm. We firstly map the massive volume and high dimensional data to a compressed identifier space, then mine the clustering relationship and select the initial clustering center. These steps avoid the sensitivity of the kmeans algorithm to the random selection of the initial clustering center, and reduce the number of iterations. Finally, combined with the MapReduce, the Partition and Combine mechanisms are applied to optimize the parallelization of this algorithm, thus the degree of parallelization and execution efficiency are more strengthened. Experimental results show that the proposed algorithm can improve the clustering accuracy and stability, and has good processing performance as well.

Key words: massive data, Hadoop, Hash, parallel kmeans clustering, center selection

ZHANG Bo,XU Weihong,CHEN Yuantao,ZHU Ling.

A k-means clustering algorithm

parallelization design based on Hash

[J]. Computer Engineering & Science.

[1]	SU Li,SUN Yanmeng,ZHANG Bowei,YANG Xianbo,ZHU Ying. A correlator implementation method based on Hadoop+CUDA [J]. J4, 20160101, 38(01): 46-51.
[2]	YAN Pan,TAN Ying,ZHANG Jianhua. A method of using historical calculation data efficiently in evolutionary algorithms [J]. J4, 20160101, 38(01): 62-66.
[3]	ZHAO Wen-tao, GUAN Li-he, HE Jian-guo, TANG Hao. A privacy protection recommendation algorithm in block chain environment [J]. Computer Engineering & Science, 2024, 46(06): 1032-1040.
[4]	CHEN Qiang, TAN Lin, WANG Yun-li, XIAO Jing. A CUDA-based data-parallel processing method in industrial blockchain [J]. Computer Engineering & Science, 2022, 44(12): 2102-2110.
[5]	QIANG Zi-lin, LIU Jian-guo, LIU Yun-feng, WEI Dong, QIANG Yan. A power grid image retrieval method based on time-frequency domain hash coding [J]. Computer Engineering & Science, 2022, 44(10): 1877-1884.
[6]	CHEN Zi-yu, HE Jun, GUO Xiang-yu. Implementation of cryptographic instructions for general purpose processors [J]. Computer Engineering & Science, 2022, 44(07): 1162-1170.
[7]	WANG Jing, QIAN Xiao-dong. Collaborative filtering recommendation based on local sensitive hash in blockchain environment [J]. Computer Engineering & Science, 2022, 44(03): 436-446.
[8]	ZHAO Jun-sheng, WANG Xin-yu, YIN Yu-jie, ZHANG Lin. A distributed retrieval method based on Mongolian news domain ontology [J]. Computer Engineering & Science, 2021, 43(03): 560-570.
[9]	LI Dan-feng, WANG Fei, ZHAO Guo-hong. A real-time HMAC-SM3 acceleration engine for large network traffic [J]. Computer Engineering & Science, 2021, 43(01): 82-88.
[10]	CHEN Hu, HAN Jian-guo. Optimization of typical memory-hard hash functions on GPUs [J]. Computer Engineering & Science, 2020, 42(10高性能专刊): 1905-1912.
[11]	ZHANG Kai-qi, LIU Xiao-yan, WANG Xin, JI Chun-shan, YAN Xin. Load balancing optimization of consistent hashing microservice based on dynamic weight [J]. Computer Engineering & Science, 2020, 42(08): 1339-1344.
[12]	YANG Qing1,2,3,ZHANG Ya-wen1,2,ZHANG Qin1,YUAN Pei-ling1. Research and application of a multidimensional association rules mining algorithm based on Hadoop [J]. Computer Engineering & Science, 2019, 41(12): 2127-2133.
[13]	HE Zhou-yu1,FENG Xu-peng2,LIU Li-jun1,HUANG Qing-song1,3. A product image retrieval method based on SHN model [J]. Computer Engineering & Science, 2019, 41(11): 1991-1999.
[14]	BING Rui1，MA Hui-fang1,2,3，LIU Yu-hang1，YU Li1. A weighted graph aggregation algorithm based on structural similarity and attribute similarity [J]. Computer Engineering & Science, 2019, 41(10): 1777-1784.
[15]	WANG Zhen,QUAN Hong-yan. A garment retrieval method based on deformable convolution [J]. Computer Engineering & Science, 2019, 41(09): 1671-1678.

A k-means clustering algorithm

parallelization design based on Hash

PDF

Knowledge

Abstract

Cite this article

share this article

Related Articles 15

Recommended Articles

Metrics

Comments