A parallelhigh utility itemset mining
algorithm based on Spark

Computer Engineering & Science

Previous Articles Next Articles

A parallelhigh utility itemset mining

algorithm based on Spark

HE Deng-ping1,2,3，HE Zong-hao1,2，LI Pei-qiang1,2

(1.School of Telecommunication and Information Engineering,
Chongqing University of Posts and Telecommunications Chongqing 400065;

2.Research Center of New Telecommunication Technology Applications,
Chongqing University of Posts and Telecommunications,Chongqing 400065;

3.Chongqing Information Technology Designing Company Limited,Chongqing 400021,China )

Received:2019-03-19 Revised:2019-04-25 Online:2019-10-25 Published:2019-10-25

Abstract

Abstract:

Aiming at the problem that the traditional Top-K high utility mining algorithms based linked list structure can not meet the mining requirements in the big data environment, a parallel high utility itemset mining algorithm based on Spark (STKO) is proposed. Firstly, the TKO algorithm is improved by increasing the threshold increase and reducing the search space. Then, based on the Spark platform, the original data storage structure is changed and broadcast variables are used to optimize the iterative process,so as to avoid a large number of recalculations and use the load balancing idea to realize parallel mining of Top-K high utility itemsets. The experimental results show that the proposed algorithm can effectively mine the high utility item sets in the big data sets.

Key words: data mining, high utility itemset, Spark big data framework, parallelization, Top-K

HE Deng-ping1,2,3，HE Zong-hao1,2，LI Pei-qiang1,2.

A parallelhigh utility itemset mining

algorithm based on Spark

[J]. Computer Engineering & Science.

[1]	YANG Hang, SHAN Rui, YANG Kun, CUI Xin-yue. Parallel implementation of a 3D-HEVC intra prediction algorithm based on dynamic self-reconfiguration structure [J]. Computer Engineering & Science, 2024, 46(11): 1931-1939.
[2]	ZHAO Yan, MA Hui-fang, WANG Wen-tao, TONG Hai-bin, HE Xiang-chun. A reliable response representation enhanced knowledge tracing method [J]. Computer Engineering & Science, 2024, 46(03): 535-544.
[3]	LEI Xuan, CHENG Guang, ZHANG Yu-jian, GUO Liang, ZHANG Fu-cun. Association analysis of alarm information based on power network situation awareness platform [J]. Computer Engineering & Science, 2023, 45(07): 1197-1208.
[4]	WANG Chen-yu, WEN Hao-min, GUO Sheng-nan, LIN You-fang, WAN Huai-yu, . Multi-task deep spatial-temporal networkfor couriers pick-up arrival time prediction [J]. Computer Engineering & Science, 2023, 45(01): 136-144.
[5]	CHENG Xiao-gang, GUO Ren, ZHOU Chang-li, . A distributed privacy-preserving data mining framework based on rational cryptography [J]. Computer Engineering & Science, 2022, 44(10): 1781-1787.
[6]	WANG Wen-tao, MA Hui-fang, SHU Yue-yu, HE Xiang-chun. Knowledge tracing based on contextualized representation [J]. Computer Engineering & Science, 2022, 44(09): 1693-1701.
[7]	LIU Yun, XIAO Tian. A conditional causality mining algorithm in network log data [J]. Computer Engineering & Science, 2021, 43(09): 1584-1590.
[8]	WEN Kai, XU Meng-meng, ZHANG Xu-hong, . A weighted erasable itemset mining algorithm based on list structure [J]. Computer Engineering & Science, 2021, 43(09): 1676-1683.
[9]	XIONG Zhong-min, WANG Bo, TAO Ran, ZHENG Zong-sheng, CHEN Ming, . An association rule mining reduction algorithm based on determining prime attributes [J]. Computer Engineering & Science, 2021, 43(04): 738-745.
[10]	ZANG Run-qiang, ZUO Mei-yun, GUO Xin-xin. Disease prediction of elderly patients based on Doc2Vec and BiLSTM [J]. Computer Engineering & Science, 2020, 42(12): 2273-2279.
[11]	HE Wang1,2,LIN Guo-yuan1,2. Analysis of cloud server fault data based on improved FP-Growth algorithm [J]. Computer Engineering & Science, 2020, 42(05): 770-775.
[12]	TAN Sheng-xi,JIA Jin-ping,ZHAO Bin,JI Gen-lin. A black hole pattern mining algorithm in dynamic spatial network [J]. Computer Engineering & Science, 2020, 42(02): 325-333.
[13]	YANG Qing1,2,3,ZHANG Ya-wen1,2,ZHANG Qin1,YUAN Pei-ling1. Research and application of a multidimensional association rules mining algorithm based on Hadoop [J]. Computer Engineering & Science, 2019, 41(12): 2127-2133.
[14]	CHEN Sheng-fa，JIA Rui-yu. An improved K-medoids algorithm #br# based on density weight Canopy [J]. Computer Engineering & Science, 2019, 41(10): 1823-1828.
[15]	HE Dengping1,2,3，HE Zonghao1,2. A top-k high utility itemset mining algorithm based on R-list [J]. Computer Engineering & Science, 2019, 41(07): 1318-1324.

A parallelhigh utility itemset mining

algorithm based on Spark

PDF

Knowledge

Abstract

Cite this article

share this article

Related Articles 15

Recommended Articles 0

Metrics

Comments