• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science

Previous Articles     Next Articles

Research and application of a multidimensional
association rules mining algorithm based on Hadoop

 YANG Qing1,2,3,ZHANG Ya-wen1,2,ZHANG Qin1,YUAN Pei-ling1   

  1. (1.School of Computer,Central China Normal University,Wuhan 430079;
    2.Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning,Wuhan 430079;
    3.National Language Resources Monitor & Research Center for Network Media,Wuhan 430079,China)
  • Received:2019-07-07 Revised:2019-09-17 Online:2019-12-25 Published:2019-12-25

Abstract:

The traditional Apriori algorithm has to scan the data set multiple times. With the rapid growth of data volume, it cannot be applied to big data analysis. For this problem, an improved parallel Apriori algorithm is designed. Firstly, an IApriori algorithm for multidimensional data is designed by pruning strategy. Secondly, the IApriori algorithm is combined with the Hadoop distributed framework to realize the parallelization of multidimensional association rules mining algorithm. This paper applies the IPApriori algorithm to the correlation analysis of mobile phone user behavior prediction, analyzes some main factors affecting the behavior of mobile phone users, and discovers the possible correlation between mobile phone user behavior and some attributes such as age dimension, gender dimension, time dimension, location dimension and mobile phone brand dimension. Finally, experiments prove that this parallelization algorithm process and the structure building method can reduce the I/O load of the system and improve the execution efficiency of the algorithm.
 

Key words: Apriori algorithm, Hadoop, multidimensional association rules, parallelization