• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2015, Vol. 37 ›› Issue (03): 422-428.

• 论文 • 上一篇    下一篇

一种高效用项集并行挖掘算法

宋威,吉红蕾,李晋宏   

  1. (北方工业大学计算机学院,北京 100144)
  • 收稿日期:2013-10-10 修回日期:2014-01-09 出版日期:2015-03-25 发布日期:2015-03-25
  • 基金资助:

    国家自然科学基金资助项目(61105045,51075423);北京市属市管高等学校人才强教计划资助项目(PHR201108057);北方工业大学科研人才提升计划资助项目(CCXZ201303)

A parallel algorithm for mining high utility itemsets  

SONG Wei,JI Honglei,LI Jinhong   

  1. (College of Computer,North China University of Technology,Beijing 100144,China)
  • Received:2013-10-10 Revised:2014-01-09 Online:2015-03-25 Published:2015-03-25

摘要:

由于能反映用户的偏好,可以弥补传统频繁项集挖掘仅由支持度来衡量项集重要性的不足,高效用项集正在成为当前数据挖掘研究的热点。为使高效用项集挖掘更好地适应数据规模不断增大的实际需求,提出了一种高效用项集的并行挖掘算法PHUIMine。提出了记录挖掘高效用项集信息的DHUI树结构,描述了DHUI树的构造方法,论证了DHUI树的动态剪枝策略。在此基础上,给出了高效用项集挖掘的并行算法描述。实验结果表明,PHUIMine算法具有较高的挖掘效率及较低的存储开销。

关键词: 数据挖掘, 高效用项集, 并行算法, 动态高效用项集树

Abstract:

Mining high utility itemsets is becoming a hot research topic in data mining owing to its ability to reflect users’preferences and make up for the shortcoming of measuring itemsets only by support degree.To meet the needs of larger data size,a parallel algorithm,called Parallel High Utility Itemset Mine (PHUIMine ),for mining high utility itemsets is proposed.Firstly,a tree structure,called DHUItree, is introduced to capture the information of high utility itemsets. Meanwhile, the DHUItree construction method is described,and the dynamic pruning strategy of DHUItree is discussed.Then, the parallel algorithm is presented. Experimental results show that PHUIMine algorithm is efficient and has low storage cost.

Key words: data mining;high utility itemset;parallel algorithm;DHUI-tree