• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2012, Vol. 34 ›› Issue (9): 174-179.

• 论文 • 上一篇    下一篇

改进的多维关联规则算法研究及应用

张素琪1,梁志刚2,胡利娟2,董永峰2   

  1. (1.天津大学,天津 300072;2.河北工业大学计算机科学与软件学院,天津 300130)
    (
  • 收稿日期:2011-07-19 修回日期:2011-10-28 出版日期:2012-09-25 发布日期:2012-09-25
  • 基金资助:

    天津市自然科学基金资助项目(10JCZDJC16000)

Research and Application of Improved Multidimensional Association Rule Mining Algorithms

ZHANG Suqi1,LIANG Zhigang2,HU Lijuan2,DONG Yongfeng2   

  1. 1.Tianjin University,Tianjin 300072;
    2.School of Computer Science and Software,Hebei University of Technology,Tianjin 300130,China)
  • Received:2011-07-19 Revised:2011-10-28 Online:2012-09-25 Published:2012-09-25

摘要:

关联规则是数据挖掘研究中最主要、最活跃的领域之一。以Apriori算法为前提,借助AprioriTid算法事务压缩的思想,减少了重复扫描数据库的时间;并提出了一种利用事务标识列表,该列表长度即是对应候选项集的支持度计数,在计算支持度计数时,仅需要得到对应列表长度即可,从而缩短了计算计数时的比较时间;同时,在生成频繁项集时引入地址索引机制,在剪枝过程中,利用候选项集的首元素在地址索引表中快速定位,减少了多次扫描事务数据库,有效地缩短了计数时间和占用的内存空间。利用改进的算法对科研管理系统数据进行关联关系分析,从中萃取数据中隐含的、有价值的信息,辅助下一阶段的科研管理工作。并通过试验进行性能比较得出,改进后的算法效率更高。

关键词: 关联规则, 数据挖掘, Apriori算法, 地址索引

Abstract:

The field of data mining association rules is one of the most important and active areas .Taking the Apriori algorithm as a premise , using the Affairs compression idea of AprioriTid algorithms, we reduce the duplication of time scanning the database. We put forward a kind of Apriori algorithm  based on the identifier lists of transactions in the database, and the list length is the candidate sets’ corresponding support count. For getting the support count in the calculation, we only need to count the length of the list, thereby reducing the calculation time. At the same time, introducing the address indexing mechanism when generating frequent itemsets in the pruning process, we use the first set of candidate elements in the address table index to quickly locate, and thus reduce the number of scanning the transaction database. We make use of the business address index table to improve the counting time and execution efficiency of algorithms.The data of scientific research management as the research object, we use the improved algorithms to analyze the data of relationship, moreover, to extract the data’s hidden ,valuable information, and support the next phase of scientific research management. The experiments show that the algorithm is more efficient.

Key words: association rule;data mining;apriori algorithm;allocation index