• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science

Previous Articles     Next Articles

An improved frequent itemsets mining
algorithm based on vertical data format

XING Chang-zheng,AN Wei-guo,WANG Xing   

  1. (School of Electronic and Information Engineering,Liaoning Technical University,Huludao 125105,China)
  • Received:2015-12-14 Revised:2016-03-17 Online:2017-07-25 Published:2017-07-25

Abstract:

The existing vertical format based frequent itemsets mining employs the intersection method to compare two Tid sets, which costs a large amount of time and wastes storage space. Aiming at these problems, we propose a vertical data format based frequent itemsets mining algorithm based on triangular matrix and diffset. The algorithm utilizes the diffset to solve the large number of Tid sets when conducting frequent item mining  for dense data sets.  A prerequisite method is used to determine whether it is necessary to connect and generate candidate frequent k+1 itemsets, and reduce the cost of time. With the help of the data structure of the triangular matrix on storage can further save storage space. Experimental results show that the algorithm can greatly reduce time cost and space memory overhead for mining frequent itemsets.
 

Key words: frequent itemsets, triangular matrix, diffset, vertical data format