• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 高性能计算 • 上一篇    下一篇

并行动态位向量频繁闭合序列模式挖掘算法

陈倩,刘云,高钰莹   

  1. (昆明理工大学信息工程与自动化学院,云南 昆明 650500)
  • 收稿日期:2017-11-22 修回日期:2018-01-24 出版日期:2018-10-25 发布日期:2018-10-25
  • 基金资助:

    国家自然科学基金(61262040)

A parallel dynamic bit vector based frequent
closed sequence pattern mining algorithm

CHEN Qian,LIU Yun,GAO Yuying   

  1. (Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China)
     
  • Received:2017-11-22 Revised:2018-01-24 Online:2018-10-25 Published:2018-10-25

摘要:

针对在时间和空间上都具有高计算成本的长序列数据库,一个更有效和更紧凑且可以完全提取信息的挖掘模式是当前的研究热点。提出一种并行动态位向量频繁闭合序列模式的挖掘算法(PDBVFCSP),该算法采用多核处理器架构和DBV数据结构相结合的方式,有效加快了序列数据库的处理速度,并对搜索空间进行划分,尽早执行预处理序列的闭合检查,减少了所需的存储空间和挖掘频繁闭合序列模式的执行时间,克服了现有并行挖掘算法通信开销、同步和数据复制等问题。利用重新分配工作的动态负载平衡机制,解决处理器之间的负载均衡问题,最大限度地减少了CPU空闲时间。对DBVVDF算法和PDBVFCSP(24核)算法进行仿真比较,结果表明,PDBVFCSP算法在运行时间、内存使用和可伸缩性等方面都有较优的性能提升,且当内核数增加时,性能更优。
 

关键词: 数据挖掘, 闭合序列模式, 动态位向量, 多核处理器, PDBVFCSP算法

Abstract:

For long sequence databases, which have high computational costs both in time and space, a mining model that is more efficient and compact and can extract information completely is a current research hotspot. We propose a parallel dynamic bit vector based frequent closed sequence pattern mining algorithm (PDBV-FCSP), which combines the multicore processor architecture with the DBV data structure to effectively speed up the processing speed of the sequence database. The search space is divided, and the closed check of the preprocessing sequence is executed as early as possible, which reduces the required storage space and the execution time of mining the frequent closed sequence mode, and overcomes the problems of communication overhead, synchronization and data replication of the existing parallel mining algorithms. The dynamic load balancing mechanism for job redistribution is used to solve the load balancing problem of workloads among processors, thus minimizing the idle CPU time. Simulation results show that, compared with the DBVVDF algorithm, the PDBVFCSP algorithm has better performance in terms of running time, memory usage and scalability. And when the core number increases, the performance is better.
 

Key words: data mining, closed-sequence mode, dynamic bit vector, multi-core processor, PDBV-FCSP algorithm