• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2013, Vol. 35 ›› Issue (3): 145-149.

• 论文 • 上一篇    下一篇

基于分而治之及Hash链表的图分类算法

孙伟1,朱正礼1,2   

  1. (1.南京林业大学信息科学技术学院,江苏 南京 210037;
    2.南京理工大学计算机科学与技术学院,江苏 南京 210094)
  • 收稿日期:2012-05-23 修回日期:2012-09-11 出版日期:2013-03-25 发布日期:2013-03-25
  • 基金资助:

    国家自然科学基金资助项目(61072148)

Graph classification algorithm based on
divide and conquer strategy and Hash linked list  

SUN Wei1,ZHU Zhengli1,2   

  1. (1.School of Information and Technology,Nanjing Forestry University,Nanjing 210037;2.School of Computer Science and Technology,Nanjing University of Science&Technology,Nanjing 210094,China)
  • Received:2012-05-23 Revised:2012-09-11 Online:2013-03-25 Published:2013-03-25

摘要:

主流的图结构数据分类算法大都是基于频繁子结构挖掘策略。这一策略必然导致对全局数据空间的不断重复搜索,从而使得该领域相关算法的效率较低,无法满足特定要求。针对此类算法的不足,采用分而治之方法,设计出一种模块化数据空间和利用Hash链表存取地址及支持度的算法。将原始数据库按照规则划分为有限的子模块,利用gSpan算法对各个模块进行操作获取局部频繁子模式,再利用Hash函数将各模块挖掘结果映射出唯一存储地址,同时记录其相应支持度构成Hash链表,最后得到全局频繁子模式并构造图数据分类器。算法避免了对全局空间的重复搜索,从而大幅度提升了执行效率;也使得模块化后的数据可以一次性装入内存,从而节省了内存开销。实验表明,新算法在分类模型塑造环节的效率较之于主流图分类算法提升了1.2~3.2倍,同时分类准确率没有下降。

关键词: 图数据分类, 分而治之, 模块化数据, Hash链表, 分类效率

Abstract:

The mainstream graph data classification algorithms are based on frequent substructure mining strategy, which inevitably leads to searching the global data space repeatedly and hence the related algorithms have low efficiency and cannot meet specific requirements. Aiming at the disadvantages of such algorithms, firstly, the “divide and conquer” strategy is used to design a modular data space and an algorithm that use the hash linked list to store the address and the support degree. Secondly, the original database is partitioned into a limited number of submodules according to the rules, and the gSpan algorithm is used to handle each submodule to get the locally frequent submodel. Thirdly, Hash functions are used to calculate the unique memory address of the mining result of each module, and construct the Hash linked list by recording the support degree. Finally, the globally frequent submodel is obtained and the graph data classifier is built up. The algorithm avoids searching the global space repeatedly so as to greatly enhance the efficiency as well as saves memory overhead by loading the data into the memory at a time. The experiment results show that, compared with the mainstream graph classification algorithms, the new algorithm has 1.2~3.2 times efficiency improvement in the phase of constructing the submodels and has the same classification accuracy rate.

Key words: graph classification;divide and conquer;modular data;Hash linked list;classification efficiency