• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2020, Vol. 42 ›› Issue (09): 1640-1648.

• 人工智能与数据挖掘 • 上一篇    下一篇

基于深度编码器的复杂网络社区发现算法
#br#

张士进,张胜,田纪彪,吴志强,戴维凯   

  1. (南昌航空大学信息工程学院,江西 南昌 330063)
  • 收稿日期:2019-10-24 修回日期:2020-03-24 接受日期:2020-09-25 出版日期:2020-09-25 发布日期:2020-09-25
  • 基金资助:
    国家自然科学基金(61661037);江西省教育厅科技资助项目(GJJ170575);江西省研究生创新专项基金(YC2018-S370)

Complex network community detection algorithm based on deep encoder 

ZHANG Shi-jin,ZHANG Sheng,TIAN Ji-biao,WU Zhi-qiang,DAI Wei-kai#br#

#br#
  

  1. (College of Information Engineering,Nanchang Hangkong University,Nanchang 330063,China)

  • Received:2019-10-24 Revised:2020-03-24 Accepted:2020-09-25 Online:2020-09-25 Published:2020-09-25

摘要: 复杂网络是复杂系统的典型表现形式,社区结构是复杂网络最重要的结构特征之一。针对目前社区发现算法精确度低以及不适合大规模网络的问题,提出一种新的算法DA-EF和用于度量节点之间相似度的影响力扩散指标。DA-EF利用多层自动编码器与森林编码器构成二级级联模型,相似度矩阵进行降维和表征学习处理,转化成低维高阶特征矩阵,最终使用K-means得到准确的社区划分结果。级联结构在保持算法同等深度的情况下,大幅降低了算法时间复杂度。在人工合成数据集和真实数据集上的实验表明,DA-EF与同类算法K-means、DA-EML和CoDDA相比,其标准互信息NMI和模块度Q值高,而且聚类运行时间最少,具有精确度高和效率快的优势。在算法性能实验中,验证了算法的级联结构、自动编码器的深度以及影响力扩散指标的合理性和有效性。



关键词: 复杂网络, 自动编码器, 森林编码器, 社区结构, 社区发现

Abstract: Complex network is a typical representation of complex systems. Community structure is one of the most important structural characteristics of complex network. Aiming at the problem that the current community detection algorithms have low community detection accuracy and is not suitable for large-scale networks, a Deep Auto-encoder and EForest (DA-EF) algorithm and an influence diffusion similarity index are proposed. The DA-EF algorithm combines a multi-layer auto-encoder with a EForest to form a two-level cascade model, transforms the similarity matrix into low dimension and higher order feature matrices through dimensionality reduction and characterization learning, and finally uses K-means to obtain community detection results. The cascade structure greatly reduces the time complexity of the algorithm while maintaining the same depth of the algorithm. The simulation results show that, compared with similar algorithms such as K-means, Spectral and CoDDA, the proposed algorithm has the best NMI and modularity Q values, and the lowest running time of clustering on synthetic datasets and real datasets. It has the advantages of high accuracy and high efficiency. In the performance experiment of the algorithm, the rationality and effectiveness of the cascade structure, the depth of the auto-encoder, and the similarity index of the algorithm are verified.


Key words: complex network, auto-encoder, EForest, community structure, community detection