• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2020, Vol. 42 ›› Issue (08): 1489-1499.

• 人工智能与数据挖掘 • 上一篇    下一篇

融合特征向量中心性与标签熵的标签传播算法

潘曙灿,许青林   

  1. (广东工业大学计算机学院,广东 广州 510000)

  • 收稿日期:2019-12-05 修回日期:2020-03-05 接受日期:2020-08-25 出版日期:2020-08-25 发布日期:2020-08-29
  • 基金资助:
    广东省科技计划项目(2016B030306003)

A label propagation algorithm combining eigenvector centrality and label entropy

PAN Shu-can,XU Qing-lin   

  1. (School of Computer Science,Guangdong University of Technology,Guangzhou 510000,China)
  • Received:2019-12-05 Revised:2020-03-05 Accepted:2020-08-25 Online:2020-08-25 Published:2020-08-29

摘要: 重叠社区结构挖掘旨在发现复杂网络中多个独立社区之间的重叠部分,其在社交、交通、舆情乃至反恐等领域具有广泛的应用。然而,目前基于标签传播的重叠社区挖掘算法在社区结构模糊的网络中表现出较强的随机性,导致准确度不高。针对重叠社区模糊边界导致的不确定性和低准确度问题,提出一种融合特征向量中心性与标签熵的标签传播算法ECLE-LPA。ECLE-LPA通过融合节点的K-核迭代因子与特征向量中心性来计算节点影响力并初始化节点标签,在标签传播过程中,通过节点标签熵和节点间亲密度更新节点标签列表及其标签隶属度,从而较好地克服了社区模糊边界的识别问题。实验结果表明:在Les Miserables、Polbooks、Football、Polblogs和Netscience等真实网络中,ECLE-LPA划分结果的EQ值普遍比对比算法提高了1%~3%;在社区结构模糊的人工网络中,ECLE-LPA划分结果的NMI值比其他标签传播算法提高了10%以上。

关键词: 复杂网络, 重叠社区, K-核迭代因子, 标签传播, 特征向量中心性, 标签熵

Abstract: Overlapping community structure mining aims to discover overlapping parts between multiple independent communities in a complex network, and it has a wide range of applications in social, transportation, public opinion, and even anti-terrorism fields. However, the current overlapping community mining algorithm based on label propagation shows strong randomness in networks with fuzzy community structure, resulting in low accuracy. Aiming at the problem of uncertainty and low accuracy caused by fuzzy boundaries of overlapping communities, a label propagation algorithm combining eigenvector centrality and label entropy (ECLE-LPA) is proposed. ECLE-LPA utilizes the K-nuclear iteration factor and the eigenvector centrality of the node to calculate the node influence and initialize the node label. In the propagation process, the label entropy and the closeness of the nodes are calculated to update the label list and the label membership, so as to overcome the recognition problem of fuzzy boundaries of overlapping communities. The experimental results show that: in real networks such as Les, Polbooks, Football, Polblogs, Netscience, etc., the EQ value of ECLE-LPA algorithm is generally increased by 1%~3% compared with the contrast algorithm. In the artificial network, the NMI value of ECLE-LPA is more than 10% higher than that of the contrast algorithm.

Key words: complex network, overlapping community, K-nuclear iteration factor, label propagation, eigenvector centrality, label entropy