• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• • 上一篇    下一篇

基于保持节点簇分布的图提示少样本节点分类

谢秋园,李秋瑶,柴变芳   

  1. (河北地质大学信息工程学院,石家庄 050030)

XIE Qiuyuan, LI Qiuyao,CHAI Bianfang   

  1. (School of Information Engineering, Hebei GEO University, Shijiazhuang 050030)

摘要: 在图挖掘任务中,基于原型的图提示学习已被广泛视为提升图数据分析性能的有效手段。然而,在少样本节点分类场景下,现有方法存在无标签数据利用不足导致类原型构建不准确以及对图拓扑结构信息利用不充分的问题,这些不足限制了图提示学习方法在下游任务中的效果。为此,提出了一种融合所有节点簇分布的图提示学习方法PNCD-GP,旨在通过充分利用无标签数据的簇分布和拓扑结构信息,提升分析的性能和准确性。在预训练阶段,采用预测掩码和保持图节点聚类作为优化策略,以学习具有判别力的表示,缩小上下游任务之间的差距。在提示微调模型学习阶段,在原始图中引入类原型虚拟节点作为提示,利用高阶信息增强拓扑结构,提升模型对图结构的理解和利用;通过保持无标签样本与有标签节点的簇分布来学习提示。该方法能够构建更精准的原型向量,并利用类原型与节点表示的相似性进行节点分类。在多个公开图数据集上的实验结果表明,PNCD-GP方法在效率与准确率方面均有显著优势。

关键词: 图挖掘, 图提示学习, 图神经网络, 少样本, 聚类

Abstract:  In graph mining tasks, prototype-based graph prompt learning has been widely regarded as an effective means to improve the performance of graph data analysis. However, in the scenario of fest-sample node classification, existing methods have problems such as inaccurate construction of class prototypes due to insufficient utilization of unlabeled data and inadequate utilization of graph topological structure information. These deficiencies limit the effectiveness of graph prompt learning methods in downstream tasks. To this end, a graph prompt learning method PNCD-GP that integrates the cluster distribution of all nodes is proposed, aiming to improve the performance and accuracy of the analysis by fully utilizing the cluster distribution and topological structure information of unlabeled data. In the pre-training stage, predictive masks and preserving graph node clustering are adopted as optimization strategies to learn discriminative representations and narrow the gap between upstream and downstream tasks. During the prompt fine-tuning model learning stage, class prototype virtual nodes are introduced into the original graph as prompts, and high-order information is utilized to enhance the topological structure, thereby improving the model's understanding and utilization of the graph structure. Learn cues by maintaining the cluster distribution of unlabeled samples and labeled nodes. This method can construct more accurate prototype vectors and classify nodes by utilizing the similarity between class prototypes and node representations. Experimental results on multiple public graph datasets show that the PNCD-GP method has significant advantages in both efficiency and accuracy.

Key words: graph mining, graph prompt learning, graph neural network, few-shot, clustering