• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2024, Vol. 46 ›› Issue (04): 635-646.

• Computer Network and Znformation Security • Previous Articles     Next Articles

Semi-supervised website topic classification based on hetero-geneous graph neural networkWANG

Xie-zhong1,CHEN Xu1,JING Yong-jun1,WANG Shu-yang2   

  1. (1.School of Computer Science and Engineering,North Minzu University,Yinchuan 750000;
    2.School of Electrical and Information Engineering,North Minzu University,Yinchuan 750000,China)
  • Received:2023-09-06 Revised:2023-10-17 Accepted:2024-04-25 Online:2024-04-25 Published:2024-04-18

Abstract: The rapid growth of the number of Internet websites has made existing methods challenging to accurately classify specific website topics. URL-based methods, for example, struggle to handle topic information not reflected in the URL, while content-based methods face limitations due to data sparsity and challenges in capturing semantic relationships. To address this, a semi-supervised website topic classification method, HGNN-SWT, based on a heterogeneous graph neural network, is proposed. This method not only utilizes website text features to complement the limitations of using only URL features but also models sparse relationships between website text and words using a heterogeneous graph, improving classification performance by handling node and edge relationships within the graph. The approach introduces a neighbor node sampling method based on random walks, considering both local features and the global graph structure of nodes. Additionally, a feature fusion strategy is proposed to capture contextual relationships and feature interactions within website text data. Experimental results on a self-created Chinaz Website dataset demonstrate that HGNN-SWT achieves higher accuracy in website topic classification compared to existing methods.

Key words: website topic, heterogeneous graph neural network, semi-supervised, feature fusion