• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2025, Vol. 47 ›› Issue (7): 1303-1311.

• 人工智能与数据挖掘 • 上一篇    下一篇

基于PKUSEG-Text-GCN的肿瘤疾病预测模型

高志玲1,赵新宇1,2
  

  1. (1.西北师范大学计算机科学与工程学院,甘肃 兰州 730070;2.新华三技术有限公司,北京310052)


  • 收稿日期:2023-12-01 修回日期:2024-09-04 出版日期:2025-07-25 发布日期:2025-08-25
  • 基金资助:
    国家自然科学基金(62363031)

A tumor disease prediction model based on PKUSEG-Text-GCN

GAO Zhiling1,ZHAO Xinyu1,2   

  1. (1.College of Computer Science & Engineering,Northwest Normal University,Lanzhou 730070;
    2.New H3C Technologies Co.,Ltd.,Beijing 310052,China)
  • Received:2023-12-01 Revised:2024-09-04 Online:2025-07-25 Published:2025-08-25

摘要: 当前疾病预测模型仅关注病历文本的局部信息以及上下文信息,缺乏对全局信息的考虑,由此导致预测结果准确率不高。利用图神经网络关注全局信息的特点,提出将图卷积神经网络(GCN)用于中文电子病历的肿瘤疾病预测。首先,利用医学领域分词工具包PKUSEG对中文电子病历进行分词;其次,通过病历与词的共现关系和病历文本中词与词之间的关系,建立文本图;最后,基于该医学文本图利用图卷积神经网络(Text-GCN)对文本图的特征进行学习,将学习到的模型用于肿瘤疾病预测。实验结果显示,所提模型相比多个模型中的最优模型准确率提升了6%。同时,当数据较少的时候准确率并不会明显下降,表明该模型在电子病历较少的情况下仍具有很好的鲁棒性。


关键词: 文本图卷积神经网络, 中文分词, 肿瘤致病分析, 肿瘤疾病预测

Abstract: Current disease prediction models primarily focus on local and contextual information within medical records,lacking the incorporation of global information,which results in suboptimal prediction accuracy.Leveraging the capability of graph neural networks to capture global information,this study proposes the use of graph convolutional networks (GCN) for tumor disease prediction based on Chinese electronic medical records (EMRs).Firstly,the PKUSEG medical domain-specific word segmentation model is employed to tokenize Chinese EMRs.Then,a text graph is constructed by analyzing the co-occurrence relationships between medical records and words,as well as the relationships between words within the medical text.Finally,the graph convolutional network (Text-GCN) is applied to learn the features of this medical text graph,and the trained model is utilized for tumor disease prediction.Experimental results demonstrate that the proposed model achieves a 6% improvement in accuracy compared to the best-performing baseline model.Moreover,the accuracy does not significantly decline when the dataset is small,indicating that the method exhibits strong robustness even with limited electronic medical records.

Key words: text graph convolutional network, Chinese word segmentation, tumor disease analysis, tumor disease prediction

中图分类号: