• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2023, Vol. 45 ›› Issue (09): 1701-1710.

• 人工智能与数据挖掘 • 上一篇    

基于异构图病历注意力网络的临床辅助诊断研究

李勇1,冯俐2,王霞3   

  1. (1.西北师范大学计算机科学与工程学院,甘肃 兰州 730070;
    2.新疆理工学院信息工程学院,新疆 阿克苏 843100;3.甘肃省人民医院药剂科,甘肃 兰州 730000)
  • 收稿日期:2022-04-08 修回日期:2022-06-09 接受日期:2023-09-25 出版日期:2023-09-25 发布日期:2023-09-12
  • 基金资助:
    国家自然科学基金(62163033)

Clinical assisted diagnosis based on heterogeneous graph medical record attention network

LI Yong1,FENG Li2,WANG Xia3   

  1. (1.College of Computer Science and Engineering,Northwest Normal University,Lanzhou 730070;
    2.College of Information Engineering,Xinjiang Institute of Technology,Aksu 843100;
    3.Department of Pharmacy,the People’s Hospital of Gansu Province,Lanzhou 730000,China)
  • Received:2022-04-08 Revised:2022-06-09 Accepted:2023-09-25 Online:2023-09-25 Published:2023-09-12

摘要: 从电子病历中自动提取有价值的信息并进行疾病辅助诊断,对于临床决策支持、智慧医院建设等都有重要的理论和实践意义。然而,电子病历中病症数据存在分布不平衡问题,导致辅助诊断中部分疾病的病历数据量不足;同时,传统方法忽略了病历的异构性和多源情境信息,这些都会使疾病预测准确性欠佳。提出了一种基于异构图病历注意力网络的临床辅助诊断预测模型HCAD。首先,通过构建外部医学知识图谱,解决电子病历数据不平衡问题;其次,有效融合患者病情描述和生理记录等情境信息,通过设计节点级注意力机制和语义关系级注意力机制,来重点识别节点和不同语义关系信息的重要程度;最后,通过分层聚合得到具有高度代表性的患者节点向量表示,从而准确地进行疾病预测。在真实电子病历数据集上的实验表明,模型HCAD具有较高的可行性、有效性和可解释性,其F1值相比基准模型的平均提高了7.45%。

关键词: 电子病历, 异构图病历网络, 元路径, 疾病预测

Abstract: Automatically extracting useful information from electronic medical records (EMRs) and assisting in disease diagnosis has important theoretical and practical significance for clinical decision support and smart hospital construction. However, there is an imbalanced distribution of symptom data in EMRs, which leads to insufficient data volume for some diseases in assisted diagnosis. Moreover, traditional methods ignore the heterogeneity and multi-source contextual information of medical records, which can lead to poor disease prediction accuracy. This paper proposes a clinical assisted diagnosis prediction model HCAD based on heterogeneous graph medical record attention network. Firstly, the problem of imbalanced electronic medical record data is solved by constructing an external medical knowledge graph. Secondly, by effectively integrating patient condition descriptions and physiological records and designing node-level attention mechanisms and semantic relationship-level attention mechanisms, the importance of node and different semantic relationship information is identified. Finally, highly representative patient node vector representations are obtained through hierarchical aggregation, which accurately predicts diseases. Experiments on a real EMR dataset show that the proposed model has high feasibility, effectiveness, and interpretability, with an average F1 value improvement of 7.45% compared to the baseline.

Key words: electronic medical record, heterogeneous graph medical record network, meta-path, disease prediction