• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2025, Vol. 47 ›› Issue (06): 1121-1132.

• 人工智能与数据挖掘 • 上一篇    下一篇

基于知识和数据双驱动的DRG医疗问答研究

徐春,孙恩威,汪晓洁   

  1. (新疆财经大学信息管理学院,新疆 乌鲁木齐 830012)
  • 收稿日期:2024-05-04 修回日期:2024-07-02 出版日期:2025-06-25 发布日期:2025-06-26
  • 基金资助:
    国家自然科学基金(6226604);新疆维吾尔自治区自然科学基金(2023D01A73)

DRG medical Q&A research based on both knowledge and data

XU Chun,SUN Enwei,WANG Xiaojie   

  1. (School of Information Management,Xinjiang University of Finance & Economics, Urumqi 830012,China)

  • Received:2024-05-04 Revised:2024-07-02 Online:2025-06-25 Published:2025-06-26

摘要: 涵盖DRG编码的真实电子病历数据过少无法支撑语言模型学习文本特征,并且现有的疾病编码模型针对复杂文本难以作出结果解释。为此,设计了一种融合医疗知识图谱和大语言模型的医疗问答系统模型GLM-2B-DRAGON。首先,利用ChatGLM-6b模型抽取并更新医疗实体及实体关系,得到涵盖DRG编码等医疗知识的知识图谱DRG-Net;其次, 使用跨模态编码器将QA问题对与知识图谱进行联合编码,实现相互补充的文本-图谱双向信息流以捕捉医疗文本特征;最后,通过知识图谱路径权重可视化分析,验证回答结果的可解释性。实验结果表明:在公开数据集CommenSenseQA和自建医疗数据集MedicalQA上,所构建的医疗问答系统模型优于现有的知识图谱增强语言模型。

关键词: 疾病诊断相关分组, 医疗问答, 医疗知识图谱, 大语言模型

Abstract: The real electronic medical record data covering diagnosis related groups (DRG) coding are too scarce to support language models in learning text features, and the existing disease coding models are difficult to interpret the results for complex text. Therefore, this paper designs a medical question answering system model GLM-2B-DRAGON (generative language model-deep bidirectional language-knowledge graph pretraining) that integrates medical knowledge graph and large language model. Firstly, ChatGLM-6B model is employed to extract and update medical entities and entity relationships, and a knowledge graph DRG-Net covering medical knowledge such as DRG coding is obtained. Secondly, the cross-modal encoder is used to jointly encode the QA pairs and the knowledge graph to realize the complementary text-graph bidirectional information flow to capture the characteristics of medical text. Finally, the interpretability of the answer results is verified through the visual analysis of the path weights of the knowledge graph. The experimental results show that the proposed system model is superior to the existing knowledge graph enhanced language models on the public dataset CommenSenseQA and the self-built medical dataset MedicalQA.

Key words: disease diagnosis related grouping, medical Q&, A, medical knowledge graph, large language model