• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2025, Vol. 47 ›› Issue (01): 150-159.

• 人工智能与数据挖掘 • 上一篇    下一篇

基于注意力指导的双粒度跨模态医学特征学习框架

陈欣然1,刘宁1,闫中敏1,刘磊2,崔立真1   

  1. (1.山东大学软件学院,山东 济南 250101;2.山东省工业技术研究院,山东 济南 250100)
  • 收稿日期:2024-06-27 修回日期:2024-08-29 接受日期:2025-01-25 出版日期:2025-01-25 发布日期:2025-01-18
  • 基金资助:
    山东省自然科学基金青年基金(ZR2022QF114)

An attention-guided dual-granularity cross-modal medical representation learning framework

CHEN Xinran1,LIU Ning1,YAN Zhongmin1,LIU Lei2,CUI Lizhen1   

  1. (1.School of Software,Shandong University,Jinan 250101;
    2.Shandong Research Institute of Industrial Technology,Jinan 250100,China)
  • Received:2024-06-27 Revised:2024-08-29 Accepted:2025-01-25 Online:2025-01-25 Published:2025-01-18

摘要: 深度学习在医学影像诊断中取得显著成果,基于深度神经网络的模型可以有效辅助医生进行决策。然而,随着模型参数规模逐渐增大,且高质量医学影像数据的标签需要专业医师手工完成,因此大规模参数模型在医疗领域愈发面临数据稀缺的挑战。一种解决方案是引入与医学影像成对的医学报告指导训练,这涉及2种模态的交互,而通用领域的跨模态对齐方法缺乏对细节信息的捕捉,不能完全适用于医疗领域。为解决此问题,提出一种注意力指导的双粒度跨模态医学特征学习框架ADCRL,实现了医学影像和报告在粗粒度和细粒度上的对齐。ADCRL能够提取出医学影像和医学报告2种粒度上的特征,使用注意力指导的模块选择医学任务可能感兴趣的影像区域,并去除噪声区域。通过对比学习式的代理任务实现2个粒度上模态的对齐。ADCRL在无监督范式下训练模型理解2种模态的全局语义和细节语义,并在下游任务中仅使用有限标注数据,即可表现出优秀的性能。主要工作包括提出细粒度特征选择方法和双粒度跨模态特征学习框架,并在公开医疗数据集上预训练并验证了框架的有效性。

关键词: 深度学习, 医学影像, 自监督学习, 对比学习, 预训练模型, 数据增强

Abstract: Deep learning has achieved significant results in medical imaging diagnosis, and models based on deep neural networks can effectively assist doctors in making decisions. However, as the scale of model parameters gradually increases, large-scale parameter models in the medical domain are increasingly facing the challenge of data scarcity, as the labeling of high-quality medical image data requires professional physicians to manually complete. One solution is to introduce medical report guidance training paired with medical images, which involves the interaction of two modalities. However, cross-modal alignment methods in the general field lack capture of detailed information and cannot be fully applicable to the medical domain. To address this issue, an attention-guided dual-granularity cross-modal medical representation learning framework ADCRL is proposed to achieve alignment of medical images and reports at both coarse-grained and fine-grained levels. ADCRL can extract features from medical images and medical reports at two granularities, use an attention-guided module to select image regions of interest for medical tasks and remove noisy regions, and align two modalities at different granularities through contrastive learning based proxy tasks. ADCRL trains models under unsupervised paradigms to understand the global and detailed  semantics of two modalities, and demonstrates excellent performance in downstream tasks using only limited annotated data. The main work include proposing fine-grained feature selection methods and a dual-granularity cross-modal feature learning framework, and pretraining and validating the effectiveness of the framework on publicly available medical datasets.

Key words: deep learning, medical image, self-supervised learning, contrastive learning, pretraining model, data augmentation