Computer Engineering & Science ›› 2025, Vol. 47 ›› Issue (1): 150-159.
• Artificial Intelligence and Data Mining • Previous Articles Next Articles
CHEN Xinran1,LIU Ning1,YAN Zhongmin1,LIU Lei2,CUI Lizhen1
Received:
Revised:
Online:
Published:
Abstract: Deep learning has achieved significant results in medical imaging diagnosis, and models based on deep neural networks can effectively assist doctors in making decisions. However, as the scale of model parameters gradually increases, large-scale parameter models in the medical domain are increasingly facing the challenge of data scarcity, as the labeling of high-quality medical image data requires professional physicians to manually complete. One solution is to introduce medical report guidance training paired with medical images, which involves the interaction of two modalities. However, cross-modal alignment methods in the general field lack capture of detailed information and cannot be fully applicable to the medical domain. To address this issue, an attention-guided dual-granularity cross-modal medical representation learning framework ADCRL is proposed to achieve alignment of medical images and reports at both coarse-grained and fine-grained levels. ADCRL can extract features from medical images and medical reports at two granularities, use an attention-guided module to select image regions of interest for medical tasks and remove noisy regions, and align two modalities at different granularities through contrastive learning based proxy tasks. ADCRL trains models under unsupervised paradigms to understand the global and detailed semantics of two modalities, and demonstrates excellent performance in downstream tasks using only limited annotated data. The main work include proposing fine-grained feature selection methods and a dual-granularity cross-modal feature learning framework, and pretraining and validating the effectiveness of the framework on publicly available medical datasets.
Key words: deep learning, medical image, self-supervised learning, contrastive learning, pretraining model, data augmentation
CHEN Xinran, LIU Ning, YAN Zhongmin, LIU Lei, CUI Lizhen. An attention-guided dual-granularity cross-modal medical representation learning framework[J]. Computer Engineering & Science, 2025, 47(1): 150-159.
0 / / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://joces.nudt.edu.cn/EN/
http://joces.nudt.edu.cn/EN/Y2025/V47/I1/150