• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2026, Vol. 48 ›› Issue (2): 372-380.

• 人工智能与数据挖掘 • 上一篇    

自适应融合的多模态实体对齐方法

王艺焱,王海荣,王怡梦,王文龙   

  1. (1.北方民族大学计算机科学与工程学院,宁夏 银川 750021;2.珠海科技学院计算机学院,广东 珠海 519041)

  • 收稿日期:2024-04-08 修回日期:2024-09-26 出版日期:2026-02-25 发布日期:2026-03-10
  • 基金资助:
    宁夏自然科学基金(2023AAC03316);宁夏回族自治区教育厅高等学校科学研究重点项目(NYG2022051)

Adaptive fusion for multimodal entity alignment method

WANG Yiyan,WANG Hairong,WANG Yimeng,WANG Wenlong#br#

#br#
  

  1. (1.School of Computer Science and Engineering,North Minzu University,Yinchuan 750021;
    2.School of Computer Science,Zhuhai College of Science and Technology,Zhuhai 519041,China)
  • Received:2024-04-08 Revised:2024-09-26 Online:2026-02-25 Published:2026-03-10

摘要: 针对多模态实体对齐存在的特征融合时信息易丢失问题,以及对齐时仅关注联合实体向量导致实体无法被正确对齐的问题,提出了自适应融合的多模态实体对齐方法ADMMEA。该方法利用FastText、ResNet-152和GAT模型提取多模态实体特征,同时获取实体名称、图像和结构数据的特征表示;采用布雷-柯蒂斯(Bray-Curtis)相异矩阵与莱文斯坦(Levenshtein)距离,计算源实体与目标实体间的相似度,生成各模态的距离矩阵;通过自适应融合策略融合图文距离矩阵,将其与结构信息矩阵拼接,得到最终的融合矩阵;利用排序思想匹配对融合矩阵按照相似度分数进行降序排列实现多模态实体对齐。在DBP15K数据集的ZH-EN,JA-EN和FR-EN子数据集上进行方法实验,并将实验结果与JAPE,RDGCN,MOGNN和MIMEA等13种方法进行对比,结果表明ADMMEA在ZH-EN,JA-EN和FR-EN这3个数据集上的Hits@1指标分别达到了0.985,0.995和0.994,证明了ADMMEA方法的有效性。

关键词: 多模态知识图谱, 多模态实体对齐, 嵌入模型, 自适应融合, 匹配问题

Abstract: To address the issues of information loss during feature fusion and incorrect entity alignment caused by solely focusing on joint entity vectors in multimodal entity alignment, this paper proposes an adaptive fusion  for multimodal entity alignment method(ADMMEA). This method employs FastText, ResNet-152, and GAT models to extract multimodal entity features, obtaining feature representations for entity names, images, and structural data. It utilizes the Bray-Curtis dissimilarity matrix and Levenshtein distance to calculate the similarity between source and target entities, generating distance matrices for each modality. Through an adaptive fusion strategy, the text-image distance matrices are fused and then concatenated with the structural information matrix to obtain the final fused matrix. Leveraging a ranking approach, the fused matrix is sorted in descending order based on similarity scores to achieve multimodal entity alignment. Experimental evaluations are conducted on the ZH-EN, JA-EN, and FR-EN subsets of the DBP15K dataset, and the results are compared with 13 other methods, including JAPE, RDGCN, MOGNN, and MIMEA, etc. The findings demonstrate that ADMMEA achieves Hits@1 scores of 0.985, 0.995, and 0.994 on the ZH-EN,JA-EN and FR-EN datasets, respectively, validating the effectiveness of the ADMMEA.

Key words: multimodal knowledge graphs, multimodal entity alignment, embedding model, adaptive fusion, matching problem