Computer Engineering & Science ›› 2026, Vol. 48 ›› Issue (2): 363-371.
• Artificial Intelligence and Data Mining • Previous Articles Next Articles
LU Shunyi,HE Qing
Online:
Published:
Abstract: To address the issue that traditional text semantic matching methods struggle to deeply mine in-depth semantic features and interaction relationships between texts, this paper proposes an article pair matching model based on multi-feature fusion of pre-trained language models (MF-APM). Firstly, a data augmentation strategy is employed to prune article content, filtering out key sentences. Secondly, the augmented news documents are fed into a Longformer model with a Siamese network architecture to extract deep features of the article content, and document matching information is obtained by combining attention-based feature fusion methods. Thirdly, BERT is used to interactively encode news headlines, and the resulting encoded vectors are input into a multi-head attention mechanism to extract deep interactive features of the headlines, thereby obtaining headline interaction information. Finally, the semantic features of both headline interaction information and document interaction information are fused through max-pooling feature fusion to predict the relationship between text pairs. Additionally, during model training, PolyLoss is introduced to replace the traditional binary cross-entropy loss function, effectively reducing the complexity of hyperparameter tuning. The proposed MF-APM model is compared with other matching models on 2 datasets, CNSE and CNSS. Experimental results show that, compared to the baseline models, the MF-APM model achieves accuracy improvements of 0.41 and 1.59 percentage points on the CNSE and CNSS datasets, respectively, and F1-score improvements of 4.64 and 1.66 percentage points, effectively enhancing the accuracy of article pair matching tasks.
Key words: pre-trained language model, long text matching, multi-head attention mechanism, attention feature fusion, PolyLoss function
LU Shunyi, HE Qing. Article pair matching model based on multi-feature fusion of pre-trained language models[J]. Computer Engineering & Science, 2026, 48(2): 363-371.
0 / / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://joces.nudt.edu.cn/EN/
http://joces.nudt.edu.cn/EN/Y2026/V48/I2/363