• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2026, Vol. 48 ›› Issue (3): 500-511.

• Graphics and Images • Previous Articles     Next Articles

An adversarial examples defense method for image reconstruction based on SCViT

ZHANG Xinjun,GUO Jifa   

  1. (School of Electronic and Information Engineering,Liaoning Technical University,Huludao 125105,China)
  • Received:2024-05-21 Revised:2024-09-13 Online:2026-03-25 Published:2026-03-25

Abstract: The growing development of artificial intelligence (AI) has brought great convenience to people’s lives, but it has also gradually triggered human contemplation regarding its security. Image classification is a crucial research task in the field of computer vision; however, the vulnerability of deep neural networks makes them susceptible to attacks from adversarial examples. Adversarial examples represent a significant research direction within the realm of AI security, with a plethora of techniques emerging for both generating and defending against them. This paper introduces modifications based on the vision Transformer (ViT) and proposes a novel model, similarity comparison vision Transformer (SCViT), for comparing the similarity of image patches. In SCViT, image patches are processed through a linear projection layer and a Transformer Encoder to obtain corresponding representation vectors. The cosine similarity between these vectors is then calculated to determine the degree of similarity between image patches. To mitigate the influence  of positional encoding on similarity computation, a small coefficient, denoted as α, is introduced before the positional encoding in SCViT. By utilizing SCViT for image patches similarity comparison, clean sample patches are used to replace adversarial sample patches one by one. Subsequently, all replaced clean sample patches are concatenated to form a new image for classification. Experimental results on the CIFAR-10 dataset demonstrate that selecting an appropriate value for α can enhance the defensive performance of the proposed method. Furthermore, experiments conducted on the Inception_v3 and Inception_v4 classification models indicate that the proposed  method exhibits good transferability across different classification networks. Compared with several commonly used image reconstruction defense methods, the proposed method not only achieves superior defensive performance but also demonstrates greater robustness, with image classification accuracy exceeding 80% against 4 types of attack methods. Additionally, experiments on the CIFAR-100 and ImageNet datasets show that the classification accuracy for adversarial examples improves by over 54 percentage points  and 46 percentage points, respectively, highlighting the versatility of the proposed method.


Key words: image classification, adversarial example, image stitching, vision Transformer, poisson fusion