• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2026, Vol. 48 ›› Issue (1): 70-78.

• 计算机网络与信息安全 • 上一篇    下一篇

一种基于通用扰动的后门攻击防御框架

饶月,马晓宁,程忠锋   

  1. (1.中国民航大学安全科学与工程学院,天津 300300;2.中国民航信息网络股份有限公司,北京 101318) 

  • 收稿日期:2024-05-14 修回日期:2024-09-12 出版日期:2026-01-25 发布日期:2026-01-25
  • 基金资助:
    民航安全能力项目(ATSA20220038)

A generic perturbation-based defense framework for back-door attacks

RAO Yue,MA Xiaoning,CHENG Zhongfeng   

  1. (1.School of Safety Science and Engineering,Civil Aviation University of China,Tianjin 300300;
    2.Travel Sky Technology Ltd.,Beijing 101318,China)
  • Received:2024-05-14 Revised:2024-09-12 Online:2026-01-25 Published:2026-01-25

摘要: 最近研究表明,深度神经网络(DNN)容易受到后门攻击,这种攻击隐蔽且强大,能让模型输出攻击者所期待的结果。针对目前后门攻击防御研究需要较高计算开销的同时还会影响模型准确率的问题,提出了一种基于通用扰动的防御框架,该框架将检测后门与消除后门的工作结合起来。检测阶段在样本集上产生能使良性样本分类错误而对后门样本无影响的扰动,通过对比待检测样本添加扰动后模型前后输出结果的变化来完成后门样本的高效检测。消除阶段将检测到的后门样本使用随机主色覆盖方法重建后与良性样本混合去重训练后门模型。在MNIST、Fashion-MNIST和CIFAR-10数据集上验证该框架在不同触发器设计、中毒比例对防御的影响以及对于特定标签攻击的防御效果。实验表明,该框架不仅能很好地降低后门攻击在不同条件下的攻击成功率,还对良性样本的分类性能几乎没有影响,同时对于特定标签攻击的防御效果相比之前的研究也有了很大的提升。


关键词: 深度神经网络, 通用扰动, 特定标签攻击, 后门攻击, 后门防御

Abstract: Recent studies have shown that deep neural network (DNN) is vulnerable to backdoor attacks, which are stealthy and powerful enough to allow the model to output the results expected by the attacker. To address the problem that current research on defense against backdoor attacks requires high computational overhead while also affecting the accuracy of the model, a generic perturbation-based defense framework is proposed, which combines the detection of backdoors with the elimination of backdoors. The detection phase generates generic  perturbations for the sample set that cause the model to misclassify benign samples without affecting the backdoor samples, and accomplishes the efficient detection of backdoor samples by comparing the changes in the model's output before and after the addition of the perturbations to the samples to be detected. In the elimination stage, the detected backdoor samples are reconstructed using the random primary color overlay method and mixing with the benign samples to deduplicate and train the backdoor model. The framework is validated on MNIST, Fashion-MNIST, and CIFAR-10 datasets to verify the effectiveness of the framework in terms of the effects of different trigger designs, poisoning ratios on the defense, and the defense effect for specific label attacks. Experimental results demonstrate that the framework not only significantly reduces the success rate of backdoor attacks under various conditions but also has minimal impact on the classification performance of benign samples. Additionally, compared to previous studies, it shows substantial improvements in defending against specific label attacks.


Key words: deep neural network(DNN);generic , perturbation;specific label attack;backdoor attack;backdoor defense