• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2026, Vol. 48 ›› Issue (4): 617-627.

• 计算机网络与信息安全 • 上一篇    下一篇

一种基于强化学习的PE恶意软件对抗样本生成方法

张朝然,马玉骐,张三峰,杨望   

  1. (1.东南大学网络空间安全学院,江苏 南京 211189;
    2.教育部计算机网络和信息集成重点实验室(东南大学),江苏 南京 211189)

  • 收稿日期:2024-02-27 修回日期:2024-09-24 出版日期:2026-04-25 发布日期:2026-04-29
  • 基金资助:
    国家重点研发计划(2022YFB3104601)

A reinforcement learning-based method for generating adversarial examples against PE malware

ZHANG Chaoran,MA Yuqi,ZHANG Sanfeng,YANG Wang   

  1. (1.School of Cyber Science and Engineering,Southeast University,Nanjing 211189;
    2.Key Laboratory of Computer Network and Information Integration (Southeast University),
    Ministry of Education,Nanjing 211189,China)
  • Received:2024-02-27 Revised:2024-09-24 Online:2026-04-25 Published:2026-04-29
  • Supported by:


摘要: 提出一种基于强化学习的PE恶意软件对抗样本生成方法。将PE恶意软件对抗样本生成视为序列到序列的生成任务,并对离线强化学习数据集进行序列建模,利用Transformer强大的序列生成能力,通过每次预测一个动作来逐步生成序列。此外,引入信息传输机制来实现强化学习过程中跨回合信息传输,提高数据效率。实验表明,基于所提出方法生成的PE恶意软件对抗样本的逃逸率优于对比实验,并具有可转移性。

关键词: 强化学习, 对抗样本, PE恶意软件, 恶意软件检测

Abstract: This paper proposes a reinforcement learning-based method for generating adversarial examples against PE malware. Firstly, it regards the generation of adversarial examples for PE malware as a sequence-to-sequence generation task, which models sequences on an offline reinforcement learning dataset and leverages the powerful sequence generation capability of Transformer by incrementally generating sequences through predicting actions at each step. Furthermore, an information transmission mechanism is introduced to facilitate cross-episode information transfer during the reinforcement learning process, enhancing data efficiency. Experimental results demonstrate that the evasion rate of PE malware adversarial examples generated using this method outperforms those in comparative experiments and exhibits transferability.


Key words: reinforcement learning, adversarial example, PE malware, malware detection