• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2022, Vol. 44 ›› Issue (05): 826-833.

• 计算机网络与信息安全 • 上一篇    下一篇

基于半监督生成对抗网络的恶意代码家族分类实现

王栋1,2,杨珂1,2,玄佳兴1,2,韩雨桐3,赵丽花1,2,王旭仁4   

  1. (1.国网电子商务有限公司(国网雄安金融科技集团有限公司),北京 100053;
    2.国家电网有限公司区块链技术实验室,北京 100053;
    3.中国科学院信息工程研究所,北京 100093;
    4.首都师范大学信息工程学院,北京 100048)
  • 收稿日期:2020-09-21 修回日期:2021-01-01 接受日期:2022-05-25 出版日期:2022-05-25 发布日期:2022-05-24
  • 基金资助:
    国家自然科学基金(61872252);国家重点研发计划项目(2018YFB0805005);国网电商公司科技项目(2500/2020-72001B)

Realization of malicious code family classification based on semi-supervised generative adversarial network

WANG Dong1,2,YANG Ke1,2,XUAN Jia-xing1,2,HAN Yu-tong3,ZHAO Li-hua1,2,WANG Xu-ren4   

  1. (1.State Grid Electronic Commerce Co.,Ltd.(State Grid Xiong’an Financial Technology Group Co.,Ltd.),Beijing 100053;
    2.Blockchain Technology Laboratory,State Grid Corporation of China,Beijing 100053;
    3.Institute of Information Engineering,Chinese Academy of Sciences,Beijing 100093;
    4.College of Information Engineering,Capital Normal University,Beijing 100048,China)
  • Received:2020-09-21 Revised:2021-01-01 Accepted:2022-05-25 Online:2022-05-25 Published:2022-05-24

摘要: 随着互联网的发展,恶意代码呈现海量化与多态化的趋势,恶意代码家族分类是网络空间安全面临的挑战之一。将半监督生成对抗网络与深度卷积学习网络相结合,构建半监督深度卷积生成对抗网络,提出了一种恶意代码家族分类模型,通过恶意代码家族特征分析,对恶意代码进行特征提取,转化为一维灰度图像;然后基于一维卷积神经网络1D-CNN,构建半监督生成对抗网络SGAN,形成恶意代码家族分类模型SGAN-CNN。从特征提取优化、半监督生成对抗训练算法优化等方面进行恶意代码家族分类能力提升。为了验证SGAN-CNN模型的分类效果,在Microsoft Malware Classification Challenge数据集上进行实验。5折交叉验证测试显示,本文提出的模型在样本标注标签占80%的情况下,分类的平均准确率达到98.81%;在样本标注标签仅有20%的情况下,分类的平均准确率达到98.01%,取得了较好的分类效果。在小样本数量情况下,也能取得不错的分类准确率。

关键词: 深度学习, 一维卷积神经网络, 半监督学习, 生成对抗网络, 恶意代码分类

Abstract: With the development of Internet, malicious code tend to be massive and polymorphic. The classification of malicious code family is one of the challenges of cyber security. Combining the semi supervised generation network with the deep convolutional neural network, a multi-family malicious code classification model is proposed. Taking the gray image of malicious codes as the feature, based on the efficient one-dimensional convolutional neural network (1D-CNN), using the semi-supervised generative adversarial network (SGAN), an efficient and accurate malicious code family classification model is constructed as SGAN-CNN, which can improve the malicious code classification ability from aspects of efficient feature extraction and SGAN optimization. In order to verify the classification ability of the model, experiments are carried out on the Microsoft malware classification challenge data set. 5-fold cross-validation shows that the proposed model achieves 98.81% of the average accuracy of the test set with 80% of the tag rate, 98.01% of the average accuracy of the test set with 20% of the tag rate, and achieves better experimental results. In the case of small samples, it can also achieve good classification accuracy.

Key words: deep learning, one-dimensional convolutional neural network, semi-supervised learning, generative adversarial network, malicious code classification