• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2022, Vol. 44 ›› Issue (05): 826-833.

• Computer Network and Znformation Security • Previous Articles     Next Articles

Realization of malicious code family classification based on semi-supervised generative adversarial network

WANG Dong1,2,YANG Ke1,2,XUAN Jia-xing1,2,HAN Yu-tong3,ZHAO Li-hua1,2,WANG Xu-ren4   

  1. (1.State Grid Electronic Commerce Co.,Ltd.(State Grid Xiong’an Financial Technology Group Co.,Ltd.),Beijing 100053;
    2.Blockchain Technology Laboratory,State Grid Corporation of China,Beijing 100053;
    3.Institute of Information Engineering,Chinese Academy of Sciences,Beijing 100093;
    4.College of Information Engineering,Capital Normal University,Beijing 100048,China)
  • Received:2020-09-21 Revised:2021-01-01 Accepted:2022-05-25 Online:2022-05-25 Published:2022-05-24

Abstract: With the development of Internet, malicious code tend to be massive and polymorphic. The classification of malicious code family is one of the challenges of cyber security. Combining the semi supervised generation network with the deep convolutional neural network, a multi-family malicious code classification model is proposed. Taking the gray image of malicious codes as the feature, based on the efficient one-dimensional convolutional neural network (1D-CNN), using the semi-supervised generative adversarial network (SGAN), an efficient and accurate malicious code family classification model is constructed as SGAN-CNN, which can improve the malicious code classification ability from aspects of efficient feature extraction and SGAN optimization. In order to verify the classification ability of the model, experiments are carried out on the Microsoft malware classification challenge data set. 5-fold cross-validation shows that the proposed model achieves 98.81% of the average accuracy of the test set with 80% of the tag rate, 98.01% of the average accuracy of the test set with 20% of the tag rate, and achieves better experimental results. In the case of small samples, it can also achieve good classification accuracy.

Key words: deep learning, one-dimensional convolutional neural network, semi-supervised learning, generative adversarial network, malicious code classification