An Android malware detection method based on pre-trained language model

Computer Engineering & Science ›› 2023, Vol. 45 ›› Issue (08): 1433-1442.

• Computer Network and Znformation Security • Previous Articles Next Articles

An Android malware detection method based on pre-trained language model

YIN Jie1,HUANG Xiao-yu1,LIU Jia-yin1,NIU Bo-wei2，XIE Wen-wei3,4

(1.Department of Computer Information and Network Security,Jiangsu Police Institute,Nanjing 210031；
2.Cyber Security Guard Corps,Jiangsu Provincial Security Department,Nanjing 210024；
3.Department of Network Security,Trend Micro Incorporated,Nanjing 210012；
4.FOCUSLAB of Nanjing University of Posts and Telecommunications,Nanjing 210003,China)

Received:2022-11-01 Revised:2023-01-06 Accepted:2023-08-25 Online:2023-08-25 Published:2023-08-18

Abstract

Abstract: In recent years, supervised machine learning-based Android malware detection methods have made some progress. However, due to the difficulty in collecting malware samples, the size of labeled datasets is generally small, which leads to limited generalization ability of the trained supervised models. To address this problem, an unsupervised and supervised combined malware detection method is proposed. Firstly, a language model is pre-trained on a large amount of unlabeled APK samples using unsupervised methods to learn the rich and complex semantic relationships between different operators. Then, the pre-trained language model is fine-tuned by the labeled malware samples to realize the malware detecting ability. Experiments on datasets such as Drebin demonstrate that the proposed method has better generalization ability and detection performance compared with the baseline method, which achieves a maximum accuracy of 98.7%.

Key words: Android, malware detection, pre-trained language model, unsupervised learning

YIN Jie, HUANG Xiao-yu, LIU Jia-yin, NIU Bo-wei, XIE Wen-wei, . An Android malware detection method based on pre-trained language model[J]. Computer Engineering & Science, 2023, 45(08): 1433-1442.

[1]	ZHAO Wen-hui, WU Xiao-ling, LING Jie, HOON Heo. Multi-domain sentiment analysis of Chinese text based on prompt tuning [J]. Computer Engineering & Science, 2024, 46(01): 179-190.
[2]	YU Jin-ping, ZHU Wei-feng, LIAO Lie-fa. Entity recognition of support policy text based on RoBERTa-wwm-BiLSTM-CRF [J]. Computer Engineering & Science, 2023, 45(08): 1498-1507.
[3]	ZHANG Ying-chen, GAO Sheng-xiang, YU Zheng-tao, WANG Zhen-han, MAO Cun-li, . A Chinese-Vietnamese neural machine translation method using the dual representation of BERT and word embedding [J]. Computer Engineering & Science, 2023, 45(03): 546-553.
[4]	XU Lin-xi,GUO Fan. A hybrid feature-based detection method on Android malware [J]. Computer Engineering & Science, 2017, 39(10): 1837-1846.

An Android malware detection method based on pre-trained language model

PDF

Knowledge

Abstract

Cite this article

share this article

Related Articles 4

Recommended Articles

Metrics

Comments