• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2022, Vol. 44 ›› Issue (04): 654-664.

• Computer Network and Znformation Security • Previous Articles     Next Articles

A mobile proxy application traffic identification method based on machine learning

CUI Hong1,ZHAO Shuang2,ZHANG Guang-sheng3,SU Jin-shu2   

  1. (1.FiberHome Telecommunication Technologies Co.,Ltd.,Wuhan 430074;
    2.College of Computer Science and Technology,National University of Defense Technology,Changsha 410073;
    3.Investigation Technology Center of PLA,Beijing 100080,China)
  • Received:2020-08-16 Revised:2020-12-14 Accepted:2022-04-25 Online:2022-04-25 Published:2022-04-20

Abstract: With the rapid development of mobile networks, more users choose to protect privacy, hide online behavior and bypass the restrictions of networks by using proxy applications. As a result, new challenges are brought to network management and auditing. In addition, malicious attackers can use proxy to hide their identity, making it more difficult to detect and prevent such malicious behavior. Therefore, proxy application traffic identification plays an important role in network management and security, while this issue has not been fully studied at present. Because the proxy application traffic is usually encrypted and obfuscated, the traditional traffic identification methods can not be applied effectively. To achieve accurate and fast traffic identification of mobile proxy applications, a set of side- channel traffic features that are independent of the payload is proposed. The option field in the TCP header is used for the first time to describe the traffic characteristics. Four machine learning algorithms with two kinds of identification objects are utilized to validate the effectiveness and importance of the proposed feature set. The experimental results show that the proposed features can effectively identify proxy application traffic. More than 99% accuracy can be achieved when identifying whether traffic is forwarded by proxy applications based on random forest. Moreover, the average accuracy is higher than 94% when identifying which proxy application the traffic belongs to. Compared with other methods, the proposed method has better accuracy and faster classification speed on the public dataset ISCX VPN- nonVPN. Hence, it is more suitable for real-time traffic identification scenarios.

Key words: proxy application traffic identification, mobile application, machine learning, traffic feature, decision tree