基于机器学习的移动代理应用流量识别方法

计算机工程与科学 ›› 2022, Vol. 44 ›› Issue (04): 654-664.

• 计算机网络与信息安全 • 上一篇下一篇

基于机器学习的移动代理应用流量识别方法

崔弘1,赵双2,张广胜3,苏金树2

(1.烽火通信科技股份有限公司，湖北武汉 430074；2.国防科技大学计算机学院，湖南长沙 410073；
3.中央军委政法委员会，北京 100080)

收稿日期:2020-08-16 修回日期:2020-12-14 接受日期:2022-04-25 出版日期:2022-04-25 发布日期:2022-04-20
基金资助:

A mobile proxy application traffic identification method based on machine learning

CUI Hong1,ZHAO Shuang2,ZHANG Guang-sheng3,SU Jin-shu2

(1.FiberHome Telecommunication Technologies Co.,Ltd.,Wuhan 430074;
2.College of Computer Science and Technology,National University of Defense Technology,Changsha 410073;
3.Investigation Technology Center of PLA,Beijing 100080,China)

Received:2020-08-16 Revised:2020-12-14 Accepted:2022-04-25 Online:2022-04-25 Published:2022-04-20

摘要/Abstract

摘要： 随着移动网络的迅速发展，越来越多的用户选择使用代理应用，以保护个人网络隐私，隐藏上网行为或绕开网络活动限制，给网络管理与审计带来了新的挑战。与此同时，恶意攻击者可利用代理应用隐藏身份，使得恶意行为更难以检测和防范。因此，代理应用流量识别对网络管理与安全具有重要的作用，但目前该问题并未得到充分的研究。由于代理应用流量通常经过加密或混淆处理，传统的流量识别技术无法被有效应用。为实现准确、快速的移动代理应用流量识别，提出一组与负载无关的流量特征，并首次加入TCP层option字段用于刻画流量。基于4种机器学习算法训练的分类器和2种流量识别对象，验证提出的特征对识别移动代理应用流量的有效性，并对各类特征的重要性进行分析。实验结果表明，提出的特征能有效识别代理应用流量。在识别流量是否经由代理时，基于随机森林的分类器可达到99%以上的整体准确率。识别流量所属代理应用时，整体准确率高于94%。在公开数据集ISCX VPN-nonVPN上与其他方法相比，提出的方法识别准确率更高，并具有更快的识别速度，适合实时流量识别场景。

关键词: 代理应用流量识别, 移动应用, 机器学习, 流量特征, 决策树

Abstract: With the rapid development of mobile networks, more users choose to protect privacy, hide online behavior and bypass the restrictions of networks by using proxy applications. As a result, new challenges are brought to network management and auditing. In addition, malicious attackers can use proxy to hide their identity, making it more difficult to detect and prevent such malicious behavior. Therefore, proxy application traffic identification plays an important role in network management and security, while this issue has not been fully studied at present. Because the proxy application traffic is usually encrypted and obfuscated, the traditional traffic identification methods can not be applied effectively. To achieve accurate and fast traffic identification of mobile proxy applications, a set of side- channel traffic features that are independent of the payload is proposed. The option field in the TCP header is used for the first time to describe the traffic characteristics. Four machine learning algorithms with two kinds of identification objects are utilized to validate the effectiveness and importance of the proposed feature set. The experimental results show that the proposed features can effectively identify proxy application traffic. More than 99% accuracy can be achieved when identifying whether traffic is forwarded by proxy applications based on random forest. Moreover, the average accuracy is higher than 94% when identifying which proxy application the traffic belongs to. Compared with other methods, the proposed method has better accuracy and faster classification speed on the public dataset ISCX VPN- nonVPN. Hence, it is more suitable for real-time traffic identification scenarios.

Key words: proxy application traffic identification, mobile application, machine learning, traffic feature, decision tree

崔弘, 赵双, 张广胜, 苏金树. 基于机器学习的移动代理应用流量识别方法[J]. 计算机工程与科学, 2022, 44(04): 654-664.

CUI Hong, ZHAO Shuang, ZHANG Guang-sheng, SU Jin-shu. A mobile proxy application traffic identification method based on machine learning[J]. Computer Engineering & Science, 2022, 44(04): 654-664.

编辑推荐

Metrics

阅读次数

全文

432

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	0	0	432

来源	本网站	其他网站

次数	350	82
比例	81%	19%

摘要

353

最新录用	在线预览	正式出版

0	0	353

	来源	本网站

	次数	353
	比例	100%

[1]	温鑫, 曾焘, 李春波, 徐子晨. 面向服务器无感计算的模型推理服务切换方法研究[J]. 计算机工程与科学, 2024, 46(07): 1210-1217.
[2]	丁建平, 李卫军, 刘雪洋, 陈旭. 命名实体识别研究综述[J]. 计算机工程与科学, 2024, 46(07): 1296-1310.
[3]	黄智慧, 肖祥立, 张玉书, 薛明富. 基于隐形后门水印的开源数据集版权保护[J]. 计算机工程与科学, 2024, 46(06): 1013-1021.
[4]	高珊, 李世杰, 蔡志平. 基于深度学习的中文文本分类综述[J]. 计算机工程与科学, 2024, 46(04): 684-692.
[5]	黄鹏程, 冯超超, 马驰远, . 未知工艺角下时序违反的机器学习预测[J]. 计算机工程与科学, 2024, 46(03): 395-399.
[6]	李扬, 尹大鹏, 马自强, 姚梓豪, 魏良根, . 结合决策树和AdaBoost的缓存侧信道攻击检测[J]. 计算机工程与科学, 2024, 46(03): 440-452.
[7]	彭畅, 刘青枝, 陈长波, . 多面体模型下的循环置换与自动调优[J]. 计算机工程与科学, 2023, 45(12): 2121-2134.
[8]	赵振宇, 杨天豪, 蒋汶乘, 张书政. 基于机器学习的多压多温多参标准单元延迟快速计算方法[J]. 计算机工程与科学, 2023, 45(08): 1331-1338.
[9]	李小玲, 方建滨, 马俊, 谭霜, 谭郁松. 基于监督学习的稀疏矩阵自动任务分配[J]. 计算机工程与科学, 2023, 45(05): 782-789.
[10]	胡艳芳, 熊文, 高炜. 基于 Spark 平台的网络游戏用户流失预测方法[J]. 计算机工程与科学, 2022, 44(10): 1730-1737.
[11]	唐阳坤, 鲜港, 杨文祥, 喻杰, 张晓蓉, 王耀彬. 基于用户行为的超级计算机作业失败预测方法[J]. 计算机工程与科学, 2022, 44(10): 1753-1761.
[12]	楚阳, 徐文龙. 基于计算机辅助诊断技术的阿尔兹海默症早期分类研究综述[J]. 计算机工程与科学, 2022, 44(05): 879-893.
[13]	刘国强, 赵振宇, 赵晨煜, 韩奥, 杨天豪. 基于机器学习的PCB布线电阻计算方法[J]. 计算机工程与科学, 2022, 44(03): 396-402.
[14]	李文丽. 基于朴素贝叶斯分类的网络谣言识别研究[J]. 计算机工程与科学, 2022, 44(03): 495-501.
[15]	贾俊杰, 段超强. 基于评分离散度的托攻击检测算法[J]. 计算机工程与科学, 2022, 44(03): 554-562.

基于机器学习的移动代理应用流量识别方法

A mobile proxy application traffic identification method based on machine learning

PDF

可视化

摘要/Abstract

引用本文

使用本文

相关文章 15

编辑推荐

Metrics

本文评价