• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2025, Vol. 47 ›› Issue (02): 238-246.

• 计算机网络与信息安全 • 上一篇    下一篇

rtTorTIM:基于多模态特征融合和Stacking集成学习的实时Tor流量识别方法#br#

王宇飞,刘强,张唯贞,伍晓洁,李佳雯,王煜恒   

  1. (国防科技大学计算机学院,湖南 长沙 410073)

  • 收稿日期:2024-07-04 修回日期:2024-08-03 接受日期:2025-02-25 出版日期:2025-02-25 发布日期:2025-02-21
  • 基金资助:
    国家重点研发计划(2022ZD0209105)

rtTorTIM: A real-time Tor traffic identification method based on multi-modal feature fusion and Stacking ensemble learning

WANG Yufei,LIU Qiang,ZHANG Weizhen,WU Xiaojie,LI Jiawen,WANG Yuheng   

  1. (College of Computer Science and Technology,National University of Defense Technology,Changsha 410073,China)
  • Received:2024-07-04 Revised:2024-08-03 Accepted:2025-02-25 Online:2025-02-25 Published:2025-02-21

摘要: 以Tor网络为代表的匿名网络在带来强隐私性保护的同时也为网络违法犯罪活动提供了温床,因此,开展实时、高精度的Tor网络流量识别研究具有重要的现实意义。为此,针对现有研究存在泛化性不强和实时性差等问题,提出了一种基于多模态特征融合和Stacking集成学习技术的Tor网络流量识别方法rtTorTIM。具体来讲,该方法首先提取Tor网络流量的主机级、流级和包级3种模态相关特征并构造特征数据集;随后,rtTorTIM选取随机森林、线性回归和K-近邻方法作为基学习器,并使用一个线性神经网络进行决策融合,从而构建起一个2层Stacking流量分类器。基于ISCX Tor 2016公开数据集的对比实验结果表明,rtTorTIM方法在Tor流量识别上的准确率、精确率和召回率均达到了99%,同时该方法在分类实时性上也展现出更优的性能。

关键词: Tor匿名网络, 多模态特征提取, 实时流量识别, Stacking集成学习, 机器学习

Abstract: Tor network, as a representative of anonymous networks, offers strong privacy protection while also providing a breeding ground for cybercriminal activities. Therefore, conducting research on real-time and high-precision identification of Tor network traffic is of great practical significance. To address issues of weak generalization and poor real-time performance in existed research, a Tor network traffic identification method, called rtTorTIM, based on multi-modal feature fusion and Stacking ensemble learning technology is proposed. Specifically, the method firstly extracts features from three modalities: host-level, stream-level, and packet-level of Tor network traffic, and then constructs a feature dataset. Random forest, linear regression, and K-nearest neighbor methods are subsequently selected as base learners, along with a linear neural network for decision fusion, to construct a two-layer Stacking traffic classifier. Comparative experimental results based on ISCX Tor 2016 public dataset show that accuracy, precision, and recall  of the rtTorTIM method   are all 99% in Tor traffic identification, while also demonstrating better performance in terms of real-time classification.

Key words: Tor anonymous network, multi-modal feature extraction, real-time traffic identification, Stacking ensemble learning, machine learning