• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2011, Vol. 33 ›› Issue (11): 10-14.

• 论文 • 上一篇    下一篇

基于改进分簇算法的网络流量识别方法

王宇科1,黎文伟2,苏欣2   

  1. (1.湖南大学网络信息中心,湖南 长沙 410082;2.湖南大学信息科学与工程学院,湖南 长沙 410082)
  • 收稿日期:2011-06-03 修回日期:2011-09-04 出版日期:2011-11-25 发布日期:2011-11-25
  • 基金资助:

    国家973计划资助项目(2007CB310702);湖南省科技计划重点项目(2009JT1018)

A Method of Network Traffic Identification Based on Improved Clustering Algorithms

WANG Yuke1,LI Wenwei2,SU Xin2   

  1. (1.Center of Network Information,Hunan University,Changsha 410082;
    2.School of Information Science and Engineering,Hunan University,Changsha 410082,China)
  • Received:2011-06-03 Revised:2011-09-04 Online:2011-11-25 Published:2011-11-25

摘要:

网络流量相关应用的自动检测对于网络安全和流量管理来说非常重要。但是,由于PeertoPeer(P2P)、VOIP等网络新应用使用动态端口、伪装和加密流等技术,使得基于端口匹配和数据包特征字段分析等识别方法在识别这些应用时存在一定的难度。不少研究工作提出了分簇算法进行流量识别,但现有的分簇算法在簇中心和簇数目的选择上存在一定缺陷。本文首先使用基于Weighting D2算法对初始化簇中心选择进行改进,通过NMI值来确定簇的数目,得到改进的分簇算法,并提出一种基于该算法的应用层流量识别方法。对于应用层流量,尤其是P2P应用识别实验结果表明,该方法能达到90%以上的识别率以及较低的误识别率和漏识别率。

关键词: 流量识别, 分簇算法, 簇中心, 簇数目

Abstract:

The automatic detection of applications associated with network traffic is very important for network security and traffic management. Unfortunately, because of some of the applications like P2P, VOIP applications using dynamic port numbers, masquerading techniques, and encryption, it is difficult using simple portbased analysis to classify packet payloads in order to identify these applications. And many research works have proposed using the clustering algorithms to identify network traffic, but these algorithms have some defects in how to choose the cluster center and the number of clusters. In this paper, we first use the Weighting D2 algorithm to improve the selection of the initialized cluster centers, and use the value of NMI(Normalize Mutual Information)to ascertain the number of clusters, and then get an improved clustering algorithm, and finally propose a application level identification method based on this algorithm. The experimental results show that this method  reaches 90% accuracy or more, and gets lower False Positive Rate and False Rejection Rate.

Key words: traffic identification;clustering algorithm;center of cluster;number of cluster