• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 论文 • 上一篇    下一篇

基于改进凝聚层次聚类的协议分类算法

张凤荔1,周洪川1,张俊娇1,刘渊2,张春瑞2   

  1. (1.电子科技大学信息与软件工程学院,四川 成都 611731;
    2.中国工程物理研究院计算机应用研究所,四川 绵阳 621900)
  • 收稿日期:2015-06-22 修回日期:2016-01-29 出版日期:2017-04-25 发布日期:2017-04-25
  • 基金资助:

    国家NASF基金(U1230106);中国工程物理研究院科学技术发展基金(2012A0403021);四川省科技计划(2014GZ0109,2015KZ002);国家自然科学基金(61472064)

A protocol classification algorithm based on improved AGNES

ZHANG Feng-li1,ZHOU Hong-chuan1,ZHANG Jun-jiao1,LIU Yuan2,ZHANG Chun-rui2   

  1. (1.School of Information and Software Engineering,University of Electronic Science & Technology of China,Chengdu 611731;
    2.Institute of Computer Application,China Academy of Engineering Physics,Mianyang 621900,China)
     
  • Received:2015-06-22 Revised:2016-01-29 Online:2017-04-25 Published:2017-04-25

摘要:

在比特流未知协议识别过程中,针对如何将得到的多协议数据帧分为单协议数据帧这一问题,提出了一种改进的凝聚型层次聚类算法。该算法以传统的凝聚型层次聚类算法思想为基础,结合比特流数据帧的特征,定义了数据帧之间及类簇之间的相似度,采用边聚类边提取符合要求类簇的方式,能快速有效地对数据帧进行聚类;并且该算法能自动地确定聚类的个数,所得的类簇含有相似度评价指标。利用林肯实验室公布的数据集进行测试,说明该算法能以较高的正确率对协议数据帧进行聚类。
 

关键词: 未知协议, 协议识别, 层次聚类算法, 聚类

Abstract:

In the process of bit-stream unknown protocol identification, how to separate multi-protocol data frames into single protocol data frames is a challenging issue. To solve this problem, we propose an improved algorithm based on the idea of traditional AGNES algorithm. Combining the features of bit-stream data frames, this algorithm can define the similarity between data frames and the similarity between clusters by two different ways. We perform clustering and extract clusters that meet the requirements of the class cluster simultaneously. Protocol data frames can be clustered quickly and effectively without inputting the number of clusters. And a similarity evaluation is included in the results of class clusters. Tests on the data set published by the Lincoln Laboratory show that the algorithm has a higher accuracy rate for clustering protocol data frames.

Key words: unknown protocol, protocol identification, hierarchical clustering algorithm, clustering