一种基于聚类与噪声的网络流量分类方法

计算机工程与科学 ›› 2022, Vol. 44 ›› Issue (07): 1207-1215.

• 计算机网络与信息安全 • 上一篇下一篇

一种基于聚类与噪声的网络流量分类方法

庞兴龙，朱国胜，杨少龙，李修远

（湖北大学计算机与信息工程学院，湖北武汉 430062）

收稿日期:2021-10-12 修回日期:2021-12-14 接受日期:2022-07-25 出版日期:2022-07-25 发布日期:2022-07-25

A network traffic classification method based on clustering and noise

PANG Xing-long，ZHU Guo-sheng，YANG Shao-long，LI Xiu-yuan

(School of Computer and Information Engineering,Hubei University,Wuhan 430062,China)

Received:2021-10-12 Revised:2021-12-14 Accepted:2022-07-25 Online:2022-07-25 Published:2022-07-25

摘要/Abstract

摘要： 在标注现实网络流量数据的过程中难免会造成标签错误标记的情况，导致标签数据不可避免地受到噪声污染，即样本的观测标签与真实标签存在差异。为降低噪声标签对分类器分类准确率的负面影响，考虑引入噪声的2种情况，即正确标签类型错误标记和标签类型错误拼写，并提出一种基于标签噪声纠正的网络流量分类方法，该方法利用聚类和权重划分来对观测样本进行评估和修复。在2个网络流量数据集上的实验结果表明，与3种标签噪声修复算法STC、CC和ADE相比，提出的修复算法在不同噪声比例干扰下对最终的分类结果都有一定的提升。在NSL-KDD数据集上，标签平均修复率分别提高23.00％,7.58%和2.05％左右；在MOORE数据集上，标签平均修复率分别提高35.12％,10.40%和471％左右，在最终分类模型上有较好的分类稳定性。

关键词: 带噪标签, 网络流量分类, K-means聚类, 标签修复

Abstract: Because the real network traffic data inevitably cause wrong labeling in label labeling, the label data are inevitably polluted by noise, that is, the observed label of the sample is different from the real label. In order to reduce the negative impact of noise labels on the classification accuracy of the classifiers, this experiment considers two cases of wrong labeling: wrong labeling of correct label type and wrong spelling of label type. A network traffic classification method based on label noise correction is proposed. The method uses clustering and weight division to evaluate and repair the observation samples, and experiments are carried out on two network traffic datasets. The experimental results show that, compared with the three tag noise repair algorithms STC, CC and ADE, the proposed repair algorithm has a certain improvement on the final classification results under the interference of different noise proportions. On the NSL-KDD data set, the average tag correction rates are increased by 23.00%, 7.58% and 2.05% respectively; Similarly, on the MOORE data set, the average correction rates of tags are increased by 35.12%, 10.40% and 4.71% respectively. The proposal has good classification stability in the final classification model.

Key words: noisy label, network traffic classification, K-means clustering, label repair

庞兴龙, 朱国胜, 杨少龙, 李修远. 一种基于聚类与噪声的网络流量分类方法[J]. 计算机工程与科学, 2022, 44(07): 1207-1215.

PANG Xing-long, ZHU Guo-sheng, YANG Shao-long, LI Xiu-yuan. A network traffic classification method based on clustering and noise[J]. Computer Engineering & Science, 2022, 44(07): 1207-1215.

编辑推荐

Metrics

阅读次数

全文

287

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	0	0	287

来源	本网站	其他网站

次数	248	39
比例	86%	14%

摘要

172

最新录用	在线预览	正式出版

0	0	172

	来源	本网站

	次数	172
	比例	100%

[1]	李兰, 刘杰, 张洁. 基于YOLOv4改进算法的复杂行人检测模型研究[J]. 计算机工程与科学, 2022, 44(08): 1449-1456.
[2]	苏小会, 张玉西, 徐淑萍, 尚煜. 改进K-means聚类算法行驶工况及油耗研究[J]. 计算机工程与科学, 2021, 43(11): 2020-2026.
[3]	刘梓璇，周建涛. 负载均衡的主导资源公平分配算法[J]. 计算机工程与科学, 2019, 41(09): 1574-1580.
[4]	马琴，张兴忠，李海芳，邓红霞. 基于谱残差和聚类法的运动目标检测研究[J]. 计算机工程与科学, 2018, 40(10): 1867-1873.
[5]	秦铭，蔡明. 基于分类融合和关联规则挖掘的图像语义标注[J]. 计算机工程与科学, 2018, 40(05): 950-956.
[6]	谢修娟1,李香菊1，莫凌飞2. 基于改进K-means算法的微博舆情分析研究[J]. 计算机工程与科学, 2018, 40(01): 155-158.

一种基于聚类与噪声的网络流量分类方法

A network traffic classification method based on clustering and noise

PDF

可视化

摘要/Abstract

引用本文

使用本文

相关文章 6

编辑推荐

Metrics

本文评价