• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2021, Vol. 43 ›› Issue (06): 1060-1066.

• 人工智能与数据挖掘 • 上一篇    下一篇

k最近邻流序列算法对异常流检测的优化研究

刘云,王梓宇   

  1. (昆明理工大学信息工程与自动化学院,云南 昆明 650500)
  • 收稿日期:2020-04-03 修回日期:2020-06-30 接受日期:2021-06-25 出版日期:2021-06-25 发布日期:2021-06-22
  • 基金资助:
    国家自然科学基金(61761025)

Optimization of anomalous flow detection by k-nearest neighbor flow sequence algorithm

LIU Yun,WANG Zi-yu   

  1. (Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China)
  • Received:2020-04-03 Revised:2020-06-30 Accepted:2021-06-25 Online:2021-06-25 Published:2021-06-22

摘要: 通过时空异常流检测技术可以发现城市交通数据中的异常交通特征。与时间序列中单个异常流检测采用的方法不同,提出了从流序列中检测异常流分布的k最近邻流序列算法(kNNFS)。算法首先为每个位置测定每个时间区间内的单个流观测值;随后计算单个流的观测频率来构建每个位置处每个时间区间的流分布概率库;最后由阈值判定使用KL散度计算的新的流分布概率与其k最近邻之间的距离是否为异常值,距离值小于阈值则更新入流分布概率库,否则为异常的流分布。仿真分析表明,对比DPMM算法和SETMADA算法,kNNFS算法在检测精度和算法运行时间方面均有优化提升。


关键词: 时空流序列, 异常流分布检测, k最近邻, KL散度

Abstract: Through the spatio-temporal anomalous flow detection technology, anomalous traffic cha- racteristics can be found in urban traffic data. Different from the single anomalous flow detection method from a time sequence, this paper proposes a k-nearest neighbor flow sequence algorithm (kNNFS) for detecting anomalous flow distributions from flow sequences. Firstly, the individual flow observation is measured in each time interval at each location. Then, a flow distribution probability database is built for each time interval at each location by calculating the frequency of the observation of a single flow. Finally, a threshold is used to determine whether the distance between a new flow distribution probability and its k nearest neighbor calculated by the KL divergence is an abnormal value. If the distance value is less than the threshold, the new flow distribution probability is updated into the historical flow distribution probability database; otherwise it is an anomalous flow distribution. Simulation results show that the kNNFS algorithm outperforms the DPMM algorithm and the SETMADA algorithm in terms of accuracy and running time.



Key words: spatio-temporal flow sequence, anomalous flow distribution detection, k-nearest neighbor, KL divergence

中图分类号: