• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2023, Vol. 45 ›› Issue (05): 931-939.

• 人工智能与数据挖掘 • 上一篇    下一篇

Conv-WGAIN:面向多元时序数据缺失的卷积生成对抗插补网络模型

刘子建1,2,丁维龙1,2,邢梦达1,2,李寒1,2,黄晔3   

  1. (1.北方工业大学信息学院,北京 100144;
    2.大规模流数据集成与分析技术北京市重点实验室,北京  100144;3.中央军委国防动员信息中心,北京 100034)
  • 收稿日期:2022-08-31 修回日期:2022-10-28 接受日期:2023-05-25 出版日期:2023-05-25 发布日期:2023-05-17
  • 基金资助:
    北京市自然科学基金(4202021)

Conv-WGAIN:Convolutional generative adversarial imputation net for multivariate time series missing data

LIU Zi-jian1,2,DING Wei-long1,2,XING Meng-da1,2,LI Han1,2,HUANG Ye3   

  1. (1.School of Information Science and Technology,North China University of Technology,Beijing 100144;
    2.Beijing Key Laboratory on Integration and Analysis of Large-Scale Stream Data,Beijing 100144;
    3.Information Center of the National Defense Mobilization Department of the Central Military Commission,Beijing 100034,China)
  • Received:2022-08-31 Revised:2022-10-28 Accepted:2023-05-25 Online:2023-05-25 Published:2023-05-17

摘要: 油浸式变压器的油色谱数据是一种多元时序传感数据,设备或网络失误往往会导致数据缺失,通常需要通过插补形成完整数据集,才能用于进一步的业务分析研究。但是,现有的插补模型无法面向多元时序数据同时处理因时间不均匀性和时间双向性带来的插补效率低和效果难以保障的问题,对此提出一种名为Conv-WGAIN的生成对抗插补网络模型,通过构建的插补特征图,可利用二维卷积从前后2个方向学习时间特征,处理时间间隔不均匀的数据;在判别器中引入Wasserstein距离来判别生成插补数据与真实观测数据,提升了生成器的稳定性。在真实项目中的油色谱数据集和3个公开数据集上的实验表明,该模型在多元时序缺失数据上具有普遍适用性,而且在不同的缺失率下的插补结果要优于其他对比模型的,RMSE降低了20.75%~73.37%。

关键词: 生成对抗插补网络, 多元时序数据, 卷积神经网络, Wasserstein距离, 缺失值插补

Abstract: Gas chromatography data of oil-immersed transformers is a kind of multivariate time series, but such data is often missing due to equipment or network failures. Imputation is usually required to form a complete dataset for further business analysis and research. However, the existing imputation models cannot deal with multivariate time series data conveniently to guarantee the efficiency and effect from the inherent characteristics of temporal irregularity and temporal bidirectionality. In this paper, a model Conv-WGAIN is proposed based on the Generative Adversarial Imputation Nets (GAIN). Through the constructed imputation feature map, 2D convolution can be used to learn temporal bidirectional features and simultaneously deal with irregular time intervals. The Wasserstein distance is introduced in discriminator for judgement to improve the stability of the model. Experiments on gas chromatography datasets from a real project and 3 public datasets show that our work is universal for data imputation on multivariate time series missing, and Conv-WGAIN outperforms other baselines with 20.75% to 73.37% in metric RMSE. 

Key words: generative adversarial imputation nets, multivariate time series data, convolutional neural network, Wasserstein distance, missing value imputation