• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2023, Vol. 45 ›› Issue (08): 1354-1364.

• 高性能计算 • 上一篇    下一篇

基于注意力机制的城市多元空气质量数据缺失值填充

马思远1,2,焦佳辉1,2,任晟岐1,2,宋伟1   

  1. (1.郑州大学河南省大数据研究院,河南 郑州 450052;2.郑州大学计算机与人工智能学院,河南 郑州 450001)
  • 收稿日期:2022-08-12 修回日期:2022-09-23 接受日期:2023-08-25 出版日期:2023-08-25 发布日期:2023-08-18
  • 基金资助:
    国家高能物理科学数据中心开放课题(HT-HEPS-T7-01050200-21-0008);河南省科技攻关计划国际合作项目(172102410065);河南省高等学校重点科研项目(22A520010)

Missing value filling for multi-variable urban air quality data based on attention mechanism

MA Si-yuan1,2,JIAO Jia-hui1,2,REN Sheng-qi1,2,SONG Wei1   

  1. (1.Henan Academy of Big Data,Zhengzhou University,Zhengzhou 450052;
    2.School of Computer and Artificial Intelligence,Zhengzhou University,Zhengzhou 450001,China)
  • Received:2022-08-12 Revised:2022-09-23 Accepted:2023-08-25 Online:2023-08-25 Published:2023-08-18

摘要: 空气污染严重影响着人类的身体健康与社会的可持续发展,但传感器获取的多元变量空气质量数据往往存在缺失值,这为数据的分析与处理带来了困扰。目前,许多对某一种空气成分变化的分析方法只依赖于此属性的时间数据与空间数据,忽略了在相同时间区间内其他空气成分对此属性变化趋势的影响,且在离散型缺失数据的填充上难以达到理想的效果。提出了一种时间注意力深度学习模型(TAM)。该模型使用注意力机制来关注不同时间戳之间的相关性与不同特征时间序列之间的相关性,并结合短期历史数据来填充多元变量空气质量数据中的缺失读数。使用北京市的空气质量数据对所提出的模型进行评估,实验结果表明,相比较于其他10种基线模型,TAM具有一定优势。

关键词: 空气质量, 缺失值填充, 注意力机制, 深度学习

Abstract: Air pollution seriously affects human health and social sustainable development.However, the multi-variable air quality data obtained by sensors often have missing values, which brings difficulties to data analysis and processing.Currently, many analysis methods for changes in a certain air component only rely on time data and spatial data of this attribute, ignoring the influence of other air components on the trend of this attribute in the same time interval.In addition, it is difficult to achieve ideal results in filling discrete missing data.This paper proposes a Time Attention Model (TAM) based on deep learning, which uses attention mechanism to focus on the correlation between different timestamps and the correlation between different feature time series, and combines short-term historical data to fill missing values in multi-variable air quality data.The proposed model is evaluated using air quality data from Beijing, and the experimental results show that TAM has advantages over ten other baseline models.

Key words: air quality, missing data imputation, attention mechanism, deep learning