• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2024, Vol. 46 ›› Issue (11): 2045-2052.

• 图形与图像 • 上一篇    下一篇

基于CNN和Transformer特征融合的烟雾识别方法

付燕,杨旭,叶鸥   

  1. (西安科技大学计算机科学与技术学院,陕西 西安 710600)
  • 收稿日期:2023-08-15 修回日期:2023-12-19 接受日期:2024-11-25 出版日期:2024-11-25 发布日期:2024-11-27
  • 基金资助:
    中国博士后科学基金(2020M673446)

A smoke recognition method based on CNN and Transformer feature fusion

FU Yan,YANG Xu,YE Ou   

  1. (College of Computer Science & Technology,Xi’an University of Science and Technology,Xi’an 710600,China)
  • Received:2023-08-15 Revised:2023-12-19 Accepted:2024-11-25 Online:2024-11-25 Published:2024-11-27

摘要: 当前许多烟雾识别方法存在虚警率较高的问题,部分原因是当前大部分卷积神经网络(CNN)在特征提取过程中主要关注烟雾图像的局部信息,而忽略了烟雾图像的全局特征。这种偏重于局部信息的处理方式在处理多变且复杂的烟雾图像时,容易导致误判的情况发生。为了解决这一问题,需要更加准确地捕捉烟雾图像的全局特征,从而改善烟雾识别方法的准确性。因此,提出了一种结合Inception和Transformer结构的双分支烟雾识别方法TCF-Net。该方法改进了Inception模型,既丰富了特征种类,又减少了通道数的冗余;其次,引入了Transformer中的自注意力机制,将自注意力机制学习全局上下文信息的能力与卷积神经网络学习局部相对位置信息的能力相结合,在特征提取过程中嵌入了特征耦合模块FCU,连续地对双分支中的局部特征和全局信息进行交互,以最大程度保留双分支中的局部信息和全局信息,提高本文方法的性能。该方法能够对视频帧进行分类,将其识别为3种状态:黑色烟雾、白色烟雾和无烟雾。实验结果显示,改进后的烟雾识别方法可以更好地提取烟雾的特征,在降低虚警率的同时将准确率提升至97.8%,证实了该方法具有较好的性能。

关键词: 烟雾识别, 卷积神经网络, 自注意力机制, 特征融合

Abstract: Currently, many smoke recognition algorithms suffer from high false alarm rates, partly due to the fact that most existing convolutional neural networks (CNNs) mainly focus on local information in smoke images during feature extraction, neglecting the global features of smoke images. This bias towards local information processing can easily lead to misjudgments when dealing with variable and complex smoke images. To address this issue, it is necessary to capture the global features of smoke images more accurately, thereby improving the accuracy of smoke recognition algorithms. Therefore, this paper propose a dual-branch smoke recognition method, TCF-Net, which combines the Inception and Transformer structures. This model is improved to enrich feature diversity while reducing channel redundancy. Additionally, the self-attention mechanism from Transformer is introduced, combining its ability to learn global context information with CNNs capacity to learn local relative position information. During feature extraction, a feature coupling unit (FCU) is embedded to continuously interact the local features and global information in both branches, maximizing the retention of both local and global information and enhancing the performance of the algorithm. The proposed algorithm can classify video frames into three states: black smoke, white smoke, and no smoke. Experimental results show that the improved network can better extract smoke features, reducing the false alarm rate while increasing the accuracy to 97.8%, confirming the excellent performance of the algorithm.

Key words: smoke recognition, convolutional neural network, self-attention mechanism, feature fusion