• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2024, Vol. 46 ›› Issue (06): 1072-1080.

• 图形与图像 • 上一篇    下一篇

基于网络融合的改进MobileViT人脸表情识别

邓翔宇,裴浩媛,盛迎   

  1. (西北师范大学物理与电子工程学院,甘肃 兰州 730070)

  • 收稿日期:2023-04-26 修回日期:2023-10-13 接受日期:2024-06-25 出版日期:2024-06-25 发布日期:2024-06-18

Facial expression recognition based on network fusion to improve MobileViT

DENG Xiang-yu,PEI Hao-yuan,SHENG Ying   

  1. (College of Physics and Electronic Engineering,Northwest Normal University,Lanzhou 730070,China)
  • Received:2023-04-26 Revised:2023-10-13 Accepted:2024-06-25 Online:2024-06-25 Published:2024-06-18

摘要: 从轻量化模型的角度提出一种基于网络融合的改进MobileViT人脸表情识别网络。该网络将多尺度卷积PSConv和注意力机制通过残差结构进行融合,形成RAPSConv特征重构模块,该模块能从细粒度角度更高效地提取多尺度特征,加强关键特征表达,进而提高网络的表达能力,构建出一个端到端的表情识别网络。同时,为了进一步缩小同类表情间差距,提出联合使用Softmax Loss和Center Loss损失函数,有效减少了表情识别的误判率。实验结果表明,改进后的网络在3个自然场景表情数据集FER2013、FER+和RAF-DB上的准确率均优于基础网络MobileViT,准确率分别提高了1.73%,2.18%和1.64%,改进后的网络参数量较少,鲁棒性较强,便于实现轻量化和集成,适合人脸表情识别在现实场景中的应用。

关键词: 人脸表情识别, MobileViT, 多尺度卷积PSConv, 注意力机制, 网络融合, 轻量化网络

Abstract: From the perspective of lightweight models, a facial expression recognition network based on network fusion to improve MobileViT is proposed. This network integrates multi-scale convolution PSConv and attention mechanisms through residual structures to form the RAPsconv feature reconstruction module. This module can more efficiently extract multi-scale features from a fine-grained perspective, enhancing the expression of key features, thereby improving the network's expressive ability and constructing an end-to-end facial expression recognition network. Additionally, to further narrow the gap between similar expressions, a loss function combining Softmax Loss and Center Loss is proposed, effectively reducing the misjudgment rate of expression recognition. Experimental results demonstrate that the improved network achieves higher accuracy on three natural scene expression datasets FER2013, FER+, and RAF-DB compared to the base network MobileViT, with accuracy improvements of 1.73%, 2.18%, and 1.64%, respectively. The improved network has fewer parameters, stronger robustness, and is suitable for lightweighting and integration, making it suitable for real-world applications in facial expression recognition.


Key words: facial expression recognition, MobileViT, multi-scale convolutional PSConv, attention mechanism, network fusion, lightweight network