基于集成学习双流神经网络的实时面部篡改视频检测模型

计算机工程与科学 ›› 2023, Vol. 45 ›› Issue (03): 470-477.

• 计算机网络与信息安全 • 上一篇下一篇

基于集成学习双流神经网络的实时面部篡改视频检测模型

袁野1,2,3，黄丽清1,2,3，叶锋1,2,3，黄添强1,2,3，罗海峰1,2,3，徐超1,2,3

(1.福建师范大学计算机与网络空间安全学院，福建福州 350117；
2.数字福建大数据安全技术研究所，福建福州 350117；
3.福建省公共服务大数据挖掘与应用工程技术研究中心，福建福州 350117)

收稿日期:2022-10-27 修回日期:2022-12-25 接受日期:2023-03-25 出版日期:2023-03-25 发布日期:2023-03-22
基金资助:
国家自然科学基金(62072106)；福建省自然科学基金(2020J01168，2022J01190，2022J01188)；福建省高校产学合作项目（2021H6004）；福建省中青年教师教育科研项目（JAT210053，JAT210051）

A real-time facial manipulation video detection model based on ensemble learning dual-stream neural network

YUAN Ye1,2,3，HUANG Li-qing1,2,3，YE Feng1,2,3，HUANG Tian-qiang1,2,3，LUO Hai-feng1,2,3，XU Chao1,2,3

(1.College of Computer and Cyber Security，Fujian Normal University，Fuzhou 350117;
2.Digital Fujian Institute of Big Data Security Technology，Fuzhou 350117;
3.Fujian Provincial Engineering Research Center of Big Data Analysis and Application，Fuzhou 350117，China)

Received:2022-10-27 Revised:2022-12-25 Accepted:2023-03-25 Online:2023-03-25 Published:2023-03-22

摘要/Abstract

摘要： 恶意面部篡改对社会安全和稳定存在负面影响，对面部篡改后的视频图像进行准确的检测是一个十分重要的课题。为了解决视频检测模型实时性较差的问题，提出一种基于集成学习双流循环神经网络的面部篡改视频检测模型，并引入集成学习中的投票机制。首先,接收少量连续的序列帧，通过卷积神经网络进行空间特征的提取，同时引入中心差分卷积进行空间域的篡改伪影增强。然后，将连续的序列帧进行差分，以增强时间域上的篡改伪影，同时通过卷积神经网络进行时间特征的提取。随后，将空间域和时间域的双流特征向量进行拼接，通过循环神经网络进行特征提取。在循环神经网络特征提取过程中，逐帧的特征信息被保留下来作为后续辅助帧级分类器的输入，同时循环神经网络的最终输出作为视频级判别器的输入。最后，引入集成模型的投票机制整合多个辅助帧级判别器和视频级判别器的输出，并通过引入权重超参数γ来平衡辅助帧级判别器和视频级判别器的重要程度，帮助模型提高检测准确率。在FaceForensics++数据集上，与主流检测模型进行对比，所提模型平均准确率提升了0.4%和1.0%。同时，所提模型可以仅使用较少连续帧进行篡改检测，提高了模型的实时性。

关键词: Deepfake, 卷积神经网络, 循环神经网络, 投票机制, 中心差分卷积

Abstract: Malicious face manipulation has a negative impact on social security and stability, and it is a very important issue to accurately detect video images after face tampering. In order to solve the problem of poor real-time performance of video manipulation detection model, this paper proposes a face manipulation video detection model based on ensemble learning dual-stream recurrent neural network, and introduces the voting mechanism in ensemble learning. The model first receives a small number of consecutive sequence frames, extracts spatial features through a convolutional neural network, and introduces central differential convolution to enhance tampering artifacts in the spatial domain. The model then differentiates consecutive sequence frames to enhance tampering artifacts in the temporal domain, while temporal feature extraction is performed through a convolutional neural network. Then, the model splices the dual-stream feature vectors in the spatial domain and the time domain, and performs feature extraction through a recurrent neural network. During the feature extraction process of the recurrent neural network , the frame-by-frame feature information is retained as the input of the subsequent auxiliary frame-level classifier, while the final output of the recurrent neural network is used as the input of the video-level discriminator. Finally, the model introduces the voting mechanism of the integrated model to integrate the outputs of multiple auxiliary frame-level discriminators and video-level discriminators, and introduces a weight hyperparameter γ to balance the importance of the auxiliary frame-level discriminator and video-level discriminator, helping the model to improve detection accuracy. On the FaceForensics++ dataset, the experimental results show that the proposed model improves the average accuracy by 0.4% and 1.0% compared with mainstream detection model. At the same time, the proposed model can only use fewer consecutive frames for manipulation detection, which improves the real-time performance of the model.

Key words: Deepfake, convolutional neural network, recurrent neural network, voting mechanism, central difference convolution

袁野, 黄丽清, 叶锋, 黄添强, 罗海峰, 徐超, . 基于集成学习双流神经网络的实时面部篡改视频检测模型[J]. 计算机工程与科学, 2023, 45(03): 470-477.

YUAN Ye, HUANG Li-qing, YE Feng, HUANG Tian-qiang, LUO Hai-feng, XU Chao, . A real-time facial manipulation video detection model based on ensemble learning dual-stream neural network[J]. Computer Engineering & Science, 2023, 45(03): 470-477.

[1]	田红鹏, 吴璟玮. RIB-NER：基于跨度的中文命名实体识别模型[J]. 计算机工程与科学, 2024, 46(07): 1311-1320.
[2]	尹春勇, 赵峰. 基于双层注意力和深度自编码器的时间序列异常检测模型[J]. 计算机工程与科学, 2024, 46(05): 826-835.
[3]	马长林, 孙状. 基于实体知识的远程监督关系抽取[J]. 计算机工程与科学, 2024, 46(05): 945-950.
[4]	陈杰, 李程, 刘仲. 面向多核向量加速器的卷积神经网络推理和训练向量化方法[J]. 计算机工程与科学, 2024, 46(04): 580-589.
[5]	曹浩东, 汪海涛, 贺建峰. 融合序列局部信息的日期感知序列推荐算法[J]. 计算机工程与科学, 2024, 46(04): 734-742.
[6]	秦文强, 吴仲城, 张俊, 李芳, . 基于异构平台的卷积神经网络加速系统设计[J]. 计算机工程与科学, 2024, 46(01): 12-20.
[7]	周理, 赵祉乔, 潘国腾, 铁俊波, 赵王. 基于RISC-V的图卷积神经网络加速器设计[J]. 计算机工程与科学, 2023, 45(12): 2113-2120.
[8]	周菊香, 周明涛, 甘健侯, 徐坚. 多阶段时序和语义信息增强的问题生成模型[J]. 计算机工程与科学, 2023, 45(10): 1847-1857.
[9]	余子丞, 凌捷. 基于Transformer和多特征融合的DGA域名检测方法[J]. 计算机工程与科学, 2023, 45(08): 1416-1423.
[10]	刘俊奇, 涂文轩, 祝恩. 图卷积神经网络综述[J]. 计算机工程与科学, 2023, 45(08): 1472-1481.
[11]	易啸, 马胜, 肖侬. 深度学习加速器在不同剪枝策略下的运行优化[J]. 计算机工程与科学, 2023, 45(07): 1141-1148.
[12]	刘阳, 粟航, 何倩, 申普, 刘鹏. 基于云-边协同变分自编码神经网络的设备故障检测方法[J]. 计算机工程与科学, 2023, 45(07): 1188-1196.
[13]	崔克彬, 崔叶微. 基于卷积和Transformer的断路器动触头跟踪方法研究[J]. 计算机工程与科学, 2023, 45(07): 1236-1244.
[14]	余泽鹏, 安业腾, 张烁, 杨自兴, 陆继翔, 曹蓉蓉, 陈轶洲, 李文中, 陆桑璐. 基于会话嵌入的应用程序使用预测[J]. 计算机工程与科学, 2023, 45(06): 1079-1086.
[15]	排日旦·阿布都热依木, 吐尔地·托合提, 艾斯卡尔·艾木都拉, . 基于深度学习的实体关系抽取方法研究[J]. 计算机工程与科学, 2023, 45(05): 895-902.

基于集成学习双流神经网络的实时面部篡改视频检测模型

A real-time facial manipulation video detection model based on ensemble learning dual-stream neural network

PDF

可视化

摘要/Abstract

引用本文

使用本文

相关文章 15

编辑推荐

Metrics

本文评价