• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2025, Vol. 47 ›› Issue (3): 524-533.

• 人工智能与数据挖掘 • 上一篇    下一篇

基于失匹配负波潜伏期优化的语音增强研究

吉陈果,贾海蓉,裴意静,段淑斐   

  1. (太原理工大学电子信息工程学院,山西 晋中 030619)
  • 收稿日期:2023-12-29 修回日期:2024-05-05 出版日期:2025-03-25 发布日期:2025-04-02
  • 基金资助:
    国家自然科学基金(12004275);山西省自然科学基金(20210302123186,202403021211098)

Optimization of  speech enhancement based on mismatched negative latency

JI Chenguo,JIA Hairong,PEI Yijing,DUAN Shufei   

  1. (College of Electronic Information Engineering,Taiyuan University of Technology,Jinzhong 030619,China)
  • Received:2023-12-29 Revised:2024-05-05 Online:2025-03-25 Published:2025-04-02

摘要: 针对现有语音增强算法和评价指标出现的失配问题,将脑电成分评估语音指标与损失函数相结合,有效提升了语音增强算法的性能。首先,验证脑电成分失匹配负波的潜伏期可以作为语音的客观评价指标,以此提出失匹配负波的潜伏期函数,并将其与信噪比联系,从而解决当前语音增强领域常用评价指标无法直接作为损失函数来优化语音增强算法的问题。其次,将潜伏期函数与传统神经网络中的学习目标进行联合训练,通过训练不断优化潜伏期函数。最后,将潜伏期函数应用到生成对抗网络的鉴别器损失函数中,结合Conformer能够有效捕捉长期依赖关系,同时在时间和频率维度上提取局部特征。实验结果显示,利用脑电成分评估的语音客观度量指标来优化神经网络能够有效改善语音的特性,从语音的增强质量、可懂度和失真程度方面均验证了所提算法的有效性。

关键词: 语音增强, 失匹配负波, 语音质量评估, 生成对抗网络

Abstract: Addressing the mismatch between the existing speech enhancement loss function and the evaluation index, the performance of the speech enhancement algorithm is effectively improved by combining the EEG component evaluation speech index with the loss function. Firstly, it is verified that the latency of mismatched negative waves of EEG components can be used as an objective evaluation index of speech. A latency function of mismatched negative waves is proposed, and it is connected to the signal-to-noise ratio, so as to solve the problem that the currently commonly used evaluation index cannot be directly used as a loss function to optimize the speech enhancement algorithm. Secondly, the latency function is trained jointly with the learning objectives in the traditional neural network, and the latency function is continuously optimized through training. Finally, the latency function is applied to the loss function of the discriminator that generates the adversarial network. Combining Conformer can effectively capture long-term dependencies and extract local features in both time and frequency dimensions. The experimental results show that the speech enhancement algorithm can effectively improve the speech characteristics by using the objective measures of EEG component evaluation. The effectiveness of the proposed algorithm is verified from the aspects of speech enhancement quality, intelligibility and distortion.

Key words: speech enhancement;mismatch , negativity;speech quality assessment;generative adversarial network