• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 人工智能与数据挖掘 • 上一篇    下一篇

基于多谱特征生成对抗网络的语音转换算法

张筱,张巍,王文浩,万永菁   

  1. (华东理工大学信息科学与工程学院,上海 200237)
  • 收稿日期:2019-09-23 修回日期:2019-12-23 出版日期:2020-05-25 发布日期:2020-05-25

A voice conversion algorithm based on multi-spectral
feature generative adversarial network 

ZHANG Xiao,ZHANG Wei,WANG Wen-hao,WAN Yong-jing   

  1.  (School of Information Science and Engineering,East China University of Science and Technology,Shanghai 200237,China)
  • Received:2019-09-23 Revised:2019-12-23 Online:2020-05-25 Published:2020-05-25

摘要:

语音转换在教育、娱乐、医疗等各个领域都有广泛的应用,为了得到高质量的转换语音,提出了基于多谱特征生成对抗网络的语音转换算法。利用生成对抗网络对由谱特征参数生成的声纹图进行转换,利用特征级多模态融合技术使网络学习来自不同特征域的多种信息,以提高网络对语音信号的感知能力,从而得到具有良好清晰度和可懂度的高质量转换语音。实验结果表明,在主、客观评价指标上,本文算法较传统算法均有明显提升。

关键词: 语音转换, 声纹图, 生成对抗网络, 多谱特征, 跨域重建误差

Abstract:

Voice conversion is widely used in education, entertainment, medical and other fields. In order to obtain high-quality converted speech, this paper proposes a voice conversion algorithm based on multi-spectral feature generative adversarial network. It uses generative adversarial network to convert the voiceprint obtained by spectral feature parameters. The feature-level multimodal fusion technique is used to make the network learn multiple spectral feature information from different feature domains, so as to improve the perception of speech signals of the network. Finally, the high-quality converted speech with good definition and intelligibility is obtained. The experimental results show that the proposed algorithm is significantly superior to the traditional algorithms in the subjective and objective evaluation indicators.
 

Key words: voice conversion, voiceprint, generative adversarial network, multi-spectral feature, cross-domain reconstruction error