• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science

Previous Articles     Next Articles

A voice conversion algorithm based on multi-spectral
feature generative adversarial network 

ZHANG Xiao,ZHANG Wei,WANG Wen-hao,WAN Yong-jing   

  1.  (School of Information Science and Engineering,East China University of Science and Technology,Shanghai 200237,China)
  • Received:2019-09-23 Revised:2019-12-23 Online:2020-05-25 Published:2020-05-25

Abstract:

Voice conversion is widely used in education, entertainment, medical and other fields. In order to obtain high-quality converted speech, this paper proposes a voice conversion algorithm based on multi-spectral feature generative adversarial network. It uses generative adversarial network to convert the voiceprint obtained by spectral feature parameters. The feature-level multimodal fusion technique is used to make the network learn multiple spectral feature information from different feature domains, so as to improve the perception of speech signals of the network. Finally, the high-quality converted speech with good definition and intelligibility is obtained. The experimental results show that the proposed algorithm is significantly superior to the traditional algorithms in the subjective and objective evaluation indicators.
 

Key words: voice conversion, voiceprint, generative adversarial network, multi-spectral feature, cross-domain reconstruction error