• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 论文 • 上一篇    

采用STRAIGHT模型和深度信念网络的语音转换方法

王民,苏利博,王稚慧,要趁红   

  1. (西安建筑科技大学信息与控制工程学院,陕西 西安 710055)
  • 收稿日期:2015-05-25 修回日期:2015-10-20 出版日期:2016-09-25 发布日期:2016-09-25
  • 基金资助:

    住房城乡建设部科学技术项目计划(2016-R2-045);西安市碑林区2014 年科技计划项目(GX1412)

Voice conversion using STRAIGHT model and deep belief networks        

WANG Min,SU Li-bo,WANG Zhi-hui,YAO Chen-hong   

  1. (School of Information and Control Engineering,Xi’an University of Architecture and Technology,Xi’an 710055, China)
  • Received:2015-05-25 Revised:2015-10-20 Online:2016-09-25 Published:2016-09-25

摘要:

提出一种将STRAIGHT 模型和深度信念网络DBN相结合实现语音转换的方式。首先,通过STRAIGHT 模型提取出源说话人和目标说话人的语音频谱参数,用提取的频谱参数分别训练两个DBN 得到语音高阶空间的个性特征信息;然后,用人工神经网络ANN将两个具有高阶特征的空间连接并进行特征转换;最后,用基于目标说话人数据训练出的DBN 来对转换后的特征信息进行逆处理得到语音频谱参数,并用STRAIGHT 模型合成具有目标说话人个性化特征的语音。实验结果表明,采用此种方式获得的语音转换效果要比传统的采用GMM 实现语音转换更好,转换后的语音音质和相似度与目标语音更接近。

关键词: 语音转换, STRAIGHT 模型, 深度信念网络, 高阶空间

Abstract:

We propose a new voice conversion method which combines the STRAIGHT model with deep belief networks. Firstly, we utilize the STRAIGHT model to extract the speech spectrum parameters of the source speaker and target speaker which are then used to train the two DBN spectrum parameters, and obtain the voice characteristic information of the higher order space. Secondly, we can connect and convert the two high order spaces using the artificial neural networks (ANNs). Finally, we employ the DBN trained by the target speaker data to perform reverse processing on the converted feature information, thus obtaining voice spectral parameters. Voice that has personalized features of the target speaker is synthesized by the STRSIGHT model. Experimental results show that compared with the traditional GMM based voice conversion method, the converted voice quality and voice similarity of the proposed method  are closer to the target voice.

Key words: voice conversion;STRAIGHT model;deep belief networks, high-order spaces