基于藏字构件的低资源多方言藏语语音合成方法研究

计算机工程与科学 ›› 2025, Vol. 47 ›› Issue (8): 1503-1510.

基于藏字构件的低资源多方言藏语语音合成方法研究

王嘉文1,2,高定国1,2,尼琼1,2,巴果1,2

(1.西藏大学信息科学技术学院，西藏拉萨 850000；
2.西藏大学藏文信息技术创新人才培养示范基地，西藏拉萨 850000)

收稿日期:2024-02-05 修回日期:2024-05-29 出版日期:2025-08-25 发布日期:2025-08-27
基金资助:
国家自然科学基金（62166038）；四川省科技计划（2023YFQ0044）；西藏大学高水平人才培养计划(2021-GSP-S126)

Low-resource multi-dialect Tibetan synthesis method based on Tibetan character components

WANG Jiawen1,2,GAO Dingguo1,2,NI Qiong1,2,BA Guo1,2

(1.College of Information Science and Technology,Tibet University,Lhasa 850000；
2.Tibetan Information Technology Innovative Talent Cultivation Demonstration Base,Tibet University,Lhasa 850000，China)

Received:2024-02-05 Revised:2024-05-29 Online:2025-08-25 Published:2025-08-27

摘要/Abstract

摘要： 藏语语音合成是人工智能领域的一个重要研究方向，对推动藏语语言信息处理的发展和创新具有重要意义。针对藏语语音语料稀缺、文本复杂以及方言多样的合成难点，首先提出了一种基于藏字构件的语料处理方法，以减少文本处理的难度；其次采用端到端的语音合成模型，探讨了2种低资源的多方言藏语合成方案。实验结果表明，所提方法通过混合数据集训练能够实现单一模型对多方言的语音合成，提高语音的自然度和表现力，达到了平均MOS为 4.56 的语音质量。

关键词: 藏字构件, 低资源, 多方言, 藏语, 语音合成

Abstract: Tibetan synthesis is an important research direction in the field of artificial intelligence,which has significant implications for promoting the development and innovation of Tibetan language information processing.This paper proposes a corpus processing method based on Tibetan character components,aiming to reduce the difficulty of text processing,and adopts an end-to-end speech synthesis model to explore two low-resource multi-dialect Tibetan synthesis schemes.The experiments show that the proposed method can achieve multi-dialect speech synthesis with a single model trained on mixed datasets,improve the naturalness and expressiveness of speech,and achieve an average MOS of 4.56 for speech quality.

Key words: Tibetan character component, low-resource, multi-dialect, Tibetan, speech synthesis

王嘉文1, 2, 高定国1, 2, 尼琼1, 2, 巴果1, 2. 基于藏字构件的低资源多方言藏语语音合成方法研究[J]. 计算机工程与科学, 2025, 47(8): 1503-1510.

WANG Jiawen1, 2, GAO Dingguo1, 2, NI Qiong1, 2, BA Guo1, 2. Low-resource multi-dialect Tibetan synthesis method based on Tibetan character components[J]. Computer Engineering & Science, 2025, 47(8): 1503-1510.

[1]	尕藏才让1, 2, 高定国1, 2, 仁青东主1. 融合多特征的藏语方言自动辨识方法[J]. 计算机工程与科学, 2025, 47(6): 1114-1120.
[2]	班琪, 云静, 邓磊, . 低资源场景下的汉语—传统蒙古语跨语言摘要方法研究[J]. 计算机工程与科学, 2025, 47(5): 931-939.
[3]	申影利, 赵小兵, . 语言模型蒸馏的低资源神经机器翻译方法[J]. 计算机工程与科学, 2024, 46(4): 743-751.
[4]	赵亚丽, 余正涛, 郭军军, 高盛祥, 相艳, . 基于情感语义对抗的跨语言情感分类模型[J]. 计算机工程与科学, 2023, 45(2): 338-345.
[5]	夏吾吉1,2，华却才让1. 基于投射的藏语语义依存分析研究[J]. 计算机工程与科学, 2019, 41(10): 1868-1873.
[6]	周雁,西绕多吉. 面向藏语声纹识别的语料库建设[J]. 计算机工程与科学, 2018, 40(11): 2080-2084.
[7]	李冠宇，于洪志，吴志强. 一种语料缺乏条件下的藏语音素自动切分方法[J]. J4, 2014, 36(10): 2009-2013.
[8]	李冠宇，于洪志，李永宏，马宁. 基于决策树的藏语拉萨话三音子模型[J]. J4, 2013, 35(9): 146-150.