• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2025, Vol. 47 ›› Issue (06): 1114-1120.

• 人工智能与数据挖掘 • 上一篇    下一篇

融合多特征的藏语方言自动辨识方法

尕藏才让1,2,高定国1,2,仁青东主1   

  1. (1.西藏大学信息科学与技术学院,西藏 拉萨 850000;2.藏文信息技术创新人才培养示范基地,西藏 拉萨 850000)
  • 收稿日期:2024-08-24 修回日期:2024-08-29 出版日期:2025-06-25 发布日期:2025-06-26
  • 基金资助:
    拉萨市科技计划项目(LSKJ202405)

An automatic Tibetan dialect identification method by  integrates multiple features

GAZANG Cairang1,2,GAO Dingguo1,2 ,RENQING Dongzhu1   

  1. (1.College of Information Science and Technology,Tibet University,Lhasa 850000;
    2.Demonstration Base for Innovative Talent Cultivation in Tibetan Language Information Technology,Lhasa 850000,China)
  • Received:2024-08-24 Revised:2024-08-29 Online:2025-06-25 Published:2025-06-26

摘要: 藏语方言众多,内部差异显著,因此藏语方言自动辨识研究在语言学、语音信息处理和刑事侦查与公共安全等领域均具有重要价值。目前,藏语方言辨识的常用方法依赖于各种声学特征和基于大数据的深度学习模型。然而,传统声学特征不能充分表示藏语各方言之间的细微差别,深度学习在小规模数据集上难以实现高精度的方言识别。为解决这一问题,提出了一种融合多种特征的藏语方言自动辨识方法。该方法结合梅尔频率倒谱系数(MFCC)、伽马通频率倒谱系数(GFCC)以及包含清浊音信息的短时能量(STE)值,形成一个多信息融合的方言辨识特征,采用双向长短期记忆(Bi-LSTM)网络对卫藏、安多和康巴等主要藏语方言进行了识别。实验结果表明,提出的多特征融合方法相对于采用单一特征的MFCC,GFCC和STE方法分别提高了10.73%、10.78%和59.48%的辩识准确率,最终达到94.89%的辨识准确率,有效地验证了所提方法的有效性和实用性。

关键词: 多特征融合, 藏语方言, 自动辨识

Abstract: Tibetan dialects are numerous and exhibit significant internal differences, making research on their automatic identification valuable in the fields of speech processing,  criminal investigation, public security, and linguistics. Currently, common methods for Tibetan dialect identification rely on various acoustic features and deep learning models based on big data. However, traditional acoustic features fail to effectively characterize the subtle distinctions among Tibetan dialects, and deep learning struggles to achieve high-precision dialect recognition on small-scale datasets. To address this issue, this paper proposes an automatic Tibetan dialect identification method by  integrating multiple features. This method combines Mel-frequency cepstral coefficients (MFCC), Gammatone frequency cepstral coefficients (GFCC), and short-time energy (STE) values containing voicing information to construct an information-fused feature system. A bidirectional long short-term memory (Bi-LSTM) network is employed to identify major Tibetan dialects such as U-Tsang, Amdo, and Kham. Experimental results show that the proposed multi-feature fusion method improves accuracy by 10.73%, 10.78%, and 59.48% compared to single-feature methods using MFCC, GFCC, and short-time energy, respectively, ultimately achiev- ing a recognition accuracy of 94.89%. This effectively validates the efficacy and practicality of the proposed method.

Key words: multi-feature fusion, Tibetan dialect, automatic recognition