• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2025, Vol. 47 ›› Issue (06): 1114-1120.

• Artificial Intelligence and Data Mining • Previous Articles     Next Articles

An automatic Tibetan dialect identification method by  integrates multiple features

GAZANG Cairang1,2,GAO Dingguo1,2 ,RENQING Dongzhu1   

  1. (1.College of Information Science and Technology,Tibet University,Lhasa 850000;
    2.Demonstration Base for Innovative Talent Cultivation in Tibetan Language Information Technology,Lhasa 850000,China)
  • Received:2024-08-24 Revised:2024-08-29 Online:2025-06-25 Published:2025-06-26

Abstract: Tibetan dialects are numerous and exhibit significant internal differences, making research on their automatic identification valuable in the fields of speech processing,  criminal investigation, public security, and linguistics. Currently, common methods for Tibetan dialect identification rely on various acoustic features and deep learning models based on big data. However, traditional acoustic features fail to effectively characterize the subtle distinctions among Tibetan dialects, and deep learning struggles to achieve high-precision dialect recognition on small-scale datasets. To address this issue, this paper proposes an automatic Tibetan dialect identification method by  integrating multiple features. This method combines Mel-frequency cepstral coefficients (MFCC), Gammatone frequency cepstral coefficients (GFCC), and short-time energy (STE) values containing voicing information to construct an information-fused feature system. A bidirectional long short-term memory (Bi-LSTM) network is employed to identify major Tibetan dialects such as U-Tsang, Amdo, and Kham. Experimental results show that the proposed multi-feature fusion method improves accuracy by 10.73%, 10.78%, and 59.48% compared to single-feature methods using MFCC, GFCC, and short-time energy, respectively, ultimately achiev- ing a recognition accuracy of 94.89%. This effectively validates the efficacy and practicality of the proposed method.

Key words: multi-feature fusion, Tibetan dialect, automatic recognition