• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2024, Vol. 46 ›› Issue (05): 937-944.

• 人工智能与数据挖掘 • 上一篇    下一篇

基于多特征交互融合的老挝语无监督音素分割方法

李新洁1,2,王文君1,2,董凌1,2,赖华1,2,余正涛1,2,高盛祥1,2   

  1. (1.昆明理工大学信息工程与自动化学院,云南 昆明 650500;2.昆明理工大学云南省人工智能重点实验室,云南 昆明 650500)
  • 收稿日期:2023-09-04 修回日期:2023-10-20 接受日期:2024-05-25 出版日期:2024-05-25 发布日期:2024-05-30
  • 基金资助:
    国家自然科学基金(62376111,U23A20388,U21B2027,62366027);云南省重点研发计划(202303AP140008,202302AD080003,202401BC070021,202103AA080015);云南省科技人才与平台计划(202105AC160018)

An unsupervised phoneme segmentation method for Lao language with multi-feature interaction fusion

LI Xin-jie1,2,WANG Wen-jun1,2,DONG Ling1,2,LAI Hua1,2,YU Zheng-tao1,2,GAO Sheng-xiang1,2   

  1. (1.Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500;
    2.Yunnan Key Laboratory of Artificial Intelligence,Kunming University of Science and Technology,Kunming 650500,China)
  • Received:2023-09-04 Revised:2023-10-20 Accepted:2024-05-25 Online:2024-05-25 Published:2024-05-30

摘要: 针对现有方法对老挝语声调变化以及音频多样性考虑不足导致音素分割不准确的问题,提出一种多特征交互融合的老挝语无监督音素分割方法。先对自监督特征、频谱特征以及音高特征进行独立编码,避免单一特征的不足;再基于注意力机制渐进融合多种独立特征,使模型更全面地捕捉老挝语的声调变化和音素边界的信息;最后采用可学习框架优化音素分割模型。实验结果表明,相比基线方法,在老挝语音素分割任务上所提方法的R-value值提升了27.88%。

关键词: 无监督学习, 特征融合, 老挝语, 音素分割, 语音表征

Abstract: Aiming at the inaccurate phoneme segmentation problem caused by the lack of consideration of Lao language tone changes and audio diversity in existing methods, this paper proposes an unsupervised phoneme segmentation method for Lao language with multi-feature interaction fusion. Firstly, self-supervised features, spectral features and pitch features are independently coded to avoid the insufficiency of a single feature. Secondly, multiple independent features are gradually fused based on the attention mechanism, so that the model can more comprehensively capture the information of Lao language tone changes and phoneme boundaries. Finally, a learnable framework is adopted to optimize the phoneme segmentation model. The experimental results show that the proposed method improves the R-value by 27.88% on the Lao phoneme segmentation task compared with the baseline methods.


Key words: unsupervised learning, feature fusion, Lao language, phoneme segmentation, speech representation