• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2024, Vol. 46 ›› Issue (06): 1121-1127.

• 人工智能与数据挖掘 • 上一篇    下一篇

融合音素的缅甸语语音识别文本纠错

陈璐1,2,董凌1,2,王文君1,2,王剑1,2,余正涛1,2,高盛祥1,2   

  1. (1.昆明理工大学信息工程与自动化学院,云南 昆明 650500;2.昆明理工大学云南省人工智能重点实验室,云南 昆明 650500)

  • 收稿日期:2023-09-04 修回日期:2023-10-30 接受日期:2024-06-25 出版日期:2024-06-25 发布日期:2024-06-19
  • 基金资助:
    国家自然科学基金(U21B2027,61972186);云南高新技术产业发展项目(201606);云南省重大科技专项计划(202103AA080015,202302AD080003);云南省基础研究计划(202001AS070014);云南省学术和技术带头人后备人才(202105AC160018)

Text error correction of Burmese speech recognition based on phoneme fusion

CHEN Lu1,2,DONG Ling1,2,WANG Wen-jun1,2,WANG Jian1,2,YU Zheng-tao1,2,GAO Sheng-xiang1,2   

  1. (1.Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500;
    2.Key Laboratory of Artificial Intelligence in Yunnan Province,
    Kunming University of Science and Technology,Kunming 650500,China)
  • Received:2023-09-04 Revised:2023-10-30 Accepted:2024-06-25 Online:2024-06-25 Published:2024-06-19

摘要: 缅甸语语音识别文本中包含大量的同音和空格错误,使用通用的文本语义信息纠正错误字符,对缅甸语空格和同音错误定位和纠正不准确。考虑到缅甸语是一种声调语言,并且音素中包含了声调信息,因此提出融合音素的缅甸语语音识别文本纠错方法。通过参数共享策略对转录文本及其音素进行联合建模,利用音素信息辅助检测并纠正缅甸语同音和空格错误。实验结果表明,本文所提方法相比基线方法ConvSeq2Seq,在缅甸语语音识别纠错任务中的F1值提升了85.97%,达到了79.15%。

关键词: 缅甸语, 语音识别文本纠错, 音素, 共享参数, BERT

Abstract: The Burmese language speech recognition text contains a large number of homophones and space errors. General methods use text semantic information to correct erroneous characters, but they are not accurate in locating and correcting Burmese space and homophone errors. Considering that Burmese is a tonal language with tone information embedded within its phonemes, this paper proposes a method for correcting errors in Burmese language speech recognition text that incorporates phonemes. Parameter sharing strategy is used to jointly model the transcribed texts and theirs phonemes, phoneme information is used to assist in detecting and correcting Burmese homophones and space errors. Experimental results show that compared with ConvSeq2Seq method, the F1 value of the proposed method in the Burmese speech recognition correction task has increased by 85.97%, reaching 79.15%.

Key words: Burmese language, speech recognition text correction, phoneme, shared parameter, bidirectional encoder representations from transformers(BERT)