融合音素的缅甸语语音识别文本纠错

计算机工程与科学 ›› 2024, Vol. 46 ›› Issue (06): 1121-1127.

融合音素的缅甸语语音识别文本纠错

陈璐1,2,董凌1,2,王文君1,2,王剑1,2,余正涛1,2,高盛祥1,2

(1.昆明理工大学信息工程与自动化学院，云南昆明 650500；2.昆明理工大学云南省人工智能重点实验室，云南昆明 650500)

收稿日期:2023-09-04 修回日期:2023-10-30 接受日期:2024-06-25 出版日期:2024-06-25 发布日期:2024-06-19
基金资助:
国家自然科学基金（U21B2027,61972186);云南高新技术产业发展项目(201606);云南省重大科技专项计划(202103AA080015,202302AD080003);云南省基础研究计划(202001AS070014);云南省学术和技术带头人后备人才(202105AC160018)

Text error correction of Burmese speech recognition based on phoneme fusion

CHEN Lu1,2,DONG Ling1,2,WANG Wen-jun1,2,WANG Jian1,2,YU Zheng-tao1,2,GAO Sheng-xiang1,2

(1.Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500;
2.Key Laboratory of Artificial Intelligence in Yunnan Province,
Kunming University of Science and Technology,Kunming 650500,China)

Received:2023-09-04 Revised:2023-10-30 Accepted:2024-06-25 Online:2024-06-25 Published:2024-06-19

摘要/Abstract

摘要： 缅甸语语音识别文本中包含大量的同音和空格错误，使用通用的文本语义信息纠正错误字符，对缅甸语空格和同音错误定位和纠正不准确。考虑到缅甸语是一种声调语言，并且音素中包含了声调信息，因此提出融合音素的缅甸语语音识别文本纠错方法。通过参数共享策略对转录文本及其音素进行联合建模，利用音素信息辅助检测并纠正缅甸语同音和空格错误。实验结果表明，本文所提方法相比基线方法ConvSeq2Seq，在缅甸语语音识别纠错任务中的F1值提升了85.97%，达到了79.15%。

关键词: 缅甸语, 语音识别文本纠错, 音素, 共享参数, BERT

Abstract: The Burmese language speech recognition text contains a large number of homophones and space errors. General methods use text semantic information to correct erroneous characters, but they are not accurate in locating and correcting Burmese space and homophone errors. Considering that Burmese is a tonal language with tone information embedded within its phonemes, this paper proposes a method for correcting errors in Burmese language speech recognition text that incorporates phonemes. Parameter sharing strategy is used to jointly model the transcribed texts and theirs phonemes, phoneme information is used to assist in detecting and correcting Burmese homophones and space errors. Experimental results show that compared with ConvSeq2Seq method, the F1 value of the proposed method in the Burmese speech recognition correction task has increased by 85.97%, reaching 79.15%.

Key words: Burmese language, speech recognition text correction, phoneme, shared parameter, bidirectional encoder representations from transformers(BERT)

陈璐, 董凌, 王文君, 王剑, 余正涛, 高盛祥, . 融合音素的缅甸语语音识别文本纠错[J]. 计算机工程与科学, 2024, 46(06): 1121-1127.

CHEN Lu, DONG Ling, WANG Wen-jun, WANG Jian, YU Zheng-tao, GAO Sheng-xiang, . Text error correction of Burmese speech recognition based on phoneme fusion[J]. Computer Engineering & Science, 2024, 46(06): 1121-1127.

编辑推荐

Metrics

阅读次数

全文

478

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	0	0	478

来源	本网站	其他网站

次数	366	112
比例	77%	23%

摘要

208

最新录用	在线预览	正式出版

0	0	208

	来源	本网站

	次数	208
	比例	100%

[1]	李新洁, 王文君, 董凌, 赖华, 余正涛, 高盛祥, . 基于多特征交互融合的老挝语无监督音素分割方法[J]. 计算机工程与科学, 2024, 46(05): 937-944.
[2]	杨春霞, 姚思诚, 宋金剑, . 基于词共现的方面级情感分析模型[J]. 计算机工程与科学, 2022, 44(11): 2071-2079.
[3]	李冠宇，于洪志，吴志强. 一种语料缺乏条件下的藏语音素自动切分方法[J]. J4, 2014, 36(10): 2009-2013.