基于多特征交互融合的老挝语无监督音素分割方法

计算机工程与科学 ›› 2024, Vol. 46 ›› Issue (05): 937-944.

基于多特征交互融合的老挝语无监督音素分割方法

李新洁1，2，王文君1，2，董凌1，2，赖华1，2，余正涛1，2，高盛祥1，2

（1.昆明理工大学信息工程与自动化学院，云南昆明 650500;2.昆明理工大学云南省人工智能重点实验室，云南昆明 650500）

收稿日期:2023-09-04 修回日期:2023-10-20 接受日期:2024-05-25 出版日期:2024-05-25 发布日期:2024-05-30
基金资助:
国家自然科学基金（62376111,U23A20388，U21B2027,62366027）；云南省重点研发计划（202303AP140008,202302AD080003，202401BC070021,202103AA080015）；云南省科技人才与平台计划（202105AC160018）

An unsupervised phoneme segmentation method for Lao language with multi-feature interaction fusion

LI Xin-jie1,2,WANG Wen-jun1,2,DONG Ling1,2,LAI Hua1,2,YU Zheng-tao1,2,GAO Sheng-xiang1,2

(1.Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500;
2.Yunnan Key Laboratory of Artificial Intelligence,Kunming University of Science and Technology,Kunming 650500,China)

Received:2023-09-04 Revised:2023-10-20 Accepted:2024-05-25 Online:2024-05-25 Published:2024-05-30

摘要/Abstract

摘要： 针对现有方法对老挝语声调变化以及音频多样性考虑不足导致音素分割不准确的问题，提出一种多特征交互融合的老挝语无监督音素分割方法。先对自监督特征、频谱特征以及音高特征进行独立编码，避免单一特征的不足；再基于注意力机制渐进融合多种独立特征，使模型更全面地捕捉老挝语的声调变化和音素边界的信息；最后采用可学习框架优化音素分割模型。实验结果表明，相比基线方法，在老挝语音素分割任务上所提方法的R-value值提升了27.88%。

关键词: 无监督学习, 特征融合, 老挝语, 音素分割, 语音表征

Abstract: Aiming at the inaccurate phoneme segmentation problem caused by the lack of consideration of Lao language tone changes and audio diversity in existing methods, this paper proposes an unsupervised phoneme segmentation method for Lao language with multi-feature interaction fusion. Firstly, self-supervised features, spectral features and pitch features are independently coded to avoid the insufficiency of a single feature. Secondly, multiple independent features are gradually fused based on the attention mechanism, so that the model can more comprehensively capture the information of Lao language tone changes and phoneme boundaries. Finally, a learnable framework is adopted to optimize the phoneme segmentation model. The experimental results show that the proposed method improves the R-value by 27.88% on the Lao phoneme segmentation task compared with the baseline methods.

Key words: unsupervised learning, feature fusion, Lao language, phoneme segmentation, speech representation

李新洁, 王文君, 董凌, 赖华, 余正涛, 高盛祥, . 基于多特征交互融合的老挝语无监督音素分割方法[J]. 计算机工程与科学, 2024, 46(05): 937-944.

LI Xin-jie, WANG Wen-jun, DONG Ling, LAI Hua, YU Zheng-tao, GAO Sheng-xiang, . An unsupervised phoneme segmentation method for Lao language with multi-feature interaction fusion[J]. Computer Engineering & Science, 2024, 46(05): 937-944.

编辑推荐

Metrics

阅读次数

全文

389

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	0	0	389

来源	本网站	其他网站

次数	292	97
比例	75%	25%

摘要

170

最新录用	在线预览	正式出版

0	0	170

	来源	本网站

	次数	170
	比例	100%

[1]	马金林, 闫琦, 马自萍. 西夏文字的多层掩码识别方法[J]. 计算机工程与科学, 2024, 46(12): 2227-2238.
[2]	付燕, 杨旭, 叶鸥. 基于CNN和Transformer特征融合的烟雾识别方法[J]. 计算机工程与科学, 2024, 46(11): 2045-2052.
[3]	刘晓华, 徐茹枝, 杨成月. 一种基于多特征融合嵌入的中文命名实体识别模型研究[J]. 计算机工程与科学, 2024, 46(08): 1473-1481.
[4]	王谢中, 陈旭, 景永俊, 王叔洋. 基于异构图神经网络的半监督网站主题分类[J]. 计算机工程与科学, 2024, 46(04): 635-646.
[5]	余天赐, 高尚. 融合多结构信息的代码注释生成模型[J]. 计算机工程与科学, 2024, 46(04): 667-675.
[6]	杨晓强, 黄加诚. 基于动态定位和特征融合的多分支细粒度识别方法[J]. 计算机工程与科学, 2024, 46(02): 253-263.
[7]	江志鹏, 王自全, 张永生, 于英, 程彬彬, 赵龙海, 张梦唯. 基于改进Deformable DETR的无人机视频流车辆目标检测算法[J]. 计算机工程与科学, 2024, 46(01): 91-101.
[8]	李卓璇, 周亚同. 改进DBNet的电商图像文字检测算法研究[J]. 计算机工程与科学, 2023, 45(11): 2008-2017.
[9]	董子平, 陈世国, 廖国清. 基于YOLOv5s的密集多人脸检测算法[J]. 计算机工程与科学, 2023, 45(10): 1838-1846.
[10]	曾凡锋, 王春真, 李琛. 基于深浅层特征融合的无监督视频摘要算法研究[J]. 计算机工程与科学, 2023, 45(09): 1602-1610.
[11]	崔克彬, 崔叶微. 基于卷积和Transformer的断路器动触头跟踪方法研究[J]. 计算机工程与科学, 2023, 45(07): 1236-1244.
[12]	王剑, 姜林, 王琳钦, 余正涛, 张松, 高盛祥, . 基于BiLSTM的低资源老挝语文本正则化任务[J]. 计算机工程与科学, 2023, 45(07): 1292-1299.
[13]	濮子俊, 张寿明. 基于特征融合与Transformer模型的声音事件定位与检测算法研究[J]. 计算机工程与科学, 2023, 45(06): 1097-1105.
[14]	邓姗姗, 黄慧, 马燕. 基于改进Faster R-CNN的小目标检测算法[J]. 计算机工程与科学, 2023, 45(05): 869-877.
[15]	孙琪, 翟锐, 左方, 张玉涛, . 基于部分卷积和多尺度特征融合的人脸图像修复模型[J]. 计算机工程与科学, 2023, 45(02): 304-312.