面向藏语声纹识别的语料库建设

计算机工程与科学

面向藏语声纹识别的语料库建设

周雁,西绕多吉

（西藏大学藏文信息技术研究中心，西藏拉萨 850000）

收稿日期:2017-06-15 修回日期:2017-12-08 出版日期:2018-11-25 发布日期:2018-11-25
基金资助:
西藏自治区自然科学基金（2015ZR145）;国家自然科学基金（61165010）

Corpus construction for Tibetan voiceprint recognition

ZHOU Yan,Shereb Dorje

(Research Center of Tibetan Information Technology,Tibet University,Lhasa 850000，China)

Received:2017-06-15 Revised:2017-12-08 Online:2018-11-25 Published:2018-11-25

摘要/Abstract

摘要：

藏语声纹识别技术的研究刚刚起步，建设一个用于藏语声纹识别的语料库迫在眉睫。结合藏语特点，设计、建立了一个面向藏语声纹识别的语料库。语料库包含文本相关、文本无关两部分，文本语料来自新闻报刊、文学类、教育类、科技类、佛学类、历史类和传统文化五明类等文献资料，录音者由来自多个不同藏语方言地区的50人组成，产生了语音语料9 500条，为藏语的声纹识别研究奠定了一定的基础。

关键词: 藏语, 声纹识别, 语料库

Abstract:

Research on Tibetan voiceprint recognition technology has just started, and it is an urgent and necessary task to establish a corpus. We design and build a corpus based on the characteristics of Tibetan language, which consists of two parts: textdependent part and textindependent part. Texts of the corpus are collected from a variety of materials, including newspaper, literature, education, science and technology, Buddhism, and history and traditional culture. As for the recording part, we invite 50 speakers from different regions of Tibet. The corpus contains 9500 speech files and it lays a certain foundation for Tibetan voiceprint recognition.

Key words: Tibetan, voiceprint recognition, corpus

周雁,西绕多吉. 面向藏语声纹识别的语料库建设[J]. 计算机工程与科学.

ZHOU Yan,Shereb Dorje. Corpus construction for Tibetan voiceprint recognition[J]. Computer Engineering & Science.

[1]	顾涛涛, 卢帅兵, 李响, 况晓辉, 赵刚. 并行模糊测试综述[J]. 计算机工程与科学, 2022, 44(06): 1046-1055.
[2]	张玉杰, 张赞. DenseNet在声纹识别中的应用研究[J]. 计算机工程与科学, 2022, 44(01): 132-137.
[3]	夏吾吉1,2，华却才让1. 基于投射的藏语语义依存分析研究[J]. 计算机工程与科学, 2019, 41(10): 1868-1873.
[4]	柳路芳1，李波1，陈鹏1，周凌寒1，王兵2. 基于词向量与可比语料库的双语词典提取研究[J]. 计算机工程与科学, 2018, 40(02): 368-373.
[5]	王跃龙. 汉语口语互动分级语料库的构建[J]. J4, 2016, 38(02): 395-400.
[6]	惠浩添，李云建，钱龙华，周国栋. 一个面向信息抽取的中英文平行语料库[J]. J4, 2015, 37(12): 2331-2338.
[7]	黄一龙，李培峰，朱巧明. 中文事件相关性语料库构建及识别方法[J]. J4, 2015, 37(12): 2306-2311.
[8]	李冠宇，于洪志，吴志强. 一种语料缺乏条件下的藏语音素自动切分方法[J]. J4, 2014, 36(10): 2009-2013.
[9]	李冠宇，于洪志，李永宏，马宁. 基于决策树的藏语拉萨话三音子模型[J]. J4, 2013, 35(9): 146-150.
[10]	才智杰,才让卓玛. 藏文自动分词系统的设计[J]. J4, 2011, 33(5): 151-154.
[11]	巢文涵[1] 李舟军[2] 陈跃新[1]. 一种用于机器翻译的相似句对检索方法[J]. J4, 2008, 30(9): 132-136.
[12]	李勇. 基于聚类方法对特定领域术语的自动筛选[J]. J4, 2008, 30(2): 64-66.
[13]	吴振南[1] 熊皓[2] 徐爱萍[2]. GIS中文查询语句的未登录词识别算法研究[J]. J4, 2007, 29(11): 81-83.