• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 人工智能与数据挖掘 • 上一篇    下一篇

面向藏语声纹识别的语料库建设

周雁,西绕多吉   

  1. (西藏大学藏文信息技术研究中心,西藏 拉萨 850000)
  • 收稿日期:2017-06-15 修回日期:2017-12-08 出版日期:2018-11-25 发布日期:2018-11-25
  • 基金资助:

    西藏自治区自然科学基金(2015ZR145);国家自然科学基金(61165010)

Corpus construction for Tibetan voiceprint recognition

ZHOU Yan,Shereb Dorje   

  1. (Research Center of Tibetan Information Technology,Tibet University,Lhasa 850000,China)
  • Received:2017-06-15 Revised:2017-12-08 Online:2018-11-25 Published:2018-11-25

摘要:

藏语声纹识别技术的研究刚刚起步,建设一个用于藏语声纹识别的语料库迫在眉睫。结合藏语特点,设计、建立了一个面向藏语声纹识别的语料库。语料库包含文本相关、文本无关两部分,文本语料来自新闻报刊、文学类、教育类、科技类、佛学类、历史类和传统文化五明类等文献资料,录音者由来自多个不同藏语方言地区的50人组成,产生了语音语料9 500条,为藏语的声纹识别研究奠定了一定的基础。
 

关键词: 藏语, 声纹识别, 语料库

Abstract:

Research on Tibetan voiceprint recognition technology has just started, and it is an urgent and necessary task to establish a corpus. We design and build a corpus based on the characteristics of Tibetan language, which consists of two parts: textdependent part and textindependent part. Texts of the corpus are collected from a variety of materials, including newspaper, literature, education, science and technology, Buddhism, and history and traditional culture. As for the recording part, we invite 50 speakers from different regions of Tibet. The corpus contains 9500 speech files and it lays a certain foundation for Tibetan voiceprint recognition.

Key words: Tibetan, voiceprint recognition, corpus