• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2016, Vol. 38 ›› Issue (02): 395-400.

• 论文 • 上一篇    

汉语口语互动分级语料库的构建

王跃龙   

  1. (华侨大学文学院,福建 泉州 362021)
  • 收稿日期:2015-10-13 修回日期:2015-12-16 出版日期:2016-02-25 发布日期:2016-02-25
  • 基金资助:

    教育部留学回国人员科研启动基金(Z1534014);华侨大学高层次人才科研启动费(13SKBS219)

Construction of graded spoken
interaction corpus of Mandarin Chinese       

WANG Yuelong   

  1. (College of Humanities,Huaqiao University,Quanzhou 362021,China)
  • Received:2015-10-13 Revised:2015-12-16 Online:2016-02-25 Published:2016-02-25

摘要:

介绍了一个汉语口语互动分级语料库的构建工作。该语料库为国内首个汉语口语互动分级语料库,记录了测试环境下学生口语互动的实际情况。语料库由超过1 200名学生的对话录制而成,时长超过3 000 min,样例分布范围从小学一年级到高中三年级。该语料库能为口语互动研究者提供经过转写和标注的真实语料,在语料调查的基础上可实现对口语互动的量化分析。另外,该语料库回避了通常根据任务难易度进行分级的做法,而是根据会话特征进行互动分级,以供研究者参考。这对口语互动分级标准的确立和互动教材的编纂等也将有参考意义。

关键词: 口语互动, 分级语料库, 分级标准

Abstract:

We construct a spoken interaction (SI) corpus of Mandarin Chinese in this paper, which is the first hierarchical corpus of SI in China. This corpus is a valuable language resource in which the spoken interactions among more than 1200 students are recorded, and the whole duration time is more than 3000 minutes. The students range from Grade 1 in primary school to Grade 3 in high school. This corpus provides researchers with materials with transcriptions and annotations, by which the quantitative analysis for SI can be realized. In addition, the materials are graded according to Conversation Analysis(CA)rather than task levels, providing reference for researchers. So the textbook compiling and the establishment of SI grading standards can both benefit from this corpus.

Key words: spoken interaction;graded corpus;grading standard