• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2015, Vol. 37 ›› Issue (12): 2324-2330.

• 论文 • 上一篇    下一篇

利用AdaBoost-SVM集成算法和语块信息的韵律短语识别

钱揖丽1,2,冯志茹1   

  1. (1.山西大学计算机与信息技术学院,山西 太原 030006;
    2.山西大学计算智能与中文信息处理教育部重点实验室,山西 太原 030006)
  • 收稿日期:2015-08-10 修回日期:2015-10-19 出版日期:2015-12-25 发布日期:2015-12-25
  • 基金资助:

    :国家自然科学基金资助项目(61175067);国家自然科学青年基金资助项目(61005053,61100138);山西省科技基础条件平台建设项目(20150910010102);山西省青年科技研究基金资助项目(20120210121);山西省回国留学人员科研资助项目(2013022)

Recognition of Chinese prosodic phrases
based on AdaBoost-SVM algorithm and chunk information

QIAN Yili1,2,FENG Zhiru1   

  1. (1.School of Computer & Information Technology,Shanxi University,Taiyuan 030006;
    2.Key Laboratory of Computational Intelligence and
    Chinese Information Processing of Ministry of Education,Shanxi University,Taiyuan 030006,China)
  • Received:2015-08-10 Revised:2015-10-19 Online:2015-12-25 Published:2015-12-25

摘要:

提出一种基于汉语语块结构并利用AdaBoost-SVM集成学习算法的汉语韵律短语识别方法。首先,对语料进行自动分词、词性标注和初语块标注,然后基于结合紧密度获取语块归并规则并利用规则对初语块进行归并,得到最终的语块结构。其次,基于语块结构并利用AdaBoostSVM集成算法,构建汉语韵律短语识别模型。同时,该文利用多种算法分别构建了利用语块信息和不利用语块的多个模型,对比实验结果表明,表示浅层句法信息的语块能够在韵律短语识别中做出积极有效的贡献;利用AdaBoos-SVM集成算法实现的模型性能更佳。

关键词: 汉语语块;AdaBoost-SVM;韵律短语;识别

Abstract:

We propose a recognition method for Chinese prosodic phrases based on Chunk and the AdaBoostSVM algorithm. Firstly, the initial chunks are marked on the corpus of automatic word segmentation and the part of speech tagging, and then they are merged using the rules based on the closeness between initial Chunks. Secondly, based on the block structure and the AdaBoostSVM integrated algorithm, a Chinese prosodic phrase recognition model is constructed. Meanwhile we utilize various algorithms to build different models which use or not use Chunk information. Comparative experimental results show that the shallow syntactic information chunks make a positive and effective contribution to Chinese prosodic phrase recognition, and the performance of the AdaBoostSVM model is better.

Key words: Chinese chunk;AdaBoost-SVM;prosodic phrase;recognition