• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2015, Vol. 37 ›› Issue (12): 2282-2293.

• 论文 • 上一篇    下一篇

基于最大熵模型的汉语标点句缺失话题自动识别初探

卢达威1,宋柔2   

  1. (1.北京大学中国语言文学系,北京 100871;2.北京语言大学语言信息处理研究所,北京 100083)
  • 收稿日期:2015-09-01 修回日期:2015-11-05 出版日期:2015-12-25 发布日期:2015-12-25
  • 基金资助:

    国家自然科学基金资助项目(61171129);国家973计划资助项目(2014CB340502)

Automatic recognition of the absent topics in Chinese
punctuation clauses based on maximum entropy model 

LU Dawei1,SONG Rou2   

  1. (1.Department of Chinese Language and Literature,Peking University,Beijing 100871;
    2.Institute of Language Information Processing,Beijing Language and Culture University,Beijing 100083,China)
  • Received:2015-09-01 Revised:2015-11-05 Online:2015-12-25 Published:2015-12-25

摘要:

本文的任务是判别标点句缺失话题是上句的主语还是宾语,将该任务作为标点句缺失话题自动识别研究的切入点。首先归纳了判别这一任务的一系列字面特征和语义特征,然后结合规则和最大熵模型,进行自动判别实验。结果显示,对特定类别动词的实验F值达到82%。对实验结果的分析说明,动词特征和语义特征对判别该任务的作用最大,规则方法和统计方法在判别任务中不能偏废,精细化的知识对判别的性能有重要影响。

关键词: 广义话题结构, 新支话题, 自动识别, 最大熵模型

Abstract:

We focus on the task of the automatic recognition,which identify whether an absent topic of a punctuation clause is the subject or object of its previous sentence. We regard this task as the pointcut of the automatic recognition of absent topics in Chinese punctuation clauses. Several literal features and semantic features are summerized to achieve this task by combining the rules and the maximum entropy model. Experimental results show that Fscore of this recognition approach reaches 82% for the samples of some specific verbs. Experimental results analysis shows that verb features and semantic features play the most important role in the recognition process; neither rules nor statistics can be neglected, and refined knowledge has great influence on the performance of the recognition .

Key words: generalized topic structure;new branch topic;automatic recognition;maximum entropy model