• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2011, Vol. 33 ›› Issue (11): 177-182.

• 论文 • 上一篇    下一篇

多重复句关系标记搭配的求解模型研究

胡金柱1,2,雷利利1,杨进才1,舒江波3,陈江曼1   

  1. (1.华中师范大学计算机科学系,湖北 武汉 430079;
    2.华中师范大学语言与语言教育研究中心,湖北 武汉 430079;
    3.华中师范大学国家数字化学习工程技术研究中心,湖北 武汉 430079)
  • 收稿日期:2011-06-01 修回日期:2011-08-30 出版日期:2011-11-25 发布日期:2011-11-25
  • 基金资助:

    国家教育部人文社科重点研究基地重点项目(10JJD740012);2011年国家社科基金资助项目(11BYY052)

Research on a Solving Model of the Collocations Between the Relation Markers in Multiple Compound Sentences

HU Jinzhu1,2,LEI Lili1,YANG Jincai1,SHU Jiangbo3,CHEN Jiangman1   

  1. (1.Department of Computer Science,Huazhong Normal University,Wuhan 430079;
    2.Language and Language Education Research Center,Huazhong Normal University,Wuhan 430079;
    3.National Engineering Research Center for ELearning,Huazhong Normal University,Wuhan 430079,China)
  • Received:2011-06-01 Revised:2011-08-30 Online:2011-11-25 Published:2011-11-25

摘要:

关系词是多重复句的连接成分,其功能是关联分句且标志分句间的语义关系,它对多重复句的研究具有重要意义。但是,在研究基于规则的现代汉语复句关系词的自动标识过程中,发现多重复句内初次识别出的关系标记,较多是伪关系词。这就需要判定其是否是真正的关系词,而判定的基础是确定关系标记之间的搭配关系,这是一个难点。为解决该问题,本文提出了两个算法:(1)利用解空间树得到关系标记所有的搭配集合;(2)对解空间树进行剪枝,去掉无用搭配集。实验测试可知:这两个算法不仅通用性强,而且判定正确率达到98.9%,剩下的1.1%还可以得到近似解,这表明本文提出的算法在处理多重复句问题上具有较好的可行性。

关键词: 多重复句, 关系词搭配, 解空间树

Abstract:

Relation words are the connected components of compound sentences, and the function of them is mainly associating clauses and marking the sense relations between clauses, but in the process of studying the automatic identification of the relation words of Modern Chinese compound sentences based on rules, we find that most of the relation markers identified in multiple compound sentences are fake relation words. Therefore, it is needed to determine whether a relation word is true, and the basis for determination is confirming the collocations between relation markers, yet it is a difficulty. This paper proposes two algorithms to solve this problem: (1)utilizing the resolution space tree to get all the collocations between relation markers; (2)pruning the solution space tree in order to delete the useless set of collocations. The results of experiments show that the two algorithms not only are generalpurpose, but also the accuracy can be improved to 98.9% and the remaining 1.1% can get approximate solutions, which shows the good effectiveness in dealing with the issues of multiple compound sentences.

Key words: multiple compound sentences;the collocations between relation words;the resolution space tree