• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 人工智能与数据挖掘 • 上一篇    下一篇

汉语篇章广义话题结构的认知计算模型研究

卢达威1,宋柔2,尚英3   

  1. (1.北京大学中国语言文学系,北京100871;2.北京语言大学信息科学学院,北京100083;
    3.北京语言大学汉语学院,北京100083)
     
  • 收稿日期:2016-11-30 修回日期:2017-03-22 出版日期:2018-07-25 发布日期:2018-07-25
  • 基金资助:

    教育部人文社会科学研究青年项目(16YJC740050);中国博士后科学基金(2016M600838)

A cognitive computational model of
generalized topic structure in Chinese text

LU Dawei1,SONG Rou2,SHANG Ying3   

  1. (1.Department of Chinese Language and Literature,Peking University,Beijing 100871;
    2.School of Information Science,Beijing Language and Culture University,Beijing 100083;
    3.School of Chinese Studies,Beijing Language and Culture University,Beijing 100083,China)
     
  • Received:2016-11-30 Revised:2017-03-22 Online:2018-07-25 Published:2018-07-25

摘要:

广义话题结构是汉语篇章中客观存在的结构形式。依据有限状态机的思想设计了识别广义话题结构的计算模型,在较大规模语料中初步检验了它的有效性,分析了该模型的空间复杂度和时间复杂度。该模型的特点是:递推控制,输出和输入以标点句为单位同步进行,无长距离回溯,有限回填,有限存储,保持词序。这些特点正是人在“话题—说明”信息的认知过程中所遵循的准则,因此该计算模型可以看作人完成这一认知过程的机械模型。

关键词: 广义话题结构, 认知, 计算模型, 标点句, 话题自足句

Abstract:

Generalized topic structure (GTS) is the fundamental objective structure in Chinese text. We design a computational model to recognize this structure based on the idea of finite-state machine (FSM). We preliminarily prove its validity in large-scale corpus and analyze its spatial complexity and time complexity. The characteristics of this model are: iterative control, synchronization of output and input in punctuation clauses (P-clause), none backtracking in long distance, limited backfilling, limited storage, and unchanged lexical order. These features are also the principles obeyed by human being while cognizing the topiccomment information in text. Thus, this model can be regarded as a mechanical model of the cognitive process of human.
 

Key words: generalized topic structure, cognition, computational model, punctuation clause, topic sufficient sentence