• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2023, Vol. 45 ›› Issue (01): 154-162.

• 人工智能与数据挖掘 • 上一篇    下一篇

基于主题关系的中文短文本图模型实体消歧

马瑛超,张晓滨   

  1. (西安工程大学计算机科学学院,陕西 西安 710048)
  • 收稿日期:2021-01-19 修回日期:2021-06-10 接受日期:2023-01-25 出版日期:2023-01-25 发布日期:2023-01-25
  • 基金资助:
    陕西省自然科学基金(2019JQ-849);西安工程大学研究生创新基金(chx2021028)

Entity disambiguation of Chinese short text using graph model based on topic relations

MA Ying-chao,ZHANG Xiao-bin   

  1. (School of Computer Science,Xi’an Polytechnic University,Xi’an 710048,China)
  • Received:2021-01-19 Revised:2021-06-10 Accepted:2023-01-25 Online:2023-01-25 Published:2023-01-25

摘要: 实体消歧作为知识库构建、信息检索等应用的重要支撑技术,在自然语言处理领域有着重要的作用。然而在短文本环境中,对实体的上下文特征进行建模的传统消歧方式很难提取到足够多用以消歧的特征。针对短文本的特点,提出一种基于实体主题关系的中文短文本图模型消歧方法,首先,通过TextRank算法对知识库信息构建的语料库进行主题推断,并使用主题推断的结果作为实体间关系的表示;然后,结合基于BERT的语义匹配模型给出的消歧评分对待消歧文本构建消歧网络图;最终,通过搜索排序得出最后的消歧结果。使用CCKS2020短文本实体链接任务提供的数据集对所提方法进行评测,实验结果表明,该方法对短文本的实体消歧效果优于其他方法,能有效解决在缺乏知识库实体关系情况下的中文短文本实体消歧问题。

关键词: 实体消歧, 图模型, 主题推断, TextRank

Abstract: As an important supporting technology for applications such as knowledge base construction and information retrieval, entity disambiguation plays an important role in the field of Natural Language Processing (NLP). However, in the short text environment, it is difficult for entity disambiguation to extract sufficient context features for disambiguation. Aiming at the characteristics of short texts, this paper proposes a disambiguation method of graph models based on entity topic relations. This method uses TextRank algorithm to infer the topic of corpus constructed by knowledge base information, and uses the result of topic inference as the representation of relationship between entities. By combining the disambiguation score given by the semantic matching model based on BERT, the disambiguation network graph is constructed, and the final disambiguation result is obtained through search and sorting. The data set provided in the short text entity link task of CCKS2020 is used to evaluate the method. The experimental results show that the proposed method is better than other entity linking methods in entity disambiguation of short text, and can effectively solve the entity disambiguation problem of Chinese short text.

Key words: entity disambiguation, graph model, topic inference, TextRank