• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 论文 • 上一篇    下一篇

一种基于WordNet的混合式语义相似度计算方法

张思琪,邢薇薇,蔡圆媛   

  1. (北京交通大学软件学院,北京 100044)
  • 收稿日期:2015-09-10 修回日期:2016-01-21 出版日期:2017-05-25 发布日期:2017-05-25
  • 基金资助:

    国家自然科学基金(61272353,61370128,61428201);教育部新世纪人才计划(NCET-13-0659);北京高校青年英才计划(YETP0583)

A WordNet-based hybrid semantic similarity measurement

ZHANG Si-qi,XING Wei-wei,CAI Yuan-yuan   

  1. (School of Software Engineering,Beijing Jiaotong University,Beijing 100044,China)
  • Received:2015-09-10 Revised:2016-01-21 Online:2017-05-25 Published:2017-05-25

摘要:

语义相似度的计算是自然语言处理中的重要研究内容,在过去几十年的研究工作中,已有大量的语义相似度计算方法被提出并广泛应用于语义消歧、文本聚类等领域中。基于WordNet本体,改进了信息量IC计算模型,进而提出了两种混合式的语义相似度的计算方法。实验结果表明,由于同时考虑了概念节点在WordNet中的最短路径距离和IC语义距离,所提方法优于已有方法,其计算结果更加接近人类的主观判断。

关键词: WordNet, 语义相似度, 信息量, 本体

Abstract:

Calculation of semantic similarity is an important research content of natural language processing (NLP), and many measurements have been proposed for the past few decades. These measurements have been widely used in word sense disambiguation, text clustering and other research fields. We propose a new measurement to calculate information content (IC) with WordNet ontology, and then propose two new hybrid measurements to calculate semantic similarity. Experimental results show that the proposed method is better than the existing methods for considering both the shortest path distance and the IC semantic distance simultaneously, and the results are more close to human judgment.

Key words: WordNet, semantic similarity, information content, ontology