• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2012, Vol. 34 ›› Issue (2): 172-175.

• 论文 • 上一篇    下一篇

一种基于知网的句子相似度计算方法

程传鹏,吴志刚   

  1. (中原工学院计算机学院,河南 郑州 450007)
  • 收稿日期:2011-07-23 修回日期:2011-10-08 出版日期:2012-02-25 发布日期:2012-02-25

A Method of Sentence Similarity Computing Based on Hownet

CHENG Chuanpeng,WU Zhigang   

  1. (School of Computer Science,Zhongyuan Institute of Technology,Zhengzhou 450007,China)
  • Received:2011-07-23 Revised:2011-10-08 Online:2012-02-25 Published:2012-02-25

摘要:

句子相似度是衡量文档相似度的基础,在自然语言处理领域中有着非常重要的作用。目前的句子相似度计算方法忽略了句子的结构对相似度的影响。本文在分析已有研究工作的基础上,提出了一种改进的句子相似度计算方法。依据知网对“实体概念”的描述,构造出义原的语义层次树,由各个义原在树中的相对位置,计算出义原之间的相似度。对三种义原加权求和得到词语之间的语义相似度。综合句子的表层相似度和句子的词语语义以及词语的相对位置关系,得到句子的整体相似度。实验表明,在同等的测试条件下,本文所提出的句子相似度计算方法在相似度比较上更符合人的直观感觉。

关键词: 句子相似度, 知网, 表层相似度, 语义偏移量

Abstract:

Sentence  similarity is the basis of document  similarity, and sentence similarity computing plays an important role in the field of natural language processing. The current methods of sentence similarity computing neglect the influence of sentence structure. On the basis of the interrelated research, this paper proposes an improved method of similarity comparison. The semantic tree of sememe is constructed according to the description of entity conception in the Hownet, the semantic similarity of sememe is computed based on the relative positions in the sememe tree. Calculating of sentence similarity is based on surface similarity and semantic similarity. Under the same test conditions, the experiments show that the proposed method is much closer to the people’s comprehension to the meanings of the sentences.

Key words: sentence similarity;hownet;surface similarity;semantic offset similarity