• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 人工智能与数据挖掘 • 上一篇    下一篇

一种基于语义与句法结构的短文本相似度计算方法

赵谦1,荆琪1,李爱萍1,2,段利国1   

  1. (1.太原理工大学信息与计算机学院,山西 太原 030024;2.武汉大学软件工程国家重点实验室,湖北 武汉 430072)
  • 收稿日期:2016-12-12 修回日期:2017-02-15 出版日期:2018-07-25 发布日期:2018-07-05
  • 基金资助:

    武汉大学软件工程国家重点实验室开放课题 (SKLSE20120930);山西省自然科学基金(20130110152)

A  short text similarity calculation method based
on semantics and syntax structure

ZHAO Qian1,JING Qi1,LI Aiping1,2,DUAN Liguo1   

  1. (1.College of Information and Computer,Taiyuan University of Technology,Taiyuan 030024;
    2.State Key Laboratory of Software Engineering,Wuhan University,Wuhan 430072,China)
  • Received:2016-12-12 Revised:2017-02-15 Online:2018-07-25 Published:2018-07-05

摘要:

为了提高短文本语义相似度计算的准确率,提出一种新的计算方法:将文本分割为句子单元,对句子进行句法依存分析,句子之间相似度计算建立在词语间相似度计算的基础上,在计算词语语义相似度时考虑词语的新特征——情感特征,并提出一种综合方法对词语进行词义消歧,综合词的词性与词语所处的语境,再依据Hownet语义词典计算词语语义相似度;将句子中词语之间的语义相似度根据句子结构加权平均得到句子的语义相似度,最后通过一种新的方法——二元集合法——计算短文本的语义相似度。词语相似度与短文本相似度的准确率分别达到了87.63%和93.77%。实验结果表明,本文方法确实提高了短文本语义相似度的准确率。
 
 

关键词: 词义消歧, 情感特征, 句法依存分析, 短文本语义相似度

Abstract:

In order to improve the accuracy of short text semantic similarity calculation, we propose a new calculation method. Firstly the short text is segmented to sentence units and we conduct syntactic dependency analysis. Similarity calculation between sentences is based on the similarity calculation between words. We then propose to take the emotional characteristics of the words into consideration when calculating semantic similarity, and put forward a comprehensive method for word sense disambiguation. Based on the parts of words and the context, we leverage the Hownet semantic dictionary to do word semantic similarity calculation. The semantic similarity of sentences is obtained by the weighted average of the semantic similarity between words in a sentence according to sentence structures. Finally we calculate the semantic similarity of short texts through a new method called binary set . Experimental results show that the accuracy of word similarity and short text similarity reaches 87.63% and 93.77% respectively, which demonstrates the improvement in the accuracy of semantic similarity.
 

Key words: word sense disambiguation, emotional characteristic, syntactic dependency analysis, short text semantic similarity