• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2015, Vol. 37 ›› Issue (05): 895-900.

• 论文 • 上一篇    下一篇

利用海量知识库实现实体标注的一种方法

汤效琴1 ,刘立波1 ,周涛2   

  1. (1.宁夏大学数学计算机学院,宁夏 银川 750021;2.宁夏农林科学院农业资源与环境研究所,宁夏 银川 750002)
  • 收稿日期:2014-05-16 修回日期:2014-07-30 出版日期:2015-05-25 发布日期:2015-05-25
  • 基金资助:

    宁夏自然科学基金资助项目(NZ13053)

An entity linking approach based on
massive RDF knowledge bases       

TANG Xiaoqin1,LIU Libo1,ZHOU Tao2   

  1. (1.College of Mathematics and Computer Science,Ningxia University,Yinchuan 750021;
    2.Institute of Agricultural Resources and Environment,
    Ningxia Academy of Agriculture and Forestry,Yinchuan 750002,China)
  • Received:2014-05-16 Revised:2014-07-30 Online:2015-05-25 Published:2015-05-25

摘要:

互联网上聚集了大量的文本、图像等非结构化信息,RDF作为W3C提出的互联网上的资源描述框架,非常适合于描述网络上的非结构化信息,因此形成了大量的RDF知识库,如Freebase、Yago、DBPedia等。RDF知识库中包含丰富的语义信息,可以对来自网页的名字实体进行标注,实现语义扩充。将网页上的名字实体映射到知识库中对应实体上称作实体标注。实体标注包括两个主要部分:实体间的映射和标注去歧义。利用海量RDF知识库的特性,提出了一种有效的实体标注方法。该方法采用简单的图加权及计算解决实体标注的去歧义问题。该方法已在云平台上实现,并通过实验验证了其准确度和可扩展性。

关键词: RDF知识库;实体标注;图加权;去岐义

Abstract:

More and more unstructured data products are produced,distributed and consumed over the Internet today.Resource Description Framework (RDF) is the Internet resources description standard proposed by W3C,which is quite suitable in describing the unstructured information on Internet.As a consequence,a large number of RDF knowledge bases have been developed,such as Freebase,Yago,DBPedia and so on.RDF knowledge bases contain rich semantic information,which not only label the named entities on Web pages, but also expand semantic information.The task of mapping named entities from web pages to the corresponding entities in knowledge bases is called entity linking.Entity linking includes two main parts: mapping between entities and entity disambiguation.In the paper, we propose an efficient entity linking approach based on the properties of RDF knowledge bases.The algorithms use a simple weighted graph approach to solve the entity disambiguation problem on cloud platform. Experimental results show that our solution is accurate and scalable. 

Key words: RDF knowledge bases;entity linking;graph weighting;entity disambiguation