• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊
论文

基于属性关系图的同名实体区分算法

展开
  • (燕山大学信息科学与工程学院,河北 秦皇岛 066004)
郝丹丹(1986),女,河北廊坊人,硕士生,研究方向为图挖掘和链接挖掘;郭景峰,博士,教授,CCF会员(E200005284S),研究方向为数据库理论及应用、数据挖掘技术;郑超,硕士生,研究方向为图挖掘。

收稿日期: 2010-03-13

  修回日期: 2010-06-14

  网络出版日期: 2010-09-08

基金资助

国家自然科学基金资助项目(60673136)

An Algorithm Based on Attributed Relational Graphs for Name Disambiguation

Expand
  • (School of Information Science and Engineering,Yanshan University,Qinhuangdao 066004,China)

Received date: 2010-03-13

  Revised date: 2010-06-14

  Online published: 2010-09-08

摘要

同名问题在大规模的数据库或者数字化图书馆中普遍存在,且困扰着许多研究课题。本文首先提出一种新的图结构——属性关系图(ARG)形象地刻画实体特征及实体间的联系,并给出一种基于属性关系图框架的同名区分算法ARGResolution,对共享同一名字的作者进行分析,根据他们之间的相似度将其聚类,最终得到对应真正实体的各个结果聚类。实验证明挖掘作者间的潜在连接进一步提高了同名区分的质量,成功解决了同名问题。

本文引用格式

郝丹丹,郭景峰,郑超 . 基于属性关系图的同名实体区分算法[J]. 计算机工程与科学, 2010 , 32(9) : 61 -64 . DOI: 10.3969/j.issn.1007130X.2010.

Abstract

The problem of name sharing is widespread in largescale databases or digital libraries,and it causes many research troubles. We propose a graph module named Attributed Relational Graph to describe the figures and links between entities,then we apply an algorithm named ARGResolution based on Attributed Relational Graph to distinct the entities having the same name. The algorithm analyzes the entities and clusters them according to the similarity measure,and eventually gets a set of clusters that correspond to the real entity respectively. The experiment over real datasets shows that mining the links can improve the quality of name disambiguation and resolve the problem successfully.

文章导航

/