• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2010, Vol. 32 ›› Issue (9): 61-64.doi: 10.3969/j.issn.1007130X.2010.

• 论文 • 上一篇    下一篇

基于属性关系图的同名实体区分算法

郝丹丹,郭景峰,郑超   

  1. (燕山大学信息科学与工程学院,河北 秦皇岛 066004)
  • 收稿日期:2010-03-13 修回日期:2010-06-14 出版日期:2010-09-02 发布日期:2010-09-08
  • 作者简介:郝丹丹(1986),女,河北廊坊人,硕士生,研究方向为图挖掘和链接挖掘;郭景峰,博士,教授,CCF会员(E200005284S),研究方向为数据库理论及应用、数据挖掘技术;郑超,硕士生,研究方向为图挖掘。
  • 基金资助:

    国家自然科学基金资助项目(60673136)

An Algorithm Based on Attributed Relational Graphs for Name Disambiguation

HAO Dandan,GUO Jingfeng,ZHENG Chao   

  1. (School of Information Science and Engineering,Yanshan University,Qinhuangdao 066004,China)
  • Received:2010-03-13 Revised:2010-06-14 Online:2010-09-02 Published:2010-09-08

摘要:

同名问题在大规模的数据库或者数字化图书馆中普遍存在,且困扰着许多研究课题。本文首先提出一种新的图结构——属性关系图(ARG)形象地刻画实体特征及实体间的联系,并给出一种基于属性关系图框架的同名区分算法ARGResolution,对共享同一名字的作者进行分析,根据他们之间的相似度将其聚类,最终得到对应真正实体的各个结果聚类。实验证明挖掘作者间的潜在连接进一步提高了同名区分的质量,成功解决了同名问题。

关键词: 同名, 属性, 链接, 相似性, 层次聚类

Abstract:

The problem of name sharing is widespread in largescale databases or digital libraries,and it causes many research troubles. We propose a graph module named Attributed Relational Graph to describe the figures and links between entities,then we apply an algorithm named ARGResolution based on Attributed Relational Graph to distinct the entities having the same name. The algorithm analyzes the entities and clusters them according to the similarity measure,and eventually gets a set of clusters that correspond to the real entity respectively. The experiment over real datasets shows that mining the links can improve the quality of name disambiguation and resolve the problem successfully.

Key words: name sharing;attributes;links;similarity;hierarchical clustering