• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2010, Vol. 32 ›› Issue (5): 92-96.doi: 10.3969/j.issn.1007130X.2010.

• 论文 • 上一篇    下一篇

高维数据相似性度量方法研究

谢明霞1,2,郭建忠1,张海波3,陈科1   

  1. (1.解放军信息工程大学测绘学院,河南 郑州 450052;2.75719部队,湖北 武汉 430074;3.68029部队,甘肃 兰州 730020)
  • 收稿日期:2009-11-15 修回日期:2010-02-09 出版日期:2010-04-28 发布日期:2010-05-11
  • 通讯作者: 谢明霞 E-mail:xmx0424@yahoo.cn
  • 作者简介:谢明霞(1985),女,湖北武汉人,硕士生,研究方向为空间数据挖掘和GIS;郭建忠,教授, 研究方向为地理信息系统。
  • 基金资助:

    国家科技支撑计划资助项目(2007BAH16B03);国家863计划资助项目(2009AA12Z228)

Research on the Similarity Measurement  of High Dimensional Data

XIE Mingxia1,2,GUO Jianzhong1,ZHANG Haibo3,CHEN Ke1   

  1. (1.Institute of Surveying and Mapping,Information Engineering University,Zhengzhou 450052;
    2.Corps 75719,Wuhan 430074;3.Corps 68029,Lanzhou 730020,China)
  • Received:2009-11-15 Revised:2010-02-09 Online:2010-04-28 Published:2010-05-11
  • Contact: XIE Mingxia1 E-mail:xmx0424@yahoo.cn

摘要:

将低维空间中的距离度量方法(如Lk范数)应用于高维空间时,随着维数的增加,对象之间距离

的对比性将不复存在。研究高维数据有效的距离或相似(相异)度度量方法是一个重要且具有挑战性的课

题。通过对传统的距离度量或相似性(相异性)度量方法在高维空间中表现出的不适应性的分析,并对现

有的应用于高维数据的相似性度量方法进行总结,提出了高维数据相似性度量函数Hsim(X,Y)的改进方法

HDsim(X,Y)。函数HDsim(X,Y)整合了各类型数据的相似性度量方法,在处理数值型、二值型以及分类属性

数据上充分体现了原Hsim(X,Y)处理数值型数据、Jaccard系数处理二值数据以及匹配率处理分类属性数据

的优越性。通过有效性及实例分析,充分论证了HDsim(X,Y)在高维空间中的有效性。

关键词: 高维数据, 相似性度量, 属性相似性, 空间相似性

Abstract:

There exists no comparison between the distances of the objects with the increase

of dimension when the method of distance measurement for low dimensional space is adopted in

high dimensional space. The study of efficient methods for distance measurement or

similarity (dissimilarity) measurement in high dimensional space is very important and

challenging. The improved function HDsim(X,Y) is proposed to measure the similarity between

the objects in high dimensional space through analyzing the inapplicability of the

traditional measurement being used in high dimensional space and summarizing the existing

methods to similarity measurement for high dimensional data. The methods for similarity

measurement to all kinds of data have been integrated by function HDsim(X,Y),which takes

full advantage of the original function Hsim(X,Y) in dealing with numerical data, the

Jaccard coefficient in dealing with the binary data,and the matching ratio in dealing with

the categorical data. Validity and case analysis demonstrate that the function HDsim(X,Y) is

effective in computing the similarity between the objects in high dimensional space.

Key words: high dimensional data;similarity measurement;attribute similarity;spatial similarity similarity

中图分类号: