• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2010, Vol. 32 ›› Issue (5): 92-96.

Previous Articles     Next Articles

Research on the Similarity Measurement  of High Dimensional Data

XIE Mingxia1,2,GUO Jianzhong1,ZHANG Haibo3,CHEN Ke1   

  1. (1.Institute of Surveying and Mapping,Information Engineering University,Zhengzhou 450052;
    2.Corps 75719,Wuhan 430074;3.Corps 68029,Lanzhou 730020,China)
  • Received:2009-11-15 Revised:2010-02-09 Online:2010-04-28 Published:2010-05-11
  • Contact: XIE Mingxia1 E-mail:xmx0424@yahoo.cn

Abstract:

There exists no comparison between the distances of the objects with the increase

of dimension when the method of distance measurement for low dimensional space is adopted in

high dimensional space. The study of efficient methods for distance measurement or

similarity (dissimilarity) measurement in high dimensional space is very important and

challenging. The improved function HDsim(X,Y) is proposed to measure the similarity between

the objects in high dimensional space through analyzing the inapplicability of the

traditional measurement being used in high dimensional space and summarizing the existing

methods to similarity measurement for high dimensional data. The methods for similarity

measurement to all kinds of data have been integrated by function HDsim(X,Y),which takes

full advantage of the original function Hsim(X,Y) in dealing with numerical data, the

Jaccard coefficient in dealing with the binary data,and the matching ratio in dealing with

the categorical data. Validity and case analysis demonstrate that the function HDsim(X,Y) is

effective in computing the similarity between the objects in high dimensional space.

Key words: high dimensional data, similarity measurement, attribute similarity, spatial similarity similarity

CLC Number: