• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 人工智能与数据挖掘 • 上一篇    下一篇

基于区间数的不确定性数据聚类算法:UD-OPTICS

吴翠先1,2,3,何少元1,2   

  1. (1.重庆邮电大学通信与信息工程学院,重庆 400065;2.重庆邮电大学通信新技术应用研究中心,重庆 400065;
    3.重庆信科设计有限公司,重庆 401121)
  • 收稿日期:2018-07-24 修回日期:2018-11-05 出版日期:2019-07-25 发布日期:2019-07-25

UD-OPTICS: An uncertain data clustering
algorithm based on interval number

WU Cuixian1,2,3,HE Shaoyuan1,2
 
  

  1.  (1.School of Telecommunication and Information Engineering,
    Chongqing University of Posts and Telecommunications,Chongqing 400065;
    2.Research Center of New Telecommunication Technology Applications,
    Chongqing University of Posts and Telecommunications,Chongqing 400065;
    3.Chongqing Information Technology Designing Company Limited,Chongqing 401121,China)

     
  • Received:2018-07-24 Revised:2018-11-05 Online:2019-07-25 Published:2019-07-25

摘要:

在不确定性数据聚类算法的研究中,普遍需要假设不确定性数据服从某种分布,继而获得表示不确定性数据的概率密度函数或概率分布函数,然而这种假设很难保证与实际应用系统中的不确定性数据分布一致。现有的基于密度的算法对初始参数敏感,在对密度不均匀的不确定性数据聚类时,无法发现任意密度的类簇。鉴于这些不足,
提出基于区间数的不确定性数据对象排序识别聚类结构算法(UDOPTICS)。该算法利用区间数理论,结合不确定性数据的相关统计信息来更加合理地表示不确定性数据,提出了低计算复杂度的区间核心距离与区间可达距离的概念与计算方法,将其用于度量不确定性数据间的相似度,拓展类簇与对象排序识别聚类结构。该算法可很好地发现任意密度的类簇。实验结果表明,UDOPTICS算法具有较高的聚类精度和较低的复杂度。
 

关键词: 不确定性数据, 区间数, 密度聚类算法, OPTICS

Abstract:

The research on uncertain data clustering algorithms generally assumes that uncertain data obeys a certain distribution, so we can obtain the probability density function or probability distribution function which represents the uncertain data. However, it is difficult to guarantee the consistency between the assumed distribution and the
distribution of uncertain data in practical applications. Existing algorithms based on density are sensitive to initial parameters, so they cannot find class clusters of arbitrary density when clustering uncertain data with uneven density. In view of these shortcomings, we propose an algorithm based on interval number for uncertain data object sorting recognition clustering structure (UDOPTICS). It uses the interval number theory and the statistical information of the uncertain data to represent the uncertain data more reasonably. We propose the concept and calculation method of interval core distance and interval reachable distance with low computational complexity, which are used to measure the similarity between uncertain data and expand the cluster structure of clusters and object sorting. This algorithm can well find clusters of arbitrary density. Experimental results show that the UDOPTICS algorithm has higher clustering accuracy and lower complexity.
 

Key words: uncertain data, interval number, density clustering algorithm, OPTICS