基于区间数的不确定性数据聚类算法:UD-OPTICS

计算机工程与科学

基于区间数的不确定性数据聚类算法:UD-OPTICS

吴翠先1,2,3，何少元1,2

（1.重庆邮电大学通信与信息工程学院，重庆 400065；2.重庆邮电大学通信新技术应用研究中心，重庆 400065；

3.重庆信科设计有限公司,重庆 401121）

收稿日期:2018-07-24 修回日期:2018-11-05 出版日期:2019-07-25 发布日期:2019-07-25

UD-OPTICS: An uncertain data clustering

algorithm based on interval number

WU Cuixian1,2,3，HE Shaoyuan1,2

(1.School of Telecommunication and Information Engineering,
Chongqing University of Posts and Telecommunications,Chongqing 400065;

2.Research Center of New Telecommunication Technology Applications,
Chongqing University of Posts and Telecommunications,Chongqing 400065;

3.Chongqing Information Technology Designing Company Limited,Chongqing 401121,China)

Received:2018-07-24 Revised:2018-11-05 Online:2019-07-25 Published:2019-07-25

摘要/Abstract

摘要：

在不确定性数据聚类算法的研究中，普遍需要假设不确定性数据服从某种分布，继而获得表示不确定性数据的概率密度函数或概率分布函数，然而这种假设很难保证与实际应用系统中的不确定性数据分布一致。现有的基于密度的算法对初始参数敏感，在对密度不均匀的不确定性数据聚类时，无法发现任意密度的类簇。鉴于这些不足，
提出基于区间数的不确定性数据对象排序识别聚类结构算法（UDOPTICS）。该算法利用区间数理论，结合不确定性数据的相关统计信息来更加合理地表示不确定性数据，提出了低计算复杂度的区间核心距离与区间可达距离的概念与计算方法，将其用于度量不确定性数据间的相似度，拓展类簇与对象排序识别聚类结构。该算法可很好地发现任意密度的类簇。实验结果表明，UDOPTICS算法具有较高的聚类精度和较低的复杂度。

关键词: 不确定性数据, 区间数, 密度聚类算法, OPTICS

Abstract:

The research on uncertain data clustering algorithms generally assumes that uncertain data obeys a certain distribution, so we can obtain the probability density function or probability distribution function which represents the uncertain data. However, it is difficult to guarantee the consistency between the assumed distribution and the
distribution of uncertain data in practical applications. Existing algorithms based on density are sensitive to initial parameters, so they cannot find class clusters of arbitrary density when clustering uncertain data with uneven density. In view of these shortcomings, we propose an algorithm based on interval number for uncertain data object sorting recognition clustering structure (UDOPTICS). It uses the interval number theory and the statistical information of the uncertain data to represent the uncertain data more reasonably. We propose the concept and calculation method of interval core distance and interval reachable distance with low computational complexity, which are used to measure the similarity between uncertain data and expand the cluster structure of clusters and object sorting. This algorithm can well find clusters of arbitrary density. Experimental results show that the UDOPTICS algorithm has higher clustering accuracy and lower complexity.

Key words: uncertain data, interval number, density clustering algorithm, OPTICS

吴翠先1,2,3，何少元1,2. 基于区间数的不确定性数据聚类算法:UD-OPTICS[J]. 计算机工程与科学.

WU Cuixian1,2,3，HE Shaoyuan1,2.

UD-OPTICS: An uncertain data clustering

algorithm based on interval number

[J]. Computer Engineering & Science.

编辑推荐

Metrics

阅读次数

全文

234

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	234	0	0

来源	本网站	其他网站

次数	181	53
比例	77%	23%

摘要

127

最新录用	在线预览	正式出版

127	0	0

	来源	本网站

	次数	127
	比例	100%

[1]	肖雪,薛善良. 基于改进的OPTICS聚类和LOPW的离群数据检测算法[J]. 计算机工程与科学, 2019, 41(05): 885-892.
[2]	覃朗，朱建军. 基于超立方体顶点采样的区间数SVM分类模型研究[J]. 计算机工程与科学, 2017, 39(11): 2131-2138.
[3]	高长元1,2，王海晶1，王京1,2. 基于改进CURE算法的不确定性移动用户数据聚类[J]. J4, 2016, 38(04): 768-774.
[4]	吴佳伟，刘国华，王梅. 匿名隐私保护模型中不确定性数据的建模问题研究[J]. J4, 2011, 33(9): 7-12.