代价敏感的KPCA-Stacking不均衡数据分类算法

计算机工程与科学 ›› 2021, Vol. 43 ›› Issue (03): 525-533.

代价敏感的KPCA-Stacking不均衡数据分类算法

曹婷婷，张忠林

（兰州交通大学电子与信息工程学院，甘肃兰州 730070）

收稿日期:2019-12-31 修回日期:2020-04-27 接受日期:2021-03-25 出版日期:2021-03-25 发布日期:2021-03-29
基金资助:
国家自然科学基金（61662043）

A cost-sensitive imbalanced data classification algorithm based on KPCA-Stacking

CAO Ting-ting，ZHANG Zhong-lin

（College of Electronic and Information Engineering,Lanzhou Jiaotong University,Lanzhou 730070,China）

Received:2019-12-31 Revised:2020-04-27 Accepted:2021-03-25 Online:2021-03-25 Published:2021-03-29

摘要/Abstract

摘要： 代价敏感学习是解决不均衡数据分类问题的一个重要策略，数据特征的非线性也给分类带来一定困难，针对此问题，结合代价敏感学习思想与核主成分分析KPCA提出一种代价敏感的Stacking集成算法KPCA-Stacking。首先对原始数据集采用自适应综合采样方法(ADASYN)进行过采样并进行KPCA降维处理；其次将KNN、LDA、SVM、RF按照贝叶斯风险最小化原理转化为代价敏感算法作为Stacking集成学习框架的初级学习器，逻辑回归作为元学习器。在5个公共数据集上对比J48决策树等10种算法，结果表明代价敏感的KPCA-Stacking算法在少数类识别率上有一定提升，比单个模型的整体分类性能更优。

关键词: 不均衡数据, 代价敏感, KPCA, Stacking, ADASYN过采样, 分类

Abstract: Cost-sensitive learning is an important strategy to solve the problem of imbalanced data classification. The non-linearity of data characteristics also brings some difficulties to classification. In view of this problem, by combining cost-sensitive learning with kernel principal component analysis (KPCA), this paper proposes a cost-sensitive Stacking integration algorithm called KPCA-Stacking.
Firstly, the original data set is over-sampled by the adaptive synthetic sampling method (ADASYN) and KPCA dimensionality reduction is performed; Secondly, KNN, LDA, SVM, and RF are converted into cost-sensitive algorithms according to the Bayesian risk minimization principle as the primary learner in the Stacking integrated learning framework, and logistic regression is used as the meta-learner. Compa- rative experiments on 10 algorithms such as J48 decision tree in 5 public datasets show that the cost- sensitive KPCA-Stacking algorithm improves the recognition rate of a few classes to a certain extent, and is better than the overall classification performance of a single model.

Key words: imbalanced data, cost-sensitive, KPCA, Stacking, ADASYN oversampling, classification

曹婷婷, 张忠林. 代价敏感的KPCA-Stacking不均衡数据分类算法[J]. 计算机工程与科学, 2021, 43(03): 525-533.

CAO Ting-ting, ZHANG Zhong-lin. A cost-sensitive imbalanced data classification algorithm based on KPCA-Stacking[J]. Computer Engineering & Science, 2021, 43(03): 525-533.

编辑推荐

Metrics

阅读次数

全文

239

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	0	0	239

来源	本网站	其他网站

次数	191	48
比例	80%	20%

摘要

184

最新录用	在线预览	正式出版

0	0	184

	来源	本网站

	次数	184
	比例	100%

[1]	柴燕涛，董德尊，张鹤颖，朱成阳，廖湘科. 基于SDN架构的高性能网络拥塞避免策略[J]. J4, 20160101, 38(01): 1-10.
[2]	沈凡凡, 汤星译, 张军, 徐超, 陈勇, 何炎祥. 基于改进萤火虫算法和长短期记忆网络的恶意行为检测方法[J]. 计算机工程与科学, 2024, 46(12): 2158-2170.
[3]	冯兴杰, 曹若轩. 融合特征投影和负监督的文本分类[J]. 计算机工程与科学, 2024, 46(10): 1864-1874.
[4]	刘强, 李沐春, 伍晓洁, 王煜恒. S-JSMA：一种低扰动冗余的快速JSMA对抗样本生成方法[J]. 计算机工程与科学, 2024, 46(08): 1395-1402.
[5]	黄智慧, 肖祥立, 张玉书, 薛明富. 基于隐形后门水印的开源数据集版权保护[J]. 计算机工程与科学, 2024, 46(06): 1013-1021.
[6]	肖新正, 黄瑞章, 陈艳平, 秦永彬, 宋玉梅, 周裕林, . Corrective-Net：面向多标签文本分类的标签关联学习模块[J]. 计算机工程与科学, 2024, 46(06): 1092-1100.
[7]	佟缘, 姚念民. 基于对span的预判断和多轮分类的实体关系抽取[J]. 计算机工程与科学, 2024, 46(05): 916-928.
[8]	刘盼, 郭延明, 雷军, 王昊冉, 老松杨, 李国辉. 结合上下文的细粒度实体分类特征表示方法[J]. 计算机工程与科学, 2024, 46(05): 929-936.
[9]	高珊, 李世杰, 蔡志平. 基于深度学习的中文文本分类综述[J]. 计算机工程与科学, 2024, 46(04): 684-692.
[10]	罗月童, 李超, 周波, 张延孔. 面向工业缺陷分类的交互式易混淆缺陷分离方法研究[J]. 计算机工程与科学, 2024, 46(03): 463-470.
[11]	吕伏, 韩晓天, 冯永安, 项梁. 基于自适应纹理特征融合的纹理图像分类方法[J]. 计算机工程与科学, 2024, 46(03): 488-498.
[12]	张远洋, 贡正仙, 孔芳. 增强依存结构表达的零样本跨语言事件论元角色分类[J]. 计算机工程与科学, 2024, 46(03): 508-517.
[13]	董燕灵, 张淑芬, 徐精诚, 王豪石, . 面向Stacking算法的差分隐私保护研究[J]. 计算机工程与科学, 2024, 46(02): 244-252.
[14]	庞诺言, 关东海, 袁伟伟. 基于早期时间序列分类的可解释实时机动识别算法[J]. 计算机工程与科学, 2024, 46(02): 353-362.
[15]	马雪, 何星星, 兰咏琪, 李莹芳. 一阶逻辑中基于treelet图神经网络的前提选择[J]. 计算机工程与科学, 2024, 46(02): 374-380.