基于局部正、负标记相关性的k近邻多标记分类新算法

计算机工程与科学

基于局部正、负标记相关性的k近邻多标记分类新算法

蒋芸，肖潇，侯金泉，陈莉

（西北师范大学计算机科学与工程学院,甘肃兰州 730070）

收稿日期:2018-06-13 修回日期:2018-09-14 出版日期:2019-10-25 发布日期:2019-10-25
基金资助:
国家自然科学基金(61163036)；甘肃省科技计划资助自然科学基金(1606RJZA047)；2012年度甘肃省高校基本科研业务费专项资金；甘肃省高校研究生导师项目(1201-16)；西北师范大学第三期知识与创新工程科研骨干项目(nwnu-kjcxgc-03-67)

A new knn multi-label classification algorithm based

on local positive and negative labeling correlation

JIANG Yun，XIAO Xiao，HOU Jin Quan，CHEN Li

（College of Computer Science & Engineering,Northwest Normal University,Lanzhou 730070,China）

Received:2018-06-13 Revised:2018-09-14 Online:2019-10-25 Published:2019-10-25

摘要/Abstract

摘要：

在多标记学习中，每个样本都由一个实例表示，并与多个类标记相关联。现有的多标记学习算法大多是在全局利用标记相关性，即假设所有的样本共享不同类别标记之间的正相关性。然而，在实际应用中，不同的样本共享不同的标记相关性，标记间不仅存在正相关性，而且存在相互排斥的现象，即负相关性。针对这一问题，提出了基于局部正、负成对标记相关性的k近邻多标记分类算法PNLC。首先，对多标记数据的特征向量进行预处理，分别为每类标记构造对该类标记最具有判别能力的属性特征；然后，在训练阶段，PNLC算法通过所有训练样本中各样本的每个k近邻的真实标记构建标记之间的正、负局部成对相关性矩阵；最后，在测试阶段，首先得到每个测试样例的k近邻及其对应的正、负成对标记关系，利用该标记关系计算最大后验概率对测试样例进行预测。实验结果表明，PNLC算法在yeast和image数据集上的分类准确率明显优于其他常用的多标记分类算法。

关键词: 多标记学习, 正、负相关性, 标记独有特征, k近邻

Abstract:

In multi-label learning, each sample is represented by a single instance and associates with multiple class labels. Most of existing multi-label learning algorithms explore label correlations globally, by assuming that the positive label correlations are shared by all examples. However, in practical applications, different samples share different label correlations, and there is not only positive correlation among labels, but also mutually exclusive one (i.e., negative correlation). To solve this problem, we propose a KNN multi-label classification algorithm based on local positive and negative label correlation, named PNLC. Firstly, we preprocess the feature vector of multi-label data and construct the most discriminative features for each class. Then, in the training stage, the PNLC algorithm constructs the positive and negative label correlation matrixes by using the truth label of each k-nearest neighbor for all the training samples. Finally, in the test phase, the k-nearest neighbors and corresponding positive and negative pairwise label correlations for each test example are identified to calculate the maximum posterior probability so as to make prediction. Experimental results show that the PNLC algorithm is obviously superior to other well-established multi-label classification algorithms on the yeast and image datasets.

Key words: multi-label learning, positive and negative correlation, label specific feature, KNN

蒋芸，肖潇，侯金泉，陈莉. 基于局部正、负标记相关性的k近邻多标记分类新算法[J]. 计算机工程与科学.

JIANG Yun，XIAO Xiao，HOU Jin Quan，CHEN Li.

A new knn multi-label classification algorithm based

on local positive and negative labeling correlation

[J]. Computer Engineering & Science.

[1]	蒋芸，肖潇，侯金泉，陈莉. 融合标记独有属性特征的k近邻多标记分类新算法[J]. 计算机工程与科学, 2019, 41(03): 513-519.
[2]	万月，陈秀宏，何佳佳. 基于加权密度的自适应谱聚类算法[J]. 计算机工程与科学, 2018, 40(10): 1897-1901.
[3]	马军福，魏玮. 一种改进的快速SLIC分割算法[J]. 计算机工程与科学, 2017, 39(02): 317-322.
[4]	张晶,李德玉,王素格. 基于多标记学习的汽车评论文本多性能识别[J]. J4, 2016, 38(01): 188-194.
[5]	孙可1,3，龚永红1,2，邓振云1,3 . 一种高效的K值自适应的SA-KNN算法[J]. J4, 2015, 37(10): 1965-1970.
[6]	冯锦海1，杨连贺1，刘军发2，忽丽莎2. 基于WLAN移动定位的个性化商品信息推荐平台[J]. J4, 2014, 36(10): 1925-1931.
[7]	龙舜，朱蔚恒. 基于学习的迭代式优化编译中的经验适用性研究[J]. J4, 2010, 32(9): 115-118.
[8]	吴永英张吉根李晨阳. 金字塔多维索引分析及其算法实现[J]. J4, 2006, 28(10): 92-94.