• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 人工智能与数据挖掘 • 上一篇    下一篇

基于局部正、负标记相关性的k近邻多标记分类新算法

蒋芸,肖潇,侯金泉,陈莉   

  1. (西北师范大学计算机科学与工程学院,甘肃 兰州 730070)
  • 收稿日期:2018-06-13 修回日期:2018-09-14 出版日期:2019-10-25 发布日期:2019-10-25
  • 基金资助:

    国家自然科学基金(61163036);甘肃省科技计划资助自然科学基金(1606RJZA047);2012年度甘肃省高校基本科研业务费专项资金;甘肃省高校研究生导师项目(1201-16);西北师范大学第三期知识与创新工程科研骨干项目(nwnu-kjcxgc-03-67)

A new knn multi-label classification algorithm based
on local positive and negative labeling correlation

JIANG Yun,XIAO Xiao,HOU Jin Quan,CHEN Li   

  1. (College of Computer Science & Engineering,Northwest Normal University,Lanzhou 730070,China)
  • Received:2018-06-13 Revised:2018-09-14 Online:2019-10-25 Published:2019-10-25

摘要:

在多标记学习中,每个样本都由一个实例表示,并与多个类标记相关联。现有的多标记学习算法大多是在全局利用标记相关性,即假设所有的样本共享不同类别标记之间的正相关性。然而,在实际应用中,不同的样本共享不同的标记相关性,标记间不仅存在正相关性,而且存在相互排斥的现象,即负相关性。针对这一问题,提出了基于局部正、负成对标记相关性的k近邻多标记分类算法PNLC。首先,对多标记数据的特征向量进行预处理,分别为每类标记构造对该类标记最具有判别能力的属性特征;然后,在训练阶段,PNLC算法通过所有训练样本中各样本的每个k近邻的真实标记构建标记之间的正、负局部成对相关性矩阵;最后,在测试阶段,首先得到每个测试样例的k近邻及其对应的正、负成对标记关系,利用该标记关系计算最大后验概率对测试样例进行预测。实验结果表明,PNLC算法在yeast和image数据集上的分类准确率明显优于其他常用的多标记分类算法。
 

关键词: 多标记学习, 正、负相关性, 标记独有特征, k近邻

Abstract:

In multi-label learning, each sample is represented by a single instance and associates with multiple class labels. Most of existing multi-label learning algorithms explore label correlations globally, by assuming that the positive label correlations are shared by all examples. However, in practical applications, different samples share different label correlations, and there is not only positive correlation among labels, but also mutually exclusive one (i.e., negative correlation). To solve this problem, we propose a KNN multi-label classification algorithm based on local positive and negative label correlation, named PNLC. Firstly, we preprocess the feature vector of multi-label data and construct the most discriminative features for each class. Then, in the training stage, the PNLC algorithm constructs the positive and negative label correlation matrixes by using the truth label of each k-nearest neighbor for all the training samples. Finally, in the test phase, the k-nearest neighbors and corresponding positive and negative pairwise label correlations for each test example are identified to calculate the maximum posterior probability so as to make prediction. Experimental results show that the PNLC algorithm is obviously superior to other well-established multi-label classification algorithms on the yeast and image datasets.
 

Key words: multi-label learning, positive and negative correlation, label specific feature, KNN