• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2020, Vol. 42 ›› Issue (07): 1309-1317.

• 数据挖掘与人工智能 • 上一篇    下一篇

一种基于关联规则的MLKNN多标签分类算法

杨岚雁1,靳敏1,张迎春2,张珣1   

  1. (1.北京工商大学计算机与信息工程学院,北京 100048;2.北京工商大学信息网络中心,北京 100048)
  • 收稿日期:2019-10-08 修回日期:2020-01-03 接受日期:2020-07-25 出版日期:2020-07-25 发布日期:2020-07-27
  • 基金资助:
    北京市属高校高水平教师队伍建设支持计划(CIT&TCD201904037);中国博士后科学基金(2017M620885)

A MLKNN multi-label classification  algorithm based on association rules

YANG Lan-yan1,JIN Min1,ZHANG Ying-chun2,ZHANG Xun1   

  1. (1.School of computer and information engineering,Beijing Technology and Business University,Beijing 100048;

    2.Information Network Center,Beijing Technology and Business University,Beijing 100048,China)

  • Received:2019-10-08 Revised:2020-01-03 Accepted:2020-07-25 Online:2020-07-25 Published:2020-07-27

摘要: 针对MLKNN算法仅对独立标签进行处理,忽略现实世界中标签之间相关性这一问题,提出了一种基于关联规则的MLKNN多标签分类算法(FP-MLKNN)。该算法采用关联规则算法挖掘标签之间的高阶相关性,并用标签之间的关联规则改进MLKNN算法,以达到提升分类性能的目的。首先,使用MLKNN算法求样本的特征置信度;采用关联规则算法挖掘生成一系列强关联规则,进而将2种算法进行融合来构造多标签分类器,对新标签进行预测;在此基础上,将本文提出的算法与MLKNN、AdaBoostMH和BPMLL这3种算法进行实验对比。实验结果表明,本文所提算法在yeast、emotions和enron数据集上的分类性能均优于这3种算法,具有较好的分类效果。


关键词: 多标签分类, MLKNN, 关联规则, 高阶相关性

Abstract: Aiming at the problem that the MLKNN algorithm ignores the correlation between labels in the real world when dealing with independent labels, this paper proposes an MLKNN multi-label classification algorithm (FP-MLKNN) based on association rules. The algorithm uses association rules to mine high-order correlations between labels, and applies the association rules between labels to the MLKNN algorithm for improvement to achieve the purpose of improving the classification performance. Firstly, the MLKNN algorithm is used to obtain the characteristic confidence of the sample. Secondly, the association rule algorithm is used to mine and generate a series of strong association rules. Thirdly, the two algorithms are fused to construct a multi-label classifier to predict new labels. Experimental results show that the proposed algorithm has better classification performance than MLKNN, AdaBoostMH and BPMLL algorithms on yeast, emotions, and enron datasets, achieving a good classification effect.

Key words: multi-label classification, MLKNN, association rules;high order correlation