• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2013, Vol. 35 ›› Issue (10): 137-143.

• 论文 • 上一篇    下一篇

基于特征选择的集成多标签分类算法

李玲1,刘华文1,2,马宗杰1,赵建民1   

  1. (1.浙江师范大学数理与信息工程学院,浙江 金华 321004;2.中国科学院数学与系统科学研究院,北京 100055)
  • 收稿日期:2013-05-27 修回日期:2013-07-25 出版日期:2013-10-25 发布日期:2013-10-25
  • 基金资助:

    国家自然科学基金资助项目(61100119,61272130,61272468,61170108,61170109);模式识别国家重点实验室开放课题基金(201204214);中国博士后基金资助项目(2013M530072)

An ensemble multilabel classification
method using feature selection           

LI Ling1,LIU Huawen2,MA Zongjie2,ZHAO Jianmin1   

  1. (1.College of Mathematics,Physics and Information Engineering,Zhejiang Normal University,Jinhua 321004;
    2.Academy of Mathematics and Systems Science,CAS,Beijing 100055,China)
  • Received:2013-05-27 Revised:2013-07-25 Online:2013-10-25 Published:2013-10-25

摘要:

与传统分类方法相似,多标签学习同样面临着因数据高维引起的问题,如过拟合和维灾难等。虽然目前已经提出了一些多标签分类算法,但多标签数据的高维性问题并未得到普遍重视。针对这个问题,利用条件互信息度量特征与类别标签之间的相关性,依此实施特征选择操作,以发现重要特征。在此基础上,提出了一种新的多标签集成分类算法。模拟实验结果表明,与经典分类算法相比,本文提出的集成算法在大多数情况下取得了较优的分类效果。

关键词: 数据挖掘;多标签分类;特征选择;条件互信息

Abstract:

Similar to traditional learning methods, multilabel learning also suffers from the problems, such as overfitting and the curse of dimensionality, which are raised from high dimensionality of data. Although many multilabel learning algorithms have been proposed, the issue of the high dimensionality has not yet received enough attentions. To solve this problem, we exploit the correlation of features to classify labels by using conditional mutual information, and then perform feature selection on data. Furthermore, a new ensemble learning algorithm for multilabel data is proposed. Experiment results on several multilabel data sets show that the proposed algorithm outperforms the wellestablished multilabel learning algorithms in most cases.

Key words: data mining;multilabel learning;feature selection;conditional mutual information