• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2021, Vol. 43 ›› Issue (11): 2049-2055.

• 人工智能与数据挖掘 • 上一篇    下一篇

基于标签共现关系的多标签特征选择

李雨晨1,魏巍1,2,白伟明1,王达1


  

  1. (1.山西大学计算机与信息技术学院,山西 太原 030006;

    2.山西大学计算智能与中文信息处理教育部重点实验室,山西  太原 030006)
  • 收稿日期:2020-09-10 修回日期:2020-11-23 接受日期:2021-11-25 出版日期:2021-11-25 发布日期:2021-11-23
  • 基金资助:
    国家自然科学基金(61772323,61976184)

Multi-label feature selection based on label co-occurrence relationship

LI Yu-chen1,WEI Wei1,2,BAI Wei-ming1,WANG Da1   

  1. (1.School of Computer and Information Technology,Shanxi University,Taiyuan 030006;

    2.Key Laboratory of Computation Intelligence and Chinese Information Processing,

    Shanxi University,Ministry of Education,Taiyuan 030006,China)

  • Received:2020-09-10 Revised:2020-11-23 Accepted:2021-11-25 Online:2021-11-25 Published:2021-11-23

摘要: 多标签数据广泛存在于现实世界中,多标签特征选择是多标签学习中重要的预处理步骤。基于模糊粗糙集模型,研究人员已经提出了一些多标签特征选择算法,但是这些算法大多没有关注标签之间的共现特性。为了解决这一问题,基于样本标签间的共现关系评价样本在标签集下的相似关系,利用这种关系定义了特征与标签之间的模糊互信息,并结合最大相关与最小冗余原则设计了一种多标签特征选择算法LC-FS。在5个公开数据集上进行了实验,实验结果表明了所提算法的有效性。


关键词: 多标签, 特征选择, 模糊粗糙集, 模糊互信息

Abstract: Multi-label data widely exists in the real world, and multi-label feature selection is an important preliminary step in multi-label learning. Based on the fuzzy rough set model, researchers have proposed multi-label feature selection algorithms, but most of these algorithms do not pay attention to the co-occurrence characteristics between labels. In order to solve this problem, the similar relationship between the samples under the label set is evaluated based on the co-occurrence relationship between the sample labels. This relationship is used to define the fuzzy mutual information between the feature and the label. Combining the principle of maximum correlation and minimum redundancy, a multi-label feature selection algorithm is designed. Experiments on 5 public data sets show the effectiveness of the proposed algorithm.


Key words: multi-label, feature selection, fuzzy rough set, fuzzy mutual information