• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2022, Vol. 44 ›› Issue (04): 723-729.

• 人工智能与数据挖掘 • 上一篇    下一篇

基于广义重要度和runner-root算法的特征选择

吴尚智,徐丹丹,王旭文,夏宁   

  1. (西北师范大学计算机科学与工程学院,甘肃 兰州 730070)
  • 收稿日期:2020-09-15 修回日期:2021-01-10 接受日期:2022-04-25 出版日期:2022-04-25 发布日期:2022-04-20
  • 基金资助:
    国家自然科学基金(61561043);甘肃省自然科学基金(1010RJZA011)

Feature selection based on general importance and runner-root algorithm

WU Shang-zhi,XU Dan-dan,WANG Xu-wen,XIA Ning   

  1. (College of Computer Science & Engineering,Northwest Normal University,Lanzhou 730070,China)
  • Received:2020-09-15 Revised:2021-01-10 Accepted:2022-04-25 Online:2022-04-25 Published:2022-04-20

摘要: 特征选择是机器学习、模式识别和数据挖掘等领域数据预处理阶段的重要步骤。现实中采集的数据维度很高,存在大量冗余和噪声数据,这使得计算时间增加的同时还会对建模结果产生误导性。结合属性子集的广义重要度和智能优化runner-root算法提出一种特征选择算法,用runner-root算法进行迭代寻优,用属性子集的广义重要度和所选特征子集的大小作为适应度函数,对所选特征子集进行评估,尽可能在整个样本空间内搜索出对决策重要的特征子集。实验结果表明,该算法可以选择出有效的特征子集,使分类模型得到较高的准确率。

关键词: 智能优化, 广义重要度, runner-root算法, 特征选择

Abstract: Feature selection is an important step in the data preprocessing stage in machine learning, pattern recognition, data mining and other fields. In reality, the data information collected is of high dimension, and there are redundant data and noisy data, which will increase the calculation time and mislead the modeling results at the same time. Combined with the generalized importance of attribute subsets and the intelligent optimization runner-root algorithm, a feature selection algorithm  is proposed. The method uses the runner-root algorithm for iterative optimization, and uses the generalized importance of attribute subsets and the size of the selected feature subsets as fitness functions to evaluate the selected feature subsets, so that the features that are important for decision making are searched out as far as possible in the entire sample space. The experimental results show that the proposed feature selection algorithm  can select effective feature subsets and obtain higher accuracy on the classification model. 

Key words: intelligent optimization, general importance, runner-root algorithm, feature selection