• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2024, Vol. 46 ›› Issue (02): 282-291.

• 人工智能与数据挖掘 • 上一篇    下一篇

融合特征权重与改进粒子群优化的特征选择算法

刘振超1,苑迎春1,2,王克俭1,2,何晨1   

  1. (1.河北农业大学信息科学与技术学院,河北 保定 071000;
    2.河北农业大学河北省农业大数据重点实验室,河北 保定 071000)

  • 收稿日期:2022-08-23 修回日期:2022-10-24 接受日期:2024-02-25 出版日期:2024-02-25 发布日期:2024-02-24
  • 基金资助:
    河北省高等教育教学改革研究与实践项目(2020GJJG076)

Feature selection algorithm based on feature weights and improved particle swarm optimization

LIU Zhen-chao1,YUAN Ying-chun1,2,WANG Ke-jian1,2,HE Chen1   

  1. (1.College of Information Science and Technology,Hebei Agricultural University,Baoding  071000;
    2.Hebei Key Laboratory of Agricultural Big Data,Hebei Agricultural University,Baoding  071000,China)
  • Received:2022-08-23 Revised:2022-10-24 Accepted:2024-02-25 Online:2024-02-25 Published:2024-02-24

摘要: 随着教育信息化的发展,教育数据呈现特征数量高、冗余度高等特点,这使目前的分类算法在教育数据上分类准确率不理想。提出一种将特征权重算法与改进粒子群优化算法融合的混合式特征选择算法(RF-ATPSO)。该算法首先使用RELIEF-F算法计算各个特征的权重,筛除冗余特征,然后在筛选后的特征集合中利用改进粒子群算法搜索最优特征子集。实验结果表明,在6个UCI公共数据集上,经RF-ATPSO算法进行特征选择后,平均准确率提升了10.04%,且平均特征子集规模最小、收敛速度最快;在学生学业成绩画像特征数据集上,该算法以较小的特征子集规模达到较高的分类准确率,平均准确率为94.77%,明显优于其它特征选择算法,实验充分证明了该算法具有实际应用意义。

关键词: 特征选择, 特征权重, 改进粒子群优化, T-分布

Abstract: With the development of educational informatization, educational data presents characteristics such as high feature counts and high redundancy, resulting in the classification accuracy of current classification algorithms not being ideal on educational data. Therefore, this paper proposes a hybrid feature selection algorithm (RF-ATPSO) that integrates feature weighting algorithm with improved particle swarm optimization algorithm. The algorithm first uses the RELIEF-F algorithm to calculate the weights of each feature, removes redundant features, and then uses the improved particle swarm optimization algorithm to search for the optimal feature subset in the filtered feature set. Experimental results show that on 6 UCI public datasets, after feature selection using the RF-ATPSO algorithm, the average accuracy is improved by 10.04%, and the average feature subset size is the smallest and the convergence speed is the fastest. In the student academic performance portrait feature dataset, the algorithm achieves high classification accuracy with a smaller feature subset size, with an average accuracy of 94.77%, which is significantly better than other feature selection algorithms. The experiment fully demonstrates the practical application significance of this algorithm.


Key words: feature selection, feature weight, improved PSO, T-distribution