• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2022, Vol. 44 ›› Issue (08): 1467-1473.

• 人工智能与数据挖掘 • 上一篇    下一篇

基于正余弦算法的文本特征选择

文武1,2,3,万玉辉1,2,文志云1,2   

  1. (1.重庆邮电大学通信与信息工程学院,重庆 400065;
    2.重庆邮电大学通信新技术应用研究中心,重庆 400065;3.重庆信科设计有限公司,重庆 401121)
  • 收稿日期:2020-07-10 修回日期:2021-01-19 接受日期:2022-08-25 出版日期:2022-08-25 发布日期:2022-08-25

Text feature selection based on sine and cosine algorithm

WEN Wu1,2,3,WAN Yu-hui1,2,WEN Zhi-yun1,2   

  1. (1.School of Communication and Information Engineering,
    Chongqing University of Posts and Telecommunications,Chongqing 400065;
    2.Research Center of New Telecommunication Technology Applications,
    Chongqing University of Posts and Telecommunications,Chongqing 400065;
    3.Chongqing Information Technology Designing Co.,Ltd.,Chongqing 401121,China)
  • Received:2020-07-10 Revised:2021-01-19 Accepted:2022-08-25 Online:2022-08-25 Published:2022-08-25

摘要: 为获取文本中的较优特征子集,剔除干扰和冗余特征,提出了一种结合过滤式算法和群智能算法的混合特征寻优算法。首先计算每个特征词的信息增益值,选取较优的特征作为预选特征集合,再利用正余弦算法对预选特征进行寻优,获取精选特征集合。为较好地平衡正余弦算法中的全局搜索和局部开发能力,加入了自适应惯性权重;为更精确地评价特征子集,引入以特征数量和准确率进行加权的适应度函数,并提出了新的位置更新机制。在KNN和贝叶斯分类器上的实验结果表明,该特征选择算法与其它特征选择算法及改进前的算法相比,分类准确率得到了一定的提升。

关键词: 特征选择, 正余弦, 惯性权重, 分类准确率

Abstract: In order to obtain a better feature subset in the text and eliminate interference and redundant features, a hybrid feature optimization algorithm  combining filtering and swarm intelligence algorithm is proposed. Firstly, the information gain value of each feature word is calculated, the better feature is selected as the preselected feature set, and then the sine cosine algorithm is used to optimize the preselected feature to obtain the selected feature set. In order to better balance the global search and local development capabilities in the sine-cosine algorithm, adaptive inertia weights are added. To more accurately evaluate feature subsets, a fitness function weighted by the number of features and accuracy is introduced, and a new location update mechanism is proposed. Experiment results on KNN and Bayesian classifier show that this feature selection model improves the classification accuracy, compared with other feature selection methods and the model before improvement.

Key words: feature selection, sine and cosine, inertia weight, classification accuracy