• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 人工智能与数据挖掘 • 上一篇    下一篇

面向排序学习的层次聚类特征选择算法

孟昱煜,陈绍立,刘兴长   

  1. (兰州交通大学电子与信息学院,甘肃 兰州 730070)
     
  • 收稿日期:2018-06-11 修回日期:2018-11-15 出版日期:2019-12-25 发布日期:2019-12-25
  • 基金资助:

    甘肃省自然科学基金(1606RJZA003);甘肃省住房和城乡建设厅项目(JK2015-15)

A hierarchical clustering based feature
selection algorithm for ranking learning
 

MENG Yu-yu,CHEN Shao-li,LIU Xing-chang   

  1. (School of Electronic and Information Engineering,Lanzhou Jiaotong University,Lanzhou 730070,China)
  • Received:2018-06-11 Revised:2018-11-15 Online:2019-12-25 Published:2019-12-25

摘要:

大型搜索系统对用户查询的快速响应尤为必要,同时在计算候选文档的特征相关性时,必须遵守严格的后端延迟约束。通过特征选择,提高了机器学习的效率。针对排序学习中快速特征选择的起点多为单一排序效果最好的特征的特点,首先提出了一种用层次聚类法生成特征选择起点的算法,并将该算法应用于已有的2种快速特征选择中。除此之外,还提出了一种充分利用聚类特征的新方法来处理特征选择。在2个标准数据集上的实验表明,该算法既可以在不影响精度的情况下获得较小的特征子集,也可以在中等子集上获得最佳的排序精度。
 

关键词: 特征选择, 排序学习, 层次化聚类, 贪婪搜索

Abstract:

Large search systems are especially necessary for quick response to user queries. At the same time, strict backend delay constraints must be observed when calculating the feature relevance of candidate documents. Feature selection can improve the machine learning efficiency. Considering the characteristics that most of the initial points of fast feature selection in ranking learning start from the single feature, which has the best ranking effect, this paper first proposes an algorithm of generating initial points of fast feature selection by hierarchical clustering, and applies the algorithm to two existing fast feature selection algorithms. In addition, a new method that makes full use of clustering features is proposed to deal with feature selection. Experiments on two standard datasets show that the proposed algorithm can obtain a smaller feature subset without affecting the accuracy and obtain the best ranking accuracy on a medium subset.
 

Key words: feature selection, ranking learning, hierarchical clustering, greedy search algorithm