• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2012, Vol. 34 ›› Issue (2): 134-138.

• 论文 • 上一篇    下一篇

选择性集成算法分类与比较

赵强利,蒋艳凰,徐 明   

  1. (国防科学技术大学计算机学院,湖南 长沙 410073)
  • 收稿日期:2010-01-06 修回日期:2010-04-25 出版日期:2012-02-25 发布日期:2012-02-25

Categorization and Comparison of the Ensemble Pruning Algorithms

ZHAO Qiangli,JIANG Yanhuang,XU Ming   

  1. (School of Computer Science,National University of Defense Technology,Changsha 410073,China)
  • Received:2010-01-06 Revised:2010-04-25 Online:2012-02-25 Published:2012-02-25

摘要:

选择性集成是当前机器学习领域的研究热点之一。由于选择性集成属于NP“难”问题,人们多利用启发式方法将选择性集成转化为其他问题来求得近似最优解,因为各种算法的出发点和描述角度各不相同,现有的大量选择性集成算法显得繁杂而没有规律。为便于研究人员迅速了解和应用本领域的最新进展,本文根据选择过程中核心策略的特征将选择性集成算法分为四类,即迭代优化法、排名法、分簇法、模式挖掘法;然后利用UCI数据库的20个常用数据集,从预测性能、选择时间、结果集成分类器大小三个方面对这些典型算法进行了实验比较;最后总结了各类方法的优缺点,并展望了选择性集成的未来研究重点。

关键词: 集成学习;选择性集成;排名法;分簇法;迭代优化法;模式挖掘法

Abstract:

Ensemble pruning is an active research direction in the machine learning field. Ensemble pruning is an NPhard problem, most researchers use heuristics to obtain near optimal solutions. There are already many ensemble pruning approaches in literatures, but because of the different perspectives on which those methods are based, it is difficult to understand them clearly. In this paper, the ensemble pruning approaches are divided into four categories according to their pruning strategies: optimizationbased, rankingbased, clustering based and pattern miningbased. Next, the popular algorithms of each category are implemented and tested on 20 datasets from the UCI repository, and compared from three facets: prediction performance, pruning time and the size of the final ensembles. The advantages and disadvantages of each category are analyzed. The paper ends with some conclusions and future work.

Key words: ensemble learning;ensemble pruning;optimization based pruning;ranking based pruning;clustering based pruning;pattern mining based pruning