• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 高性能计算 • 上一篇    下一篇

基于选择性集成的并行多分类器融合方法

陶晓玲1,2,亢蕊楠3,刘丽燕3   

  1. (1.桂林电子科技大学广西云计算与大数据协同创新中心,广西 桂林 541004;
    2.桂林电子科技大学广西密码学与信息安全重点实验室,广西 桂林 541004;
    3.桂林电子科技大学信息与通信学院,广西 桂林 541004)
  • 收稿日期:2016-11-16 修回日期:2017-03-30 出版日期:2018-05-25 发布日期:2018-05-25
  • 基金资助:

    国家自然科学基金(61363006);广西自然科学基金(2016GXNSFAA380098);广西云计算与大数据协同创新中心开放课题(YD16803);桂林电子科技大学研究生科研创新项目(2016YJCX94)

A parallel multi-classifier fusion approach
based on selective ensemble

TAO Xiao-ling1,2,KANG Rui-nan3,LIU Li-yan3   

  1. (1.Guangxi Collaboration Innovation Center of Cloud Computing and Big Data,
    Guilin University of Electronic Technology,Guilin 541004;
    2.Guangxi Key Laboratory of Cryptography and Information Security,
    Guilin University of Electronic Technology,Guilin 541004;
    3.College of Information and Communication,Guilin University of Electronic Technology,Guilin 541004,China)
  • Received:2016-11-16 Revised:2017-03-30 Online:2018-05-25 Published:2018-05-25

摘要:

为解决多分类器融合过程中时间开销大和准确率不高的问题,采用改进的Bagging方法并结合MapReduce技术,提出了一种基于选择性集成的并行多分类器融合方法PMCF-SE。该方法基于MapReduce并行计算架构。在Map阶段,选择分类效果较好的基分类器;在Reduce阶段,从所选的基分类器中选择差异性较大的基分类器,然后采用D-S证据理论融合被选的基分类器。实验结果表明,在执行效率方面,与单机环境相比,集群环境下该方法的执行效率有所提高;在分类准确率方面,与Bagging算法相比,PMCF-SE在不同的基分类器数目下的分类准确率都高于Bagging算法。

关键词: 多分类器融合, 选择性集成, D-S证据理论, MapReduce, 并行化

Abstract:

In order to solve the problem of large time and low accuracy in the process of multi-classifier fusion,a Parallel Multi-Classifier Fusion Approach based on Selective Ensemble (PMCF-SE) is proposed by adopting both the improved Baggingmethod and MapReduce technique. Our approach is based on the MapReduce parallel computing framework.In the Map phase,the base classifiers with better classification performance are selected. In the Reduce phase,the base classifiers of greater diversity are selected, and then the selected base classifiers are fused with the D-S evidence theory. Experimental results show that, compared with the stand-alone environment, the execution efficiency of the classification model in the cluster environment is improved. PMCF-SE has higher classification accuracy than the Bagging algorithm under different numbers of base classifiers.
 

Key words: multi-classifier fusion, selective ensemble, D-S evidence theory, MapReduce, parallelization