• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 人工智能与数据挖掘 • 上一篇    下一篇

基于信息熵的集成学习过程多样性度量研究

周钢,郭福亮   

  1. (海军工程大学,湖北 武汉 430033)
  • 收稿日期:2018-10-29 修回日期:2019-01-08 出版日期:2019-09-25 发布日期:2019-09-25

Process diversity measurement of
ensemble learning based on information entropy

ZHOU Gang,GUO Fu-liang   

  1. (Naval University of Engineering,Wuhan 430033,China)
  • Received:2018-10-29 Revised:2019-01-08 Online:2019-09-25 Published:2019-09-25

摘要:

基分类器的多样性是提升集成学习的精度和泛化能力的重要因素,大数据环境下的传统后验证多样性度量方法计算效率较低,提出一种基于信息熵的过程多样性度量方法。通过使用分类器各属性的增益及其所在树层次得到属性集的联合增益,并计算分类器间的熵距离评估其多样性,利用熵距离按照K-means方法即可动态购置集成学习分类器。在西瓜数据集和典型分类数据集上进行比较研究,发现与传统集成学习方法相比,该方法具有相近的准确性和更高的计算效率。
 

关键词: 集成学习, 过程多样性, 联合增益, K-means, 多样性度量

Abstract:

The diversity of base classifiers is an important factor to improve the accuracy and generalization ability of ensemble learning. The traditional post verification diversity measurement method is inefficient in big data environment. We therefore propose a process diversity measurement method based on information entropy. The method uses the gain of each attribute of the base classifier and its tree level to obtain the joint gain of the attribute set, and calculates the entropy distance between the classifiers to evaluate their diversity. It dynamically integrates the learning classifier with the entropy distance in accordance with the K-means method. Compared with traditional methods on watermelon dataset and other classification datasets, it is found that the proposed method has similar accuracy and higher computational efficiency.
 
 

Key words: ensemble learning, process diversity, joint gain, K-means, diversity measurement