• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2008, Vol. 30 ›› Issue (4): 69-72.

• 论文 • 上一篇    下一篇

主动学习中一种基于委员会的误分类采样算法

龙军 殷建平 祝恩 赵文涛   

  • 出版日期:2008-04-01 发布日期:2010-05-19

  • Online:2008-04-01 Published:2010-05-19

摘要:

主动学习通过主动选择要学习的样例进行标注,从而有效地降低学习算法的样本复杂度。针对当前主动学习算法普遍采用的平分版本空间策略,本文提出过半缩减版本空间的策略,这种策略避免了平分版本空间策略所要求的较强假设。基于过半缩减版本空间的策略,本文实现了一种选取具有最大可能性被误分类的样例作为训练样例的启发式主动动学习算法(CBMPMS)。该算法计算版本空间中随机抽取的假设组成的委员会和当前学习器对样例预测的类概率差异的熵,以此作为选择样例的标准。针对UCI数据集的实验表明,该算法能够在大多数数据集上取得比相关研究更好的性能。

关键词: 主动学习 误分类采样 版本空间缩减

Abstract:

By selecting the most informative instances for labeling, active learners can significantly reduce the instance complexity of learning methods. We pre sent the sampling strategy of reducing the volume of version space by more than half while the leading active learning methods utilize the strategy of halving version spaces when sampling. Via the Halving Model, we propose an active learning method called CBMPMS(Committee-Based Most Possible Misclassif fication Sampiing) ,which samples the instances that have the largest probability to be misclassified by the current classifier. The method calculates  the entropy of the difference between the class probability distribution predicted by the current classifier ,and the committee selected randomly from  the current version space, and then takes it as the sampling criterion. The experiments on the UCI datasets show that the proposed method outperforms the traditional sampling methods on most selected datasets.

Key words: active learning, misclassification sampling, version sapee reduction