• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science

Previous Articles     Next Articles

Comparative study of big data active learning
based on MapReduce and Spark

ZHAI Jun-hai1,2,QI Jia-xing1,2,SHEN Chu1,2,SONG Dan-dan1,2,WANG Mo-han1,2,TIAN Shi1,2   

  1. (1.Hebei Key Laboratory of Machine Learning and Computational Intelligence,Baoding 071002;
    2.College of Mathematics and Information Science,Hebei University,Baoding 071002,China)
  • Received:2019-04-20 Revised:2019-06-18 Online:2019-10-25 Published:2019-10-25

Abstract:

In our previous work, a big data active learning algorithm based on MapReduce was proposed. In this paper, we transplant this algorithm into the Spark environment and propose a Spark based big data active learning algorithm. Furthermore, the two algorithms are experimentally compared on four aspects: running time, number of files, number of synchronizations, and memory cost. Some valuable conclusions are obtained,which can be very helpful to researchers in the related fields.

Key words: big data, machine learning, active learning, instance selection, open source framework