• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 论文 • 上一篇    下一篇

基于Spark的并行遗传算法求解多峰函数极值

刘鹏1,2,叶帅3,孟磊1,2,王灿4   

  1. (1.中国矿业大学物联网(感知矿山)研究中心,江苏 徐州 221008;
    2.矿山互联网应用技术国家地方联合工程实验室,江苏 徐州 221008;
    3.中国矿业大学信息与控制工程学院,江苏 徐州 221116;4.华东计算技术研究所航天产品部,上海 201808)
  • 收稿日期:2017-09-02 修回日期:2017-11-05 出版日期:2018-02-25 发布日期:2018-02-25
  • 基金资助:

    国家自然科学基金(61471361,41302203)

A Spark based parallel genetic algorithm
solving multimodal function extremums
 

LIU Peng1,2,YE Shuai3,MENG Lei1,2,WANG Can4   

  1. (1.Internet of Things Perception Mine Research Center,China University of Mining and Technology,Xuzhou 221008;
    2.National and Local Joint Engineering Laboratory of Internet Application Technology on Mine,Xuzhou 221008;
    3.School of Information and Control Engineering,China University of Mining and Technology,Xuzhou 221116;
    4.Aerospace Products Division,East China Institute of Computing Technology,Shanghai 201808,China)
  • Received:2017-09-02 Revised:2017-11-05 Online:2018-02-25 Published:2018-02-25

摘要:

遗传算法求解多峰函数极值需进行反复多次的迭代运算,面对大数据样本时会出现运算效率过低的现象,这极大地限制了遗传算法的实际应用。经典Hadoop并行平台可在一定程度上提高遗传算法的运行效率,而新一代Spark并行平台可以更加充分地发挥遗传算法的并行潜能。设计并实现了基于Spark的并行遗传算法,在各个子节点上并行执行子种群个体的交叉、变异等操作,达到了高度并行化进化种群以高效求取多峰函数极值的目的。为方便比较,同时设计并实现了单机及Hadoop平台下的相应算法。实验结果表明,处理大数据样本时,相比传统单机和Hadoop平台,基于Spark的并行化遗传算法显著降低了求解多峰函数极值的耗时,大幅提高了算法的效率;同时,由于其并行计算带来的强大随机性,也有效避免了种群单一过早收敛的问题,提高了算法的准确性。

关键词: 遗传算法, 多峰函数, 极值, 并行计算, Spark, Hadoop

Abstract:

The Genetic Algorithm (GA) needs many computation iterations in solving multimodal function extremums, so its running efficiency is too low when dealing with large-scale data, which greatly limits its practical application. The classical parallel platform Hadoop can improve the GA running efficiency to some extent, while the state-of-the-art parallel platform Spark can release much more parallelism of GA by realizing parallel crossover, mutation and other operations on each computing node. For the convenience of comparison, the GA solving multimodal function extremums are designed and implemented on single node, Hadoop and Spark, respectively. Experimental results show that, compared with single node platform and Hadoop platform, the Spark based implementation not only significantly reduces the running time but also effectively avoids the problem of premature convergence because of its powerful randomness, while dealing with large-scale samples.

Key words: genetic algorithm, multimodal function, extremum, parallel computing, Spark, Hadoop