• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 论文 • 上一篇    下一篇

基于Hadoop平台的分布式SVM参数寻优

吴云蔚,宁芊   

  1. (四川大学电子信息学院,四川 成都 610065)
  • 收稿日期:2016-01-30 修回日期:2016-06-07 出版日期:2017-06-25 发布日期:2017-06-25
  • 基金资助:

    国家973计划(2013CB328903-2)

Distributed SVM parameter optimization based on Hadoop

WU Yun-wei,NING Qian   

  1. (College of Electronics and Information Engineering,Sichuan University,Chengdu  610065,China)
  • Received:2016-01-30 Revised:2016-06-07 Online:2017-06-25 Published:2017-06-25

摘要:

参数的选择对算法分类与预测的正确率有直接影响。在参数选择中全局网格搜索有着计算可靠、简单、优化效果明显的优势,适合应用于可靠性要求高的工程运算,如在复杂系统的故障诊断中对故障模式识别算法进行参数寻优等。但是,全局网格搜索在寻优过程中耗时过长,仍然是一个制约其使用的问题,尤其对于实时性要求较高的系统。以支持向量机的参数全局寻优问题为例,针对网格搜索寻优时间长的缺点,利用Hadoop平台进行分布式参数寻优,借助HDFS将参数自动划分到计算节点上,并运用MapReduce计算框架建立分布式参数寻优模型,完成模型训练预测及参数优化。实验结果表明,在不降低算法性能的前提下提高了寻优效率。

关键词: Hadoop, MapReduce, 支持向量机, 网格搜索, 参数寻优, 分布式计算

Abstract:

The classification and prediction accuracy of an algorithm are directly influenced by the choice of parameters, and among the methods of parameter selection, global grid search has obvious advantages, such as reliable and simple calculation, and obvious optimization effect, which are suitable for engineering operations that have high reliability requirement, for instance, parameter optimization of the fault pattern recognition algorithm in fault diagnosis of system. However, the global grid search is time-consuming in the search process, therefore there is still a constraint on use, especially for the system which has high real-time requirement. Using the global parameter optimization of support vector machine as a case, Hadoop platform is used for distributed parameter optimization in order to overcome the disadvantage of grid search. With HDFS, the parameters can be automatically divided into calculation nodes. We establish the distributed parameter optimization model by using the MapReduce computing framework, then conduct model training and prediction as well as parameter optimization. Experimental results show that the optimization efficiency is improved without reducing algorithm performance.
 

Key words: Hadoop, MapReduce, support vector machine, grid search, parameter optimization, distributed computing