• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 论文 • 上一篇    下一篇

面向高性能计算环境的作业优化调度模型的设计与实现

王小宁,肖海力,曹荣强   

  1. (中国科学院计算机网络信息中心,北京 100190)
  • 收稿日期:2016-12-09 修回日期:2017-02-20 出版日期:2017-04-25 发布日期:2017-04-25
  • 基金资助:

    国家重点研发计划项目(2016YFB0201404);十二五863重大项目(2014AA01A302)

Design and implementation of an optimal job scheduling #br# model in the high performance computing environment 

WANG Xiao-ning,XIAO Hai-li,CAO Rong-qiang   

  1. (Computer Network Information Center,Chinese Academy of Sciences,Beijing 100190,China)
  • Received:2016-12-09 Revised:2017-02-20 Online:2017-04-25 Published:2017-04-25

摘要:

高性能计算环境聚合了多个分布在不同地域、不同组织机构的高性能计算资源,面向用户提供统一的访问入口和使用方式,由系统中间件根据用户作业请求匹配合适的高性能计算资源。随着环境应用编程接口的开放以及作业请求数量的大幅增加,面对高并发作业提交请求时,目前采用的即时调度模型会由于网络等原因导致一定数量的请求处理失败,同时缺乏灵活性。针对此问题,优化了环境作业调度模型,引入作业环境队列,细化了作业系统层状态,增加了作业调度策略可配置性,并基于环境中间件SCE实现了系统原型。经测试,在单核心服务每分钟处理近200个作业提交请求的工作负载下,无因系统和网络原因引起的作业提交出错现象;在共计1 000个作业中,近500个作业提交命令请求在0.3 s以内完成,800余个作业提交命令请求在0.5 s以内完成。

关键词: 中国国家网格, 高性能计算环境, 网格计算, 云服务, 作业调度

Abstract:

The high performance computing environment is a computing platform, which aggregates multiple distributed high performance computers from indifferent organizations, providing users with unified access and usage patterns. The system middleware matches the appropriate highperformance computing resources according to users’job request. With the opening of the environment programming interface (API) and the substantial increase in the number of job submission requests, some job submission requests fail because of too many network connections under high concurrent job submission requests. Also, the job scheduling strategy is lack of flexibility. We propose an optimized job scheduling model in the high performance computing environment, which introduces environment job queues, refines the systemlevel status for each job, and increases the configuration of job scheduling strategy. We also implement a prototype system based on middleware SCE. Test results show that no job request fails under the workload of 200 job requests each minute in a single system service. In a total of 1000 jobs, nearly 500 job submissions are completed within 0.3 seconds, and more than 800 job submissions are completed in less than 0.5 seconds.

Key words: CNGrid, high performance computing environment, grid computing, cloud service, job scheduling