• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 高性能计算 • 上一篇    下一篇

云计算环境下基于可靠性感知的任务调度算法

齐平1,2,王福成1,王必晴1,梁昌勇2   

  1. (1.铜陵学院数学与计算机学院,安徽 铜陵 244000;2.合肥工业大学管理学院,安徽 合肥 230039)
  • 收稿日期:2018-04-11 修回日期:2018-06-20 出版日期:2018-11-25 发布日期:2018-11-25
  • 基金资助:

    国家自然科学基金重点项目(71331002);安徽省高校优秀青年骨干人才国内外访学研修项目(gxfx2017113);铜陵学院人才科研启动基金(2015tlxyrc08)

A reliability aware task scheduling
 algorithm for cloud computing

QI Ping1,2,WANG Fucheng1,WANG Biqing1,LIANG Changyong2   

  1. (1.Department of Mathematics and Computer Science,Tongling University,Tongling 244000;
    2 School of Management,Hefei University of Technology,Hefei 230039,China)
  • Received:2018-04-11 Revised:2018-06-20 Online:2018-11-25 Published:2018-11-25

摘要:

针对云计算环境下并行任务易受资源失效的影响而无法完成,且动态提供云资源可靠性较低的问题,首先,引入失效恢复机制,由于在失效可恢复情况下资源失效规律动态变化,使用两参数Weibull分布对不同时段资源节点和通信链路失效规律的局部特征进行描述;然后,根据并行任务之间存在的各类交互关系分析,提出了一种基于变参数失效规则的资源可靠性评估模型;最后,将该模型并入粒子群算法得到基于可靠性感知的自适应惯性权重粒子群资源调度算法RPSO,从而在计算适应度时充分考虑备选资源的可靠程度。仿真实验结果表明,当选择了合适的失效恢复参数时,提出的RPSO算法能够大幅度提高云服务可靠性,且只会增加少量的额外失效恢复开销。

关键词: 云计算, 失效规律, 失效恢复机制, 粒子群优化, 资源调度

Abstract:

Parallel tasks in the cloud computing environment are vulnerable to resource failure and hence cannot be completed, and dynamically providing cloud resources has low reliability. Aiming at this issue, firstly, we introduce a failure recovery mechanism. Because the failure regularity of resources changes dynamically under the condition of failure recoverability, the twoparameter Weibull distribution is used to describe the local characteristics of resource nodes and the failure regularity of communication links in different time periods. Then, based on the analysis of various interactions between parallel tasks, we propose a resource reliability evaluation model based on variableparameter failure regularity. Finally, the model is incorporated into the particle swarm optimization algorithm to obtain the reliabilityaware and adaptive inertia weight PSO resource scheduling algorithm (RPSO), so that the reliability of the alternative resources is fully considered when calculating the fitness. Simulation results show that when appropriate failure recovery parameters are selected, the proposed RPSO algorithm can increase the reliability of cloud services and only add a small amount of additional failure recovery overhead.
 

Key words: cloud computing, failure regularity, failure recovery mechanism, particle swarm optimization, resource scheduling