数据中心制冷系统强化学习控制

计算机工程与科学 ›› 2025, Vol. 47 ›› Issue (3): 422-433.

数据中心制冷系统强化学习控制

魏东1，2，贾宇辰1，韩少然3

（1.北京建筑大学电气与信息工程学院，北京 100044；2.北京建筑大数据智能处理方法研究北京市重点实验室，北京 100044；
3.北京京诚瑞达电气工程技术有限公司，北京 100176）

收稿日期:2023-07-13 修回日期:2024-03-20 出版日期:2025-03-25 发布日期:2025-04-01
基金资助:
国家自然科学基金（62371032)；北京市自然科学基金（4232021）；住房城乡建设部科学技术项目 (研究开发项目)(2019-K-149)；北京建筑大学高级主讲教师培育计划(GJZJ20220803)

Reinforcement learning control for data center refrigeration systems

WEI Dong 1，2，JIA Yuchen1，HAN Shaoran3

(1.School of Electrical and Information Engineering,Beijing University of Civil Engineering and Architecture,Beijing 100044;
2.Key Laboratory of Intelligent Processing for Building Big Data,
Beijing University of Civil Engineering and Architecture,Beijing 100044;
3.Beijing Jingcheng Ruida Electric Engineering Technology Co.,Ltd.,Beijing 100176,China)

Received:2023-07-13 Revised:2024-03-20 Online:2025-03-25 Published:2025-04-01

摘要/Abstract

摘要： 数据中心制冷系统需要全年不间断运行，其能耗不容忽视，且传统PID控制方法难以实现系统整体节能。为此提出数据中心制冷系统强化学习控制方法，控制目标为在满足制冷要求的前提下提升系统整体能效。设计双层递阶控制结构，针对上层优化层提出多步预测深度确定性策略梯度MP-DDPG算法，利用DDPG处理制冷系统多维连续动作空间问题，以求取空气处理机组水阀开度以及制冷站系统各回路的最佳设定值，同时通过多步预测提升算法效率，并在实时控制阶段克服系统大时滞影响。下层现场控制层通过PID控制使被控变量跟踪优化层得出的最优设定值，可在不破坏原有现场控制系统的情况下实现性能优化。针对无模型强化学习控制难以满足控制实时性问题，首先构建系统预测模型，将强化学习控制器与其进行离线交互训练，然后实现在线实时控制。实验结果表明，与传统DDPG算法相比，控制器学习效率提升50%；与PID和MP-DQN相比，系统动态性能得到了改善，且整体能效提升约30.149%和11.6%。

关键词: 数据中心制冷系统, 预测控制, 强化学习, 深度确定性策略梯度法, 集成学习

Abstract: The refrigeration system in data centers needs to operate continuously throughout the year, and its energy consumption cannot be ignored. Moreover, traditional PID control methods struggle to achieve overall energy savings for the system. To address this, a reinforcement learning control strategy is proposed for data center refrigeration systems, with the control objective of enhancing the overall energy efficiency of the system while meeting cooling requirements. A two-layer hierarchical control structure is designed. The upper optimization layer introduces the multistep prediction-deep deterministic policy gradient (MP-DDPG) algorithm, which leverages DDPG to handle the multi-dimensional continuous action space of the refrigeration system to determine the water valve opening of the air hand- ling unit and the optimal setpoint for each loop in the chilling station system. Multistep prediction is employed to enhance algorithm efficiency and overcome the impact of large system delay during real-time control. The lower field control layer uses PID control to enable the controlled variables to track the optimal setpoints obtained from the optimization layer, achieving performance optimization without disrupting the existing field control system. To address the challenge of real-time control with model-free reinforcement learning, a system prediction model is first constructed, and the reinforcement learning controller is trained offline through interaction with this model. Subsequently, online real-time control is implemented. Experimental results show that compared to the traditional DDPG algorithm, the learning efficiency of the controller is improved by 50%. Compared to PID and MP-DQN (multistep prediction-deep Q network), the systems dynamic performance is improved, and the whole energy efficiency is increased by approximately 30.149% and 11.6%, respectively.

Key words: data center refrigeration system, predictive control, reinforcement learning, depth deterministic strategy gradient method, integrated learning

魏东, 贾宇辰, 韩少然. 数据中心制冷系统强化学习控制[J]. 计算机工程与科学, 2025, 47(3): 422-433.

WEI Dong , JIA Yuchen, HAN Shaoran. Reinforcement learning control for data center refrigeration systems[J]. Computer Engineering & Science, 2025, 47(3): 422-433.

[1]	陈俊彦1, 李欣梅1, 朱昌洪2, 肖微3. 基于多视图图注意力机制的软件定义光传输网络路由优化算法[J]. 计算机工程与科学, 2025, 47(7): 1193-1204.
[2]	李天云, 李韬, 温冬, 杨惠, 张毓涛, 罗欣, 董德尊. 基于人工智能方法的网络拥塞控制综述[J]. 计算机工程与科学, 2025, 47(6): 1018-1027.
[3]	邸剑, 万雪, 姜丽梅, . 基于随机对称搜索的进化强化学习算法[J]. 计算机工程与科学, 2025, 47(5): 912-920.
[4]	李佳坤, 谢雨来, 冯丹. 云边协同框架下视频处理任务实时调度算法[J]. 计算机工程与科学, 2025, 47(10): 1767-1778.
[5]	章政, 夏小云, 陈泽丰, 向毅. 融合强化学习的分阶段策略求解旅行背包问题[J]. 计算机工程与科学, 2025, 47(1): 140-149.
[6]	余世瑞, 姜春茂. 基于模糊强化学习的云计算虚拟机调度策略[J]. 计算机工程与科学, 2025, 47(1): 56-65.
[7]	庄述鑫, 陈永红, 郝一行, 吴巍炜, 徐学永, 王万元. 对抗环境中基于种群多样性的鲁棒策略生成方法[J]. 计算机工程与科学, 2024, 46(6): 1081-1091.
[8]	段成龙, 袁杰, 常乾坤, 张宁宁. 基于D2GA的逆强化学习算法[J]. 计算机工程与科学, 2024, 46(11): 2053-2062.
[9]	顾颖程, 魏柳, 姜宁, 程环宇, 刘凯, 宋玉, 刘梅招, 汤雷, 陈彧, 张胜. 边缘场景下面向分布式交互应用的服务器分配[J]. 计算机工程与科学, 2024, 46(10): 1748-1756.
[10]	蔡玉, 官铮, 王增文, 王学, 杨志军. 基于多智能体深度强化学习的车联网区分业务资源分配算法[J]. 计算机工程与科学, 2024, 46(10): 1757-1764.
[11]	曾凡锋, 王春真, 李琛. 基于深浅层特征融合的无监督视频摘要算法研究[J]. 计算机工程与科学, 2023, 45(9): 1602-1610.
[12]	王扬, 陈智斌. 一种求解CVRP的动态图转换模型[J]. 计算机工程与科学, 2023, 45(5): 859-868.
[13]	彭坤彦, 尹翔, 刘笑竹, 李恒宇. 基于粒子群优化和深度强化学习的策略搜索方法[J]. 计算机工程与科学, 2023, 45(4): 718-725.
[14]	官蕊, 丁家满, 贾连印, 游进国, 姜瑛, . 基于强化学习的多样性文档排序算法[J]. 计算机工程与科学, 2020, 42(9): 1697-1703.
[15]	蔡钺, 游进国, 丁家满. 基于近端策略优化与对抗学习的对话生成[J]. 计算机工程与科学, 2020, 42(9): 1680-1689.