• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2024, Vol. 46 ›› Issue (09): 1539-1546.

• 高性能计算 • 上一篇    下一篇

一种面向计算图的及时内存重用算法

曹博钧,钱入意,徐远超   

  1. (首都师范大学信息工程学院,北京 100048)

  • 收稿日期:2024-02-05 修回日期:2024-04-05 接受日期:2024-09-25 出版日期:2024-09-25 发布日期:2024-09-19
  • 基金资助:
    北京市自然科学基金(4212017)

An urgent memory reuse algorithm for computational graphs

CAO Bo-jun,QIAN Ru-yi,XU Yuan-chao   

  1. (College of Information Engineering,Capital Normal University,Beijing 100048,China)
  • Received:2024-02-05 Revised:2024-04-05 Accepted:2024-09-25 Online:2024-09-25 Published:2024-09-19

摘要: 有限的设备内存容量制约了深度神经网络模型的进一步发展,内存重用是少有的在不引入额外开销的前提下节省内存使用的方法之一。计算图中的中间张量占据着主要的内存空间,是内存重用算法的主要优化对象。现有的典型内存重用算法,包括大张量优先算法和短生命周期优先算法,仅从单一特征出发,只考虑张量之间的生命周期是否重叠,忽略了邻近张量之间的生命周期相对位置关系,计算图越复杂,对内存重用的挖掘越不够充分。针对该问题,提出一种新的内存重用算法——UMR,通过深入分析图中邻近张量的生命周期相对位置关系,并及时进行重用,从而获得了更多的内存重用机会。基于MLPerf中的真实推理模型对算法进行评估,结果显示UMR算法的内存重用率不低于现有的主流算法,且能达到该模型内存重用的理论最优。基于相对复杂的计算图对算法进行的评估表明,与大张量优先与短生命周期优先2种算法相比,UMR算法最高节省了21.6%和18.7%的内存占用,平均分别节省了6.5%与13.2%的内存占用。

关键词: 计算图, 内存优化, 内存重用, 内存利用率

Abstract: The limited device memory capacity restricts the further expansion of deep neural network models, and memory reuse is one of the few methods to save memory usage without introducing additional overhead. Intermediate tensors in the computational graph occupy the majority of memory space and are the primary optimization targets for memory reuse algorithms. Existing typical memory reuse algorithms, including the large tensor priority algorithm and the short lifetime priority algorithm, only consider a single characteristic, focusing solely on whether the lifetimes of tensors overlap, while ignoring the relative positional relationship between the lifetimes of adjacent tensors. As the computational graph becomes more complex, the exploitation of memory reuse becomes less sufficient. To address this issue, a new memory reuse algorithm, UMR, is proposed. By deeply analyzing the relative positional relationship between the lifetimes of adjacent tensors in the graph and promptly reusing them, UMR obtains more opportunities for memory reuse. The algorithm is evaluated based on real inference models in MLPerf, and the results show that the memory reuse rate of the UMR algorithm is not lower than that of existing mainstream algorithms and can achieve the theoretical optimum for memory reuse in the model. Evaluations of the algorithm based on relatively complex computational graphs indicate that compared to the large tensor priority and short lifetime priority algorithms, UMR saves up to 21.6% and 18.7% of memory usage, with average savings of 6.5% and 13.2%, respectively.

Key words: computational graph, memory optimization, memory reuse, memory usage rate