OpenLM: A multi-platform and high-performance large language model inference framework

Computer Engineering & Science ›› 2025, Vol. 47 ›› Issue (12): 2129-2138.

• High Performance Computing • Previous Articles Next Articles

OpenLM: A multi-platform and high-performance large language model inference framework

LIU Gao,XU Jianliang,ZHANG Xianyi,LIU Xiandong

(1.Faculty of Information Science and Engineering,Ocean University of China,Qingdao 266100；
2.Peng Feng(Beijing) Technology Co.,Ltd.,Beijing 100080,China)

Received:2025-02-20 Revised:2025-03-04 Online:2025-12-25 Published:2026-01-06

Abstract

Abstract: As computational devices continue to diversify and computational power grows rapidly, the increasing number of large language models (LLMs) has made efficient multi-model inference across heterogeneous platforms a complex and formidable challenge. To address this, we propose OpenLM, a high-performance inference framework to support efficient deployment of multiple LLMs on diverse hardware platforms. The OpenLM framework boasts extensive model compatibility, providing efficient performance support for a wide range of models. It incorporates high-performance computing operators optimized for multiple platforms and architectures to maximize hardware performance. Meanwhile, OpenLM features a flexible framework architecture that facilitates rapid integration and support for the latest models. To further optimize memory (both GPU and CPU memory) consumption, task scheduling, and system stability during the inference process, the framework introduces features such as Paged- Attention mechanisms, dynamic batching, weight quantization, and KV cache quantization. According to the experimental results, these optimization strategies effectively enhance inference efficiency, reduce resource overhead, and bolster overall framework performance.

Key words: deep learning, large language model (LLM), high-performance computing (HPC), LLM inference framework

LIU Gao, XU Jianliang, ZHANG Xianyi, LIU Xiandong. OpenLM: A multi-platform and high-performance large language model inference framework[J]. Computer Engineering & Science, 2025, 47(12): 2129-2138.

[1]	LI Zhipeng1, CHEN Danyang1, 2, ZHONG Cheng1, 2. A multiple restoration network for large broken images [J]. Computer Engineering & Science, 2025, 47(9): 1638-1646.
[2]	WANG Yan, LIU Jingjing, HU Jinyuan, CHEN Yanyan. A Transformer-based pixel-by-pixel detail compensation dehazing network [J]. Computer Engineering & Science, 2025, 47(9): 1647-1657.
[3]	WANG Fengying1, 2, SONG Zikai2, ZHANG Yan1, DU Liming1. Optimization and reduction for deep learning test set based on MMD-GA [J]. Computer Engineering & Science, 2025, 47(9): 1700-1710.
[4]	YIN Chunyong, ZHANG Xiaohu. Log anomaly detection based on Transformer and Text-CNN [J]. Computer Engineering & Science, 2025, 47(3): 448-458.
[5]	XU Wen, YU Li. A compressive sensing image reconstruction network based on iterative shrinkage thresholding and deep learning [J]. Computer Engineering & Science, 2025, 47(3): 485-493.
[6]	LIU Yongmin, XU Cheng, HUANG Hao, ZHANG Qianlei, ZHAO Junjie, . Research on intrusion detection method based on SAE and WGAN [J]. Computer Engineering & Science, 2025, 47(2): 256-264.
[7]	XU Tianyou, GAO Guangyong. Robust image hiding by invertible generative adversarial network [J]. Computer Engineering & Science, 2025, 47(2): 288-297.
[8]	LI Jinzhong, LIU Weidong, CHEN Shengbo. Diversified ranking of search result: Recent progress and prospects [J]. Computer Engineering & Science, 2025, 47(12): 2227-2252.
[9]	TANG Jintao, ZHANG Chengxian, BAO Chenlong, LI Wenjing. Domain oriented discontinuous named entity recognition based on large language model [J]. Computer Engineering & Science, 2025, 47(12): 2253-2260.
[10]	YIN Chunyong, LI Rongbiao. Network traffic anomaly detection based on gated fusion and multi-scale convolution [J]. Computer Engineering & Science, 2025, 47(11): 1953-1963.
[11]	WU Yuhong, WANG Jian. Fault diagnosis of analog circuits based on Patches-CNN [J]. Computer Engineering & Science, 2025, 47(1): 35-44.
[12]	XU Chao, RUAN Rongyao, CHEN Yong, . A blockchain-based medical data auditing method [J]. Computer Engineering & Science, 2025, 47(1): 95-106.
[13]	CHEN Xinran, LIU Ning, YAN Zhongmin, LIU Lei, CUI Lizhen. An attention-guided dual-granularity cross-modal medical representation learning framework [J]. Computer Engineering & Science, 2025, 47(1): 150-159.
[14]	CHEN Lei, LIANG Zheng-you, SUN Yu, CAI Jun-min. Mobile monocular depth estimation based on multi-scale feature fusion [J]. Computer Engineering & Science, 2024, 46(9): 1616-1524.
[15]	LIU Qiang, LI Mu-chun, WU Xiao-jie, WANG Yu-heng. S-JSMA: A fast JSMA adversarial example generation method with low disturbance redundancy [J]. Computer Engineering & Science, 2024, 46(8): 1395-1402.

OpenLM: A multi-platform and high-performance large language model inference framework

PDF

Knowledge

Abstract

Cite this article

share this article

Related Articles 15

Recommended Articles

Metrics

Comments