基于多核NPU的TCP数据接收卸载

J4 ›› 2016, Vol. 38 ›› Issue (07): 1344-1349.

基于多核NPU的TCP数据接收卸载

李杰,陈曙晖

（国防科学技术大学计算机学院,湖南长沙 410073）

收稿日期:2015-06-25 修回日期:2015-08-11 出版日期:2016-07-25 发布日期:2016-07-25
基金资助:
国家自然科学基金（61379148）

Multicore NPU based TCP large receive offload

LI Jie,CHEN Shuhui

（College of Computer,National University of Defense Technology,Changsha 410073,China）

Received:2015-06-25 Revised:2015-08-11 Online:2016-07-25 Published:2016-07-25

摘要/Abstract

摘要：

目前以太网的发展速度远高于存储器和CPU的发展速度，存储器访问和CPU处理网络协议已经成为TCP的性能瓶颈。网络带宽的不断增大对CPU造成了沉重的负担，大约需要1 GHz的CPU处理资源对1 Gbps的网络流量进行协议处理。为此，使用多核NPU作为NIC，实现TCP接收数据路径中的校验和计算、报文乱序重组功能，并将合并之后的大报文经Linux网卡驱动程序交由协议栈处理，从而减少协议栈处理报文和网卡产生中断的数量，提升端系统的TCP性能。在10 Gbps以太网络中，实验取得4.9 Gbps的TCP接收数据吞吐量。

关键词: TCP乱序重组, TCP数据接收卸载, LRO, TOE, 多核NPU

Abstract:

The current development of the Ethernet technology is much faster than that of memory and CPU technologies, and the memory access and CPU processing network stack have become the bottleneck of TCP performance on end systems. The constantly increasing network bandwidth burdens the CPU severely, and approximately 1GHz CPU resource is needed to process 1Gbps network traffic. We therefore take a multicore NPU as the NIC and the TCP's checksum verification and packets reordering functions are offloaded. Small TCP packets aggregated into fewer but larger packets by the multicore NPU, thus reducing both the number of packets processed by network stack and the number of interrupts generated by the NIC, and eventually improving the TCP performance on end systems. Experimental results show that 4.9 Gbps TCP receive data throughput can be achieved in a 10Gbps network.

Key words: TCP packets reordering;TCP data receive offload;LRO;TOE;multicore NPU

李杰,陈曙晖. 基于多核NPU的TCP数据接收卸载[J]. J4, 2016, 38(07): 1344-1349.

LI Jie,CHEN Shuhui. Multicore NPU based TCP large receive offload [J]. J4, 2016, 38(07): 1344-1349.

编辑推荐

Metrics

阅读次数

全文

248

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	0	0	248

来源	本网站	其他网站

次数	211	37
比例	85%	15%

摘要

221

最新录用	在线预览	正式出版

0	0	221

	来源	本网站

	次数	221
	比例	100%

[1]	李胜国, 廖霞, 于恒彪, 黄春, 姜浩, 逯喜燕, 王华林, 成礼智. 面向结构矩阵的可扩展并行矩阵乘算法框架[J]. 计算机工程与科学, 2024, 46(09): 1529-1538.
[2]	袁也, 王刚, 刘晓光, 李雨森. 新型电路版图布局布线算法设计[J]. 计算机工程与科学, 2021, 43(07): 1185-1191.
[3]	张鹏. CC-NUMA架构下4路龙芯3B服务器设计与实现[J]. 计算机工程与科学, 2018, 40(12): 2141-2145.
[4]	王小峰时向泉苏金树. 一种TCP／IP卸载的数据零拷贝传输方法[J]. J4, 2008, 30(2): 135-138.