• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2024, Vol. 46 ›› Issue (06): 959-967.

• High Performance Computing • Previous Articles     Next Articles

Exploration of the many-core data flow hardware architecture based on Actor model

ZHANG Jia-hao,DENG Jin-yi,YIN Shou-yi,WEI Shao-jun,HU Yang   

  1. (School of Integrated Circuits,Tsinghua University,Beijing 100084,China)
  • Received:2023-10-06 Revised:2023-11-23 Accepted:2024-06-25 Online:2024-06-25 Published:2024-06-17

Abstract: The distributed training of ultra-large-scale AI models poses challenges to the communication capability and scalability of chip architectures. Wafer-level chips integrate a large number of computing cores and inter-connect networks on the same wafer, achieving ultra-high computing density and communication performance, making them an ideal choice for training ultra-large-scale AI models. AMCoDA is a hardware architecture based on the Actor model, aiming to leverage the highly parallel, asynchronous message passing, and scalable characteristics of the Actor parallel programming model to achieve distributed training of AI models on wafer-level chips. The design of AMCoDA includes three levels: computational model, execution model, and hardware architecture. Experiments show that AMCoDA extensively supports various parallel patterns and collective communications in distributed training, flexibly and efficiently deploying and executing complex distributed training strategies. 


Key words: wafer-level chip, distributed training, Actor model, many-core dataflow architecture ,