• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2014, Vol. 36 ›› Issue (02): 201-205.

• 论文 • 上一篇    下一篇

高基Montgomery模乘阵列结构设计与实现

邬贵明,谢向辉,吴东,郑方,严忻恺   

  1. (数学工程与先进计算国家重点实验室, 江苏 无锡 214125)
  • 收稿日期:2013-08-05 修回日期:2013-10-26 出版日期:2014-02-25 发布日期:2014-02-25
  • 基金资助:

    中国博士后科学基金资助项目;国家863计划资助项目(2013AA010105)

Design and implementation of high radix Montgomery modular multiplication array structures             

WU Guiming,XIE Xianghui,WU Dong,ZHENG Fang,YAN Xinkai   

  1. (State Key Laboratory of Mathematical Engineering and Advanced Computing,Wuxi 214125,China)
  • Received:2013-08-05 Revised:2013-10-26 Online:2014-02-25 Published:2014-02-25

摘要:

提出了两种高基Montgomery模乘线性阵列结构。两种线性阵列结构分别利用两种不同的并行化开发方法,沿不同的循环维度进行任务分配和调度,都能够充分开发算法的流水线并行。在Xilinx XC5VLX330 FPGA上实现了两种256位宽、基为216的模乘阵列结构。实验结果表明,两种结构具有84个时钟周期的延迟,吞吐率分别为1/17和1/21,与相关结构相比吞吐率更高。两种结构在性能和实现代价间能够达到合理平衡。

关键词: 模乘;线性阵列结构;现场可编程阵列;流水化

Abstract:

Two linear arrays for high radix Montgomery modular multiplication are proposed. They use two different parallelization methods, both of which can exploit pipelined parallelism through task assignment and task scheduling along different loop dimensions. The two linear arrays for 256bit modular multiplication using the radix of 216, are implemented on Xilinx XC5VLX330 FPGA. The experimental results show that both linear arrays have the latencies of 84 cycles, and the throughput of 1/17 and 1/21, respectively. Compared with the related work, our designs have higher throughput. Moreover, the balance between performance and hardware overhead can be achieved.

Key words: modular multiplication;linear array;FPGA;pipeline