Computer Engineering & Science >
A Survey of the NonUniform Cache Architecture
Received date: 2009-06-02
Revised date: 2009-12-15
Online published: 2011-02-25
Because of the memory wall problem, cache memory technologies have been very important. In the future, growing wire delays will become one of the key factors restricting the design of large caches. To provide a uniform access delay, the traditional cache design methodology has to accommodate cache access time to the farthest cache bank from processors. Thus, researchers proposed a kind of NonUniform Cache Architecture (NUCA), which is almost the inevitable outcome in intending processors. If a processor hits the data stored in the nearby banks of NUCA, the cache access time will be short. Otherwise, the access time will be long. This paper summarizes the origin, development and typical cases of NUCA. It is important that two memory technologies of multiprocessors, from which the research of NUCA may borrow ideas, are introduced. Finally, the key problems of the NUCA research as well as solutions are proposed in the authors' opinion.
WU Junjie,YANG Xuejun . A Survey of the NonUniform Cache Architecture[J]. Computer Engineering & Science, 2011 , 33(2) : 51 -60 . DOI: 10.3969/j.issn.1007130X.2011.
[1]Wu CY,Shiau MC. Delay Models and Speed Improvement Techniques for RC Tree Interconnections Among SmallGeometry CMOS Inverters[J]. IEEE Journal of SolidState Circuits,1990,25(5):12471256.
[2]Semiconductor Industry Association[EB/OL].[20020317].http://public.itrs.net/Files/2002Update/2002Update.pdf.
[3]Hrishikesh M S,Jouppi N P,Farkas K I,et al.The Optimal Logic Depth Per Pipeline Stage Is 6 to 8 Inverter Delays[C]∥Proc of the 29th Annual Int’l Symp on Computer Architecture, 2002:1424.
[4]Kim C,Burger D,Keckler S W.An Adaptive,NonUniform Cache Structure for WireDelay Dominated OnChip Caches[C]∥Proc of the 10th Int’l Conf on Architectural Support for Programming Languages and Operating Systems,2002:211222.
[5]Chishti Z,Powell M D,Vijaykumar T N.Distance Associativity for HighPerformance EnergyEfficient NonUniform Cache Architectures[C]∥Proc of the 36th Annual IEEE/ACM Int’l Symp on Microarchitecture,2003:55.
[6]Beckmann B M,Wood D A.Managing Wire Delay in Large ChipMultiprocessor Caches[C]∥Proc of the 37th Annual IEEE/ACM Int’l Symp on Microarchitecture,2004:319330.
[7]Chishti Z,Powell M D,Vijaykumar T N.Optimizing Replication,Communication, and Capacity Allocation in Cmps[C]∥Proc of the 32nd Annual Int’l Symp on Computer Architecture,2005:357368.
[8]Merino J,Puente V,Prieto P,et al.Spnuca:A Cost Effective Dynamic NonUniform Cache Architecture[J].SIGARCH Computer Architecture News,2008,36(2):6471.
[9]Kandemir M,Li F,Irwin M J,et al.A Novel Migrationbased Nuca Design for Chip Multiprocessors[C]∥Proc of the 2008 ACM/IEEE Conf on Supercomputing,2008:112.
[10]Cho S,Jin L.Managing Distributed,Shared L2 Caches Through OSLevel Page Allocation[C]∥Proc of the 39th Annual IEEE/ACM Int’l Symp on Microarchitecture,2006:455468.
[11]Bardine A,Foglia P,Gabrielli G,et al.Improving Power Efficiency of DNuca Caches[J].SIGARCH Comput Archit News,2007,35(4):5358.
[12]Bardine A,Foglia P,Gabrielli G,et al.Analysis of Static and Dynamic Energy Consumption in Nuca Caches: Initial Results[C]∥Proc of the 2007 Workshop on Memory Performance,2007:105112.
[13]Bardine A,Comparetti M,Foglia P,et al.Leveraging Data Promotion for Low Power DNuca Caches[C]∥Proc of the 11th EUROMICRO Conf on Digital System Design Architectures, Methods and Tools,2008:307316.
[14]Sankaralingam K,Nagarajan R,Liu H,et al.Exploiting ILP, TLP, and DLP with the Polymorphous TRIPS Architecture[C]∥Proc of the 30th Annual Int’l Symp on Computer Architecture,2003:422433.
[15]Beckmann B M,Wood D A.TLC: Transmission Line Caches[C]∥Proc of the 36th Annual IEEE/ACM Int’l Symp on Microarchitecture,2003:43.
[16]Chang J,Sohi G S.Cooperative Caching for Chip Multiprocessors[C]∥Proc of the 33rd Annual Int’l Symp on Computer Architecture,2006:264276.
[17]Chang J,Sohi G S.Cooperative Cache Partitioning for Chip Multiprocessors[C]∥Proc of the 21st Annual Int’l Conf on Supercomputing,2007:242252.
[18]Wikipedia. NonUniform Memory Access[EB/OL].[20091113].http://en.wikipedia.org/wiki/NonUniform_Memory_Access.
[19]Grbic A,Brown S,Caranci S,et al.Design and Implementation of the Numachine Multiprocessor[C]∥Proc of the 35th Annual Conf on Design Automation,1998:6669.
[20]Pimentel A.Parallel Architectures:The Bare Metal[EB/OL].[20091116].http://staff.science.uva.nl/~andy/apr/apr.html.
[21]Mu T,Tao J,Schulz M,et al.Interactive Locality Optimization on Numa Architectures[C]∥Proc of the 2003 ACM Symp on Software Visualization,2003:133141.
[22]Larowe J R P.Page Placement for NonUniform Memory Access Time (NUMA) Shared Memory Multiprocessors:[Ph D dissertation][D].Durham Duke University,1991.
[23]Richard J,LaRowe P,Holliday M A,et al.An Analysis of Dynamic Page Placement on a Numa Multiprocessor[C]∥Proc of the 1992 ACM SIGMETRICS Joint Int’l Conf on Measurement and Modeling of Computer Systems,1992:2334.
[24]Li Z.Reducing Cache Conflicts by Partitioning and Privatizing Shared Arrays[C]∥Proc of the 1999 Int’l Conf on Parallel Architectures and Compilation Techniques,1999:183.
[25]Wikipedia. Cache only Memory Architecture[EB/OL].[20090617].http://en.wikipedia.org/wiki/Cache_only_memory_architecture.
[26]Dahlgren F,Torrellas J.CacheOnly Memory Architectures[J].Computer,1999,32(6):7279.
[27]Landin A,Haridi S.Ddm a CacheOnly Memory Architecture[R].European Research Consortium for Informatics and Mathematics at SICS, Technology Report, 1991.
[28]Wikipedia. Kendall Square Research[EB/OL].[20090514].http://en.wikipedia.org/wiki/Kendall_Square_Research.
[29]Windheiser D,Boyd E L,Hao E,et al.KSR1 Multiprocessor:Analysis of Latency Hiding Techniques in Sparse Solver[C]∥Proc of Seventh Int’l Parallel Processing Symp,1993:456461.
/
| 〈 |
|
〉 |