A Survey of the NonUniform Cache Architecture

WU Junjie,YANG Xuejun

doi:10.3969/j.issn.1007130X.2011.

Computer Engineering & Science >

2011 , Vol. 33 >Issue 2: 51 - 60

DOI: https://doi.org/10.3969/j.issn.1007130X.2011.

论文

A Survey of the NonUniform Cache Architecture

Expand

(National Laboratory for Parallel and Distributed Processing,Changsha 410073，China)

Received date: 2009-06-02

Revised date: 2009-12-15

Online published: 2011-02-25

Fold

Abstract

Because of the memory wall problem, cache memory technologies have been very important. In the future, growing wire delays will become one of the key factors restricting the design of large caches. To provide a uniform access delay, the traditional cache design methodology has to accommodate cache access time to the farthest cache bank from processors. Thus, researchers proposed a kind of NonUniform Cache Architecture (NUCA), which is almost the inevitable outcome in intending processors. If a processor hits the data stored in the nearby banks of NUCA, the cache access time will be short. Otherwise, the access time will be long. This paper summarizes the origin, development and typical cases of NUCA. It is important that two memory technologies of multiprocessors, from which the research of NUCA may borrow ideas, are introduced. Finally, the key problems of the NUCA research as well as solutions are proposed in the authors' opinion.

Key words： NUCA;wire delay;locality;multicore;NUMA;COMA

Cite this article

WU Junjie,YANG Xuejun . A Survey of the NonUniform Cache Architecture[J]. Computer Engineering & Science, 2011 , 33(2) : 51 -60 . DOI: 10.3969/j.issn.1007130X.2011.

References

［1］Wu CY,Shiau MC. Delay Models and Speed Improvement Techniques for RC Tree Interconnections Among SmallGeometry CMOS Inverters［J］. IEEE Journal of SolidState Circuits,1990,25(5):12471256.
［2］Semiconductor Industry Association［EB/OL］.［20020317］.http://public.itrs.net/Files/2002Update/2002Update.pdf.
［3］Hrishikesh M S,Jouppi N P,Farkas K I,et al.The Optimal Logic Depth Per Pipeline Stage Is 6 to 8 Inverter Delays［C］∥Proc of the 29th Annual Int’l Symp on Computer Architecture, 2002：1424.
［4］Kim C,Burger D,Keckler S W.An Adaptive,NonUniform Cache Structure for WireDelay Dominated OnChip Caches［C］∥Proc of the 10th Int’l Conf on Architectural Support for Programming Languages and Operating Systems,2002:211222.
［5］Chishti Z,Powell M D,Vijaykumar T N.Distance Associativity for HighPerformance EnergyEfficient NonUniform Cache Architectures［C］∥Proc of the 36th Annual IEEE/ACM Int’l Symp on Microarchitecture,2003：55.
［6］Beckmann　B M,Wood D A.Managing Wire Delay in Large ChipMultiprocessor Caches［C］∥Proc of the 37th Annual IEEE/ACM Int’l Symp on Microarchitecture,2004：319330.
［7］Chishti Z,Powell M D,Vijaykumar T N.Optimizing Replication,Communication, and Capacity Allocation in Cmps［C］∥Proc of the 32nd Annual Int’l Symp on Computer Architecture,2005:357368.
［8］Merino J,Puente V,Prieto P,et al.Spnuca:A Cost Effective Dynamic NonUniform Cache Architecture［J］.SIGARCH Computer Architecture News,2008,36(2):6471.
［9］Kandemir M,Li F,Irwin M J,et al.A Novel Migrationbased Nuca Design for Chip Multiprocessors［C］∥Proc of the 2008 ACM/IEEE Conf on Supercomputing,2008:112.
［10］Cho S,Jin L.Managing Distributed,Shared L2 Caches Through OSLevel Page Allocation［C］∥Proc of the 39th Annual IEEE/ACM Int’l Symp on Microarchitecture,2006：455468.
［11］Bardine A,Foglia P,Gabrielli G,et al.Improving Power Efficiency of DNuca Caches［J］.SIGARCH Comput Archit News,2007,35(4):5358.
［12］Bardine A,Foglia P,Gabrielli G,et al.Analysis of Static and Dynamic Energy Consumption in Nuca Caches: Initial Results［C］∥Proc of the 2007 Workshop on Memory Performance,2007:105112.
［13］Bardine A,Comparetti M,Foglia P,et al.Leveraging Data Promotion for Low Power DNuca Caches［C］∥Proc of the 11th EUROMICRO Conf on Digital System Design Architectures, Methods and Tools,2008:307316.
［14］Sankaralingam K,Nagarajan R,Liu H,et al.Exploiting ILP, TLP, and DLP with the Polymorphous TRIPS Architecture［C］∥Proc of the 30th Annual Int’l Symp on Computer Architecture,2003:422433.
［15］Beckmann B M,Wood D A.TLC: Transmission Line Caches［C］∥Proc of the 36th Annual IEEE/ACM Int’l Symp on Microarchitecture,2003：43.
［16］Chang J,Sohi G S.Cooperative Caching for Chip Multiprocessors［C］∥Proc of the 33rd Annual Int’l Symp on Computer Architecture,2006:264276.
［17］Chang J,Sohi G S.Cooperative Cache Partitioning for Chip Multiprocessors［C］∥Proc of the 21st Annual Int’l Conf on Supercomputing,2007:242252.
［18］Wikipedia. NonUniform Memory Access［EB/OL］.［20091113］.http://en.wikipedia.org/wiki/NonUniform_Memory_Access.
［19］Grbic A,Brown S,Caranci S,et al.Design and Implementation of the Numachine Multiprocessor［C］∥Proc of the 35th Annual Conf on Design Automation,1998:6669.
［20］Pimentel A.Parallel Architectures:The Bare Metal［EB/OL］.［20091116］.http://staff.science.uva.nl/~andy/apr/apr.html.
［21］Mu T,Tao J,Schulz M,et al.Interactive Locality Optimization on Numa Architectures［C］∥Proc of the 2003 ACM Symp on Software Visualization,2003:133141.
［22］Larowe J R P.Page Placement for NonUniform Memory Access Time (NUMA) Shared Memory Multiprocessors:［Ph D dissertation］［D］.Durham Duke University,1991.
［23］Richard J,LaRowe P,Holliday M A,et al.An Analysis of Dynamic Page Placement on a Numa Multiprocessor［C］∥Proc of the 1992 ACM SIGMETRICS Joint Int’l Conf on Measurement and Modeling of Computer Systems,1992:2334.
［24］Li Z.Reducing Cache Conflicts by Partitioning and Privatizing Shared Arrays［C］∥Proc of the 1999 Int’l Conf on Parallel Architectures and Compilation Techniques,1999：183.
［2５］Wikipedia. Cache only Memory Architecture［EB/OL］.［20090617］.http://en.wikipedia.org/wiki/Cache_only_memory_architecture.
［26］Dahlgren F,Torrellas J.CacheOnly Memory Architectures［J］.Computer,1999,32(6):7279.
［27］Landin A,Haridi S.Ddm  a CacheOnly Memory Architecture［R］.European Research Consortium for Informatics and Mathematics at SICS, Technology Report, 1991.
［28］Wikipedia. Kendall Square Research［EB/OL］.［20090514］.http://en.wikipedia.org/wiki/Kendall_Square_Research.
［29］Windheiser D,Boyd E L,Hao E,et al.KSR1 Multiprocessor:Analysis of Latency Hiding Techniques in Sparse Solver［C］∥Proc of Seventh Int’l Parallel Processing Symp,1993：456461.

Options

Outlines

模态框（Modal）标题

Abstract

Cite this article

References