Fundamental Latency Trade-offs in Architecting DRAM Caches (Alloy Cache)
Authors: Moinuddin K. Qureshi and Gabriel H. Loh
Venue: MICRO 2012
The authors present a new architecture for DRAM cache which targets optimizing latency rather than hit rate. The authors show that for every optimization made to improve hit rate, one must also consider the added cache latency, thus resulting in a new "break even hit rate". The authors compare to three key baselines: an infeasibly large SRAM-tag array, the prior "LH-Cache", and a ideal latency-optimized cache. They find that "unoptimizations" improve latency, while minimally impacting hit-rate. Their final result is a cache which boosts performance 35% improvement over the baseline, and nearly 20% better than the previous LH-cache. The specific contributions are:
Venue: MICRO 2012
The authors present a new architecture for DRAM cache which targets optimizing latency rather than hit rate. The authors show that for every optimization made to improve hit rate, one must also consider the added cache latency, thus resulting in a new "break even hit rate". The authors compare to three key baselines: an infeasibly large SRAM-tag array, the prior "LH-Cache", and a ideal latency-optimized cache. They find that "unoptimizations" improve latency, while minimally impacting hit-rate. Their final result is a cache which boosts performance 35% improvement over the baseline, and nearly 20% better than the previous LH-cache. The specific contributions are:
- Moving to a direct mapped cache provides significant improvement (by reducing latency)
- Combining tag-store and data-store into a single entity improves performance 21%
- Implementing a small per-core predictor enhances performance further (to a total of 35%), of whether the data will reside in memory or the DRAM cache
- The latency optimizations even outperform that of an SRAM array for tags
Comments
Post a Comment