Gaining Insights into Multicore cache Partitioning: Bridging the Gap between Simulation and Real Systems

October 30, 2018

Authors: Jian Lin, Qingda Lu, .. P. Sadayappan et al.
Venue: HPCA 2008

The authors of this paper present an in-depth analysis and optimization of cache partitioning on a real-system. They accomplish this by using OS-page coloring, which induces only ~2% overhead. Since they cite the goal of their study primarily as analysis and potential, they subtract out this overhead. The authors show significant discrepancy compared to previous studies, which they cite as an artifact of simulations which are too small in length, and use too small of datasets. The real system approach allows for much longer runs with larger datasets.

Benchmarks are divided into 4 categories:

Red: Highly sensitive to cache size (bzip2, mcf, omnetpp, astar, sphinx3, xalanc)
Yellow: Moderately sensitive (gcc, leslie3d, soplex, Gems, tonto, lbm, perl, catcus, h264)
Green: Marginally sensitive (bwaves, zeus, gromacs, povray, libq, wrf)

They create 27 workloads which each comprise of two benchmarks, such that all combinations of benchmark categories are covered. They demonstrate significant improvement across various metrics: throughput, weighted speedup, fair speedup and SMT Speedup. They show a larger differential in performance due to benchmark category combination. They also present a dynamic optimization algorithm, which appears to work in a hill-climbing approach. They show that often an oracle static analysis can often outperform an online, dynamic optimization scheme.

The authors also provide a study to see how well different formulations of fairness via miss rates correlate to actual fairness via IPC. They find none of the policies to be particularly good indicators and each suffer in particularly cases (via different mixes of benchmarks). This prompts for the need to use IPC directly for the optimization framework. In this study, they also demonstrate that dynamic optimization works much better when a baseline IPC is known prior.

Full Text

Search This Blog

Karl Taht's Research Paper Blog

Gaining Insights into Multicore cache Partitioning: Bridging the Gap between Simulation and Real Systems

Comments

Post a Comment

Popular posts from this blog

Fundamental Latency Trade-offs in Architecting DRAM Caches (Alloy Cache)

ZCOMP: Reducing DNN Cross-Layer Memory Footprint Using Vector Extensions

AutoFDO: Automatic Feedback-Directed Optimization for Warehouse-Scale Applications