AsmDB: Understanding and Mitigating Front-end Stalls in Warehouse-Scale Computers

Authors: Grant Ayers et al.
Venue: ISCA 2019

Previous works have highlighted a significant frontend bottleneck is warehouse-scale computers (WSC). A variety of solutions have been proposed, both on real hardware and architectural papers, to mitigate the issue. This paper performs deep analysis across 90%+ of Google's entire fleet to perform a fine-grain analysis of when, where, and how frontend bottlenecks occur. AsmDB comprises of post-processed last branch record (LBR) data to form control flow probabilities and precise information about what instructions triggered I-cache misses. The analysis shows several core reasons for I-cache misses: large jump distances (either via function call or indirect branch) and cold code being brought into the cache (either via cache blocks or prefetching).

The remainder of the paper transitions to being much more compiler focused. It describes a software prefetching algorithm which utilizes AsmDB information to inject prefetches at ideal points in code. Prefetches must be both timely, and accurate. The challenge is exacerbated in when performing software prefetching due to varying IPC, as well as fan-in and fan-out concerns. Fundamentally, software prefetching increases the code size by requiring another instruction, so even if a prefetch is useful, it naturally incurs some overhead. The authors describe in detail how the AsmDB information helps to determine exactly where to perform this prefetch, and the net results is about 0.5-1% overall performance gains. I absolutely loved the analysis component of this paper but was hoping for a bigger perf win, especially considering the complexity required for the prefetching implementation.

Full Text


Comments

Popular posts from this blog

Fundamental Latency Trade-offs in Architecting DRAM Caches (Alloy Cache)

ZCOMP: Reducing DNN Cross-Layer Memory Footprint Using Vector Extensions

AutoFDO: Automatic Feedback-Directed Optimization for Warehouse-Scale Applications