Posts

Showing posts from September, 2018

Phase Behavior in Serial and Parallel Applications

Authors: Andreas Sembrant, David Black-Schaffer, Erik Hagersten Venue:    IISWC 2012 This paper extends ScarPhase to be feasible in a multi-threaded environment. This is made possible by tracking the same data as ScarPhase on a per-phase basis. The authors make note that global sharing (phase IDs, phase predictors) does not improve performance much, this seems to be due to the fact that when new "phases" arrive, they arrive simultaneously in multiple threads. Thus global sharing would not improve the accuracy or quality of the phase detection, but rather remove redundancy. However, the redundancy may be preferable from an implementation perspective. This paper primary focus on workload analysis rather than the phase detection algorithm. The authors show that PARSEC displays much less phase behavior when compared to Spec2006. Additionally, as the number of threads scales in data-parallel applications, phases becoming increasingly shorter (assuming the same data), and eventuall...

Efficient Software-Based Online Phase Classification (ScarPhase)

Authors: Andreas Sembrant, David Eklov, Erik Hagersten Venue:    IISWC 2011 The authors develop an online phase detection algorithm that works by approximating Basic Block Vectors (BBVs) by sampling conditional branches, made possible by PEBS. The algorithm operates on real-hardware, online, and uses fixed 100M instruction count windows. They show that while conditional branches encapsulate less information that all branches, when you are sampling, conditional branches provide better information. They evaluate their phase detection algorithms using the coefficient of variation, comparing the CoV within a phase to the global CoV. They additionally provide a metric which penalizes creating new phases. Finally, they utilize a Markov Predictor, similar to Sherwood and show accurate phase prediction as well. Overall, this paper does an excellent job of solving many off the challenges of phase detection: Feasible with real hardware with a low overhead of less than 2% Address ...

Fundamental Latency Trade-offs in Architecting DRAM Caches (Alloy Cache)

Authors: Moinuddin K. Qureshi and Gabriel H. Loh Venue:    MICRO 2012 The authors present a new architecture for DRAM cache which targets optimizing latency rather than hit rate. The authors show that for every optimization made to improve hit rate, one must also consider the added cache latency, thus resulting in a new "break even hit rate". The authors compare to three key baselines: an infeasibly large SRAM-tag array, the prior "LH-Cache", and a ideal latency-optimized cache. They find that "unoptimizations" improve latency, while minimally impacting hit-rate. Their final result is a cache which boosts performance 35% improvement over the baseline, and nearly 20% better than the previous LH-cache. The specific contributions are: Moving to a direct mapped cache provides significant improvement (by reducing latency) Combining tag-store and data-store into a single entity improves performance 21% Implementing a small per-core predictor enhances per...