Predicting inter-thread cache contention on a chip multi-processor architecture
Authors: Dhruba Chandra, Fei Guo, Seongbeom Kim, Yan Solihin
Venue: HPCA 2005
The authors present Prob, a model which is able to predict the performance implications of co-locating multiple threads on CMP. The model uses the stack distance profiles / circular sequence profiles as input. Using probability theory, Prob is able to accurately predict the cache miss rates of co-locating programs with an average of ~3.8% accuracy. Moreover, the models accuracy is only significantly off when the performance implications are predicted to be very large, and the real implications are even larger.
This is the first work to model the effects of co-locating threads on a CMP, yet is exceedingly accurate. However, the model does not propose a solution to co-locating threads, only a prediction model of the effects. Moreover, the study is done on a two core system -- which was state-of-the-art at time of publication. However, in many-core era the study would be interesting to re-examine the accuracy of this model. Lastly, the algorithm predicts cache miss rate effects, but not performance (IPC) effects. Later work shows that miss rates are not always good indicators, and IPC should be used directly. This is because of auxiliary effects, such as how bandwidth utilization typically increases as cache contention increases.
Full Text
Venue: HPCA 2005
The authors present Prob, a model which is able to predict the performance implications of co-locating multiple threads on CMP. The model uses the stack distance profiles / circular sequence profiles as input. Using probability theory, Prob is able to accurately predict the cache miss rates of co-locating programs with an average of ~3.8% accuracy. Moreover, the models accuracy is only significantly off when the performance implications are predicted to be very large, and the real implications are even larger.
This is the first work to model the effects of co-locating threads on a CMP, yet is exceedingly accurate. However, the model does not propose a solution to co-locating threads, only a prediction model of the effects. Moreover, the study is done on a two core system -- which was state-of-the-art at time of publication. However, in many-core era the study would be interesting to re-examine the accuracy of this model. Lastly, the algorithm predicts cache miss rate effects, but not performance (IPC) effects. Later work shows that miss rates are not always good indicators, and IPC should be used directly. This is because of auxiliary effects, such as how bandwidth utilization typically increases as cache contention increases.
Full Text
Comments
Post a Comment