Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior
Authors: Yoongu Kim, Michael Papamichael, Onur Mutlu, and Mor Hachol-Balter Venue: MICRO 2010 This paper presents Thread Cluster Memory Scheduling (TCM), a memory scheduling algorithm that targets optimizing both system throughput and fairness. To achieve this, three key ideas are employed. Firstly, threads are clustered as either bandwidth intensive or non-intensive. The idea here is that low-bandwidth threads are more sensitive to latency. While an example is presented, an easier way to reason about this is that a low-bandwidth thread needs only a small fraction of memory service time to see a significant performance increase. In other words, it has a high ROI with minimal impact to other threads. As such, non-BW-intensive threads are always given the highest priority. The second observation to be exploited is the disparity in behavior among high-bandwidth threads. Specifically, a metric niceness measures a threads susceptibility to interference and impact on ot...