Heracles: Improving Resource Efficiency at Scale
Authors: David Lo, Liqun Cheng, Rama Govindaraju, Parthasarathy Ranganathan, Christos Kozyrakis Venue: ISCA 2015 This work presents a resource controller and scheduler that works to improve the throughput of best-effort tasks while preserving the SLO for latency-sensitive applications. The work combines tuning by addressing many fronts: core isolation (taskset), LLC isolation (CAT), power isolation (DVFS), and network traffic isolation (qdisc). They show that because of the unique property that these form a convex function, each can be optimized individually by understanding the current load of the system and available slack, which is polled by the top-level controller every 15 seconds. Overall, they increase machine utilization to 90% without sacrificing SLO agreements, which is defined for 60-second windows. The authors demonstrate three latency-critical workloads: websearch, ml_cluster, and memkeyval, which each stress different combinations of cache, bandwidth, pow...