Heracles: Improving Resource Efficiency at Scale

Authors: David Lo, Liqun Cheng, Rama Govindaraju, Parthasarathy Ranganathan, Christos Kozyrakis
Venue:    ISCA 2015

This work presents a resource controller and scheduler that works to improve the throughput of best-effort tasks while preserving the SLO for latency-sensitive applications. The work combines tuning by addressing many fronts: core isolation (taskset), LLC isolation (CAT), power isolation (DVFS), and network traffic isolation (qdisc). They show that because of the unique property that these form a convex function, each can be optimized individually by understanding the current load of the system and available slack, which is polled by the top-level controller every 15 seconds. Overall, they increase machine utilization to 90% without sacrificing SLO agreements, which is defined for 60-second windows.

The authors demonstrate three latency-critical workloads: websearch, ml_cluster, and memkeyval, which each stress different combinations of cache, bandwidth, power, and network traffic. They demonstrate the success of Hercules to continue to meet 100% SLO agreement with co-location of stress test best-effort workloads: streaming-LLC, cpu_pwr, iperf, brain, and streetview. Their methodology requires some offline profiling, however, they note that this is only because bandwidth partitioning and measurement does not exist separately, so they must be able to approximate bandwidth of workloads.

Full Text 

Comments

Popular posts from this blog

Fundamental Latency Trade-offs in Architecting DRAM Caches (Alloy Cache)

ZCOMP: Reducing DNN Cross-Layer Memory Footprint Using Vector Extensions

AutoFDO: Automatic Feedback-Directed Optimization for Warehouse-Scale Applications