Real Time Power Estimation and Thread Scheduling via Performance Counters

Authors: Karan Singh, Major Bhadauria, Sally A. McKee
Venue:    ACM SIGARCH Computer Architecture News 2009

This paper presents a methodology for real-time power estimation via performance counters. The study is does completely on real hardware. The work characterizes power usage into four buckets: FP Units, Memory, Stalls, and Instructions Retired. This is based on the overall area of the chip itself. They utilize Spearman's rank correlation on the data to choose the best performance counter from each bucket. Four counters (this paper seems to be before perf event multiplexing) are selected as inputs to piecewise functions to approximate power utilization, obtaining a median error from 3.9-7.2% on different benchmark suites.

The work then utilizes the power estimates to build a proof-of-concept thread scheduler as a user space program. Each application's current power draw can be estimated, and thus if the target is exceeded, an application can be unscheduled or replaced. The details of the algorithm are omitted from the paper. It should also be noted that DVFS is not included in this paper as the feature was not available per-core during the time of publishing. This paper seems to be a strong basis for future work. Note that there is a large body of related work as well. Advantages of this work over its time's prior art may be less applicable, as some design features can be overcome with  more modern approaches. One such as example is utilizing a limited number of counters, which can be overcome with multiplexing at the time scale in the paper (1s intervals).

This paper should be used to find subsequent work as it clearly defines the problem and solution space that is of interest even today, a decade later.

Full Text

Comments

Popular posts from this blog

Fundamental Latency Trade-offs in Architecting DRAM Caches (Alloy Cache)

ZCOMP: Reducing DNN Cross-Layer Memory Footprint Using Vector Extensions

AutoFDO: Automatic Feedback-Directed Optimization for Warehouse-Scale Applications