Posts

Showing posts from February, 2018

QoS Policies and Architecture for Cache/Memory in CMP Platforms

Authors:   Ravi Iyer, Li Zhao, Fei Guo, Steve Reinhardt (Intel, NC State, U Michigan) Venue:      SIGMETRICS 2007 First and foremost, the biggest thing this paper has going for it is the evaluation methodology. While the paper does not have adequate space to display a lot of test cases (few workloads are shown), they provide performance numbers for a purely trace-based methodology as well as a full-system mock up. What makes this particularly impressive is that the full-system is done with a modified Linux kernel which adds priority level to processes as well as new system calls to change QoS bits in platform registers. Unlike other QoS papers, this one focuses on app prioritization. They do so by adjusting two key shared resources: memory bandwidth and cache allocation. The ideas present don't seem to provide any unsurprising insight. Essentially, giving more resources to a high-priority app (more than it's fair share) boost that app's performance at th...

Communist, Utilitarian, and Capitalist Cache Policies on CMPs: Caches as a Shared Resource

Authors:  Lisa R. Hsu, Steven Reinhardt (U Michigan), Ravi Iyer, Srihari Makinei (Intel) Venue:     PACT 2006 This paper examines the resultant partition of different LLC cache allocation policies on an multi-core (CMP) system. The overall finding is that while LRU-like policies tend to degrade into something like utilitarian policies. While at first glance this is may sound good, the metric of utility is raw-IPC, which gives bias toward program with high levels of ILP. The paper also explains that utilitarian policies can result in bad fairness, and likewise fairness policies can result in poor utility. Additionally, the paper goes on to show that different metrics (raw-IPC, misses-per-access, misses-per-instruction) result in drastically different cache partitioning schemes. The paper is more a case study than a proposed solution, and suggest that a more complicated, online policy will be required to target either utility- or fairness-based cache partitioning sc...

Characterizing and Predicting Program Behavior and it's Variability

Authors:   Evelyn Duesterwald, Calin Cascaval, Sandhya Dwarkadas (IBM, Rochester) Venue:      PACT 2003 This paper makes no effort to tune hardware, but shows yet another phase detection and prediction technique. The study is done on real hardware, using IBM Power3 and Power4 architectures. Both of these are extremely outdated, but it makes for an interesting paper nevertheless. They are able to instrument their phase detection techniques on OS interrupts at a granularity of 10ms. At this time, they read various performance counter, analyzing the current hardware state and predicting the future hardware state. It is not entirely clear how they actually determine a phase, but essentially, they are able to accurately predict different performance metrics such as IPC, branch miss-predict rate, and L1 cache miss rate. They support the use of table-based predictor. They also show that most statistics are extremely correlated, so a predictor for one metric can li...

Locality Phase Prediction

Authors:     Xipeng Shen, Yutao Zhong, Chen Ding  (University of Rochester) Venue:        ASPLOS 2004 Locality Phase Prediction presents an offline phase detection and prediction scheme. While, previous phase detection works use repeating branches or performance counters to identify program phases, locality phases are defined by the data locality. More formally "a locality phase ... [is] a period of program execution that has stable or slow changing data locality inside the phase but disruptive transition periods between phases." Additionally, it is important to note their definition of a phase: "a phase is a unit of repeating behavior rather than a unit of uniform behavior." The claim is that these types of phases are particularly common in simulation-type programs, which will often process over a (data) structure many times, as they calculate behavior at each time-step. In order to extract the phases, they use data reuse distance, which is va...

Online Phase Detection Algorithms

Authors:     Priya Nagpurkar ...et.al ...  V.T. Rajan (UC Santa Barbara, IBM Research) Venue:        International Symposium on Code Generation and Optimization, 2006 The paper presents a novel phase detection algorithm The main claims are: a client- and machine-independent baseline methodology for evaluating the accuracy of an online phase detector, a metric to compare phase detectors, empirical evaluation of numerous phase detectors (using Java applications). The phase detection in this paper builds off the idea that during execution, there exist two key states: stable phase, and transition period. They use a model which constant accepts inputs, and then when in a stable phase, grows the "trailing window" size to the size of the stable phase. They call this notion an adaptive trailing window policy. The paper uses an offline baseline to evaluate the online phase detection algorithms. The biggest contributions this paper seems to make is: ...

Managing Multi-Configuration Hardware via Dynamic Working Set Analysis

Authors:      Ashutosh S. Dhodapkar and James E. Smith (U. Wisconsin - Madison) Venue:         ISCA 2002 This paper is one of the first to propose working set signatures to detect and recall program phases. This eliminates the need for re-training phases, and can also be used to predict features like proper cache size. The signature is generated by hashing each branches into an N-bit vector. The vector is unweighted meaning that, it simply encapsulates whether or not the hash of a branch was seen. They use 128 byte vectors for 100K instruction windows (fine grain). This paper should mainly be seen as a building block, as it is some of the earlier work in phase detection. Key issues are: too frequently sampling (every branch committed), too frequent changes (100K instructions, doesn't account for cost of context switch), and targeted reconfiguration (instruction cache size -- typical programs now multiple MB). This paper has good informatio...

Phase Tracking and Prediction

Authors: Tim Sherwood, Suleyman Sair, Brad Calder Venue:    ISCA 2003 The primary goal of this paper is to develop a phase detection and prediction mechanism that can guide optimizations for large-scale program behavior. The phases are detected based on the code being executed. Unlike Dhodapkar and Smith's work, their technique uses a basic block vector approach that also encapsulates time spent executing each code block. This paper also uses branch IP's to build basic block vectors, but, hashes them rather than using random projection. They show that a relatively small hash (32 buckets) is sufficient. Additionally, most of the programs execution can be captured using just 20 unique phase ID/signatures. To determine is a new vector is a new phase, they use the L1 distance set to an emperically determined threshold value. To determine the quality of phases, they look at the standard deviation of various counters on the global scale and for individual phases, showing that each ...

Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications

Authors:   Tim Sherwood, Erez Perelman, Brad Calder (UC San Diego) Venue:      PACT 2001 As the title states, the key problem this paper seeks to solve is finding simulation points. To do this, they come up with basic block vectors analysis (BBVA). BBVA uses BBV's, which can be collected in an environment more similar to running the default program than simulation, and as a result, is many orders of magnitude quicker than simulation. They use the BBV's to determine the relative sizes of each phase, i.e., how much duration of program execution they represent. Then using other tricks such as Fourier analysis, they are also able to analyze the cyclic behavior. Full Text

Automatically Characterizing Large Scale Program Behavior

Authors:   Tim Sherwood, Erez Perelman, Greg Hamerly, Brad Calder (UC San Diego) Venue:      ASPLOS 2002 This paper introduces the concept of Basic Block Vectors (BBV). A basic block is defined as a piece of code with one entry and one exit. A basic block vector consists of an element for each basic block. The value in that block corresponds to the number of times the block was seen multiplied by the number of instructions in the block. Similar phases, or basic blocks, need not be temporally adjacent. One particularly interesting components is there phase detection algorithm. There basic block vectors are often millions of dimensions, and they use random linear projection to reduce the dimensionality 15. They use k-means multiples times and score the result using BIC - Bayesian Information Criterion. Full Text

What the Blog is About

I am 3rd year PhD Student at the University of Utah. During the course of research we read an incredible amount of research papers, and yet forget most of what we've read. The blog is an attempt to maintain that information easily.