Posts

Showing posts from February, 2019

Accelerating Two-Dimensional Page Walks for Virtualized Systems

Authors: Ravi Bhargava, Benjamin Serebrin, Francesco Spadini, Srilatha Manne Venue:    ASPLOS 2008 This paper examines the effects of a virtualization on address translation. It is known that minimizing the number of virtualization exits to the hypervisor is critical for optimal performance. By default, two dimensions of address translation would be needed -- guest VA to guest PA, then guest PA to system PA. By default this is done by a "shadow page table" in software. However, the use of a nested page table enables full translation form guest VA to system PA via hardware. However, added complexity can yield reduced TLB hit rates, which can lead to degradation in performance by up to 70% over native execution. As such, the authors of this paper introduce  Page Walk Cache  (PWC), which when used in combination with nested paging, can yield up to a 38% performance improvement (or more referenced outside of the paper).  The authors of this paper perform carefu...

Taming Performance Variability

Authors: Aleksander Maricq, Dmitry Duplyakin, Ivo Jimenez, Carlos Maltzahn, Ryan Stutsman, Robert Ricci Venue: OSDI 2018 This paper performs in-depth statistical analysis to understand the performance variability present in real-systems. The goal is to quantify variability, and understand how to tame it from both a researcher's and cloud provider's perspective. To do so, the authors collect nearly 900,000 data points over the course of 10 months on real systems. A key insight is that the distribution of runs is a non-normal distribution, as such, typical parameterized analysis with closed form solutions should not be applied. In fact, typical analysis using CoV yields significantly different results than those which make no assumptions about the distribution. Thus, the authors make use of non-parameterized techniques to establish confidence intervals and error tolerance. From a researcher's perspective, the authors build a tool which performs such analysis on a given d...

Seer: Leveraging Big Data to Navigate the Complexity of Performance Debugging in Cloud Microservices

Authors: Yu Gan, Yanqi Zhang, Kelvin Hu, Dailun Cheng, Yuan He, Meghna Pancholi, Christina Delimitrou Venue: ASPLOS 2018 Seer presents a framework to diagnose and avoid QoS violations in real-time. The motivation, design, and experimental framework in this paper are some of the best and most through I have seen in my recent reading. The work begins by discussing the microservice design of cloud providers. Such frameworks have numerous layers of abstraction, are often written in multiple programming languages, and have complex (and changing) dependency graphs. A performance bug in one microservice can cause QoS in many others, and diagnosing the root cause can be difficult. The work then builds a complex data collection framework which uses RPC-level and perf counters. When perf counters aren't available, the system uses microbenchmarks to diagnose the bottleneck. This area is particularly complex, and the authors even note that their system is similar to Dapper and Zipkin wh...