Taming Performance Variability
Authors: Aleksander Maricq, Dmitry Duplyakin, Ivo Jimenez, Carlos Maltzahn, Ryan Stutsman, Robert Ricci
Venue: OSDI 2018
This paper performs in-depth statistical analysis to understand the performance variability present in real-systems. The goal is to quantify variability, and understand how to tame it from both a researcher's and cloud provider's perspective. To do so, the authors collect nearly 900,000 data points over the course of 10 months on real systems.
A key insight is that the distribution of runs is a non-normal distribution, as such, typical parameterized analysis with closed form solutions should not be applied. In fact, typical analysis using CoV yields significantly different results than those which make no assumptions about the distribution. Thus, the authors make use of non-parameterized techniques to establish confidence intervals and error tolerance. From a researcher's perspective, the authors build a tool which performs such analysis on a given dataset and recommends the required number of trials needed to establish certain bounds. From a cloud provider's perspective, the same analysis can be used to determine the number of machines (and which) to omit in order to achieve much tighter bounds on performance.
Full Paper
Venue: OSDI 2018
This paper performs in-depth statistical analysis to understand the performance variability present in real-systems. The goal is to quantify variability, and understand how to tame it from both a researcher's and cloud provider's perspective. To do so, the authors collect nearly 900,000 data points over the course of 10 months on real systems.
A key insight is that the distribution of runs is a non-normal distribution, as such, typical parameterized analysis with closed form solutions should not be applied. In fact, typical analysis using CoV yields significantly different results than those which make no assumptions about the distribution. Thus, the authors make use of non-parameterized techniques to establish confidence intervals and error tolerance. From a researcher's perspective, the authors build a tool which performs such analysis on a given dataset and recommends the required number of trials needed to establish certain bounds. From a cloud provider's perspective, the same analysis can be used to determine the number of machines (and which) to omit in order to achieve much tighter bounds on performance.
Full Paper
Comments
Post a Comment