Posts

Showing posts from October, 2019

Learning Scheduling Algorithms for Data Processing Clusters

Authors: Hongzi Mao, Malte Schwarzkopf, Shaileshh Bojja Venkatakrishnan, Zili Meng, Mohammad Alizadeh Venue: Proceedings of the ACM Special Interest Group on Data Communication This paper utilizes reinforcement learning to schedule learn a scheduling policy for Spark jobs. The scheduler has two main decisions: (i) what stage to schedule and (ii) how much parallelism to exploit for that stage. The RL problem is formulated as given the state of the cluster and DAG input, output a scheduling action. Reward is defined as -T x J where T is the time step and J is the number of jobs in the system. The decision making process is particularly difficult because an job can present a DAG of any shape for dependencies, yet, the neural network input is of fixed size. To solve this, a method based on graph convolutional neural networks [1] is used. The RL policy network predicts a composite action of stage of maximum parallelism level. To train the network in the case of continuous job arrivals, ...

Dune: Safe User-level Access to Privledged CPU Features

Authors: Adam Belay, Andrea Bittau, Ali Mashtizadeh, David Terei, David Mazieres, Christos Kozyrakis Venue: OSDI 2012 Dune uses virtualization hardware to provide the abstraction of a userspace process, but with safe access to hardware features. The Dune process runs as VMX non-root (aka guest ring 0). This enables virtual access to hardware features, such as exception handling and virtual memory. Because the system is built utilizing hardware support, performance is maintained (e.g. normal system call invocations do not cause a VM exit). Furthermore, a Dune process in guest ring 0 can run untrusted code in guest ring 3, and intercept any unwanted behavior, such as unauthorized systems calls. Without getting into more details, what is important is that Dune effectively enables kernel bypass, and sets the stage future work such as Shinjuku.

The Linux Scheduler: A Decade of Wasted Cores

Authors: Jean-Pierre Lozi, Baptiste Lepers, Justin Funston, Fabien Gaud, Vivien Quema, Alexandra Fedorova Venue: EuroSys 2016 Before diving into this paper, it's worth mentioning that this paper is presented more like a technical report rather than a research paper. Its clearly very important work, but the authors present software "bugs" within Linux, and fixes for those bugs. The paper is not trying to establish something completely new (other than a set of tools), but rather analysis and fixes. . . . The paper presents findings that the Linux CFS scheduler breaks a fundamental invariant: make sure that ready threads are scheduled if cores are idle. Due to increased complexity within the scheduler to deal with multiprocessors and NUMA domains, the scheduler has issues which prevent this invariant from being met. The CFS scheduler relies on a hierarchy of "scheduling groups" of cores and NUMA domains (scheduling domains). As an aside, it is not clear if th...

Shinjuku: Preemptive Scheduling for microsecond-scale Tail Latency

Authors: Kostis Kaffes, Timothy Chong, Jack Tigar Humphries, Adam Belay, David Mazieres, Christos Kozyrakis Venue: NSDI 2019 Shinjuku is a dataplane operating system that leverages hardware support for virtualization to enable microsecond-scale preemption. In network processing, there is a fundamental challenge between optimizing for throughput and latency. Interrupt cores too frequently, and throughput will drop because of context switching overheads. Conversely, infrequent preemption can lead to poor tail latency, as short requests can get stuck behind long requests. This is particularly prevalent in bimodal distributions -- such as DB server processing short get() and put() requests while also servicing scans. Shinjuku first greatly reduces context switching overhead by leveraging Dune, enabling direct access to APICs, and other optimizations. With low-overhead context swap enabled, the authors then focus on an effective preemptive scheduling algorithms. The authors utilize cen...

Exokernel: An Operating System Architecture for Application-Level Resource Management

Authors: Dawson R. Engler, M. Frans Kaashoek, James O'Toole Jr. Venue:    SOSP 1995 This work presents a solution to a fundamental constraint of modern operations: that generality degrades performance, or conversely, that the lower level of a primitive, the more efficiently it can be implemented. The exokernel operating systems centralizes around a single goal: separating protection of management. The exokernel itself, Aegis, utilizes secure bindings, visible resource revocation, and an abort protocol to handle protections. However, low level interfaces are exposed so that "library operating systems" can efficiently implement abstractions tailored to their needs. Some examples of this are explicit memory management, custom IPC (inter-process communication), and exception handling. The details of how this is achieved I would describe as "less than straight forward", however, there are extensive resources describing it further, such as this book . I think more ...