Posts

Showing posts from November, 2019

Continuous Control with Deep Reinforcement Learning (DDPG)

Authors: Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicholas Heess, Tom Erez, Yuval Tassa, David Silver, & Daan Wierstra Venue: ICLR 2016 This work focuses on solving the problem of an complex environment AND complex actions. Formally, the work presents "an actor-critic, model-free algorithm based on the deterministic policy gradient (DPG) that can operate over continuous action spaces". This work builds on to two prior works, namely the Deep Q Network (DQN) and DPG algorithm. While DQN proposed using a deep neural network to enable RL to perform well in more complex tasks, it suffers from instability in large action spaces. Orthogonally, DPG offers a solution to large action spaces, but cannot support use of a deep network. This work extends DPG such to fix the instability issues by adding batch normalization and a target network. Batch normalization normalizes each dimension such that samples in a minibatch have a unit mean and variance. The target ne...

Resource Management with Deep Reinforcement Learning

Authors: Hongzi Mao, Mohammad Alizadeh, Ishai Menache, Srikanth Kandula Venue: HotNets-XV This work presents DeepRM, a deep reinforcement learning approach to the bin-packing task of job scheduling in a cluster. The authors utilize a synthetic environment which comprises of d  resource types. Jobs arrive online and and are scheduling during discrete time steps. No preemption occurs. In their simulated framework, they find the RL algorithm improves average slow down significantly compared to Tetris, Shortest Job First, and Packer. However, average job completion time is slightly higher. Intuitively, this makes sense since the RL algorithm is given a single reward, which is defined in respect to slowdown in this work. Overall, this work marks an important step toward automation of job scheduling in a resource constrained environment. RL Formulation In order to fix the state representation, only the vector representation of M  jobs is encoded in the state space, plus a scal...

FACT: A Framework for Adaptive Contention-aware Thread Migrations

Authors: Kishore Kumar Pusukuri, David Vengerov, Alexandra Fedorova, Vana Kalogeraki Venue: Computing Frontiers (CF) 2011 This paper presents one of the first applications of machine learning to solve the thread scheduling problem on multi-core systems. In 2011 (and I believe still today, in 2019), OS's do not factor in resource sharing effects, such as cache, prefetcher, memory bus, memory controller. To effectively schedule tasks, the OS needs to understand how different workloads utilize resources and the overall effects of resource sharing. This paper uses a machine learning approach to predict the effects of potential thread migrations. The work finds that fuzzy rule-based predictors work best, and out performs the default scheduler by ~11% and the prior art by ~2%. The remainder of this post discusses the algorithm and problem setup. This discussion may come across as critical, but it is meant only to be though provoking and counter arguments are welcome.  The base algor...

AsmDB: Understanding and Mitigating Front-end Stalls in Warehouse-Scale Computers

Authors: Grant Ayers et al. Venue: ISCA 2019 Previous works have highlighted a significant frontend bottleneck is warehouse-scale computers (WSC). A variety of solutions have been proposed, both on real hardware and architectural papers, to mitigate the issue. This paper performs deep analysis across 90%+ of Google's entire fleet to perform a fine-grain analysis of when, where, and how frontend bottlenecks occur. AsmDB comprises of post-processed last branch record (LBR) data to form control flow probabilities and precise information about what instructions triggered I-cache misses. The analysis shows several core reasons for I-cache misses: large jump distances (either via function call or indirect branch) and cold code being brought into the cache (either via cache blocks or prefetching). The remainder of the paper transitions to being much more compiler focused. It describes a software prefetching algorithm which utilizes AsmDB information to inject prefetches at ideal poin...

SoftSKU: Optimizing Server Architectures for Microserive Diversity @Scale

Authors: Akshitha Sriraman, Abhishek Dhanotia, Thomas F. Wenisch Venue: ISCA 2019 This work comprises of two main parts: a detailed analysis and tool to improve course-grain parameters based on general application (microservice) behavior. The author's analyze workloads in Facebook's datacenter in the categories of Web, Feed, Ads, and Cache which have varying throughput and latency requirements. The data center workloads exhibit significant front-end stalls (instruction fetch misses), significant branch  mispredictions, and significant back-end stalls (mostly data cache misses). uSKU is presented as a tool which automates the process of parameter tuning in an effort to improve system optimization for specific classes of microservices. Core frequency, uncore frequency, core count, code-and-data prioritization, prefetchers, transparent and static huge pages are explored. Knobs are tested independently and thus to do not consider dependent effects (Gaussian process search seem...