Resource Management with Deep Reinforcement Learning

November 20, 2019

Authors: Hongzi Mao, Mohammad Alizadeh, Ishai Menache, Srikanth Kandula
Venue: HotNets-XV

This work presents DeepRM, a deep reinforcement learning approach to the bin-packing task of job scheduling in a cluster. The authors utilize a synthetic environment which comprises of d resource types. Jobs arrive online and and are scheduling during discrete time steps. No preemption occurs. In their simulated framework, they find the RL algorithm improves average slow down significantly compared to Tetris, Shortest Job First, and Packer. However, average job completion time is slightly higher. Intuitively, this makes sense since the RL algorithm is given a single reward, which is defined in respect to slowdown in this work. Overall, this work marks an important step toward automation of job scheduling in a resource constrained environment.

RL Formulation
In order to fix the state representation, only the vector representation of M jobs is encoded in the state space, plus a scalar value which tracks the number of jobs not yet scheduled. During each time step, the agent makes multiple decisions (actions). Each decision the agent selects one of M jobs to schedule, or no job. The agent repeatedly makes decisions until an invalid action (not enough resources) or no job is chosen. The returns are defined as the sum of -1/T for all jobs where T is the ideal completion time for that job. The system is trained using policy gradients with a baseline in an episodic approach. One part that is unclear is how multiple actions can be used to update the model when only a single reward is given for each timestep. How is this encoded into the loss function and corresponding update?

Search This Blog

Karl Taht's Research Paper Blog

Resource Management with Deep Reinforcement Learning

Comments

Post a Comment

Popular posts from this blog

Fundamental Latency Trade-offs in Architecting DRAM Caches (Alloy Cache)

ZCOMP: Reducing DNN Cross-Layer Memory Footprint Using Vector Extensions

AutoFDO: Automatic Feedback-Directed Optimization for Warehouse-Scale Applications