Flexible and Efficient Decision-Making for Proactive Latency-Aware Self-Adaptation
Venue: Transactions on Autonomous and Adaptive Systems (TAAS)
Authors: Gabriel A. Moreno, Javier Camara, David Garlan, Bradley Schmerl
The title essentially encapsulates the problem which this paper is addressing. The setting is a Markov Decision Process, but with deterministic adaptations that do not effect the evolution of the environment. Restated another way, decisions do not impact the state, but rather the reward (and penalty). This leads to a separate notion of environment state and system state. Note that the utility (reward) may be a factor of both. The innovation of this paper lies specifically in encapsulating that while deterministic, the different actions have different delays in terms of system state.
The environment evolution is modeled via a discrete time markov chain (DTMC), which can be used to generate a partial probability tree. Specifically, an Extended Pearson-Tukey (EP-T) three point approximation. To encode latency, the progress of an adaptation is encoded into state. The authors make use of Alloy to reduce the search space into only feasible structures. This component is computed offline. Essentially, this work differs from the author's previous work computing a large portion of computation offline. Since all transitions are deterministic, feasibility of decisions can be computed offline and only environment transitions and rewards must be evaluated offline. The results of the paper are very compelling, although the actual computation time is not clear.
Personal note: This is one of the most interesting, and mathematically complex papers I've read in a while. It draws from both reinforcement learning topics as well as queuing theory and formal verification. However, the use of such complicated techniques looks to yield exceedingly positive results. I hope to follow up with the authors separately about some details.
Full Paper
Authors: Gabriel A. Moreno, Javier Camara, David Garlan, Bradley Schmerl
The title essentially encapsulates the problem which this paper is addressing. The setting is a Markov Decision Process, but with deterministic adaptations that do not effect the evolution of the environment. Restated another way, decisions do not impact the state, but rather the reward (and penalty). This leads to a separate notion of environment state and system state. Note that the utility (reward) may be a factor of both. The innovation of this paper lies specifically in encapsulating that while deterministic, the different actions have different delays in terms of system state.
The environment evolution is modeled via a discrete time markov chain (DTMC), which can be used to generate a partial probability tree. Specifically, an Extended Pearson-Tukey (EP-T) three point approximation. To encode latency, the progress of an adaptation is encoded into state. The authors make use of Alloy to reduce the search space into only feasible structures. This component is computed offline. Essentially, this work differs from the author's previous work computing a large portion of computation offline. Since all transitions are deterministic, feasibility of decisions can be computed offline and only environment transitions and rewards must be evaluated offline. The results of the paper are very compelling, although the actual computation time is not clear.
Personal note: This is one of the most interesting, and mathematically complex papers I've read in a while. It draws from both reinforcement learning topics as well as queuing theory and formal verification. However, the use of such complicated techniques looks to yield exceedingly positive results. I hope to follow up with the authors separately about some details.
Full Paper
Comments
Post a Comment