Abstract
We consider a special type of continuous-time Markov decision
processes (MDPs) that arise when phase-type distributions are used to
model the timing of non-Markovian events and actions. We focus,
primarily, on the execution of phase-dependent policies. Phases are
introduced into a model to represent relevant execution history, but
there is no physical manifestation of phases in the real world. We
treat phases as partially observable state features and show how a
belief distribution over phase configurations can be derived from
observable state features through the use of transient analysis for
Markov chains. This results in an efficient method for phase tracking
during execution that can be combined with the QMDP
value method for POMDPs to make action choices. We also discuss,
briefly, how the structure of MDPs with phase transitions can be
exploited in structured value iteration with symbolic representation
of vectors and matrices.
Full paper: PDF (6 pages, 23 references)
Copyright © 2005, American Association for Artificial Intelligence. All rights reserved.
Presentation: PPT, PDF (20 slides)
| Håkan L. S. Younes |
|
|
| Last modified: Mon Mar 13 12:55:25 EST 2006 |