Abstract:
Controlled stochastic systems occur in
science, engineering, manufacturing, social sciences,
and many other contexts. If the system is modeled
as a Markov decision process (MDP) and will run
ad infinitum, the optimal control policy can
be computed in polynomial time using linear programming.
The problems considered here assume that the time that the
process will run is finite, and based on the size of the input.
There are many factors that compound the complexity of computing
the optimal policy. For instance
there are many factors that compound the complexity of this
computation. For instance, if the controller does not have
complete information about the state of the system, or if
the system is represented in some very succinct manner,
the optimal policy is provably not computable
in time polynomial in the size of the input.
We analyze the computational complexity of evaluating
policies and of determining whether a sufficiently good
policy exists for a MDP, based on a number of confounding factors,
including the observability of the system state; the
succinctness of the representation; the type of policy;
even the number of actions relative to the number of states.
In almost every case, we show that the decision problem is
complete for some known complexity class.
Some of these results are familiar from work by Papadimitriou
and Tsitsiklis and others, but some, such as our PL-completeness
proofs, are surprising. We
include proofs of completeness for natural problems in
the as yet little-studied classes NP(PP).
Back to my homepage