Bayesian exploration for approximate dynamic programming

Citation:

Ryzhov, I.O. et al., Submitted. Bayesian exploration for approximate dynamic programming. Operations Research.
2015_ryzhov_adp.pdf1 MB

Abstract:

Approximate dynamic programming (ADP) is a general methodological framework for multi-stage stochastic optimization problems in transportation, finance, energy, and other applications where scarce resources must be allocated optimally. We propose a new approach to the exploration/exploitation dilemma in ADP. First, we show how a Bayesian belief structure can be used to express uncertainty about the value function in ADP. Bayesian models can be easily integrated into both parametric and non-parametric value function approximations. Second, we propose a new exploration strategy, based on the concept of value of information from the optimal learning literature, for systematic exploration of the state and action space. Using several resource allocation problems, we demonstrate that the performance of our method is highly competitive against other exploration strategies, with the benefits being particularly valuable in online implementations where computation time is less restrictive.

Last updated on 07/22/2015