Publications

Manuscript
Li, J. & Ryzhov, I.O., Submitted. Convergence rates of epsilon-greedy global optimization under radial basis function interpolation.Abstract

We study a global optimization problem where the objective function can be observed exactly at individual design points with no derivative information. We suppose that the design points are determined sequentially using an epsilon-greedy algorithm, i.e., by sampling uniformly on the design space with a certain probability, and otherwise sampling in a local neighborhood of the solution currently believed to be the best. We derive both pathwise and probabilistic bounds on the rate of convergence to the best solution. Convergence rates improve when the width of the local search neighborhood is made to shrink over time at a suitably chosen speed.

PDF icon 2020_liry_rbf.pdf
Zhou, J. & Ryzhov, I.O., Submitted. Revenue management of observable express service with customer choice.Abstract

We study a stylized queueing model motivated by paid express lanes on highways. There are two parallel, observable queues with finitely many servers; one queue has a faster service rate, but charges a fee to join, and the other is free but slow. Upon arrival, customers see the state of each queue and choose between them by comparing the respective disutility of time spent waiting, subject to random shocks; this framework encompasses both the multinomial logit and exponomial customer choice models. Using a fluid limit analysis, we give a detailed characterization of the equilibrium in this system. We show that social welfare is optimized when the express queue is exactly at (but not over) full capacity; however, in some cases, revenue is maximized by artificially creating congestion in the free queue. The latter behaviour is caused by changes in the price elasticity of demand as the service capacity of the free queue fills up.

PDF icon 2019_zhry_queues.pdf
Chen, Y., et al., Submitted. Data-driven robust resource allocation with isotonic cost functions.Abstract

We consider two-stage planning problems (arising, e.g., in city logistics) where a resource is first divided among a set of independent regions, and then costs are incurred based on the allocation to each region. Costs are assumed to be decreasing in the quantity of the resource, but their precise values are unknown, for example if they represent difficult expected values. We develop a new data-driven uncertainty model for isotonic cost functions, which can be used in conjunction with robust optimization to obtain tractable allocation decisions that significantly improve worst-case performance outcomes. Our model uses a novel uncertainty set construction that rigorously handles isotonic structure by reverse-engineering a statistical goodness-of-fit test with respect to a given sample of data. The practical value of this approach is demonstrated in two applications on realistic transportation networks.

PDF icon 2019_chmarysc_isotonic.pdf
Chen, Y. & Ryzhov, I.O., Submitted. Balancing optimal large deviations in sequential selection.Abstract

In the ranking and selection problem, a sampling budget is allocated among a finite number of designs with the goal of efficiently identifying the best. Allocations of this budget may be static (with no dependence on the random values of the samples) or adaptive (decisions are made based on the results of previous decisions). A popular methodological strategy in the simulation literature is to first characterize optimal static allocations, by using large deviations theory to derive a set of optimality conditions, and then to use these conditions to guide the design of adaptive allocations. We propose a new methodology that can be guaranteed to adaptively learn the solution to these optimality conditions in a computationally efficient manner, without any tunable parameters, and under general assumptions on the sampling distribution.

PDF icon 2019_chry_rates.pdf
Gu, L., Ryzhov, I.O. & Eftekhar, M., Submitted. The facts on the ground: evaluating humanitarian fleet management policies using simulation.Abstract

In humanitarian fleet management, the performance of purchase, assignment, and sales decisions is determined by dynamic interactions between the fleet composition (vehicles that were acquired at different times and have different residual values), the time-varying and uncertain demands on the fleet, and the depreciation of the vehicles as they are exploited. When all of these factors are taken into account, optimal decisions become analytically intractable. We propose to evaluate purchase, assignment, and sales policies in a realistic simulation environment that directly models heterogeneous vehicle attributes and tracks their evolution over time. Using data from a large international humanitarian organization (LIHO), the simulator can identify the rationale behind seemingly ad-hoc decisions by field managers at LIHO. For instance, by selling vehicles later than LIHO recommends, managers are actually reducing their costs; similarly, managers decline to coordinate vehicles between mission types because the merits of "sharing" in this way turn out to be marginal at best.

PDF icon 2019_guryef_logistics.pdf

Finalist, POMS Humanitarian Operations and Crisis Management Best Paper Award, 2019.

Fan, Y., Liao, Y. & Ryzhov, I.O., Submitted. A dynamic screening algorithm for hierarchical binary marketing data.Abstract

In many applications of business and marketing analytics, statistical models are fit using hierarchically structured data: common characteristics of products and customers are represented as categorical variables, and each category can be split up into multiple subcategories at a lower level of the hierarchy. Hundreds of thousands of binary variables may be required to model the hierarchy, necessitating the use of variable selection to screen out large numbers of irrelevant or insignificant features. We propose a new dynamic screening method, based on the distance correlation criterion, designed for hierarchical binary data. Our method can screen out large parts of the hierarchy at the higher levels, avoiding the need to explore many lower-level features and greatly reducing the computational cost of screening. The practical potential of the method is demonstrated in a case study involving a large volume of B2B transaction data.

PDF icon 2018_faliry_ddc.pdf
Gu, L., et al., Submitted. Social behavior and user engagement in competitive online gaming: an empirical analysis.Abstract

We conduct an empirical study of customer engagement, loyalty, and purchase behavior, based on a high-resolution player-level dataset from a major international video game company for one of its premier titles. Based on these data, we first define several metrics to quantify user satisfaction and loyalty, then study the effects on these metrics of in-game user behavior. Our analysis has a particular emphasis on social behavior, as video games have increasingly taken on a social dimension, with social interaction becoming an important motivating factor for users. Our results show that both gameplay and social behavior is strongly connected to short-term user engagement; however, most of the positive effects disappear when we consider long-term retention or user purchases. In many cases, the strength of the social experience appears to detract from the user's commitment to the game, which argues against the strategic investment of resources by game developers into the design of social gameplay.

PDF icon 2019_gu_gaming.pdf
Journal Article
Peng, Y., et al., In Press. Efficient sampling allocation procedures for selecting the optimal quantile. INFORMS Journal on Computing.Abstract

We derive new sampling allocation schemes, based on myopic allocation policies and optimal computing budget allocation, for selecting the design with the optimal quantile in a Bayesian framework, analogous to existing methods in classic ranking and selection for selecting the design with the optimal mean. Theoretical analyses and empirical results demonstrate that these approaches have complementary advantages. To that end, we also provide a switching strategy that has balanced performance in both small-sample and large-sample scenarios.

PDF icon 2016_peng_quantile.pdf
Chen, Y. & Ryzhov, I.O., 2020. Consistency analysis of sequential learning under approximate Bayesian inference. Operations Research , 68 (1) , pp. 295-307. PDFAbstract

Approximate Bayesian inference is a powerful methodology for constructing computationally efficient statistical mechanisms for sequential learning from incomplete or censored information. Approximate Bayesian learning models have proven successful in a variety of operations research and business problems; however, prior work in this area has been primarily computational, and the consistency of approximate Bayesian estimators has been a largely open problem. We develop a new consistency theory by interpreting approximate Bayesian inference as a form of stochastic approximation (SA) with an additional "bias" term. We prove the convergence of a general SA algorithm of this form, and leverage this analysis to derive the first consistency proofs for a suite of approximate Bayesian models from the recent literature.

Qu, H., et al., 2020. Learning demand curves in B2B pricing: a new framework and case study. Production and Operations Management , 29 (5) , pp. 1287-1306. PDFAbstract

In business-to-business (B2B) pricing, a seller seeks to maximize revenue obtained from high-volume transactions involving a wide variety of buyers, products, and other characteristics. Buyer response is highly uncertain, and the seller only observes whether buyers accept or reject the offered prices. These deals are also subject to high opportunity cost, since revenue is zero if the price is rejected. The seller must adapt to this uncertain environment and learn quickly from new deals as they take place. We propose a new framework for statistical and optimal learning in this problem, based on approximate Bayesian inference, that has the ability to measure and update the seller's uncertainty about the demand curve based on new deals. In a case study, based on historical data, we show that our approach offers significant practical benefits.

Huang, Y., et al., 2019. Optimal learning for urban delivery fleet allocation. Transportation Science , 53 (3) , pp. 623-641. PDFAbstract

In a two-tiered city logistics system, an urban logistics company usually partitions the urban area into regions and allocates its delivery fleet (e.g., vehicles, couriers) to these regions. On a daily basis, the delivery station in each region receives the delivery packages from the city distribution centers and delivers them to customers within the region, using its allocated delivery vehicles. A tactical decision in such a city logistics system is the allocation of its delivery fleet to the regions to minimize the expected operational cost of the entire system. However, due to the complexity of the urban delivery operations and the day-to-day variance of the customer demand, an accurate evaluation of the expected operational cost associated with an allocation decision can be very expensive. We propose a learning policy that adaptively selects the fleet allocation to learn the underlying expected operational cost function by incorporating the value of information. Specifically, we exploit the monotonicity of the expected operational cost in the number of allocated delivery vehicles in a region and extend the idea of knowledge gradient with discrete priors (KGDP) with resampling and re-generation (KGDP-R&R). Our numerical results demonstrate the effectiveness of KGDP-R&R against other learning policies as well as its managerial implications as compared to heuristics in practice.

Chen, Y. & Ryzhov, I.O., 2019. Complete expected improvement converges to an optimal budget allocation. Advances in Applied Probability , 51 (1) , pp. 209-235. PDFAbstract

The ranking and selection problem is a well-known mathematical framework for the formal study of optimal information collection. Expected improvement (EI) is a leading algorithmic approach to this problem; the practical benefits of EI have repeatedly been demonstrated in the literature, especially in the widely studied setting of Gaussian sampling distributions. However, it was recently proved that some of the most well-known EI-type methods achieve suboptimal convergence rates. We investigate a recently-proposed variant of EI (known as "complete EI") and prove that, with some minor modifications, it can be made to converge to the rate-optimal static budget allocation without requiring any tuning.

Ryzhov, I.O., et al., 2019. Bayesian exploration for approximate dynamic programming. Operations Research , 67 (1) , pp. 198-214. PDFAbstract

Approximate dynamic programming (ADP) is a general methodological framework for multi-stage stochastic optimization problems in transportation, finance, energy, and other applications where scarce resources must be allocated optimally. We propose a new approach to the exploration/exploitation dilemma in ADP that leverages two important concepts from the optimal learning literature: first, we show how a Bayesian belief structure can be used to express uncertainty about the value function in ADP; second, we develop a new exploration strategy based on the concept of value of information, and prove that it systematically explores the state space. An important advantage of our framework is that it can be integrated into both parametric and non-parametric value function approximations, which are widely used in practical implementations of ADP. We evaluate this strategy on a variety of distinct resource allocation problems and demonstrate that, while more computationally intensive, it is highly competitive against other exploration strategies.

Ryzhov, I.O., 2018. The local time method for targeting and selection. Operations Research , 66 (5) , pp. 1406-1422. PDFAbstract

We propose a framework for "targeting and selection" (T&S), a new problem class in simulation optimization where the objective is to select a simulation alternative whose mean performance matches a pre-specified target as closely as possible. T&S resembles the more well-known problem of ranking and selection, but presents unexpected challenges: for example, a one-step look-ahead method may produce statistically inconsistent estimates of the values, even under very standard normality assumptions. We create a new and fundamentally different approach, based on a Brownian local time model, that exhibits characteristics of two widely-studied methodologies, namely expected improvement and optimal computing budget allocation. We characterize the asymptotic sampling rates of this method and relate them to the convergence rates of metrics of interest. The local time method outperforms benchmarks in experiments, including problems where the modeling assumptions of T&S are violated.

Marković, N., Ryzhov, I.O. & Schonfeld, P., 2017. Evasive flow capture: a multi-period stochastic facility location problem with independent demand. European Journal of Operational Research , 257 (2) , pp. 687-703. PDFAbstract

We introduce the problem of locating facilities over a finite time horizon with the goal of intercepting stochastic traffic flows that exhibit evasive behavior, which arises when locating weigh-in-motion systems, vehicle inspection stations, tollbooths, or other fixed flow-capturing facilities used for law enforcement. The problem can be formulated as a multi-stage, mixed-integer stochastic program (SP); however, under certain independence assumptions, this can be reformulated as a large two-stage SP, enabling us to solve much larger instances. We additionally propose an algorithm based on Lagrangian relaxation that separates the reformulated SP into a variant of a deterministic knapsack problem and a sum of time-decoupled single-period SPs that can be solved independently. The model and algorithm are tested on instances involving road networks of Nevada and Vermont. A comparison with the previously studied single-period stochastic programming approach shows that the newly proposed multi-period model substantially reduces the expected cost.

Ryzhov, I.O., 2016. On the convergence rates of expected improvement methods. Operations Research , 64 (6) , pp. 1515-1528. PDFAbstract

We consider a ranking and selection problem with independent normal observations, and analyze the asymptotic sampling rates of expected improvement (EI) methods in this setting. Such methods often perform well in practice, but a tractable analysis of their convergence rates is difficult due to the nonlinearity and nonconvexity of the functions used in the EI calculations. We present new results indicating that, for known sampling noise, variants of EI produce simulation allocations that are essentially identical to those chosen by the optimal computing budget allocation (OCBA) methodology, which is known to achieve near-optimal asymptotic performance in ranking and selection. This is the first general equivalence result between EI and OCBA, and it provides insight into the good practical performance of EI. We also derive the limiting allocation for EI under unknown sampling variance.

Winner, INFORMS Simulation Society Outstanding Publication Award, 2017.

Han, B., Ryzhov, I.O. & Defourny, B., 2016. Optimal learning in linear regression with combinatorial feature selection. INFORMS Journal on Computing , 28 (4) , pp. 721-735. PDFAbstract

We present a new framework for sequential information collection in applications where regression is used to learn about a set of unknown parameters, and alternates with optimization to design new data points. Such problems can be handled using the framework of ranking and selection (R&S), but traditional R&S procedures will experience high computational costs when the decision space grows combinatorially. This challenge arises in many applications of business analytics; in particular, we are motivated by the problem of efficiently learning effective strategies for non-profit fundraising. We present a value of information procedure for simultaneously learning unknown regression parameters and unknown sampling noise. We then develop an approximate version of the procedure, based on convex relaxation, that retains good performance and scales better to large problems.

Finalist, INFORMS Washington DC Chapter Student Excellence Award, 2015.

Ryzhov, I.O., Han, B. & Bradić, J., 2016. Cultivating disaster donors using data analytics. Management Science , 62 (3) , pp. 849-866. PDFAbstract

Non-profit organizations use direct-mail marketing to cultivate one-time donors and convert them into recurring contributors. Cultivated donors generate much more revenue than new donors, but also lapse with time, making it important to steadily draw in new cultivations. The direct-mail budget is limited, but better-designed mailings can improve success rates without increasing costs. We propose an empirical model to analyze the effectiveness of several design approaches used in practice, based on a massive dataset covering 8.6 million direct-mail communications with donors to the American Red Cross during 2009-2011. We find evidence that mailed appeals are more effective when they emphasize disaster preparedness and training efforts over post-disaster cleanup. Including small cards that affirm donors' identity as Red Cross supporters is an effective strategy, while including gift items such as address labels is not. Finally, very recent acquisitions are more likely to respond to appeals that ask them to contribute an amount similar to their most recent donation, but this approach has an adverse effect on donors with a longer history. We show via simulation that a simple design strategy based on these insights has potential to improve success rates from 5.4% to 8.1%.

Ding, Z. & Ryzhov, I.O., 2016. Optimal learning with non-Gaussian rewards. Advances in Applied Probability , 48 (1) , pp. 112-136. PDFAbstract

We propose a novel theoretical characterization of the optimal "Gittins index" policy in multi-armed bandit problems with non-Gaussian, infinitely divisible reward distributions. We first construct a continuous-time, conditional Lévy process which probabilistically interpolates the sequence of discrete-time rewards. When the rewards are Gaussian, this approach enables an easy connection to the convenient time-change properties of Brownian motion. Although no such device is available in general for the non-Gaussian case, we use optimal stopping theory to characterize the value of the optimal policy as the solution to a free-boundary partial integro-differential equation (PIDE). We provide the free-boundary PIDE in explicit form under the specific settings of exponential and Poisson rewards. We also prove continuity and monotonicity properties of the Gittins index in these two problems, and discuss how the PIDE can be solved numerically to find the optimal index value of a given belief state.

Finalist, INFORMS Junior Faculty Forum Paper Competition, 2014.

Marković, N., Ryzhov, I.O. & Schonfeld, P., 2015. Evasive flow capture: optimal location of weigh-in-motion systems, tollbooths, and security checkpoints. Networks , 65 (1) , pp. 22-42. PDFAbstract

The flow-capturing problem (FCP) consists of locating facilities to maximize the number of flow-based customers that encounter at least one of these facilities along their predetermined travel paths. The FCP literature assumes that if a facility is located along (or “close enough” to) a predetermined path of a flow of customers, that flow is considered captured. However, existing models for the FCP do not consider targeted users who behave noncooperatively by changing their travel paths to avoid fixed facilities. Examples of facilities that targeted subjects may have an incentive to avoid include weigh-in-motion stations used to detect and fine overweight trucks, tollbooths, and security and safety checkpoints. This article introduces a new type of flow-capturing model, called the “evasive flow-capturing problem” (EFCP), which generalizes the FCP and has relevant applications in transportation, revenue management, and security and safety management. We formulate deterministic and stochastic versions of the EFCP, analyze their structural properties, study exact and approximate solution techniques, and show an application to a real-world transportation network.

Winner, Glover-Klingman Prize, 2015.

Ryzhov, I.O., Frazier, P.I. & Powell, W.B., 2015. A new optimal stepsize for approximate dynamic programming. IEEE Transactions on Automatic Control , 60 (3) , pp. 743-758. PDFAbstract

Approximate dynamic programming (ADP) has proven itself in a wide range of applications spanning large-scale transportation problems, health care, revenue management, and energy systems. The design of effective ADP algorithms has many dimensions, but one crucial factor is the stepsize rule used to update a value function approximation. Many operations research applications are computationally intensive, and it is important to obtain good results quickly. Furthermore, the most popular stepsize formulas use tunable parameters and can produce very poor results if tuned improperly. We derive a new stepsize rule that optimizes the prediction error in order to improve the short-term performance of an ADP algorithm. With only one, relatively insensitive tunable parameter, the new rule adapts to the level of noise in the problem and produces faster convergence in numerical experiments.

Defourny, B., Ryzhov, I.O. & Powell, W.B., 2015. Optimal information blending with measurements in the L2 sphere. Mathematics of Operations Research , 40 (4) , pp. 1060-1088. PDFAbstract
A sequential information collection problem, where a risk-averse decision maker updates a Bayesian belief about the unknown objective function of a linear program, is used to investigate the informational value of measurements performed to refine a robust optimization model. The information is collected in the form of a linear combination of the objective coefficients, subject to random noise. We have the ability to choose the weights in the linear combination, creating a new, nonconvex continuous-optimization problem, which we refer to as information blending. We develop two optimal blending strategies: (1) an active learning method that maximizes uncertainty reduction and (2) an economic approach that maximizes an expected improvement criterion. Semidefinite programming relaxations are used to create efficient convex approximations to the nonconvex blending problem.
Qu, H., et al., 2015. Sequential selection with unknown correlation structures. Operations Research , 63 (4) , pp. 931-948. PDFAbstract

We create the first computationally tractable Bayesian statistical model for learning unknown correlation structures in fully sequential simulation selection. Correlations represent similarities or differences between various design alternatives, and can be exploited to extract much more information from each individual simulation. However, in most applications, the correlation structure is unknown, thus creating the additional challenge of simultaneously learning unknown mean performance values and unknown correlations. Based on our new statistical model, we derive a Bayesian procedure that seeks to optimize the expected opportunity cost of the final selection based on the value of information, thus anticipating future changes to our beliefs about the correlations. Our approach outperforms existing methods for known correlation structures in numerical experiments, including one motivated by the problem of optimal wind farm placement, where real data are used to calibrate the simulation model.

Winner, INFORMS Computing Society Best Student Paper Award, 2012.

Ryzhov, I.O. & Powell, W.B., 2012. Information collection for linear programs with uncertain objective coefficients. SIAM Journal on Optimization , 22 (4) , pp. 1344-1368. PDFAbstract

Consider a linear program (LP) with uncertain objective coefficients, for which we have a Bayesian prior. We can collect information to improve our understanding of these coefficients, but this may be expensive, giving us a separate problem of optimizing the collection of information to improve the quality of the solution relative to the true cost coefficients. We formulate this information collection problem for LPs for the first time and derive a knowledge gradient policy which finds the marginal value of each measurement by solving a sequence of LPs. We prove that this policy is asymptotically optimal and demonstrate its performance on a network flow problem.

Ryzhov, I.O., Powell, W.B. & Frazier, P.I., 2012. The knowledge gradient algorithm for a general class of online learning problems. Operations Research , 60 (1) , pp. 180-195. PDFAbstract

We derive a one-period look-ahead policy for finite- and infinite-horizon online optimal learning problems with Gaussian rewards. Our approach is able to handle the case where our prior beliefs about the rewards are correlated, which is not handled by traditional multiarmed bandit methods. Experiments show that our KG policy performs competitively against the best-known approximation to the optimal policy in the classic bandit problem, and it outperforms many learning policies in the correlated case.

Ryzhov, I.O. & Powell, W.B., 2011. Information collection on a graph. Operations Research , 59 (1) , pp. 188-201. PDFAbstract

We derive a knowledge gradient policy for an optimal learning problem on a graph, in which we use sequential measurements to refine Bayesian estimates of individual edge values in order to learn about the best path. This problem differs from traditional ranking and selection in that the implementation decision (the path we choose) is distinct from the measurement decision (the edge we measure). Our decision rule is easy to compute and performs competitively against other learning policies, including a Monte Carlo adaptation of the knowledge gradient policy for ranking and selection.

Finalist, INFORMS New Jersey Chapter Student Contest, 2009.
Conference Proceedings
Chen, Y. & Ryzhov, I.O., 2019. Balancing optimal large deviations in ranking and selection. Proceedings of the 2019 Winter Simulation Conference , pp. 3368-3379. PDFAbstract

The ranking and selection problem deals with the optimal allocation of a simulation budget to efficiently identify the best among a finite set of unknown values. The large deviations approach to this problem provides very strong performance guarantees for static (non-adaptive) budget allocations. Using this approach, one can describe the optimal static allocation with a set of highly nonlinear, distribution-dependent optimality conditions whose solution depends on the unknown parameters of the output distribution. We propose a new methodology that provably learns this solution (asymptotically) and is very computationally efficient, has no tunable parameters, and works for a wide variety of output distributions.

Xie, W., Zhang, P. & Ryzhov, I.O., 2018. A simulation-based prediction framework for stochastic system dynamic risk management. Proceedings of the 2018 Winter Simulation Conference , pp. 1886-1897. PDFAbstract

We propose a simulation-based prediction framework which can quantify the prediction uncertainty of system future response and further guide operational decisions for complex stochastic systems. Specifically, by exploring the underlying generative process of real-world data streams, we first develop a nonparametric input model which can capture the important properties, including non-stationarity, skewness, componentwise and time dependence. It can improve the prediction accuracy, and the posterior predictive distribution can quantify the prediction uncertainty accounting for both input and stochastic uncertainties. Then, we propose the simulation-based prediction framework which can efficiently search for the optimal operational decisions hedging against the prediction uncertainty and minimizing the expected cost occurring in the planning horizon. The empirical study demonstrates that our approach has promising performance.

Chhabra, M., Das, S. & Ryzhov, I.O., 2018. The promise and perils of myopia in dynamic pricing with censored information. Proceedings of the 27th International Joint Conference on Artificial Intelligence , pp. 4994-5001. PDFAbstract

A seller with unlimited inventory of a digital good interacts with potential buyers with i.i.d. valuations. The seller can adaptively quote prices to each buyer to maximize long-term profits, but does not know the valuation distribution exactly. Under a linear demand model, we consider two information settings: partially censored, where agents who buy reveal their true valuations after the purchase is completed, and completely censored, where agents never reveal their valuations. In the partially censored case, we prove that myopic pricing with a Pareto prior is Bayes optimal and has finite regret. In both settings, we evaluate the myopic strategy against more sophisticated look-aheads using three valuation distributions generated from real data on auctions of physical goods, keyword auctions, and user ratings, where the linear demand assumption is clearly violated. For some datasets, complete censoring actually helps, because the restricted data acts as a “regularizer” on the posterior, preventing it from being affected too much by outliers.

Chen, Y. & Ryzhov, I.O., 2017. Rate-optimality of the complete expected improvement criterion. Proceedings of the 2017 Winter Simulation Conference , pp. 2173-2182. PDFAbstract

Expected improvement (EI) is a leading algorithmic approach to simulation-based optimization. However, it was recently proved that, in the context of ranking and selection, some of the most well-known EI-type methods cause the probability of incorrect selection to converge at suboptimal rates. We investigate a more recent variant of EI (known as "complete EI") that was proposed by Salemi, Nelson, and Staum (2014), and summarize results showing that, with some minor modifications, complete EI can be made to achieve the optimal convergence rate in ranking and selection with independent Gaussian noise. This is the strongest theoretical guarantee available for any EI-type method.

Chen, Y. & Ryzhov, I.O., 2016. Approximate Bayesian inference as a form of stochastic approximation: a new consistency theory with applications. Proceedings of the 2016 Winter Simulation Conference , pp. 534-544. PDFAbstract

Approximate Bayesian inference is a powerful methodology for constructing computationally efficient statistical learning mechanisms in problems where incomplete information is collected sequentially. Approximate Bayesian models have been developed and applied in a variety of different domains; however, this work has thus far been primarily computational, and convergence or consistency results for approximate Bayesian estimators are largely unavailable. We develop a new consistency theory for these learning schemes by interpreting them as stochastic approximation (SA) algorithms with additional "bias" terms. We prove the convergence of a general SA algorithm of this type, and apply this result to demonstrate, for the first time, the consistency of several approximate Bayesian methods from the recent literature.

Finalist, Best Theoretical Paper, Winter Simulation Conference, 2016.

Ryzhov, I.O., 2015. Expected improvement is equivalent to OCBA. Proceedings of the 2015 Winter Simulation Conference , pp. 3668-3677. PDFAbstract

This paper summarizes new theoretical results on the asymptotic sampling rates of expected improvement (EI) methods in fully sequential ranking and selection (R&S). These methods have been widely observed to perform well in practice, and often have asymptotic consistency properties, but rate results are generally difficult to obtain when observations are subject to stochastic noise. We find that, in one general R&S problem, variants of EI produce simulation allocations that are virtually identical to the rate-optimal allocations calculated by the optimal computing budget allocation (OCBA) methodology. This result provides new insight into the good empirical performance of EI under normality assumptions.

Fu, M.C., et al., 2014. Simulation optimization: a panel on the state of the art in research and practice. Proceedings of the 2014 Winter Simulation Conference , pp. 3696-3706. PDFAbstract

The goal of this panel was to discuss the state of the art in simulation optimization research and practice. The participants included representation from both academia and industry, where the latter was represented by participation from a leading software provider of optimization tools for simulation. This paper begins with a short introduction to simulation optimization, and then presents a list of specific questions that served as a basis for discussion during the panel discussion. Each of the panelists was given an opportunity to provide their preliminary thoughts on simulation optimization for this paper, and the remainder of the paper summarizes those points, ranging from assessments of the field from their perspective to initial reactions to some of the posed questions. Finally, one of the panelists who has worked on an online testbed of simulation optimization problems for the research community was asked to provide an update of the status of the Web site.

Chau, M., et al., 2014. Simulation optimization: a tutorial overview and recent developments in gradient-based methods. Proceedings of the 2014 Winter Simulation Conference , pp. 21-35. PDFAbstract

We provide a tutorial overview of simulation optimization methods, including statistical ranking & selection (R&S) methods such as indifference-zone procedures, optimal computing budget allocation (OCBA), and Bayesian value of information (VIP) approaches; random search methods; sample average approximation (SAA); response surface methodology (RSM); and stochastic approximation (SA). In this paper, we provide high-level descriptions of each of the approaches, as well as some comparisons of their characteristics and relative strengths; simple examples will be used to illustrate the different approaches in the talk. We then describe some recent research in two areas of simulation optimization: stochastic approximation and the use of direct stochastic gradients for simulation metamodels. We conclude with a brief discussion of available simulation optimization software.

Qu, H., Ryzhov, I.O. & Fu, M.C., 2013. Learning logistic demand curves in business-to-business pricing. Proceedings of the 2013 Winter Simulation Conference , pp. 29-40. PDFAbstract

This work proposes an approximate Bayesian statistical model for predicting the win/loss probability for a given price in business-to-business (B2B) pricing. This model allows us to learn parameters in logistic regression based on binary (win/loss) data and can be quickly updated after each new win/loss observation. We also consider an approach for recommending target prices based on the approximate Bayesian model, thus integrating uncertainty into decision-making. We test the statistical model and the target price recommendation strategy with synthetic data, and observe encouraging empirical results.

Ding, Z. & Ryzhov, I.O., 2013. Optimal learning with non-Gaussian rewards. Proceedings of the 2013 Winter Simulation Conference , pp. 631-642. PDFAbstract

We propose a theoretical and computational framework for approximating the optimal policy in multiarmed bandit problems where the reward distributions are non-Gaussian. We first construct a probabilistic interpolation of the sequence of discrete-time rewards in the form of a continuous-time conditional Lévy process. In the Gaussian setting, this approach allows an easy connection to Brownian motion and its convenient time-change properties. No such device is available for non-Gaussian rewards; however, we show how optimal stopping theory can be used to characterize the value of the optimal policy, using a free-boundary partial integro-differential equation, for exponential and Poisson rewards. We then solve this problem numerically to approximate the set of belief states possessing a given optimal index value, and provide illustrations showing that the solution behaves as expected.

Chau, M., et al., 2013. An empirical sensitivity analysis of the Kiefer-Wolfowitz algorithm and its variants. Proceedings of the 2013 Winter Simulation Conference , pp. 945-956. PDFAbstract

We investigate the mean-squared error (MSE) performance of the Kiefer-Wolfowitz (KW) stochastic approximation (SA) algorithm and two of its variants, namely the scaled-and-shifted KW (SSKW) in Broadie, Cicek, and Zeevi (2011) and Kesten’s rule. We conduct a sensitivity analysis of KW with various tuning sequences and initial start values and implement the algorithms for two contrasting functions. From our numerical experiments, SSKW is less sensitive to initial start values under a set of pre-specified parameters, but KW and Kesten’s rule outperform SSKW if they begin with well-tuned parameter values. We also investigate the tightness of an MSE bound for quadratic functions, a relevant issue for determining how long to run an SA algorithm. Our numerical experiments indicate the MSE bound for quadratic functions for the KW algorithm is sensitive to the noise level.

Han, B., Ryzhov, I.O. & Defourny, B., 2013. Efficient learning of donor retention strategies for the American Red Cross. Proceedings of the 2013 Winter Simulation Conference , pp. 17-28. PDFAbstract

We present a new sequential decision model for adaptively allocating a fundraising campaign budget for a non-profit organization such as the American Red Cross. The campaign outcome is related to a set of design features using linear regression. We derive the first simulation allocation procedure for simultaneously learning unknown regression parameters and unknown sampling noise. The large number of alternatives in this problem makes it difficult to evaluate the value of information. We apply convex approximation with a quantization procedure and derive a semidefinite programming relaxation to reduce the computational complexity. Simulation experiments based on historical data demonstrate the efficient performance of the approximation.

Qu, H., Ryzhov, I.O. & Fu, M.C., 2012. Ranking and selection with unknown correlation structures. Proceedings of the 2012 Winter Simulation Conference , pp. 144-155. PDFAbstract

We create the first computationally tractable Bayesian statistical model for learning unknown correlations among estimated alternatives in fully sequential ranking and selection. Although correlations allow us to extract more information from each individual simulation, the correlation structure is itself unknown, and we face the additional challenge of simultaneously learning the unknown values and unknown correlations from simulation. We derive a Bayesian procedure that allocates simulations based on the value of information, thus exploiting the correlation structure and anticipating future changes to our beliefs about the correlations. We test the model and algorithm in a simulation study motivated by the problem of optimal wind farm placement, and obtain encouraging empirical results.

Winner, Best Theoretical Paper, Winter Simulation Conference, 2012.

Ryzhov, I.O., Defourny, B. & Powell, W.B., 2012. Ranking and selection meets robust optimization. Proceedings of the 2012 Winter Simulation Conference , pp. 532-542. PDFAbstract

The objective of ranking and selection is to efficiently allocate an information budget among a set of design alternatives with unknown values in order to maximize the decision-maker’s chances of discovering the best alternative. The field of robust optimization, however, considers risk-averse decision makers who may accept a suboptimal alternative in order to minimize the risk of a worst-case outcome. We bring these two fields together by defining a Bayesian ranking and selection problem with a robust implementation decision. We propose a new simulation allocation procedure that is risk-neutral with respect to simulation outcomes, but risk-averse with respect to the implementation decision. We discuss the properties of the procedure and present numerical examples illustrating the difference between the risk-averse problem and the more typical risk-neutral problem from the literature.

Ryzhov, I.O. & Powell, W.B., 2011. Bayesian active learning with basis functions. Proceedings of the 2011 IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning , pp. 143-150. PDFAbstract

A common technique for dealing with the curse of dimensionality in approximate dynamic programming is to use a parametric value function approximation, where the value of being in a state is assumed to be a linear combination of basis functions. Even with this simplification, we face the exploration/exploitation dilemma: an inaccurate approximation may lead to poor decisions, making it necessary to sometimes explore actions that appear to be suboptimal. We propose a Bayesian strategy for active learning with basis functions, based on the knowledge gradient concept from the optimal learning literature. The new method performs well in numerical experiments conducted on an energy storage problem.

Winner, Best Paper, IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning, 2011.

Ryzhov, I.O., Tariq, A. & Powell, W.B., 2011. May the best man win: simulation optimization for match-making in e-sports. Proceedings of the 2011 Winter Simulation Conference , pp. 4239-4250. PDFAbstract

We consider the problem of automated match-making in a competitive online gaming service. Large numbers of players log on to the service and indicate their availability. The system must then find an opponent for each player, with the objective of creating competitive, challenging games that do not heavily favour either side, for as many players as possible. Existing mathematical models for this problem assume that each player has a skill level that is unknown to the game master. As more games are played, the game master’s belief about player skills evolves according to a Bayesian learning model, allowing the game master to adaptively improve the quality of future games as information is being collected. We propose a new decision-making policy in this setting, based on the knowledge gradient concept from the literature on optimal learning. We conduct simulations to demonstrate the potential of this policy.

Ryzhov, I.O. & Powell, W.B., 2011. The value of information in multi-armed bandits with exponentially distributed rewards. Proceedings of the 2011 International Conference on Computational Science , pp. 1363-1372. PDFAbstract

We consider a class of multi-armed bandit problems where the reward obtained by pulling an arm is drawn from an exponential distribution whose parameter is unknown. A Bayesian model with independent gamma priors is used to represent our beliefs and uncertainty about the exponential parameters. We derive a precise expression for the marginal value of information in this problem, which allows us to create a new knowledge gradient (KG) policy for making decisions. The policy is practical and easy to implement, making a case for value of information as a general approach to optimal learning problems with many different types of learning models.

Ryzhov, I.O. & Powell, W.B., 2010. Approximate dynamic programming with correlated Bayesian beliefs. Proceedings of the 48th Annual Allerton Conference on Communication, Control, and Computing , pp. 1360-1367. PDFAbstract

In approximate dynamic programming, we can represent our uncertainty about the value function using a Bayesian model with correlated beliefs. Thus, a decision made at a single state can provide us with information about many states, making each individual observation much more powerful. We propose a new exploration strategy based on the knowledge gradient concept from the optimal learning literature, which is currently the only method capable of handling correlated belief structures. The proposed method outperforms several other heuristics in numerical experiments conducted on two broad problem classes.

Ryzhov, I.O., Frazier, P.I. & Powell, W.B., 2010. On the robustness of a one-period look-ahead policy in multi-armed bandit problems. Proceedings of the 2010 International Conference on Computational Science , pp. 1635-1644. PDFAbstract

We analyze the robustness of a knowledge gradient (KG) policy for the multi-armed bandit problem. The KG policy is based on a one-period look-ahead, which is known to underperform in other learning problems when the marginal value of information is non-concave. We present an adjustment that corrects for non-concavity and approximates a multi-step look-ahead, and compare its performance to the unadjusted KG policy and other heuristics. We provide guidance for determining when adjustment will improve performance, and when it is unnecessary. We present evidence suggesting that KG is generally robust in the multi-armed bandit setting, which argues in favour of KG as an alternative to index policies.

Ryzhov, I.O., Valdez-Vivas, M.R. & Powell, W.B., 2010. Optimal learning of transition probabilities in the two-agent newsvendor problem. Proceedings of the 2010 Winter Simulation Conference , pp. 1088-1098. PDFAbstract

We examine a newsvendor problem with two agents: a requesting agent that observes private demand information, and an oversight agent that must determine how to allocate resources upon receiving a bid from the requesting agent. Because the two agents have different cost structures, the requesting agent tends to bid higher than the amount that is actually needed. As a result, the allocating agent needs to adaptively learn how to interpret the bids and estimate the requesting agent’s biases. Learning must occur as quickly as possible, because each suboptimal resource allocation incurs an economic cost. We present a mathematical model that casts the problem as a Markov decision process with unknown transition probabilities. We then perform a simulation study comparing four different techniques for optimal learning of transition probabilities. The best technique is shown to be a knowledge gradient algorithm, based on a one-period look-ahead approach.

Ryzhov, I.O. & Powell, W.B., 2009. The knowledge gradient algorithm for online subset selection. Proceedings of the 2009 IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning , pp. 137-144. PDFAbstract

We derive a one-period look-ahead policy for online subset selection problems, where learning about one subset also gives us information about other subsets. The subset selection problem is treated as a multi-armed bandit problem with correlated prior beliefs. We show that our decision rule is easily computable, and present experimental evidence that the policy is competitive against other online learning policies.

Ryzhov, I.O. & Powell, W.B., 2009. A Monte Carlo knowledge gradient method for learning abatement potential of emissions reduction technologies. Proceedings of the 2009 Winter Simulation Conference , pp. 1492-1502. PDFAbstract

Suppose that we have a set of emissions reduction technologies whose greenhouse gas abatement potential is unknown, and we wish to find an optimal portfolio (subset) of these technologies. Due to the interaction between technologies, the effectiveness of a portfolio can only be observed through expensive field implementations. We view this problem as an online optimal learning problem with correlated prior beliefs, where the performance of a portfolio of technologies in one project is used to guide choices for future projects. Given the large number of potential portfolios, we propose a learning policy which uses Monte Carlo sampling to narrow down the choice set to a relatively small number of promising portfolios, and then applies a one-period look-ahead approach using knowledge gradients to choose among this reduced set. We present experimental evidence that this policy is competitive against other online learning policies that consider the entire choice set.

Finalist, Best Theoretical Paper, Winter Simulation Conference, 2009.

Book
Powell, W.B. & Ryzhov, I.O., 2012. Optimal Learning, John Wiley and Sons. Amazon Link