Zheng, H., et al., In Press.

Personalized multimorbidity management for patients with type 2 diabetes using reinforcement learning of electronic health records.

Drugs.

AbstractComorbid chronic conditions are common among people with type 2 diabetes. We developed an Artificial Intelligence algorithm, based on Reinforcement Learning (RL), for personalized diabetes and multimorbidity management with strong potential to improve health outcomes relative to current clinical practice. We modeled glycemia, blood pressure and cardiovascular disease (CVD) risk as health outcomes using a retrospective cohort of 16,665 patients with type 2 diabetes from New York University Langone Health ambulatory care electronic health records in 2009 to 2017. We trained a RL prescription algorithm that recommends a treatment regimen optimizing patients’ cumulative health outcomes using their individual characteristics and medical history at each encounter. The RL recommendations were evaluated on an independent subset of patients. The single-outcome optimization RL algorithms, RL-glycemia, RL-blood pressure, and RL-CVD, recommended consistent prescriptions with what observed by clinicians in 86.1%, 82.9% and 98.4% of the encounters. For patient encounters in which the RL recommendations differed from the clinician prescriptions, significantly fewer encounters showed uncontrolled glycemia (A1c > 8% on 35% of encounters), uncontrolled hypertension (blood pressure > 140mmHg on 16% of encounters) and high CVD risk (risk > 20% on 25% of encounters) under RL algorithms than those observed under clinicians (43%, 27% and 31% of encounters respectively; all P < 0.001).

2020_zheng_multimorbidity.pdf Gu, L., Ryzhov, I.O. & Eftekhar, M., In Press.

The facts on the ground: evaluating humanitarian fleet management policies using simulation.

European Journal of Operational Research.

PDFAbstractIn humanitarian fleet management, the performance of purchase, assignment, and sales decisions is determined by dynamic interactions between the fleet composition (vehicles that were acquired at different times and have different residual values), the time-varying and uncertain demands on the fleet, and the depreciation of the vehicles as they are exploited. When all of these factors are taken into account, optimal decisions become analytically intractable. We propose to evaluate purchase, assignment, and sales policies in a realistic simulation environment that directly models heterogeneous vehicle attributes and tracks their evolution over time. Using data from a large international humanitarian organization (LIHO), the simulator can identify the rationale behind seemingly ad-hoc decisions by field managers at LIHO. For instance, by selling vehicles later than LIHO recommends, managers are actually reducing their costs; similarly, managers decline to coordinate vehicles between mission types because the merits of "sharing" in this way turn out to be marginal at best.

Finalist, POMS Humanitarian Operations and Crisis Management Best Paper Award, 2019.

Peng, Y., et al., In Press.

Efficient sampling allocation procedures for selecting the optimal quantile.

INFORMS Journal on Computing.

PDFAbstractWe derive new sampling allocation schemes, based on myopic allocation policies and optimal computing budget allocation, for selecting the design with the optimal quantile in a Bayesian framework, analogous to existing methods in classic ranking and selection for selecting the design with the optimal mean. Theoretical analyses and empirical results demonstrate that these approaches have complementary advantages. To that end, we also provide a switching strategy that has balanced performance in both small-sample and large-sample scenarios.

Chen, Y. & Ryzhov, I.O., 2020.

Consistency analysis of sequential learning under approximate Bayesian inference.

Operations Research , 68 (1) , pp. 295-307.

PDFAbstractApproximate Bayesian inference is a powerful methodology for constructing computationally efficient statistical mechanisms for sequential learning from incomplete or censored information. Approximate Bayesian learning models have proven successful in a variety of operations research and business problems; however, prior work in this area has been primarily computational, and the consistency of approximate Bayesian estimators has been a largely open problem. We develop a new consistency theory by interpreting approximate Bayesian inference as a form of stochastic approximation (SA) with an additional "bias" term. We prove the convergence of a general SA algorithm of this form, and leverage this analysis to derive the first consistency proofs for a suite of approximate Bayesian models from the recent literature.

Qu, H., et al., 2020.

Learning demand curves in B2B pricing: a new framework and case study.

Production and Operations Management , 29 (5) , pp. 1287-1306.

PDFAbstractIn business-to-business (B2B) pricing, a seller seeks to maximize revenue obtained from high-volume transactions involving a wide variety of buyers, products, and other characteristics. Buyer response is highly uncertain, and the seller only observes whether buyers accept or reject the offered prices. These deals are also subject to high opportunity cost, since revenue is zero if the price is rejected. The seller must adapt to this uncertain environment and learn quickly from new deals as they take place. We propose a new framework for statistical and optimal learning in this problem, based on approximate Bayesian inference, that has the ability to measure and update the seller's uncertainty about the demand curve based on new deals. In a case study, based on historical data, we show that our approach offers significant practical benefits.

Huang, Y., et al., 2019.

Optimal learning for urban delivery fleet allocation.

Transportation Science , 53 (3) , pp. 623-641.

PDFAbstractIn a two-tiered city logistics system, an urban logistics company usually partitions the urban area into regions and allocates its delivery fleet (e.g., vehicles, couriers) to these regions. On a daily basis, the delivery station in each region receives the delivery packages from the city distribution centers and delivers them to customers within the region, using its allocated delivery vehicles. A tactical decision in such a city logistics system is the allocation of its delivery fleet to the regions to minimize the expected operational cost of the entire system. However, due to the complexity of the urban delivery operations and the day-to-day variance of the customer demand, an accurate evaluation of the expected operational cost associated with an allocation decision can be very expensive. We propose a learning policy that adaptively selects the fleet allocation to learn the underlying expected operational cost function by incorporating the value of information. Specifically, we exploit the monotonicity of the expected operational cost in the number of allocated delivery vehicles in a region and extend the idea of knowledge gradient with discrete priors (KGDP) with resampling and re-generation (KGDP-R&R). Our numerical results demonstrate the effectiveness of KGDP-R&R against other learning policies as well as its managerial implications as compared to heuristics in practice.

Winner, INFORMS Urban Transportation SIG Outstanding Paper Award, 2020.

Chen, Y. & Ryzhov, I.O., 2019.

Complete expected improvement converges to an optimal budget allocation.

Advances in Applied Probability , 51 (1) , pp. 209-235.

PDFAbstractThe ranking and selection problem is a well-known mathematical framework for the formal study of optimal information collection. Expected improvement (EI) is a leading algorithmic approach to this problem; the practical benefits of EI have repeatedly been demonstrated in the literature, especially in the widely studied setting of Gaussian sampling distributions. However, it was recently proved that some of the most well-known EI-type methods achieve suboptimal convergence rates. We investigate a recently-proposed variant of EI (known as "complete EI") and prove that, with some minor modifications, it can be made to converge to the rate-optimal static budget allocation without requiring any tuning.

Ryzhov, I.O., et al., 2019.

Bayesian exploration for approximate dynamic programming.

Operations Research , 67 (1) , pp. 198-214.

PDFAbstractApproximate dynamic programming (ADP) is a general methodological framework for multi-stage stochastic optimization problems in transportation, finance, energy, and other applications where scarce resources must be allocated optimally. We propose a new approach to the exploration/exploitation dilemma in ADP that leverages two important concepts from the optimal learning literature: first, we show how a Bayesian belief structure can be used to express uncertainty about the value function in ADP; second, we develop a new exploration strategy based on the concept of value of information, and prove that it systematically explores the state space. An important advantage of our framework is that it can be integrated into both parametric and non-parametric value function approximations, which are widely used in practical implementations of ADP. We evaluate this strategy on a variety of distinct resource allocation problems and demonstrate that, while more computationally intensive, it is highly competitive against other exploration strategies.

Ryzhov, I.O., 2018.

The local time method for targeting and selection.

Operations Research , 66 (5) , pp. 1406-1422.

PDFAbstractWe propose a framework for "targeting and selection" (T&S), a new problem class in simulation optimization where the objective is to select a simulation alternative whose mean performance matches a pre-specified target as closely as possible. T&S resembles the more well-known problem of ranking and selection, but presents unexpected challenges: for example, a one-step look-ahead method may produce statistically inconsistent estimates of the values, even under very standard normality assumptions. We create a new and fundamentally different approach, based on a Brownian local time model, that exhibits characteristics of two widely-studied methodologies, namely expected improvement and optimal computing budget allocation. We characterize the asymptotic sampling rates of this method and relate them to the convergence rates of metrics of interest. The local time method outperforms benchmarks in experiments, including problems where the modeling assumptions of T&S are violated.

Marković, N., Ryzhov, I.O. & Schonfeld, P., 2017.

Evasive flow capture: a multi-period stochastic facility location problem with independent demand.

European Journal of Operational Research , 257 (2) , pp. 687-703.

PDFAbstractWe introduce the problem of locating facilities over a finite time horizon with the goal of intercepting stochastic traffic flows that exhibit evasive behavior, which arises when locating weigh-in-motion systems, vehicle inspection stations, tollbooths, or other fixed flow-capturing facilities used for law enforcement. The problem can be formulated as a multi-stage, mixed-integer stochastic program (SP); however, under certain independence assumptions, this can be reformulated as a large two-stage SP, enabling us to solve much larger instances. We additionally propose an algorithm based on Lagrangian relaxation that separates the reformulated SP into a variant of a deterministic knapsack problem and a sum of time-decoupled single-period SPs that can be solved independently. The model and algorithm are tested on instances involving road networks of Nevada and Vermont. A comparison with the previously studied single-period stochastic programming approach shows that the newly proposed multi-period model substantially reduces the expected cost.

Ryzhov, I.O., 2016.

On the convergence rates of expected improvement methods.

Operations Research , 64 (6) , pp. 1515-1528.

PDFAbstractWe consider a ranking and selection problem with independent normal observations, and analyze the asymptotic sampling rates of expected improvement (EI) methods in this setting. Such methods often perform well in practice, but a tractable analysis of their convergence rates is difficult due to the nonlinearity and nonconvexity of the functions used in the EI calculations. We present new results indicating that, for known sampling noise, variants of EI produce simulation allocations that are essentially identical to those chosen by the optimal computing budget allocation (OCBA) methodology, which is known to achieve near-optimal asymptotic performance in ranking and selection. This is the first general equivalence result between EI and OCBA, and it provides insight into the good practical performance of EI. We also derive the limiting allocation for EI under unknown sampling variance.

Winner, INFORMS Simulation Society Outstanding Publication Award, 2017.

Han, B., Ryzhov, I.O. & Defourny, B., 2016.

Optimal learning in linear regression with combinatorial feature selection.

INFORMS Journal on Computing , 28 (4) , pp. 721-735.

PDFAbstractWe present a new framework for sequential information collection in applications where regression is used to learn about a set of unknown parameters, and alternates with optimization to design new data points. Such problems can be handled using the framework of ranking and selection (R&S), but traditional R&S procedures will experience high computational costs when the decision space grows combinatorially. This challenge arises in many applications of business analytics; in particular, we are motivated by the problem of efficiently learning effective strategies for non-profit fundraising. We present a value of information procedure for simultaneously learning unknown regression parameters and unknown sampling noise. We then develop an approximate version of the procedure, based on convex relaxation, that retains good performance and scales better to large problems.

Finalist, INFORMS Washington DC Chapter Student Excellence Award, 2015.

Ryzhov, I.O., Han, B. & Bradić, J., 2016.

Cultivating disaster donors using data analytics.

Management Science , 62 (3) , pp. 849-866.

PDFAbstractNon-profit organizations use direct-mail marketing to cultivate one-time donors and convert them into recurring contributors. Cultivated donors generate much more revenue than new donors, but also lapse with time, making it important to steadily draw in new cultivations. The direct-mail budget is limited, but better-designed mailings can improve success rates without increasing costs. We propose an empirical model to analyze the effectiveness of several design approaches used in practice, based on a massive dataset covering 8.6 million direct-mail communications with donors to the American Red Cross during 2009-2011. We find evidence that mailed appeals are more effective when they emphasize disaster preparedness and training efforts over post-disaster cleanup. Including small cards that affirm donors' identity as Red Cross supporters is an effective strategy, while including gift items such as address labels is not. Finally, very recent acquisitions are more likely to respond to appeals that ask them to contribute an amount similar to their most recent donation, but this approach has an adverse effect on donors with a longer history. We show via simulation that a simple design strategy based on these insights has potential to improve success rates from 5.4% to 8.1%.

Ding, Z. & Ryzhov, I.O., 2016.

Optimal learning with non-Gaussian rewards.

Advances in Applied Probability , 48 (1) , pp. 112-136.

PDFAbstractWe propose a novel theoretical characterization of the optimal "Gittins index" policy in multi-armed bandit problems with non-Gaussian, infinitely divisible reward distributions. We first construct a continuous-time, conditional Lévy process which probabilistically interpolates the sequence of discrete-time rewards. When the rewards are Gaussian, this approach enables an easy connection to the convenient time-change properties of Brownian motion. Although no such device is available in general for the non-Gaussian case, we use optimal stopping theory to characterize the value of the optimal policy as the solution to a free-boundary partial integro-differential equation (PIDE). We provide the free-boundary PIDE in explicit form under the specific settings of exponential and Poisson rewards. We also prove continuity and monotonicity properties of the Gittins index in these two problems, and discuss how the PIDE can be solved numerically to find the optimal index value of a given belief state.

Finalist, INFORMS Junior Faculty Forum Paper Competition, 2014.

Marković, N., Ryzhov, I.O. & Schonfeld, P., 2015.

Evasive flow capture: optimal location of weigh-in-motion systems, tollbooths, and security checkpoints.

Networks , 65 (1) , pp. 22-42.

PDFAbstractThe flow-capturing problem (FCP) consists of locating facilities to maximize the number of flow-based customers that encounter at least one of these facilities along their predetermined travel paths. The FCP literature assumes that if a facility is located along (or “close enough” to) a predetermined path of a flow of customers, that flow is considered captured. However, existing models for the FCP do not consider targeted users who behave noncooperatively by changing their travel paths to avoid fixed facilities. Examples of facilities that targeted subjects may have an incentive to avoid include weigh-in-motion stations used to detect and fine overweight trucks, tollbooths, and security and safety checkpoints. This article introduces a new type of flow-capturing model, called the “evasive flow-capturing problem” (EFCP), which generalizes the FCP and has relevant applications in transportation, revenue management, and security and safety management. We formulate deterministic and stochastic versions of the EFCP, analyze their structural properties, study exact and approximate solution techniques, and show an application to a real-world transportation network.

Winner, Glover-Klingman Prize, 2015.

Ryzhov, I.O., Frazier, P.I. & Powell, W.B., 2015.

A new optimal stepsize for approximate dynamic programming.

IEEE Transactions on Automatic Control , 60 (3) , pp. 743-758.

PDFAbstractApproximate dynamic programming (ADP) has proven itself in a wide range of applications spanning large-scale transportation problems, health care, revenue management, and energy systems. The design of effective ADP algorithms has many dimensions, but one crucial factor is the stepsize rule used to update a value function approximation. Many operations research applications are computationally intensive, and it is important to obtain good results quickly. Furthermore, the most popular stepsize formulas use tunable parameters and can produce very poor results if tuned improperly. We derive a new stepsize rule that optimizes the prediction error in order to improve the short-term performance of an ADP algorithm. With only one, relatively insensitive tunable parameter, the new rule adapts to the level of noise in the problem and produces faster convergence in numerical experiments.

Defourny, B., Ryzhov, I.O. & Powell, W.B., 2015.

Optimal information blending with measurements in the L2 sphere.

Mathematics of Operations Research , 40 (4) , pp. 1060-1088.

PDFAbstractA sequential information collection problem, where a risk-averse decision maker updates a Bayesian belief about the unknown objective function of a linear program, is used to investigate the informational value of measurements performed to refine a robust optimization model. The information is collected in the form of a linear combination of the objective coefficients, subject to random noise. We have the ability to choose the weights in the linear combination, creating a new, nonconvex continuous-optimization problem, which we refer to as information blending. We develop two optimal blending strategies: (1) an active learning method that maximizes uncertainty reduction and (2) an economic approach that maximizes an expected improvement criterion. Semidefinite programming relaxations are used to create efficient convex approximations to the nonconvex blending problem.

Qu, H., et al., 2015.

Sequential selection with unknown correlation structures.

Operations Research , 63 (4) , pp. 931-948.

PDFAbstractWe create the first computationally tractable Bayesian statistical model for learning unknown correlation structures in fully sequential simulation selection. Correlations represent similarities or differences between various design alternatives, and can be exploited to extract much more information from each individual simulation. However, in most applications, the correlation structure is unknown, thus creating the additional challenge of simultaneously learning unknown mean performance values and unknown correlations. Based on our new statistical model, we derive a Bayesian procedure that seeks to optimize the expected opportunity cost of the final selection based on the value of information, thus anticipating future changes to our beliefs about the correlations. Our approach outperforms existing methods for known correlation structures in numerical experiments, including one motivated by the problem of optimal wind farm placement, where real data are used to calibrate the simulation model.

Winner, INFORMS Computing Society Best Student Paper Award, 2012.

Ryzhov, I.O. & Powell, W.B., 2012.

Information collection for linear programs with uncertain objective coefficients.

SIAM Journal on Optimization , 22 (4) , pp. 1344-1368.

PDFAbstractConsider a linear program (LP) with uncertain objective coefficients, for which we have a Bayesian prior. We can collect information to improve our understanding of these coefficients, but this may be expensive, giving us a separate problem of optimizing the collection of information to improve the quality of the solution relative to the true cost coefficients. We formulate this information collection problem for LPs for the first time and derive a knowledge gradient policy which finds the marginal value of each measurement by solving a sequence of LPs. We prove that this policy is asymptotically optimal and demonstrate its performance on a network flow problem.

Ryzhov, I.O., Powell, W.B. & Frazier, P.I., 2012.

The knowledge gradient algorithm for a general class of online learning problems.

Operations Research , 60 (1) , pp. 180-195.

PDFAbstractWe derive a one-period look-ahead policy for finite- and infinite-horizon online optimal learning problems with Gaussian rewards. Our approach is able to handle the case where our prior beliefs about the rewards are correlated, which is not handled by traditional multiarmed bandit methods. Experiments show that our KG policy performs competitively against the best-known approximation to the optimal policy in the classic bandit problem, and it outperforms many learning policies in the correlated case.

Ryzhov, I.O. & Powell, W.B., 2011.

Information collection on a graph.

Operations Research , 59 (1) , pp. 188-201.

PDFAbstractWe derive a knowledge gradient policy for an optimal learning problem on a graph, in which we use sequential measurements to refine Bayesian estimates of individual edge values in order to learn about the best path. This problem differs from traditional ranking and selection in that the implementation decision (the path we choose) is distinct from the measurement decision (the edge we measure). Our decision rule is easy to compute and performs competitively against other learning policies, including a Monte Carlo adaptation of the knowledge gradient policy for ranking and selection.

Finalist, INFORMS New Jersey Chapter Student Contest, 2009.