The ranking and selection problem is a well-known mathematical framework for the formal study of optimal information collection. Expected improvement (EI) is a leading algorithmic approach to this problem; the practical benefits of EI have repeatedly been demonstrated in the literature, especially in the widely studied setting of Gaussian sampling distributions. However, it was recently proved that some of the most well-known EI-type methods achieve suboptimal convergence rates. We investigate a recently-proposed variant of EI (known as "complete EI") and prove that, with some minor modifications, it can be made to achieve the optimal asymptotic convergence rate under Gaussian observations. This is the first strong rate-optimality result for any EI-type method.
In many applications of business and marketing analytics, statistical models are fit using hierarchically structured data: common characteristics of products and customers are represented as categorical variables, and each category can be split up into multiple subcategories at a lower level of the hierarchy. Hundreds of thousands of binary variables may be required to model the hierarchy, necessitating the use of variable selection to screen out large numbers of irrelevant or insignificant features. We propose a new dynamic screening method, based on the distance correlation criterion, designed for hierarchical binary data. Our method can screen out large parts of the hierarchy at the higher levels, avoiding the need to explore many lower-level features and greatly reducing the computational cost of screening. The practical potential of the method is demonstrated in a case study involving a large volume of B2B transaction data.
Approximate Bayesian inference is a powerful methodology for constructing computationally efficient statistical mechanisms for sequential learning from incomplete or censored information. Approximate Bayesian learning models have proven successful in a variety of operations research and business problems; however, prior work in this area has been primarily computational, and the consistency of approximate Bayesian estimators has been a largely open problem. We develop a new consistency theory by interpreting approximate Bayesian inference as a form of stochastic approximation (SA) with an additional "bias" term. We prove the convergence of a general SA algorithm of this form, and leverage this analysis to derive the first consistency proofs for a suite of approximate Bayesian models from the recent literature.
We derive new sampling allocation schemes, based on myopic allocation policies and optimal computing budget allocation, for selecting the design with the optimal quantile in a Bayesian framework, analogous to existing methods in classic ranking and selection for selecting the design with the optimal mean. Theoretical analyses and empirical results demonstrate that these approaches have complementary advantages. To that end, we also provide a switching strategy that has balanced performance in both small-sample and large-sample scenarios.
Video games have increasingly taken on a social dimension, with social interaction becoming an important motivating factor for users. Since social gaming helps users derive higher utility from the gaming experience, it may be expected to improve user engagement, retention, and purchase behavior. At the same time, social users may be more vulnerable to churn when one or more individuals from their social network stops playing, and previous work has been mixed on whether, and how, engagement translates to money spent. We conduct an empirical study of the relationship between social interaction and user engagement, retention, and purchase behavior, based on a high-resolution player-level dataset from a major international video game company for one of its premier titles. Our results show that user engagement is highly correlated with certain social dynamics; at the same time, social interaction does not always translate to better retention rates or more purchases. In some cases, high dependence on a small set of friends is positively correlated with churn, indicating a tradeoff between engagement in one title and adoption of others. Early adopters are generally more responsive to the social experience than more casual users.
We propose a framework for "targeting and selection" (T&S), a new problem class in simulation optimization where the objective is to select a simulation alternative whose mean performance matches a pre-specified target as closely as possible. T&S resembles the more well-known problem of ranking and selection, but presents unexpected challenges: for example, a one-step look-ahead method may produce statistically inconsistent estimates of the values, even under very standard normality assumptions. We create a new and fundamentally different approach, based on a Brownian local time model, that exhibits characteristics of two widely-studied methodologies, namely expected improvement and optimal computing budget allocation. We characterize the asymptotic sampling rates of this method and relate them to the convergence rates of metrics of interest. The local time method outperforms benchmarks in experiments, including problems where the modeling assumptions of T&S are violated.
In business-to-business (B2B) pricing, a seller seeks to maximize revenue obtained from high-volume transactions involving a wide variety of buyers, products, and other characteristics. Buyer response is highly uncertain, and the seller only observes whether buyers accept or reject the offered prices. These deals are also subject to high opportunity cost, since revenue is zero if the price is rejected. The seller must adapt to this uncertain environment and learn quickly from new deals as they take place. We propose a new framework for statistical and optimal learning in this problem, based on approximate Bayesian inference, that has the ability to measure and update the seller's uncertainty about the demand curve based on new deals. In a case study, based on historical data, we show that our approach offers significant practical benefits.
Approximate dynamic programming (ADP) is a general methodological framework for multi-stage stochastic optimization problems in transportation, finance, energy, and other applications where scarce resources must be allocated optimally. We propose a new approach to the exploration/exploitation dilemma in ADP. First, we show how a Bayesian belief structure can be used to express uncertainty about the value function in ADP. Bayesian models can be integrated into both parametric and non-parametric value function approximations, which is vital for practical implementation. Second, we propose a new exploration strategy, based on the concept of value of information from the optimal learning literature, and prove that it systematically explores the state space. We evaluate this strategy using a variety of distinct resource allocation problems and demonstrate that it is highly competitive against other exploration strategies.
We introduce the problem of locating facilities over a finite time horizon with the goal of intercepting stochastic traffic flows that exhibit evasive behavior, which arises when locating weigh-in-motion systems, vehicle inspection stations, tollbooths, or other fixed flow-capturing facilities used for law enforcement. The problem can be formulated as a multi-stage, mixed-integer stochastic program (SP); however, under certain independence assumptions, this can be reformulated as a large two-stage SP, enabling us to solve much larger instances. We additionally propose an algorithm based on Lagrangian relaxation that separates the reformulated SP into a variant of a deterministic knapsack problem and a sum of time-decoupled single-period SPs that can be solved independently. The model and algorithm are tested on instances involving road networks of Nevada and Vermont. A comparison with the previously studied single-period stochastic programming approach shows that the newly proposed multi-period model substantially reduces the expected cost.
We consider a ranking and selection problem with independent normal observations, and analyze the asymptotic sampling rates of expected improvement (EI) methods in this setting. Such methods often perform well in practice, but a tractable analysis of their convergence rates is difficult due to the nonlinearity and nonconvexity of the functions used in the EI calculations. We present new results indicating that, for known sampling noise, variants of EI produce simulation allocations that are essentially identical to those chosen by the optimal computing budget allocation (OCBA) methodology, which is known to achieve near-optimal asymptotic performance in ranking and selection. This is the first general equivalence result between EI and OCBA, and it provides insight into the good practical performance of EI. We also derive the limiting allocation for EI under unknown sampling variance.
Winner, INFORMS Simulation Society Outstanding Publication Award, 2017.
We present a new framework for sequential information collection in applications where regression is used to learn about a set of unknown parameters, and alternates with optimization to design new data points. Such problems can be handled using the framework of ranking and selection (R&S), but traditional R&S procedures will experience high computational costs when the decision space grows combinatorially. This challenge arises in many applications of business analytics; in particular, we are motivated by the problem of efficiently learning effective strategies for non-profit fundraising. We present a value of information procedure for simultaneously learning unknown regression parameters and unknown sampling noise. We then develop an approximate version of the procedure, based on convex relaxation, that retains good performance and scales better to large problems.
Finalist, INFORMS Washington DC Chapter Student Excellence Award, 2015.
Non-profit organizations use direct-mail marketing to cultivate one-time donors and convert them into recurring contributors. Cultivated donors generate much more revenue than new donors, but also lapse with time, making it important to steadily draw in new cultivations. The direct-mail budget is limited, but better-designed mailings can improve success rates without increasing costs. We propose an empirical model to analyze the effectiveness of several design approaches used in practice, based on a massive dataset covering 8.6 million direct-mail communications with donors to the American Red Cross during 2009-2011. We find evidence that mailed appeals are more effective when they emphasize disaster preparedness and training efforts over post-disaster cleanup. Including small cards that affirm donors' identity as Red Cross supporters is an effective strategy, while including gift items such as address labels is not. Finally, very recent acquisitions are more likely to respond to appeals that ask them to contribute an amount similar to their most recent donation, but this approach has an adverse effect on donors with a longer history. We show via simulation that a simple design strategy based on these insights has potential to improve success rates from 5.4% to 8.1%.
We propose a novel theoretical characterization of the optimal "Gittins index" policy in multi-armed bandit problems with non-Gaussian, infinitely divisible reward distributions. We first construct a continuous-time, conditional Lévy process which probabilistically interpolates the sequence of discrete-time rewards. When the rewards are Gaussian, this approach enables an easy connection to the convenient time-change properties of Brownian motion. Although no such device is available in general for the non-Gaussian case, we use optimal stopping theory to characterize the value of the optimal policy as the solution to a free-boundary partial integro-differential equation (PIDE). We provide the free-boundary PIDE in explicit form under the specific settings of exponential and Poisson rewards. We also prove continuity and monotonicity properties of the Gittins index in these two problems, and discuss how the PIDE can be solved numerically to find the optimal index value of a given belief state.
Finalist, INFORMS Junior Faculty Forum Paper Competition, 2014.
The flow-capturing problem (FCP) consists of locating facilities to maximize the number of flow-based customers that encounter at least one of these facilities along their predetermined travel paths. The FCP literature assumes that if a facility is located along (or “close enough” to) a predetermined path of a flow of customers, that flow is considered captured. However, existing models for the FCP do not consider targeted users who behave noncooperatively by changing their travel paths to avoid fixed facilities. Examples of facilities that targeted subjects may have an incentive to avoid include weigh-in-motion stations used to detect and fine overweight trucks, tollbooths, and security and safety checkpoints. This article introduces a new type of flow-capturing model, called the “evasive flow-capturing problem” (EFCP), which generalizes the FCP and has relevant applications in transportation, revenue management, and security and safety management. We formulate deterministic and stochastic versions of the EFCP, analyze their structural properties, study exact and approximate solution techniques, and show an application to a real-world transportation network.
Approximate dynamic programming (ADP) has proven itself in a wide range of applications spanning large-scale transportation problems, health care, revenue management, and energy systems. The design of effective ADP algorithms has many dimensions, but one crucial factor is the stepsize rule used to update a value function approximation. Many operations research applications are computationally intensive, and it is important to obtain good results quickly. Furthermore, the most popular stepsize formulas use tunable parameters and can produce very poor results if tuned improperly. We derive a new stepsize rule that optimizes the prediction error in order to improve the short-term performance of an ADP algorithm. With only one, relatively insensitive tunable parameter, the new rule adapts to the level of noise in the problem and produces faster convergence in numerical experiments.
A sequential information collection problem, where a risk-averse decision maker updates a Bayesian belief about the unknown objective function of a linear program, is used to investigate the informational value of measurements performed to refine a robust optimization model. The information is collected in the form of a linear combination of the objective coefficients, subject to random noise. We have the ability to choose the weights in the linear combination, creating a new, nonconvex continuous-optimization problem, which we refer to as information blending. We develop two optimal blending strategies: (1) an active learning method that maximizes uncertainty reduction and (2) an economic approach that maximizes an expected improvement criterion. Semidefinite programming relaxations are used to create efficient convex approximations to the nonconvex blending problem.
We create the first computationally tractable Bayesian statistical model for learning unknown correlation structures in fully sequential simulation selection. Correlations represent similarities or differences between various design alternatives, and can be exploited to extract much more information from each individual simulation. However, in most applications, the correlation structure is unknown, thus creating the additional challenge of simultaneously learning unknown mean performance values and unknown correlations. Based on our new statistical model, we derive a Bayesian procedure that seeks to optimize the expected opportunity cost of the final selection based on the value of information, thus anticipating future changes to our beliefs about the correlations. Our approach outperforms existing methods for known correlation structures in numerical experiments, including one motivated by the problem of optimal wind farm placement, where real data are used to calibrate the simulation model.
Winner, INFORMS Computing Society Best Student Paper Award, 2012.
Consider a linear program (LP) with uncertain objective coefficients, for which we have a Bayesian prior. We can collect information to improve our understanding of these coefficients, but this may be expensive, giving us a separate problem of optimizing the collection of information to improve the quality of the solution relative to the true cost coefficients. We formulate this information collection problem for LPs for the first time and derive a knowledge gradient policy which finds the marginal value of each measurement by solving a sequence of LPs. We prove that this policy is asymptotically optimal and demonstrate its performance on a network flow problem.
We derive a one-period look-ahead policy for finite- and infinite-horizon online optimal learning problems with Gaussian rewards. Our approach is able to handle the case where our prior beliefs about the rewards are correlated, which is not handled by traditional multiarmed bandit methods. Experiments show that our KG policy performs competitively against the best-known approximation to the optimal policy in the classic bandit problem, and it outperforms many learning policies in the correlated case.
We derive a knowledge gradient policy for an optimal learning problem on a graph, in which we use sequential measurements to refine Bayesian estimates of individual edge values in order to learn about the best path. This problem differs from traditional ranking and selection in that the implementation decision (the path we choose) is distinct from the measurement decision (the edge we measure). Our decision rule is easy to compute and performs competitively against other learning policies, including a Monte Carlo adaptation of the knowledge gradient policy for ranking and selection.
Finalist, INFORMS New Jersey Chapter Student Contest, 2009.