Research Statement Yilun Chen∗ 1 Motivation My research focuses on understanding and addressing the fundamental challenges of sequential decision-making in the big data era. Such decision-making tasks arise frequently in important practical settings ranging from pricing, inventory control and revenue management to public healthcare and pandemic management. They have long been studied in the operations literature, often boiling down to solving stochastic dynamic programs (DP). Recently, amidst the ongoing data revolution, new, profound challenges stem from the increasing possibilities and expectations for companies/ governments to fully exploit the information extracted from various relevant data streams (i.e. data-driven) in their decision-making processes. Such data-driven problems generally demand large-scale, more realistic and flexible models in different application domains, which renders the associated DPs computationally intractable due to curse of dimensionality issues. Coming up with simple, tractable and provably good decision-making rules is a pressing task faced by many practitioners, which keeps driving the development of solutions in academia, and motivates my own research. My research vision is to i. develop new methodologies that lead to simple decision-making solutions with provable performance guarantees; and ii. devise (near) optimal and implementable algorithms for practitioners to improve their daily operations. 2 Research Accomplishments 2.1 High-dimensional optimal stopping The fundamental problem of optimal stopping has wide applications across operations management, economics, computer science and control, and is of central importance to the fields of applied probability and mathematical finance, with a long and rich history. To list a few examples, modern retailers consistently gather sales data/customer information to ensure launching a promotion at the right time; and governments collect data (at various levels) on a daily basis in support of the vital, real-time decisions of whether to shutdown during a pandemic. Over two decades ago, the then-hot topics of pricing complicated financial derivatives motivated the early studies of high-dimensional optimal stopping, which has since aroused considerable research interest, and remains highly relevant in modeling practical decision-making tasks in this big data era. Such ∗ School of Operations Research and Information Engineering, Cornell University, Email: yc2436@cornell.edu high-dimensional optimal stopping problems, unlike their low dimensional, i.i.d. secretary-type counterparts, are notoriously challenging and long considered intractable. In my work [1], we prove for the first time that, contrary to popular belief, high-dimensional optimal stopping is fundamentally tractable. More precisely, we develop a new methodology for approximately computing the optimal value/stopping policy, that can provably achieve any fixed error tolerance efficiently (with runtime scaling polynomially in the time horizon and effectively independent of the dimension). This surprising result is built on our novel discovery (in the same paper) of an equivalent formulation (in the form of an infinite expansion) of the optimal stopping DP, contrasting existing heuristic approaches (ADP, duality, deep learning etc.) that typically do not come with theoretically-strong joint-performance-and-efficiency guarantees. Our discovery also draws a novel connection between optimal stopping and the theory of network flows, which opens up the possibility of leveraging the rich tools in combinatorial optimization to solve challenging sequential decision-making problems. The work was awarded first place in the 2019 INFORMS Nicholson Student Paper Competition. 2.2 High-dimensional sequential decision-making In practice, underlying high dimensionality broadly exists in complex decision-making settings that are beyond the modelling capability of optimal stopping, imposing severe computational issues. Motivated by the fact that in many such settings, frequent switches from one action to another can be highly costly/undesirable/impossible, I study a fairly general class of high-dimensional sequential decision-making problems subject to a “limited action change” constraint in [4], and prove they are in fact tractable. Our model captures a large number of important real-world decision-making examples, including the dynamic pricing of a product that requires many raw materials/commodities as input (typically with highly volatile prices) under the limited price change business rule, and the evaluation of high-dimensional financial derivatives(swing options) with a restricted number of exercising opportunities. Viewing each time point of action change as an optimal stopping time, we leverage the results of [1] to build efficient and provably good approximation algorithms with runtime polynomial in the time horizon and independent of the problem dimension for any fixed error tolerance and fixed action-change budget. 2.3 Online maximum weighted independent set Sequential decision-making problems with underlying network/graph structure can model various operational tasks faced by online platforms, and have been widely considered in the revenue management literature. In these applications, the underlying graph nodes often represent arriving customers, who, in reality, are characterized by high-dimensional data streams with possibly non i.i.d., history-dependent demands. The combination of such high-dimensionality/path dependency and the network structure renders the decision-making problems intractable, yet is underexplored in the literature. In [2] we overcome this crucial computational challenge for the first time in the context of high-dimensional online maximum weighted independent set in bounded-degree bipartite graphs. Combining several sophisticated techniques, including a novel blocking flow algorithm inspired by [1] and classical network flow algorithms such as Dinic’s algorithm, we propose a novel polynomial time approximation scheme (PTAS) that exhibits an interesting correlation decay prop erty. Our algorithm’s runtime scales only linearly in the time horizon for any fixed error tolerance and max degree, and is effectively independent of the underlying dimension. 2.4 Ongoing works 2.4.1 Fast and implementable optimal stopping One of my immediate goals is to convert the theoretical progress of [1] into practical methods that can help practitioners deal with real-world tasks. Further building on the connection between optimal stopping and network flow theory, we devise a family of fast and implementable algorithms. The algorithms combine the core ideas of [1] with several classical network flow algorithms such as Push-Relabel. We perform numerical tests on multiple benchmark examples (default test examples in the optimal stopping/option pricing literature) to justify the practicality of the proposed algorithms. For instance, our algorithm outputs almost matching upper and lower bounds of the true prices of the benchmark high-dimensional Bermudan knock-out options, improving upon the state-of-the-art methods by an order of magnitude. The results are presented in working paper [3]. 2.4.2 Bayesian regret of multi-armed bandit Multi-armed bandit (MAB) is an important sequential decision-making problem with numerous applications in operations management (including dynamic price experimentation, online advertising etc). Over the years, various algorithms (e.g. methods based on upper confidence bounds (ucb), Thompson Sampling (TS)) have been proposed for MAB problems in the literature. In certain asymptotic regimes, many of these algorithms are proved to be “optimal.” However, how to effectively compare the performance of these “optimal” algorithms is a fundamental question that remains unanswered. We propose a solution in the simplified yet canonical case of Bayesian oneand- a-half arm bandits with Bernoulli rewards. Based on a higher order asymptotic analysis of the algorithm’s regret, we quantify the suboptimality of ucb and TS (compared to the exact-optimal, Gittins index policy), providing a framework for properly comparing MAB algorithms. This is a work in progress. 2.4.3 High-dimensional inventory control Inventory control with lost sales and large lead times has long been recognized as intractable due to the high dimensionality introduced by the pipeline vector. Building on the recent progress of [5], we investigate the connection between several simple policies that overcome the curse of dimensionality and enjoy optimal performance guarantees in the asymptotic regime where lead time approaches infinity. This is a work in progress. 2.4.4 Generalization of machine learning models Today, machine learning models of high complexity/expressiveness (e.g. deep neural networks) have achieved a huge empirical success in identifying the patterns in all kinds of data. These models often lead to “interpolating” solutions that fit the training data, yet do not exhibit “overfitting” issues in practice, contradicting the classical statistical theory. During my summer internship at IBM Research in 2020, my collaborators and I investigated this disagreement between theory and practice by studying the impact of interpolation on a model’s generalization ability. This is an ongoing project. Future plans Moving forward, I would like to continue pursuing deep theoretical insights and simple practical solutions to address new decision-making challenges emerging from real-world applications. Specifically, I have the following plans to accomplish in the future. Implementable algorithms and real-world applications. Given the broad range of high-impact real-world problems in which optimal stopping and high-dimensional control play a fundamental role, I am excited to find opportunities to collaborate with practitioners, leveraging our methods to make a significant impact. My concrete vision is to devise implementable algorithms/ commercialized software based on the theory we developed, to help our partners make better decisions in complex environments, and ultimately improve revenue/social wellbeing. Advancing the theory of sequential decision-making. My Ph.D works shed light on a number of important and interesting cutting-edge questions in the theory of sequential decision- making that remain to be explored. These questions often lie at the interface of control, optimization, applied probability and computer science, whose resolution requires diverse tools. My vision is to harness my broad set of skills developed during my Ph.D studies to pursue the answers to these questions. For example, I’m particularly interested in understanding the information-theoretical limit on the sample complexity of optimal stopping and sequential decision-making under various computational models (classical, parallel, quantum etc.). Such results can greatly advance our understanding of the fundamental hardness of certain sequential decision-making problems, and can potentially lead to more efficient algorithms. References [1] Y. Chen and D. A. Goldberg. Beating the curse of dimensionality in optimal stopping. Under Revision for Mathematics of Operations Research, 2018. First Place in the 2019 INFORMS Nicholson Student Paper Competition Finalist in the 2018 INFORMS Section on Finance Best Student Paper Competition. [2] Y. Chen and D. A. Goldberg. Polynomial time algorithms for the weighted online bipartite independent set problem. Preprint, 2020. [3] Y. Chen and D. A. Goldberg. Fast and practical algorithms for high-dimensional optimal stopping. Working paper, 2020. [4] Y. Chen and D. A. Goldberg. A new approach to high-dimensional online decision-making. Preprint to be submitted to Management Science, 2020. [5] D. A. Goldberg, D. A. Katz-Rogozhnikov, Y. Lu, M. Sharma, and M. S. Squillante. Asymptotic optimality of constant-order policies for lost sales inventory models with large lead times. Mathematics of Operations Research, 41(3):898–913, 2016.