Research Statement
Yilun Chen∗
1 Motivation
My research focuses on understanding and addressing the fundamental challenges of sequential
decision-making in the big data era. Such decision-making tasks arise frequently in important practical
settings ranging from pricing, inventory control and revenue management to public healthcare
and pandemic management. They have long been studied in the operations literature, often boiling
down to solving stochastic dynamic programs (DP). Recently, amidst the ongoing data revolution,
new, profound challenges stem from the increasing possibilities and expectations for companies/
governments to fully exploit the information extracted from various relevant data streams
(i.e. data-driven) in their decision-making processes. Such data-driven problems generally demand
large-scale, more realistic and flexible models in different application domains, which renders the
associated DPs computationally intractable due to curse of dimensionality issues. Coming up with
simple, tractable and provably good decision-making rules is a pressing task faced by many practitioners,
which keeps driving the development of solutions in academia, and motivates my own
research. My research vision is to
i. develop new methodologies that lead to simple decision-making solutions with provable performance
guarantees; and
ii. devise (near) optimal and implementable algorithms for practitioners to improve their daily
operations.
2 Research Accomplishments
2.1 High-dimensional optimal stopping
The fundamental problem of optimal stopping has wide applications across operations management,
economics, computer science and control, and is of central importance to the fields of applied
probability and mathematical finance, with a long and rich history. To list a few examples, modern
retailers consistently gather sales data/customer information to ensure launching a promotion
at the right time; and governments collect data (at various levels) on a daily basis in support
of the vital, real-time decisions of whether to shutdown during a pandemic. Over two decades
ago, the then-hot topics of pricing complicated financial derivatives motivated the early studies
of high-dimensional optimal stopping, which has since aroused considerable research interest, and
remains highly relevant in modeling practical decision-making tasks in this big data era. Such
∗
School of Operations Research and Information Engineering, Cornell University, Email: yc2436@cornell.edu
high-dimensional optimal stopping problems, unlike their low dimensional, i.i.d. secretary-type
counterparts, are notoriously challenging and long considered intractable.
In my work [1], we prove for the first time that, contrary to popular belief, high-dimensional
optimal stopping is fundamentally tractable. More precisely, we develop a new methodology for
approximately computing the optimal value/stopping policy, that can provably achieve any fixed
error tolerance efficiently (with runtime scaling polynomially in the time horizon and effectively
independent of the dimension). This surprising result is built on our novel discovery (in the same
paper) of an equivalent formulation (in the form of an infinite expansion) of the optimal stopping
DP, contrasting existing heuristic approaches (ADP, duality, deep learning etc.) that typically do
not come with theoretically-strong joint-performance-and-efficiency guarantees. Our discovery also
draws a novel connection between optimal stopping and the theory of network flows, which opens
up the possibility of leveraging the rich tools in combinatorial optimization to solve challenging
sequential decision-making problems. The work was awarded first place in the 2019 INFORMS
Nicholson Student Paper Competition.
2.2 High-dimensional sequential decision-making
In practice, underlying high dimensionality broadly exists in complex decision-making settings that
are beyond the modelling capability of optimal stopping, imposing severe computational issues.
Motivated by the fact that in many such settings, frequent switches from one action to another can
be highly costly/undesirable/impossible, I study a fairly general class of high-dimensional sequential
decision-making problems subject to a “limited action change” constraint in [4], and prove they are
in fact tractable. Our model captures a large number of important real-world decision-making examples,
including the dynamic pricing of a product that requires many raw materials/commodities
as input (typically with highly volatile prices) under the limited price change business rule, and
the evaluation of high-dimensional financial derivatives(swing options) with a restricted number of
exercising opportunities. Viewing each time point of action change as an optimal stopping time,
we leverage the results of [1] to build efficient and provably good approximation algorithms with
runtime polynomial in the time horizon and independent of the problem dimension for any fixed
error tolerance and fixed action-change budget.
2.3 Online maximum weighted independent set
Sequential decision-making problems with underlying network/graph structure can model various
operational tasks faced by online platforms, and have been widely considered in the revenue management
literature. In these applications, the underlying graph nodes often represent arriving
customers, who, in reality, are characterized by high-dimensional data streams with possibly non
i.i.d., history-dependent demands. The combination of such high-dimensionality/path dependency
and the network structure renders the decision-making problems intractable, yet is underexplored
in the literature. In [2] we overcome this crucial computational challenge for the first time in the
context of high-dimensional online maximum weighted independent set in bounded-degree bipartite
graphs. Combining several sophisticated techniques, including a novel blocking flow algorithm
inspired by [1] and classical network flow algorithms such as Dinic’s algorithm, we propose a novel
polynomial time approximation scheme (PTAS) that exhibits an interesting correlation decay prop
erty. Our algorithm’s runtime scales only linearly in the time horizon for any fixed error tolerance
and max degree, and is effectively independent of the underlying dimension.
2.4 Ongoing works
2.4.1 Fast and implementable optimal stopping
One of my immediate goals is to convert the theoretical progress of [1] into practical methods that
can help practitioners deal with real-world tasks. Further building on the connection between optimal
stopping and network flow theory, we devise a family of fast and implementable algorithms.
The algorithms combine the core ideas of [1] with several classical network flow algorithms such as
Push-Relabel. We perform numerical tests on multiple benchmark examples (default test examples
in the optimal stopping/option pricing literature) to justify the practicality of the proposed
algorithms. For instance, our algorithm outputs almost matching upper and lower bounds of the
true prices of the benchmark high-dimensional Bermudan knock-out options, improving upon the
state-of-the-art methods by an order of magnitude. The results are presented in working paper [3].
2.4.2 Bayesian regret of multi-armed bandit
Multi-armed bandit (MAB) is an important sequential decision-making problem with numerous applications
in operations management (including dynamic price experimentation, online advertising
etc). Over the years, various algorithms (e.g. methods based on upper confidence bounds (ucb),
Thompson Sampling (TS)) have been proposed for MAB problems in the literature. In certain
asymptotic regimes, many of these algorithms are proved to be “optimal.” However, how to effectively
compare the performance of these “optimal” algorithms is a fundamental question that
remains unanswered. We propose a solution in the simplified yet canonical case of Bayesian oneand-
a-half arm bandits with Bernoulli rewards. Based on a higher order asymptotic analysis of the
algorithm’s regret, we quantify the suboptimality of ucb and TS (compared to the exact-optimal,
Gittins index policy), providing a framework for properly comparing MAB algorithms. This is a
work in progress.
2.4.3 High-dimensional inventory control
Inventory control with lost sales and large lead times has long been recognized as intractable due to
the high dimensionality introduced by the pipeline vector. Building on the recent progress of [5], we
investigate the connection between several simple policies that overcome the curse of dimensionality
and enjoy optimal performance guarantees in the asymptotic regime where lead time approaches
infinity. This is a work in progress.
2.4.4 Generalization of machine learning models
Today, machine learning models of high complexity/expressiveness (e.g. deep neural networks) have
achieved a huge empirical success in identifying the patterns in all kinds of data. These models
often lead to “interpolating” solutions that fit the training data, yet do not exhibit “overfitting”
issues in practice, contradicting the classical statistical theory. During my summer internship at
IBM Research in 2020, my collaborators and I investigated this disagreement between theory and
practice by studying the impact of interpolation on a model’s generalization ability. This is an
ongoing project.
Future plans
Moving forward, I would like to continue pursuing deep theoretical insights and simple practical
solutions to address new decision-making challenges emerging from real-world applications. Specifically,
I have the following plans to accomplish in the future.
Implementable algorithms and real-world applications. Given the broad range of
high-impact real-world problems in which optimal stopping and high-dimensional control play a
fundamental role, I am excited to find opportunities to collaborate with practitioners, leveraging
our methods to make a significant impact. My concrete vision is to devise implementable algorithms/
commercialized software based on the theory we developed, to help our partners make
better decisions in complex environments, and ultimately improve revenue/social wellbeing.
Advancing the theory of sequential decision-making. My Ph.D works shed light on a
number of important and interesting cutting-edge questions in the theory of sequential decision-
making that remain to be explored. These questions often lie at the interface of control, optimization,
applied probability and computer science, whose resolution requires diverse tools. My vision is
to harness my broad set of skills developed during my Ph.D studies to pursue the answers to these
questions. For example, I’m particularly interested in understanding the information-theoretical
limit on the sample complexity of optimal stopping and sequential decision-making under various
computational models (classical, parallel, quantum etc.). Such results can greatly advance our understanding
of the fundamental hardness of certain sequential decision-making problems, and can
potentially lead to more efficient algorithms.
References
[1] Y. Chen and D. A. Goldberg. Beating the curse of dimensionality in optimal stopping.
Under Revision for Mathematics of Operations Research, 2018.
First Place in the 2019 INFORMS Nicholson Student Paper Competition
Finalist in the 2018 INFORMS Section on Finance Best Student Paper Competition.
[2] Y. Chen and D. A. Goldberg. Polynomial time algorithms for the weighted online bipartite independent
set problem.
Preprint, 2020.
[3] Y. Chen and D. A. Goldberg. Fast and practical algorithms for high-dimensional optimal stopping.
Working paper, 2020.
[4] Y. Chen and D. A. Goldberg. A new approach to high-dimensional online decision-making.
Preprint to be submitted to Management Science, 2020.
[5] D. A. Goldberg, D. A. Katz-Rogozhnikov, Y. Lu, M. Sharma, and M. S. Squillante. Asymptotic optimality
of constant-order policies for lost sales inventory models with large lead times. Mathematics of Operations
Research, 41(3):898–913, 2016.