Research Statement 

Lijun Ding, Cornell Univeristy 

October 19, 2020 

My work lies at the intersection of optimization, statistics, and machine learning, where I works on solving large-
scale and high dimensional optimization problems. My research has been broadly motivated by the technology trend 
of “high dimensional but structured data”: instead of having a few accurate data, we now face a huge amount of data 
that are often indirect, incomplete, have more variables and features than the number of samples, but with special 
structures, e.g., sparsity and low-rankness. This trend has resulted in large-scale optimization problems for decision 
making, statistical inference and predictions. However, these problems are traditionally considered intractable to 
solve due to (i) the high computational cost in handling the data and the optimization variables, and (ii) the 
potential ill-conditioning and NP hardness coming from nonconvexity due to modeling and from noisy/corrupted 
data generating processes. 

My work resolve the above issues by (i) justifying theoretically the good conditioning, the implicit regularization, 
and the benign geometry for these large scale optimization problems under mild and standard statistical assumptions, 
and (ii) developing robust, computational efficient and theoretically sound algorithms that can learn and utilize 
these special structure to solve these large-scale problems. To achieve both items, my research brings together 
fundamental ideas and sophisticated tools from a broad spectrum of areas including high-dimensional statistics, 
convex and non-convex optimization, and statistical learning theory. This work could have far-reaching impacts on 
both practitioners and theoreticians: (i) to practitioners, this new set of optimization algorithms allows for faster 
and yet robust decision making and predictions, (ii) to theoreticians, the lens of statistics and probability brings new 
insights to the fundamental researches of optimization in their application to large scale problems. 

In what follows, I shall first present my prior work on solving large-scale optimization problems, including classical 
convex optimization problems such as semidefinite programming, and newly arising nonconvex problems, based on 
the idea and techniques such as Frank-Wolfe, strict complementarity, and the leave-one-out argument. Then I 
present my plans for several important further directions including (i) bridging convex and nonconvex optimization 
algorithmically and theoretically ; (ii) efficient sampling and optimization in reinforcement learning. 

1 Prior work 

1.1 Semidefintie programming: regularity, conditioning, and efficient algorithm 

Semidefinite programming form a class of convex optimization problems with remarkable modeling power. Its 
application ranges from combinatorical problems such as max-cut, optics problems such as phase retrieval, graph 
network problem such as stochastic block model, to modern machine learning problem such as matrix completion 
and robust PCA. In all these applications, SDP has demonstrated the best theoretical guarantees once it is exactly 
solved. However, SDPs are challenging to solve because they involve a matrix variable X ∈ Rn×n whose dimension 
n can rise into the millions or billions. Most existing algorithms suffer from both high storage, Ω(n2), of storing 
the matrix variable, and the extraordinary high time complexity Ω(n3). Such high complexity forbids their usage 
in modern large-scale problems. Even worse, even assuming we can run these algorithms, it is not clear that these 
problems are well-conditioned or enough regular so that the traditional guarantees on the objective value can be 
translated to guarantees on metrics such as distance to solution which are more concerned in SDP applications. 

My research close the above gaps by (i) demonstrating that the traditionally considered regularity conditions do 
hold in these applications [DU20], (ii) design storage and time efficient algorithms for solving SDPs [DYC+19, DG20, 
DFU20], which converges merely under strong duality, and speed up to linear convergence if the further regularity 
conditions are satisfied. Hence they are well-suited for solving large scale SDPs. 


More precisely, in the work [DU20], I demonstrate that the traditionally considered regularity conditions such 
as strict complementarity, non-degeneracy of SDPs are satisfied for Z2 synchronization, stochastic block model, and 
matrix completion. Hence, accuracy measured by objective value does translated to the distance metric as shown 
in my work [DU]. In the work [DYC+19], I designed a new storage optimal (and potentially the first to my best 
knowledge) algorithm that provably solves SDP with sublinear convergence rate and only requires one eigenvector 
computation per iteration and solving one small-scale SDP with decision variable of size r, where r is the rank of 
the optimal solution which is expected to be much smaller than n in most of applications. In the followup works 
[DG20, DFU20], I further refined the algorithm, by exploration of the idea from classical spectral bundle method and 
Frank-Wolfe, so that it still keeps the sublinear rate yet achieve linear convergence rate once the regularity conditions 
are satisfied. 

1.2 Statistical nonconvex optimization: analysis framework and efficient algorithm 

Modern optimization problems are usually nonconvex, either due to the modeling or the nature of the underlying 
mechanism. One important approach to solving these problems is convex relaxation, and it has achieved great 
theoretical success in terms of finding the underlying signal in the signal processing regime or its generalization 
power in the statistics and machine learning regime, once the convex relation is solved exactly. However, it is 
observed that most algorithms for solving the relaxations are slow in general, that is, they usually only enjoy a 
sublinear rate of convergence in both theory and practice. Thus, people turn to directly solving nonconvex low-rank 
optimization (i.e., optimization problems with rank constraints) via simple gradient based algorithm. Despite the 
remarkable empirical success achieved by these simple algorithms, there is a huge gap between this and the theory 
side of nonconvex algorithms. These simple algorithms lack theoretical guarantees, whereas existing algorithms with 
guarantees are often highly artificial and too complicated to be implemented in practice. 

To close the above gap, I have proposed a framework to analyze and explain the fast convergence of the simple gradient 
based algorithms. Applying this framework allowed me to solve an open problem posted back in 2009 [JMD10]. 
Moreover, inspired by the analysis of these simple algorithms, I have designed efficient, simple to implement and 
provably linearly/quadratically convergent algorithms. Let me explain my work on this line in more detail in the 
following two sections. 

Stochastic iteration procedure analysis framework Based on a novel technique, the “leave one out technique”, 
Professor Chen and I [DC20] introduced a general and powerful framework for analyzing an iterative stochastic 
procedure arising in nonconvex algorithms. This powerful framework enables us to solve an open problem from 2009 
[JMD10], by establishing the convergence of singular value projection (SVP), a simple gradient based algorithm for 
solving low-rank matrix completion. In fact we show that SVP actually obtains a linear convergence rate. The 
power of our framework is not limited to just simple gradient algorithms, it can also be applied to other stochastic 
iteration procedures. Moreover, we also show that the sample complexity for a convex relaxation of low-rank matrix 
completion is near optimal (which is the best result in the literature) via this framework. 

Convex optimization based algorithms Inspired by the analysis of gradient based algorithms for solving non-
convex problems, one of my works [CCD+19] developed algorithms that in practice only need to solve a few (three 
to five) small scale convex optimization problems, where each of these problems is not much costlier than a couple 
of iterations of gradient based algorithms. Under a statistical setting, by utilizing tools from random matrix theory, 
variational analysis, and spectral theory, we show our algorithm is almost free of tuning and achieves quadratic 
convergence when initialized properly. Empirically, this method not only converges fast but also can be initialized 
randomly when sample size is reasonably large. 

Averaged projected gradient method (AVPG) Having established algorithms that are well-suited for quadratic 
or norm based objective with nonconvex constraints, I further advanced my research to non-quadratic objective where 
familiar concepts such as restricted isometry are no longer satisfied, and standard nonconvex approaches such as gradient 
descent may perform badly in practice and have no satisfactory theory in guaranteeing global and efficient 
convergence. I demonstrate that the critical component in dealing with non-quadratic objective is a regularity projection 
oracle, which enables global and linear convergence of our proposed algorithm, AVPG [DZC20]. Our results 
apply to a wide range of non-quadratic problems including rank aggregation, one bit matrix sensing/completion, and 
more broadly generalized linear models with rank constraint. 


2 Future Research 

2.1 Bridging convex and nonconvex optimization: algorithm and theory 

Having explored both the convex and nonconvex worlds, I want to leverage the insights and algorithms from convex 
optimization and use them in solving nonconvex optimization problems. 

Algorithmically, I would like to design a method which gets the best from both worlds even under the setting of 
limited observations or weak signals: (1) it generates an output which generalizes both theoretically and empirically 
(as some convex methods) (2) it is simple to implement and runs very fast (like some nonconvex methods). 

To achieve this, I propose a hybrid approach based on convex relaxation and nonconvex methods, using approximate 
solutions from convex relaxations as the initialization for algorithms solving statistical nonconvex problems. 
The rationale behind this hybrid approach is two fold. First, in term of generalizability, there is hope that for 
various important settings, the approximate solution is just close enough to the true signal to be a good initialization 
point for nonconvex algorithms. Second, the proposed approach also enjoys low computation cost because in each 
phase, the initialization phase and the nonconvex algorithm phase, does not consume much computational budget. 
Indeed, in the initialization phase, though obtaining high accuracy in the optimal solution is usually not affordable, 
approximate solutions can be obtained with reasonable cost, e.g., an early stopping rule for iterative algorithms. In 
the second phase, many noncvonex algorithms are found to be very fast in practice and in theory when initialized 
appropriately. 

Theoretically, I want to understand when effective nonconvex algorithms exist if we already know that the 
convex relaxation admits generalizable solutions. The rational behind this is that the theory for convex relaxation is 
well-developed, while analyzing nonconvex problems usually requires lengthy, technical analysis, and certain ad-hoc 
arguments. More importantly, this should help us in build intuitive and robust algorithms for previously unexplored 
nonconvex problems. I would like to further explore this direction based on my previous work, which has identified a 
few conditions and algorithm oracles that ensure the translation of knowledge of convex optimization to nonconvex 
optimization, such as the regular projection oracle [DZC20], and the sharpness and growhth condition []. 

2.2 Efficient sampling and optimization in reinforcement learning 

Tons of real-world problems are best modeled by reinforcement learning, where an agent has to sequentially makes 
decisions and receives rewards from the environment so that she could maximize the cumulative reward. The 
nonconvexity comes from the functional approximation to the value function or the parametrization of the policy 
space. The convergence of algorithms such as policy gradient and Q-learning and the landscape of the objective, the 
cumulative reward are complicated due to the functional approximation error and the intricate dependency along 
the trajectory of data. However, these heuristics has gained remarkable empirical success in games, robotics, and 
even traditional operations research area such as scheduling. My plan is to understand the intricate the statistical 
dependence and the geometry of the landscape via certain low rank structures of the Q-function, the transition 
function, or the reward function, and further design efficient sampling and optimization schemes. 

References 

[CCD+19] Vasileios Charisopoulos, Yudong Chen, Damek Davis, Mateo D´ıaz, Lijun Ding, and Dmitriy Drusvyatskiy. 

Low-rank matrix recovery with composite optimization: good conditioning and rapid convergence. arXiv 
preprint arXiv:1904.10020, 2019. 
[DC20] Lijun Ding and Yudong Chen. Leave-one-out approach for matrix completion: Primal and dual analysis. 
IEEE Transactions on Information Theory, 2020. 
[DFU20] Lijun Ding, Jicong Fan, and Madeleine Udell. k fw: A frank-wolfe style algorithm with stronger subproblem 
oracles. arXiv preprint arXiv:2006.16142, 2020. 
[DG20] Lijun Ding and Benjamin Grimmer. Revisit of spectral bundle methods: Primal-dual (sub) linear convergence 
rates. arXiv preprint arXiv:2008.07067, 2020. 
[DU] Lijun Ding and Madeleine Udell. A strict complementary slackness approach to growth, error bound, and 
sensitivity of solution in conic programming. To appear soon. 


[DU20] Lijun Ding and Madeleine Udell. On the regularity and conditioning of low rank semidefinite programs. 
arXiv preprint arXiv:2002.10673, 2020. 

[DYC+19] Lijun Ding, Alp Yurtsever, Volkan Cevher, Joel A Tropp, and Madeleine Udell. An optimal-

storage approach to semidefinite 
arXiv:1902.03373, 2019. 
programming using approximate complementarity. arXiv preprint 
[DZC20] Lijun Ding, Yuqian Zhang, and Yudong Chen. Low-rank matrix recovery with non-quadratic loss: projected 
gradient method and regularity projection oracle. arXiv preprint arXiv:2008.13777, 2020. 
[JMD10] Prateek Jain, Raghu Meka, and Inderjit S Dhillon. Guaranteed rank minimization via singular value 
projection. In Advances in Neural Information Processing Systems, pages 937–945, 2010.