Arrow Research search

Author name cluster

Luo Luo

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

36 papers
2 author rows

Possible papers

36

AAAI Conference 2026 Conference Paper

Decentralized Non-convex Stochastic Optimization with Heterogeneous Variance

  • Hongxu Chen
  • Ke Wei
  • Luo Luo

Decentralized optimization is critical for solving large-scale machine learning problems over distributed networks, where multiple nodes collaborate through local communication. In practice, the variances of stochastic gradient estimators often differ across nodes, yet their impact on algorithm design and complexity remains unclear. To address this issue, we propose D-NSS, a decentralized algorithm with node-specific sampling, and establish its sample complexity depending on the arithmetic mean of local standard deviations, achieving tighter bounds than existing methods that rely on the worst-case or quadratic mean. We further derive a matching sample complexity lower bound under heterogeneous variance, thereby proving the optimality of this dependence. Moreover, we extend the framework with a variance reduction technique and develop D-NSS-VR, which under the mean-squared smoothness assumption attains an improved sample complexity bound while preserving the arithmetic-mean dependence. Finally, numerical experiments validate the theoretical results and demonstrate the effectiveness of the proposed algorithms.

AAAI Conference 2026 Conference Paper

Privacy Leaks by Adversaries: Adversarial Iterations for Membership Inference Attack

  • Jing Xue
  • Zhishen Sun
  • Haishan Ye
  • Luo Luo
  • Xiangyu Chang
  • Guang Dai

Membership inference attack (MIA) has become one of the most widely used and effective methods for evaluating the privacy risks of machine learning models. This attack aims to determine whether a specific sample is part of the model's training set by analyzing the model's output. While traditional membership inference attacks focus on leveraging the model’s posterior output, such as confidence on the target sample, we propose IMIA, a novel attack strategy that utilizes the process of generating adversarial samples to infer membership. We propose to infer the member properties of the target sample using the number of iterations required to generate its adversarial sample. We conduct experiments across multiple models and datasets, and our results demonstrate that the number of iterations for generating an adversarial sample is a reliable feature for membership inference, achieving strong performance both in black-box and white-box attack scenarios. This work provides a new perspective for evaluating model privacy and highlights the potential of adversarial example-based features for privacy leakage assessment.

NeurIPS Conference 2025 Conference Paper

A Near-Optimal Algorithm for Decentralized Convex-Concave Finite-Sum Minimax Optimization

  • Hongxu Chen
  • Ke Wei
  • Haishan Ye
  • Luo Luo

In this paper, we study the distributed convex-concave finite-sum minimax optimization over the network, and a decentralized variance-reduced optimistic gradient method with stochastic mini-batch sizes (DIVERSE) is proposed. For the strongly-convex-strongly-concave objective, it is shown that DIVERSE can achieve a linear convergence rate that depends on the global smoothness parameters, yielding sharper computation and communication complexity bounds than existing results. Furthermore, we also establish the lower complexity bounds, which show that our upper bounds are optimal up to a logarithmic factor in terms of the local incremental first-order oracle calls, the computation rounds, and the communication rounds. Numerical experiments demonstrate that our algorithm outperforms existing methods in practice.

ICML Conference 2025 Conference Paper

A Parameter-Free and Near-Optimal Zeroth-Order Algorithm for Stochastic Convex Optimization

  • Kunjie Ren
  • Luo Luo

This paper studies zeroth-order optimization for stochastic convex minimization problems. We propose a parameter-free stochastic zeroth-order method (POEM), which introduces a step-size scheme based on the distance over finite difference and an adaptive smoothing parameter. Our theoretical analysis shows that POEM achieves near-optimal stochastic zeroth-order oracle complexity. Furthermore, numerical experiments demonstrate that POEM outperforms existing zeroth-order methods in practice.

NeurIPS Conference 2025 Conference Paper

Accelerated Evolving Set Processes for Local PageRank Computation

  • Binbin Huang
  • Luo Luo
  • Yanghua Xiao
  • Deqing Yang
  • Baojian Zhou

This work proposes a novel framework based on nested evolving set processes to accelerate Personalized PageRank (PPR) computation. At each stage of the process, we employ a localized inexact proximal point iteration to solve a simplified linear system. We show that the time complexity of such localized methods is upper bounded by $\min\{\tilde{\mathcal{O}}(R^2/\epsilon^2), \tilde{\mathcal{O}}(m)\}$ to obtain an $\epsilon$-approximation of the PPR vector, where $m$ denotes the number of edges in the graph and $R$ is a constant defined via nested evolving set processes. Furthermore, the algorithms induced by our framework require solving only $\tilde{\mathcal{O}}(1/\sqrt{\alpha})$ such linear systems, where $\alpha$ is the damping factor. When $1/\epsilon^2\ll m$, this implies the existence of an algorithm that computes an $\epsilon$-approximation of the PPR vector with an overall time complexity of $\tilde{\mathcal{O}}(R^2 / (\sqrt{\alpha}\epsilon^2))$, independent of the underlying graph size. Our result resolves an open conjecture from existing literature. Experimental results on real-world graphs validate the efficiency of our methods, demonstrating significant convergence in the early stages.

AAAI Conference 2025 Conference Paper

An Enhanced Levenberg--Marquardt Method via Gram Reduction

  • Chengchang Liu
  • Luo Luo
  • John C.S. Lui

This paper studies the problem of solving the system of nonlinear equations. We propose the Gram-reduced Levenberg--Marquardt method, which reuses the Gram matrix. Our method has a global convergence guarantee without relying on any step of line-search or solving sub-problems. We show that our method takes a smaller computational complexity than existing Levenberg--Marquardt methods to find the stationary point of the square norm of the equations. We also show that the proposed method enjoys a local superlinear convergence rate under the non-degenerate assumption. Experiments are conducted on real-world applications in scientific computing and machine learning, which validate the efficiency of our method.

ICML Conference 2024 Conference Paper

Decentralized Convex Finite-Sum Optimization with Better Dependence on Condition Numbers

  • Yuxing Liu
  • Lesi Chen
  • Luo Luo

This paper studies decentralized optimization problem, where the local objective on each node is an average of a finite set of convex functions and the global function is strongly convex. We propose an efficient stochastic variance reduced first-order method that allows the different nodes to establish their stochastic local gradient estimator with different mini-batch sizes per iteration. We prove the upper bound on the computation time of the proposed method contains the dependence on the global condition number, which is sharper than the previous results that only depend on the local condition numbers. Compared with the state-of-the-art methods, we also show that our method requires less local incremental first-order oracle calls and comparable communication cost. We further perform numerical experiments to validate the advantage of our method.

AAAI Conference 2024 Conference Paper

Decentralized Gradient-Free Methods for Stochastic Non-smooth Non-convex Optimization

  • Zhenwei Lin
  • Jingfan Xia
  • Qi Deng
  • Luo Luo

We consider decentralized gradient-free optimization of minimizing Lipschitz continuous functions that satisfy neither smoothness nor convexity assumption. We propose two novel gradient-free algorithms, the Decentralized Gradient-Free Method (DGFM) and its variant, the Decentralized Gradient-Free Method+ (DGFM+). Based on the techniques of randomized smoothing and gradient tracking, DGFM requires the computation of the zeroth-order oracle of a single sample in each iteration, making it less demanding in terms of computational resources for individual computing nodes. Theoretically, DGFM achieves a complexity of O(d^(3/2)δ^(-1)ε^(-4)) for obtaining a (δ,ε)-Goldstein stationary point. DGFM+, an advanced version of DGFM, incorporates variance reduction to further improve the convergence behavior. It samples a mini-batch at each iteration and periodically draws a larger batch of data, which improves the complexity to O(d^(3/2)δ^(-1)ε^(-3)). Moreover, experimental results underscore the empirical advantages of our proposed algorithms when applied to real-world datasets.

NeurIPS Conference 2024 Conference Paper

Gradient-Free Methods for Nonconvex Nonsmooth Stochastic Compositional Optimization

  • Zhuanghua Liu
  • Luo Luo
  • Bryan Kian Hsiang Low

The stochastic compositional optimization (SCO) is popular in many real-world applications, including risk management, reinforcement learning, and meta-learning. However, most of the previous methods for SCO require the smoothness assumption on both the outer and inner functions, which limits their applications to a wider range of problems. In this paper, we study the SCO problem in that both the outer and inner functions are Lipschitz continuous but possibly nonconvex and nonsmooth. In particular, we propose gradient-free stochastic methods for finding the $(\delta, \epsilon)$-Goldstein stationary points of such problems with non-asymptotic convergence rates. Our results also lead to an improved convergence rate for the convex nonsmooth SCO problem. Furthermore, we conduct numerical experiments to demonstrate the effectiveness of the proposed methods.

AAAI Conference 2024 Conference Paper

Incremental Quasi-Newton Methods with Faster Superlinear Convergence Rates

  • Zhuanghua Liu
  • Luo Luo
  • Bryan Kian Hsiang Low

We consider the finite-sum optimization problem, where each component function is strongly convex and has Lipschitz continuous gradient and Hessian. The recently proposed incremental quasi-Newton method is based on BFGS update and achieves a local superlinear convergence rate that is dependent on the condition number of the problem. This paper proposes a more efficient quasi-Newton method by incorporating the symmetric rank-1 update into the incremental framework, which results in the condition-number-free local superlinear convergence rate. Furthermore, we can boost our method by applying the block update on the Hessian approximation, which leads to an even faster local convergence rate. The numerical experiments show the proposed methods significantly outperform the baseline methods.

JMLR Journal 2024 Journal Article

Near-Optimal Algorithms for Making the Gradient Small in Stochastic Minimax Optimization

  • Lesi Chen
  • Luo Luo

We study the problem of finding a near-stationary point for smooth minimax optimization. The recently proposed extra anchored gradient (EAG) methods achieve the optimal convergence rate for the convex-concave minimax problem in the deterministic setting. However, the direct extension of EAG to stochastic optimization is not efficient. In this paper, we design a novel stochastic algorithm called Recursive Anchored IteratioN (RAIN). We show that the RAIN achieves near-optimal stochastic first-order oracle (SFO) complexity for stochastic minimax optimization in both convex-concave and strongly-convex-strongly-concave cases. In addition, we extend the idea of RAIN to solve structured nonconvex-nonconcave minimax problem and it also achieves near-optimal SFO complexity. [abs] [ pdf ][ bib ] &copy JMLR 2024. ( edit, beta )

NeurIPS Conference 2024 Conference Paper

Near-Optimal Distributed Minimax Optimization under the Second-Order Similarity

  • Qihao Zhou
  • Haishan Ye
  • Luo Luo

This paper considers the distributed convex-concave minimax optimization under the second-order similarity. We propose stochastic variance-reduced optimistic gradient sliding (SVOGS) method, which takes the advantage of the finite-sum structure in the objective by involving the mini-batch client sampling and variance reduction. We prove SVOGS can achieve the $\varepsilon$-duality gap within communication rounds of ${\mathcal O}(\delta D^2/\varepsilon)$, communication complexity of ${\mathcal O}(n+\sqrt{n}\delta D^2/\varepsilon)$, and local gradient calls of $\tilde{\mathcal O}(n+(\sqrt{n}\delta+L)D^2/\varepsilon\log(1/\varepsilon))$, where $n$ is the number of nodes, $\delta$ is the degree of the second-order similarity, $L$ is the smoothness parameter and $D$ is the diameter of the constraint set. We can verify that all of above complexity (nearly) matches the corresponding lower bounds. For the specific $\mu$-strongly-convex-$\mu$-strongly-convex case, our algorithm has the upper bounds on communication rounds, communication complexity, and local gradient calls of $\mathcal O(\delta/\mu\log(1/\varepsilon))$, ${\mathcal O}((n+\sqrt{n}\delta/\mu)\log(1/\varepsilon))$, and $\tilde{\mathcal O}(n+(\sqrt{n}\delta+L)/\mu)\log(1/\varepsilon))$ respectively, which are also nearly tight. Furthermore, we conduct the numerical experiments to show the empirical advantages of proposed method.

ICML Conference 2024 Conference Paper

On the Complexity of Finite-Sum Smooth Optimization under the Polyak-Łojasiewicz Condition

  • Yunyan Bai
  • Yuxing Liu
  • Luo Luo

This paper considers the optimization problem of the form $\min_{{\bf x}\in{\mathbb R}^d} f({\bf x})\triangleq \frac{1}{n}\sum_{i=1}^n f_i({\bf x})$, where $f(\cdot)$ satisfies the Polyak–Łojasiewicz (PL) condition with parameter $\mu$ and $\{f_i(\cdot)\}_{i=1}^n$ is $L$-mean-squared smooth. We show that any gradient method requires at least $\Omega(n+\kappa\sqrt{n}\log(1/\epsilon))$ incremental first-order oracle (IFO) calls to find an $\epsilon$-suboptimal solution, where $\kappa\triangleq L/\mu$ is the condition number of the problem. This result nearly matches upper bounds of IFO complexity for best-known first-order methods. We also study the problem of minimizing the PL function in the distributed setting such that the individuals $f_1(\cdot), …, f_n(\cdot)$ are located on a connected network of $n$ agents. We provide lower bounds of $\Omega(\kappa/\sqrt{\gamma}\log(1/\epsilon))$, $\Omega((\kappa+\tau\kappa/\sqrt{\gamma})\log(1/\epsilon))$ and $\Omega\big(n+\kappa\sqrt{n}\log(1/\epsilon)\big)$ for communication rounds, time cost and local first-order oracle calls respectively, where $\gamma\in(0, 1]$ is the spectral gap of the mixing matrix associated with the network and $\tau>0$ is the time cost of per communication round. Furthermore, we propose a decentralized first-order method that nearly matches above lower bounds in expectation.

NeurIPS Conference 2024 Conference Paper

Optimizing over Multiple Distributions under Generalized Quasar-Convexity Condition

  • Shihong Ding
  • Long Yang
  • Luo Luo
  • Cong Fang

We study a typical optimization model where the optimization variable is composed of multiple probability distributions. Though the model appears frequently in practice, such as for policy problems, it lacks specific analysis in the general setting. For this optimization problem, we propose a new structural condition/landscape description named generalized quasar-convexity (GQC) beyond the realms of convexity. In contrast to original quasar-convexity \citep{hinder2020near}, GQC allows an individual quasar-convex parameter $\gamma_i$ for each variable block $i$ and the smaller of $\gamma_i$ implies less block-convexity. To minimize the objective function, we consider a generalized oracle termed as the internal function that includes the standard gradient oracle as a special case. We provide optimistic mirror descent (OMD) for multiple distributions and prove that the algorithm can achieve an adaptive $\tilde{\mathcal{O}}((\sum_{i=1}^d1/\gamma_i)\epsilon^{-1})$ iteration complexity to find an $\varepsilon$-suboptimal global solution without pre-known the exact values of $\gamma_i$ when the objective admits ``polynomial-like'' structural. Notably, it achieves iteration complexity that does not explicitly depend on the number of distributions and strictly faster $(\sum_{i=1}^d 1/\gamma_i \text{ v. s. } d\max_{i\in[1: d]} 1/\gamma_i)$ than mirror decent methods. We also extend GQC to the minimax optimization problem proposing the generalized quasar-convexity-concavity (GQCC) condition and a decentralized variant of OMD with regularization. Finally, we show the applications of our algorithmic framework on discounted Markov Decision Processes problem and Markov games, which bring new insights on the landscape analysis of reinforcement learning.

ICML Conference 2024 Conference Paper

Zeroth-Order Methods for Constrained Nonconvex Nonsmooth Stochastic Optimization

  • Zhuanghua Liu
  • Cheng Chen 0015
  • Luo Luo
  • Bryan Kian Hsiang Low

This paper studies the problem of solving nonconvex nonsmooth optimization over a closed convex set. Most previous works tackle such problems by transforming the constrained problem into an unconstrained problem that can be solved by the techniques developed in the unconstrained setting. However, they only provide asymptotic convergence analysis for their methods. In this work, we provide the non-asymptotic analysis for solving constrained nonconvex nonsmooth optimization. We first generalize classical gradient mapping and the Frank–Wolfe gap in the nonsmooth setting. Then we introduce novel notions of approximate stationarity concerning such generalized quantities. We also propose several stochastic zeroth-order algorithms for the problem, along with their non-asymptotic convergence guarantees of obtaining the proposed approximate stationarity. Finally, we conduct numerical experiments that demonstrate the effectiveness of our algorithms.

NeurIPS Conference 2023 Conference Paper

Block Broyden's Methods for Solving Nonlinear Equations

  • Chengchang Liu
  • Cheng Chen
  • Luo Luo
  • John C. S. Lui

This paper studies quasi-Newton methods for solving nonlinear equations. We propose block variants of both good and bad Broyden's methods, which enjoy explicit local superlinear convergence rates. Our block good Broyden's method has faster condition-number-free convergence rate than existing Broyden's methods because it takes the advantage of multiple rank modification on the Jacobian estimator. On the other hand, our block bad Broyden's method directly estimates the inverse of the Jacobian provably, which reduces the computational cost of the iteration. Our theoretical results provide some new insights on why good Broyden's method outperforms bad Broyden's method in most of the cases. The empirical results also demonstrate the superiority of our methods and validate our theoretical analysis.

ICML Conference 2023 Conference Paper

Faster Gradient-Free Algorithms for Nonsmooth Nonconvex Stochastic Optimization

  • Lesi Chen
  • Jing Xu 0027
  • Luo Luo

We consider the optimization problem of the form $\min_{x \in \mathbb{R}^d} f(x) \triangleq \mathbb{E}[F(x; \xi)]$, where the component $F(x; \xi)$ is $L$-mean-squared Lipschitz but possibly nonconvex and nonsmooth. The recently proposed gradient-free method requires at most $\mathcal{O}( L^4 d^{3/2} \epsilon^{-4} + \Delta L^3 d^{3/2} \delta^{-1} \epsilon^{-4})$ stochastic zeroth-order oracle complexity to find a $(\delta, \epsilon)$-Goldstein stationary point of objective function, where $\Delta = f(x_0) - \inf_{x \in \mathbb{R}^d} f(x)$ and $x_0$ is the initial point of the algorithm. This paper proposes a more efficient algorithm using stochastic recursive gradient estimators, which improves the complexity to $\mathcal{O}(L^3 d^{3/2} \epsilon^{-3}+ \Delta L^2 d^{3/2} \delta^{-1} \epsilon^{-3})$.

JMLR Journal 2023 Journal Article

Multi-Consensus Decentralized Accelerated Gradient Descent

  • Haishan Ye
  • Luo Luo
  • Ziang Zhou
  • Tong Zhang

his paper considers the decentralized convex optimization problem, which has a wide range of applications in large-scale machine learning, sensor networks, and control theory. We propose novel algorithms that achieve optimal computation complexity and near optimal communication complexity. Our theoretical results give affirmative answers to the open problem on whether there exists an algorithm that can achieve a communication complexity (nearly) matching the lower bound depending on the global condition number instead of the local one. Furthermore, the linear convergence of our algorithms only depends on the strong convexity of global objective and it does not require the local functions to be convex. The design of our methods relies on a novel integration of well-known techniques including Nesterov's acceleration, multi-consensus and gradient-tracking. Empirical studies show the outperformance of our methods for machine learning applications. [abs] [ pdf ][ bib ] &copy JMLR 2023. ( edit, beta )

NeurIPS Conference 2022 Conference Paper

Faster Stochastic Algorithms for Minimax Optimization under Polyak-{\L}ojasiewicz Condition

  • Lesi Chen
  • Boyuan Yao
  • Luo Luo

This paper considers stochastic first-order algorithms for minimax optimization under Polyak-{\L}ojasiewicz (PL) conditions. We propose SPIDER-GDA for solving the finite-sum problem of the form $\min_x \max_y f(x, y)\triangleq \frac{1}{n} \sum_{i=1}^n f_i(x, y)$, where the objective function $f(x, y)$ is $\mu_x$-PL in $x$ and $\mu_y$-PL in $y$; and each $f_i(x, y)$ is $L$-smooth. We prove SPIDER-GDA could find an $\epsilon$-approximate solution within ${\mathcal O}\left((n + \sqrt{n}\, \kappa_x\kappa_y^2)\log (1/\epsilon)\right)$ stochastic first-order oracle (SFO) complexity, which is better than the state-of-the-art method whose SFO upper bound is ${\mathcal O}\big((n + n^{2/3}\kappa_x\kappa_y^2)\log (1/\epsilon)\big)$, where $\kappa_x\triangleq L/\mu_x$ and $\kappa_y\triangleq L/\mu_y$. For the ill-conditioned case, we provide an accelerated algorithm to reduce the computational cost further. It achieves $\tilde{{\mathcal O}}\big((n+\sqrt{n}\, \kappa_x\kappa_y)\log^2 (1/\epsilon)\big)$ SFO upper bound when $\kappa_x\geq\sqrt{n}$. Our ideas also can be applied to the more general setting that the objective function only satisfies PL condition for one variable. Numerical experiments validate the superiority of proposed methods.

NeurIPS Conference 2022 Conference Paper

Finding Second-Order Stationary Points in Nonconvex-Strongly-Concave Minimax Optimization

  • Luo Luo
  • Yujun Li
  • Cheng Chen

We study the smooth minimax optimization problem $\min_{\bf x}\max_{\bf y} f({\bf x}, {\bf y})$, where $f$ is $\ell$-smooth, strongly-concave in ${\bf y}$ but possibly nonconvex in ${\bf x}$. Most of existing works focus on finding the first-order stationary point of the function $f({\bf x}, {\bf y})$ or its primal function $P({\bf x})\triangleq \max_{\bf y} f({\bf x}, {\bf y})$, but few of them focus on achieving the second-order stationary point, which is essential to nonconvex problems. In this paper, we propose a novel approach for minimax optimization, called Minimax Cubic Newton (MCN), which could find an ${\mathcal O}\left(\varepsilon, \kappa^{1. 5}\sqrt{\rho\varepsilon}\right)$-second-order stationary point of $P({\bf x})$ with calling ${\mathcal O}\left(\kappa^{1. 5}\sqrt{\rho}\varepsilon^{-1. 5}\right)$ times of second-order oracles and $\tilde{\mathcal O}\left(\kappa^{2}\sqrt{\rho}\varepsilon^{-1. 5}\right)$ times of first-order oracles, where $\kappa$ is the condition number and $\rho$ is the Lipschitz continuous constant for the Hessian of $f({\bf x}, {\bf y})$. In addition, we propose an inexact variant of MCN for high-dimensional problems to avoid calling the expensive second-order oracles. Instead, our method solves the cubic sub-problem inexactly via gradient descent and matrix Chebyshev expansion. This strategy still obtains the desired approximate second-order stationary point with high probability but only requires $\tilde{\mathcal O}\left(\kappa^{1. 5}\ell\varepsilon^{-2}\right)$ Hessian-vector oracle calls and $\tilde{\mathcal O}\left(\kappa^{2}\sqrt{\rho}\varepsilon^{-1. 5}\right)$ first-order oracle calls. To the best of our knowledge, this is the first work that considers the non-asymptotic convergence behavior of finding second-order stationary points for minimax problems without the convex-concave assumptions.

NeurIPS Conference 2022 Conference Paper

Quasi-Newton Methods for Saddle Point Problems

  • Chengchang Liu
  • Luo Luo

This paper studies quasi-Newton methods for strongly-convex-strongly-concave saddle point problems. We propose random Broyden family updates, which have explicit local superlinear convergence rate of ${\mathcal O}\big(\big(1-1/(d\varkappa^2)\big)^{k(k-1)/2}\big)$, where $d$ is the dimension of the problem, $\varkappa$ is the condition number and $k$ is the number of iterations. The design and analysis of proposed algorithm are based on estimating the square of indefinite Hessian matrix, which is different from classical quasi-Newton methods in convex optimization. We also present two specific Broyden family algorithms with BFGS-type and SR1-type updates, which enjoy the faster local convergence rate of $\mathcal O\big(\big(1-1/d\big)^{k(k-1)/2}\big)$. Our numerical experiments show proposed algorithms outperform classical first-order methods.

JMLR Journal 2021 Journal Article

Approximate Newton Methods

  • Haishan Ye
  • Luo Luo
  • Zhihua Zhang

Many machine learning models involve solving optimization problems. Thus, it is important to address a large-scale optimization problem in big data applications. Recently, subsampled Newton methods have emerged to attract much attention due to their efficiency at each iteration, rectified a weakness in the ordinary Newton method of suffering a high cost in each iteration while commanding a high convergence rate. Other efficient stochastic second order methods have been also proposed. However, the convergence properties of these methods are still not well understood. There are also several important gaps between the current convergence theory and the empirical performance in real applications. In this paper, we aim to fill these gaps. We propose a unifying framework to analyze both local and global convergence properties of second order methods. Accordingly, we present our theoretical results which match the empirical performance in real applications well. [abs] [ pdf ][ bib ] &copy JMLR 2021. ( edit, beta )

AAAI Conference 2021 Conference Paper

Revisiting Co-Occurring Directions: Sharper Analysis and Efficient Algorithm for Sparse Matrices

  • Luo Luo
  • Cheng Chen
  • Guangzeng Xie
  • Haishan Ye

We study the streaming model for approximate matrix multiplication (AMM). We are interested in the scenario that the algorithm can only take one pass over the data with limited memory. The state-of-the-art deterministic sketching algorithm for streaming AMM is the co-occurring directions (COD), which has much smaller approximation errors than randomized algorithms and outperforms other deterministic sketching methods empirically. In this paper, we provide a tighter error bound for COD whose leading term considers the potential approximate low-rank structure and the correlation of input matrices. We prove COD is space optimal with respect to our improved error bound. We also propose a variant of COD for sparse matrices with theoretical guarantees. The experiments on realworld sparse datasets show that the proposed algorithm is more efficient than baseline methods.

NeurIPS Conference 2020 Conference Paper

Decentralized Accelerated Proximal Gradient Descent

  • Haishan Ye
  • Ziang Zhou
  • Luo Luo
  • Tong Zhang

Decentralized optimization has wide applications in machine learning, signal processing, and control. In this paper, we study the decentralized composite optimization problem with a non-smooth regularization term. Many proximal gradient based decentralized algorithms have been proposed in the past. However, these algorithms do not achieve near optimal computational complexity and communication complexity. In this paper, we propose a new method which establishes the optimal computational complexity and a near optimal communication complexity. Our empirical study shows that the proposed algorithm outperforms existing state-of-the-art algorithms.

IJCAI Conference 2020 Conference Paper

Efficient and Robust High-Dimensional Linear Contextual Bandits

  • Cheng Chen
  • Luo Luo
  • Weinan Zhang
  • Yong Yu
  • Yijiang Lian

The linear contextual bandits is a sequential decision-making problem where an agent decides among sequential actions given their corresponding contexts. Since large-scale data sets become more and more common, we study the linear contextual bandits in high-dimensional situations. Recent works focus on employing matrix sketching methods to accelerating contextual bandits. However, the matrix approximation error will bring additional terms to the regret bound. In this paper we first propose a novel matrix sketching method which is called Spectral Compensation Frequent Directions (SCFD). Then we propose an efficient approach for contextual bandits by adopting SCFD to approximate the covariance matrices. By maintaining and manipulating sketched matrices, our method only needs O(md) space and O(md) updating time in each round, where d is the dimensionality of the data and m is the sketching size. Theoretical analysis reveals that our method has better regret bounds than previous methods in high-dimensional cases. Experimental results demonstrate the effectiveness of our algorithm and verify our theoretical guarantees.

NeurIPS Conference 2020 Conference Paper

Efficient Projection-free Algorithms for Saddle Point Problems

  • Cheng Chen
  • Luo Luo
  • Weinan Zhang
  • Yong Yu

The Frank-Wolfe algorithm is a classic method for constrained optimization problems. It has recently been popular in many machine learning applications because its projection-free property leads to more efficient iterations. In this paper, we study projection-free algorithms for convex-strongly-concave saddle point problems with complicated constraints. Our method combines Conditional Gradient Sliding with Mirror-Prox and show that it only requires $\tilde{\cO}(1/\sqrt{\epsilon})$ gradient evaluations and $\tilde{\cO}(1/\epsilon^2)$ linear optimizations in the batch setting. We also extend our method to the stochastic setting and propose first stochastic projection-free algorithms for saddle point problems. Experimental results demonstrate the effectiveness of our algorithms and verify our theoretical guarantees.

ICML Conference 2020 Conference Paper

Lower Complexity Bounds for Finite-Sum Convex-Concave Minimax Optimization Problems

  • Guangzeng Xie
  • Luo Luo
  • Yijiang Lian
  • Zhihua Zhang 0004

This paper studies the lower bound complexity for minimax optimization problem whose objective function is the average of $n$ individual smooth convex-concave functions. We consider the algorithm which gets access to gradient and proximal oracle for each individual component. For the strongly-convex-strongly-concave case, we prove such an algorithm can not reach an $\varepsilon$-suboptimal point in fewer than $\Omega\left((n+\kappa)\log(1/\varepsilon)\right)$ iterations, where $\kappa$ is the condition number of the objective function. This lower bound matches the upper bound of the existing incremental first-order oracle algorithm stochastic variance-reduced extragradient. We develop a novel construction to show the above result, which partitions the tridiagonal matrix of classical examples into $n$ groups. This construction is friendly to the analysis of incremental gradient and proximal oracle and we also extend the analysis to general convex-concave cases.

JMLR Journal 2020 Journal Article

Nesterov's Acceleration for Approximate Newton

  • Haishan Ye
  • Luo Luo
  • Zhihua Zhang

Optimization plays a key role in machine learning. Recently, stochastic second-order methods have attracted considerable attention because of their low computational cost in each iteration. However, these methods might suffer from poor performance when the Hessian is hard to be approximate well in a computation-efficient way. To overcome this dilemma, we resort to Nesterov's acceleration to improve the convergence performance of these second-order methods and propose accelerated approximate Newton. We give the theoretical convergence analysis of accelerated approximate Newton and show that Nesterov's acceleration can improve the convergence rate. Accordingly, we propose an accelerated regularized sub-sampled Newton (ARSSN) which performs much better than the conventional regularized sub-sampled Newton empirically and theoretically. Moreover, we show that ARSSN has better performance than classical first-order methods empirically. [abs] [ pdf ][ bib ] &copy JMLR 2020. ( edit, beta )

NeurIPS Conference 2020 Conference Paper

Stochastic Recursive Gradient Descent Ascent for Stochastic Nonconvex-Strongly-Concave Minimax Problems

  • Luo Luo
  • Haishan Ye
  • Zhichao Huang
  • Tong Zhang

We consider nonconvex-concave minimax optimization problems of the form $\min_{\bf x}\max_{\bf y\in{\mathcal Y}} f({\bf x}, {\bf y})$, where $f$ is strongly-concave in $\bf y$ but possibly nonconvex in $\bf x$ and ${\mathcal Y}$ is a convex and compact set. We focus on the stochastic setting, where we can only access an unbiased stochastic gradient estimate of $f$ at each iteration. This formulation includes many machine learning applications as special cases such as robust optimization and adversary training. We are interested in finding an ${\mathcal O}(\varepsilon)$-stationary point of the function $\Phi(\cdot)=\max_{\bf y\in{\mathcal Y}} f(\cdot, {\bf y})$. The most popular algorithm to solve this problem is stochastic gradient decent ascent, which requires $\mathcal O(\kappa^3\varepsilon^{-4})$ stochastic gradient evaluations, where $\kappa$ is the condition number. In this paper, we propose a novel method called Stochastic Recursive gradiEnt Descent Ascent (SREDA), which estimates gradients more efficiently using variance reduction. This method achieves the best known stochastic gradient complexity of ${\mathcal O}(\kappa^3\varepsilon^{-3})$, and its dependency on $\varepsilon$ is optimal for this problem.

JMLR Journal 2019 Journal Article

Robust Frequent Directions with Application in Online Learning

  • Luo Luo
  • Cheng Chen
  • Zhihua Zhang
  • Wu-Jun Li
  • Tong Zhang

The frequent directions (FD) technique is a deterministic approach for online sketching that has many applications in machine learning. The conventional FD is a heuristic procedure that often outputs rank deficient matrices. To overcome the rank deficiency problem, we propose a new sketching strategy called robust frequent directions (RFD) by introducing a regularization term. RFD can be derived from an optimization problem. It updates the sketch matrix and the regularization term adaptively and jointly. RFD reduces the approximation error of FD without increasing the computational cost. We also apply RFD to online learning and propose an effective hyperparameter-free online Newton algorithm. We derive a regret bound for our online Newton algorithm based on RFD, which guarantees the robustness of the algorithm. The experimental studies demonstrate that the proposed method outperforms state-of-the-art second order online learning algorithms. [abs] [ pdf ][ bib ] &copy JMLR 2019. ( edit, beta )

ICML Conference 2017 Conference Paper

Approximate Newton Methods and Their Local Convergence

  • Haishan Ye
  • Luo Luo
  • Zhihua Zhang 0004

Many machine learning models are reformulated as optimization problems. Thus, it is important to solve a large-scale optimization problem in big data applications. Recently, subsampled Newton methods have emerged to attract much attention for optimization due to their efficiency at each iteration, rectified a weakness in the ordinary Newton method of suffering a high cost in each iteration while commanding a high convergence rate. Other efficient stochastic second order methods are also proposed. However, the convergence properties of these methods are still not well understood. There are also several important gaps between the current convergence theory and performance in real applications. In this paper, we aim to fill these gaps. We propose a unifying framework to analyze local convergence properties of second order methods. Based on this framework, our theoretical analysis matches the performance in real applications.

AAAI Conference 2017 Conference Paper

Communication Lower Bounds for Distributed Convex Optimization: Partition Data on Features

  • Zihao Chen
  • Luo Luo
  • Zhihua Zhang

Recently, there has been an increasing interest in designing distributed convex optimization algorithms under the setting where the data matrix is partitioned on features. Algorithms under this setting sometimes have many advantages over those under the setting where data is partitioned on samples, especially when the number of features is huge. Therefore, it is important to understand the inherent limitations of these optimization problems. In this paper, with certain restrictions on the communication allowed in the procedures, we develop tight lower bounds on communication rounds for a broad class of non-incremental algorithms under this setting. We also provide a lower bound on communication rounds for a class of (randomized) incremental algorithms.

IJCAI Conference 2016 Conference Paper

Frequent Direction Algorithms for Approximate Matrix Multiplication with Applications in CCA

  • Qiaomin Ye
  • Luo Luo
  • Zhihua Zhang

Approximate matrix multiplication (AMM) becomes increasingly popular because it makes matrix computation suitable for large-scale datasets. Most previous AMM methods are based on the idea of random selection or random projection. In this paper, we propose a deterministic algorithm FD-AMM for computing an approximation to the product of two given matrices. Moreover, the algorithm works in a streaming manner. In particular, our approach is inspired by a recently proposed matrix sketching algorithm called Frequent Directions (FD). FD-AMM has stronger error bound than both random selection and random projection algorithms with respect to the same space complexity. Our approach also leads to an algorithm for computing the Canonical Correlation Analysis (CCA) of two matrices exactly in a streaming way, which takes less space than the classical method. Experimental results validate the effectiveness of our method.

UAI Conference 2016 Conference Paper

Quasi-Newton Hamiltonian Monte Carlo

  • Tianfan Fu
  • Luo Luo
  • Zhihua Zhang 0004

The Hamiltonian Monte Carlo (HMC) method has become significantly popular in recent years. It is the state-of-the-art MCMC sampler due to its more efficient exploration to the parameter space than the standard random-walk based proposal. The key idea behind HMC is that it makes use of first-order gradient information about the target distribution. In this paper, we propose a novel dynamics using second-order geometric information about the desired distribution. The second-order information is estimated by using a quasi-Newton method (say, the BFGS method), so it does not bring heavy computational burden. Moreover, our theoretical analysis guarantees that this dynamics remains the target distribution invariant. As a result, the proposed quasiNewton Hamiltonian Monte Carlo (QNHMC) algorithm traverses the parameter space more efficiently than the standard HMC and produces a less correlated series of samples. Finally, empirical evaluation on simulated data verifies the effectiveness and efficiency of our approach. We also conduct applications of QNHMC in Bayesian logistic regression and online Bayesian matrix factorization problems.

JMLR Journal 2016 Journal Article

SPSD Matrix Approximation vis Column Selection: Theories, Algorithms, and Extensions

  • Shusen Wang
  • Luo Luo
  • Zhihua Zhang

Symmetric positive semidefinite (SPSD) matrix approximation is an important problem with applications in kernel methods. However, existing SPSD matrix approximation methods such as the Nyström method only have weak error bounds. In this paper we conduct in-depth studies of an SPSD matrix approximation model and establish strong relative-error bounds. We call it the prototype model for it has more efficient and effective extensions, and some of its extensions have high scalability. Though the prototype model itself is not suitable for large- scale data, it is still useful to study its properties, on which the analysis of its extensions relies. This paper offers novel theoretical analysis, efficient algorithms, and a highly accurate extension. First, we establish a lower error bound for the prototype model and improve the error bound of an existing column selection algorithm to match the lower bound. In this way, we obtain the first optimal column selection algorithm for the prototype model. We also prove that the prototype model is exact under certain conditions. Second, we develop a simple column selection algorithm with a provable error bound. Third, we propose a so-called spectral shifting model to make the approximation more accurate when the eigenvalues of the matrix decay slowly, and the improvement is theoretically quantified. The spectral shifting method can also be applied to improve other SPSD matrix approximation models. [abs] [ pdf ][ bib ] &copy JMLR 2016. ( edit, beta )

ICML Conference 2015 Conference Paper

Support Matrix Machines

  • Luo Luo
  • Yubo Xie
  • Zhihua Zhang 0004
  • Wu-Jun Li

In many classification problems such as electroencephalogram (EEG) classification and image classification, the input features are naturally represented as matrices rather than vectors or scalars. In general, the structure information of the original feature matrix is useful and informative for data analysis tasks such as classification. One typical structure information is the correlation between columns or rows in the feature matrix. To leverage this kind of structure information, we propose a new classification method that we call support matrix machine (SMM). Specifically, SMM is defined as a hinge loss plus a so-called spectral elastic net penalty which is a spectral extension of the conventional elastic net over a matrix. The spectral elastic net enjoys a property of grouping effect, i. e. , strongly correlated columns or rows tend to be selected altogether or not. Since the optimization problem for SMM is convex, this encourages us to devise an alternating direction method of multipliers algorithm for solving the problem. Experimental results on EEG and face image classification data show that our model is more robust and efficient than the state-of-the-art methods.