Author name cluster

Zebang Shen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

28 papers

2 author rows

EWRL Workshop 2025 Workshop Paper

Flow Density Control: Generative Optimization Beyond Entropy-Regularized Fine-Tuning

Riccardo De Santi
Marin Vlastelica
Ya-Ping Hsieh
Zebang Shen
Niao He
Andreas Krause

Adapting large-scale foundational flow and diffusion generative models to optimize task-specific objectives while preserving prior information is crucial for real-world applications such as molecular design, protein docking, and creative image generation. Existing principled fine-tuning methods aim to maximize the expected reward of generated samples, while retaining knowledge from the pre-trained model via KL-divergence regularization. In this work, we tackle the significantly more general problem of optimizing general utilities beyond average rewards, including risk-averse and novelty-seeking reward maximization, diversity measures for exploration, and experiment design objectives among others. Likewise, we consider more general ways to preserve prior information beyond KL-divergence, such as optimal transport distances and Rényi divergences. To this end, we introduce Flow Density Control (FDC), a simple algorithm that reduces this complex problem to a specific sequence of simpler fine-tuning tasks, each solvable via scalable established methods. We derive convergence guarantees for the proposed scheme under realistic assumptions by leveraging recent understanding of mirror flows. Finally, we validate our method on illustrative settings, text-to-image, and molecular design tasks, showing that it can steer pre-trained generative models to optimize objectives and solve practically relevant tasks beyond the reach of current fine-tuning schemes.

NeurIPS Conference 2025 Conference Paper

Flow Density Control: Generative Optimization Beyond Entropy-Regularized Fine-Tuning

Riccardo De Santi
Marin Vlastelica
Ya-Ping Hsieh
Zebang Shen
Niao He
Andreas Krause

Adapting large-scale foundational flow and diffusion generative models to optimize task-specific objectives while preserving prior information is crucial for real-world applications such as molecular design, protein docking, and creative image generation. Existing principled fine-tuning methods aim to maximize the expected reward of generated samples, while retaining knowledge from the pre-trained model via KL-divergence regularization. In this work, we tackle the significantly more general problem of optimizing general utilities beyond average rewards, including risk-averse and novelty-seeking reward maximization, diversity measures for exploration, and experiment design objectives among others. Likewise, we consider more general ways to preserve prior information beyond KL-divergence, such as optimal transport distances and Rényi divergences. To this end, we introduce Flow Density Control (FDC), a simple algorithm that reduces this complex problem to a specific sequence of simpler fine-tuning tasks, each solvable via scalable established methods. We derive convergence guarantees for the proposed scheme under realistic assumptions by leveraging recent understanding of mirror flows. Finally, we validate our method on illustrative settings, text-to-image, and molecular design tasks, showing that it can steer pre-trained generative models to optimize objectives and solve practically relevant tasks beyond the reach of current fine-tuning schemes.

ICLR Conference 2025 Conference Paper

Learning to Steer Markovian Agents under Model Uncertainty

Jiawei Huang
Vinzenz Thoma
Zebang Shen
Heinrich H. Nax
Niao He

Designing incentives for an adapting population is a ubiquitous problem in a wide array of economic applications and beyond. In this work, we study how to design additional rewards to steer multi-agent systems towards desired policies \emph{without} prior knowledge of the agents' underlying learning dynamics. Motivated by the limitation of existing works, we consider a new and general category of learning dynamics called \emph{Markovian agents}. We introduce a model-based non-episodic Reinforcement Learning (RL) formulation for our steering problem. Importantly, we focus on learning a \emph{history-dependent} steering strategy to handle the inherent model uncertainty about the agents' learning dynamics. We introduce a novel objective function to encode the desiderata of achieving a good steering outcome with reasonable cost. Theoretically, we identify conditions for the existence of steering strategies to guide agents to the desired policies. Complementing our theoretical contributions, we provide empirical algorithms to approximately solve our objective, which effectively tackles the challenge in learning history-dependent strategies. We demonstrate the efficacy of our algorithms through empirical evaluations.

ICML Conference 2025 Conference Paper

Provable Maximum Entropy Manifold Exploration via Diffusion Models

Riccardo De Santi
Marin Vlastelica
Ya-Ping Hsieh
Zebang Shen
Niao He
Andreas Krause 0001

Exploration is critical for solving real-world decision-making problems such as scientific discovery, where the objective is to generate truly novel designs rather than mimic existing data distributions. In this work, we address the challenge of leveraging the representational power of generative models for exploration without relying on explicit uncertainty quantification. We introduce a novel framework that casts exploration as entropy maximization over the approximate data manifold implicitly defined by a pre-trained diffusion model. Then, we present a novel principle for exploration based on density estimation, a problem well-known to be challenging in practice. To overcome this issue and render this method truly scalable, we leverage a fundamental connection between the entropy of the density induced by a diffusion model and its score function. Building on this, we develop an algorithm based on mirror descent that solves the exploration problem as sequential fine-tuning of a pre-trained diffusion model. We prove its convergence to the optimal exploratory diffusion model under realistic assumptions by leveraging recent understanding of mirror flows. Finally, we empirically evaluate our approach on both synthetic and high-dimensional text-to-image diffusion, demonstrating promising results.

NeurIPS Conference 2025 Conference Paper

Scalable Neural Incentive Design with Parameterized Mean-Field Approximation

Nathan Corecco
Batuhan Yardim
Vinzenz Thoma
Zebang Shen
Niao He

Designing incentives for a multi-agent system to induce a desirable Nash equilibrium is both a crucial and challenging problem appearing in many decision-making domains, especially for a large number of agents $N$. Under the exchangeability assumption, we formalize this incentive design (ID) problem as a parameterized mean-field game (PMFG), aiming to reduce complexity via an infinite-population limit. We first show that when dynamics and rewards are Lipschitz, the finite-$N$ ID objective is approximated by the PMFG at rate $\mathcal{O}(\frac{1}{\sqrt{N}})$. Moreover, beyond the Lipschitz-continuous setting, we prove the same $\mathcal{O}(\frac{1}{\sqrt{N}})$ decay for the important special case of sequential auctions, despite discontinuities in dynamics, through a tailored auction-specific analysis. Built on our novel approximation results, we further introduce our Adjoint Mean-Field Incentive Design (AMID) algorithm, which uses explicit differentiation of iterated equilibrium operators to compute gradients efficiently. By uniting approximation bounds with optimization guarantees, AMID delivers a powerful, scalable algorithmic tool for many-agent (large $N$) ID. Across diverse auction settings, the proposed AMID method substantially increases revenue over first-price formats and outperforms existing benchmark methods.

EWRL Workshop 2024 Workshop Paper

Learning to Steer Markovian Agents under Model Uncertainty

Jiawei Huang
Vinzenz Thoma
Zebang Shen
Heinrich H. Nax
Niao He

We study reward design for steering multi-agent systems towards desired equilibria \emph{without} prior knowledge of the agents' underlying policy learning dynamics model. We introduce a model-based non-episodic Reinforcement Learning (RL) formulation for our steering problem. To handle the model uncertainty, we contribute a new optimization objective targeting at learning a \emph{history-dependent} steering strategy, and establish guarantees on its optimal solution. Theoretically, we identify conditions for the existence of steering strategies to guide agents sufficiently close to the desired policies. Complementing our theoretical contributions, we provide empirically tractable algorithms to approximately solve our objective, which effectively tackles the challenge in efficiently learning history-dependent strategies. We demonstrate the efficacy of our algorithms through empirical evaluations.

NeurIPS Conference 2024 Conference Paper

Solving Zero-Sum Markov Games with Continuous State via Spectral Dynamic Embedding

Chenhao Zhou
Zebang Shen
Chao Zhang
Hanbin Zhao
Hui Qian

In this paper, we propose a provably efficient natural policy gradient algorithm called Spectral Dynamic Embedding Policy Optimization (\SDEPO) for two-player zero-sum stochastic Markov games with continuous state space and finite action space. In the policy evaluation procedure of our algorithm, a novel kernel embedding method is employed to construct a finite-dimensional linear approximations to the state-action value function. We explicitly analyze the approximation error in policy evaluation, and show that \SDEPO\ achieves an $\tilde{O}(\frac{1}{(1-\gamma)^3\epsilon})$ last-iterate convergence to the $\epsilon-$optimal Nash equilibrium, which is independent of the cardinality of the state space. The complexity result matches the best-known results for global convergence of policy gradient algorithms for single agent setting. Moreover, we also propose a practical variant of \SDEPO\ to deal with continuous action space and empirical results demonstrate the practical superiority of the proposed method.

PDF Details DOI

AAAI Conference 2023 Conference Paper

CDMA: A Practical Cross-Device Federated Learning Algorithm for General Minimax Problems

Jiahao Xie
Chao Zhang
Zebang Shen
Weijie Liu
Hui Qian

Minimax problems arise in a wide range of important applications including robust adversarial learning and Generative Adversarial Network (GAN) training. Recently, algorithms for minimax problems in the Federated Learning (FL) paradigm have received considerable interest. Existing federated algorithms for general minimax problems require the full aggregation (i.e., aggregation of local model information from all clients) in each training round. Thus, they are inapplicable to an important setting of FL known as the cross-device setting, which involves numerous unreliable mobile/IoT devices. In this paper, we develop the first practical algorithm named CDMA for general minimax problems in the cross-device FL setting. CDMA is based on a Start-Immediately-With-Enough-Responses mechanism, in which the server first signals a subset of clients to perform local computation and then starts to aggregate the local results reported by clients once it receives responses from enough clients in each round. With this mechanism, CDMA is resilient to the low client availability. In addition, CDMA is incorporated with a lightweight global correction in the local update steps of clients, which mitigates the impact of slow network connections. We establish theoretical guarantees of CDMA under different choices of hyperparameters and conduct experiments on AUC maximization, robust adversarial network training, and GAN training tasks. Theoretical and experimental results demonstrate the efficiency of CDMA.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

Entropy-dissipation Informed Neural Network for McKean-Vlasov Type PDEs

Zebang Shen
Zhenfu Wang

The McKean-Vlasov equation (MVE) describes the collective behavior of particles subject to drift, diffusion, and mean-field interaction. In physical systems, the interaction term can be singular, i. e. it diverges when two particles collide. Notable examples of such interactions include the Coulomb interaction, fundamental in plasma physics, and the Biot-Savart interaction, present in the vorticity formulation of the 2D Navier-Stokes equation (NSE) in fluid dynamics. Solving MVEs that involve singular interaction kernels presents a significant challenge, especially when aiming to provide rigorous theoretical guarantees. In this work, we propose a novel approach based on the concept of entropy dissipation in the underlying system. We derive a potential function that effectively controls the KL divergence between a hypothesis solution and the ground truth. Building upon this theoretical foundation, we introduce the Entropy-dissipation Informed Neural Network (EINN) framework for solving MVEs. In EINN, we utilize neural networks (NN) to approximate the underlying velocity field and minimize the proposed potential function. By leveraging the expressive power of NNs, our approach offers a promising avenue for tackling the complexities associated with singular interactions. To assess the empirical performance of our method, we compare EINN with SOTA NN-based MVE solvers. The results demonstrate the effectiveness of our approach in solving MVEs across various example problems.

ICLR Conference 2023 Conference Paper

Share Your Representation Only: Guaranteed Improvement of the Privacy-Utility Tradeoff in Federated Learning

Zebang Shen
Jiayuan Ye 0001
Anmin Kang
Seyed Hamed Hassani
Reza Shokri

Repeated parameter sharing in federated learning causes significant information leakage about private data, thus defeating its main purpose: data privacy. Mitigating the risk of this information leakage, using state of the art differentially private algorithms, also does not come for free. Randomized mechanisms can prevent convergence of models on learning even the useful representation functions, especially if there is more disagreement between local models on the classification functions (due to data heterogeneity). In this paper, we consider a representation federated learning objective that encourages various parties to collaboratively refine the consensus part of the model, with differential privacy guarantees, while separately allowing sufficient freedom for local personalization (without releasing it). We prove that in the linear representation setting, while the objective is non-convex, our proposed new algorithm \DPFEDREP\ converges to a ball centered around the \emph{global optimal} solution at a linear rate, and the radius of the ball is proportional to the reciprocal of the privacy budget. With this novel utility analysis, we improve the SOTA utility-privacy trade-off for this problem by a factor of $\sqrt{d}$, where $d$ is the input dimension. We empirically evaluate our method with the image classification task on CIFAR10, CIFAR100, and EMNIST, and observe a significant performance improvement over the prior work under the same small privacy budget. The code can be found in this link, https://github.com/shenzebang/CENTAUR-Privacy-Federated-Representation-Learning.

TMLR Journal 2023 Journal Article

Straggler-Resilient Personalized Federated Learning

Isidoros Tziotis
Zebang Shen
Ramtin Pedarsani
Hamed Hassani
Aryan Mokhtari

Federated Learning is an emerging learning paradigm that allows training models from samples distributed across a large network of clients while respecting privacy and communication restrictions. Despite its success, federated learning faces several challenges related to its decentralized nature. In this work, we develop a novel algorithmic procedure with theoretical speedup guarantees that simultaneously handles two of these hurdles, namely (i) data heterogeneity, i.e., data distributions can vary substantially across clients, and (ii) system heterogeneity, i.e., the computational power of the clients could differ significantly. By leveraging previous works in the realm of representation learning (Collins et al., 2021; Liang et al., 2020), our method constructs a global common representation utilizing the data from all clients. Additionally, it learns a user-specific set of parameters resulting in a personalized solution for each individual client. Furthermore, it mitigates the effects of stragglers by adaptively selecting clients based on their computational characteristics, thus achieving for the first time near optimal sample complexity and provable logarithmic speedup. Experimental results support our theoretical findings showing the superiority of our method over alternative personalized federated schemes in system and data heterogeneous environments.

ICLR Conference 2022 Conference Paper

An Agnostic Approach to Federated Learning with Class Imbalance

Zebang Shen
Juan Cerviño
Seyed Hamed Hassani
Alejandro Ribeiro

Federated Learning (FL) has emerged as the tool of choice for training deep models over heterogeneous and decentralized datasets. As a reflection of the experiences from different clients, severe class imbalance issues are observed in real-world FL problems. Moreover, there exists a drastic mismatch between the imbalances from the local and global perspectives, i.e. a local majority class can be the minority of the population. Additionally, the privacy requirement of FL poses an extra challenge, as one should handle class imbalance without identifying the minority class. In this paper we propose a novel agnostic constrained learning formulation to tackle the class imbalance problem in FL, without requiring further information beyond the standard FL objective. A meta algorithm, CLIMB, is designed to solve the target optimization problem, with its convergence property analyzed under certain oracle assumptions. Through an extensive empirical study over various data heterogeneity and class imbalance configurations, we showcase that CLIMB considerably improves the performance in the minority class without compromising the overall accuracy of the classifier, which significantly outperforms previous arts. In fact, we observe the greatest performance boost in the most difficult scenario where every client only holds data from one class. The code can be found here https://github.com/shenzebang/Federated-Learning-Pytorch.

AAAI Conference 2022 Conference Paper

From One to All: Learning to Match Heterogeneous and Partially Overlapped Graphs

Weijie Liu
Hui Qian
Chao Zhang
Jiahao Xie
Zebang Shen
Nenggan Zheng

Recent years have witnessed a flurry of research activity in graph matching, which aims at finding the correspondence of nodes across two graphs and lies at the heart of many artificial intelligence applications. However, matching heterogeneous graphs with partial overlap remains a challenging problem in real-world applications. This paper proposes the first practical learning-to-match method to meet this challenge. The proposed unsupervised method adopts a novel partial optimal transport paradigm to learn a transport plan and node embeddings simultaneously. In a from-one-to-all manner, the entire learning procedure is decomposed into a series of easy-tosolve sub-procedures, each of which only handles the alignment of a single type of nodes. A mechanism for searching the transport mass is also proposed. Experimental results demonstrate that the proposed method outperforms state-ofthe-art graph matching methods.

AAAI Conference 2021 Conference Paper

A Hybrid Stochastic Gradient Hamiltonian Monte Carlo Method

Chao Zhang
Zhijian Li
Zebang Shen
Jiahao Xie
Hui Qian

Recent theoretical analyses reveal that existing Stochastic Gradient Markov Chain Monte Carlo (SG-MCMC) methods need large mini-batches of samples (exponentially dependent on the dimension) to reduce the mean square error of gradient estimates and ensure non-asymptotic convergence guarantees when the target distribution has a nonconvex potential function. In this paper, we propose a novel SG-MCMC algorithm, called Hybrid Stochastic Gradient Hamiltonian Monte Carlo (HSG-HMC) method, which needs merely one sample per iteration and possesses a simple structure with only one hyperparameter. Such improvement leverages a hybrid stochastic gradient estimator that exploits historical stochastic gradient information to control the mean square error. Theoretical analyses show that our method obtains the best-known overall sample complexity to achieve epsilon-accuracy in terms of the 2-Wasserstein distance for sampling from distributions with nonconvex potential functions. Empirical studies on both simulated and real-world datasets demonstrate the advantage of our method.

IJCAI Conference 2020 Conference Paper

Accelerating Stratified Sampling SGD by Reconstructing Strata

Weijie Liu
Hui Qian
Chao Zhang
Zebang Shen
Jiahao Xie
Nenggan Zheng

In this paper, a novel stratified sampling strategy is designed to accelerate the mini-batch SGD. We derive a new iteration-dependent surrogate which bound the stochastic variance from above. To keep the strata minimizing this surrogate with high probability, a stochastic stratifying algorithm is adopted in an adaptive manner, that is, in each iteration, strata are reconstructed only if an easily verifiable condition is met. Based on this novel sampling strategy, we propose an accelerated mini-batch SGD algorithm named SGD-RS. Our theoretical analysis shows that the convergence rate of SGD-RS is superior to the state-of-the-art. Numerical experiments corroborate our theory and demonstrate that SGD-RS achieves at least 3. 48-times speed-ups compared to vanilla minibatch SGD.

PDF Details DOI

AAAI Conference 2020 Conference Paper

Aggregated Gradient Langevin Dynamics

Chao Zhang
Jiahao Xie
Zebang Shen
Peilin Zhao
Tengfei Zhou
Hui Qian

In this paper, we explore a general Aggregated Gradient Langevin Dynamics framework (AGLD) for the Markov Chain Monte Carlo (MCMC) sampling. We investigate the nonasymptotic convergence of AGLD with a uniﬁed analysis for different data accessing (e. g. random access, cyclic access and random reshufﬂe) and snapshot updating strategies, under convex and nonconvex settings respectively. It is the ﬁrst time that bounds for I/O friendly strategies such as cyclic access and random reshufﬂe have been established in the MCMC literature. The theoretic results also indicate that methods in AGLD possess the merits of both the low periteration computational complexity and the short mixture time. Empirical studies demonstrate that our framework allows to derive novel schemes to generate high-quality samples for large-scale Bayesian posterior learning tasks.

AAAI Conference 2020 Conference Paper

Efficient Projection-Free Online Methods with Stochastic Recursive Gradient

Jiahao Xie
Zebang Shen
Chao Zhang
Boyu Wang
Hui Qian

This paper focuses on projection-free methods for solving smooth Online Convex Optimization (OCO) problems. Existing projection-free methods either achieve suboptimal regret bounds or have high per-round computational costs. To ﬁll this gap, two efﬁcient projection-free online methods called ORGFW and MORGFW are proposed for solving stochastic and adversarial OCO problems, respectively. By employing a recursive gradient estimator, our methods achieve optimal regret bounds (up to a logarithmic factor) while possessing low per-round computational costs. Experimental results demonstrate the efﬁciency of the proposed methods compared to state-of-the-arts.

NeurIPS Conference 2020 Conference Paper

Sinkhorn Barycenter via Functional Gradient Descent

Zebang Shen
Zhenfu Wang
Alejandro Ribeiro
Hamed Hassani

In this paper, we consider the problem of computing the barycenter of a set of probability distributions under the Sinkhorn divergence. This problem has recently found applications across various domains, including graphics, learning, and vision, as it provides a meaningful mechanism to aggregate knowledge. Unlike previous approaches which directly operate in the space of probability measures, we recast the Sinkhorn barycenter problem as an instance of unconstrained functional optimization and develop a novel functional gradient descent method named \texttt{Sinkhorn Descent} (\texttt{SD}). We prove that \texttt{SD} converges to a stationary point at a sublinear rate, and under reasonable assumptions, we further show that it asymptotically finds a global minimizer of the Sinkhorn barycenter problem. Moreover, by providing a mean-field analysis, we show that \texttt{SD} preserves the {weak convergence} of empirical measures. Importantly, the computational complexity of \texttt{SD} scales linearly in the dimension $d$ and we demonstrate its scalability by solving a $100$-dimensional Sinkhorn barycenter problem.

NeurIPS Conference 2020 Conference Paper

Sinkhorn Natural Gradient for Generative Models

Zebang Shen
Zhenfu Wang
Alejandro Ribeiro
Hamed Hassani

We consider the problem of minimizing a functional over a parametric family of probability measures, where the parameterization is characterized via a push-forward structure. An important application of this problem is in training generative adversarial networks. In this regard, we propose a novel Sinkhorn Natural Gradient (SiNG) algorithm which acts as a steepest descent method on the probability space endowed with the Sinkhorn divergence. We show that the Sinkhorn information matrix (SIM), a key component of SiNG, has an explicit expression and can be evaluated accurately in complexity that scales logarithmically with respect to the desired accuracy. This is in sharp contrast to existing natural gradient methods that can only be carried out approximately. Moreover, in practical applications when only Monte-Carlo type integration is available, we design an empirical estimator for SIM and provide the stability analysis. In our experiments, we quantitatively compare SiNG with state-of-the-art SGD-type solvers on generative tasks to demonstrate its efficiency and efficacy of our method.

ICML Conference 2019 Conference Paper

Hessian Aided Policy Gradient

Zebang Shen
Alejandro Ribeiro
Seyed Hamed Hassani
Hui Qian 0001
Chao Mi

Reducing the variance of estimators for policy gradient has long been the focus of reinforcement learning research. While classic algorithms like REINFORCE find an $\epsilon$-approximate first-order stationary point in $\OM({1}/{\epsilon^4})$ random trajectory simulations, no provable improvement on the complexity has been made so far. This paper presents a Hessian aided policy gradient method with the first improved sample complexity of $\OM({1}/{\epsilon^3})$. While our method exploits information from the policy Hessian, it can be implemented in linear time with respect to the parameter dimension and is hence applicable to sophisticated DNN parameterization. Simulations on standard tasks validate the efficiency of our method.

NeurIPS Conference 2019 Conference Paper

Stochastic Continuous Greedy ++: When Upper and Lower Bounds Match

Amin Karbasi
Hamed Hassani
Aryan Mokhtari
Zebang Shen

In this paper, we develop \scg~(\text{SCG}{$++$}), the first efficient variant of a conditional gradient method for maximizing a continuous submodular function subject to a convex constraint. Concretely, for a monotone and continuous DR-submodular function, \SCGPP achieves a tight $[(1-1/e)\OPT -\epsilon]$ solution while using $O(1/\epsilon^2)$ stochastic gradients and $O(1/\epsilon)$ calls to the linear optimization oracle. The best previously known algorithms either achieve a suboptimal $[(1/2)\OPT -\epsilon]$ solution with $O(1/\epsilon^2)$ stochastic gradients or the tight $[(1-1/e)\OPT -\epsilon]$ solution with suboptimal $O(1/\epsilon^3)$ stochastic gradients. We further provide an information-theoretic lower bound to showcase the necessity of $\OM({1}/{\epsilon^2})$ stochastic oracle queries in order to achieve $[(1-1/e)\OPT -\epsilon]$ for monotone and DR-submodular functions. This result shows that our proposed \SCGPP enjoys optimality in terms of both approximation guarantee, i. e. , $(1-1/e)$ approximation factor, and stochastic gradient evaluations, i. e. , $O(1/\epsilon^2)$ calls to the stochastic oracle. By using stochastic continuous optimization as an interface, we also show that it is possible to obtain the $[(1-1/e)\OPT-\epsilon]$ tight approximation guarantee for maximizing a monotone but stochastic submodular set function subject to a general matroid constraint after at most $\mathcal{O}(n^2/\epsilon^2)$ calls to the stochastic function value, where $n$ is the number of elements in the ground set.

IJCAI Conference 2018 Conference Paper

JUMP: a Jointly Predictor for User Click and Dwell Time

Tengfei Zhou
Hui Qian
Zebang Shen
Chao Zhang
Chengwei Wang
Shichen Liu
Wenwu Ou

With the recent proliferation of recommendation system, there have been a lot of interests in session-based prediction methods, particularly those based on Recurrent Neural Network (RNN) and their variants. However, existing methods either ignore the dwell time prediction that plays an important role in measuring user's engagement on the content, or fail to process very short or noisy sessions. In this paper, we propose a joint predictor, JUMP, for both user click and dwell time in session-based settings. To map its input into a feature vector, JUMP adopts a novel three-layered RNN structure which includes a fast-slow layer for very short sessions and an attention layer for noisy sessions. Experiments demonstrate that JUMP outperforms state-of-the-art methods in both user click and dwell time prediction.

ICML Conference 2018 Conference Paper

Towards More Efficient Stochastic Decentralized Learning: Faster Convergence and Sparse Communication

Zebang Shen
Aryan Mokhtari
Tengfei Zhou
Peilin Zhao
Hui Qian 0001

Recently, the decentralized optimization problem is attracting growing attention. Most existing methods are deterministic with high per-iteration cost and have a convergence rate quadratically depending on the problem condition number. Besides, the dense communication is necessary to ensure the convergence even if the dataset is sparse. In this paper, we generalize the decentralized optimization problem to a monotone operator root finding problem, and propose a stochastic algorithm named DSBA that (1) converges geometrically with a rate linearly depending on the problem condition number, and (2) can be implemented using sparse communication only. Additionally, DSBA handles important learning problems like AUC-maximization which can not be tackled efficiently in the previous problem setting. Experiments on convex minimization and AUC-maximization validate the efficiency of our method.

IJCAI Conference 2017 Conference Paper

Accelerated Doubly Stochastic Gradient Algorithm for Large-scale Empirical Risk Minimization

Zebang Shen
Hui Qian
Tongzhou Mu
Chao Zhang

Nowadays, algorithms with fast convergence, small memory footprints, and low per-iteration complexity are particularly favorable for artificial intelligence applications. In this paper, we propose a doubly stochastic algorithm with a novel accelerating multi-momentum technique to solve large scale empirical risk minimization problem for learning tasks. While enjoying a provably superior convergence rate, in each iteration, such algorithm only accesses a mini batch of samples and meanwhile updates a small block of variable coordinates, which substantially reduces the amount of memory reference when both the massive sample size and ultra-high dimensionality are involved. Specifically, to obtain an ε-accurate solution, our algorithm requires only O(log(1/ε)/sqrt(ε)) overall computation for the general convex case and O((n+sqrt{nκ})log(1/ε)) for the strongly convex case. Empirical studies on huge scale datasets are conducted to illustrate the efficiency of our method in practice.

IJCAI Conference 2017 Conference Paper

Tensor Completion with Side Information: A Riemannian Manifold Approach

Tengfei Zhou
Hui Qian
Zebang Shen
Chao Zhang
Congfu Xu

By restricting the iterate on a nonlinear manifold, the recently proposed Riemannian optimization methods prove to be both efficient and effective in low rank tensor completion problems. However, existing methods fail to exploit the easily accessible side information, due to their format mismatch. Consequently, there is still room for improvement. To fill the gap, in this paper, a novel Riemannian model is proposed to tightly integrate the original model and the side information by overcoming their inconsistency. For this model, an efficient Riemannian conjugate gradient descent solver is devised based on a new metric that captures the curvature of the objective. Numerical experiments suggest that our method is more accurate than the state-of-the-art without compromising the efficiency.

IJCAI Conference 2016 Conference Paper

Adaptive Variance Reducing for Stochastic Gradient Descent

Zebang Shen
Hui Qian
Tengfei Zhou
Tongzhou Mu

Variance Reducing (VR) stochastic methods are fast-converging alternatives to the classical Stochastic Gradient Descent (SGD) for solving large-scale regularized finite sum problems, especially when a highly accurate solution is required. One critical step in VR is the function sampling. State-of-the-art VR algorithms such as SVRG and SAGA, employ either Uniform Probability (UP) or Importance Probability (IP), which is deficient in reducing the variance and hence leads to suboptimal convergence rate. In this paper, we propose a novel sampling scheme that explicitly computes some Adaptive Probability (AP) at each iteration. Analysis shows that, equipped with AP, both SVRG and SAGA yield provably better convergence rate than the ones with UP or IP, which is confirmed in experiments. Additionally, to cut down the per iteration computation load, an efficient variant is proposed by utilizing AP periodically, whose performance is empirically validated.

AAAI Conference 2016 Conference Paper

Fast Hybrid Algorithm for Big Matrix Recovery

Tengfei Zhou
Hui Qian
Zebang Shen
Congfu Xu

Large-scale Nuclear Norm penalized Least Square problem (NNLS) is frequently encountered in estimation of low rank structures. In this paper we accelerate the solution procedure by combining non-smooth convex optimization with smooth Riemannian method. Our methods comprise of two phases. In the ﬁrst phase, we use Alternating Direction Method of Multipliers (ADMM) both to identify the ﬁx rank manifold where an optimum resides and to provide an initializer for the subsequent reﬁnement. In the second phase, two superlinearly convergent Riemannian methods: Riemannian NewTon (NT) and Riemannian Conjugate Gradient descent (CG) are adopted to improve the approximation over a ﬁx rank manifold. We prove that our Hybrid method of ADMM and NT (HADMNT) converges to an optimum of NNLS at least quadratically. The experiments on large-scale collaborative ﬁltering datasets demonstrate very competitive performance of these fast hybrid methods compared to the state-of-the-arts.

IJCAI Conference 2015 Conference Paper

Simple Atom Selection Strategy for Greedy Matrix Completion

Zebang Shen
Hui Qian
Tengfei Zhou
Song Wang

In this paper we focus on the greedy matrix completion problem. A simple atom selection strategy is proposed to find the optimal atom in each iteration by alternating minimization. Based on this per-iteration strategy, we devise a greedy algorithm and establish an upper bound of the approximating error. To evaluate different weight refinement methods, several variants are designed. We prove that our algorithm and three of its variants have the property of linear convergence. Experiments of Recommendation and Image Recovery are conducted to make empirical evaluation with promising results. The proposed algorithm takes only 700 seconds to process Yahoo Music dataset in PC, and achieves a root mean square error 24. 5 on the test set.