Author name cluster

Wenhao Yang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

21 papers

2 author rows

AAAI Conference 2026 Conference Paper

Logic Unseen: Revealing the Logical Blindspots of Vision-Language Models

Yuchen Zhou
Jiayu Tang
Shuo Yang
Xiaoyan Xiao
Yuqin Dai
Wenhao Yang
Chao Gou
Xiaobo Xia

Vision-Language Models (VLMs), exemplified by CLIP, have emerged as foundational for multimodal intelligence. However, their capacity for logical understanding remains significantly underexplored, resulting in critical **''logical blindspots''** that limit their reliability in practical applications. To systematically diagnose this, we introduce **LogicBench**, a comprehensive benchmark with over 50,000 vision-language pairs across 9 logical categories and 4 diverse scenarios: images, videos, anomaly detection, and medical diagnostics. Our evaluation reveals that existing VLMs, even the state-of-the-art ones, fall at over 40 accuracy points below human performance, particularly in challenging tasks like Causality and Conditionality, highlighting their reliance on surface semantics over critical logical structures. To bridge this gap, we propose **LogicCLIP**, a novel training framework designed to boost VLMs' logical sensitivity through advancements in both data generation and optimization objectives. LogicCLIP utilizes logic-aware data generation and a contrastive learning strategy that combines coarse-grained alignment, a fine-grained multiple-choice objective, and a novel logical structure-aware objective. Extensive experiments demonstrate LogicCLIP's substantial improvements in logical comprehension across all LogicBench domains, significantly outperforming baselines. Moreover, LogicCLIP retains, and often surpasses, competitive performance on general vision-language benchmarks, demonstrating that the enhanced logical understanding does not come at the expense of general alignment. We believe LogicBench and LogicCLIP will be important resources for advancing VLM logical capabilities.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

All You Need is One: Capsule Prompt Tuning with a Single Vector

Yiyang Liu
James Liang
Heng Fan
Wenhao Yang
Yiming Cui
Xiaotian Han
Lifu Huangg
Dongfang Liu

Prompt-based learning has emerged as a parameter-efficient finetuning (PEFT) approach to facilitate Large Language Model (LLM) adaptation to downstream tasks by conditioning generation with task-aware guidance. Despite its successes, current prompt-based learning methods heavily rely on laborious grid searching for optimal prompt length and typically require considerable number of prompts, introducing additional computational burden. Worse yet, our pioneer findings indicate that the task-aware prompt design is inherently limited by its absence of instance-aware information, leading to a subtle attention interplay with the input sequence. In contrast, simply incorporating instance-aware information as a part of the guidance can enhance the prompt-tuned model performance without additional fine-tuning. Moreover, we find an interesting phenomenon, namely "attention anchor", that incorporating instance-aware tokens at the earliest position of the sequence can successfully preserve strong attention to critical structural information and exhibit more active attention interaction with all input tokens. In light of our observation, we introduce Capsule Prompt-Tuning (CaPT), an efficient and effective solution that leverages off-the-shelf, informative instance semantics into prompt-based learning. Our approach innovatively integrates both instance-aware and task-aware information in a nearly parameter-free manner (i. e. , one single capsule prompt). Empirical results demonstrate that our method can exhibit superior performance across various language tasks (e. g. , 84. 03\% average accuracy on T5-Large), serving as an "attention anchor, " while enjoying high parameter efficiency (e. g. , 0. 003\% of model parameters on Llama3. 2-1B).

PDF Details

NeurIPS Conference 2025 Conference Paper

Factor Decorrelation Enhanced Data Removal from Deep Predictive Models

Wenhao Yang
Lin Li
Xiaohui Tao
Kaize Shi

The imperative of user privacy protection and regulatory compliance necessitates sensitive data removal in model training, yet this process often induces distributional shifts that undermine model performance-particularly in out-of-distribution (OOD) scenarios. We propose a novel data removal approach that enhances deep predictive models through factor decorrelation and loss perturbation. Our approach introduces: (1) a discriminative-preserving factor decorrelation module employing dynamic adaptive weight adjustment and iterative representation updating to reduce feature redundancy and minimize inter-feature correlations. (2) a smoothed data removal mechanism with loss perturbation that creates information-theoretic safeguards against data leakage during removal operations. Extensive experiments on five benchmark datasets show that our approach outperforms other baselines and consistently achieves high predictive accuracy and robustness even under significant distribution shifts. The results highlight its superior efficiency and adaptability in both in-distribution and out-of-distribution scenarios.

PDF Details

ICLR Conference 2025 Conference Paper

Neuron based Personality Trait Induction in Large Language Models

Jia Deng
Tianyi Tang
Yanbin Yin
Wenhao Yang
Xin Zhao 0018
Ji-Rong Wen

Large language models (LLMs) have become increasingly proficient at simulating various personality traits, an important capability for supporting related applications (e.g., role-playing). To further improve this capacity, in this paper, we present a neuron based approach for personality trait induction in LLMs, with three major technical contributions. First, we construct PERSONALITYBENCH, a large-scale dataset for identifying and evaluating personality traits in LLMs. This dataset is grounded in the Big Five personality traits from psychology and designed to assess the generative capabilities of LLMs towards specific personality traits. Second, by leveraging PERSONALITYBENCH, we propose an efficient method for identifying personality-related neurons within LLMs by examining the opposite aspects of a given trait. Third, we develop a simple yet effective induction method that manipulates the values of these identified personality-related neurons, which enables fine-grained control over the traits exhibited by LLMs without training and modifying model parameters. Extensive experiments validates the efficacy of our neuron identification and trait induction methods. Notably, our approach achieves comparable performance as fine-tuned models, offering a more efficient and flexible solution for personality trait induction in LLMs.

Details

ICML Conference 2025 Conference Paper

Revisiting Differentially Private Algorithms for Decentralized Online Learning

Xiaoyu Wang
Wenhao Yang
Chang Yao 0001
Mingli Song
Yuanyu Wan

Although the differential privacy (DP) of decentralized online learning has garnered considerable attention recently, existing algorithms are unsatisfactory due to their inability to achieve $(\epsilon, 0)$-DP over all $T$ rounds, recover the optimal regret in the non-private case, and maintain the lightweight computation under complex constraints. To address these issues, we first propose a new decentralized online learning algorithm satisfying $(\epsilon, 0)$-DP over $T$ rounds, and show that it can achieve $\widetilde{O}(n(\rho^{-1/4}+\epsilon^{-1}\rho^{1/4})\sqrt{T})$ and $\widetilde{O}(n (\rho^{-1/2}+\epsilon^{-1}))$ regret bounds for convex and strongly convex functions respectively, where $n$ is the number of local learners and $\rho$ is the spectral gap of the communication matrix. As long as $\epsilon=\Omega(\sqrt{\rho})$, these bounds nearly match existing lower bounds in the non-private case, which implies that $(\epsilon, 0)$-DP of decentralized online learning may be ensured nearly for free. Our key idea is to design a block-decoupled accelerated gossip strategy that can be incorporated with the classical tree-based private aggregation, and also enjoys a faster average consensus among local learners. Furthermore, we develop a projection-free variant of our algorithm to keep the efficiency under complex constraints. As a trade-off, the above regret bounds degrade to $\widetilde{O}(n(T^{3/4}+\epsilon^{-1}T^{1/4}))$ and $\widetilde{O}(n(T^{2/3}+\epsilon^{-1}))$ respectively, which however are even better than the existing private centralized projection-free online algorithm.

Details

IJCAI Conference 2025 Conference Paper

Smoothed Online Convex Optimization with Delayed Feedback

Sifan Yang
Wenhao Yang
Wei Jiang
Yuanyu Wan
Lijun Zhang

Smoothed online convex optimization (SOCO), in which the online player incurs both a hitting cost and a switching cost for changing its decisions, has garnered significant attention in recent years. While existing studies typically assume that the gradient information is revealed immediately, such an assumption may not hold in some real-world applications. To overcome this limitation, we investigate SOCO with delayed feedback, and develop two online algorithms that can minimize the dynamic regret with switching cost. Firstly, we extend Mild-OGD, an existing algorithm that adopts the meta-expert framework for online convex optimization with delayed feedback, to account for switching cost. Specifically, we analyze the switching cost in the expert-algorithm of Mild-OGD, and then modify its meta-algorithm to incorporate this cost when assigning the weight to each expert. We demonstrate that our proposed method, Smelt-DOGD can achieve an O(√(dT(P_T+1))) dynamic regret bound with switching cost, where d is the maximum delay and P_T is the path-length. Secondly, we develop an efficient variant to reduce the number of projections per round from O(log T) to 1, yet maintaining the same theoretical guarantee. The key idea is to construct a new surrogate loss defined over a simpler domain for expert-algorithms so that these experts do not need to perform the complex projection operations in each round. Finally, we conduct experiments to validate the effectiveness and efficiency of our algorithms.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Towards Unbiased Information Extraction and Adaptation in Cross-Domain Recommendation

Yibo Wang
Yingchun Jian
Wenhao Yang
Shiyin Lu
Lei Shen
Bing Wang
Xiaoyi Zeng
Lijun Zhang

Cross-Domain Recommendation (CDR) leverages additional knowledge from auxiliary domains to address the long-standing data sparsity issue. However, existing methods typically acquire this knowledge by minimizing the average loss over all domains, overlooking the fact that different domains possess different user-preference distributions. As a result, the acquired knowledge may contain biased information from data-rich domains, leading to performance degradation in data-scarce domains. In this paper, we propose a novel CDR method, which takes domain distinctions into consideration to extract and adapt unbiased information. Specifically, our method consists of two key components: Unbiased Information Extraction (UIE) and Unbiased Information Adaptation (UIA). In the UIE, inspired by distributionally robust optimization, we optimize the worst-case performance across all domains to extract domain-invariant information, preventing the potential bias from auxiliary domains. In the UIA, we introduce a new user-item attention module, which employs domain-specific information from historically interacted items to attend the adaptation of domain-invariant information. To verify the effectiveness of our method, we conduct extensive experiments on three real-world datasets, each of which contains three extremely sparse domains. Experimental results demonstrate the considerable superiority of our proposed method compared to baselines.

PDF Details DOI

UAI Conference 2024 Conference Paper

Distributionally Robust Optimization as a Scalable Framework to Characterize Extreme Value Distributions

Patrick K. Kuiper
Ali Hasan
Wenhao Yang
Yuting Ng
Hoda Bidkhori
Jose H. Blanchet
Vahid Tarokh

The goal of this paper is to develop distributionally robust optimization (DRO) estimators, specifically for multidimensional Extreme Value Theory (EVT) statistics. EVT supports using semi-parametric models called max-stable distributions built from spatial Poisson point processes. While powerful, these models are only asymptotically valid for large samples. However, since extreme data is by definition scarce, the potential for model misspecification error is inherent to these applications, thus DRO estimators are natural. In order to mitigate over-conservative estimates while enhancing out-of-sample performance, we study DRO estimators informed by semi-parametric max-stable constraints in the space of point processes. We study both tractable convex formulations for some problems of interest (e. g. CVaR) and more general neural network based estimators. Both approaches are validated using synthetically generated data, recovering prescribed characteristics, and verifying the efficacy of the proposed techniques. Additionally, the proposed method is applied to a real data set of financial returns for comparison to a previous analysis. We established the proposed model as a novel formulation in the multivariate EVT domain, and innovative with respect to performance when compared to relevant alternate proposals.

Details

NeurIPS Conference 2024 Conference Paper

Efficient Sign-Based Optimization: Accelerating Convergence via Variance Reduction

Wei Jiang
Sifan Yang
Wenhao Yang
Lijun Zhang

Sign stochastic gradient descent (signSGD) is a communication-efficient method that transmits only the sign of stochastic gradients for parameter updating. Existing literature has demonstrated that signSGD can achieve a convergence rate of $\mathcal{O}(d^{1/2}T^{-1/4})$, where $d$ represents the dimension and $T$ is the iteration number. In this paper, we improve this convergence rate to $\mathcal{O}(d^{1/2}T^{-1/3})$ by introducing the Sign-based Stochastic Variance Reduction (SSVR) method, which employs variance reduction estimators to track gradients and leverages their signs to update. For finite-sum problems, our method can be further enhanced to achieve a convergence rate of $\mathcal{O}(m^{1/4}d^{1/2}T^{-1/2})$, where $m$ denotes the number of component functions. Furthermore, we investigate the heterogeneous majority vote in distributed settings and introduce two novel algorithms that attain improved convergence rates of $\mathcal{O}(d^{1/2}T^{-1/2} + dn^{-1/2})$ and $\mathcal{O}(d^{1/4}T^{-1/4})$ respectively, outperforming the previous results of $\mathcal{O}(dT^{-1/4} + dn^{-1/2})$ and $\mathcal{O}(d^{3/8}T^{-1/8})$, where $n$ represents the number of nodes. Numerical experiments across different tasks validate the effectiveness of our proposed methods.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Non-stationary Projection-Free Online Learning with Dynamic and Adaptive Regret Guarantees

Yibo Wang
Wenhao Yang
Wei Jiang
Shiyin Lu
Bing Wang
Haihong Tang
Yuanyu Wan
Lijun Zhang

Projection-free online learning has drawn increasing interest due to its efficiency in solving high-dimensional problems with complicated constraints. However, most existing projection-free online methods focus on minimizing the static regret, which unfortunately fails to capture the challenge of changing environments. In this paper, we investigate non-stationary projection-free online learning, and choose dynamic regret and adaptive regret to measure the performance. Specifically, we first provide a novel dynamic regret analysis for an existing projection-free method named BOGD_IP, and establish an O(T^¾ (1+P_T)) dynamic regret bound, where P_T denotes the path-length of the comparator sequence. Then, we improve the upper bound to O(T^¾ (1+P_T)^¼) by running multiple BOGD_IP algorithms with different step sizes in parallel, and tracking the best one on the fly. Our results are the first general-case dynamic regret bounds for projection-free online learning, and can recover the existing O(T^¾) static regret by setting P_T = 0. Furthermore, we propose a projection-free method to attain an O(?^¾) adaptive regret bound for any interval with length?, which nearly matches the static regret over that interval. The essential idea is to maintain a set of BOGD_IP algorithms dynamically, and combine them by a meta algorithm. Moreover, we demonstrate that it is also equipped with an O(T^¾ (1+P_T)^¼) dynamic regret bound. Finally, empirical studies verify our theoretical findings.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Online Composite Optimization Between Stochastic and Adversarial Environments

Yibo Wang
Sijia Chen
Wei Jiang
Wenhao Yang
Yuanyu Wan
Lijun Zhang

We study online composite optimization under the Stochastically Extended Adversarial (SEA) model. Specifically, each loss function consists of two parts: a fixed non-smooth and convex regularizer, and a time-varying function which can be chosen either stochastically, adversarially, or in a manner that interpolates between the two extremes. In this setting, we show that for smooth and convex time-varying functions, optimistic composite mirror descent (OptCMD) can obtain an $\mathcal{O}(\sqrt{\sigma_{1: T}^2} + \sqrt{\Sigma_{1: T}^2})$ regret bound, where $\sigma_{1: T}^2$ and $\Sigma_{1: T}^2$ denote the cumulative stochastic variance and the cumulative adversarial variation of time-varying functions, respectively. For smooth and strongly convex time-varying functions, we establish an $\mathcal{O}((\sigma_{\max}^2 + \Sigma_{\max}^2)\log(\sigma_{1: T}^2 + \Sigma_{1: T}^2))$ regret bound, where $\sigma_{\max}^2$ and $\Sigma_{\max}^2$ denote the maximal stochastic variance and the maximal adversarial variation, respectively. For smooth and exp-concave time-varying functions, we achieve an $\mathcal{O}(d \log (\sigma_{1: T}^2 + \Sigma_{1: T}^2))$ bound where $d$ denotes the dimensionality. Moreover, to deal with the unknown function type in practical problems, we propose a multi-level \textit{universal} algorithm that is able to achieve the desirable bounds for three types of time-varying functions simultaneously. It should be noticed that all our findings match existing bounds for the SEA model without the regularizer, which implies that there is \textit{no price} in regret bounds for the benefits gained from the regularizer.

PDF Details DOI

ICML Conference 2024 Conference Paper

Projection-Free Variance Reduction Methods for Stochastic Constrained Multi-Level Compositional Optimization

Wei Jiang 0029
Sifan Yang
Wenhao Yang
Yibo Wang 0005
Yuanyu Wan
Lijun Zhang 0005

This paper investigates projection-free algorithms for stochastic constrained multi-level optimization. In this context, the objective function is a nested composition of several smooth functions, and the decision set is closed and convex. Existing projection-free algorithms for solving this problem suffer from two limitations: 1) they solely focus on the gradient mapping criterion and fail to match the optimal sample complexities in unconstrained settings; 2) their analysis is exclusively applicable to non-convex functions, without considering convex and strongly convex objectives. To address these issues, we introduce novel projection-free variance reduction algorithms and analyze their complexities under different criteria. For gradient mapping, our complexities improve existing results and match the optimal rates for unconstrained problems. For the widely-used Frank-Wolfe gap criterion, we provide theoretical guarantees that align with those for single-level problems. Additionally, by using a stage-wise adaptation, we further obtain complexities for convex and strongly convex functions. Finally, numerical experiments on different tasks demonstrate the effectiveness of our methods.

Details

IROS Conference 2024 Conference Paper

RADAR: Robotics Assembly by Demonstration via Augmented Reality

Wenhao Yang
Shi Bai
Yunbo Zhang

With the widespread adoption of robots in high-mix, low-volume manufacturing, and the challenges posed by long-horizon assembly tasks, we introduce the RADAR system—an integrated human-robot collaboration system for Robotic Assembly by Demonstration via Augmented Reality. Existing frameworks lack a comprehensive, cross-task framework for effective assembly collaboration, limiting their applicability in complex tasks. We designed the RADAR system’s conceptual model, detailing its workflow and components. The system integrates human input into robotic metal beam assembly through augmented reality interactions and interfaces. We also developed a task planner that dynamically adjusts human-robot assembly tasks at coarse-fine resolutions. Validating through practical scenarios, particularly the RAMP assembly benchmark, showed that human involvement significantly enhances assembly precision and success rates, proving RADAR’s effectiveness and efficiency in human-robot collaborative assembly.

Details

ICML Conference 2024 Conference Paper

Small-loss Adaptive Regret for Online Convex Optimization

Wenhao Yang
Wei Jiang 0029
Yibo Wang 0005
Ping Yang
Yao Hu 0002
Lijun Zhang 0005

To deal with changing environments, adaptive regret has been proposed to minimize the regret over every interval. Previous studies have established a small-loss adaptive regret bound for general convex functions under the smoothness condition, offering the advantage of being much tighter than minimax rates for benign problems. However, it remains unclear whether similar bounds are attainable for other types of convex functions, such as exp-concave and strongly convex functions. In this paper, we first propose a novel algorithm that achieves a small-loss adaptive regret bound for exp-concave and smooth function. Subsequently, to address the limitation that existing algorithms can only handle one type of convex functions, we further design a universal algorithm capable of delivering small-loss adaptive regret bounds for general convex, exp-concave, and strongly convex functions simultaneously. That is challenging because the universal algorithm follows the meta-expert framework, and we need to ensure that upper bounds for both meta-regret and expert-regret are of small-loss types. Moreover, we provide a novel analysis demonstrating that our algorithms are also equipped with minimax adaptive regret bounds when functions are non-smooth.

Details

NeurIPS Conference 2024 Conference Paper

Universal Online Convex Optimization with $1$ Projection per Round

Wenhao Yang
Yibo Wang
Peng Zhao
Lijun Zhang

To address the uncertainty in function types, recent progress in online convex optimization (OCO) has spurred the development of universal algorithms that simultaneously attain minimax rates for multiple types of convex functions. However, for a $T$-round online problem, state-of-the-art methods typically conduct $O(\log T)$ projections onto the domain in each round, a process potentially time-consuming with complicated feasible sets. In this paper, inspired by the black-box reduction of Cutkosky and Orabona [2018], we employ a surrogate loss defined over simpler domains to develop universal OCO algorithms that only require $1$ projection. Embracing the framework of prediction with expert advice, we maintain a set of experts for each type of functions and aggregate their predictions via a meta-algorithm. The crux of our approach lies in a uniquely designed expert-loss for strongly convex functions, stemming from an innovative decomposition of the regret into the meta-regret and the expert-regret. Our analysis sheds new light on the surrogate loss, facilitating a rigorous examination of the discrepancy between the regret of the original loss and that of the surrogate loss, and carefully controlling meta-regret under the strong convexity condition. With only $1$ projection per round, we establish optimal regret bounds for general convex, exponentially concave, and strongly convex functions simultaneously. Furthermore, we enhance the expert-loss to exploit the smoothness property, and demonstrate that our algorithm can attain small-loss regret for multiple types of convex and smooth functions.

PDF Details DOI

ICML Conference 2023 Conference Paper

Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice

Toshinori Kitamura
Tadashi Kozuno
Yunhao Tang
Nino Vieillard
Michal Valko
Wenhao Yang
Jincheng Mei
Pierre Ménard

Mirror descent value iteration (MDVI), an abstraction of Kullback-Leibler (KL) and entropy-regularized reinforcement learning (RL), has served as the basis for recent high-performing practical RL algorithms. However, despite the use of function approximation in practice, the theoretical understanding of MDVI has been limited to tabular Markov decision processes (MDPs). We study MDVI with linear function approximation through its sample complexity required to identify an $\varepsilon$-optimal policy with probability $1-\delta$ under the settings of an infinite-horizon linear MDP, generative model, and G-optimal design. We demonstrate that least-squares regression weighted by the variance of an estimated optimal value function of the next state is crucial to achieving minimax optimality. Based on this observation, we present Variance-Weighted Least-Squares MDVI (VWLS-MDVI), the first theoretical algorithm that achieves nearly minimax optimal sample complexity for infinite-horizon linear MDPs. Furthermore, we propose a practical VWLS algorithm for value-based deep RL, Deep Variance Weighting (DVW). Our experiments demonstrate that DVW improves the performance of popular value-based deep RL algorithms on a set of MinAtar benchmarks.

Details

ICML Conference 2023 Conference Paper

Semiparametrically Efficient Off-Policy Evaluation in Linear Markov Decision Processes

Chuhan Xie
Wenhao Yang
Zhihua Zhang

We study semiparametrically efficient estimation in off-policy evaluation (OPE) where the underlying Markov decision process (MDP) is linear with a known feature map. We characterize the variance lower bound for regular estimators in the linear MDP setting and propose an efficient estimator whose variance achieves that lower bound. Consistency and asymptotic normality of our estimator are established under mild conditions, which merely requires the only infinite-dimensional nuisance parameter to be estimated at a $n^{-1/4}$ convergence rate. We also construct an asymptotically valid confidence interval for statistical inference and conduct simulation studies to validate our results. To our knowledge, this is the first work that concerns efficient estimation in the presence of a known structure of MDPs in the OPE literature.

Details

NeurIPS Conference 2022 Conference Paper

Pluralistic Image Completion with Gaussian Mixture Models

Xiaobo Xia
Wenhao Yang
Jie Ren
Yewen Li
Yibing Zhan
Bo Han
Tongliang Liu

Pluralistic image completion focuses on generating both visually realistic and diverse results for image completion. Prior methods enjoy the empirical successes of this task. However, their used constraints for pluralistic image completion are argued to be not well interpretable and unsatisfactory from two aspects. First, the constraints for visual reality can be weakly correlated to the objective of image completion or even redundant. Second, the constraints for diversity are designed to be task-agnostic, which causes the constraints to not work well. In this paper, to address the issues, we propose an end-to-end probabilistic method. Specifically, we introduce a unified probabilistic graph model that represents the complex interactions in image completion. The entire procedure of image completion is then mathematically divided into several sub-procedures, which helps efficient enforcement of constraints. The sub-procedure directly related to pluralistic results is identified, where the interaction is established by a Gaussian mixture model (GMM). The inherent parameters of GMM are task-related, which are optimized adaptively during training, while the number of its primitives can control the diversity of results conveniently. We formally establish the effectiveness of our method and demonstrate it with comprehensive experiments. The implementationis available at https: //github. com/tmllab/PICMM.

PDF Details

NeurIPS Conference 2022 Conference Paper

Semi-infinitely Constrained Markov Decision Processes

Liangyu Zhang
Yang Peng
Wenhao Yang
Zhihua Zhang

We propose a generalization of constrained Markov decision processes (CMDPs) that we call the \emph{semi-infinitely constrained Markov decision process} (SICMDP). Particularly, in a SICMDP model, we impose a continuum of constraints instead of a finite number of constraints as in the case of ordinary CMDPs. We also devise a reinforcement learning algorithm for SICMDPs that we call SI-CRL. We first transform the reinforcement learning problem into a linear semi-infinitely programming (LSIP) problem and then use the dual exchange method in the LSIP literature to solve it. To the best of our knowledge, we are the first to apply tools from semi-infinitely programming (SIP) to solve reinforcement learning problems. We present theoretical analysis for SI-CRL, identifying its sample complexity and iteration complexity. We also conduct extensive numerical examples to illustrate the SICMDP model and validate the SI-CRL algorithm.

PDF Details

ICLR Conference 2020 Conference Paper

On the Convergence of FedAvg on Non-IID Data

Xiang Li 0050
Kaixuan Huang
Wenhao Yang
Shusen Wang
Zhihua Zhang 0004

Federated learning enables a large amount of edge computing devices to jointly learn a model without data sharing. As a leading algorithm in this setting, Federated Averaging (\texttt{FedAvg}) runs Stochastic Gradient Descent (SGD) in parallel on a small subset of the total devices and averages the sequences only once in a while. Despite its simplicity, it lacks theoretical guarantees under realistic settings. In this paper, we analyze the convergence of \texttt{FedAvg} on non-iid data and establish a convergence rate of $\mathcal{O}(\frac{1}{T})$ for strongly convex and smooth problems, where $T$ is the number of SGDs. Importantly, our bound demonstrates a trade-off between communication-efficiency and convergence rate. As user devices may be disconnected from the server, we relax the assumption of full device participation to partial device participation and study different averaging schemes; low device participation rate can be achieved without severely slowing down the learning. Our results indicate that heterogeneity of data slows down the convergence, which matches empirical observations. Furthermore, we provide a necessary condition for \texttt{FedAvg} on non-iid data: the learning rate $\eta$ must decay, even if full-gradient is used; otherwise, the solution will be $\Omega (\eta)$ away from the optimal.

Details

NeurIPS Conference 2019 Conference Paper

A Regularized Approach to Sparse Optimal Policy in Reinforcement Learning

Wenhao Yang
Xiang Li
Zhihua Zhang

We propose and study a general framework for regularized Markov decision processes (MDPs) where the goal is to find an optimal policy that maximizes the expected discounted total reward plus a policy regularization term. The extant entropy-regularized MDPs can be cast into our framework. Moreover, under our framework, many regularization terms can bring multi-modality and sparsity, which are potentially useful in reinforcement learning. In particular, we present sufficient and necessary conditions that induce a sparse optimal policy. We also conduct a full mathematical analysis of the proposed regularized MDPs, including the optimality condition, performance error, and sparseness control. We provide a generic method to devise regularization forms and propose off-policy actor critic algorithms in complex environment settings. We empirically analyze the numerical properties of optimal policies and compare the performance of different sparse regularization forms in discrete and continuous environments.

PDF Details