Author name cluster

Deqiang Ouyang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

5 papers

2 author rows

AAAI Conference 2026 Conference Paper

Improving Generalization in Offline Meta-Reinforcement Learning via Cross-task Contexts

Hongcai He
Zetao Zheng
Anjie Zhu
Deqiang Ouyang
Jie Shao

Context-based offline meta-reinforcement learning (meta-RL) is a paradigm that integrates meta-learning with offline reinforcement learning. It learns a strategy to extract task-specific contexts from trajectories of meta-training tasks and leverages this strategy for adapting to unseen target tasks. However, existing methods struggle to generate generalizable contexts for adaptations due to context shift, which arises from the context-based policy overfitting to offline data. We argue that leveraging the internal relationships among tasks, rather than treating each task in isolation, is crucial for mitigating the impact of context shift. Hence, we propose a framework called cross-task contexts for improving generalization in meta-RL (CTMRL). Specifically, we design a context quantization variational auto-encoder (CQ-VAE), which clusters task-specific contexts of meta-training tasks into discrete codes based on the internal relationships among tasks. Cross-task contexts are constructed with these codes, reflecting shared information across similar tasks. These cross-task contexts not only serve as high-level structures to capture similarity across tasks but also provide a foundation for hard contrastive learning that enhances the distinguishability of similar yet distinct tasks, thereby improving the generalization of contexts and facilitating adaptation to unseen target tasks. The evaluation in meta-environments confirms the performance advantage of CTMRL over existing methods.

PDF Details DOI

AAAI Conference 2026 Conference Paper

MetaGameBO: Hierarchical Game-Theoretic Driven Robust Meta-Learning for Bayesian Optimization

Hui Li
Huafeng Liu
Yiran Fu
Shuyang Lin
Baoxin Zhang
Deqiang Ouyang
Liping Jing
Jian Yu

Meta-learning for Bayesian optimization accelerates optimization by leveraging knowledge from previous tasks, but existing methods optimize for average performance and fail on challenging outlier tasks critical in practice. These limitations become particularly severe when target tasks exhibit distribution shifts or when optimization budgets are limited in real-world applications. We introduce MetaGameBO, a hierarchical game-theoretic framework that formulates meta-learning as robust optimization through CVaR-based task selection and diversity-aware sample learning. Our approach incorporates uncertainty-aware adaptation via probabilistic embeddings and Thompson sampling for robust generalization to out-of-distribution targets. We establish theoretical guarantees including convergence to game-theoretic equilibria and improved sample complexity, and demonstrate substantial improvements with 95.7% reduction in average loss and 88.6% lower tail risk compared to state-of-the-art methods on challenging tasks and distribution shifts.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Beyond Random: Automatic Inner-loop Optimization in Dataset Distillation

Muquan Li
Hang Gou
Dongyang Zhang
Shuang Liang
Xiurui Xie
Deqiang Ouyang
Ke Qin

The growing demand for efficient deep learning has positioned dataset distillation as a pivotal technique for compressing training dataset while preserving model performance. However, existing inner-loop optimization methods for dataset distillation typically rely on random truncation strategies, which lack flexibility and often yield suboptimal results. In this work, we observe that neural networks exhibit distinct learning dynamics across different training stages—early, middle, and late—making random truncation ineffective. To address this limitation, we propose Automatic Truncated Backpropagation Through Time (AT-BPTT), a novel framework that dynamically adapts both truncation positions and window sizes according to intrinsic gradient behavior. AT-BPTT introduces three key components: (1) a probabilistic mechanism for stage-aware timestep selection, (2) an adaptive window sizing strategy based on gradient variation, and (3) a low-rank Hessian approximation to reduce computational overhead. Extensive experiments on CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet-1K show that AT-BPTT achieves state-of-the-art performance, improving accuracy by an average of 6. 16\% over baseline methods. Moreover, our approach accelerates inner-loop optimization by 3. 9 × while saving 63\% memory cost.

PDF Details

ICML Conference 2025 Conference Paper

Learning Robust Neural Processes with Risk-Averse Stochastic Optimization

Huafeng Liu 0001
Yiran Fu
Liping Jing
Hui Li
Shuyang Lin
Jingyue Shi
Deqiang Ouyang
Jian Yu 0001

Neural processes (NPs) are a promising paradigm to enable skill transfer learning across tasks with the aid of the distribution of functions. The previous NPs employ the empirical risk minimization principle in optimization. However, the fast adaption ability to different tasks can vary widely, and the worst fast adaptation can be catastrophic in risk-sensitive tasks. To achieve robust neural processes modeling, we consider the problem of training models in a risk-averse manner, which can control the worst fast adaption cases at a certain probabilistic level. By transferring the risk minimization problem to a two-level finite sum minimax optimization problem, we can easily solve it via a double-looped stochastic mirror prox algorithm with a task-aware variance reduction mechanism via sampling samples across all tasks. The mirror prox technique ensures better handling of complex constraint sets and non-Euclidean geometries, making the optimization adaptable to various tasks. The final solution, by aggregating prox points with the adaptive learning rates, enables a stable and high-quality output. The proposed learning strategy can work with various NPs flexibly and achieves less biased approximation with a theoretical guarantee. To illustrate the superiority of the proposed model, we perform experiments on both synthetic and real-world data, and the results demonstrate that our approach not only helps to achieve more accurate performance but also improves model robustness.

Details

IJCAI Conference 2020 Conference Paper

Exploring Parameter Space with Structured Noise for Meta-Reinforcement Learning

Hui Xu
Chong Zhang
Jiaxing Wang
Deqiang Ouyang
Yu Zheng
Jie Shao

Efficient exploration is a major challenge in Reinforcement Learning (RL) and has been studied extensively. However, for a new task existing methods explore either by taking actions that maximize task agnostic objectives (such as information gain) or applying a simple dithering strategy (such as noise injection), which might not be effective enough. In this paper, we investigate whether previous learning experiences can be leveraged to guide exploration of current new task. To this end, we propose a novel Exploration with Structured Noise in Parameter Space (ESNPS) approach. ESNPS utilizes meta-learning and directly uses meta-policy parameters, which contain prior knowledge, as structured noises to perturb the base model for effective exploration in new tasks. Experimental results on four groups of tasks: cheetah velocity, cheetah direction, ant velocity and ant direction demonstrate the superiority of ESNPS against a number of competitive baselines.

PDF Details DOI