Author name cluster

Qian Lin

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

21 papers

2 author rows

AAAI Conference 2026 Conference Paper

Reliability-Guaranteed and Reward-Seeking Sequence Modeling for Model-Based Offline Reinforcement Learning

Shenghong He
Chao Yu
Qian Lin
Yile Liang
Donghui Li
Xuetao Ding

As a data-driven learning approach, model-based offline reinforcement learning (MORL) aims to learn a policy by exploiting a dynamics model derived from an existing dataset. Applying conservative quantification to the dynamics model, most existing works on MORL generate trajectories that approximate the real data distribution to facilitate policy learning. However, these methods typically overlook the influence of historical information on environmental dynamics, thus generating unreliable trajectories that fail to align with the true data distribution. In this paper, we propose a new MORL algorithm called Reliability-guaranteed and Reward-seeking Transformer (RT). RT can avoid generating unreliable trajectories through the calculation of cumulative reliability of the trajectories, which is a weighted variational distance between the generated trajectory distribution and the true data distribution. Moreover, by sampling candidate actions with high rewards, RT can efficiently generate high-reward trajectories from the existing offline data, thereby further facilitating policy learning. We theoretically prove the performance guarantees of RT in policy learning, and empirically demonstrate its effectiveness against state-of-the-art model-based methods on several offline benchmark tasks and a large-scale industrial dataset from an on-demand food delivery platform.

PDF Details DOI

ICML Conference 2025 Conference Paper

Conservative Offline Goal-Conditioned Implicit V-Learning

Kaiqiang Ke
Qian Lin
Zongkai Liu
Shenghong He
Chao Yu 0004

Offline goal-conditioned reinforcement learning (GCRL) learns a goal-conditioned value function to train policies for diverse goals with pre-collected datasets. Hindsight experience replay addresses the issue of sparse rewards by treating intermediate states as goals but fails to complete goal-stitching tasks where achieving goals requires stitching different trajectories. While cross-trajectory sampling is a potential solution that associates states and goals belonging to different trajectories, we demonstrate that this direct method degrades performance in goal-conditioned tasks due to the overestimation of values on unconnected pairs. To this end, we propose Conservative Goal-Conditioned Implicit Value Learning (CGCIVL), a novel algorithm that introduces a penalty term to penalize value estimation for unconnected state-goal pairs and leverages the quasimetric framework to accurately estimate values for connected pairs. Evaluations on OGBench, a benchmark for offline GCRL, demonstrate that CGCIVL consistently surpasses state-of-the-art methods across diverse tasks.

Details

NeurIPS Conference 2025 Conference Paper

Consistency of Physics-Informed Neural Networks for Second-Order Elliptic Equations

Yuqian Cheng
Zhuo Chen
Qian Lin

The physics-informed neural networks (PINNs) are widely applied in solving differential equations. However, few studies have discussed their consistency. In this paper, we consider the consistency of PINNs when applied to second-order elliptic equations with Dirichlet boundary conditions. We first provide the necessary and sufficient condition for the consistency of the physics-informed kernel gradient flow algorithm, and then as a direct corollary, when the neural network is sufficiently wide, we obtain a necessary and sufficient condition for the consistency of PINNs based on the neural tangent kernel theory. We also estimate the non-asymptotic loss bounds of physics-informed kernel gradient flow and PINN under suitable stronger assumptions. Finally, these results inspires us to construct a notable pathological example where the PINN method is inconsistent.

PDF Details

AAAI Conference 2025 Conference Paper

Offline Multi-Agent Reinforcement Learning via In-Sample Sequential Policy Optimization

Zongkai Liu
Qian Lin
Chao Yu
Xiawei Wu
Yile Liang
Donghui Li
Xuetao Ding

Offline Multi-Agent Reinforcement Learning (MARL) is an emerging field that aims to learn optimal multi-agent policies from pre-collected datasets. Compared to single-agent case, multi-agent setting involves a large joint state-action space and coupled behaviors of multiple agents, which bring extra complexity to offline policy optimization. In this work, we revisit the existing offline MARL methods and show that in certain scenarios they can be problematic, leading to uncoordinated behaviors and out-of-distribution (OOD) joint actions. To address these issues, we propose a new offline MARL algorithm, named In-Sample Sequential Policy Optimization (InSPO). InSPO sequentially updates each agent's policy in an in-sample manner, which not only avoids selecting OOD joint actions but also carefully considers teammates' updated policies to enhance coordination. Additionally, by thoroughly exploring low-probability actions in the behavior policy, InSPO can well address the issue of premature convergence to sub-optimal solutions. Theoretically, we prove InSPO guarantees monotonic policy improvement and converges to quantal response equilibrium (QRE). Experimental results demonstrate the effectiveness of our method compared to current state-of-the-art offline MARL methods.

PDF Details DOI

JMLR Journal 2025 Journal Article

Optimal Rates of Kernel Ridge Regression under Source Condition in Large Dimensions

Haobo Zhang
Yicheng Li
Weihao Lu
Qian Lin

Motivated by studies of neural networks, particularly the neural tangent kernel theory, we investigate the large-dimensional behavior of kernel ridge regression, where the sample size satisfies $n $ is proportion to $ d^{\gamma}$ for some $\gamma > 0$. Given a reproducing kernel Hilbert space $H$ associated with an inner product kernel defined on the unit sphere $S^{d}$, we assume that the true function $f_{\rho}^{*}$ belongs to the interpolation space $[H]^{s}$ for some $s>0$ (source condition). We first establish the exact order (both upper and lower bounds) of the generalization error of KRR for the optimally chosen regularization parameter $\lambda$. Furthermore, we show that KRR is minimax optimal when $0 1$, KRR fails to achieve minimax optimality, exhibiting the saturation effect. Our results illustrate that the convergence rate with respect to dimension $d$ varying along $\gamma$ exhibits a periodic plateau behavior, and the convergence rate with respect to sample size $n$ exhibits a multiple descent behavior. Interestingly, our work unifies several recent studies on kernel regression in the large-dimensional setting, which correspond to $s=0$ and $s=1$, respectively. [abs] [ pdf ][ bib ] &copy JMLR 2025. ( edit, beta )

PDF Details

NeurIPS Conference 2024 Conference Paper

An Offline Adaptation Framework for Constrained Multi-Objective Reinforcement Learning

Qian Lin
Zongkai Liu
Danying Mo
Chao Yu

In recent years, significant progress has been made in multi-objective reinforcement learning (RL) research, which aims to balance multiple objectives by incorporating preferences for each objective. In most existing studies, specific preferences must be provided during deployment to indicate the desired policies explicitly. However, designing these preferences depends heavily on human prior knowledge, which is typically obtained through extensive observation of high-performing demonstrations with expected behaviors. In this work, we propose a simple yet effective offline adaptation framework for multi-objective RL problems without assuming handcrafted target preferences, but only given several demonstrations to implicitly indicate the preferences of expected policies. Additionally, we demonstrate that our framework can naturally be extended to meet constraints on safety-critical objectives by utilizing safe demonstrations, even when the safety thresholds are unknown. Empirical results on offline multi-objective and safe tasks demonstrate the capability of our framework to infer policies that align with real preferences while meeting the constraints implied by the provided demonstrations.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Improving Adaptivity via Over-Parameterization in Sequence Models

Yicheng Li
Qian Lin

It is well known that eigenfunctions of a kernel play a crucial role in kernel regression. Through several examples, we demonstrate that even with the same set of eigenfunctions, the order of these functions significantly impacts regression outcomes. Simplifying the model by diagonalizing the kernel, we introduce an over-parameterized gradient descent in the realm of sequence model to capture the effects of various orders of a fixed set of eigen-functions. This method is designed to explore the impact of varying eigenfunction orders. Our theoretical results show that the over-parameterization gradient flow can adapt to the underlying structure of the signal and significantly outperform the vanilla gradient flow method. Moreover, we also demonstrate that deeper over-parameterization can further enhance the generalization capability of the model. These results not only provide a new perspective on the benefits of over-parameterization and but also offer insights into the adaptivity and generalization potential of neural networks beyond the kernel regime.

PDF Details DOI

ICLR Conference 2024 Conference Paper

Off-Policy Primal-Dual Safe Reinforcement Learning

Zifan Wu
Bo Tang 0018
Qian Lin
Chao Yu 0004
Shangqin Mao
Qianlong Xie
Xingxing Wang
Dong Wang 0022

Primal-dual safe RL methods commonly perform iterations between the primal update of the policy and the dual update of the Lagrange Multiplier. Such a training paradigm is highly susceptible to the error in cumulative cost estimation since this estimation serves as the key bond connecting the primal and dual update processes. We show that this problem causes significant underestimation of cost when using off-policy methods, leading to the failure to satisfy the safety constraint. To address this issue, we propose conservative policy optimization, which learns a policy in a constraint-satisfying area by considering the uncertainty in cost estimation. This improves constraint satisfaction but also potentially hinders reward maximization. We then introduce local policy convexification to help eliminate such suboptimality by gradually reducing the estimation uncertainty. We provide theoretical interpretations of the joint coupling effect of these two ingredients and further verify them by extensive experiments. Results on benchmark tasks show that our method not only achieves an asymptotic performance comparable to state-of-the-art on-policy methods while using much fewer samples, but also significantly reduces constraint violation during training. Our code is available at https://github.com/ZifanWu/CAL.

Details

JMLR Journal 2024 Journal Article

On the Eigenvalue Decay Rates of a Class of Neural-Network Related Kernel Functions Defined on General Domains

Yicheng Li
Zixiong Yu
Guhan Chen
Qian Lin

In this paper, we provide a strategy to determine the eigenvalue decay rate (EDR) of a large class of kernel functions defined on a general domain rather than $\mathbb{S}^{d}$. This class of kernel functions include but are not limited to the neural tangent kernel associated with neural networks with different depths and various activation functions. After proving that the dynamics of training the wide neural networks uniformly approximated that of the neural tangent kernel regression on general domains, we can further illustrate the minimax optimality of the wide neural network provided that the underground truth function $f\in [\mathcal H_{\mathrm{NTK}}]^{s}$, an interpolation space associated with the RKHS $\mathcal{H}_{\mathrm{NTK}}$ of NTK. We also showed that the overfitted neural network can not generalize well. We believe our approach for determining the EDR of kernels might be also of independent interests. [abs] [ pdf ][ bib ] &copy JMLR 2024. ( edit, beta )

PDF Details

NeurIPS Conference 2024 Conference Paper

On the Impacts of the Random Initialization in the Neural Tangent Kernel Theory

Guhan Chen
Yicheng Li
Qian Lin

This paper aims to discuss the impact of random initialization of neural networks in the neural tangent kernel (NTK) theory, which is ignored by most recent works in the NTK theory. It is well known that as the network's width tends to infinity, the neural network with random initialization converges to a Gaussian process (f^{\mathrm{GP}}), which takes values in (L^{2}(\mathcal{X})), where (\mathcal{X}) is the domain of the data. In contrast, to adopt the traditional theory of kernel regression, most recent works introduced a special mirrored architecture and a mirrored (random) initialization to ensure the network's output is identically zero at initialization. Therefore, it remains a question whether the conventional setting and mirrored initialization would make wide neural networks exhibit different generalization capabilities. In this paper, we first show that the training dynamics of the gradient flow of neural networks with random initialization converge uniformly to that of the corresponding NTK regression with random initialization (f^{\mathrm{GP}}). We then show that (\mathbf{P}(f^{\mathrm{GP}} \in [\mathcal{H}^{\mathrm{NT}}]^{s}) = 1) for any (s < \frac{3}{d+1}) and (\mathbf{P}(f^{\mathrm{GP}} \in [\mathcal{H}^{\mathrm{NT}}]^{s}) = 0) for any (s \geq \frac{3}{d+1}), where ([\mathcal{H}^{\mathrm{NT}}]^{s}) is the real interpolation space of the RKHS (\mathcal{H}^{\mathrm{NT}}) associated with the NTK. Consequently, the generalization error of the wide neural network trained by gradient descent is (\Omega(n^{-\frac{3}{d+3}})), and it still suffers from the curse of dimensionality. Thus, the NTK theory may not explain the superior performance of neural networks.

PDF Details DOI

JMLR Journal 2024 Journal Article

On the Optimality of Misspecified Spectral Algorithms

Haobo Zhang
Yicheng Li
Qian Lin

In the misspecified spectral algorithms problem, researchers usually assume the underground true function $f_{\rho}^{*} \in [\mathcal{H}]^{s}$, a less-smooth interpolation space of a reproducing kernel Hilbert space (RKHS) $\mathcal{H}$ for some $s\in (0,1)$. The existing minimax optimal results require $\|f_{\rho}^{*}\|_{L^{\infty}} \alpha_{0}$ where $\alpha_{0}\in (0,1)$ is the embedding index, a constant depending on $\mathcal{H}$. Whether the spectral algorithms are optimal for all $s\in (0,1)$ is an outstanding problem lasting for years. In this paper, we show that spectral algorithms are minimax optimal for any $\alpha_{0}-\frac{1}{\beta} [abs] [ pdf ][ bib ] &copy JMLR 2024. ( edit, beta )

PDF Details

NeurIPS Conference 2024 Conference Paper

On the Saturation Effects of Spectral Algorithms in Large Dimensions

Weihao Lu
Haobo Zhang
Yicheng Li
Qian Lin

The saturation effects, which originally refer to the fact that kernel ridge regression (KRR) fails to achieve the information-theoretical lower bound when the regression function is over-smooth, have been observed for almost 20 years and were rigorously proved recently for kernel ridge regression and some other spectral algorithms over a fixed dimensional domain. The main focus of this paper is to explore the saturation effects for a large class of spectral algorithms (including the KRR, gradient descent, etc. ) in large dimensional settings where $n \asymp d^{\gamma}$. More precisely, we first propose an improved minimax lower bound for the kernel regression problem in large dimensional settings and show that the gradient flow with early stopping strategy will result in an estimator achieving this lower bound (up to a logarithmic factor). Similar to the results in KRR, we can further determine the exact convergence rates (both upper and lower bounds) of a large class of (optimal tuned) spectral algorithms with different qualification $\tau$'s. In particular, we find that these exact rate curves (varying along $\gamma$) exhibit the periodic plateau behavior and the polynomial approximation barrier. Consequently, we can fully depict the saturation effects of the spectral algorithms and reveal a new phenomenon in large dimensional settings (i. e. , the saturation effect occurs in large dimensional setting as long as the source condition $s>\tau$ while it occurs in fixed dimensional setting as long as $s>2\tau$).

PDF Details DOI

AAMAS Conference 2024 Conference Paper

Policy-regularized Offline Multi-objective Reinforcement Learning

Qian Lin
Chao Yu
Zongkai Liu
Zifan Wu

In this paper, we aim to utilize only offline trajectory data to train a policy for multi-objective RL. We extend the offline policy-regularized method, a widely-adopted approach for single-objective offline RL problems, into the multi-objective setting in order to achieve the above goal. However, such methods face a new challenge in offline MORL settings, namely the preference-inconsistent demonstration problem. We propose two solutions to this problem: 1) filtering out preference-inconsistent demonstrations via approximating behavior preferences, and 2) adopting regularization techniques with high policy expressiveness. Moreover, we integrate the preferenceconditioned scalarized update method into policy-regularized offline RL, in order to simultaneously learn a set of policies using a single policy network, thus reducing the computational costs induced by the training of a large number of individual policies for various preferences. Finally, we introduce Regularization Weight Adaptation to dynamically determine appropriate regularization weights for arbitrary target preferences during deployment. Empirical results on various multi-objective datasets demonstrate the capability of our approach in solving offline MORL problems. 1

PDF

ICLR Conference 2024 Conference Paper

The optimality of kernel classifiers in Sobolev space

Jianfa Lai
Zhifan Li
Dongming Huang
Qian Lin

Kernel methods are widely used in machine learning, especially for classification problems. However, the theoretical analysis of kernel classification is still limited. This paper investigates the statistical performances of kernel classifiers. With some mild assumptions on the conditional probability $\eta(x)=\mathbb{P}(Y=1\mid X=x)$, we derive an upper bound on the classification excess risk of a kernel classifier using recent advances in the theory of kernel regression. We also obtain a minimax lower bound for Sobolev spaces, which shows the optimality of the proposed classifier. Our theoretical results can be extended to the generalization error of overparameterized neural network classifiers. To make our theoretical results more applicable in realistic settings, we also propose a simple method to estimate the interpolation smoothness of $2\eta(x)-1$ and apply the method to real datasets.

Details

NeurIPS Conference 2023 Conference Paper

On the Asymptotic Learning Curves of Kernel Ridge Regression under Power-law Decay

Yicheng Li
Haobo Zhang
Qian Lin

The widely observed 'benign overfitting phenomenon' in the neural network literature raises the challenge to the `bias-variance trade-off' doctrine in the statistical learning theory. Since the generalization ability of the 'lazy trained' over-parametrized neural network can be well approximated by that of the neural tangent kernel regression, the curve of the excess risk (namely, the learning curve) of kernel ridge regression attracts increasing attention recently. However, most recent arguments on the learning curve are heuristic and are based on the 'Gaussian design' assumption. In this paper, under mild and more realistic assumptions, we rigorously provide a full characterization of the learning curve in the asymptotic senseunder a power-law decay condition of the eigenvalues of the kernel and also the target function. The learning curve elaborates the effect and the interplay of the choice of the regularization parameter, the source condition and the noise. In particular, our results suggest that the 'benign overfitting phenomenon' exists in over-parametrized neural networks only when the noise level is small.

PDF Details

ICML Conference 2023 Conference Paper

On the Optimality of Misspecified Kernel Ridge Regression

Haobo Zhang 0004
Yicheng Li 0004
Weihao Lu 0002
Qian Lin

In the misspecified kernel ridge regression problem, researchers usually assume the underground true function $f_{\rho}^{\star} \in [\mathcal{H}]^{s}$, a less-smooth interpolation space of a reproducing kernel Hilbert space (RKHS) $\mathcal{H}$ for some $s\in (0, 1)$. The existing minimax optimal results require $\left\Vert f_{\rho}^{\star} \right \Vert_{L^{\infty}} \alpha_{0}$ where $\alpha_{0} \in (0, 1) $ is the embedding index, a constant depending on $\mathcal{H}$. Whether the KRR is optimal for all $s\in (0, 1)$ is an outstanding problem lasting for years. In this paper, we show that KRR is minimax optimal for any $s\in (0, 1)$ when the $\mathcal{H}$ is a Sobolev RKHS.

Details

ICLR Conference 2023 Conference Paper

On the Saturation Effect of Kernel Ridge Regression

Yicheng Li 0004
Haobo Zhang 0004
Qian Lin

The saturation effect refers to the phenomenon that the kernel ridge regression (KRR) fails to achieve the information theoretical lower bound when the smoothness of the underground truth function exceeds certain level. The saturation effect has been widely observed in practices and a saturation lower bound of KRR has been conjectured for decades. In this paper, we provide a proof of this long-standing conjecture.

Details

ICML Conference 2023 Conference Paper

Safe Offline Reinforcement Learning with Real-Time Budget Constraints

Qian Lin
Bo Tang 0018
Zifan Wu
Chao Yu 0004
Shangqin Mao
Qianlong Xie
Xingxing Wang
Dong Wang 0022

Aiming at promoting the safe real-world deployment of Reinforcement Learning (RL), research on safe RL has made significant progress in recent years. However, most existing works in the literature still focus on the online setting where risky violations of the safety budget are likely to be incurred during training. Besides, in many realworld applications, the learned policy is required to respond to dynamically determined safety budgets (i. e. , constraint threshold) in real time. In this paper, we target at the above real-time budget constraint problem under the offline setting, and propose Trajectory-based REal-time Budget Inference (TREBI) as a novel solution that approaches this problem from the perspective of trajectory distribution. Theoretically, we prove an error bound of the estimation on the episodic reward and cost under the offline setting and thus provide a performance guarantee for TREBI. Empirical results on a wide range of simulation tasks and a real-world large-scale advertising application demonstrate the capability of TREBI in solving real-time budget constraint problems under offline settings.

Details

AAAI Conference 2022 Conference Paper

A Semi-supervised Learning Approach with Two Teachers to Improve Breakdown Identification in Dialogues

Qian Lin
Hwee Tou Ng

Identifying breakdowns in ongoing dialogues helps to improve communication effectiveness. Most prior work on this topic relies on human annotated data and data augmentation to learn a classification model. While quality labeled dialogue data requires human annotation and is usually expensive to obtain, unlabeled data is easier to collect from various sources. In this paper, we propose a novel semi-supervised teacher-student learning framework to tackle this task. We introduce two teachers which are trained on labeled data and perturbed labeled data respectively. We leverage unlabeled data to improve classification in student training where we employ two teachers to refine the labeling of unlabeled data through teacher-student learning in a bootstrapping manner. Through our proposed training approach, the student can achieve improvements over singleteacher performance. Experimental results on the Dialogue Breakdown Detection Challenge dataset DBDC5 and Learning to Identify Follow-Up Questions dataset LIF show that our approach outperforms all previous published approaches as well as other supervised and semi-supervised baseline methods.

PDF Details

ICRA Conference 2019 Conference Paper

Efficient Exact Collision Detection between Ellipsoids and Superquadrics via Closed-form Minkowski Sums

Sipu Ruan
Karen L. Poblete
Yingke Li
Qian Lin
Qianli Ma 0002
Gregory S. Chirikjian

Collision detection has attracted attention of researchers for decades in the field of computer graphics, robot motion planning, computer aided design, etc. A large number of successful algorithms have been proposed and applied, which make use of convex polytopes and bounding volumes as primitives. However, algorithms for those shapes rely significantly on the complexity of the meshes. This paper deals with collision detection for shapes with simple and exact mathematical descriptions, such as ellipsoids and superquadrics. These primitives have a wide range of applications in representing complex objects and have much fewer parameters than meshes. The foundation of the proposed collision detection scheme relies on the closed-form Minkowski sums between ellipsoids and superquadrics in n-dimensional Euclidean space. The basic idea here is to shrink the ellipsoid into a point and expand each superquadric into a new offset surface with closed-form parametric expression. The solutions for detecting relative positions between a point and a general convex differentiable parametric surface in both 2D and 3D are derived, leading to an algorithm for exact collision detection. To compare between exact and inexact algorithms, an accuracy metric is introduced based on the Principal Kinematic Formula (PKF). The proposed algorithm is then compared with existing wellknown algorithms: Gilbert-Johnson-Keerthi (GJK) and Algebraic Separation Conditions (ASC). The results show that the proposed algorithm performs competitively with these efficient checkers.

Details

AIJ Journal 2019 Journal Article

Exploiting reverse target-side contexts for neural machine translation via asynchronous bidirectional decoding

Jinsong Su
Xiangwen Zhang
Qian Lin
Yue Qin
Junfeng Yao
Yang Liu

Details DOI