Author name cluster

Ming Lin

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

22 papers

2 author rows

EAAI Journal 2025 Journal Article

Adaptive neural network tracking control for unknown high-order nonlinear systems: A constructive approximation set based approach

Yu-Fa Liu
Yong-Hua Liu
Jin-Wa Wu
Jie Tao
Ming Lin
Chun-Yi Su
Renquan Lu

This article addresses the problem of adaptive neural network (NN) tracking control for unknown high-order nonlinear systems, with a focus on accurately constructing NN approximation sets. To guarantee the local approximation capabilities of NNs, it is crucial that their input signals remain within corresponding compact sets. However, the unknown functions and powers in high-order nonlinear systems make it difficult to determine these sets accurately. To solve this, we introduce a novel adaptive NN tracking control strategy that integrates signal substitution technique, barrier functions (BFs), and NNs. Specifically, the signal substitution technique converts the original system states into state error variables, along with the desired reference signal and its time derivatives, which serve as part of the NN input. BFs are employed to constrain the state errors, while NNs approximate the transformed unknown system functions. This approach enables precise calculation of bounds for the NN weight estimators, ensuring that the NN approximation sets are constructed. Unlike existing methods, our approach not only proves the existence of NN approximation sets but also provides a constructive design strategy, significantly enhancing the approximation accuracy for unknown nonlinear functions. Simulation results demonstrate the effectiveness and advantages of the proposed method.

Details DOI

NeurIPS Conference 2025 Conference Paper

DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization

Gang Li
Ming Lin
Tomer Galanti
Zhengzhong Tu
Tianbao Yang

The recent success and openness of DeepSeek-R1 have brought widespread attention to Group Relative Policy Optimization (GRPO) as a reinforcement learning method for large reasoning models (LRMs). In this work, we analyze the GRPO objective under a binary reward setting and reveal an inherent limitation of question-level difficulty bias arising from its group relative advantage function. We also identify a connection between GRPO and traditional discriminative methods in supervised learning. Motivated by these insights, we introduce a new Discriminative Constrained Optimization (DisCO) framework for reinforcing LRMs, grounded in the principle of discriminative learning: increasing the scores of positive answers while decreasing those of negative ones. The main differences between DisCO and GRPO and its recent variants are: (1) it replaces the group relative objective with a discriminative objective defined by a scoring function; (2) it abandons clipping-based surrogates in favor of non-clipping RL surrogate objectives used as scoring functions; (3) it employs a simple yet effective constrained optimization approach to enforce the KL divergence constraint. As a result, DisCO offers notable advantages over GRPO and its variants: (i) it completely eliminates difficulty bias by adopting discriminative objectives; (ii) it addresses the entropy instability in GRPO and its variants through the use of non-clipping scoring functions and a constrained optimization approach, yielding long and stable training dynamics; (iii) it allows the incorporation of advanced discriminative learning techniques to address data imbalance, where a significant number of questions have more negative than positive generated answers during training. Our experiments on enhancing the mathematical reasoning capabilities of SFT-finetuned models show that DisCO significantly outperforms GRPO and its improved variants such as DAPO, achieving average gains of 7\% over GRPO and 6\% over DAPO across six benchmark tasks for a 1. 5B model.