Author name cluster

Lisha Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

16 papers

2 author rows

NeurIPS Conference 2025 Conference Paper

Beyond Value Functions: Single-Loop Bilevel Optimization under Flatness Conditions

Liuyuan Jiang
Quan Xiao
Lisha Chen
Tianyi Chen

Bilevel optimization, a hierarchical optimization paradigm, has gained significant attention in a wide range of practical applications, notably in the fine-tuning of generative models. However, due to the nested problem structure, most existing algorithms require either the Hessian vector calculation or the nested loop updates, which are computationally inefficient in large language model (LLM) fine-tuning. In this paper, building upon the fully first-order penalty-based approach, we propose an efficient value function-free (\textsf{PBGD-Free}) algorithm that eliminates the loop of solving the lower-level problem and admits fully single-loop updates. Inspired by the landscape analysis of representation learning-based LLM fine-tuning problem, we propose a relaxed flatness condition for the upper-level function and prove the convergence of the proposed value-function-free algorithm. We test the performance of the proposed algorithm in various applications and demonstrate its superior computational efficiency over the state-of-the-art bilevel methods.

PDF Details

ICML Conference 2025 Conference Paper

Efficient First-Order Optimization on the Pareto Set for Multi-Objective Learning under Preference Guidance

Lisha Chen
Quan Xiao
Ellen Hidemi Fukuda
Xinyi Chen
Kun Yuan
Tianyi Chen

Multi-objective learning under user-specified preference is common in real-world problems such as multi-lingual speech recognition under fairness. In this work, we frame such a problem as a semivectorial bilevel optimization problem, whose goal is to optimize a pre-defined preference function, subject to the constraint that the model parameters are weakly Pareto optimal. To solve this problem, we convert the multi-objective constraints to a single-objective constraint through a merit function with an easy-to-evaluate gradient, and then, we use a penalty-based reformulation of the bilevel optimization problem. We theoretically establish the properties of the merit function, and the relations of solutions for the penalty reformulation and the constrained formulation. Then we propose algorithms to solve the reformulated single-level problem, and establish its convergence guarantees. We test the method on various synthetic and real-world problems. The results demonstrate the effectiveness of the proposed method in finding preference-guided optimal solutions to the multi-objective problem.

Details

NeurIPS Conference 2025 Conference Paper

Objective Soups: Multilingual Multi-Task Modeling for Speech Processing

A F M Saif
Lisha Chen
Xiaodong Cui
Songtao Lu
Brian Kingsbury
Tianyi Chen

The need for training multilingual multi-task speech processing (MSP) models that perform both automatic speech recognition and speech-to-text translation is increasingly evident. However, a significant challenge arises from the conflicts among multiple objectives when using a single model. Multi-objective optimization can address this challenge by facilitating the optimization of multiple conflicting objectives and aligning the gradient updates in a common descent direction. While multi-objective optimization helps avoid conflicting gradient updates, a critical issue is that when there are many objectives, such as in MSP, it is often {\em difficult to find} a common descent direction. This leads to an important question: Is it more effective to separate highly conflicting objectives into different optimization levels or to keep them in a single level? To address this question, this paper investigates three multi-objective MSP formulations, which we refer to as \textbf{objective soup recipes}. These formulations apply multi-objective optimization at different optimization levels to mitigate potential conflicts among all objectives. To keep computation and memory overhead low, we incorporate a lightweight layer‑selection strategy that detects the most conflicting layers and uses only their gradients when computing the conflict‑avoidance direction. We conduct an extensive investigation using the CoVoST v2 dataset for combined multilingual ASR and ST tasks, along with the LibriSpeech and AISHELL-1 datasets for multilingual ASR, to identify highly conflicting objectives and determine the most effective training recipe among the three proposed multi-objective optimization algorithms.

PDF Details

NeurIPS Conference 2024 Conference Paper

FERERO: A Flexible Framework for Preference-Guided Multi-Objective Learning

Lisha Chen
AFM Saif
Yanning Shen
Tianyi Chen

Finding specific preference-guided Pareto solutions that represent different trade-offs among multiple objectives is critical yet challenging in multi-objective problems. Existing methods are restrictive in preference definitions and/or their theoretical guarantees. In this work, we introduce a Flexible framEwork for pREfeRence-guided multi-Objective learning ( FERERO ) by casting it as a constrained vector optimization problem. Specifically, two types of preferences are incorporated into this formulation -- the relative preference defined by the partial ordering induced by a polyhedral cone, and the absolute preference defined by constraints that are linear functions of the objectives. To solve this problem, convergent algorithms are developed with both single-loop and stochastic variants. Notably, this is the first single-loop primal algorithm for constrained optimization to our knowledge. The proposed algorithms adaptively adjust to both constraint and objective values, eliminating the need to solve different subproblems at different stages of constraint satisfaction. Experiments on multiple benchmarks demonstrate the proposed method is very competitive in finding preference-guided optimal solutions. Code is available at https: //github. com/lisha-chen/FERERO/.

PDF Details DOI

JMLR Journal 2024 Journal Article

Three-Way Trade-Off in Multi-Objective Learning: Optimization, Generalization and Conflict-Avoidance

Lisha Chen
Heshan Fernando
Yiming Ying
Tianyi Chen

Multi-objective learning (MOL) often arises in machine learning problems when there are multiple data modalities or tasks. One critical challenge in MOL is the potential conflict among different objectives during the optimization process. Recent works have developed various dynamic weighting algorithms for MOL, where the central idea is to find an update direction that avoids conflicts among objectives. Albeit its appealing intuition, empirical studies show that dynamic weighting methods may not outperform static ones. To understand this theory-practice gap, we focus on a stochastic variant of MGDA, the Multi-objective gradient with Double sampling (MoDo), and study the generalization performance and its interplay with optimization through the lens of algorithmic stability in the framework of statistical learning theory. We find that the key rationale behind MGDA—updating along conflict-avoidant direction—may hinder dynamic weighting algorithms from achieving the optimal $O(1/\sqrt{n})$ population risk, where $n$ is the number of training samples. We further demonstrate the impact of dynamic weights on the three-way trade-off among optimization, generalization, and conflict avoidance unique in MOL. We showcase the generality of our theoretical framework by analyzing other algorithms under the framework. Experiments on various multi-task learning benchmarks are performed to demonstrate the practical applicability. Code is available at https://github.com/heshandevaka/Trade-Off-MOL. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2024. ( edit, beta )

PDF Details

NeurIPS Conference 2023 Conference Paper

Three-Way Trade-Off in Multi-Objective Learning: Optimization, Generalization and Conflict-Avoidance

Lisha Chen
Heshan Fernando
Yiming Ying
Tianyi Chen

Multi-objective learning (MOL) often arises in emerging machine learning problems when multiple learning criteria or tasks need to be addressed. Recent works have developed various _dynamic weighting_ algorithms for MOL, including MGDA and its variants, whose central idea is to find an update direction that _avoids conflicts_ among objectives. Albeit its appealing intuition, empirical studies show that dynamic weighting methods may not always outperform static alternatives. To bridge this gap between theory and practice, we focus on a new variant of stochastic MGDA - the Multi-objective gradient with Double sampling (MoDo) algorithm and study its generalization performance and the interplay with optimization through the lens of algorithm stability. We find that the rationale behind MGDA -- updating along conflict-avoidant direction - may \emph{impede} dynamic weighting algorithms from achieving the optimal ${\cal O}(1/\sqrt{n})$ population risk, where $n$ is the number of training samples. We further highlight the variability of dynamic weights and their impact on the three-way trade-off among optimization, generalization, and conflict avoidance that is unique in MOL. Code is available at https: //github. com/heshandevaka/Trade-Off-MOL.

PDF Details

ICML Conference 2022 Conference Paper

Sharp-MAML: Sharpness-Aware Model-Agnostic Meta Learning

Momin Abbas
Quan Xiao
Lisha Chen
Pin-Yu Chen
Tianyi Chen

Model-agnostic meta learning (MAML) is currently one of the dominating approaches for few-shot meta-learning. Albeit its effectiveness, the optimization of MAML can be challenging due to the innate bilevel problem structure. Specifically, the loss landscape of MAML is much more complex with possibly more saddle points and local minimizers than its empirical risk minimization counterpart. To address this challenge, we leverage the recently invented sharpness-aware minimization and develop a sharpness-aware MAML approach that we term Sharp-MAML. We empirically demonstrate that Sharp-MAML and its computation-efficient variant can outperform the plain-vanilla MAML baseline (e. g. , +3% accuracy on Mini-Imagenet). We complement the empirical study with the convergence rate analysis and the generalization bound of Sharp-MAML. To the best of our knowledge, this is the first empirical and theoretical study on sharpness-aware minimization in the context of bilevel learning.

Details

NeurIPS Conference 2022 Conference Paper

Understanding Benign Overfitting in Gradient-Based Meta Learning

Lisha Chen
Songtao Lu
Tianyi Chen

Meta learning has demonstrated tremendous success in few-shot learning with limited supervised data. In those settings, the meta model is usually overparameterized. While the conventional statistical learning theory suggests that overparameterized models tend to overfit, empirical evidence reveals that overparameterized meta learning methods still work well -- a phenomenon often called ``benign overfitting. '' To understand this phenomenon, we focus on the meta learning settings with a challenging bilevel structure that we term the gradient-based meta learning, and analyze its generalization performance under an overparameterized meta linear regression model. While our analysis uses the relatively tractable linear models, our theory contributes to understanding the delicate interplay among data heterogeneity, model adaptation and benign overfitting in gradient-based meta learning tasks. We corroborate our theoretical claims through numerical simulations.

PDF Details

AAAI Conference 2021 Conference Paper

Uncertain Graph Neural Networks for Facial Action Unit Detection

Tengfei Song
Lisha Chen
Wenming Zheng
Qiang Ji

Capturing the dependencies among different facial action units (AU) is extremely important for the AU detection task. Many studies have employed graph-based deep learning methods to exploit the dependencies among AUs. However, the dependencies among AUs in real world data are often noisy and the uncertainty is essential to be taken into consideration. Rather than employing a deterministic mode, we propose an uncertain graph neural network (UGN) to learn the probabilistic mask that simultaneously captures both the individual dependencies among AUs and the uncertainties. Further, we propose an adaptive weighted loss function based on the epistemic uncertainties to adaptively vary the weights of the training samples during the training process to account for unbalanced data distributions among AUs. We also provide an insightful analysis on how the uncertainties are related to the performance of AU detection. Extensive experiments, conducted on two benchmark datasets, i. e. , BP4D and DISFA, demonstrate our method achieves the state-of-the-art performance.

PDF Details

NeurIPS Conference 2019 Conference Paper

Deep Structured Prediction for Facial Landmark Detection

Lisha Chen
Hui Su
Qiang Ji

Existing deep learning based facial landmark detection methods have achieved excellent performance. These methods, however, do not explicitly embed the structural dependencies among landmark points. They hence cannot preserve the geometric relationships between landmark points or generalize well to challenging conditions or unseen data. This paper proposes a method for deep structured facial landmark detection based on combining a deep Convolutional Network with a Conditional Random Field. We demonstrate its superior performance to existing state-of-the-art techniques in facial landmark detection, especially a better generalization ability on challenging datasets that include large pose and occlusion.

PDF Details

IJCAI Conference 2019 Conference Paper

Embodied Conversational AI Agents in a Multi-modal Multi-agent Competitive Dialogue

Rahul R. Divekar
Xiangyang Mou
Lisha Chen
Maíra Gatti de Bayser
Melina Alberio Guerra
Hui Su

In a setting where two AI agents embodied as animated humanoid avatars are engaged in a conversation with one human and each other, we see two challenges. One, determination by the AI agents about which one of them is being addressed. Two, determination by the AI agents if they may/could/should speak at the end of a turn. In this work we bring these two challenges together and explore the participation of AI agents in multi-party conversations. Particularly, we show two embodied AI shopkeeper agents who sell similar items aiming to get the business of a user by competing with each other on the price. In this scenario, we solve the first challenge by using headpose (estimated by deep learning techniques) to determine who the user is talking to. For the second challenge we use deontic logic to model rules of a negotiation conversation.

PDF Details

ICRA Conference 2014 Conference Paper

Real-time damping estimation for variable impedance actuators

Navvab Kashiri
Matteo Laffranchi
Jinoh Lee
Nikos G. Tsagarakis
Lisha Chen
Darwin G. Caldwell

Recently-developed variable damping mechanisms have been exploited as a complement to compliant actuators. While accurate knowledge and control of generated damping is essential for achieving the desired performance, no physical sensor measuring the damping exists. This work introduces a novel non-model-based approach for the estimation of time-variant damping for variable impedance actuation systems. The approach is based only on torque and position/velocity measurements; without the knowledge of system's inputs, to ensure the estimation of both intentional and unintentional changes. Hence, a recursive least square estimator, modified for achieving a proper convergence for the estimation of time-variant parameters, is exploited. Experiments on a variable physical damping actuator are also presented to validate the performance of proposed approach.

Details

IROS Conference 2013 Conference Paper

Link position control of a compliant actuator with unknown transmission friction torque

Lisha Chen
Matteo Laffranchi
Jinoh Lee
Navvab Kashiri
Nikos G. Tsagarakis
Darwin G. Caldwell

This paper proposes a control strategy for a compliant actuator, the CompAct™ actuator, which is equipped with semi active friction dampers in its transmission system. Both the transmission flexibility and the nonlinearity of the friction based damping torque makes the control of this actuator not a trivial task. This paper studies model of the presented actuator and the control problem of accurate link position tracking based on sliding mode approach that considers the friction torque as an uncertainty. Stability analysis and simulations highlight the effectiveness of the proposed controller in compensating for the deflections and unknown friction torque of the actuator. The performance of the controller is also validated by experiment results that demonstrate the tracking performance of the CompAct™ actuator achieved by the presented control strategy.

Details

ICRA Conference 2013 Conference Paper

Optimal control for maximizing velocity of the CompAct™ compliant actuator

Lisha Chen
Manolo Garabini
Matteo Laffranchi
Navvab Kashiri
Nikos G. Tsagarakis
Antonio Bicchi
Darwin G. Caldwell

The CompAct™ actuator features a clutch mechanism placed in parallel with its passive series elastic transmission element and can therefore benefit from the advantages of both series elastic actuators (SEA) and rigid actuators. The actuator is capable of effectively managing the storage and release of the potential energy of the compliant element by the appropriate control of the clutch subsystem. Controlling the timing of the energy storage/release in the elastic element is exploited for improving motion control in this research. This paper analyses how this class of actuation systems can be used to maximize the link velocity of the joint. The dynamic model of the joint is derived and an optimal control strategy is proposed to identify optimal input reference profiles for the actuator (motor position/velocity and clutch activation timing) which permit the link velocity maximization. The effect of compliance of the joint on the performance of the system is studied and the optimal stiffness is analyzed.

Details

JMLR Journal 2013 Journal Article

Stress Functions for Nonlinear Dimension Reduction, Proximity Analysis, and Graph Drawing

Lisha Chen
Andreas Buja

Multidimensional scaling (MDS) is the art of reconstructing pointsets (embeddings) from pairwise distance data, and as such it is at the basis of several approaches to nonlinear dimension reduction and manifold learning. At present, MDS lacks a unifying methodology as it consists of a discrete collection of proposals that differ in their optimization criteria, called ”stress functions”. To correct this situation we propose (1) to embed many of the extant stress functions in a parametric family of stress functions, and (2) to replace the ad hoc choice among discrete proposals with a principled parameter selection method. This methodology yields the following benefits and problem solutions: (a )It provides guidance in tailoring stress functions to a given data situation, responding to the fact that no single stress function dominates all others across all data situations; (b) the methodology enriches the supply of available stress functions; (c) it helps our understanding of stress functions by replacing the comparison of discrete proposals with a characterization of the effect of parameters on embeddings; (d) it builds a bridge to graph drawing, which is the related but not identical art of constructing embeddings from graphs. [abs] [ pdf ][ bib ] &copy JMLR 2013. ( edit, beta )

PDF Details

IROS Conference 2012 Conference Paper

The role of physical damping in compliant actuation systems

Matteo Laffranchi
Lisha Chen
Nikos G. Tsagarakis
Darwin G. Caldwell

Recently, compliance has been considered as one of the key physical properties that a robot should incorporate to be able to physically interact with humans and uncertain environments. Apart from the improved ability of interaction, mechanical robustness and higher safety-related performances, compliance introduces underdamped oscillatory modes and reduces the mechanical natural frequency of the plant to be controlled making its control much more complex than that of conventional stiff actuators. To overcome these drawbacks, some recent works focus on the incorporation of physical damping within compliant actuators. This work presents an analysis for the quantitative evaluation of the effects of physical damping in compliant robotic joints to demonstrate the improvements (dynamic performance, stability, controllability, tracking precision and energy efficiency) which can be gained by incorporating physical damping in such flexible transmission systems. Simulation and experimental results validate that these benefits can effectively be achieved on an existing compliant actuator prototype with variable physical damping.

Details