Arrow Research search

Author name cluster

Ji Cheng

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers
2 author rows

Possible papers

8

AAAI Conference 2026 Conference Paper

Offline Multi-Objective Bandits: From Logged Data to Pareto-Optimal Policies

  • Ji Cheng
  • Song Lai
  • Shunyu Yao
  • Bo Xue

Offline policy learning from logged data is a critical paradigm for enabling effective decision-making without costly online exploration. However, its application has been largely confined to single-objective problems, a stark contrast to real-world scenarios where decision-making inherently involves navigating multiple, often conflicting, objectives. This paper introduces a comprehensive framework for Offline Multi-Objective Bandits (OffMOB), providing a principled solution to the fundamental challenge of learning Pareto-optimal policies from a static dataset. Our core contribution is a novel algorithm that uniquely integrates the pessimism principle with multi-objective optimization to safely learn from off-policy data. Crucially, our approach transcends the primary limitation of scalarization techniques, which are restricted to finding a single policy for a pre-defined preference. Instead, OffMOB directly approximates the entire Pareto front, learning a single, flexible policy model capable of generating an optimal action for any desired trade-off. To rigorously evaluate performance, we introduce the Tchebycheff sub-optimality metric and establish the first finite-sample generalization bounds for this problem class, proving that our algorithm converges to the true Pareto front under practical data coverage assumptions. Extensive experiments on complex benchmarks demonstrate that OffMOB significantly outperforms existing methods, identifying the complete set of optimal trade-offs where naive extensions fail.

AAAI Conference 2026 Conference Paper

Parametric Pareto Set Learning for Expensive Multi-Objective Optimization

  • Ji Cheng
  • Bo Xue
  • Qingfu Zhang

Parametric multi-objective optimization (PMO) addresses the challenge of solving an infinite family of multi-objective optimization problems, where optimal solutions must adapt to varying parameters. Traditional methods require re-execution for each parameter configuration, leading to prohibitive costs when objective evaluations are computationally expensive. To address this issue, we propose Parametric Pareto Set Learning with multi-objective Bayesian Optimization (PPSL-MOBO), a novel framework that learns a unified mapping from both preferences and parameters to Pareto-optimal solutions. PPSL-MOBO leverages a hypernetwork with Low-Rank Adaptation (LoRA) to efficiently capture parametric variations, while integrating Gaussian process surrogates and hypervolume-based acquisition to minimize expensive function evaluations. We demonstrate PPSL-MOBO's effectiveness on two challenging applications: multi-objective optimization with shared components, where certain design variables must be identical across solution families due to modular constraints, and dynamic multi-objective optimization, where objectives evolve over time. Unlike existing methods that cannot directly solve PMO problems in a unified manner, PPSL-MOBO learns a single model that generalizes across the entire parameter space. By enabling instant inference of Pareto sets for new parameter values without retraining, PPSL-BO provides an efficient solution for expensive PMO problems.

JMLR Journal 2025 Journal Article

Lexicographic Lipschitz Bandits: New Algorithms and a Lower Bound

  • Bo Xue
  • Ji Cheng
  • Fei Liu
  • Yimu Wang
  • Lijun Zhang
  • Qingfu Zhang

This paper studies a multiobjective bandit problem under lexicographic ordering, wherein the learner aims to maximize $m$ objectives, each with different levels of importance. First, we introduce the local trade-off, $\lambda_*$, which depicts the trade-off between different objectives. For the case when an upper bound of $\lambda_*$ is known, i.e., $\lambda\geq\lambda_*$, we develop an algorithm that achieves a general regret bound of $\widetilde{O}(\Lambda^i(\lambda)T^{(d_z^i+1)/(d_z^i+2)})$ for the $i$-th objective, where $i\in\{1,2,\ldots,m\}$, $\Lambda^i(\lambda)=1+\lambda+\cdots+\lambda^{i-1}$, $d_z^i$ is the zooming dimension for the $i$-th objective, and $T$ is the time horizon. Next, we provide a matching lower bound for the lexicographic Lipschitz bandit problem, proving that our algorithm is optimal in terms of $\lambda_*$ and $T$. Finally, for the case where $m=2$, we remove the dependence on the knowledge about $\lambda_*$, albeit at the cost of increasing the regret bound to $\widetilde{O}(\Lambda^i(\lambda_*)T^{(3d_z^i+4)/(3d_z^i+6)})$, which remains optimal in terms of $\lambda_*$. Compared to existing work on lexicographic multi-armed bandits, our approach improves the current regret bound of $\widetilde{O}(T^{2/3})$ and extends the number of arms to infinity. Numerical experiments confirm the effectiveness of our algorithms. [abs] [ pdf ][ bib ] &copy JMLR 2025. ( edit, beta )

IJCAI Conference 2025 Conference Paper

Multi-Objective Neural Bandits with Random Scalarization

  • Ji Cheng
  • Bo Xue
  • Chengyu Lu
  • Ziqiang Cui
  • Qingfu Zhang

Multi-objective multi-armed bandit (MOMAB) problems are crucial for complex decision-making scenarios where multiple conflicting objectives must be simultaneously optimized. However, most existing works are based on the linear assumption of the feedback rewards, which significantly constrains their applicability and efficacy in capturing the intricate dynamics of real-world environments. This paper explores a multi-objective neural bandit (MONB) framework, which integrates the universal approximators, neural networks, with the classical MOMABs. We adopt random scalarization to accommodate the special needs of a practitioner by setting an appropriate distribution on the regions of interest. Using the trade-off capabilities of upper confidence bound (UCB) and Thompson sampling (TS) strategies, we propose two novel algorithms, MONeural-UCB and MONeural-TS. Theoretical and empirical analysis demonstrate the superiority of our methods in multi-objective or multi-task bandit problems, which makes great improvement over the classical linear MOMABs.

NeurIPS Conference 2025 Conference Paper

Neural Evolution Strategy for Black-box Pareto Set Learning

  • Chengyu Lu
  • Zhenhua Li
  • Xi Lin
  • Ji Cheng
  • Qingfu Zhang

Multi-objective optimization problems (MOPs) are prevalent in numerous real-world applications. Recently, Pareto Set Learning (PSL) has emerged as a powerful paradigm for solving MOPs. PSL can produce a neural network for modeling the set of all Pareto optimal solutions. However, applying PSL to black-box objectives, particularly those exhibiting non-separability, high dimensionality, and/or other complex properties, remains very challenging. To address this issue, we propose leveraging evolution strategies (ESs), a class of specialized black-box optimization algorithms, within the PSL paradigm. Traditional ESs capture the complex dimensional dependencies less efficiently, which can significantly hinder their performance in PSL. To tackle this issue, we suggest encapsulating the dependencies within a neural network, which is then trained using a novel gradient estimation method. The proposed method, termed Neural-ES, is evaluated using a bespoke benchmark suite for black-box PSL. Experimental comparisons with other methods demonstrate the efficiency of Neural-ES, underscoring its ability to learn the Pareto sets of challenging black-box MOPs.

AAAI Conference 2024 Conference Paper

Hierarchize Pareto Dominance in Multi-Objective Stochastic Linear Bandits

  • Ji Cheng
  • Bo Xue
  • Jiaxiang Yi
  • Qingfu Zhang

Multi-objective Stochastic Linear bandit (MOSLB) plays a critical role in the sequential decision-making paradigm, however, most existing methods focus on the Pareto dominance among different objectives without considering any priority. In this paper, we study bandit algorithms under mixed Pareto-lexicographic orders, which can reflect decision makers' preferences. We adopt the Grossone approach to deal with these orders and develop the notion of Pareto-lexicographic optimality to evaluate the learners' performance. Our work represents a first attempt to address these important and realistic orders in bandit algorithms. To design algorithms under these orders, the upper confidence bound (UCB) policy and the prior free lexicographical filter are adapted to approximate the optimal arms at each round. Moreover, the framework of the algorithms involves two stages in pursuit of the balance between exploration and exploitation. Theoretical analysis as well as numerical experiments demonstrate the effectiveness of our algorithms.

AAAI Conference 2024 Conference Paper

Multiobjective Lipschitz Bandits under Lexicographic Ordering

  • Bo Xue
  • Ji Cheng
  • Fei Liu
  • Yimu Wang
  • Qingfu Zhang

This paper studies the multiobjective bandit problem under lexicographic ordering, wherein the learner aims to simultaneously maximize? objectives hierarchically. The only existing algorithm for this problem considers the multi-armed bandit model, and its regret bound is O((KT)^(2/3)) under a metric called priority-based regret. However, this bound is suboptimal, as the lower bound for single objective multi-armed bandits is Omega(KlogT). Moreover, this bound becomes vacuous when the arm number K is infinite. To address these limitations, we investigate the multiobjective Lipschitz bandit model, which allows for an infinite arm set. Utilizing a newly designed multi-stage decision-making strategy, we develop an improved algorithm that achieves a general regret bound of O(T^((d_z^i+1)/(d_z^i+2))) for the i-th objective, where d_z^i is the zooming dimension for the i-th objective, with i in {1,2,...,m}. This bound matches the lower bound of the single objective Lipschitz bandit problem in terms of T, indicating that our algorithm is almost optimal. Numerical experiments confirm the effectiveness of our algorithm.

ICRA Conference 2009 Conference Paper

Manipulation at the NanoNewton level: Micrograpsing for mechanical characterization of biomaterials

  • Keekyoung Kim
  • Xinyu Liu 0002
  • Yong Zhang 0046
  • Ji Cheng
  • Xiao Yu Wu
  • Yu Sun 0001

This paper presents the use of a monolithic, force-feedback MEMS (microelectomechanical systems) microgripper for characterizing both elastic and viscoelastic properties of highly deformable hydrogel microcapsules (15–25µm) at wet state during micromanipulation. The single-chip microgripper integrates an electrothermal microactuator and two capacitive force sensors, one for contact detection (force resolution: 38. 5nN) and the other for gripping force measurements (force resolution: 19. 9nN). Through nanoNewton force measurements, closed-loop force control, and visual tracking, the system quantified Young's modulus values and viscoelastic parameters of alginate microcapsules, demonstrating an easy-to-operate, accurate compression testing technique for characterizing soft, micrometer-sized biomaterials.