Arrow Research search

Author name cluster

Fan Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

14 papers
2 author rows

Possible papers

14

EAAI Journal 2026 Journal Article

A visual detection method for blueberries with different ripeness in a complex orchard environment

  • Chengbiao Fu
  • Fan Chen
  • Anhong Tian

Efficiently and accurately detecting blueberries of different maturity levels in the orchard is the key to smart agriculture. We propose a detection model based on the You Only Look Once (YOLO) version 11 nano (YOLOv11n) architecture for detecting blueberries. It incorporates an Efficient Multi-scale Attention (EMA) module in the shallow backbone, employs a Region-focused Contrastive Loss (RCL), and upgrades the neck with a "Gather and Distribution" (GD) mechanism. We name it YOLOv11-EMA-RCL-GD (abbreviated as YOLOv11-ERG for brevity). The improved model incorporates three key modules: the EMA module enhances background-fruit distinction; the RCL method focuses on local regions for detecting occluded fruits; and the GD mechanism strengthens multi-scale feature fusion. It was trained and tested on our self-constructed blueberry dataset. YOLOv11-ERG achieves a mean Average Precision (mAP) of 91. 8% at an Intersection over Union (IoU) threshold of 50% (mAP@0. 5), and a mAP of 68. 6% at IoU thresholds ranging from 50% to 95% (mAP@0. 5-0. 95), with 5. 93 million parameters and 10. 9 Giga Floating-point Operations Per Second. The proposed model outperforms several existing detectors, including Real-Time DEtection TRansformer-Large(RT-DETR-L), Faster Region-based Convolutional Network method (Faster R-CNN), and other YOLO series models. While achieving higher detection performance, the improved model requires just 62. 8% of the parameters and 51. 2% of the computation of YOLOv11-small (YOLOv11s). This balance therefore fulfills the key need for efficient and accurate detection in smart agriculture. In addition, experimental results show that the RCL method can effectively improve model performance on other public agricultural datasets. The code is publicly available: https: //github. com/C-F-Chen/YOLOv11-ERG.

AAAI Conference 2026 Conference Paper

CO2-Meter: A Comprehensive Carbon Footprint Estimator for LLMs on Edge Devices

  • Zhenxiao Fu
  • Fan Chen
  • Lei Jiang

LLMs have transformed NLP, yet deploying them on edge devices poses great carbon challenges. Prior estimators remain incomplete, neglecting peripheral energy use, distinct prefill/decode behaviors, and SoC design complexity. This paper presents CO2-Meter, a unified framework for estimating operational and embodied carbon in LLM edge inference. Contributions include: (1) equation-based peripheral energy models and datasets; (2) a GNN-based predictor with phase-specific LLM energy data; (3) a unit-level embodied carbon model for SoC bottleneck analysis; and (4) validation showing superior accuracy over prior methods. Case studies show CO2-Meter's effectiveness in identifying carbon hotspots and guiding sustainable LLM design on edge platforms.

AAAI Conference 2026 Conference Paper

Deep Reinforcement Learning for Scalable Offline Three-Dimensional Packing

  • Hao Yin
  • Hongjie He
  • Fan Chen

With the increasing number of items requiring handling simultaneously in complex logistics, offline three-dimensional packing methods need to plan larger numbers of items. Existing deep reinforcement learning (DRL)-based packing methods cannot plan for large numbers of items while keeping high-quality solutions due to limited exploration space and high computational complexity. To address this issue, this paper proposes a scalable DRL-based packing method. An attention-based pack-Q-network (PQNet) is constructed to learn the optimal packing policy by integrating unpacked items, available spaces, and packed items. To expand the valid exploration space, a bidding-based multi-policy (BBMP) framework composed of multiple PQNets is designed to efficiently explore more latent valid solutions, thus enhancing solution quality. To reduce computational complexity, a training-free dynamic candidate selection (DCS) framework is proposed to incorporate comprehensive item information during execution with minimal computation overhead, which helps in effectively planning large numbers of items. Experimental results show that across item numbers of 20~1000, our method consistently outperforms the best-performing baseline at each tested scale by 3.2%~13.1% in space utilization.

ICLR Conference 2025 Conference Paper

Multi-Reward as Condition for Instruction-based Image Editing

  • Xin Gu 0003
  • Ming Li
  • Libo Zhang 0001
  • Fan Chen
  • Longyin Wen
  • Tiejian Luo
  • Sijie Zhu

High-quality training triplets (instruction, original image, edited image) are essential for instruction-based image editing. Predominant training datasets (e.g., InsPix2Pix) are created using text-to-image generative models (e.g., Stable Diffusion, DALL-E) which are not trained for image editing. Accordingly, these datasets suffer from inaccurate instruction following, poor detail preserving, and generation artifacts. In this paper, we propose to address the training data quality issue with multi-perspective reward data instead of refining the ground-truth image quality. 1) we first design a quantitative metric system based on best-in-class LVLM (Large Vision Language Model), i.e., GPT-4o in our case, to evaluate the generation quality from 3 perspectives, namely, instruction following, detail preserving, and generation quality. For each perspective, we collected quantitative score in $0\sim 5$ and text descriptive feedback on the specific failure points in ground-truth edited images, resulting in a high-quality editing reward dataset, i.e., RewardEdit20K. 2) We further proposed a novel training framework to seamlessly integrate the metric output, regarded as multi-reward, into editing models to learn from the imperfect training triplets. During training, the reward scores and text descriptions are encoded as embeddings and fed into both the latent space and the U-Net of the editing models as auxiliary conditions. During inference, we set these additional conditions to the highest score with no text description for failure points, to aim at the best generation outcome. 3) We also build a challenging evaluation benchmark with real-world images/photos and diverse editing instructions, named as Real-Edit. Experiments indicate that our multi-reward conditioned model outperforms its no-reward counterpart on two popular editing pipelines, i.e., InsPix2Pix and SmartEdit. Code is released at https://github.com/bytedance/Multi-Reward-Editing.

NeurIPS Conference 2025 Conference Paper

Outcome-Based Online Reinforcement Learning: Algorithms and Fundamental Limits

  • Fan Chen
  • Zeyu Jia
  • Alexander Rakhlin
  • Tengyang Xie

Reinforcement learning with outcome-based feedback faces a fundamental challenge: when rewards are only observed at trajectory endpoints, how do we assign credit to the right actions? This paper provides the first comprehensive analysis of this problem in online RL with general function approximation. We develop a provably sample-efficient algorithm achieving $\widetilde{O}({C_{\rm cov} H^3}/{\varepsilon^2})$ sample complexity, where $C_{\rm cov}$ is the coverability coefficient of the underlying MDP. By leveraging general function approximation, our approach works effectively in large or infinite state spaces where tabular methods fail, requiring only that value functions and reward functions can be represented by appropriate function classes. Our results also characterize when outcome-based feedback is statistically separated from per-step rewards, revealing an unavoidable exponential separation for certain MDPs. For deterministic MDPs, we show how to eliminate the completeness assumption, dramatically simplifying the algorithm. We further extend our approach to preference-based feedback settings, proving that equivalent statistical efficiency can be achieved even under more limited information. Together, these results constitute a theoretical foundation for understanding the statistical properties of outcome-based reinforcement learning.

NeurIPS Conference 2025 Conference Paper

Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning

  • Yurun Yuan
  • Fan Chen
  • Zeyu Jia
  • Alexander Rakhlin
  • Tengyang Xie

Policy-based methods currently dominate reinforcement learning (RL) pipelines for large language model (LLM) reasoning, leaving value-based approaches largely unexplored. We revisit the classical paradigm of Bellman Residual Minimization and introduce Trajectory Bellman Residual Minimization (TBRM), an algorithm that naturally adapts this idea to LLMs, yielding a simple yet effective off-policy algorithm that optimizes a single trajectory-level Bellman objective using the model's own logits as $Q$-values. TBRM removes the need for critics, importance-sampling ratios, or clipping, and can operate with only one rollout per prompt. We prove convergence to the near-optimal KL-regularized policy from arbitrary off-policy data via an improved change-of-trajectory-measure analysis. Experiments on standard mathematical-reasoning benchmarks show that TBRM matches or surpasses policy-based baselines, like PPO and GRPO, with comparable or lower computational and memory overhead. Our results indicate that value-based RL might be a principled and efficient alternative for enhancing reasoning capabilities in LLMs. The codebase for TBRM is publicly available at [https: //github. com/rlx-lab/TBRM](https: //github. com/rlx-lab/TBRM).

NeurIPS Conference 2024 Conference Paper

Assouad, Fano, and Le Cam with Interaction: A Unifying Lower Bound Framework and Characterization for Bandit Learnability

  • Fan Chen
  • Dylan J. Foster
  • Yanjun Han
  • Jian Qian
  • Alexander Rakhlin
  • Yunbei Xu

We develop a unifying framework for information-theoretic lower bound in statistical estimation and interactive decision making. Classical lower bound techniques---such as Fano's method, Le Cam's method, and Assouad's lemma---are central to the study of minimax risk in statistical estimation, yet are insufficient to provide tight lower bounds for \emph{interactive decision making} algorithms that collect data interactively (e. g. , algorithms for bandits and reinforcement learning). Recent work of Foster et al. provides minimax lower bounds for interactive decision making using seemingly different analysis techniques from the classical methods. These results---which are proven using a complexity measure known as the \emph{Decision-Estimation Coefficient} (DEC)---capture difficulties unique to interactive learning, yet do not recover the tightest known lower bounds for passive estimation. We propose a unified view of these distinct methodologies through a new lower bound approach called \emph{interactive Fano method}. As an application, we introduce a novel complexity measure, the \emph{Fractional Covering Number}, which facilitates the new lower bounds for interactive decision making that extend the DEC methodology by incorporating the complexity of estimation. Using the fractional covering number, we (i) provide a unified characterization of learnability for \emph{any} stochastic bandit problem, (ii) close the remaining gap between the upper and lower bounds in Foster et al. (up to polynomial factors) for any interactive decision making problem in which the underlying model class is convex.

NeurIPS Conference 2024 Conference Paper

CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts

  • Jiachen Li
  • Xinyao Wang
  • Sijie Zhu
  • Chia-Wen Kuo
  • Lu Xu
  • Fan Chen
  • Jitesh Jain
  • Humphrey Shi

Recent advancements in Multimodal Large Language Models (LLMs) have focused primarily on scaling by increasing text-image pair data and enhancing LLMs to improve performance on multimodal tasks. However, these scaling approaches are computationally expensive and overlook the significance of efficiently improving model capabilities from the vision side. Inspired by the successful applications of Mixture-of-Experts (MoE) in LLMs, which improves model scalability during training while keeping inference costs similar to those of smaller models, we propose CuMo, which incorporates Co-upcycled Top-K sparsely-gated Mixture-of-experts blocks into both the vision encoder and the MLP connector, thereby enhancing the multimodal LLMs with neglectable additional activated parameters during inference. CuMo first pre-trains the MLP blocks and then initializes each expert in the MoE block from the pre-trained MLP block during the visual instruction tuning stage, with auxiliary losses to ensure a balanced loading of experts. CuMo outperforms state-of-the-art multimodal LLMs across various VQA and visual-instruction-following benchmarks within each model size group, all while training exclusively on open-sourced datasets.

AAMAS Conference 2024 Conference Paper

Solving Offline 3D Bin Packing Problem with Large-sized Bin via Two-stage Deep Reinforcement Learning

  • Hao Yin
  • Fan Chen
  • Hongjie He

Existing Deep Reinforcement Learning (DRL) algorithms address the 3D Bin Packing Problem (3D-BPP) by decomposing the packing action into three sub-stages. However, this three-stage scheme makes it necessary for information to be passed between subnetworks, which may increase the computational cost of training and inference. This paper proposes a two-stage DRL algorithm, combining index and orientation into a single sub-stage to simplify learning. Additionally, a Bidirectional Cooperative Packing (BCP) method is introduced to compress the action space during position selection while retaining exploration capability. The experimental results show that the two-stage DRL algorithm, which incorporates BCP, achieves 0. 3%-1. 7% improvement in space utilization compared to the currently best-performing algorithm.

JMLR Journal 2023 Journal Article

Entropic Fictitious Play for Mean Field Optimization Problem

  • Fan Chen
  • Zhenjie Ren
  • Songbo Wang

We study two-layer neural networks in the mean field limit, where the number of neurons tends to infinity. In this regime, the optimization over the neuron parameters becomes the optimization over the probability measures, and by adding an entropic regularizer, the minimizer of the problem is identified as a fixed point. We propose a novel training algorithm named entropic fictitious play, inspired by the classical fictitious play in game theory for learning Nash equilibriums, to recover this fixed point, and the algorithm exhibits a two-loop iteration structure. Exponential convergence is proved in this paper and we also verify our theoretical results by simple numerical examples. [abs] [ pdf ][ bib ] &copy JMLR 2023. ( edit, beta )

NeurIPS Conference 2023 Conference Paper

Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection

  • Yu Bai
  • Fan Chen
  • Huan Wang
  • Caiming Xiong
  • Song Mei

Neural sequence models based on the transformer architecture have demonstrated remarkable \emph{in-context learning} (ICL) abilities, where they can perform new tasks when prompted with training and test examples, without any parameter update to the model. This work first provides a comprehensive statistical theory for transformers to perform ICL. Concretely, we show that transformers can implement a broad class of standard machine learning algorithms in context, such as least squares, ridge regression, Lasso, learning generalized linear models, and gradient descent on two-layer neural networks, with near-optimal predictive power on various in-context data distributions. Using an efficient implementation of in-context gradient descent as the underlying mechanism, our transformer constructions admit mild size bounds, and can be learned with polynomially many pretraining sequences. Building on these ``base'' ICL algorithms, intriguingly, we show that transformers can implement more complex ICL procedures involving \emph{in-context algorithm selection}, akin to what a statistician can do in real life---A \emph{single} transformer can adaptively select different base ICL algorithms---or even perform qualitatively different tasks---on different input sequences, without any explicit prompting of the right algorithm or task. We both establish this in theory by explicit constructions, and also observe this phenomenon experimentally. In theory, we construct two general mechanisms for algorithm selection with concrete examples: pre-ICL testing, and post-ICL validation. As an example, we use the post-ICL validation mechanism to construct a transformer that can perform nearly Bayes-optimal ICL on a challenging task---noisy linear models with mixed noise levels. Experimentally, we demonstrate the strong in-context algorithm selection capabilities of standard transformer architectures.

NeurIPS Conference 2022 Conference Paper

A Near-Optimal Primal-Dual Method for Off-Policy Learning in CMDP

  • Fan Chen
  • Junyu Zhang
  • Zaiwen Wen

As an important framework for safe Reinforcement Learning, the Constrained Markov Decision Process (CMDP) has been extensively studied in the recent literature. However, despite the rich results under various on-policy learning settings, there still lacks some essential understanding of the offline CMDP problems, in terms of both the algorithm design and the information theoretic sample complexity lower bound. In this paper, we focus on solving the CMDP problems where only offline data are available. By adopting the concept of the single-policy concentrability coefficient $C^*$, we establish an $\Omega\left(\frac{\min\left\{|\mathcal{S}||\mathcal{A}|, |\mathcal{S}|+I\right\} C^*}{(1-\gamma)^3\epsilon^2}\right)$ sample complexity lower bound for the offline CMDP problem, where $I$ stands for the number of constraints. By introducing a simple but novel deviation control mechanism, we propose a near-optimal primal-dual learning algorithm called DPDL. This algorithm provably guarantees zero constraint violation and its sample complexity matches the above lower bound except for an $\tilde{\mathcal{O}}((1-\gamma)^{-1})$ factor. Comprehensive discussion on how to deal with the unknown constant $C^*$ and the potential asynchronous structure on the offline dataset are also included.

ICRA Conference 2020 Conference Paper

A Bio-Signal Enhanced Adaptive Impedance Controller for Lower Limb Exoskeleton

  • Lin-qing Xia
  • Yachun Feng
  • Fan Chen
  • Xinyu Wu 0001

The problem of human-exoskeleton interaction with uncertain dynamical parameters remains an open-ended research area. It requires an elaborate control strategy design of the exoskeleton to accommodate complex and unpredictable human body movements. In this paper, we proposed a novel control approach for the lower limb exoskeleton to realize its task of assisting the human operator walking. The main challenge of this study was to determine the human lower extremity dynamics, such as the joint torque. For this purpose, we developed a neural network-based torque estimation method. It can predict the joint torques of humans with surface electromyogram signals (sEMG). Then an radial basis function neural network (RBF NN) enhanced adaptive impedance controller is employed to ensure exoskeleton track desired motion trajectory of a human operator. Algorithm performance is evaluated with two healthy subjects and the rehabilitation lower-limb exoskeleton developed by Shenzhen Institutes of Advanced Technology (SIAT).

ICRA Conference 2018 Conference Paper

Eddy Current Damper Design for Vibration Suppression in Robotic Milling Process

  • Fan Chen
  • Huan Zhao 0001
  • Han Ding 0001

This paper presents a novel eddy current damper design for chatter suppression in robotic milling process. The designed eddy current dampers are installed on a milling spindle to damp the tool tip vibrations. The structural design of the eddy current dampers and the working principle of the proposed vibration attenuation method are explained. Finite element method is used to analyze the magnetic flux density and the magnetic force generated by the designed eddy current. The dynamics of the robotic milling system without and with eddy current dampers are modeled, and the damping performance of the proposed method is verified through simulations in both frequency and time domains. The results show that the peaks of the tool tip frequency response function caused by the spindle and milling tool modes are damped by 3. 2 dB and 5. 3 dB, respectively, and the chatter stability is improved by about 43% in the high spindle speed zone, compared to the case without eddy current dampers.