Arrow Research search

Author name cluster

Chen Tang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

16 papers
2 author rows

Possible papers

16

AAAI Conference 2026 Conference Paper

AgriEval: A Comprehensive Chinese Agricultural Benchmark for Large Language Models

  • Lian Yan
  • Haotian Wang
  • Chen Tang
  • Haifeng Liu
  • Tianyang Sun
  • Liangliang Liu
  • Yi Guan
  • Jingchi Jiang

n the agricultural domain, the deployment of large language models (LLMs) is hindered by the lack of training data and evaluation benchmarks. To mitigate this issue, we propose AgriEval, the first comprehensive Chinese agricultural benchmark with three main characteristics: (1) Comprehensive Capability Evaluation. AgriEval covers six major agriculture categories and 29 subcategories within agriculture, addressing four core cognitive scenarios—memorization, understanding, inference, and generation. (2) High-Quality Data. The dataset is curated from university-level examinations and assignments, providing a natural and robust benchmark for assessing the capacity of LLMs to apply knowledge and make expert-like decisions. (3) Diverse Formats and Extensive Scale. AgriEval comprises 14,697 multiple-choice questions and 2,167 open-ended question-and-answer questions, establishing it as the most extensive agricultural benchmark available to date. We also present comprehensive experimental results over 51 open-source and commercial LLMs. The experimental results reveal that most existing LLMs struggle to achieve 60 percent accuracy, underscoring the developmental potential in agricultural LLMs. Additionally, we conduct extensive experiments to investigate factors influencing model performance and propose strategies for enhancement.

NeurIPS Conference 2025 Conference Paper

Accelerating Parallel Diffusion Model Serving with Residual Compression

  • Jiajun Luo
  • Yicheng Xiao
  • Jianru Xu
  • Yangxiu You
  • Rongwei Lu
  • Chen Tang
  • Jingyan Jiang
  • Zhi Wang

Diffusion models produce realistic images and videos but require substantial computational resources, necessitating multi-accelerator parallelism for real-time deployment. However, parallel inference introduces significant communication overhead from exchanging large activations between devices, limiting efficiency and scalability. We present CompactFusion, a compression framework that significantly reduces communication while preserving generation quality. Our key observation is that diffusion activations exhibit strong temporal redundancy—adjacent steps produce highly similar activations, saturating bandwidth with near-duplicate data carrying little new information. To address this inefficiency, we seek a more compact representation that encodes only the essential information. CompactFusion achieves this via Residual Compression that transmits only compressed residuals (step-wise activation differences). Based on empirical analysis and theoretical justification, we show that it effectively removes redundant data, enabling substantial data reduction while maintaining high fidelity. We also integrate lightweight error feedback to prevent error accumulation. CompactFusion establishes a new paradigm for parallel diffusion inference, delivering lower latency and significantly higher generation quality than prior methods. On 4$\times$L20, it achieves $3. 0\times$ speedup while greatly improving fidelity. It also uniquely supports communication-heavy strategies like sequence parallelism on slow networks, achieving $6. 7\times$ speedup over prior overlap-based method. CompactFusion applies broadly across diffusion models and parallel settings, and integrates easily without requiring pipeline rework. Portable implementation demonstrated on xDiT is publicly available at https: //github. com/Cobalt-27/CompactFusion

EAAI Journal 2025 Journal Article

Complex Deng entropy for uncertainty measure in complex evidence theory

  • Chen Tang
  • Fuyuan Xiao

Dempster–Shafer (DS) evidence theory, an extension of probability theory, is widely utilized across various domains due to its adeptness in managing uncertain and imprecise information. Building on DS evidence theory, the complex evidence theory addresses uncertainty in decision-making in a complex plane framework. Uncertainty measurement holds a pivotal role in both evidence theory and probability theory. In this paper, capitalizing on the unique attributes of complex basic belief assignment (CBBA), we propose a novel measure of complex belief entropy, designed to evaluate total uncertainty in complex evidence theory. Notably, the proposed complex belief entropy encompasses not only discord and non-specificity but also interference, shedding light on the interactions among focal elements. Additionally, we conduct a thorough analysis of the properties associated with this newly proposed entropy. Finally, based on the proposed entropy model, a decision-making algorithm is proposed, demonstrating the superiority of the entropy model. Our findings reveal that the proposed complex belief entropy effectively measures the total uncertainty of CBBA in the framework of complex evidence theory.

AAAI Conference 2025 Conference Paper

Deep Reinforcement Learning for Robotics: A Survey of Real-World Successes

  • Chen Tang
  • Ben Abbatematteo
  • Jiaheng Hu
  • Rohan Chandra
  • Roberto Martín-Martín
  • Peter Stone

Reinforcement learning (RL), particularly its combination with deep neural networks referred to as deep RL (DRL), has shown tremendous promise across a wide range of applications, suggesting its potential for enabling the development of sophisticated robotic behaviors. Robotics problems, however, pose fundamental difficulties for the application of RL, stemming from the complexity and cost of interacting with the physical world. These challenges notwithstanding, recent advances have enabled DRL to succeed at some real-world robotic tasks. However, state-of-the-art DRL solutions’ maturity varies significantly across robotic applications. In this talk, I will review the current progress of DRL in real-world robotic applications based on our recent survey paper (with Tang, Abbatematteo, Hu, Chandra, and Martı́n-Martı́n), with a particular focus on evaluating the real-world successes achieved with DRL in realizing several key robotic competencies, including locomotion, navigation, stationary manipulation, mobile manipulation, human-robot interaction, and multi-robot interaction. The analysis aims to identify the key factors underlying those exciting successes, reveal underexplored areas, and provide an overall characterization of the status of DRL in robotics. I will also highlight several important avenues for future work, emphasizing the need for stable and sample-efficient real-world RL paradigms, holistic approaches for discovering and integrating various competencies to tackle complex long-horizon, open-world tasks, and principled development and evaluation procedures. The talk is designed to offer insights for RL practitioners and roboticists toward harnessing RL’s power to create generally capable real-world robotic systems.

IJCAI Conference 2025 Conference Paper

DFMU: Distribution-based Framework for Modeling Aleatoric Uncertainty in Multimodal Sentiment Analysis

  • Chen Tang
  • Tingrui Shen
  • Xinrong Gong
  • Chong Zhao
  • Tong Zhang

In Multimodal Sentiment Analysis (MSA), data noise arising from various sources can lead to uncertainty in Aleatoric Uncertainty (AU), significantly impacting model performance. Current efforts to address AU have insufficiently explored its sources. They primarily focus on modeling noise rather than implementing targeted modeling based on its origin. Consequently, these approaches struggle to effectively mitigate the influence of AU, resulting in sustained limitations in model performance. Our research identifies that the AU primarily stems from two problems: subjective bias in the annotation process and the complex set relationships of sentiment features. To specifically address them, we propose DFMU, a Distribution-based Framework for Modeling Aleatoric Uncertainty, which incorporates an uncertainty modeling block capable of encoding uncertainty distributions and adaptively adjusting optimization objectives. Furthermore, we introduce distribution-based contrastive learning with sentiment words replacement to better capture the complex relationships among features. Extensive experiments on three public MSA datasets, i. e. , MOSI, MOSEI, and SIMS, demonstrate that the proposed model maintains robust performance even under high noise conditions and achieves state-of-the-art results on these popular datasets.

ECAI Conference 2025 Conference Paper

Dynamic Model Fusion for Multi-Source Test-Time Adaptation

  • Yuan Xue 0013
  • Qinting Jiang
  • Yuan Meng
  • Xingxuan Zhang
  • Chen Tang
  • Jingyan Jiang
  • Zhi Wang 0001

Deep Neural Networks suffer significant performance degradation when faced with distribution shifts between training and test data. Test-time adaptation (TTA) has emerged as a practical solution that enables models to adapt to the shifted test distribution. Currently, most existing TTA methods are designed around a single model, which incorporate limited information from a singular data distribution. In practice, pre-trained models derived from diverse source domains are readily accessible, each capturing a distinct data distribution and containing complementary information. To exploit this diversity, we propose Model Fusion-based multi-source Test-Time Adaptation (MFTTA), which constructs a target model by fusing the parameters of multiple source models. Drawing inspiration from deep model fusion, we introduce a fine-grained fusion mechanism governed by an off-policy reinforcement learning agent, which dynamically assigns fusion weights based on the current data distribution. Furthermore, we design a correlation-aware model update strategy that prioritizes the source model most relevant to the incoming test data. Extensive experiments on standard out-of-distribution benchmarks demonstrate that our method effectively integrates knowledge from multiple source models, adapts robustly to dynamic distribution shifts, and alleviates the problem of forgetting in long-term adaptation.

JBHI Journal 2025 Journal Article

GlanceSeg: Real-Time Microaneurysm Lesion Segmentation With Gaze-Map-Guided Foundation Model for Early Detection of Diabetic Retinopathy

  • Hongyang Jiang
  • Mengdi Gao
  • Zirong Liu
  • Chen Tang
  • Xiaoqing Zhang
  • Shuai Jiang
  • Wu Yuan
  • Jiang Liu

Early-stage diabetic retinopathy (DR) presents challenges in clinical diagnosis due to inconspicuous and minute microaneurysms (MAs), resulting in limited research in this area. Additionally, the potential of emerging foundation models, such as the segment anything model (SAM), in medical scenarios remains rarely explored. In this work, we propose a human-in-the-loop, label-free early DR diagnosis framework called GlanceSeg, based on SAM. GlanceSeg enables real-time segmentation of MA lesions as ophthalmologists review fundus images. Our human-in-the-loop framework integrates the ophthalmologist's gaze maps, allowing for rough localization of minute lesions in fundus images. Subsequently, a saliency map is generated based on the located region of interest, which provides prompt points to assist the foundation model in efficiently segmenting MAs. Finally, a domain knowledge filtering (DKF) module refines the segmentation of minute lesions. We conducted experiments on two newly-built public datasets, i. e. , IDRiD and Retinal-Lesions, and validated the feasibility and superiority of GlanceSeg through visualized illustrations and quantitative measures. Additionally, we demonstrated that GlanceSeg improves annotation efficiency for clinicians and further enhances segmentation performance through fine-tuning using annotations. The clinician-friendly GlanceSeg is able to segment small lesions in real-time, showing potential for clinical applications.

AAAI Conference 2025 Conference Paper

JAQ: Joint Efficient Architecture Design and Low-Bit Quantization with Hardware-Software Co-Exploration

  • Mingzi Wang
  • Yuan Meng
  • Chen Tang
  • Weixiang Zhang
  • Yijian Qin
  • Yang Yao
  • Yingxin Li
  • Tongtong Feng

The co-design of neural network architectures, quantization precisions, and hardware accelerators offers a promising approach to achieving an optimal balance between performance and efficiency, particularly for model deployment on resource-constrained edge devices. In this work, we propose the JAQ Framework, which jointly optimizes the three critical dimensions. However, effectively automating the design process across the vast search space of those three dimensions poses significant challenges, especially when pursuing extremely low-bit quantization. Specifical, the primary challenges include: (1) Memory overhead in software-side: Low-precision quantization-aware training can lead to significant memory usage due to storing large intermediate features and latent weights for backpropagation, potentially causing memory exhaustion. (2) Search time-consuming in hardware-side: The discrete nature of hardware parameters and the complex interplay between compiler optimizations and individual operators make the accelerator search time-consuming. To address these issues, JAQ mitigates the memory overhead through a channel-wise sparse quantization (CSQ) scheme, selectively applying quantization to the most sensitive components of the model during optimization. Additionally, JAQ designs BatchTile, which employs a hardware generation network to encode all possible tiling modes, thereby speeding up the search for the optimal compiler mapping strategy. Extensive experiments demonstrate the effectiveness of JAQ, achieving approximately 7% higher Top-1 accuracy on ImageNet compared to previous methods and reducing the hardware search time per iteration to 0.15 seconds.

AAAI Conference 2025 Conference Paper

Model Lineage Closeness Analysis

  • Chen Tang
  • Lan Zhang
  • Qi Zhao
  • Xirong Zhuang
  • Xiang-Yang Li

As machine learning model modification techniques are extensively employed to obtain well-performing models at reduced costs, several studies have emerged to determine the presence of a modification relationship (i.e., lineage) between models. However, these methods are not robust to high-impact modification techniques and none of them have addressed the measurement of lineage closeness, which quantifies the degrees of modification. In this work, we visualize the changes in model decision boundaries resulting from different modification techniques and conclude that differences in decision boundaries serve as a precise metric of lineage closeness. Building upon this insight, we propose a modification-type agnostic and task-agnostic method to measure model lineage closeness by calculating mean adversarial distances from data points to decision boundaries and matching rate of data points, with data points selected through an efficient sampling method to reduce computational overhead. Moreover, we propose a novel indirect measurement approach to support lineage closeness measurement for models with different tasks. Finally, comprehensive experiments show that our design achieves an impressive 97% accuracy in lineage determination, and can precisely measure model lineage closeness for different modifications.

NeurIPS Conference 2025 Conference Paper

SpecEM: Training-Free LLM Ensembling via Iterative Drafting, Verification, and Online Feedback

  • Bo Lv
  • Nayu Liu
  • Chen Tang
  • Xin Liu
  • Yue Yu
  • Ping Luo

Ensembles of generative large language models (LLMs) are a promising way to compensate for individual model limitations, integrating the strengths of different LLMs. Existing LLM ensemble methods, however, face limitations such as first-token delay and challenges in long-range semantic collaboration between models, Moreover, they typically assume equal voting weights for all models during ensemble, ignoring performance differences between models for a given task. In this work, we propose SpecEM, a training-free, plug-and-play LLM ensemble framework that dynamically adjusts each model's model contribution in real time based on task performance. Inspired by speculative decoding, SpecFuse iteratively performs drafting and verification, allowing models to collaborate semantically at the segment level for integrated output. Furthermore, we introduce an online feedback mechanism with multiplicative weight updates, where each model's voting weight is adjusted on-the-fly according to how often it "outperforms" others during verification stage, ensuring that stronger models exert greater influence on the ensemble during generation. Experimental results on five popular LLMs (ranging from 7B to 72B parameters) and six benchmark tasks, spanning instruction following, reasoning, commonsense, and general instruction response, demonstrate consistent performance improvements compared to state-of-the-art LLM ensemble methods.

TMLR Journal 2025 Journal Article

TempFlex: Advancing MLLMs with Temporal Perception and Natively Scalable Resolution Encoding

  • Zhanyu Wang
  • Chen Tang
  • Haoyu He
  • Kuan Feng
  • Chao Wang
  • Bingni Zhang
  • Xiaolei Xu
  • SHEN WANG

Multimodal large language models (MLLMs) have made significant progress across vision-language tasks, yet many designs still suffer from two core limitations. (i) Excessive visual tokens and broken global context: Tiled Patch Encoding fragments high-resolution images, leading to token overload and disrupting global attention modeling. (ii) Lack of temporal reasoning: Most models process video as independent frames using static image encoders, failing to capture temporal dynamics. We present TempFlex-VL, a token-efficient and temporally aware MLLM that addresses both issues through lightweight architectural enhancements. First, we introduce a resolution-agnostic visual encoder that directly processes full images without tiling, preserving global context while substantially reducing visual tokens. Second, we propose Temporal Fiber Fusion (TFF), a plug-and-play module with three complementary pathways: (1) a dynamic local-convolution branch for fine-grained motion, (2) a gated memory accumulator for long-term dependencies, and (3) a periodic encoder for modeling cyclic patterns. These signals are softly fused, enabling the model to adapt to diverse temporal structures without overfitting. To support large-scale video-language pretraining, we curate TempFlex-2M, a high-quality synthetic video–text corpus generated in a single stage via GPT-4o with direct visual prompting. We instantiate TempFlex-VL using two different language backbones, Gemma3-4B and Qwen3-4B, demonstrating the generality of our design across architectures. Both variants achieve state-of-the-art or competitive results on a wide range of image and video benchmarks while markedly improving token efficiency. Code is publicly available at: https://github.com/wang-zhanyu/TempFlex.

AAMAS Conference 2024 Conference Paper

Quantifying Agent Interaction in Multi-agent Reinforcement Learning for Cost-efficient Generalization

  • Yuxin Chen
  • Chen Tang
  • Ran Tian
  • Chenran Li
  • Jinning Li
  • Masayoshi Tomizuka
  • Wei Zhan

Generalization in Multi-agent Reinforcement Learning (MARL) is challenging. Introducing a diverse set of co-play agents typically boosts the agent’s generalization to unseen co-players. However, the extent to which an agent is influenced by co-players varies across scenarios and environments; thus, the improvement in generalization introduced by diversifying co-players also varies. In this work, we introduce Level of Influence (LoI), a novel metric measuring the interaction intensity among agents within a given scenario and environment. We show that LoI can effectively predict the disparities in the benefits of diversifying co-player distribution across scenarios, offering insights into optimizing training cost for varied situations. The code is available at: https: //github. com/ ThomasChen98/Level-of-Influence.

RLJ Journal 2024 Journal Article

Quantifying Interaction Level Between Agents Helps Cost-efficient Generalization in Multi-agent Reinforcement Learning

  • Yuxin Chen
  • Chen Tang
  • Thomas Tian
  • Chenran Li
  • Jinning Li
  • Masayoshi Tomizuka
  • Wei Zhan

Generalization poses a significant challenge in Multi-agent Reinforcement Learning (MARL). The extent to which unseen co-players influence an agent depends on the agent's policy and the specific scenario. A quantitative examination of this relationship sheds light on how to effectively train agents for diverse scenarios. In this study, we present the Level of Influence (LoI), a metric quantifying the interaction intensity among agents within a given scenario and environment. We observe that, generally, a more diverse set of co-play agents during training enhances the generalization performance of the ego agent; however, this improvement varies across distinct scenarios and environments. LoI proves effective in predicting these improvement disparities within specific scenarios. Furthermore, we introduce a LoI-guided resource allocation method tailored to train a set of policies for diverse scenarios under a constrained budget. Our results demonstrate that strategic resource allocation based on LoI can achieve higher performance than uniform allocation under the same computation budget. The code is available at: https://github.com/ThomasChen98/Level-of-Influence.

RLC Conference 2024 Conference Paper

Quantifying Interaction Level Between Agents Helps Cost-efficient Generalization in Multi-agent Reinforcement Learning

  • Yuxin Chen
  • Chen Tang
  • Thomas Tian
  • Chenran Li
  • Jinning Li
  • Masayoshi Tomizuka
  • Wei Zhan

Generalization poses a significant challenge in Multi-agent Reinforcement Learning (MARL). The extent to which unseen co-players influence an agent depends on the agent's policy and the specific scenario. A quantitative examination of this relationship sheds light on how to effectively train agents for diverse scenarios. In this study, we present the Level of Influence (LoI), a metric quantifying the interaction intensity among agents within a given scenario and environment. We observe that, generally, a more diverse set of co-play agents during training enhances the generalization performance of the ego agent; however, this improvement varies across distinct scenarios and environments. LoI proves effective in predicting these improvement disparities within specific scenarios. Furthermore, we introduce a LoI-guided resource allocation method tailored to train a set of policies for diverse scenarios under a constrained budget. Our results demonstrate that strategic resource allocation based on LoI can achieve higher performance than uniform allocation under the same computation budget. The code is available at: https: //github. com/ThomasChen98/Level-of-Influence.

NeurIPS Conference 2023 Conference Paper

Residual Q-Learning: Offline and Online Policy Customization without Value

  • Chenran Li
  • Chen Tang
  • Haruki Nishimura
  • Jean Mercat
  • Masayoshi Tomizuka
  • Wei Zhan

Imitation Learning (IL) is a widely used framework for learning imitative behavior from demonstrations. It is especially appealing for solving complex real-world tasks where handcrafting reward function is difficult, or when the goal is to mimic human expert behavior. However, the learned imitative policy can only follow the behavior in the demonstration. When applying the imitative policy, we may need to customize the policy behavior to meet different requirements coming from diverse downstream tasks. Meanwhile, we still want the customized policy to maintain its imitative nature. To this end, we formulate a new problem setting called policy customization. It defines the learning task as training a policy that inherits the characteristics of the prior policy while satisfying some additional requirements imposed by a target downstream task. We propose a novel and principled approach to interpret and determine the trade-off between the two task objectives. Specifically, we formulate the customization problem as a Markov Decision Process (MDP) with a reward function that combines 1) the inherent reward of the demonstration; and 2) the add-on reward specified by the downstream task. We propose a novel framework, Residual Q-learning, which can solve the formulated MDP by leveraging the prior policy without knowing the inherent reward or value function of the prior policy. We derive a family of residual Q-learning algorithms that can realize offline and online policy customization, and show that the proposed algorithms can effectively accomplish policy customization tasks in various environments. Demo videos and code are available on our website: https: //sites. google. com/view/residualq-learning.

NeurIPS Conference 2021 Conference Paper

Exploring Social Posterior Collapse in Variational Autoencoder for Interaction Modeling

  • Chen Tang
  • Wei Zhan
  • Masayoshi Tomizuka

Multi-agent behavior modeling and trajectory forecasting are crucial for the safe navigation of autonomous agents in interactive scenarios. Variational Autoencoder (VAE) has been widely applied in multi-agent interaction modeling to generate diverse behavior and learn a low-dimensional representation for interacting systems. However, existing literature did not formally discuss if a VAE-based model can properly encode interaction into its latent space. In this work, we argue that one of the typical formulations of VAEs in multi-agent modeling suffers from an issue we refer to as social posterior collapse, i. e. , the model is prone to ignoring historical social context when predicting the future trajectory of an agent. It could cause significant prediction errors and poor generalization performance. We analyze the reason behind this under-explored phenomenon and propose several measures to tackle it. Afterward, we implement the proposed framework and experiment on real-world datasets for multi-agent trajectory prediction. In particular, we propose a novel sparse graph attention message-passing (sparse-GAMP) layer, which helps us detect social posterior collapse in our experiments. In the experiments, we verify that social posterior collapse indeed occurs. Also, the proposed measures are effective in alleviating the issue. As a result, the model attains better generalization performance when historical social context is informative for prediction.