Arrow Research search

Author name cluster

Yun Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

14 papers
2 author rows

Possible papers

14

AAAI Conference 2026 Conference Paper

Enhancing Uncertainty Estimation in LLMs with Expectation of Aggregated Internal Belief

  • Zeguan Xiao
  • Diyang Dou
  • Boya Xiong
  • Yun Chen
  • Guanhua Chen

Large Language Models (LLMs) have achieved remarkable success across a wide range of natural language tasks, but often exhibit overconfidence and generate plausible yet incorrect answers. This overconfidence, especially in models undergone Reinforcement Learning from Human Feedback (RLHF), poses significant challenges for reliable uncertainty estimation and safe deployment. In this paper, we propose EAGLE (Expectation of AGgregated internaL bEief), a novel self-evaluation-based calibration method that leverages the internal hidden states of LLMs to derive more accurate confidence scores. Instead of relying on the model's final output, our approach extracts internal beliefs from multiple intermediate layers during self-evaluation. By aggregating these layer-wise beliefs and calculating the expectation over the resulting confidence score distribution, EAGLE produces a refined confidence score that more faithfully reflects the model's internal certainty. Extensive experiments on diverse datasets and LLMs demonstrate that EAGLE significantly improves calibration performance over existing baselines. We also provide an in-depth analysis of EAGLE, including a layer-wise examination of uncertainty patterns, a study of the impact of self-evaluation prompts, and an analysis of the effect of self-evaluation score range.

NeurIPS Conference 2025 Conference Paper

Beyond the Surface: Enhancing LLM-as-a-Judge Alignment with Human via Internal Representations

  • Peng Lai
  • Jianjie Zheng
  • Sijie Cheng
  • Yun Chen
  • Peng Li
  • Yang Liu
  • Guanhua Chen

The growing scale of evaluation tasks has led to the widespread adoption of automated evaluation using LLMs, a paradigm known as “LLM-as-a-judge”. However, improving its alignment with human preferences without complex prompts or fine-tuning remains challenging. Previous studies mainly optimize based on shallow outputs, overlooking rich cross-layer representations. In this work, motivated by preliminary findings that middle-to-upper layers encode semantically and task-relevant representations that are often more aligned with human judgments than the final layer, we propose LAGER, a post-hoc, plug-and-play framework for improving the alignment of LLM-as-a-Judge point-wise evaluations with human scores by leveraging internal representations. LAGER produces fine-grained judgment scores by aggregating cross-layer score-token logits and computing the expected score from a softmax-based distribution, while keeping the LLM backbone frozen and ensuring no impact on the inference process. LAGER fully leverages the complementary information across different layers, overcoming the limitations of relying solely on the final layer. We evaluate our method on the standard alignment benchmarks Flask, HelpSteer, and BIGGen using Spearman correlation, and find that LAGER achieves improvements of up to 7. 5% over the best baseline across these benchmarks. Without reasoning steps, LAGER matches or outperforms reasoning-based methods. Experiments on downstream applications, such as data selection and emotional understanding, further show the generalization of LAGER.

IJCAI Conference 2025 Conference Paper

Deconfounding Multi-Cause Latent Confounders: A Factor-Model Approach to Climate Model Bias Correction

  • Wentao Gao
  • Jiuyong Li
  • Debo Cheng
  • Lin Liu
  • Jixue Liu
  • Thuc Le
  • Xiaojing Du
  • Xiongren Chen

Global Climate Models (GCMs) are crucial for predicting future climate changes by simulating the Earth systems. However, GCM outputs exhibit systematic biases due to model uncertainties, parameterization simplifications, and inadequate representation of complex climate phenomena. Traditional bias correction methods, which rely on historical observation data and statistical techniques, often neglect unobserved confounders, leading to biased results. This paper proposes a novel bias correction approach to utilize both GCM and observational data to learn a factor model that captures multi-cause latent confounders. Inspired by recent advances in causality based time series deconfounding, our method first constructs a factor model to learn latent confounders from historical data and then applies them to enhance the bias correction process using advanced time series forecasting models. The experimental results demonstrate significant improvements in the accuracy of precipitation outputs. By addressing unobserved confounders, our approach offers a robust and theoretically grounded solution for climate model bias correction.

NeurIPS Conference 2025 Conference Paper

Flux4D: Flow-based Unsupervised 4D Reconstruction

  • Jingkang Wang
  • Henry Che
  • Yun Chen
  • Ze Yang
  • Lily Goli
  • Sivabalan Manivasagam
  • Raquel Urtasun

Reconstructing large-scale dynamic scenes from visual observations is a fundamental challenge in computer vision, with critical implications for robotics and autonomous systems. While recent differentiable rendering methods such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) have achieved impressive photorealistic reconstruction, they suffer from scalability limitations and require annotations to decouple actor motion. Existing self-supervised methods attempt to eliminate explicit annotations by leveraging motion cues and geometric priors, yet they remain constrained by per-scene optimization and sensitivity to hyperparameter tuning. In this paper, we introduce Flux4D, a simple and scalable framework for 4D reconstruction of large-scale dynamic scenes. Flux4D directly predicts 3D Gaussians and their motion dynamics to reconstruct sensor observations in a fully unsupervised manner. By adopting only photometric losses and enforcing an "as static as possible" regularization, Flux4D learns to decompose dynamic elements directly from raw data without requiring pre-trained supervised models or foundational priors simply by training across many scenes. Our approach enables efficient reconstruction of dynamic scenes within seconds, scales effectively to large datasets, and generalizes well to unseen environments, including rare and unknown objects. Experiments on outdoor driving datasets show Flux4D significantly outperforms existing methods in scalability, generalization, and reconstruction quality.

YNIMG Journal 2025 Journal Article

Increased spindle-related brain activation in right middle temporal gyrus during N2 than N3 among healthy sleepers: Initial discovery and independent sample replication

  • Yan Shao
  • Yupeng Guo
  • Yun Chen
  • Guangyuan Zou
  • Jie Chen
  • Xuejiao Gao
  • Panpan Lu
  • Yujie Tong

The association between spindle metrics and sleep architecture differs during N2 vs. N3 sleep, the underlying neural mechanism is not clearly illustrated. Here, we tested the discrepancy in spindle-related brain activation between N2 and N3 within healthy college students (dataset 1: n = 27, 59 % females, median age 23 years), using simultaneous electroencephalography-functional magnetic resonance imaging (EEG-fMRI). To assess the replicability of the finding, we repeated the analysis among normal adults (independent dataset 2: n = 30, 50 % females, median age 32 years). The finding from dataset 1 indicated significantly increased blood-oxygen level-dependent signal in the right middle temporal gyrus during N2 compared with N3, which was well replicated in dataset 2. Furthermore, correlation analysis was performed to explore the association between this spindle-related brain activation and N2, N3 sleep duration during EEG-fMRI. We conducted the correlation analysis in N2 and N3, respectively. The negative association between spindle-related brain activation in the right middle temporal gyrus and sleep duration was only observed in N2. Our findings emphasize the unique role of spindle-related brain activation in the right middle temporal gyrus during N2 in shortening N2 sleep duration.

ICRA Conference 2025 Conference Paper

Towards Neurorobotic Interface for Finger Joint Angle Estimation: A Multi-Stage CNN-LSTM Network with Transfer Learning

  • Yun Chen
  • Xinyu Zhang
  • Hui Li
  • Hongsheng He
  • Wan Shou
  • Qiang Zhang 0028

To maximize the autonomy of individuals with upper limb amputations in daily activities, leveraging forearm muscle information to infer movement intent is a promising research direction. While current prosthetic hand technologies can utilize forearm muscle data to achieve basic movements such as grasping, accurately estimating finger joint angles remains a significant challenge. Therefore, we propose a Multi-Stage Cascade Convolutional Neural Network with Long Short-Term Memory Network, where an upsampling module is introduced before the downsampling module to enhance model generalization. Additionally, we designed a transfer learning (TL) framework based on parameter freezing, where the pre-trained downsampling module is fixed, and only the upsampling module is updated with a small amount of out-ofdistribution data to achieve TL. Furthermore, we compared the performance of unimodal and multimodal models, collecting surface electromyography (sEMG) signals, brightness mode ultrasound images (B-mode US images), and motion capture data simultaneously. The results show that on the validation set, the US image had the lowest error, while on the prediction set, the four-channel sEMG achieved the lowest error. The performance of the multimodal model in both datasets was intermediate between the unimodal models. On the prediction set, the average normalized root mean square error values for the four-channel sEMG, US images, and sensor fusion models across three subjects were 0. 170, 0. 203, and 0. 186, respectively. By utilizing advanced sensor fusion techniques and TL, our approach can reduce the need for extensive data collection and training for new users, making prosthetic control more accessible and adaptable to individual needs.

YNIMG Journal 2025 Journal Article

Unexpected feedback enhances episodic memory: Exploring signed and unsigned reward prediction errors with EEG

  • Yun Chen
  • Chunyu Zhao
  • Rong Liu
  • Qi Li

Reward prediction error (RPE) is crucial for learning and memory, yet its influence on episodic memory remains underexplored. Previous studies have shown two models for the influence of RPE on memory-signed RPE (SRPE) or unsigned RPE (URPE) effects. In the current study, thirty participants completed a prediction-feedback assessment integrated with a study-recognition task. Electroencephalogram (EEG) was recorded throughout the experiment. Our results showed that recognition accuracy improved when rewards deviated from expectations, irrespective of the direction (positive or negative) of the deviation. EEG analyses revealed that during the recognition phase, the late positive component was predominantly associated with URPE effects. In the earlier time windows of reward feedback, FRN and P300 components reflected SRPE effects. In contrast, robust URPE effects emerged in the later time windows, as evidenced by both univariate and multivariate analyses. Importantly, the URPE effects of representational similarity were correlated with that of subsequent recognition performance. These findings demonstrate that RPE is differentially processed across memory encoding stages, with URPE exerting a dominant influence. Our results highlight the critical role of unexpected feedback in memory formation and provide novel insights into the mechanisms underlying RPE in episodic memory.

NeurIPS Conference 2024 Conference Paper

Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models

  • Bowen Ping
  • Shuo Wang
  • Hanqing Wang
  • Xu Han
  • Yuzhuang Xu
  • Yukun Yan
  • Yun Chen
  • Baobao Chang

Fine-tuning is a crucial process for adapting large language models (LLMs) to diverse applications. In certain scenarios, such as multi-tenant serving, deploying multiple LLMs becomes necessary to meet complex demands. Recent studies suggest decomposing a fine-tuned LLM into a base model and corresponding delta weights, which are then compressed using low-rank or low-bit approaches to reduce costs. In this work, we observe that existing low-rank and low-bit compression methods can significantly harm the model performance for task-specific fine-tuned LLMs (e. g. , WizardMath for math problems). Motivated by the long-tail distribution of singular values in the delta weights, we propose a delta quantization approach using mixed-precision. This method employs higher-bit representation for singular vectors corresponding to larger singular values. We evaluate our approach on various fine-tuned LLMs, including math LLMs, code LLMs, chat LLMs, and even VLMs. Experimental results demonstrate that our approach performs comparably to full fine-tuned LLMs, surpassing both low-rank and low-bit baselines by a considerable margin. Additionally, we show that our method is compatible with various backbone LLMs, such as Llama-2, Llama-3, and Mistral, highlighting its generalizability.

NeurIPS Conference 2024 Conference Paper

SeTAR: Out-of-Distribution Detection with Selective Low-Rank Approximation

  • Yixia Li
  • Boya Xiong
  • Guanhua Chen
  • Yun Chen

Out-of-distribution (OOD) detection is crucial for the safe deployment of neural networks. Existing CLIP-based approaches perform OOD detection by devising novel scoring functions or sophisticated fine-tuning methods. In this work, we propose SeTAR, a novel, training-free OOD detection method that leverages selective low-rank approximation of weight matrices in vision-language and vision-only models. SeTAR enhances OOD detection via post-hoc modification of the model's weight matrices using a simple greedy search algorithm. Based on SeTAR, we further propose SeTAR+FT, a fine-tuning extension optimizing model performance for OOD detection tasks. Extensive evaluations on ImageNet1K and Pascal-VOC benchmarks show SeTAR's superior performance, reducing the relatively false positive rate by up to 18. 95\% and 36. 80\% compared to zero-shot and fine-tuning baselines. Ablation studies further validate our approach's effectiveness, robustness, and generalizability across different model backbones. Our work offers a scalable, efficient solution for OOD detection, setting a new state-of-the-art in this area.

NeurIPS Conference 2023 Conference Paper

Neural Lighting Simulation for Urban Scenes

  • Ava Pun
  • Gary Sun
  • Jingkang Wang
  • Yun Chen
  • Ze Yang
  • Sivabalan Manivasagam
  • Wei-Chiu Ma
  • Raquel Urtasun

Different outdoor illumination conditions drastically alter the appearance of urban scenes, and they can harm the performance of image-based robot perception systems if not seen during training. Camera simulation provides a cost-effective solution to create a large dataset of images captured under different lighting conditions. Towards this goal, we propose LightSim, a neural lighting camera simulation system that enables diverse, realistic, and controllable data generation. LightSim automatically builds lighting-aware digital twins at scale from collected raw sensor data and decomposes the scene into dynamic actors and static background with accurate geometry, appearance, and estimated scene lighting. These digital twins enable actor insertion, modification, removal, and rendering from a new viewpoint, all in a lighting-aware manner. LightSim then combines physically-based and learnable deferred rendering to perform realistic relighting of modified scenes, such as altering the sun location and modifying the shadows or changing the sun brightness, producing spatially- and temporally-consistent camera videos. Our experiments show that LightSim generates more realistic relighting results than prior work. Importantly, training perception models on data generated by LightSim can significantly improve their performance. Our project page is available at https: //waabi. ai/lightsim/.

AAAI Conference 2021 Conference Paper

Lexically Constrained Neural Machine Translation with Explicit Alignment Guidance

  • Guanhua Chen
  • Yun Chen
  • Victor O.K. Li

Lexically constrained neural machine translation (NMT), which leverages pre-specified translation to constrain NMT, has practical significance in interactive translation and NMT domain adaptation. Previous works either modify the decoding algorithm or train the model on augmented datasets. These methods suffer from either high computational overheads or low copying success rates. In this paper, we investigate ATT-INPUT and ATT-OUTPUT, two alignment-based constrained decoding methods. These two methods revise the target tokens during decoding based on word alignments derived from encoder-decoder attention weights. Our study shows that ATT-INPUT translates better while ATT- OUTPUT is more computationally efficient. Capitalizing on both strengths, we further propose EAM-OUTPUT by introducing an explicit alignment module (EAM) to a pretrained Transformer. It decodes similarly as ATT-OUTPUT, except using alignments derived from the EAM. We leverage the word alignments induced from ATT-INPUT as labels and train the EAM while keeping the parameters of the Transformer frozen. Experiments on WMT16 De-En and WMT16 Ro- En show the effectiveness of our approaches on constrained NMT. In particular, the proposed EAM-OUTPUT method consistently outperforms previous approaches in translation quality, with light computational overheads over unconstrained baseline.

IJCAI Conference 2020 Conference Paper

Lexical-Constraint-Aware Neural Machine Translation via Data Augmentation

  • Guanhua Chen
  • Yun Chen
  • Yong Wang
  • Victor O. K. Li

Leveraging lexical constraint is extremely significant in domain-specific machine translation and interactive machine translation. Previous studies mainly focus on extending beam search algorithm or augmenting the training corpus by replacing source phrases with the corresponding target translation. These methods either suffer from the heavy computation cost during inference or depend on the quality of the bilingual dictionary pre-specified by user or constructed with statistical machine translation. In response to these problems, we present a conceptually simple and empirically effective data augmentation approach in lexical constrained neural machine translation. Specifically, we make constraint-aware training data by first randomly sampling the phrases of the reference as constraints, and then packing them together into the source sentence with a separation symbol. Extensive experiments on several language pairs demonstrate that our approach achieves superior translation results over the existing systems, improving translation of constrained sentences without hurting the unconstrained ones.

AAAI Conference 2018 Short Paper

Dialogue Generation With GAN

  • Hui Su
  • Xiaoyu Shen
  • Pengwei Hu
  • Wenjie Li
  • Yun Chen

This paper presents a Generative Adversarial Network (GAN) to model multi-turn dialogue generation, which trains a latent hierarchical recurrent encoder-decoder simultaneously with a discriminative classifier that make the prior approximate to the posterior. Experiments show that our model achieves better results.

AAAI Conference 2018 Conference Paper

Zero-Resource Neural Machine Translation with Multi-Agent Communication Game

  • Yun Chen
  • Yang Liu
  • Victor Li

While end-to-end neural machine translation (NMT) has achieved notable success in the past years in translating a handful of resource-rich language pairs, it still suffers from the data scarcity problem for low-resource language pairs and domains. To tackle this problem, we propose an interactive multimodal framework for zero-resource neural machine translation. Instead of being passively exposed to large amounts of parallel corpora, our learners (implemented as encoder-decoder architecture) engage in cooperative image description games, and thus develop their own image captioning or neural machine translation model from the need to communicate in order to succeed at the game. Experimental results on the IAPR-TC12 and Multi30K datasets show that the proposed learning mechanism significantly improves over the state-of-the-art methods.