Author name cluster

Xinyi Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers

2 author rows

AAAI Conference 2026 Conference Paper

Sonic4D: Spatial Audio Generation for Immersive 4D Scene Exploration

Siyi Xie
Hanxin Zhu
Xinyi Chen
Tianyu He
Xin Li
Zhibo Chen

Recent advancements in 4D generation have demonstrated its remarkable capability in synthesizing photorealistic renderings of dynamic 3D scenes. However, despite achieving impressive visual performance, almost all existing methods overlook the generation of spatial audio aligned with the corresponding 4D scenes, posing a significant limitation to truly immersive audiovisual experiences. To mitigate this issue, we propose Sonic4D, a novel framework that enables spatial audio generation for immersive exploration of 4D scenes. Specifically, our method is composed of three stages: 1) To capture both the dynamic visual content and raw auditory information from a monocular video, we first employ pre-trained expert models to generate the 4D scene and its corresponding monaural audio. 2) Subsequently, to transform the monaural audio into spatial audio, we localize and track the sound sources within the 4D scene, where their 3D spatial coordinates at different timestamps are estimated via a pixel-level visual grounding strategy. 3) Based on the estimated sound source locations, we further synthesize plausible spatial audio that varies across different viewpoints and timestamps using physics-based simulation. Extensive experiments have demonstrated that our proposed method generates realistic spatial audio consistent with the synthesized 4D scene in a training-free manner, significantly enhancing the immersive experience for users.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Towards Efficient and Robust Manipulation via Multi-Frame Vision-Language-Action Modeling

Hao Li
Shuai Yang
Yilun Chen
Xinyi Chen
Xiaoda Yang
Yang Tian
Hanqing Wang
Tai WANG

Recent vision-language-action (VLA) models built on pretrained vision-language models (VLMs) have demonstrated strong performance in robotic manipulation. However, these models remain constrained by the single-frame image paradigm and fail to fully leverage the temporal information offered by multi-frame histories, as directly feeding multiple frames into VLM backbones incurs substantial computational overhead and inference latency. We propose CronusVLA, a unified framework that extends single-frame VLA models to the multi-frame paradigm. CronusVLA follows a two-stage process: (1) Single-frame pretraining on large-scale embodied datasets with autoregressive prediction of action tokens, establishing an effective embodied vision-language foundation; (2) Multi-frame post-training, which adapts the prediction of the vision-language backbone from discrete tokens to learnable features, and aggregates historical information via feature chunking. CronusVLA effectively addresses the existing challenges of multi-frame modeling while enhancing performance. To evaluate the robustness under temporal and spatial disturbances, we introduce SimplerEnv-OR, a novel benchmark featuring 24 types of observational disturbances and 120 severity levels. Experiments across three embodiments in simulated and real-world environments demonstrate that CronusVLA achieves leading performance and superior robustness, with a 70.9% success rate on SimplerEnv, a 26.8% improvement over OpenVLA on LIBERO, and the highest robustness score on SimplerEnv-OR, showing the promise of efficient multi-frame adaptation for real-world VLA deployment.

PDF Details DOI

ICML Conference 2025 Conference Paper

Achieving Linear Speedup and Near-Optimal Complexity for Decentralized Optimization over Row-stochastic Networks

Liyuan Liang
Xinyi Chen
Gan Luo
Kun Yuan

A key challenge in decentralized optimization is determining the optimal convergence rate and designing algorithms to achieve it. While this problem has been extensively addressed for doubly-stochastic and column-stochastic mixing matrices, the row-stochastic scenario remains unexplored. This paper bridges this gap by introducing effective metrics to capture the influence of row-stochastic mixing matrices and establishing the first convergence lower bound for decentralized learning over row-stochastic networks. However, existing algorithms fail to attain this lower bound due to two key issues: deviation in the descent direction caused by the adapted gradient tracking (GT) and instability introduced by the Pull-Diag protocol. To address descent deviation, we propose a novel analysis framework demonstrating that Pull-Diag-GT achieves linear speedup—the first such result for row-stochastic decentralized optimization. Moreover, by incorporating a multi-step gossip (MG) protocol, we resolve the instability issue and attain the lower bound, achieving near-optimal complexity for decentralized optimization over row-stochastic networks.

Details

ICML Conference 2025 Conference Paper

Efficient First-Order Optimization on the Pareto Set for Multi-Objective Learning under Preference Guidance

Lisha Chen
Quan Xiao
Ellen Hidemi Fukuda
Xinyi Chen
Kun Yuan
Tianyi Chen

Multi-objective learning under user-specified preference is common in real-world problems such as multi-lingual speech recognition under fairness. In this work, we frame such a problem as a semivectorial bilevel optimization problem, whose goal is to optimize a pre-defined preference function, subject to the constraint that the model parameters are weakly Pareto optimal. To solve this problem, we convert the multi-objective constraints to a single-objective constraint through a merit function with an easy-to-evaluate gradient, and then, we use a penalty-based reformulation of the bilevel optimization problem. We theoretically establish the properties of the merit function, and the relations of solutions for the penalty reformulation and the constrained formulation. Then we propose algorithms to solve the reformulated single-level problem, and establish its convergence guarantees. We test the method on various synthetic and real-world problems. The results demonstrate the effectiveness of the proposed method in finding preference-guided optimal solutions to the multi-objective problem.

Details

YNIMG Journal 2025 Journal Article

MR-guided graph learning of 18F-florbetapir PET enables accurate and interpretable Alzheimer’s disease staging

Xinyi Chen
Lijuan Chen
Weiheng Yao
Qiankun Zuo
Ye Li
Dong Liang
Shuqiang Wang
Meiyun Wang

PURPOSE: Subtle structural and molecular brain changes make noninvasive early detection and staging of Alzheimer's disease (AD) challenging and critical for effective intervention. This study develops a novel graph convolutional network (GCN) learning framework that integrates amyloid-β PET imaging and MRI structural features, aiming for improved early detection and accurate staging of AD. METHODS: The retrospective study utilized 18F-florbetapir PET scans from the Alzheimer's Disease Neuroimaging Initiative (ADNI) as the training dataset (323 scans from 196 subjects - 45 normal control, 80 mild cognitive impairment/MCI, 71 AD) and two independent datasets for testing (99 scans from 85 subjects - 31 normal control, 15 MCI, 44 AD). Individual brain graphs were constructed for each PET scan, and graph learning framework was designed to extract molecular features from PET while integrating structural features from MRI. Performance was evaluated using receiver operating characteristic (ROC) analysis, comparing results against cortical SUVR. Additionally, a biomarker GCN_score was defined based on identified salient regions-of-interest, with its effectiveness assessed using the Kruskal-Wallis test and Cohen's effect size. RESULTS: The framework achieved AUCs of 89.8 % (specificity 83.6 %, sensitivity 81.6 %) for distinguishing MCI from normal controls and 88.3 % (specificity 81.6 %, sensitivity 80.6 %) for MCI from AD in the ADNI dataset, with comparable performance in external testing. All results significantly outperformed cortical SUVR (DeLong test p < 0.001). The GCN_score demonstrated superior group differentiation (Cohen's effect sizes 1.744 and 1.32) compared to cortical SUVR (0.309 and 0.641). CONCLUSION: The proposed graph-based learning framework effectively integrates PET and MRI features for accurate AD stage distinction, showing significant promise for early detection and facilitating timely intervention.

Details DOI

IJCAI Conference 2025 Conference Paper

Neuromorphic Sequential Arena: A Benchmark for Neuromorphic Temporal Processing

Xinyi Chen
Chenxiang Ma
YuJie Wu
Kay Chen Tan
Jibin Wu

Temporal processing is vital for extracting meaningful information from time-varying signals. Recent advancements in Spiking Neural Networks (SNNs) have shown immense promise in efficiently processing these signals. However, progress in this field has been impeded by the lack of effective and standardized benchmarks, which complicates the consistent measurement of technological advancements and limits the practical applicability of SNNs. To bridge this gap, we introduce the Neuromorphic Sequential Arena (NSA), a comprehensive benchmark that offers an effective, versatile, and application-oriented evaluation framework for neuromorphic temporal processing. The NSA includes seven real-world temporal processing tasks from a diverse range of application scenarios, each capturing rich temporal dynamics across multiple timescales. Utilizing NSA, we conduct extensive comparisons of recently introduced spiking neuron models and neural architectures, presenting comprehensive baselines in terms of task performance, training speed, memory usage, and energy efficiency. Our findings emphasize an urgent need for efficient SNN designs that can consistently deliver high performance across tasks with varying temporal complexities while maintaining low computational costs. NSA enables systematic tracking of advancements in neuromorphic algorithm research and paves the way for developing effective and efficient neuromorphic temporal processing systems.

PDF Details DOI

ICLR Conference 2025 Conference Paper

Toward Understanding In-context vs. In-weight Learning

Bryan Chan
Xinyi Chen
András György 0001
Dale Schuurmans

It has recently been demonstrated empirically that in-context learning emerges in transformers when certain distributional properties are present in the training data, but this ability can also diminish upon further training. We provide a new theoretical understanding of these phenomena by identifying simplified distributional properties that give rise to the emergence and eventual disappearance of in-context learning. We do so by first analyzing a simplified model that uses a gating mechanism to choose between an in-weight and an in-context predictor. Through a combination of a generalization error and regret analysis we identify conditions where in-context and in-weight learning emerge. These theoretical findings are then corroborated experimentally by comparing the behaviour of a full transformer on the simplified distributions to that of the stylized model, demonstrating aligned results. We then extend the study to a full large language model, showing how fine-tuning on various collections of natural language prompts can elicit similar in-context and in-weight learning behaviour.

Details

NeurIPS Conference 2024 Conference Paper

Preference Learning Algorithms Do Not Learn Preference Rankings

Angelica Chen
Sadhika Malladi
Lily H. Zhang
Xinyi Chen
Qiuyi Zhang
Rajesh Ranganath
Kyunghyun Cho

Preference learning algorithms (e. g. , RLHF and DPO) are frequently used to steer LLMs to produce generations that are more preferred by humans, but our understanding of their inner workings is still limited. In this work, we study the conventional wisdom that preference learning trains models to assign higher likelihoods to more preferred outputs than less preferred outputs, measured via ranking accuracy. Surprisingly, we find that most state-of-the-art preference-tuned models achieve a ranking accuracy of less than 60% on common preference datasets. We furthermore derive the idealized ranking accuracy that a preference-tuned LLM would achieve if it optimized the DPO or RLHF objective perfectly. We demonstrate that existing models exhibit a significant alignment gap -- i. e. , a gap between the observed and idealized ranking accuracies. We attribute this discrepancy to the DPO objective, which is empirically and theoretically ill-suited to correct even mild ranking errors in the reference model, and derive a simple and efficient formula for quantifying the difficulty of learning a given preference datapoint. Finally, we demonstrate that ranking accuracy strongly correlates with the empirically popular win rate metric when the model is close to the reference model used in the objective, shedding further light on the differences between on-policy (e. g. , RLHF) and off-policy (e. g. , DPO) preference learning algorithms.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

Online Control for Meta-optimization

Xinyi Chen
Elad Hazan

Choosing the optimal hyperparameters, including learning rate and momentum, for specific optimization instances is a significant yet non-convex challenge. This makes conventional iterative techniques such as hypergradient descent \cite{baydin2017online} insufficient in obtaining global optimality guarantees. We consider the more general task of meta-optimization -- online learning of the best optimization algorithm given problem instances, and introduce a novel approach based on control theory. We show how meta-optimization can be formulated as an optimal control problem, departing from existing literature that use stability-based methods to study optimization. Our approach leverages convex relaxation techniques in the recently-proposed nonstochastic control framework to overcome the challenge of nonconvexity, and obtains regret guarantees vs. the best offline solution. This guarantees that in meta-optimization, we can learn a method that attains convergence comparable to that of the best optimization method in hindsight from a class of methods.

PDF Details

NeurIPS Conference 2023 Conference Paper

Sketchy: Memory-efficient Adaptive Regularization with Frequent Directions

Vladimir Feinberg
Xinyi Chen
Y. Jennifer Sun
Rohan Anil
Elad Hazan

Adaptive regularization methods that exploit more than the diagonal entries exhibit state of the art performance for many tasks, but can be prohibitive in terms of memory and running time. We find the spectra of the Kronecker-factored gradient covariance matrix in deep learning (DL) training tasks are concentrated on a small leading eigenspace that changes throughout training, motivating a low-rank sketching approach. We describe a generic method for reducing memory and compute requirements of maintaining a matrix preconditioner using the Frequent Directions (FD) sketch. While previous approaches have explored applying FD for second-order optimization, we present a novel analysis which allows efficient interpolation between resource requirements and the degradation in regret guarantees with rank $k$: in the online convex optimization (OCO) setting over dimension $d$, we match full-matrix $d^2$ memory regret using only $dk$ memory up to additive error in the bottom $d-k$ eigenvalues of the gradient covariance. Further, we show extensions of our work to Shampoo, resulting in a method competitive in quality with Shampoo and Adam, yet requiring only sub-linear memory for tracking second moments.

PDF Details

ICML Conference 2020 Conference Paper

Calibration, Entropy Rates, and Memory in Language Models

Mark Braverman
Xinyi Chen
Sham M. Kakade
Karthik Narasimhan
Cyril Zhang
Yi Zhang

Building accurate language models that capture meaningful long-term dependencies is a core challenge in natural language processing. Towards this end, we present a calibration-based approach to measure long-term discrepancies between a generative sequence model and the true distribution, and use these discrepancies to improve the model. Empirically, we show that state-of-the-art language models, including LSTMs and Transformers, are miscalibrated: the entropy rates of their generations drift dramatically upward over time. We then provide provable methods to mitigate this phenomenon. Furthermore, we show how this calibration-based approach can also be used to measure the amount of memory that language models use for prediction.

Details

NeurIPS Conference 2020 Conference Paper

Online Agnostic Boosting via Regret Minimization

Nataly Brukhim
Xinyi Chen
Elad Hazan
Shay Moran

Boosting is a widely used machine learning approach based on the idea of aggregating weak learning rules. While in statistical learning numerous boosting methods exist both in the realizable and agnostic settings, in online learning they exist only in the realizable case. In this work we provide the first agnostic online boosting algorithm; that is, given a weak learner with only marginally-better-than-trivial regret guarantees, our algorithm boosts it to a strong learner with sublinear regret. Our algorithm is based on an abstract (and simple) reduction to online convex optimization, which efficiently converts an arbitrary online convex optimizer to an online booster. Moreover, this reduction extends to the statistical as well as the online realizable settings, thus unifying the 4 cases of statistical/online and agnostic/realizable boosting.

PDF Details

NeurIPS Conference 2018 Conference Paper

Online Learning of Quantum States

Scott Aaronson
Xinyi Chen
Elad Hazan
Satyen Kale
Ashwin Nayak

Suppose we have many copies of an unknown n-qubit state $\rho$. We measure some copies of $\rho$ using a known two-outcome measurement E_1, then other copies using a measurement E_2, and so on. At each stage t, we generate a current hypothesis $\omega_t$ about the state $\rho$, using the outcomes of the previous measurements. We show that it is possible to do this in a way that guarantees that $|\trace(E_i \omega_t) - \trace(E_i\rho)|$, the error in our prediction for the next measurement, is at least $eps$ at most $O(n / eps^2) $\ times. Even in the non-realizable setting---where there could be arbitrary noise in the measurement outcomes---we show how to output hypothesis states that incur at most $O(\sqrt {Tn}) $ excess loss over the best possible state on the first $T$ measurements. These results generalize a 2007 theorem by Aaronson on the PAC-learnability of quantum states, to the online and regret-minimization settings. We give three different ways to prove our results---using convex optimization, quantum postselection, and sequential fat-shattering dimension---which have different advantages in terms of parameters and portability.

PDF Details