Author name cluster

Can Cui

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers

2 author rows

AAAI Conference 2026 Conference Paper

VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model

Yihao Wang
Pengxiang Ding
Lingxiao Li
Can Cui
Zirui Ge
Xinyang Tong
Wenxuan Song
Han Zhao

Vision-Language-Action (VLA) models typically bridge the gap between perceptual and action spaces by pre-training a large-scale Vision-Language Model (VLM) on robotic data. While this approach greatly enhances performance, it also incurs significant training costs. In this paper, we investigate how to effectively bridge vision-language (VL) representations to action (A). We introduce VLA-Adapter, a novel paradigm designed to reduce the reliance of VLA models on large-scale VLMs and extensive pre-training. To this end, we first systematically analyze the effectiveness of various VL conditions and present key findings on which conditions are essential for bridging perception and action spaces. Based on these insights, we propose a lightweight Policy module with Bridge Attention, which autonomously injects the optimal condition into the action space. In this way, our method achieves high performance using only a 0.5B-parameter backbone, without any robotic data pre-training. Extensive experiments on both simulated and real-world robotic benchmarks show that VLA-Adapter not only achieves state-of-the-art level performance, but also offers the fast inference speed reported to date. Furthermore, thanks to the proposed advanced bridging paradigm, VLA-Adapter enables the training of a powerful VLA model on a single consumer-grade GPU, greatly lowering the barrier to deploying VLA model.

PDF Details DOI

ICLR Conference 2022 Conference Paper

Learned Simulators for Turbulence

Kimberly L. Stachenfeld
Drummond Buschman Fielding
Dmitrii Kochkov
Miles D. Cranmer
Tobias Pfaff
Jonathan Godwin
Can Cui
Shirley Ho

Turbulence simulation with classical numerical solvers requires high-resolution grids to accurately resolve dynamics. Here we train learned simulators at low spatial and temporal resolutions to capture turbulent dynamics generated at high resolution. We show that our proposed model can simulate turbulent dynamics more accurately than classical numerical solvers at the comparably low resolutions across various scientifically relevant metrics. Our model is trained end-to-end from data and is capable of learning a range of challenging chaotic and turbulent dynamics at low resolution, including trajectories generated by the state-of-the-art Athena++ engine. We show that our simpler, general-purpose architecture outperforms various more specialized, turbulence-specific architectures from the learned turbulence simulation literature. In general, we see that learned simulators yield unstable trajectories; however, we show that tuning training noise and temporal downsampling solves this problem. We also find that while generalization beyond the training distribution is a challenge for learned models, training noise, added loss constraints, and dataset augmentation can help. Broadly, we conclude that our learned simulator outperforms traditional solvers run on coarser grids, and emphasize that simple design choices can offer stability and robust generalization.

Details

AAAI Conference 2021 Conference Paper

Adaptive Knowledge Driven Regularization for Deep Neural Networks

Zhaojing Luo
Shaofeng Cai
Can Cui
Beng Chin Ooi
Yang Yang

In many real-world applications, the amount of data available for training is often limited, and thus inductive bias and auxiliary knowledge are much needed for regularizing model training. One popular regularization method is to impose prior distribution assumptions on model parameters, and many recent works also attempt to regularize training by integrating external knowledge into specific neurons. However, existing regularization methods fail to take account of the interaction between connected neuron pairs, which is invaluable internal knowledge for adaptive regularization for better representation learning as training progresses. In this paper, we explicitly take into account the interaction between connected neurons, and propose an adaptive internal knowledge driven regularization method, CORR-Reg. The key idea of CORR-Reg is to give a higher significance weight to connections of more correlated neuron pairs. The significance weights adaptively identify more important input neurons for each neuron. Instead of regularizing connection model parameters with a static strength such as weight decay, CORR- Reg imposes weaker regularization strength on more significant connections. As a consequence, neurons attend to more informative input features and thus learn more diversified and discriminative representation. We derive CORR-Reg with the Bayesian inference framework and propose a novel optimization algorithm with the Lagrange multiplier method and Stochastic Gradient Descent. Extensive evaluations on diverse benchmark datasets and neural network structures show that CORR-Reg achieves significant improvement over stateof-the-art regularization methods.

PDF Details

ICRA Conference 2021 Conference Paper

Asynchronous Multi-View SLAM

Anqi Joyce Yang
Can Cui
Ioan Andrei Bârsan
Raquel Urtasun
Shenlong Wang

Existing multi-camera SLAM systems assume synchronized shutters for all cameras, which is often not the case in practice. In this work, we propose a generalized multi-camera SLAM formulation which accounts for asynchronous sensor observations. Our framework integrates a continuous-time motion model to relate information across asynchronous multi-frames during tracking, local mapping, and loop closing. For evaluation, we collected AMV-Bench, a challenging new SLAM dataset covering 482 km of driving recorded using our asynchronous multi-camera robotic platform. AMV-Bench is over an order of magnitude larger than previous multi-view HD outdoor SLAM datasets, and covers diverse and challenging motions and environments. Our experiments emphasize the necessity of asynchronous sensor modeling, and show that the use of multiple cameras is critical towards robust and accurate SLAM in challenging outdoor scenes. The supplementary material is located at: https://www.cs.toronto.edu/~ajyang/amv-slam

Details