Author name cluster

Xiangyu Wu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

10 papers

2 author rows

AAAI Conference 2026 Conference Paper

Beyond Tokens: Dynamic Latent Reasoning via Semantic Residual Refinement

Fangrui Lv
Lei Wang
Ruixin Hong
Yong Du
Xiangyu Wu
Tingting Gao
Guorui Zhou
Changshui Zhang

Chain-of-Thought prompting has remarkably advanced LLM reasoning by generating explicit step-by-step tokens, yet its discrete nature inherently limits expressiveness and efficiency, struggling with abstract, ambiguous, or semantically divergent cognition beyond linguistic tokens. Latent reasoning offers a promising alternative by operating in the model’s internal continuous space for richer cognitive representations. However, existing methods typically rely on finetuning or token interpolation to bridge latent and input spaces, introducing training difficulty or semantic degradation. To this end, we propose Dynamic Latent Reasoning (DyLaR), a training-free framework that preserves semantic fidelity to latent space. DyLaR introduces a Semantic Residual Refinement module that progressively refines latent inputs by integrating semantic residuals from prior hidden states, thus capturing expressive semantic hierarchies that closely approximate continuous latent representations. To enhance flexibility, DyLaR further incorporates a dynamic switching policy that allows LLMs to alternate between discrete and latent reasoning based on model uncertainty, favoring explicit reasoning when confident and latent exploration under ambiguity. Empirical experiments across knowledge- and reasoning-intensive tasks demonstrate that DyLaR consistently outperforms strong baselines in both effectiveness and token efficiency. Qualitative analyses further illustrate its interpretability and flexibility in navigating complex reasoning scenarios.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding

Jiaqi Tang
Jianmin Chen
Wei Wei
Xiaogang Xu
Runtao Liu
Xiangyu Wu
Qipeng Xie
Jiafei Wu

Multimodal Large Language Models struggle to maintain reliable performance under extreme real-world visual degradations, which impede their practical robustness. Existing robust MLLMs predominantly rely on implicit training/adaptation that focuses solely on visual encoder generalization, suffering from limited interpretability and isolated optimization. To overcome these limitations, we propose Robust-R1, a novel framework that explicitly models visual degradations through structured reasoning chains. Our approach integrates: (i) supervised fine-tuning for degradation-aware reasoning foundations, (ii) reward-driven alignment for accurately perceiving degradation parameters, and (iii) dynamic reasoning depth scaling adapted to degradation intensity. To facilitate this approach, we introduce a specialized 11K dataset featuring realistic degradations synthesized across four critical real-world visual processing stages, each annotated with structured chains connecting degradation parameters, perceptual influence, pristine semantic reasoning chain, and conclusion. Comprehensive evaluations demonstrate state-of-theart robustness: Robust-R1 outperforms all general and robust baselines on the real-world degradation benchmark R-Bench, while maintaining superior anti-degradation performance under multi-intensity adversarial degradations on MMMB, MMStar, and RealWorldQA.

PDF Details DOI

ICLR Conference 2025 Conference Paper

Multi-Label Test-Time Adaptation with Bound Entropy Minimization

Xiangyu Wu
Feng Yu 0030
Yang Yang 0128
Qing-Guo Chen
Jianfeng Lu 0003

Mainstream test-time adaptation (TTA) techniques endeavor to mitigate distribution shifts via entropy minimization for multi-class classification, inherently increasing the probability of the most confident class. However, when encountering multi-label instances, the primary challenge stems from the varying number of labels per image, and prioritizing only the highest probability class inevitably undermines the adaptation of other positive labels. To address this issue, we investigate TTA within multi-label scenario (ML--TTA), developing Bound Entropy Minimization (BEM) objective to simultaneously increase the confidence of multiple top predicted labels. Specifically, to determine the number of labels for each augmented view, we retrieve a paired caption with yielded textual labels for that view. These labels are allocated to both the view and caption, called weak label set and strong label set with the same size k. Following this, the proposed BEM considers the highest top-k predicted labels from view and caption as a single entity, respectively, learning both view and caption prompts concurrently. By binding top-k predicted labels, BEM overcomes the limitation of vanilla entropy minimization, which exclusively optimizes the most confident class. Across the MSCOCO, VOC, and NUSWIDE multi-label datasets, our ML--TTA framework equipped with BEM exhibits superior performance compared to the latest SOTA methods, across various model architectures, prompt initialization, and varying label scenarios. The code is available at https://github.com/Jinx630/ML-TTA.

Details

IJCAI Conference 2024 Conference Paper

TAI++: Text as Image for Multi-Label Image Classification by Co-Learning Transferable Prompt

Xiangyu Wu
Qing-Yuan Jiang
Yang Yang
Yi-Feng Wu
Qing-Guo Chen
Jianfeng Lu

The recent introduction of prompt tuning based on pre-trained vision-language models has dramatically improved the performance of multi-label image classification. However, some existing strategies that have been explored still have drawbacks, i. e. , either exploiting massive labeled visual data at a high cost or using text data only for text prompt tuning and thus failing to learn the diversity of visual knowledge. Hence, the application scenarios of these methods are limited. In this paper, we propose a pseudo-visual prompt (PVP) module for implicit visual prompt tuning to address this problem. Specifically, we first learn the pseudo-visual prompt for each category, mining diverse visual knowledge by the well-aligned space of pre-trained vision-language models. Then, a co-learning strategy with a dual-adapter module is designed to transfer visual knowledge from pseudo-visual prompt to text prompt, enhancing their visual representation abilities. Experimental results on VOC2007, MS-COCO, and NUSWIDE datasets demonstrate that our method can surpass state-of-the-art (SOTA) methods across various settings for multi-label image classification tasks. The code is available at https: //github. com/njustkmg/PVP.

PDF Details DOI

ICRA Conference 2023 Conference Paper

Learning a Single Near-hover Position Controller for Vastly Different Quadcopters

Dingqi Zhang
Antonio Loquercio
Xiangyu Wu
Ashish Kumar 0007
Jitendra Malik
Mark W. Mueller

This paper proposes an adaptive near-hover position controller for quadcopters, which can be deployed to quadcopters of very different mass, size and motor constants, and also shows rapid adaptation to unknown disturbances during runtime. The core algorithmic idea is to learn a single policy that can adapt online at test time not only to the disturbances applied to the drone, but also to the robot dynamics and hardware in the same framework. We achieve this by training a neural network to estimate a latent representation of the robot and environment parameters, which is used to condition the behaviour of the controller, also represented as a neural network. We train both networks exclusively in simulation with the goal of flying the quadcopters to goal positions and avoiding crashes to the ground. We directly deploy the same controller trained in the simulation without any modifications on two quadcopters in the real world with differences in mass, size, motors, and propellers with mass differing by 4. 5 times. In addition, we show rapid adaptation to sudden and large disturbances up to one-third of the mass of the quadcopters. We perform an extensive evaluation in both simulation and the physical world, where we outperform a state-of-the-art learning-based adaptive controller and a traditional PID controller specifically tuned to each platform individually. Video results can be found at https://youtu.be/U-c-LbTfvoA.

Details

IROS Conference 2021 Conference Paper

Real-time Geo-localization Using Satellite Imagery and Topography for Unmanned Aerial Vehicles

Shuxiao Chen
Xiangyu Wu
Mark W. Mueller
Koushil Sreenath

The capabilities of autonomous flight with unmanned aerial vehicles (UAVs) have significantly increased in recent times. However, basic problems such as fast and robust geo-localization in GPS-denied environments still remain unsolved. Existing research has primarily concentrated on improving the accuracy of localization at the cost of long and varying computation time in various situations, which often necessitates the use of powerful ground station machines. In order to make image-based geo-localization online and pragmatic for lightweight embedded systems on UAVs, we propose a framework that is reliable in changing scenes, flexible about computing resource allocation and adaptable to common camera placements. The framework is comprised of two stages: offline database preparation and online inference. At the first stage, color images and depth maps are rendered as seen from potential vehicle poses quantized over the satellite and topography maps of anticipated flying areas. A database is then populated with the global and local descriptors of the rendered images. At the second stage, for each captured real-world query image, top global matches are retrieved from the database and the vehicle pose is further refined via local descriptor matching. We present field experiments of image-based localization on two different UAV platforms to validate our results.

Details

IROS Conference 2020 Conference Paper

A collision-resilient aerial vehicle with icosahedron tensegrity structure

Jiaming Zha
Xiangyu Wu
Joseph Kroeger
Natalia Perez
Mark W. Mueller

Aerial vehicles with collision resilience can operate with more confidence in environments with obstacles that are hard to detect and avoid. This paper presents the methodology used to design a collision resilient aerial vehicle with icosahedron tensegrity structure. A simplified stress analysis of the tensegrity frame under impact forces is performed to guide the selection of its components. In addition, an autonomous controller is presented to reorient the vehicle from an arbitrary orientation on the ground to help it take off. Experiments show that the vehicle can successfully reorient itself after landing upside-down and can survive collisions with speed up to 6. 5m/s.

Details

IROS Conference 2020 Conference Paper

In-flight range optimization of multicopters using multivariable extremum seeking with adaptive step size

Xiangyu Wu
Mark W. Mueller

Limited flight range is a common problem for multicopters. To alleviate this problem, we propose a method for finding the optimal speed and heading of a multicopter when flying a given path to achieve the longest flight range. Based on a novel multivariable extremum seeking controller with adaptive step size, the method (a) does not require any power consumption model of the vehicle, (b) can adapt to unknown disturbances, (c) can be executed online, and (d) converges faster than the standard extremum seeking controller with constant step size. We conducted indoor experiments to validate the effectiveness of this method under different payloads and initial conditions, and showed that it is able to converge more than 30% faster than the standard extremum seeking controller. This method is especially useful for applications such as package delivery, where the size and weight of the payload differ for different deliveries and the power consumption of the vehicle is hard to model.

Details

ICRA Conference 2020 Conference Paper

Using multiple short hops for multicopter navigation with only inertial sensors

Xiangyu Wu
Mark W. Mueller

In certain challenging environments, such as inside buildings on fire, the main sensors (e. g. cameras, LiDARs and GPS systems) used for multicopter localization can become unavailable. Direct integration of the inertial navigation sensors (the accelerometer and rate gyroscope), is however unaffected by external disturbances, but the rapid error accumulation quickly makes a naive application of such a strategy feasible only for very short durations. In this work we propose a motion strategy for reducing the inertial navigation state estimation error of multicopters. The proposed strategy breaks a long duration flight into multiple short duration hops between which the vehicle remains stationary on the ground. When the vehicle is stationary, zero-velocity pseudo-measurements are introduced to an extended Kalman Filter to reduce the state estimation error. We perform experiments for closed-loop control of a multicopter for evaluation. The mean absolute position estimation error was 3. 4% over a total flight distance of 5m in the experiments. The results showed a 80% reduction compared to the standard inertial navigation method without using this strategy. In addition, an additional experiment with total flight distance of 10m is conducted to demonstrate the ability of this method to navigate a multicopter in real-world environment. The final trajectory tracking error was 3% of the total flight distance.

Details

ICRA Conference 2019 Conference Paper

Model-free Online Motion Adaptation for Optimal Range and Endurance of Multicopters

Andrea Tagliabue
Xiangyu Wu
Mark W. Mueller

In this work we introduce an approach that allows a quadcopter to find the velocity which maximizes its flight time (endurance) or flight distance (range) while moving along a given path, using on-board power measurement. The proposed strategy is based on Extremum Seeking control and (a) does not require any model of the power consumption of the system, (b) can be executed on-line, and (c) guarantees adaptation to unknown disturbances. We show experimentally that hovering is not the most energy-efficient loitering strategy, and we demonstrate the proposed method's ability to adapt to different aerodynamic disturbances, such as payloads. The method may be especially useful in applications where a quadcopter carries an unknown payload, allowing it to adapt for improved range.

Details