Author name cluster

Wanting Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers

1 author row

AAAI Conference 2026 Conference Paper

Mem4D: Decoupling Static and Dynamic Memory for Dynamic Scene Reconstruction

Xudong Cai
Shuo Wang
Peng Wang
Yongcai Wang
Zhaoxin Fan
Wanting Li
Tianbao Zhang
Jianrong Tao

Reconstructing dense geometry for dynamic scenes from a monocular video is a critical yet challenging task. Recent memory-based methods enable efficient online reconstruction, but they fundamentally suffer from a Memory Demand Dilemma: The memory representation faces an inherent conflict between the long-term stability required for static structures and the rapid, high-fidelity detail retention needed for dynamic motion. This conflict forces existing methods into a compromise, leading to either geometric drift in static structures or blurred, inaccurate reconstructions of dynamic objects. To address this dilemma, we propose Mem4D, a novel framework that decouples the modeling of static geometry and dynamic motion. Guided by this insight, we design a dual-memory architecture: 1) The Transient Dynamics Memory (TDM) focuses on capturing high-frequency motion details from recent frames, enabling accurate and fine-grained modeling of dynamic content; 2) The Persistent Structure Memory (PSM) compresses and preserves long-term spatial information, ensuring global consistency and drift-free reconstruction for static elements. By alternating queries to these specialized memories, Mem4D simultaneously maintains static geometry with global consistency and reconstructs dynamic elements with high fidelity. Experiments on challenging benchmarks demonstrate that our method achieves state-of-the-art or competitive performance while maintaining high efficiency.

PDF Details DOI

AAAI Conference 2026 Conference Paper

MonoDream: Monocular Vision-Language Navigation with Panoramic Dreaming

Shuo Wang
Yongcai Wang
Zhaoxin Fan
Yucheng Wang
Maiyue Chen
Kaihui Wang
Zhizhong Su
Wanting Li

Vision-Language Navigation (VLN) tasks often leverage panoramic RGB and depth inputs to provide rich spatial cues for action planning, but these sensors can be costly or less accessible in real-world deployments. Recent approaches based on Vision-Language Action (VLA) models achieve strong results with monocular input, yet they still lag behind methods using panoramic RGB-D information. We present MonoDream, a lightweight VLA framework that enables monocular agents to learn a Unified Navigation Representation (UNR). This shared feature representation jointly aligns navigation-relevant visual semantics (e.g., global layout, depth, and future cues) and language-grounded action intent, enabling more reliable action prediction. MonoDream further introduces Latent Panoramic Dreaming (LPD) tasks to supervise the UNR, which train the model to predict latent features of panoramic RGB and depth observations at both current and future steps based on only monocular input. Experiments on multiple VLN benchmarks show that MonoDream consistently improves monocular navigation performance and significantly narrows the gap with panoramic-based agents.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Aux-Think: Exploring Reasoning Strategies for Data-Efficient Vision-Language Navigation

Shuo Wang
Yongcai Wang
Wanting Li
Xudong Cai
Yucheng Wang
Maiyue Chen
Zhizhong Su
Deying Li

Vision-Language Navigation is a critical task for developing embodied agents that can follow natural language instructions to navigate in complex real-world environments. Recent advances by finetuning large pretrained models have significantly improved generalization and instruction grounding compared to traditional approaches. However, the role of reasoning strategies in navigation—an action-centric, long-horizon task—remains underexplored, despite Chain-of-Thought reasoning's demonstrated success in static tasks like question answering and visual reasoning. To address this gap, we conduct the first systematic evaluation of reasoning strategies for VLN, including No-Think (direct action prediction), Pre-Think (reason before action), and Post-Think (reason after action). Surprisingly, our findings reveal the Inference-time Reasoning Collaps issue, where inference-time reasoning degrades navigation accuracy, highlighting the challenges of integrating reasoning into VLN. Based on this insight, we propose Aux-Think, a framework that trains models to internalize structured reasoning patterns through CoT supervision during training, while preserving No-Think inference for efficient action prediction. To support this framework, we release R2R-CoT-320k, a large-scale Chain-of-Thought annotated dataset. Empirically, Aux-Think significantly reduces training effort without compromising performance.

PDF Details

TCS Journal 2023 Journal Article

A robust map matching method by considering memorized multiple matching candidates

Wanting Li
Yongcai Wang
Deying Li
Xiaojia Xu

Map matching is to track the positions of vehicles on the road network based on the positions provided by GPS (Global Positioning System) devices. Balancing localization accuracy and computation efficiency is a key problem in map matching. Existing methods mainly use Hidden Markov Model (HMM) or historical transportation data to learn the transitional probabilities among road segments. Although the roads to explore can be remarkably reduced by the Markov assumption, miss-of-match and matching breaks may occur if the GPS data is highly noisy, and the transitional model needs to be learned offline. To address these problems, this paper presents Multiple Candidate Matching (MCM) to improve the robustness of map matching. MCM doesn't need to pre-train the transitional model nor the historical transportation information. MCM memorizes multiple historical matching candidates in the map matching process. It votes among historical matchings and current matchings, but generates limited number of road candidates in real-time to restrict the computation complexity. MCM for both online map matching and offline map matching are presented and their properties are analyzed theoretically and experimentally. Numerical experiments in large-scale data sets show that MCM is very promising in terms of accuracy, computational efficiency, and robustness. The matching break and miss-of-match problems can be resolved effectively when compared with the state-of-the-art map matching methods. Codes are outsourced at https: //github. com/lindalee-inlab/MCM.

Details DOI