Author name cluster

David Chan

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers

1 author row

NeurIPS Conference 2025 Conference Paper

Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling

Tsung-Han (Patrick) Wu
Heekyung Lee
Jiaxin Ge
Joseph Gonzalez
Trevor Darrell
David Chan

Vision-Language Models (VLMs) excel at visual understanding but often suffer from visual hallucinations, where they generate descriptions of nonexistent objects, actions, or concepts, posing significant risks in safety-critical applications. Existing hallucination mitigation methods typically follow one of two paradigms: generation adjustment, which modifies decoding behavior to align text with visual inputs, and post-hoc verification, where external models assess and correct outputs. While effective, generation adjustment methods often rely on heuristics and lack correction mechanisms, while post-hoc verification is complicated, typically requiring multiple models and tending to reject outputs rather than refine them. In this work, we introduce REVERSE, a unified framework that integrates hallucination-aware training with on-the-fly self-verification. By leveraging a new hallucination-verification dataset containing over 1. 3M semi-synthetic samples, along with a novel inference-time retrospective resampling technique, our approach enables VLMs to both detect hallucinations during generation and dynamically revise those hallucinations. Our evaluations show that REVERSE achieves state-of-the-art hallucination reduction, outperforming the best existing methods by up to 12% on CHAIR-MSCOCO and 34% on HaloQuest.

PDF Details

NeurIPS Conference 2025 Conference Paper

LISAt: Language-Instructed Segmentation Assistant for Satellite Imagery

Jerome Quenum
Wen-Han Hsieh
Tsung-Han (Patrick) Wu
Ritwik Gupta
Trevor Darrell
David Chan

Segmentation models can recognize a pre-defined set of objects in images. However, segmentation models capable of "reasoning" over complex user queries that implicitly refer to multiple objects of interest remain underexplored, especially in the geospatial domain. Recent advances in "reasoning segmentation"---generating segmentation masks from complex, implicit query text---demonstrate the potential of vision-language models (VLMs) to reason across an open domain of objects. Yet, our experiments reveal that these models struggle when applied to the unique challenges of remote-sensing imagery. To address this gap, we introduce a new dataset which consists of: GRES, a curated geospatial reasoning-segmentation dataset with 27, 615 annotations across 9, 205 images, and PreGRES, a collection of existing datasets to make up a large-scale multimodal pretraining corpus with over 1M question-answer pairs across 119, 279 images. We propose an initial benchmark model, LISAt, a VLM for geospatial analysis that can describe complex remote-sensing scenes, answer detailed queries, and segment objects based on natural-language prompts. LISAt establishes a strong initial geospatial benchmark, outperforming prior foundation models such as RS-GPT4V by 10. 04\% (BLEU-4) on visual description tasks and surpassing open-domain models on geospatial reasoning segmentation by 143. 36\% (gIoU). Our model, dataset, and code are available on our project page: https: //lisat-bair. github. io/LISAt/.

PDF Details

NeurIPS Conference 2025 Conference Paper

REOrdering Patches Improves Vision Models

Declan Kutscher
David Chan
Yutong Bai
Trevor Darrell
Ritwik Gupta

Sequence models such as transformers require inputs to be represented as one-dimensional sequences. In vision, this typically involves flattening images using a fixed row-major (raster-scan) order. While full self-attention is permutation-equivariant, modern long-sequence transformers increasingly rely on architectural approximations that break this invariance and introduce sensitivity to patch ordering. We show that patch order significantly affects model performance in such settings, with simple alternatives like column-major or Hilbert curves yielding notable accuracy shifts. Motivated by this, we propose REOrder, a two-stage framework for discovering task-optimal patch orderings. First, we derive an information-theoretic prior by evaluating the compressibility of various patch sequences. Then, we learn a policy over permutations by optimizing a Plackett-Luce policy using REINFORCE. This approach enables efficient learning in a combinatorial permutation space. REOrder improves top-1 accuracy over row-major ordering on ImageNet-1K by up to 3. 01% and Functional Map of the World by 13. 35%.

PDF Details

AAMAS Conference 2018 Conference Paper

Rapid Randomized Restarts for Multi-Agent Path Finding: Preliminary Results

Liron Cohen
Sven Koenig
T. K. Satish Kumar
Glenn Wagner
Howie Choset
David Chan
Nathan Sturtevant

Multi-Agent Path Finding (MAPF) is an NP-hard problem with many real-world applications. However, existing MAPF solvers are deterministic and perform poorly on MAPF instances where many agents interfere with each other in a small region of space. In this paper, we enhance MAPF solvers with randomization and observe that their runtimes can exhibit heavy-tailed distributions. This insight leads us to develop simple Rapid Randomized Restart (RRR) strategies with the intuition that multiple short runs will have a better chance of solving such MAPF instances than one long run with the same runtime limit. Our contribution is to show experimentally that the same RRR strategy indeed boosts the performance of two state-of-the-art MAPF solvers, namely M* and ECBS.

PDF