Author name cluster

Andrew Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers

2 author rows

NeurIPS Conference 2025 Conference Paper

Better Training Data Attribution via Better Inverse Hessian-Vector Products

Andrew Wang
Elisa Nguyen
Runshi Yang
Juhan Bae
Sheila McIlraith
Roger Grosse

Training data attribution (TDA) provides insights into which training data is responsible for a learned model behavior. Gradient-based TDA methods such as influence functions and unrolled differentiation both involve a computation that resembles an inverse Hessian-vector product (iHVP), which is difficult to approximate efficiently. We introduce an algorithm (ASTRA) which uses the EKFAC-preconditioner on Neumann series iterations to arrive at an accurate iHVP approximation for TDA. ASTRA is easy to tune, requires fewer iterations than Neumann series iterations, and is more accurate than EKFAC-based approximations. Using ASTRA, we show that improving the accuracy of the iHVP approximation can significantly improve TDA performance.

PDF Details

NeurIPS Conference 2025 Conference Paper

FEEDBACK FRICTION: LLMs Struggle to Fully Incorporate External Feedback

Dongwei Jiang
Bowei Zhang
Andrew Wang
Nicholas Andrews
Daniel Khashabi

Recent studies have shown LLMs possess some ability to improve their responses when given external feedback. However, it remains unclear how effectively and thoroughly these models can incorporate extrinsic feedback. In an ideal scenario, if LLMs receive near-perfect and complete feedback, we would expect them to fully integrate the feedback and reach correct solutions. In this paper, we systematically investigate LLMs’ ability to incorporate feedback by designing a controlled experimental environment. For each problem, a solver model attempts a solution, then a feedback generator with access to near-complete ground-truth answers produces targeted feedback, after which the solver tries again. We evaluate this pipeline across a diverse range of tasks, including math reasoning, knowledge reasoning, scientific reasoning, and general multi-domain evaluations with state-of-the-art language models including Claude 3. 7 with extended thinking. Surprisingly, even under these near-ideal conditions, solver models consistently show resistance to feedback, a limitation that we term FEEDBACK FRICTION. To mitigate this limitation, we experiment with sampling-based strategies like progressive temperature increases and explicit rejection of previously attempted incorrect answers, which yield improvements but still fail to help models achieve target performance. We analyze FEEDBACK FRICTION and find that models’ confidence on specific questions, measured by semantic entropy, predicts feedback resistance: high-confidence predictions remain resistant to external correction. We hope that highlighting this issue in LLMs will help future research in self-improvement.

PDF Details

NeurIPS Conference 2025 Conference Paper

Ground-Compose-Reinforce: Grounding Language in Agentic Behaviours using Limited Data

Andrew Li
Toryn Klassen
Andrew Wang
Parand A. Alamdari
Sheila McIlraith

Grounding language in perception and action is a key challenge when building situated agents that can interact with humans, or other agents, via language. In the past, addressing this challenge has required manually designing the language grounding or curating massive datasets that associate language with the environment. We propose Ground-Compose-Reinforce, an end-to-end, neurosymbolic framework for training RL agents directly from high-level task specifications—without manually designed reward functions or other domain-specific oracles, and without massive datasets. These task specifications take the form of Reward Machines, automata-based representations that capture high-level task structure and are in some cases autoformalizable from natural language. Critically, we show that Reward Machines can be grounded using limited data by exploiting compositionality. Experiments in a custom Meta-World domain with only 350 labelled pretraining trajectories show that our framework faithfully elicits complex behaviours from high-level specifications—including behaviours that never appear in pretraining—while non-compositional approaches fail.

PDF Details

NeurIPS Conference 2025 Conference Paper

InFlux: A Benchmark for Self-Calibration of Dynamic Intrinsics of Video Cameras

Erich Liang
Roma Bhattacharjee
Sreemanti Dey
Rafael Moschopoulos
Caitlin Wang
Michel Liao
Grace Tan
Andrew Wang

Accurately tracking camera intrinsics is crucial for achieving 3D understanding from 2D video. However, most 3D algorithms assume that camera intrinsics stay constant throughout a video, which is often not true for many real-world in-the-wild videos. A major obstacle in this field is a lack of dynamic camera intrinsics benchmarks--existing benchmarks typically offer limited diversity in scene content and intrinsics variation, and none provide per-frame intrinsic changes for consecutive video frames. In this paper, we present Intrinsics in Flux (InFlux), a real-world benchmark that provides per-frame ground truth intrinsics annotations for videos with dynamic intrinsics. Compared to prior benchmarks, InFlux captures a wider range of intrinsic variations and scene diversity, featuring 143K+ annotated frames from 386 high-resolution indoor and outdoor videos with dynamic camera intrinsics. To ensure accurate per-frame intrinsics, we build a comprehensive lookup table of calibration experiments and extend the Kalibr toolbox to improve its accuracy and robustness. Using our benchmark, we evaluate existing baseline methods for predicting camera intrinsics and find that most struggle to achieve accurate predictions on videos with dynamic intrinsics. For the dataset, code, videos, and submission, please visit https: //influx. cs. princeton. edu/.

PDF Details

ICML Conference 2025 Conference Paper

Learning Extrapolative Sequence Transformations from Markov Chains

Sophia Hager
Aleem Khan
Andrew Wang
Nicholas Andrews

Most successful applications of deep learning involve similar training and test conditions. However, tasks such as biological sequence design involve searching for sequences that improve desirable properties beyond previously known values, which requires novel hypotheses that extrapolate beyond training data. In these settings, extrapolation may be achieved by using random search methods such as Markov chain Monte Carlo (MCMC), which, given an initial state, sample local transformations to approximate a target density that rewards states with the desired properties. However, even with a well-designed proposal, MCMC may struggle to explore large structured state spaces efficiently. Rather than relying on stochastic search, it would be desirable to have a model that greedily optimizes the properties of interest, successfully extrapolating in as few steps as possible. We propose to learn such a model from the Markov chains resulting from MCMC search. Specifically, our approach uses selected states from Markov chains as a source of training data for an autoregressive model, which is then able to efficiently generate novel sequences that extrapolate along the sequence-level properties of interest. The proposed approach is validated on three problems: protein sequence design, text sentiment control, and text anonymization. We find that the autoregressive model can extrapolate as well or better than MCMC, but with the additional benefits of scalability and significantly higher sample efficiency.

Details

ICLR Conference 2024 Conference Paper

Identifying the Risks of LM Agents with an LM-Emulated Sandbox

Yangjun Ruan
Honghua Dong
Andrew Wang
Silviu Pitis
Yongchao Zhou
Jimmy Ba
Yann Dubois
Chris J. Maddison

Recent advances in Language Model (LM) agents and tool use, exemplified by applications like ChatGPT Plugins, enable a rich set of capabilities but also amplify potential risks—such as leaking private data or causing financial losses. Identifying these risks is labor-intensive, necessitating implementing the tools, setting up the environment for each test scenario manually, and finding risky cases. As tools and agents become more complex, the high cost of testing these agents will make it increasingly difficult to find high-stakes, long-tail risks. To address these challenges, we introduce ToolEmu: a framework that uses an LM to emulate tool execution and enables scalable testing of LM agents against a diverse range of tools and scenarios. Alongside the emulator, we develop an LM-based automatic safety evaluator that examines agent failures and quantifies associated risks. We test both the tool emulator and evaluator through human evaluation and find that 68.8% of failures identified with ToolEmu would be valid real-world agent failures. Using our curated initial benchmark consisting of 36 high-stakes toolkits and 144 test cases, we provide a quantitative risk analysis of current LM agents and identify numerous failures with potentially severe outcomes. Notably, even the safest LM agent exhibits such failures 23.9% of the time according to our evaluator, underscoring the need to develop safer LM agents for real-world deployment.

Details

ICML Conference 2023 Conference Paper

Learning Belief Representations for Partially Observable Deep RL

Andrew Wang
Andrew C. Li
Toryn Q. Klassen
Rodrigo Toro Icarte
Sheila A. McIlraith

Many important real-world Reinforcement Learning (RL) problems involve partial observability and require policies with memory. Unfortunately, standard deep RL algorithms for partially observable settings typically condition on the full history of interactions and are notoriously difficult to train. We propose a novel deep, partially observable RL algorithm based on modelling belief states — a technique typically used when solving tabular POMDPs, but that has traditionally been difficult to apply to more complex environments. Our approach simplifies policy learning by leveraging state information at training time, that may not be available at deployment time. We do so in two ways: first, we decouple belief state modelling (via unsupervised learning) from policy optimization (via RL); and second, we propose a representation learning approach to capture a compact set of reward-relevant features of the state. Experiments demonstrate the efficacy of our approach on partially observable domains requiring information seeking and long-term memory.

Details

NeurIPS Conference 2022 Conference Paper

Operator Splitting Value Iteration

Amin Rakhsha
Andrew Wang
Mohammad Ghavamzadeh
Amir-massoud Farahmand

We introduce new planning and reinforcement learning algorithms for discounted MDPs that utilize an approximate model of the environment to accelerate the convergence of the value function. Inspired by the splitting approach in numerical linear algebra, we introduce \emph{Operator Splitting Value Iteration} (OS-VI) for both Policy Evaluation and Control problems. OS-VI achieves a much faster convergence rate when the model is accurate enough. We also introduce a sample-based version of the algorithm called OS-Dyna. Unlike the traditional Dyna architecture, OS-Dyna still converges to the correct value function in presence of model approximation error.

PDF Details

AAAI Conference 2015 Conference Paper

Chance-Constrained Scheduling via Conflict-Directed Risk Allocation

Andrew Wang
Brian Williams

Temporal uncertainty in large-scale logistics forces one to trade off between lost efficiency through built-in slack and costly replanning when deadlines are missed. Due to the difficulty of reasoning about such likelihoods and consequences, a computational framework is needed to quantify and bound the risk of violating scheduling requirements. This work addresses the chance-constrained scheduling problem, where actions’ durations are modeled probabilistically. Our solution method uses conflict-directed risk allocation to efficiently compute a scheduling policy. The key insight, compared to previous work in probabilistic scheduling, is to decouple the reasoning about temporal and risk constraints. This decomposes the problem into a separate master and subproblem, which can be iteratively solved much quicker. Through a set of simulated car-sharing scenarios, it is empirically shown that conflict-directed risk allocation computes solutions nearly an order of magnitude faster than prior art does, which considers all constraints in a single lump-sum optimization.

PDF Details

AAAI Conference 2015 Conference Paper

Risk-Aware Scheduling throughout Planning and Execution

Andrew Wang

PDF Details

AAAI Conference 2012 Conference Paper

Bayes-Adaptive Interactive POMDPs

Brenda Ng
Kofi Boakye
Carol Meyers
Andrew Wang

We introduce the Bayes-Adaptive Interactive Partially Observable Markov Decision Process (BA-IPOMDP), the first multiagent decision model that explicitly incorporates model learning. As in I-POMDPs, the BA-IPOMDP agent maintains beliefs over interactive states, which include the physical states as well as the other agents’ models. The BA-IPOMDP assumes that the state transition and observation probabilities are unknown, and augments the interactive states to include these parameters. Beliefs are maintained over this augmented interactive state space. This (necessary) state expansion exacerbates the curse of dimensionality, especially since each I-POMDP belief update is already a recursive procedure (because an agent invokes belief updates from other agents’ perspectives as part of its own belief update, in order to anticipate other agents’ actions). We extend the interactive particle filter to perform approximate belief update on BA-IPOMDPs. We present our findings on the multiagent Tiger problem.

PDF Details

YNIMG Journal 2011 Journal Article

Detection of amyloid plaques targeted by USPIO-Aβ1–42 in Alzheimer's disease transgenic mice using magnetic resonance microimaging

Jing Yang
Youssef Zaim Wadghiri
Dung Minh Hoang
Wai Tsui
Yanjie Sun
Erika Chung
Yongsheng Li
Andrew Wang

Amyloid plaques are one of the pathological hallmarks of Alzheimer's disease (AD). The visualization of amyloid plaques in the brain is important to monitor AD progression and to evaluate the efficacy of therapeutic interventions. Our group has developed several contrast agents to detect amyloid plaques in vivo using magnetic resonance microimaging (μMRI) in AD transgenic mice, where we used intra-carotid mannitol to enhance blood–brain barrier (BBB) permeability. In the present study, we used ultrasmall superparamagnetic iron oxide (USPIO) nanoparticles, chemically coupled with Aβ1–42 peptide to detect amyloid deposition along with mannitol for in vivo μMRI by femoral intravenous injection. A 3D gradient multi-echo sequence was used for imaging with a 100μm isotropic resolution. The amyloid plaques detected by T2*-weighted μMRI were confirmed with matched histological sections. Furthermore, two different quantitative analyses were used. The region of interest-based quantitative measurement of T2* values showed contrast-injected APP/PS1 mice had significantly reduced T2* values compared to wild-type mice. In addition, the scans were examined with voxel-based morphometry (VBM) using statistical parametric mapping (SPM) for comparison of contrast-injected AD transgenic and wild-type mice. The regional differences seen in VBM comparing USPIO-Aβ1–42 injected APP/PS1 and wild-type mice correlated with the amyloid plaque distribution histologically, contrasting with no differences between the two groups of mice without contrast agent injection in regions of the brain with amyloid deposition. Our results demonstrated that both approaches were able to identify the differences between AD transgenic mice and wild-type mice, after injected with USPIO-Aβ1–42. The feasibility of using less invasive intravenous femoral injections for amyloid plaque detection in AD transgenic mice facilitates using this method for longitudinal studies in the pathogenesis of AD.

Details DOI

ICRA Conference 1995 Conference Paper

Mechanisms for Haptic Feedback

Raymond Hui
Alain Ouellet
Andrew Wang
Paul G. Kry
Stefan B. Williams
George Vukovich
Walter Peruzzini

This article describes work in progress at the Canadian Space Agency on the design and implementation of haptic devices. Haptic devices are a special class of robotic mechanisms for which structural transparency is a foremost design criterion. Also notable is the fact that often, three or four degrees of freedom, rather than six as in general robotic tasks, are sufficient for many haptic applications. Furthermore, in order to make these devices readily available to many users, it is necessary their kinematic models be sufficiently simple such that they can be controlled by inexpensive means. Various three and four-DOF mechanisms, some of which recently developed by the Canadian Space Agency, are discussed herein in terms of their suitability for haptic applications. For five or six-DOF applications, the concept of the virtual handle is introduced, reducing the problems and complexity usually associated with mechanisms with a high number of degrees of freedom.

Details