Arrow Research search

Author name cluster

Anthony Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

3 papers
1 author row

Possible papers

3

NeurIPS Conference 2025 Conference Paper

Escaping the SpuriVerse: Can Large Vision-Language Models Generalize Beyond Seen Spurious Correlations?

  • Yiwei Yang
  • Chung Peng Lee
  • Shangbin Feng
  • Dora Zhao
  • Bingbing Wen
  • Anthony Liu
  • Yulia Tsvetkov
  • Bill Howe

Spurious correlations occur when models rely on non-essential features that coincidentally co-vary with target labels, leading to incorrect reasoning under distribution shift. We consider spurious correlations in multi-modal Large Vision Language Models (LVLMs) pretrained on extensive and diverse datasets without explicit task supervision. We develop a benchmark by sourcing GPT-4o errors on real-world visual-question-answering (VQA) benchmarks, then curating a subset through LVLM-human annotation and synthetic counterfactual evaluation to identify errors caused by spurious correlations. This process yields SpuriVerse, a novel benchmark comprised of 124 distinct types of spurious correlations extracted from real-world datasets, each containing 1 realistic and 10 synthetic VQA samples for a total of 1364 multiple choice questions. We evaluate 15 open and closed-source LVLMs on SpuriVerse, finding that even state-of-the-art closed-source models struggle significantly, achieving at best only 35. 0\% accuracy. Fine-tuning on synthetic examples that emphasize the spurious correlation improves performance to 78. 4\%, suggesting that training on diverse spurious patterns generalizes to unseen situations: models appear to learn to avoid "shortcuts" and attend to the overall image context.

AAAI Conference 2022 Conference Paper

Learning Parameterized Task Structure for Generalization to Unseen Entities

  • Anthony Liu
  • Sungryull Sohn
  • Mahdi Qazwini
  • Honglak Lee

Real world tasks are hierarchical and compositional. Tasks can be composed of multiple subtasks (or sub-goals) that are dependent on each other. These subtasks are defined in terms of entities (e. g. , apple, pear) that can be recombined to form new subtasks (e. g. , pickup apple, and pickup pear). To solve these tasks efficiently, an agent must infer subtask dependencies (e. g. an agent must execute pickup apple before place apple in pot), and generalize the inferred dependencies to new subtasks (e. g. place apple in pot is similar to place apple in pan). Moreover, an agent may also need to solve unseen tasks, which can involve unseen entities. To this end, we formulate parameterized subtask graph inference (PSGI), a method for modeling subtask dependencies using first-order logic with subtask entities. To facilitate this, we learn entity attributes in a zero-shot manner, which are used as quantifiers (e. g. is pickable(X)) for the parameterized subtask graph. We show this approach accurately learns the latent structure on hierarchical and compositional tasks more efficiently than prior work, and show PSGI can generalize by modelling structure on subtasks unseen during adaptation.

NeurIPS Conference 2020 Conference Paper

Predictive Information Accelerates Learning in RL

  • Kuang-Huei Lee
  • Ian Fischer
  • Anthony Liu
  • Yijie Guo
  • Honglak Lee
  • John Canny
  • Sergio Guadarrama

The Predictive Information is the mutual information between the past and the future, I(X past; X future). We hypothesize that capturing the predictive information is useful in RL, since the ability to model what will happen next is necessary for success on many tasks. To test our hypothesis, we train Soft Actor-Critic (SAC) agents from pixels with an auxiliary task that learns a compressed representation of the predictive information of the RL environment dynamics using a contrastive version of the Conditional Entropy Bottleneck (CEB) objective. We refer to these as Predictive Information SAC (PI-SAC) agents. We show that PI-SAC agents can substantially improve sample efficiency over challenging baselines on tasks from the DM Control suite of continuous control environments. We evaluate PI-SAC agents by comparing against uncompressed PI-SAC agents, other compressed and uncompressed agents, and SAC agents directly trained from pixels. Our implementation is given on GitHub.