Arrow Research search

Author name cluster

Yibo Jiang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

12 papers
2 author rows

Possible papers

12

ICLR Conference 2025 Conference Paper

Quantifying Generalization Complexity for Large Language Models

  • Zhenting Qi
  • Hongyin Luo
  • Xuliang Huang
  • Zhuokai Zhao
  • Yibo Jiang
  • Xiangjun Fan
  • Himabindu Lakkaraju
  • James R. Glass

While large language models (LLMs) have shown exceptional capabilities in understanding complex queries and performing sophisticated tasks, their generalization abilities are often deeply entangled with memorization, necessitating more precise evaluation. To address this challenge, we introduce Scylla, a dynamic evaluation framework that quantitatively measures the generalization abilities of LLMs. Scylla disentangles generalization from memorization via assessing model performance on both in-distribution (ID) and out-of-distribution (OOD) data through 20 tasks across 5 levels of complexity. Through extensive experiments, we uncover a non-monotonic relationship between task complexity and the performance gap between ID and OOD data, which we term the generalization valley. Specifically, this phenomenon reveals a critical threshold---referred to as critical complexity---where reliance on non-generalizable behavior peaks, indicating the upper bound of LLMs' generalization capabilities. As model size increases, the critical complexity shifts toward higher levels of task complexity, suggesting that larger models can handle more complex reasoning tasks before over-relying on memorization. Leveraging Scylla and the concept of critical complexity, we benchmark 28 LLMs including both open-sourced models such as LLaMA and Qwen families, and closed-sourced models like Claude and GPT, providing a more robust evaluation and establishing a clearer understanding of LLMs' generalization capabilities.

ICLR Conference 2025 Conference Paper

The Geometry of Categorical and Hierarchical Concepts in Large Language Models

  • Kiho Park 0001
  • Yo Joong Choe
  • Yibo Jiang
  • Victor Veitch

The linear representation hypothesis is the informal idea that semantic concepts are encoded as linear directions in the representation spaces of large language models (LLMs). Previous work has shown how to make this notion precise for representing binary concepts that have natural contrasts (e.g., {male, female}) as _directions_ in representation space. However, many natural concepts do not have natural contrasts (e.g., whether the output is about an animal). In this work, we show how to extend the formalization of the linear representation hypothesis to represent features (e.g., is_animal) as _vectors_. This allows us to immediately formalize the representation of categorical concepts as polytopes in the representation space. Further, we use the formalization to prove a relationship between the hierarchical structure of concepts and the geometry of their representations. We validate these theoretical results on the Gemma and LLaMA-3 large language models, estimating representations for 900+ hierarchically related concepts using data from WordNet.

ICML Conference 2025 Conference Paper

The Illusion of Role Separation: Hidden Shortcuts in LLM Role Learning (and How to Fix Them)

  • Zihao Wang
  • Yibo Jiang
  • Jiahao Yu
  • Heqing Huang

Large language models (LLMs) that integrate multiple input roles (e. g. , system instructions, user queries, external tool outputs) are increasingly prevalent in practice. Ensuring that the model accurately distinguishes messages from each role—a concept we call role separation —is crucial for consistent multi-role behavior. Although recent work often targets state-of-the-art prompt injection defenses, it remains unclear whether such methods truly teach LLMs to differentiate roles or merely memorize known triggers. In this paper, we examine role-separation learning: the process of teaching LLMs to robustly distinguish system and user tokens. Through a simple, controlled experimental framework, we find that fine-tuned models often rely on two proxies for role identification: (1) task type exploitation, and (2) proximity to begin-of-text. Although data augmentation can partially mitigate these shortcuts, it generally leads to iterative patching rather than a deeper fix. To address this, we propose reinforcing invariant signals that mark role boundaries by adjusting token-wise cues in the model’s input encoding. In particular, modifying position IDs helps the model learn clearer distinctions and reduces reliance on superficial proxies. By focusing on this mechanism-centered perspective, our work illuminates how LLMs can more reliably maintain consistent multi-role behavior without merely memorizing known prompts or triggers.

ICLR Conference 2024 Conference Paper

Beyond Reverse KL: Generalizing Direct Preference Optimization with Diverse Divergence Constraints

  • Chaoqi Wang
  • Yibo Jiang
  • Chenghao Yang 0001
  • Han Liu
  • Yuxin Chen 0001

The increasing capabilities of large language models (LLMs) raise opportunities for artificial general intelligence but concurrently amplify safety concerns, such as potential misuse of AI systems, necessitating effective AI alignment. Reinforcement Learning from Human Feedback (RLHF) has emerged as a promising pathway towards AI alignment but brings forth challenges due to its complexity and dependence on a separate reward model. Direct Preference Optimization (DPO) has been proposed as an alternative; and it remains equivalent to RLHF under the reverse KL regularization constraint. This paper presents $f$-DPO, a generalized approach to DPO by incorporating diverse divergence constraints. We show that under certain $f$-divergences, including Jensen-Shannon divergence, forward KL divergences and $\alpha$-divergences, the complex relationship between the reward and optimal policy can also be simplified by addressing the Karush–Kuhn–Tucker conditions. This eliminates the need for estimating the normalizing constant in the Bradley-Terry model and enables a tractable mapping between the reward function and the optimal policy. Our approach optimizes LLMs to align with human preferences in a more efficient and supervised manner under a broad set of divergence constraints. Empirically, adopting these divergences ensures a balance between alignment performance and generation diversity. Importantly, our $f$-DPO outperforms PPO-based methods in divergence efficiency, and divergence constraints directly influence expected calibration error (ECE).

NeurIPS Conference 2024 Conference Paper

Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers

  • Yibo Jiang
  • Goutham Rajendran
  • Pradeep Ravikumar
  • Bryon Aragam

Large Language Models (LLMs) have the capacity to store and recall facts. Through experimentation with open-source models, we observe that this ability to retrieve facts can be easily manipulated by changing contexts, even without altering their factual meanings. These findings highlight that LLMs might behave like an associative memory model where certain tokens in the contexts serve as clues to retrieving facts. We mathematically explore this property by studying how transformers, the building blocks of LLMs, can complete such memory tasks. We study a simple latent concept association problem with a one-layer transformer and we show theoretically and empirically that the transformer gathers information using self-attention and uses the value matrix for associative memory.

ICML Conference 2024 Conference Paper

On the Origins of Linear Representations in Large Language Models

  • Yibo Jiang
  • Goutham Rajendran
  • Pradeep Ravikumar
  • Bryon Aragam
  • Victor Veitch

An array of recent works have argued that high-level semantic concepts are encoded "linearly" in the representation space of large language models. In this work, we study the origins of such linear representations. To that end, we introduce a latent variable model to abstract and formalize the concept dynamics of the next token prediction. We use this formalism to prove that linearity arises as a consequence of the loss function and the implicit bias of gradient descent. The theory is further substantiated empirically via experiments.

NeurIPS Conference 2023 Conference Paper

Learning Nonparametric Latent Causal Graphs with Unknown Interventions

  • Yibo Jiang
  • Bryon Aragam

We establish conditions under which latent causal graphs are nonparametrically identifiable and can be reconstructed from unknown interventions in the latent space. Our primary focus is the identification of the latent structure in measurement models without parametric assumptions such as linearity or Gaussianity. Moreover, we do not assume the number of hidden variables is known, and we show that at most one unknown intervention per hidden variable is needed. This extends a recent line of work on learning causal representations from observations and interventions. The proofs are constructive and introduce two new graphical concepts--- imaginary subsets and isolated edges ---that may be useful in their own right. As a matter of independent interest, the proofs also involve a novel characterization of the limits of edge orientations within the equivalence class of DAGs induced by unknown interventions. These are the first results to characterize the conditions under which causal representations are identifiable without making any parametric assumptions in a general setting with unknown interventions and without faithfulness.

NeurIPS Conference 2023 Conference Paper

Uncovering Meanings of Embeddings via Partial Orthogonality

  • Yibo Jiang
  • Bryon Aragam
  • Victor Veitch

Machine learning tools often rely on embedding text as vectors of real numbers. In this paper, we study how the semantic structure of language is encoded in the algebraic structure of such embeddings. Specifically, we look at a notion of "semantic independence" capturing the idea that, e. g. , "eggplant" and "tomato" are independent given "vegetable". Although such examples are intuitive, it is difficult to formalize such a notion of semantic independence. The key observation here is that any sensible formalization should obey a set of so-called independence axioms, and thus any algebraic encoding of this structure should also obey these axioms. This leads us naturally to use partial orthogonality as the relevant algebraic structure. We develop theory and methods that allow us to demonstrate that partial orthogonality does indeed capture semantic independence. Complementary to this, we also introduce the concept of independence preserving embeddings where embeddings preserve the conditional independence structures of a distribution, and we prove the existence of such embeddings and approximations to them.

NeurIPS Conference 2022 Conference Paper

Invariant and Transportable Representations for Anti-Causal Domain Shifts

  • Yibo Jiang
  • Victor Veitch

Real-world classification problems must contend with domain shift, the (potential) mismatch between the domain where a model is deployed and the domain(s) where the training data was gathered. Methods to handle such problems must specify what structure is held in common between the domains and what is allowed to vary. A natural assumption is that causal (structural) relationships are invariant in all domains. Then, it is tempting to learn a predictor for label $Y$ that depends only on its causal parents. However, many real-world problems are ``anti-causal'' in the sense that $Y$ is a cause of the covariates $X$---in this case, $Y$ has no causal parents and the naive causal invariance is useless. In this paper, we study representation learning under a particular notion of domain shift that both respects causal invariance and that naturally handles the ``anti-causal'' structure. We show how to leverage the shared causal structure of the domains to learn a representation that both admits an invariant predictor and that also allows fast adaptation in new domains. The key is to translate causal assumptions into learning principles that disentangle ``invariant'' and ``non-stable'' features. Experiments on both synthetic and real-world data demonstrate the effectiveness of the proposed learning algorithm.

ICRA Conference 2021 Conference Paper

A Legged Soft Robot Platform for Dynamic Locomotion

  • Boxi Xia
  • Jiaming Fu
  • Hongbo Zhu
  • Zhicheng Song
  • Yibo Jiang
  • Hod Lipson

This paper presents an open-source untethered quadrupedal soft robot platform for dynamic locomotion (e. g. , high-speed running and backflipping). The robot is mostly soft (80 vol. %) while driven by four geared servo motors. The robot’s soft body and soft legs were 3D printed with gyroid infill using a flexible material, enabling it to conform to the environment and passively stabilize during locomotion in multi-terrain environments. In addition, we simulated the robot in a real-time soft body simulation. With tuned gaits in simulation, the real robot can locomote at a speed of 0. 9 m/s (2. 5 body length/second), substantially faster than most untethered legged soft robots published to date. We hope this platform, along with its verified simulator, can catalyze agile soft robots' development.

ICML Conference 2020 Conference Paper

Associative Memory in Iterated Overparameterized Sigmoid Autoencoders

  • Yibo Jiang
  • Cengiz Pehlevan

Recent work showed that overparameterized autoencoders can be trained to implement associative memory via iterative maps, when the trained input-output Jacobian of the network has all of its eigenvalue norms strictly below one. Here, we theoretically analyze this phenomenon for sigmoid networks by leveraging recent developments in deep learning theory, especially the correspondence between training neural networks in the infinite-width limit and performing kernel regression with the Neural Tangent Kernel (NTK). We find that overparameterized sigmoid autoencoders can have attractors in the NTK limit for both training with a single example and multiple examples under certain conditions. In particular, for multiple training examples, we find that the norm of the largest Jacobian eigenvalue drops below one with increasing input norm, leading to associative memory.

AAAI Conference 2018 Short Paper

StackReader: An RNN-Free Reading Comprehension Model

  • Yibo Jiang
  • Zhou Zhao

Machine comprehension of text is the problem to answer a query based on a given context. Many existing systems use RNN-based units for contextual modeling linked with some attention mechanisms. In this paper, however, we propose StackReader, an end-to-end neural network model, to solve this problem, without recurrent neural network (RNN) units and its variants. This simple model is based solely on attention mechanism and gated convolutional neural network. Experiments on SQuAD have shown to have relatively high accuracy with a significant decrease in training time.