Arrow Research search

Author name cluster

James Thorne

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers
2 author rows

Possible papers

7

AAAI Conference 2026 Conference Paper

Margin-Aware Preference Optimization for Aligning Diffusion Models Without Reference

  • Jiwoo Hong
  • Sayak Paul
  • Noah Lee
  • Kashif Rasul
  • James Thorne
  • Jongheon Jeong

Modern preference alignment methods, such as DPO, rely on divergence regularization to a reference model for training stability—but this creates a fundamental problem we call "reference mismatch." In this paper, we investigate the negative impacts of reference mismatch in aligning text-to-image (T2I) diffusion models, showing that larger reference mismatch hinders effective adaptation given the same amount of data, e.g., as when learning new artistic styles, or personalizing to specific objects. We demonstrate this phenomenon across text-to-image (T2I) diffusion models and introduce margin-aware preference optimization (MaPO), a reference-agnostic approach that breaks free from this constraint. By directly optimizing the likelihood margin between preferred and dispreferred outputs under the Bradley-Terry model without anchoring to a reference, MaPO transforms diverse T2I tasks into unified pairwise preference optimization. We validate MaPO's versatility across five challenging domains: (1) safe generation, (2) style adaptation, (3) cultural representation, (4) personalization, and (5) general preference alignment. Our results reveal that MaPO's advantage grows dramatically with reference mismatch severity, outperforming both DPO and specialized methods like DreamBooth while reducing training time by 15%. MaPO thus emerges as a versatile and memory-efficient method for generic T2I adaptation tasks.

ICML Conference 2025 Conference Paper

On the Robustness of Reward Models for Language Model Alignment

  • Jiwoo Hong
  • Noah Lee
  • Eunki Kim
  • Guijin Son
  • Woojin Chung
  • Aman Gupta
  • Shao Tang
  • James Thorne

The Bradley-Terry (BT) model is widely practiced in reward modeling for reinforcement learning with human feedback (RLHF). Despite its effectiveness, reward models (RMs) trained with BT model loss as one-way classifiers are prone to over-optimization, losing generalizability to unseen inputs. In this paper, we study the cause of over-optimization and its downstream effects on the RLHF procedure, highlighting the importance of robustness in RMs. First, we show that the excessive dispersion of hidden state norms is the main source of over-optimization. Correspondingly, we propose batch-wise sum-to-zero regularization (BSR) that enforces reward sum for each batch to be zero-centered, constraining the rewards with abnormally large magnitudes. We assess the impact of BSR in improving robustness in RMs through four scenarios of over-optimization, where BSR consistently manifests better robustness on unseen inputs. Then, we compare the plain BT model and BSR on RLHF training and empirically show that robust RMs better align the policy to the gold preference model. Finally, we apply BSR to high-quality data and models, which surpasses state-of-the-art RMs in the 8B scale by adding more than 5% in complex preference prediction tasks. By conducting RLOO training with 8B RMs, AlpacaEval 2. 0, with reducing generation length by 40% while adding a 7% increase in win rate, further highlights that robustness in RMs induces robustness in RLHF training.

NeurIPS Conference 2024 Conference Paper

Block Transformer: Global-to-Local Language Modeling for Fast Inference

  • Namgyu Ho
  • Sangmin Bae
  • Taehyeon Kim
  • Hyunjik Jo
  • Yireun Kim
  • Tal Schuster
  • Adam Fisch
  • James Thorne

We introduce the Block Transformer which adopts hierarchical global-to-local modeling to autoregressive transformers to mitigate the inference bottlenecks associated with self-attention. Self-attention requires the key-value (KV) cache of all previous sequences to be retrieved from memory at every decoding step to retrieve context information, leading to two primary bottlenecks during batch inference. First, there is a significant delay in obtaining the first token, as the information of the entire prompt must first be processed to prefill the KV cache. Second, computation of subsequent tokens is bottlenecked by the high memory I/O demand of fetching the entire KV cache, which grows linearly with sequence length, incurring quadratic memory reads overall. We design the Block Transformer to strategically mitigate these costs, by incorporating coarsity and locality into an integrated global-to-local architecture. At the lower layers, we aggregate tokens into fixed size blocks to apply attention across the entire sequence at coarse-grained detail, to capture the global context while minimizing KV cache overhead. At upper layers, we apply attention within each block to decode individual tokens, to model fine-grained details with a lightweight local KV cache. We pretrain vanilla and Block Transformers from scratch and demonstrate that Block Transformers reach 10--20x inference throughput compared to vanilla transformers with equivalent perplexity and zero-shot task performance.

ICLR Conference 2024 Conference Paper

FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets

  • Seonghyeon Ye
  • Doyoung Kim 0001
  • Sungdong Kim
  • Hyeonbin Hwang
  • Seungone Kim
  • Yongrae Jo
  • James Thorne
  • Juho Kim 0001

Evaluation of Large Language Models (LLMs) is challenging because instruction-following necessitates alignment with human values and the required set of skills varies depending on the instruction. However, previous studies have mainly focused on coarse-grained evaluation (i.e. overall preference-based evaluation), which limits interpretability since it does not consider the nature of user instructions that require instance-wise skill composition. In this paper, we introduce FLASK (Fine-grained Language Model Evaluation based on Alignment Skill Sets), a fine-grained evaluation protocol for both human-based and model-based evaluation which decomposes coarse-level scoring to a skill set-level scoring for each instruction. We experimentally observe that the fine-graininess of evaluation is crucial for attaining a holistic view of model performance and increasing the reliability of the evaluation. Using FLASK, we compare multiple open-source and proprietary LLMs and observe a high correlation between model-based and human-based evaluations.

ICLR Conference 2024 Conference Paper

Prometheus: Inducing Fine-Grained Evaluation Capability in Language Models

  • Seungone Kim
  • Jamin Shin
  • Yejin Choi 0001
  • Joel Jang
  • Shayne Longpre
  • Hwaran Lee
  • Sangdoo Yun
  • Seongjin Shin

Recently, GPT-4 has become the de facto evaluator for long-form text generated by large language models (LLMs). However, for practitioners and researchers with large and custom evaluation tasks, GPT-4 is unreliable due to its closed-source nature, uncontrolled versioning, and prohibitive costs. In this work, we propose PROMETHEUS a fully open-source LLM that is on par with GPT-4’s evaluation capabilities when the appropriate reference materials (reference answer, score rubric) are accompanied. For this purpose, we construct a new dataset – FEEDBACK COLLECTION – that consists of 1K fine-grained score rubrics, 20K instructions, and 100K natural language feedback generated by GPT-4. Using the FEEDBACK COLLECTION, we train PROMETHEUS, a 13B evaluation-specific LLM that can assess any given response based on novel and unseen score rubrics and reference materials provided by the user. Our dataset’s versatility and diversity make our model generalize to challenging real-world criteria, such as prioritizing conciseness, child-readability, or varying levels of formality. We show that PROMETHEUS shows a stronger correlation with GPT-4 evaluation compared to ChatGPT on seven evaluation benchmarks (Two Feedback Collection testsets, MT Bench, Vicuna Bench, Flask Eval, MT Bench Human Judgment, and HHH Alignment), showing the efficacy of our model and dataset design. During human evaluation with hand-crafted score rubrics, PROMETHEUS shows a Pearson correlation of 0.897 with human evaluators, which is on par with GPT-4-0613 (0.882), and greatly outperforms ChatGPT (0.392). Remarkably, when assessing the quality of the generated feedback, PROMETHEUS demonstrates a win rate of 58.62% when compared to GPT-4 evaluation and a win rate of 79.57% when compared to ChatGPT evaluation. Our findings suggests that by adding reference materials and training on GPT-4 feedback, we can obtain effective open-source evaluator LMs.

NeurIPS Conference 2023 Conference Paper

VisAlign: Dataset for Measuring the Alignment between AI and Humans in Visual Perception

  • Jiyoung Lee
  • Seungho Kim
  • Seunghyun Won
  • Joonseok Lee
  • Marzyeh Ghassemi
  • James Thorne
  • Jaeseok Choi
  • O-Kil Kwon

AI alignment refers to models acting towards human-intended goals, preferences, or ethical principles. Analyzing the similarity between models and humans can be a proxy measure for ensuring AI safety. In this paper, we focus on the models' visual perception alignment with humans, further referred to as AI-human visual alignment. Specifically, we propose a new dataset for measuring AI-human visual alignment in terms of image classification. In order to evaluate AI-human visual alignment, a dataset should encompass samples with various scenarios and have gold human perception labels. Our dataset consists of three groups of samples, namely Must-Act (i. e. , Must-Classify), Must-Abstain, and Uncertain, based on the quantity and clarity of visual information in an image and further divided into eight categories. All samples have a gold human perception label; even Uncertain (e. g. , severely blurry) sample labels were obtained via crowd-sourcing. The validity of our dataset is verified by sampling theory, statistical theories related to survey design, and experts in the related fields. Using our dataset, we analyze the visual alignment and reliability of five popular visual perception models and seven abstention methods. Our code and data is available at https: //github. com/jiyounglee-0523/VisAlign.

NeurIPS Conference 2021 Conference Paper

FEVEROUS: Fact Extraction and VERification Over Unstructured and Structured information

  • Rami Aly
  • Zhijiang Guo
  • Michael Schlichtkrull
  • James Thorne
  • Andreas Vlachos
  • Christos Christodoulopoulos
  • Oana Cocarascu
  • Arpit Mittal

Fact verification has attracted a lot of attention in the machine learning and natural language processing communities, as it is one of the key methods for detecting misinformation. Existing large-scale benchmarks for this task have focused mostly on textual sources, i. e. unstructured information, and thus ignored the wealth of information available in structured formats, such as tables. In this paper we introduce a novel dataset and benchmark, Fact Extraction and VERification Over Unstructured and Structured information (FEVEROUS), which consists of 87, 026 verified claims. Each claim is annotated with evidence in the form of sentences and/or cells from tables in Wikipedia, as well as a label indicating whether this evidence supports, refutes, or does not provide enough information to reach a verdict. Furthermore, we detail our efforts to track and minimize the biases present in the dataset and could be exploited by models, e. g. being able to predict the label without using evidence. Finally, we develop a baseline for verifying claims against text and tables which predicts both the correct evidence and verdict for 18% of the claims.