Author name cluster

Jiaming Song

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

53 papers

2 author rows

AAAI Conference 2026 Conference Paper

SABER: Switchable and Balanced Training for Efficient LLM Reasoning

Kai Zhao
Yanjun Zhao
Jiaming Song
Shien He
Lusheng Zhang
Qiang Zhang
Tianjiao Li

Large language models (LLMs) empowered by chain-of-thought reasoning have achieved impressive accuracy on complex tasks but suffer from excessive inference costs and latency when applied uniformly to all problems. We propose SABER (Switchable and Balanced Training for Efficient LLM Reasoning), a reinforcement learning framework that endows LLMs with user‑controllable, token‑budgeted reasoning. SABER first profiles each training example’s base‑model thinking token usage and assigns it to one of the predefined budget tiers. During fine‑tuning, the model is guided by system prompts and length‑aware rewards to respect its assigned budget. In parallel, we incorporate no‑think examples to ensure the model remains reliable even when explicit reasoning is turned off. SABER further supports four discrete inference modes—NoThink, FastThink, CoreThink, and DeepThink, enabling flexible trade‑offs between latency and reasoning depth. Extensive evaluations on math reasoning (MATH, GSM8K), code generation (MBPP), and logical reasoning (LiveBench-Reasoning) demonstrate that SABER achieves high accuracy under tight budgets, graceful degradation, and effective cross-scale and cross‑domain generalization. In particular, SABER‑FastThink cuts reasoning length by 65.4% and yields a 3.6% accuracy gain compared with the base model on the MATH benchmark.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Self-NPO: Data-Free Diffusion Model Enhancement via Truncated Diffusion Fine-Tuning

Fu-Yun Wang
Keqiang Sun
Yao Teng
Xihui Liu
Jiale Yuan
Jiaming Song
Hongsheng Li

Diffusion models have demonstrated remarkable success in various visual generation tasks, including image, video, and 3D content generation. Preference optimization (PO) is a prominent and growing area of research that aims to align these models with human preferences. While existing PO methods primarily concentrate on producing favorable outputs, they often overlook the significance of classifier-free guidance (CFG) in mitigating undesirable results. DiffusionNPO addresses this gap by introducing negative preference optimization (NPO), training models to generate outputs opposite to human preferences and thereby steering them away from unfavorable outcomes through CFG. However, prior NPO approaches rely on costly and fragile procedures for obtaining explicit preference annotations (e.g., manual pairwise labeling or reward model training), limiting their practicality in domains where such data are scarce or difficult to acquire. In this work, we propose Self-NPO, specifically truncated diffusion fine-tuning, a data-free approach of negative preference optimization by directly learning from the model itself, eliminating the need for manual data labeling or reward model training. This data-free approach is highly efficient (less than 1% training cost of Diffusion-NPO) and achieves comparable performance of Diffusion-NPO in a data-free manner. We demonstrate that Self-NPO integrates seamlessly into widely used diffusion models, including SD1.5, SDXL, and CogVideoX, as well as models already optimized for human preferences, consistently enhancing both their generation quality and alignment with human preferences.

PDF Details DOI

ICML Conference 2025 Conference Paper

Inductive Moment Matching

Linqi Zhou
Stefano Ermon
Jiaming Song

Diffusion models and Flow Matching generate high-quality samples but are slow at inference, and distilling them into few-step models often leads to instability and extensive tuning. To resolve these trade-offs, we propose Moment Matching Self-Distillation (MMSD), a new class of generative models for one- or few-step sampling with a single-stage training procedure. Unlike distillation, MMSD does not require pre-training initialization and optimization of two networks; and unlike Consistency Models, MMSD guarantees distribution-level convergence and remains stable under various hyperparameters and standard model architectures. MMSD surpasses diffusion models on ImageNet-256x256 with 2. 13 FID using only 8 inference steps and achieves state-of-the-art 2-step FID of 2. 05 on CIFAR-10 for a model trained from scratch.