Author name cluster

Trung Le

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

30 papers

1 author row

AAAI Conference 2026 Conference Paper

CTPD: Cross Tokenizer Preference Distillation

Truong Nguyen
Phi Van Dat
Ngan Nguyen
Linh Ngo Van
Trung Le
Thanh Hong Nguyen

While knowledge distillation has seen widespread use in pre-training and instruction tuning, its application to aligning language models with human preferences remains underexplored, particularly in the more realistic cross-tokenizer setting. The incompatibility of tokenization schemes between teacher and student models has largely prevented fine-grained, white-box distillation of preference information. To address this gap, we propose Cross-Tokenizer Preference Distillation (CTPD), the first unified framework for transferring human-aligned behavior between models with heterogeneous tokenizers. CTPD introduces three key innovations: (1) Aligned Span Projection, which maps teacher and student tokens to shared character-level spans for precise supervision transfer; (2) a cross-tokenizer adaptation of Token-level Importance Sampling (TIS-DPO) for improved credit assignment; and (3) a Teacher-Anchored Reference, allowing the student to directly leverage the teacher’s preferences in a DPO-style objective. Our theoretical analysis grounds CTPD in importance sampling, and experiments across multiple benchmarks confirm its effectiveness, with significant performance gains over existing methods. These results establish CTPD as a practical and general solution for preference distillation across diverse tokenization schemes, opening the door to more accessible and efficient alignment of language models.

PDF Details DOI

AAAI Conference 2026 Conference Paper

DIET: Machine Unlearning on a Data-Diet

Nilakshan Kunananthaseelan
Jing Wu
Trung Le
Gholamreza Haffari
Mehrtash Harandi

Machine Unlearning (MU) aims to remove the influence of specific knowledge from a pretrained model. Existing methods often rely on retained training data to preserve utility; such dependence is impractical due to privacy and scalability constraints. A further complication arises when unlearning is applied to vision-language models (VLMs), where entangled multimodal representations make targeted forgetting especially challenging. We propose DIET, a principled retain-data-free unlearning method for VLMs that addresses these challenges by leveraging the geometry of hyperbolic space. The core idea is to push forget embeddings toward class-mismatched prototypes located at the boundary of the hyperbolic space. In hyperbolic geometry, points near the boundary become infinitely distant from interior points. As a result, moving forget embeddings to the boundary makes their influence on the model asymptotically negligible. To formalize this, we guide the forgetting process using the Busemann function, which quantifies directional distance to the boundary. We further develop an adaptive scheme based on optimal transport that selects mismatched prototypes for each forget embedding, enabling flexible unlearning dynamics. Extensive experiments on fine-grained datasets such as Flowers102, OxfordPets, and StanfordCars show that DIET achieves an average forget accuracy of 8.06%, while preserving 69.04% utility using only 16 samples per concept, significantly outperforming the best retain-free baselines with a 117.5% improvement in model utility, and showing competitive performance to retain-data baselines with only a 3.79% drop

PDF Details DOI

AAAI Conference 2026 Conference Paper

MCW-KD: Multi-Cost Wasserstein Knowledge Distillation for Large Language Models

Hoang Tran Vuong
Tue Le
Quyen Tran
Linh Ngo Van
Trung Le

Knowledge distillation (KD) is widely recognized as an effective approach for compressing large language models (LLMs). However, standard KD methods often falter when confronted with architectural or tokenization heterogeneity between teacher and student models, which creates a mismatch in their representations. While Optimal Transport (OT) provides a promising solution to align these representations, most OT-based methods rely on a single cost function, which isn’t enough to capture the multifaceted discrepancies between models with distinct designs. To address this limitation, we introduce Multi-Cost Wasserstein Knowledge Distillation (MCW-KD), a novel framework that enhances KD by simultaneously optimizing several cost functions within a unified OT formulation. MCW-KD employs specific cost matrices to effectively align both the final hidden states and the output distributions of the models. We also provide a rigorous theoretical foundation for the proposed Multi-Cost Wasserstein Distance, ensuring both mathematical validity and computational ability. Extensive experiments on instruction-following datasets demonstrate that MCW-KD significantly improves student model performance compared to state-of-the-art KD baselines, especially when teacher and student models have different tokenizers.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Geometry-Aware Collaborative Multi-Solutions Optimizer for Model Fine-Tuning with Parameter Efficiency

Van-Anh Nguyen
Trung Le
Mehrtash Harandi
Ehsan Abbasnejad
Thanh-Toan Do
Dinh Phung

We propose a framework grounded in gradient flow theory and informed by geometric structure that provides multiple diverse solutions for a given task, ensuring collaborative results that enhance performance and adaptability across different tasks. This framework enables flexibility, allowing for efficient task-specific fine-tuning while preserving the knowledge of the pre-trained foundation models. Extensive experiments across transfer learning, few-shot learning, and domain generalization show that our proposed approach consistently outperforms existing Bayesian methods, delivering strong performance with affordable computational overhead and offering a practical solution by updating only a small subset of parameters.