Beyond Neuron-Level Sparsity: Achieving Faithful and Interpretable LLMs with Mixture of Decoders

Grigorios Chrysos

doi:10.1609/aaai.v40i47.41340

Back to AAAI

AAAI 2026

Beyond Neuron-Level Sparsity: Achieving Faithful and Interpretable LLMs with Mixture of Decoders

Conference Paper New Faculty Highlights Artificial Intelligence

PDF Details DOI

Abstract

As large language models (LLMs) scale, ensuring interpretability and privacy becomes critical. This talk addresses these interconnected challenges with novel approaches to model specialization and safety. First, we tackle the dense, distributed nature of LLM representations by casting Mixture-of-Experts (MoE) as a tensor decomposition, enabling specialized experts in a factorized space. Second, we argue that current neuron-level sparsity methods create a severe accuracy-sparsity trade-off, and we propose a paradigm shift to layer-level sparsity with the Mixture of Decoders (MxD). We explain how MxD uses tensor factorization to expand dense layers into thousands of specialized, full-rank sublayers, demonstrating how it significantly outperforms alternatives in preserving model faithfulness and performance across LLMs up to 3B parameters. Finally, we address privacy in open-weight models by proposing a scalable and certifiable algorithm that induces maximal uncertainty on protected instances, proving tight bounds that characterize the resulting privacy-utility tradeoff.

Authors

Grigorios Chrysos

Keywords

No keywords are indexed for this paper.

Context

Venue: AAAI Conference on Artificial Intelligence
Archive span: 1980-2026
Indexed papers: 28718
Paper id: 270200831943272891