Arrow Research search
Back to ICLR

ICLR 2025

Diffusion-based Neural Network Weights Generation

Conference Paper Accept (Poster) Artificial Intelligence ยท Machine Learning

Abstract

Transfer learning is a cornerstone of modern deep learning, yet it remains constrained by challenges in model selection and the overhead of extensive model storage. In this work, we present Diffusion-based Neural Network Weights Generation, D2NWG, a novel framework that leverages diffusion processes to synthesize task-specific network weights. By modeling the distribution of weights from a diverse ensemble of pretrained models and conditioning the generation process on dataset characteristics, task descriptions, and architectural specifications, D2NWG circumvents the need for storing and searching through massive model repositories. We evaluate D2NWG across multiple experimental settings. On in-distribution tasks, our framework achieves performance that is on par with or superior to conventional pretrained models, while also serving as an effective initialization strategy for novel domains, resulting in faster convergence and a 6\% improvement in few-shot learning scenarios. Extensive ablation studies further indicate that our approach scales robustly with increased diversity and volume of pretrained models. Moreover, D2NWG demonstrates significant promise for large language model applications. In evaluations on the OpenLM leaderboard, our method improved LLaMA-3-2-1B-Instruct performance by 3\% on challenging mathematical reasoning tasks, with a consistent gain of 0.36\% across a range of benchmarks. These findings establish D2NWG as a versatile and powerful framework for neural network weight generation, offering a scalable solution to the limitations of traditional transfer learning.

Authors

Keywords

  • generative hyper-representation learning
  • diffusion model
  • neural network weights generation
  • parameters generation
  • hypernetworks

Context

Venue
International Conference on Learning Representations
Archive span
2013-2025
Indexed papers
10294
Paper id
349765114669089827