Arrow Research search

Author name cluster

Nico Daheim

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

3 papers
1 author row

Possible papers

3

ICLR Conference 2025 Conference Paper

Uncertainty-Aware Decoding with Minimum Bayes Risk

  • Nico Daheim
  • Clara Meister
  • Thomas Möllenhoff
  • Iryna Gurevych

Despite their outstanding performance in the majority of scenarios, contemporary language models still occasionally generate undesirable outputs, for example, hallucinated text. While such behaviors have previously been linked to uncertainty, there is a notable lack of methods that actively consider uncertainty during text generation. In this work, we show how Minimum Bayes Risk (MBR) decoding, which selects model generations according to an expected risk, can be generalized into a principled uncertainty-aware decoding method. In short, we account for model uncertainty during decoding by incorporating a posterior over model parameters into MBR’s computation of expected risk. We show that this modified expected risk is useful for both choosing outputs and deciding when to abstain from generation and can provide improvements without incurring overhead. We benchmark different methods for learning posteriors and show that performance improves with prediction diversity. We release our code publicly.

ICLR Conference 2024 Conference Paper

Model Merging by Uncertainty-Based Gradient Matching

  • Nico Daheim
  • Thomas Möllenhoff
  • Edoardo M. Ponti
  • Iryna Gurevych
  • Mohammad Emtiyaz Khan

Models trained on different datasets can be merged by a weighted-averaging of their parameters, but why does it work and when can it fail? Here, we connect the inaccuracy of weighted-averaging to mismatches in the gradients and propose a new uncertainty-based scheme to improve the performance by reducing the mismatch. The connection also reveals implicit assumptions in other schemes such as averaging, task arithmetic, and Fisher-weighted averaging. Our new method gives consistent improvements for large language models and vision transformers, both in terms of performance and robustness to hyperparameters.

ICML Conference 2024 Conference Paper

Variational Learning is Effective for Large Deep Networks

  • Yuesong Shen
  • Nico Daheim
  • Bai Cong
  • Peter Nickl
  • Gian Maria Marconi
  • Clement Bazan
  • Rio Yokota
  • Iryna Gurevych

We give extensive empirical evidence against the common belief that variational learning is ineffective for large neural networks. We show that an optimizer called Improved Variational Online Newton (IVON) consistently matches or outperforms Adam for training large networks such as GPT-2 and ResNets from scratch. IVON’s computational costs are nearly identical to Adam but its predictive uncertainty is better. We show several new use cases of IVON where we improve finetuning and model merging in Large Language Models, accurately predict generalization error, and faithfully estimate sensitivity to data. We find overwhelming evidence that variational learning is effective. Code is available at https: //github. com/team-approx-bayes/ivon.