Author name cluster

Michael McCabe

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers

1 author row

NeurIPS Conference 2025 Conference Paper

AION-1: Omnimodal Foundation Model for Astronomical Sciences

Liam Parker
Francois Lanusse
Jeff Shen
Ollie Liu
Tom Hehir
Leopoldo Sarra
Lucas Meyer
Micah Bowles

While foundation models have shown promise across a variety of fields, astronomy lacks a unified framework for joint modeling across its highly diverse data modalities. In this paper, we present AION-1, the first large-scale multimodal foundation family of models for astronomy. AION-1 enables arbitrary transformations between heterogeneous data types using a two-stage architecture: modality-specific tokenization followed by transformer-based masked modeling of cross-modal token sequences. Trained on over 200M astronomical objects, AION-1 demonstrates strong performance across regression, classification, generation, and object retrieval tasks. Beyond astronomy, AION-1 provides a scalable blueprint for multimodal scientific foundation models that can seamlessly integrate heterogeneous combinations of real-world observations. Our model release is entirely open source, including the dataset, training script, and weights.

PDF Details

NeurIPS Conference 2025 Conference Paper

Lost in Latent Space: An Empirical Study of Latent Diffusion Models for Physics Emulation

François Rozet
Ruben Ohana
Michael McCabe
Gilles Louppe
Francois Lanusse
Shirley Ho

The steep computational cost of diffusion models at inference hinders their use as fast physics emulators. In the context of image and video generation, this computational drawback has been addressed by generating in the latent space of an autoencoder instead of the pixel space. In this work, we investigate whether a similar strategy can be effectively applied to the emulation of dynamical systems and at what cost. We find that the accuracy of latent-space emulation is surprisingly robust to a wide range of compression rates (up to 1000x). We also show that diffusion-based emulators are consistently more accurate than non-generative counterparts and compensate for uncertainty in their predictions with greater diversity. Finally, we cover practical design choices, spanning from architectures to optimizers, that we found critical to train latent-space emulators.

PDF Details

NeurIPS Conference 2025 Conference Paper

Predicting partially observable dynamical systems via diffusion models with a multiscale inference scheme

Rudy Morel
Francesco Ramunno
Jeff Shen
Alberto Bietti
Kyunghyun Cho
Miles Cranmer
Siavash Golkar
OLEXANDR GUGNIN

Conditional diffusion models provide a natural framework for probabilistic prediction of dynamical systems and have been successfully applied to fluid dynamics and weather prediction. However, in many settings, the available information at a given time represents only a small fraction of what is needed to predict future states, either due to measurement uncertainty or because only a small fraction of the state can be observed. This is true for example in solar physics, where we can observe the Sun’s surface and atmosphere, but its evolution is driven by internal processes for which we lack direct measurements. In this paper, we tackle the probabilistic prediction of partially observable, long-memory dynamical systems, with applications to solar dynamics and the evolution of active regions. We show that standard inference schemes, such as autoregressive rollouts, fail to capture long-range dependencies in the data, largely because they do not integrate past information effectively. To overcome this, we propose a multiscale inference scheme for diffusion models, tailored to physical processes. Our method generates trajectories that are temporally fine-grained near the present and coarser as we move farther away, which enables capturing long-range temporal dependencies without increasing computational cost. When integrated into a diffusion model, we show that our inference scheme significantly reduces the bias of the predicted distributions and improves rollout stability.

PDF Details

NeurIPS Conference 2025 Conference Paper

SmokeViz: A Large-Scale Satellite Dataset for Wildfire Smoke Detection and Segmentation

Rey Koki
Michael McCabe
Dhruv Kedar
Josh Myers-Dean
Annabel Wade
Jebb Stewart
Christina Kumler-Bonfanti
Jed Brown

The global rise in wildfire frequency and intensity over the past decade underscores the need for improved fire monitoring techniques. To advance deep learning research on wildfire detection and its associated human health impacts, we introduce SmokeViz, a large-scale machine learning dataset of smoke plumes in satellite imagery. The dataset is derived from expert annotations created by smoke analysts at the National Oceanic and Atmospheric Administration, which provide coarse temporal and spatial approximations of smoke presence. To enhance annotation precision, we propose pseudo-label dimension reduction (PLDR), a generalizable method that applies pseudo-labeling to refine datasets with mismatching temporal and/or spatial resolutions. Unlike typical pseudo-labeling applications that aim to increase the number of labeled samples, PLDR maintains the original labels but increases the dataset quality by solving for intermediary pseudo-labels (IPLs) that align each annotation to the most representative input data. For SmokeViz, a parent model produces IPLs to identify the single satellite image within each annotations time window that best corresponds with the smoke plume. This refinement process produces a succinct and relevant deep learning dataset consisting of over 160, 000 manual annotations. The SmokeViz dataset is expected to be a valuable resource to develop further wildfire-related machine learning models and is publicly available at \url{https: //noaa-gsl-experimental-pds. s3. amazonaws. com/index. html#SmokeViz/}.

PDF Details

NeurIPS Conference 2024 Conference Paper

Multiple Physics Pretraining for Spatiotemporal Surrogate Models

Michael McCabe
Bruno Régaldo-Saint Blancard
Liam Parker
Ruben Ohana
Miles Cranmer
Alberto Bietti
Michael Eickenberg
Siavash Golkar

We introduce multiple physics pretraining (MPP), an autoregressive task-agnostic pretraining approach for physical surrogate modeling of spatiotemporal systems with transformers. In MPP, rather than training one model on a specific physical system, we train a backbone model to predict the dynamics of multiple heterogeneous physical systems simultaneously in order to learn features that are broadly useful across systems and facilitate transfer. In order to learn effectively in this setting, we introduce a shared embedding and normalization strategy that projects the fields of multiple systems into a shared embedding space. We validate the efficacy of our approach on both pretraining and downstream tasks over a broad fluid mechanics-oriented benchmark. We show that a single MPP-pretrained transformer is able to match or outperform task-specific baselines on all pretraining sub-tasks without the need for finetuning. For downstream tasks, we demonstrate that finetuning MPP-trained models results in more accurate predictions across multiple time-steps on systems with previously unseen physical components or higher dimensional systems compared to training from scratch or finetuning pretrained video foundation models. We open-source our code and model weights trained at multiple scales for reproducibility.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

The Well: a Large-Scale Collection of Diverse Physics Simulations for Machine Learning

Ruben Ohana
Michael McCabe
Lucas Meyer
Rudy Morel
Fruzsina J. Agocs
Miguel Beneitez
Marsha Berger
Blakesley Burkhart

Machine learning based surrogate models offer researchers powerful tools for accelerating simulation-based workflows. However, as standard datasets in this space often cover small classes of physical behavior, it can be difficult to evaluate the efficacy of new approaches. To address this gap, we introduce the Well: a large-scale collection of datasets containing numerical simulations of a wide variety of spatiotemporal physical systems. The Well draws from domain experts and numerical software developers to provide 15TB of data across 16 datasets covering diverse domains such as biological systems, fluid dynamics, acoustic scattering, as well as magneto-hydrodynamic simulations of extra-galactic fluids or supernova explosions. These datasets can be used individually or as part of a broader benchmark suite. To facilitate usage of the Well, we provide a unified PyTorch interface for training and evaluating models. We demonstrate the function of this library by introducing example baselines that highlight the new challenges posed by the complex dynamics of the Well. The code and data is available at https: //github. com/PolymathicAI/the_well.

PDF Details DOI

TMLR Journal 2023 Journal Article

Towards Stability of Autoregressive Neural Operators

Michael McCabe
Peter Harrington
Shashank Subramanian
Jed Brown

Neural operators have proven to be a promising approach for modeling spatiotemporal systems in the physical sciences. However, training these models for large systems can be quite challenging as they incur significant computational and memory expense---these systems are often forced to rely on autoregressive time-stepping of the neural network to predict future temporal states. While this is effective in managing costs, it can lead to uncontrolled error growth over time and eventual instability. We analyze the sources of this autoregressive error growth using prototypical neural operator models for physical systems and explore ways to mitigate it. We introduce architectural and application-specific improvements that allow for careful control of instability-inducing operations within these models without inflating the compute/memory expense. We present results on several scientific systems that include Navier-Stokes fluid flow, rotating shallow water, and a high-resolution global weather forecasting system. We demonstrate that applying our design principles to neural operators leads to significantly lower errors for long-term forecasts as well as longer time horizons without qualitative signs of divergence compared to the original models for these systems. We open-source our code for reproducibility.

PDF Details

NeurIPS Conference 2021 Conference Paper

Learning to Assimilate in Chaotic Dynamical Systems

Michael McCabe
Jed Brown

The accuracy of simulation-based forecasting in chaotic systems is heavily dependent on high-quality estimates of the system state at the beginning of the forecast. Data assimilation methods are used to infer these initial conditions by systematically combining noisy, incomplete observations and numerical models of system dynamics to produce highly effective estimation schemes. We introduce a self-supervised framework, which we call \textit{amortized assimilation}, for learning to assimilate in dynamical systems. Amortized assimilation combines deep learning-based denoising with differentiable simulation, using independent neural networks to assimilate specific observation types while connecting the gradient flow between these sub-tasks with differentiable simulation and shared recurrent memory. This hybrid architecture admits a self-supervised training objective which is minimized by an unbiased estimator of the true system state even in the presence of only noisy training data. Numerical experiments across several chaotic benchmark systems highlight the improved effectiveness of our approach compared to widely-used data assimilation methods.

PDF Details