Author name cluster

Michael Eickenberg

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

15 papers

2 author rows

TMLR Journal 2026 Journal Article

From Feature Visualization to Visual Circuits: Effect of Model Perturbation

Geraldin Nanfack
Michael Eickenberg
Eugene Belilovsky

Understanding the inner workings of large-scale deep neural networks is challenging yet crucial in several high-stakes applications. Mechanistic interpretability is an emergent field that tackles this challenge, often by identifying human-understandable subgraphs in deep neural networks known as circuits. In vision-pretrained models, these subgraphs are typically interpreted by visualizing their node features through a popular technique called feature visualization. Recent works have analyzed the stability of different feature visualization types under the adversarial model manipulation framework, where models are subtly perturbed to alter their interpretations while maintaining performance. However, existing model manipulation methods have two key limitations: (1) they manipulate either synthetic or natural feature visualizations individually, but not both simultaneously, and (2) no work has studied whether circuit-based interpretations are vulnerable to such manipulations. This paper exposes these vulnerabilities by proposing a novel attack called ProxPulse that simultaneously manipulates both types of feature visualizations. Surprisingly, we find that visual circuits exhibit some robustness to ProxPulse. We therefore introduce CircuitBreaker, the first attack targeting entire circuits, which successfully manipulates circuit interpretations, revealing that circuits also lack robustness. The effectiveness of these attacks is validated across a range of pre-trained models, from smaller architectures like AlexNet to medium-scale models like ResNet-50, and larger ones such as ResNet-152 and DenseNet-201 on ImageNet. ProxPulse changes both visualization types with <1\% accuracy drop, while our CircuitBreaker attack manipulates visual circuits with attribution correlation scores dropping from near-perfect to ~0.6 while preserving circuit head functionality.

PDF Details

ICLR Conference 2025 Conference Paper

PETRA: Parallel End-to-end Training with Reversible Architectures

Stéphane Rivaud
Louis Fournier
Thomas Pumir
Eugene Belilovsky
Michael Eickenberg
Edouard Oyallon

Reversible architectures have been shown to be capable of performing on par with their non-reversible architectures, being applied in deep learning for memory savings and generative modeling. In this work, we show how reversible architectures can solve challenges in parallelizing deep model training. We introduce PETRA, a novel alternative to backpropagation for parallelizing gradient computations. PETRA facilitates effective model parallelism by enabling stages (i.e., a set of layers) to compute independently on different devices, while only needing to communicate activations and gradients between each other. By decoupling the forward and backward passes and keeping a single updated version of the parameters, the need for weight stashing is also removed. We develop a custom autograd-like training framework for PETRA, and we demonstrate its effectiveness on standard computer vision benchmarks, achieving competitive accuracies comparable to backpropagation using ResNet-18, ResNet-34, and ResNet-50 models.

Details

AAAI Conference 2024 Conference Paper

Adversarial Attacks on the Interpretation of Neuron Activation Maximization

Geraldin Nanfack
Alexander Fulleringer
Jonathan Marty
Michael Eickenberg
Eugene Belilovsky

Feature visualization is one of the most popular techniques used to interpret the internal behavior of individual units of trained deep neural networks. Based on activation maximization, they consist of finding synthetic or natural inputs that maximize neuron activations. This paper introduces an optimization framework that aims to deceive feature visualization through adversarial model manipulation. It consists of finetuning a pre-trained model with a specifically introduced loss that aims to maintain model performance, while also significantly changing feature visualization. We provide evidence of the success of this manipulation on several pre-trained models for the classification task with ImageNet.

PDF Details DOI

ICML Conference 2024 Conference Paper

MoMo: Momentum Models for Adaptive Learning Rates

Fabian Schaipp
Ruben Ohana
Michael Eickenberg
Aaron Defazio
Robert M. Gower

Training a modern machine learning architecture on a new task requires extensive learning-rate tuning, which comes at a high computational cost. Here we develop new Polyak-type adaptive learning rates that can be used on top of any momentum method, and require less tuning to perform well. We first develop MoMo, a Mo mentum Mo del based adaptive learning rate for SGD-M (stochastic gradient descent with momentum). MoMo uses momentum estimates of the batch losses and gradients sampled at each iteration to build a model of the loss function. Our model also makes use of any known lower bound of the loss function by using truncation, e. g. most losses are lower-bounded by zero. The models is then approximately minimized at each iteration to compute the next step. We show how MoMo can be used in combination with any momentum-based method, and showcase this by developing MoMo-Adam - which is Adam with our new model-based adaptive learning rate. We show that MoMo attains a $\mathcal{O}(1/\sqrt{K})$ convergence rate for convex problems with interpolation, needing knowledge of no problem-specific quantities other than the optimal value. Additionally, for losses with unknown lower bounds, we develop on-the-fly estimates of a lower bound, that are incorporated in our model. We demonstrate that MoMo and MoMo-Adam improve over SGD-M and Adam in terms of robustness to hyperparameter tuning for training image classifiers on MNIST, CIFAR, and Imagenet, for recommender systems on the Criteo dataset, for a transformer model on the translation task IWSLT14, and for a diffusion model.

Details

NeurIPS Conference 2024 Conference Paper

Multiple Physics Pretraining for Spatiotemporal Surrogate Models

Michael McCabe
Bruno Régaldo-Saint Blancard
Liam Parker
Ruben Ohana
Miles Cranmer
Alberto Bietti
Michael Eickenberg
Siavash Golkar

We introduce multiple physics pretraining (MPP), an autoregressive task-agnostic pretraining approach for physical surrogate modeling of spatiotemporal systems with transformers. In MPP, rather than training one model on a specific physical system, we train a backbone model to predict the dynamics of multiple heterogeneous physical systems simultaneously in order to learn features that are broadly useful across systems and facilitate transfer. In order to learn effectively in this setting, we introduce a shared embedding and normalization strategy that projects the fields of multiple systems into a shared embedding space. We validate the efficacy of our approach on both pretraining and downstream tasks over a broad fluid mechanics-oriented benchmark. We show that a single MPP-pretrained transformer is able to match or outperform task-specific baselines on all pretraining sub-tasks without the need for finetuning. For downstream tasks, we demonstrate that finetuning MPP-trained models results in more accurate predictions across multiple time-steps on systems with previously unseen physical components or higher dimensional systems compared to training from scratch or finetuning pretrained video foundation models. We open-source our code and model weights trained at multiple scales for reproducibility.

PDF Details DOI

TMLR Journal 2024 Journal Article

Statistical Component Separation for Targeted Signal Recovery in Noisy Mixtures

Bruno Régaldo-Saint Blancard
Michael Eickenberg

Separating signals from an additive mixture may be an unnecessarily hard problem when one is only interested in specific properties of a given signal. In this work, we tackle simpler "statistical component separation" problems that focus on recovering a predefined set of statistical descriptors of a target signal from a noisy mixture. Assuming access to samples of the noise process, we investigate a method devised to match the statistics of the solution candidate corrupted by noise samples with those of the observed mixture. We first analyze the behavior of this method using simple examples with analytically tractable calculations. Then, we apply it in an image denoising context employing 1) wavelet-based descriptors, 2) ConvNet-based descriptors on astrophysics and ImageNet data. In the case of 1), we show that our method better recovers the descriptors of the target data than a standard denoising method in most situations. Additionally, despite not constructed for this purpose, it performs surprisingly well in terms of peak signal-to-noise ratio on full signal reconstruction. In comparison, representation 2) appears less suitable for image denoising. Finally, we extend this method by introducing a diffusive stepwise algorithm which gives a new perspective to the initial method and leads to promising results for image denoising under specific circumstances.

PDF Details

ICML Conference 2023 Conference Paper

Can Forward Gradient Match Backpropagation?

Louis Fournier
Stéphane Rivaud
Eugene Belilovsky
Michael Eickenberg
Edouard Oyallon

Forward Gradients - the idea of using directional derivatives in forward differentiation mode - have recently been shown to be utilizable for neural network training while avoiding problems generally associated with backpropagation gradient computation, such as locking and memorization requirements. The cost is the requirement to guess the step direction, which is hard in high dimensions. While current solutions rely on weighted averages over isotropic guess vector distributions, we propose to strongly bias our gradient guesses in directions that are much more promising, such as feedback obtained from small, local auxiliary networks. For a standard computer vision neural network, we conduct a rigorous study systematically covering a variety of combinations of gradient targets and gradient guesses, including those previously presented in the literature. We find that using gradients obtained from a local loss as a candidate direction drastically improves on random noise in Forward Gradient methods.

Details

YNIMG Journal 2022 Journal Article

Feature-space selection with banded ridge regression

Tom Dupré la Tour
Michael Eickenberg
Anwar O. Nunez-Elizalde
Jack L. Gallant

Encoding models provide a powerful framework to identify the information represented in brain recordings. In this framework, a stimulus representation is expressed within a feature space and is used in a regularized linear regression to predict brain activity. To account for a potential complementarity of different feature spaces, a joint model is fit on multiple feature spaces simultaneously. To adapt regularization strength to each feature space, ridge regression is extended to banded ridge regression, which optimizes a different regularization hyperparameter per feature space. The present paper proposes a method to decompose over feature spaces the variance explained by a banded ridge regression model. It also describes how banded ridge regression performs a feature-space selection, effectively ignoring non-predictive and redundant feature spaces. This feature-space selection leads to better prediction accuracy and to better interpretability. Banded ridge regression is then mathematically linked to a number of other regression methods with similar feature-space selection mechanisms. Finally, several methods are proposed to address the computational challenge of fitting banded ridge regressions on large numbers of voxels and feature spaces. All implementations are released in an open-source Python package called Himalaya.

Details DOI

ICML Conference 2020 Conference Paper

Decoupled Greedy Learning of CNNs

Eugene Belilovsky
Michael Eickenberg
Edouard Oyallon

A commonly cited inefficiency of neural network training by back-propagation is the update locking problem: each layer must wait for the signal to propagate through the network before updating. In recent years multiple authors have considered alternatives that can alleviate this issue. In this context, we consider a simpler, but more effective, substitute that uses minimal feedback, which we call Decoupled Greedy Learning (DGL). It is based on a greedy relaxation of the joint training objective, recently shown to be effective in the context of Convolutional Neural Networks (CNNs) on large-scale image classification. We consider an optimization of this objective that permits us to decouple the layer training, allowing for layers or modules in networks to be trained with a potentially linear parallelization in layers. We show theoretically and empirically that this approach converges. Then, we empirically find that it can lead to better generalization than sequential greedy optimization and sometimes end-to-end back-propagation. We show an extension of this approach to asynchronous settings, where modules can operate with large communication delays, is possible with the use of a replay buffer. We demonstrate the effectiveness of DGL on the CIFAR-10 dataset against alternatives and on the large-scale ImageNet dataset.

Details

JMLR Journal 2020 Journal Article

Kymatio: Scattering Transforms in Python

Mathieu Andreux
Tomás Angles
Georgios Exarchakis
Roberto Leonarduzzi
Gaspar Rochette
Louis Thiry
John Zarka
Stéphane Mallat

The wavelet scattering transform is an invariant and stable signal representation suitable for many signal processing and machine learning applications. We present the Kymatio software package, an easy-to-use, high-performance Python implementation of the scattering transform in 1D, 2D, and 3D that is compatible with modern deep learning frameworks, including PyTorch and TensorFlow/Keras. The transforms are implemented on both CPUs and GPUs, the latter offering a significant speedup over the former. The package also has a small memory footprint. Source code, documentation, and examples are available under a BSD license at https://www.kymat.io. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2020. ( edit, beta )

PDF Details

ICML Conference 2019 Conference Paper

Greedy Layerwise Learning Can Scale To ImageNet

Eugene Belilovsky
Michael Eickenberg
Edouard Oyallon

Shallow supervised 1-hidden layer neural networks have a number of favorable properties that make them easier to interpret, analyze, and optimize than their deep counterparts, but lack their representational power. Here we use 1-hidden layer learning problems to sequentially build deep networks layer by layer, which can inherit properties from shallow networks. Contrary to previous approaches using shallow networks, we focus on problems where deep learning is reported as critical for success. We thus study CNNs on image classification tasks using the large-scale ImageNet dataset and the CIFAR-10 dataset. Using a simple set of ideas for architecture and training we find that solving sequential 1-hidden-layer auxiliary problems lead to a CNN that exceeds AlexNet performance on ImageNet. Extending this training methodology to construct individual layers by solving 2-and-3-hidden layer auxiliary problems, we obtain an 11-layer network that exceeds several members of the VGG model family on ImageNet, and can train a VGG-11 model to the same accuracy as end-to-end learning. To our knowledge, this is the first competitive alternative to end-to-end training of CNNs that can scale to ImageNet. We illustrate several interesting properties of these models and conduct a range of experiments to study the properties this training induces on the intermediate layers.

Details

YNIMG Journal 2017 Journal Article

Seeing it all: Convolutional network layers map the function of the human visual system

Michael Eickenberg
Alexandre Gramfort
Gaël Varoquaux
Bertrand Thirion

Convolutional networks used for computer vision represent candidate models for the computations performed in mammalian visual systems. We use them as a detailed model of human brain activity during the viewing of natural images by constructing predictive models based on their different layers and BOLD fMRI activations. Analyzing the predictive performance across layers yields characteristic fingerprints for each visual brain region: early visual areas are better described by lower level convolutional net layers and later visual areas by higher level net layers, exhibiting a progression across ventral and dorsal streams. Our predictive model generalizes beyond brain responses to natural images. We illustrate this on two experiments, namely retinotopy and face-place oppositions, by synthesizing brain activity and performing classical brain mapping upon it. The synthesis recovers the activations observed in the corresponding fMRI studies, showing that this deep encoding model captures representations of brain function that are universal across experimental paradigms.

Details DOI

NeurIPS Conference 2017 Conference Paper

Solid Harmonic Wavelet Scattering: Predicting Quantum Molecular Energy from Invariant Descriptors of 3D Electronic Densities

Michael Eickenberg
Georgios Exarchakis
Matthew Hirn
Stephane Mallat

We introduce a solid harmonic wavelet scattering representation, invariant to rigid motion and stable to deformations, for regression and classification of 2D and 3D signals. Solid harmonic wavelets are computed by multiplying solid harmonic functions with Gaussian windows dilated at different scales. Invariant scattering coefficients are obtained by cascading such wavelet transforms with the complex modulus nonlinearity. We study an application of solid harmonic scattering invariants to the estimation of quantum molecular energies, which are also invariant to rigid motion and stable with respect to deformations. A multilinear regression over scattering invariants provides close to state of the art results over small and large databases of organic molecules.

PDF Details

YNIMG Journal 2015 Journal Article

Data-driven HRF estimation for encoding and decoding models

Fabian Pedregosa
Michael Eickenberg
Philippe Ciuciu
Bertrand Thirion
Alexandre Gramfort

Despite the common usage of a canonical, data-independent, hemodynamic response function (HRF), it is known that the shape of the HRF varies across brain regions and subjects. This suggests that a data-driven estimation of this function could lead to more statistical power when modeling BOLD fMRI data. However, unconstrained estimation of the HRF can yield highly unstable results when the number of free parameters is large. We develop a method for the joint estimation of activation and HRF by means of a rank constraint, forcing the estimated HRF to be equal across events or experimental conditions, yet permitting it to differ across voxels. Model estimation leads to an optimization problem that we propose to solve with an efficient quasi-Newton method, exploiting fast gradient computations. This model, called GLM with Rank-1 constraint (R1-GLM), can be extended to the setting of GLM with separate designs which has been shown to improve decoding accuracy in brain activity decoding experiments. We compare 10 different HRF modeling methods in terms of encoding and decoding scores on two different datasets. Our results show that the R1-GLM model outperforms competing methods in both encoding and decoding settings, positioning it as an attractive method both from the points of view of accuracy and computational efficiency.

Details DOI

NeurIPS Conference 2015 Conference Paper

Semi-Supervised Factored Logistic Regression for High-Dimensional Neuroimaging Data

Danilo Bzdok
Michael Eickenberg
Olivier Grisel
Bertrand Thirion
Gael Varoquaux

Imaging neuroscience links human behavior to aspects of brain biology in ever-increasing datasets. Existing neuroimaging methods typically perform either discovery of unknown neural structure or testing of neural structure associated with mental tasks. However, testing hypotheses on the neural correlates underlying larger sets of mental tasks necessitates adequate representations for the observations. We therefore propose to blend representation modelling and task classification into a unified statistical learning problem. A multinomial logistic regression is introduced that is constrained by factored coefficients and coupled with an autoencoder. We show that this approach yields more accurate and interpretable neural models of psychological tasks in a reference dataset, as well as better generalization to other datasets.

PDF Details