Author name cluster

Jerome Bolte

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers

1 author row

TMLR Journal 2025 Journal Article

A second-order-like optimizer with adaptive gradient scaling for deep learning

Jerome Bolte
Ryan Boustany
Edouard Pauwels
Andrei Purica

In this empirical article, we introduce INNAprop, an optimization algorithm that combines the INNA method with the RMSprop adaptive gradient scaling. It leverages second order information and rescaling while keeping the memory and compute requirements of standard DL methods as AdamW or SGD. INNAprop is evaluated on CIFAR-10, Food101, and ImageNet with ResNets, VGG, DenseNet, and ViT. We also train GPT-2 (OpenWebText) from scratch and with LoRA fine-tuning (E2E). INNAprop consistently offers close performance to AdamW, while performing significantly better in our LLM training experiments, achieving faster convergence and higher accuracy with minimal hyperparameter tuning, even at large scale. Our code is public.

PDF Details

NeurIPS Conference 2025 Conference Paper

When majority rules, minority loses: bias amplification of gradient descent

François Bachoc
Jerome Bolte
Ryan Boustany
Jean-Michel Loubes

Despite growing empirical evidence of bias amplification in machine learning, its theoretical foundations remain poorly understood. We develop a formal framework for majority-minority learning tasks, showing how standard training can favor majority groups and produce stereotypical predictors that neglect minority-specific features. Assuming population and variance imbalance, our analysis reveals three key findings: (i) the close proximity between "full-data" and stereotypical predictors, (ii) the dominance of a region where training the entire model tends to merely learn the majority traits, and (iii) a lower bound on the additional training required. Our results are illustrated through experiments in deep learning for tabular and image classification tasks.

PDF Details

NeurIPS Conference 2023 Conference Paper

One-step differentiation of iterative algorithms

Jerome Bolte
Edouard Pauwels
Samuel Vaiter

In appropriate frameworks, automatic differentiation is transparent to the user, at the cost of being a significant computational burden when the number of operations is large. For iterative algorithms, implicit differentiation alleviates this issue but requires custom implementation of Jacobian evaluation. In this paper, we study one-step differentiation, also known as Jacobian-free backpropagation, a method as easy as automatic differentiation and as performant as implicit differentiation for fast algorithms (e. g. superlinear optimization methods). We provide a complete theoretical approximation analysis with specific examples (Newton's method, gradient descent) along with its consequences in bilevel optimization. Several numerical examples illustrate the well-foundness of the one-step estimator.

PDF Details

NeurIPS Conference 2022 Conference Paper

Automatic differentiation of nonsmooth iterative algorithms

Jerome Bolte
Edouard Pauwels
Samuel Vaiter

Differentiation along algorithms, i. e. , piggyback propagation of derivatives, is now routinely used to differentiate iterative solvers in differentiable programming. Asymptotics is well understood for many smooth problems but the nondifferentiable case is hardly considered. Is there a limiting object for nonsmooth piggyback automatic differentiation (AD)? Does it have any variational meaning and can it be used effectively in machine learning? Is there a connection with classical derivative? All these questions are addressed under appropriate contractivity conditions in the framework of conservative derivatives which has proved useful in understanding nonsmooth AD. For nonsmooth piggyback iterations, we characterize the attractor set of nonsmooth piggyback iterations as a set-valued fixed point which remains in the conservative framework. This has various consequences and in particular almost everywhere convergence of classical derivatives. Our results are illustrated on parametric convex optimization problems with forward-backward, Douglas-Rachford and Alternating Direction of Multiplier algorithms as well as the Heavy-Ball method.

PDF Details