Author name cluster

Florence Regol

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers

2 author rows

NeurIPS Conference 2025 Conference Paper

Is the acquisition worth the cost? Surrogate losses for Consistent Two-stage Classifiers

Florence Regol
Joseph Cotnareanu
Theodore Glavas
Mark Coates

Recent years have witnessed the emergence of a spectrum of foundation models, covering a broad range of capabilities and costs. Often, we effectively use foundation models as feature generators and train classifiers that use the outputs of these models to make decisions. In this paper, we consider an increasingly relevant setting where we have two classifier stages. The first stage has access to features $x$ and has the option to make a classification decision or defer, while incurring a cost, to a second classifier that has access to features $x$ and $z$. This is similar to the ``learning to defer'' setting, with the important difference that we train both classifiers jointly, and the second classifier has access to more information. The natural loss for this setting is an $\ell_{01c}$ loss, where a penalty is paid for incorrect classification, as in $\ell_{01}$, but an additional penalty $c$ is paid for consulting the second classifier. The $\ell_{01c}$ loss is unwieldy for training. Our primary contribution in this paper is the derivation of a hinge-based surrogate loss $\ell^c_{hinge}$ that is much more amenable to training but also satisfies the property that $\ell^c_{hinge}$-consistency implies $\ell_{01c}$-consistency.

PDF Details

ICML Conference 2025 Conference Paper

When to retrain a machine learning model

Florence Regol
Leo Schwinn
Kyle Sprague
Mark Coates
Thomas Markovich

A significant challenge in maintaining real-world machine learning models is responding to the continuous and unpredictable evolution of data. Most practitioners are faced with the difficult question: when should I retrain or update my machine learning model? This seemingly straightforward problem is particularly challenging for three reasons: 1) decisions must be made based on very limited information - we usually have access to only a few examples, 2) the nature, extent, and impact of the distribution shift are unknown, and 3) it involves specifying a cost ratio between retraining and poor performance, which can be hard to characterize. Existing works address certain aspects of this problem, but none offer a comprehensive solution. Distribution shift detection falls short as it cannot account for the cost trade-off; the scarcity of the data, paired with its unusual structure, makes it a poor fit for existing offline reinforcement learning methods, and the online learning formulation overlooks key practical considerations. To address this, we present a principled formulation of the retraining problem and propose an uncertainty-based method that makes decisions by continually forecasting the evolution of model performance evaluated with a bounded metric. Our experiments, addressing classification tasks, show that the method consistently outperforms existing baselines on 7 datasets. We thoroughly assess its robustness to varying cost trade-off values and mis-specified cost trade-offs.

Details

ICML Conference 2024 Conference Paper

Interacting Diffusion Processes for Event Sequence Forecasting

Mai Zeng
Florence Regol
Mark Coates

Neural Temporal Point Processes (TPPs) have emerged as the primary framework for predicting sequences of events that occur at irregular time intervals, but their sequential nature can hamper performance for long-horizon forecasts. To address this, we introduce a novel approach that incorporates a diffusion generative model. The model facilitates sequence-to-sequence prediction, allowing multi-step predictions based on historical event sequences. In contrast to previous approaches, our model directly learns the joint probability distribution of types and inter-arrival times for multiple events. The model is composed of two diffusion processes, one for the time intervals and one for the event types. These processes interact through their respective denoising functions, which can take as input intermediate representations from both processes, allowing the model to learn complex interactions. We demonstrate that our proposal outperforms state-of-the-art baselines for long-horizon forecasting of TPPs.

Details

ICLR Conference 2024 Conference Paper

Jointly-Learned Exit and Inference for a Dynamic Neural Network

Florence Regol
Joud Chataoui
Mark Coates

Large pretrained models, coupled with fine-tuning, are slowly becoming established as the dominant architecture in machine learning. Even though these models offer impressive performance, their practical application is often limited by the prohibitive amount of resources required for $\textit{every}$ inference. Early-exiting dynamic neural networks (EDNN) circumvent this issue by allowing a model to make some of its predictions from intermediate layers (i.e., early-exit). Training an EDNN architecture is challenging as it consists of two intertwined components: the gating mechanism (GM) that controls early-exiting decisions and the intermediate inference modules (IMs) that perform inference from intermediate representations. As a result, most existing approaches rely on thresholding confidence metrics for the gating mechanism and strive to improve the underlying backbone network and the inference modules. Although successful, this approach has two fundamental shortcomings: 1) the GMs and the IMs are decoupled during training, leading to a train-test mismatch; and 2) the thresholding gating mechanism introduces a positive bias into the predictive probabilities, making it difficult to readily extract uncertainty information. We propose a novel architecture that connects these two modules. This leads to significant performance improvements on classification datasets and enables better uncertainty characterization capabilities.

Details

AAAI Conference 2023 Conference Paper

Diffusing Gaussian Mixtures for Generating Categorical Data

Florence Regol
Mark Coates

Learning a categorical distribution comes with its own set of challenges. A successful approach taken by state-of-the-art works is to cast the problem in a continuous domain to take advantage of the impressive performance of the generative models for continuous data. Amongst them are the recently emerging diffusion probabilistic models, which have the observed advantage of generating high-quality samples. Recent advances for categorical generative models have focused on log likelihood improvements. In this work, we propose a generative model for categorical data based on diffusion models with a focus on high-quality sample generation, and propose sampled-based evaluation methods. The efficacy of our method stems from performing diffusion in the continuous domain while having its parameterization informed by the structure of the categorical nature of the target distribution. Our method of evaluation highlights the capabilities and limitations of different generative models for generating categorical data, and includes experiments on synthetic and real-world protein datasets.

PDF Details DOI

AAAI Conference 2022 Conference Paper

Bag Graph: Multiple Instance Learning Using Bayesian Graph Neural Networks

Soumyasundar Pal
Antonios Valkanas
Florence Regol
Mark Coates

Multiple Instance Learning (MIL) is a weakly supervised learning problem where the aim is to assign labels to sets or bags of instances, as opposed to traditional supervised learning where each instance is assumed to be independent and identically distributed (i. i. d.) and is to be labeled individually. Recent work has shown promising results for neural network models in the MIL setting. Instead of focusing on each instance, these models are trained in an end-to-end fashion to learn effective bag-level representations by suitably combining permutation invariant pooling techniques with neural architectures. In this paper, we consider modelling the interactions between bags using a graph and employ Graph Neural Networks (GNNs) to facilitate end-to-end learning. Since a meaningful graph representing dependencies between bags is rarely available, we propose to use a Bayesian GNN framework that can generate a likely graph structure for scenarios where there is uncertainty in the graph or when no graph is available. Empirical results demonstrate the efficacy of the proposed technique for several MIL benchmark tasks and a distribution regression task.

PDF Details

ICML Conference 2020 Conference Paper

Active Learning on Attributed Graphs via Graph Cognizant Logistic Regression and Preemptive Query Generation

Florence Regol
Soumyasundar Pal
Yingxue Zhang 0001
Mark Coates

Node classification in attributed graphs is an important task in multiple practical settings, but it can often be difficult or expensive to obtain labels. Active learning can improve the achieved classification performance for a given budget on the number of queried labels. The best existing methods are based on graph neural networks, but they often perform poorly unless a sizeable validation set of labelled nodes is available in order to choose good hyperparameters. We propose a novel graph-based active learning algorithm for the task of node classification in attributed graphs; our algorithm uses graph cognizant logistic regression, equivalent to a linearized graph-convolutional neural network (GCN), for the prediction phase and maximizes the expected error reduction in the query phase. To reduce the delay experienced by a labeller interacting with the system, we derive a preemptive querying system that calculates a new query during the labelling process, and to address the setting where learning starts with almost no labelled data, we also develop a hybrid algorithm that performs adaptive model averaging of label propagation and linearized GCN inference. We conduct experiments on five public benchmark datasets, demonstrating a significant improvement over state-of-the-art approaches and illustrate the practical value of the method by applying it to a private microwave link network dataset.

Details

UAI Conference 2020 Conference Paper

Non Parametric Graph Learning for Bayesian Graph Neural Networks

Soumyasundar Pal
Saber Malekmohammadi
Florence Regol
Yingxue Zhang 0001
Yishi Xu
Mark Coates

Graphs are ubiquitous in modelling relationalstructures. Recent endeavours in machine learningfor graph structured data have led to manyarchitectures and learning algorithms. However, the graph used by these algorithms is oftenconstructed based on inaccurate modellingassumptions and/or noisy data. As a result, itfails to represent the true relationships betweennodes. A Bayesian framework which targetsposterior inference of the graph by consideringit as a random quantity can be beneficial. Inthis paper, we propose a novel non-parametricgraph model for constructing the posterior distributionof graph adjacency matrices. The proposedmodel is flexible in the sense that it caneffectively take into account the output of graphbased learning algorithms that target specifictasks. In addition, model inference scales wellto large graphs. We demonstrate the advantagesof this model in three different problem settings: node classification, link prediction andrecommendation.

Details