Arrow Research search

Author name cluster

Mark Girolami

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

20 papers
1 author row

Possible papers

20

JMLR Journal 2025 Journal Article

Autoencoders in Function Space

  • Justin Bunker
  • Mark Girolami
  • Hefin Lambley
  • Andrew M. Stuart
  • T. J. Sullivan

Autoencoders have found widespread application in both their original deterministic form and in their variational formulation (VAEs). In scientific applications and in image processing it is often of interest to consider data that are viewed as functions; while discretisation (of differential equations arising in the sciences) or pixellation (of images) renders problems finite dimensional in practice, conceiving first of algorithms that operate on functions, and only then discretising or pixellating, leads to better algorithms that smoothly operate between resolutions. In this paper function-space versions of the autoencoder (FAE) and variational autoencoder (FVAE) are introduced, analysed, and deployed. Well-definedness of the objective governing VAEs is a subtle issue, particularly in function space, limiting applicability. For the FVAE objective to be well defined requires compatibility of the data distribution with the chosen generative model; this can be achieved, for example, when the data arise from a stochastic differential equation, but is generally restrictive. The FAE objective, on the other hand, is well defined in many situations where FVAE fails to be. Pairing the FVAE and FAE objectives with neural operator architectures that can be evaluated on any mesh enables new applications of autoencoders to inpainting, superresolution, and generative modelling of scientific data. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2025. ( edit, beta )

NeurIPS Conference 2024 Conference Paper

Generating Origin-Destination Matrices in Neural Spatial Interaction Models

  • Ioannis Zachos
  • Mark Girolami
  • Theodoros Damoulas

Agent-based models (ABMs) are proliferating as decision-making tools across policy areas in transportation, economics, and epidemiology. In these models, a central object of interest is the discrete origin-destination matrix which captures spatial interactions and agent trip counts between locations. Existing approaches resort to continuous approximations of this matrix and subsequent ad-hoc discretisations in order to perform ABM simulation and calibration. This impedes conditioning on partially observed summary statistics, fails to explore the multimodal matrix distribution over a discrete combinatorial support, and incurs discretisation errors. To address these challenges, we introduce a computationally efficient framework that scales linearly with the number of origin-destination pairs, operates directly on the discrete combinatorial space, and learns the agents' trip intensity through a neural differential equation that embeds spatial interactions. Our approach outperforms the prior art in terms of reconstruction error and ground truth matrix coverage, at a fraction of the computational cost. We demonstrate these benefits in two large-scale spatial mobility ABMs in Washington, DC and Cambridge, UK.

JMLR Journal 2024 Journal Article

Targeted Separation and Convergence with Kernel Discrepancies

  • Alessandro Barp
  • Carl-Johann Simon-Gabriel
  • Mark Girolami
  • Lester Mackey

Maximum mean discrepancies (MMDs) like the kernel Stein discrepancy (KSD) have grown central to a wide range of applications, including hypothesis testing, sampler selection, distribution approximation, and variational inference. In each setting, these kernel-based discrepancy measures are required to $(i)$ separate a target $\mathrm{P}$ from other probability measures or even $(ii)$ control weak convergence to $\mathrm{P}$. In this article we derive new sufficient and necessary conditions to ensure $(i)$ and $(ii)$. For MMDs on separable metric spaces, we characterize those kernels that separate Bochner embeddable measures and introduce simple conditions for separating all measures with unbounded kernels and for controlling convergence with bounded kernels. We use these results on $\mathbb{R}^d$ to substantially broaden the known conditions for KSD separation and convergence control and to develop the first KSDs known to exactly metrize weak convergence to $\mathrm{P}$. Along the way, we highlight the implications of our results for hypothesis testing, measuring and improving sample quality, and sampling with Stein variational gradient descent. [abs] [ pdf ][ bib ] &copy JMLR 2024. ( edit, beta )

TMLR Journal 2024 Journal Article

Tweedie Moment Projected Diffusions for Inverse Problems

  • Benjamin Boys
  • Mark Girolami
  • Jakiw Pidstrigach
  • Sebastian Reich
  • Alan Mosca
  • Omer Deniz Akyildiz

Diffusion generative models unlock new possibilities for inverse problems as they allow for the incorporation of strong empirical priors into the process of scientific inference. Recently, diffusion models are repurposed for solving inverse problems using Gaussian approximations to conditional densities of the reverse process via Tweedie’s formula to parameterise the mean, complemented with various heuristics. To address various challenges arising from these approximations, we leverage higher order information using Tweedie’s formula and obtain a statistically principled approximation. We further provide a theoretical guarantee specifically for posterior sampling which can lead to better theoretical understanding of diffusion-based conditional sampling. Finally, we illustrate the empirical effectiveness of our approach for general linear inverse problems on toy synthetic examples as well as image restoration. We show that our method (i) removes any time-dependent step-size hyperparameters required by earlier methods, (ii) brings stability and better sample quality across multiple noise levels, (iii) is the only method that works in a stable way with variance exploding (VE) forward processes as opposed to earlier works.

TMLR Journal 2023 Journal Article

Sobolev Spaces, Kernels and Discrepancies over Hyperspheres

  • Simon Hubbert
  • Emilio Porcu
  • Chris J. Oates
  • Mark Girolami

This work extends analytical foundations for kernel methods beyond the usual Euclidean manifold. Specifically, we characterise the smoothness of the native spaces (reproducing kernel Hilbert spaces) that are reproduced by geodesically isotropic kernels in the hyperspherical context. Our results are relevant to several areas of machine learning; we focus on their consequences for kernel cubature, determining the rate of convergence of the worst case error, and expanding the applicability of cubature algorithms based on Stein's method. First, we introduce a characterisation of Sobolev spaces on the $d$-dimensional sphere based on the Fourier--Schoenberg sequences associated with a given kernel. Such sequences are hard (if not impossible) to compute analytically on $d$-dimensional spheres, but often feasible over Hilbert spheres, where $d = \infty$. Second, we circumvent this problem by finding a projection operator that allows us to map from Hilbert spheres to finite-dimensional spheres. Our findings are illustrated for selected parametric families of kernel.

JMLR Journal 2021 Journal Article

Convergence Guarantees for Gaussian Process Means With Misspecified Likelihoods and Smoothness

  • George Wynne
  • François-Xavier Briol
  • Mark Girolami

Gaussian processes are ubiquitous in machine learning, statistics, and applied mathematics. They provide a flexible modelling framework for approximating functions, whilst simultaneously quantifying uncertainty. However, this is only true when the model is well-specified, which is often not the case in practice. In this paper, we study the properties of Gaussian process means when the smoothness of the model and the likelihood function are misspecified. In this setting, an important theoretical question of practical relevance is how accurate the Gaussian process approximations will be given the chosen model and the extent of the misspecification. The answer to this problem is particularly useful since it can inform our choice of model and experimental design. In particular, we describe how the experimental design and choice of kernel and kernel hyperparameters can be adapted to alleviate model misspecification. [abs] [ pdf ][ bib ] &copy JMLR 2021. ( edit, beta )

NeurIPS Conference 2019 Conference Paper

Minimum Stein Discrepancy Estimators

  • Alessandro Barp
  • Francois-Xavier Briol
  • Andrew Duncan
  • Mark Girolami
  • Lester Mackey

When maximum likelihood estimation is infeasible, one often turns to score matching, contrastive divergence, or minimum probability flow to obtain tractable parameter estimates. We provide a unifying perspective of these techniques as minimum Stein discrepancy estimators, and use this lens to design new diffusion kernel Stein discrepancy (DKSD) and diffusion score matching (DSM) estimators with complementary strengths. We establish the consistency, asymptotic normality, and robustness of DKSD and DSM estimators, then derive stochastic Riemannian gradient descent algorithms for their efficient optimisation. The main strength of our methodology is its flexibility, which allows us to design estimators with desirable properties for specific models at hand by carefully selecting a Stein discrepancy. We illustrate this advantage for several challenging problems for score matching, such as non-smooth, heavy-tailed or light-tailed densities.

NeurIPS Conference 2019 Conference Paper

Multi-resolution Multi-task Gaussian Processes

  • Oliver Hamelijnck
  • Theodoros Damoulas
  • Kangrui Wang
  • Mark Girolami

We consider evidence integration from potentially dependent observation processes under varying spatio-temporal sampling resolutions and noise levels. We offer a multi-resolution multi-task (MRGP) framework that allows for both inter-task and intra-task multi-resolution and multi-fidelity. We develop shallow Gaussian Process (GP) mixtures that approximate the difficult to estimate joint likelihood with a composite one and deep GP constructions that naturally handle biases. In doing so, we generalize existing approaches and offer information-theoretic corrections and efficient variational approximations. We demonstrate the competitiveness of MRGPs on synthetic settings and on the challenging problem of hyper-local estimation of air pollution levels across London from multiple sensing modalities operating at disparate spatio-temporal resolutions.

NeurIPS Conference 2019 Conference Paper

Precision-Recall Balanced Topic Modelling

  • Seppo Virtanen
  • Mark Girolami

Topic models are becoming increasingly relevant probabilistic models for dimensionality reduction of text data, inferring topics that capture meaningful themes of frequently co-occurring terms. We formulate topic modelling as an information retrieval task, where the goal is, based on the latent topic representation, to capture relevant term co-occurrence patterns. We evaluate performance for this task rigorously with regard to two types of errors, false negatives and positives, based on the well-known precision-recall trade-off and provide a statistical model that allows the user to balance between the contributions of the different error types. When the user focuses solely on the contribution of false negatives ignoring false positives altogether our proposed model reduces to a standard topic model. Extensive experiments demonstrate the proposed approach is effective and infers more coherent topics than existing related approaches.

NeurIPS Conference 2017 Conference Paper

Probabilistic Models for Integration Error in the Assessment of Functional Cardiac Models

  • Chris Oates
  • Steven Niederer
  • Angela Lee
  • François-Xavier Briol
  • Mark Girolami

This paper studies the numerical computation of integrals, representing estimates or predictions, over the output $f(x)$ of a computational model with respect to a distribution $p(\mathrm{d}x)$ over uncertain inputs $x$ to the model. For the functional cardiac models that motivate this work, neither $f$ nor $p$ possess a closed-form expression and evaluation of either requires $\approx$ 100 CPU hours, precluding standard numerical integration methods. Our proposal is to treat integration as an estimation problem, with a joint model for both the a priori unknown function $f$ and the a priori unknown distribution $p$. The result is a posterior distribution over the integral that explicitly accounts for dual sources of numerical approximation error due to a severely limited computational budget. This construction is applied to account, in a statistically principled manner, for the impact of numerical errors that (at present) are confounding factors in functional cardiac model assessment.

NeurIPS Conference 2015 Conference Paper

Frank-Wolfe Bayesian Quadrature: Probabilistic Integration with Theoretical Guarantees

  • François-Xavier Briol
  • Chris Oates
  • Mark Girolami
  • Michael Osborne

There is renewed interest in formulating integration as an inference problem, motivated by obtaining a full distribution over numerical error that can be propagated through subsequent computation. Current methods, such as Bayesian Quadrature, demonstrate impressive empirical performance but lack theoretical analysis. An important challenge is to reconcile these probabilistic integrators with rigorous convergence guarantees. In this paper, we present the first probabilistic integrator that admits such theoretical treatment, called Frank-Wolfe Bayesian Quadrature (FWBQ). Under FWBQ, convergence to the true value of the integral is shown to be exponential and posterior contraction rates are proven to be superexponential. In simulations, FWBQ is competitive with state-of-the-art methods and out-performs alternatives based on Frank-Wolfe optimisation. Our approach is applied to successfully quantify numerical error in the solution to a challenging model choice problem in cellular biology.

NeurIPS Conference 2009 Conference Paper

Analysis of SVM with Indefinite Kernels

  • Yiming Ying
  • Colin Campbell
  • Mark Girolami

The recent introduction of indefinite SVM by Luss and dAspremont [15] has effectively demonstrated SVM classification with a non-positive semi-definite kernel (indefinite kernel). This paper studies the properties of the objective function introduced there. In particular, we show that the objective function is continuously differentiable and its gradient can be explicitly computed. Indeed, we further show that its gradient is Lipschitz continuous. The main idea behind our analysis is that the objective function is smoothed by the penalty term, in its saddle (min-max) representation, measuring the distance between the indefinite kernel matrix and the proxy positive semi-definite one. Our elementary result greatly facilitates the application of gradient-based algorithms. Based on our analysis, we further develop Nesterovs smooth optimization approach [16, 17] for indefinite SVM which has an optimal convergence rate for smooth problems. Experiments on various benchmark datasets validate our analysis and demonstrate the efficiency of our proposed algorithms.

NeurIPS Conference 2008 Conference Paper

Accelerating Bayesian Inference over Nonlinear Differential Equations with Gaussian Processes

  • Ben Calderhead
  • Mark Girolami
  • Neil Lawrence

Identification and comparison of nonlinear dynamical systems using noisy and sparse experimental data is a vital task in many fields, however current methods are computationally expensive and prone to error due in part to the nonlinear nature of the likelihood surfaces induced. We present an accelerated sampling procedure which enables Bayesian inference of parameters in nonlinear ordinary and delay differential equations via the novel use of Gaussian processes (GP). Our method involves GP regression over time-series data, and the resulting derivative and time delay estimates make parameter inference possible without solving the dynamical system explicitly, resulting in dramatic savings of computational time. We demonstrate the speed and statistical accuracy of our approach using examples of both ordinary and delay differential equations, and provide a comprehensive comparison with current state of the art methods.

NeurIPS Conference 2006 Conference Paper

Data Integration for Classification Problems Employing Gaussian Process Priors

  • Mark Girolami
  • Mingjun Zhong

By adopting Gaussian process priors a fully Bayesian solution to the problem of integrating possibly heterogeneous data sets within a classification setting is presented. Approximate inference schemes employing Variational & Expectation Propagation based methods are developed and rigorously assessed. We demonstrate our approach to integrating multiple data sets on a large scale protein fold prediction problem where we infer the optimal combinations of covariance functions and achieve state-of-the-art performance without resorting to any ad hoc parameter tuning and classifier combination.

NeurIPS Conference 2006 Conference Paper

Kernel Maximum Entropy Data Transformation and an Enhanced Spectral Clustering Algorithm

  • Robert Jenssen
  • Torbjørn Eltoft
  • Mark Girolami
  • Deniz Erdogmus

We propose a new kernel-based data transformation technique. It is founded on the principle of maximum entropy (MaxEnt) preservation, hence named kernel MaxEnt. The key measure is Renyi's entropy estimated via Parzen windowing. We show that kernel MaxEnt is based on eigenvectors, and is in that sense similar to kernel PCA, but may produce strikingly different transformed data sets. An enhanced spectral clustering algorithm is proposed, by replacing kernel PCA by kernel MaxEnt as an intermediate step. This has a major impact on performance.

NeurIPS Conference 2006 Conference Paper

Sparse Multinomial Logistic Regression via Bayesian L1 Regularisation

  • Gavin Cawley
  • Nicola Talbot
  • Mark Girolami

Multinomial logistic regression provides the standard penalised maximum- likelihood solution to multi-class pattern recognition problems. More recently, the development of sparse multinomial logistic regression models has found ap- plication in text processing and microarray classification, where explicit identifi- cation of the most informative features is of value. In this paper, we propose a sparse multinomial logistic regression method, in which the sparsity arises from the use of a Laplace prior, but where the usual regularisation parameter is inte- grated out analytically. Evaluation over a range of benchmark datasets reveals this approach results in similar generalisation performance to that obtained using cross-validation, but at greatly reduced computational expense.

NeurIPS Conference 2003 Conference Paper

Simplicial Mixtures of Markov Chains: Distributed Modelling of Dynamic User Profiles

  • Mark Girolami
  • Ata Kabán

To provide a compact generative representation of the sequential activ- ity of a number of individuals within a group there is a tradeoff between the definition of individual specific and global models. This paper pro- poses a linear-time distributed model for finite state symbolic sequences representing traces of individual user activity by making the assump- tion that heterogeneous user behavior may be ‘explained’ by a relatively small number of common structurally simple behavioral patterns which may interleave randomly in a user-specific proportion. The results of an empirical study on three different sources of user traces indicates that this modelling approach provides an efficient representation scheme, re- flected by improved prediction performance as well as providing low- complexity and intuitively interpretable representations.