Author name cluster

Markus Heinonen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

27 papers

2 author rows

ICML Conference 2025 Conference Paper

Devil is in the Details: Density Guidance for Detail-Aware Generation with Flow Models

Rafal Karczewski
Markus Heinonen
Vikas K. Garg 0001

Diffusion models have emerged as a powerful class of generative models, capable of producing high-quality images by mapping noise to a data distribution. However, recent findings suggest that image likelihood does not align with perceptual quality: high-likelihood samples tend to be smooth, while lower-likelihood ones are more detailed. Controlling sample density is thus crucial for balancing realism and detail. In this paper, we analyze an existing technique, Prior Guidance, which scales the latent code to influence image detail. We introduce score alignment, a condition that explains why this method works and show that it can be tractably checked for any continuous normalizing flow model. We then propose Density Guidance, a principled modification of the generative ODE that enables exact log-density control during sampling. Finally, we extend Density Guidance to stochastic sampling, ensuring precise log-density control while allowing controlled variation in structure or fine details. Our experiments demonstrate that these techniques provide fine-grained control over image detail without compromising sample quality. Code is available at https: //github. com/Aalto-QuML/density-guidance.

ICLR Conference 2025 Conference Paper

Diffusion Models as Cartoonists: The Curious Case of High Density Regions

Rafal Karczewski
Markus Heinonen
Vikas K. Garg 0001

We investigate what kind of images lie in the high-density regions of diffusion models. We introduce a theoretical mode-tracking process capable of pinpointing the exact mode of the denoising distribution, and we propose a practical high-density sampler that consistently generates images of higher likelihood than usual samplers. Our empirical findings reveal the existence of significantly higher likelihood samples that typical samplers do not produce, often manifesting as cartoon-like drawings or blurry images depending on the noise level. Curiously, these patterns emerge in datasets devoid of such examples. We also present a novel approach to track sample likelihoods in diffusion SDEs, which remarkably incurs no additional computational cost. Code is available at https://github.com/Aalto-QuML/high-density-diffusion

ICLR Conference 2025 Conference Paper

E(3)-equivariant models cannot learn chirality: Field-based molecular generation

Alexandru Dumitrescu
Dani Korpela
Markus Heinonen
Yogesh Verma
Valerii Iakovlev
Vikas K. Garg 0001
Harri Lähdesmäki

Obtaining the desired effect of drugs is highly dependent on their molecular geometries. Thus, the current prevailing paradigm focuses on 3D point-cloud atom representations, utilizing graph neural network (GNN) parametrizations, with rotational symmetries baked in via E(3) invariant layers. We prove that such models must necessarily disregard chirality, a geometric property of the molecules that cannot be superimposed on their mirror image by rotation and translation. Chirality plays a key role in determining drug safety and potency. To address this glaring issue, we introduce a novel field-based representation, proposing reference rotations that replace rotational symmetry constraints. The proposed model captures all molecular geometries including chirality, while still achieving highly competitive performance with E(3)-based methods across standard benchmarking metrics.

ICLR Conference 2025 Conference Paper

Equivariant Denoisers Cannot Copy Graphs: Align Your Graph Diffusion Models

Najwa Laabid
Severi Rissanen
Markus Heinonen
Arno Solin
Vikas K. Garg 0001

Graph diffusion models, dominant in graph generative modeling, remain underexplored for graph-to-graph translation tasks like chemical reaction prediction. We demonstrate that standard permutation equivariant denoisers face fundamental limitations in these tasks due to their inability to break symmetries in noisy inputs. To address this, we propose \emph{aligning} input and target graphs to break input symmetries while preserving permutation equivariance in non-matching graph portions. Using retrosynthesis (i.e., the task of predicting precursors for synthesis of a given target molecule) as our application domain, we show how alignment dramatically improves discrete diffusion model performance from $5$\% to a SOTA-matching $54.7$\% top-1 accuracy. Code is available at https://github.com/Aalto-QuML/DiffAlign.

ICLR Conference 2025 Conference Paper

Free Hunch: Denoiser Covariance Estimation for Diffusion Models Without Extra Costs

Severi Rissanen
Markus Heinonen
Arno Solin

The covariance for clean data given a noisy observation is an important quantity in many training-free guided generation methods for diffusion models. Current methods require heavy test-time computation, altering the standard diffusion training process or denoiser architecture, or making heavy approximations. We propose a new framework that sidesteps these issues by using covariance information that is available for free from training data and the curvature of the generative trajectory, which is linked to the covariance through the second-order Tweedie's formula. We integrate these sources of information using (i) a novel method to transfer covariance estimates across noise levels and (ii) low-rank updates in a given noise level. We validate the method on linear inverse problems, where it outperforms recent baselines, especially with fewer diffusion steps.

TMLR Journal 2025 Journal Article

Gaussian Processes with Bayesian Inference of Covariate Couplings

Mattia Rosso
Juho Ylä-Jääski
Zheyang Shen
Markus Heinonen
Maurizio Filippone

Gaussian processes are powerful probabilistic models that are often coupled with ARD capable of uncovering the importance of individual covariates. We develop covariances characterized by affine transformations of the inputs, formalized via a precision matrix between covariates, which can uncover covariate couplings for enhanced interpretability. We study a range of couplings priors from Wishart to Horseshoe and present fully Bayesian inference of such precision matrices within sparse Gaussian processes. We empirically demonstrate the efficacy and interpretability of this approach.

ICML Conference 2025 Conference Paper

Progressive Tempering Sampler with Diffusion

Severi Rissanen
Ruikang Ouyang
Jiajun He 0003
Wenlin Chen
Markus Heinonen
Arno Solin
José Miguel Hernández-Lobato

Recent research has focused on designing neural samplers that amortize the process of sampling from unnormalized densities. However, despite significant advancements, they still fall short of the state-of-the-art MCMC approach, Parallel Tempering (PT), when it comes to the efficiency of target evaluations. On the other hand, unlike a well-trained neural sampler, PT yields only dependent samples and needs to be rerun—at considerable computational cost—whenever new samples are required. To address these weaknesses, we propose the Progressive Tempering Sampler with Diffusion (PTSD), which trains diffusion models sequentially across temperatures, leveraging the advantages of PT to improve the training of neural samplers. We also introduce a novel method to combine high-temperature diffusion models to generate approximate lower-temperature samples, which are minimally refined using MCMC and used to train the next diffusion model. PTSD enables efficient reuse of sample information across temperature levels while generating well-mixed, uncorrelated samples. Our method significantly improves target evaluation efficiency, outperforming diffusion-based neural samplers.

ICLR Conference 2024 Conference Paper

ClimODE: Climate and Weather Forecasting with Physics-informed Neural ODEs

Yogesh Verma
Markus Heinonen
Vikas K. Garg 0001

Climate and weather prediction traditionally relies on complex numerical simulations of atmospheric physics. Deep learning approaches, such as transformers, have recently challenged the simulation paradigm with complex network forecasts. However, they often act as data-driven black-box models that neglect the underlying physics and lack uncertainty quantification. We address these limitations with ClimODE, a spatiotemporal continuous-time process that implements a key principle of advection from statistical mechanics, namely, weather changes due to a spatial movement of quantities over time. ClimODE models precise weather evolution with value-conserving dynamics, learning global weather transport as a neural flow, which also enables estimating the uncertainty in predictions. Our approach outperforms existing data-driven methods in global and regional forecasting with an order of magnitude smaller parameterization, establishing a new state of the art.

NeurIPS Conference 2024 Conference Paper

Improving robustness to corruptions with multiplicative weight perturbations

Trung Trinh
Markus Heinonen
Luigi Acerbi
Samuel Kaski

Deep neural networks (DNNs) excel on clean images but struggle with corrupted ones. Incorporating specific corruptions into the data augmentation pipeline can improve robustness to those corruptions but may harm performance on clean images and other types of distortion. In this paper, we introduce an alternative approach that improves the robustness of DNNs to a wide range of corruptions without compromising accuracy on clean images. We first demonstrate that input perturbations can be mimicked by multiplicative perturbations in the weight space. Leveraging this, we propose Data Augmentation via Multiplicative Perturbation (DAMP), a training method that optimizes DNNs under random multiplicative weight perturbations. We also examine the recently proposed Adaptive Sharpness-Aware Minimization (ASAM) and show that it optimizes DNNs under adversarial multiplicative weight perturbations. Experiments on image classification datasets (CIFAR-10/100, TinyImageNet and ImageNet) and neural network architectures (ResNet50, ViT-S/16, ViT-B/16) show that DAMP enhances model generalization performance in the presence of corruptions across different settings. Notably, DAMP is able to train a ViT-S/16 on ImageNet from scratch, reaching the top-1 error of 23. 7% which is comparable to ResNet50 without extensive data augmentations.

PDF Details DOI

ICLR Conference 2024 Conference Paper

Input-gradient space particle inference for neural network ensembles

Trung Q. Trinh
Markus Heinonen
Luigi Acerbi
Samuel Kaski

Deep Ensembles (DEs) demonstrate improved accuracy, calibration and robustness to perturbations over single neural networks partly due to their functional diversity. Particle-based variational inference (ParVI) methods enhance diversity by formalizing a repulsion term based on a network similarity kernel. However, weight-space repulsion is inefficient due to over-parameterization, while direct function-space repulsion has been found to produce little improvement over DEs. To sidestep these difficulties, we propose First-order Repulsive Deep Ensemble (FoRDE), an ensemble learning method based on ParVI, which performs repulsion in the space of first-order input gradients. As input gradients uniquely characterize a function up to translation and are much smaller in dimension than the weights, this method guarantees that ensemble members are functionally different. Intuitively, diversifying the input gradients encourages each network to learn different features, which is expected to improve the robustness of an ensemble. Experiments on image classification datasets and transfer learning tasks show that FoRDE significantly outperforms the gold-standard DEs and other ensemble methods in accuracy and calibration under covariate shift due to input perturbations.

ICML Conference 2023 Conference Paper

AbODE: Ab initio antibody design using conjoined ODEs

Yogesh Verma
Markus Heinonen
Vikas K. Garg 0001

Antibodies are Y-shaped proteins that neutralize pathogens and constitute the core of our adaptive immune system. De novo generation of new antibodies that target specific antigens holds the key to accelerating vaccine discovery. However, this co-design of the amino acid sequence and the 3D structure subsumes and accentuates, some central challenges from multiple tasks including protein folding (sequence to structure), inverse folding (structure to sequence), and docking (binding). We strive to surmount these challenges with a new generative model AbODE that extends graph PDEs to accommodate both contextual information and external interactions. Unlike existing approaches, AbODE uses a single round of full-shot decoding, and elicits continuous differential attention that encapsulates, and evolves with, latent interactions within the antibody as well as those involving the antigen. We unravel fundamental connections between AbODE and temporal networks as well as graph-matching networks. The proposed model significantly outperforms existing methods on standard metrics across benchmarks.

NeurIPS Conference 2023 Conference Paper

Continuous-Time Functional Diffusion Processes

Giulio Franzese
Giulio Corallo
Simone Rossi
Markus Heinonen
Maurizio Filippone
Pietro Michiardi

We introduce Functional Diffusion Processes (FDPs), which generalize score-based diffusion models to infinite-dimensional function spaces. FDPs require a new mathematical framework to describe the forward and backward dynamics, and several extensions to derive practical training objectives. These include infinite-dimensional versions of Girsanov theorem, in order to be able to compute an ELBO, and of the sampling theorem, in order to guarantee that functional evaluations in a countable set of points are equivalent to infinite-dimensional functions. We use FDPs to build a new breed of generative models in function spaces, which do not require specialized network architectures, and that can work with any kind of continuous data. Our results on real data show that FDPs achieve high-quality image generation, using a simple MLP architecture with orders of magnitude fewer parameters than existing diffusion models.

ICLR Conference 2023 Conference Paper

Generative Modelling with Inverse Heat Dissipation

Severi Rissanen
Markus Heinonen
Arno Solin

While diffusion models have shown great success in image generation, their noise-inverting generative process does not explicitly consider the structure of images, such as their inherent multi-scale nature. Inspired by diffusion models and the empirical success of coarse-to-fine modelling, we propose a new diffusion-like model that generates images through stochastically reversing the heat equation, a PDE that locally erases fine-scale information when run over the 2D plane of the image. We interpret the solution of the forward heat equation with constant additive noise as a variational approximation in the diffusion latent variable model. Our new model shows emergent qualitative properties not seen in standard diffusion models, such as disentanglement of overall colour and shape in images. Spectral analysis on natural images highlights connections to diffusion models and reveals an implicit coarse-to-fine inductive bias in them.

ICLR Conference 2023 Conference Paper

Latent Neural ODEs with Sparse Bayesian Multiple Shooting

Valerii Iakovlev
Çagatay Yildiz
Markus Heinonen
Harri Lähdesmäki

Training dynamic models, such as neural ODEs, on long trajectories is a hard problem that requires using various tricks, such as trajectory splitting, to make model training work in practice. These methods are often heuristics with poor theoretical justifications, and require iterative manual tuning. We propose a principled multiple shooting technique for neural ODEs that splits the trajectories into manageable short segments, which are optimized in parallel, while ensuring probabilistic control on continuity over consecutive segments. We derive variational inference for our shooting-based latent neural ODE models and propose amortized encodings of irregularly sampled trajectories with a transformer-based recognition network with temporal attention and relative positional encoding. We demonstrate efficient and stable training, and state-of-the-art performance on multiple large-scale benchmark datasets.

TMLR Journal 2023 Journal Article

Learning Energy Conserving Dynamics Efficiently with Hamiltonian Gaussian Processes

Magnus Ross
Markus Heinonen

Hamiltonian mechanics is one of the cornerstones of the natural sciences. Recently there has been significant interest in learning Hamiltonian systems in a free-form way directly from trajectory data. Previous methods have tackled the problem of learning from many short, low-noise trajectories, but learning from a small number of long, noisy trajectories, whilst accounting for model uncertainty has not been addressed. In this work, we present a Gaussian process model for Hamiltonian systems with efficient decoupled parameterisation, and introduce an energy-conserving shooting method that allows robust inference from both short and long trajectories. We demonstrate the method's success in learning Hamiltonian systems in various data settings.

TMLR Journal 2023 Journal Article

Learning representations that are closed-form Monge mapping optimal with application to domain adaptation

Oliver Struckmeier
Ievgen Redko
Anton Mallasto
Karol Arndt
Markus Heinonen
Ville Kyrki

Optimal transport (OT) is a powerful geometric tool used to compare and align probability measures following the least effort principle. Despite its widespread use in machine learning (ML), OT problem still bears its computational burden, while at the same time suffering from the curse of dimensionality for measures supported on general high-dimensional spaces. In this paper, we propose to tackle these challenges using representation learning. In particular, we seek to learn an embedding space such that the samples of the two input measures become alignable in it with a simple affine mapping that can be calculated efficiently in closed-form. We then show that such approach leads to results that are comparable to solving the original OT problem when applied to the transfer learning task on which many OT baselines where previously evaluated in both homogeneous and heterogeneous DA settings.

NeurIPS Conference 2023 Conference Paper

Learning Space-Time Continuous Latent Neural PDEs from Partially Observed States

Valerii Iakovlev
Markus Heinonen
Harri Lähdesmäki

We introduce a novel grid-independent model for learning partial differential equations (PDEs) from noisy and partial observations on irregular spatiotemporal grids. We propose a space-time continuous latent neural PDE model with an efficient probabilistic framework and a novel encoder design for improved data efficiency and grid independence. The latent state dynamics are governed by a PDE model that combines the collocation method and the method of lines. We employ amortized variational inference for approximate posterior estimation and utilize a multiple shooting technique for enhanced training speed and stability. Our model demonstrates state-of-the-art performance on complex synthetic and real-world datasets, overcoming limitations of previous approaches and effectively handling partially-observed data. The proposed model outperforms recent methods, showing its potential to advance data-driven PDE modeling and enabling robust, grid-independent modeling of complex partially-observed dynamic processes across various domains.

NeurIPS Conference 2022 Conference Paper

Modular Flows: Differential Molecular Generation

Yogesh Verma
Samuel Kaski
Markus Heinonen
Vikas Garg

Generating new molecules is fundamental to advancing critical applications such as drug discovery and material synthesis. Flows can generate molecules effectively by inverting the encoding process, however, existing flow models either require artifactual dequantization or specific node/edge orderings, lack desiderata such as permutation invariance, or induce discrepancy between encoding and decoding steps that necessitates post hoc validity correction. Inspired by graph PDEs, we circumvent these issues with novel continuous normalizing E(3)-equivariant flows, based on a system of coupled node ODEs, that repeatedly reconcile locally toward globally aligned densities. Our models can be cast as message passing temporal networks, and result in superlative density estimation and molecular generation. In particular, our generated samples achieve state of the art on both the standard QM9 and ZINC250K benchmarks.

ICML Conference 2022 Conference Paper

Tackling covariate shift with node-based Bayesian neural networks

Trung Q. Trinh
Markus Heinonen
Luigi Acerbi
Samuel Kaski

Bayesian neural networks (BNNs) promise improved generalization under covariate shift by providing principled probabilistic representations of epistemic uncertainty. However, weight-based BNNs often struggle with high computational complexity of large-scale architectures and datasets. Node-based BNNs have recently been introduced as scalable alternatives, which induce epistemic uncertainty by multiplying each hidden node with latent random variables, while learning a point-estimate of the weights. In this paper, we interpret these latent noise variables as implicit representations of simple and domain-agnostic data perturbations during training, producing BNNs that perform well under covariate shift due to input corruptions. We observe that the diversity of the implicit corruptions depends on the entropy of the latent variables, and propose a straightforward approach to increase the entropy of these variables during training. We evaluate the method on out-of-distribution image classification benchmarks, and show improved uncertainty estimation of node-based BNNs under covariate shift due to input perturbations. As a side effect, the method also provides robustness against noisy training labels.

UAI Conference 2022 Conference Paper

Variational multiple shooting for Bayesian ODEs with Gaussian processes

Pashupati Hegde
Çagatay Yildiz
Harri Lähdesmäki
Samuel Kaski
Markus Heinonen

Recent machine learning advances have proposed black-box estimation of \textit{unknown continuous-time system dynamics} directly from data. However, earlier works are based on approximative solutions or point estimates. We propose a novel Bayesian nonparametric model that uses Gaussian processes to infer posteriors of unknown ODE systems directly from data. We derive sparse variational inference with decoupled functional sampling to represent vector field posteriors. We also introduce a probabilistic shooting augmentation to enable efficient inference from arbitrarily long trajectories. The method demonstrates the benefit of computing vector field posteriors, with predictive uncertainty scores outperforming alternative methods on multiple ODE learning tasks.

ICML Conference 2021 Conference Paper

Continuous-time Model-based Reinforcement Learning

Çagatay Yildiz
Markus Heinonen
Harri Lähdesmäki

Model-based reinforcement learning (MBRL) approaches rely on discrete-time state transition models whereas physical systems and the vast majority of control tasks operate in continuous-time. To avoid time-discretization approximation of the underlying process, we propose a continuous-time MBRL framework based on a novel actor-critic method. Our approach also infers the unknown state evolution differentials with Bayesian neural ordinary differential equations (ODE) to account for epistemic uncertainty. We implement and test our method on a new ODE-RL suite that explicitly solves continuous-time control systems. Our experiments illustrate that the model is robust against irregular and noisy data, and can solve classic control problems in a sample-efficient manner.

NeurIPS Conference 2021 Conference Paper

De-randomizing MCMC dynamics with the diffusion Stein operator

Zheyang Shen
Markus Heinonen
Samuel Kaski

Approximate Bayesian inference estimates descriptors of an intractable target distribution - in essence, an optimization problem within a family of distributions. For example, Langevin dynamics (LD) extracts asymptotically exact samples from a diffusion process because the time evolution of its marginal distributions constitutes a curve that minimizes the KL-divergence via steepest descent in the Wasserstein space. Parallel to LD, Stein variational gradient descent (SVGD) similarly minimizes the KL, albeit endowed with a novel Stein-Wasserstein distance, by deterministically transporting a set of particle samples, thus de-randomizes the stochastic diffusion process. We propose de-randomized kernel-based particle samplers to all diffusion-based samplers known as MCMC dynamics. Following previous work in interpreting MCMC dynamics, we equip the Stein-Wasserstein space with a fiber-Riemannian Poisson structure, with the capacity of characterizing a fiber-gradient Hamiltonian flow that simulates MCMC dynamics. Such dynamics discretizes into generalized SVGD (GSVGD), a Stein-type deterministic particle sampler, with particle updates coinciding with applying the diffusion Stein operator to a kernel function. We demonstrate empirically that GSVGD can de-randomize complex MCMC dynamics, which combine the advantages of auxiliary momentum variables and Riemannian structure, while maintaining the high sample quality from an interacting particle system.

ICLR Conference 2021 Conference Paper

Learning continuous-time PDEs from sparse data with graph neural networks

Valerii Iakovlev
Markus Heinonen
Harri Lähdesmäki

The behavior of many dynamical systems follow complex, yet still unknown partial differential equations (PDEs). While several machine learning methods have been proposed to learn PDEs directly from data, previous methods are limited to discrete-time approximations or make the limiting assumption of the observations arriving at regular grids. We propose a general continuous-time differential model for dynamical systems whose governing equations are parameterized by message passing graph neural networks. The model admits arbitrary space and time discretizations, which removes constraints on the locations of observation points and time intervals between the observations. The model is trained with continuous-time adjoint method enabling efficient neural PDE inference. We demonstrate the model's ability to work with unstructured grids, arbitrary time steps, and noisy observations. We compare our method with existing approaches on several well-known physical systems that involve first and higher-order PDEs with state-of-the-art predictive performance.

NeurIPS Conference 2019 Conference Paper

ODE2VAE: Deep generative second order ODEs with Bayesian neural networks

Cagatay Yildiz
Markus Heinonen
Harri Lahdesmaki

We present Ordinary Differential Equation Variational Auto-Encoder (ODE2VAE), a latent second order ODE model for high-dimensional sequential data. Leveraging the advances in deep generative models, ODE2VAE can simultaneously learn the embedding of high dimensional trajectories and infer arbitrarily complex continuous-time latent dynamics. Our model explicitly decomposes the latent space into momentum and position components and solves a second order ODE system, which is in contrast to recurrent neural network (RNN) based time series models and recently proposed black-box ODE techniques. In order to account for uncertainty, we propose probabilistic latent ODE dynamics parameterized by deep Bayesian neural networks. We demonstrate our approach on motion capture, image rotation, and bouncing balls datasets. We achieve state-of-the-art performance in long term motion prediction and imputation tasks.

ICML Conference 2018 Conference Paper

Learning unknown ODE models with Gaussian processes

Markus Heinonen
Çagatay Yildiz
Henrik Mannerström
Jukka Intosalmi
Harri Lähdesmäki

In conventional ODE modelling coefficients of an equation driving the system state forward in time are estimated. However, for many complex systems it is practically impossible to determine the equations or interactions governing the underlying dynamics. In these settings, parametric ODE model cannot be formulated. Here, we overcome this issue by introducing a novel paradigm of nonparametric ODE modelling that can learn the underlying dynamics of arbitrary continuous-time systems without prior knowledge. We propose to learn non-linear, unknown differential functions from state observations using Gaussian process vector fields within the exact ODE formalism. We demonstrate the model’s capabilities to infer dynamics from sparse data and to simulate the system forward into future.

UAI Conference 2018 Conference Paper

Variational zero-inflated Gaussian processes with sparse kernels

Pashupati Hegde
Markus Heinonen
Samuel Kaski

Zero-inflated datasets, which have an excess of zero outputs, are commonly encountered in problems such as climate or rare event modelling. Conventional machine learning approaches tend to overestimate the non-zeros leading to poor performance. We propose a novel model family of zero-inflated Gaussian processes (ZiGP) for such zero-inflated datasets, produced by sparse kernels through learning a latent probit Gaussian process that can zero out kernel rows and columns whenever the signal is absent. The ZiGPs are particularly useful for making the powerful Gaussian process networks more interpretable. We introduce sparse GP networks where variable-order latent modelling is achieved through sparse mixing signals. We derive the non-trivial stochastic variational inference tractably for scalable learning of the sparse kernels in both models. The novel output-sparse approach improves both prediction of zero-inflated data and interpretability of latent mixing models.

NeurIPS Conference 2017 Conference Paper

Non-Stationary Spectral Kernels

Sami Remes
Markus Heinonen
Samuel Kaski

We propose non-stationary spectral kernels for Gaussian process regression by modelling the spectral density of a non-stationary kernel function as a mixture of input-dependent Gaussian process frequency density surfaces. We solve the generalised Fourier transform with such a model, and present a family of non-stationary and non-monotonic kernels that can learn input-dependent and potentially long-range, non-monotonic covariances between inputs. We derive efficient inference using model whitening and marginalized posterior, and show with case studies that these kernels are necessary when modelling even rather simple time series, image or geospatial data with non-stationary characteristics.