Arrow Research search

Author name cluster

Yongxin Yang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

24 papers
2 author rows

Possible papers

24

ICML Conference 2025 Conference Paper

HyperIV: Real-time Implied Volatility Smoothing

  • Yongxin Yang
  • Wenqi Chen
  • Chao Shu
  • Timothy M. Hospedales

We propose HyperIV, a novel approach for real-time implied volatility smoothing that eliminates the need for traditional calibration procedures. Our method employs a hypernetwork to generate parameters for a compact neural network that constructs complete volatility surfaces within 2 milliseconds, using only 9 market observations. Moreover, the generated surfaces are guaranteed to be free of static arbitrage. Extensive experiments across 8 index options demonstrate that HyperIV achieves superior accuracy compared to existing methods while maintaining computational efficiency. The model also exhibits strong cross-asset generalization capabilities, indicating broader applicability across different market instruments. These key features – rapid adaptation to market conditions, guaranteed absence of arbitrage, and minimal data requirements – make HyperIV particularly valuable for real-time trading applications. We make code available at https: //github. com/qmfin/hyperiv.

NeurIPS Conference 2024 Conference Paper

Generating compositional scenes via Text-to-image RGBA Instance Generation

  • Alessandro Fontanella
  • Petru-Daniel Tudosiu
  • Yongxin Yang
  • Shifeng Zhang
  • Sarah Parisot

Text-to-image diffusion generative models can generate high quality images at the cost of tedious prompt engineering. Controllability can be improved by introducing layout conditioning, however existing methods lack layout editing ability and fine-grained control over object attributes. The concept of multi-layer generation holds great potential to address these limitations, however generating image instances concurrently to scene composition limits control over fine-grained object attributes, relative positioning in 3D space and scene manipulation abilities. In this work, we propose a novel multi-stage generation paradigm that is designed for fine-grained control, flexibility and interactivity. To ensure control over instance attributes, we devise a novel training paradigm to adapt a diffusion model to generate isolated scene components as RGBA images with transparency information. To build complex images, we employ these pre-generated instances and introduce a multi-layer composite generation process that smoothly assembles components in realistic scenes. Our experiments show that our RGBA diffusion model is capable of generating diverse and high quality instances with precise control over object attributes. Through multi-layer composition, we demonstrate that our approach allows to build and manipulate images from highly complex prompts with fine-grained control over object appearance and location, granting a higher degree of control than competing methods.

ICML Conference 2024 Conference Paper

Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models

  • Yongshuo Zong
  • Ondrej Bohdal
  • Tingyang Yu
  • Yongxin Yang
  • Timothy M. Hospedales

Current vision large language models (VLLMs) exhibit remarkable capabilities yet are prone to generate harmful content and are vulnerable to even the simplest jailbreaking attacks. Our initial analysis finds that this is due to the presence of harmful data during vision-language instruction fine-tuning, and that VLLM fine-tuning can cause forgetting of safety alignment previously learned by the underpinning LLM. To address this issue, we first curate a vision-language safe instruction-following dataset VLGuard covering various harmful categories. Our experiments demonstrate that integrating this dataset into standard vision-language fine-tuning or utilizing it for post-hoc fine-tuning effectively safety aligns VLLMs. This alignment is achieved with minimal impact on, or even enhancement of, the models’ helpfulness. The versatility of our safety fine-tuning dataset makes it a valuable resource for safety-testing existing VLLMs, training new models or safeguarding pre-trained VLLMs. Empirical results demonstrate that fine-tuned VLLMs effectively reject unsafe instructions and substantially reduce the success rates of several black-box adversarial attacks, which approach zero in many cases. The code and dataset will be open-sourced.

ICLR Conference 2023 Conference Paper

ChiroDiff: Modelling chirographic data with Diffusion Models

  • Ayan Das 0003
  • Yongxin Yang
  • Timothy M. Hospedales
  • Tao Xiang 0002
  • Yi-Zhe Song

Generative modelling over continuous-time geometric constructs, a.k.a $chirographic\ data$ such as handwriting, sketches, drawings etc., have been accomplished through autoregressive distributions. Such strictly-ordered discrete factorization however falls short of capturing key properties of chirographic data -- it fails to build holistic understanding of the temporal concept due to one-way visibility (causality). Consequently, temporal data has been modelled as discrete token sequences of fixed sampling rate instead of capturing the true underlying concept. In this paper, we introduce a powerful model-class namely Denoising\ Diffusion\ Probabilistic\ Models or DDPMs for chirographic data that specifically addresses these flaws. Our model named "ChiroDiff", being non-autoregressive, learns to capture holistic concepts and therefore remains resilient to higher temporal sampling rate up to a good extent. Moreover, we show that many important downstream utilities (e.g. conditional sampling, creative mixing) can be flexibly implemented using ChiroDiff. We further show some unique use-cases like stochastic vectorization, de-noising/healing, abstraction are also possible with this model-class. We perform quantitative and qualitative evaluation of our framework on relevant datasets and found it to be better or on par with competing approaches.

ICLR Conference 2023 Conference Paper

MEDFAIR: Benchmarking Fairness for Medical Imaging

  • Yongshuo Zong
  • Yongxin Yang
  • Timothy M. Hospedales

A multitude of work has shown that machine learning-based medical diagnosis systems can be biased against certain subgroups of people. This has motivated a growing number of bias mitigation algorithms that aim to address fairness issues in machine learning. However, it is difficult to compare their effectiveness in medical imaging for two reasons. First, there is little consensus on the criteria to assess fairness. Second, existing bias mitigation algorithms are developed under different settings, e.g., datasets, model selection strategies, backbones, and fairness metrics, making a direct comparison and evaluation based on existing results impossible. In this work, we introduce MEDFAIR, a framework to benchmark the fairness of machine learning models for medical imaging. MEDFAIR covers eleven algorithms from various categories, ten datasets from different imaging modalities, and three model selection criteria. Through extensive experiments, we find that the under-studied issue of model selection criterion can have a significant impact on fairness outcomes; while in contrast, state-of-the-art bias mitigation algorithms do not significantly improve fairness outcomes over empirical risk minimization (ERM) in both in-distribution and out-of-distribution settings. We evaluate fairness from various perspectives and make recommendations for different medical application scenarios that require different ethical principles. Our framework provides a reproducible and easy-to-use entry point for the development and evaluation of future bias mitigation algorithms in deep learning. Code is available at https://github.com/ys-zong/MEDFAIR.

TMLR Journal 2023 Journal Article

Meta-Calibration: Learning of Model Calibration Using Differentiable Expected Calibration Error

  • Ondrej Bohdal
  • Yongxin Yang
  • Timothy Hospedales

Calibration of neural networks is a topical problem that is becoming more and more important as neural networks increasingly underpin real-world applications. The problem is especially noticeable when using modern neural networks, for which there is a significant difference between the confidence of the model and the probability of correct prediction. Various strategies have been proposed to improve calibration, yet accurate calibration remains challenging. We propose a novel framework with two contributions: introducing a new differentiable surrogate for expected calibration error (DECE) that allows calibration quality to be directly optimised, and a meta-learning framework that uses DECE to optimise for validation set calibration with respect to model hyper-parameters. The results show that we achieve competitive performance with existing calibration approaches. Our framework opens up a new avenue and toolset for tackling calibration, which we believe will inspire further work on this important challenge.

UAI Conference 2023 Conference Paper

Mixture of Normalizing Flows for European Option Pricing

  • Yongxin Yang
  • Timothy M. Hospedales

We present a mixture of normalizing flows (MoNF) approach to European option pricing with guarantees that its estimations are free from static arbitrage. In contrast to many existing methods that meet economic rationality constraints (e. g. , non-arbitrage) by introducing auxiliary losses, our solution meets those constraints exactly by design. To achieve this, we propose to build a model for risk neutral density using normalizing flows, which results in a pricing model, instead of modelling the option pricing function directly. First, we convert the constraints for direct pricing models to the constraints for models backed by risk neutral density estimation, then we design a specific NF architecture that meets these constraints. Furthermore, we find that employing a mixture of such normalizing flows improves the performance significantly, compared to using a deeper single NF. Finally, we present a mechanism to regularise the proposed model, and this regularisation can serve as a bridge between our method and any sample-based mathematical finance method. The evaluations on five option datasets show superiority of our method compared to mathematical finance solutions and some other neural networks based methods. The code is available at \url{https: //github. com/qmfin/MoNF}.

ICLR Conference 2022 Conference Paper

Augmented Sliced Wasserstein Distances

  • Xiongjie Chen
  • Yongxin Yang
  • Yunpeng Li 0004

While theoretically appealing, the application of the Wasserstein distance to large-scale machine learning problems has been hampered by its prohibitive computational cost. The sliced Wasserstein distance and its variants improve the computational efficiency through the random projection, yet they suffer from low accuracy if the number of projections is not sufficiently large, because the majority of projections result in trivially small values. In this work, we propose a new family of distance metrics, called augmented sliced Wasserstein distances (ASWDs), constructed by first mapping samples to higher-dimensional hypersurfaces parameterized by neural networks. It is derived from a key observation that (random) linear projections of samples residing on these hypersurfaces would translate to much more flexible nonlinear projections in the original sample space, so they can capture complex structures of the data distribution. We show that the hypersurfaces can be optimized by gradient ascent efficiently. We provide the condition under which the ASWD is a valid metric and show that this can be obtained by an injective neural network architecture. Numerical results demonstrate that the ASWD significantly outperforms other Wasserstein variants for both synthetic and real-world problems.

ICML Conference 2022 Conference Paper

Loss Function Learning for Domain Generalization by Implicit Gradient

  • Boyan Gao
  • Henry Gouk
  • Yongxin Yang
  • Timothy M. Hospedales

Generalising robustly to distribution shift is a major challenge that is pervasive across most real-world applications of machine learning. A recent study highlighted that many advanced algorithms proposed to tackle such domain generalisation (DG) fail to outperform a properly tuned empirical risk minimisation (ERM) baseline. We take a different approach, and explore the impact of the ERM loss function on out-of-domain generalisation. In particular, we introduce a novel meta-learning approach to loss function search based on implicit gradient. This enables us to discover a general purpose parametric loss function that provides a drop-in replacement for cross-entropy. Our loss can be used in standard training pipelines to efficiently train robust models using any neural architecture on new datasets. The results show that it clearly surpasses cross-entropy, enables simple ERM to outperform some more complicated prior DG methods, and provides state-of-the-art performance across a variety of DG benchmarks. Furthermore, unlike most existing DG approaches, our setup applies to the most practical setting of single-source domain generalisation, on which we show significant improvement.

IJCAI Conference 2022 Conference Paper

Residual Contrastive Learning for Image Reconstruction: Learning Transferable Representations from Noisy Images

  • Nanqing Dong
  • Matteo Maggioni
  • Yongxin Yang
  • Eduardo Pérez-Pellitero
  • Ales Leonardis
  • Steven McDonagh

This paper is concerned with contrastive learning (CL) for low-level image restoration and enhancement tasks. We propose a new label-efficient learning paradigm based on residuals, residual contrastive learning (RCL), and derive an unsupervised visual representation learning framework, suitable for low-level vision tasks with noisy inputs. While supervised image reconstruction aims to minimize residual terms directly, RCL alternatively builds a connection between residuals and CL by defining a novel instance discrimination pretext task, using residuals as the discriminative feature. Our formulation mitigates the severe task misalignment between instance discrimination pretext tasks and downstream image reconstruction tasks, present in existing CL frameworks. Experimentally, we find that RCL can learn robust and transferable representations that improve the performance of various downstream tasks, such as denoising and super resolution, in comparison with recent self-supervised methods designed specifically for noisy inputs. Additionally, our unsupervised pre-training can significantly reduce annotation costs whilst maintaining performance competitive with fully-supervised image reconstruction.

ICLR Conference 2022 Conference Paper

SketchODE: Learning neural sketch representation in continuous time

  • Ayan Das 0003
  • Yongxin Yang
  • Timothy M. Hospedales
  • Tao Xiang 0002
  • Yi-Zhe Song

Learning meaningful representations for chirographic drawing data such as sketches, handwriting, and flowcharts is a gateway for understanding and emulating human creative expression. Despite being inherently continuous-time data, existing works have treated these as discrete-time sequences, disregarding their true nature. In this work, we model such data as continuous-time functions and learn compact representations by virtue of Neural Ordinary Differential Equations. To this end, we introduce the first continuous-time Seq2Seq model and demonstrate some remarkable properties that set it apart from traditional discrete-time analogues. We also provide solutions for some practical challenges for such models, including introducing a family of parameterized ODE dynamics & continuous-time data augmentation particularly suitable for the task. Our models are validated on several datasets including VectorMNIST, DiDi and Quick, Draw!.

NeurIPS Conference 2022 Conference Paper

ZooD: Exploiting Model Zoo for Out-of-Distribution Generalization

  • Qishi Dong
  • Awais Muhammad
  • Fengwei Zhou
  • Chuanlong Xie
  • Tianyang Hu
  • Yongxin Yang
  • Sung-Ho Bae
  • Zhenguo Li

Recent advances on large-scale pre-training have shown great potentials of leveraging a large set of Pre-Trained Models (PTMs) for improving Out-of-Distribution (OoD) generalization, for which the goal is to perform well on possible unseen domains after fine-tuning on multiple training domains. However, maximally exploiting a zoo of PTMs is challenging since fine-tuning all possible combinations of PTMs is computationally prohibitive while accurate selection of PTMs requires tackling the possible data distribution shift for OoD tasks. In this work, we propose ZooD, a paradigm for PTMs ranking and ensemble with feature selection. Our proposed metric ranks PTMs by quantifying inter-class discriminability and inter-domain stability of the features extracted by the PTMs in a leave-one-domain-out cross-validation manner. The top-K ranked models are then aggregated for the target OoD task. To avoid accumulating noise induced by model ensemble, we propose an efficient variational EM algorithm to select informative features. We evaluate our paradigm on a diverse model zoo consisting of 35 models for various OoD tasks and demonstrate: (i) model ranking is better correlated with fine-tuning ranking than previous methods and up to 9859x faster than brute-force fine-tuning; (ii) OoD generalization after model ensemble with feature selection outperforms the state-of-the-art methods and the accuracy on most challenging task DomainNet is improved from 46. 5\% to 50. 6\%. Furthermore, we provide the fine-tuning results of 35 PTMs on 7 OoD datasets, hoping to help the research of model zoo and OoD generalization. Code will be available at \href{https: //gitee. com/mindspore/models/tree/master/research/cv/zood}{https: //gitee. com/mindspore/models/tree/master/research/cv/zood}.

ICLR Conference 2021 Conference Paper

Domain Generalization with MixStyle

  • Kaiyang Zhou
  • Yongxin Yang
  • Yu Qiao 0001
  • Tao Xiang 0002

Though convolutional neural networks (CNNs) have demonstrated remarkable ability in learning discriminative features, they often generalize poorly to unseen domains. Domain generalization aims to address this problem by learning from a set of source domains a model that is generalizable to any unseen domain. In this paper, a novel approach is proposed based on probabilistically mixing instance-level feature statistics of training samples across source domains. Our method, termed MixStyle, is motivated by the observation that visual domain is closely related to image style (e.g., photo vs.~sketch images). Such style information is captured by the bottom layers of a CNN where our proposed style-mixing takes place. Mixing styles of training instances results in novel domains being synthesized implicitly, which increase the domain diversity of the source domains, and hence the generalizability of the trained model. MixStyle fits into mini-batch training perfectly and is extremely easy to implement. The effectiveness of MixStyle is demonstrated on a wide range of tasks including category classification, instance retrieval and reinforcement learning.

NeurIPS Conference 2021 Conference Paper

EvoGrad: Efficient Gradient-Based Meta-Learning and Hyperparameter Optimization

  • Ondrej Bohdal
  • Yongxin Yang
  • Timothy Hospedales

Gradient-based meta-learning and hyperparameter optimization have seen significant progress recently, enabling practical end-to-end training of neural networks together with many hyperparameters. Nevertheless, existing approaches are relatively expensive as they need to compute second-order derivatives and store a longer computational graph. This cost prevents scaling them to larger network architectures. We present EvoGrad, a new approach to meta-learning that draws upon evolutionary techniques to more efficiently compute hypergradients. EvoGrad estimates hypergradient with respect to hyperparameters without calculating second-order gradients, or storing a longer computational graph, leading to significant improvements in efficiency. We evaluate EvoGrad on three substantial recent meta-learning applications, namely cross-domain few-shot learning with feature-wise transformations, noisy label learning with Meta-Weight-Net and low-resource cross-lingual learning with meta representation transformation. The results show that EvoGrad significantly improves efficiency and enables scaling meta-learning to bigger architectures such as from ResNet10 to ResNet34.

AAAI Conference 2021 Conference Paper

Simple and Effective Stochastic Neural Networks

  • Tianyuan Yu
  • Yongxin Yang
  • Da Li
  • Timothy Hospedales
  • Tao Xiang

Stochastic neural networks (SNNs) are currently topical, with several paradigms being actively investigated including dropout, Bayesian neural networks, variational information bottleneck (VIB) and noise regularized learning. These neural network variants impact several major considerations, including generalization, network compression, robustness against adversarial attack and label noise, and model calibration. However, many existing networks are complicated and expensive to train, and/or only address one or two of these practical considerations. In this paper we propose a simple and effective stochastic neural network (SE-SNN) architecture for discriminative learning by directly modeling activation uncertainty and encouraging high activation variability. Compared to existing SNNs, our SE-SNN is simpler to implement and faster to train, and produces state of the art results on network compression by pruning, adversarial defense, learning with label noise, and model calibration.

ICML Conference 2020 Conference Paper

A Tree-Structured Decoder for Image-to-Markup Generation

  • Jianshu Zhang 0001
  • Jun Du 0002
  • Yongxin Yang
  • Yi-Zhe Song
  • Si Wei
  • Lirong Dai 0001

Recent encoder-decoder approaches typically employ string decoders to convert images into serialized strings for image-to-markup. However, for tree-structured representational markup, string representations can hardly cope with the structural complexity. In this work, we first show via a set of toy problems that string decoders struggle to decode tree structures, especially as structural complexity increases, we then propose a tree-structured decoder that specifically aims at generating a tree-structured markup. Our decoders works sequentially, where at each step a child node and its parent node are simultaneously generated to form a sub-tree. This sub-tree is consequently used to construct the final tree structure in a recurrent manner. Key to the success of our tree decoder is twofold, (i) it strictly respects the parent-child relationship of trees, and (ii) it explicitly outputs trees as oppose to a linear string. Evaluated on both math formula recognition and chemical formula recognition, the proposed tree decoder is shown to greatly outperform strong string decoder baselines.

AAAI Conference 2020 Conference Paper

Deep Domain-Adversarial Image Generation for Domain Generalisation

  • Kaiyang Zhou
  • Yongxin Yang
  • Timothy Hospedales
  • Tao Xiang

Machine learning models typically suffer from the domain shift problem when trained on a source dataset and evaluated on a target dataset of different distribution. To overcome this problem, domain generalisation (DG) methods aim to leverage data from multiple source domains so that a trained model can generalise to unseen domains. In this paper, we propose a novel DG approach based on Deep Domain-Adversarial Image Generation (DDAIG). Specifically, DDAIG consists of three components, namely a label classifier, a domain classi- fier and a domain transformation network (DoTNet). The goal for DoTNet is to map the source training data to unseen domains. This is achieved by having a learning objective formulated to ensure that the generated data can be correctly classi- fied by the label classifier while fooling the domain classifier. By augmenting the source training data with the generated unseen domain data, we can make the label classifier more robust to unknown domain changes. Extensive experiments on four DG datasets demonstrate the effectiveness of our approach.

AAAI Conference 2020 Conference Paper

Index Tracking with Cardinality Constraints: A Stochastic Neural Networks Approach

  • Yu Zheng
  • Bowei Chen
  • Timothy M. Hospedales
  • Yongxin Yang

Partial (replication) index tracking is a popular passive investment strategy. It aims to replicate the performance of a given index by constructing a tracking portfolio which contains some constituents of the index. The tracking error optimisation is quadratic and NP-hard when taking the 0 constraint into account so it is usually solved by heuristic methods such as evolutionary algorithms. This paper introduces a simple, efficient and scalable connectionist model as an alternative. We propose a novel reparametrisation method and then solve the optimisation problem with stochastic neural networks. The proposed approach is examined with S&P 500 index data for more than 10 years and compared with widely used index tracking approaches such as forward and backward selection and the largest market capitalisation methods. The empirical results show our model achieves excellent performance. Compared with the benchmarked models, our model has the lowest tracking error, across a range of portfolio sizes. Meanwhile it offers comparable performance to the others on secondary criteria such as volatility, Sharpe ratio and maximum drawdown.

NeurIPS Conference 2020 Conference Paper

Online Meta-Critic Learning for Off-Policy Actor-Critic Methods

  • Wei Zhou
  • Yiying Li
  • Yongxin Yang
  • Huaimin Wang
  • Timothy Hospedales

Off-Policy Actor-Critic (OffP-AC) methods have proven successful in a variety of continuous control tasks. Normally, the critic's action-value function is updated using temporal-difference, and the critic in turn provides a loss for the actor that trains it to take actions with higher expected return. In this paper, we introduce a flexible and augmented meta-critic that observes the learning process and meta-learns an additional loss for the actor that accelerates and improves actor-critic learning. Compared to existing meta-learning algorithms, meta-critic is rapidly learned online for a single task, rather than slowly over a family of tasks. Crucially, our meta-critic is designed for off-policy based learners, which currently provide state-of-the-art reinforcement learning sample efficiency. We demonstrate that online meta-critic learning benefits to a variety of continuous control tasks when combined with contemporary OffP-AC methods DDPG, TD3 and SAC.

AAAI Conference 2019 Conference Paper

Disjoint Label Space Transfer Learning with Common Factorised Space

  • Xiaobin Chang
  • Yongxin Yang
  • Tao Xiang
  • Timothy M. Hospedales

In this paper, a unified approach is presented to transfer learning that addresses several source and target domain labelspace and annotation assumptions with a single model. It is particularly effective in handling a challenging case, where source and target label-spaces are disjoint, and outperforms alternatives in both unsupervised and semi-supervised settings. The key ingredient is a common representation termed Common Factorised Space. It is shared between source and target domains, and trained with an unsupervised factorisation loss and a graph-based loss. With a wide range of experiments, we demonstrate the flexibility, relevance and efficacy of our method, both in the challenging cases with disjoint label spaces, and in the more conventional cases such as unsupervised domain adaptation, where the source and target domains share the same label-sets.

ICML Conference 2019 Conference Paper

Feature-Critic Networks for Heterogeneous Domain Generalization

  • Yiying Li
  • Yongxin Yang
  • Wei Zhou
  • Timothy M. Hospedales

The well known domain shift issue causes model performance to degrade when deployed to a new target domain with different statistics to training. Domain adaptation techniques alleviate this, but need some instances from the target domain to drive adaptation. Domain generalisation is the recently topical problem of learning a model that generalises to unseen domains out of the box, and various approaches aim to train a domain-invariant feature extractor, typically by adding some manually designed losses. In this work, we propose a learning to learn approach, where the auxiliary loss that helps generalisation is itself learned. Beyond conventional domain generalisation, we consider a more challenging setting of heterogeneous domain generalisation, where the unseen domains do not share label space with the seen ones, and the goal is to train a feature representation that is useful off-the-shelf for novel data and novel categories. Experimental evaluation demonstrates that our method outperforms state-of-the-art solutions in both settings.

AAAI Conference 2018 Conference Paper

Learning to Generalize: Meta-Learning for Domain Generalization

  • Da Li
  • Yongxin Yang
  • Yi-Zhe Song
  • Timothy Hospedales

Domain shift refers to the well known problem that a model trained in one source domain performs poorly when applied to a target domain with different statistics. Domain Generalization (DG) techniques attempt to alleviate this issue by producing models which by design generalize well to novel testing domains. We propose a novel meta-learning method for domain generalization. Rather than designing a specific model that is robust to domain shift as in most previous DG work, we propose a model agnostic training procedure for DG. Our algorithm simulates train/test domain shift during training by synthesizing virtual testing domains within each mini-batch. The meta-optimization objective requires that steps to improve training domain performance should also improve testing domain performance. This meta-learning procedure trains models with good generalization ability to novel domains. We evaluate our method and achieve state of the art results on a recent cross-domain image classification benchmark, as well demonstrating its potential on two classic reinforcement learning tasks.

AAAI Conference 2017 Conference Paper

Gated Neural Networks for Option Pricing: Rationality by Design

  • Yongxin Yang
  • Yu Zheng
  • Timothy Hospedales

We propose a neural network approach to price EU call options that significantly outperforms some existing pricing models and comes with guarantees that its predictions are economically reasonable. To achieve this, we introduce a class of gated neural networks that automatically learn to divide-and-conquer the problem space for robust and accurate pricing. We then derive instantiations of these networks that are ‘rational by design’ in terms of naturally encoding a valid call option surface that enforces no arbitrage principles. This integration of human insight within data-driven learning provides significantly better generalisation in pricing performance due to the encoded inductive bias in the learning, guarantees sanity in the model’s predictions, and provides econometrically useful byproduct such as risk neutral density.