Arrow Research search

Author name cluster

Mateja Jamnik

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

35 papers
2 author rows

Possible papers

35

ICML Conference 2025 Conference Paper

Avoiding Leakage Poisoning: Concept Interventions Under Distribution Shifts

  • Mateo Espinosa Zarlenga
  • Gabriele Dominici
  • Pietro Barbiero
  • Zohreh Shams
  • Mateja Jamnik

In this paper, we investigate how concept-based models (CMs) respond to out-of-distribution (OOD) inputs. CMs are interpretable neural architectures that first predict a set of high-level concepts (e. g. , "stripes", "black") and then predict a task label from those concepts. In particular, we study the impact of concept interventions (i. e. , operations where a human expert corrects a CM’s mispredicted concepts at test time) on CMs’ task predictions when inputs are OOD. Our analysis reveals a weakness in current state-of-the-art CMs, which we term leakage poisoning, that prevents them from properly improving their accuracy when intervened on for OOD inputs. To address this, we introduce MixCEM, a new CM that learns to dynamically exploit leaked information missing from its concepts only when this information is in-distribution. Our results across tasks with and without complete sets of concept annotations demonstrate that MixCEMs outperform strong baselines by significantly improving their accuracy for both in-distribution and OOD samples in the presence and absence of concept interventions.

TMLR Journal 2025 Journal Article

Do Concept Bottleneck Models Respect Localities?

  • Naveen Janaki Raman
  • Mateo Espinosa Zarlenga
  • Juyeon Heo
  • Mateja Jamnik

Concept-based explainability methods use human-understandable intermediaries to produce explanations for machine learning models. These methods assume concept predictions can help understand a model's internal reasoning. In this work, we assess the degree to which such an assumption is true by analyzing whether concept predictors leverage "relevant" features to make predictions, a term we call locality. Concept-based models that fail to respect localities also fail to be explainable because concept predictions are based on spurious features, making the interpretation of the concept predictions vacuous. To assess whether concept-based models respect localities, we construct and use three metrics to characterize when models respect localities, complementing our analysis with theoretical results. Each of our metrics captures a different notion of perturbation and assess whether perturbing "irrelevant" features impacts the predictions made by a concept predictors. We find that many concept-based models used in practice fail to respect localities because concept predictors cannot always clearly distinguish distinct concepts. Based on these findings, we propose suggestions for alleviating this issue.

AAAI Conference 2025 Conference Paper

Measuring Cross-Modal Interactions in Multimodal Models

  • Laura Wenderoth
  • Konstantin Hemker
  • Nikola Simidjievski
  • Mateja Jamnik

Integrating AI in healthcare can greatly improve patient care and system efficiency. However, the lack of explainability in AI systems (XAI) hinders their clinical adoption, especially in multimodal decision-making that combines various data sources. The majority of existing XAI methods focus on unimodal models, which fail to capture cross-modal interactions that are crucial for understanding the combined impact of multiple data sources. Existing methods for quantifying cross-modal interactions are limited to two modalities, rely on labelled data, and depend on model performance, which is problematic in healthcare, where XAI must handle multiple data sources and provide individualised explanations. This paper introduces InterSHAP, a cross-modal interaction score that addresses the limitations of existing approaches. InterSHAP uses the Shapley interaction index to precisely separate and quantify the contributions of the individual modalities and their interactions without approximations. By integrating an open-source implementation with the SHAP package, we enhance reproducibility and ease of use. We show that InterSHAP accurately measures the presence of cross-modal interactions, can handle multiple modalities, and provides detailed explanations at a local level for individual data points. Furthermore, we apply InterSHAP to real medical multimodal datasets, and demonstrate its practical applicability for individualised explanations.

ICLR Conference 2025 Conference Paper

Multimodal Lego: Model Merging and Fine-Tuning Across Topologies and Modalities in Biomedicine

  • Konstantin Hemker
  • Nikola Simidjievski
  • Mateja Jamnik

Learning holistic computational representations in physical, chemical or biological systems requires the ability to process information from different distributions and modalities within the same model. Thus, the demand for multimodal machine learning models has sharply risen for modalities that go beyond vision and language, such as sequences, graphs, time series, or tabular data. While there are many available multimodal fusion and alignment approaches, most of them require end-to-end training, scale quadratically with the number of modalities, cannot handle cases of high modality imbalance in the training set, or are highly topology-specific, making them too restrictive for many biomedical learning tasks. This paper presents Multimodal Lego (MM-Lego), a general-purpose fusion framework to turn any set of encoders into a competitive multimodal model with no or minimal fine-tuning. We achieve this by introducing a wrapper for any unimodal encoder that enforces shape consistency between modality representations. It harmonises these representations by learning features in the frequency domain to enable model merging with little signal interference. We show that MM-Lego 1) can be used as a model merging method which achieves competitive performance with end-to-end fusion models without any fine-tuning, 2) can operate on any unimodal encoder, and 3) is a model fusion method that, with minimal fine-tuning, surpasses all benchmarks in five out of seven datasets.

AAAI Conference 2025 Conference Paper

Neural Reasoning for Sure Through Constructing Explainable Models

  • Tiansi Dong
  • Mateja Jamnik
  • Pietro Liò

Neural networks remain black-box systems, unsure about their outputs, and their performance may drop unpredictably in real applications. An open question is how to qualitatively extend neural networks, so that they are sure about their reasoning results, or reasoning-for-sure. Here, we introduce set-theoretic relations explicitly and seamlessly into neural networks by extending vector embedding into sphere embedding, so that part-whole relations can explicitly encode set-theoretic relations through sphere boundaries in the vector space. A reasoning-for-sure neural network successfully constructs, within a constant number M of epochs, a sphere configuration as its semantic model for any consistent set-theoretic relation. We implement Hyperbolic Sphere Neural Network (HSphNN), the first reasoning-for-sure neural network for all types of Aristotelian syllogistic reasoning. Its construction process is realised as a sequence of neighbourhood transitions from the current towards the target configuration. We prove M=1 for HSphNN. In experiments, HSphNN achieves the symbolic level rigour of syllogistic reasoning and successfully checks both decisions and explanations of ChatGPT (gpt-3.5-turbo and gpt-4o) without errors. Through prompts, HSphNN improves the performance of gpt-3.5-turbo from 46.875% to 58.98%, and of gpt-4o from 82.42% to 84.76%. We show ways to extend HSphNN for various kinds of logical and Bayesian reasoning, and to integrate it with traditional neural networks seamlessly.

ICML Conference 2025 Conference Paper

NMA-tune: Generating Highly Designable and Dynamics Aware Protein Backbones

  • Urszula Julia Komorowska
  • Francisco Vargas 0001
  • Alessandro Rondina
  • Pietro Liò
  • Mateja Jamnik

Protein’s backbone flexibility is a crucial property that heavily influences its functionality. Recent work in the field of protein diffusion probabilistic modelling has leveraged Normal Mode Analysis (NMA) and, for the first time, introduced information about large scale protein motion into the generative process. However, obtaining molecules with both the desired dynamics and designable quality has proven challenging. In this work, we present NMA-tune, a new method that introduces the dynamics information to the protein design stage. NMA-tune uses a trainable component to condition the backbone generation on the lowest normal mode of oscillation. We implement NMA-tune as a plug-and-play extension to RFdiffusion, show that the proportion of samples with high quality structure and the desired dynamics is improved as compared to other methods without the trainable component, and we show the presence of the targeted modes in the Molecular Dynamics simulations.

ICLR Conference 2024 Conference Paper

Dynamics-Informed Protein Design with Structure Conditioning

  • Urszula Julia Komorowska
  • Simon V. Mathis
  • Kieran Didi
  • Francisco Vargas 0001
  • Pietro Liò
  • Mateja Jamnik

Current protein generative models are able to design novel backbones with desired shapes or functional motifs. However, despite the importance of a protein’s dynamical properties for its function, conditioning on dynamical properties remains elusive. We present a new approach to protein generative modeling by leveraging Normal Mode Analysis that enables us to capture dynamical properties too. We introduce a method for conditioning the diffusion probabilistic models on protein dynamics, specifically on the lowest non-trivial normal mode of oscillation. Our method, similar to the classifier guidance conditioning, formulates the sampling process as being driven by conditional and unconditional terms. However, unlike previous works, we approximate the conditional term with a simple analytical function rather than an external neural network, thus making the eigenvector calculations approachable. We present the corresponding SDE theory as a formal justification of our approach. We extend our framework to conditioning on structure and dynamics at the same time, enabling scaffolding of the dynamical motifs. We demonstrate the empirical effectiveness of our method by turning the open-source unconditional protein diffusion model Genie into the conditional model with no retraining. Generated proteins exhibit the desired dynamical and structural properties while still being biologically plausible. Our work represents a first step towards incorporating dynamical behaviour in protein design and may open the door to designing more flexible and functional proteins in the future.

NeurIPS Conference 2024 Conference Paper

End-to-End Ontology Learning with Large Language Models

  • Andy Lo
  • Albert Q. Jiang
  • Wenda Li
  • Mateja Jamnik

Ontologies are useful for automatic machine processing of domain knowledge as they represent it in a structured format. Yet, constructing ontologies requires substantial manual effort. To automate part of this process, large language models (LLMs) have been applied to solve various subtasks of ontology learning. However, this partial ontology learning does not capture the interactions between subtasks. We address this gap by introducing OLLM, a general and scalable method for building the taxonomic backbone of an ontology from scratch. Rather than focusing on subtasks, like individual relations between entities, we model entire subcomponents of the target ontology by finetuning an LLM with a custom regulariser that reduces overfitting on high-frequency concepts. We introduce a novel suite of metrics for evaluating the quality of the generated ontology by measuring its semantic and structural similarity to the ground truth. In contrast to standard metrics, our metrics use deep learning techniques to define more robust distance measures between graphs. Both our quantitative and qualitative results on Wikipedia show that OLLM outperforms subtask composition methods, producing more semantically accurate ontologies while maintaining structural integrity. We further demonstrate that our model can be effectively adapted to new domains, like arXiv, needing only a small number of training examples. Our source code and datasets are available at https: //github. com/andylolu2/ollm.

TMLR Journal 2024 Journal Article

GCondNet: A Novel Method for Improving Neural Networks on Small High-Dimensional Tabular Data

  • Andrei Margeloiu
  • Nikola Simidjievski
  • Pietro Lio
  • Mateja Jamnik

Neural networks often struggle with high-dimensional but small sample-size tabular datasets. One reason is that current weight initialisation methods assume independence between weights, which can be problematic when there are insufficient samples to estimate the model's parameters accurately. In such small data scenarios, leveraging additional structures can improve the model's performance and training stability. To address this, we propose GCondNet, a general approach to enhance neural networks by leveraging implicit structures present in tabular data. We create a graph between samples for each data dimension, and utilise Graph Neural Networks (GNNs) to extract this implicit structure, and for conditioning the parameters of the first layer of an underlying predictor network. By creating many small graphs, GCondNet exploits the data's high-dimensionality, and thus improves the performance of an underlying predictor network. We demonstrate GCondNet's effectiveness on 12 real-world datasets, where it outperforms 14 standard and state-of-the-art methods. The results show that GCondNet is a versatile framework for injecting graph-regularisation into various types of neural networks, including MLPs and tabular Transformers. The code is available at https://github.com/andreimargeloiu/GCondNet.

AAAI Conference 2024 System Paper

Generation of Visual Representations for Multi-Modal Mathematical Knowledge

  • Lianlong Wu
  • Seewon Choi
  • Daniel Raggi
  • Aaron Stockdill
  • Grecia Garcia Garcia
  • Fiorenzo Colarusso
  • Peter C.H. Cheng
  • Mateja Jamnik

In this paper we introduce MaRE, a tool designed to generate representations in multiple modalities for a given mathematical problem while ensuring the correctness and interpretability of the transformations between different representations. The theoretical foundation for this tool is Representational Systems Theory (RST), a mathematical framework for studying the structure and transformations of representations. In MaRE’s web front-end user interface, a set of probability equations in Bayesian Notation can be rigorously transformed into Area Diagrams, Contingency Tables, and Probability Trees with just one click, utilising a back-end engine based on RST. A table of cognitive costs, based on the cognitive Representational Interpretive Structure Theory (RIST), that a representation places on a particular profile of user is produced at the same time. MaRE is general and domain independent, applicable to other representations encoded in RST. It may enhance mathematical education and research, facilitating multi-modal knowledge representation and discovery.

NeurIPS Conference 2024 Conference Paper

HEALNet: Multimodal Fusion for Heterogeneous Biomedical Data

  • Konstantin Hemker
  • Nikola Simidjievski
  • Mateja Jamnik

Technological advances in medical data collection, such as high-throughput genomic sequencing and digital high-resolution histopathology, have contributed to the rising requirement for multimodal biomedical modelling, specifically for image, tabular and graph data. Most multimodal deep learning approaches use modality-specific architectures that are often trained separately and cannot capture the crucial cross-modal information that motivates the integration of different data sources. This paper presents the H ybrid E arly-fusion A ttention L earning Net work (HEALNet) – a flexible multimodal fusion architecture, which: a) preserves modality-specific structural information, b) captures the cross-modal interactions and structural information in a shared latent space, c) can effectively handle missing modalities during training and inference, and d) enables intuitive model inspection by learning on the raw data input instead of opaque embeddings. We conduct multimodal survival analysis on Whole Slide Images and Multi-omic data on four cancer datasets from The Cancer Genome Atlas (TCGA). HEALNet achieves state-of-the-art performance compared to other end-to-end trained fusion models, substantially improving over unimodal and multimodal baselines whilst being robust in scenarios with missing modalities. The code is available at https: //github. com/konst-int-i/healnet.

NeurIPS Conference 2024 Conference Paper

Multi-language Diversity Benefits Autoformalization

  • Albert Q. Jiang
  • Wenda Li
  • Mateja Jamnik

Autoformalization is the task of translating natural language materials into machine-verifiable formalisations. Progress in autoformalization research is hindered by the lack of a sizeable dataset consisting of informal-formal pairs expressing the same essence. Existing methods tend to circumvent this challenge by manually curating small corpora or using few-shot learning with large language models. But these methods suffer from data scarcity and formal language acquisition difficulty. In this work, we create mma, a large, flexible, multi-language, and multi-domain dataset of informal-formal pairs, by using a language model to translate in the reverse direction, that is, from formal mathematical statements into corresponding informal ones. Experiments show that language models fine-tuned on mma can produce up to $29-31$\% of statements acceptable with minimal corrections on the miniF2F and ProofNet benchmarks, up from $0$\% with the base model. We demonstrate that fine-tuning on multi-language formal data results in more capable autoformalization models even on single-language tasks.

ICML Conference 2024 Conference Paper

ProtoGate: Prototype-based Neural Networks with Global-to-local Feature Selection for Tabular Biomedical Data

  • Xiangjian Jiang
  • Andrei Margeloiu
  • Nikola Simidjievski
  • Mateja Jamnik

Tabular biomedical data poses challenges in machine learning because it is often high-dimensional and typically low-sample-size (HDLSS). Previous research has attempted to address these challenges via local feature selection, but existing approaches often fail to achieve optimal performance due to their limitation in identifying globally important features and their susceptibility to the co-adaptation problem. In this paper, we propose ProtoGate, a prototype-based neural model for feature selection on HDLSS data. ProtoGate first selects instance-wise features via adaptively balancing global and local feature selection. Furthermore, ProtoGate employs a non-parametric prototype-based prediction mechanism to tackle the co-adaptation problem, ensuring the feature selection results and predictions are consistent with underlying data clusters. We conduct comprehensive experiments to evaluate the performance and interpretability of ProtoGate on synthetic and real-world datasets. The results show that ProtoGate generally outperforms state-of-the-art methods in prediction accuracy by a clear margin while providing high-fidelity feature selection and explainable predictions. Code is available at https: //github. com/SilenceX12138/ProtoGate.

NeurIPS Conference 2024 Conference Paper

Repurposing Language Models into Embedding Models: Finding the Compute-Optimal Recipe

  • Albert Q. Jiang
  • Alicja Ziarko
  • Bartosz Piotrowski
  • Wenda Li
  • Mateja Jamnik
  • Piotr Miłoś

Text embeddings are essential for tasks such as document retrieval, clustering, and semantic similarity assessment. In this paper, we study how to contrastively train text embedding models in a compute-optimal fashion, given a suite of pretrained decoder-only language models. Our innovation is an algorithm that produces optimal configurations of model sizes, data quantities, and fine-tuning methods for text-embedding models at different computational budget levels. The resulting recipe, which we obtain through extensive experiments, can be used by practitioners to make informed design choices for their embedding models. Specifically, our findings suggest that full fine-tuning and Low-Rank Adaptation fine-tuning produce optimal models at lower and higher computational budgets respectively.

NeurIPS Conference 2024 Conference Paper

TabEBM: A Tabular Data Augmentation Method with Distinct Class-Specific Energy-Based Models

  • Andrei Margeloiu
  • Xiangjian Jiang
  • Nikola Simidjievski
  • Mateja Jamnik

Data collection is often difficult in critical fields such as medicine, physics, and chemistry, yielding typically only small tabular datasets. However, classification methods tend to struggle with these small datasets, leading to poor predictive performance. Increasing the training set with additional synthetic data, similar to data augmentation in images, is commonly believed to improve downstream tabular classification performance. However, current tabular generative methods that learn either the joint distribution $ p(\mathbf{x}, y) $ or the class-conditional distribution $ p(\mathbf{x} \mid y) $ often overfit on small datasets, resulting in poor-quality synthetic data, usually worsening classification performance compared to using real data alone. To solve these challenges, we introduce TabEBM, a novel class-conditional generative method using Energy-Based Models (EBMs). Unlike existing tabular methods that use a shared model to approximate all class-conditional densities, our key innovation is to create distinct EBM generative models for each class, each modelling its class-specific data distribution individually. This approach creates robust energy landscapes, even in ambiguous class distributions. Our experiments show that TabEBM generates synthetic data with higher quality and better statistical fidelity than existing methods. When used for data augmentation, our synthetic data consistently leads to improved classification performance across diverse datasets of various sizes, especially small ones. Code is available at https: //github. com/andreimargeloiu/TabEBM.

ICML Conference 2024 Conference Paper

Understanding Inter-Concept Relationships in Concept-Based Models

  • Naveen Janaki Raman
  • Mateo Espinosa Zarlenga
  • Mateja Jamnik

Concept-based explainability methods provide insight into deep learning systems by constructing explanations using human-understandable concepts. While the literature on human reasoning demonstrates that we exploit relationships between concepts when solving tasks, it is unclear whether concept-based methods incorporate the rich structure of inter-concept relationships. We analyse the concept representations learnt by concept-based models to understand whether these models correctly capture inter-concept relationships. First, we empirically demonstrate that state-of-the-art concept-based models produce representations that lack stability and robustness, and such methods fail to capture inter-concept relationships. Then, we develop a novel algorithm which leverages inter-concept relationships to improve concept intervention accuracy, demonstrating how correctly capturing inter-concept relationships can improve downstream tasks.

ICLR Conference 2023 Conference Paper

Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs

  • Albert Q. Jiang
  • Sean Welleck
  • Jin Peng Zhou
  • Timothée Lacroix
  • Jiacheng Liu 0010
  • Wenda Li
  • Mateja Jamnik
  • Guillaume Lample

The formalization of existing mathematical proofs is a notoriously difficult process. Despite decades of research on automation and proof assistants, writing formal proofs remains arduous and only accessible to a few experts. While previous studies to automate formalization focused on powerful search algorithms, no attempts were made to take advantage of available informal proofs. In this work, we introduce Draft, Sketch, and Prove (DSP), a method that maps informal proofs to formal proof sketches, and uses the sketches to guide an automated prover by directing its search to easier sub-problems. We investigate two relevant setups where informal proofs are either written by humans or generated by a language model. Our experiments and ablation studies show that large language models are able to produce well-structured formal sketches that follow the same reasoning steps as the informal proofs. Guiding an automated prover with these sketches enhances its performance from $20.9\%$ to $39.3\%$ on a collection of mathematical competition problems.

FSCD Conference 2023 Conference Paper

How Can We Make Trustworthy AI? (Invited Talk)

  • Mateja Jamnik

Not too long ago most headlines talked about our fear of AI. Today, AI is ubiquitous, and the conversation has moved on from whether we should use AI to how we can trust the AI systems that we use in our daily lives. In this talk I look at some key technical ingredients that help us build confidence and trust in using intelligent technology. I argue that intuitiveness, interaction, explainability and inclusion of human domain knowledge are essential in building this trust. I present some of the techniques and methods we are building for making AI systems that think and interact with humans in more intuitive and personalised ways, enabling humans to better understand the solutions produced by machines, and enabling machines to incorporate human domain knowledge in their reasoning and learning processes.

ICML Conference 2023 Conference Paper

Interpretable Neural-Symbolic Concept Reasoning

  • Pietro Barbiero
  • Gabriele Ciravegna
  • Francesco Giannini
  • Mateo Espinosa Zarlenga
  • Lucie Charlotte Magister
  • Alberto Tonda
  • Pietro Liò
  • Frédéric Precioso

Deep learning methods are highly accurate, yet their opaque decision process prevents them from earning full human trust. Concept-based models aim to address this issue by learning tasks based on a set of human-understandable concepts. However, state-of-the-art concept-based models rely on high-dimensional concept embedding representations which lack a clear semantic meaning, thus questioning the interpretability of their decision process. To overcome this limitation, we propose the Deep Concept Reasoner (DCR), the first interpretable concept-based model that builds upon concept embeddings. In DCR, neural networks do not make task predictions directly, but they build syntactic rule structures using concept embeddings. DCR then executes these rules on meaningful concept truth degrees to provide a final interpretable and semantically-consistent prediction in a differentiable manner. Our experiments show that DCR: (i) improves up to +25% w. r. t. state-of-the-art interpretable concept-based models on challenging benchmarks (ii) discovers meaningful logic rules matching known ground truths even in the absence of concept supervision during training, and (iii), facilitates the generation of counterfactual examples providing the learnt rules as guidance.

NeurIPS Conference 2023 Conference Paper

Learning to Receive Help: Intervention-Aware Concept Embedding Models

  • Mateo Espinosa Zarlenga
  • Katie Collins
  • Krishnamurthy Dvijotham
  • Adrian Weller
  • Zohreh Shams
  • Mateja Jamnik

Concept Bottleneck Models (CBMs) tackle the opacity of neural architectures by constructing and explaining their predictions using a set of high-level concepts. A special property of these models is that they permit concept interventions, wherein users can correct mispredicted concepts and thus improve the model's performance. Recent work, however, has shown that intervention efficacy can be highly dependent on the order in which concepts are intervened on and on the model's architecture and training hyperparameters. We argue that this is rooted in a CBM's lack of train-time incentives for the model to be appropriately receptive to concept interventions. To address this, we propose Intervention-aware Concept Embedding models (IntCEMs), a novel CBM-based architecture and training paradigm that improves a model's receptiveness to test-time interventions. Our model learns a concept intervention policy in an end-to-end fashion from where it can sample meaningful intervention trajectories at train-time. This conditions IntCEMs to effectively select and receive concept interventions when deployed at test-time. Our experiments show that IntCEMs significantly outperform state-of-the-art concept-interpretable models when provided with test-time concept interventions, demonstrating the effectiveness of our approach.

TMLR Journal 2023 Journal Article

TabCBM: Concept-based Interpretable Neural Networks for Tabular Data

  • Mateo Espinosa Zarlenga
  • Zohreh Shams
  • Michael Edward Nelson
  • Been Kim
  • Mateja Jamnik

Concept-based interpretability addresses the opacity of deep neural networks by constructing an explanation for a model's prediction using high-level units of information referred to as concepts. Research in this area, however, has been mainly focused on image and graph-structured data, leaving high-stakes tasks whose data is tabular out of reach of existing methods. In this paper, we address this gap by introducing the first definition of what a high-level concept may entail in tabular data. We use this definition to propose Tabular Concept Bottleneck Models (TabCBMs), a family of interpretable self-explaining neural architectures capable of learning high-level concept explanations for tabular tasks. As our method produces concept-based explanations both when partial concept supervision or no concept supervision is available at training time, it is adaptable to settings where concept annotations are missing. We evaluate our method in both synthetic and real-world tabular tasks and show that TabCBM outperforms or performs competitively compared to state-of-the-art methods, while providing a high level of interpretability as measured by its ability to discover known high-level concepts. Finally, we show that TabCBM can discover important high-level concepts in synthetic datasets inspired by critical tabular tasks (e.g., single-cell RNAseq) and allows for human-in-the-loop concept interventions in which an expert can identify and correct mispredicted concepts to boost the model's performance.

AAAI Conference 2023 Conference Paper

Towards Robust Metrics for Concept Representation Evaluation

  • Mateo Espinosa Zarlenga
  • Pietro Barbiero
  • Zohreh Shams
  • Dmitry Kazhdan
  • Umang Bhatt
  • Adrian Weller
  • Mateja Jamnik

Recent work on interpretability has focused on concept-based explanations, where deep learning models are explained in terms of high-level units of information, referred to as concepts. Concept learning models, however, have been shown to be prone to encoding impurities in their representations, failing to fully capture meaningful features of their inputs. While concept learning lacks metrics to measure such phenomena, the field of disentanglement learning has explored the related notion of underlying factors of variation in the data, with plenty of metrics to measure the purity of such factors. In this paper, we show that such metrics are not appropriate for concept learning and propose novel metrics for evaluating the purity of concept representations in both approaches. We show the advantage of these metrics over existing ones and demonstrate their utility in evaluating the robustness of concept representations and interventions performed on them. In addition, we show their utility for benchmarking state-of-the-art methods from both families and find that, contrary to common assumptions, supervision alone may not be sufficient for pure concept representations.

AAAI Conference 2023 Conference Paper

Weight Predictor Network with Feature Selection for Small Sample Tabular Biomedical Data

  • Andrei Margeloiu
  • Nikola Simidjievski
  • Pietro Liò
  • Mateja Jamnik

Tabular biomedical data is often high-dimensional but with a very small number of samples. Although recent work showed that well-regularised simple neural networks could outperform more sophisticated architectures on tabular data, they are still prone to overfitting on tiny datasets with many potentially irrelevant features. To combat these issues, we propose Weight Predictor Network with Feature Selection (WPFS) for learning neural networks from high-dimensional and small sample data by reducing the number of learnable parameters and simultaneously performing feature selection. In addition to the classification network, WPFS uses two small auxiliary networks that together output the weights of the first layer of the classification model. We evaluate on nine real-world biomedical datasets and demonstrate that WPFS outperforms other standard as well as more recent methods typically applied to tabular data. Furthermore, we investigate the proposed feature selection mechanism and show that it improves performance while providing useful insights into the learning task.

NeurIPS Conference 2022 Conference Paper

Autoformalization with Large Language Models

  • Yuhuai Wu
  • Albert Qiaochu Jiang
  • Wenda Li
  • Markus Rabe
  • Charles Staats
  • Mateja Jamnik
  • Christian Szegedy

Autoformalization is the process of automatically translating from natural language mathematics to formal specifications and proofs. A successful autoformalization system could advance the fields of formal verification, program synthesis, and artificial intelligence. While the long-term goal of autoformalization seemed elusive for a long time, we show large language models provide new prospects towards this goal. We make the surprising observation that LLMs can correctly translate a significant portion ($25. 3\%$) of mathematical competition problems perfectly to formal specifications in Isabelle/HOL. We demonstrate the usefulness of this process by improving a previously introduced neural theorem prover via training on these autoformalized theorems. Our methodology results in a new state-of-the-art result on the MiniF2F theorem proving benchmark, improving the proof rate from~$29. 6\%$ to~$35. 2\%$.

NeurIPS Conference 2022 Conference Paper

Concept Embedding Models: Beyond the Accuracy-Explainability Trade-Off

  • Mateo Espinosa Zarlenga
  • Pietro Barbiero
  • Gabriele Ciravegna
  • Giuseppe Marra
  • Francesco Giannini
  • Michelangelo Diligenti
  • Zohreh Shams
  • Frederic Precioso

Deploying AI-powered systems requires trustworthy models supporting effective human interactions, going beyond raw prediction accuracy. Concept bottleneck models promote trustworthiness by conditioning classification tasks on an intermediate level of human-like concepts. This enables human interventions which can correct mispredicted concepts to improve the model's performance. However, existing concept bottleneck models are unable to find optimal compromises between high task accuracy, robust concept-based explanations, and effective interventions on concepts---particularly in real-world conditions where complete and accurate concept supervisions are scarce. To address this, we propose Concept Embedding Models, a novel family of concept bottleneck models which goes beyond the current accuracy-vs-interpretability trade-off by learning interpretable high-dimensional concept representations. Our experiments demonstrate that Concept Embedding Models (1) attain better or competitive task accuracy w. r. t. standard neural models without concepts, (2) provide concept representations capturing meaningful semantics including and beyond their ground truth labels, (3) support test-time concept interventions whose effect in test accuracy surpasses that in standard concept bottleneck models, and (4) scale to real-world conditions where complete concept supervisions are scarce.

AAAI Conference 2022 Short Paper

On the Relation between Distributionally Robust Optimization and Data Curation (Student Abstract)

  • Agnieszka Słowik
  • Léon Bottou
  • Sean B. Holden
  • Mateja Jamnik

Machine learning systems based on minimizing average error have been shown to perform inconsistently across important subsets of the data, and this defect is not exposed by a low average error for the entire dataset. In some social and economic applications, where data represent people, this can lead to discrimination of underrepresented gender, ethnic and other groups. Distributionally Robust Optimization (DRO) attempts to address this problem by minimizing the worst expected risk across subpopulations. We establish theoretical results that clarify the relation between DRO and the optimization of the same loss averaged on a weighted training dataset. A practical implication of our results is that neither DRO nor curation of the training set represent a complete solution for bias mitigation.

NeurIPS Conference 2022 Conference Paper

Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers

  • Albert Qiaochu Jiang
  • Wenda Li
  • Szymon Tworkowski
  • Konrad Czechowski
  • Tomasz Odrzygóźdź
  • Piotr Miłoś
  • Yuhuai Wu
  • Mateja Jamnik

In theorem proving, the task of selecting useful premises from a large library to unlock the proof of a given conjecture is crucially important. This presents a challenge for all theorem provers, especially the ones based on language models, due to their relative inability to reason over huge volumes of premises in text form. This paper introduces Thor, a framework integrating language models and automated theorem provers to overcome this difficulty. In Thor, a class of methods called hammers that leverage the power of automated theorem provers are used for premise selection, while all other tasks are designated to language models. Thor increases a language model's success rate on the PISA dataset from $39\%$ to $57\%$, while solving $8. 2\%$ of problems neither language models nor automated theorem provers are able to solve on their own. Furthermore, with a significantly smaller computational budget, Thor can achieve a success rate on the MiniF2F dataset that is on par with the best existing methods. Thor can be instantiated for the majority of popular interactive theorem provers via a straightforward protocol we provide.

ICLR Conference 2020 Conference Paper

Abstract Diagrammatic Reasoning with Multiplex Graph Networks

  • Duo Wang
  • Mateja Jamnik
  • Pietro Liò

Abstract reasoning, particularly in the visual domain, is a complex human ability, but it remains a challenging problem for artificial neural learning systems. In this work we propose MXGNet, a multilayer graph neural network for multi-panel diagrammatic reasoning tasks. MXGNet combines three powerful concepts, namely, object-level representation, graph neural networks and multiplex graphs, for solving visual reasoning tasks. MXGNet first extracts object-level representations for each element in all panels of the diagrams, and then forms a multi-layer multiplex graph capturing multiple relations between objects across different diagram panels. MXGNet summarises the multiple graphs extracted from the diagrams of the task, and uses this summarisation to pick the most probable answer from the given candidates. We have tested MXGNet on two types of diagrammatic reasoning tasks, namely Diagram Syllogisms and Raven Progressive Matrices (RPM). For an Euler Diagram Syllogism task MXGNet achieves state-of-the-art accuracy of 99.8%. For PGM and RAVEN, two comprehensive datasets for RPM reasoning, MXGNet outperforms the state-of-the-art models by a considerable margin.

AAAI Conference 2020 Short Paper

Bayesian Optimisation for Premise Selection in Automated Theorem Proving (Student Abstract)

  • Agnieszka Słowik
  • Chaitanya Mangla
  • Mateja Jamnik
  • Sean B. Holden
  • Lawrence C. Paulson

Modern theorem provers utilise a wide array of heuristics to control the search space explosion, thereby requiring optimisation of a large set of parameters. An exhaustive search in this multi-dimensional parameter space is intractable in most cases, yet the performance of the provers is highly dependent on the parameter assignment. In this work, we introduce a principled probabilistic framework for heuristic optimisation in theorem provers. We present results using a heuristic for premise selection and the Archive of Formal Proofs (AFP) as a case study.

ECAI Conference 2020 Conference Paper

You Shouldn't Trust Me: Learning Models Which Conceal Unfairness from Multiple Explanation Methods

  • Botty Dimanov
  • Umang Bhatt
  • Mateja Jamnik
  • Adrian Weller

Transparency of algorithmic systems has been discussed as a way for end-users and regulators to develop appropriate trust in machine learning models. One popular approach, LIME [26], even suggests that model explanations can answer the question “Why should I trust you? ” Here we show a straightforward method for modifying a pre-trained model to manipulate the output of many popular feature importance explanation methods with little change in accuracy, thus demonstrating the danger of trusting such explanation methods. We show how this explanation attack can mask a model’s discriminatory use of a sensitive feature, raising strong concerns about using such explanation methods to check model fairness.

KR Conference 2018 Conference Paper

iCon: A Diagrammatic Theorem Prover for Ontologies

  • Zohreh Shams
  • Mateja Jamnik
  • Gem Stapleton
  • Yuri Sato

Concept diagrams form a visual language that is aimed at non-experts for the specification of ontologies and reasoning about them. Empirical evidence suggests that they are more accessible to ontology users than symbolic notations typically used (e. g. , DL, OWL). Here, we report on iCon, an interactive theorem prover for concept diagrams that allows reasoning about ontologies diagrammatically. The input to iCon is a theorem that needs proving to establish how an entailment, in an ontology that needs debugging, is caused by a minimal set of axioms. Such a minimal set of axioms is called an entailment justification. Carrying out inference in iCon provides a diagrammatic proof (i. e. , explanation) that shows how the axioms in an entailment justification give rise to the entailment under investigation. iCon proofs are formally verified and guaranteed to be correct.

LPAR Conference 2004 Conference Paper

Can a Higher-Order and a First-Order Theorem Prover Cooperate?

  • Christoph Benzmüller
  • Volker Sorge
  • Mateja Jamnik
  • Manfred Kerber

Abstract State-of-the-art first-order automated theorem proving systems have reached considerable strength over recent years. However, in many areas of mathematics they are still a long way from reliably proving theorems that would be considered relatively simple by humans. For example, when reasoning about sets, relations, or functions, first-order systems still exhibit serious weaknesses. While it has been shown in the past that higher-order reasoning systems can solve problems of this kind automatically, the complexity inherent in their calculi and their inefficiency in dealing with large numbers of clauses prevent these systems from solving a whole range of problems. We present a solution to this challenge by combining a higher-order and a first-order automated theorem prover, both based on the resolution principle, in a flexible and distributed environment. By this we can exploit concise problem formulations without forgoing efficient reasoning on first-order subproblems. We demonstrate the effectiveness of our approach on a set of problems still considered non-trivial for many first-order theorem provers.