Author name cluster

Ian Davidson

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

30 papers

2 author rows

TMLR Journal 2025 Journal Article

CXAD: Contrastive Explanations for Anomaly Detection: Algorithms, Complexity Results and Experiments

Ian Davidson
Nicolás Kennedy
S. S. Ravi

Anomaly/Outlier detection (AD/OD) is often used in controversial applications to detect unusual behavior which is then further investigated or policed. This means an explanation of why something was predicted as an anomaly is desirable not only for individuals but also for the general population and policy-makers. However, existing explainable AI (XAI) methods are not well suited for Explainable Anomaly detection (XAD). In particular, most XAI methods provide instance-level explanations, whereas a model/global-level explanation is desirable for a complete understanding of the definition of normality or abnormality used by an AD algorithm. Further, existing XAI methods try to explain an algorithm’s behavior by finding an explanation of why an instance belongs to a category. However, by definition, anomalies/outliers are chosen because they are different from the normal instances. We propose a new style of model agnostic explanation, called contrastive explanation, that is designed specifically for AD algorithms. It addresses the novel challenge of providing a model-agnostic and global-level explanation by finding contrasts between the outlier group of instances and the normal group. We propose three formulations: (i) Contrastive Explanation, (ii) Strongly Contrastive Explanation, and (iii) Multiple Strong Contrastive Explanations. The last formulation is specifically for the case where a given dataset is believed to have many types of anomalies. For the first two formulations, we show the underlying problem is in the computational class P by presenting linear and polynomial time exact algorithms. We show that the last formulation is computationally intractable, and we use an integer linear program for that version to generate experimental results. We demonstrate our work on several data sets such as the CelebA image data set, the HateXplain language data set, and the COMPAS dataset on fairness. These data sets are chosen as their ground truth explanations are clear or well-known.

AAAI Conference 2025 Conference Paper

Searching for Unfairness in Algorithms’ Outputs: Novel Tests and Insights

Ian Davidson
S. S. Ravi

As AI algorithms are deployed extensively, the need to ensure the fairness of their outputs is critical. Most existing work is on “fairness by design” approaches that incorporate limited tests for fairness into a limited number of algorithms. Here, we explore a framework that removes these limitations and can be used with any algorithm’s output that allocates instances to one of K categories/classes such as outlier detection (OD), clustering and classification. The framework can encode standard and novel fairness types beyond simple counting, and importantly, it can detect intersectional unfairness without being specifically told what to look for. Our experimental results show that both standard and novel types of unfairness exist extensively in the outputs of fair-by-design algorithms and the counter-intuitive result that they can actually increase intersectional unfairness.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Cooperative Knowledge Distillation: A Learner Agnostic Approach

Michael Livanos
Ian Davidson
Stephen Wong

Knowledge distillation is a simple but powerful way to transfer knowledge between a teacher model to a student model. Existing work suffers from at least one of the following key limitations in terms of direction and scope of transfer which restrict its use: all knowledge is transferred from teacher to student regardless of whether or not that knowledge is useful, the student is the only one learning in this exchange, and typically distillation transfers knowledge only from a single teacher to a single student. We formulate a novel form of knowledge distillation in which many models can act as both students and teachers which we call cooperative distillation. The models cooperate as follows: a model (the student) identifies specific deficiencies in it's performance and searches for another model (the teacher) who encodes learned knowledge into instructional virtual instances via counterfactual instance generation. Because different models may have different strengths and weaknesses, all models can act as either students or teachers (cooperation) when appropriate and only distill knowledge in areas specific to their strengths (focus). Since counterfactuals as a paradigm are not tied to any specific algorithm, we can use this method to distill knowledge between learners of different architectures, algorithms, and even feature spaces. We demonstrate our approach not only outperforms baselines such as transfer learning, self-supervised learning, and multiple knowledge distillation algorithms on several datasets, but it can also be used in settings where the aforementioned techniques cannot.

PDF Details DOI

YNICL Journal 2022 Journal Article

Data augmentation with Mixup: Enhancing performance of a functional neuroimaging-based prognostic deep learning classifier in recent onset psychosis

Jason Smucny
Ge Shi
Tyler A. Lesh
Cameron S. Carter
Ian Davidson

IJCAI Conference 2021 Conference Paper

Deep Descriptive Clustering

Hongjing Zhang
Ian Davidson

Recent work on explainable clustering allows describing clusters when the features are interpretable. However, much modern machine learning focuses on complex data such as images, text, and graphs where deep learning is used but the raw features of data are not interpretable. This paper explores a novel setting for performing clustering on complex data while simultaneously generating explanations using interpretable tags. We propose deep descriptive clustering that performs sub-symbolic representation learning on complex data while generating explanations based on symbolic data. We form good clusters by maximizing the mutual information between empirical distribution on the inputs and the induced clustering labels for clustering objectives. We generate explanations by solving an integer linear programming that generates concise and orthogonal descriptions for each cluster. Finally, we allow the explanation to inform better clustering by proposing a novel pairwise loss with self-generated constraints to maximize the clustering and explanation module's consistency. Experimental results on public data demonstrate that our model outperforms competitive baselines in clustering performance while offering high-quality cluster-level explanations.

PDF Details DOI

JAIR Journal 2021 Journal Article

Generic Constraint-based Block Modeling using Constraint Programming

Alex Mattenet
Ian Davidson
Siegfried Nijssen
Pierre Schaus

Block modeling has been used extensively in many domains including social science, spatial temporal data analysis and even medical imaging. Original formulations of the problem modeled it as a mixed integer programming problem, but were not scalable. Subsequent work relaxed the discrete optimization requirement, and showed that adding constraints is not straightforward in existing approaches. In this work, we present a new approach based on constraint programming, allowing discrete optimization of block modeling in a manner that is not only scalable, but also allows the easy incorporation of constraints. We introduce a new constraint filtering algorithm that outperforms earlier approaches, in both constrained and unconstrained settings, for an exhaustive search and for a type of local search called Large Neighborhood Search. We show its use in the analysis of real datasets. Finally, we show an application of the CP framework for model selection using the Minimum Description Length principle.

PDF Details DOI

ECAI Conference 2020 Conference Paper

A Framework for Determining the Fairness of Outlier Detection

Ian Davidson
S. S. Ravi

Outlier detection (OD) is a widely studied problem whose goal is to identify points from a data set that are considered anomalous. Among the methods used in AI and data science, OD is perhaps the most controversial as common applications such as credit card fraud, cyber-intrusion and terrorist activity all involve suggesting that someone is committing a serious crime. However, there is little work on fair outlier detection. We show how to determine if an outlier detection algorithm’s output is fair with respect to multiple protected status variables (PSVs) by formulating various combinatorial problems which attempt to find an explanation (using the PSVs) that differentiates the outlier group from the normal group. We argue that if there is no solution for these explanation problems, then the output of an algorithm can be considered fair, and give a probabilistic interpretation of our work. Since we prove that the underlying combinatorial problems are computationally intractable (i. e. , NP-hard), our approaches cannot be efficiently gamed/side-stepped.

AAAI Conference 2020 Conference Paper

Constraint Programming for an Efficient and Flexible Block Modeling Solver

Alex Lucía Mattenet
Ian Davidson
Siegfried Nijssen
Pierre Schaus

Constraint Programming (CP) is a powerful paradigm for solving combinatorial problems. In CP, the user creates a model by declaring variables with their domains and expresses the constraints that need to be satisﬁed in any solution. The solver is then in charge of ﬁnding feasible solutions—a value in the domain of each variable that satisﬁes all the constraints. The discovery of solutions is done by exploring a search tree that is pruned by the constraints in charge of removing impossible values. The CP framework has the advantage of exposing a rich high-level declarative constraint language for modeling, as well as efﬁcient purpose-speciﬁc ﬁltering algorithms that can be reused in many problems. In this work, we harness this ﬂexibility and efﬁciency for the Block Modeling problem. It is a variant of the graph clustering problem that has been used extensively in many domains including social science, spatio-temporal data analysis and even medical imaging. We present a new approach based on constraint programming, allowing discrete optimization of block modeling in a manner that is not only scalable, but also allows the easy incorporation of constraints. We introduce a new constraint ﬁltering algorithm that outperforms earlier approaches. We show its use in the analysis of real datasets. This is an extended abstract of an earlier publication at CP2019 (Mattenet et al. 2019).

AAAI Conference 2020 Conference Paper

Efficient Algorithms for Generating Provably Near-Optimal Cluster Descriptors for Explainability

Prathyush Sambaturu
Aparna Gupta
Ian Davidson
S. S. Ravi
Anil Vullikanti
Andrew Warren

Improving the explainability of the results from machine learning methods has become an important research goal. Here, we study the problem of making clusters more interpretable by extending a recent approach of [Davidson et al. , NeurIPS 2018] for constructing succinct representations for clusters. Given a set of objects S, a partition π of S (into clusters), and a universe T of tags such that each element in S is associated with a subset of tags, the goal is to ﬁnd a representative set of tags for each cluster such that those sets are pairwise-disjoint and the total size of all the representatives is minimized. Since this problem is NP-hard in general, we develop approximation algorithms with provable performance guarantees for the problem. We also show applications to explain clusters from datasets, including clusters of genomic sequences that represent different threat levels.

AAAI Conference 2020 Conference Paper

Making Existing Clusterings Fairer: Algorithms, Complexity Results and Insights

Ian Davidson
S.S Ravi

We explore the area of fairness in clustering from the different perspective of modifying clusterings from existing algorithms to make them fairer whilst retaining their quality. We formulate the minimal cluster modiﬁcation for fairness (MCMF) problem where the input is a given partitional clustering and the goal is to minimally change it so that the clustering is still of good quality and fairer. We show using an intricate case analysis that for a single protected variable, the problem is efﬁciently solvable (i. e. , in the class P) by proving that the constraint matrix for an integer linear programming (ILP) formulation is totally unimodular (TU). Interestingly, we show that even for a single protected variable, the addition of simple pairwise guidance (to say ensure individual level fairness) makes the MCMF problem computationally intractable (i. e. , NP-hard). Experimental results on Twitter, Census and NYT data sets show that our methods can modify existing clusterings for data sets in excess of 100, 000 instances within minutes on laptops and ﬁnd as fair but higher quality clusterings than fair by design clustering algorithms.

IJCAI Conference 2019 Conference Paper

Dense Transformer Networks for Brain Electron Microscopy Image Segmentation

Jun Li
Yongjun Chen
Lei Cai
Ian Davidson
Shuiwang Ji

The key idea of current deep learning methods for dense prediction is to apply a model on a regular patch centered on each pixel to make pixel-wise predictions. These methods are limited in the sense that the patches are determined by network architecture instead of learned from data. In this work, we propose the dense transformer networks, which can learn the shapes and sizes of patches from data. The dense transformer networks employ an encoder-decoder architecture, and a pair of dense transformer modules are inserted into each of the encoder and decoder paths. The novelty of this work is that we provide technical solutions for learning the shapes and sizes of patches from data and efficiently restoring the spatial correspondence required for dense prediction. The proposed dense transformer modules are differentiable, thus the entire network can be trained. We apply the proposed networks on biological image segmentation tasks and show superior performance is achieved in comparison to baseline methods.

AAAI Conference 2019 Conference Paper

Towards Fluid Machine Intelligence: Can We Make a Gifted AI?

Ian Davidson
Peter B. Walker

Most applications of machine intelligence have focused on demonstrating crystallized intelligence. Crystallized intelligence relies on accessing problem-specific knowledge, skills and experience stored in long term memory. In this paper, we challenge the AI community to design AIs to completely take tests of fluid intelligence which assess the ability to solve novel problems using problem-independent solving skills. Tests of fluid intelligence such as the NNAT are used extensively by schools to determine entry into gifted education programs. We explain the differences between crystallized and fluid intelligence, the importance and capabilities of machines demonstrating fluid intelligence and pose several challenges to the AI community, including that a machine taking such a test would be considered gifted by school districts in the state of California. Importantly, we show existing work on seemingly related fields such as transfer, zero-shot, life-long and meta learning (in their current form) are not directly capable of demonstrating fluid intelligence but instead are task-transductive mechanisms.

IJCAI Conference 2018 Conference Paper

Descriptive Clustering: ILP and CP Formulations with Applications

Thi-Bich-Hanh Dao
Chia-Tung Kuo
S. S. Ravi
Christel Vrain
Ian Davidson

In many settings just finding a good clustering is insufficient and an explanation of the clustering is required. If the features used to perform the clustering are interpretable then methods such as conceptual clustering can be used. However, in many applications this is not the case particularly for image, graph and other complex data. Here we explore the setting where a set of interpretable discrete tags for each instance is available. We formulate the descriptive clustering problem as a bi-objective optimization to simultaneously find compact clusters using the features and to describe them using the tags. We present our formulation in a declarative platform and show it can be integrated into a standard iterative algorithm to find all Pareto optimal solutions to the two objectives. Preliminary results demonstrate the utility of our approach on real data sets for images and electronic health care records and that it outperforms single objective and multi-view clustering baselines.

ICML Conference 2018 Conference Paper

Extreme Learning to Rank via Low Rank Assumption

Minhao Cheng
Ian Davidson
Cho-Jui Hsieh

We consider the setting where we wish to perform ranking for hundreds of thousands of users which is common in recommender systems and web search ranking. Learning a single ranking function is unlikely to capture the variability across all users while learning a ranking function for each person is time-consuming and requires large amounts of data from each user. To address this situation, we propose a Factorization RankSVM algorithm which learns a series of k basic ranking functions and then constructs for each user a local ranking function that is a combination of them. We develop a fast algorithm to reduce the time complexity of gradient descent solver by exploiting the low-rank structure, and the resulting algorithm is much faster than existing methods. Furthermore, we prove that the generalization error of the proposed method can be significantly better than training individual RankSVMs. Finally, we present some interesting patterns in the principal ranking functions learned by our algorithms.

AAAI Conference 2018 Conference Paper

Human Guided Linear Regression With Feature-Level Constraints

Aubrey Gress
Ian Davidson

Linear regression methods are commonly used by both researchers and data scientists due to their interpretability and their reduced likelihood of overﬁtting. However, these methods can still perform poorly if little labeled training data is available. Typical methods used to overcome a lack of labeled training data somehow involve exploiting an outside source of labeled data or large amounts of unlabeled data. This includes areas such as active learning, semi-supervised learning and transfer learning, but in many domains these approaches are not always applicable because they require either a mechanism to label data, large amounts of unlabeled data or additional sources of sufﬁciently related data. In this paper we explore an alternative, non-data centric approach. We allow the user to guide the learning system through three forms of feature-level guidance which constrain the parameters of the regression function. Such guidance is unlikely to be perfectly accurate, so we derive methods which are robust to some amounts of noise, a property we formally prove for one of our methods.

NeurIPS Conference 2018 Conference Paper

The Cluster Description Problem - Complexity Results, Formulations and Approximations

Ian Davidson
Antoine Gourru
S Ravi

Consider the situation where you are given an existing $k$-way clustering $\pi$. A challenge for explainable AI is to find a compact and distinct explanations of each cluster which in this paper is using instance-level descriptors/tags from a common dictionary. Since the descriptors/tags were not given to the clustering method, this is not a semi-supervised learning situation. We show that the \emph{feasibility} problem of just testing whether any distinct description (not the most compact) exists is generally intractable for just two clusters. This means that unless \textbf{P} = \cnp, there cannot exist an efficient algorithm for the cluster description problem. Hence, we explore ILP formulations for smaller problems and a relaxed but restricted setting that leads to a polynomial time algorithm for larger problems. We explore several extension to the basic setting such as the ability to ignore some instances and composition constraints on the descriptions of the clusters. We show our formulation's usefulness on Twitter data where the communities were found using social connectivity (i. e. \texttt{follower} relation) but the explanation of the communities is based on behavioral properties of the nodes (i. e. hashtag usage) not available to the clustering method.

AIJ Journal 2017 Journal Article

A flexible ILP formulation for hierarchical clustering

Sean Gilpin
Ian Davidson

AAAI Conference 2017 Conference Paper

A Framework for Minimal Clustering Modification via Constraint Programming

Chia-Tung Kuo
S. Ravi
Thi-Bich-Hanh Dao
Christel Vrain
Ian Davidson

Consider the situation where your favorite clustering algorithm applied to a data set returns a good clustering but there are a few undesirable properties. One adhoc way to ﬁx this is to re-run the clustering algorithm and hope to ﬁnd a better variation. Instead, we propose to not run the algorithm again but minimally modify the existing clustering to remove the undesirable properties. We formulate the minimal clustering modiﬁcation problem where we are given an initial clustering produced from any algorithm. The clustering is then modiﬁed to: i) remove the undesirable properties and ii) be minimally different to the given clustering. We show the underlying feasibility sub-problem can be intractable and demonstrate the ﬂexibility of our constraint programming formulation. We empirically validate its usefulness through experiments on social network and medical imaging data sets.

ECAI Conference 2016 Conference Paper

A Framework for Actionable Clustering Using Constraint Programming

Thi-Bich-Hanh Dao
Christel Vrain
Khanh-Chuong Duong
Ian Davidson

Consider if you wish to cluster your ego network in Facebook so as to find several useful groups each of which you can invite to a different dinner party. You may require that each cluster must contain equal number of males and females, that the width of a cluster in terms of age is at most 10 and that each person in a cluster should have at least r other people with the same hobby. These are examples of cardinality, geometric and density requirements/constraints respectfully that can make the clustering useful for a given purpose. However existing formulations of constrained clustering were not designed to handle these constraints since they typically deal with low-level, instance-level constraints. We formulate a constraint programming (CP) languages formulation of clustering with these cluster-level styles of constraints which we call actionable clustering. Experimental results show the potential uses of this work to make clustering more actionable. We also show that these constraints can be used to improve the accuracy of semi-supervised clustering.

AAAI Conference 2016 Conference Paper

A Framework for Outlier Description Using Constraint Programming

Chia-Tung Kuo
Ian Davidson

Outlier detection has been studied extensively and employed in diverse applications in the past decades. In this paper we formulate a related yet understudied problem which we call outlier description. This problem often arises in practice when we have a small number of data instances that had been identiﬁed to be outliers and we wish to explain why they are outliers. We propose a framework based on constraint programming to ﬁnd an optimal subset of features that most differentiates the outliers and normal instances. We further demonstrate the framework offers great ﬂexibility in incorporating diverse scenarios arising in practice such as multiple explanations and human in the loop extensions. We empirically evaluate our proposed framework on real datasets, including medical imaging and text corpus, and demonstrate how the results are useful and interpretable in these domains.

AAAI Conference 2015 Conference Paper

Propagating Ranking Functions on a Graph: Algorithms and Applications

Buyue Qian
Xiang Wang
Ian Davidson

Learning to rank is an emerging learning task that opens up a diverse set of applications. However, most existing work focuses on learning a single ranking function whilst in many real world applications, there can be many ranking functions to fulfill various retrieval tasks on the same data set. How to train many ranking functions is challenging due to the limited availability of training data which is further compounded when plentiful training data is available for a small subset of the ranking functions. This is particularly true in settings, such as personalized ranking/retrieval, where each person requires a unique ranking function according to their preference, but only the functions of the persons who provide sufficient ratings (of objects, such as movies and music) can be well trained. To address this, we propose to construct a graph where each node corresponds to a retrieval task, and then propagate ranking functions on the graph. We illustrate the usefulness of the idea of propagating ranking functions and our method by exploring two real world applications.

IJCAI Conference 2013 Conference Paper

Active Learning from Relative Queries

Buyue Qian
Xiang Wang
Fei Wang
Hongfei Li
Jieping Ye
Ian Davidson

Active learning has been extensively studied and shown to be useful in solving real problems. The typical setting of traditional active learning methods is querying labels from an oracle. This is only possible if an expert exists, which may not be the case in many real world applications. In this paper, we focus on designing easier questions that can be answered by a non-expert. These questions poll relative information as opposed to absolute information and can be even generated from sideinformation. We propose an active learning approach that queries the ordering of the importance of an instance’s neighbors rather than its label. We explore our approach on real datasets and make several interesting discoveries including that querying neighborhood information can be an effective question to ask and sometimes can even yield better performance than querying labels.

PDF Details DOI

AAAI Conference 2013 Conference Paper

Clustering with Complex Constraints — Algorithms and Applications

Weifeng Zhi
Xiang Wang
Buyue Qian
Patrick Butler
Naren Ramakrishnan
Ian Davidson

Clustering with constraints is an important and developing area. However, most work is conﬁned to conjunctions of simple together and apart constraints which limit their usability. In this paper, we propose a new formulation of constrained clustering that is able to incorporate not only existing types of constraints but also more complex logical combinations beyond conjunctions. We ﬁrst show how any statement in conjunctive normal form (CNF) can be represented as a linear inequality. Since existing clustering formulations such as spectral clustering cannot easily incorporate these linear inequalities, we propose a quadratic programming (QP) clustering formulation to accommodate them. This new formulation allows us to have much more complex guidance in clustering. We demonstrate the effectiveness of our approach in two applications on text and personal information management. We also compare our algorithm against existing constrained spectral clustering algorithm to show its efﬁciency in computational time.

AAAI Conference 2013 Conference Paper

Formalizing Hierarchical Clustering as Integer Linear Programming

Sean Gilpin
Siegried Nijssen
Ian Davidson

Hierarchical clustering is typically implemented as a greedy heuristic algorithm with no explicit objective function. In this work we formalize hierarchical clustering as an integer linear programming (ILP) problem with a natural objective function and the dendrogram properties enforced as linear constraints. Though exact solvers exists for ILP we show that a simple randomized algorithm and a linear programming (LP) relaxation can be used to provide approximate solutions faster. Formalizing hierarchical clustering also has the beneﬁt that relaxing the constraints can produce novel problem variations such as overlapping clusterings. Our experiments show that our formulation is capable of outperforming standard agglomerative clustering algorithms in a variety of settings, including traditional hierarchical clustering as well as learning overlapping clusterings.

ICML Conference 2013 Conference Paper

Joint Transfer and Batch-mode Active Learning

Rita Chattopadhyay
Wei Fan 0001
Ian Davidson
Sethuraman Panchanathan
Jieping Ye

Active learning and transfer learning are two different methodologies that address the common problem of insufficient labels. Transfer learning addresses this problem by using the knowledge gained from a related and already labeled data source, whereas active learning focuses on selecting a small set of informative samples for manual annotation. Recently, there has been much interest in developing frameworks that combine both transfer and active learning methodologies. A few such frameworks reported in literature perform transfer and active learning in two separate stages. In this work, we present an integrated framework that performs transfer and active learning simultaneously by solving a single convex optimization problem. The proposed framework computes the weights of source domain data and selects the samples from the target domain data simultaneously, by minimizing a common objective of reducing distribution difference between the data set consisting of reweighted source and the queried target domain data and the set of unlabeled target domain data. Comprehensive experiments on three real world data sets demonstrate that the proposed method improves the classification accuracy by 5% to 10% over the existing two-stage approach

AAAI Conference 2010 Conference Paper

Semi-Supervised Dimension Reduction for Multi-Label Classification

Buyue Qian
Ian Davidson

A significant challenge to make learning techniques more suitable for general purpose use in AI is to move beyond i) complete supervision, ii) low dimensional data and iii) a single label per instance. Solving this challenge would allow making predictions for high dimensional large dataset with multiple (but possibly incomplete) labelings. While other work has addressed each of these problems separately, in this paper we show how to address them together, namely the problem of semi-supervised dimension reduction for multi-labeled classification, SSDR-MC. To our knowledge this is the first paper that attempts to address all challenges together. In this work, we study a novel joint learning framework which performs optimization for dimension reduction and multi-label inference in semi-supervised setting. The experimental results validate the performance of our approach, and demonstrate the effectiveness of connecting dimension reduction and learning.

IJCAI Conference 2009 Conference Paper

Ian Davidson

As A. I. algorithms are applied to more complex domains that involve high dimensional data sets there is a need to more saliently represent the data. However, most dimension reduction approaches are driven by objective functions that may not or only partially suit the end users requirements. In this work, we show how to incorporate general-purpose domain expertise encoded as a graph into dimension reduction in way that lends itself to an elegant generalized eigenvalue problem. We call our approach Graph-Driven Constrained Dimension Reduction via Linear Projection (GCDR-LP) and show that it has several desirable properties.

ICML Conference 2007 Conference Paper

Intractability and clustering with constraints

Ian Davidson
S. S. Ravi

AAAI Conference 2004 Conference Paper

An Ensemble Technique for Stable Learners with Performance Bounds

Ian Davidson

Ensemble techniques such as bagging and DECORATE exploit the “instability” of learners, such as decision trees, to create a diverse set of models. However, creating a diverse set of models for stable learners such as naïve Bayes is difficult as they are relatively insensitive to training data changes. Furthermore, many popular ensemble techniques do not have a rigorous underlying theory and often provide no insight into how many models to build. We formally define stable learner as having a second order derivative of the posterior density function and propose an ensemble technique specifically for stable learners. Our ensemble technique, bootstrap model averaging, creates a number of bootstrap samples from the training data, builds a model from each and then sums the joint instance and class probability over all models built. We show that for stable learners our ensemble technique for infinite bootstrap samples approximates posterior model averaging (aka the optimal Bayes classifier (OBC)). For finite bootstrap samples we estimate the increase over the OBC error using Chebychev bounds. We empirically illustrate our approach’s usefulness for several stable learners and verify our bound’s correctness.

UAI Conference 2000 Conference Paper

Minimum Message Length Clustering Using Gibbs Sampling

Ian Davidson