Author name cluster

Pedro Domingos

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

27 papers

1 author row

NeurIPS Conference 2018 Conference Paper

Submodular Field Grammars: Representation, Inference, and Application to Image Parsing

Abram Friesen
Pedro Domingos

Natural scenes contain many layers of part-subpart structure, and distributions over them are thus naturally represented by stochastic image grammars, with one production per decomposition of a part. Unfortunately, in contrast to language grammars, where the number of possible split points for a production $A \rightarrow BC$ is linear in the length of $A$, in an image there are an exponential number of ways to split a region into subregions. This makes parsing intractable and requires image grammars to be severely restricted in practice, for example by allowing only rectangular regions. In this paper, we address this problem by associating with each production a submodular Markov random field whose labels are the subparts and whose labeling segments the current object into these subparts. We call the result a submodular field grammar (SFG). Finding the MAP split of a region into subregions is now tractable, and by exploiting this we develop an efficient approximate algorithm for MAP parsing of images with SFGs. Empirically, we present promising improvements in accuracy when using SFGs for scene understanding, and show exponential improvements in inference time compared to traditional methods, while returning comparable minima.

AAAI Conference 2016 Conference Paper

Learning Tractable Probabilistic Models for Fault Localization

Aniruddh Nath
Pedro Domingos

In recent years, several probabilistic techniques have been applied to various debugging problems. However, most existing probabilistic debugging systems use relatively simple statistical models, and fail to generalize across multiple programs. In this work, we propose Tractable Fault Localization Models (TFLMs) that can be learned from data, and probabilistically infer the location of the bug. While most previous statistical debugging methods generalize over many executions of a single program, TFLMs are trained on a corpus of previously seen buggy programs, and learn to identify recurring patterns of bugs. Widely-used fault localization techniques such as TARANTULA evaluate the suspiciousness of each line in isolation; in contrast, a TFLM deﬁnes a joint probability distribution over buggy indicator variables for each line. Joint distributions with rich dependency structure are often computationally intractable; TFLMs avoid this by exploiting recent developments in tractable probabilistic models (specifically, Relational SPNs). Further, TFLMs can incorporate additional sources of information, including coverage-based features such as TARANTULA. We evaluate the fault localization performance of TFLMs that include TARANTULA scores as features in the probabilistic model. Our study shows that the learned TFLMs isolate bugs more effectively than previous statistical methods or using TARANTULA directly.

IJCAI Conference 2015 Conference Paper

IJCAI-15 Distinguished Paper Recursive Decomposition for Nonconvex Optimization

Abram L. Friesen
Pedro Domingos

Continuous optimization is an important problem in many areas of AI, including vision, robotics, probabilistic inference, and machine learning. Unfortunately, most real-world optimization problems are nonconvex, causing standard convex techniques to find only local optima, even with extensions like random restarts and simulated annealing. We observe that, in many cases, the local modes of the objective function have combinatorial structure, and thus ideas from combinatorial optimization can be brought to bear. Based on this, we propose a problem-decomposition approach to nonconvex optimization. Similarly to DPLL-style SAT solvers and recursive conditioning in probabilistic inference, our algorithm, RDIS, recursively sets variables so as to simplify and decompose the objective function into approximately independent subfunctions, until the remaining functions are simple enough to be optimized by standard techniques like gradient descent. The variables to set are chosen by graph partitioning, ensuring decomposition whenever possible. We show analytically that RDIS can solve a broad class of nonconvex optimization problems exponentially faster than gradient descent with random restarts. Experimentally, RDIS outperforms standard techniques on problems like structure from motion and protein folding.

AAAI Conference 2015 Conference Paper

Learning Relational Sum-Product Networks

Aniruddh Nath
Pedro Domingos

Sum-product networks (SPNs) are a recently-proposed deep architecture that guarantees tractable inference, even on certain high-treewidth models. SPNs are a propositional architecture, treating the instances as independent and identically distributed. In this paper, we introduce Relational Sum- Product Networks (RSPNs), a new tractable first-order probabilistic architecture. RSPNs generalize SPNs by modeling a set of instances jointly, allowing them to influence each other’s probability distributions, as well as modeling probabilities of relations between objects. We also present LearnR- SPN, the first algorithm for learning high-treewidth tractable statistical relational models. LearnRSPN is a recursive topdown structure learning algorithm for RSPNs, based on Gens and Domingos’ LearnSPN algorithm for propositional SPN learning. We evaluate the algorithm on three datasets; the RSPN learning algorithm outperforms Markov Logic Networks in both running time and predictive accuracy.

AAAI Conference 2014 Conference Paper

Approximate Lifting Techniques for Belief Propagation

Parag Singla
Aniruddh Nath
Pedro Domingos

Many AI applications need to explicitly represent relational structure as well as handle uncertainty. First order probabilistic models combine the power of logic and probability to deal with such domains. A naive approach to inference in these models is to propositionalize the whole theory and carry out the inference on the ground network. Lifted inference techniques (such as lifted belief propagation; Singla and Domingos 2008) provide a more scalable approach to inference by combining together groups of objects which behave identically. In many cases, constructing the lifted network can itself be quite costly. In addition, the exact lifted network is often very close in size to the fully propositionalized model. To overcome these problems, we present approximate lifted inference, which groups together similar but distinguishable objects and treats them as if they were identical. Early stopping terminates the execution of the lifted network construction at an early stage resulting in a coarser network. Noisetolerant hypercubes allow for marginal errors in the representation of the lifted network itself. Both of our algorithms can significantly speed up the process of lifted network construction as well as result in much smaller models. The coarseness of the approximation can be adjusted depending on the accuracy required, and we can bound the resulting error. Extensive evaluation on six domains demonstrates great efficiency gains with only minor (or no) loss in accuracy.

NeurIPS Conference 2014 Conference Paper

Deep Symmetry Networks

Robert Gens
Pedro Domingos

The chief difficulty in object recognition is that objects' classes are obscured by a large number of extraneous sources of variability, such as pose and part deformation. These sources of variation can be represented by symmetry groups, sets of composable transformations that preserve object identity. Convolutional neural networks (convnets) achieve a degree of translational invariance by computing feature maps over the translation group, but cannot handle other groups. As a result, these groups' effects have to be approximated by small translations, which often requires augmenting datasets and leads to high sample complexity. In this paper, we introduce deep symmetry networks (symnets), a generalization of convnets that forms feature maps over arbitrary symmetry groups. Symnets use kernel-based interpolation to tractably tie parameters and pool over symmetry spaces of any dimension. Like convnets, they are trained with backpropagation. The composition of feature transformations through the layers of a symnet provides a new approach to deep learning. Experiments on NORB and MNIST-rot show that symnets over the affine group greatly reduce sample complexity relative to convnets by better capturing the symmetries in the data.

AAAI Conference 2012 Conference Paper

A Tractable First-Order Probabilistic Logic

Pedro Domingos
William Webb

Tractable subsets of first-order logic are a central topic in AI research. Several of these formalisms have been used as the basis for first-order probabilistic languages. However, these are intractable, losing the original motivation. Here we propose the first non-trivially tractable first-order probabilistic language. It is a subset of Markov logic, and uses probabilistic class and part hierarchies to control complexity. We call it TML (Tractable Markov Logic). We show that TML knowledge bases allow for efficient inference even when the corresponding graphical models have very high treewidth. We also show how probabilistic inheritance, default reasoning, and other inference patterns can be carried out in TML. TML opens up the prospect of efficient large-scale firstorder probabilistic inference.

NeurIPS Conference 2012 Conference Paper

Discriminative Learning of Sum-Product Networks

Robert Gens
Pedro Domingos

Sum-product networks are a new deep architecture that can perform fast, exact in- ference on high-treewidth models. Only generative methods for training SPNs have been proposed to date. In this paper, we present the ﬁrst discriminative training algorithms for SPNs, combining the high accuracy of the former with the representational power and tractability of the latter. We show that the class of tractable discriminative SPNs is broader than the class of tractable generative ones, and propose an efﬁcient backpropagation-style algorithm for computing the gradient of the conditional log likelihood. Standard gradient descent suffers from the diffusion problem, but networks with many layers can be learned reliably us- ing “hard” gradient descent, where marginal inference is replaced by MPE infer- ence (i. e. , inferring the most probable state of the non-evidence variables). The resulting updates have a simple and intuitive form. We test discriminative SPNs on standard image classiﬁcation tasks. We obtain the best results to date on the CIFAR-10 dataset, using fewer features than prior methods with an SPN architec- ture that learns local image structure discriminatively. We also report the highest published test accuracy on STL-10 even though we only use the labeled portion of the dataset.

AAAI Conference 2011 Conference Paper

Coarse-to-Fine Inference and Learning for First-Order Probabilistic Models

Chloe Kiddon
Pedro Domingos

Coarse-to-ﬁne approaches use sequences of increasingly ﬁne approximations to control the complexity of inference and learning. These techniques are often used in NLP and vision applications. However, no coarse-to-ﬁne inference or learning methods have been developed for general ﬁrst-order probabilistic domains, where the potential gains are even higher. We present our Coarse-to-Fine Probabilistic Inference (CFPI) framework for general coarse-to-ﬁne inference for ﬁrst-order probabilistic models, which leverages a given or induced type hierarchy over objects in the domain. Starting by considering the inference problem at the coarsest type level, our approach performs inference at successively ﬁner grains, pruning highand low-probability atoms before reﬁning. CFPI can be applied with any probabilistic inference method and can be used in both propositional and relational domains. CFPI provides theoretical guarantees on the errors incurred, and these guarantees can be tightened when CFPI is applied to speciﬁc inference algorithms. We also show how to learn parameters in a coarse-to-ﬁne manner to maximize the efﬁciency of CFPI. We evaluate CFPI with the lifted belief propagation algorithm on social network link prediction and biomolecular event prediction tasks. These experiments show CFPI can greatly speed up inference without sacriﬁcing accuracy.

NeurIPS Conference 2010 Conference Paper

Approximate Inference by Compilation to Arithmetic Circuits

Daniel Lowd
Pedro Domingos

Arithmetic circuits (ACs) exploit context-specific independence and determinism to allow exact inference even in networks with high treewidth. In this paper, we introduce the first ever approximate inference methods using ACs, for domains where exact inference remains intractable. We propose and evaluate a variety of techniques based on exact compilation, forward sampling, AC structure learning, Markov network parameter learning, variational inference, and Gibbs sampling. In experiments on eight challenging real-world domains, we find that the methods based on sampling and learning work best: one such method (AC2-F) is faster and usually more accurate than loopy belief propagation, mean field, and Gibbs sampling; another (AC2-G) has a running time similar to Gibbs sampling but is consistently more accurate than all baselines.

AAAI Conference 2010 Conference Paper

Efficient Belief Propagation for Utility Maximization and Repeated Inference

Aniruddh Nath
Pedro Domingos

Many problems require repeated inference on probabilistic graphical models, with different values for evidence variables or other changes. Examples of such problems include utility maximization, MAP inference, online and interactive inference, parameter and structure learning, and dynamic inference. Since small changes to the evidence typically only affect a small region of the network, repeatedly performing inference from scratch can be massively redundant. In this paper, we propose expanding frontier belief propagation (EFBP), an efficient approximate algorithm for probabilistic inference with incremental changes to the evidence (or model). EFBP is an extension of loopy belief propagation (BP) where each run of inference reuses results from the previous ones, instead of starting from scratch with the new evidence; messages are only propagated in regions of the network affected by the changes. We provide theoretical guarantees bounding the difference in beliefs generated by EFBP and standard BP, and apply EFBP to the problem of expected utility maximization in influence diagrams. Experiments on viral marketing and combinatorial auction problems show that EFBP can converge much faster than BP without significantly affecting the quality of the solutions.

AAAI Conference 2010 Conference Paper

Efficient Lifting for Online Probabilistic Inference

Aniruddh Nath
Pedro Domingos

NeurIPS Conference 2010 Conference Paper

Learning Efficient Markov Networks

Vibhav Gogate
William Webb
Pedro Domingos

We present an algorithm for learning high-treewidth Markov networks where inference is still tractable. This is made possible by exploiting context specific independence and determinism in the domain. The class of models our algorithm can learn has the same desirable properties as thin junction trees: polynomial inference, closed form weight learning, etc. , but is much broader. Our algorithm searches for a feature that divides the state space into subspaces where the remaining variables decompose into independent subsets (conditioned on the feature or its negation) and recurses on each subspace/subset of variables until no useful new features can be found. We provide probabilistic performance guarantees for our algorithm under the assumption that the maximum feature length is k (the treewidth can be much larger) and dependences are of bounded strength. We also propose a greedy version of the algorithm that, while forgoing these guarantees, is much more efficient. Experiments on a variety of domains show that our approach compares favorably with thin junction trees and other Markov network structure learners.

IJCAI Conference 2007 Conference Paper

Daniel Lowd
Pedro Domingos

A formula in first-order logic can be viewed as a tree, with a logical connective at each node, and a knowledge base can be viewed as a tree whose root is a conjunction. Markov logic [Richardson and Domingos, 2006] makes this conjunction probabilistic, as well as the universal quantifiers directly under it, but the rest of the tree remains purely logical. This causes an asymmetry in the treatment of conjunctions and disjunctions, and of universal and existential quantifiers. We propose to overcome this by allowing the features of Markov logic networks (MLNs) to be nested MLNs. We call this representation recursive random fields (RRFs). RRFs can represent many first-order distributions exponentially more compactly than MLNs. We perform inference in RRFs using MCMC and ICM, and weight learning using a form of backpropagation. Weight learning in RRFs is more powerful than structure learning in MLNs. Applied to first-order knowledge bases, it provides a very flexible form of theory revision. We evaluate RRFs on the problem of probabilistic integrity constraints in databases, and obtain promising results.

IJCAI Conference 2003 Conference Paper

Automatically Personalizing User Interfaces

Daniel S. Weld
Corin Anderson
Pedro Domingos
Oren Etzioni
Krzysztof Gajos
Tessa Lau
Steve Wolfinan

Todays computer interfaces are one-size-fits-all. Users with little programming experience have very limited opportunities to customize an interface to their task and work habits. Furthermore, the overhead induced by generic interfaces will be proportionately greater on small form-factor PDAs, embedded applications and wearable devices. Automatic personalization may greatly enhance user productivity, but it requires advances in customization (explicit, user-initiated change) and adaptation (interface-initiated change in response to routine user behavior). In order to improve customization, we must make it easier for users to direct these changes. In order to improve adaptation, we must better predict user behavior and navigate the inherent tension between the dynamism of automatic adaptation and the stability required in order for the user to predict the computers behavior and maintain control. This paper surveys a decade's work on customization and adaptation at the University of Washington, distilling the lessons we have learned.

IJCAI Conference 2003 Conference Paper

Dynamic Probabilistic Relational Models

Sumit Sanghai
Pedro Domingos
Daniel Weld

Intelligent agents must function in an uncertain world, containing multiple objects and relations that change over time. Unfortunately, no representation is currently available that can handle all these issues, while allowing for principled and efficient inference. This paper addresses this need by introducing dynamic probabilistic relational models (DPRMs). DPRMs are an extension of dynamic Bayesian networks (DBNs) where each time slice (and its dependences on previous slices) is represented by a probabilistic relational model (PRM). Particle filtering, the standard method for inference in DBNs, has severe limitations when applied to DPRMs, but we are able to greatly improve its performance through a form of relational Rao-Blackwellisation. Further gains in efficiency arc obtained through the use of abstraction trees, a novel data structure. We successfully apply DPRMs to execution monitoring and fault diagnosis of an assembly plan, in which a complex product is gradually constructed from subparts.

NeurIPS Conference 2001 Conference Paper

Learning from Infinite Data in Finite Time

Pedro Domingos
Geoff Hulten

We propose the following general method for scaling learning algorithms to arbitrarily large data sets. Consider the model Mii learned by the algorithm using ni examples in step i (ii = (nl, .. ., nm)), and the model Moo that would be learned using in(cid: 173) finite examples. Upper-bound the loss L(Mii' M oo ) between them as a function of ii, and then minimize the algorithm's time com(cid: 173) plexity f(ii) subject to the constraint that L(Moo, Mii ) be at most f with probability at most 8. We apply this method to the EM algorithm for mixtures of Gaussians. Preliminary experiments on a series of large data sets provide evidence of the potential of this approach. 1 An Approach to Large-Scale Learning Large data sets make it possible to reliably learn complex models. On the other hand, they require large computational resources to learn from. While in the past the factor limiting the quality of learnable models was typically the quantity of data available, in many domains today data is super-abundant, and the bottleneck is t he time required to process it. Many algorithms for learning on large data sets have been proposed, but in order to achieve scalability they generally compromise the quality of the results to an unspecified degree. We believe this unsatisfactory state of affairs is avoidable, and in this paper we propose a general method for scaling learning algorithms to arbitrarily large databases without compromising the quality of the results. Our method makes it possible to learn in finite time a model that is essentially indistinguishable from the one that would be obtained using infinite data. Consider the simplest possible learning problem: estimating the mean of a random variable x. If we have a very large number of samples, most of them are probably superfluous. If we are willing to accept an error of at most f with probability at most 8, Hoeffding bounds [4] (for example) tell us that, irrespective of the distribution of x, only n = ~(R/f)2 1n (2/8) samples are needed, where R is x's range. We propose to extend this type of reasoning beyond learning single parameters, to learning complex models. The approach we propose consists of three steps: Derive an upper bound on the relative loss between the finite-data and infinite-data models, as a function of the number of samples used in each step of the finite-data algorithm. Derive an upper bound on the time complexity of the learning algorithm, as a function of the number of samples used in each step. Minimize the time bound (via the number of samples used in each step) subject to target limits on the loss. In this paper we exemplify this approach using the EM algorithm for mixtures of Gaussians. In earlier papers we applied it (or an earlier version of it) to decision tree induction [2J and k-means clustering [3J. Despite its wide use, EM has long been criticized for its inefficiency (see discussion following Dempster et al. [1]), and has been considered unsuitable for large data sets [8J. Many approaches to speeding it up have been proposed (see Thiesson et al. [6J for a survey). Our method can be seen as an extension of progressive sampling approaches like Meek et al. [5J: rather than minimize the total number of samples needed by the algorithm, we minimize the number needed by each step, leading to potentially much greater savings; and we obtain guarantees that do not depend on unverifiable extrapolations of learning curves. 2 A Loss Bound for EM In a mixture of Gaussians model, each D-dimensional data point Xj is assumed to have been independently generated by the following process: 1) randomly choose a mixture component k; 2) randomly generate a point from it according to a Gaussian distribution with mean f-Lk and covariance matrix ~k. In this paper we will restrict ourselves to the case where the number K of mixture components and the probabil(cid: 173) ity of selection P(f-Lk) and covariance matrix for each component are known. Given a training set S = {Xl, .. ., X N }, the learning goal is then to find the maximum(cid: 173) likelihood estimates of the means f-Lk. The EM algorithm [IJ accomplishes this by, starting from some set of initial means, alternating until convergence between esti(cid: 173) mating the probability p(f-Lk IXj) that each point was generated by each Gaussian (the Estep), and computing the ML estimates of the means ilk = 2: :; ': 1 WjkXj / 2: :f=l Wjk (the M step), where Wjk = p(f-Lklxj) from the previous E step. In the basic EM algorithm, all N examples in the training set are used in each iteration. The goal in this paper is to speed up EM by using only ni < N examples in the ith itera(cid: 173) tion, while guaranteeing that the means produced by the algorithm do not differ significantly from those that would be obtained with arbitrarily large N. Let Mii = (ill, .. ., ilK) be the vector of mean estimates obtained by the finite-data EM algorithm (i. e. , using ni examples in iteration i), and let Moo = (f-L1, .. ., f-LK) be the vector obtained using infinite examples at each iteration. In order to proceed, we need to quantify the difference between Mii and Moo. A natural choice is the sum of the squared errors between corresponding means, which is proportional to the negative log-likelihood of the finite-data means given the infinite-data ones: L(Mii' Moo ) = L Ililk - f-Lkl12 = L L lilkd -

NeurIPS Conference 2001 Conference Paper

The Intelligent surfer: Probabilistic Combination of Link and Content Information in PageRank

Matthew Richardson
Pedro Domingos

The PageRank algorithm, used in the Google search engine, greatly improves the results of Web search by taking into account the link structure of the Web. PageRank assigns to a page a score propor- tional to the number of times a random surfer would visit that page, if it surfed indefinitely from page to page, following all outlinks from a page with equal probability. We propose to improve Page- Rank by using a more intelligent surfer, one that is guided by a probabilistic model of the relevance of a page to a query. Efficient execution of our algorithm at query time is made possible by pre- computing at crawl time (and thus once for all queries) the neces- sary terms. Experiments on two large subsets of the Web indicate that our algorithm significantly outperforms PageRank in the (hu- man-rated) quality of the pages returned, while remaining efficient enough to be used in today’s large search engines.

AAAI Conference 2000 Conference Paper

A Unified Bias-Variance Decomposition for Zero-One and Squared Loss

Pedro Domingos

The bias-variance decomposition is a very useful and widely-used tool for understanding machine-learning algorithms. It was originally developed for squared loss. In recent years, several authors have proposed decompositions for zero-one loss, but each has significant shortcomings. In particular, all of these decompositions have only an intuitive relationship to the original squared-loss one. In this paper, we define bias and variance for an arbitrary loss function, and show that the resulting decomposition specializes to the standard one for the squared-loss case, and to a close relative of Kong and Dietterich’s (1995) one for the zero-one case. The same decomposition also applies to variable misclassification costs. We show a number of interesting consequences of the unified definition. For example, Schapire et al. ’s (1997) notion of “margin” can be expressed as a function of the zero-one bias and variance, making it possible to formally relate a classifier ensemble’s generalization error to the base learner’s bias and variance on training examples. Experiments with the unified definition lead to further insights.

IJCAI Conference 1999 Conference Paper

Process-Oriented Estimation of Generalization Error

Pedro Domingos

Methods to avoid overfitting fall into two broad categories: data-oriented (using separate data for validation) and representation-oriented (penalizing complexity in the model). Both have limitations that are hard to overcome. We argue that fully adequate model evaluation is only possible if the search process by which models are obtained is also taken into account. To this end, we recently proposed a method for process-oriented evaluation (P0E), and successfully applied it to rule induction [Domingos, 1998b]. However, for the sake of simplicity this treatment made a number of rather artificial assumptions. In this paper the assumptions are removed, and a simple formula for error estimation is obtained. Empirical trials show the new, better-founded form of POE to be as accurate as the previous one, while further reducing theory sizes.

AAAI Conference 1997 Short Paper

A Comparison of Model Averaging Methods in Foreign Exchange Prediction

Pedro Domingos

AAAI Conference 1997 Short Paper

Learning Multiple Models without Sacrificing Comprehensibility

Pedro Domingos

AAAI Conference 1996 Short Paper

Fast Discovery of Simple Rules

Pedro Domingos

The recent emergence of data mining as a major application of machine learning has led to increased interest in fast rule induction algorithms. These are able to efficiently process large numbers of examples, under the constraint of still achieving good accuracy. If e is the number of examples, many rule learners have O(e^4) asymptotic time complexity in noisy domains, and C4.5RULES has been empirically observed to sometimes require O(e^3) time. Recent advances have brought this bound down to O(elog^2 e), while maintaining accuracy at the level of C4.5RULES’s (Cohen 1995). Ideally, we would like to have an algorithm capable of inducing accurate rules in time linear in e, without becoming too expensive in other factors. This extended abstract presents such an algorithm.

AAAI Conference 1996 Short Paper

Multistrategy Learning: A Case Study

Pedro Domingos

Two of the most popular approaches to induction are instance-based learning (IBL) and rule generation. Their strengths and weaknesses are largely complementary. IBL methods are able to identify small details in the instance space, but have trouble with attributes that are relevant in some parts of the space but not others. Conversely, rule induction methods may overlook small exception regions, but are able to select different attributes in different parts of the instance space. The two methods have been unified in the RISE algorithm (Domingos 1995). RISE views instances as maximally specific rules, forms more general rules by gradually clustering instances of the same class, and classifies a test example by letting the nearest rule win. This approach potentially combines the advantages of rule induction and IBL, and has indeed been observed to be more accurate than each on a large number of benchmark datasets. However, it is important to determine if this performance is indeed due to the hypothesized advantages, and to define the situations in which RISE’s bias will and will not be preferable to those of the individual approaches. This abstract reports experiments to this end in artificial domains.

AAAI Conference 1996 Short Paper

Simple Bayesian Classifiers Do Not Assume Independence

Pedro Domingos

Here we show that the SBC is in fact optimal even when the independence assumption is grossly violated, and thus applicable to a far broader range of domains than previously thought.

AAAI Conference 1996 Short Paper

Towards a Unified Approach to Concept Learning

Pedro Domingos

Rule induction (either directly or by means of decision trees) and case-based learning (forms of which are also known as instance-based, memory-based and nearest-neighbor learning) arguably constitute the two leading symbolic approaches to concept and classification learning. Rule-based methods discard the individual training examples, and remember only abstractions formed from them. At performance time, rules are applied by logical match (i.e., only rules whose preconditions are satisfied by an example are applied to it). Case-based methods explicitly memorize some or all of the examples; they avoid forming abstractions, and instead invest more effort at performance time in finding the most similar cases to the target one.

IJCAI Conference 1995 Conference Paper

Rule Induction and Instance-Based Learning A Unified Approach

Pedro Domingos

This paper presents a new approach to inductive learning that combines aspects of instancebased learning and rule induction in a single simple algorithm The RISE system searches for rules in a specific-to-general fashion, starting with one rule per training example, and avoids some of the difficulties of separate-andeonquer approaches by evaluating each proposed induction step globally, le, through an efficient procedure that is equivalent to checking the accuracy of the rule set as a whole on every training example Classification is performed using a best-match strategy, and reduces to nearest-neighbor if all generalizations of instances were rejected An extensive empirical study shows that RISE consistently achieves higher accuracies than state-of-the-art representatives of its "parent" paradigms (PEBLS and CN2), and also outperforms a decision-tree learner (C4 5) in 13 out of 15 test domains (in 10 with 95% confidence)