Author name cluster

George H. John

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

6 papers

2 author rows

AIJ Journal 1997 Journal Article

Wrappers for feature subset selection

Ron Kohavi
George H. John

In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a feature subset selection method should consider how the algorithm and the training set interact. We explore the relation between optimal feature subset selection and relevance. Our wrapper method searches for an optimal feature subset tailored to a particular algorithm and a domain. We study the strengths and weaknesses of the wrapper approach and show a series of improved designs. We compare the wrapper approach to induction without feature subset selection and to Relief, a filter approach to feature subset selection. Significant improvement in accuracy is achieved for some datasets for the two families of induction algorithms used: decision trees and Naive-Bayes.

Details DOI

ICML Conference 1995 Conference Paper

Automatic Parameter Selection by Minimizing Estimated Error

Ron Kohavi
George H. John

Details

UAI Conference 1995 Conference Paper

Estimating Continuous Distributions in Bayesian Classifiers

George H. John
Pat Langley

When modeling a probability distribution with a Bayesian network, we are faced with the problem of how to handle continuous variables. Most previous work has either solved the problem by discretizing, or assumed that the data are generated by a single Gaussian. In this paper we abandon the normality assumption and instead use statistical methods for nonparametric density estimation. For a naive Bayesian classifier, we present experimental results on a variety of natural and artificial domains, comparing two methods of density estimation: assuming normality and modeling each conditional distribution with a single Gaussian; and using nonparametric kernel density estimation. We observe large reductions in error on several natural and artificial data sets, which suggests that kernel estimation is a useful tool for learning Bayesian models.

Details

AAAI Conference 1994 Short Paper

Finding Multivariate Splits in Decision Trees Using Function Optimization

George H. John

We present a new method for top-down induction of decision trees (TDIDT) with multivariate binary splits at the nodes. The primary contribution of this work is a new splitting criterion called soft entropy, which is continuous and differentiable with respect to the parameters of the splitting function. Using simple gradient descent to find multivariate splits and a novel pruning technique, our TDIDT-SEH (Soft Entropy Hyperplanes) algorithm is able to learn very small trees with better accuracy than competing learning algorithms on most datasets examined.

PDF Details

ICML Conference 1994 Conference Paper

Irrelevant Features and the Subset Selection Problem

George H. John
Ron Kohavi
Karl Pfleger

Details

AAAI Conference 1994 Short Paper

When the Best Move Isn’t Optimal: Q-learning with Exploration

George H. John

The most popular delayed reinforcement learning technique, Q-learning (Watkins 1989)) estimates the future reward expected from executing each action in every state. If these estimates are correct, then an agent can use them to select the action with maximal expected future reward in each state, and thus perform optimally. Watkins has proved that Q-learning produces an optimal policy (the function mapping states to actions) and that these estimates converge to the correct values given the optimal policy.

PDF Details