Author name cluster

Mark Craven

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

14 papers

2 author rows

AAAI Conference 2022 Conference Paper

Feature Importance Explanations for Temporal Black-Box Models

Akshay Sood
Mark Craven

Models in the supervised learning framework may capture rich and complex representations over the features that are hard for humans to interpret. Existing methods to explain such models are often specific to architectures and data where the features do not have a time-varying component. In this work, we propose TIME, a method to explain models that are inherently temporal in nature. Our approach (i) uses a model-agnostic permutation-based approach to analyze global feature importance, (ii) identifies the importance of salient features with respect to their temporal ordering as well as localized windows of influence, and (iii) uses hypothesis testing to provide statistical rigor.

PDF Details

AAAI Conference 2019 Conference Paper

Understanding Learned Models by Identifying Important Features at the Right Resolution

Kyubin Lee
Akshay Sood
Mark Craven

In many application domains, it is important to characterize how complex learned models make their decisions across the distribution of instances. One way to do this is to identify the features and interactions among them that contribute to a model’s predictive accuracy. We present a model-agnostic approach to this task that makes the following specific contributions. Our approach (i) tests feature groups, in addition to base features, and tries to determine the level of resolution at which important features can be determined, (ii) uses hypothesis testing to rigorously assess the effect of each feature on the model’s loss, (iii) employs a hierarchical approach to control the false discovery rate when testing feature groups and individual base features for importance, and (iv) uses hypothesis testing to identify important interactions among features and feature groups. We evaluate our approach by analyzing random forest and LSTM neural network models learned in two challenging biomedical applications.

PDF Details

IJCAI Conference 2011 Conference Paper

A Framework for Incorporating General Domain Knowledge into Latent Dirichlet Allocation Using First-Order Logic

David Andrzejewski
Xiaojin Zhu
Mark Craven
Benjamin Recht

Topic models have been used successfully for a variety of problems, often in the form of application-specific extensions of the basic Latent Dirichlet Allocation (LDA) model. Because deriving these new models in order to encode domain knowledge can be difficult and time-consuming, we propose the Fold·all model, which allows the user to specify general domain knowledge in First-Order Logic (FOL). However, combining topic modeling with FOL can result in inference problems beyond the capabilities of existing techniques. We have therefore developed a scalable inference technique using stochastic gradient descent which may also be useful to the Markov Logic Network (MLN) research community. Experiments demonstrate the expresive power of Fold·all, as well as the scalability of our proposed inference method.

PDF Details DOI

ICML Conference 2009 Conference Paper

Incorporating domain knowledge into topic modeling via Dirichlet Forest priors

David Andrzejewski
Xiaojin Zhu 0001
Mark Craven

Users of topic modeling methods often have knowledge about the composition of words that should have high or low probability in various topics. We incorporate such domain knowledge using a novel Dirichlet Forest prior in a Latent Dirichlet Allocation framework. The prior is a mixture of Dirichlet tree distributions with special structures. We present its construction, and inference via collapsed Gibbs sampling. Experiments on synthetic and real datasets demonstrate our model's ability to follow and generalize beyond user-specified domain knowledge.

Details

UAI Conference 2008 Conference Paper

Learning Hidden Markov Models for Regression using Path Aggregation

Keith Noto
Mark Craven

We consider the task of learning mappings from sequential data to real-valued responses. We present and evaluate an approach to learning a type of hidden Markov model (HMM) for regression. The learning process involves inferring the structure and parameters of a conventional HMM, while simultaneously learning a regression model that maps features that characterize paths through the model to continuous responses. Our results, in both synthetic and biological domains, demonstrate the value of jointly learning the two components of our approach.

Details

NeurIPS Conference 2007 Conference Paper

Multiple-Instance Active Learning

Burr Settles
Mark Craven
Soumya Ray

In a multiple instance (MI) learning problem, instances are naturally organized into bags and it is the bags, instead of individual instances, that are labeled for training. MI learners assume that every instance in a bag labeled negative is actually negative, whereas at least one instance in a bag labeled positive is actually positive. We present a framework for active learning in the multiple-instance setting. In particular, we consider the case in which an MI learner is allowed to selectively query unlabeled instances in positive bags. This approach is well motivated in domains in which it is inexpensive to acquire bag labels and possible, but expensive, to acquire instance labels. We describe a method for learning from labels at mixed levels of granularity, and introduce two active query selection strategies motivated by the MI setting. Our experiments show that learning from instance labels can significantly improve performance of a basic MI learning algorithm in two multiple-instance domains: content-based image recognition and text classification.

PDF Details

ICML Conference 2005 Conference Paper

Supervised versus multiple instance learning: an empirical comparison

Soumya Ray
Mark Craven

Details

NeurIPS Conference 2004 Conference Paper

Markov Networks for Detecting Overalpping Elements in Sequence Data

Mark Craven
Joseph Bockhorst

Many sequential prediction tasks involve locating instances of pat- terns in sequences. Generative probabilistic language models, such as hidden Markov models (HMMs), have been successfully applied to many of these tasks. A limitation of these models however, is that they cannot naturally handle cases in which pattern instances overlap in arbitrary ways. We present an alternative approach, based on conditional Markov networks, that can naturally repre- sent arbitrarily overlapping elements. We show how to efficiently train and perform inference with these models. Experimental re- sults from a genomics domain show that our models are more accu- rate at locating instances of overlapping patterns than are baseline models based on HMMs. 1 Introduction Hidden Markov models (HMMs) and related probabilistic sequence models have been among the most accurate methods used for sequence-based prediction tasks in genomics, natural language processing and other problem domains. One key limitation of these models, however, is that they cannot represent general overlaps among sequence elements in a concise and natural manner. We present a novel approach to modeling and predicting overlapping sequence elements that is based on undirected Markov networks. Our work is motivated by the task of predicting DNA sequence elements involved in the regulation of gene expression in bacteria. Like HMM-based methods, our approach is able to represent and exploit relationships among different sequence elements of interest. In contrast to HMMs, however, our approach can naturally represent sequence elements that overlap in arbitrary ways. We describe and evaluate our approach in the context of predicting a bacterial genome's genes and regulatory "signals" (together its regulatory elements). Part of the process of understanding a given genome is to assemble a "parts list", often using computational methods, of its regulatory elements. Predictions, in this case, entail specifying the start and end coordinates of subsequences of interest. It is common in bacterial genomes for these important sequence elements to overlap. START END (a) (b) prom prom prom term 1 2 3 1 gene1 gene 2 prom gene term Figure 1: (a) Example arrangement of two genes, three promoters and one terminator in a DNA sequence. (b) Topology of an HMM for predicting these elements. Large circles represent element-specific sub-models and small gray circles represent inter-element sub- models, one for each allowed pair of adjacent elements. Due to the overlapping elements, there is no path through the HMM consistent with the configuration in (a). Our approach to predicting overlapping sequence elements, which is based on dis- criminatively trained undirected graphical models called conditional Markov net- works [5, 10] (also called conditional random fields), uses two key steps to make a set of predictions. In the first step, candidate elements are generated by having a set of models independently make predictions. In the second step, a Markov network is constructed to decide which candidate predictions to accept. Consider the task of predicting gene, promoter, and terminator elements encoded in bacterial DNA. Figure 1(a) shows an example arrangement of these elements in a DNA sequence. Genes are DNA sequences that encode information for constructing proteins. Promoters and terminators are DNA sequences that regulate transcrip- tion, the first step in the synthesis of a protein from a gene. Transcription begins at a promoter, proceeds downstream (left-to-right in Figure 1(a)), and ends at a terminator. Regulatory elements often overlap each other, for example prom2 and prom3 or gene1 and prom2 in Figure 1. One technique for predicting these elements is first to train a probabilistic sequence model for each element type (e. g. [2, 9]) and then to "scan" an input sequence with each model in turn. Although this approach can predict overlapping elements, it is limited since it ignores inter-element dependencies. Other methods, based on HMMs (e. g. [11, 1]), explicitly consider these dependencies. Figure 1(b) shows an example topology of such an HMM. Given an input sequence, this HMM defines a probability distribution over parses, partitionings of the sequence into subsequences corresponding to elements and the regions between them. These models are not nat- urally suited to representing overlapping elements. For the case shown in Figure 1(a) for example, even if the subsequences for gene1 and prom2 match their respective sub-models very well, since both elements cannot be in the same parse there is a competition between predictions of gene1 and prom2. One could expand the state set to include states for specific overlap situations however, the number of states in- creases exponentially with the number of overlap configurations. Alternatively, one could use the factorized state representation of factorial HMMs [4]. These models, however, assume a fixed number of loosely connected processes evolving in parallel, which is not a good match to our genomics domain. Like HMMs, our method, called CMN-OP (conditional Markov networks for over- lapping patterns), employs element-specific sub-models and probabilistic constraints on neighboring elements qualitatively expressed in a graph. The key difference be- tween CMN-OP and HMMs is the probability distributions they define for an input sequence. While, as mentioned above, an HMM defines a probability distribution over partitions of the sequence, a CMN-OP defines a probability distribution over all possible joint arrangements of elements in an input sequence. Figure 2 illustrates this distinction. (a) HMM (b) CMN-OP predicted labels sample space predicted signals sample space end position 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 1 2 3 4 5 6 7 8 3 4 5 6 start position 7 8 Figure 2: An illustration of the difference in the sample spaces on which probability distributions over labelings are defined by (a) HMMs and (b) CMN-OP models. The left side of (a) shows a sequence of length eight for which an HMM has predicted that an element of interest occupies two subsequences, [1: 3] and [6: 7]. The darker subsequences, [4: 5] and [8: 8], represent sequence regions between predicted elements. The right side of (a) shows the corresponding event in the sample space of the HMM, which associates one label with each position. The left side of (b) shows four predicted elements made by a CMN-OP model. The right side of (b) illustrates the corresponding event in the CMN-OP sample space. Each square corresponds to a subsequence, and an event in this sample space assigns a (possibly empty) label to each sub-sequence.

PDF Details

IJCAI Conference 2003 Conference Paper

Hierarchical Hidden Markov Models for Information Extraction

Marios Skounakis
Mark Craven
Soumya Ray

Information extraction can be defined as the task of automatically extracting instances of specified classes or relations from text. We consider the case of using machine learning methods to induce models for extracting relation instances from biomedical articles. We propose and evaluate an approach that is based on using hierarchical hidden Markov models to represent the grammatical structure of the sentences being processed. Our approach first uses a shallow parser to construct a multi-level representation of each sentence being processed. Then we train hierarchical H M M s to capture the regularities of the parses for both positive and negative sentences. We evaluate our method by inducing models to extract binary relations in three biomedical domains. Our experiments indicate that our approach results in more accurate models than several baseline H M M approaches.

PDF

ICML Conference 2002 Conference Paper

Exploiting Relations Among Concepts to Acquire Weakly Labeled Training Data

Joseph Bockhorst
Mark Craven

Details

AIJ Journal 2000 Journal Article

Learning to construct knowledge bases from the World Wide Web

Mark Craven
Dan DiPasquo
Dayne Freitag
Andrew McCallum
Tom Mitchell
Kamal Nigam
Seán Slattery

The World Wide Web is a vast source of information accessible to computers, but understandable only to humans. The goal of the research described here is to automatically create a computer understandable knowledge base whose content mirrors that of the World Wide Web. Such a knowledge base would enable much more effective retrieval of Web information, and promote new uses of the Web to support knowledge-based inference and problem solving. Our approach is to develop a trainable information extraction system that takes two inputs. The first is an ontology that defines the classes (e. g. , company, person, employee, product) and relations (e. g. , employed_by, produced_by) of interest when creating the knowledge base. The second is a set of training data consisting of labeled regions of hypertext that represent instances of these classes and relations. Given these inputs, the system learns to extract information from other pages and hyperlinks on the Web. This article describes our general approach, several machine learning algorithms for this task, and promising initial results with a prototype system that has created a knowledge base describing university people, courses, and research projects.

Details DOI

AAAI Conference 1998 Conference Paper

Learning to Extract Symbolic Knowledge from the World Wide Web

Mark Craven
Dayne Freitag
Tom Mitchell

The World Wide Webis a vast source of information accessible to computers, but understandable only to humans. Thegoal of the research described here is to automatically create a computer understandable world wide knowledge base whose content mirrors that of the World Wide Web. Such a knowledge base would enable much more effective retrieval of Webinformation, and promote newuses of the Webto support knowledgebased inference and problem solving. Our approach is to develop a trainable information extraction system that takes two inputs: an ontology defining the classes and relations of interest, and a set of training data consisting of labeled reg-ions of hypertext representing instances of these classes and relations. Giventhese inputs, the system learns to extract information from other pages and hyperlinks on the Web. This paper describes our general approach, several machinelearning algorithms for this task, and promising initial results with a prototype system.

PDF Details

NeurIPS Conference 1995 Conference Paper

Extracting Tree-Structured Representations of Trained Networks

Mark Craven
Jude Shavlik

A significant limitation of neural networks is that the represen(cid: 173) tations they learn are usually incomprehensible to humans. We present a novel algorithm, TREPAN, for extracting comprehensible, symbolic representations from trained neural networks. Our algo(cid: 173) rithm uses queries to induce a decision tree that approximates the concept represented by a given network. Our experiments demon(cid: 173) strate that TREPAN is able to produce decision trees that maintain a high level of fidelity to their respective networks while being com(cid: 173) prehensible and accurate. Unlike previous work in this area, our algorithm is general in its applicability and scales well to large net(cid: 173) works and problems with high-dimensional input spaces.

PDF Details

NeurIPS Conference 1995 Conference Paper

Learning Sparse Perceptrons

Jeffrey Jackson
Mark Craven

We introduce a new algorithm designed to learn sparse percep(cid: 173) trons over input representations which include high-order features. Our algorithm, which is based on a hypothesis-boosting method, is able to PAC-learn a relatively natural class of target concepts. Moreover, the algorithm appears to work well in practice: on a set of three problem domains, the algorithm produces classifiers that utilize small numbers of features yet exhibit good generalization performance. Perhaps most importantly, our algorithm generates concept descriptions that are easy for humans to understand. 1

PDF Details