Arrow Research search

Author name cluster

William W. Cohen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

47 papers
2 author rows

Possible papers

47

JAAMAS Journal 2026 Journal Article

Reasoning about Textual Similarity in a Web-Based Information Access System

  • William W. Cohen

Abstract The degree to which information sources are pre-processed by Web-based information systems varies greatly. In search engines like Altavista, little pre-processing is done, while in “knowledge integration” systems, complex site-specific “wrappers” are used to integrate different information sources into a common database representation. In this paper we describe an intermediate point between these two models. In our system, information sources are converted into a highly structured collection of small fragments of text. Database-like queries to this structured collection of text fragments are approximated using a novel logic called WHIRL, which combines inference in the style of deductive databases with ranked retrieval methods from information retrieval (IR). WHIRL allows queries that integrate information from multiple Web sites, without requiring the extraction and normalization of object identifiers that can be used as keys; instead, operations that in conventional databases require equality tests on keys are approximated using IR similarity metrics for text. This leads to a reduction in the amount of human engineering required to field a knowledge integration system. Experimental evidence is given showing that many information sources can be easily modeled with WHIRL, and that inferences in the logic are both accurate and efficient.

NeurIPS Conference 2024 Conference Paper

Stratified Prediction-Powered Inference for Effective Hybrid Evaluation of Language Models

  • Adam Fisch
  • Joshua Maynez
  • R. A. Hofer
  • Bhuwan Dhingra
  • Amir Globerson
  • William W. Cohen

Prediction-powered inference (PPI) is a method that improves statistical estimates based on limited human-labeled data. PPI achieves this by combining small amounts of human-labeled data with larger amounts of data labeled by a reasonably accurate---but potentially biased---automatic system, in a way that results in tighter confidence intervals for certain parameters of interest (e. g. , the mean performance of a language model). In this paper, we propose a method called Stratified Prediction-Powered Inference (StratPPI), in which we show that the basic PPI estimates can be considerably improved by employing simple data stratification strategies. Without making any assumptions on the underlying automatic labeling system or data distribution, we derive an algorithm for computing provably valid confidence intervals for parameters of any dimensionality that is based on stratified sampling. In particular, we show both theoretically and empirically that, with appropriate choices of stratification and sample allocation, our approach can provide substantially tighter confidence intervals than unstratified approaches. Specifically, StratPPI is expected to improve in cases where the performance of the autorater varies across different conditional distributions of the target data.

NeurIPS Conference 2024 Conference Paper

VLM Agents Generate Their Own Memories: Distilling Experience into Embodied Programs of Thought

  • Gabriel Sarch
  • Lawrence Jang
  • Michael J. Tarr
  • William W. Cohen
  • Kenneth Marino
  • Katerina Fragkiadaki

Large-scale generative language and vision-language models (LLMs and VLMs) excel in few-shot in-context learning for decision making and instruction following. However, they require high-quality exemplar demonstrations to be included in their context window. In this work, we ask: Can LLMs and VLMs generate their own examples from generic, sub-optimal demonstrations? We propose In-Context Abstraction Learning (ICAL), a method that builds a memory of multimodal experience from sub-optimal demonstrations and human feedback. Given a task demonstration that may contain inefficiencies or mistakes, a VLM abstracts the trajectory into a generalized program by correcting inefficient actions and annotating cognitive abstractions: causal relationships, object state changes, temporal subgoals, and task-relevant visual elements. These abstractions are iteratively improved and adapted through human feedback while the agent attempts to execute the trajectory in a similar environment. The resulting examples, when used as exemplars in the prompt, significantly improve decision-making in retrieval-augmented LLM and VLM agents. Moreover, as the agent's library of examples grows, it becomes more efficient, relying less on human feedback and requiring fewer environment interactions per demonstration. Our ICAL agent surpasses the state-of-the-art in dialogue-based instruction following in TEACh, multimodal web agents in VisualWebArena, and action anticipation in Ego4D. In TEACh, we achieve a 12. 6% improvement in goal-condition success. In VisualWebArena, our task success rate improves over the SOTA from 14. 3% to 22. 7% using GPT4V. In Ego4D action forecasting, we improve over few-shot GPT-4V and remain competitive with supervised models. We show finetuning our retrieval-augmented in-context agent yields additional improvements. Our approach significantly reduces reliance on manual prompt engineering and consistently outperforms in-context learning from action plans that lack such abstractions.

ICML Conference 2023 Conference Paper

Pre-computed memory or on-the-fly encoding? A hybrid approach to retrieval augmentation makes the most of your compute

  • Michiel de Jong
  • Yury Zemlyanskiy
  • Nicholas FitzGerald
  • Joshua Ainslie
  • Sumit Sanghai
  • Fei Sha
  • William W. Cohen

Retrieval-augmented language models such as Fusion-in-Decoder are powerful, setting the state of the art on a variety of knowledge-intensive tasks. However, they are also expensive, due to the need to encode a large number of retrieved passages. Some work avoids this cost by pre-encoding a text corpus into a memory and retrieving dense representations directly. However, pre-encoding memory incurs a severe quality penalty as the memory representations are not conditioned on the current input. We propose LUMEN, a hybrid between these two extremes, pre-computing the majority of the retrieval representation and completing the encoding on the fly using a live encoder that is conditioned on the question and fine-tuned for the task. We show that LUMEN significantly outperforms pure memory on multiple question-answering tasks while being much cheaper than FiD, and outperforms both for any given compute budget. Moreover, the advantage of LUMEN over FiD increases with model size.

TMLR Journal 2023 Journal Article

Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks

  • Wenhu Chen
  • Xueguang Ma
  • Xinyi Wang
  • William W. Cohen

Recently, there has been significant progress in teaching language models to perform step-by-step reasoning to solve complex numerical reasoning tasks. Chain-of-thoughts prompting (CoT) is the state-of-art method for many of these tasks. CoT uses language models to produce text describing reasoning, and computation, and finally the answer to a question. Here we propose `Program of Thoughts' (PoT), which uses language models (mainly Codex) to generate text and programming language statements, and finally an answer. In PoT, the computation can be delegated to a program interpreter, which is used to execute the generated program, thus decoupling complex computation from reasoning and language understanding. We evaluate PoT on five math word problem datasets and three financial-QA datasets in both few-shot and zero-shot settings. We find that PoT has an average performance gain over CoT of around 12% across all datasets. By combining PoT with self-consistency decoding, we can achieve extremely strong performance on all the math datasets and financial datasets. All of our data and code will be released.

AAAI Conference 2023 Conference Paper

QA Is the New KR: Question-Answer Pairs as Knowledge Bases

  • William W. Cohen
  • Wenhu Chen
  • Michiel De Jong
  • Nitish Gupta
  • Alessandro Presta
  • Pat Verga
  • John Wieting

We propose a new knowledge representation (KR) based on knowledge bases (KBs) derived from text, based on question generation and entity linking. We argue that the proposed type of KB has many of the key advantages of a traditional symbolic KB: in particular, it consists of small modular components, which can be combined compositionally to answer complex queries, including relational queries and queries involving ``multi-hop'' inferences. However, unlike a traditional KB, this information store is well-aligned with common user information needs. We present one such KB, called a QEDB, and give qualitative evidence that the atomic components are high-quality and meaningful, and that atomic components can be combined in ways similar to the triples in a symbolic KB. We also show experimentally that questions reflective of typical user questions are more easily answered with a QEDB than a symbolic KB.

ICLR Conference 2023 Conference Paper

Re-Imagen: Retrieval-Augmented Text-to-Image Generator

  • Wenhu Chen
  • Hexiang Hu
  • Chitwan Saharia
  • William W. Cohen

Research on text-to-image generation has witnessed significant progress in generating diverse and photo-realistic images, driven by diffusion and auto-regressive models trained on large-scale image-text data. Though state-of-the-art models can generate high-quality images of common entities, they often have difficulty generating images of uncommon entities, such as `Chortai (dog)' or `Picarones (food)'. To tackle this issue, we present the Retrieval-Augmented Text-to-Image Generator (Re-Imagen), a generative model that uses retrieved information to produce high-fidelity and faithful images, even for rare or unseen entities. Given a text prompt, Re-Imagen accesses an external multi-modal knowledge base to retrieve relevant (image, text) pairs, and uses them as references to generate the image. With this retrieval step, Re-Imagen is augmented with the knowledge of high-level semantics and low-level visual details of the mentioned entities, and thus improves its accuracy in generating the entities' visual appearances. We train Re-Imagen on a constructed dataset containing (image,text,retrieval) triples to teach the model to ground on both text prompt and retrieval. Furthermore, we develop a new sampling strategy to interleave the classifier-free guidance for text and retrieval condition to balance the text and retrieval alignment. Re-Imagen achieves new SoTA FID results on two image generation benchmarks, such as COCO (\ie, FID = 5.25) and WikiImage (\ie, FID = 5.82) without fine-tuning. To further evaluate the capabilities of the model, we introduce EntityDrawBench, a new benchmark that evaluates image generation for diverse entities, from frequent to rare, across multiple visual domains. Human evaluation on EntityDrawBench shows that Re-Imagen performs on par with the best prior models in photo-realism, but with significantly better real-world faithfulness, especially on less frequent entities.

ICLR Conference 2023 Conference Paper

Scenario-based Question Answering with Interacting Contextual Properties

  • Haitian Sun
  • William W. Cohen
  • Ruslan Salakhutdinov

In the scenario-based Question Answering (QA) task, models are asked to find answers that are appropriate to the user scenarios associated with the question and identify information that is missing from the scenarios but is necessary for the answers to hold. Scenarios commonly include multiple properties of users, such as age, employment status, and income level for the question “How much can I claim from this benefit”. The properties relevant to a potential answer are given in a document, which will state conditions necessary for the answer to hold. Documents also may specify how conditions interact with each other, e.g. with text like “one of the conditions below must apply”. Although understanding the relationship between conditions is crucial for solving this challenging QA task, limited work has been done so far in modeling this. In this paper, we propose the T-Reasoner model, which solves this problem with three jointly learned modules: an entailment module which checks whether a condition has been satisfied by the scenario, a decoding module which locates eligible answers from documents, and a reasoning module which infers the relationship between conditions and performs a reasoning step to determine the logically consistent answers and identify missing conditions. T-Reasoner outperforms strong baselines on a synthetic scenario-based QA dataset and achieves a new state-of-the-art on two scenario-based QA benchmarks, outperforming the prior best models by 3-10 points.

NeurIPS Conference 2023 Conference Paper

Subject-driven Text-to-Image Generation via Apprenticeship Learning

  • Wenhu Chen
  • Hexiang Hu
  • Yandong Li
  • Nataniel Ruiz
  • Xuhui Jia
  • Ming-Wei Chang
  • William W. Cohen

Recent text-to-image generation models like DreamBooth have made remarkable progress in generating highly customized images of a target subject, by fine-tuning an ``expert model'' for a given subject from a few examples. However, this process is expensive, since a new expert model must be learned for each subject. In this paper, we present SuTI, a Subject-driven Text-to-Image generator that replaces subject-specific fine tuning with {in-context} learning. Given a few demonstrations of a new subject, SuTI can instantly generate novel renditions of the subject in different scenes, without any subject-specific optimization. SuTI is powered by {apprenticeship learning}, where a single apprentice model is learned from data generated by a massive number of subject-specific expert models. Specifically, we mine millions of image clusters from the Internet, each centered around a specific visual subject. We adopt these clusters to train a massive number of expert models, each specializing in a different subject. The apprentice model SuTI then learns to imitate the behavior of these fine-tuned experts. SuTI can generate high-quality and customized subject-specific images 20x faster than optimization-based SoTA methods. On the challenging DreamBench and DreamBench-v2, our human evaluation shows that SuTI significantly outperforms existing models like InstructPix2Pix, Textual Inversion, Imagic, Prompt2Prompt, Re-Imagen and DreamBooth.

AAAI Conference 2022 Conference Paper

Explain, Edit, and Understand: Rethinking User Study Design for Evaluating Model Explanations

  • Siddhant Arora
  • Danish Pruthi
  • Norman Sadeh
  • William W. Cohen
  • Zachary C. Lipton
  • Graham Neubig

In attempts to “explain” predictions of machine learning models, researchers have proposed hundreds of techniques for attributing predictions to features that are deemed important. While these attributions are often claimed to hold the potential to improve human “understanding” of the models, surprisingly little work explicitly evaluates progress towards this aspiration. In this paper, we conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews. They are challenged both to simulate the model on fresh reviews, and to edit reviews with the goal of lowering the probability of the originally predicted class. Successful manipulations would lead to an adversarial example. During the training (but not the test) phase, input spans are highlighted to communicate salience. Through our evaluation, we observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control. For the BERT-based classifier, popular local explanations do not improve their ability to reduce the model confidence over the no-explanation case. Remarkably, when the explanation for the BERT model is given by the (global) attributions of a linear model trained to imitate the BERT model, people can effectively manipulate the model.

ICLR Conference 2022 Conference Paper

Mention Memory: incorporating textual knowledge into Transformers through entity mention attention

  • Michiel de Jong
  • Yury Zemlyanskiy
  • Nicholas FitzGerald
  • Fei Sha
  • William W. Cohen

Natural language understanding tasks such as open-domain question answering often require retrieving and assimilating factual information from multiple sources. We propose to address this problem by integrating a semi-parametric representation of a large text corpus into a Transformer model as a source of factual knowledge. Specifically, our method represents knowledge with ``mention memory'', a table of dense vector representations of every entity mention in a corpus. The proposed model - TOME - is a Transformer that accesses the information through internal memory layers in which each entity mention in the input passage attends to the mention memory. This approach enables synthesis of and reasoning over many disparate sources of information within a single Transformer model. In experiments using a memory of 150 million Wikipedia mentions, TOME achieves strong performance on several open-domain knowledge-intensive tasks, including the claim verification benchmarks HoVer and FEVER and several entity-based QA benchmarks. We also show that the model learns to attend to informative mentions without any direct supervision. Finally we demonstrate that the model can generalize to new unseen entities by updating the memory without retraining.

NeurIPS Conference 2022 Conference Paper

Transformer Memory as a Differentiable Search Index

  • Yi Tay
  • Vinh Tran
  • Mostafa Dehghani
  • Jianmo Ni
  • Dara Bahri
  • Harsh Mehta
  • Zhen Qin
  • Kai Hui

In this paper, we demonstrate that information retrieval can be accomplished with a single Transformer, in which all information about the corpus is encoded in the parameters of the model. To this end, we introduce the Differentiable Search Index (DSI), a new paradigm that learns a text-to-text model that maps string queries directly to relevant docids; in other words, a DSI model answers queries directly using only its parameters, dramatically simplifying the whole retrieval process. We study variations in how documents and their identifiers are represented, variations in training procedures, and the interplay between models and corpus sizes. Experiments demonstrate that given appropriate design choices, DSI significantly outperforms strong baselines such as dual encoder models. Moreover, DSI demonstrates strong generalization capabilities, outperforming a BM25 baseline in a zero-shot setup.

ICLR Conference 2021 Conference Paper

Open Question Answering over Tables and Text

  • Wenhu Chen
  • Ming-Wei Chang
  • Eva Schlinger
  • William Yang Wang
  • William W. Cohen

In open question answering (QA), the answer to a question is produced by retrieving and then analyzing documents that might contain answers to the question. Most open QA systems have considered only retrieving information from unstructured text. Here we consider for the first time open QA over {\em both} tabular and textual data and present a new large-scale dataset \emph{Open Table-and-Text Question Answering} (OTT-QA) to evaluate performance on this task. Most questions in OTT-QA require multi-hop inference across tabular data and unstructured text, and the evidence required to answer a question can be distributed in different ways over these two types of input, making evidence retrieval challenging---our baseline model using an iterative retriever and BERT-based reader achieves an exact match score less than 10\%. We then propose two novel techniques to address the challenge of retrieving and aggregating evidence for OTT-QA. The first technique is to use ``early fusion'' to group multiple highly relevant tabular and textual units into a fused block, which provides more context for the retriever to search for. The second technique is to use a cross-block reader to model the cross-dependency between multiple retrieved evidence with global-local sparse attention. Combining these two techniques improves the score significantly, to above 27\%.

ICML Conference 2021 Conference Paper

Reasoning Over Virtual Knowledge Bases With Open Predicate Relations

  • Haitian Sun
  • Patrick Verga
  • Bhuwan Dhingra
  • Ruslan Salakhutdinov
  • William W. Cohen

We present the Open Predicate Query Language (OPQL); a method for constructing a virtual KB (VKB) trained entirely from text. Large Knowledge Bases (KBs) are indispensable for a wide-range of industry applications such as question answering and recommendation. Typically, KBs encode world knowledge in a structured, readily accessible form derived from laborious human annotation efforts. Unfortunately, while they are extremely high precision, KBs are inevitably highly incomplete and automated methods for enriching them are far too inaccurate. Instead, OPQL constructs a VKB by encoding and indexing a set of relation mentions in a way that naturally enables reasoning and can be trained without any structured supervision. We demonstrate that OPQL outperforms prior VKB methods on two different KB reasoning tasks and, additionally, can be used as an external memory integrated into a language model (OPQL-LM) leading to improvements on two open-domain question answering tasks.

AAAI Conference 2021 Conference Paper

What’s the Best Place for an AI Conference, Vancouver or _______: Why Completing Comparative Questions is Difficult

  • ‪Avishai Zagoury‬‏
  • Einat Minkov
  • Idan Szpektor
  • William W. Cohen

Although large neural language models (LMs) like BERT can be finetuned to yield state-of-the-art results on many NLP tasks, it is often unclear what these models actually learn. Here we study using such LMs to fill in entities in humanauthored comparative questions, like “Which country is older, India or? ”—i. e. , we study the ability of neural LMs to ask (not answer) reasonable questions. We show that accuracy in this fill-in-the-blank task is well-correlated with human judgements of whether a question is reasonable, and that these models can be trained to achieve nearly human-level performance in completing comparative questions in three different subdomains. However, analysis shows that what they learn fails to model any sort of broad notion of which entities are semantically comparable or similar—instead the trained models are very domain-specific, and performance is highly correlated with co-occurrences between specific entities observed in the training set. This is true both for models that are pretrained on general text corpora, as well as models trained on a large corpus of comparison questions. Our study thus reinforces recent results on the difficulty of making claims about a deep model’s world knowledge or linguistic competence based on performance on specific benchmark problems. We make our evaluation datasets publicly available to foster future research on complex understanding and reasoning in such models at standards of human interaction.

ICLR Conference 2020 Conference Paper

Differentiable Reasoning over a Virtual Knowledge Base

  • Bhuwan Dhingra
  • Manzil Zaheer
  • Vidhisha Balachandran
  • Graham Neubig
  • Ruslan Salakhutdinov
  • William W. Cohen

We consider the task of answering complex multi-hop questions using a corpus as a virtual knowledge base (KB). In particular, we describe a neural module, DrKIT, that traverses textual data like a KB, softly following paths of relations between mentions of entities in the corpus. At each step the module uses a combination of sparse-matrix TFIDF indices and a maximum inner product search (MIPS) on a special index of contextual representations of the mentions. This module is differentiable, so the full system can be trained end-to-end using gradient based methods, starting from natural language inputs. We also describe a pretraining scheme for the contextual representation encoder by generating hard negative examples using existing knowledge bases. We show that DrKIT improves accuracy by 9 points on 3-hop questions in the MetaQA dataset, cutting the gap between text-based and KB-based state-of-the-art by 70%. On HotpotQA, DrKIT leads to a 10% improvement over a BERT-based re-ranking approach to retrieving the relevant passages required to answer a question. DrKIT is also very efficient, processing up to 10-100x more queries per second than existing multi-hop systems.

NeurIPS Conference 2020 Conference Paper

Faithful Embeddings for Knowledge Base Queries

  • Haitian Sun
  • Andrew Arnold
  • Tania Bedrax Weiss
  • Fernando Pereira
  • William W. Cohen

The deductive closure of an ideal knowledge base (KB) contains exactly the logical queries that the KB can answer. However, in practice KBs are both incomplete and over-specified, failing to answer some queries that have real-world answers. \emph{Query embedding} (QE) techniques have been recently proposed where KB entities and KB queries are represented jointly in an embedding space, supporting relaxation and generalization in KB inference. However, experiments in this paper show that QE systems may disagree with deductive reasoning on answers that do not require generalization or relaxation. We address this problem with a novel QE method that is more faithful to deductive reasoning, and show that this leads to better performance on complex queries to incomplete KBs. Finally we show that inserting this new QE module into a neural question-answering system leads to substantial improvements over the state-of-the-art.

ICLR Conference 2020 Conference Paper

Scalable Neural Methods for Reasoning With a Symbolic Knowledge Base

  • William W. Cohen
  • Haitian Sun
  • R. Alex Hofer
  • Matthew Siegler

We describe a novel way of representing a symbolic knowledge base (KB) called a sparse-matrix reified KB. This representation enables neural modules that are fully differentiable, faithful to the original semantics of the KB, expressive enough to model multi-hop inferences, and scalable enough to use with realistically large KBs. The sparse-matrix reified KB can be distributed across multiple GPUs, can scale to tens of millions of entities and facts, and is orders of magnitude faster than naive sparse-matrix implementations. The reified KB enables very simple end-to-end architectures to obtain competitive performance on several benchmarks representing two families of tasks: KB completion, and learning semantic parsers from denotations.

ICLR Conference 2018 Conference Paper

Breaking the Softmax Bottleneck: A High-Rank RNN Language Model

  • Zhilin Yang 0001
  • Zihang Dai
  • Ruslan Salakhutdinov
  • William W. Cohen

We formulate language modeling as a matrix factorization problem, and show that the expressiveness of Softmax-based models (including the majority of neural language models) is limited by a Softmax bottleneck. Given that natural language is highly context-dependent, this further implies that in practice Softmax with distributed word embeddings does not have enough capacity to model natural language. We propose a simple and effective method to address this issue, and improve the state-of-the-art perplexities on Penn Treebank and WikiText-2 to 47.69 and 40.68 respectively. The proposed method also excels on the large-scale 1B Word dataset, outperforming the baseline by over 5.6 points in perplexity.

AAAI Conference 2017 Conference Paper

Bootstrapping Distantly Supervised IE Using Joint Learning and Small Well-Structured Corpora

  • Lidong Bing
  • Bhuwan Dhingra
  • Kathryn Mazaitis
  • Jong Hyuk Park
  • William W. Cohen

We propose a framework to improve the performance of distantly-supervised relation extraction, by jointly learning to solve two related tasks: concept-instance extraction and relation extraction. We further extend this framework to make a novel use of document structure: in some small, wellstructured corpora, sections can be identified that correspond to relation arguments, and distantly-labeled examples from such sections tend to have good precision. Using these as seeds we extract additional relation examples by applying label propagation on a graph composed of noisy examples extracted from a large unstructured testing corpus. Combined with the soft constraint that concept examples should have the same type as the second argument of the relation, we get significant improvements over several state-of-the-art approaches to distantly-supervised relation extraction, and reasonable extraction performance even with very small set of distant labels.

IJCAI Conference 2017 Conference Paper

Using Graphs of Classifiers to Impose Declarative Constraints on Semi-supervised Learning

  • Lidong Bing
  • William W. Cohen
  • Bhuwan Dhingra

We propose a general approach to modeling semi-supervised learning (SSL) algorithms. Specifically, we present a declarative language for modeling both traditional supervised classification tasks and many SSL heuristics, including both well-known heuristics such as co-training and novel domain-specific heuristics. In addition to representing individual SSL heuristics, we show that multiple heuristics can be automatically combined using Bayesian optimization methods. We experiment with two classes of tasks, link-based text classification and relation extraction. We show modest improvements on well-studied link-based classification benchmarks, and state-of-the-art results on relation-extraction tasks for two realistic domains.

IJCAI Conference 2016 Conference Paper

Learning First-Order Logic Embeddings via Matrix Factorization

  • William Yang Wang
  • William W. Cohen

Many complex reasoning tasks in Artificial Intelligence (including relation extraction, knowledge base completion, and information integration) can be formulated as inference problems using a probabilistic first-order logic. However, due to the discrete nature of logical facts and predicates, it is challenging to generalize symbolic representations and represent first-order logic formulas in probabilistic relational models. In this work, we take a rather radical approach: we aim at learning continuous low-dimensional embeddings for first-order logic from scratch. In particular, we first consider a structural gradient based structure learning approach to generate plausible inference formulas from facts; then, we build grounded proof graphs using background facts, training examples, and these inference formulas. To learn embeddings for formulas, we map the training examples into the rows of a binary matrix, and inference formulas into the columns. Using a scalable matrix factorization approach, we then learn the latent continuous representations of examples and logical formulas via a low-rank approximation method. In experiments, we demonstrate the effectiveness of reasoning with first-order logic embeddings by comparing with several state-of-the-art baselines on two datasets in the task of knowledge base completion.

ICML Conference 2016 Conference Paper

Revisiting Semi-Supervised Learning with Graph Embeddings

  • Zhilin Yang 0001
  • William W. Cohen
  • Ruslan Salakhutdinov

We present a semi-supervised learning framework based on graph embeddings. Given a graph between instances, we train an embedding for each instance to jointly predict the class label and the neighborhood context in the graph. We develop both transductive and inductive variants of our method. In the transductive variant of our method, the class labels are determined by both the learned embeddings and input feature vectors, while in the inductive variant, the embeddings are defined as a parametric function of the feature vectors, so predictions can be made on instances not seen during training. On a large and diverse set of benchmark tasks, including text classification, distantly supervised entity extraction, and entity classification, we show improved performance over many of the existing models.

IJCAI Conference 2015 Conference Paper

A Soft Version of Predicate Invention Based on Structured Sparsity

  • William Yang Wang
  • Kathryn Mazaitis
  • William W. Cohen

In predicate invention (PI), new predicates are introduced into a logical theory, usually by rewriting a group of closely-related rules to use a common invented predicate as a “subroutine”. PI is difficult, since a poorly-chosen invented predicate may lead to error cascades. Here we suggest a “soft” version of predicate invention: instead of explicitly creating new predicates, we implicitly group closely-related rules by using structured sparsity to regularize their parameters together. We show that soft PI, unlike hard PI, consistently improves over previous strong baselines for structure-learning on two large-scale tasks.

ECAI Conference 2012 Conference Paper

Creating Features from a Learned Grammar in a Simulated Student

  • Nan Li 0001
  • Abraham Schreiber
  • William W. Cohen
  • Kenneth R. Koedinger

Understanding and developing intelligent agents that simulate human learning has been a long-standing goal in both artificial intelligence and cognitive science. Although learning agents are able to produce intelligent behavior with less human knowledge engineering than in the past, intelligent agent developers are still required to manually encode much prior domain knowledge. We recently proposed an efficient algorithm that acquires representations of the world using an unsupervised grammar induction algorithm, and integrated this representation learner into a simulated student, SimStudent. In this paper, we use the representation learner to automatically generate a set of feature predicates based on the acquired representation, and provide the automatically generated feature predicates to SimStudent as prior domain knowledge. We show that with the automatically-generated feature predicates, the learning agent can perform at a level comparable to when it is given manually-constructed feature predicates, but without the effort required to create these feature predicates.

ECAI Conference 2010 Conference Paper

A Very Fast Method for Clustering Big Text Datasets

  • Frank Lin
  • William W. Cohen

Large-scale text datasets have long eluded a family of particularly elegant and effective clustering methods that exploits the power of pair-wise similarities between data points due to the prohibitive cost, time- and space-wise, in operating on a similarity matrix, where the state-of-the-art is at best quadratic in time and in space. We present an extremely fast and simple method also using the power of all pair-wise similarity between data points, and show through experiments that it does as well as previous methods in clustering accuracy, and it does so with in linear time and space, without sampling data points or sparsifying the similarity matrix.

IJCAI Conference 2005 Conference Paper

Learning to Understand Web Site Update Requests

  • William W. Cohen
  • Einat Minkov
  • Anthony

Although Natural Language Processing (NLP) for requests for information has been well-studied, there has been little prior work on understanding requests to update information. In this paper, we propose an intelligent system that can process natural language website update requests semi-automatically. In particular, this system can analyze requests, posted via email, to update the factual content of individual tuples in a databasebacked website. Users’ messages are processed using a scheme decomposing their requests into a sequence of entity recognition and text classification tasks. Using a corpus generated by human-subject experiments, we experimentally evaluate the performance of this system, as well as its robustness in handling request types not seen in training, or user-specific language styles not seen in training.

IJCAI Conference 2005 Conference Paper

Stacked Sequential Learning

  • William W. Cohen
  • Vitor R

We describe a new sequential learning scheme called “stacked sequential learning”. Stacked sequential learning is a meta-learning algorithm, in which an arbitrary base learner is augmented so as to make it aware of the labels of nearby examples. We evaluate the method on several “sequential partitioning problems”, which are characterized by long runs of identical labels. We demonstrate that on these problems, sequential stacking consistently improves the performance of non-sequential base learners; that sequential stacking often improves performance of learners (such as CRFs) that are designed specifically for sequential tasks; and that a sequentially stacked maximum-entropy learner generally outperforms CRFs.

AAAI Conference 1999 Conference Paper

A Simple, Fast, and Effective Rule Learner

  • William W. Cohen
  • Yoram Singer
  • AT
  • T Labs - Research

Wedescribe SLIPPER~ a newrule learner that generates rulesets by repeatedly boosting a simple, greedy, rule-builder. Likethe rulesets built byother rule learners, the ensembleof rules created by SLIPPER is compact and comprehensible. This is madepossible by imposingappropriate constraints on the rule-builder, andby use of a recently-proposedgeneralization of Adaboostcalled confidence-ratedboosting. In spite of its relative simplicity, SLIPPER is highly scalable, andan effective learner. Experimentally, SLIPPER scales no worse than O(nlog n), wheren is the numberof examples, and on a set of 32 benchmark problems, SLIPPER achieves lower error rates than RIPPER 20 times, and lowererror rates than C4. 5rules22times.

AAAI Conference 1999 Conference Paper

AI & the World Wide WebRecognizing Structure in Web Pages Using Similarity Queries

  • William W. Cohen
  • AT
  • T Labs - Research

Wepresent general-purpose methodsfor recognizing certain types of structure in HTML documents. The methodsare implementedusing WHIRL, a "soft" logic that incorporates a notion of textual similarity developed in the information retrieval community. In an experimental evaluation on 82 Web pages, the structure ranked first byour methodis "meaningful"--i. e. , a structure that wasused in a hand-coded"wrapper", or extraction program, for the page--nearly 70%of the time. This improveson a value of 50%obtained by an earlier method. With appropriate backgroundinformation, the structure-recognition methodswedescribe can also be used to learn a wrapper from examples, or for maintaining a wrapper as a Web page changes format. In these settings, the top-rankedstructure is meaningfulnearly 85%of the time.

AAAI Conference 1997 Conference Paper

Transferring and Retraining Learned Information Filters

  • William W. Cohen

Any system that learns how to filter documents will suffer poor performance during an initial training phase. One way of addressing this problem is to exploit filters learned by other users in a collaborative fashion. We investigate “direct transfer” of learned filters in this setting-a limiting case for any collaborative learning system. We evaluate the stability of several different learning methods under direct transfer, and conclude that symbolic learning methods that use negatively correlated features of the data perform poorly in transfer, even when they perform well in more conventional evaluation settings. This effect is robust: it holds for several learning methods, when a diverse set of users is used in training the classifier, and even when the learned classifiers can be adapted to the new user’s distribution. Our experiments give rise to several concrete proposals for improving generalization performance in a collaborative setting, including a beneficial variation on a feature selection method that has been widely used in text categorization.

AAAI Conference 1996 Conference Paper

Learning Trees and Rules with Set-Valued Features

  • William W. Cohen

In most learning systems examples are represented as fixed-length “ feature vectors”, the components of which are either real numbers or nominal values. W e propose an extension of the featurevector representation that allows the value of a feature to be a set of strings; for instance, to represent a small white and black dog with the nominal features size and species and the setvalued feature color, one might use a feature vector with size=small, species=canis-f amiliaris and color={ white, black}. Since we make no assumptions about the number of possible set elements, this extension of the traditional feature-vector representation is closely connected to Blum’ s “ infinite attribute” rep resentation. We argue that many decision tree and rule learning algorithms can be easily extended to setvalued features. We also show by example that many real-world learning problems can be efficiently and naturally represented with set-valued features; in particular, text categorization problems and problems that arise in propositionalizing first-order representations lend themselves to set-valued features.

AAAI Conference 1994 Conference Paper

Pac-Learning Nondeterminate Clauses

  • William W. Cohen

Several practical inductive logic programming systems efficiently learn “determinate” clauses of constant depth. Recently it has been shown that while nonrecursive constant-depth determinate clauses are pat-learnable, most of the obvious syntactic generalizations of this language are not pat-learnable. In this paper we introduce a new restriction on logic programs called “locality”, and present two formal results. First, the language of nonrecursive clauses of constant locality is paclearnable. Second, the language of nonrecursive clauses of constant locality is strictly more expressive than the language of nonrecursive determinate clauses of constant depth. Hence, constantlocality clauses are a pat-learnable generalization of constant-depth determinate clauses.

AAAI Conference 1994 Conference Paper

Recovering Software Specifications with Inductive Logic Programming

  • William W. Cohen

We consider using machine learning techniques to help understand a large software system. In particular, we describe how learning techniques can be used to reconstruct abstract Datalog specifications of a certain type of database software from examples of its operation. In a case study involving a large (more than one million lines of C) real-world software system, we demonstrate that off-the-shelf inductive logic programming methods can be successfully used for specification recovery; specifically, Grende12 can extract specifications for about one-third of the modules in a test suite with high rates of precision and recall. We then describe two extensions to Grende12 which improve performance on this task: one which allows it to output a set of candidate hypotheses, and another which allows it to output specifications containing determinations. In combination, these extensions enable specifications to be extracted for nearly two-thirds of the benchmark modules with perfect recall, and precision of better than 60%.

AAAI Conference 1993 Conference Paper

Cryptographic Limitations on Learning One-Clause Logic Programs

  • William W. Cohen

An active area of research in machine learning is learning logic programs from examples. This paper investigates formally the problem of learning a single Horn clause: we focus on generalizations of the language of constant-depth determinate clauses, which is used by several practical learning systems. We show first that determinate clauses of logarithmic depth are not learnable. Next we show that learning indeterminate clauses with at most k indeterminate variables is equivalent to learning DNF. Finally, we show that recursive constant-depth determinate clauses are not learnable. Our primary technical tool is the method of predictionpreserving reducibilities introduced by Pitt and Warmuth [1990]; as a consequence our results are independent of the representations used by the learning system.

AAAI Conference 1993 Conference Paper

Pac-Learning a Restricted Class of Recursive Logic Programs

  • William W. Cohen

A crucial problem in “inductive logic programming” is learning recursive logic programs from examples alone; current systems such as GOLEM and FOIL often achieve success only for carefully selected sets of examples. We describe a program called FORCE2 that uses the new technique of “forced simulation” to learn twoclause “closed” linear recursive ij-determinate programs; although this class of programs is fairly restricted, it does include most of the standard benchmark problems. Experimentally, FORCE2 requires fewer examples than FOIL, and is more accurate when learning from randomly chosen datasets. Formally, FORCE2 is also shown to be a pat-learning algorithm in a variant of Valiant’ s [1984] model, in which we assume the ability to make two types of queries: one which gives an upper bound on the depth of the proof for an example, and one which determines if an example can be proved in unit depth.

AAAI Conference 1992 Conference Paper

Computing Least Common Subsumers in Description Logics

  • William W. Cohen

Description logics are a popular formalism for knowledge representation and reasoning. This paper introduces a new operation for description logics: computing the “least common subsumer” of a pair of descriptions. This operation computes the largest set of commonalities between two descriptions. After arguing for the usefulness of this operation, we analyze it by relating computation of the least common subsumer to the well-understood problem of testing subsumption; a close connection is shown in the restricted case of “structural subsumption”. We also present a method for computing the least common subsumer of “attribute chain equalities”, and analyze the tractability of computing the least common subsumer of a set of descriptions -an important operation in inductive learning.

AAAI Conference 1990 Conference Paper

Learning from Textbook Knowledge: A Case Study

  • William W. Cohen

One of the “grand challenges for machine learning” is the problem of learning from textbooks. This paper addresses the problem of learning from texts including omissions and inconsistencies that are clarified by illustrative examples. To avoid problems in natural language understanding, we consider a simplification of this problem in which the text has been manually translated into a logical theory. This learning problem is solvable by a technique that we call analogical abductive explanation based learning (ANA-EBL). Formal evidence and experimental results in the domain of contract bridge show that the learning technique is both efficient and effective.