Arrow Research search

Author name cluster

Dan Roth

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

59 papers
1 author row

Possible papers

59

TMLR Journal 2026 Journal Article

On Calibration of Multilingual Question Answering LLMs

  • Yahan Yang
  • Soham Dan
  • Dan Roth
  • Insup Lee

Multilingual pre-trained Large Language Models (LLMs) are incredibly effective at Question Answering (QA), a core task in Natural Language Understanding, achieving high accuracies on several multilingual benchmarks. However, little is known about how well their confidences are calibrated. In this paper, we comprehensively benchmark the calibration of several multilingual LLMs (MLLMs) on a variety of QA tasks. We perform extensive experiments, spanning encoder-only, encoder-decoder, and decoder-only QA models (size varying from 110M to 7B parameters) and diverse languages, including both high- and low-resource ones. We study different dimensions of calibration in in-distribution, out-of-distribution, and cross-lingual transfer settings, and investigate strategies to improve it, including post-hoc methods and regularized fine-tuning. For decoder-only LLMs such as LlaMa2, we additionally find that in-context learning improves confidence calibration on multilingual data. We also conduct several ablation experiments to study the effect of language distances, language corpus size, and model size on calibration, and how multilingual models compare with their monolingual counterparts for diverse tasks and languages. Our experiments suggest that the multilingual QA models are poorly calibrated for languages other than English and incorporating a small set of cheaply translated multilingual samples during fine-tuning/calibration effectively enhances the calibration performance.

NeurIPS Conference 2025 Conference Paper

Imbalances in Neurosymbolic Learning: Characterization and Mitigating Strategies

  • Efthymia Tsamoura
  • Kaifu Wang
  • Dan Roth

We study one of the most popular problems in **neurosymbolic learning** (NSL), that of learning neural classifiers given only the result of applying a symbolic component $\sigma$ to the gold labels of the elements of a vector $\mathbf x$. The gold labels of the elements in $\mathbf x$ are unknown to the learner. We make multiple contributions, theoretical and practical, to address a problem that has not been studied so far in this context, that of characterizing and mitigating *learning imbalances*, i. e. , major differences in the errors that occur when classifying instances of different classes (aka **class-specific risks**). Our theoretical reveals a unique phenomenon: that $\sigma$ can greatly impact learning imbalances. This result sharply contrasts with previous research on supervised and weakly supervised learning, which only studies learning imbalances under data imbalances. On the practical side, we introduce a technique for estimating the marginal of the hidden gold labels using weakly supervised data. Then, we introduce algorithms that mitigate imbalances at training and testing time by treating the marginal of the hidden labels as a constraint. We demonstrate the effectiveness of our techniques using strong baselines from NSL and long-tailed learning, suggesting performance improvements of up to 14\%.

NeurIPS Conference 2024 Conference Paper

Synatra: Turning Indirect Knowledge into Direct Demonstrations for Digital Agents at Scale

  • Tianyue Ou
  • Frank F. Xu
  • Aman Madaan
  • Jiarui Liu
  • Robert Lo
  • Abishek Sridhar
  • Sudipta Sengupta
  • Dan Roth

LLMs can now act as autonomous agents that interact with digital environments and complete specific objectives (e. g. , arranging an online meeting). However, accuracy is still far from satisfactory, partly due to a lack of large-scale, direct demonstrations for digital tasks. Obtaining supervised data from humans is costly, and automatic data collection through exploration or reinforcement learning relies on complex environmental and content setup, resulting in datasets that lack comprehensive coverage of various scenarios. On the other hand, there is abundant knowledge that may indirectly assist task completion, such as online tutorials that were created for human consumption. In this work, we present Synatra, an approach that effectively transforms this indirect knowledge into direct supervision at scale. We define different types of indirect knowledge, and carefully study the available sources to obtain it, methods to encode the structure of direct demonstrations, and finally methods to transform indirect knowledge into direct demonstrations. We use 100k such synthetically-created demonstrations to finetune a 7B CodeLlama, and demonstrate that the resulting agent surpasses all comparably sized models on three web-based task benchmarks Mind2Web, MiniWoB++ and WebArena, as well as surpassing GPT-3. 5 on WebArena and Mind2Web. In addition, while synthetic demonstrations prove to be only 3% the cost of human demonstrations (at $0. 031 each), we show that the synthetic demonstrations can be more effective than an identical number of human demonstrations collected from limited domains.

NeurIPS Conference 2024 Conference Paper

Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models

  • Yushi Hu
  • Weijia Shi
  • Xingyu Fu
  • Dan Roth
  • Mari Ostendorf
  • Luke Zettlemoyer
  • Noah A. Smith
  • Ranjay Krishna

Humans draw to facilitate reasoning: we draw auxiliary lines when solving geometry problems; we mark and circle when reasoning on maps; we use sketches to amplify our ideas and relieve our limited-capacity working memory. However, such actions are missing in current multimodal language models (LMs). Current chain-of-thought and tool-use paradigms only use text as intermediate reasoning steps. In this work, we introduce Sketchpad, a framework that gives multimodal LMs a visual sketchpad and tools to draw on the sketchpad. The LM conducts planning and reasoning according to the visual artifacts it has drawn. Different from prior work, which uses text-to-image models to enable LMs to draw, Sketchpad enables LMs to draw with lines, boxes, marks, etc. , which is closer to human sketching and better facilitates reasoning. \name can also use specialist vision models during the sketching process (e. g. , draw bounding boxes with object detection models, draw masks with segmentation models), to further enhance visual perception and reasoning. We experiment on a wide range of math tasks (including geometry, functions, graph, chess) and complex visual reasoning tasks. Sketchpad substantially improves performance on all tasks over strong base models with no sketching, yielding an average gain of 12. 7% on math tasks, and 8. 6% on vision tasks. GPT-4o with Sketchpad sets a new state of the art on all tasks, including V*Bench (80. 3%), BLINK spatial reasoning (83. 9%), and visual correspondence (80. 8%). We will release all code and data.

TMLR Journal 2023 Journal Article

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

  • Aarohi Srivastava
  • Abhinav Rastogi
  • Abhishek Rao
  • Abu Awal Md Shoeb
  • Abubakar Abid
  • Adam Fisch
  • Adam R. Brown
  • Adam Santoro

Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG- bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood develop- ment, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google- internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting.

NeurIPS Conference 2023 Conference Paper

CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion

  • Yangruibo Ding
  • Zijian Wang
  • Wasi Ahmad
  • Hantian Ding
  • Ming Tan
  • Nihal Jain
  • Murali Krishna Ramanathan
  • Ramesh Nallapati

Code completion models have made significant progress in recent years, yet current popular evaluation datasets, such as HumanEval and MBPP, predominantly focus on code completion tasks within a single file. This over-simplified setting falls short of representing the real-world software development scenario where repositories span multiple files with numerous cross-file dependencies, and accessing and understanding cross-file context is often required to complete the code correctly. To fill in this gap, we propose CrossCodeEval, a diverse and multilingual code completion benchmark that necessitates an in-depth cross-file contextual understanding to complete the code accurately. CrossCodeEval is built on a diverse set of real-world, open-sourced, permissively-licensed repositories in four popular programming languages: Python, Java, TypeScript, and C#. To create examples that strictly require cross-file context for accurate completion, we propose a straightforward yet efficient static-analysis-based approach to pinpoint the use of cross-file context within the current file. Extensive experiments on state-of-the-art code language models like CodeGen and StarCoder demonstrate that CrossCodeEval is extremely challenging when the relevant cross-file context is absent, and we see clear improvements when adding these context into the prompt. However, despite such improvements, the pinnacle of performance remains notably unattained even with the highest-performing model, indicating that CrossCodeEval is also capable of assessing model's capability in leveraging extensive context to make better code completion. Finally, we benchmarked various methods in retrieving cross-file context, and show that CrossCodeEval can also be used to measure the capability of code retrievers.

AAAI Conference 2023 Conference Paper

GLUECons: A Generic Benchmark for Learning under Constraints

  • Hossein Rajaby Faghihi
  • Aliakbar Nafar
  • Chen Zheng
  • Roshanak Mirzaee
  • Yue Zhang
  • Andrzej Uszok
  • Alexander Wan
  • Tanawan Premsri

Recent research has shown that integrating domain knowledge into deep learning architectures is effective; It helps reduce the amount of required data, improves the accuracy of the models' decisions, and improves the interpretability of models. However, the research community lacks a convened benchmark for systematically evaluating knowledge integration methods. In this work, we create a benchmark that is a collection of nine tasks in the domains of natural language processing and computer vision. In all cases, we model external knowledge as constraints, specify the sources of the constraints for each task, and implement various models that use these constraints. We report the results of these models using a new set of extended evaluation criteria in addition to the task performances for a more in-depth analysis. This effort provides a framework for a more comprehensive and systematic comparison of constraint integration techniques and for identifying related research challenges. It will facilitate further research for alleviating some problems of state-of-the-art neural models.

NeurIPS Conference 2023 Conference Paper

On Learning Latent Models with Multi-Instance Weak Supervision

  • Kaifu Wang
  • Efthymia Tsamoura
  • Dan Roth

We consider a weakly supervised learning scenario where the supervision signal is generated by a transition function $\sigma$ of labels associated with multiple input instances. We formulate this problem as *multi-instance Partial Label Learning (multi-instance PLL)*, which is an extension to the standard PLL problem. Our problem is met in different fields, including latent structural learning and neuro-symbolic integration. Despite the existence of many learning techniques, limited theoretical analysis has been dedicated to this problem. In this paper, we provide the first theoretical study of multi-instance PLL with possibly an unknown transition $\sigma$. Our main contributions are as follows: First, we proposed a necessary and sufficient condition for the learnability of the problem. This condition nontrivially generalizes and relaxes the existing *small ambiguity degree* in PLL literature since we allow the transition to be deterministic. Second, we derived Rademacher-style error bounds based on the top-$k$ surrogate loss that is widely used in the neuro-symbolic literature. Furthermore, we conclude with empirical experiments for learning with an unknown transition. The empirical results align with our theoretical findings; however, they also expose the issue of scalability in the weak supervision literature.

NeurIPS Conference 2021 Conference Paper

SciGen: a Dataset for Reasoning-Aware Text Generation from Scientific Tables

  • Nafise Moosavi
  • Andreas Rücklé
  • Dan Roth
  • Iryna Gurevych

We introduce SciGen, a new challenge dataset consisting of tables from scientific articles and their corresponding descriptions, for the task of reasoning-aware data-to-text generation. Describing scientific tables goes beyond the surface realization of the table content and requires reasoning over table values. The unique properties of SciGen are that (1) tables mostly contain numerical values, and (2) the corresponding descriptions require arithmetic reasoning. SciGen is the first dataset that assesses the arithmetic reasoning capabilities of generation models on complex input structures, such as tables from scientific articles, and thus it opens new avenues for future research in reasoning-aware text generation and evaluation. The core part of SciGen, including the test data, is annotated by one of the authors of the corresponding articles. Such expert annotations do not scale to large training data sizes. To tackle this, we propose a pipeline for automatically extracting high-quality table-description pairs from the LaTeX sources of scientific articles. We study the effectiveness of state-of-the-art data-to-text generation models on SciGen and evaluate the results using common metrics and human evaluation. Our results and analyses show that adding high-quality unsupervised training data improves the correctness and reduces the hallucination in generated descriptions, however, the ability of state-of-the-art models is still severely limited on this task.

AAAI Conference 2021 Conference Paper

Visual Pivoting for (Unsupervised) Entity Alignment

  • Fangyu Liu
  • Muhao Chen
  • Dan Roth
  • Nigel Collier

This work studies the use of visual semantic representations to align entities in heterogeneous knowledge graphs (KGs). Images are natural components of many existing KGs. By combining visual knowledge with other auxiliary information, we show that the proposed new approach, EVA, creates a holistic entity representation that provides strong signals for cross-graph entity alignment. Besides, previous entity alignment methods require human labelled seed alignment, restricting availability. EVA provides a completely unsupervised solution by leveraging the visual similarity of entities to create an initial seed dictionary (visual pivots). Experiments on benchmark data sets DBP15k and DWY15k show that EVA offers state-of-the-art performance on both monolingual and cross-lingual entity alignment tasks. Furthermore, we discover that images are particularly useful to align long-tail KG entities, which inherently lack the structural contexts that are necessary for capturing the correspondences. Code release: https: //github. com/cambridgeltl/eva; project page: http: //cogcomp. org/page/publication view/927.

NeurIPS Conference 2020 Conference Paper

Learnability with Indirect Supervision Signals

  • Kaifu Wang
  • Qiang Ning
  • Dan Roth

Learning from indirect supervision signals is important in real-world AI applications when, often, gold labels are missing or too costly. In this paper, we develop a unified theoretical framework for multi-class classification when the supervision is provided by a variable that contains nonzero mutual information with the gold label. The nature of this problem is determined by (i) the transition probability from the gold labels to the indirect supervision variables and (ii) the learner's prior knowledge about the transition. Our framework relaxes assumptions made in the literature, and supports learning with unknown, non-invertible and instance-dependent transitions. Our theory introduces a novel concept called \emph{separation}, which characterizes the learnability and generalization bounds. We also demonstrate the application of our framework via concrete novel results in a variety of learning scenarios such as learning with superset annotations and joint supervision signals.

AAAI Conference 2020 Conference Paper

Robust Named Entity Recognition with Truecasing Pretraining

  • Stephen Mayhew
  • Gupta Nitish
  • Dan Roth

Although modern named entity recognition (NER) systems show impressive performance on standard datasets, they perform poorly when presented with noisy data. In particular, capitalization is a strong signal for entities in many languages, and even state of the art models overfit to this feature, with drastically lower performance on uncapitalized text. In this work, we address the problem of robustness of NER systems in data with noisy or uncertain casing, using a pretraining objective that predicts casing in text, or a truecaser, leveraging unlabeled data. The pretrained truecaser is combined with a standard BiLSTM-CRF model for NER by appending output distributions to character embeddings. In experiments over several datasets of varying domain and casing quality, we show that our new model improves performance in uncased text, even adding value to uncased BERT embeddings. Our method achieves a new state of the art on the WNUT17 shared task dataset.

IJCAI Conference 2020 Conference Paper

TransOMCS: From Linguistic Graphs to Commonsense Knowledge

  • Hongming Zhang
  • Daniel Khashabi
  • Yangqiu Song
  • Dan Roth

Commonsense knowledge acquisition is a key problem for artificial intelligence. Conventional methods of acquiring commonsense knowledge generally require laborious and costly human annotations, which are not feasible on a large scale. In this paper, we explore a practical way of mining commonsense knowledge from linguistic graphs, with the goal of transferring cheap knowledge obtained with linguistic patterns into expensive commonsense knowledge. The result is a conversion of ASER [Zhang et al. , 2020], a large-scale selectional preference knowledge resource, into TransOMCS, of the same representation as ConceptNet [Liu and Singh, 2004] but two orders of magnitude larger. Experimental results demonstrate the transferability of linguistic knowledge to commonsense knowledge and the effectiveness of the proposed approach in terms of quantity, novelty, and quality. TransOMCS is publicly available at: https: //github. com/HKUST-KnowComp/TransOMCS.

IJCAI Conference 2019 Conference Paper

Learning and Inference for Structured Prediction: A Unifying Perspective

  • Aryan Deshwal
  • Janardhan Rao Doppa
  • Dan Roth

In a structured prediction problem, one needs to learn a predictor that, given a structured input, produces a structured object, such as a sequence, tree, or clustering output. Prototypical structured prediction tasks include part-of-speech tagging (predicting POS tag sequence for an input sentence) and semantic segmentation of images (predicting semantic labels for pixels of an input image). Unlike simple classification problems, here there is a need to assign values to multiple output variables accounting for the dependencies between them. Consequently, the prediction step itself (aka ``inference" or ``decoding") is computationally-expensive, and so is the learning process, that typically requires making predictions as part of it. The key learning and inference challenge is due to the exponential size of the structured output space and depend on its complexity. In this paper, we present a unifying perspective of the different frameworks that address structured prediction problems and compare them in terms of their strengths and weaknesses. We also discuss important research directions including integration of deep learning advances into structured prediction, and learning from weakly supervised signals and active querying to overcome the challenges of building structured predictors from small amount of labeled data.

IJCAI Conference 2019 Conference Paper

Randomized Greedy Search for Structured Prediction: Amortized Inference and Learning

  • Chao Ma
  • F A Rezaur Rahman Chowdhury
  • Aryan Deshwal
  • Md Rakibul Islam
  • Janardhan Rao Doppa
  • Dan Roth

In a structured prediction problem, we need to learn a predictor that can produce a structured output given a structured input (e. g. , part-of-speech tagging). The key learning and inference challenge is due to the exponential size of the structured output space. This paper makes four contributions towards the goal of a computationally-efficient inference and training approach for structured prediction that allows to employ complex models and to optimize for non-decomposable loss functions. First, we define a simple class of randomized greedy search (RGS) based inference procedures that leverage classification algorithms for simple outputs. Second, we develop a RGS specific learning approach for amortized inference that can quickly produce high-quality outputs for a given set of structured inputs. Third, we plug our amortized RGS inference solver inside the inner loop of parameter-learning algorithms (e. g. , structured SVM) to improve the speed of training. Fourth, we perform extensive experiments on diverse structured prediction tasks. Results show that our proposed approach is competitive or better than many state-of-the-art approaches in spite of its simplicity.

AAAI Conference 2018 Conference Paper

Learning Better Name Translation for Cross-Lingual Wikification

  • Chen-Tse Tsai
  • Dan Roth

A notable challenge in cross-lingual wikification is the problem of retrieving English Wikipedia title candidates given a non-English mention, a step that requires translating names1 written in a foreign language into English. Creating training data for name translation requires significant amount of human efforts. In order to cover as many languages as possible, we propose a probabilistic model that leverages indirect supervision signals in a knowledge base. More specifically, the model learns name translation from title pairs obtained from the inter-language links in Wikipedia. The model jointly considers word alignment and word transliteration. Comparing to 6 other approaches on 9 languages, we show that the proposed model outperforms others not only on the transliteration metric, but also on the ability to generate target English titles for a cross-lingual wikifier. Consequently, as we show, it improves the end-to-end performance of a cross-lingual wikifier on the TAC 2016 EDL dataset.

NeurIPS Conference 2018 Conference Paper

Learning Pipelines with Limited Data and Domain Knowledge: A Study in Parsing Physics Problems

  • Mrinmaya Sachan
  • Kumar Avinava Dubey
  • Tom Mitchell
  • Dan Roth
  • Eric Xing

As machine learning becomes more widely used in practice, we need new methods to build complex intelligent systems that integrate learning with existing software, and with domain knowledge encoded as rules. As a case study, we present such a system that learns to parse Newtonian physics problems in textbooks. This system, Nuts&Bolts, learns a pipeline process that incorporates existing code, pre-learned machine learning models, and human engineered rules. It jointly trains the entire pipeline to prevent propagation of errors, using a combination of labelled and unlabelled data. Our approach achieves a good performance on the parsing task, outperforming the simple pipeline and its variants. Finally, we also show how Nuts&Bolts can be used to achieve improvements on a relation extraction task and on the end task of answering Newtonian physics problems.

AAAI Conference 2018 Conference Paper

Question Answering as Global Reasoning Over Semantic Abstractions

  • Daniel Khashabi
  • Tushar Khot
  • Ashish Sabharwal
  • Dan Roth

We propose a novel method for exploiting the semantic structure of text to answer multiple-choice questions. The approach is especially suitable for domains that require reasoning over a diverse set of linguistic constructs but have limited training data. To address these challenges, we present the first system, to the best of our knowledge, that reasons over a wide range of semantic abstractions of the text, which are derived using off-the-shelf, general-purpose, pre-trained natural language modules such as semantic role labelers, coreference resolvers, and dependency parsers. Representing multiple abstractions as a family of graphs, we translate question answering (QA) into a search for an optimal subgraph that satisfies certain global and local properties. This formulation generalizes several prior structured QA systems. Our system, SEMANTICILP, demonstrates strong performance on two domains simultaneously. In particular, on a collection of challenging science QA datasets, it outperforms various state-ofthe-art approaches, including neural models, broad coverage information retrieval, and specialized techniques using structured knowledge bases, by 2%-6%.

IJCAI Conference 2018 Conference Paper

Systems AI: A Declarative Learning Based Programming Perspective

  • Parisa Kordjamshidi
  • Dan Roth
  • Kristian Kersting

Data-driven approaches are becoming dominant problem-solving techniques in many areas of research and industry. Unfortunately, current technologies do not make such techniques easy to use for application experts who are not fluent in machine learning nor for machine learning experts who aim at testing ideas on real-world data and need to evaluate those as a part of an end-to-end system. We review key efforts made by various AI communities to provide languages for high-level abstractions over learning and reasoning techniques needed for designing complex AI systems. We classify the existing frameworks based on the type of techniques as well as the data and knowledge representations they use, provide a comparative study of the way they address the challenges of programming real-world applications, and highlight some shortcomings and future directions.

AAAI Conference 2017 Conference Paper

Incidental Supervision: Moving beyond Supervised Learning

  • Dan Roth

Machine Learning and Inference methods have become ubiquitous in our attempt to induce more abstract representations of natural language text, visual scenes, and other messy, naturally occurring data, and support decisions that depend on it. However, learning models for these tasks is difficult partly because generating the necessary supervision signals for it is costly and does not scale. This paper describes several learning paradigms that are designed to alleviate the supervision bottleneck. It will illustrate their benefit in the context of multiple problems, all pertaining to inducing various levels of semantic representations from text. In particular, we discuss (i) Response Driven Learning of models, a learning protocol that supports inducing meaning representations simply by observing the model’s behavior in its environment, (ii) the exploitation of Incidental Supervision signals that exist in the data, independently of the task at hand, to learn models that identify and classify semantic predicates, and (iii) the use of weak supervision to combine simple models to support global decisions where joint supervision is not available. While these ideas are applicable in a range of Machine Learning driven fields, we will demonstrate it in the context of several natural language applications, from (cross-lingual) text classification, to Wikification, to semantic parsing.

AAAI Conference 2017 Conference Paper

Unit Dependency Graph and Its Application to Arithmetic Word Problem Solving

  • Subhro Roy
  • Dan Roth

We present SummaRuNNer, a Recurrent Neural Network (RNN) based sequence model for extractive summarization of documents and show that it achieves performance better than or comparable to state-of-the-art. Our model has the additional advantage of being very interpretable, since it allows visualization of its predictions broken up by abstract features such as information content, salience and novelty. Another novel contribution of our work is abstractive training of our extractive model that can train on human generated reference summaries alone, eliminating the need for sentence-level extractive labels.

IJCAI Conference 2016 Conference Paper

Cross-Lingual Dataless Classification for Many Languages

  • Yangqiu Song
  • Shyam Upadhyay
  • Haoruo Peng
  • Dan Roth

Dataless text classification [Chang et al. , 2008] is a classification paradigm which maps documents into a given label space without requiring any annotated training data. This paper explores a cross-lingual variant of this paradigm, where documents in multiple languages are classified into an English label space. We use CLESA (cross-lingual explicit semantic analysis) to embed both foreign language documents and an English label space into a shared semantic space, and select the best label(s) for a document using the similarity between the corresponding semantic representations. We illustrate our approach by experimenting with classifying documents in 88 different languages into the same English label space. In particular, we show that CLESA is better than using a monolingual ESA on the target foreign language and translating the English labels into that language. Moreover, the evaluation on two benchmarks, TED and RCV2, showed that cross-lingual dataless classification outperforms supervised learning methods when a large collection of annotated documents is not available.

AAAI Conference 2016 Conference Paper

Labeling the Semantic Roles of Commas

  • Naveen Arivazhagan
  • Christos Christodoulopoulos
  • Dan Roth

Commas and the surrounding sentence structure often express relations that are essential to understanding the meaning of the sentence. This paper proposes a set of relations commas participate in, expanding on previous work in this area, and develops a new dataset annotated with this set of labels. We identify features that are important to achieve a good performance on comma labeling and then develop a machine learning method that achieves high accuracy on identifying comma relations, improving over previous work. Finally, we discuss a variety of possible uses, both as syntactic and discourseoriented features and constraints for downstream tasks.

IJCAI Conference 2016 Conference Paper

Question Answering via Integer Programming over Semi-Structured Knowledge

  • Daniel Khashabi
  • Tushar Khot
  • Ashish Sabharwal
  • Peter Clark
  • Oren Etzioni
  • Dan Roth

Answering science questions posed in natural language is an important AI challenge. Answering such questions often requires non-trivial inference and knowledge that goes beyond factoid retrieval. Yet, most systems for this task are based on relatively shallow Information Retrieval (IR) and statistical correlation techniques operating on large unstructured corpora. We propose a structured inference system for this task, formulated as an Integer Linear Program (ILP), that answers natural language questions using a semi-structured knowledge base derived from text, including questions requiring multi-step inference and a combination of multiple facts. On a dataset of real, unseen science questions, our system significantly outperforms (+14%) the best previous attempt at structured reasoning for this task, which used Markov Logic Networks (MLNs). It also improves upon a previous ILP formulation by 17. 7%. When combined with unstructured inference methods, the ILP system significantly boosts overall performance (+10%). Finally, we show our approach is substantially more robust to a simple answer perturbation compared to statistical correlation methods.

IJCAI Conference 2015 Conference Paper

Constrained Information-Theoretic Tripartite Graph Clustering to Identify Semantically Similar Relations

  • Chenguang Wang
  • Yangqiu Song
  • Dan Roth
  • Chi Wang
  • Jiawei Han
  • Heng Ji
  • Ming Zhang

In knowledge bases or information extraction results, differently expressed relations can be semantically similar (e. g. , (X, wrote, Y) and (X, ’s written work, Y)). Therefore, grouping semantically similar relations into clusters would facilitate and improve many applications, including knowledge base completion, information extraction, information retrieval, and more. This paper formulates relation clustering as a constrained tripartite graph clustering problem, presents an efficient clustering algorithm and exhibits the advantage of the constrained framework. We introduce several ways that provide side information via must-link and cannotlink constraints to improve the clustering results. Different from traditional semi-supervised learning approaches, we propose to use the similarity of relation expressions and the knowledge of entity types to automatically construct the constraints for the algorithm. We show improved relation clustering results on two datasets extracted from human annotated knowledge base (i. e. , Freebase) and open information extraction results (i. e. , ReVerb data).

IJCAI Conference 2015 Conference Paper

Saul: Towards Declarative Learning Based Programming

  • Parisa Kordjamshidi
  • Dan Roth
  • Hao Wu

We present Saul, a new probabilistic programming language designed to address some of the shortcomings of programming languages that aim at advancing and simplifying the development of AI systems. Such languages need to interact with messy, naturally occurring data, to allow a programmer to specify what needs to be done at an appropriate level of abstraction rather than at the data level, to be developed on a solid theory that supports moving to and reasoning at this level of abstraction and, finally, to support flexible integration of these learning and inference models within an application program. Saul is an object-functional programming language written in Scala that facilitates these by (1) allowing a programmer to learn, name and manipulate named abstractions over relational data; (2) supporting seamless incorporation of trainable (probabilistic or discriminative) components into the program, and (3) providing a level of inference over trainable models to support composition and make decisions that respect domain and application constraints. Saul is developed over a declaratively defined relational data model, can use piecewise learned factor graphs with declaratively specified learning and inference objectives, and it supports inference over probabilistic models augmented with declarative knowledge-based constraints. We describe the key constructs of Saul and exemplify its use in developing applications that require relational feature engineering and structured output prediction.

AAAI Conference 2015 Conference Paper

Structural Learning with Amortized Inference

  • Kai-Wei Chang
  • Shyam Upadhyay
  • Gourab Kundu
  • Dan Roth

Training a structured prediction model involves performing several loss-augmented inference steps. Over the lifetime of the training, many of these inference problems, although different, share the same solution. We propose AI-DCD, an Amortized Inference framework for Dual Coordinate Descent method, an approximate learning algorithm, that accelerates the training process by exploiting this redundancy of solutions, without compromising the performance of the model. We show the efficacy of our method by training a structured SVM using dual coordinate descent for an entityrelation extraction task. Our method learns the same model as an exact training algorithm would, but call the inference engine only in 10% – 24% of the inference problems encountered during training. We observe similar gains on a multi-label classification task and with a Structured Perceptron model for the entity-relation task.

AAAI Conference 2014 Conference Paper

On Dataless Hierarchical Text Classification

  • Yangqiu Song
  • Dan Roth

In this paper, we systematically study the problem of dataless hierarchical text classification. Unlike standard text classification schemes that rely on supervised training, dataless classification depends on understanding the labels of the sought after categories and requires no labeled data. Given a collection of text documents and a set of labels, we show that understanding the labels can be used to accurately categorize the documents. This is done by embedding both labels and documents in a semantic space that allows one to compute meaningful semantic similarity between a document and a potential label. We show that this scheme can be used to support accurate multiclass classification without any supervision. We study several semantic representations and show how to improve the classification using bootstrapping. Our results show that bootstrapped dataless classification is competitive with supervised classification with thousands of labeled examples.

IJCAI Conference 2013 Conference Paper

End-to-End Coreference Resolution for Clinical Narratives

  • Prateek Jindal
  • Dan Roth

Coreference resolution is the problem of clustering mentions into entities and is very critical for natural language understanding. This paper studies the problem of coreference resolution in the context of the important domain of clinical text. Clinical text is unique because it requires significant use of domain knowledge to support coreference resolution. It also has specific discourse characteristics which impose several constraints on coreference decisions. We present a principled framework to incorporate knowledge-based constraints in the coreference model. We also show that different pronouns behave quite differently, necessitating the development of distinct ways for resolving different pronouns. Our methods result in significant performance improvements and we report the best results on a clinical corpora that has been used in coreference shared tasks. Moreover, for the first time, we report the results for end-to-end coreference resolution on this corpora.

IJCAI Conference 2011 Conference Paper

Learning from Natural Instructions

  • Dan Goldwasser
  • Dan Roth

Machine learning is traditionally formalized and researched as the study of learning concepts and decision functions from labeled examples, requiring a representation that encodes information about the domain of the decision function to be learned. We are interested in providing a way for a human teacher to interact with an automated learner using natural instructions, thus allowing the teacher to communicate the relevant domain expertise to the learner without necessarily knowing anything about the internal representations used in the learning process. In this paper we suggest to view the process of learning a decision function as a natural language lesson interpretation problem instead of learning from labeled examples. This interpretation of machine learning is motivated by human learning processes, in which the learner is given a lesson describing the target concept directly, and a few instances exemplifying it. We introduce a learning algorithm for the lesson interpretation problem that gets feedback from its performance on the final task, while learning jointly (1) how to interpret the lesson and (2) how to use this interpretation to do well on the final task. his approach alleviates the supervision burden of traditional machine learning by focusing on supplying the learner with only human-level task expertise for learning. We evaluate our approach by applying it to the rules of the Freecell solitaire card game. We show that our learning approach can eventually use natural language instructions to learn the target concept and play the game legally. Furthermore, we show that the learned semantic interpreter also generalizes to previously unseen instructions.

IJCAI Conference 2011 Conference Paper

Making Better Informed Trust Decisions with Generalized Fact-Finding

  • Jeff Pasternack
  • Dan Roth

Information retrieval may suggest a document, and information extraction may tell us what it says, but which information sources do we trust and which assertions do we believe when different authors make conflicting claims? Trust algorithms known as fact-finders attempt to answer these questions, but consider only which source makes which claim, ignoring a wealth of background knowledge and contextual detail such as the uncertainty in the information extraction of claims from documents, attributes of the sources, the degree of similarity among claims, and the degree of certainty expressed by the sources. We introduce a new, generalized fact-finding framework able to incorporate this additional information into the fact-finding process. Experiments using several state-of-the-art fact-finding algorithms demonstrate that generalized fact-finders achieve significantly better performance than their original variants on both semi-synthetic and real-world problems.

IJCAI Conference 2011 Conference Paper

Online Latent Structure Training for Language Acquisition

  • Michael Connor
  • Cynthia Fisher
  • Dan Roth

A fundamental step in sentence comprehension involves assigning semantic roles to sentence constituents. To accomplish this, the listener must parse the sentence, find constituents that are candidate arguments, and assign semantic roles to those constituents. Where do children learning their first languages begin in solving this problem? Even assuming children can derive a rough meaning for the sentence from the situation, how do they begin to map this meaning to the structure and the structure to the form of the sentence? In this paper we use feedback from a semantic role labeling (SRL) task to improve the intermediate syntactic representations that feed the SRL. We accomplish this by training an intermediate classifier using signals derived from latent structure optimization techniques. By using a separate classifier to predict internal structure we see benefits due to knowledge embedded in the classifier's feature representation. This extra structure allows the system to begin to learn using weaker, more plausible semantic feedback.

IJCAI Conference 2009 Conference Paper

  • Alexandre Klementiev
  • Dan Roth
  • Kevin Small
  • Ivan Titov

Consider the setting where a panel of judges is repeatedly asked to (partially) rank sets of objects according to given criteria, and assume that the judges’ expertise depends on the objects’ domain. Learning to aggregate their rankings with the goal of producing a better joint ranking is a fundamental problem in many areas of Information Retrieval and Natural Language Processing, amongst others. However, supervised ranking data is generally dif- ficult to obtain, especially if coming from multiple domains. Therefore, we propose a framework for learning to aggregate votes of constituent rankers with domain specific expertise without supervision. We apply the learning framework to the settings of aggregating full rankings and aggregating top-k lists, demonstrating significant improvements over a domain-agnostic baseline in both cases.

AAAI Conference 2008 Conference Paper

Active Learning for Pipeline Models

  • Dan Roth

For many machine learning solutions to complex applications, there are significant performance advantages to decomposing the overall task into several simpler sequential stages, commonly referred to as a pipeline model. Typically, such scenarios are also characterized by high sample complexity, motivating the study of active learning for these situations. While most active learning research examines single predictions, we extend such work to applications which utilize pipelined predictions. Specifically, we present an adaptive strategy for combining local active learning strategies into one that minimizes the annotation requirements for the overall task. Empirical results for a three-stage entity and relation extraction system demonstrate a significant reduction in supervised data requirements when using the proposed method.

AAAI Conference 2008 Conference Paper

Importance of Semantic Representation: Dataless Classification

  • Ming-Wei Chang
  • Dan Roth

Traditionally, text categorization has been studied as the problem of training of a classifier using labeled data. However, people can categorize documents into named categories without any explicit training because we know the meaning of category names. In this paper, we introduce Dataless Classification, a learning protocol that uses world knowledge to induce classifiers without the need for any labeled data. Like humans, a dataless classifier interprets a string of words as a set of semantic concepts. We propose a model for dataless classification and show that the label name alone is often sufficient to induce classifiers. Using Wikipedia as our source of world knowledge, we get 85. 29% accuracy on tasks from the 20 Newsgroup dataset and 88. 62% accuracy on tasks from a Yahoo! Answers dataset without any labeled or unlabeled data from the datasets. With unlabeled data, we can further improve the results and show quite competitive performance to a supervised learning algorithm that uses 100 labeled examples.

IJCAI Conference 2007 Conference Paper

  • Dav Zimak
  • Sariel Har-Peled
  • Dan Roth

We study the problem of learning large margin halfspaces in various settings using coresets and show that coresets are a widely applicable tool for large margin learning. A large margin coreset is a subset of the input data sufficient for approximating the true maximum margin solution. In this work, we provide a direct algorithm and analysis for constructing large margin coresets. We show various applications including a novel coreset based analysis of large margin active learning and a polynomial time (in the number of input data and the amount of noise) algorithm for agnostic learning in the presence of outlier noise. We also highlight a simple extension to multi-class classification problems and structured output learning.

JMLR Journal 2005 Journal Article

Generalization Bounds for the Area Under the ROC Curve

  • Shivani Agarwal
  • Thore Graepel
  • Ralf Herbrich
  • Sariel Har-Peled
  • Dan Roth

We study generalization properties of the area under the ROC curve (AUC), a quantity that has been advocated as an evaluation criterion for the bipartite ranking problem. The AUC is a different term than the error rate used for evaluation in classification problems; consequently, existing generalization bounds for the classification error rate cannot be used to draw conclusions about the AUC. In this paper, we define the expected accuracy of a ranking function (analogous to the expected error rate of a classification function), and derive distribution-free probabilistic bounds on the deviation of the empirical AUC of a ranking function (observed on a finite data sequence) from its expected accuracy. We derive both a large deviation bound, which serves to bound the expected accuracy of a ranking function in terms of its empirical AUC on a test sequence, and a uniform convergence bound, which serves to bound the expected accuracy of a learned ranking function in terms of its empirical AUC on a training sequence. Our uniform convergence bound is expressed in terms of a new set of combinatorial parameters that we term the bipartite rank-shatter coefficients; these play the same role in our result as do the standard VC-dimension related shatter coefficients (also known as the growth function) in uniform convergence results for the classification error rate. A comparison of our result with a recent uniform convergence result derived by Freund et al. (2003) for a quantity closely related to the AUC shows that the bound provided by our result can be considerably tighter. [abs] [ pdf ][ bib ] &copy JMLR 2005. ( edit, beta )

IJCAI Conference 2005 Conference Paper

Learning and Inference over Constrained Output

  • Vasin Punyakanok
  • Dan Roth
  • Wen-tau Yih
  • Dav

We study learning structured output in a discriminative framework where values of the output variables are estimated by local classifiers. In this framework, complex dependencies among the output variables are captured by constraints and dictate which global labels can be inferred. We compare two strategies, learning independent classifiers and inference based training, by observing their behaviors in different conditions. Experiments and theoretical justification lead to the conclusion that using inference based learning is superior when the local classifiers are difficult to learn but may require many examples before any discernible difference can be observed.

IJCAI Conference 2005 Conference Paper

The Necessity of Syntactic Parsing for Semantic Role Labeling

  • Vasin Punyakanok
  • Dan Roth
  • Wen-tau

We provide an experimental study of the role of syntactic parsing in semantic role labeling. Our conclusions demonstrate that syntactic parse information is clearly most relevant in the very first stage – the pruning stage. In addition, the quality of the pruning stage cannot be determined solely based on its recall and precision. Instead it depends on the characteristics of the output candidates that make downstream problems easier or harder. Motivated by this observation, we suggest an effective and simple approach of combining different semantic role labeling systems through joint inference, which significantly improves the performance.

NeurIPS Conference 2004 Conference Paper

A Large Deviation Bound for the Area Under the ROC Curve

  • Shivani Agarwal
  • Thore Graepel
  • Ralf Herbrich
  • Dan Roth

The area under the ROC curve (AUC) has been advocated as an evalu- ation criterion for the bipartite ranking problem. We study large devi- ation properties of the AUC; in particular, we derive a distribution-free large deviation bound for the AUC which serves to bound the expected accuracy of a ranking function in terms of its empirical AUC on an inde- pendent test sequence. A comparison of our result with a corresponding large deviation result for the classification error rate suggests that the test sample size required to obtain an -accurate estimate of the expected ac- curacy of a ranking function with δ-confidence is larger than that required to obtain an -accurate estimate of the expected error rate of a classifi- cation function with the same confidence. A simple application of the union bound allows the large deviation bound to be extended to learned ranking functions chosen from finite function classes.

NeurIPS Conference 2002 Conference Paper

Constraint Classification for Multiclass Classification and Ranking

  • Sariel Har-Peled
  • Dan Roth
  • Dav Zimak

The constraint classification framework captures many flavors of mul- ticlass classification including winner-take-all multiclass classification, multilabel classification and ranking. We present a meta-algorithm for learning in this framework that learns via a single linear classifier in high dimension. We discuss distribution independent as well as margin-based generalization bounds and present empirical and theoretical evidence showing that constraint classification benefits over existing methods of multiclass classification.

NeurIPS Conference 2001 Conference Paper

Efficiency versus Convergence of Boolean Kernels for On-Line Learning Algorithms

  • Roni Khardon
  • Dan Roth
  • Rocco Servedio

We study online learning in Boolean domains using kernels which cap- ture feature expansions equivalent to using conjunctions over basic fea- tures. We demonstrate a tradeoff between the computational efficiency with which these kernels can be computed and the generalization abil- ity of the resulting classifier. We first describe several kernel functions which capture either limited forms of conjunctions or all conjunctions. We show that these kernels can be used to efficiently run the Percep- tron algorithm over an exponential number of conjunctions; however we also prove that using such kernels the Perceptron algorithm can make an exponential number of mistakes even when learning simple func- tions. We also consider an analogous use of kernel functions to run the multiplicative-update Winnow algorithm over an expanded feature space of exponentially many conjunctions. While known upper bounds imply that Winnow can learn DNF formulae with a polynomial mistake bound in this setting, we prove that it is computationally hard to simulate Win- now’s behavior for learning DNF over such a feature set, and thus that such kernel functions for Winnow are not efficiently computable.

NeurIPS Conference 2000 Conference Paper

The Use of Classifiers in Sequential Inference

  • Vasin Punyakanok
  • Dan Roth

We study the problem of combining the outcomes of several different classifiers in a way that provides a coherent inference that satisfies some constraints. In particular, we develop two general approaches for an im(cid: 173) portant subproblem - identifying phrase structure. The first is a Marko(cid: 173) vian approach that extends standard HMMs to allow the use of a rich ob(cid: 173) servation structure and of general classifiers to model state-observation dependencies. The second is an extension of constraint satisfaction for(cid: 173) malisms. We develop efficient combination algorithms under both mod(cid: 173) els and study them experimentally in the context of shallow parsing.

NeurIPS Conference 1999 Conference Paper

A SNoW-Based Face Detector

  • Ming-Hsuan Yang
  • Dan Roth
  • Narendra Ahuja

A novel learning approach for human face detection using a network of linear units is presented. The SNoW learning architecture is a sparse network of linear functions over a pre-defined or incremen(cid: 173) tally learned feature space and is specifically tailored for learning in the presence of a very large number of features. A wide range of face images in different poses, with different expressions and under different lighting conditions are used as a training set to capture the variations of human faces. Experimental results on commonly used benchmark data sets of a wide range of face images show that the SNoW-based approach outperforms methods that use neural networks, Bayesian methods, support vector machines and oth(cid: 173) ers. Furthermore, learning and evaluation using the SNoW-based method are significantly more efficient than with other methods.

IJCAI Conference 1999 Conference Paper

Learning in Natural Language

  • Dan Roth

Statistics-based classifiers in natural language are developed typically by assuming a generative model for the data, estimating its parameters from training data and then using Bayes rule to obtain a classifier. For many problems the assumptions made by the generative models are evidently wrong, leaving open the question of why these approaches work. This paper presents a learning theory account of the major statistical approaches to learning in natural language. A class of Linear Statistical Queries (LSQ) hypotheses is defined and learning with it is shown to exhibit some robustness properties. Many statistical learners used in natural language, including naive Bayes, Markov Models and Maximum Entropy models are shown to be LSQ hypotheses, explaining the robustness of these predictors even when the underlying probabilistic assumptions do not hold. This coherent view of when and why learning approaches work in this context may help to develop better learning methods and an understanding of the role of learning in natural language inferences.

IJCAI Conference 1999 Conference Paper

Relational Learning for NLP using Linear Threshold Elements

  • Roni Khardon
  • Dan Roth
  • Leslie G. Valiant___911

We describe a coherent view of learning and reasoning with relational representations in the context of natural language processing. In particular, we discuss the Neuroidal Architecture, Inductive Logic Programming and the SNoW system explaining the relationships among these, and thereby offer an explanation of the theoretical basis for the SNoW system. We suggest that extensions of this system along the lines suggested by the theory may provide new levels of scalability and functionality.

AAAI Conference 1998 Conference Paper

Learning to Resolve Natural Language Ambiguities: A Unified Approach

  • Dan Roth

Weanalyze a few ofthe commonly used statistics based and machine learning algorithms fornatural language disambiguation tasks and observe that they can bcrecast aslearning linear separators inthe feature space. Each ofthe methods makes a priori assumptions, which itemploys, given the data, when searching for its hypothesis. Nevertheless, asweshow, itsearches aspace that isasrich asthe space ofall linear separators. Weuse this tobuild an argumentfora data driven approach which merely searches for a good linear separator inthe feature space, without further assumptions onthe domain ora specific problem. Wepresent such anapproach - a sparse network of linear separators, utilizing the Winnow learning aigorlthrn -and show how touse itinavariety ofambiguity resolution problems. The learning approach presented isattribute-efficient and, therefore, appropriate for domains having very large number ofattributes. In particular, wepresent an extensive experimental comparisonof our approach with other methodson several well studied lexical disambiguation tasks such as context-sensltlvespelling correction, prepositional phrase attachmentand part of speechtagging. In all cases weshowthat our approacheither outperforms other methodstried for these tasks or performscomparablyto the best.

NeurIPS Conference 1997 Conference Paper

Linear Concepts and Hidden Variables: An Empirical Study

  • Adam Grove
  • Dan Roth

Some learning techniques for classification tasks work indirectly, by first trying to fit a full probabilistic model to the observed data. Whether this is a good idea or not depends on the robustness with respect to deviations from the postulated model. We study this question experimentally in a restricted, yet non-trivial and interesting case: we consider a conditionally independent attribute (CIA) model which postulates a single binary-valued hidden variable z on which all other attributes (i. e. , the target and the observables) depend. In this model, finding the most likely value of anyone variable (given known values for the others) reduces to testing a linear function of the observed values. We learn CIA with two techniques: the standard EM algorithm, and a new algorithm we develop based on covariances. We compare these, in a controlled fashion, against an algorithm (a version of Winnow) that attempts to find a good linear classifier directly. Our conclusions help delimit the fragility of using the CIA model for classification: once the data departs from this model, performance quickly degrades and drops below that of the directly-learned linear classifier.

AAAI Conference 1996 Conference Paper

A Connectionist Framework for Reasoning: Reasoning with Examples

  • Dan Roth

We present a connectionist architecture that supports almost instantaneous deductive and abductive reasoning. The deduction algorithm responds in few steps for single rule queries and in general, takes time that is linear with the number of rules in the query. The abduction algorithm produces an explanation in few steps and the best explanation in time linear with the size of the assumption set. The size of the network is polynomially related to the size of other representations of the domain, and may even be smaller. We base our connectionist model on Valiant’ s Neuroidal model (Va194) and thus make minimal assumptions about the computing elements, which are assumed to be classical threshold elements with states. Within this model we develop a reasoning framework that utilizes a model-based approach to reasoning (KKS93; KR94b). In particular, we suggest to interpret the connectionist architecture as encoding examples of the domain we reason about and show how to perform various reasoning tasks with this interpretation. We then show that the representations used can be acquired efficiently from interactions with the environment and discuss how this learning process influences the reasoning performance of the network.

IJCAI Conference 1995 Conference Paper

Default-Reasoning with Models

  • Roni Khardon
  • Dan Roth

Reasoning with model-based representations is an intuitive paradigm, which has been shown to be theoretically sound and to possess some computational advantages over reasoning with formula-based representations of knowledge. In this paper we present more evidence to the util­ ity of such representations. In real life situations, one normally completes a lot of missing "context" information when answering queries. We model this situation by augmenting the available knowledge about the world with context-specific information; we show that reasoning with model-based repre­ sentations can be done efficiently in the pres­ ence of varying context information. We then consider the task of default reasoning. We show that default reasoning is a generalization of reasoning within context, in which the reasoner has many "context" rules, which may be conflicting. We characterize the cases in which model-based reasoning supports efficient default reasoning and develop algorithms that handle efficiently fragments of Reiter's default logic. In particular, this includes cases in which performing the default reasoning task with the traditional, formula-based, representation is in­ tractable. Further, we argue that these results support an incremental view of reasoning in a natural way.