Author name cluster

Jaime Carbonell

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

23 papers

1 author row

NeurIPS Conference 2021 Conference Paper

Domain Adaptation with Invariant Representation Learning: What Transformations to Learn?

Petar Stojanov
Zijian Li
Mingming Gong
Ruichu Cai
Jaime Carbonell
Kun Zhang

Unsupervised domain adaptation, as a prevalent transfer learning setting, spans many real-world applications. With the increasing representational power and applicability of neural networks, state-of-the-art domain adaptation methods make use of deep architectures to map the input features $X$ to a latent representation $Z$ that has the same marginal distribution across domains. This has been shown to be insufficient for generating optimal representation for classification, and to find conditionally invariant representations, usually strong assumptions are needed. We provide reasoning why when the supports of the source and target data from overlap, any map of $X$ that is fixed across domains may not be suitable for domain adaptation via invariant features. Furthermore, we develop an efficient technique in which the optimal map from $X$ to $Z$ also takes domain-specific information as input, in addition to the features $X$. By using the property of minimal changes of causal mechanisms across domains, our model also takes into account the domain-specific information to ensure that the latent representation $Z$ does not discard valuable information about $Y$. We demonstrate the efficacy of our method via synthetic and real-world data experiments. The code is available at: \texttt{https: //github. com/DMIRLAB-Group/DSAN}.

AAAI Conference 2020 Conference Paper

Semi-Supervised Learning on Meta Structure: Multi-Task Tagging and Parsing in Low-Resource Scenarios

KyungTae Lim
Jay Yoon Lee
Jaime Carbonell
Thierry Poibeau

Multi-view learning makes use of diverse models arising from multiple sources of input or different feature subsets for the same task. For example, a given natural language processing task can combine evidence from models arising from character, morpheme, lexical, or phrasal views. The most common strategy with multi-view learning, especially popular in the neural network community, is to unify multiple representations into one uniﬁed vector through concatenation, averaging, or pooling, and then build a single-view model on top of the uniﬁed representation. As an alternative, we examine whether building one model per view and then unifying the different models can lead to improvements, especially in low-resource scenarios. More speciﬁcally, taking inspiration from co-training methods, we propose a semi-supervised learning approach based on multi-view models through consensus promotion, and investigate whether this improves overall performance. To test the multi-view hypothesis, we use moderately lowresource scenarios for nine languages and test the performance of the joint model for part-of-speech tagging and dependency parsing. The proposed model shows signiﬁcant improvements across the test cases, with average gains of −0. 9 ∼ +9. 3 labeled attachment score (LAS) points. We also investigate the effect of unlabeled data on the proposed model by varying the amount of training data and by using different domains of unlabeled data.

AAAI Conference 2019 Conference Paper

Gradient-Based Inference for Networks with Output Constraints

Jay Yoon Lee
Sanket Vaibhav Mehta
Michael Wick
Jean-Baptiste Tristan
Jaime Carbonell

Practitioners apply neural networks to increasingly complex problems in natural language processing, such as syntactic parsing and semantic role labeling that have rich output structures. Many such structured-prediction problems require deterministic constraints on the output values; for example, in sequence-to-sequence syntactic parsing, we require that the sequential outputs encode valid trees. While hidden units might capture such properties, the network is not always able to learn such constraints from the training data alone, and practitioners must then resort to post-processing. In this paper, we present an inference method for neural networks that enforces deterministic constraints on outputs without performing rule-based post-processing or expensive discrete search. Instead, in the spirit of gradient-based training, we enforce constraints with gradient-based inference (GBI): for each input at test-time, we nudge continuous model weights until the network’s unconstrained inference procedure generates an output that satisfies the constraints. We study the efficacy of GBI on three tasks with hard constraints: semantic role labeling, syntactic parsing, and sequence transduction. In each case, the algorithm not only satisfies constraints, but improves accuracy, even when the underlying network is stateof-the-art.

NeurIPS Conference 2019 Conference Paper

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Zhilin Yang
Zihang Dai
Yiming Yang
Jaime Carbonell
Russ Salakhutdinov
Quoc Le

With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling. However, relying on corrupting the input with masks, BERT neglects dependency between the masked positions and suffers from a pretrain-finetune discrepancy. In light of these pros and cons, we propose XLNet, a generalized autoregressive pretraining method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and (2) overcomes the limitations of BERT thanks to its autoregressive formulation. Furthermore, XLNet integrates ideas from Transformer-XL, the state-of-the-art autoregressive model, into pretraining. Empirically, under comparable experiment setting, XLNet outperforms BERT on 20 tasks, often by a large margin, including question answering, natural language inference, sentiment analysis, and document ranking.

AAAI Conference 2019 Conference Paper

Zero-Shot Neural Transfer for Cross-Lingual Entity Linking

Shruti Rijhwani
Jiateng Xie
Graham Neubig
Jaime Carbonell

Cross-lingual entity linking maps an entity mention in a source language to its corresponding entry in a structured knowledge base that is in a different (target) language. While previous work relies heavily on bilingual lexical resources to bridge the gap between the source and the target languages, these resources are scarce or unavailable for many low-resource languages. To address this problem, we investigate zero-shot cross-lingual entity linking, in which we assume no bilingual lexical resources are available in the source low-resource language. Specifically, we propose pivot-based entity linking, which leverages information from a highresource “pivot” language to train character-level neural entity linking models that are transferred to the source lowresource language in a zero-shot manner. With experiments on 9 low-resource languages and transfer through a total of 54 languages, we show that our proposed pivot-based framework improves entity linking accuracy 17% (absolute) on average over the baseline systems, for the zero-shot scenario. 1 Further, we also investigate the use of language-universal phonological representations which improves average accuracy (absolute) by 36% when transferring between languages that use different scripts.

TCS Journal 2018 Journal Article

Bounds on the minimax rate for estimating a prior over a VC class from independent learning tasks

Liu Yang
Steve Hanneke
Jaime Carbonell

AAMAS Conference 2018 Conference Paper

I Know What You Don't Know: Proactive Learning through Targeted Human Interaction

Abdelwahab Bourai
Jaime Carbonell

Humans communicate extensively through “meta-information" encoded in emitted non-verbal signals. This meta-information not only allows us to analyze an individual’s external emotional state but also certain internal states. For example, humans are able to learn from others thanks to their ability to determine their most knowledgeable peers in a given domain through their interactions with these individuals. As autonomous agents expand into more socially oriented tasks, they must capture and reason through these emitted cues to better understand their human counterparts. In this work, we conduct two experiments. First, we train a model to predict the knowledgeability of speakers using non-verbal features. Next we simulate the process of selecting the most knowledgeable person in a given domain using a proactive learning approach. The results indicate our agent is capable of observing human behavior and using this information to select a specific human for aid on a given question.

NeurIPS Conference 2017 Conference Paper

Active Learning from Peers

Keerthiram Murugesan
Jaime Carbonell

This paper addresses the challenge of learning from peers in an online multitask setting. Instead of always requesting a label from a human oracle, the proposed method first determines if the learner for each task can acquire that label with sufficient confidence from its peers either as a task-similarity weighted sum, or from the single most similar task. If so, it saves the oracle query for later use in more difficult cases, and if not it queries the human oracle. The paper develops the new algorithm to exhibit this behavior and proves a theoretical mistake bound for the method compared to the best linear predictor in hindsight. Experiments over three multitask learning benchmark datasets show clearly superior performance over baselines such as assuming task independence, learning only from the oracle and not learning from peer tasks.

IJCAI Conference 2017 Conference Paper

Completely Heterogeneous Transfer Learning with Attention - What And What Not To Transfer

Seungwhan Moon
Jaime Carbonell

We study a transfer learning framework where source and target datasets are heterogeneous in both feature and label spaces. Specifically, we do not assume explicit relations between source and target tasks a priori, and thus it is crucial to determine what and what not to transfer from source knowledge. Towards this goal, we define a new heterogeneous transfer learning approach that (1) selects and attends to an optimized subset of source samples to transfer knowledge from, and (2) builds a unified transfer network that learns from both source and target knowledge. This method, termed "Attentional Heterogeneous Transfer", along with a newly proposed unsupervised transfer loss, improve upon the previous state-of-the-art approaches on extensive simulations as well as a challenging hetero-lingual text classification task.

IJCAI Conference 2017 Conference Paper

Self-Paced Multitask Learning with Shared Knowledge

Keerthiram Murugesan
Jaime Carbonell

This paper introduces self-paced task selection to multitask learning, where instances from more closely related tasks are selected in a progression of easier-to-harder tasks, to emulate an effective human education strategy, but applied to multitask machine learning. We develop the mathematical foundation for the approach based on iterative selection of the most appropriate task, learning the task parameters, and updating the shared knowledge, optimizing a new bi-convex loss function. This proposed method applies quite generally, including to multitask feature learning, multitask learning with alternating structure optimization, etc. Results show that in each of the above formulations self-paced (easier-to-harder) task selection outperforms the baseline version of these methods in all the experiments.

AAAI Conference 2017 Conference Paper

Vision-Language Fusion for Object Recognition

Sz-Rung Shiang
Stephanie Rosenthal
Anatole Gershman
Jaime Carbonell
Jean Oh

While recent advances in computer vision have caused object recognition rates to spike, there is still much room for improvement. In this paper, we develop an algorithm to improve object recognition by integrating human-generated contextual information with vision algorithms. Speciﬁcally, we examine how interactive systems such as robots can utilize two types of context information–verbal descriptions of an environment and human-labeled datasets. We propose a re-ranking schema, MultiRank, for object recognition that can ef- ﬁciently combine such information with the computer vision results. In our experiments, we achieve up to 9. 4% and 16. 6% accuracy improvements using the oracle and the detected bounding boxes, respectively, over the vision-only recognizers. We conclude that our algorithm has the ability to make a signiﬁcant impact on object recognition in robotics and beyond.

NeurIPS Conference 2016 Conference Paper

Adaptive Smoothed Online Multi-Task Learning

Keerthiram Murugesan
Hanxiao Liu
Jaime Carbonell
Yiming Yang

This paper addresses the challenge of jointly learning both the per-task model parameters and the inter-task relationships in a multi-task online learning setting. The proposed algorithm features probabilistic interpretation, efficient updating rules and flexible modulation on whether learners focus on their specific task or on jointly address all tasks. The paper also proves a sub-linear regret bound as compared to the best linear predictor in hindsight. Experiments over three multi-task learning benchmark datasets show advantageous performance of the proposed approach over several state-of-the-art online multi-task learning baselines.

JAIR Journal 2016 Journal Article

Learning Concept Graphs from Online Educational Data

Hanxiao Liu
Wanli Ma
Yiming Yang
Jaime Carbonell

This paper addresses an open challenge in educational data mining, i.e., the problem of automatically mapping online courses from different providers (universities, MOOCs, etc.) onto a universal space of concepts, and predicting latent prerequisite dependencies (directed links) among both concepts and courses. We propose a novel approach for inference within and across course-level and concept-level directed graphs. In the training phase, our system projects partially observed course-level prerequisite links onto directed concept-level links; in the testing phase, the induced concept-level links are used to infer the unknown course-level prerequisite links. Whereas courses may be specific to one institution, concepts are shared across different providers. The bi-directional mappings enable our system to perform interlingua-style transfer learning, e.g. treating the concept graph as the interlingua and transferring the prerequisite relations across universities via the interlingua. Experiments on our newly collected datasets of courses from MIT, Caltech, Princeton and CMU show promising results.

PDF Details DOI

AAAI Conference 2015 Conference Paper

Unsupervised Phrasal Near-Synonym Generation from Text Corpora

Dishan Gupta
Jaime Carbonell
Anatole Gershman
Steve Klein
David Miller

Unsupervised discovery of synonymous phrases is useful in a variety of tasks ranging from text mining and search engines to semantic analysis and machine translation. This paper presents an unsupervised corpus-based conditional model: Near-Synonym System (NeSS) for finding phrasal synonyms and near synonyms that requires only a large monolingual corpus. The method is based on maximizing information-theoretic combinations of shared contexts and is parallelizable for large-scale processing. An evaluation framework with crowd-sourced judgments is proposed and results are compared with alternate methods, demonstrating considerably superior results to the literature and to thesaurus look up for multi-word phrases. Moreover, the results show that the statistical scoring functions and overall scalability of the system are more important than language specific NLP tools. The method is language-independent and practically useable due to accuracy and real-time performance via parallel decomposition.

NeurIPS Conference 2014 Conference Paper

Efficient Structured Matrix Rank Minimization

Adams Wei Yu
Wanli Ma
Yaoliang Yu
Jaime Carbonell
Suvrit Sra

We study the problem of finding structured low-rank matrices using nuclear norm regularization where the structure is encoded by a linear map. In contrast to most known approaches for linearly structured rank minimization, we do not (a) use the full SVD; nor (b) resort to augmented Lagrangian techniques; nor (c) solve linear systems per iteration. Instead, we formulate the problem differently so that it is amenable to a generalized conditional gradient method, which results in a practical improvement with low per iteration computational cost. Numerical results show that our approach significantly outperforms state-of-the-art competitors in terms of running time, while effectively recovering low rank solutions in stochastic system realization and spectral compressed sensing problems.

NeurIPS Conference 2013 Conference Paper

Buy-in-Bulk Active Learning

Liu Yang
Jaime Carbonell

In many practical applications of active learning, it is more cost-effective to request labels in large batches, rather than one-at-a-time. This is because the cost of labeling a large batch of examples at once is often sublinear in the number of examples in the batch. In this work, we study the label complexity of active learning algorithms that request labels in a given number of batches, as well as the tradeoff between the total number of queries and the number of rounds allowed. We additionally study the total cost sufficient for learning, for an abstract notion of the cost of requesting the labels of a given number of examples at once. In particular, we find that for sublinear cost functions, it is often desirable to request labels in large batches (i. e. , buying in bulk); although this may increase the total number of labels requested, it reduces the total cost required for learning.

AAAI Conference 2010 Conference Paper

Learning Spatial-Temporal Varying Graphs with Applications to Climate Data Analysis

Xi Chen
Yan Liu
Han Liu
Jaime Carbonell

An important challenge in understanding climate change is to uncover the dependency relationships between various climate observations and forcing factors. Graphical lasso, a recently proposed `1 penalty based structure learning algorithm, has been proven successful for learning underlying dependency structures for the data drawn from a multivariate Gaussian distribution. However, climatological data often turn out to be non-Gaussian, e. g. cloud cover, precipitation, etc. In this paper, we examine nonparametric learning methods to address this challenge. In particular, we develop a methodology to learn dynamic graph structures from spatial-temporal data so that the graph structures at adjacent time or locations are similar. Experimental results demonstrate that our method not only recovers the underlying graph well but also captures the smooth variation properties on both synthetic data and climate data.

IJCAI Conference 2007 Conference Paper

Lucian V Lita
Jaime Carbonell

Question answering (QA) is a highly complex task that brings together classification, clustering, retrieval, and extraction. Question answering systems include various statistical and rule-based components that combine and form multiple strategies for finding answers. However, in real-life scenarios efficiency constraints make it infeasible to simultaneously use all available strategies in a QA system. To address this issue, we present an approach for carefully selecting answering strategies that are likely to benefit individual questions, without significantly reducing performance. We evaluate the impact of strategy selection on question answering performance at several important QA stages: document retrieval, answer extraction, and answer merging. We present strategy selection experiments using a statistical question answering system, and we show significant efficiency improvements. By selecting 10% of the available answering strategies, we obtained similar performance when compared to using all of the strategies combined.

IJCAI Conference 2007 Conference Paper

Yan Liu
Jaime Carbonell
Vanathi Gopalakrishnan
Peter Weigele

Protein fold recognition is a crucial step in inferring biological structure and function. This paper focuses on machine learning methods for predicting quaternary structural folds, which consist of multiple protein chains that form chemical bonds among side chains to reach a structurally stable domain. The complexity associated with modeling the quaternary fold poses major theoretical and computational challenges to current machine learning methods. We propose methods to address these challenges and show how (1) domain knowledge is encoded and utilized to characterize structural properties using segmentation conditional graphical models; and (2) model complexity is handled through efficient inference algorithms. Our model follows a discriminative approach so that any informative features, such as those representative of overlapping or long-range interactions, can be used conveniently. The model is applied to predict two important quaternary folds, the triple beta-spirals and double-barrel trimers. Cross-family validation shows that our method outperforms other state-of-the art algorithms.

IJCAI Conference 2007 Conference Paper

Jingrui He
Jaime Carbonell
Yan Liu

This paper proposes and develops a new graph-based semi-supervised learning method. Different from previous graph-based methods that are based on discriminative models, our method is essentially a generative model in that the class conditional probabilities are estimated by graph propagation and the class priors are estimated by linear regression. Experimental results on various datasets show that the proposed method is superior to existing graph-based semi-supervised learning methods, especially when the labeled subset alone proves insufficient to estimate meaningful class priors.

NeurIPS Conference 2007 Conference Paper

Nearest-Neighbor-Based Active Learning for Rare Category Detection

Jingrui He
Jaime Carbonell

Rare category detection is an open challenge for active learning, especially in the de-novo case (no labeled examples), but of signiﬁcant practical importance for data mining - e. g. detecting new ﬁnancial transaction fraud patterns, where normal legitimate transactions dominate. This paper develops a new method for detecting an instance of each minority class via an unsupervised local-density-differential sampling strategy. Essentially a variable-scale nearest neighbor process is used to optimize the probability of sampling tightly-grouped minority classes, subject to a local smoothness assumption of the majority class. Results on both synthetic and real data sets are very positive, detecting each minority class with only a frac- tion of the actively sampled points required by random sampling and by Pelleg’s Interleave method, the prior best technique in the sparse literature on this topic.

AAAI Conference 1999 Conference Paper

Selecting Text Spans for Document Summaries: Heuristics and Metrics

Vibhu Mittal
Mark Kantrowitz
Just Research; Jade Goldstein
Jaime Carbonell
Carnegie Mellon University

Human-quality text summarization systems are difficult to design, and even more difficult to evaluate, in part because documents can differ along several dimensions, such as length, writing style and lexical usage. Nevertheless, certain cues can often help suggest the selection of sentences for inclusion in a summary. This paper presents an analysis of newsarticle summaries generated by sentence extraction. Sentences are ranked for potential inclusion in the summary using a weighted combination of linguistic features – derived from an analysis of news-wire summaries. This paper evaluates the relative effectiveness of these features. In order to do so, we discuss the construction of a large corpus of extractionbased summaries, and characterize the underlying degree of difficulty of summarization at different compression levels on articles in this corpus. Results on our feature set are presented after normalization by this degree of difficulty.

IJCAI Conference 1989 Conference Paper

Towards a General Framework for Composing Disjunctive and Iterative Macro-operators

Peter Shell
Jaime Carbonell

Inducing disjunctive and iterative macro-operators from empirical problem-solving traces provides a more powerful knowledge compilation method than simple linear macro-operators. Whereas earlier work focused on when to create iterative macro-operators, this paper addresses how to form them, combining proven optimization methods such as extraction of loop invariants, with techniques for further optimizing RETEmatch efficiency. The disjunctive and iterative composition processes have been implemented in FERMI and its underlying production system language. Empirical results confirm substantial rule-match speedups and system performance improvements in different application domains.