Arrow Research search

Author name cluster

John D. Lafferty

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

28 papers
2 author rows

Possible papers

28

ICML Conference 2025 Conference Paper

Disentangling and Integrating Relational and Sensory Information in Transformer Architectures

  • Awni Altabaa
  • John D. Lafferty

Relational reasoning is a central component of generally intelligent systems, enabling robust and data-efficient inductive generalization. Recent empirical evidence shows that many existing neural architectures, including Transformers, struggle with tasks requiring relational reasoning. In this work, we distinguish between two types of information: sensory information about the properties of individual objects, and relational information about the relationships between objects. While neural attention provides a powerful mechanism for controlling the flow of sensory information between objects, the Transformer lacks an explicit computational mechanism for routing and processing relational information. To address this limitation, we propose an architectural extension of the Transformer framework that we call the Dual Attention Transformer (DAT), featuring two distinct attention mechanisms: sensory attention for directing the flow of sensory information, and a novel relational attention mechanism for directing the flow of relational information. We empirically evaluate DAT on a diverse set of tasks ranging from synthetic relational benchmarks to complex real-world tasks such as language modeling and visual processing. Our results demonstrate that integrating explicit relational computational mechanisms into the Transformer architecture leads to significant performance gains in terms of data efficiency and parameter efficiency.

ICLR Conference 2024 Conference Paper

Abstractors and relational cross-attention: An inductive bias for explicit relational reasoning in Transformers

  • Awni Altabaa
  • Taylor Whittington Webb
  • Jonathan D. Cohen 0003
  • John D. Lafferty

An extension of Transformers is proposed that enables explicit relational reasoning through a novel module called the *Abstractor*. At the core of the Abstractor is a variant of attention called *relational cross-attention*. The approach is motivated by an architectural inductive bias for relational learning that disentangles relational information from object-level features. This enables explicit relational reasoning, supporting abstraction and generalization from limited data. The Abstractor is first evaluated on simple discriminative relational tasks and compared to existing relational architectures. Next, the Abstractor is evaluated on purely relational sequence-to-sequence tasks, where dramatic improvements are seen in sample efficiency compared to standard Transformers. Finally, Abstractors are evaluated on a collection of tasks based on mathematical problem solving, where consistent improvements in performance and sample efficiency are observed.

AAAI Conference 2019 Conference Paper

TopicEq: A Joint Topic and Mathematical Equation Model for Scientific Texts

  • Michihiro Yasunaga
  • John D. Lafferty

Scientific documents rely on both mathematics and text to communicate ideas. Inspired by the topical correspondence between mathematical equations and word contexts observed in scientific texts, we propose a novel topic model that jointly generates mathematical equations and their surrounding text (TopicEq). Using an extension of the correlated topic model, the context is generated from a mixture of latent topics, and the equation is generated by an RNN that depends on the latent topic activations. To experiment with this model, we create a corpus of 400K equation-context pairs extracted from a range of scientific articles from arXiv, and fit the model using a variational autoencoder approach. Experimental results show that this joint model significantly outperforms existing topic models and equation models for scientific texts. Moreover, we qualitatively show that the model effectively captures the relationship between topics and mathematics, enabling novel applications such as topic-aware equation generation, equation topic inference, and topic-aware alignment of mathematical symbols and words.

ICML Conference 2018 Conference Paper

Distributed Nonparametric Regression under Communication Constraints

  • Yuancheng Zhu
  • John D. Lafferty

This paper studies the problem of nonparametric estimation of a smooth function with data distributed across multiple machines. We assume an independent sample from a white noise model is collected at each machine, and an estimator of the underlying true function needs to be constructed at a central machine. We place limits on the number of bits that each machine can use to transmit information to the central machine. Our results give both asymptotic lower bounds and matching upper bounds on the statistical risk under various settings. We identify three regimes, depending on the relationship among the number of machines, the size of data available at each machine, and the communication budget. When the communication budget is small, the statistical risk depends solely on this communication bottleneck, regardless of the sample size. In the regime where the communication budget is large, the classic minimax risk in the non-distributed estimation setting is recovered. In an intermediate regime, the statistical risk depends on both the sample size and the communication budget.

ICML Conference 2018 Conference Paper

Prediction Rule Reshaping

  • Matt Bonakdarpour
  • Sabyasachi Chatterjee
  • Rina Barber
  • John D. Lafferty

Two methods are proposed for high-dimensional shape-constrained regression and classification. These methods reshape pre-trained prediction rules to satisfy shape constraints like monotonicity and convexity. The first method can be applied to any pre-trained prediction rule, while the second method deals specifically with random forests. In both cases, efficient algorithms are developed for computing the estimators, and experiments are performed to demonstrate their performance on four datasets. We find that reshaping methods enforce shape constraints without compromising predictive accuracy.

ICML Conference 2013 Conference Paper

Computation-Risk Tradeoffs for Covariance-Thresholded Regression

  • Dinah Shender
  • John D. Lafferty

We present a family of linear regression estimators that provides a fine-grained tradeoff between statistical accuracy and computational efficiency. The estimators are based on hard thresholding of the sample covariance matrix entries together with l2-regularizion(ridge regression). We analyze the predictive risk of this family of estimators as a function of the threshold and regularization parameter. With appropriate parameter choices, the estimate is the solution to a sparse, diagonally dominant linear system, solvable in near-linear time. Our analysis shows how the risk varies with the sparsity and regularization level, thus establishing a statistical estimation setting for which there is an explicit, smooth tradeoff between risk and computation. Simulations are provided to support the theoretical analyses.

ICML Conference 2013 Conference Paper

The Bigraphical Lasso

  • Alfredo A. Kalaitzis
  • John D. Lafferty
  • Neil D. Lawrence
  • Shuheng Zhou 0002

The i. i. d. assumption in machine learning is endemic, but often flawed. Complex data sets exhibit partial correlations between both instances and features. A model specifying both types of correlation can have a number of parameters that scales quadratically with the number of features and data points. We introduce the bigraphical lasso, an estimator for precision matrices of matrix-normals based on the Cartesian product of graphs. A prominent product in spectral graph theory, this structure has appealing properties for regression, enhanced sparsity and interpretability. To deal with the parameter explosion we introduce L1 penalties and fit the model through a flip-flop algorithm that results in a linear number of lasso regressions.

AAMAS Conference 2007 Conference Paper

Conditional Random Fields for Activity Recognition

  • Douglas L. Vail
  • Manuela M. Veloso
  • John D. Lafferty

Activity recognition is a key component for creating intelligent, multi-agent systems. Intrinsically, activity recognition is a temporal classification problem. In this paper, we compare two models for temporal classification: hidden Markov models (HMMs), which have long been applied to the activity recognition problem, and conditional random fields (CRFs). CRFs are discriminative models for labeling sequences. They condition on the entire observation sequence, which avoids the need for independence assumptions between observations. Conditioning on the observations vastly expands the set of features that can be incorporated into the model without violating its assumptions. Using data from a simulated robot tag domain, chosen because it is multi-agent and produces complex interactions between observations, we explore the differences in performance between the discriminatively trained CRF and the generative HMM. Additionally, we examine the effect of incorporating features which violate independence assumptions between observations; such features are typically necessary for high classification accuracy. We find that the discriminatively trained CRF performs as well as or better than an HMM even when the model features do not violate the independence assumptions of the HMM. In cases where features depend on observations from many time steps, we confirm that CRFs are robust against any degradation in performance.

IROS Conference 2007 Conference Paper

Feature selection in conditional random fields for activity recognition

  • Douglas L. Vail
  • John D. Lafferty
  • Manuela Veloso

Temporal classification, such as activity recognition, is a key component for creating intelligent robot systems. In the case of robots, classification algorithms must robustly incorporate complex, non-independent features extracted from streams of sensor data. Conditional random fields are discriminatively trained temporal models that can easily incorporate such features. However, robots have few computational resources to spare for computing a large number of features from high bandwidth sensor data, which creates opportunities for feature selection. Creating models that contain only the most relevant features reduces the computational burden of temporal classification. In this paper, we show that lscr 1 regularization is an effective technique for feature selection in conditional random fields. We present results from a multi-robot tag domain with data from both real and simulated robots that compare the classification accuracy of models trained with lscr 1 regularization, which simultaneously smoothes the model and selects features; lscr 2 regularization, which smoothes to avoid over-fitting, but performs no feature selection; and models trained with no smoothing.

ICML Conference 2006 Conference Paper

Dynamic topic models

  • David M. Blei
  • John D. Lafferty

A family of probabilistic time series models is developed to analyze the time evolution of topics in large document collections. The approach is to use state space models on the natural parameters of the multinomial distributions that represent the topics. Variational approximations based on Kalman filters and nonparametric wavelet regression are developed to carry out approximate posterior inference over the latent topics. In addition to giving quantitative, predictive models of a sequential corpus, dynamic topic models provide a qualitative window into the contents of a large document collection. The models are demonstrated by analyzing the OCR'ed archives of the journal Science from 1880 through 2000.

UAI Conference 2004 Conference Paper

Variational Chernoff Bounds for Graphical Models

  • Pradeep Ravikumar
  • John D. Lafferty

Recent research has made significant progress on the problem of bounding log partition functions for exponential family graphical models. Such bounds have associated dual parameters that are often used as heuristic estimates of the marginal probabilities required in inference and learning. However these variational estimates do not give rigorous bounds on marginal probabilities, nor do they give estimates for probabilities of more general events than simple marginals. In this paper we build on this recent work by deriving rigorous upper and lower bounds on event probabilities for graphical models. Our approach is based on the use of generalized Chernoff bounds to express bounds on event probabilities in terms of convex optimization problems; these optimization problems, in turn, require estimates of generalized log partition functions. Simulations indicate that this technique can result in useful, rigorous bounds to complement the heuristic variational estimates, with comparable computational cost.

UAI Conference 2002 Conference Paper

Expectation-Propogation for the Generative Aspect Model

  • Thomas P. Minka
  • John D. Lafferty

The generative aspect model is an extension of the multinomial model for text that allows word probabilities to vary stochastically across documents. Previous results with aspect models have been promising, but hindered by the computational difficulty of carrying out inference and learning. This paper demonstrates that the simple variational methods of Blei et al (2001) can lead to inaccurate inferences and biased learning for the generative aspect model. We develop an alternative approach that leads to higher accuracy at comparable cost. An extension of Expectation-Propagation is used for inference and then embedded in an EM algorithm for learning. Experimental results are presented for both synthetic and real data sets.