Arrow Research search

Author name cluster

Christopher Meek

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

37 papers
2 author rows

Possible papers

37

ICLR Conference 2023 Conference Paper

Learning Math Reasoning from Self-Sampled Correct and Partially-Correct Solutions

  • Ansong Ni
  • Jeevana Priya Inala
  • Chenglong Wang 0005
  • Alex Polozov
  • Christopher Meek
  • Dragomir Radev
  • Jianfeng Gao 0001

Pretrained language models have shown superior performance on many natural language processing tasks, yet they still struggle at multi-step formal reasoning tasks like grade school math problems. One key challenge of finetuning them to solve such math reasoning problems is that many existing datasets only contain one reference solution for each problem, despite the fact that there are often alternative solutions resembling different reasoning paths to the final answer. This way, the finetuned models are biased towards the limited reference solutions, which limits their generalization to unseen examples. To mitigate this issue, we propose to let the model perform sampling during training and learn from both self-sampled fully-correct solutions, which yield the correct answer upon execution, and partially-correct solutions, whose intermediate state matches an intermediate state of a known correct solution. We show that our use of self-sampled correct and partially-correct solutions can benefit learning and help guide the sampling process, leading to more efficient exploration of the solution space. Additionally, we explore various training objectives to support learning from multiple solutions per example and find they greatly affect the performance. Experiments on two math reasoning datasets show the effectiveness of our method compared to learning from a single reference solution with MLE, where we improve PASS@100 from 35.5% to 44.5% for GSM8K, and 27.6% to 36.2% PASS@80 for MathQA. Such improvements are also consistent across different model sizes.

ICLR Conference 2022 Conference Paper

Synchromesh: Reliable Code Generation from Pre-trained Language Models

  • Gabriel Poesia
  • Alex Polozov
  • Vu Le 0002
  • Ashish Tiwari 0001
  • Gustavo Soares
  • Christopher Meek
  • Sumit Gulwani

Large pre-trained language models have been used to generate code, providing a flexible interface for synthesizing programs from natural language specifications. However, they often violate syntactic and semantic rules of their output language, limiting their practical usability. In this paper, we propose Synchromesh: a framework for substantially improving the reliability of pre-trained models for code generation. Synchromesh comprises two components. First, it retrieves few-shot examples from a training bank using Target Similarity Tuning (TST), a novel method for semantic example selection. TST learns to recognize utterances that describe similar target programs despite of differences in surface natural language features. Then, Synchromesh feeds the examples to a pre-trained language model and samples programs using Constrained Semantic Decoding (CSD): a general framework for constraining the output to a set of valid programs in the target language. CSD leverages constraints on partial outputs to sample complete correct programs, and needs neither re-training nor fine-tuning of the language model. We evaluate our methods by synthesizing code from natural language descriptions using GPT-3 and Codex in three real-world languages: SQL queries, Vega-Lite visualizations and SMCalFlow programs. These domains showcase rich constraints that CSD is able to enforce, including syntax, scoping and typing rules. Across all languages, we observe complementary gains from CSD and TST in prediction accuracy and in effectively preventing parsing, type and run-time errors.

NeurIPS Conference 2021 Conference Paper

Few-Shot Learning Evaluation in Natural Language Understanding

  • Subhabrata Mukherjee
  • Xiaodong Liu
  • Guoqing Zheng
  • Saghar Hosseini
  • Hao Cheng
  • Ge Yang
  • Christopher Meek
  • Ahmed Awadallah

Most recent progress in natural language understanding (NLU) has been driven, in part, by benchmarks such as GLUE, SuperGLUE, SQuAD, etc. In fact, many NLU models have now matched or exceeded "human-level" performance on many tasks in these benchmarks. Most of these benchmarks, however, give models access to relatively large amounts of labeled data for training. As such, the models are provided far more data than required by humans to achieve strong performance. That has motivated a line of work that focuses on improving few-shot learning performance of NLU models. However, there is a lack of standardized evaluation benchmarks for few-shot NLU resulting in different experimental settings in different papers. To help accelerate this line of work, we introduce CLUES, a benchmark for evaluating the few-shot learning capabilities of NLU models. We demonstrate that while recent models reach human performance when they have access to large amounts of labeled data, there is a huge gap in performance in the few-shot setting for most tasks. We also demonstrate differences between alternative model families and adaptation techniques in the few shot setting. Finally, we discuss several principles and choices in designing the experimental settings for evaluating the true few-shot learning performance and suggest a unified standardized approach to few-shot learning evaluation. We aim to encourage research on NLU models that can generalize to new tasks with a small number of examples. Code and data for CLUES are available at https: //github. com/microsoft/CLUES.

ICLR Conference 2021 Conference Paper

SCoRe: Pre-Training for Context Representation in Conversational Semantic Parsing

  • Tao Yu 0009
  • Rui Zhang 0037
  • Alex Polozov
  • Christopher Meek
  • Ahmed Hassan Awadallah

Conversational Semantic Parsing (CSP) is the task of converting a sequence of natural language queries to formal language (e.g., SQL, SPARQL) that can be executed against a structured ontology (e.g. databases, knowledge bases). To accomplish this task, a CSP system needs to model the relation between the unstructured language utterance and the structured ontology while representing the multi-turn dynamics of the dialog. Pre-trained language models (LMs) are the state-of-the-art for various natural language processing tasks. However, existing pre-trained LMs that use language modeling training objectives over free-form text have limited ability to represent natural language references to contextual structural data. In this work, we present SCORE, a new pre-training approach for CSP tasks designed to induce representations that capture the alignment between the dialogue flow and the structural context. We demonstrate the broad applicability of SCORE to CSP tasks by combining SCORE with strong base systems on four different tasks (SPARC, COSQL, MWOZ, and SQA). We show that SCORE can improve the performance over all these base systems by a significant margin and achieves state-of-the-art results on three of them.

UAI Conference 2015 Conference Paper

Selective Greedy Equivalence Search: Finding Optimal Bayesian Networks Using a Polynomial Number of Score Evaluations

  • Max Chickering
  • Christopher Meek

We introduce Selective Greedy Equivalence Search (SGES), a restricted version of Greedy Equivalence Search (GES). SGES retains the asymptotic correctness of GES but, unlike GES, has polynomial performance guarantees. In particular, we show that when data are sampled independently from a distribution that is perfect with respect to a DAG G defined over the observable variables then, in the limit of large data, SGES will identify the equivalence class of G after a number of score evaluations that is (1) polynomial in the number of nodes and (2) exponential in various complexity measures including maximum-number-of-parents, maximum-cliquesize, and a new measure called v-width that is at least as small as—and potentially much smaller than—the other two. More generally, we show that for any hereditary and equivalenceinvariant property Π known to hold in G, we retain the large-sample optimality guarantees of GES even if we ignore any GES deletion operator during the backward phase that results in a state for which Π does not hold in the commondescendants subgraph.

ICML Conference 2014 Conference Paper

Aggregating Ordinal Labels from Crowds by Minimax Conditional Entropy

  • Dengyong Zhou
  • Qiang Liu 0001
  • John C. Platt
  • Christopher Meek

We propose a method to aggregate noisy ordinal labels collected from a crowd of workers or annotators. Eliciting ordinal labels is important in tasks such as judging web search quality and consumer satisfaction. Our method is motivated by the observation that workers usually have difficulty distinguishing between two adjacent ordinal classes whereas distinguishing between two classes which are far away from each other is much easier. We develop the method through minimax conditional entropy subject to constraints which encode this observation. Empirical evaluations on real datasets demonstrate significant improvements over existing methods.

NeurIPS Conference 2014 Conference Paper

Recursive Inversion Models for Permutations

  • Christopher Meek
  • Marina Meila

We develop a new exponential family probabilistic model for permutations that can capture hierarchical structure, and that has the well known Mallows and generalized Mallows models as subclasses. We describe how one can do parameter estimation and propose an approach to structure search for this class of models. We provide experimental evidence that this added flexibility both improves predictive performance and enables a deeper understanding of collections of permutations.

NeurIPS Conference 2011 Conference Paper

A Model for Temporal Dependencies in Event Streams

  • Asela Gunawardana
  • Christopher Meek
  • Puyang Xu

We introduce the Piecewise-Constant Conditional Intensity Model, a model for learning temporal dependencies in event streams. We describe a closed-form Bayesian approach to learning these models, and describe an importance sampling algorithm for forecasting future events using these models, using a proposal distribution based on Poisson superposition. We then use synthetic data, supercomputer event logs, and web search query logs to illustrate that our learning algorithm can efficiently learn nonlinear temporal dependencies, and that our importance sampling algorithm can effectively forecast future events.

NeurIPS Conference 2009 Conference Paper

Improving Existing Fault Recovery Policies

  • Guy Shani
  • Christopher Meek

Automated recovery from failures is a key component in the management of large data centers. Such systems typically employ a hand-made controller created by an expert. While such controllers capture many important aspects of the recovery process, they are often not systematically optimized to reduce costs such as server downtime. In this paper we explain how to use data gathered from the interactions of the hand-made controller with the system, to create an optimized controller. We suggest learning an indefinite horizon Partially Observable Markov Decision Process, a model for decision making under uncertainty, and solve it using a point-based algorithm. We describe the complete process, starting with data gathering, model learning, model checking procedures, and computing a policy. While our paper focuses on a specific domain, our method is applicable to other systems that use a hand-coded, imperfect controllers.

UAI Conference 2008 Conference Paper

Inference for Multiplicative Models

  • Ydo Wexler
  • Christopher Meek

The paper introduces a generalization for known probabilistic models such as log-linear and graphical models, called here multiplicative models. These models, that express probabilities via product of parameters are shown to capture multiple forms of contextual independence between variables, including decision graphs and noisy-OR functions. An inference algorithm for multiplicative models is provided and its correctness is proved. The complexity analysis of the inference algorithm uses a more refined parameter than the tree-width of the underlying graph, and shows the computational cost does not exceed that of the variable elimination algorithm in graphical models. The paper ends with examples where using the new models and algorithm is computationally beneficial.

NeurIPS Conference 2008 Conference Paper

MAS: a multiplicative approximation scheme for probabilistic inference

  • Ydo Wexler
  • Christopher Meek

We propose a multiplicative approximation scheme (MAS) for inference problems in graphical models, which can be applied to various inference algorithms. The method uses $\epsilon$-decompositions which decompose functions used throughout the inference procedure into functions over smaller sets of variables with a known error $\epsilon$. MAS translates these local approximations into bounds on the accuracy of the results. We show how to optimize $\epsilon$-decompositions and provide a fast closed-form solution for an $L_2$ approximation. Applying MAS to the Variable Elimination inference algorithm, we introduce an algorithm we call DynaDecomp which is extremely fast in practice and provides guaranteed error bounds on the result. The superior accuracy and efficiency of DynaDecomp is demonstrated.

AIJ Journal 2006 Journal Article

On the incompatibility of faithfulness and monotone DAG faithfulness

  • David Maxwell Chickering
  • Christopher Meek

Cheng, Greiner, Kelly, Bell and Liu [Artificial Intelligence 137 (2002) 43–90] describe an algorithm for learning Bayesian networks that—in a domain consisting of n variables—identifies the optimal solution using O ( n 4 ) calls to a mutual-information oracle. This result relies on (1) the standard assumption that the generative distribution is Markov and faithful to some directed acyclic graph (DAG), and (2) a new assumption about the generative distribution that the authors call monotone DAG faithfulness (MDF). The MDF assumption rests on an intuitive connection between active paths in a Bayesian-network structure and the mutual information among variables. The assumption states that the (conditional) mutual information between a pair of variables is a monotonic function of the set of active paths between those variables; the more active paths between the variables the higher the mutual information. In this paper, we demonstrate the unfortunate result that, for any realistic learning scenario, the monotone DAG faithfulness assumption is incompatible with the faithfulness assumption. Furthermore, for the class of Bayesian-network structures for which the two assumptions are compatible, we can learn the optimal solution using standard approaches that require only O( n 2 ) calls to an independence oracle.

NeurIPS Conference 2005 Conference Paper

Using ``epitomes'' to model genetic diversity: Rational design of HIV vaccine cocktails

  • Nebojsa Jojic
  • Vladimir Jojic
  • Christopher Meek
  • David Heckerman
  • Brendan Frey

We introduce a new model of genetic diversity which summarizes a large input dataset into an epitome, a short sequence or a small set of short sequences of probability distributions capturing many overlapping subsequences from the dataset. The epitome as a representation has already been used in modeling real-valued signals, such as images and audio. The discrete sequence model we introduce in this paper targets applications in genetics, from multiple alignment to recombination and mutation inference. In our experiments, we concentrate on modeling the diversity of HIV where the epitome emerges as a natural model for producing relatively small vaccines covering a large number of immune system targets known as epitopes. Our experiments show that the epitome includes more epitopes than other vaccine designs of similar length, including cocktails of consensus strains, phylogenetic tree centers, and observed strains. We also discuss epitome designs that take into account uncertainty about Tcell cross reactivity and epitope presentation. In our experiments, we find that vaccine optimization is fairly robust to these uncertainties.

JMLR Journal 2004 Journal Article

Large-Sample Learning of Bayesian Networks is NP-Hard

  • David Maxwell Chickering
  • David Heckerman
  • Christopher Meek

In this paper, we provide new complexity results for algorithms that learn discrete-variable Bayesian networks from data. Our results apply whenever the learning algorithm uses a scoring criterion that favors the simplest structure for which the model is able to represent the generative distribution exactly. Our results therefore hold whenever the learning algorithm uses a consistent scoring criterion and is applied to a sufficiently large dataset. We show that identifying high-scoring structures is NP-hard, even when any combination of one or more of the following hold: the generative distribution is perfect with respect to some DAG containing hidden variables; we are given an independence oracle; we are given an inference oracle; we are given an information oracle; we restrict potential solutions to structures in which each node has at most k parents, for all k >=3. Our proof relies on a new technical result that we establish in the appendices. In particular, we provide a method for constructing the local distributions in a Bayesian network such that the resulting joint distribution is provably perfect with respect to the structure of the network. [abs] [ pdf ]

UAI Conference 2003 Conference Paper

Large-Sample Learning of Bayesian Networks is NP-Hard

  • Max Chickering
  • Christopher Meek
  • David Heckerman

In this paper, we provide new complexity results for algorithms that learn discrete-variable Bayesian networks from data. Our results apply whenever the learning algorithm uses a scoring criterion that favors the simplest model able to represent the generative distribution exactly. Our results therefore hold whenever the learning algorithm uses a consistent scoring criterion and is applied to a sufficiently large dataset. We show that identifying high-scoring structures is hard, even when we are given an independence oracle, an inference oracle, and/or an information oracle. Our negative results also apply to the learning of discrete-variable Bayesian networks in which each node has at most k parents, for all k > 3.

UAI Conference 2002 Conference Paper

CFW: A Collaborative Filtering System Using Posteriors over Weights of Evidence

  • Carl Myers Kadie
  • Christopher Meek
  • David Heckerman

We describe CFW, a computationally efficient algorithm for collaborative filtering that uses posteriors over weights of evidence. In experiments on real data, we show that this method predicts as well or better than other methods in situations where the size of the user query is small. The new approach works particularly well when the user s query CONTAINS low frequency(unpopular) items.The approach complements that OF dependency networks which perform well WHEN the size OF the query IS large.Also IN this paper, we argue that the USE OF posteriors OVER weights OF evidence IS a natural way TO recommend similar items collaborative - filtering task.

UAI Conference 2002 Conference Paper

Factorization of Discrete Probability Distributions

  • Dan Geiger
  • Christopher Meek
  • Bernd Sturmfels

We formulate necessary and sufficient conditions for an arbitrary discrete probability distribution to factor according to an undirected graphical model, or a log-linear model, or other more general exponential models. This result generalizes the well known Hammersley-Clifford Theorem.

JMLR Journal 2002 Journal Article

The Learning-Curve Sampling Method Applied to Model-Based Clustering

  • Christopher Meek
  • Bo Thiesson
  • David Heckerman

We examine the learning-curve sampling method, an approach for applying machine-learning algorithms to large data sets. The approach is based on the observation that the computational cost of learning a model increases as a function of the sample size of the training data, whereas the accuracy of a model has diminishing improvements as a function of sample size. Thus, the learning-curve sampling method monitors the increasing costs and performance as larger and larger amounts of data are used for training, and terminates learning when future costs outweigh future benefits. In this paper, we formalize the learning-curve sampling method and its associated cost-benefit tradeoff in terms of decision theory. In addition, we describe the application of the learning-curve sampling method to the task of model-based clustering via the expectation-maximization (EM) algorithm. In experiments on three real data sets, we show that the learning-curve sampling method produces models that are nearly as accurate as those trained on complete data sets, but with dramatically reduced learning times. Finally, we describe an extension of the basic learning-curve approach for model-based clustering that results in an additional speedup. This extension is based on the observation that the shape of the learning curve for a given model and data set is roughly independent of the number of EM iterations used during training. Thus, we run EM for only a few iterations to decide how many cases to use for training, and then run EM to full convergence once the number of cases is selected.

JMLR Journal 2000 Journal Article

Dependency Networks for Inference, Collaborative Filtering, and Data Visualization

  • David Heckerman
  • David Maxwell Chickering
  • Christopher Meek
  • Robert Rounthwaite
  • Carl Kadie

We describe a graphical model for probabilistic relationships--an alternative to the Bayesian network--called a dependency network. The graph of a dependency network, unlike a Bayesian network, is potentially cyclic. The probability component of a dependency network, like a Bayesian network, is a set of conditional distributions, one for each node given its parents. We identify several basic properties of this representation and describe a computationally efficient procedure for learning the graph and probability components from data. We describe the application of this representation to probabilistic inference, collaborative filtering (the task of predicting preferences), and the visualization of acausal predictive relationships.

UAI Conference 2000 Conference Paper

Perfect Tree-like Markovian Distributions

  • Ann Becker
  • Dan Geiger
  • Christopher Meek

We show that if a strictly positive joint probability distribution for a set of binary random variables factors according to a tree, then vertex separation represents all and only the independence relations enclosed in the distribution. The same result is shown to hold also for multivariate strictly positive normal distributions. Our proof uses a new property of conditional independence that holds for these two classes of probability distributions.

UAI Conference 1999 Conference Paper

Quantifier Elimination for Statistical Problems

  • Dan Geiger
  • Christopher Meek

Recent improvement on Tarski's procedure for quantifier elimination in the first order theory of real numbers makes it feasible to solve small instances of the following problems completely automatically: 1. listing all equality and inequality constraints implied by a graphical model with hidden variables. 2. Comparing graphyical models with hidden variables (i.e., model equivalence, inclusion, and overlap). 3. Answering questions about the identification of a model or portion of a model, and about bounds on quantities derived from a model. 4. Determing whether a given set of independence assertions. We discuss the foundation of quantifier elimination and demonstrate its application to these problems.

AIIM Journal 1997 Journal Article

An evaluation of machine-learning methods for predicting pneumonia mortality

  • Gregory F. Cooper
  • Constantin F. Aliferis
  • Richard Ambrosino
  • John Aronis
  • Bruce G. Buchanan
  • Richard Caruana
  • Michael J. Fine
  • Clark Glymour

This paper describes the application of eight statistical and machine-learning methods to derive computer models for predicting mortality of hospital patients with pneumonia from their findings at initial presentation. The eight models were each constructed based on 9847 patient cases and they were each evaluated on 4352 additional cases. The primary evaluation metric was the error in predicted survival as a function of the fraction of patients predicted to survive. This metric is useful in assessing a model's potential to assist a clinician in deciding whether to treat a given patient in the hospital or at home. We examined the error rates of the models when predicting that a given fraction of patients will survive. We examined survival fractions between 0. 1 and 0. 6. Over this range, each model's predictive error rate was within 1% of the error rate of every other model. When predicting that approximately 30% of the patients will survive, all the models have an error rate of less than 1. 5%. The models are distinguished more by the number of variables and parameters that they contain than by their error rates; these differences suggest which models may be the most amenable to future implementation as paper-based guidelines.

UAI Conference 1997 Conference Paper

Models and Selection Criteria for Regression and Classification

  • David Heckerman
  • Christopher Meek

When performing regression or classification, we are interested in the conditional probability distribution for an outcome or class variable Y given a set of explanatoryor input variables X. We consider Bayesian models for this task. In particular, we examine a special class of models, which we call Bayesian regression/classification (BRC) models, that can be factored into independent conditional (y|x) and input (x) models. These models are convenient, because the conditional model (the portion of the full model that we care about) can be analyzed by itself. We examine the practice of transforming arbitrary Bayesian models to BRC models, and argue that this practice is often inappropriate because it ignores prior knowledge that may be important for learning. In addition, we examine Bayesian methods for learning models from data. We discuss two criteria for Bayesian model selection that are appropriate for repression/classification: one described by Spiegelhalter et al. (1993), and another by Buntine (1993). We contrast these two criteria using the prequential framework of Dawid (1984), and give sufficient conditions under which the criteria agree.

UAI Conference 1997 Conference Paper

Structure and Parameter Learning for Causal Independence and Causal Interaction Models

  • Christopher Meek
  • David Heckerman

This paper discusses causal independence models and a generalization of these models called causal interaction models. Causal interaction models are models that have independent mechanisms where a mechanism can have several causes. In addition to introducing several particular types of causal interaction models, we show how we can apply the Bayesian approach to learning causal interaction models obtaining approximate posterior distributions for the models and obtain MAP and ML estimates for the parameters. We illustrate the approach with a simulation study of learning model posteriors.

UAI Conference 1996 Conference Paper

Asymptotic Model Selection for Directed Networks with Hidden Variables

  • Dan Geiger
  • David Heckerman
  • Christopher Meek

We extend the Bayesian Information Criterion (BIC), an asymptotic approximation for the marginal likelihood, to Bayesian networks with hidden variables. This approximation can be used to select models given large samples of data. The standard BIC as well as our extension punishes the complexity of a model according to the dimension of its parameters. We argue that the dimension of a Bayesian network with hidden variables is the rank of the Jacobian matrix of the transformation between the parameters of the network and the parameters of the observable variables. We compute the dimensions of several networks including the naive Bayes model with a hidden root node.

UAI Conference 1995 Conference Paper

Causal inference and causal explanation with background knowledge

  • Christopher Meek

This paper presents correct algorithms for answering the following two questions; (i) Does there exist a causal explanation consistent with a set of background knowledge which explains all of the observed independence facts in a sample? (ii) Given that there is such a causal explanation what are the causal relationships common to every such causal explanation?

UAI Conference 1995 Conference Paper

Causal Inference in the Presence of Latent Variables and Selection Bias

  • Peter Spirtes
  • Christopher Meek
  • Thomas S. Richardson 0001

We show that there is a general, informative and reliable procedure for discovering causal relations when, for all the investigator knows, both latent variables and selection bias may be at work. Given information about conditional independence and dependence relations between measured variables, even when latent variables and selection bias may be present, there are sufficient conditions for reliably concluding that there is a causal path from one variable to another, and sufficient conditions for reliably concluding when no such causal path exists.

UAI Conference 1995 Conference Paper

Strong completeness and faithfulness in Bayesian networks

  • Christopher Meek

A completeness result for d-separation applied to discrete Bayesian networks is presented and it is shown that in a strong measure-theoretic sense almost all discrete distributions for a given network structure are faithful; i.e. the independence facts true of the distribution are all and only those entailed by the network structure.