Author name cluster

Stephen Bach

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers

1 author row

NeurIPS Conference 2023 Conference Paper

Enhancing CLIP with CLIP: Exploring Pseudolabeling for Limited-Label Prompt Tuning

Cristina Menghini
Andrew Delworth
Stephen Bach

Fine-tuning vision-language models (VLMs) like CLIP to downstream tasks is often necessary to optimize their performance. However, a major obstacle is the limited availability of labeled data. We study the use of pseudolabels, i. e. , heuristic labels for unlabeled data, to enhance CLIP via prompt tuning. Conventional pseudolabeling trains a model on labeled data and then generates labels for unlabeled data. VLMs' zero-shot capabilities enable a ``second generation'' of pseudolabeling approaches that do not require task-specific training on labeled data. By using zero-shot pseudolabels as a source of supervision, we observe that learning paradigms such as semi-supervised, transductive zero-shot, and unsupervised learning can all be seen as optimizing the same loss function. This unified view enables the development of versatile training strategies that are applicable across learning paradigms. We investigate them on image classification tasks where CLIP exhibits limitations, by varying prompt modalities, e. g. , textual or visual prompts, and learning paradigms. We find that(1) unexplored prompt tuning strategies that iteratively refine pseudolabels consistently improve CLIP accuracy, by 19. 5 points in semi-supervised learning, by 28. 4 points in transductive zero-shot learning, and by 15. 2 points in unsupervised learning, and (2) unlike conventional semi-supervised pseudolabeling, which exacerbates model biases toward classes with higher-quality pseudolabels, prompt tuning leads to a more equitable distribution of per-class accuracy. The code to reproduce the experiments is at https: //github. com/BatsResearch/menghini-neurips23-code.

PDF Details

NeurIPS Conference 2022 Conference Paper

BigBio: A Framework for Data-Centric Biomedical Natural Language Processing

Jason Fries
Leon Weber
Natasha Seelam
Gabriel Altay
Debajyoti Datta
Samuele Garda
Sunny Kang
Rosaline Su

Training and evaluating language models increasingly requires the construction of meta-datasets -- diverse collections of curated data with clear provenance. Natural language prompting has recently lead to improved zero-shot generalization by transforming existing, supervised datasets into a variety of novel instruction tuning tasks, highlighting the benefits of meta-dataset curation. While successful in general-domain text, translating these data-centric approaches to biomedical language modeling remains challenging, as labeled biomedical datasets are significantly underrepresented in popular data hubs. To address this challenge, we introduce BigBio a community library of 126+ biomedical NLP datasets, currently covering 13 task categories and 10+ languages. BigBio facilitates reproducible meta-dataset curation via programmatic access to datasets and their metadata, and is compatible with current platforms for prompt engineering and end-to-end few/zero shot language model evaluation. We discuss our process for task schema harmonization, data auditing, contribution guidelines, and outline two illustrative use cases: zero-shot evaluation of biomedical prompts and large-scale, multi-task learning. BigBio is an ongoing community effort and is available at https: //github. com/bigscience-workshop/biomedical

PDF Details

NeurIPS Conference 2022 Conference Paper

Tight Lower Bounds on Worst-Case Guarantees for Zero-Shot Learning with Attributes

Alessio Mazzetto
Cristina Menghini
Andrew Yuan
Eli Upfal
Stephen Bach

We develop a rigorous mathematical analysis of zero-shot learning with attributes. In this setting, the goal is to label novel classes with no training data, only detectors for attributes and a description of how those attributes are correlated with the target classes, called the class-attribute matrix. We develop the first non-trivial lower bound on the worst-case error of the best map from attributes to classes for this setting, even with perfect attribute detectors. The lower bound characterizes the theoretical intrinsic difficulty of the zero-shot problem based on the available information---the class-attribute matrix---and the bound is practically computable from it. Our lower bound is tight, as we show that we can always find a randomized map from attributes to classes whose expected error is upper bounded by the value of the lower bound. We show that our analysis can be predictive of how standard zero-shot methods behave in practice, including which classes will likely be confused with others.

PDF Details

TMLR Journal 2022 Journal Article

Zero-Shot Learning with Common Sense Knowledge Graphs

Nihal V. Nayak
Stephen Bach

Zero-shot learning relies on semantic class representations such as hand-engineered attributes or learned embeddings to predict classes without any labeled examples. We propose to learn class representations by embedding nodes from common sense knowledge graphs in a vector space. Common sense knowledge graphs are an untapped source of explicit high-level knowledge that requires little human effort to apply to a range of tasks. To capture the knowledge in the graph, we introduce ZSL-KG, a general-purpose framework with a novel transformer graph convolutional network (TrGCN) for generating class representations. Our proposed TrGCN architecture computes non-linear combinations of node neighbourhoods. Our results show that ZSL-KG improves over existing WordNet-based methods on five out of six zero-shot benchmark datasets in language and vision.

PDF Details

AAAI Conference 2020 Conference Paper

Weakly Supervised Sequence Tagging from Noisy Rules

Esteban Safranchik
Shiying Luo
Stephen Bach

We propose a framework for training sequence tagging models with weak supervision consisting of multiple heuristic rules of unknown accuracy. In addition to supporting rules that vote on tags in the output sequence, we introduce a new type of weak supervision, called linking rules, that vote on how sequence elements should be grouped into spans with the same tag. These rules are an alternative to candidate span generators that require signiﬁcantly more human effort. To estimate the accuracies of the rules and combine their con- ﬂicting outputs into training data, we introduce a new type of generative model, linked hidden Markov models (linked HMMs), and prove they are generically identiﬁable (up to a tag permutation) without any observed training labels. We ﬁnd that linked HMMs provide an average 7 F1 point boost on benchmark named entity recognition tasks versus generative models that assume the tags are i. i. d. Further, neural sequence taggers trained with these structure-aware generative models outperform comparable state-of-the-art approaches to weak supervision by an average of 2. 6 F1 points.

PDF Details

NeurIPS Conference 2012 Conference Paper

Scaling MPE Inference for Constrained Continuous Markov Random Fields with Consensus Optimization

Stephen Bach
Matthias Broecheler
Lise Getoor
Dianne O'leary

Probabilistic graphical models are powerful tools for analyzing constrained, continuous domains. However, finding most-probable explanations (MPEs) in these models can be computationally expensive. In this paper, we improve the scalability of MPE inference in a class of graphical models with piecewise-linear and piecewise-quadratic dependencies and linear constraints over continuous domains. We derive algorithms based on a consensus-optimization framework and demonstrate their superior performance over state of the art. We show empirically that in a large-scale voter-preference modeling problem our algorithms scale linearly in the number of dependencies and constraints.

PDF Details

NeurIPS Conference 2010 Conference Paper

A Bayesian Approach to Concept Drift

Stephen Bach
Mark Maloof

To cope with concept drift, we placed a probability distribution over the location of the most-recent drift point. We used Bayesian model comparison to update this distribution from the predictions of models trained on blocks of consecutive observations and pruned potential drift points with low probability. We compare our approach to a non-probabilistic method for drift and a probabilistic method for change-point detection. In our experiments, our approach generally yielded improved accuracy and/or speed over these other methods.

PDF Details