Author name cluster

Daniel S. Weld

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

52 papers

2 author rows

AAAI Conference 2022 Conference Paper

A Search Engine for Discovery of Scientific Challenges and Directions

Dan Lahav
Jon Saad Falcon
Bailey Kuehl
Sophie Johnson
Sravanthi Parasa
Noam Shomron
Duen Horng Chau
Diyi Yang

Keeping track of scientific challenges, advances and emerging directions is a fundamental part of research. However, researchers face a flood of papers that hinders discovery of important knowledge. In biomedicine, this directly impacts human lives. To address this problem, we present a novel task of extraction and search of scientific challenges and directions, to facilitate rapid knowledge discovery. We construct and release an expert-annotated corpus of texts sampled from full-length papers, labeled with novel semantic categories that generalize across many types of challenges and directions. We focus on a large corpus of interdisciplinary work relating to the COVID-19 pandemic, ranging from biomedicine to areas such as AI and economics. We apply a model trained on our data to identify challenges and directions across the corpus and build a dedicated search engine. In experiments with 19 researchers and clinicians using our system, we outperform a popular scientific search engine in assisting knowledge discovery. Finally, we show that models trained on our resource generalize to the wider biomedical domain and to AI papers, highlighting its broad utility. We make our data, model and search engine publicly available.

PDF Details

AAAI Conference 2021 Conference Paper

Is the Most Accurate AI the Best Teammate? Optimizing AI for Teamwork

Gagan Bansal
Besmira Nushi
Ece Kamar
Eric Horvitz
Daniel S. Weld

AI practitioners typically strive to develop the most accurate systems, making an implicit assumption that the AI system will function autonomously. However, in practice, AI systems often are used to provide advice to people in domains ranging from criminal justice and finance to healthcare. In such AI-advised decision making, humans and machines form a team, where the human is responsible for making final decisions. But is the most accurate AI the best teammate? We argue “not necessarily” — predictable performance may be worth a slight sacrifice in AI accuracy. Instead, we argue that AI systems should be trained in a human-centered manner, directly optimized for team performance. We study this proposal for a specific type of human-AI teaming, where the human overseer chooses to either accept the AI recommendation or solve the task themselves. To optimize the team performance for this setting we maximize the team’s expected utility, expressed in terms of the quality of the final decision, cost of verifying, and individual accuracies of people and machines. Our experiments with linear and non-linear models on real-world, high-stakes datasets show that the most accuracy AI may not lead to highest team performance and show the benefit of modeling teamwork during training through improvements in expected team utility across datasets, considering parameters such as human skill and the cost of mistakes. We discuss the shortcoming of current optimization approaches beyond well-studied loss functions such as log-loss, and encourage future work on AI optimization problems motivated by human-AI collaboration.

PDF Details

AAAI Conference 2019 Conference Paper

Updates in Human-AI Teams: Understanding and Addressing the Performance/Compatibility Tradeoff

Gagan Bansal
Besmira Nushi
Ece Kamar
Daniel S. Weld
Walter S. Lasecki
Eric Horvitz

AI systems are being deployed to support human decision making in high-stakes domains such as healthcare and criminal justice. In many cases, the human and AI form a team, in which the human makes decisions after reviewing the AI’s inferences. A successful partnership requires that the human develops insights into the performance of the AI system, including its failures. We study the influence of updates to an AI system in this setting. While updates can increase the AI’s predictive performance, they may also lead to behavioral changes that are at odds with the user’s prior experiences and confidence in the AI’s inferences. We show that updates that increase AI performance may actually hurt team performance. We introduce the notion of the compatibility of an AI update with prior user experience and present methods for studying the role of compatibility in human-AI teams. Empirical results on three high-stakes classification tasks show that current machine learning algorithms do not produce compatible updates. We propose a re-training objective to improve the compatibility of an update by penalizing new errors. The objective offers full leverage of the performance/compatibility tradeoff across different datasets, enabling more compatible yet accurate updates.

PDF Details

AAMAS Conference 2016 Conference Paper

Optimal Testing for Crowd Workers

Jonathan Bragg
NONE-REMOVE Mausam
Daniel S. Weld

Requesters on crowdsourcing platforms, such as Amazon Mechanical Turk, routinely insert gold questions to verify that a worker is diligent and is providing high-quality answers. However, there is no clear understanding of when and how many gold questions to insert. Typically, requesters mix a flat 10–30% of gold questions into the task stream of every worker. This static policy is arbitrary and wastes valuable budget — the exact percentage is often chosen with little experimentation, and, more importantly, it does not adapt to individual workers, the current mixture of spamming vs. diligent workers, or the number of tasks workers perform before quitting. We formulate the problem of balancing between (1) testing workers to determine their accuracy and (2) actually getting work performed as a partially-observable Markov decision process (POMDP) and apply reinforcement learning to dynamically calculate the best policy. Evaluations on both synthetic data and with real Mechanical Turk workers show that our agent learns adaptive testing policies that produce up to 111% more reward than the non-adaptive policies used by most requesters. Furthermore, our method is fully automated, easy to apply, and runs mostly out of the box.

PDF

AIJ Journal 2013 Journal Article

POMDP-based control of workflows for crowdsourcing

Peng Dai
Christopher H. Lin
Mausam
Daniel S. Weld

Crowdsourcing, outsourcing of tasks to a crowd of unknown people (“workers”) in an open call, is rapidly rising in popularity. It is already being heavily used by numerous employers (“requesters”) for solving a wide variety of tasks, such as audio transcription, content screening, and labeling training data for machine learning. However, quality control of such tasks continues to be a key challenge because of the high variability in worker quality. In this paper we show the value of decision-theoretic techniques for the problem of optimizing workflows used in crowdsourcing. In particular, we design AI agents that use Bayesian network learning and inference in combination with Partially-Observable Markov Decision Processes (POMDPs) for obtaining excellent cost-quality tradeoffs. We use these techniques for three distinct crowdsourcing scenarios: (1) control of voting to answer a binary-choice question, (2) control of an iterative improvement workflow, and (3) control of switching between alternate workflows for a task. In each scenario, we design a Bayes net model that relates worker competency, task difficulty and worker response quality. We also design a POMDP for each task, whose solution provides the dynamic control policy. We demonstrate the usefulness of our models and agents in live experiments on Amazon Mechanical Turk. We consistently achieve superior quality results than non-adaptive controllers, while incurring equal or less cost.