Arrow Research search

Author name cluster

Eric Horvitz

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

79 papers
2 author rows

Possible papers

79

ICLR Conference 2025 Conference Paper

Improving Instruction-Following in Language Models through Activation Steering

  • Alessandro Stolfo
  • Vidhisha Balachandran
  • Safoora Yousefi
  • Eric Horvitz
  • Besmira Nushi

The ability to follow instructions is crucial for numerous real-world applications of language models. In pursuit of deeper insights and more powerful capabilities, we derive instruction-specific vector representations from language models and use them to steer models accordingly. These vectors are computed as the difference in activations between inputs with and without instructions, enabling a modular approach to activation steering. We demonstrate how this method can enhance model adherence to constraints such as output format, length, and word inclusion, providing inference-time control over instruction following. Our experiments across four models demonstrate how we can use the activation vectors to guide models to follow constraints even without explicit instructions and to enhance performance when instructions are present. Additionally, we explore the compositionality of activation steering, successfully applying multiple instructions simultaneously. Finally, we demonstrate that steering vectors computed on instruction-tuned models can transfer to improve base models. Our findings demonstrate that activation steering offers a practical and scalable approach for fine-grained control in language generation. Our code and data are available at https://github.com/microsoft/llm-steer-instruct.

ICLR Conference 2025 Conference Paper

Utility-Directed Conformal Prediction: A Decision-Aware Framework for Actionable Uncertainty Quantification

  • Santiago Cortes-Gomez
  • Carlos Miguel Patiño
  • Yewon Byun
  • Zhiwei Steven Wu
  • Eric Horvitz
  • Bryan Wilder

There is increasing interest in ``decision-focused" machine learning methods which train models to account for how their predictions are used in downstream optimization problems. Doing so can often improve performance on subsequent decision problems. However, current methods for uncertainty quantification do not incorporate any information at all about downstream decisions. We develop a framework based on conformal prediction to produce prediction sets that account for a downstream decision loss function, making them more appropriate to inform high-stakes decision-making. Our approach harnesses the strengths of conformal methods—modularity, model-agnosticism, and statistical coverage guarantees—while incorporating downstream decisions and user-specified utility functions. We prove that our methods retain standard coverage guarantees. Empirical evaluation across a range of datasets and utility metrics demonstrates that our methods achieve significantly lower decision loss compared to standard conformal methods. Additionally, we present a real-world use case in healthcare diagnosis, where our method effectively incorporates the hierarchical structure of dermatological diseases. It successfully generates sets with coherent diagnostic meaning, aiding the triage process during dermatology diagnosis and illustrating how our method can ground high-stakes decision-making on external domain knowledge.

AAAI Conference 2024 Conference Paper

When to Show a Suggestion? Integrating Human Feedback in AI-Assisted Programming

  • Hussein Mozannar
  • Gagan Bansal
  • Adam Fourney
  • Eric Horvitz

AI powered code-recommendation systems, such as Copilot and CodeWhisperer, provide code suggestions inside a programmer's environment (e.g., an IDE) with the aim of improving productivity. We pursue mechanisms for leveraging signals about programmers' acceptance and rejection of code suggestions to guide recommendations. We harness data drawn from interactions with GitHub Copilot, a system used by millions of programmers, to develop interventions that can save time for programmers. We introduce a utility-theoretic framework to drive decisions about suggestions to display versus withhold. The approach, conditional suggestion display from human feedback (CDHF), relies on a cascade of models that provide the likelihood that recommended code will be accepted. These likelihoods are used to selectively hide suggestions, reducing both latency and programmer verification time. Using data from 535 programmers, we perform a retrospective evaluation of CDHF and show that we can avoid displaying a significant fraction of suggestions that would have been rejected. We further demonstrate the importance of incorporating the programmer's latent unobserved state in decisions about when to display suggestions through an ablation study. Finally, we showcase how using suggestion acceptance as a reward signal for guiding the display of suggestions can lead to suggestions of reduced quality, indicating an unexpected pitfall.

AAAI Conference 2022 Conference Paper

A Search Engine for Discovery of Scientific Challenges and Directions

  • Dan Lahav
  • Jon Saad Falcon
  • Bailey Kuehl
  • Sophie Johnson
  • Sravanthi Parasa
  • Noam Shomron
  • Duen Horng Chau
  • Diyi Yang

Keeping track of scientific challenges, advances and emerging directions is a fundamental part of research. However, researchers face a flood of papers that hinders discovery of important knowledge. In biomedicine, this directly impacts human lives. To address this problem, we present a novel task of extraction and search of scientific challenges and directions, to facilitate rapid knowledge discovery. We construct and release an expert-annotated corpus of texts sampled from full-length papers, labeled with novel semantic categories that generalize across many types of challenges and directions. We focus on a large corpus of interdisciplinary work relating to the COVID-19 pandemic, ranging from biomedicine to areas such as AI and economics. We apply a model trained on our data to identify challenges and directions across the corpus and build a dedicated search engine. In experiments with 19 researchers and clinicians using our system, we outperform a popular scientific search engine in assisting knowledge discovery. Finally, we show that models trained on our resource generalize to the wider biomedical domain and to AI papers, highlighting its broad utility. We make our data, model and search engine publicly available.

ICML Conference 2021 Conference Paper

Exploiting structured data for learning contagious diseases under incomplete testing

  • Maggie Makar
  • Lauren West
  • David Hooper
  • Eric Horvitz
  • Erica Shenoy
  • John V. Guttag

One of the ways that machine learning algorithms can help control the spread of an infectious disease is by building models that predict who is likely to become infected making them good candidates for preemptive interventions. In this work we ask: can we build reliable infection prediction models when the observed data is collected under limited, and biased testing that prioritizes testing symptomatic individuals? Our analysis suggests that when the infection is highly transmissible, incomplete testing might be sufficient to achieve good out-of-sample prediction error. Guided by this insight, we develop an algorithm that predicts infections, and show that it outperforms baselines on simulated data. We apply our model to data from a large hospital to predict Clostridioides difficile infections; a communicable disease that is characterized by both symptomatically infected and asymptomatic (i. e. , untested) carriers. Using a proxy instead of the unobserved untested-infected state, we show that our model outperforms benchmarks in predicting infections.

AAAI Conference 2021 Conference Paper

Is the Most Accurate AI the Best Teammate? Optimizing AI for Teamwork

  • Gagan Bansal
  • Besmira Nushi
  • Ece Kamar
  • Eric Horvitz
  • Daniel S. Weld

AI practitioners typically strive to develop the most accurate systems, making an implicit assumption that the AI system will function autonomously. However, in practice, AI systems often are used to provide advice to people in domains ranging from criminal justice and finance to healthcare. In such AI-advised decision making, humans and machines form a team, where the human is responsible for making final decisions. But is the most accurate AI the best teammate? We argue “not necessarily” — predictable performance may be worth a slight sacrifice in AI accuracy. Instead, we argue that AI systems should be trained in a human-centered manner, directly optimized for team performance. We study this proposal for a specific type of human-AI teaming, where the human overseer chooses to either accept the AI recommendation or solve the task themselves. To optimize the team performance for this setting we maximize the team’s expected utility, expressed in terms of the quality of the final decision, cost of verifying, and individual accuracies of people and machines. Our experiments with linear and non-linear models on real-world, high-stakes datasets show that the most accuracy AI may not lead to highest team performance and show the benefit of modeling teamwork during training through improvements in expected team utility across datasets, considering parameters such as human skill and the cost of mistakes. We discuss the shortcoming of current optimization approaches beyond well-studied loss functions such as log-loss, and encourage future work on AI optimization problems motivated by human-AI collaboration.

JAIR Journal 2020 Journal Article

Blind Spot Detection for Safe Sim-to-Real Transfer

  • Ramya Ramakrishnan
  • Ece Kamar
  • Debadeepta Dey
  • Eric Horvitz
  • Julie Shah

Agents trained in simulation may make errors when performing actions in the real world due to mismatches between training and execution environments. These mistakes can be dangerous and difficult for the agent to discover because the agent is unable to predict them a priori. In this work, we propose the use of oracle feedback to learn a predictive model of these blind spots in order to reduce costly errors in real-world applications. We focus on blind spots in reinforcement learning (RL) that occur due to incomplete state representation: when the agent lacks necessary features to represent the true state of the world, and thus cannot distinguish between numerous states. We formalize the problem of discovering blind spots in RL as a noisy supervised learning problem with class imbalance. Our system learns models for predicting blind spots within unseen regions of the state space by combining techniques for label aggregation, calibration, and supervised learning. These models take into consideration noise emerging from different forms of oracle feedback, including demonstrations and corrections. We evaluate our approach across two domains and demonstrate that it achieves higher predictive performance than baseline methods, and also that the learned model can be used to selectively query an oracle at execution time to prevent errors. We also empirically analyze the biases of various feedback types and how these biases influence the discovery of blind spots. Further, we include analyses of our approach that incorporate relaxed initial optimality assumptions. (Interestingly, relaxing the assumptions of an optimal oracle and an optimal simulator policy helped our models to perform better.) We also propose extensions to our method that are intended to improve performance when using corrections and demonstrations data.

IJCAI Conference 2020 Conference Paper

Learning to Complement Humans

  • Bryan Wilder
  • Eric Horvitz
  • Ece Kamar

A rising vision for AI in the open world centers on the development of systems that can complement humans for perceptual, diagnostic, and reasoning tasks. To date, systems aimed at complementing the skills of people have employed models trained to be as accurate as possible in isolation. We demonstrate how an end-to-end learning strategy can be harnessed to optimize the combined performance of human-machine teams by considering the distinct abilities of people and machines. The goal is to focus machine learning on problem instances that are difficult for humans, while recognizing instances that are difficult for the machine and seeking human input on them. We demonstrate in two real-world domains (scientific discovery and medical diagnosis) that human-machine teams built via these methods outperform the individual performance of machines and people. We then analyze conditions under which this complementarity is strongest, and which training methods amplify it. Taken together, our work provides the first systematic investigation of how machine learning systems can be trained to complement human reasoning.

AAAI Conference 2020 Conference Paper

Metareasoning in Modular Software Systems: On-the-Fly Configuration Using Reinforcement Learning with Rich Contextual Representations

  • Aditya Modi
  • Debadeepta Dey
  • Alekh Agarwal
  • Adith Swaminathan
  • Besmira Nushi
  • Sean Andrist
  • Eric Horvitz

Assemblies of modular subsystems are being pressed into service to perform sensing, reasoning, and decision making in high-stakes, time-critical tasks in areas such as transportation, healthcare, and industrial automation. We address the opportunity to maximize the utility of an overall computing system by employing reinforcement learning to guide the configuration of the set of interacting modules that comprise the system. The challenge of doing system-wide optimization is a combinatorial problem. Local attempts to boost the performance of a specific module by modifying its configuration often leads to losses in overall utility of the system’s performance as the distribution of inputs to downstream modules changes drastically. We present metareasoning techniques which consider a rich representation of the input, monitor the state of the entire pipeline, and adjust the configuration of modules on-the-fly so as to maximize the utility of a system’s operation. We show significant improvement in both real-world and synthetic pipelines across a variety of reinforcement learning techniques.

NeurIPS Conference 2019 Conference Paper

Bias Correction of Learned Generative Models using Likelihood-Free Importance Weighting

  • Aditya Grover
  • Jiaming Song
  • Ashish Kapoor
  • Kenneth Tran
  • Alekh Agarwal
  • Eric Horvitz
  • Stefano Ermon

A learned generative model often produces biased statistics relative to the underlying data distribution. A standard technique to correct this bias is importance sampling, where samples from the model are weighted by the likelihood ratio under model and true distributions. When the likelihood ratio is unknown, it can be estimated by training a probabilistic classifier to distinguish samples from the two distributions. We employ this likelihood-free importance weighting method to correct for the bias in generative models. We find that this technique consistently improves standard goodness-of-fit metrics for evaluating the sample quality of state-of-the-art deep generative models, suggesting reduced bias. Finally, we demonstrate its utility on representative applications in a) data augmentation for classification using generative adversarial networks, and b) model-based policy evaluation using off-policy data.

NeurIPS Conference 2019 Conference Paper

Efficient Forward Architecture Search

  • Hanzhang Hu
  • John Langford
  • Rich Caruana
  • Saurajit Mukherjee
  • Eric Horvitz
  • Debadeepta Dey

We propose a neural architecture search (NAS) algorithm, Petridish, to iteratively add shortcut connections to existing network layers. The added shortcut connections effectively perform gradient boosting on the augmented layers. The proposed algorithm is motivated by the feature selection algorithm forward stage-wise linear regression, since we consider NAS as a generalization of feature selection for regression, where NAS selects shortcuts among layers instead of selecting features. In order to reduce the number of trials of possible connection combinations, we train jointly all possible connections at each stage of growth while leveraging feature selection techniques to choose a subset of them. We experimentally show this process to be an efficient forward architecture search algorithm that can find competitive models using few GPU days in both the search space of repeatable network modules (cell-search) and the space of general networks (macro-search). Petridish is particularly well-suited for warm-starting from existing models crucial for lifelong-learning scenarios.

AAAI Conference 2019 Conference Paper

Overcoming Blind Spots in the Real World: Leveraging Complementary Abilities for Joint Execution

  • Ramya Ramakrishnan
  • Ece Kamar
  • Besmira Nushi
  • Debadeepta Dey
  • Julie Shah
  • Eric Horvitz

Simulators are being increasingly used to train agents before deploying them in real-world environments. While training in simulation provides a cost-effective way to learn, poorly modeled aspects of the simulator can lead to costly mistakes, or blind spots. While humans can help guide an agent towards identifying these error regions, humans themselves have blind spots and noise in execution. We study how learning about blind spots of both can be used to manage hand-off decisions when humans and agents jointly act in the real-world in which neither of them are trained or evaluated fully. The formulation assumes that agent blind spots result from representational limitations in the simulation world, which leads the agent to ignore important features that are relevant for acting in the open world. Our approach for blind spot discovery combines experiences collected in simulation with limited human demonstrations. The first step applies imitation learning to demonstration data to identify important features that the human is using but that the agent is missing. The second step uses noisy labels extracted from action mismatches between the agent and the human across simulation and demonstration data to train blind spot models. We show through experiments on two domains that our approach is able to learn a succinct representation that accurately captures blind spot regions and avoids dangerous errors in the real world through transfer of control between the agent and the human.

AAAI Conference 2019 Conference Paper

Reverse-Engineering Satire, or “Paper on Computational Humor Accepted despite Making Serious Advances”

  • Robert West
  • Eric Horvitz

Humor is an essential human trait. Efforts to understand humor have called out links between humor and the foundations of cognition, as well as the importance of humor in social engagement. As such, it is a promising and important subject of study, with relevance for artificial intelligence and human– computer interaction. Previous computational work on humor has mostly operated at a coarse level of granularity, e. g. , predicting whether an entire sentence, paragraph, document, etc. , is humorous. As a step toward deep understanding of humor, we seek fine-grained models of attributes that make a given text humorous. Starting from the observation that satirical news headlines tend to resemble serious news headlines, we build and analyze a corpus of satirical headlines paired with nearly identical but serious headlines. The corpus is constructed via Unfun. me, an online game that incentivizes players to make minimal edits to satirical headlines with the goal of making other players believe the results are serious headlines. The edit operations used to successfully remove humor pinpoint the words and concepts that play a key role in making the original, satirical headline funny. Our analysis reveals that the humor tends to reside toward the end of headlines, and primarily in noun phrases, and that most satirical headlines follow a certain logical pattern, which we term false analogy. Overall, this paper deepens our understanding of the syntactic and semantic structure of satirical news headlines and provides insights for building humor-producing systems.

NeurIPS Conference 2019 Conference Paper

Staying up to Date with Online Content Changes Using Reinforcement Learning for Scheduling

  • Andrey Kolobov
  • Yuval Peres
  • Cheng Lu
  • Eric Horvitz

From traditional Web search engines to virtual assistants and Web accelerators, services that rely on online information need to continually keep track of remote content changes by explicitly requesting content updates from remote sources (e. g. , web pages). We propose a novel optimization objective for this setting that has several practically desirable properties, and efficient algorithms for it with optimality guarantees even in the face of mixed content change observability and initially unknown change model parameters. Experiments on 18. 5M URLs crawled daily for 14 weeks show significant advantages of this approach over prior art.

AAAI Conference 2019 Conference Paper

Traffic Updates: Saying a Lot While Revealing a Little

  • John Krumm
  • Eric Horvitz

Taking speed reports from vehicles is a proven, inexpensive way to infer traffic conditions. However, due to concerns about privacy and bandwidth, not every vehicle occupant may want to transmit data about their location and speed in real time. We show how to drastically reduce the number of transmissions in two ways, both based on a Markov random field for modeling traffic speed and flow. First, we show that a only a small number of vehicles need to report from each location. We give a simple, probabilistic method that lets a group of vehicles decide on which subset will transmit a report, preserving privacy by coordinating without any communication. The second approach computes the potential value of any location’s speed report, emphasizing those reports that will most affect the overall speed inferences, and omitting those that contribute little value. Both methods significantly reduce the amount of communication necessary for accurate speed inferences on a road network.

AAAI Conference 2019 Conference Paper

Updates in Human-AI Teams: Understanding and Addressing the Performance/Compatibility Tradeoff

  • Gagan Bansal
  • Besmira Nushi
  • Ece Kamar
  • Daniel S. Weld
  • Walter S. Lasecki
  • Eric Horvitz

AI systems are being deployed to support human decision making in high-stakes domains such as healthcare and criminal justice. In many cases, the human and AI form a team, in which the human makes decisions after reviewing the AI’s inferences. A successful partnership requires that the human develops insights into the performance of the AI system, including its failures. We study the influence of updates to an AI system in this setting. While updates can increase the AI’s predictive performance, they may also lead to behavioral changes that are at odds with the user’s prior experiences and confidence in the AI’s inferences. We show that updates that increase AI performance may actually hurt team performance. We introduce the notion of the compatibility of an AI update with prior user experience and present methods for studying the role of compatibility in human-AI teams. Empirical results on three high-stakes classification tasks show that current machine learning algorithms do not produce compatible updates. We propose a re-training objective to improve the compatibility of an update by penalizing new errors. The objective offers full leverage of the performance/compatibility tradeoff across different datasets, enabling more compatible yet accurate updates.

AAMAS Conference 2018 Conference Paper

Discovering Blind Spots in Reinforcement Learning

  • Ramya Ramakrishnan
  • Ece Kamar
  • Debadeepta Dey
  • Julie Shah
  • Eric Horvitz

Agents trained in simulation may make errors in the real world due to mismatches between training and execution environments. These mistakes can be dangerous and difficult to discover because the agent cannot predict them a priori. We propose using oracle feedback to learn a predictive model of these blind spots to reduce costly errors in real-world applications. We focus on blind spots in reinforcement learning (RL) that occur due to incomplete state representation: The agent does not have the appropriate features to represent the true state of the world and thus cannot distinguish among numerous states. We formalize the problem of discovering blind spots in RL as a noisy supervised learning problem with class imbalance. We learn models to predict blind spots in unseen regions of the state space by combining techniques for label aggregation, calibration, and supervised learning. The models take into consideration noise emerging from different forms of oracle feedback, including demonstrations and corrections. We evaluate our approach on two domains and show that it achieves higher predictive performance than baseline methods, and that the learned model can be used to selectively query an oracle at execution time to prevent errors. We also empirically analyze the biases of various feedback types and how they influence the discovery of blind spots.

AAAI Conference 2018 Conference Paper

Optimizing Interventions via Offline Policy Evaluation: Studies in Citizen Science

  • Avi Segal
  • Kobi Gal
  • Ece Kamar
  • Eric Horvitz
  • Grant Miller

Volunteers who help with online crowdsourcing such as citizen science tasks typically make only a few contributions before exiting. We propose a computational approach for increasing users’ engagement in such settings that is based on optimizing policies for displaying motivational messages to users. The approach, which we refer to as Trajectory Corrected Intervention (TCI), reasons about the tradeoff between the long-term influence of engagement messages on participants’ contributions and the potential risk of disrupting their current work. We combine model-based reinforcement learning with off-line policy evaluation to generate intervention policies, without relying on a fixed representation of the domain. TCI works iteratively to learn the best representation from a set of random intervention trials and to generate candidate intervention policies. It is able to refine selected policies off-line by exploiting the fact that users can only be interrupted once per session. We implemented TCI in the wild with Galaxy Zoo, one of the largest citizen science platforms on the web. We found that TCI was able to outperform the state-of-the-art intervention policy for this domain, and significantly increased the contributions of thousands of users. This work demonstrates the benefit of combining traditional AI planning with off-line policy methods to generate intelligent intervention strategies.

NeurIPS Conference 2017 Conference Paper

Estimating Accuracy from Unlabeled Data: A Probabilistic Logic Approach

  • Emmanouil Platanios
  • Hoifung Poon
  • Tom Mitchell
  • Eric Horvitz

We propose an efficient method to estimate the accuracy of classifiers using only unlabeled data. We consider a setting with multiple classification problems where the target classes may be tied together through logical constraints. For example, a set of classes may be mutually exclusive, meaning that a data instance can belong to at most one of them. The proposed method is based on the intuition that: (i) when classifiers agree, they are more likely to be correct, and (ii) when the classifiers make a prediction that violates the constraints, at least one classifier must be making an error. Experiments on four real-world data sets produce accuracy estimates within a few percent of the true accuracy, using solely unlabeled data. Our models also outperform existing state-of-the-art solutions in both estimating accuracies, and combining multiple classifier outputs. The results emphasize the utility of logical constraints in estimating accuracy, thus validating our intuition.

AAAI Conference 2017 Conference Paper

Identifying Unknown Unknowns in the Open World: Representations and Policies for Guided Exploration

  • Himabindu Lakkaraju
  • Ece Kamar
  • Rich Caruana
  • Eric Horvitz

Predictive models deployed in the real world may assign incorrect labels to instances with high confidence. Such errors or unknown unknowns are rooted in model incompleteness, and typically arise because of the mismatch between training data and the cases encountered at test time. As the models are blind to such errors, input from an oracle is needed to identify these failures. In this paper, we formulate and address the problem of informed discovery of unknown unknowns of any given predictive model where unknown unknowns occur due to systematic biases in the training data. We propose a modelagnostic methodology which uses feedback from an oracle to both identify unknown unknowns and to intelligently guide the discovery. We employ a two-phase approach which first organizes the data into multiple partitions based on the feature similarity of instances and the confidence scores assigned by the predictive model, and then utilizes an explore-exploit strategy for discovering unknown unknowns across these partitions. We demonstrate the efficacy of our framework by varying the underlying causes of unknown unknowns across various applications. To the best of our knowledge, this paper presents the first algorithmic approach to the problem of discovering unknown unknowns of predictive models.

AAAI Conference 2017 Conference Paper

Long-Term Trends in the Public Perception of Artificial Intelligence

  • Ethan Fast
  • Eric Horvitz

Analyses of text corpora over time can reveal trends in beliefs, interest, and sentiment about a topic. We focus on views expressed about artificial intelligence (AI) in the New York Times over a 30-year period. General interest, awareness, and discussion about AI has waxed and waned since the field was founded in 1956. We present a set of measures that captures levels of engagement, measures of pessimism and optimism, the prevalence of specific hopes and concerns, and topics that are linked to discussions about AI over decades. We find that discussion of AI has increased sharply since 2009, and that these discussions have been consistently more optimistic than pessimistic. However, when we examine specific concerns, we find that worries of loss of control of AI, ethical concerns for AI, and the negative impact of AI on work have grown in recent years. We also find that hopes for AI in healthcare and education have increased over time.

AAAI Conference 2017 Conference Paper

On Human Intellect and Machine Failures: Troubleshooting Integrative Machine Learning Systems

  • Besmira Nushi
  • Ece Kamar
  • Eric Horvitz
  • Donald Kossmann

We study the problem of troubleshooting machine learning systems that rely on analytical pipelines of distinct components. Understanding and fixing errors that arise in such integrative systems is difficult as failures can occur at multiple points in the execution workflow. Moreover, errors can propagate, become amplified or be suppressed, making blame assignment difficult. We propose a human-in-the-loop methodology which leverages human intellect for troubleshooting system failures. The approach simulates potential component fixes through human computation tasks and measures the expected improvements in the holistic behavior of the system. The method provides guidance to designers about how they can best improve the system. We demonstrate the effectiveness of the approach on an automated image captioning system that has been pressed into real-world use.

AAAI Conference 2017 Short Paper

Predicting Mortality of Intensive Care Patients via Learning about Hazard

  • Dae Lee
  • Eric Horvitz

Patients in intensive care units (ICU) are acutely ill and have the highest mortality rates for hospitalized patients. Predictive models and planning system could forecast and guide interventions to prevent the hazardous deterioration of patientsÕ physiologies, thereby giving the opportunity of employing machine learning and inference to assist with the care of ICU patients. We report on the construction of a prediction pipeline that estimates the probability of death by inferring rates of hazard over time, based on patientsÕ physiological measurements. The inferred model provided the contribution of each variable and information about the influence of sets of observations on the overall risks and expected trajectories of patients.

IJCAI Conference 2016 Conference Paper

Intervention Strategies for Increasing Engagement in Crowdsourcing: Platform, Predictions, and Experiments

  • Avi Segal
  • Ya'akov (Kobi) Gal
  • Ece Kamar
  • Eric Horvitz
  • Alex Bowyer
  • Grant Miller

Volunteer-based crowdsourcing depend critically on maintaining the engagement of participants. We explore a methodology for extending engagement in citizen science by combining machine learning with intervention design. We first present a platform for using real-time predictions about forthcoming disengagement to guide interventions. Then we discuss a set of experiments with delivering different messages to users based on the proximity to the predicted time of disengagement. The messages address motivational factors that were found in prior studies to influence users' engagements. We evaluate this approach on Galaxy Zoo, one of the largest citizen science application on the web, where we traced the behavior and contributions of thousands of users who received intervention messages over a period of a few months. We found sensitivity of the amount of user contributions to both the timing and nature of the message. Specifically, we found that a message emphasizing the helpfulness of individual users significantly increased users' contributions when delivered ac- cording to predicted times of disengagement, but not when delivered at random times. The influence of the message on users' contributions was more pronounced as additional user data was collected and made available to the classifier.

JMLR Journal 2016 Journal Article

Patient Risk Stratification with Time-Varying Parameters: A Multitask Learning Approach

  • Jenna Wiens
  • John Guttag
  • Eric Horvitz

The proliferation of electronic health records (EHRs) frames opportunities for using machine learning to build models that help healthcare providers improve patient outcomes. However, building useful risk stratification models presents many technical challenges including the large number of factors (both intrinsic and extrinsic) influencing a patient's risk of an adverse outcome and the inherent evolution of that risk over time. We address these challenges in the context of learning a risk stratification model for predicting which patients are at risk of acquiring a Clostridium difficile infection (CDI). We take a novel data-centric approach, leveraging the contents of EHRs from nearly 50,000 hospital admissions. We show how, by adapting techniques from multitask learning, we can learn models for patient risk stratification with unprecedented classification performance. Our model, based on thousands of variables, both time-varying and time-invariant, changes over the course of a patient admission. Applied to a held out set of approximately 25,000 patient admissions, we achieve an area under the receiver operating characteristic curve of 0.81 (95% CI 0.78-0.84). The model has been integrated into the health record system at a large hospital in the US, and can be used to produce daily risk estimates for each inpatient. While more complex than traditional risk stratification methods, the widespread development and use of such data-driven models could ultimately enable cost-effective, targeted prevention strategies that lead to better patient outcomes. [abs] [ pdf ][ bib ] &copy JMLR 2016. ( edit, beta )

IJCAI Conference 2015 Conference Paper

Information Gathering in Networks via Active Exploration

  • Adish Singla
  • Eric Horvitz
  • Pushmeet Kohli
  • Ryen White
  • Andreas Krause

How should we gather information in a network, where each node’s visibility is limited to its local neighborhood? This problem arises in numerous real-world applications, such as surveying and task routing in social networks, team formation in collaborative networks and experimental design with dependency constraints. Often the informativeness of a set of nodes can be quantified via a submodular utility function. Existing approaches for submodular optimization, however, require that the set of all nodes that can be selected is known ahead of time, which is often unrealistic. In contrast, we propose a novel model where we start our exploration from an initial node, and new nodes become visible and available for selection only once one of their neighbors has been chosen. We then present a general algorithm NETEXP for this problem, and provide theoretical bounds on its performance dependent on structural properties of the underlying network. We evaluate our methodology on various simulated problem instances as well as on data collected from social question answering system deployed within a large enterprise.

IJCAI Conference 2015 Conference Paper

Metareasoning for Planning Under Uncertainty

  • Christopher H. Lin
  • Andrey Kolobov
  • Ece Kamar
  • Eric Horvitz

The conventional model for online planning under uncertainty assumes that an agent can stop and plan without incurring costs for the time spent planning. However, planning time is not free in most realworld settings. For example, an autonomous drone is subject to nature’s forces, like gravity, even while it thinks, and must either pay a price for counteracting these forces to stay in place, or grapple with the state change caused by acquiescing to them. Policy optimization in these settings requires metareasoning—a process that trades off the cost of planning and the potential policy improvement that can be achieved. We formalize and analyze the metareasoning problem for Markov Decision Processes (MDPs). Our work subsumes previously studied special cases of metareasoning and shows that in the general case, metareasoning is at most polynomially harder than solving MDPs with any given algorithm that disregards the cost of thinking. For reasons we discuss, optimal general metareasoning turns out to be impractical, motivating approximations. We present approximate metareasoning procedures which rely on special properties of the BRTDP planning algorithm and explore the effectiveness of our methods on a variety of problems.

AAAI Conference 2014 Conference Paper

Signals in the Silence: Models of Implicit Feedback in a Recommendation System for Crowdsourcing

  • Christopher Lin
  • Ece Kamar
  • Eric Horvitz

We exploit the absence of signals as informative observations in the context of providing task recommendations in crowdsourcing. Workers on crowdsourcing platforms do not provide explicit ratings about tasks. We present methods that enable a system to leverage implicit signals about task preferences. These signals include types of tasks that have been available and have been displayed, and the number of tasks workers select and complete. In contrast to previous work, we present a general model that can represent both positive and negative implicit signals. We introduce algorithms that can learn these models without exceeding the computational complexity of existing approaches. Finally, using data from a high-throughput crowdsourcing platform, we show that reasoning about both positive and negative implicit feedback can improve the quality of task recommendations.

AAAI Conference 2014 Conference Paper

Stochastic Privacy

  • Adish Singla
  • Eric Horvitz
  • Ece Kamar
  • Ryen White

Online services such as web search and e-commerce applications typically rely on the collection of data about users, including details of their activities on the web. Such personal data is used to maximize revenues via targeting of advertisements and longer engagements of users, and to enhance the quality of service via personalization of content. To date, service providers have largely followed the approach of either requiring or requesting consent for collecting user data. Users may be willing to share private information in return for incentives, enhanced services, or assurances about the nature and extent of the logged data. We introduce stochastic privacy, an approach to privacy centering on the simple concept of providing people with a guarantee that the probability that their personal data will be shared does not exceed a given bound. Such a probability, which we refer to as the privacy risk, can be given by users as a preference or communicated as a policy by a service provider. Service providers can work to personalize and to optimize revenues in accordance with preferences about privacy risk. We present procedures, proofs, and an overall system for maximizing the quality of services, while respecting bounds on privacy risk. We demonstrate the methodology with a case study and evaluation of the procedures applied to web search personalization. We show how we can achieve near-optimal utility of accessing information with provable guarantees on the probability of sharing data.

AAAI Conference 2013 Conference Paper

Automated Workflow Synthesis

  • Haoqi Zhang
  • Eric Horvitz
  • David Parkes

By coordinating efforts from humans and machines, human computation systems can solve problems that machines cannot tackle alone. A general challenge is to design efficient human computation algorithms or workflows with which to coordinate the work of the crowd. We introduce a method for automated workflow synthesis aimed at ideally harnessing human efforts by learning about the crowd’s performance on tasks and synthesizing an optimal workflow for solving a problem. We present experimental results for human sorting tasks, which demonstrate both the benefit of understanding and optimizing the structure of workflows based on observations. Results also demonstrate the benefits of using value of information to guide experiments for identifying efficient workflows with fewer experiments.

IJCAI Conference 2013 Conference Paper

Lifelong Learning for Acquiring the Wisdom of the Crowd

  • Ece Kamar
  • Ashish Kapoor
  • Eric Horvitz

Predictive models play a key role for inference and decision making in crowdsourcing. We present methods that can be used to guide the collection of data for enhancing the competency of such predictive models while using the models to provide a base crowdsourcing service. We focus on the challenge of ideally balancing the goals of collecting data over time for learning and for improving task performance with the cost of workers’ contributions over the lifetime of the operation of a system. We introduce the use of distributions over a set of predictive models to represent uncertainty about the dynamics of the world. We employ a novel Monte Carlo algorithm to reason simultaneously about uncertainty about the world dynamics and the progression of task solution as workers are hired over time to optimize hiring decisions. We evaluate the methodology with experiments on a challenging citizen-science problem, demonstrating how it balances exploration and exploitation over the lifetime of a crowdsourcing system.

IJCAI Conference 2013 Conference Paper

Look versus Leap: Computing Value of Information with High-Dimensional Streaming Evidence

  • Stephanie Rosenthal
  • Dan Bohus
  • Ece Kamar
  • Eric Horvitz

A key decision facing autonomous systems with access to streams of sensory data is whether to act based on current evidence or to wait for additional information that might enhance the utility of taking an action. Computing the value of information is particularly difficult with streaming highdimensional sensory evidence. We describe a belief projection approach to reasoning about information value in these settings, using models for inferring future beliefs over states given streaming evidence. These belief projection models can be learned from data or constructed via direct assessment of parameters and they fit naturally in modular, hierarchical state inference architectures. We describe principles of using belief projection and present results drawn from an implementation of the methodology within a conversational system.

AAMAS Conference 2012 Conference Paper

Combining Human and Machine Intelligence in Large-scale Crowdsourcing

  • Ece Kamar
  • Severin Hacker
  • Eric Horvitz

We show how machine learning and inference can be harnessed to leverage the complementary strengths of humans and computational agents to solve crowdsourcing tasks. We construct a set of Bayesian predictive models from data and describe how the models operate within an overall crowdsourcing architecture that combines the efforts of people and machine vision on the task of classifying celestial bodies defined within a citizens' science project named Galaxy Zoo. We show how learned probabilistic models can be used to fuse human and machine contributions and to predict the behaviors of workers. We employ multiple inferences in concert to guide decisions on hiring and routing workers to tasks so as to maximize the efficiency of large-scale crowdsourcing processes based on expected utility.

AAAI Conference 2012 Conference Paper

Learning to Learn: Algorithmic Inspirations from Human Problem Solving

  • Ashish Kapoor
  • Bongshin Lee
  • Desney Tan
  • Eric Horvitz

We harness the ability of people to perceive and interact with visual patterns in order to enhance the performance of a machine learning method. We show how we can collect evidence about how people optimize the parameters of an ensemble classification system using a tool that provides a visualization of misclassification costs. Then, we use these observations about human attempts to minimize cost in order to extend the performance of a state-of-the-art ensemble classification system. The study highlights opportunities for learning from evidence collected about human problem solving to refine and extend automated learning and inference.

NeurIPS Conference 2012 Conference Paper

Patient Risk Stratification for Hospital-Associated C. diff as a Time-Series Classification Task

  • Jenna Wiens
  • Eric Horvitz
  • John Guttag

A patient's risk for adverse events is affected by temporal processes including the nature and timing of diagnostic and therapeutic activities, and the overall evolution of the patient's pathophysiology over time. Yet many investigators ignore this temporal aspect when modeling patient risk, considering only the patient's current or aggregate state. We explore representing patient risk as a time series. In doing so, patient risk stratification becomes a time-series classification task. The task differs from most applications of time-series analysis, like speech processing, since the time series itself must first be extracted. Thus, we begin by defining and extracting approximate \textit{risk processes}, the evolving approximate daily risk of a patient. Once obtained, we use these signals to explore different approaches to time-series classification with the goal of identifying high-risk patterns. We apply the classification to the specific task of identifying patients at risk of testing positive for hospital acquired colonization with \textit{Clostridium Difficile}. We achieve an area under the receiver operating characteristic curve of 0. 79 on a held-out set of several hundred patients. Our two-stage approach to risk stratification outperforms classifiers that consider only a patient's current state (p$<$0. 05).

AAAI Conference 2012 Conference Paper

Performance and Preferences: Interactive Refinement of Machine Learning Procedures

  • Ashish Kapoor
  • Bongshin Lee
  • Desney Tan
  • Eric Horvitz

Problem solving procedures have been typically aimed at achieving well defined goals or satisfying straightforward preferences. However, learners and solvers may often generate rich multiattribute results with procedures guided by sets of controls that define different dimensions of quality. We explore methods that enable people to explore and express preferences about the operation of classification models in supervised multiclass learning. We leverage a leave one out confusion matrix that provides users with views and real time controls of a model space. The approach allows people to consider in an interactive manner the global implications of local changes in decision boundaries. We focus on kernel classifiers and show the effectiveness of the methodology on a variety of tasks.

AAMAS Conference 2012 Conference Paper

Task Routing for Prediction Tasks

  • Haoqi Zhang
  • Eric Horvitz
  • Yiling Chen
  • David Parkes

We describe methods for routing a prediction task on a network where each participant can contribute information and route the task onwards. \emph{Routing scoring rules} bring truthful contribution of information about the task and optimal routing of the task into a Perfect Bayesian Equilibrium under common knowledge about the competancies of agents. Relaxing the common knowledge assumption, we address the challenge of routing in situations where each agent's knowledge about other agents is limited to a local neighborhood. A family of \emph{local routing rules} isolate in equilibrium routing decisions that depend only on this local knowledge, and are the only routing scoring rules with this property. Simulation results show that local routing rules can promote effective task routing.

AAMAS Conference 2011 Conference Paper

Jogger: Models for Context-Sensitive Reminding

  • Ece Kamar
  • Eric Horvitz

We describe research on principles of context-sensitive reminding that show promise for serving in systems that work to jog peoples' memories about information that they may forget. The methods center on the construction and use of a set of distinct probabilistic models that predict (1) items that may be forgotten, (2) the expected relevance of the items in a situation, and (3) the cost of interruption associated with alerting about a reminder. We describe the use of this set of models in the Jogger prototype that employs predictions and decision-theoretic optimization to compute the value of reminders about meetings.

AAAI Conference 2010 Conference Paper

Generalized Task Markets for Human and Machine Computation

  • Dafna Shahaf
  • Eric Horvitz

We discuss challenges and opportunities for developing generalized task markets where human and machine intelligence are enlisted to solve problems, based on a consideration of the competencies, availabilities, and pricing of different problemsolving resources. The approach couples human computation with machine learning and planning, and is aimed at optimizing the flow of subtasks to people and to computational problem solvers. We illustrate key ideas in the context of Lingua Mechanica, a project focused on harnessing human and machine translation skills to perform translation among languages. We present infrastructure and methods for enlisting and guiding human and machine computation for language translation, including details about the hardness of generating plans for assigning tasks to solvers. Finally, we discuss studies performed with machine and human solvers, focusing on components of a Lingua Mechanica prototype.

IJCAI Conference 2009 Conference Paper

  • Dafna Shahaf
  • Eric Horvitz

Autonomous agents that sense, reason, and act in real-world environments for extended periods often need to solve streams of incoming problems. Traditionally, effort is applied only to problems that have already arrived and have been noted. We examine continual computation methods that allow agents to ideally allocate time to solving current as well as potential future problems under uncertainty. We first review prior work on continual computation. Then, we present new directions and results, including the consideration of shared subtasks and multiple tasks. We present results on the computational complexity of the continual-computation problem and provide approximations for arbitrary models of computational performance. Finally, we review special formulations for addressing uncertainty about the best algorithm to apply, learning about performance, and considering costs associated with delayed use of results.

IJCAI Conference 2009 Conference Paper

  • Ece Kamar
  • Eric Horvitz

We develop and test computational methods for guiding collaboration that demonstrate how shared plans can be created in real-world settings, where agents can be expected to have diverse and varying goals, preferences, and availabilities. The methods are motivated and evaluated in the realm of ridesharing, using GPS logs of commuting data. We consider challenges with coordination among self-interested people aimed at minimizing the cost of transportation and the impact of travel on the environment. We present planning, optimization, and payment mechanisms that provide fair and ef- ficient solutions to the rideshare collaboration challenge. We evaluate different VCG-based payment schemes in terms of their computational efficiency, budget balance, incentive compatibility, and strategy proofness. We present the behavior and analyses provided by the ABC ridesharing prototype system. The system learns about destinations and preferences from GPS traces and calendars, and considers time, fuel, environmental, and cognitive costs. We review how ABC generates rideshare plans from hundreds of real-life GPS traces collected from a community of commuters and reflect about the promise of employing the ABC methods to reduce the number of vehicles on the road, thus reducing CO2 emissions and fuel expenditures.

NeurIPS Conference 2009 Conference Paper

Breaking Boundaries Between Induction Time and Diagnosis Time Active Information Acquisition

  • Ashish Kapoor
  • Eric Horvitz

There has been a clear distinction between induction or training time and diagnosis time active information acquisition. While active learning during induction focuses on acquiring data that promises to provide the best classification model, the goal at diagnosis time focuses completely on next features to observe about the test case at hand in order to make better predictions about the case. We introduce a model and inferential methods that breaks this distinction. The methods can be used to extend case libraries under a budget but, more fundamentally, provide a framework for guiding agents to collect data under scarce resources, focused by diagnostic challenges. This extension to active learning leads to a new class of policies for real-time diagnosis, where recommended information-gathering sequences include actions that simultaneously seek new data for the case at hand and for cases in the training set.

AAMAS Conference 2008 Conference Paper

Mobile Opportunistic Commerce: Mechanisms, Architecture, and Application

  • Ece Kamar
  • Eric Horvitz
  • Chris Meek

We present mechanisms, architectures, and an implementation addressing challenges with mobile opportunistic commerce centering on markets and mechanisms that support the procurement of goods and services in mobile settings. Our efforts seek to extend core concepts from research in electronic commerce to interactions between mobile buyers and brick and mortar businesses that have geographically situated retail offices. We focus on efficient mechanisms, infrastructure, and automation that can enable sellers and buyers to take joint advantage of the relationship of the locations of retail offices to the routes of mobile buyers who may have another primary destination. The methods promote automated vigilance about opportunities to buy and sell, and to support negotiations on the joint value to buyers and sellers including buyers’ costs of divergence from their original paths to acquire services and commodities. We extend prior work on auction mechanisms to personal procurement settings by analyzing the dynamics of the cost to buyers based on preexisting plans, location, and overall context. We present mechanisms for auctions in single item, combinatorial, and multiattribute settings that take into consideration personal inconvenience costs within timesensitive dynamic markets and challenges with privacy and fairness.

IJCAI Conference 2007 Conference Paper

  • Ashish Kapoor
  • Eric Horvitz
  • Sumit Basu

An inescapable bottleneck with learning from large data sets is the high cost of labeling training data. Unsupervised learning methods have promised to lower the cost of tagging by leveraging notions of similarity among data points to assign tags. However, unsupervised and semi-supervised learning techniques often provide poor results due to errors in estimation. We look at methods that guide the allocation of human effort for labeling data so as to get the greatest boosts in discriminatory power with increasing amounts of work. We focus on the application of value of information to Gaussian Process classifiers and explore the effectiveness of the method on the task of classifying voice messages.

IJCAI Conference 2007 Conference Paper

  • Doug Downey
  • Susan Dumais
  • Eric Horvitz

We describe the formulation, construction, and evaluation of predictive models of human information seeking from a large dataset of Web search activities. We first introduce an expressive language for describing searching and browsing behavior, and use this language to characterize several prior studies of search behavior. Then, we focus on the construction of predictive models from the data. We review several analyses, including an exploration of the properties of users, queries, and search sessions that are most predictive of future behavior. We also investigate the influence of temporal delay on user actions, and representational tradeoffs with varying the number of steps of user activity considered. Finally, we discuss applications of the predictive models, and focus on the example of performing principled prefetching of content.

UAI Conference 2007 Conference Paper

On Discarding, Caching, and Recalling Samples in Active Learning

  • Ashish Kapoor
  • Eric Horvitz

We address challenges of active learning under scarce informational resources in non-stationary environments. In real-world settings, data labeled and integrated into a predictive model may become invalid over time. However, the data can become informative again with switches in context and such changes may indicate unmodeled cyclic or other temporal dynamics. We explore principles for discarding, caching, and recalling labeled data points in active learning based on computations of value of information. We review key concepts and study the value of the methods via investigations of predictive performance and costs of acquiring data for simulated and real-world data sets.

JMLR Journal 2006 Journal Article

Considering Cost Asymmetry in Learning Classifiers

  • Francis R. Bach
  • David Heckerman
  • Eric Horvitz

Receiver Operating Characteristic (ROC) curves are a standard way to display the performance of a set of binary classifiers for all feasible ratios of the costs associated with false positives and false negatives. For linear classifiers, the set of classifiers is typically obtained by training once, holding constant the estimated slope and then varying the intercept to obtain a parameterized set of classifiers whose performances can be plotted in the ROC plane. We consider the alternative of varying the asymmetry of the cost function used for training. We show that the ROC curve obtained by varying both the intercept and the asymmetry, and hence the slope, always outperforms the ROC curve obtained by varying only the intercept. In addition, we present a path-following algorithm for the support vector machine (SVM) that can compute efficiently the entire ROC curve, and that has the same computational complexity as training a single classifier. Finally, we provide a theoretical analysis of the relationship between the asymmetric cost model assumed when training a classifier and the cost model assumed in applying the classifier. In particular, we show that the mismatch between the step function used for testing and its convex upper bounds, usually used for training, leads to a provable and quantifiable difference around extreme asymmetries. [abs] [ pdf ][ bib ] &copy JMLR 2006. ( edit, beta )

UAI Conference 2005 Conference Paper

Prediction, Expectation, and Surprise: Methods, Designs, and Study of a Deployed Traffic Forecasting Service

  • Eric Horvitz
  • Johnson Apacible
  • Raman Sarin
  • Lin Liao

We present research on developing models that forecast traffic flow and congestion in the Greater Seattle area. The research has led to the deployment of a service named JamBayes, that is being actively used by over 2,500 users via smartphones and desktop versions of the system. We review the modeling effort and describe experiments probing the predictive accuracy of the models. Finally, we present research on building models that can identify current and future surprises, via efforts on modeling and forecasting unexpected situations.

UAI Conference 2003 Conference Paper

Web-Based Question Answering: A Decision-Making Perspective

  • David Azari
  • Eric Horvitz
  • Susan T. Dumais
  • Eric Brill

We describe an investigation of the use of probabilistic models and cost-benefit analyses to guide resource-intensive procedures used by a Web-based question answering system. We first provide an overview of research on question-answering systems. Then, we present details on AskMSR, a prototype web-based question answering system. We discuss Bayesian analyses of the quality of answers generated by the system and show how we can endow the system with the ability to make decisions about the number of queries issued to a search engine, given the cost of queries and the expected value of query results in refining an ultimate answer. Finally, we review the results of a set of experiments.

UAI Conference 2002 Conference Paper

Coordinates: Probabilistic Forecasting of Presence and Availability

  • Eric Horvitz
  • Paul Koch
  • Carl Myers Kadie
  • Andy Jacobs

We present methods employed in Coordinate, a prototype service that supports collaboration and communication by learning predictive models that provide forecasts of users s AND availability.We describe how data IS collected about USER activity AND proximity FROM multiple devices, IN addition TO analysis OF the content OF users, the time of day, and day of week. We review applications of presence forecasting embedded in the Priorities application and then present details of the Coordinate service that was informed by the earlier efforts.

UAI Conference 2001 Conference Paper

A Bayesian Approach to Tackling Hard Computational Problems

  • Eric Horvitz
  • Yongshao Ruan
  • Carla P. Gomes
  • Henry A. Kautz
  • Bart Selman
  • Max Chickering

We are developing a general framework for using learned Bayesian models for decision-theoretic control of search and reasoningalgorithms. We illustrate the approach on the specific task of controlling both general and domain-specific solvers on a hard class of structured constraint satisfaction problems. A successful strategyfor reducing the high (and even infinite) variance in running time typically exhibited by backtracking search algorithms is to cut off and restart the search if a solution is not found within a certainamount of time. Previous work on restart strategies have employed fixed cut off values. We show how to create a dynamic cut off strategy by learning a Bayesian model that predicts the ultimate length of a trial based on observing the early behavior of the search algorithm. Furthermore, we describe the general conditions under which a dynamic restart strategy can outperform the theoretically optimal fixed strategy.

UAI Conference 2000 Conference Paper

Collaborative Filtering by Personality Diagnosis: A Hybrid Memory and Model-Based Approach

  • David M. Pennock
  • Eric Horvitz
  • Steve Lawrence
  • C. Lee Giles

The growth of Internet commerce has stimulated the use of collaborative filtering (CF) algorithms as recommender systems. Such systems leverage knowledge about the known preferences of multiple users to recommend items of interest to other users. CF methods have been harnessed to make recommendations about such items as web pages, movies, books, and toys. Researchers have proposed and evaluated many approaches for generating recommendations. We describe and evaluate a new method called emph{personality diagnosis (PD)}. Given a user's preferences for some items, we compute the probability that he or she is of the same ``personality type'' as other users, and, in turn, the probability that he or she will like new items. PD retains some of the advantages of traditional similarity-weighting techniques in that all data is brought to bear on each prediction and new data can be added easily and incrementally. Additionally, PD has a meaningful probabilistic interpretation, which may be leveraged to justify, explain, and augment results. We report empirical results on the EachMovie database of movie ratings, and on user profile data collected from the CiteSeer digital library of Computer Science research papers. The probabilistic framework naturally supports a variety of descriptive measurements---in particular, we consider the applicability of a value of information (VOI) computation.

UAI Conference 2000 Conference Paper

Conversation as Action Under Uncertainty

  • Tim Paek
  • Eric Horvitz

Conversations abound with uncetainties of various kinds. Treating conversation as inference and decision making under uncertainty, we propose a task independent, multimodal architecture for supporting robust continuous spoken dialog called Quartet. We introduce four interdependent levels of analysis, and describe representations, inference procedures, and decision strategies for managing uncertainties within and between the levels. We highlight the approach by reviewing interactions between a user and two spoken dialog systems developed using the Quartet architecture: Prsenter, a prototype system for navigating Microsoft PowerPoint presentations, and the Bayesian Receptionist, a prototype system for dealing with tasks typically handled by front desk receptionists at the Microsoft corporate campus.

UAI Conference 1999 Conference Paper

Attention-Sensitive Alerting

  • Eric Horvitz
  • Andy Jacobs
  • David Hovel

We introduce utility-directed procedures for mediating the flow of potentially distracting alerts and communications to computer users. We present models and inference procedures that balance the context-sensitive costs of deferring alerts with the cost of interruption. We describe the challenge of reasoning about such costs under uncertainty via an analysis of user activity and the content of notifications. After introducing principles of attention-sensitive alerting, we focus on the problem of guiding alerts about email messages. We dwell on the problem of inferring the expected criticality of email and discuss work on the Priorities system, centering on prioritizing email by criticality and modulating the communication of notifications to users about the presence and nature of incoming email.

IJCAI Conference 1999 Conference Paper

Thinking Ahead: Continual Computation Policies for Allocating Idle and Real-Time Resources to Solve Future Challenges

  • Eric Horvitz

Research on continual computation centers on developing precomputation policies that can effectively harness available resources to solve future challenges. We focus on integrating a consideration of offline and real-time resources in continual computation. We review precomputation policies for flexible procedures and present strategies that account for the expected future real-time refinement of a result following precomputation. Finally, we address policies that consider the tradeoff between the efficiency of solving current and potential future challenges.

UAI Conference 1998 Conference Paper

Inferring Informational Goals from Free-Text Queries: A Bayesian Approach

  • David Heckerman
  • Eric Horvitz

People using consumer software applications typically do not use technical jargon when querying an online database of help topics. Rather, they attempt to communicate their goals with common words and phrases that describe software functionality in terms of structure and objects they understand. We describe a Bayesian approach to modeling the relationship between words in a user's query for assistance and the informational goals of the user. After reviewing the general method, we describe several extensions that center on integrating additional distinctions and structure about language usage and user goals into the Bayesian models.

UAI Conference 1998 Conference Paper

The Lumière Project: Bayesian User Modeling for Inferring the Goals and Needs of Software Users

  • Eric Horvitz
  • John S. Breese
  • David Heckerman
  • David Hovel
  • Koos Rommelse

The Lumiere Project centers on harnessing probability and utility to provide assistance to computer software users. We review work on Bayesian user models that can be employed to infer a users needs by considering a user's background, actions, and queries. Several problems were tackled in Lumiere research, including (1) the construction of Bayesian models for reasoning about the time-varying goals of computer users from their observed actions and queries, (2) gaining access to a stream of events from software applications, (3) developing a language for transforming system events into observational variables represented in Bayesian user models, (4) developing persistent profiles to capture changes in a user expertise, and (5) the development of an overall architecture for an intelligent user interface. Lumiere prototypes served as the basis for the Office Assistant in the Microsoft Office '97 suite of productivity applications.

AAAI Conference 1997 Conference Paper

Models of Continual Computation

  • Eric Horvitz

Automated problem solving is viewed typically as the expenditure of computation to solve one or more problems passed to a reasoning system. In response to each problem received, effort is applied to generate a solution and problem solving ends when the solution is rendered. We discuss the notion of continual computation that addresses a broader conception of problem by considering the ideal use of the idle time between problem instances. The time is used to develop solutions proactively to one or more expected challenges in the future. We consider analyses for traditional all-ornothing algorithms as well as more flexible computational procedures. After exploring the allocation of idle time for several settings, we generalize the analysis to consider the case of shifting computation from a current problem to solve future challenges. Finally, we discuss a sample application of the use of continual computation in the setting of diagnostic reasoning.

UAI Conference 1997 Conference Paper

Perception, Attention, and Resources: A Decision-Theoretic Approach to Graphics Rendering

  • Eric Horvitz
  • Jed Lengyel

We describe work to control graphics rendering under limited computational resources by taking a decision-theoretic perspective on perceptual costs and computational savings of approximations. The work extends earlier work on the control of rendering by introducing methods and models for computing the expected cost associated with degradations of scene components. The expected cost is computed by considering the perceptual cost of degradations and a probability distribution over the attentional focus of viewers. We review the critical literature describing findings on visual search and attention, discuss the implications of the findings, and introduce models of expected perceptual cost. Finally, we discuss policies that harness information about the expected cost of scene components.

UAI Conference 1996 Conference Paper

A Graph-Theoretic Analysis of Information Value

  • Kim-Leng Poh
  • Eric Horvitz

We derive qualitative relationships about the informational relevance of variables in graphical decision models based on a consideration of the topology of the models. Specifically, we identify dominance relations for the expected value of information on chance variables in terms of their position and relationships in influence diagrams. The qualitative relationships can be harnessed to generate nonnumerical procedures for ordering uncertain variables in a decision model by their informational relevance.

UAI Conference 1995 Conference Paper

Display of Information for Time-Critical Decision Making

  • Eric Horvitz
  • Matthew Barry

We describe methods for managing the complexity of information displayed to people responsible for making high-stakes, time-critical decisions. The techniques provide tools for real-time control of the configuration and quantity of information displayed to a user, and a methodology for designing flexible human-computer interfaces for monitoring applications. After defining a prototypical set of display decision problems, we introduce the expected value of revealed information (EVRI) and the related measure of expected value of displayed information (EVDI). We describe how these measures can be used to enhance computer displays used for monitoring complex systems. We motivate the presentation by discussing our efforts to employ decision-theoretic control of displays for a time-critical monitoring application at the NASA Mission Control Center in Houston.

UAI Conference 1995 Conference Paper

Exploiting System Hierarchy to Compute Repair Plans in Probabilistic Model-Based Diagnosis

  • Sampath Srinivas
  • Eric Horvitz

The goal of model-based diagnosis is to isolate causes of anomalous system behavior and recommend inexpensive repair actions in response. In general, precomputing optimal repair policies is intractable. To date, investigators addressing this problem have explored approximations that either impose restrictions on the system model (such as a single fault assumption) or compute an immediate best action with limited lookahead. In this paper, we develop a formulation of repair in model-based diagnosis and a repair algorithm that computes optimal sequences of actions. This optimal approach is costly but can be applied to precompute an optimal repair strategy for compact systems. We show how we can exploit a hierarchical system specification to make this approach tractable for large systems. When introducing hierarchy, we also consider the tradeoff between simply replacing a component and decomposing it to repair its subcomponents. The hierarchical repair algorithm is suitable for off-line precomputation of an optimal repair strategy. A modification of the algorithm takes advantage of an iterative deepening scheme to trade off inference time and the quality of the computed strategy.

UAI Conference 1995 Conference Paper

Reasoning, Metareasoning, and Mathematical Truth: Studies of Theorem Proving under Limited Resources

  • Eric Horvitz
  • Adrian C. Klein

In earlier work, we introduced flexible inference and decision-theoretic metareasoning to address the intractability of normative inference. Here, rather than pursuing the task of computing beliefs and actions with decision models composed of distinctions about uncertain events, we examine methods for inferring beliefs about mathematical truth before an automated theorem prover completes a proof. We employ a Bayesian analysis to update belief in truth, given theorem-proving progress, and show how decision-theoretic methods can be used to determine the value of continuing to deliberate versus taking immediate action in time-critical situations.

UAI Conference 1993 Conference Paper

Reasoning about the Value of Decision-Model Refinement: Methods and Application

  • Kim-Leng Poh
  • Eric Horvitz

We investigate the value of extending the completeness of a decision model along different dimensions of refinement. Specifically, we analyze the expected value of quantitative, conceptual, and structural refinement of decision models. We illustrate the key dimensions of refinement with examples. The analyses of value of model refinement can be used to focus the attention of an analyst or an automated reasoning system on extensions of a decision model associated with the greatest expected value.

UAI Conference 1993 Conference Paper

Utility-Based Abstraction and Categorization

  • Eric Horvitz
  • Adrian C. Klein

We take a utility-based approach to categorization. We construct generalizations about events and actions by considering losses associated with failing to distinguish among detailed distinctions in a decision model. The utility-based methods transform detailed states of the world into more abstract categories comprised of disjunctions of the states. We show how we can cluster distinctions into groups of distinctions at progressively higher levels of abstraction, and describe rules for decision making with the abstractions. The techniques introduce a utility-based perspective on the nature of concepts, and provide a means of simplifying decision models used in automated reasoning systems. We demonstrate the techniques by describing the capabilities and output of TUBA, a program for utility-based abstraction.

UAI Conference 1992 Conference Paper

Dynamic Network Models for Forecasting

  • Paul Dagum
  • Adam Galper
  • Eric Horvitz

We have developed a probabilistic forecasting methodology through a synthesis of belief network models and classical time-series analysis. We present the dynamic network model (DNM) and describe methods for constructing, refining, and performing inference with this representation of temporal probabilistic knowledge. The DNM representation extends static belief-network models to more general dynamic forecasting models by integrating and iteratively refining contemporaneous and time-lagged dependencies. We discuss key concepts in terms of a model for forecasting U.S. car sales in Japan.

UAI Conference 1992 Conference Paper

Reformulating Inference Problems Through Selective Conditioning

  • Paul Dagum
  • Eric Horvitz

We describe how we selectively reformulate portions of a belief network that pose difficulties for solution with a stochastic-simulation algorithm. With employ the selective conditioning approach to target specific nodes in a belief network for decomposition, based on the contribution the nodes make to the tractability of stochastic simulation. We review previous work on BNRAS algorithms- randomized approximation algorithms for probabilistic inference. We show how selective conditioning can be employed to reformulate a single BNRAS problem into multiple tractable BNRAS simulation problems. We discuss how we can use another simulation algorithm-logic sampling-to solve a component of the inference problem that provides a means for knitting the solutions of individual subproblems into a final result. Finally, we analyze tradeoffs among the computational subtasks associated with the selective conditioning approach to reformulation.

UAI Conference 1991 Conference Paper

An Approximate Nonmyopic Computation for Value of Information

  • David Heckerman
  • Eric Horvitz
  • Blackford Middleton

Value-of-information analyses provide a straightforward means for selecting the best next observation to make, and for determining whether it is better to gather additional information or to act immediately. Determining the next best test to perform, given a state of uncertainty about the world, requires a consideration of the value of making all possible sequences of observations. In practice, decision analysts and expert-system designers have avoided the intractability of exact computation of the value of information by relying on a myopic approximation. Myopic analyses are based on the assumption that only one additional test will be performed, even when there is an opportunity to make a large number of observations. We present a nonmyopic approximation for value of information that bypasses the traditional myopic analyses by exploiting the statistical properties of large samples.

UAI Conference 1991 Conference Paper

Time-Dependent Utility and Action Under Uncertainty

  • Eric Horvitz
  • Geoffrey Rutledge

We discuss representing and reasoning with knowledge about the time-dependent utility of an agent's actions. Time-dependent utility plays a crucial role in the interaction between computation and action under bounded resources. We present a semantics for time-dependent utility and describe the use of time-dependent information in decision contexts. We illustrate our discussion with examples of time-pressured reasoning in Protos, a system constructed to explore the ideal control of inference by reasoners with limit abilities.

UAI Conference 1990 Conference Paper

Ideal reformulation of belief networks

  • John S. Breese
  • Eric Horvitz

The intelligent reformulation or restructuring of a belief network can greatly increase the efficiency of inference. However, time expended for reformulation is not available for performing inference. Thus, under time pressure, there is a tradeoff between the time dedicated to reformulating the network and the time applied to the implementation of a solution. We investigate this partition of resources into time applied to reformulation and time used for inference. We shall describe first general principles for computing the ideal partition of resources under uncertainty. These principles have applicability to a wide variety of problems that can be divided into interdependent phases of problem solving. After, we shall present results of our empirical study of the problem of determining the ideal amount of time to devote to searching for clusters in belief networks. In this work, we acquired and made use of probability distributions that characterize (1) the performance of alternative heuristic search methods for reformulating a network instance into a set of cliques, and (2) the time for executing inference procedures on various belief networks. Given a preference model describing the value of a solution as a function of the delay required for its computation, the system selects an ideal time to devote to reformulation.

UAI Conference 1987 Conference Paper

Reasoning about Beliefs and Actions Under Computational Resource Constraints

  • Eric Horvitz

Although many investigators affirm a desire to build reasoning systems that behave consistently with the axiomatic basis defined by probability theory and utility theory, limited resources for engineering and computation can make a complete normative analysis impossible. We attempt to move discussion beyond the debate over the scope of problems that can be handled effectively to cases where it is clear that there are insufficient computational resources to perform an analysis deemed as complete. Under these conditions, we stress the importance of considering the expected costs and benefits of applying alternative approximation procedures and heuristics for computation and knowledge acquisition. We discuss how knowledge about the structure of user utility can be used to control value tradeoffs for tailoring inference to alternative contexts. We address the notion of real-time rationality, focusing on the application of knowledge about the expected timewise-refinement abilities of reasoning strategies to balance the benefits of additional computation with the costs of acting with a partial result. We discuss the benefits of applying decision theory to control the solution of difficult problems given limitations and uncertainty in reasoning resources.

UAI Conference 1986 Conference Paper

The myth of modularity in rule-based systems for reasoning with uncertainty

  • David Heckerman
  • Eric Horvitz

In this paper, we examine the concept of modularity, an often cited advantage of the ruled-based representation methodology. We argue that the notion of modularity consists of two distinct concepts which we call syntactic modularity and semantic modularity. We argue that when reasoning under certainty, it is reasonable to regard the rule-based approach as both syntactically and semantically modular. However, we argue that in the case of plausible reasoning, rules are syntactically modular but are rarely semantically modular. To illustrate this point, we examine a particular approach for managing uncertainty in rule-based systems called the MYCIN certainty factor model. We formally define the concept of semantic modularity with respect to the certainty factor model and discuss logical consequences of the definition. We show that the assumption of semantic modularity imposes strong restrictions on rules in a knowledge base. We argue that such restrictions are rarely valid in practical applications. Finally, we suggest how the concept of semantic modularity can be relaxed in a manner that makes it appropriate for plausible reasoning.

UAI Conference 1985 Conference Paper

The Inconsistent Use of Measures of Certainty in Artificial Intelligence Research

  • Eric Horvitz
  • David Heckerman

There has been a great deal of interest among artificial intelligence researchers in methodologies for using subjective measures of belief for reasoning about uncertain events. We emphasize the fundamental difference between the use of measures of absolute belief and measures of change in belief. We show that expert system investigators have elicited measures of absolute belief and have used them as though they were measures of change in belief. We argue that such use is counterintuitive and formally inconsistent. In our discussion of the inconsistency, we introduce a belief-updating paradigm used frequently in plausible reasoning. Finally, we attempt to characterize the potential impact of the inconsistency.