Arrow Research search

Author name cluster

Michael Kearns

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

41 papers
1 author row

Possible papers

41

NeurIPS Conference 2024 Conference Paper

Oracle-Efficient Reinforcement Learning for Max Value Ensembles

  • Marcel Hussing
  • Michael Kearns
  • Aaron Roth
  • Sikata B. Sengupta
  • Jessica Sorrell

Reinforcement learning (RL) in large or infinite state spaces is notoriously challenging, both theoretically (where worst-case sample and computational complexities must scale with state space cardinality) and experimentally (where function approximation and policy gradient techniques often scale poorly and suffer from instability and high variance). One line of research attempting to address these difficultiesmakes the natural assumption that we are given a collection of base or constituent policies (possibly heuristic) upon which we would like to improve in a scalable manner. In this work we aim to compete with the max-following policy, which at each state follows the action of whichever constituent policy has the highest value. The max-following policy is always at least as good as the best constituent policy, and may be considerably better. Our main result is an efficient algorithm that learns to compete with the max-following policy, given only access to the constituent policies (but not their value functions). In contrast to prior work in similar settings, our theoretical results require only the minimal assumption of an ERM oracle for value function approximation for the constituent policies (and not the global optimal policy or the max-following policy itself) on samplable distributions. We illustrate our algorithm's experimental effectiveness and behavior on several robotic simulation testbeds.

NeurIPS Conference 2024 Conference Paper

Reconstruction Attacks on Machine Unlearning: Simple Models are Vulnerable

  • Martin Bertran
  • Shuai Tang
  • Michael Kearns
  • Jamie Morgenstern
  • Aaron Roth
  • Zhiwei S. Wu

Machine unlearning is motivated by principles of data autonomy. The premise is that a person can request to have their data's influence removed from deployed models, and those models should be updated as if they were retrained without the person's data. We show that these updates expose individuals to high-accuracy reconstruction attacks which allow the attacker to recover their data in its entirety, even when the original models are so simple that privacy risk might not otherwise have been a concern. We show how to mount a near-perfect attack on the deleted data point from linear regression models. We then generalize our attack to other loss functions and architectures, and empirically demonstrate the effectiveness of our attacks across a wide range of datasets (capturing both tabular and image data). Our work highlights that privacy risk is significant even for extremely simple model classes when individuals can request deletion of their data from the model.

AAMAS Conference 2023 Conference Paper

Efficient Stackelberg Strategies for Finitely Repeated Games

  • Natalie Collina
  • Eshwar Ram Arunachaleswaran
  • Michael Kearns

We study Stackelberg equilibria in finitely repeated games, where the leader commits to a strategy that picks actions in each round and can be adaptive to the history of play (i. e. they commit to an algorithm). In particular, we study static repeated games with no discounting. We give efficient algorithms for finding approximate Stackelberg equilibria in this setting, along with rates of convergence depending on the time horizonT. In many cases, these algorithms allow the leader to do much better on average than they can in the single-round Stackelberg. We give two algorithms, one computing strategies with an optimal 1 T rate at the expense of an exponential dependence on the number of actions, and another (randomized) approach computing strategies with no dependence on the number of actions but a worse dependence on T of 1 T 0. 25. Both algorithms build upon a linear program to produce simple automata leader strategies and induce corresponding automata best-responses for the follower. We complement these results by showing that approximating the Stackelberg value in three-player finite-horizon repeated games is a computationally hard problem via a reduction from balanced vertex cover.

NeurIPS Conference 2023 Conference Paper

Replicable Reinforcement Learning

  • Eric Eaton
  • Marcel Hussing
  • Michael Kearns
  • Jessica Sorrell

The replicability crisis in the social, behavioral, and data sciences has led to the formulation of algorithm frameworks for replicability --- i. e. , a requirement that an algorithm produce identical outputs (with high probability) when run on two different samples from the same underlying distribution. While still in its infancy, provably replicable algorithms have been developed for many fundamental tasks in machine learning and statistics, including statistical query learning, the heavy hitters problem, and distribution testing. In this work we initiate the study of replicable reinforcement learning, providing a provably replicable algorithm for parallel value iteration, and a provably replicable version of R-Max in the episodic setting. These are the first formal replicability results for control problems, which present different challenges for replication than batch learning settings.

NeurIPS Conference 2023 Conference Paper

Scalable Membership Inference Attacks via Quantile Regression

  • Martin Bertran
  • Shuai Tang
  • Aaron Roth
  • Michael Kearns
  • Jamie H. Morgenstern
  • Steven Z. Wu

Membership inference attacks are designed to determine, using black box access to trained models, whether a particular example was used in training or not. Membership inference can be formalized as a hypothesis testing problem. The most effective existing attacks estimate the distribution of some test statistic (usually the model's confidence on the true label) on points that were (and were not) used in training by training many \emph{shadow models}---i. e. models of the same architecture as the model being attacked, trained on a random subsample of data. While effective, these attacks are extremely computationally expensive, especially when the model under attack is large. \footnotetext[0]{Martin and Shuai are the lead authors, and other authors are ordered alphabetically. {maberlop, shuat}@amazon. com}We introduce a new class of attacks based on performing quantile regression on the distribution of confidence scores induced by the model under attack on points that are not used in training. We show that our method is competitive with state-of-the-art shadow model attacks, while requiring substantially less compute because our attack requires training only a single model. Moreover, unlike shadow model attacks, our proposed attack does not require any knowledge of the architecture of the model under attack and is therefore truly ``black-box". We show the efficacy of this approach in an extensive series of experiments on various datasets and model architectures. Our code is available at \href{https: //github. com/amazon-science/quantile-mia}{github. com/amazon-science/quantile-mia. }

NeurIPS Conference 2022 Conference Paper

Private Synthetic Data for Multitask Learning and Marginal Queries

  • Giuseppe Vietri
  • Cedric Archambeau
  • Sergul Aydore
  • William Brown
  • Michael Kearns
  • Aaron Roth
  • Ankit Siva
  • Shuai Tang

We provide a differentially private algorithm for producing synthetic data simultaneously useful for multiple tasks: marginal queries and multitask machine learning (ML). A key innovation in our algorithm is the ability to directly handle numerical features, in contrast to a number of related prior approaches which require numerical features to be first converted into {high cardinality} categorical features via {a binning strategy}. Higher binning granularity is required for better accuracy, but this negatively impacts scalability. Eliminating the need for binning allows us to produce synthetic data preserving large numbers of statistical queries such as marginals on numerical features, and class conditional linear threshold queries. Preserving the latter means that the fraction of points of each class label above a particular half-space is roughly the same in both the real and synthetic data. This is the property that is needed to train a linear classifier in a multitask setting. Our algorithm also allows us to produce high quality synthetic data for mixed marginal queries, that combine both categorical and numerical features. Our method consistently runs 2-5x faster than the best comparable techniques, and provides significant accuracy improvements in both marginal queries and linear prediction tasks for mixed-type datasets.

NeurIPS Conference 2019 Conference Paper

Average Individual Fairness: Algorithms, Generalization and Experiments

  • Saeed Sharifi-Malvajerdi
  • Michael Kearns
  • Aaron Roth

We propose a new family of fairness definitions for classification problems that combine some of the best properties of both statistical and individual notions of fairness. We posit not only a distribution over individuals, but also a distribution over (or collection of) classification tasks. We then ask that standard statistics (such as error or false positive/negative rates) be (approximately) equalized across individuals, where the rate is defined as an expectation over the classification tasks. Because we are no longer averaging over coarse groups (such as race or gender), this is a semantically meaningful individual-level constraint. Given a sample of individuals and problems, we design an oracle-efficient algorithm (i. e. one that is given access to any standard, fairness-free learning heuristic) for the fair empirical risk minimization task. We also show that given sufficiently many samples, the ERM solution generalizes in two directions: both to new individuals, and to new classification tasks, drawn from their corresponding distributions. Finally we implement our algorithm and empirically verify its effectiveness.

IJCAI Conference 2019 Conference Paper

Equilibrium Characterization for Data Acquisition Games

  • Jinshuo Dong
  • Hadi Elzayn
  • Shahin Jabbari
  • Michael Kearns
  • Zachary Schutzman

We study a game between two firms which each provide a service based on machine learning. The firms are presented with the opportunity to purchase a new corpus of data, which will allow them to potentially improve the quality of their products. The firms can decide whether or not they want to buy the data, as well as which learning model to build on that data. We demonstrate a reduction from this potentially complicated action space to a one-shot, two-action game in which each firm only decides whether or not to buy the data. The game admits several regimes which depend on the relative strength of the two firms at the outset and the price at which the data is being offered. We analyze the game's Nash equilibria in all parameter regimes and demonstrate that, in expectation, the outcome of the game is that the initially stronger firm's market position weakens whereas the initially weaker firm's market position becomes stronger. Finally, we consider the perspective of the users of the service and demonstrate that the expected outcome at equilibrium is not the one which maximizes the welfare of the consumers.

IJCAI Conference 2019 Conference Paper

Network Formation under Random Attack and Probabilistic Spread

  • Yu Chen
  • Shahin Jabbari
  • Michael Kearns
  • Sanjeev Khanna
  • Jamie Morgenstern

We study a network formation game where agents receive benefits by forming connections to other agents but also incur both direct and indirect costs from the formed connections. Specifically, once the agents have purchased their connections, an attack starts at a randomly chosen vertex in the network and spreads according to the independent cascade model with a fixed probability, destroying any infected agents. The utility or welfare of an agent in our game is defined to be the expected size of the agent's connected component post-attack minus her expenditure in forming connections. Our goal is to understand the properties of the equilibrium networks formed in this game. Our first result concerns the edge density of equilibrium networks. A network connection increases both the likelihood of remaining connected to other agents after an attack as well the likelihood of getting infected by a cascading spread of infection. We show that the latter concern primarily prevails and any equilibrium network in our game contains only $O(n\log n)$ edges where $n$ denotes the number of agents. On the other hand, there are equilibrium networks that contain $\Omega(n)$ edges showing that our edge density bound is tight up to a logarithmic factor. Our second result shows that the presence of attack and its spread through a cascade does not significantly lower social welfare as long as the network is not too dense. We show that any non-trivial equilibrium network with $O(n)$ edges has $\Theta(n^2)$ social welfare, asymptotically similar to the social welfare guarantee in the game without any attacks.

NeurIPS Conference 2018 Conference Paper

Online Learning with an Unknown Fairness Metric

  • Stephen Gillen
  • Christopher Jung
  • Michael Kearns
  • Aaron Roth

We consider the problem of online learning in the linear contextual bandits setting, but in which there are also strong individual fairness constraints governed by an unknown similarity metric. These constraints demand that we select similar actions or individuals with approximately equal probability DHPRZ12, which may be at odds with optimizing reward, thus modeling settings where profit and social policy are in tension. We assume we learn about an unknown Mahalanobis similarity metric from only weak feedback that identifies fairness violations, but does not quantify their extent. This is intended to represent the interventions of a regulator who "knows unfairness when he sees it" but nevertheless cannot enunciate a quantitative fairness metric over individuals. Our main result is an algorithm in the adversarial context setting that has a number of fairness violations that depends only logarithmically on T, while obtaining an optimal O(sqrt(T)) regret bound to the best fair policy.

RLDM Conference 2017 Conference Abstract

Fairness in Reinforcement Learning

  • Shahin Jabbari
  • Matthew Joseph
  • Michael Kearns
  • Jamie Morgenstern
  • Aaron Roth

We initiate the study of fair reinforcement learning, where the actions of a learning algorithm may affect its environment and future rewards. We define a fairness constraint requiring that an algorithm never prefers one action over another if the long-term (discounted) reward of choosing the latter action is higher. Our first result is negative: despite the fact that fairness is consistent with the optimal policy, any learning algorithm satisfying fairness must take exponentially many rounds in the number of states to achieve non-trivial approximation to the optimal policy. We then provide a provably fair polynomial time algorithm under an approximate notion of fairness, thus establishing an exponential gap between exact and approximate fairness.

NeurIPS Conference 2016 Conference Paper

Fairness in Learning: Classic and Contextual Bandits

  • Matthew Joseph
  • Michael Kearns
  • Jamie Morgenstern
  • Aaron Roth

We introduce the study of fairness in multi-armed bandit problems. Our fairness definition demands that, given a pool of applicants, a worse applicant is never favored over a better one, despite a learning algorithm’s uncertainty over the true payoffs. In the classic stochastic bandits problem we provide a provably fair algorithm based on “chained” confidence intervals, and prove a cumulative regret bound with a cubic dependence on the number of arms. We further show that any fair algorithm must have such a dependence, providing a strong separation between fair and unfair learning that extends to the general contextual case. In the general contextual case, we prove a tight connection between fairness and the KWIK (Knows What It Knows) learning model: a KWIK algorithm for a class of functions can be transformed into a provably fair contextual bandit algorithm and vice versa. This tight connection allows us to provide a provably fair algorithm for the linear contextual bandit problem with a polynomial dependence on the dimension, and to show (for a different class of functions) a worst-case exponential gap in regret between fair and non-fair learning algorithms.

IJCAI Conference 2016 Conference Paper

Tight Policy Regret Bounds for Improving and Decaying Bandits

  • Hoda Heidari
  • Michael Kearns
  • Aaron Roth

We consider a variant of the well-studied multi-armed bandit problem in which the reward from each action evolves monotonically in the number of times the decision maker chooses to take that action. We are motivated by settings in which we must give a series of homogeneous tasks to a finite set of arms (workers) whose performance may improve (due to learning) or decay (due to loss of interest) with repeated trials. We assume that the arm-dependent rates at which the rewards change are unknown to the decision maker, and propose algorithms with provably optimal policy regret bounds, a much stronger notion than the often-studied external regret. For the case where the rewards are increasing and concave, we give an algorithm whose policy regret is sublinear and has a (provably necessary) dependence on the time required to distinguish the optimal arm from the rest. We illustrate the behavior and performance of this algorithm via simulations. For the decreasing case, we present a simple greedy approach and show that the policy regret of this algorithm is constant and upper bounded by the number of arms.

AAAI Conference 2015 Conference Paper

Online Learning and Profit Maximization from Revealed Preferences

  • Kareem Amin
  • Rachel Cummings
  • Lili Dworkin
  • Michael Kearns
  • Aaron Roth

We consider the problem of learning from revealed preferences in an online setting. In our framework, each period a consumer buys an optimal bundle of goods from a merchant according to her (linear) utility function and current prices, subject to a budget constraint. The merchant observes only the purchased goods, and seeks to adapt prices to optimize his profits. We give an efficient algorithm for the merchant’s problem that consists of a learning phase in which the consumer’s utility function is (perhaps partially) inferred, followed by a price optimization step. We also give an alternative online learning algorithm for the setting where prices are set exogenously, but the merchant would still like to predict the bundle that will be bought by the consumer, for purposes of inventory or supply chain management. In contrast with most prior work on the revealed preferences problem, we demonstrate that by making stronger assumptions on the form of utility functions, efficient algorithms for both learning and profit maximization are possible, even in adaptive, online settings.

AAAI Conference 2014 Conference Paper

New Models for Competitive Contagion

  • Moez Draief
  • Hoda Heidari
  • Michael Kearns

In this paper, we introduce and examine two new models for competitive contagion in networks, a game-theoretic generalization of the viral marketing problem. In our setting, firms compete to maximize their market share in a network of consumers whose adoption decisions are stochastically determined by the choices of their neighbors. Building on the switching-selecting framework introduced by Goyal and Kearns, we first introduce a new model in which the payoff to firms comprises not only the number of vertices who adopt their (competing) technologies, but also the network connectivity among those nodes. For a general class of stochastic dynamics driving the local adoption process, we derive upper bounds on (1) the (pure strategy) Price of Anarchy (PoA), which measures the inefficiency of resource use at equilibrium, and (2) the Budget Multiplier, which captures the extent to which the network amplifies the imbalances in the firms’ initial budgets. These bounds depend on the firm budgets and the maximum degree of the network, but no other structural properties. In addition, we give general conditions under which the PoA and the Budget Multiplier can be unbounded. We also introduce a model in which budgeting decisions are endogenous, rather than externally given as is typical in the viral marketing problem. In this setting, the firms are allowed to choose the number of seeds to initially infect (at a fixed cost per seed), as well as which nodes to select as seeds. In sharp contrast to the results of Goyal and Kearns, we show that for almost any local adoption dynamics, there exists a family of graphs for which the PoA and Budget Multiplier are unbounded.

NeurIPS Conference 2013 Conference Paper

Marginals-to-Models Reducibility

  • Tim Roughgarden
  • Michael Kearns

We consider a number of classical and new computational problems regarding marginal distributions, and inference in models specifying a full joint distribution. We prove general and efficient reductions between a number of these problems, which demonstrate that algorithmic progress in inference automatically yields progress for “pure data” problems. Our main technique involves formulating the problems as linear programs, and proving that the dual separation oracle for the Ellipsoid Method is provided by the target problem. This technique may be of independent interest in probabilistic inference.

AAMAS Conference 2012 Conference Paper

Learning and Predicting Dynamic Networked Behavior with Graphical Multiagent Models

  • Quang Duong
  • Michael Wellman
  • Satinder Singh
  • Michael Kearns

Factored models of multiagent systems address the complexity of joint behavior by exploiting locality in agent interactions. \emph{History-dependent graphical multiagent models} (hGMMs) further capture dynamics by conditioning behavior on history. The challenges of modeling real human behavior motivated us to extend the hGMM representation by distinguishing two types of agent interactions. This distinction opens the opportunity for learning dependence networks that are different from given graphical structures representing observed agent interactions. We propose a greedy algorithm for learning hGMMs from time-series data, inducing both graphical structure and parameters. Our empirical study employs human-subject experiment data for a dynamic consensus scenario, where agents on a network attempt to reach a unanimous vote. We show that the learned hGMMs directly expressing joint behavior outperform alternatives in predicting dynamic human voting behavior, and end-game vote results. Analysis of learned graphical structures reveals patterns of action dependence not directly reflected in the original experiment networks.

AAAI Conference 2010 Conference Paper

Private and Third-Party Randomization in Risk-Sensitive Equilibrium Concepts

  • Mickey Brautbar
  • Michael Kearns
  • Umar Syed

We consider risk-sensitive generalizations of Nash and correlated equilibria in noncooperative games. We prove that, except for a class of degenerate games, unless a two-player game has a pure Nash equilibrium, it does not have a risksensitive Nash equilibrium. We also show that every game has a risk-sensitive correlated equilibrium. The striking contrast between these existence results is due to the different sources of randomization in Nash (private randomization) and correlated equilibria (third-party randomization).

JMLR Journal 2008 Journal Article

Learning from Multiple Sources

  • Koby Crammer
  • Michael Kearns
  • Jennifer Wortman

We consider the problem of learning accurate models from multiple sources of "nearby" data. Given distinct samples from multiple data sources and estimates of the dissimilarities between these sources, we provide a general theory of which samples should be used to learn models for each source. This theory is applicable in a broad decision-theoretic learning framework, and yields general results for classification and regression. A key component of our approach is the development of approximate triangle inequalities for expected loss, which may be of independent interest. We discuss the related problem of learning parameters of a distribution from multiple data sources. Finally, we illustrate our theory through a series of synthetic simulations. [abs] [ pdf ][ bib ] &copy JMLR 2008. ( edit, beta )

NeurIPS Conference 2007 Conference Paper

Privacy-Preserving Belief Propagation and Sampling

  • Michael Kearns
  • Jinsong Tan
  • Jennifer Wortman

We provide provably privacy-preserving versions of belief propagation, Gibbs sampling, and other local algorithms — distributed multiparty protocols in which each party or vertex learns only its final local value, and absolutely nothing else.

NeurIPS Conference 2006 Conference Paper

A Small World Threshold for Economic Network Formation

  • Eyal Even-Dar
  • Michael Kearns

We introduce a game-theoretic model for network formation inspired by earlier stochastic models that mix localized and long-distance connectivity. In this model, players may purchase edges at distance d at a cost of d, and wish to minimize the sum of their edge purchases and their average distance to other players. In this model, we show there is a striking "small world" threshold phenomenon: in two dimensions, if 2 then every Nash equilibrium results in a network whose diameter grows as a root of the network size, and thus is unbounded. We contrast our results with those of Kleinberg [8] in a stochastic model, and empirically investigate the "navigability" of equilibrium networks. Our theoretical results all generalize to higher dimensions.

JAAMAS Journal 2006 Journal Article

Cobot in LambdaMOO: An Adaptive Social Statistics Agent

  • Charles Lee Isbell Jr.
  • Michael Kearns
  • Dave Kormann

Abstract We describe our development of Cobot, a novel software agent who lives in LambdaMOO, a popular virtual world frequented by hundreds of users. Cobot’s goal was to become an actual part of that community. Here, we present a detailed discussion of the functionality that made him one of the objects most frequently interacted with in LambdaMOO, human or artificial. Cobot’s fundamental power is that he has the ability to collect social statistics summarizing the quantity and quality of interpersonal interactions. Initially, Cobot acted as little more than a reporter of this information; however, as he collected more and more data, he was able to use these statistics as models that allowed him to modify his own behavior. In particular, cobot is able to use this data to “self-program, ” learning the proper way to respond to the actions of individual users, by observing how others interact with one another. Further, Cobot uses reinforcement learning to proactively take action in this complex social environment, and adapts his behavior based on multiple sources of human reward. Cobot represents a unique experiment in building adaptive agents who must live in and navigate social spaces.

NeurIPS Conference 2006 Conference Paper

Learning from Multiple Sources

  • Koby Crammer
  • Michael Kearns
  • Jennifer Wortman

We consider the problem of learning accurate models from multiple sources of "nearby" data. Given distinct samples from multiple data sources and estimates of the dissimilarities between these sources, we provide a general theory of which samples should be used to learn models for each source. This theory is applicable in a broad decision-theoretic learning framework, and yields results for classification and regression generally, and for density estimation within the exponential family. A key component of our approach is the development of approximate triangle inequalities for expected loss, which may be of independent interest.

NeurIPS Conference 2005 Conference Paper

Learning from Data of Variable Quality

  • Koby Crammer
  • Michael Kearns
  • Jennifer Wortman

We initiate the study of learning from multiple sources of limited data, each of which may be corrupted at a different rate. We develop a com- plete theory of which data sources should be used for two fundamental problems: estimating the bias of a coin, and learning a classifier in the presence of label noise. In both cases, efficient algorithms are provided for computing the optimal subset of data.

NeurIPS Conference 2004 Conference Paper

Economic Properties of Social Networks

  • Sham Kakade
  • Michael Kearns
  • Luis Ortiz
  • Robin Pemantle
  • Siddharth Suri

We examine the marriage of recent probabilistic generative models for social networks with classical frameworks from mathematical eco- nomics. We are particularly interested in how the statistical structure of such networks influences global economic quantities such as price vari- ation. Our findings are a mixture of formal analysis, simulation, and experiments on an international trade data set from the United Nations.

NeurIPS Conference 2003 Conference Paper

Algorithms for Interdependent Security Games

  • Michael Kearns
  • Luis Ortiz

nspired by events ranging from 9/11 to the collapse of the accounting firm Arthur Ander- sen, economists Kunreuther and Heal [5] recently introduced an interesting game-theoretic model for problems of interdependent security (IDS), in which a large number of players must make individual investment decisions related to security — whether physical, finan- cial, medical, or some other type — but in which the ultimate safety of each participant may depend in a complex way on the actions of the entire population. A simple example is the choice of whether to install a fire sprinkler system in an individual condominium in a large building. While such a system might greatly reduce the chances of the owner’s prop- erty being destroyed by a fire originating within their own unit, it might do little or nothing to reduce the chances of damage caused by fires originating in other units (since sprinklers can usually only douse small fires early). If “enough” other unit owners have not made the investment in sprinklers, it may be not cost-effective for any individual to do so.

NeurIPS Conference 2002 Conference Paper

A Note on the Representational Incompatibility of Function Approximation and Factored Dynamics

  • Eric Allender
  • Sanjeev Arora
  • Michael Kearns
  • Cristopher Moore
  • Alexander Russell

We establish a new hardness result that shows that the difficulty of plan- ning in factored Markov decision processes is representational rather than just computational. More precisely, we give a fixed family of fac- tored MDPs with linear rewards whose optimal policies and value func- tions simply cannot be represented succinctly in any standard parametric form. Previous hardness results indicated that computing good policies from the MDP parameters was difficult, but left open the possibility of succinct function approximation for any fixed factored MDP. Our result applies even to policies which yield a polynomially poor approximation to the optimal value, and highlights interesting connectionswith the com- plexity class of Arthur-Merlin games.

AAAI Conference 2002 Conference Paper

CobotDS: A Spoken Dialogue System for Chat

  • Michael Kearns
  • Georgia Institute of Technology; Satinder Singh
  • University of Pittsburgh; Jessica Howe

We describe CobotDS, a spoken dialogue system providing access to a well-known internet chat server called LambdaMOO. CobotDS provides real-time, two-way, natural language communication between a phone user and the multiple users in the text environment. We describe a number of the challenging design issues we faced, and our use of summarization, social filtering and personalized grammars in tackling them. We report a number of empirical findings from a small user study.

NeurIPS Conference 2002 Conference Paper

Nash Propagation for Loopy Graphical Games

  • Luis Ortiz
  • Michael Kearns

We introduce NashProp, an iterative and local message-passing algo- rithm for computing Nash equilibria in multi-player games represented by arbitrary undirected graphs. We provide a formal analysis and exper- imental evidence demonstrating that NashProp performs well on large graphical games with many loops, often converging in just a dozen itera- tions on graphs with hundreds of nodes. NashProp generalizes the tree algorithm of (Kearns et al. 2001), and can be viewed as similar in spirit to belief propagation in probabilis- tic inference, and thus complements the recent work of (Vickrey and Koller 2002), who explored a junction tree approach. Thus, as for prob- abilistic inference, we have at least two promising general-purpose ap- proaches to equilibria computation in graphs.

NeurIPS Conference 2001 Conference Paper

An Efficient, Exact Algorithm for Solving Tree-Structured Graphical Games

  • Michael Littman
  • Michael Kearns
  • Satinder Singh

We describe a new algorithm for computing a Nash equilibrium in graphical games, a compact representation for multi-agent systems that we introduced in previous work. The algorithm is the first to compute equilibria both efficiently and exactly for a non-trivial class of graphical games.

AAAI Conference 2000 Conference Paper

Cobot in LambdaMOO: A Social Statistics Agent

  • Charles Lee Isbell
  • Michael Kearns
  • Satinder Singh

We describe our development of Cobot, a software agent who lives in LambdaMOO, a popular virtual world frequented by hundreds of users. We present a detailed discussion of the functionality that has made him one of the objects most frequently interacted with in LambdaMOO, human or artificial.

IJCAI Conference 1999 Conference Paper

A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes

  • Michael Kearns
  • Yishay Mansoun
  • Andrew Y Ng

An issue that is critical for the application of Markov decision processes (MDPs) to realis­ tic problems is how the complexity of planning scales with the size of the MDP. In stochas­ tic environments with very large or even infi­ nite state spaces, traditional planning and re­ inforcement learning algorithms are often in­ applicable, since their running time typically scales linearly with the state space size In this paper we present a new algorithm that, given only a generative model (simulator) for an ar­ bitrary MDP, performs near-optimal planning with a running time that has no dependence on the number of states. Although the running time is exponential in the horizon time (which depends only on the discount factor 7 and the desired degree of approximation to the opti­ mal policy), our results establish for the first time that there are no theoretical barriers to computing near-optimal policies in arbitrarily large, unstructured MDPs. Our algorithm is based on the idea of sparse sampling. We prove that a randomly sampled look-ahead tree that covers only a vanishing fraction of the full look-ahead tree neverthe­ less suffices to compute near-optimal actions from any state of an MDP. Practical implementations of the algorithm are discussed, and we draw ties to our related recent results on finding a near-best strategy from a given class of strategies in very large partially observable MDPs [KMN99].

NeurIPS Conference 1999 Conference Paper

Approximate Planning in Large POMDPs via Reusable Trajectories

  • Michael Kearns
  • Yishay Mansour
  • Andrew Ng

We consider the problem of reliably choosing a near-best strategy from a restricted class of strategies TI in a partially observable Markov deci(cid: 173) sion process (POMDP). We assume we are given the ability to simulate the POMDP, and study what might be called the sample complexity - that is, the amount of data one must generate in the POMDP in order to choose a good strategy. We prove upper bounds on the sample com(cid: 173) plexity showing that, even for infinitely large and arbitrarily complex POMDPs, the amount of data needed can be finite, and depends only linearly on the complexity of the restricted strategy class TI, and expo(cid: 173) nentially on the horizon time. This latter dependence can be eased in a variety of ways, including the application of gradient and local search algorithms. Our measure of complexity generalizes the classical super(cid: 173) vised learning notion of VC dimension to the settings of reinforcement learning and planning.

IJCAI Conference 1999 Conference Paper

Efficient Reinforcement Learning in Factored MDPs

  • Michael Kearns
  • Daphne Koller

We present a provably efficient and near-optimal algorithm for reinforcement learning in Markov decision processes (MDPs) whose transition model can be factored as a dynamic Bayesian network (DBN). Our algorithm generalizes the recent algorithm of Kearns and Singh, and assumes that we are given both an algorithm for approximate planning, and the graphical structure (but not the parameters) of the DBN. Unlike the original algorithm, our new algorithm exploits the DBN structure to achieve a running time that scales polynomially in the number of parameters of the DBN, which may be exponentially smaller than the number of global states.

NeurIPS Conference 1999 Conference Paper

Reinforcement Learning for Spoken Dialogue Systems

  • Satinder Singh
  • Michael Kearns
  • Diane Litman
  • Marilyn Walker

Recently, a number of authors have proposed treating dialogue systems as Markov decision processes (MDPs). However, the practical application ofMDP algorithms to dialogue systems faces a number of severe technical challenges. We have built a general software tool (RLDS, for Reinforcement Learning for Dialogue Systems) based on the MDP framework, and have applied it to dialogue corpora gathered from two dialogue systems built at AT&T Labs. Our experiments demonstrate that RLDS holds promise as a tool for "browsing" and understanding correlations in complex, temporally dependent dialogue corpora.

NeurIPS Conference 1998 Conference Paper

Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms

  • Michael Kearns
  • Satinder Singh

In this paper, we address two issues of long-standing interest in the re(cid: 173) inforcement learning literature. First, what kinds of performance guar(cid: 173) antees can be made for Q-learning after only a finite number of actions? Second, what quantitative comparisons can be made between Q-learning and model-based (indirect) approaches, which use experience to estimate next-state distributions for off-line value iteration? We first show that both Q-learning and the indirect approach enjoy rather rapid convergence to the optimal policy as a function of the num(cid: 173) ber of state transitions observed. In particular, on the order of only (Nlog(1/c)/c2 )(log(N) + loglog(l/c)) transitions are sufficient for both algorithms to come within c of the optimal policy, in an idealized model that assumes the observed transitions are "well-mixed" throughout an N-state MDP. Thus, the two approaches have roughly the same sample complexity. Perhaps surprisingly, this sample complexity is far less than what is required for the model-based approach to actually construct a good approximation to the next-state distribution. The result also shows that the amount of memory required by the model-based approach is closer to N than to N 2 • For either approach, to remove the assumption that the observed tran(cid: 173) sitions are well-mixed, we consider a model in which the transitions are determined by a fixed, arbitrary exploration policy. Bounds on the number of transitions required in order to achieve a desired level of performance are then related to the stationary distribution and mixing time of this policy.

NeurIPS Conference 1998 Conference Paper

Inference in Multilayer Networks via Large Deviation Bounds

  • Michael Kearns
  • Lawrence Saul

We study probabilistic inference in large, layered Bayesian net(cid: 173) works represented as directed acyclic graphs. We show that the intractability of exact inference in such networks does not preclude their effective use. We give algorithms for approximate probabilis(cid: 173) tic inference that exploit averaging phenomena occurring at nodes with large numbers of parents. We show that these algorithms compute rigorous lower and upper bounds on marginal probabili(cid: 173) ties of interest, prove that these bounds become exact in the limit of large networks, and provide rates of convergence.

AAAI Conference 1996 Conference Paper

Boosting Theory Towards Practice: Recent Developments in Decision Tree Induction and the Weak Learning Framework

  • Michael Kearns

One of the original goals of computational learning theory was that of formulating models that permit meaningful comparisons between the different machine learning heuristics that are used in practice [Kearns et aI., 19871. Despite the other successes of computational learning theory, this goal has proven elusive. Empirically successful machine learning algorithms such as 64.5 and the backpropagation algorithm for neural networks have not met the criteria of the well-known Probably Approximately Correct (PAC) model [Valiant, 19841 and its variants, and thus such models are of little use in drawing distinctions among the heuristics used in applications. Conversely, the algorithms suggested by computational learning theory are usually too limited in various ways to find wide application.