Arrow Research search

Author name cluster

Tom M. Mitchell

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

29 papers
2 author rows

Possible papers

29

ICLR Conference 2024 Conference Paper

SmartPlay: A Benchmark for LLMs as Intelligent Agents

  • Yue Wu 0001
  • Xuan Tang
  • Tom M. Mitchell
  • Yuanzhi Li

Recent large language models (LLMs) have demonstrated great potential toward intelligent agents and next-gen automation, but there currently lacks a systematic benchmark for evaluating LLMs' abilities as agents. We introduce SmartPlay: both a challenging benchmark and a methodology for evaluating LLMs as agents. SmartPlay consists of 6 different games, including Rock-Paper-Scissors, Tower of Hanoi, Minecraft. Each game features a unique setting, providing up to 20 evaluation settings and infinite environment variations. Each game in SmartPlay uniquely challenges a subset of 9 important capabilities of an intelligent LLM agent, including reasoning with object dependencies, planning ahead, spatial reasoning, learning from history, and understanding randomness. The distinction between the set of capabilities each game test allows us to analyze each capability separately. SmartPlay serves not only as a rigorous testing ground for evaluating the overall performance of LLM agents but also as a road-map for identifying gaps in current methodologies. We release our benchmark at https://github.com/microsoft/SmartPlay

NeurIPS Conference 2023 Conference Paper

Read and Reap the Rewards: Learning to Play Atari with the Help of Instruction Manuals

  • Yue Wu
  • Yewen Fan
  • Paul Pu Liang
  • Amos Azaria
  • Yuanzhi Li
  • Tom M. Mitchell

High sample complexity has long been a challenge for RL. On the other hand, humans learn to perform tasks not only from interaction or demonstrations, but also by reading unstructured text documents, e. g. , instruction manuals. Instruction manuals and wiki pages are among the most abundant data that could inform agents of valuable features and policies or task-specific environmental dynamics and reward structures. Therefore, we hypothesize that the ability to utilize human-written instruction manuals to assist learning policies for specific tasks should lead to a more efficient and better-performing agent. We propose the Read and Reward framework. Read and Reward speeds up RL algorithms on Atari games by reading manuals released by the Atari game developers. Our framework consists of a QA Extraction module that extracts and summarizes relevant information from the manual and a Reasoning module that evaluates object-agent interactions based on information from the manual. An auxiliary reward is then provided to a standard A2C RL agent, when interaction is detected. Experimentally, various RL algorithms obtain significant improvement in performance and training speed when assisted by our design. Code at github. com/Holmeswww/RnR

NeurIPS Conference 2023 Conference Paper

SPRING: Studying Papers and Reasoning to play Games

  • Yue Wu
  • So Yeon Min
  • Shrimai Prabhumoye
  • Yonatan Bisk
  • Russ R. Salakhutdinov
  • Amos Azaria
  • Tom M. Mitchell
  • Yuanzhi Li

Open-world survival games pose significant challenges for AI algorithms due to their multi-tasking, deep exploration, and goal prioritization requirements. Despite reinforcement learning (RL) being popular for solving games, its high sample complexity limits its effectiveness in complex open-world games like Crafter or Minecraft. We propose a novel approach, SPRING, to read Crafter's original academic paper and use the knowledge learned to reason and play the game through a large language model (LLM). Prompted with the LaTeX source as game context and a description of the agent's current observation, our SPRING framework employs a directed acyclic graph (DAG) with game-related questions as nodes and dependencies as edges. We identify the optimal action to take in the environment by traversing the DAG and calculating LLM responses for each node in topological order, with the LLM's answer to final node directly translating to environment actions. In our experiments, we study the quality of in-context "reasoning" induced by different forms of prompts under the setting of the Crafter environment. Our experiments suggest that LLMs, when prompted with consistent chain-of-thought, have great potential in completing sophisticated high-level trajectories. Quantitatively, SPRING with GPT-4 outperforms all state-of-the-art RL baselines, trained for 1M steps, without any training. Finally, we show the potential of Crafter as a test bed for LLMs. Code at github. com/holmeswww/SPRING

NeSy Conference 2023 Conference Paper

The Roles of Symbols in Neural-based AI: They are Not What You Think!

  • Daniel L. Silver
  • Tom M. Mitchell

We present a novel neuro-symbolic hypothesis and an architecture for intelligent agents that combines subsymbolic representations for symbols and concepts for learning and reasoning. We argue that symbols will remain critical to the future of intelligent systems NOT because they are the fundamental building blocks of thought, but because they characterize the subsymbolic processes that constitute thought. In [1] we begin by defining terminology for discussing the neural encoding of symbols and concepts, and describing the key questions we seek to answer about neuro-symbolic systems. We then present relevant research results from neuroscience, behavioral (cognitive) science, and artificial intelligence, that yield evidence about the combination of symbolic and subsymbolic processing in humans and current artificial neural networks. Guided by this evidence, we present a novel neuro-symbolic hypothesis and an associated architecture meant to provide a plausible answer to the question of how humans might implement neuro-symbolic reasoning, and how future intelligent agents might be designed to do so as well.

ICLR Conference 2020 Conference Paper

Jelly Bean World: A Testbed for Never-Ending Learning

  • Emmanouil Antonios Platanios
  • Abulhair Saparov
  • Tom M. Mitchell

Machine learning has shown growing success in recent years. However, current machine learning systems are highly specialized, trained for particular problems or domains, and typically on a single narrow dataset. Human learning, on the other hand, is highly general and adaptable. Never-ending learning is a machine learning paradigm that aims to bridge this gap, with the goal of encouraging researchers to design machine learning systems that can learn to perform a wider variety of inter-related tasks in more complex environments. To date, there is no environment or testbed to facilitate the development and evaluation of never-ending learning systems. To this end, we propose the Jelly Bean World testbed. The Jelly Bean World allows experimentation over two-dimensional grid worlds which are filled with items and in which agents can navigate. This testbed provides environments that are sufficiently complex and where more generally intelligent algorithms ought to perform better than current state-of-the-art reinforcement learning approaches. It does so by producing non-stationary environments and facilitating experimentation with multi-task, multi-agent, multi-modal, and curriculum learning settings. We hope that this new freely-available software will prompt new research and interest in the development and evaluation of never-ending learning systems and more broadly, general intelligence systems.

NeurIPS Conference 2020 Conference Paper

Modeling Task Effects on Meaning Representation in the Brain via Zero-Shot MEG Prediction

  • Mariya Toneva
  • Otilia Stretcu
  • Barnabas Poczos
  • Leila Wehbe
  • Tom M. Mitchell

How meaning is represented in the brain is still one of the big open questions in neuroscience. Does a word (e. g. , bird) always have the same representation, or does the task under which the word is processed alter its representation (answering can you eat it? " versus can it fly? ")? The brain activity of subjects who read the same word while performing different semantic tasks has been shown to differ across tasks. However, it is still not understood how the task itself contributes to this difference. In the current work, we study Magnetoencephalography (MEG) brain recordings of participants tasked with answering questions about concrete nouns. We investigate the effect of the task (i. e. the question being asked) on the processing of the concrete noun by predicting the millisecond-resolution MEG recordings as a function of both the semantics of the noun and the task. Using this approach, we test several hypotheses about the task-stimulus interactions by comparing the zero-shot predictions made by these hypotheses for novel tasks and nouns not seen during training. We find that incorporating the task semantics significantly improves the prediction of MEG recordings, across participants. The improvement occurs 475-550ms after the participants first see the word, which corresponds to what is considered to be the ending time of semantic processing for a word. These results suggest that only the end of semantic processing of a word is task-dependent, and pose a challenge for future research to formulate new hypotheses for earlier task effects as a function of the task and stimuli.

JAAMAS Journal 2019 Journal Article

An agent for learning new natural language commands

  • Amos Azaria
  • Shashank Srivastava
  • Tom M. Mitchell

Abstract Teaching via natural language is an intuitive way for end users to add functionality to a virtual assistant, enabling them to personalize their assistant with new commands without requiring the intervention of the system developer, who cannot possibly anticipate all of an end user’s needs. In this paper we introduce our Learning by Instruction Agent (LIA), the first virtual assistant, for an email domain, that is capable of learning how to perform new commands taught by end users in natural language. LIA grounds the semantics of each command in terms of primitive executable procedures. When a user provides LIA with a command that it does not understand, it prompts the user to explain the command through a sequence of natural language steps. From this input, LIA learns the meaning of the new command and how to generalize the command to novel situations. For example, having been taught how to “forward an email to Alice”, it can correctly understand “forward this email to Bob”. We show that users that were assigned to interact with LIA completed the task quicker than users assigned to interact with a non-learning agent. These results demonstrate the potential of natural language teaching to improve the capabilities of intelligent personal assistants. We annotated 4759 natural language statements with their associated computer readable execution commands (logical forms) to form a dataset (which we publicize in this paper). We present the performance of several different parser methods on this dataset.

ICML Conference 2016 Conference Paper

Estimating Accuracy from Unlabeled Data: A Bayesian Approach

  • Emmanouil Antonios Platanios
  • Avinava Dubey
  • Tom M. Mitchell

We consider the question of how unlabeled data can be used to estimate the true accuracy of learned classifiers, and the related question of how outputs from several classifiers performing the same task can be combined based on their estimated accuracies. To answer these questions, we first present a simple graphical model that performs well in practice. We then provide two nonparametric extensions to it that improve its performance. Experiments on two real-world data sets produce accuracy estimates within a few percent of the true accuracy, using solely unlabeled data. Our models also outperform existing state-of-the-art solutions in both estimating accuracies, and combining multiple classifier outputs.

UAI Conference 2014 Conference Paper

Estimating Accuracy from Unlabeled Data

  • Emmanouil Antonios Platanios
  • Avrim Blum
  • Tom M. Mitchell

We consider the question of how unlabeled data can be used to estimate the true accuracy of learned classifiers. This is an important question for any autonomous learning system that must estimate its accuracy without supervision, and also when classifiers trained from one data distribution must be applied to a new distribution (e. g. , document classifiers trained on one text corpus are to be applied to a second corpus). We first show how to estimate error rates exactly from unlabeled data when given a collection of competing classifiers that make independent errors, based on the agreement rates between subsets of these classifiers. We further show that even when the competing classifiers do not make independent errors, both their accuracies and error dependencies can be estimated by making certain relaxed assumptions. Experiments on two data real-world data sets produce estimates within a few percent of the true accuracy, using solely unlabeled data. These results are of practical significance in situations where labeled data is scarce and shed light on the more general question of how the consistency among multiple functions is related to their true accuracies.

ECAI Conference 2012 Invited Paper

Never Ending Learning

  • Tom M. Mitchell

We will never really understand learning or intelligence until we can build machines that learn many different things, over years, and become better learners over time. This talk describes our research to build a Never-Ending Language Learner (NELL) that runs 24 hours per day, forever, learning to read the web. Each day NELL extracts (reads) more facts from the web, and integrates these into its growing knowledge base of beliefs. Each day NELL also learns to read better than yesterday, enabling it to go back to the text it read yesterday, and extract more facts, more accurately. NELL has been running 24 hours/day for over two years now. The result so far is a collection of 15 million interconnected beliefs (e. g. , servedWtih(coffee, applePie), isA(applePie, bakedGood)), that NELL is considering at different levels of confidence, along with hundreds of thousands of learned phrasings, morphoogical features, and web page structures that NELL uses to extract beliefs from the web. Track NELL's progress at http: //rtw. ml. cmu. edu.

YNIMG Journal 2011 Journal Article

Commonality of neural representations of words and pictures

  • Svetlana V. Shinkareva
  • Vicente L. Malave
  • Robert A. Mason
  • Tom M. Mitchell
  • Marcel Adam Just

In this work we explore whether the patterns of brain activity associated with thinking about concrete objects are dependent on stimulus presentation format, whether an object is referred to by a written or pictorial form. Multi-voxel pattern analysis methods were applied to brain imaging (fMRI) data to identify the item category associated with brief viewings of each of 10 words (naming 5 tools and 5 dwellings) and, separately, with brief viewings of each of 10 pictures (line drawings) of the objects named by the words. These methods were able to identify the category of the picture the participant was viewing, based on neural activation patterns observed during word-viewing, and identify the category of the word the participant was viewing, based on neural activation patterns observed during picture-viewing, using data from only that participant or only from other participants. These results provide an empirical demonstration of object category identification across stimulus formats and across participants. In addition, we were able to identify the category of the word that the participant was viewing based on the patterns of neural activation generated during word-viewing by that participant or by all other participants. Similarly, we were able to identify with even higher accuracy the category of the picture the participant was viewing, based on the patterns of neural activation demonstrated during picture-viewing by that participant or by all other participants. The brain locations that were important for category identification were similar across participants and were distributed throughout the cortex where various object properties might be neurally represented. These findings indicate consistent triggering of semantic representations using different stimulus formats and suggest the presence of stable, distributed, and identifiable neural states that are common to pictorial and verbal input referring to object categories.

YNIMG Journal 2009 Journal Article

Modeling fMRI data generated by overlapping cognitive processes with unknown onsets using Hidden Process Models

  • Rebecca A. Hutchinson
  • Radu Stefan Niculescu
  • Timothy A. Keller
  • Indrayana Rustandi
  • Tom M. Mitchell

We present a new method for modeling fMRI time series data called Hidden Process Models (HPMs). Like several earlier models for fMRI analysis, Hidden Process Models assume that the observed data is generated by a sequence of underlying mental processes that may be triggered by stimuli. HPMs go beyond these earlier models by allowing for processes whose timing may be unknown, and that might not be directly tied to specific stimuli. HPMs provide a principled, probabilistic framework for simultaneously learning the contribution of each process to the observed data, as well as the timing and identities of each instantiated process. They also provide a framework for evaluating and selecting among competing models that assume different numbers and types of underlying mental processes. We describe the HPM framework and its learning and inference algorithms, and present experimental results demonstrating its use on simulated and real fMRI data. Our experiments compare several models of the data using cross-validated data log-likelihood in an fMRI study involving overlapping mental processes whose timings are not fully known.

IJCAI Conference 2007 Conference Paper

  • Radu Stefan Niculescu
  • Tom M. Mitchell
  • R. Bharat Rao

The task of learning models for many real-world problems requires incorporating domain knowledge into learning algorithms, to enable accurate learning from a realistic volume of training data. Domain knowledge can come in many forms. For example, expert knowledge about the relevance of variables relative to a certain problem can help perform better feature selection. Domain knowledge about the conditional independence relationships among variables can help learning of the Bayesian Network structure. This paper considers a different type of domain knowledge for constraining parameter estimates when learning Bayesian Networks. In particular, we consider domain knowledge that comes in the form of inequality constraints among subsets of parameters in a Bayesian Network with known structure. These parameter constraints are incorporated into learning procedures for Bayesian Networks, by formulating this task as a constrained optimization problem. The main contribution of this paper is the derivation of closed form Maximum Likelihood parameter estimators in the above setting.

IROS Conference 2007 Conference Paper

Feature selection for grasp recognition from optical markers

  • Lillian Y. Chang
  • Nancy S. Pollard
  • Tom M. Mitchell
  • Eric P. Xing

Although the human hand is a complex biomechanical system, only a small set of features may be necessary for observation learning of functional grasp classes. We explore how to methodically select a minimal set of hand pose features from optical marker data for grasp recognition. Supervised feature selection is used to determine a reduced feature set of surface marker locations on the hand that is appropriate for grasp classification of individual hand poses. Classifiers trained on the reduced feature set of five markers retain at least 92% of the prediction accuracy of classifiers trained on a full feature set of thirty markers. The reduced model also generalizes better to new subjects. The dramatic reduction of the marker set size and the success of a linear classifier from local marker coordinates recommend optical marker techniques as a practical alternative to data glove methods for observation learning of grasping.

JMLR Journal 2006 Journal Article

Bayesian Network Learning with Parameter Constraints

  • Radu Stefan Niculescu
  • Tom M. Mitchell
  • R. Bharat Rao

The task of learning models for many real-world problems requires incorporating domain knowledge into learning algorithms, to enable accurate learning from a realistic volume of training data. This paper considers a variety of types of domain knowledge for constraining parameter estimates when learning Bayesian networks. In particular, we consider domain knowledge that constrains the values or relationships among subsets of parameters in a Bayesian network with known structure. We incorporate a wide variety of parameter constraints into learning procedures for Bayesian networks, by formulating this task as a constrained optimization problem. The assumptions made in module networks, dynamic Bayes nets and context specific independence models can be viewed as particular cases of such parameter constraints. We present closed form solutions or fast iterative algorithms for estimating parameters subject to several specific classes of parameter constraints, including equalities and inequalities among parameters, constraints on individual parameters, and constraints on sums and ratios of parameters, for discrete and continuous variables. Our methods cover learning from both frequentist and Bayesian points of view, from both complete and incomplete data. We present formal guarantees for our estimators, as well as methods for automatically learning useful parameter constraints from data. To validate our approach, we apply it to the domain of fMRI brain image analysis. Here we demonstrate the ability of our system to first learn useful relationships among parameters, and then to use them to constrain the training of the Bayesian network, resulting in improved cross-validated accuracy of the learned model. Experiments on synthetic data are also presented. [abs] [ pdf ][ bib ] &copy JMLR 2006. ( edit, beta )

AAAI Conference 2006 Conference Paper

Extracting Knowledge about Users’ Activities from Raw Workstation Contents

  • Tom M. Mitchell
  • Yifen Huang

A long-standing goal of AI is the development of intelligent workstation-based personal agents to assist users in their daily lives. A key impediment to this goal is the unrealistic cost of developing and maintaining a detailed knowledge base describing the user’s different activities, and which people, meetings, emails, etc. are affiliated with each such activity. This paper presents a clustering approach to automatically acquiring such a knowledge base by analyzing the raw contents of the workstation, including emails, contact person names, and online calendar meetings. Our approach analyzes the distribution of email words, the social network of email senders and recipients, and the results of Google Desktop Search queried with text from online calendar entries and person contact names. For each cluster it constructs, the program outputs a frame-based representation of the corresponding user activity. This paper describes our approach and experimentally assesses its performance over the workstations of three different users.

AAAI Conference 1996 Conference Paper

Challenge Problems for Artificial Intelligence

  • Bart Selman
  • Thomas Dean
  • Tom M. Mitchell

AI textbooks and papers often discuss the big questions, such as "how to reason with uncertainty", "how to reason efficiently", or "how to improve performance through learning ." It is more difficult, however, to find descriptions of concrete problems or challenges that are still ambitious and interesting, yet not so open-ended. The goal of this panel is to formulate a set of such challenge problems for the field. Each panelist was asked to formulate one or more challenges. The emphasis is on problems for which there is a good chance that they will be resolved within the next five to ten years.

AIJ Journal 1993 Journal Article

An apprentice-based approach to knowledge acquisition

  • Sridhar Mahadevan
  • Tom M. Mitchell
  • Jack Mostow
  • Lou Steinberg
  • Prasad V. Tadepalli

We explore here the feasibility of learning apprentice programs: interactive knowledge-based assistants that learn by observing and analyzing the problem-solving steps of their users. In particular, we describe a learning apprentice for digital circuit design, called LEAP. LEAP learns feasible ways of decomposing circuit modules into submodules, as well as the recommended method when there are competing feasible decompositions. VBL is an explanation-based learning technique used in LEAP to infer problem-reduction operators for decomposing circuit modules. PED is a general extension of explanation-based learning to incomplete domain theories containing determinations. PED is used in LEAP to learn control rules for ranking alternative decompositions as well as to extend LEAP's partial theory of circuit cost. An experimental study shows that by using this approach LEAP can learn a significant subset of a manually created knowledge base for boolean circuit design. The experimental study also reveals some limitations of LEAP, and more generally suggests directions for further research in building effective learning apprentice systems.

ICRA Conference 1990 Conference Paper

Learning reliable manipulation strategies without initial physical models

  • Alan D. Christiansen
  • Matthew T. Mason
  • Tom M. Mitchell

A description is given of a robot, possessing limited sensory and effectory capabilities but no initial model of the effects of its actions on the world, that acquires such a model through exploration, practice, and observation. By acquiring an increasingly correct model of its actions, it generates increasingly successful plans to achieve its goals. In an apparently nondeterministic world, achieving reliability requires the identification of reliable actions and a preference for using such actions. Furthermore, by selecting its training actions carefully, the robot can significantly improve its learning rate. >

AAAI Conference 1983 Conference Paper

An Intelligent Aid for Circuit Redesign

  • Tom M. Mitchell
  • Smadar Kedar-Cabelli
  • Jeffrey Shulman

Digital circuit redesign is a task that requires knowledge of CirCUit structure, function, and purpose, and of the interrelationships among these We describe a knowledge-based system, REDESIGN, which assists In the redesign of digital circuits to meet altered functlonal specifications. REDESIGN assists the user in focusing on an appropriate portion of the circuit, generating possible local changes within the circuit, ranking these possible changes, and detecting undesirable side-effects of redesigns. lt provides this assistance by combining two modes of reasoning about circuits: (I) causal reasoning involving analysis of circuit operation, and (2) reasoning about the purposes, or roles, of various circuit modules within the larger circuit. We describe these two modes of reasoning, and the way in which they are combined by REDESIGN to provide aid In circuit redesign.