Arrow Research search

Author name cluster

Murray Campbell

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers
2 author rows

Possible papers

13

ICLR Conference 2024 Conference Paper

On the generalization capacity of neural networks during generic multimodal reasoning

  • Takuya Ito
  • Soham Dan
  • Mattia Rigotti
  • James R. Kozloski
  • Murray Campbell

The advent of the Transformer has led to the development of large language models (LLM), which appear to demonstrate human-like capabilities. To assess the generality of this class of models and a variety of other base neural network architectures to multimodal domains, we evaluated and compared their capacity for multimodal generalization. We introduce a multimodal question-answer benchmark to evaluate three specific types of out-of-distribution (OOD) generalization performance: distractor generalization (generalization in the presence of distractors), systematic compositional generalization (generalization to new task permutations), and productive compositional generalization (generalization to more complex tasks with deeper dependencies). While we found that most architectures faired poorly on most forms of generalization (e.g., RNNs and standard Transformers), models that leveraged cross-attention mechanisms between input domains, such as the Perceiver, fared better. Our positive results demonstrate that for multimodal distractor and systematic generalization, cross-attention is an important mechanism to integrate multiple sources of information. On the other hand, all architectures failed in productive generalization, suggesting fundamental limitations of existing architectures for specific types of multimodal OOD generalization. These results demonstrate the strengths and limitations of specific architectural components underlying modern neural models for multimodal reasoning. Finally, we provide *Generic COG* (gCOG), a configurable benchmark with several multimodal generalization splits, for future studies to explore.

NeSy Conference 2022 Conference Paper

Combining Fast and Slow Thinking for Human-like and Efficient Decisions in Constrained Environments

  • Marianna Bergamaschi Ganapini
  • Murray Campbell
  • Francesco Fabiano
  • Lior Horesh
  • Jonathan Lenchner
  • Andrea Loreggia
  • Nicholas Mattei
  • Francesca Rossi 0001

Current AI systems lack several important human capabilities, such as adaptability, generalizability, selfcontrol, consistency, common sense, and causal reasoning. We believe that existing cognitive theories of human decision making, such as the thinking fast and slow theory, can provide insights on how to advance AI systems towards some of these capabilities. In this paper, we propose a general architecture that is based on fast/slow solvers and a metacognitive component. We then present experimental results on the behavior of an instance of this architecture, for AI systems that make decisions about navigating in a constrained environment. We show how combining the fast and slow decision modalities, which can be implemented by learning and reasoning components respectively, allows the system to evolve over time and gradually pass from slow to fast thinking with enough experience, and that this greatly helps in decision quality, resource consumption, and efficiency.

PRL Workshop 2021 Workshop Paper

AI Planning Annotation in Reinforcement Learning: Options and Beyond

  • Junkyu Lee
  • Michael Katz
  • Don Joven Agravante
  • Miao Liu
  • Tim Klinger
  • Murray Campbell
  • Shirin Sohrabi
  • Gerald Tesauro

AI planning and reinforcement learning (RL) both solve sequential decision-making problems, taking fundamentally different approaches. In this work, we aim to bring AI planning and RL closer by investigating the relationship between abstractions in AI planning and the options framework in RL. To this end, we propose annotating RL tasks with AI planning models, allowing us to define options based purely on the planning model. Our experimental investigation shows that these options can be quickly trained offline and can improve the sample efficiency of a reinforcement learning algorithm.

AAAI Conference 2021 Conference Paper

Circles are like Ellipses, or Ellipses are like Circles? Measuring the Degree of Asymmetry of Static and Contextual Word Embeddings and the Implications to Representation Learning

  • Wei Zhang
  • Murray Campbell
  • Yang Yu
  • Sadhana Kumaravel

Human judgments of word similarity have been a popular method of evaluating the quality of word embedding. But it fails to measure the geometry properties such as asymmetry. For example, it is more natural to say “Ellipses are like Circles” than “Circles are like Ellipses”. Such asymmetry has been observed from the word evocation experiment, where one word is used to recall another. This association data have been understudied for measuring embedding quality. In this paper, we use three well-known evocation datasets for the purpose and study both static embedding as well as contextual embedding, such as BERT. To fight for the dynamic nature of BERT embedding, we probe BERT’s conditional probabilities as a language model, using a large number of Wikipedia contexts to derive a theoretically justifiable Bayesian asymmetry score. The result shows that the asymmetry judgment and similarity judgments disagree, and asymmetry judgment aligns with its strong performance on “extrinsic evaluations”. This is the first time we can show contextual embeddings’s strength on intrinsic evaluation, and the asymmetry judgment provides a new perspective to evaluate contextual embedding and new insights for representation learning.

IJCAI Conference 2021 Conference Paper

Mental Models of AI Agents in a Cooperative Game Setting (Extended Abstract)

  • Katy Ilonka Gero
  • Zahra Ashktorab
  • Casey Dugan
  • Qian Pan
  • James Johnson
  • Werner Geyer
  • Maria Ruiz
  • Sarah Miller

As more and more forms of AI become prevalent, it becomes increasingly important to understand how people develop mental models of these systems. In this work we study people's mental models of an AI agent in a cooperative word guessing game. We run a study in which people play the game with an AI agent while ``thinking out loud''; through thematic analysis we identify features of the mental models developed by participants. In a large-scale study we have participants play the game with the AI agent online and use a post-game survey to probe their mental model. We find that those who win more often have better estimates of the AI agent's abilities. We present three components---global knowledge, local knowledge, and knowledge distribution---for modeling AI systems and propose that understanding the underlying technology is insufficient for developing appropriate conceptual models---analysis of behavior is also necessary.

AAAI Conference 2021 Conference Paper

Text-based RL Agents with Commonsense Knowledge: New Challenges, Environments and Baselines

  • Keerthiram Murugesan
  • Mattia Atzeni
  • Pavan Kapanipathi
  • Pushkar Shukla
  • Sadhana Kumaravel
  • Gerald Tesauro
  • Kartik Talamadupula
  • Mrinmaya Sachan

Text-based games have emerged as an important test-bed for Reinforcement Learning (RL) research, requiring RL agents to combine grounded language understanding with sequential decision making. In this paper, we examine the problem of infusing RL agents with commonsense knowledge. Such knowledge would allow agents to efficiently act in the world by pruning out implausible actions, and to perform lookahead planning to determine how current actions might affect future world states. We design a new text-based gaming environment called TextWorld Commonsense (TWC) for training and evaluating RL agents with a specific kind of commonsense knowledge about objects, their attributes, and affordances. We also introduce several baseline RL agents which track the sequential context and dynamically retrieve the relevant commonsense knowledge from ConceptNet. We show that agents which incorporate commonsense knowledge in TWC perform better, while acting more efficiently. We conduct user-studies to estimate human performance on TWC and show that there is ample room for future improvement.

AAAI Conference 2019 Conference Paper

Hybrid Reinforcement Learning with Expert State Sequences

  • Xiaoxiao Guo
  • Shiyu Chang
  • Mo Yu
  • Gerald Tesauro
  • Murray Campbell

Existing imitation learning approaches often require that the complete demonstration data, including sequences of actions and states, are available. In this paper, we consider a more realistic and difficult scenario where a reinforcement learning agent only has access to the state sequences of an expert, while the expert actions are unobserved. We propose a novel tensor-based model to infer the unobserved actions of the expert state sequences. The policy of the agent is then optimized via a hybrid objective combining reinforcement learning and imitation learning. We evaluated our hybrid approach on an illustrative domain and Atari games. The empirical results show that (1) the agents are able to leverage state expert sequences to learn faster than pure reinforcement learning baselines, (2) our tensor-based action inference model is advantageous compared to standard deep neural networks in inferring expert actions, and (3) the hybrid policy optimization objective is robust against noise in expert state sequences.

AAAI Conference 2019 Conference Paper

Learning to Teach in Cooperative Multiagent Reinforcement Learning

  • Shayegan Omidshafiei
  • Dong-Ki Kim
  • Miao Liu
  • Gerald Tesauro
  • Matthew Riemer
  • Christopher Amato
  • Murray Campbell
  • Jonathan P. How

Collective human knowledge has clearly benefited from the fact that innovations by individuals are taught to others through communication. Similar to human social groups, agents in distributed learning systems would likely benefit from communication to share knowledge and teach skills. The problem of teaching to improve agent learning has been investigated by prior works, but these approaches make assumptions that prevent application of teaching to general multiagent problems, or require domain expertise for problems they can apply to. This learning to teach problem has inherent complexities related to measuring long-term impacts of teaching that compound the standard multiagent coordination challenges. In contrast to existing works, this paper presents the first general framework and algorithm for intelligent agents to learn to teach in a multiagent environment. Our algorithm, Learning to Coordinate and Teach Reinforcement (LeCTR), addresses peer-to-peer teaching in cooperative multiagent reinforcement learning. Each agent in our approach learns both when and what to advise, then uses the received advice to improve local learning. Importantly, these roles are not fixed; these agents learn to assume the role of student and/or teacher at the appropriate moments, requesting and providing advice in order to improve teamwide performance and learning. Empirical comparisons against state-of-the-art teaching methods show that our teaching agents not only learn significantly faster, but also learn to coordinate in tasks where existing methods fail.

IJCAI Conference 2019 Conference Paper

Teaching AI Agents Ethical Values Using Reinforcement Learning and Policy Orchestration

  • Ritesh Noothigattu
  • Djallel Bouneffouf
  • Nicholas Mattei
  • Rachita Chandra
  • Piyush Madan
  • Kush R. Varshney
  • Murray Campbell
  • Moninder Singh

Autonomous cyber-physical agents play an increasingly large role in our lives. To ensure that they behave in ways aligned with the values of society, we must develop techniques that allow these agents to not only maximize their reward in an environment, but also to learn and follow the implicit constraints of society. We detail a novel approach that uses inverse reinforcement learning to learn a set of unspecified constraints from demonstrations and reinforcement learning to learn to maximize environmental rewards. A contextual bandit-based orchestrator then picks between the two policies: constraint-based and environment reward-based. The contextual bandit orchestrator allows the agent to mix policies in novel ways, taking the best actions from either a reward-maximizing or constrained policy. In addition, the orchestrator is transparent on which policy is being employed at each time step. We test our algorithms using Pac-Man and show that the agent is able to learn to act optimally, act within the demonstrated constraints, and mix these two functions in complex ways.

ICML Conference 2017 Conference Paper

Local-to-Global Bayesian Network Structure Learning

  • Tian Gao
  • Kshitij P. Fadnis
  • Murray Campbell

We introduce a new local-to-global structure learning algorithm, called graph growing structure learning (GGSL), to learn Bayesian network (BN) structures. GGSL starts at a (random) node and then gradually expands the learned structure through a series of local learning steps. At each local learning step, the proposed algorithm only needs to revisit a subset of the learned nodes, consisting of the local neighborhood of a target, and therefore improves on both memory and time efficiency compared to traditional global structure learning approaches. GGSL also improves on the existing local-to-global learning approaches by removing the need for conflict-resolving AND-rules, and achieves better learning accuracy. We provide theoretical analysis for the local learning step, and show that GGSL outperforms existing algorithms on benchmark datasets. Overall, GGSL demonstrates a novel direction to scale up BN structure learning while limiting accuracy loss.

AAAI Conference 1983 Conference Paper

A Chess Program That Chunks

  • Murray Campbell

CHUNKER is a ches program that uses chunked knowledge to achieve success. Its domain is a subset of king and pawn endings in chess that has been studied for over 300 years. CHUNKER has a large library of chunk instances where chunk type has a property list and each instance has a set of values for these properties. This allows CHUNKER to reason about positions that come up in the search that would otherwise have to handled by means of additional search. Thus the program is able to sohe the most difficult problem of its present domain (a problem that would require 45 ply of search and on the order of 1Oi3 years of CPU time to be solved by the best of present day chess programs) in 18 ply and one minute of CPU time. Further, CHUNKER is undoubtedly the world' s foremost expert in its domain, and has discovercd 2 mistakes in the literature and has been instrumental in discovering a new theorem about the domain that allows the assessing of positions with a new degree of ease and confidence. In this paper we describe CHUNKER’s structure and performance, and discuss our plans for extending it to play the whole domain of king and pawn endings.