Arrow Research search

Author name cluster

Celso M. de Melo

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

12 papers
2 author rows

Possible papers

12

ICRA Conference 2025 Conference Paper

ConceptAgent: LLM-Driven Precondition Grounding and Tree Search for Robust Task Planning and Execution

  • Corban Rivera
  • Grayson Byrd
  • William Paul
  • Tyler Feldman
  • Meghan Booker
  • Emma Holmes
  • David Handelman
  • Bethany Kemp

Robotic planning and execution in open-world environments is a complex problem due to the vast state spaces and high variability of task embodiment. Recent advances in perception algorithms, combined with Large Language Models (LLMs) for planning, offer promising solutions to these challenges, as the common sense reasoning capabilities of LLMs provide a strong heuristic for efficiently searching the action space. However, prior work fails to address the possibility of hallucinations from LLMs, which results in failures to execute the planned actions largely due to logical fallacies at high-or low-levels. To contend with automation failure due to such hallucinations, we introduce ConceptAgent, a natural language-driven robotic platform designed for task execution in unstructured environments. With a focus on scalability and reliability of LLM-based planning in complex state and action spaces, we present innovations designed to limit these shortcomings, including 1) Predicate Grounding to prevent and recover from infeasible actions, and 2) an embodied version of LLM-guided Monte Carlo Tree Search with self reflection. ConceptAgent combines these planning enhancements with dynamic language aligned 3d scene graphs, and large multi-modal pretrained models to perceive, localize, and interact with its environment, enabling reliable task completion. In simulation experiments, ConceptAgent achieved a 19% task completion rate across three room layouts and 30 easy level embodied tasks outperforming other state-of-the-art LLM-driven reasoning baselines that scored 10. 26% and 8. 11% on the same benchmark. Additionally, ablation studies on moderate to hard embodied tasks revealed a 20% increase in task completion from the baseline agent to the fully enhanced ConceptAgent, highlighting the individual and combined contributions of Predicate Grounding and LLM-guided Tree Search to enable more robust automation in complex state and action spaces. Additionally, in real-world mobile manipulation trials, conducted in randomized, low-clutter environments, a ConceptAgent-driven Spot robot achieved a 40% task completion rate, demonstrating the performance of our perception system in real-world scenarios.

ICRA Conference 2024 Conference Paper

ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning

  • Qiao Gu
  • Ali Kuwajerwala
  • Sacha Morin
  • Krishna Murthy Jatavallabhula
  • Bipasha Sen
  • Aditya Agarwal
  • Corban Rivera
  • William Paul

For robots to perform a wide variety of tasks, they require a 3D representation of the world that is semantically rich, yet compact and efficient for task-driven perception and planning. Recent approaches have attempted to leverage features from large vision-language models to encode semantics in 3D representations. However, these approaches tend to produce maps with per-point feature vectors, which do not scale well in larger environments, nor do they contain semantic spatial relationships between entities in the environment, which are useful for downstream planning. In this work, we propose ConceptGraphs, an open-vocabulary graph-structured representation for 3D scenes. ConceptGraphs is built by leveraging 2D foundation models and fusing their output to 3D by multi-view association. The resulting representations generalize to novel semantic classes, without the need to collect large 3D datasets or finetune models. We demonstrate the utility of this representation through a number of downstream planning tasks that are specified through abstract (language) prompts and require complex reasoning over spatial and semantic concepts. To explore the full scope of our experiments and results, we encourage readers to visit our project webpage.

AAAI Conference 2024 Conference Paper

Entropic Open-Set Active Learning

  • Bardia Safaei
  • Vibashan Vs
  • Celso M. de Melo
  • Vishal M. Patel

Active Learning (AL) aims to enhance the performance of deep models by selecting the most informative samples for annotation from a pool of unlabeled data. Despite impressive performance in closed-set settings, most AL methods fail in real-world scenarios where the unlabeled data contains unknown categories. Recently, a few studies have attempted to tackle the AL problem for the open-set setting. However, these methods focus more on selecting known samples and do not efficiently utilize unknown samples obtained during AL rounds. In this work, we propose an Entropic Open-set AL (EOAL) framework which leverages both known and unknown distributions effectively to select informative samples during AL rounds. Specifically, our approach employs two different entropy scores. One measures the uncertainty of a sample with respect to the known-class distributions. The other measures the uncertainty of the sample with respect to the unknown-class distributions. By utilizing these two entropy scores we effectively separate the known and unknown samples from the unlabeled data resulting in better sampling. Through extensive experiments, we show that the proposed method outperforms existing state-of-the-art methods on CIFAR-10, CIFAR-100, and TinyImageNet datasets. Code is available at https://github.com/bardisafa/EOAL.

ICRA Conference 2023 Conference Paper

AZTR: Aerial Video Action Recognition with Auto Zoom and Temporal Reasoning

  • Xijun Wang 0002
  • Ruiqi Xian
  • Tianrui Guan
  • Celso M. de Melo
  • Stephen M. Nogar
  • Aniket Bera
  • Dinesh Manocha

We propose a novel approach for aerial video action recognition. Our method is designed for videos captured using UAVs and can run on edge or mobile devices. We present a learning-based approach that uses customized auto zoom to automatically identify the human target and scale it appropriately. This makes it easier to extract the key features and reduces the computational overhead. We also present an efficient temporal reasoning algorithm to capture the action information along the spatial and temporal domains within a controllable computational cost. Our approach has been implemented and evaluated both on the desktop with high-end GPUs and on the low power Robotics RB5 Platform for robots and drones. In practice, we achieve $6. 1-7. 4 \%$ improvement over SOTA in Top-1 accuracy on the RoCoG-v2 dataset, 8. 3-10. 4% improvement on the UAV-Human dataset and 3. 2% improvement on the Drone Action dataset.

ICRA Conference 2023 Conference Paper

Synthetic-to-Real Domain Adaptation for Action Recognition: A Dataset and Baseline Performances

  • Arun V. Reddy
  • Ketul Shah
  • William Paul
  • Rohita Mocharla
  • Judy Hoffman
  • Kapil D. Katyal
  • Dinesh Manocha
  • Celso M. de Melo

Human action recognition is a challenging problem, particularly when there is high variability in factors such as subject appearance, backgrounds and viewpoint. While deep neural networks (DNNs) have been shown to perform well on action recognition tasks, they typically require large amounts of high-quality labeled data to achieve robust performance across a variety of conditions. Synthetic data has shown promise as a way to avoid the substantial costs and potential ethical concerns associated with collecting and labeling enormous amounts of data in the real-world. However, synthetic data may differ from real data in important ways. This phenomenon, known as domain shift, can limit the utility of synthetic data in robotics applications. To mitigate the effects of domain shift, substantial effort is being dedicated to the development of domain adaptation (DA) techniques. Yet, much remains to be understood about how best to develop these techniques. In this paper, we introduce a new dataset called Robot Control Gestures (RoCoG-v2). The dataset is composed of both real and synthetic videos from seven gesture classes, and is intended to support the study of synthetic-to-real domain shift for video-based action recognition. Our work expands upon existing datasets by focusing the action classes on gestures for human-robot teaming, as well as by enabling investigation of domain shift in both ground and aerial views. We present baseline results using state-of-the-art action recognition and domain adaptation algorithms and offer initial insight on tackling the synthetic-to-real and ground-to-air domain shifts. Instructions on accessing the dataset can be found at https://github.com/reddyav1/RoCoG-v2.

IROS Conference 2020 Conference Paper

Vision-Based Gesture Recognition in Human-Robot Teams Using Synthetic Data

  • Celso M. de Melo
  • Brandon Rothrock
  • Prudhvi Gurram
  • Oytun Ulutan
  • B. S. Manjunath

Building successful collaboration between humans and robots requires efficient, effective, and natural communication. Here we study a RGB-based deep learning approach for controlling robots through gestures (e. g. , "follow me"). To address the challenge of collecting high-quality annotated data from human subjects, synthetic data is considered for this domain. We contribute a dataset of gestures that includes real videos with human subjects and synthetic videos from our custom simulator. A solution is presented for gesture recognition based on the state-of-the-art I3D model. Comprehensive testing was conducted to optimize the parameters for this model. Finally, to gather insight on the value of synthetic data, several experiments are described that systematically study the properties of synthetic data (e. g. , gesture variations, character variety, generalization to new gestures). We discuss practical implications for the design of effective human-robot collaboration and the usefulness of synthetic data for deep learning.

AAMAS Conference 2018 Conference Paper

Social Decisions and Fairness Change When People's Interests Are Represented by Autonomous Agents

  • Celso M. de Melo
  • Stacy Marsella
  • Jonathan Gratch

Recent times have seen an emergence of a new breed of intelligent machines that act autonomously on our behalf, such as autonomous vehicles, drones, personal assistants, etc. These machines introduce a new interaction paradigm where people instruct, or program, these agents to act on their behalf with others. Here we show that this act of programming changes the way people think about the situation, often leading them to adopt a broader perspective and act more fairly. We present four studies where participants made fairer decisions in ultimatum and negotiation tasks when engaging through an agent representative, when compared to direct interaction with others. These findings emphasize the importance of understanding the cognitive factors underlying people’s decision making when designing autonomous machines, if we wish to promote a fairer society.

JAAMAS Journal 2017 Journal Article

Social decisions and fairness change when people’s interests are represented by autonomous agents

  • Celso M. de Melo
  • Stacy Marsella
  • Jonathan Gratch

Abstract There has been growing interest on agents that represent people’s interests or act on their behalf such as automated negotiators, self-driving cars, or drones. Even though people will interact often with others via these agent representatives, little is known about whether people’s behavior changes when acting through these agents, when compared to direct interaction with others. Here we show that people’s decisions will change in important ways because of these agents; specifically, we showed that interacting via agents is likely to lead people to behave more fairly, when compared to direct interaction with others. We argue this occurs because programming an agent leads people to adopt a broader perspective, consider the other side’s position, and rely on social norms—such as fairness—to guide their decision making. To support this argument, we present four experiments: in Experiment 1 we show that people made fairer offers in the ultimatum and impunity games when interacting via agent representatives, when compared to direct interaction; in Experiment 2, participants were less likely to accept unfair offers in these games when agent representatives were involved; in Experiment 3, we show that the act of thinking about the decisions ahead of time—i. e. , under the so-called “strategy method”—can also lead to increased fairness, even when no agents are involved; and, finally, in Experiment 4 we show that participants were less likely to reach an agreement with unfair counterparts in a negotiation setting. We discuss theoretical implications for our understanding of the nature of people’s social behavior with agent representatives, as well as practical implications for the design of agents that have the potential to increase fairness in society.

AAMAS Conference 2016 Conference Paper

"Do as I Say, Not as I Do: " Challenges in Delegating Decisions to Automated Agents

  • Celso M. de Melo
  • Stacy Marsella
  • Jonathan Gratch

There has been growing interest, across various domains, in computer agents that can decide on behalf of humans. These agents have the potential to save considerable time and help humans reach better decisions. One implicit assumption, however, is that, as long as the algorithms that simulate decision-making are correct and capture how humans make decisions, humans will treat these agents similarly to other humans. Here we show that interaction with agents that act on our behalf or on behalf of others is richer and more interesting than initially expected. Our results show that, on the one hand, people are more selfish with agents acting on behalf of others, than when interacting directly with others. We propose that agents increase the social distance with others which, subsequently, leads to increased demand. On the other hand, when people task an agent to interact with others, people show more concern for fairness than when interacting directly with others. In this case, higher psychological distance leads people to consider their social image and the long-term consequences of their actions and, thus, behave more fairly. To support these findings, we present an experiment where people engaged in the ultimatum game, either directly or via an agent, with others or agents representing others. We show that these patterns of behavior also occur in a variant of the ultimatum game – the impunity game – where others have minimal power over the final outcome. Finally, we study how social value orientation – i. e. , people’s propensity for cooperation – impact these effects. These results have important implications for our understanding of the psychological mechanisms underlying interaction with agents, as well as practical implications for the design of successful agents that act on our behalf or on behalf of others. General Terms Experimentation, Economics, Human Factors, Theory.

AAMAS Conference 2011 Conference Paper

The Effect of Expression of Anger and Happiness in Computer Agents on Negotiations with Humans

  • Celso M. de Melo
  • Peter Carnevale
  • Jonathan Gratch

There is now considerable evidence in social psychology, economics, and related disciplines that emotion plays an important role in negotiation. For example, humans make greater concessions in negotiation to an opposing human who expresses anger, and they make fewer concessions to an opponent who expresses happiness, compared to a no-emotion-expression control. However, in AI, despite the wide interest in negotiation as a means to resolve differences between agents and humans, emotion has been largely ignored. This paper explores whether expression of anger or happiness by computer agents, in a multi-issue negotiation task, can produce effects that resemble effects seen in human-human negotiation. The paper presents an experiment where participants play with agents that express emotions (anger vs. happiness vs. control) through different modalities (text vs. facial displays). An important distinction in our experiment is that participants are aware that they negotiate with computer agents. The data indicate that the emotion effects observed in past work with humans also occur in agent-human negotiation, and occur independently of modality of expression. The implications of these results are discussed for the fields of automated negotiation, intelligent virtual agents and artificial intelligence.