Arrow Research search

Author name cluster

Arjun Gupta

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers
2 author rows

Possible papers

8

IROS Conference 2024 Conference Paper

Estimating Perceptual Uncertainty to Predict Robust Motion Plans

  • Arjun Gupta
  • Michelle Zhang
  • Saurabh Gupta 0001

A typical sense-plan-act robotics pipeline is brittle due to the inherent inaccuracies in the output of the sensing module and the lack of awareness of the planning module to those inaccuracies. This paper develops a framework to predict uncertainty estimates for neural network-based vision models used for state estimation in robotics pipelines. Our uncertainty estimates are based directly on the image observation data and are explicitly trained to model the error distribution on a held-out calibration set. We also demonstrate how predicted uncertainties can be used to select robust control strategies. We conduct experiments on the mobile manipulation problem of articulating everyday objects (e. g. opening a cupboard) and demonstrate the quality of estimated uncertainty and its downstream impact on robustness of inferred control strategies.

ICRA Conference 2023 Conference Paper

Predicting Motion Plans for Articulating Everyday Objects

  • Arjun Gupta
  • Max E. Shepherd
  • Saurabh Gupta 0001

Mobile manipulation tasks such as opening a door, pulling open a drawer, or lifting a toilet seat require constrained motion of the end-effector under environmental and task constraints. This, coupled with partial information in novel environments, makes it challenging to employ classical motion planning approaches at test time. Our key insight is to cast it as a learning problem to leverage past experience of solving similar planning problems to directly predict motion plans for mobile manipulation tasks in novel situations at test time. To enable this, we develop a simulator, ArtObjSim, that simulates articulated objects placed in real scenes. We then introduce $\mathbf{SeqIK}+\theta_{0}$, a fast and flexible representation for motion plans. Finally, we learn models that use $\mathbf{SeqIK}+\theta_{0}$ to quickly predict motion plans for articulating novel objects at test time. Experimental evaluation shows improved speed and accuracy at generating motion plans than pure search-based methods and pure learning methods.

ICLR Conference 2022 Conference Paper

Learning Value Functions from Undirected State-only Experience

  • Matthew Chang
  • Arjun Gupta
  • Saurabh Gupta 0001

This paper tackles the problem of learning value functions from undirected state-only experience (state transitions without action labels i.e. (s,s',r) tuples). We first theoretically characterize the applicability of Q-learning in this setting. We show that tabular Q-learning in discrete Markov decision processes (MDPs) learns the same value function under any arbitrary refinement of the action space. This theoretical result motivates the design of Latent Action Q-learning or LAQ, an offline RL method that can learn effective value functions from state-only experience. Latent Action Q-learning (LAQ) learns value functions using Q-learning on discrete latent actions obtained through a latent-variable future prediction model. We show that LAQ can recover value functions that have high correlation with value functions learned using ground truth actions. Value functions learned using LAQ lead to sample efficient acquisition of goal-directed behavior, can be used with domain-specific low-level controllers, and facilitate transfer across embodiments. Our experiments in 5 environments ranging from 2D grid world to 3D visual navigation in realistic environments demonstrate the benefits of LAQ over simpler alternatives, imitation learning oracles, and competing methods.

ICLR Conference 2022 Conference Paper

The Uncanny Similarity of Recurrence and Depth

  • Avi Schwarzschild
  • Arjun Gupta
  • Amin Ghiasi
  • Micah Goldblum
  • Tom Goldstein

It is widely believed that deep neural networks contain layer specialization, wherein networks extract hierarchical features representing edges and patterns in shallow layers and complete objects in deeper layers. Unlike common feed-forward models that have distinct filters at each layer, recurrent networks reuse the same parameters at various depths. In this work, we observe that recurrent models exhibit the same hierarchical behaviors and the same performance benefits as depth despite reusing the same filters at every recurrence. By training models of various feed-forward and recurrent architectures on several datasets for image classification as well as maze solving, we show that recurrent networks have the ability to closely emulate the behavior of non-recurrent deep models, often doing so with far fewer parameters.

NeurIPS Conference 2021 Conference Paper

Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with Recurrent Networks

  • Avi Schwarzschild
  • Eitan Borgnia
  • Arjun Gupta
  • Furong Huang
  • Uzi Vishkin
  • Micah Goldblum
  • Tom Goldstein

Deep neural networks are powerful machines for visual pattern recognition, but reasoning tasks that are easy for humans may still be difficult for neural models. Humans possess the ability to extrapolate reasoning strategies learned on simple problems to solve harder examples, often by thinking for longer. For example, a person who has learned to solve small mazes can easily extend the very same search techniques to solve much larger mazes by spending more time. In computers, this behavior is often achieved through the use of algorithms, which scale to arbitrarily hard problem instances at the cost of more computation. In contrast, the sequential computing budget of feed-forward neural networks is limited by their depth, and networks trained on simple problems have no way of extending their reasoning to accommodate harder problems. In this work, we show that recurrent networks trained to solve simple problems with few recurrent steps can indeed solve much more complex problems simply by performing additional recurrences during inference. We demonstrate this algorithmic behavior of recurrent networks on prefix sum computation, mazes, and chess. In all three domains, networks trained on simple problem instances are able to extend their reasoning abilities at test time simply by "thinking for longer. "

ICML Conference 2021 Conference Paper

Just How Toxic is Data Poisoning? A Unified Benchmark for Backdoor and Data Poisoning Attacks

  • Avi Schwarzschild
  • Micah Goldblum
  • Arjun Gupta
  • John Dickerson 0001
  • Tom Goldstein

Data poisoning and backdoor attacks manipulate training data in order to cause models to fail during inference. A recent survey of industry practitioners found that data poisoning is the number one concern among threats ranging from model stealing to adversarial attacks. However, it remains unclear exactly how dangerous poisoning methods are and which ones are more effective considering that these methods, even ones with identical objectives, have not been tested in consistent or realistic settings. We observe that data poisoning and backdoor attacks are highly sensitive to variations in the testing setup. Moreover, we find that existing methods may not generalize to realistic settings. While these existing works serve as valuable prototypes for data poisoning, we apply rigorous tests to determine the extent to which we should fear them. In order to promote fair comparison in future work, we develop standardized benchmarks for data poisoning and backdoor attacks.

AAAI Conference 2021 Short Paper

Text Embedding Bank for Detailed Image Paragraph Captioning

  • Arjun Gupta
  • Zengming Shen
  • Thomas Huang

Existing deep learning-based models for image captioning typically consist of an image encoder to extract visual features and a language model decoder, an architecture that has shown promising results in single high-level sentence generation. However, only the word-level guiding signal is available when the image encoder is optimized to extract visual features. The inconsistency between the parallel extraction of visual features and sequential text supervision limits its success when the length of the generated text is long (more than 50 words). We propose a new module, called the Text Embedding Bank (TEB), to address this problem for image paragraph captioning. This module uses the paragraph vector model to learn fixed-length feature representations from a variable-length paragraph. We refer to the fixedlength feature as the TEB. This TEB module plays two roles to benefit paragraph captioning performance. First, it acts as a form of global and coherent deep supervision to regularize visual feature extraction in the image encoder. Second, it acts as a distributed memory to provide features of the whole paragraph to the language model, which alleviates the longterm dependency problem. Adding this module to two existing state-of-the-art methods achieves a new state-of-theart result on the paragraph captioning Stanford Visual Genome dataset.

NeurIPS Conference 2020 Conference Paper

Semantic Visual Navigation by Watching YouTube Videos

  • Matthew Chang
  • Arjun Gupta
  • Saurabh Gupta

Semantic cues and statistical regularities in real-world environment layouts can improve efficiency for navigation in novel environments. This paper learns and leverages such semantic cues for navigating to objects of interest in novel environments, by simply watching YouTube videos. This is challenging because YouTube videos don't come with labels for actions or goals, and may not even showcase optimal behavior. Our method tackles these challenges through the use of Q-learning on pseudo-labeled transition quadruples (image, action, next image, reward). We show that such off-policy Q-learning from passive data is able to learn meaningful semantic cues for navigation. These cues, when used in a hierarchical navigation policy, lead to improved efficiency at the ObjectGoal task in visually realistic simulations. We observe a relative improvement of 15-83% over end-to-end RL, behavior cloning, and classical methods, while using minimal direct interaction.