Arrow Research search

Author name cluster

Carmelo Sferrazza

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

14 papers
2 author rows

Possible papers

14

ICLR Conference 2025 Conference Paper

ActSafe: Active Exploration with Safety Constraints for Reinforcement Learning

  • Yarden As
  • Bhavya Sukhija
  • Lenart Treven
  • Carmelo Sferrazza
  • Stelian Coros
  • Andreas Krause 0001

Reinforcement learning (RL) is ubiquitous in the development of modern AI systems. However, state-of-the-art RL agents require extensive, and potentially unsafe, interactions with their environments to learn effectively. These limitations confine RL agents to simulated environments, hindering their ability to learn directly in real-world settings. In this work, we present ActSafe, a novel model-based RL algorithm for safe and efficient exploration. ActSafe learns a well-calibrated probabilistic model of the system and plans optimistically w.r.t. the epistemic uncertainty about the unknown dynamics, while enforcing pessimism w.r.t. the safety constraints. Under regularity assumptions on the constraints and dynamics, we show that ActSafe guarantees safety during learning while also obtaining a near-optimal policy in finite time. In addition, we propose a practical variant of ActSafe that builds on latest model-based RL advancements and enables safe exploration even in high-dimensional settings such as visual control. We empirically show that ActSafe obtains state-of-the-art performance in difficult exploration tasks on standard safe deep RL benchmarks while ensuring safety during learning.

ICRA Conference 2025 Conference Paper

Beyond Sight: Finetuning Generalist Robot Policies with Heterogeneous Sensors via Language Grounding

  • Joshua Jones
  • Oier Mees
  • Carmelo Sferrazza
  • Kyle Stachowicz
  • Pieter Abbeel
  • Sergey Levine

Interacting with the world is a multi-sensory experience: achieving effective general-purpose interaction requires making use of all available modalities - including vision, touch, and audio - to fill in gaps from partial observation. For example, when vision is occluded reaching into a bag, a robot should rely on its senses of touch and sound. However, state-of-the-art generalist robot policies are typically trained on large datasets to predict robot actions solely from visual and proprioceptive observations. In this work, we propose FuSe, a novel approach that enables finetuning visuomotor generalist policies on heterogeneous sensor modalities for which large datasets are not readily available by leveraging natural language as a common cross-modal grounding. We combine a multimodal contrastive loss with a sensory-grounded language generation loss to encode high-level semantics. In the context of robot manipulation, we show that FuSe enables performing challenging tasks that require reasoning jointly over modalities such as vision, touch, and sound in a zero-shot setting, such as multimodal prompting, compositional cross-modal prompting, and descriptions of objects it interacts with. We show that the same recipe is applicable to widely different generalist policies, including both diffusion-based generalist policies and large vision-language-action (VLA) models. Extensive experiments in the real world show that FuSe is able to increase success rates by over 20% compared to all considered baselines.

NeurIPS Conference 2025 Conference Paper

Bigger, Regularized, Categorical: High-Capacity Value Functions are Efficient Multi-Task Learners

  • Michal Nauman
  • Marek Cygan
  • Carmelo Sferrazza
  • Aviral Kumar
  • Pieter Abbeel

Recent advances in language modeling and vision stem from training large models on diverse, multi‑task data. This paradigm has had limited impact in value-based reinforcement learning (RL), where improvements are often driven by small models trained in a single-task context. This is because in multi-task RL sparse rewards and gradient conflicts make optimization of temporal difference brittle. Practical workflows for generalist policies therefore avoid online training, instead cloning expert trajectories or distilling collections of single‑task policies into one agent. In this work, we show that the use of high-capacity value models trained via cross-entropy and conditioned on learnable task embeddings addresses the problem of task interference in online RL, allowing for robust and scalable multi‑task training. We test our approach on 7 multi-task benchmarks with over 280 unique tasks, spanning high degree-of-freedom humanoid control and discrete vision-based RL. We find that, despite its simplicity, the proposed approach leads to state-of-the-art single and multi-task performance, as well as sample-efficient transfer to new tasks.

ICRA Conference 2025 Conference Paper

Hand-Object Interaction Pretraining from Videos

  • Himanshu Singh 0002
  • Antonio Loquercio
  • Carmelo Sferrazza
  • Jane Wu
  • Haozhi Qi
  • Pieter Abbeel
  • Jitendra Malik

We present an approach to learn general robot manipulation priors from 3D hand-object interaction trajectories. We build a framework to use in-the-wild videos to generate sensorimotor robot trajectories. We do so by lifting both the human hand and the manipulated object in a shared 3D space and retargeting human motions to robot actions. Generative modeling on this data gives us a task-agnostic base policy. This policy captures a general yet flexible manipulation prior. We empirically demonstrate that finetuning this policy, with both reinforcement learning (RL) and behavior cloning (BC), enables sample-efficient adaptation to downstream tasks and simultaneously improves robustness and generalizability compared to prior approaches. Qualitative experiments are available at: https://hgaurav2k.github.io/hop/.

ICLR Conference 2025 Conference Paper

MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization

  • Bhavya Sukhija
  • Stelian Coros
  • Andreas Krause 0001
  • Pieter Abbeel
  • Carmelo Sferrazza

Reinforcement learning (RL) algorithms aim to balance exploiting the current best strategy with exploring new options that could lead to higher rewards. Most common RL algorithms use undirected exploration, i.e., select random sequences of actions. Exploration can also be directed using intrinsic rewards, such as curiosity or model epistemic uncertainty. However, effectively balancing task and intrinsic rewards is challenging and often task-dependent. In this work, we introduce a framework, MaxInfoRL, for balancing intrinsic and extrinsic exploration. MaxInfoRL steers exploration towards informative transitions, by maximizing intrinsic rewards such as the information gain about the underlying task. When combined with Boltzmann exploration, this approach naturally trades off maximization of the value function with that of the entropy over states, rewards, and actions. We show that our approach achieves sublinear regret in the simplified setting of multi-armed bandits. We then apply this general formulation to a variety of off-policy model-free RL methods for continuous state-action spaces, yielding novel algorithms that achieve superior performance across hard exploration problems and complex scenarios such as visual control tasks.

NeurIPS Conference 2025 Conference Paper

SOMBRL: Scalable and Optimistic Model-Based RL

  • Lenart Treven
  • Carmelo Sferrazza
  • Florian Dorfler
  • Pieter Abbeel
  • Andreas Krause

We address the challenge of efficient exploration in model-based reinforcement learning (MBRL), where the system dynamics are unknown and the RL agent must learn directly from online interactions. We propose S calable and O ptimistic MBRL (SOMBRL), an approach based on the principle of optimism in the face of uncertainty. SOMBRL learns an uncertainty-aware dynamics model and greedily maximizes a weighted sum of the extrinsic reward and the agent's epistemic uncertainty. SOMBRL is compatible with any policy optimizers or planners, and under common regularity assumptions on the system, we show that SOMBRL has sublinear regret for nonlinear dynamics in the ( i ) finite-horizon, ( ii ) discounted infinite-horizon, and ( iii ) non-episodic setting. Additionally, SOMBRL offers a flexible and scalable solution for principled exploration. We evaluate SOMBRL on state-based and visual-control environments, where it displays strong performance across all tasks and baselines. We also evaluate SOMBRL on a dynamic RC car hardware and show SOMBRL outperforms the state-of-the-art, illustrating the benefits of principled exploration for MBRL.

ICLR Conference 2024 Conference Paper

Chain of Hindsight aligns Language Models with Feedback

  • Hao Liu 0055
  • Carmelo Sferrazza
  • Pieter Abbeel

Learning from human preferences is important for language models to match human needs and to align with human and social values. Prior works have achieved remarkable successes by learning from human feedback to understand and follow instructions. Nonetheless, these methods are either founded on hand-picked model generations that are favored by human annotators, rendering them inefficient in terms of data utilization and challenging to apply in general, or they depend on reinforcement learning, which often suffers from imperfect reward functions and relies on extremely challenging optimizations. In this work, we propose a novel technique, Chain of Hindsight, that is easy to optimize and can learn from any form of feedback, regardless of its polarity. Our idea is inspired by how humans learn from extensive feedback presented in the form of languages. We convert all types of feedback into sequences of sentences, which are then used to fine-tune the model, allowing us to take advantage of the language comprehension capabilities of language models. We condition the model on a sequence of model generations paired with feedback. By doing so, the model is trained to generate outputs based on feedback, while learning to identify and correct negative attributes or errors. Applying our method to large language models, we observed that Chain of Hindsight significantly surpasses previous methods in aligning language models with human preferences. We report significant improvements on summarization and dialogue benchmarks, with our approach markedly preferred in human evaluations.

IROS Conference 2024 Conference Paper

The Power of the Senses: Generalizable Manipulation from Vision and Touch through Masked Multimodal Learning

  • Carmelo Sferrazza
  • Younggyo Seo
  • Hao Liu 0055
  • Youngwoon Lee
  • Pieter Abbeel

Humans rely on the synergy of their senses for most essential tasks. For tasks requiring object manipulation, we seamlessly and effectively exploit the complementarity of our senses of vision and touch. This paper draws inspiration from such capabilities and aims to find a systematic approach to fuse visual and tactile information in a reinforcement learning setting. We propose Masked Multimodal Learning (M3L), which jointly learns a policy and visual-tactile representations based on masked autoencoding. The representations jointly learned from vision and touch improve sample efficiency, and unlock generalization capabilities beyond those achievable through each of the senses separately. Remarkably, representations learned in a multimodal setting also benefit vision-only policies at test time. We evaluate M3L on three simulated environments with both visual and tactile observations: robotic insertion, door opening, and dexterous in-hand manipulation, demonstrating the benefits of learning a multimodal policy. Videos of the experiments and the open-source code are available at https://sferrazza.cc/m3l_site.

ICRA Conference 2022 Conference Paper

Leveraging distributed contact force measurements for slip detection: a physics-based approach enabled by a data-driven tactile sensor

  • Pietro Griffa
  • Carmelo Sferrazza
  • Raffaello D'Andrea

Grasping objects whose physical properties are unknown is still a great challenge in robotics. Most solutions rely entirely on visual data to plan the best grasping strategy. However, to match human abilities and be able to reliably pick and hold unknown objects, the integration of an artificial sense of touch in robotic systems is pivotal. This paper describes a novel model-based slip detection pipeline that can predict possibly failing grasps in real-time and signal a necessary increase in grip force. As such, the slip detector does not rely on manually collected data, but exploits physics to generalize across different tasks. To evaluate the approach, a state-of-the-art vision-based tactile sensor that accurately estimates distributed forces was integrated into a grasping setup composed of a six degrees-of-freedom cobot and a two-finger gripper. Results show that the system can reliably predict slip while manipulating objects of different shapes, materials, and weights. The sensor can detect both translational and rotational slip in various scenarios, making it suitable to improve the stability of a grasp.

IROS Conference 2020 Conference Paper

Learning the sense of touch in simulation: a sim-to-real strategy for vision-based tactile sensing

  • Carmelo Sferrazza
  • Thomas Bi
  • Raffaello D'Andrea

Data-driven approaches to tactile sensing aim to overcome the complexity of accurately modeling contact with soft materials. However, their widespread adoption is impaired by concerns about data efficiency and the capability to generalize when applied to various tasks. This paper focuses on both these aspects with regard to a vision-based tactile sensor, which aims to reconstruct the distribution of the three- dimensional contact forces applied on its soft surface. Accurate models for the soft materials and the camera projection, derived via state-of-the-art techniques in the respective domains, are employed to generate a dataset in simulation. A strategy is proposed to train a tailored deep neural network entirely from the simulation data. The resulting learning architecture is directly transferable across multiple tactile sensors without further training and yields accurate predictions on real data, while showing promising generalization capabilities to unseen contact conditions.

IROS Conference 2020 Conference Paper

Vision-Based Proprioceptive Sensing: Tip Position Estimation for a Soft Inflatable Bellow Actuator

  • Peter Werner
  • Matthias Hofer 0003
  • Carmelo Sferrazza
  • Raffaello D'Andrea

This paper presents a vision-based sensing approach for a soft linear actuator, which is equipped with an internal camera. The proposed vision-based sensing pipeline predicts the three-dimensional tip position of the actuator. To train and evaluate the algorithm, predictions are compared to ground truth data from an external motion capture system. An off-the-shelf distance sensor is integrated in a second actuator of the same type, providing only the vertical component of the tip position and used as a baseline for comparison. The camera-based sensing pipeline runs at 40 Hz in real-time on a standard laptop and is additionally used for closed loop elongation control of the actuator. It is shown that the approach can achieve comparable accuracy to the distance sensor for measuring the linear expansion of the actuator, but additionally provide the full three-dimensional tip position.

IROS Conference 2019 Conference Paper

Transfer learning for vision-based tactile sensing

  • Carmelo Sferrazza
  • Raffaello D'Andrea

Due to the complexity of modeling the elastic properties of materials, the use of machine learning algorithms is continuously increasing for tactile sensing applications. Recent advances in deep neural networks applied to computer vision make vision-based tactile sensors very appealing for their high-resolution and low cost. A soft optical tactile sensor that is scalable to large surfaces with arbitrary shape is discussed in this paper. A supervised learning algorithm trains a model that is able to reconstruct the normal force distribution on the sensor’s surface, purely from the images recorded by an internal camera. In order to reduce the training times and the need for large datasets, a calibration procedure is proposed to transfer the acquired knowledge across multiple sensors while maintaining satisfactory performance.

ICRA Conference 2017 Conference Paper

Implementation of a parametrized infinite-horizon model predictive control scheme with stability guarantees

  • Michael Muehlebach
  • Carmelo Sferrazza
  • Raffaello D'Andrea

This article discusses the implementation of an infinite-horizon model predictive control approach that is based on representing input and state trajectories by a linear combination of basis functions. An iterative constraint sampling strategy is presented for guaranteeing constraint satisfaction over all times. It will be shown that the proposed method converges. In addition, we will discuss the implementation of the resulting (online) model predictive control algorithm on an unmanned aerial vehicle and provide experimental results. The computational efficiency of the algorithm is highlighted by the fact that a sampling rate of 100 Hz was achieved on an embedded platform.

IROS Conference 2016 Conference Paper

Numerical search for local (partial) differential flatness

  • Carmelo Sferrazza
  • Diego Pardo
  • Jonas Buchli

Differential flatness is a property of certain systems that greatly simplifies the generation of optimal and dynamically feasible trajectories. Using a differentially flat model, there is no need to integrate the system dynamics to retrieve the states and the constraints of the optimization problem are simpler. Recently, the concept of partial differential flatness has been introduced covering a broader class of systems. In particular, it allows to reduce the need for integration by limiting it to a subset of the states. However, finding an analytical expression for the (partial) differential flatness requires the manipulation of the equations of motion in a very specific manner such that a series of properties are fulfilled. In general, finding such analytical model is not straightforward nor compatible with algorithmic models. In order to tackle this problem, in this paper we present a numerical method to find a (partially) differentially flat model of a system around a collection of states and inputs trajectories. We present results on three underactuated nonlinear systems (cart-pole, planar ballbot and a 3D quadrotor). As use case examples, we show online trajectory re-planning tasks. The validity of the trajectories obtained with the locally flat models is verified by forward integrating the original equations of motion together with an optimal stabilizer.