Author name cluster

Oscar Ramirez

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers

1 author row

ICML Conference 2024 Conference Paper

Target Networks and Over-parameterization Stabilize Off-policy Bootstrapping with Function Approximation

Fengdi Che
Chenjun Xiao
Jincheng Mei
Bo Dai 0001
Ramki Gummadi
Oscar Ramirez
Christopher K. Harris
A. Rupam Mahmood

We prove that the combination of a target network and over-parameterized linear function approximation establishes a weaker convergence condition for bootstrapped value estimation in certain cases, even with off-policy data. Our condition is naturally satisfied for expected updates over the entire state-action space or learning with a batch of complete trajectories from episodic Markov decision processes. Notably, using only a target network or an over-parameterized model does not provide such a convergence guarantee. Additionally, we extend our results to learning with truncated trajectories, showing that convergence is achievable for all tasks with minor modifications, akin to value truncation for the final states in trajectories. Our primary result focuses on temporal difference estimation for prediction, providing high-probability value estimation error bounds and empirical analysis on Baird’s counterexample and a Four-room task. Furthermore, we explore the control setting, demonstrating that similar convergence conditions apply to Q-learning.

Details

ICLR Conference 2022 Conference Paper

Understanding and Leveraging Overparameterization in Recursive Value Estimation

Chenjun Xiao
Bo Dai 0001
Jincheng Mei
Oscar Ramirez
Ramki Gummadi
Christopher K. Harris
Dale Schuurmans

The theory of function approximation in reinforcement learning (RL) typically considers low capacity representations that incur a tradeoff between approximation error, stability and generalization. Current deep architectures, however, operate in an overparameterized regime where approximation error is not necessarily a bottleneck. To better understand the utility of deep models in RL we present an analysis of recursive value estimation using \emph{overparameterized} linear representations that provides useful, transferable findings. First, we show that classical updates such as temporal difference (TD) learning or fitted-value-iteration (FVI) converge to \emph{different} fixed points than residual minimization (RM) in the overparameterized linear case. We then develop a unified interpretation of overparameterized linear value estimation as minimizing the Euclidean norm of the weights subject to alternative constraints. A practical consequence is that RM can be modified by a simple alteration of the backup targets to obtain the same fixed points as FVI and TD (when they converge), while universally ensuring stability. Further, we provide an analysis of the generalization error of these methods, demonstrating per iterate bounds on the value prediction error of FVI, and fixed point bounds for TD and RM. Given this understanding, we then develop new algorithmic tools for improving recursive value estimation with deep models. In particular, we extract two regularizers that penalize out-of-span top-layer weights and co-linearity in top-layer features respectively. Empirically we find that these regularizers dramatically improve the stability of TD and FVI, while allowing RM to match and even sometimes surpass their generalization performance with assured stability.

Details

ICRA Conference 2018 Conference Paper

PRM-RL: Long-range Robotic Navigation Tasks by Combining Reinforcement Learning and Sampling-Based Planning

Aleksandra Faust
Ken Oslund
Oscar Ramirez
Anthony G. Francis
Lydia Tapia
Marek Fiser
James Davidson

We present PRM-RL, a hierarchical method for long-range navigation task completion that combines sampling-based path planning with reinforcement learning (RL). The RL agents learn short-range, point-to-point navigation policies that capture robot dynamics and task constraints without knowledge of the large-scale topology. Next, the sampling-based planners provide roadmaps which connect robot configurations that can be successfully navigated by the RL agent. The same RL agents are used to control the robot under the direction of the planning, enabling long-range navigation. We use the Probabilistic Roadmaps (PRMs) for the sampling-based planner. The RL agents are constructed using feature-based and deep neural net policies in continuous state and action spaces. We evaluate PRM-RL, both in simulation and on-robot, on two navigation tasks with non-trivial robot dynamics: end-to-end differential drive indoor navigation in office environments, and aerial cargo delivery in urban environments with load displacement constraints. Our results show improvement in task completion over both RL agents on their own and traditional sampling-based planners. In the indoor navigation task, PRM-RL successfully completes up to 215 m long trajectories under noisy sensor conditions, and the aerial cargo delivery completes flights over 1000 m without violating the task constraints in an environment 63 million times larger than used in training.

Details

ICRA Conference 2017 Conference Paper

Flexible virtual fixture interface for path specification in tele-manipulation

Camilo Perez Quintero
Masood Dehghan
Oscar Ramirez
Marcelo H. Ang
Martin Jägersand

We present the design and implementation of a flexible force-vision-based interface; allowing local operators to visually specify a path constraint to a remote robot manipulator in an on-line fashion during the teleoperation. Using bilateral and unilateral configurations, we compare our system to direct teleoperation through user studies. Three performance metrics (smoothness, error and execution time) and a subjective evaluation (NASA TLX) were used to quantify user performance. The trials show that our system outperforms direct teleoperation and reduces cognitive load. Our findings show that the performance of a unilateral teleop configuration with visual-force constraints surpass a bilateral teleop configuration in terms of displacement error and variance, as well as allowing users to complete tasks faster and with a smoother trajectory.

Details

ICRA Conference 2016 Conference Paper

CPWalker: Robotic platform for gait rehabilitation in patients with Cerebral Palsy

Cristina Bayon
Oscar Ramirez
M. Dolores del Castillo
José Ignacio Serrano
Rafael Raya
José M. Belda-Lois
Rakel Poveda
Fernando Mollà

Cerebral Palsy (CP) is a disorder of posture and movement due to an imperfection or lesion in the immature brain. CP is often associated to sensory deficits, cognition impairments, communication and motor disabilities, behaviour issues, seizure disorder, pain and secondary musculoskeletal problems. New strategies are needed to help to promote, maintain, and rehabilitate the functional capacity, and thereby diminish the dedication and assistance required and the economical demands that this condition represents for the patient, the caregivers and the whole society. This paper describes the conceptualization and development of the integrated CPWalker robotic platform to support novel therapies for CP rehabilitation. This platform (Smart Walker + exoskeleton) is controlled by a multimodal interface to establish the interaction of CP children with robot-based therapies. The objective of these therapies is to improve the physical skills of children with CP and similar disorders. CPWalker concept will promote the earlier incorporation of CP patients to the rehabilitation therapy and increase the level of intensity and frequency of the exercises according to the task, which will enable the maintenance of therapeutic methods in daily basis, with the intention to lead to significant improvements in the treatment outcome.

Details

ICRA Conference 2016 Conference Paper

ViTa: Visual task specification interface for manipulation with uncalibrated visual servoing

Mona Gridseth
Oscar Ramirez
Camilo Perez Quintero
Martin Jägersand

We present a human robot interface (HRI) for semi-autonomous human-in-the-loop control, that aims to tackle some of the challenges for robotics in unstructured environments. Our HRI lets the user specify desired object alignments in an image editor as geometric overlays on images. The HRI is based on the technique of visual task specification [1], which provides a well studied theoretical framework. Tasks are completed using uncalibrated image-based visual servoing (UVS). Our interface is shown to be effective for a versatile set of tasks that span both coarse and fine manipulation. We complete tasks such as inserting a marker in its cap, inserting a small cube in a shape sorter, grasping a circular lid, following a line, grasping a screw, cutting along a line, picking and placing a box and grasping a cylinder using a Barrett WAM arm and hand.

Details

ICRA Conference 2015 Conference Paper

VIBI: Assistive vision-based interface for robot manipulation

Camilo Perez Quintero
Oscar Ramirez
Martin Jägersand

Upper-body disabled people can benefit from the use of robot-arms to perform every day tasks. However, the adoption of this kind of technology has been limited by the complexity of robot manipulation tasks and the difficulty in controlling a multiple-DOF arm using a joystick or a similar device. Motivated by this need, we present an assistive vision-based interface for robot manipulation. Our proposal is to replace the direct joystick motor control interface present in a commercial wheelchair mounted assistive robotic manipulator with a human-robot interface based on visual selection. The scene in front of the robot is shown on a screen, and the user can then select an object with our novel grasping interface. We develop computer vision and motion control methods that drive the robot to that object. Our aim is not to replace user control, but instead augment user capabilities through our system with different levels of semi-autonomy, while leaving the user with a sense that he/she is in control of the task. Two disabled pilot users, were involved at different stages of our research. The first pilot user during the interface design along with rehab experts. The second performed user studies along with an 8 subject control group to evaluate our interface. Our system reduces robot instruction from a 6-DOF task in continuous space to either a 2-DOF pointing task or a discrete selection task among objects detected by computer vision.

Details