Author name cluster

Viktor Makoviychuk

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

10 papers

2 author rows

ICRA Conference 2024 Conference Paper

Geometric Fabrics: a Safe Guiding Medium for Policy Learning

Karl Van Wyk
Ankur Handa
Viktor Makoviychuk
Yijie Guo
Arthur Allshire
Nathan D. Ratliff

Robotics policies are always subjected to complex, second order dynamics that entangle their actions with resulting states. In reinforcement learning (RL) contexts, policies have the burden of deciphering these complicated interactions over massive amounts of experience and complex reward functions to learn how to accomplish tasks. Moreover, policies typically issue actions directly to controllers like Operational Space Control (OSC) or joint PD control, which induces straightline motion towards these action targets in task or joint space. However, straightline motion in these spaces for the most part do not capture the rich, nonlinear behavior our robots need to exhibit, shifting the burden of discovering these behaviors more completely to the agent. Unlike these simpler controllers, geometric fabrics capture a much richer and desirable set of behaviors via artificial, second order dynamics grounded in nonlinear geometry. These artificial dynamics shift the uncontrolled dynamics of a robot via an appropriate control law to form behavioral dynamics. Behavioral dynamics unlock a new action space and safe, guiding behavior over which RL policies are trained. Behavioral dynamics enable bang-bang-like RL policy actions that are still safe for real robots, simplify reward engineering, and help sequence real-world, high-performance policies. We describe the framework more generally and create a specific instantiation for the problem of dexterous, in-hand reorientation of a cube by a highly actuated robot hand.

Details

ICRA Conference 2023 Conference Paper

DeXtreme: Transfer of Agile In-hand Manipulation from Simulation to Reality

Ankur Handa
Arthur Allshire
Viktor Makoviychuk
Aleksei Petrenko
Ritvik Singh
Jingzhou Liu
Denys Makoviichuk
Karl Van Wyk

Recent work has demonstrated the ability of deep reinforcement learning (RL) algorithms to learn complex robotic behaviours in simulation, including in the domain of multi-fingered manipulation. However, such models can be challenging to transfer to the real world due to the gap between simulation and reality. In this paper, we present our techniques to train a) a policy that can perform robust dexterous manipulation on an anthropomorphic robot hand and b) a robust pose estimator suitable for providing reliable real-time information on the state of the object being manipulated. Our policies are trained to adapt to a wide range of conditions in simulation. Consequently, our vision-based policies significantly outperform the best vision policies in the literature on the same reorientation task and are competitive with policies that are given privileged state information via motion capture systems. Our work reaffirms the possibilities of sim-to-real transfer for dexterous manipulation in diverse kinds of hardware and simulator setups, and in our case, with the Allegro Hand and Isaac Gym GPU-based simulation. Furthermore, it opens up possibilities for researchers to achieve such results with commonly-available, affordable robot hands and cameras. Videos of the resulting policy and supplementary information, including experiments and demos, can be found on the website.

Details

ICLR Conference 2022 Conference Paper

Accelerated Policy Learning with Parallel Differentiable Simulation

Jie Xu 0028
Viktor Makoviychuk
Yashraj Narang
Fabio Ramos 0001
Wojciech Matusik
Animesh Garg
Miles Macklin

Deep reinforcement learning can generate complex control policies, but requires large amounts of training data to work effectively. Recent work has attempted to address this issue by leveraging differentiable simulators. However, inherent problems such as local minima and exploding/vanishing numerical gradients prevent these methods from being generally applied to control tasks with complex contact-rich dynamics, such as humanoid locomotion in classical RL benchmarks. In this work we present a high-performance differentiable simulator and a new policy learning algorithm (SHAC) that can effectively leverage simulation gradients, even in the presence of non-smoothness. Our learning algorithm alleviates problems with local minima through a smooth critic function, avoids vanishing/exploding gradients through a truncated learning window, and allows many physical environments to be run in parallel. We evaluate our method on classical RL control tasks, and show substantial improvements in sample efficiency and wall-clock time over state-of-the-art RL and differentiable simulation-based algorithms. In addition, we demonstrate the scalability of our method by applying it to the challenging high-dimensional problem of muscle-actuated locomotion with a large action space, achieving a greater than $17\times$ reduction in training time over the best-performing established RL algorithm. More visual results are provided at: https://short-horizon-actor-critic.github.io/.

Details

NeurIPS Conference 2022 Conference Paper

EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine

Jiayi Weng
Min Lin
Shengyi Huang
Bo Liu
Denys Makoviichuk
Viktor Makoviychuk
Zichen Liu
Yufan Song

There has been significant progress in developing reinforcement learning (RL) training systems. Past works such as IMPALA, Apex, Seed RL, Sample Factory, and others, aim to improve the system's overall throughput. In this paper, we aim to address a common bottleneck in the RL training system, i. e. , parallel environment execution, which is often the slowest part of the whole system but receives little attention. With a curated design for paralleling RL environments, we have improved the RL environment simulation speed across different hardware setups, ranging from a laptop and a modest workstation, to a high-end machine such as NVIDIA DGX-A100. On a high-end machine, EnvPool achieves one million frames per second for the environment execution on Atari environments and three million frames per second on MuJoCo environments. When running EnvPool on a laptop, the speed is 2. 8x that of the Python subprocess. Moreover, great compatibility with existing RL training libraries has been demonstrated in the open-sourced community, including CleanRL, rl_games, DeepMind Acme, etc. Finally, EnvPool allows researchers to iterate their ideas at a much faster pace and has great potential to become the de facto RL environment execution engine. Example runs show that it only takes five minutes to train agents to play Atari Pong and MuJoCo Ant on a laptop. EnvPool is open-sourced at https: //github. com/sail-sg/envpool.

PDF Details

ICRA Conference 2022 Conference Paper

OSCAR: Data-Driven Operational Space Control for Adaptive and Robust Robot Manipulation

Josiah Wong
Viktor Makoviychuk
Anima Anandkumar
Yuke Zhu

Learning performant robot manipulation policies can be challenging due to high-dimensional continuous actions and complex physics-based dynamics. This can be alleviated through intelligent choice of action space. Operational Space Control (OSC) has been used as an effective task-space controller for manipulation. Nonetheless, its strength depends on the underlying modeling fidelity, and is prone to failure when there are modeling errors. In this work, we propose OSC for Adaptation and Robustness (OSCAR), a data-driven variant of OSC that compensates for modeling errors by inferring relevant dynamics parameters from online trajectories. OSCAR decomposes dynamics learning into task-agnostic and task-specific phases, decoupling the dynamics dependencies of the robot and the extrinsics due to its environment. This structure enables robust zero-shot performance under out-of-distribution and rapid adaptation to significant domain shifts through additional finetuning. We evaluate our method on a variety of simulated manipulation problems, and find substantial improvements over an array of controller baselines. For more results and information, please visit https://cremebrule.github.io/oscar-web/.

Details

IROS Conference 2022 Conference Paper

Transferring Dexterous Manipulation from GPU Simulation to a Remote Real-World TriFinger

Arthur Allshire
Mayank Mittal
Varun Lodaya
Viktor Makoviychuk
Denys Makoviichuk
Felix Widmaier
Manuel Wüthrich
Stefan Bauer

In-hand manipulation of objects is an important capability to enable robots to carry-out tasks which demand high levels of dexterity. This work presents a robot systems approach to learning dexterous manipulation tasks involving moving objects to arbitrary 6-DoF poses. We show empirical benefits, both in simulation and sim - to- real transfer, of using keypoint-based representations for object pose in policy observations and reward calculation to train a model-free reinforcement learning agent. By utilizing domain randomization strategies and large-scale training, we achieve a high success rate of 83 % on a real TriFinger system, with a single policy able to perform grasping, ungrasping, and finger gaiting in order to achieve arbitrary poses within the workspace. We demonstrate that our policy can generalise to unseen objects, and success rates can be further improved through finetuning. With the aim of assisting further research in learning in-hand manipulation, we provide a detailed exposition of our system and make the codebase of our system available, along with checkpoints trained on billions of steps of experience, at https://s2r2-ig.github.io

Details

NeurIPS Conference 2021 Conference Paper

Isaac Gym: High Performance GPU Based Physics Simulation For Robot Learning

Viktor Makoviychuk
Lukasz Wawrzyniak
Yunrong Guo
Michelle Lu
Kier Storey
Miles Macklin
David Hoeller
Nikita Rudin

Isaac Gym offers a high-performance learning platform to train policies for a wide variety of robotics tasks entirely on GPU. Both physics simulation and neural network policy training reside on GPU and communicate by directly passing data from physics buffers to PyTorch tensors without ever going through CPU bottlenecks. This leads to blazing fast training times for complex robotics tasks on a single GPU with 2-3 orders of magnitude improvements compared to conventional RL training that uses a CPU-based simulator and GPUs for neural networks. We host the results and videos at https: //sites. google. com/view/isaacgym-nvidia and Isaac Gym can be downloaded at https: //developer. nvidia. com/isaac-gym. The benchmark and environments are available at https: //github. com/NVIDIA-Omniverse/IsaacGymEnvs.

PDF Details

ICML Conference 2021 Conference Paper

Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning

Anuj Mahajan
Mikayel Samvelyan
Lei Mao
Viktor Makoviychuk
Animesh Garg
Jean Kossaifi
Shimon Whiteson
Yuke Zhu

Reinforcement Learning in large action spaces is a challenging problem. This is especially true for cooperative multi-agent reinforcement learning (MARL), which often requires tractable learning while respecting various constraints like communication budget and information about other agents. In this work, we focus on the fundamental hurdle affecting both value-based and policy-gradient approaches: an exponential blowup of the action space with the number of agents. For value-based methods, it poses challenges in accurately representing the optimal value function for value-based methods, thus inducing suboptimality. For policy gradient methods, it renders the critic ineffective and exacerbates the problem of the lagging critic. We show that from a learning theory perspective, both problems can be addressed by accurately representing the associated action-value function with a low-complexity hypothesis class. This requires accurately modelling the agent interactions in a sample efficient way. To this end, we propose a novel tensorised formulation of the Bellman equation. This gives rise to our method Tesseract, which utilises the view of Q-function seen as a tensor where the modes correspond to action spaces of different agents. Algorithms derived from Tesseract decompose the Q-tensor across the agents and utilise low-rank tensor approximations to model the agent interactions relevant to the task. We provide PAC analysis for Tesseract based algorithms and highlight their relevance to the class of rich observation MDPs. Empirical results in different domains confirm the gains in sample efficiency using Tesseract as supported by the theory.

Details

ICRA Conference 2020 Conference Paper

In-Hand Object Pose Tracking via Contact Feedback and GPU-Accelerated Robotic Simulation

Jacky Liang
Ankur Handa
Karl Van Wyk
Viktor Makoviychuk
Oliver Kroemer
Dieter Fox

Tracking the pose of an object while it is being held and manipulated by a robot hand is difficult for vision-based methods due to significant occlusions. Prior works have explored using contact feedback and particle filters to localize in-hand objects. However, they have mostly focused on the static grasp setting and not when the object is in motion, as doing so requires modeling of complex contact dynamics. In this work, we propose using GPU-accelerated parallel robot simulations and derivative-free, sample-based optimizers to track in-hand object poses with contact feedback during manipulation. We use physics simulation as the forward model for robot-object interactions, and the algorithm jointly optimizes for the state and the parameters of the simulations, so they better match with those of the real world. Our method runs in real-time (30Hz) on a single GPU, and it achieves an average point cloud distance error of 6mm in simulation experiments and 13mm in the real-world ones.

Details

ICRA Conference 2019 Conference Paper

Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience

Yevgen Chebotar
Ankur Handa
Viktor Makoviychuk
Miles Macklin
Jan Issac
Nathan D. Ratliff
Dieter Fox

We consider the problem of transferring policies to the real world by training on a distribution of simulated scenarios. Rather than manually tuning the randomization of simulations, we adapt the simulation parameter distribution using a few real world roll-outs interleaved with policy training. In doing so, we are able to change the distribution of simulations to improve the policy transfer by matching the policy behavior in simulation and the real world. We show that policies trained with our method are able to reliably transfer to different robots in two real world tasks: swing-peg-in-hole and opening a cabinet drawer. The video of our experiments can be found at https://sites.google.com/view/simopt.

Details