Author name cluster

Daniel Nikovski

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

21 papers

2 author rows

IROS Conference 2023 Conference Paper

Constrained Dynamic Movement Primitives for Collision Avoidance in Novel Environments

Seiji Shaw
Devesh K. Jha
Arvind U. Raghunathan
Radu Corcodel
Diego Romeres
George Konidaris 0001
Daniel Nikovski

Dynamic movement primitives are widely used for learning skills that can be demonstrated to a robot by a skilled human or controller. While their generalization capabilities and simple formulation make them very appealing to use, they possess no strong guarantees to satisfy operational safety constraints for a task. We present constrained dynamic movement primitives (CDMPs), which can allow for positional constraint satisfaction in the robot workspace. Our method solves a non-linear optimization to perturb an existing DMP's forcing weights to admit a Zeroing Barrier Function (ZBF), which certifies positional workspace constraint satisfaction. We demonstrate our approach under different positional constraints on the end-effector movement on multiple physical robots, such as obstacle avoidance and workspace limitations.

ICRA Conference 2021 Conference Paper

Tactile-RL for Insertion: Generalization to Objects of Unknown Geometry

Siyuan Dong
Devesh K. Jha
Diego Romeres
Sangwoon Kim
Daniel Nikovski
Alberto Rodriguez 0003

Object insertion is a classic contact-rich manipulation task. The task remains challenging, especially when considering general objects of unknown geometry, which significantly limits the ability to understand the contact configuration between the object and the environment. We study the problem of aligning the object and environment with a tactile-based feedback insertion policy. The insertion process is modeled as an episodic policy that iterates between insertion attempts followed by pose corrections. We explore different mechanisms to learn such a policy based on Reinforcement Learning. The key contribution of this paper is to demonstrate that it is possible to learn a tactile insertion policy that generalizes across different object geometries, and an ablation study of the key design choices for the learning agent: 1) the type of learning scheme: supervised vs. reinforcement learning; 2) the type of learning schedule: unguided vs. curriculum learning; 3) the type of sensing modality: force/torque vs. tactile; and 4) the type of tactile representation: tactile RGB vs. tactile flow. We show that the optimal configuration of the learning agent (RL + curriculum + tactile flow) exposed to 4 training objects yields an closed-loop insertion policy that inserts 4 novel objects with over 85. 0% success rate and within 3~4 consecutive attempts. Comparisons between F/T and tactile sensing, shows that while an F/T-based policy learns more efficiently, a tactile-based policy provides better generalization. See supplementary video and results at https://sites.google.com/view/tactileinsertion.

ICML Conference 2020 Conference Paper

Can Increasing Input Dimensionality Improve Deep Reinforcement Learning?

Kei Ota
Tomoaki Oiki
Devesh K. Jha
Toshisada Mariyama
Daniel Nikovski

Deep reinforcement learning (RL) algorithms have recently achieved remarkable successes in various sequential decision making tasks, leveraging advances in methods for training large deep networks. However, these methods usually require large amounts of training data, which is often a big problem for real-world applications. One natural question to ask is whether learning good representations for states and using larger networks helps in learning better policies. In this paper, we try to study if increasing input dimensionality helps improve performance and sample efficiency of model-free deep RL algorithms. To do so, we propose an online feature extractor network (OFENet) that uses neural nets to produce \emph{good} representations to be used as inputs to an off-policy RL algorithm. Even though the high dimensionality of input is usually thought to make learning of RL agents more difficult, we show that the RL agents in fact learn more efficiently with the high-dimensional representation than with the lower-dimensional state observations. We believe that stronger feature propagation together with larger networks allows RL agents to learn more complex functions of states and thus improves the sample efficiency. Through numerical experiments, we show that the proposed method achieves much higher sample efficiency and better performance. Codes for the proposed method are available at http: //www. merl. com/research/license/OFENet

ICRA Conference 2020 Conference Paper

Local Policy Optimization for Trajectory-Centric Reinforcement Learning

Patrik Kolaric
Devesh K. Jha
Arvind U. Raghunathan
Frank L. Lewis
Mouhacine Benosman
Diego Romeres
Daniel Nikovski

The goal of this paper is to present a method for simultaneous trajectory and local stabilizing policy optimization to generate local policies for trajectory-centric model-based reinforcement learning (MBRL). This is motivated by the fact that global policy optimization for non-linear systems could be a very challenging problem both algorithmically and numerically. However, a lot of robotic manipulation tasks are trajectory-centric, and thus do not require a global model or policy. Due to inaccuracies in the learned model estimates, an open-loop trajectory optimization process mostly results in very poor performance when used on the real system. Motivated by these problems, we try to formulate the problem of trajectory optimization and local policy synthesis as a single optimization problem. It is then solved simultaneously as an instance of nonlinear programming. We provide some results for analysis as well as achieved performance of the proposed technique under some simplifying assumptions.

ICRA Conference 2019 Conference Paper

Semiparametrical Gaussian Processes Learning of Forward Dynamical Models for Navigating in a Circular Maze

Diego Romeres
Devesh K. Jha
Alberto Dalla Libera
Bill Yerazunis
Daniel Nikovski

This paper presents a problem of model learning for the purpose of learning how to navigate a ball to a goal state in a circular maze environment with two degrees of freedom. The motion of the ball in the maze environment is influenced by several non-linear effects such as dry friction and contacts, which are difficult to model physically. We propose a semiparametric model to estimate the motion dynamics of the ball based on Gaussian Process Regression equipped with basis functions obtained from physics first principles. The accuracy of this semiparametric model is shown not only in estimation but also in prediction at n-steps ahead and its compared with standard algorithms for model learning. The learned model is then used in a trajectory optimization algorithm to compute ball trajectories. We propose the system presented in the paper as a benchmark problem for reinforcement and robot learning, for its interesting and challenging dynamics and its relative ease of reproducibility.

ICRA Conference 2019 Conference Paper

Sim-to-Real Transfer Learning using Robustified Controllers in Robotic Tasks involving Complex Dynamics

Jeroen van Baar
Alan Sullivan
Radu Cordorel
Devesh K. Jha
Diego Romeres
Daniel Nikovski

Learning robot tasks or controllers using deep reinforcement learning has been proven effective in simulations. Learning in simulation has several advantages. For example, one can fully control the simulated environment, including halting motions while performing computations. Another advantage when robots are involved, is that the amount of time a robot is occupied learning a task-rather than being productive-can be reduced by transferring the learned task to the real robot. Transfer learning requires some amount of fine-tuning on the real robot. For tasks which involve complex (non-linear) dynamics, the fine-tuning itself may take a substantial amount of time. In order to reduce the amount of fine-tuning we propose to learn robustified controllers in simulation. Robustified controllers are learned by exploiting the ability to change simulation parameters (both appearance and dynamics) for successive training episodes. An additional benefit for this approach is that it alleviates the precise determination of physics parameters for the simulator, which is a non-trivial task. We demonstrate our proposed approach on a real setup in which a robot aims to solve a maze game, which involves complex dynamics due to static friction and potentially large accelerations. We show that the amount of fine-tuning in transfer learning for a robustified controller is substantially reduced compared to a non-robustified controller.

IROS Conference 2019 Conference Paper

Trajectory Optimization for Unknown Constrained Systems using Reinforcement Learning

Kei Ota
Devesh K. Jha
Tomoaki Oiki
Mamoru Miura
Takashi Nammoto
Daniel Nikovski
Toshisada Mariyama

In this paper, we propose a reinforcement learning-based algorithm for trajectory optimization for constrained dynamical systems. This problem is motivated by the fact that for most robotic systems, the dynamics may not always be known. Generating smooth, dynamically feasible trajectories could be difficult for such systems. Using sampling-based algorithms for motion planning may result in trajectories that are prone to undesirable control jumps. However, they can usually provide a good reference trajectory which a model-free reinforcement learning algorithm can then exploit by limiting the search domain and quickly finding a dynamically smooth trajectory. We use this idea to train a reinforcement learning agent to learn a dynamically smooth trajectory in a curriculum learning setting. Furthermore, for generalization, we parameterize the policies with goal locations, so that the agent can be trained for multiple goals simultaneously. We show result in both simulated environments as well as real experiments, for a 6-DoF manipulator arm operated in position-controlled mode to validate the proposed idea. We compare the proposed ideas against a PID controller which is used to track a designed trajectory in configuration space. Our experiments show that our RL agent trained with a reference path outperformed a model-free PID controller of the type commonly used on many robotic platforms for trajectory tracking.

ICML Conference 2018 Conference Paper

Reinforcement Learning with Function-Valued Action Spaces for Partial Differential Equation Control

Yangchen Pan
Amir Massoud Farahmand
Martha White
Saleh Nabi
Piyush Grover
Daniel Nikovski

Recent work has shown that reinforcement learning (RL) is a promising approach to control dynamical systems described by partial differential equations (PDE). This paper shows how to use RL to tackle more general PDE control problems that have continuous high-dimensional action spaces with spatial relationship among action dimensions. In particular, we propose the concept of action descriptors, which encode regularities among spatially-extended action dimensions and enable the agent to control high-dimensional action PDEs. We provide theoretical evidence suggesting that this approach can be more sample efficient compared to a conventional approach that treats each action dimension separately and does not explicitly exploit the spatial regularity of the action space. The action descriptor approach is then used within the deep deterministic policy gradient algorithm. Experiments on two PDE control problems, with up to 256-dimensional continuous actions, show the advantage of the proposed approach over the conventional one.

IJCAI Conference 2018 Conference Paper

Time Series Chains: A Novel Tool for Time Series Data Mining

Yan Zhu
Makoto Imamura
Daniel Nikovski
Eamonn Keogh

Since their introduction over a decade ago, time se-ries motifs have become a fundamental tool for time series analytics, finding diverse uses in dozens of domains. In this work we introduce Time Series Chains, which are related to, but distinct from, time series motifs. Informally, time series chains are a temporally ordered set of subsequence patterns, such that each pattern is similar to the pattern that preceded it, but the first and last patterns are arbi-trarily dissimilar. In the discrete space, this is simi-lar to extracting the text chain “hit, hot, dot, dog” from a paragraph. The first and last words have nothing in common, yet they are connected by a chain of words with a small mutual difference. Time Series Chains can capture the evolution of systems, and help predict the future. As such, they potentially have implications for prognostics. In this work, we introduce a robust definition of time series chains, and a scalable algorithm that allows us to discover them in massive datasets.

NeurIPS Conference 2017 Conference Paper

Random Projection Filter Bank for Time Series Data

Amir-massoud Farahmand
Sepideh Pourazarm
Daniel Nikovski

We propose Random Projection Filter Bank (RPFB) as a generic and simple approach to extract features from time series data. RPFB is a set of randomly generated stable autoregressive filters that are convolved with the input time series to generate the features. These features can be used by any conventional machine learning algorithm for solving tasks such as time series prediction, classification with time series data, etc. Different filters in RPFB extract different aspects of the time series, and together they provide a reasonably good summary of the time series. RPFB is easy to implement, fast to compute, and parallelizable. We provide an error upper bound indicating that RPFB provides a reasonable approximation to a class of dynamical systems. The empirical results in a series of synthetic and real-world problems show that RPFB is an effective method to extract features from time series.

ICAPS Conference 2017 Conference Paper

Submodular Function Maximization for Group Elevator Scheduling

Srikumar Ramalingam
Arvind U. Raghunathan
Daniel Nikovski

We propose a novel approach for group elevator scheduling by formulating it as the maximization of submodular function under a matroid constraint. In particular, we propose to model the total waiting time of passengers using a quadratic Boolean function. The unary and pairwise terms in the function denote the waiting time for single and pairwise allocation of passengers to elevators, respectively. We show that this objective function is submodular. The matroid constraints ensure that every passenger is allocated to exactly one elevator. We use a greedy algorithm to maximize the submodular objective function, and derive provable guarantees on the optimality of the solution. We tested our algorithm using Elevate 8, a commercial-grade elevator simulator that allows simulation with a wide range of elevator settings. We achieve significant improvement over the existing algorithms.

AAAI Conference 2016 Conference Paper

Truncated Approximate Dynamic Programming with Task-Dependent Terminal Value

Amir-massoud Farahmand
Daniel Nikovski
Yuji Igarashi
Hiroki Konaka

We propose a new class of computationally fast algorithms to ﬁnd close to optimal policy for Markov Decision Processes (MDP) with large ﬁnite horizon T. The main idea is that instead of planning until the time horizon T, we plan only up to a truncated horizon H T and use an estimate of the true optimal value function as the terminal value. Our approach of ﬁnding the terminal value function is to learn a mapping from an MDP to its value function by solving many similar MDPs during a training phase and ﬁt a regression estimator. We analyze the method by providing an error propagation theorem that shows the effect of various sources of errors to the quality of the solution. We also empirically validate this approach in a real-world application of designing an energy management system for Hybrid Electric Vehicles with promising results.

EWRL Workshop 2016 Workshop Paper

Value-Aware Loss Function for Model Learning in Reinforcement Learning

Amir-massoud Farahmand
Andre Barreto
Daniel Nikovski

We consider the problem of estimating the transition probability kernel to be used by a model-based reinforcement learning (RL) algorithm. We argue that estimating a generative model that minimizes a probabilistic loss, such as the log-loss, might be an overkill because such a probabilistic loss does not take into account the underlying structure of the decision problem and the RL algorithm that intends to solve it. We introduce a loss function that takes the structure of the value function into account. We provide a finite-sample upper bound for the loss function showing the dependence of the error on model approximation error and the number of samples.

ICRA Conference 2004 Conference Paper

Optimal Parking in Group Elevator Control

Matthew Brand
Daniel Nikovski

We consider the problem of optimally parking empty cars in an elevator group so as to anticipate and intercept the arrival of new passengers and minimize their waiting times. Two solutions are proposed, for the down-peak and up-peak traffic patterns. We demonstrate that matching the distribution of free cars to the arrival distribution of passengers is sufficient to produce savings of up to 80% in down-peak traffic. Since this approach Is not useful for the much harder case of up-peak traffic, we propose a solution based on the representation of the elevator system as a Markov decision process (MDP) model with relatively few aggregated states, and determination of the optimal parking policy by means of dynamic programming on the MDP model.

MFCS Conference 2004 Invited Paper

Theory and Applied Computing: Observations and Anecdotes

Matthew Brand
Sarah F. Frisken Gibson
Neal Lesh
Joe Marks
Daniel Nikovski
Ronald N. Perry
Jonathan S. Yedidia

Abstract While the kind of theoretical computer science being studied in academe is still highly relevant to systems-oriented research, it is less relevant to applications-oriented research. In applied computing, theoretical elements are used only when strictly relevant to the practical problem at hand. Theory is often combined judiciously with empiricism. And increasingly, theory is most useful when cross-pollinated with ideas and methods from other fields. We will illustrate these points by describing several recent projects at Mitsubishi Electric Research Labs that have heavy mathematical and algorithmic underpinnings. These projects include new algorithms for: traffic analysis; geometric layout; belief propagation in graphical models; dimensionality reduction; and shape representation. Practical applications of this work include elevator dispatch, stock cutting, error-correcting codes, data mining, and digital typography. In all cases theoretical concepts and results are used effectively to solve practical problems of commercial import.

ICAPS Conference 2003 Conference Paper

Decision-Theoretic Group Elevator Scheduling

Daniel Nikovski
Matthew Brand

We present an efficient algorithm for exact calculation and minimization of expected waiting times of all passengers using a bank of elevators. The dynamics of the system are represented by a discrete-state Markov chain embedded in the continuous phase-space diagram of a moving elevator car. The chain is evaluated efficiently using dynamic programming to compute measures of future system performance such as expected waiting time, properly averaged over all possible future scenarios. An elevator group scheduler based on this method significantly outperforms a conventional algorithm based on minimization of proxy criteria such as the time needed for all cars to complete their assigned deliveries. For a wide variety of buildings, ranging from 8 to 30 floors, and with 2 to 8 shafts, our algorithm reduces waiting times up to 70% in heavy traffic, and exhibits an average waiting-time speed-up of 20% in a test set of 20, 000 building types and traffic patterns. While the algorithm has greater computational costs than most conventional algorithms, it is linear in the size of the building and number of shafts, and quadratic in the number of passengers, and is completely within the computational capabilities of currently existing elevator bank control systems.

UAI Conference 2003 Conference Paper

Marginalizing Out Future Passengers in Group Elevator Control

Daniel Nikovski
Matthew Brand

IROS Conference 2002 Conference Paper

Learning probabilistic models for optimal visual servo control of dynamic manipulation

Daniel Nikovski
Illah R. Nourbakhsh

We present an experiment in sequential visual servo control of a dynamic manipulation task with unknown equations of motion and feedback from an uncalibrated camera. Our algorithm constructs a model of a Markov decision process (MDP) by means of grounding states in observed trajectories, and uses the model to find a control policy based on visual input, which maximizes a prespecified optimal control criterion balancing performance and control effort.

IROS Conference 2002 Conference Paper

Learning probabilistic models for state tracking of mobile robots

Daniel Nikovski
Illah R. Nourbakhsh

We propose a learning algorithm for acquiring a stochastic model of the behavior of a mobile robot, which allows the robot to localize itself along the outer boundary of its environment while traversing it. Compared to previously suggested solutions based on learning self-organizing neural nets, our approach achieves much higher spatial resolution which is limited only by the control time-step of the robot. We demonstrate the successful work of the algorithm on a small robot with only three infrared range sensors and a digital compass, and suggest how this algorithm can be extended to learn probabilistic models for full decision-theoretic reasoning and planning.

AAAI Conference 2000 Short Paper

Grounding State Representations in Sensory Experience for Reasoning and Planning by Mobile Robots

Daniel Nikovski

We are addressing the problem of learning probabilistic models of the interaction between a mobile robot and its environment and using these models for task planning. This requires modifying the state-of-the-art reinforcement learning algorithms to deal with hidden state and high-dimensional observation spaces of continuous variables. Our approach is to identify hidden states by means of the trajectories leading into and out of them, and perform clustering in this embedding trajectory space in order to compile a partially observable Markov decision process (POMDP) model, which can be used for approximate decision-theoretic planning. The ultimate objective of our work is to develop algorithms that learn POMDP models with discrete hidden states defined (grounded) directly into continuous sensory variables such as sonar and infrared readings.

ICML Conference 2000 Conference Paper

Learning Probabilistic Models for Decision-Theoretic Navigation of Mobile Robots

Daniel Nikovski
Illah R. Nourbakhsh