Author name cluster

Kikuo Fujimura

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

28 papers

2 author rows

AAAI Conference 2022 Conference Paper

Recursive Reasoning Graph for Multi-Agent Reinforcement Learning

Xiaobai Ma
David Isele
Jayesh K. Gupta
Kikuo Fujimura
Mykel J. Kochenderfer

Multi-agent reinforcement learning (MARL) provides an efficient way for simultaneously learning policies for multiple agents interacting with each other. However, in scenarios requiring complex interactions, existing algorithms can suffer from an inability to accurately anticipate the influence of selfactions on other agents. Incorporating an ability to reason about other agents’ potential responses can allow an agent to formulate more effective strategies. This paper adopts a recursive reasoning model in a centralized-training-decentralizedexecution framework to help learning agents better cooperate with or compete against others. The proposed algorithm, referred to as the Recursive Reasoning Graph (R2G), shows state-of-the-art performance on multiple multi-agent particle and robotics games.

ICRA Conference 2021 Conference Paper

Reinforcement Learning for Autonomous Driving with Latent State Inference and Spatial-Temporal Relationships

Xiaobai Ma
Jiachen Li 0001
Mykel J. Kochenderfer
David Isele
Kikuo Fujimura

Deep reinforcement learning (DRL) provides a promising way for learning navigation in complex autonomous driving scenarios. However, identifying the subtle cues that can indicate drastically different outcomes remains an open problem with designing autonomous systems that operate in human environments. In this work, we show that explicitly inferring the latent state and encoding spatial-temporal relationships in a reinforcement learning framework can help address this difficulty. We encode prior knowledge on the latent states of other drivers through a framework that combines the reinforcement learner with a supervised learner. In addition, we model the influence passing between different vehicles through graph neural networks (GNNs). The proposed framework significantly improves performance in the context of navigating T-intersections compared with state-of-the-art baseline approaches.

ICLR Conference 2020 Conference Paper

CM3: Cooperative Multi-goal Multi-stage Multi-agent Reinforcement Learning

Jiachen Yang
Alireza Nakhaei
David Isele
Kikuo Fujimura
Hongyuan Zha

A variety of cooperative multi-agent control problems require agents to achieve individual goals while contributing to collective success. This multi-goal multi-agent setting poses difficulties for recent algorithms, which primarily target settings with a single global reward, due to two new challenges: efficient exploration for learning both individual goal attainment and cooperation for others' success, and credit-assignment for interactions between actions and goals of different agents. To address both challenges, we restructure the problem into a novel two-stage curriculum, in which single-agent goal attainment is learned prior to learning multi-agent cooperation, and we derive a new multi-goal multi-agent policy gradient with a credit function for localized credit assignment. We use a function augmentation scheme to bridge value and policy functions across the curriculum. The complete architecture, called CM3, learns significantly faster than direct adaptations of existing algorithms on three challenging multi-goal multi-agent problems: cooperative navigation in difficult formations, negotiating multi-vehicle lane changes in the SUMO traffic simulator, and strategic cooperation in a Checkers environment.

ICRA Conference 2020 Conference Paper

Driving in Dense Traffic with Model-Free Reinforcement Learning

Dhruv Mauria Saxena
Sangjae Bae
Alireza Nakhaei
Kikuo Fujimura
Maxim Likhachev

Traditional planning and control methods could fail to find a feasible trajectory for an autonomous vehicle to execute amongst dense traffic on roads. This is because the obstacle-free volume in spacetime is very small in these scenarios for the vehicle to drive through. However, that does not mean the task is infeasible since human drivers are known to be able to drive amongst dense traffic by leveraging the cooperativeness of other drivers to open a gap. The traditional methods fail to take into account the fact that the actions taken by an agent affect the behaviour of other vehicles on the road. In this work, we rely on the ability of deep reinforcement learning to implicitly model such interactions and learn a continuous control policy over the action space of an autonomous vehicle. The application we consider requires our agent to negotiate and open a gap in the road in order to successfully merge or change lanes. Our policy learns to repeatedly probe into the target road lane while trying to find a safe spot to move in to. We compare against two model-predictive control-based algorithms and show that our policy outperforms them in simulation. As part of this work, we introduce a benchmark for driving in dense traffic for use by the community.

IROS Conference 2019 Conference Paper

Interaction-aware Decision Making with Adaptive Strategies under Merging Scenarios

Yeping Hu
Alireza Nakhaei
Masayoshi Tomizuka
Kikuo Fujimura

In order to drive safely and efficiently under merging scenarios, autonomous vehicles should be aware of their surroundings and make decisions by interacting with other road participants. Moreover, different strategies should be made when the autonomous vehicle is interacting with drivers having different level of cooperativeness. Whether the vehicle is on the merge-lane or main-lane will also influence the driving maneuvers since drivers will behave differently when they have the right-of-way than otherwise. Many traditional methods have been proposed to solve decision making problems under merging scenarios. However, these works either are incapable of modeling complicated interactions or require implementing hand-designed rules which cannot properly handle the uncertainties in real-world scenarios. In this paper, we proposed an interaction-aware decision making with adaptive strategies (IDAS) approach that can let the autonomous vehicle negotiate the road with other drivers by leveraging their cooperativeness under merging scenarios. A single policy is learned under the multi-agent reinforcement learning (MARL) setting via the curriculum learning strategy, which enables the agent to automatically infer other drivers’ various behaviors and make decisions strategically. A masking mechanism is also proposed to prevent the agent from exploring states that violate common sense of human judgment and increase the learning efficiency. An exemplar merging scenario was used to implement and examine the proposed method.

ICRA Conference 2019 Conference Paper

Interaction-Aware Multi-Agent Reinforcement Learning for Mobile Agents with Individual Goals

Anahita Mohseni-Kabir
David Isele
Kikuo Fujimura

In a multi-agent setting, the optimal policy of a single agent is largely dependent on the behavior of other agents. We investigate the problem of multi-agent reinforcement learning, focusing on decentralized learning in non-stationary domains for mobile robot navigation. We identify a cause for the difficulty in training non-stationary policies: mutual adaptation to sub-optimal behaviors, and we use this to motivate a curriculum-based strategy for learning interactive policies. The curriculum has two stages. First, the agent leverages policy gradient algorithms to learn a policy that is capable of achieving multiple goals. Second, the agent learns a modifier policy to learn how to interact with other agents in a multi-agent setting. We evaluated our approach on both an autonomous driving lane-change domain and a robot navigation domain.

AAMAS Conference 2019 Conference Paper

MARL-PPS: Multi-agent Reinforcement Learning with Periodic Parameter Sharing

Safa Cicek
Alireza Nakhaei
Stefano Soatto
Kikuo Fujimura

We present a multi-agent reinforcement learning algorithm that is a simple, yet effective modification of a known algorithm. External agents are modeled as a time-varying environment, whose policy parameters are updated periodically at a slower rate than the planner to make learning stable and more efficient. Replay buffer, which is used to store the experiences, is also reset with the same large period to draw samples from a fixed environment. This enables us to address challenging cooperative control problems in highway navigation. The resulting Multi-agent Reinforcement Learning with Periodic Parameter Sharing (MARL-PPS) algorithm outperforms the baselines in multi-agent highway scenarios we tested.

ICRA Conference 2019 Conference Paper

Uncertainty-Aware Data Aggregation for Deep Imitation Learning

Yuchen Cui
David Isele
Scott Niekum
Kikuo Fujimura

Estimating statistical uncertainties allows autonomous agents to communicate their confidence during task execution and is important for applications in safety-critical domains such as autonomous driving. In this work, we present the uncertainty-aware imitation learning (UAIL) algorithm for improving end-to-end control systems via data aggregation. UAIL applies Monte Carlo Dropout to estimate uncertainty in the control output of end-to-end systems, using states where it is uncertain to selectively acquire new training data. In contrast to prior data aggregation algorithms that force human experts to visit sub-optimal states at random, UAIL can anticipate its own mistakes and switch control to the expert in order to prevent visiting a series of sub-optimal states. Our experimental results from simulated driving tasks demonstrate that our proposed uncertainty estimation method can be leveraged to reliably predict infractions. Our analysis shows that UAIL outperforms existing data aggregation algorithms on a series of benchmark tasks.

IROS Conference 2018 Conference Paper

Collaborative Planning for Mixed-Autonomy Lane Merging

Shray Bansal
Akansel Cosgun
Alireza Nakhaei
Kikuo Fujimura

Driving is a social activity: drivers often indicate their intent to change lanes via motion cues. We consider mixed-autonomy traffic where a Human-driven Vehicle (HV) and an Autonomous Vehicle (AV) drive together. We propose a planning framework where the degree to which the AV considers the other agent's reward is controlled by a selfishness factor. We test our approach on a simulated two-lane highway where the AV and HV merge into each other's lanes. In a user study with 21 subjects and 6 different selfishness factors, we found that our planning approach was sound and that both agents had less merging times when a factor that balances the rewards for the two agents was chosen. Our results on double lane merging suggest it to be a non-zero-sum game and encourage further investigation on collaborative decision making algorithms for mixed-autonomy traffic.

ICRA Conference 2018 Conference Paper

Navigating Occluded Intersections with Autonomous Vehicles Using Deep Reinforcement Learning

David Isele
Reza Rahimi
Akansel Cosgun
Kaushik Subramanian
Kikuo Fujimura

Providing an efficient strategy to navigate safely through unsignaled intersections is a difficult task that requires determining the intent of other drivers. We explore the effectiveness of Deep Reinforcement Learning to handle intersection problems. Using recent advances in Deep RL, we are able to learn policies that surpass the performance of a commonly-used heuristic approach in several metrics including task completion time and goal success rate and have limited ability to generalize. We then explore a system's ability to learn active sensing behaviors to enable navigating safely in the case of occlusions. Our analysis, provides insight into the intersection handling problem, the solutions learned by the network point out several shortcomings of current rule-based methods, and the failures of our current deep reinforcement learning system point to future research directions.

IROS Conference 2018 Conference Paper

Safe Reinforcement Learning on Autonomous Vehicles

David Isele
Alireza Nakhaei
Kikuo Fujimura

There have been numerous advances in reinforcement learning, but the typically unconstrained exploration of the learning process prevents the adoption of these methods in many safety critical applications. Recent work in safe reinforcement learning uses idealized models to achieve their guarantees, but these models do not easily accommodate the stochasticity or high-dimensionality of real world systems. We investigate how prediction provides a general and intuitive framework to constraint exploration, and show how it can be used to safely learn intersection handling behaviors on an autonomous vehicle.

ICRA Conference 2018 Conference Paper

Scalable Decision Making with Sensor Occlusions for Autonomous Driving

Maxime Bouton
Alireza Nakhaei
Kikuo Fujimura
Mykel J. Kochenderfer

Autonomous driving in urban areas requires avoiding other road users with only partial observability of the environment. Observations are only partial because obstacles can occlude the field of view of the sensors. The problem of robust and efficient navigation under uncertainty can be framed as a partially observable Markov decision process (POMDP). In order to bypass the computational cost of scaling the formulation to avoiding multiple road users, this paper demonstrates a decomposition method that leverages the optimal avoidance strategy for a single user. We evaluate the performance of two POMDP solution techniques augmented with the decomposition method for scenarios involving a pedestrian crosswalk and an intersection.

AAMAS Conference 2018 Conference Paper

Utility Decomposition with Deep Corrections for Scalable Planning under Uncertainty

Maxime Bouton
Kyle Julian
Alireza Nakhaei
Kikuo Fujimura
Mykel J. Kochenderfer

Decomposition methods have been proposed in the past to approximate solutions to large sequential decision making problems. In contexts where an agent interacts with multiple entities, utility decomposition can be used where each individual entity is considered independently. The individual utility functions are then combined in real time to solve the global problem. Although these techniques can perform well empirically, they sacrifice optimality. This paper proposes an approach inspired from multi-fidelity optimization to learn a correction term with a neural network representation. Learning this correction can significantly improve performance. We demonstrate this approach on a pedestrian avoidance problem for autonomous driving. By leveraging strategies to avoid a single pedestrian, the decomposition method can scale to avoid multiple pedestrians. We verify empirically that the proposed correction method leads to a significant improvement over the decomposition method alone and outperforms a policy trained on the full scale problem without utility decomposition.

ICRA Conference 2010 Conference Paper

Constrained closed loop inverse kinematics

Behzad Dariush
Youding Zhu
Arjun Arumbakkam
Kikuo Fujimura

This paper introduces a kinematically constrained closed loop inverse kinematics algorithm for motion control of robots or other articulated rigid body systems. The proposed strategy utilizes gradients of collision and joint limit potential functions to arrive at an appropriate weighting matrix to penalize and dampen motion approaching constraint surfaces. The method is particularly suitable for self collision avoidance of highly articulated systems which may have multiple collision points among several segment pairs. In that respect, the proposed method has a distinct advantage over existing gradient projection based methods which rely on numerically unstable null-space projections when there are multiple intermittent constraints. We also show how this approach can be augmented with a previously reported method based on redirection of constraints along virtual surface manifolds. The hybrid strategy is effective, robust, and does not require parameter tuning. The efficacy of the proposed algorithm is demonstrated for a self collision avoidance problem where the reference motion is obtained from human observations. We show simulation and experimental results on the humanoid robot ASIMO.

IROS Conference 2008 Conference Paper

Online and markerless motion retargeting with kinematic constraints

Behzad Dariush
Michael Gienger
Arjun Arumbakkam
Christian Goerick
Youding Zhu
Kikuo Fujimura

Transferring motion from a human demonstrator to a humanoid robot is an important step toward developing robots that are easily programmable and that can replicate or learn from observed human motion. The so called motion retargeting problem has been well studied and several off-line solutions exist based on optimization approaches that rely on pre-recorded human motion data collected from a marker-based motion capture system. From the perspective of human robot interaction, there is a growing interest in online and marker-less motion transfer. Such requirements have placed stringent demands on retargeting algorithms and limited the potential use of off-line and pre-recorded methods. To address these limitations, we present an online task space control theoretic retargeting formulation to generate robot joint motions that adhere to the robot’s joint limit constraints, self-collision constraints, and balance constraints. The inputs to the proposed method include low dimensional normalized human motion descriptors, detected and tracked using a vision based feature detection and tracking algorithm. The proposed vision algorithm does not rely on markers placed on anatomical landmarks, nor does it require special instrumentation or calibration. The current implementation requires a depth image sequence, which is collected from a single time of flight imaging device. We present online experimental results of the entire pipeline on the Honda humanoid robot - ASIMO.

IROS Conference 2008 Conference Paper

The memory game: Creating a human-robot interactive scenario for ASIMO

Victor Ng-Thow-Hing
Jongwoo Lim
Joel Wormer
Ravi Kiran Sarvadevabhatla
Carlos Rocha
Kikuo Fujimura
Yoshiaki Sakagami

We present a human-robot interactive scenario consisting of a memory card game between Honda’s humanoid robot ASIMO and a human player. The game features perception exclusively through ASIMO’s on-board cameras and both reactive and proactive behaviors specific to different situational contexts in the memory game. ASIMO is able to build a dynamic environmental map of relevant objects in the game such as the table and card layout as well as understand activities from the player such as pointing at cards, flipping cards and removing them from the table. Our system architecture, called the Cognitive Map, treats the memory game as a multi-agent system, with modules acting independently and communicating with each other via messages through a shared blackboard system. The game behavior module can model game state and contextual information to make decisions based on different pattern recognition modules. Behavior is then sent through high-level command interfaces to be resolved into actual physical actions by the robot via a multi-modal communication module. The experience gained in modeling this interactive scenario will allow us to reuse the architecture to create new scenarios and explore new research directions in learning how to respond to new interactive situations.

ICRA Conference 2008 Conference Paper

Whole body humanoid control from human motion descriptors

Behzad Dariush
Michael Gienger
Bing Jian
Christian Goerick
Kikuo Fujimura

Many advanced motion control strategies developed in robotics use captured human motion data as valuable source of examples to simplify the process of programming or learning complex robot motions. Direct and online control of robots from observed human motion has several inherent challenges. The most important may be the representation of the large number of mechanical degrees of freedom involved in the execution of movement tasks. Attempting to map all such degrees of freedom from a human to a humanoid is a formidable task from an instrumentation and sensing point of view. More importantly, such an approach is incompatible with mechanisms in the central nervous system which are believed to organize or simplify the control of these degrees of freedom during motion execution and motor learning phase. Rather than specifying the desired motion of every degree of freedom for the purpose of motion control, it is important to describe motion by low dimensional motion primitives that are defined in Cartesian (or task) space. In this paper, we formulate the human to humanoid retargeting problem as a task space control problem. The control objective is to track desired task descriptors while satisfying constraints such as joint limits, velocity limits, collision avoidance, and balance. The retargeting algorithm generates the joint space trajectories that are commanded to the robot. We present experimental and simulation results of the retargeting control algorithm on the Honda humanoid robot ASIMO.

IROS Conference 2002 Conference Paper

The intelligent ASIMO: system overview and integration

Yoshiaki Sakagami
Ryujin Watanabe
Chiaki Aoyama
Shinichi Matsunaga
Nobuo Higaki
Kikuo Fujimura

We present the system overview and integration of the ASIMO autonomous robot that can function successfully in indoor environments. The first model of ASIMO is already being leased to companies for receptionist work. In this paper, we describe the new capabilities that we have added to ASIMO. We explain the structure of the robot system for intelligence, integrated subsystems on its body, and their new functions. We describe the behavior-based planning architecture on ASIMO and its vision and auditory system. We describe its gesture recognition system, human interaction and task performance. We also discuss the external online database system that can be accessed using internet to retrieve desired information, the management system for receptionist work, and various function demonstrations.

TCS Journal 2002 Journal Article

Time-minimal paths amidst moving obstacles in three dimensions

Kikuo Fujimura

ICRA Conference 1994 Conference Paper

Motion Planning Amidst Planar Moving Obstacles

Neeraj Aggarwal
Kikuo Fujimura

A method is investigated for finding a collision-free path for a mobile robot with a few degrees of freedom in a time-varying domain. The environment contains a set of obstacles with arbitrary known motion patterns. In a time-varying environment, paths are time-dependent, i. e. , a path needs to be specified as a function of time, since connectivity in the environment changes over time. Given a time-varying environment, a start time, and a start location, a method is presented for finding a collision-free path from start to goal points for a finite-size robot subject to a speed bound. Our method makes use of a heuristic approach based on a transient pixel representation. >

IROS Conference 1993 Conference Paper

A navigation strategy for cooperative multiple mobile robots

Karansher Singh
Kikuo Fujimura

The problem of navigation of cooperative multiple heterogeneous mobile robots is investigated. The mobile robots vary in size and their capabilities in terms of speeds to navigate through the region and sensor ranges to acquire information about the region. The robots are assumed to have sufficient memory to store the map and to be able to communicate with each other. An algorithm is presented for plane sweep by cooperative multiple mobile robots. The authors' approach makes use of an occupancy grid and the robots sensors are to sweep all pixels of the grid. The algorithm is discussed in detail and its feasibility is demonstrated by simulation results for the case of two cooperating mobile robots.

IROS Conference 1993 Conference Paper

Motion planning amidst dynamic obstacles in three dimensions

Kikuo Fujimura

Motion planning amidst moving obstacles in three dimensions is considered. Each obstacle is a polyhedron in three dimensions that moves with a constant speed at a fixed direction. Three basic properties are established for time-minimal motions in three dimensions amidst slowly moving obstacles. Representing moving three-dimensional objects with full generality would require space-time for four dimensions. The discussion is based on the idea of three-dimensional collision fronts and does not make explicit use of the time dimension. This makes it possible to treat the problem as in the case of stationary obstacles.

ICRA Conference 1992 Conference Paper

On motion planning amidst transient obstacles

Kikuo Fujimura

The author discusses important class of dynamic obstacles, that is, obstacles that appear and disappear in the environment. This formulation allows modeling of a number of time-varying situations that can arise in application domains. For example, a motion can be planned in an environment where an agent rearranges the environment by picking up an object and placing it back at another location in the same environment. An algorithm is presented to generate a motion in such a dynamic domain. The algorithm runs in O(n/sup 3/ log n) time, where n is the total number of vertices in the environment. >

IROS Conference 1992 Conference Paper

Route Planning For Mobile Robots Amidst Moving Obstacles

Kikuo Fujimura

ICRA Conference 1991 Conference Paper

A model of reactive planning for multiple mobile agents

Kikuo Fujimura

Reactive planning is studied for multiple mobile agents. The approach taken is distributed, i. e. , each planning agent independently plans its own action based on its map information. An environment contains mobile agents of different capacities with respect to knowledge about the environment, planning algorithms, etc. A model for such reactive agents is described, and simulation results are presented to show their behavior patterns. >

ICRA Conference 1990 Conference Paper

Motion planning in a dynamic domain

Kikuo Fujimura
Hanan Samet

Motion planning is studied in a time-varying environment. Each obstacle is a convex polygon that moves in a fixed direction at a constant speed. The robot is a convex polygon that is subject to a speed bound. A method of determining whether or not there is a translational collision-free motion for a polygonal robot from an initial position to a final position and of planning such a motion, if it exists, is presented. The method makes use of the concept of configuration spaces and accessibility. An algorithm is given for motion planning in such an environment, and its time complexity is analyzed. >

ICRA Conference 1989 Conference Paper

Time-minimal paths among moving obstacles

Kikuo Fujimura
Hanan Samet

Motion planning for a point robot is studied in a two-dimensional time-varying environment. The obstacle is a convex polygon that moves in a fixed direction at a constant speed. The point to be reached (referred to as the destination point) also moves along a known path. The concept of accessibility from a point to a moving object is introduced, and it is used to define a graph on a set of moving obstacles. The graph is shown to exhibit an important property, that is, if the moving point is able to move faster than any of the obstacles, a time-minimal path is given as a sequence of edges in the graph. An algorithm is described for generating a time-minimal path, and its execution time is analyzed. >

ICRA Conference 1988 Conference Paper

Path planning among moving obstacles using spatial indexing

Kikuo Fujimura
Hanan Samet

A method is presented for planning a path in the presence of moving obstacles. Given a set of polygonal moving obstacles, a path is generated for a mobile robot that navigates in the two-dimensional plane. Time is included as one of the dimensions of the model world. This allows the moving obstacles to be regarded as stationary in the extended world. For a solution to be feasible, the robot must not collide with any other moving obstacles and must navigate within the predetermined range of velocity, acceleration, and centrifugal force. A spatial index is used to facilitate geometric search for the path-planning task. Computer simulation results are presented to illustrate the feasibility of this approach. >