Author name cluster

Akshat Kumar

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

64 papers

2 author rows

NeurIPS Conference 2025 Conference Paper

IOSTOM: Offline Imitation Learning from Observations via State Transition Occupancy Matching

Quang Anh Pham
Janaka Brahmanage
Tien Mai
Akshat Kumar

Offline Learning from Observations (LfO) focuses on enabling agents to imitate expert behavior using datasets that contain only expert state trajectories and separate transition data with suboptimal actions. This setting is both practical and critical in real-world scenarios where direct environment interaction or access to expert action labels is costly, risky, or infeasible. Most existing LfO methods attempt to solve this problem through state or state-action occupancy matching. They typically rely on pretraining a discriminator to differentiate between expert and non-expert states, which could introduce errors and instability—especially when the discriminator is poorly trained. While recent discriminator-free methods have emerged, they generally require substantially more data, limiting their practicality in low-data regimes. In this paper, we propose IOSTOM ($\textit{Imitation from Observation via State Transition Occupancy Matching}$), a novel offline LfO algorithm designed to overcome these limitations. Our approach formulates a learning objective based on the joint state visitation distribution. A key distinction of IOSTOM is that it first excludes actions entirely from the training objective. Instead, we learn an $\textit{implicit policy}$ that models transition probabilities between states, resulting in a more compact and stable optimization problem. To recover the expert policy, we introduce an efficient action inference mechanism that $\textit{avoids training an inverse dynamics model}$. Extensive empirical evaluations across diverse offline LfO benchmarks show that IOSTOM substantially outperforms state-of-the-art methods, demonstrating both improved performance and data efficiency.

AAAI Conference 2025 Conference Paper

Leveraging Constraint Violation Signals for Action Constrained Reinforcement Learning

Janaka Chathuranga Brahmanage
Jiajing Ling
Akshat Kumar

In many RL applications, ensuring an agent's actions adhere to constraints is crucial for safety. Most previous methods in Action-Constrained Reinforcement Learning (ACRL) employ a projection layer after the policy network to correct the action. However projection-based methods suffer from issues like the zero gradient problem and higher runtime due to the usage of optimization solvers. Recently methods were proposed to train generative models to learn a differentiable mapping between latent variables and feasible actions to address this issue. However, generative models require training using samples from the constrained action space, which itself is challenging. To address such limitations, first, we define a target distribution for feasible actions based on constraint violation signals, and train normalizing flows by minimizing the KL divergence between an approximated distribution over feasible actions and the target. This eliminates the need to generate feasible action samples, greatly simplifying the flow model learning. Second, we integrate the learned flow model with existing deep RL methods, which restrict it to exploring only the feasible action space. Third, we extend our approach beyond ACRL to handle state-wise constraints by learning the constraint violation signal from the environment. Empirically, our approach has significantly fewer constraint violations while achieving similar or better quality in several control tasks than previous best methods.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Offline Safe Reinforcement Learning Using Trajectory Classification

Ze Gong
Akshat Kumar
Pradeep Varakantham

Offline safe reinforcement learning (RL) has emerged as a promising approach for learning safe behaviors without engaging in risky online interactions with the environment. Most existing methods in offline safe RL rely on cost constraints at each time step (derived from global cost constraints) and this can result in either overly conservative policies or violation of safety constraints. In this paper, we propose to learn a policy that generates desirable trajectories and avoids undesirable trajectories. To be specific, we first partition the pre-collected dataset of state-action trajectories into desirable and undesirable subsets. Intuitively, the desirable set contains high reward and safe trajectories, and undesirable set contains unsafe trajectories and low-reward safe trajectories. Second, we learn a policy that generates desirable trajectories and avoids undesirable trajectories, where (un)desirability scores are provided by a classifier learnt from the dataset of desirable and undesirable trajectories. This approach bypasses the computational complexity and stability issues of a min-max objective that is employed in existing methods. Theoretically, we also show our approach’s strong connections to existing learning paradigms involving human feedback. Finally, we extensively evaluate our method using the DSRL benchmark for offline safe RL. Empirically, our method outperforms competitive baselines, achieving higher rewards and better constraint satisfaction across a wide variety of benchmark tasks.

PDF Details DOI

AAMAS Conference 2025 Conference Paper

ShipNaviSim: Data-Driven Simulation for Real-World Maritime Navigation

Quang Anh Pham
Janaka Chathuranga Brahmanage
Akshat Kumar

Maritime traffic management in busy ports faces growing challenges due to increased vessel traffic and complex waterway interactions. Strategies such as e-navigation by the International Maritime Organization aim to enhance navigation safety through traffic digitization. Maritime traffic simulation is essential for these systems, offering a virtual environment to model, analyze, and optimize traffic flows. Unlike road traffic, there are few simulators for maritime traffic, and they often lack realism and multi-ship interactions. In this paper, we (a) present ShipNaviSim, a data-driven maritime traffic simulator that utilizes a large-scale dataset over 2 years and electronic navigation charts to model vessel movements in Singapore Strait, one of the busiest ports in the world; (b) implement and evaluate different imitation learning algorithms such as behavior cloning to learn a policy that can accurately simulate real world vessel movements and multi-ship interactions; (c) develop vessel-specific metrics such as trajectory curvature, near miss rate, to validate the learned policy’s alignment with human expert data. Extensive testing shows that our learned agents can behave like human experts, and thus can be used with the simulator for recommending routes for vessels in a hotspot region or generating diverse traffic scenarios to benchmark navigation systems.

AAMAS Conference 2024 Conference Paper

Difference of Convex Functions Programming for Policy Optimization in Reinforcement Learning

Akshat Kumar

We formulate the problem of optimizing an agent’s policy within the Markov decision process (MDP) model as a difference-of-convex functions (DC) program. The DC perspective enables optimizing the policy iteratively where each iteration constructs an easier-tooptimize lower bound on the value function using the well known concave-convex procedure. We show that several popular policy gradient based deep RL algorithms (both for discrete and continuous state, action spaces, and stochastic/deterministic policies) such as actor-critic, deterministic policy gradient (DPG), and soft actor critic (SAC) can be derived from the DC perspective. Additionally, the DC formulation enables more sample efficient learning approaches by exploiting the structure of the value function lower bound, and when the policy has a simpler parametric form, allows using efficient nonlinear programming solvers. Furthermore, we show that the DC perspective extends easily to constrained RL and partially observable and multiagent settings. Such connections provide new insight on previous algorithms, and also help develop new algorithms for RL.

AAMAS Conference 2024 Conference Paper

Factored MDP based Moving Target Defense with Dynamic Threat Modeling

Megha Bose
Praveen Paruchuri
Akshat Kumar

Moving Target Defense (MTD) has emerged as a proactive defense framework to counteract ever-changing cyber threats. Existing approaches often make assumptions about attacker-side knowledge and behavior, potentially resulting in suboptimal defense. This paper introduces a novel MTD approach, leveraging a Markov Decision Process (MDP) model that eliminates the need for prior knowledge about attacker intentions or payoffs. Our framework seamlessly integrates real-time attacker responses into the defender’s MDP using a dynamic Bayesian network. We use a factored MDP model to enable a more comprehensive and realistic representation of the system having multiple switchable aspects and also accommodate incremental updates of an attack response predictor as new attack data emerges, ensuring adaptive defense. Empirical evaluations demonstrate the approach’s effectiveness in uncertain scenarios with evolving as well as unknown attack landscapes.

ICML Conference 2024 Conference Paper

Unified Training of Universal Time Series Forecasting Transformers

Gerald Woo
Chenghao Liu
Akshat Kumar
Caiming Xiong
Silvio Savarese
Doyen Sahoo

Deep learning for time series forecasting has traditionally operated within a one-model-per-dataset framework, limiting its potential to leverage the game-changing impact of large pre-trained models. The concept of universal forecasting, emerging from pre-training on a vast collection of time series datasets, envisions a single Large Time Series Model capable of addressing diverse downstream forecasting tasks. However, constructing such a model poses unique challenges specific to time series data: (i) cross-frequency learning, (ii) accommodating an arbitrary number of variates for multivariate time series, and (iii) addressing the varying distributional properties inherent in large-scale data. To address these challenges, we present novel enhancements to the conventional time series Transformer architecture, resulting in our proposed M asked Enc o der-based Un i ve r s a l T i me Series Forecasting Transformer ( Moirai ). Trained on our newly introduced Large-scale Open Time Series Archive (LOTSA) featuring over 27B observations across nine domains, Moirai achieves competitive or superior performance as a zero-shot forecaster when compared to full-shot models. Code, data, and model weights can be found at https: //github. com/SalesforceAIResearch/uni2ts.

NeurIPS Conference 2023 Conference Paper

FlowPG: Action-constrained Policy Gradient with Normalizing Flows

Janaka Brahmanage
Jiajing Ling
Akshat Kumar

Action-constrained reinforcement learning (ACRL) is a popular approach for solving safety-critical and resource-allocation related decision making problems. A major challenge in ACRL is to ensure agent taking a valid action satisfying constraints in each RL step. Commonly used approach of using a projection layer on top of the policy network requires solving an optimization program which can result in longer training time, slow convergence, and zero gradient problem. To address this, first we use a normalizing flow model to learn an invertible, differentiable mapping between the feasible action space and the support of a simple distribution on a latent variable, such as Gaussian. Second, learning the flow model requires sampling from the feasible action space, which is also challenging. We develop multiple methods, based on Hamiltonian Monte-Carlo and probabilistic sentential decision diagrams for such action sampling for convex and non-convex constraints. Third, we integrate the learned normalizing flow with the DDPG algorithm. By design, a well-trained normalizing flow will transform policy output into a valid action without requiring an optimization solver. Empirically, our approach results in significantly fewer constraint violations (upto an order-of-magnitude for several instances) and is multiple times faster on a variety of continuous control tasks.

AAMAS Conference 2023 Conference Paper

Knowledge Compilation for Constrained Combinatorial Action Spaces in Reinforcement Learning

Jiajing Ling
Moritz Lukas Schuler
Akshat Kumar
Pradeep Varakantham

Action-constrained reinforcement learning (ACRL), where any action taken in a state must satisfy given constraints, has several practical applications such as resource allocation in supply-demand matching, and path planning among others. A key challenge is to enforce constraints when the action space is discrete and combinatorial. To address this, first, we assume an action is represented using propositional variables, and action constraints are represented using Boolean functions. Second, we compactly encode the set of all valid actions that satisfy action constraints using a probabilistic sentential decision diagram (PSDD), a recently proposed knowledge compilation framework. Parameters of the PSDD compactly encode the probability distribution over all valid actions. Consequently, the learning task becomes optimizing PSDD parameters to maximize the RL objective. Third, we show how to embed the PSDD parameters using deep neural networks, and optimize them using a deep Q-learning based algorithm. By design, our approach is guaranteed to never violate any constraint, and does not involve any expensive projection step over the constraint space. Finally, we show how practical resource allocation constraints can be encoded using a PSDD. Empirically, our approach works better than previous ACRL methods, which often violate constraints, and are not scalable as they involve computationally expensive projectionover-constraints step.

ICML Conference 2023 Conference Paper

Learning Deep Time-index Models for Time Series Forecasting

Gerald Woo
Chenghao Liu
Doyen Sahoo
Akshat Kumar
Steven C. H. Hoi

Deep learning has been actively applied to time series forecasting, leading to a deluge of new methods, belonging to the class of historical-value models. Yet, despite the attractive properties of time-index models, such as being able to model the continuous nature of underlying time series dynamics, little attention has been given to them. Indeed, while naive deep time-index models are far more expressive than the manually predefined function representations of classical time-index models, they are inadequate for forecasting, being unable to generalize to unseen time steps due to the lack of inductive bias. In this paper, we propose DeepTime, a meta-optimization framework to learn deep time-index models which overcome these limitations, yielding an efficient and accurate forecasting model. Extensive experiments on real world datasets in the long sequence time-series forecasting setting demonstrate that our approach achieves competitive results with state-of-the-art methods, and is highly efficient. Code is available at https: //github. com/salesforce/DeepTime.

AAAI Conference 2023 Conference Paper

Planning and Learning for Non-markovian Negative Side Effects Using Finite State Controllers

Aishwarya Srivastava
Sandhya Saisubramanian
Praveen Paruchuri
Akshat Kumar
Shlomo Zilberstein

Autonomous systems are often deployed in the open world where it is hard to obtain complete specifications of objectives and constraints. Operating based on an incomplete model can produce negative side effects (NSEs), which affect the safety and reliability of the system. We focus on mitigating NSEs in environments modeled as Markov decision processes (MDPs). First, we learn a model of NSEs using observed data that contains state-action trajectories and severity of associated NSEs. Unlike previous works that associate NSEs with state-action pairs, our framework associates NSEs with entire trajectories, which is more general and captures non-Markovian dependence on states and actions. Second, we learn finite state controllers (FSCs) that predict NSE severity for a given trajectory and generalize well to unseen data. Finally, we develop a constrained MDP model that uses information from the underlying MDP and the learned FSC for planning while avoiding NSEs. Our empirical evaluation demonstrates the effectiveness of our approach in learning and mitigating Markovian and non-Markovian NSEs.

PDF Details DOI

ICAPS Conference 2023 Conference Paper

Safe MDP Planning by Learning Temporal Patterns of Undesirable Trajectories and Averting Negative Side Effects

Siow Meng Low
Akshat Kumar
Scott Sanner

In safe MDP planning, a cost function based on the current state and action is often used to specify safety aspects. In real world, often the state representation used may lack sufficient fidelity to specify such safety constraints. Operating based on an incomplete model can often produce unintended negative side effects (NSEs). To address these challenges, first, we associate safety signals with state-action trajectories (rather than just immediate state-action). This makes our safety model highly general. We also assume categorical safety labels are given for different trajectories, rather than a numerical cost function, which is harder to specify by the problem designer. We then employ a supervised learning model to learn such non-Markovian safety patterns. Second, we develop a Lagrange multiplier method, which incorporates the safety model and the underlying MDP model in a single computation graph to facilitate agent learning of safe behaviors. Finally, our empirical results on a variety of discrete and continuous domains show that this approach can satisfy complex non-Markovian safety constraints while optimizing agent

AAAI Conference 2023 Conference Paper

Scalable and Globally Optimal Generalized L₁ K-center Clustering via Constraint Generation in Mixed Integer Linear Programming

Aravinth Chembu
Scott Sanner
Hassan Khurram
Akshat Kumar

The k-center clustering algorithm, introduced over 35 years ago, is known to be robust to class imbalance prevalent in many clustering problems and has various applications such as data summarization, document clustering, and facility location determination. Unfortunately, existing k-center algorithms provide highly suboptimal solutions that can limit their practical application, reproducibility, and clustering quality. In this paper, we provide a novel scalable and globally optimal solution to a popular variant of the k-center problem known as generalized L_1 k-center clustering that uses L_1 distance and allows the selection of arbitrary vectors as cluster centers. We show that this clustering objective can be reduced to a mixed-integer linear program (MILP) that facilitates globally optimal clustering solutions. However, solving such a MILP may be intractable for large datasets; to remedy this, we present a scalable algorithm that leverages constraint generation to efficiently and provably converge to its global optimum. We further enhance outlier handling through a simple but elegant extension to our MILP objective. We first evaluate our algorithm on a variety of synthetic datasets to better understand its properties and then validate on 20 real benchmark datasets where we compare its performance to both traditional L_1 distance k-center and k-medians baselines. Our results demonstrate significant suboptimality of existing algorithms in comparison to our approach and further demonstrate that we can find optimal generalized L_1 k-center clustering solutions up to an unprecedented 1,000,000 data points.

PDF Details DOI

ICLR Conference 2022 Conference Paper

CoST: Contrastive Learning of Disentangled Seasonal-Trend Representations for Time Series Forecasting

Gerald Woo
Chenghao Liu
Doyen Sahoo
Akshat Kumar
Steven C. H. Hoi

Deep learning has been actively studied for time series forecasting, and the mainstream paradigm is based on the end-to-end training of neural network architectures, ranging from classical LSTM/RNNs to more recent TCNs and Transformers. Motivated by the recent success of representation learning in computer vision and natural language processing, we argue that a more promising paradigm for time series forecasting, is to first learn disentangled feature representations, followed by a simple regression fine-tuning step -- we justify such a paradigm from a causal perspective. Following this principle, we propose a new time series representation learning framework for long sequence time series forecasting named CoST, which applies contrastive learning methods to learn disentangled seasonal-trend representations. CoST comprises both time domain and frequency domain contrastive losses to learn discriminative trend and seasonal representations, respectively. Extensive experiments on real-world datasets show that CoST consistently outperforms the state-of-the-art methods by a considerable margin, achieving a 21.3% improvement in MSE on multivariate benchmarks. It is also robust to various choices of backbone encoders, as well as downstream regressors. Code is available at https://github.com/salesforce/CoST.

AAAI Conference 2022 Conference Paper

Sample-Efficient Iterative Lower Bound Optimization of Deep Reactive Policies for Planning in Continuous MDPs

Siow Meng Low
Akshat Kumar
Scott Sanner

Recent advances in deep learning have enabled optimization of deep reactive policies (DRPs) for continuous MDP planning by encoding a parametric policy as a deep neural network and exploiting automatic differentiation in an end-toend model-based gradient descent framework. This approach has proven effective for optimizing DRPs in nonlinear continuous MDPs, but it requires a large number of sampled trajectories to learn effectively and can suffer from high variance in solution quality. In this work, we revisit the overall model-based DRP objective and instead take a minorizationmaximization perspective to iteratively optimize the DRP w. r. t. a locally tight lower-bounded objective. This novel formulation of DRP learning as iterative lower bound optimization (ILBO) is particularly appealing because (i) each step is structurally easier to optimize than the overall objective, (ii) it guarantees a monotonically improving objective under certain theoretical conditions, and (iii) it reuses samples between iterations thus lowering sample complexity. Empirical evaluation confirms that ILBO is significantly more sampleefficient than the state-of-the-art DRP planner and consistently produces better solution quality with lower variance. We additionally demonstrate that ILBO generalizes well to new problem instances (i. e. , different initial states) without requiring retraining.

IJCAI Conference 2022 Conference Paper

Using Constraint Programming and Graph Representation Learning for Generating Interpretable Cloud Security Policies

Mikhail Kazdagli
Mohit Tiwari
Akshat Kumar

Modern software systems rely on mining insights from business sensitive data stored in public clouds. A data breach usually incurs significant (monetary) loss for a commercial organization. Conceptually, cloud security heavily relies on Identity Access Management (IAM) policies that IT admins need to properly configure and periodically update. Security negligence and human errors often lead to misconfiguring IAM policies which may open a backdoor for attackers. To address these challenges, first, we develop a novel framework that encodes generating optimal IAM policies using constraint programming (CP). We identify reducing dormant permissions of cloud users as an optimality criterion, which intuitively implies minimizing unnecessary datastore access permissions. Second, to make IAM policies interpretable, we use graph representation learning applied to historical access patterns of users to augment our CP model with similarity constraints: similar users should be grouped together and share common IAM policies. Third, we describe multiple attack models and show that our optimized IAM policies significantly reduce the impact of security attacks using real data from 8 commercial organizations, and synthetic instances.

PDF Details DOI

AAMAS Conference 2021 Conference Paper

Action Selection for Composable Modular Deep Reinforcement Learning

Vaibhav Gupta
Daksh Anand
Praveen Paruchuri
Akshat Kumar

In modular reinforcement learning (MRL), a complex decision making problem is decomposed into multiple simpler subproblems each solved by a separate module. Often, these subproblems have conflicting goals, and incomparable reward scales. A composable decision making architecture requires that even the modules authored separately with possibly misaligned reward scales can be combined coherently. An arbitrator should consider different module’s action preferences to learn effective global action selection. We present a novel framework called GRACIAS that assigns fine-grained importance to the different modules based on their relevance in a given state, and enables composable decision making based on modern deep RL methods such as deep deterministic policy gradient (DDPG) and deep Q-learning. We provide insights into the convergence properties of GRACIAS and also show that previous MRL algorithms reduce to special cases of our framework. We experimentally demonstrate on several standard MRL domains that our approach works significantly better than the previous MRL methods, and is highly robust to incomparable reward scales. Our framework extends MRL to complex Atari games such as Qbert, and has a better learning curve than the conventional RL algorithms.

AAMAS Conference 2021 Conference Paper

Approximate Difference Rewards for Scalable Multigent Reinforcement Learning

Arambam James Singh
Akshat Kumar
Hoong Chuin Lau

We address the problem of multiagent credit assignment in a large scale multiagent system. Difference rewards (DRs) are an effective tool to tackle this problem, but their exact computation is known to be challenging even for small number of agents. We propose a scalable method to compute difference rewards based on aggregate information in a multiagent system with large number of agents by exploiting the symmetry present in several practical applications. Empirical evaluation on two multiagent domains—air-traffic control and cooperative navigation, shows better solution quality than previous approaches.

ICAPS Conference 2021 Conference Paper

Integrating Knowledge Compilation with Reinforcement Learning for Routes

Jiajing Ling
Kushagra Chandak
Akshat Kumar

Sequential multiagent decision-making under partial observability and uncertainty poses several challenges. Although multiagent reinforcement learning (MARL) approaches have increased the scalability, addressing combinatorial domains is still challenging as random exploration by agents is unlikely to generate useful reward signals. We address cooperative multiagent pathfinding under uncertainty and partial observability where agents move from their respective sources to destinations while also satisfying constraints (e. g. , visiting landmarks). Our main contributions include: (1) compiling domain knowledge such as underlying graph connectivity and domain constraints into propositional logic based decision diagrams, (2) developing modular techniques to integrate such knowledge with deep MARL algorithms, and (3) developing fast algorithms to query the compiled knowledge for accelerated episode simulation in RL. Empirically, our approach can tractably represent various types of domain constraints, and outperforms previous MARL approaches significantly both in terms of sample complexity and solution quality on a number of instances.

ICAPS Conference 2021 Conference Paper

Learning and Exploiting Shaped Reward Models for Large Scale Multiagent RL

Arambam James Singh
Akshat Kumar
Hoong Chuin Lau

Many real world systems involve interaction among large number of agents to achieve a common goal, for example, air traffic control. Several model-free RL algorithms have been proposed for such settings. A key limitation is that the empirical reward signal in model-free case is not very effective in addressing the multiagent credit assignment problem, which determines an agent's contribution to the team's success. This results in lower solution quality and high sample complexity. To address this, we contribute (a) an approach to learn a differentiable reward model for both continuous and discrete action setting by exploiting the collective nature of interactions among agents, a feature commonly present in large scale multiagent applications; (b) a shaped reward model analytically derived from the learned reward model to address the key challenge of credit assignment; (c) a model-based multiagent RL approach that integrates shaped rewards into well known RL algorithms such as policy gradient, soft-actor critic. Compared to previous methods, our learned reward models are more accurate, and our approaches achieve better solution quality on synthetic and real world instances of air traffic control, and cooperative navigation with large agent population.

AAMAS Conference 2021 Conference Paper

Ship-GAN: Generative Modeling Based Maritime Traffic Simulator

Chaithanya Basrur
Arambam James Singh
Arunesh Sinha
Akshat Kumar

Modeling vessel movement in a maritime environment is an extremely challenging task given the complex nature of vessel behavior. Several existing multiagent maritime decision making frameworks require access to an accurate traffic simulator. We develop a system using electronic navigation charts to generate realistic and high fidelity vessel traffic data using Generative Adversarial Networks (GANs). Our proposed Ship-GAN uses a conditional Wasserstein GAN to model a vessel’s behavior. The generator can simulate the travel time of vessels across different maritime zones conditioned on vessels’ speeds and traffic intensity. Furthermore, it can be used as an accurate simulator for prior decision making approaches for maritime traffic coordination, which used less accurate model than our approach. Experiments performed on the historical data from heavily trafficked Singapore strait show that our Ship- GAN system generates data whose statistical distribution is close to the real data distribution, and better fit than prior methods.

ICAPS Conference 2020 Conference Paper

Reinforcement Learning for Zone Based Multiagent Pathfinding under Uncertainty

Jiajing Ling
Tarun Gupta 0002
Akshat Kumar

We address the problem of multiple agents finding their paths from respective sources to destination nodes in a graph (also called MAPF). Most existing approaches assume that all agents move at fixed speed, and that a single node accommodates only a single agent. Motivated by the emerging applications of autonomous vehicles such as drone traffic management, we present zone-based path finding (or ZBPF) where agents move among zones, and agents' movements require uncertain travel time. Furthermore, each zone can accommodate multiple agents (as per its capacity). We also develop a simulator for ZBPF which provides a clean interface from the simulation environment to learning algorithms. We develop a novel formulation of the ZBPF problem using difference-of-convex functions (DC) programming. The resulting approach can be used for policy learning using samples from the simulator. We also present a multiagent credit assignment scheme that helps our learning approach converge faster. Empirical results in a number of 2D and 3D instances show that our approach can effectively minimize congestion in zones, while ensuring agents reach their final destinations.

IJCAI Conference 2019 Conference Paper

Decision Making for Improving Maritime Traffic Safety Using Constraint Programming

Saumya Bhatnagar
Akshat Kumar
Hoong Chuin Lau

Maritime navigational safety is of utmost importance to prevent vessel collisions in heavily trafficked ports, and avoid environmental costs. In case of a likely near miss among vessels, port traffic controllers provide assistance for safely navigating the waters, often at very short lead times. A better strategy is to avoid such situations from even happening. To achieve this, we a) formalize the decision model for traffic hotspot mitigation including realistic maritime navigational features and constraints through consultations with domain experts; and b) develop a constraint programming based scheduling approach to mitigate hotspots. We model the problem as a variant of the resource constrained project scheduling problem to adjust vessel movement schedules such that the average delay is minimized and navigational safety constraints are also satisfied. We conduct a thorough evaluation on key performance indicators using real world data, and demonstrate the effectiveness of our approach in mitigating high-risk situations.

AAMAS Conference 2019 Conference Paper

Graph Based Optimization for Multiagent Cooperation

Arambam James Singh
Akshat Kumar

We address the problem of solving math programs defined over a graph where nodes represent agents and edges represent interaction among agents. The objective and constraint functions of this program model the task agent team must perform and the domain constraints. In this multiagent setting, no single agent observes the complete objective and all the constraints of the program. Thus, we develop a distributed message-passing approach to solve this optimization problem. We focus on the class of graph structured linear and quadratic programs (LPs/QPs) which can model important multiagent coordination frameworks such as distributed constraint optimization (DCOP). For DCOPs, our framework models functional constraints among agents (e. g. resource, network flow constraints) in a much more tractable fashion than previous approaches. Our iterative approach has several desirable properties—it is guaranteed to find the optimal solution for LPs, converges for general cyclic graphs, and is memory efficient making it suitable for resource limited agents, and has anytime property. Empirically, our approach provides solid empirical results on several standard benchmark problems when compared against previous approaches.

IJCAI Conference 2019 Conference Paper

Multiagent Decision Making and Learning in Urban Environments

Akshat Kumar

Our increasingly interconnected urban environments provide several opportunities to deploy intelligent agents---from self-driving cars, ships to aerial drones---that promise to radically improve productivity and safety. Achieving coordination among agents in such urban settings presents several algorithmic challenges---ability to scale to thousands of agents, addressing uncertainty, and partial observability in the environment. In addition, accurate domain models need to be learned from data that is often noisy and available only at an aggregate level. In this paper, I will overview some of our recent contributions towards developing planning and reinforcement learning strategies to address several such challenges present in large-scale urban multiagent systems.

AAAI Conference 2019 Conference Paper

Multiagent Decision Making For Maritime Traffic Management

Arambam James Singh
Duc Thien Nguyen
Akshat Kumar
Hoong Chuin Lau

We address the problem of maritime traffic management in busy waterways to increase the safety of navigation by reducing congestion. We model maritime traffic as a large multiagent systems with individual vessels as agents, and VTS authority as the regulatory agent. We develop a maritime traffic simulator based on historical traffic data that incorporates realistic domain constraints such as uncertain and asynchronous movement of vessels. We also develop a traffic coordination approach that provides speed recommendation to vessels in different zones. We exploit the nature of collective interactions among agents to develop a scalable policy gradient approach that can scale up to real world problems. Empirical results on synthetic and real world problems show that our approach can significantly reduce congestion while keeping the traffic throughput high.

ICAPS Conference 2019 Conference Paper

Resource Constrained Deep Reinforcement Learning

Abhinav Bhatia
Pradeep Varakantham
Akshat Kumar

In urban environments, resources have to be constantly matched to the “right” locations where customer demand is present. For instance, ambulances have to be matched to base stations regularly so as to reduce response time for emergency incidents in ERS (Emergency Response Systems); vehicles (cars, bikes among others) have to be matched to docking stations to reduce lost demand in shared mobility systems. Such problems are challenging owing to the demand uncertainty, combinatorial action spaces and constraints on allocation of resources (e. g. , total resources, minimum and maximum number of resources at locations and regions). Existing systems typically employ myopic and greedy optimization approaches to optimize resource allocation. Such approaches typically are unable to handle surges or variances in demand patterns well. Recent work has demonstrated the ability of Deep RL methods in adapting well to highly uncertain environments. However, existing Deep RL methods are unable to handle combinatorial action spaces and constraints on allocation of resources. To that end, we have developed three approaches on top of the well known actor-critic approach, DDPG (Deep Deterministic Policy Gradient) that are able to handle constraints on resource allocation. We also demonstrate that they are able to outperform leading approaches on simulators validated on semi-real and real data sets.

AAAI Conference 2019 Conference Paper

Successor Features Based Multi-Agent RL for Event-Based Decentralized MDPs

Tarun Gupta
Akshat Kumar
Praveen Paruchuri

Decentralized MDPs (Dec-MDPs) provide a rigorous framework for collaborative multi-agent sequential decisionmaking under uncertainty. However, their computational complexity limits the practical impact. To address this, we focus on a class of Dec-MDPs consisting of independent collaborating agents that are tied together through a global reward function that depends upon their entire histories of states and actions to accomplish joint tasks. To overcome scalability barrier, our main contributions are: (a) We propose a new actor-critic based Reinforcement Learning (RL) approach for event-based Dec-MDPs using successor features (SF) which is a value function representation that decouples the dynamics of the environment from the rewards; (b) We then present Dec-ESR (Decentralized Event based Successor Representation) which generalizes learning for event-based Dec-MDPs using SF within an end-to-end deep RL framework; (c) We also show that Dec-ESR allows useful transfer of information on related but different tasks, hence bootstraps the learning for faster convergence on new tasks; (d) For validation purposes, we test our approach on a large multi-agent coverage problem which models schedule coordination of agents in a real urban subway network and achieves better quality solutions than previous best approaches.

NeurIPS Conference 2018 Conference Paper

Credit Assignment For Collective Multiagent RL With Global Rewards

Duc Thien Nguyen
Akshat Kumar
Hoong Chuin Lau

Scaling decision theoretic planning to large multiagent systems is challenging due to uncertainty and partial observability in the environment. We focus on a multiagent planning model subclass, relevant to urban settings, where agent interactions are dependent on their ``collective influence'' on each other, rather than their identities. Unlike previous work, we address a general setting where system reward is not decomposable among agents. We develop collective actor-critic RL approaches for this setting, and address the problem of multiagent credit assignment, and computing low variance policy gradient estimates that result in faster convergence to high quality solutions. We also develop difference rewards based credit assignment methods for the collective setting. Empirically our new approaches provide significantly better solutions than previous methods in the presence of global rewards on two real world problems modeling taxi fleet optimization and multiagent patrolling, and a synthetic grid navigation domain.

AAAI Conference 2018 Conference Paper

Integrated Cooperation and Competition in Multi-Agent Decision-Making

Kyle Wray
Akshat Kumar
Shlomo Zilberstein

Observing that many real-world sequential decision problems are not purely cooperative or purely competitive, we propose a new model—cooperative-competitive process (CCP)—that can simultaneously encapsulate both cooperation and competition. First, we discuss how the CCP model bridges the gap between cooperative and competitive models. Next, we investigate a speciﬁc class of group-dominant CCPs, in which agents cooperate to achieve a common goal as their primary objective, while also pursuing individual goals as a secondary objective. We provide an approximate solution for this class of problems that leverages stochastic ﬁnite-state controllers. The model is grounded in two multi-robot meeting and boxpushing domains that are implemented in simulation and demonstrated on two real robots.

AAAI Conference 2018 Conference Paper

Planning and Learning for Decentralized MDPs With Event Driven Rewards

Tarun Gupta
Akshat Kumar
Praveen Paruchuri

Decentralized (PO)MDPs provide a rigorous framework for sequential multiagent decision making under uncertainty. However, their high computational complexity limits the practical impact. To address scalability and real-world impact, we focus on settings where a large number of agents primarily interact through complex joint-rewards that depend on their entire histories of states and actions. Such historybased rewards encapsulate the notion of events or tasks such that the team reward is given only when the joint-task is completed. Algorithmically, we contribute — 1) A nonlinear programming (NLP) formulation for such event-based planning model; 2) A probabilistic inference based approach that scales much better than NLP solvers for a large number of agents; 3) A policy gradient based multiagent reinforcement learning approach that scales well even for exponential statespaces. Our inference and RL-based advances enable us to solve a large real-world multiagent coverage problem modeling schedule coordination of agents in a real urban subway network where other approaches fail to scale.

AAAI Conference 2018 Conference Paper

Resource-Constrained Scheduling for Maritime Traffic Management

Lucas Agussurja
Akshat Kumar
Hoong Chuin Lau

We address the problem of mitigating congestion and preventing hotspots in busy water areas such as Singapore Straits and port waters. Increasing maritime trafﬁc coupled with narrow waterways makes vessel schedule coordination for just-in-time arrival critical for navigational safety. Our contributions are: 1) We formulate the maritime trafﬁc management problem based on the real case study of Singapore waters; 2) We model the problem as a variant of the resource-constrained project scheduling problem (RCPSP), and formulate mixed-integer and constraint programming (MIP/CP) formulations; 3) To improve the scalability, we develop a combinatorial Benders (CB) approach that is signiﬁcantly more effective than standard MIP and CP formulations. We also develop symmetry breaking constraints and optimality cuts that further enhance the CB approach’s effectiveness; 4) We develop a realistic maritime trafﬁc simulator using electronic navigation charts of Singapore Straits. Our scheduling approach on synthetic problems and a real 55-day AIS dataset results in signiﬁcant reduction of the trafﬁc density while incurring minimal delays.

TIST Journal 2018 Journal Article

Risk-Sensitive Stochastic Orienteering Problems for Trip Optimization in Urban Environments

Pradeep Varakantham
Akshat Kumar
Hoong Chuin Lau
William Yeoh

Orienteering Problems (OPs) are used to model many routing and trip planning problems. OPs are a variant of the well-known traveling salesman problem where the goal is to compute the highest reward path that includes a subset of vertices and has an overall travel time less than a specified deadline. However, the applicability of OPs is limited due to the assumption of deterministic and static travel times. To that end, Campbell et al. extended OPs to Stochastic OPs (SOPs) to represent uncertain travel times (Campbell et al. 2011). In this article, we make the following key contributions: (1) We extend SOPs to Dynamic SOPs (DSOPs), which allow for time-dependent travel times; (2) we introduce a new objective criterion for SOPs and DSOPs to represent a percentile measure of risk; (3) we provide non-linear optimization formulations along with their linear equivalents for solving the risk-sensitive SOPs and DSOPs; (4) we provide a local search mechanism for solving the risk-sensitive SOPs and DSOPs; and (5) we provide results on existing benchmark problems and a real-world theme park trip planning problem.

AAMAS Conference 2017 Conference Paper

A Multi-Agent System for Coordinating Vessel Traffic

Teck-Hou Teng
Hoong Chuin Lau
Akshat Kumar

AAAI Conference 2017 Conference Paper

Collective Multiagent Sequential Decision Making Under Uncertainty

Duc Thien Nguyen
Akshat Kumar
Hoong Chuin Lau

Multiagent sequential decision making has seen rapid progress with formal models such as decentralized MDPs and POMDPs. However, scalability to large multiagent systems and applicability to real world problems remain limited. To address these challenges, we study multiagent planning problems where the collective behavior of a population of agents affects the joint-reward and environment dynamics. Our work exploits recent advances in graphical models for modeling and inference with a population of individuals such as collective graphical models and the notion of ﬁnite partial exchangeability in lifted inference. We develop a collective decentralized MDP model where policies can be computed based on counts of agents in different states. As the policy search space over counts is combinatorial, we develop a sampling based framework that can compute open and closed loop policies. Comparisons with previous best approaches on synthetic instances and a real world taxi dataset modeling supply-demand matching show that our approach signiﬁcantly outperforms them w. r. t. solution quality.

AAMAS Conference 2017 Conference Paper

Coordinating Vessel Traffic to Improve Safety and Efficiency

Teck-Hou Teng
Hoong Chuin Lau
Akshat Kumar

Global increase in trade leads to congestion of maritime traffic at the ports. This often leads to increased maritime incidents or near-miss situations. To improve maritime safety while maintaining efficiency, movement of vessels needs to be better coordinated. Our work formulates this problem of coordinating the paths of vessels as a multi-agent path-finding (MAPF) problem. To address this problem, we introduce an innovative application of MAPF in the maritime domain known as Vessel Coordination Module (VCM). Based on the local search paradigm, VCM plans on a joint state space updated using the Electronic Navigation Charts (ENC) and the paths of vessels. We introduce the notion of path quality that measures the number of positions on a vessel path that is too close to some other vessels spatially and temporally. VCM aims to improve the overall path quality of vessels by improving path quality of selected vessels. Experiments are conducted on the Singapore Straits to evaluate and compare performance of our proposed approach in heterogeneous maritime scenario. Our experiment results show that VCM can improve the overall path quality of the vessels.

AAAI Conference 2017 Conference Paper

Decentralized Planning in Stochastic Environments with Submodular Rewards

Rajiv Kumar
Pradeep Varakantham
Akshat Kumar

Decentralized Markov Decision Process (Dec-MDP) provides a rich framework to represent cooperative decentralized and stochastic planning problems under transition uncertainty. However, solving a Dec-MDP to generate coordinated yet decentralized policies is NEXP-Hard. Researchers have made signiﬁcant progress in providing approximate approaches to improve scalability with respect to number of agents. However, there has been little or no research devoted to ﬁnding guarantees on solution quality for approximate approaches considering multiple (more than 2 agents) agents. We have a similar situation with respect to the competitive decentralized planning problem and the Stochastic Game (SG) model. To address this, we identify models in the cooperative and competitive case that rely on submodular rewards, where we show that existing approximate approaches can provide strong quality guarantees (a priori, and for cooperative case also posteriori guarantees). We then provide solution approaches and demonstrate improved online guarantees on benchmark problems from the literature for the cooperative case.

AAMAS Conference 2017 Conference Paper

Multiagent Coordination Using Graph Structured Mathematical Optimization

Arambam James Singh
Akshat Kumar

We address the problem of solving mathematical programs defined over a graph where nodes represent agents and edges represent interaction among agents. We focus on the class of graph structured linear and quadratic programs (LPs/QPs) which can model important multiagent coordination frameworks such as distributed constraint optimization (DCOP). For DCOPs, our framework provides a key benefit of modelling functional constraints among agents (e. g. resource, network flow constraints) in a much more tractable fashion. Our framework is also more general than previous work on solving graph-based LPs/QPs as it can model a richer class of objective function and constraints than previous work. Our iterative approach has several desirable properties—it is guaranteed to converge to the optimal solution for LPs, it works for general cyclic graphs, it is memory efficient making it suitable for resource limited agents, and has anytime property. Empirically, our approach provides solid empirical results on several standard benchmark problems when compared against previous approaches.

NeurIPS Conference 2017 Conference Paper

Policy Gradient With Value Function Approximation For Collective Multiagent Planning

Duc Thien Nguyen
Akshat Kumar
Hoong Chuin Lau

Decentralized (PO)MDPs provide an expressive framework for sequential decision making in a multiagent system. Given their computational complexity, recent research has focused on tractable yet practical subclasses of Dec-POMDPs. We address such a subclass called CDec-POMDP where the collective behavior of a population of agents affects the joint-reward and environment dynamics. Our main contribution is an actor-critic (AC) reinforcement learning method for optimizing CDec-POMDP policies. Vanilla AC has slow convergence for larger problems. To address this, we show how a particular decomposition of the approximate action-value function over agents leads to effective updates, and also derive a new way to train the critic based on local reward signals. Comparisons on a synthetic benchmark and a real world taxi fleet optimization problem show that our new AC approach provides better quality solutions than previous best approaches.

AAAI Conference 2017 Conference Paper

Robust Optimization for Tree-Structured Stochastic Network Design

Xiaojian Wu
Akshat Kumar
Daniel Sheldon
Shlomo Zilberstein

Stochastic network design is a general framework for optimizing network connectivity. It has several applications in computational sustainability including spatial conservation planning, pre-disaster network preparation, and river network optimization. A common assumption in previous work has been made that network parameters (e. g. , probability of species colonization) are precisely known, which is unrealistic in real-world settings. We therefore address the robust river network design problem where the goal is to optimize river connectivity for ﬁsh movement by removing barriers. We assume that ﬁsh passability probabilities are known only imprecisely, but are within some interval bounds. We then develop a planning approach that computes the policies with either high robust ratio or low regret. Empirically, our approach scales well to large river networks. We also provide insights into the solutions generated by our robust approach, which has signiﬁcantly higher robust ratio than the baseline solution with mean parameter estimates.

ICAPS Conference 2016 Conference Paper

Dual Formulations for Optimizing Dec-POMDP Controllers

Akshat Kumar
Hala Mostafa
Shlomo Zilberstein

Decentralized POMDP is an expressive model for multi-agent planning. Finite-state controllers (FSCs)---often used to represent policies for infinite-horizon problems---offer a compact, simple-to-execute policy representation. We exploit novel connections between optimizing decentralized FSCs and the dual linear program for MDPs. Consequently, we describe a dual mixed integer linear program (MIP) for optimizing deterministic FSCs. We exploit the Dec-POMDP structure to devise a compact MIP and formulate constraints that result in policies executable in partially-observable decentralized settings. We show analytically that the dual formulation can also be exploited within the expectation maximization (EM) framework to optimize stochastic FSCs. The resulting EM algorithm can be implemented by solving a sequence of linear programs, without requiring expensive message-passing over the Dec-POMDP DBN. We also present an efficient technique for policy improvement based on a weighted entropy measure. Compared with state-of-the-art FSC methods, our approach offers over an order-of-magnitude speedup, while producing similar or better solutions.

AAAI Conference 2016 Conference Paper

Robust Decision Making for Stochastic Network Design

Akshat Kumar
Arambam Singh
Pradeep Varakantham
Daniel Sheldon

We address the problem of robust decision making for stochastic network design. Our work is motivated by spatial conservation planning where the goal is to take management decisions within a ﬁxed budget to maximize the expected spread of a population of species over a network of land parcels. Most previous work for this problem assumes that accurate estimates of different network parameters (edge activation probabilities, habitat suitability scores) are available, which is an unrealistic assumption. To address this shortcoming, we assume that network parameters are only partially known, speciﬁed via interval bounds. We then develop a decision making approach that computes the solution with minimax regret. We provide new theoretical results regarding the structure of the minmax regret solution which help develop a computationally efﬁcient approach. Empirically, we show that previous approaches that work on point estimates of network parameters result in high regret on several standard benchmarks, while our approach provides signiﬁcantly more robust solutions.

AAMAS Conference 2016 Conference Paper

Robust Influence Maximization (Extended Abstract)

Meghna Lowalekar
Pradeep Varakantham
Akshat Kumar

Influence Maximization [2] is the problem of finding a fixed size set of nodes, which will maximize the expected number of influenced nodes in a social network. The number of influenced nodes is dependent on the influence strength of edges that can be very noisy. The noise in the influence strengths can be modeled using a random noise or adversarial noise model. It has been shown that all random processes that independently affect edges of the graph can be absorbed into the activation probabilities themselves and hence random noise can be captured within the independent cascade model. On the other hand, similar to He et al. [1], we consider the adversarial noise where influence strength for an edge can belong to any point in the interval: [p̌u, v, p̂u, v] and the exact values are chosen by an adversary from this interval. The problems of evaluating robustness of a given solution and computing robust optimal solutions have received scant attention in the literature and are of key interest in this paper. Specifically, we aim to minimize (over all available seed sets) the maximum (over all instantiations of influence strengths) regret. Concretely, the key contributions are: (1) We show that maximum regret for a given solution is attained when influence strength on each of the edges is set to one of the extreme values of the influence strength intervals on edges. (2) We provide a novel way of considering samples that accounts for the noise in influence strength on all edges. (3) We develop a framework which provides an approach to get an optimal regret solution and more importantly a metric to evaluate robustness of a given solution based on the regret optimal solution. (4) Finally, we show results on evaluating the robustness of the well known greedy approach. Surprisingly, even without considering noise in influence strengths explicitly, greedy approach achieves highly robust solutions on small-medium scale social network instances.

AAAI Conference 2016 Conference Paper

Shortest Path Based Decision Making Using Probabilistic Inference

Akshat Kumar

We present a new perspective on the classical shortest path routing (SPR) problem in graphs. We show that the SPR problem can be recast to that of probabilistic inference in a mixture of simple Bayesian networks. Maximizing the likelihood in this mixture becomes equivalent to solving the SPR problem. We develop the well known Expectation-Maximization (EM) algorithm for the SPR problem that maximizes the likelihood, and show that it does not get stuck in a locally optimal solution. Using the same probabilistic framework, we then address an NP-Hard network design problem where the goal is to repair a network of roads post some disaster within a ﬁxed budget such that the connectivity between a set of nodes is optimized. We show that our likelihood maximization approach using the EM algorithm scales well for this problem taking the form of message-passing among nodes of the graph, and provides signiﬁcantly better quality solutions than a standard mixed-integer programming solver.

ICAPS Conference 2015 Conference Paper

History-Based Controller Design and Optimization for Partially Observable MDPs

Akshat Kumar
Shlomo Zilberstein

Partially observable MDPs provide an elegant framework forsequential decision making. Finite-state controllers (FSCs) are often used to represent policies for infinite-horizon problems as they offer a compact representation, simple-to-execute plans, and adjustable tradeoff between computational complexityand policy size. We develop novel connections between optimizing FSCs for POMDPs and the dual linear programfor MDPs. Building on that, we present a dual mixed integer linear program (MIP) for optimizing FSCs. To assign well-defined meaning to FSC nodes as well as aid in policy search, we show how to associate history-based features with each FSC node. Using this representation, we address another challenging problem, that of iteratively deciding which nodes to add to FSC to get a better policy. Using an efficient off-the-shelf MIP solver, we show that this new approach can find compact near-optimal FSCs for severallarge benchmark domains, and is competitive with previous best approaches.

ICML Conference 2015 Conference Paper

Message Passing for Collective Graphical Models

Tao Sun 0008
Daniel Sheldon
Akshat Kumar

Collective graphical models (CGMs) are a formalism for inference and learning about a population of independent and identically distributed individuals when only noisy aggregate data are available. We highlight a close connection between approximate MAP inference in CGMs and marginal inference in standard graphical models. The connection leads us to derive a novel Belief Propagation (BP) style algorithm for collective graphical models. Mathematically, the algorithm is a strict generalization of BP—it can be viewed as an extension to minimize the Bethe free energy plus additional energy terms that are non-linear functions of the marginals. For CGMs, the algorithm is much more efficient than previous approaches to inference. We demonstrate its performance on two synthetic experiments concerning bird migration and collective human mobility.

IJCAI Conference 2015 Conference Paper

Probabilistic Inference Based Message-Passing for Resource Constrained DCOPs

Supriyo Ghosh
Akshat Kumar
Pradeep Varakantham

Distributed constraint optimization (DCOP) is an important framework for coordinated multiagent decision making. We address a practically useful variant of DCOP, called resource-constrained DCOP (RC-DCOP), which takes into account agents’ consumption of shared limited resources. We present a promising new class of algorithm for RC-DCOPs by translating the underlying coordination problem to probabilistic inference. Using inference techniques such as expectation-maximization and convex optimization machinery, we develop a novel convergent message-passing algorithm for RC-DCOPs. Experiments on standard benchmarks show that our approach provides better quality than previous best DCOP algorithms and has much lower failure rate. Comparisons against an efficient centralized solver show that our approach provides near-optimal solutions, and is significantly faster on larger instances.

JAIR Journal 2015 Journal Article

Probabilistic Inference Techniques for Scalable Multiagent Decision Making

Akshat Kumar
Shlomo Zilberstein
Marc Toussaint

Decentralized POMDPs provide an expressive framework for multiagent sequential decision making. However, the complexity of these models---NEXP-Complete even for two agents---has limited their scalability. We present a promising new class of approximation algorithms by developing novel connections between multiagent planning and machine learning. We show how the multiagent planning problem can be reformulated as inference in a mixture of dynamic Bayesian networks (DBNs). This planning-as-inference approach paves the way for the application of efficient inference techniques in DBNs to multiagent decision making. To further improve scalability, we identify certain conditions that are sufficient to extend the approach to multiagent systems with dozens of agents. Specifically, we show that the necessary inference within the expectation-maximization framework can be decomposed into processes that often involve a small subset of agents, thereby facilitating scalability. We further show that a number of existing multiagent planning models satisfy these conditions. Experiments on large planning benchmarks confirm the benefits of our approach in terms of runtime and scalability with respect to existing techniques.

PDF Details DOI

ICAPS Conference 2014 Conference Paper

Near-Optimal Nonmyopic Contact Center Planning Using Dual Decomposition

Akshat Kumar
Sudhanshu Shekhar Singh
Pranav Gupta
Gyana R. Parija

We address the problem of minimizing staffing cost in a contact center subject to service level requirements over multiple weeks. We handle both the capacity planning and agent schedule generation aspect of this problem. Our work incorporates two unique business requirements. First, we develop techniques that can provide near-optimal staffing for 247 contact centers over long term, upto eight weeks, rather than planning myopically on a week-on-week basis. Second, our approach is usable in an online interactive setting in which staffing managers using our system expect high quality plans within a short time period. Results on large real world and synthetic instances show that our Lagrangian relaxation based technique can achieve a solution within 94% of optimal on an average, for eight week problems within ten minutes, whereas a generic integer programming solver can only achieve a solution within 80% of optimal. Our approach is also deployed in live business environment and reduces headcount by a decile over techniques used previously by our client business units.

ICML Conference 2013 Conference Paper

Approximate Inference in Collective Graphical Models

Daniel Sheldon
Tao Sun 0008
Akshat Kumar
Thomas G. Dietterich

We study the problem of approximate inference in collective graphical models (CGMs), which were recently introduced to model the problem of learning and inference with noisy aggregate observations. We first analyze the complexity of inference in CGMs: unlike inference in conventional graphical models, exact inference in CGMs is NP-hard even for tree-structured models. We then develop a tractable convex approximation to the NP-hard MAP inference problem in CGMs, and show how to use MAP inference for approximate marginal inference within the EM framework. We demonstrate empirically that these approximation techniques can reduce the computational cost of inference by two orders of magnitude and the cost of learning by at least an order of magnitude while providing solutions of equal or better quality.

IJCAI Conference 2013 Conference Paper

Automated Generation of Interaction Graphs for Value-Factored Decentralized POMDPs

William Yeoh
Akshat Kumar
Shlomo Zilberstein

The Decentralized Partially Observable Markov Decision Process (Dec-POMDP) is a powerful model for multiagent planning under uncertainty, but its applicability is hindered by its high complexity – solving Dec-POMDPs optimally is NEXP-hard. Recently, Kumar et al. introduced the Value Factorization (VF) framework, which exploits decomposable value functions that can be factored into subfunctions. This framework has been shown to be a generalization of several models that leverage sparse agent interactions such as TI-Dec-MDPs, ND- POMDPs and TD-POMDPs. Existing algorithms for these models assume that the interaction graph of the problem is given. In this paper, we introduce three algorithms to automatically generate interaction graphs for models within the VF framework and establish lower and upper bounds on the expected reward of an optimal joint policy. We illustrate experimentally the benefits of these techniques for sensor placement in a decentralized tracking application.

PDF Details DOI

UAI Conference 2013 Conference Paper

Collective Diffusion Over Networks: Models and Inference

Akshat Kumar
Daniel Sheldon
Biplav Srivastava

Diffusion processes in networks are increasingly used to model the spread of information and social influence. In several applications in computational sustainability such as the spread of wildlife, infectious diseases and traffic mobility pattern, the observed data often consists of only aggregate information. In this work, we present new models that generalize standard diffusion processes to such collective settings. We also present optimization based techniques that can accurately learn the underlying dynamics of the given contagion process, including the hidden network structure, by only observing the time a node becomes active and the associated aggregate information. Empirically, our technique is highly robust and accurately learns network structure with more than 90% recall and precision. Results on real-world flu spread data in the US confirm that our technique can also accurately model infectious disease spread.

IJCAI Conference 2013 Conference Paper

Parameter Learning for Latent Network Diffusion

Xiaojian Wu
Akshat Kumar
Daniel Sheldon
Shlomo Zilberstein

Diffusion processes in networks are increasingly used to model dynamic phenomena such as the spread of information, wildlife, or social inﬂuence. Our work addresses the problem of learning the underlying parameters that govern such a diffusion process by observing the time at which nodes become active. A key advantage of our approach is that, unlike previous work, it can tolerate missing observations for some nodes in the diffusion process. Having incomplete observations is characteristic of ofﬂine networks used to model the spread of wildlife. We develop an EM algorithm to address parameter learning in such settings. Since both the E and M steps are computationally challenging, we employ a number of optimization methods such as nonlinear and difference-of-convex programming to address these challenges. Evaluation of the approach on the Red-cockaded Woodpecker conservation problem shows that it is highly robust and accurately learns parameters in various settings, even with more than 80% missing data.

PDF Details DOI

AAAI Conference 2012 Conference Paper

Lagrangian Relaxation Techniques for Scalable Spatial Conservation Planning

Akshat Kumar
Xiaojian Wu
Shlomo Zilberstein

We address the problem of spatial conservation planning in which the goal is to maximize the expected spread of cascades of an endangered species by strategically purchasing land parcels within a given budget. This problem can be solved by standard integer programming methods using the sample average approximation (SAA) scheme. Our main contribution lies in exploiting the separable structure present in this problem and using Lagrangian relaxation techniques to gain scalability over the flat representation. We also generalize the approach to allow the application of the SAA scheme to a range of stochastic optimization problems. Our iterative approach is highly efficient in terms of space requirements and it provides an upper bound over the optimal solution at each iteration. We apply our approach to the Red-cockaded Woodpecker conservation problem. The results show that it can find the optimal solution significantly faster—sometimes by an order-of-magnitude—than using the flat representation for a range of budget sizes.

AAMAS Conference 2011 Conference Paper

Message-Passing Algorithms for Large Structured Decentralized POMDPs

Akshat Kumar
Shlomo Zilberstein

Decentralized POMDPs provide a rigorous framework for multi-agent decision-theoretic planning. However, their high complexity has limited scalability. In this work, we present a promising new class of algorithms based on probabilistic inference for infinite-horizon ND-POMDPs-a restricted Dec-POMDP model. We first transform the policy optimization problem to that of likelihood maximization in a mixture of dynamic Bayes nets (DBNs). We then develop the Expectation-Maximization (EM) algorithm for maximizing the likelihood in this representation. The EM algorithm for ND-POMDPs lends itself naturally to a simple messagepassing paradigm guided by the agent interaction graph. It is thus highly scalable w. r. t. the number of agents, can be easily parallelized, and produces good quality solutions.

UAI Conference 2011 Conference Paper

Message-Passing Algorithms for Quadratic Programming Formulations of MAP Estimation

Akshat Kumar
Shlomo Zilberstein

Computing maximum a posteriori (MAP) estimation in graphical models is an important inference problem with many applications. We present message-passing algorithms for quadratic programming (QP) formulations of MAP estimation for pairwise Markov random fields. In particular, we use the concave-convex procedure (CCCP) to obtain a locally optimal algorithm for the non-convex QP formulation. A similar technique is used to derive a globally convergent algorithm for the convex QP relaxation of MAP. We also show that a recently developed expectation-maximization (EM) algorithm for the QP formulation of MAP can be derived from the CCCP perspective. Experiments on synthetic and real-world problems confirm that our new approach is competitive with max-product and its variations. Compared with CPLEX, we achieve more than an order-of-magnitude speedup in solving optimally the convex QP relaxation.

IJCAI Conference 2011 Conference Paper

Scalable Multiagent Planning Using Probabilistic Inference

Akshat Kumar
Shlomo Zilberstein
Marc Toussaint

Multiagent planning has seen much progress with the development of formal models such as Dec-POMDPs. However, the complexity of these models-NEXP-Complete even for two agents-has limited scalability. We identify certain mild conditions that are sufficient to make multiagent planning amenable to a scalable approximation w. r. t. the number of agents. This is achieved by constructing a graphical model in which likelihood maximization is equivalent to plan optimization. Using the Expectation-Maximization framework for likelihood maximization, we show that the necessary inference can be decomposed into processes that often involve a small subset of agents, thereby facilitating scalability. We derive a global update rule that combines these local inferences to monotonically increase the overall solution quality. Experiments on a large multiagent planning benchmark confirm the benefits of the new approach in terms of runtime and scalability.

PDF Details DOI

UAI Conference 2010 Conference Paper

Anytime Planning for Decentralized POMDPs using Expectation Maximization

Akshat Kumar
Shlomo Zilberstein

Decentralized POMDPs provide an expressive framework for multi-agent sequential decision making. While finite-horizon DEC- POMDPs have enjoyed significant success, progress remains slow for the infinite-horizon case mainly due to the inherent complexity of optimizing stochastic controllers representing agent policies. We present a promising new class of algorithms for the infinite-horizon case, which recasts the optimization problem as inference in a mixture of DBNs. An attractive feature of this approach is the straightforward adoption of existing inference techniques in DBNs for solving DEC-POMDPs and supporting richer representations such as factored or continuous states and actions. We also derive the Expectation Maximization (EM) algorithm to optimize the joint policy represented as DBNs. Experiments on benchmark domains show that EM compares favorably against the state-of-the-art solvers.

NeurIPS Conference 2010 Conference Paper

MAP Estimation for Graphical Models by Likelihood Maximization

Akshat Kumar
Shlomo Zilberstein

Computing a {\em maximum a posteriori} (MAP) assignment in graphical models is a crucial inference problem for many practical applications. Several provably convergent approaches have been successfully developed using linear programming (LP) relaxation of the MAP problem. We present an alternative approach, which transforms the MAP problem into that of inference in a finite mixture of simple Bayes nets. We then derive the Expectation Maximization (EM) algorithm for this mixture that also monotonically increases a lower bound on the MAP assignment until convergence. The update equations for the EM algorithm are remarkably simple, both conceptually and computationally, and can be implemented using a graph-based message passing paradigm similar to max-product computation. We experiment on the real-world protein design dataset and show that EM's convergence rate is significantly higher than the previous LP relaxation based approach MPLP. EM achieves a solution quality within $95$\% of optimal for most instances and is often an order-of-magnitude faster than MPLP.

AAMAS Conference 2010 Conference Paper

Point-Based Backup for Decentralized POMDPs: Complexity and New Algorithms

Akshat Kumar
Shlomo Zilberstein

Decentralized POMDPs provide an expressive frameworkfor sequential multi-agent decision making. Despite theirhigh complexity, there has been significant progress in scaling up existing algorithms, largely due to the use of point-based methods. Performing point-based backup is a fundamental operation in state-of-the-art algorithms. We showthat even a single backup step in the multi-agent settingis NP-Complete. Despite this negative worst-case result, wepresent an efficient and scalable optimal algorithm as well asa principled approximation scheme. The optimal algorithmexploits recent advances in the weighted CSP literature toovercome the complexity of the backup operation. The polytime approximation scheme provides a constant factor approximation guarantee based on the number of belief points. In experiments on standard domains, the optimal approachprovides significant speedup (up to 2 orders of magnitude)over the previous best optimal algorithm and is able to increase the number of belief points by more than a factor of3. The approximation scheme also works well in practice, providing near-optimal solutions to the backup problem.

IJCAI Conference 2009 Conference Paper

Akshat Kumar
Shlomo Zilberstein

Planning under uncertainty for multiple agents has grown rapidly with the development of formal models such as multi-agent MDPs and decentralized MDPs. But despite their richness, the applicability of these models remains limited due to their computational complexity. We present the class of event-detecting multi-agent MDPs (eMMDPs), designed to detect multiple mobile targets by a team of sensor agents. We show that eMMDPs are NP- Hard and present a scalable 2-approximation algorithm for solving them using matroid theory and constraint optimization. The complexity of the algorithm is linear in the state-space and number of agents, quadratic in the horizon, and exponential only in a small parameter that depends on the interaction among the agents. Despite the worst-case approximation ratio of 2, experimental results show that the algorithm produces near-optimal policies for a range of test problems.

AAMAS Conference 2009 Conference Paper

Constraint-Based Dynamic Programming for Decentralized POMDPs with Structured Interactions

Akshat Kumar
Shlomo Zilberstein

Decentralized partially observable MDPs (DEC-POMDPs) provide a rich framework for modeling decision making by a team of agents. Despite rapid progress in this area, the limited scalability of solution techniques has restricted the applicability of the model. To overcome this computational barrier, research has focused on restricted classes of DEC- POMDPs, which are easier to solve yet rich enough to capture many practical problems. We present CBDP, an ef- ﬁcient and scalable point-based dynamic programming algorithm for one such model called ND-POMDP (Network Distributed POMDP). Speciﬁcally, CBDP provides magnitudes of speedup in the policy computation and generates better quality solution for all test instances. It has linear complexity in the number of agents and horizon length. Furthermore, the complexity per horizon for the examined class of problems is exponential only in a small parameter that depends upon the interaction among the agents, achieving signiﬁcant scalability for large, loosely coupled multi-agent systems. The eﬃciency of CBDP lies in exploiting the structure of interactions using constraint networks. These results extend signiﬁcantly the eﬀectiveness of decision-theoretic planning in multi-agent settings.

AAMAS Conference 2009 Conference Paper

Distributed Constraint Optimization with Structured Resource Constraints

Akshat Kumar
Boi Faltings
Adrian Petcu

Distributed constraint optimization (DCOP) provides a framework for coordinated decision making by a team of agents. Often, during the decision making, capacity constraints on agents’ resource consumption must be taken into account. To address such scenarios, an extension of DCOP- Resource Constrained DCOP- has been proposed. However, certain type of resources have an additional structure associated with them and exploiting it can result in more eﬃcient algorithms than possible with a general framework. An example of these are distribution networks, where the ﬂow of a commodity from sources to sinks is limited by the ﬂow capacity of edges. We present a new model of structured resource constraints that exploits the acyclicity and the ﬂow conservation property of distribution networks. We show how this model can be used in eﬃcient algorithms for ﬁnding the optimal ﬂow conﬁguration in distribution networks, an essential problem in managing power distribution networks. Experiments demonstrate the eﬃciency and scalability of our approach on publicly available benchmarks and compare favorably against a specialized solver for this task. Our results extend signiﬁcantly the eﬀectiveness of distributed constraint optimization for practical multi-agent settings.

AAAI Conference 2008 Conference Paper

H-DPOP: Using Hard Constraints for Search Space Pruning in DCOP

Akshat Kumar

In distributed constraint optimization problems, dynamic programming methods have been recently proposed (e. g. DPOP). In dynamic programming many valuations are grouped together in fewer messages, which produce much less networking overhead than search. Nevertheless, these messages are exponential in size. The basic DPOP always communicates all possible assignments, even when some of them may be inconsistent due to hard constraints. Many real problems contain hard constraints that significantly reduce the space of feasible assignments. This paper introduces H- DPOP, a hybrid algorithm that is based on DPOP, which uses Constraint Decision Diagrams (CDD) to rule out infeasible assignments, and thus compactly represent UTIL messages. Experimental results show that H-DPOP requires several orders of magnitude less memory than DPOP, especially for dense and tightly-constrained problems.