Author name cluster

Frits de Nijs

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

9 papers

2 author rows

AAMAS Conference 2025 Conference Paper

Using Assistance Rewards Without Introducing Bias: Overcoming Sparse Rewards in Multi-Agent Reinforcement Learning

Yue Yang
Bernd Meyer
Frits de Nijs

Reinforcement learning agents may fail to learn good policies when their reward function is too sparse. Auxiliary reward shaping functions can help guide exploration towards the true rewards, but risk producing sub-optimal policies as agents now target a modified objective function. Our paper addresses this challenge by introducing a general framework for incorporating auxiliary reward functions without introducing a bias in the true objective. Agents train an ensemble of reward-function-specific policies, sharing experiences collected with one policy to all other policies in the ensemble. A top-level control policy then learns to choose the best policy to maximize the true objective. We show that this scheme does not affect the convergence properties of the underlying reinforcement learning algorithm, while avoiding potential biasing of the agent’s objective. We also adapted our proposed algorithm using off-policy PPO with MA-Trace correction for state value estimation. To our knowledge, this is the first work to adapt off-policy PPO in a multiagent setting. We also demonstrate that our approach operates effectively with various assistance reward designs, removing the need for detailed reward function crafting or fine-tuning.

PDF

JAIR Journal 2021 Journal Article

Constrained Multiagent Markov Decision Processes: a Taxonomy of Problems and Algorithms

Frits de Nijs
Erwin Walraven
Mathijs M. de Weerdt
Matthijs T. J. Spaan

In domains such as electric vehicle charging, smart distribution grids and autonomous warehouses, multiple agents share the same resources. When planning the use of these resources, agents need to deal with the uncertainty in these domains. Although several models and algorithms for such constrained multiagent planning problems under uncertainty have been proposed in the literature, it remains unclear when which algorithm can be applied. In this survey we conceptualize these domains and establish a generic problem class based on Markov decision processes. We identify and compare the conditions under which algorithms from the planning literature for problems in this class can be applied: whether constraints are soft or hard, whether agents are continuously connected, whether the domain is fully observable, whether a constraint is momentarily (instantaneous) or on a budget, and whether the constraint is on a single resource or on multiple. Further we discuss the advantages and disadvantages of these algorithms. We conclude by identifying open problems that are directly related to the conceptualized domains, as well as in adjacent research areas.

PDF Details DOI

AAAI Conference 2021 Short Paper

Evaluating Meta-Reinforcement Learning through a HVAC Control Benchmark (Student Abstract)

Yashvir S. Grewal
Frits de Nijs
Sarah Goodwin

Meta-Reinforcement Learning (RL) algorithms promise to leverage prior task experience to quickly learn new unseen tasks. Unfortunately, evaluating meta-RL algorithms is complicated by a lack of suitable benchmarks. In this paper we propose adapting a challenging real-world heating, ventilation and air-conditioning (HVAC) control benchmark for meta-RL. Unlike existing benchmark problems, HVAC control has a broader task distribution, and sources of exogenous stochasticity from price and weather predictions which can be shared across task definitions. This can enable greater differentiation between the performance of current meta-RL approaches, and open the way for future research into algorithms that can adapt to entirely new tasks not sampled from the current task distribution.

PDF Details

AAMAS Conference 2018 Conference Paper

Capacity-aware Sequential Recommendations

Frits de Nijs
Georgios Theocharous
Nikos Vlassis
Mathijs M. de Weerdt
Matthijs T. J. Spaan

Personalized recommendations are increasingly important to engage users and guide them through large systems, for example when recommending points of interest to tourists visiting a popular city. To maximize long-term user experience, the system should consider issuing recommendations sequentially, since by observing the user’s response to a recommendation, the system can update its estimate of the user’s (latent) interests. However, as traditional recommender systems target individuals, their effect on a collective of users can unintentionally overload capacity. Therefore, recommender systems should not only consider the users’ interests, but also the effect of recommendations on the available capacity. The structure in such a constrained, multi-agent, partially observable decision problem can be exploited by a novel belief-space sampling algorithm which bounds the size of the state space by a limit on regret. By exploiting the stationary structure of the problem, our algorithm is significantly more scalable than existing approximate solvers. Moreover, by explicitly considering the information value of actions, this algorithm significantly improves the quality of recommendations over an extension of posterior sampling reinforcement learning to the constrained multi-agent case. We show how to decouple constraint satisfaction from sequential recommendation policies, resulting in algorithms which issue recommendations to thousands of agents while respecting constraints.

PDF

AAAI Conference 2018 Conference Paper

Preallocation and Planning Under Stochastic Resource Constraints

Frits de Nijs
Matthijs Spaan
Mathijs de Weerdt

Resource constraints frequently complicate multi-agent planning problems. Existing algorithms for resource-constrained, multi-agent planning problems rely on the assumption that the constraints are deterministic. However, frequently resource constraints are themselves subject to uncertainty from external inﬂuences. Uncertainty about constraints is especially challenging when agents must execute in an environment where communication is unreliable, making on-line coordination difﬁcult. In those cases, it is a signiﬁcant challenge to ﬁnd coordinated allocations at plan time depending on availability at run time. To address these limitations, we propose to extend algorithms for constrained multi-agent planning problems to handle stochastic resource constraints. We show how to factorize resource limit uncertainty and use this to develop novel algorithms to plan policies for stochastic constraints. We evaluate the algorithms on a search-and-rescue problem and on a power-constrained planning domain where the resource constraints are decided by nature. We show that plans taking into account all potential realizations of the constraint obtain signiﬁcantly better utility than planning for the expectation, while causing fewer constraint violations.

PDF Details

AAAI Conference 2017 Conference Paper

Bounding the Probability of Resource Constraint Violations in Multi-Agent MDPs

Frits de Nijs
Erwin Walraven
Mathijs de Weerdt
Matthijs Spaan

Multi-agent planning problems with constraints on global resource consumption occur in several domains. Existing algorithms for solving Multi-agent Markov Decision Processes can compute policies that meet a resource constraint in expectation, but these policies provide no guarantees on the probability that a resource constraint violation will occur. We derive a method to bound constraint violation probabilities using Hoeffding’s inequality. This method is applied to two existing approaches for computing policies satisfying constraints: the Constrained MDP framework and a Column Generation approach. We also introduce an algorithm to adaptively relax the bound up to a given maximum violation tolerance. Experiments on a hard toy problem show that the resulting policies outperform static optimal resource allocations to an arbitrary level. By testing the algorithms on more realistic planning domains from the literature, we demonstrate that the adaptive bound is able to efﬁciently trade off violation probability with expected value, outperforming state-of-the-art planners.

PDF Details

ECAI Conference 2016 Conference Paper

Decoupling a Resource Constraint Through Fictitious Play in Multi-Agent Sequential Decision Making

Frits de Nijs
Matthijs T. J. Spaan
Mathijs de Weerdt

When multiple independent agents use a limited shared resource, they need to coordinate and thereby their planning problems become coupled. We present a resource assignment strategy that decouples agents using marginal utility cost, allowing them to plan individually. We show that agents converge to an expected cost curve by keeping a history of plans, inspired by fictitious play. This performs slightly better than a state-of-the-art best-response approach and is significantly more scalable than a preallocation Mixed-Integer Linear Programming formulation, providing a good trade-off between performance and quality.

Details

AAAI Conference 2015 Conference Paper

Best-Response Planning of Thermostatically Controlled Loads under Power Constraints

Frits de Nijs
Matthijs Spaan
Mathijs de Weerdt

Renewable power sources such as wind and solar are inflexible in their energy production, which requires demand to rapidly follow supply in order to maintain energy balance. Promising controllable demands are airconditioners and heat pumps which use electric energy to maintain a temperature at a setpoint. Such Thermostatically Controlled Loads (TCLs) have been shown to be able to follow a power curve using reactive control. In this paper we investigate the use of planning under uncertainty to pro-actively control an aggregation of TCLs to overcome temporary grid imbalance. We present a formal definition of the planning problem under consideration, which we model using the Multi- Agent Markov Decision Process (MMDP) framework. Since we are dealing with hundreds of agents, solving the resulting MMDPs directly is intractable. Instead, we propose to decompose the problem by decoupling the interactions through arbitrage. Decomposition of the problem means relaxing the joint power consumption constraint, which means that joining the plans together can cause overconsumption. Arbitrage acts as a conflict resolution mechanism during policy execution, using the future expected value of policies to determine which TCLs should receive the available energy. We experimentally compare several methods to plan with arbitrage, and conclude that a best response-like mechanism is a scalable approach that returns near-optimal solutions.

PDF Details

ICAPS Conference 2014 Conference Paper

A Novel Priority Rule Heuristic: Learning from Justification

Frits de Nijs
Tomas Klos

The Resource Constrained Project Scheduling Problem consists of finding start times for precedence-constrained activities which compete over renewable resources, with the goal to produce the shortest schedule. The method of Justification is a very popular post-processing schedule optimization technique which, although it is not clear exactly why, has been shown to work very well, even improving randomly generated schedules over those produced by advanced heuristics. In this paper, we set out to investigate why Justification works so well, and, with this understanding, to bypass the need for Justification by computing a priori the priorities Justification implicitly employs. We perform an exploratory study to investigate the effectiveness of Justification on a novel test set which varies the RCPSP phase-transition parameters across a larger range than existing test sets. We propose several hypotheses to explain the behavior of Justification, which we test by deriving from them several predictions, and a new priority rule. We show that this rule matches the priorities used by Justification more closely than existing rules, making it outperform the most successful priority rule heuristic.

Details