Author name cluster

Daniel Kudenko

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

28 papers

2 author rows

EAAI Journal 2025 Journal Article

Hybrid pathfinding optimization for the Lightning Network with Reinforcement Learning

Danila Valko
Daniel Kudenko

Details DOI

AAMAS Conference 2025 Conference Paper

Improving the Effectiveness of Potential-based Reward Shaping in Reinforcement Learning

Henrik Müller
Daniel Kudenko

Potential-based reward shaping is often used to incorporate prior knowledge of how to solve the task into reinforcement learning because it can formally guarantee policy invariance. In this work, we highlight the dependence of effective potential-based reward shaping on the initial Q-values and external rewards, which determine the agent’s ability to exploit the shaping rewards to guide its exploration and achieve increased sample efficiency. We formally derive how a simple linear shift of the potential function can be used to improve the effectiveness of reward shaping without changing the structure of the potential function and thus its implicitly encoded preferences, and without having to adjust the initial Qvalues. We verify our theoretical findings on tabular Q-learning and demonstrate the application of our findings in deep reinforcement learning.

PDF

ECAI Conference 2024 Conference Paper

Entity Matching Across Small Networks Using Node Attributes

Zahra Ahmadi
Zijian Zhang
Hoang H. Nguyen
Sergio Burdisso
Srikanth R. Madikeri
Petr Motlícek
Erinç Dikici
Gerhard Backfried

Entity matching, also known as user identity linkage, is a critical task in data integration. While established techniques primarily focus on large-scale networks, there are several applications where small networks pose challenges due to limited training data and sparsity. This study addresses entity matching in the field of criminology, where small networks are common and the number of known matching nodes is restricted. To support this research, we exploit a multimodal dataset, collected as part of a security-related project, consisting of an intercepted telephone calls network (i. e. , ROXSD data) and a network of social forum interactions (i. e. , ROXHOOD data) collected in a simulated environment, although following real investigation scenario. To improve accuracy and efficiency, we propose a novel approach for entity matching across these two small networks using node attributes. Existing techniques often merely focus on topology consistency between two networks and overlook valuable information, such as network node attributes, making them vulnerable to structural changes. Inspired by the remarkable success of deep learning, we present UGC-DeepLink, an end-to-end semi-supervised learning framework that leverages user-generated content. UGC-DeepLink encodes network nodes into vector representations, capturing both local and global network structures to align anchor nodes using deep neural networks. A dual learning paradigm and the policy gradient method transfer knowledge and update the linkage. Additionally, node attributes, such as call contents and forum exchanged texts, enhance the ranking of matching nodes. Experimental results on ROXSD and ROXHOOD demonstrate that UGC-DeepLink surpasses baselines and state-of-the-art methods in terms of identity-match ranking. The code and dataset are available at https: //github. com/erichoang/UGC-DeepLink.

Details

AAMAS Conference 2024 Conference Paper

Multi-Robot Motion and Task Planning in Automotive Production Using Controller-based Safe Reinforcement Learning

Eric Wete
Joel Greenyer
Daniel Kudenko
Wolfgang Nejdl

Using synthesis- and AI-planning-based approaches, recent works investigated methods to support engineers with the automation of design, planning, and execution of multi-robot cells. However, real-time constraints and stochastic processes were not well covered due, e. g. , to the high abstraction level of the problem modeling, and these methods do not scale well. In this paper, using probabilistic model checking, we construct a controller and integrate it with reinforcement learning approaches to synthesize the most efficient and correct multi-robot task schedules. Statistical Model Checking (SMC) is applied for system requirement verification. Our method is aware of uncertainties and considers robot movement times, interruption times, and stochastic interruptions that can be learned during multi-robot cell operations. We developed a model-at-runtime that integrates the execution of the production cell and optimizes its performance using a controller-based AI system. For this purpose and to derive the best policy, we implemented and compared AI-based methods, namely, Monte Carlo Tree Search, a heuristic AI-planning technique, and Q-learning, a model-free reinforcement learning method. Our results show that our methodology can choose time-efficient task sequences that consequently improve the cycle time and efficiently adapt to stochastic events, e. g. , robot interruptions. Moreover, our approach scales well compared to previous investigations using SMC, which did not reveal any violation of the requirements.

PDF

ECAI Conference 2020 Conference Paper

Uniform State Abstraction for Reinforcement Learning

John Burden
Daniel Kudenko

Potential Based Reward Shaping combined with a potential function based on appropriately defined abstract knowledge has been shown to significantly improve learning speed in Reinforcement Learning. MultiGrid Reinforcement Learning (MRL) has further shown that such abstract knowledge in the form of a potential function can be learned almost solely from agent interaction with the environment. However, we show that MRL faces the problem of not extending well to work with Deep Learning. In this paper we extend and improve MRL to take advantage of modern Deep Learning algorithms such as Deep Q-Networks (DQN). We show that DQN augmented with our approach perform significantly better on continuous control tasks than its Vanilla counterpart and DQN augmented with MRL.

Details

KER Journal 2019 Journal Article

Introspective Q -learning and learning from demonstration

Mao Li
Tim Brys
Daniel Kudenko

Abstract One challenge faced by reinforcement learning (RL) agents is that in many environments the reward signal is sparse, leading to slow improvement of the agent’s performance in early learning episodes. Potential-based reward shaping can help to resolve the aforementioned issue of sparse reward by incorporating an expert’s domain knowledge into the learning through a potential function. Past work on reinforcement learning from demonstration (RLfD) directly mapped (sub-optimal) human expert demonstration to a potential function, which can speed up RL. In this paper we propose an introspective RL agent that significantly further speeds up the learning. An introspective RL agent records its state–action decisions and experience during learning in a priority queue. Good quality decisions, according to a Monte Carlo estimation, will be kept in the queue, while poorer decisions will be rejected. The queue is then used as demonstration to speed up RL via reward shaping. A human expert’s demonstration can be used to initialize the priority queue before the learning process starts. Experimental validation in the 4-dimensional CartPole domain and the 27-dimensional Super Mario AI domain shows that our approach significantly outperforms non-introspective RL and state-of-the-art approaches in RLfD in both domains.

Details DOI

KER Journal 2019 Journal Article

Two-level Q-learning: learning from conflict demonstrations

Mao Li
Yi Wei
Daniel Kudenko

Abstract One way to address this low sample efficiency of reinforcement learning (RL) is to employ human expert demonstrations to speed up the RL process (RL from demonstration or RLfD). The research so far has focused on demonstrations from a single expert. However, little attention has been given to the case where demonstrations are collected from multiple experts, whose expertise may vary on different aspects of the task. In such scenarios, it is likely that the demonstrations will contain conflicting advice in many parts of the state space. We propose a two-level Q-learning algorithm, in which the RL agent not only learns the policy of deciding on the optimal action but also learns to select the most trustworthy expert according to the current state. Thus, our approach removes the traditional assumption that demonstrations come from one single source and are mostly conflict-free. We evaluate our technique on three different domains and the results show that the state-of-the-art RLfD baseline fails to converge or performs similarly to conventional Q-learning. In contrast, the performance level of our novel algorithm increases with more experts being involved in the learning process and the proposed approach has the capability to handle demonstration conflicts well.

Details DOI

AAMAS Conference 2018 Conference Paper

Introspective Reinforcement Learning and Learning from Demonstration

Mao Li
Tim Brys
Daniel Kudenko

Reinforcement learning is a paradigm used to model how an autonomous agent learns to maximize its cumulative reward by interacting with the environment. One challenge faced by reinforcement learning is that in many environments the reward signal is sparse, leading to slow improvement of the agent’s performance in early learning episodes. Potential-based reward shaping is a technique that can resolve the aforementioned issue of sparse reward by incorporating an expert’s domain knowledge in the learning via a potential function. Past work on reinforcement learning from demonstration directly mapped (sub-optimal) human expert demonstrations to a potential function, which can speed up reinforcement learning. In this paper we propose an introspective reinforcement learning agent that significantly speeds up the learning further. An introspective reinforcement learning agent records its stateaction decisions and experiences during learning in a priority queue. Good quality decisions will be kept in the queue, while poorer decisions will be rejected. The queue is then used as demonstration to speed up reinforcement learning via reward shaping. An expert agent’s demonstrations can be used to initialise the priority queue before the learning process starts. Experimental validations in the 4-dimensional CartPole domain and the 27-dimensional Super Mario AI domain show that our approach significantly outperforms state-of-the-art approaches to reinforcement learning from demonstration in both domains.

PDF

KER Journal 2016 Journal Article

Context-sensitive reward shaping for sparse interaction multi-agent systems

Yann-Michaël de Hauwere
Sam Devlin
Daniel Kudenko
Ann Nowé

Abstract Potential-based reward shaping is a commonly used approach in reinforcement learning to direct exploration based on prior knowledge. Both in single and multi-agent settings this technique speeds up learning without losing any theoretical convergence guarantees. However, if speed ups through reward shaping are to be achieved in multi-agent environments, a different shaping signal should be used for each context in which agents have a different subgoal or when agents are involved in a different interaction situation. This paper describes the use of context-aware potential functions in a multi-agent system in which the interactions between agents are sparse. This means that, unknown to the agents a priori, the interactions between the agents only occur sporadically in certain regions of the state space. During these interactions, agents need to coordinate in order to reach the global optimal solution. We demonstrate how different reward shaping functions can be used on top of Future Coordinating Q-learning (FCQ-learning); an algorithm capable of automatically detecting when agents should take each other into consideration. Using FCQ-learning, coordination problems can even be anticipated before the actual problems occur, allowing the problems to be solved timely. We evaluate our approach on a range of gridworld problems, as well as a simulation of air traffic control.

Details DOI

KER Journal 2016 Journal Article

Overcoming incorrect knowledge in plan-based reward shaping

Kyriakos Efthymiadis
Sam Devlin
Daniel Kudenko

Abstract Reward shaping has been shown to significantly improve an agent’s performance in reinforcement learning. Plan-based reward shaping is a successful approach in which a STRIPS plan is used in order to guide the agent to the optimal behaviour. However, if the provided knowledge is wrong, it has been shown the agent will take longer to learn the optimal policy. Previously, in some cases, it was better to ignore all prior knowledge despite it only being partially incorrect. This paper introduces a novel use of knowledge revision to overcome incorrect domain knowledge when provided to an agent receiving plan-based reward shaping. Empirical results show that an agent using this method can outperform the previous agent receiving plan-based reward shaping without knowledge revision.

Details DOI

KER Journal 2016 Journal Article

Plan-based reward shaping for multi-agent reinforcement learning

Sam Devlin
Daniel Kudenko

Abstract Recent theoretical results have justified the use of potential-based reward shaping as a way to improve the performance of multi-agent reinforcement learning (MARL). However, the question remains of how to generate a useful potential function. Previous research demonstrated the use of STRIPS operator knowledge to automatically generate a potential function for single-agent reinforcement learning. Following up on this work, we investigate the use of STRIPS planning knowledge in the context of MARL. Our results show that a potential function based on joint or individual plan knowledge can significantly improve MARL performance compared with no shaping. In addition, we investigate the limitations of individual plan knowledge as a source of reward shaping in cases where the combination of individual agent plans causes conflict.

Details DOI

AAMAS Conference 2016 Conference Paper

Resource Abstraction for Reinforcement Learning in Multiagent Congestion Problems

Kleanthis Malialis
Sam Devlin
Daniel Kudenko

Real-world congestion problems (e. g. traffic congestion) are typically very complex and large-scale. Multiagent reinforcement learning (MARL) is a promising candidate for dealing with this emerging complexity by providing an autonomous and distributed solution to these problems. However, there are three limiting factors that affect the deployability of MARL approaches to congestion problems. These are learning time, scalability and decentralised coordination i. e. no communication between the learning agents. In this paper we introduce Resource Abstraction, an approach that addresses these challenges by allocating the available resources into abstract groups. This abstraction creates new reward functions that provide a more informative signal to the learning agents and aid the coordination amongst them. Experimental work is conducted on two benchmark domains from the literature, an abstract congestion problem and a realistic traffic congestion problem. The current state-of-theart for solving multiagent congestion problems is a form of reward shaping called difference rewards. We show that the system using Resource Abstraction significantly improves the learning speed and scalability, and achieves the highest possible or near-highest joint performance/social welfare for both congestion problems in large-scale scenarios involving up to 1000 reinforcement learning agents. CCS Concepts •Computing methodologies → Multi-agent systems; Multi-agent reinforcement learning;

PDF

EAAI Journal 2015 Journal Article

Distributed response to network intrusions using multiagent reinforcement learning

Kleanthis Malialis
Daniel Kudenko

Details DOI

JAAMAS Journal 2015 Journal Article

Potential-based reward shaping for finite horizon online POMDP planning

Adam Eck
Leen-Kiat Soh
Daniel Kudenko

Abstract In this paper, we address the problem of suboptimal behavior during online partially observable Markov decision process (POMDP) planning caused by time constraints on planning. Taking inspiration from the related field of reinforcement learning (RL), our solution is to shape the agent’s reward function in order to lead the agent to large future rewards without having to spend as much time explicitly estimating cumulative future rewards, enabling the agent to save time to improve the breadth planning and build higher quality plans. Specifically, we extend potential-based reward shaping (PBRS) from RL to online POMDP planning. In our extension, information about belief states is added to the function optimized by the agent during planning. This information provides hints of where the agent might find high future rewards beyond its planning horizon, and thus achieve greater cumulative rewards. We develop novel potential functions measuring information useful to agent metareasoning in POMDPs (reflecting on agent knowledge and/or histories of experience with the environment), theoretically prove several important properties and benefits of using PBRS for online POMDP planning, and empirically demonstrate these results in a range of classic benchmark POMDP planning problems.

Details DOI

AAAI Conference 2014 Conference Paper

Combining Multiple Correlated Reward and Shaping Signals by Measuring Confidence

Tim Brys
Ann Nowé
Daniel Kudenko
Matthew Taylor

Multi-objective problems with correlated objectives are a class of problems that deserve specific attention. In contrast to typical multi-objective problems, they do not require the identification of trade-offs between the objectives, as (near-) optimal solutions for any objective are (near-) optimal for every objective. Intelligently combining the feedback from these objectives, instead of only looking at a single one, can improve optimization. This class of problems is very relevant in reinforcement learning, as any single-objective reinforcement learning problem can be framed as such a multiobjective problem using multiple reward shaping functions. After discussing this problem class, we propose a solution technique for such reinforcement learning problems, called adaptive objective selection. This technique makes a temporal difference learner estimate the Q-function for each objective in parallel, and introduces a way of measuring confidence in these estimates. This confidence metric is then used to choose which objective’s estimates to use for action selection. We show significant improvements in performance over other plausible techniques on two problem domains. Finally, we provide an intuitive analysis of the technique’s decisions, yielding insights into the nature of the problems being solved.

PDF Details

ECAI Conference 2014 Conference Paper

Coordinated Team Learning and Difference Rewards for Distributed Intrusion Response

Kleanthis Malialis
Sam Devlin
Daniel Kudenko

Distributed denial of service attacks constitute a rapidly evolving threat in the current Internet. Multiagent Router Throttling is a novel approach to respond to such attacks. We demonstrate that our approach can significantly scale-up using hierarchical communication and coordinated team learning. Furthermore, we incorporate a form of reward shaping called difference rewards and show that the scalability of our system is significantly improved in experiments involving over 100 reinforcement learning agents. We also demonstrate that difference rewards constitute an ideal online learning mechanism for network intrusion response. We compare our proposed approach against a popular state-of-the-art router throttling technique from the network security literature, and we show that our proposed approach significantly outperforms it. We note that our approach can be useful in other related multiagent domains.

Details

AAMAS Conference 2013 Conference Paper

Overcoming Erroneous Domain Knowledge in Plan-Based Reward Shaping

Kyriakos Efthymiadis
Sam Devlin
Daniel Kudenko

Reward shaping has been shown to significantly improve an agent’s performance in reinforcement learning. Plan-based reward shaping is a successful approach in which a STRIPS plan is used in order to guide the agent to the optimal behaviour. However, if the provided domain knowledge is wrong, it has been shown the agent will take longer to learn the optimal policy. Previously, in some cases, it was better to ignore all prior knowledge despite it only being partially erroneous. This paper introduces a novel use of knowledge revision to overcome erroneous domain knowledge when provided to an agent receiving plan-based reward shaping. Empirical results show that an agent using this method can outperform the previous agent receiving plan-based reward shaping without knowledge revision.

PDF

AAMAS Conference 2013 Conference Paper

Potential-Based Reward Shaping for POMDPs

Adam Eck
Leen-Kiat Soh
Sam Devlin
Daniel Kudenko

We address the problem of suboptimal behavior caused by short horizons during online POMDP planning. Our solution extends potential-based reward shaping from the related field of reinforcement learning to online POMDP planning in order to improve planning without increasing the planning horizon. In our extension, information about the quality of belief states is added to the function optimized by the agent during planning. This information provides hints of where the agent might find high future rewards, and thus achieve greater cumulative rewards.

PDF

AAMAS Conference 2012 Conference Paper

Dynamic Potential-Based Reward Shaping

Sam Devlin
Daniel Kudenko

Potential-based reward shaping can significantly improve the time needed to learn an optimal policy and, in multi-agent systems, the performance of the final joint-policy. It has been proven to not alter the optimal policy of an agent learning alone or the Nash equilibria of multiple agents learning together. However, a limitation of existing proofs is the assumption that the potential of a state does not change dynamically during the learning. This assumption often is broken, especially if the reward-shaping function is generated automatically. In this paper we prove and demonstrate a method of extending potential-based reward shaping to allow dynamic shaping and maintain the guarantees of policy invariance in the single-agent case and consistent Nash equilibria in the multi-agent case.

PDF

AAMAS Conference 2011 Conference Paper

Multi-Agent, Reward Shaping for RoboCup KeepAway

Sam Devlin
Marek Grze
#X15b;
Daniel Kudenko

This paper investigates the impact of reward shaping in multi-agent reinforcement learning as a way to incorporate domain knowledge about good strategies. In theory, potential-based reward shaping does not alter the Nash Equilibria of a stochastic game, only the exploration of the shaped agent. We demonstrate empirically the performance of statebased and state-action-based reward shaping in RoboCup KeepAway. The results illustrate that reward shaping can alter both the learning time required to reach a stable joint policy and the final group performance for better or worse.

PDF

AAMAS Conference 2011 Conference Paper

Theoretical Considerations of Potential-Based Reward Shaping for Multi-Agent Systems

Sam Devlin
Daniel Kudenko

Potential-based reward shaping has previously been proven to both be equivalent to Q-table initialisation and guarantee policy invariance in single-agent reinforcement learning. The method has since been used in multi-agent reinforcement learning without consideration of whether the theoretical equivalence and guarantees hold. This paper extends the existing proofs to similar results in multi-agent systems, providing the theoretical background to explain the success of previous empirical studies. Specifically, it is proven that the equivalence to Q-table initialisation remains and the Nash Equilibria of the underlying stochastic game are not modified. Furthermore, we demonstrate empirically that potential-based reward shaping affects exploration and, consequentially, can alter the joint policy converged upon.

PDF

AAMAS Conference 2010 Conference Paper

PAC-MDP Learning with Knowledge-based Admissible Models

Marek Grzes
Daniel Kudenko

PAC-MDP algorithms approach the exploration-exploitation problem of reinforcement learning agents in an effective way whichguarantees that with high probability, the algorithm performs nearoptimally for all but a polynomial number of steps. The performance of these algorithms can be further improved by incorporating domain knowledge to guide their learning process. In this paper we propose a framework to use partial knowledge about effectsof actions in a theoretically well-founded way. Empirical evaluation shows that our proposed method is more efficient than reward shaping which represents an alternative approach to incorporate background knowledge. Our solution is also very competitivewhen compared with the Bayesian Exploration Bonus (BEB) algorithm. BEB is not PAC-MDP, however it can exploit domainknowledge via informative priors. We show how to use the samekind of knowledge in the PAC-MDP framework in a way whichpreserves all theoretical guarantees of PAC-MDP learning.

PDF

ECAI Conference 2008 Conference Paper

Multi-Agent Reinforcement Learning for Intrusion Detection: A case study and evaluation

Arturo Servin
Daniel Kudenko

In this paper we propose a novel approach to train Multi-Agent Reinforcement Learning (MARL) agents to cooperate to detect intrusions in the form of normal and abnormal states in the network. We present an architecture of distributed sensor and decision agents that learn how to identify normal and abnormal states of the network using Reinforcement Learning (RL). Sensor agents extract network-state information using tile-coding as a function approximation technique and send communication signals in the form of actions to decision agents. By means of an on line process, sensor and decision agents learn the semantics of the communication actions. In this paper we detail the learning process and the operation of the agent architecture. We also present tests and results of our research work in an intrusion detection case study, using a realistic network simulation where sensor and decision agents learn to identify normal and abnormal states of the network.

Details

AAMAS Conference 2007 Conference Paper

Parallel Reinforcement Learning with Linear Function Approximation

Matthew Grounds
Daniel Kudenko

In this paper, we investigate the use of parallelization in reinforcement learning (RL), with the goal of learning optimal policies for single-agent RL problems more quickly by using parallel hardware. Our approach is based on agents using the SARSA(λ) algorithm, with value functions represented using linear function approximators. In our proposed method, each agent learns independently in a separate simulation of the single-agent problem. The agents periodically exchange information extracted from the weights of their approximators, accelerating convergence towards the optimal policy. We present empirical results for an implementation on a Beowulf cluster.

PDF

ECAI Conference 2004 Conference Paper

Algorithms for Distributed Exploration

Thomas Walker
Daniel Kudenko
Malcolm J. A. Strens

Details

KER Journal 2001 Journal Article

Learning in multi-agent systems

Eduardo Alonso
MARK D'INVERNO
Daniel Kudenko
Michael Luck
JASON NOBLE

In recent years, multi-agent systems (MASs) have received increasing attention in the artificial intelligence community. Research in multi-agent systems involves the investigation of autonomous, rational and flexible behaviour of entities such as software programs or robots, and their interaction and coordination in such diverse areas as robotics (Kitano et al., 1997), information retrieval and management (Klusch, 1999), and simulation (Gilbert & Conte, 1995). When designing agent systems, it is impossible to foresee all the potential situations an agent may encounter and specify an agent behaviour optimally in advance. Agents therefore have to learn from, and adapt to, their environment, especially in a multi-agent setting.

Details DOI

AAAI Conference 1998 Conference Paper

Feature Generation for Sequence Categorization

Daniel Kudenko

The problem of sequence categorization is to generalize from a corpus of labeled sequences procedures for accurately labeling future unlabeled sequences. The choice of representation of sequences can have a major impact on this task, and in the absence of background knowledge a good representation is often not knownand straightforward representations are often far from optimal. Wepropose a feature generation method (called FGEN)that creates Boolean features that check for the presence or absence of heuristically selected collections of subsequences. Weshow empirically that the representation computedby FGEN improves the accuracy of two commonly used learning systems (C4. 5 and Ripper) whenthe new features are added to existing representations of sequence data. Weshowthe superiority of FGEN across a range of tasks selected from three domains: DNAsequences, Unix commandsequences, and English text.

PDF Details

AIJ Journal 1994 Journal Article

An empirical analysis of terminological representation systems

Jochen Heinsohn
Daniel Kudenko
Bernhard Nebel
Hans-Jürgen Profitlich

Details DOI