Arrow Research search

Author name cluster

Saaduddin Mahmud

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

12 papers
2 author rows

Possible papers

12

AAAI Conference 2026 Conference Paper

Causal Explanations for Sequential Decision Making (Abstract Reprint)

  • Samer B. Nashed
  • Saaduddin Mahmud
  • Claudia V. Goldman
  • Shlomo Zilberstein

Stochastic sequential decision-making systems — such as Markov decision processes and their variants — are increasingly used in areas such as transportation, healthcare, and communication. However, the ability to explain these systems’ outputs to non-technical end users has not kept pace with their widespread adoption. This paper addresses that gap by extending prior work and presenting a unified framework for generating causal explanations of agent behavior in sequential decision-making settings, grounded in the structural causal model (SCM) paradigm. Our framework supports the generation of multiple, semantically distinct explanations for agent actions — capabilities that were previously unattainable. In addition to introducing a novel taxonomy of explanations for MDPs to guide empirical investigation, we develop both exact and approximate causal inference methods within the SCM framework. We analyze their applicability and derive run-time bounds for each. This leads to the proposed algorithm, MeanRESP, which operates flexibly across a spectrum of approximations tailored to external constraints. We further analyze the sample complexity and error rates of approximate MeanRESP, and provide a detailed comparison of its outputs — under varying definitions of responsibility — with popular Shapley-value-based methods. Empirically, we performed a series of experiments to evaluate the practicality and effectiveness of the proposed system, focusing on real-world computational demands and the validity and reliability of metrics for comparing approximate and exact causal methods. Finally, we present two user studies that reveal user preferences for certain types of explanations and demonstrate a strong preference for explanations generated by our framework compared to those from other state-of-the-art systems.

AAAI Conference 2026 Conference Paper

Inference-Aware Prompt Optimization for Aligning Black-Box Large Language Models

  • Saaduddin Mahmud
  • Mason Nakamura
  • Kyle Hollins Wray
  • Shlomo Zilberstein

Prompt optimization methods have demonstrated significant effectiveness in aligning black-box large language models (LLMs). In parallel, inference scaling strategies such as Best-of-N Sampling and Majority Voting have likewise been shown to improve alignment and performance by trading additional computation for better output. However, existing prompt optimization approaches are inference strategy agnostic; that is, they optimize prompts without accounting for the inference strategy. This constitutes a significant methodological gap, as our empirical and theoretical analysis reveals a strong interdependence between these two paradigms. Moreover, we find that user preferences regarding trade-offs among multiple objectives and inference budgets substantially influence the choice of prompt and inference configuration. To address this gap, we introduce a novel unified framework named IAPO (Inference-Aware Prompt Optimization) that jointly optimizes the prompt and inference scale, while being aware of the inference budget and different task objectives. We then develop a fixed-budget training algorithm for IAPO, called PSST (Prompt Scaling via Sequential Trimming), and establish finite-budget guarantees on the error probability. Finally, we evaluate the effectiveness of PSST on six tasks, including multi-objective text generation and reasoning, and demonstrate the critical role of incorporating inference-awareness in aligning black-box LLMs using prompt optimization.

JAIR Journal 2025 Journal Article

Causal Explanations for Sequential Decision Making

  • Samer B. Nashed
  • Saaduddin Mahmud
  • Claudia V. Goldman
  • Shlomo Zilberstein

Stochastic sequential decision-making systems — such as Markov decision processes and their variants — are increasingly used in areas such as transportation, healthcare, and communication. However, the ability to explain these systems’ outputs to non-technical end users has not kept pace with their widespread adoption. This paper addresses that gap by extending prior work and presenting a unified framework for generating causal explanations of agent behavior in sequential decision-making settings, grounded in the structural causal model (SCM) paradigm. Our framework supports the generation of multiple, semantically distinct explanations for agent actions — capabilities that were previously unattainable. In addition to introducing a novel taxonomy of explanations for MDPs to guide empirical investigation, we develop both exact and approximate causal inference methods within the SCM framework. We analyze their applicability and derive run-time bounds for each. This leads to the proposed algorithm, MeanRESP, which operates flexibly across a spectrum of approximations tailored to external constraints. We further analyze the sample complexity and error rates of approximate MeanRESP, and provide a detailed comparison of its outputs—under varying definitions of responsibility—with popular Shapley-value-based methods. Empirically, we performed a series of experiments to evaluate the practicality and effectiveness of the proposed system, focusing on real-world computational demands and the validity and reliability of metrics for comparing approximate and exact causal methods. Finally, we present two user studies that reveal user preferences for certain types of explanations and demonstrate a strong preference for explanations generated by our framework compared to those from other state-of-the-art systems.

AAAI Conference 2025 Conference Paper

MAPLE: A Framework for Active Preference Learning Guided by Large Language Models

  • Saaduddin Mahmud
  • Mason Nakamura
  • Shlomo Zilberstein

The advent of large language models (LLMs) has sparked significant interest in using natural language for preference learning. However, existing methods often suffer from high computational burdens, taxing human supervision, and lack of interpretability. To address these issues, we introduce MAPLE, a framework for large language model-guided Bayesian active preference learning. MAPLE leverages LLMs to model the distribution over preference functions, conditioning it on both natural language feedback and conventional preference learning feedback, such as pairwise trajectory rankings. MAPLE employs active learning to systematically reduce uncertainty in this distribution and incorporates a language-conditioned active query selection mechanism to identify informative and easy-to-answer queries, thus reducing the burden on humans. We evaluate MAPLE's sample efficiency and preference inference quality across two benchmarks, including a real-world vehicle route planning benchmark using OpenStreetMap data. Our results demonstrate that MAPLE accelerates the learning process and effectively improves humans' ability to answer queries.

AAMAS Conference 2024 Conference Paper

Explaining the Behavior of POMDP-based Agents Through the Impact of Counterfactual Information

  • Saaduddin Mahmud
  • Marcell Vazquez-Chanlatte
  • Stefan Witwicki
  • Shlomo Zilberstein

In this work, we consider AI agents operating in Partially Observable Markov Decision Processes (POMDPs)—a widely-used framework for sequential decision making with incomplete state information. Agents operating with partial information take actions not only to advance their underlying goals but also to seek information and reduce uncertainty. Despite rapid progress in explainable AI, research on separating information-driven vs. goal-driven behaviors remains sparse. To address this gap, we introduce a novel explanation generation framework called Sequential Information Probing (SIP), to investigate the direct impact of state information, or its absence, on agent behavior. To quantify the impact we also propose two metrics under this SIP framework called Value of Information (VoI) and Influence of Information (IoI). We then theoretically derive several properties of these metrics. Finally, we present several experiments, including a case study on an autonomous vehicle, that illustrate the efficacy of our method.

IJCAI Conference 2023 Conference Paper

Explanation-Guided Reward Alignment

  • Saaduddin Mahmud
  • Sandhya Saisubramanian
  • Shlomo Zilberstein

Agents often need to infer a reward function from observations to learn desired behaviors. However, agents may infer a reward function that does not align with the original intent because there can be multiple reward functions consistent with its observations. Operating based on such misaligned rewards can be risky. Furthermore, black-box representations make it difficult to verify the learned rewards and prevent harmful behavior. We present a framework for verifying and improving reward alignment using explanations and show how explanations can help detect misalignment and reveal failure cases in novel scenarios. The problem is formulated as inverse reinforcement learning from ranked trajectories. Verification tests created from the trajectory dataset are used to iteratively validate and improve reward alignment. The agent explains its learned reward and a tester signals whether the explanation passes the test. In cases where the explanation fails, the agent offers alternative explanations to gather feedback, which is then used to improve the learned reward. We analyze the efficiency of our approach in improving reward alignment using different types of explanations and demonstrate its effectiveness in five domains.

IROS Conference 2023 Conference Paper

Learning Constraints on Autonomous Behavior from Proactive Feedback

  • Connor Basich
  • Saaduddin Mahmud
  • Shlomo Zilberstein

Learning from feedback is a common paradigm to acquire information that is hard to specify a priori. In this work, we consider an agent with a known nominal reward model that captures its high-level task objective. Furthermore, the agent operates subject to constraints that are unknown a priori and must be inferred from human interventions. Unlike existing methods, our approach does not rely on full or partial demonstration trajectories or assume a fully reactive human. Instead, we assume access only to sparse interventions, which may in fact be generated proactively by the human, and we only make minimal assumptions about the human. We provide both theoretical bounds on performance and empirical validations of our method. We show that our method enables an agent to learn a constraint set with high accuracy that generalizes well to new environments within a domain, whereas methods that only consider reactive feedback learn an incorrect constraint set that does not generalize well, making constraint violations more likely in new environments.

AAMAS Conference 2023 Conference Paper

Semi-Autonomous Systems with Contextual Competence Awareness

  • Saaduddin Mahmud
  • Connor Basich
  • Shlomo Zilberstein

Competence modeling is critical for the efficient and safe operation of semi-autonomous systems (SAS) with varying levels of autonomy. In this paper, we extend the notion of competence modeling by introducing a contextual competence model. While previous work on competence-aware systems (CAS) defined the competence of a SAS relative to a single static operator, we present an augmented operator model that is contextualized by Markovian state information capable of capturing multiple operators. Access to such information allows the SAS to account for the stochastic shifts that may occur in the behavior of the operator(s) during deployment and optimize its autonomy accordingly. We show that the extended model called Contextual Competence Aware System (CoCAS) has the same convergence guarantees as CAS, and empirically illustrate the benefit of our approach over both the original CAS model as well as other relevant work in shared autonomy.

AAMAS Conference 2022 Conference Paper

A Simulation Based Online Planning Algorithm for Multi-Agent Cooperative Environments

  • Rafid Ameer Mahmud
  • Fahim Faisal
  • Saaduddin Mahmud
  • Md. Mosaddek Khan

Multi-agent Markov Decision Process (MMDP) has been an effective way of modelling sequential decision making algorithms for multi-agent cooperative environments. However, challenges such as exponential size of action space and dynamic changes limit the efficacy of proposed solutions. This paper propose a scalable and robust algorithm that can effectively solve MMDPs in real time. Simulation, pruning, and prediction are the three key components of the algorithm. The simulation component enables real time solutions by using a novel iterative pruning technique which in turn makes use of the prediction component trained with self play data. The algorithm is self-sustained as it generates new training data from simulation and gradually becomes better. Furthermore, we show empirical results demonstrating the capabilities of the algorithm and compare them with existing MMDP solvers.

AAAI Conference 2020 Conference Paper

A Particle Swarm Based Algorithm for Functional Distributed Constraint Optimization Problems

  • Moumita Choudhury
  • Saaduddin Mahmud
  • Md. Mosaddek Khan

Distributed Constraint Optimization Problems (DCOPs) are a widely studied constraint handling framework. The objective of a DCOP algorithm is to optimize a global objective function that can be described as the aggregation of several distributed constraint cost functions. In a DCOP, each of these functions is defined by a set of discrete variables. However, in many applications, such as target tracking or sleep scheduling in sensor networks, continuous valued variables are more suited than the discrete ones. Considering this, Functional DCOPs (F-DCOPs) have been proposed that can explicitly model a problem containing continuous variables. Nevertheless, state-of-the-art F-DCOPs approaches experience onerous memory or computation overhead. To address this issue, we propose a new F-DCOP algorithm, namely Particle Swarm based F-DCOP (PFD), which is inspired by a meta-heuristic, Particle Swarm Optimization (PSO). Although it has been successfully applied to many continuous optimization problems, the potential of PSO has not been utilized in F-DCOPs. To be exact, PFD devises a distributed method of solution construction while significantly reducing the computation and memory requirements. Moreover, we theoretically prove that PFD is an anytime algorithm. Finally, our empirical results indicate that PFD outperforms the state-of-the-art approaches in terms of solution quality and computation overhead.

IJCAI Conference 2020 Conference Paper

Learning Optimal Temperature Region for Solving Mixed Integer Functional DCOPs

  • Saaduddin Mahmud
  • Md. Mosaddek Khan
  • Moumita Choudhury
  • Long Tran-Thanh
  • Nicholas R. Jennings

Distributed Constraint Optimization Problems (DCOPs) are an important framework for modeling coordinated decision-making problems in multi-agent systems with a set of discrete variables. Later works have extended DCOPs to model problems with a set of continuous variables, named Functional DCOPs (F-DCOPs). In this paper, we combine both of these frameworks into the Mixed Integer Functional DCOP (MIF-DCOP) framework that can deal with problems regardless of their variables' type. We then propose a novel algorithm - Distributed Parallel Simulated Annealing (DPSA), where agents cooperatively learn the optimal parameter configuration for the algorithm while also solving the given problem using the learned knowledge. Finally, we empirically evaluate our approach in DCOP, F-DCOP, and MIF-DCOP settings and show that DPSA produces solutions of significantly better quality than the state-of-the-art non-exact algorithms in their corresponding settings.