Author name cluster

Mahdi Imani

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

9 papers

2 author rows

AAAI Conference 2025 Conference Paper

Learning to Collaborate with Unknown Agents in the Absence of Reward

Zuyuan Zhang
Hanhan Zhou
Mahdi Imani
Taeyoung Lee
Tian Lan

With the advancements of artificial intelligence (AI), emerging scenarios involving close collaboration between AI and other unknown agents are becoming increasingly common. This requires sometimes training AI agents to collaborate with unknown agents in the absence of a reward function -- which may be unavailable to the AI agents or even undefined by the unknown agents themselves -- thus posing news challenges to existing learning algorithms that often require knowing the shared reward. In this paper, we show that effective teaming with unknown agents can be achieved in the absence of a reward function, through actively modeling other unknown agents and reasoning about their latent rewards from available interaction/observation history. In particular, we propose a novel framework that leverages a kernel density Bayesian inverse learning method for active reward/goal inference and prove that multi-agent reinforcement learning guided by the inferred reward signals can converge to an optimal policy teaming with unknown agents. The result enables us to develop an adaptive policy update strategy, through the use of a family of pre-trained, goal-conditioned policies, further eliminating the need for online retraining. The proposed solution is evaluated using a wide range of diverse unknown agents of latent and even non-stationary reward. Our solution significantly increases the teaming performance between AI and unknown agents in the absence of reward.

PDF Details DOI

RLC Conference 2025 Conference Paper

Reinforcement Learning for Human-AI Collaboration via Probabilistic Intent Inference

Yuxin Lin
Seyede Fatemeh Ghoreishi
Tian Lan
Mahdi Imani

Effective collaboration between humans and AI agents is increasingly essential as autonomous systems take on critical roles in domains like disaster response, healthcare, and robotics. However, achieving robust human-AI collaboration remains challenging due to the uncertainty, complexity, and unpredictability of human behavior, which is often difficult to convey explicitly to AI agents. This paper presents a belief-space reinforcement learning framework that enables AI agents to implicitly and probabilistically infer latent human intentions from behavioral data and integrate this understanding into robust decision-making. Our approach models human behavior at both the action (low) and subtask (high) levels, combining these with human and agent state information to construct a comprehensive belief state for the AI agent. We demonstrate that this belief state follows the Markov property, enabling the derivation of an optimal Bayesian policy under human and task uncertainty. Deep reinforcement learning is used to train an offline Bayesian policy across a wide range of human and task uncertainties, allowing real-time deployment to support effective human-AI collaboration. Numerical experiments demonstrate the effectiveness of the proposed policy in terms of cooperation, adaptability, and robustness.

PDF Details

RLJ Journal 2025 Journal Article

Reinforcement Learning for Human-AI Collaboration via Probabilistic Intent Inference

Yuxin Lin
Seyede Fatemeh Ghoreishi
Tian Lan
Mahdi Imani

PDF Details

ICLR Conference 2024 Conference Paper

Bayesian Optimization through Gaussian Cox Process Models for Spatio-temporal Data

Yongsheng Mei
Mahdi Imani
Tian Lan 0001

Bayesian optimization (BO) has established itself as a leading strategy for efficiently optimizing expensive-to-evaluate functions. Existing BO methods mostly rely on Gaussian process (GP) surrogate models and are not applicable to (doubly-stochastic) Gaussian Cox processes, where the observation process is modulated by a latent intensity function modeled as a GP. In this paper, we propose a novel maximum *a posteriori* inference of Gaussian Cox processes. It leverages the Laplace approximation and change of kernel technique to transform the problem into a new reproducing kernel Hilbert space, where it becomes more tractable computationally. It enables us to obtain both a functional posterior of the latent intensity function and the covariance of the posterior, thus extending existing works that often focus on specific link functions or estimating the posterior mean. Using the result, we propose a BO framework based on the Gaussian Cox process model and further develop a Nyström approximation for efficient computation. Extensive evaluations on various synthetic and real-world datasets demonstrate significant improvement over state-of-the-art inference solutions for Gaussian Cox processes, as well as effective BO with a wide range of acquisition functions designed through the underlying Gaussian Cox process model.

Details

ECAI Conference 2023 Conference Paper

A Bayesian Optimization Framework for Finding Local Optima in Expensive Multimodal Functions

Yongsheng Mei
Tian Lan 0001
Mahdi Imani
Suresh Subramaniam 0001

Bayesian optimization (BO) is a popular global optimization scheme for sample-efficient optimization in domains with expensive function evaluations. The existing BO techniques are capable of finding a single global optimum solution. However, finding a set of global and local optimum solutions is crucial in a wide range of real-world problems, as implementing some of the optimal solutions might not be feasible due to various practical restrictions (e. g. , resource limitation, physical constraints, etc.). In such domains, if multiple solutions are known, the implementation can be quickly switched to another solution, and the best possible system performance can still be obtained. This paper develops a multimodal BO framework to effectively find a set of local/global solutions for expensive-to-evaluate multimodal objective functions. We consider the standard BO setting with Gaussian process regression representing the objective function. We analytically derive the joint distribution of the objective function and its first-order derivatives. This joint distribution is used in the body of the BO acquisition functions to search for local optima during the optimization process. We introduce variants of the well-known BO acquisition functions to the multimodal setting and demonstrate the performance of the proposed framework in locating a set of local optimum solutions using multiple optimization problems.

Details

IS Journal 2022 Journal Article

Bayesian Optimization for Expensive Smooth-Varying Functions

Mahdi Imani
Mohsen Imani
Seyede Fatemeh Ghoreishi

Bayesian optimization (BO) is a powerful class of data-driven techniques for the maximization of expensive-to-evaluate objective functions. These techniques construct a Gaussian process (GP) regression for representing the objective function according to the latest available function evaluations and sequentially select samples and evaluate the function by maximizing an acquisition function. The primary assumption in most BO policies is that the objective function has a uniform level of smoothness over the input space, modeled by a kernel function. However, the uniform smoothness assumption is likely to be violated in a wide range of practical problems, primary domains in which the objective function is evaluated differently at various regions of input space (e. g. , through different experiments, software, or approximators). This article develops a BO framework capable of optimizing expensive smooth-varying functions. Unlike the existing techniques that rely on a single GP model, the proposed framework constructs a set of local and global GP models to represent the objective function. The predictive mean and variance at any given sample in the input space are computed according to the posterior probabilities of the local and global GP models. Local and global models are adaptively controlled through a single parameter, which can be optimized along with other GP models’ parameters during the optimization process. Using the predicted local and global values, the expected improvement acquisition function is employed as one of the possible acquisition functions for the selection process. The performance of the proposed framework is assessed extensively through two optimization benchmark problems.

Details DOI

IS Journal 2021 Journal Article

Optimal Finite-Horizon Perturbation Policy for Inference of Gene Regulatory Networks

Mahdi Imani
Seyede Fatemeh Ghoreishi

A major goal of systems biology is to model accurately the complex dynamical behavior of gene regulatory networks (GRNs). Despite several advancements that have been made in inference of GRNs, two main issues continue to make the problem challenging: 1) nonidentifiability of parameters and 2) limited amounts of data. Thus, it becomes necessary to experimentally perturb or excite the system into different states. This perturbation process disrupts the expression of genes from active to inactive, or vice versa, at each time point. Another issue is the partial observability of the gene states, which must be inferred indirectly from noisy gene expression measurements. In this article, this latter issue is accounted for by employing the partially observed Boolean dynamical system signal model for the data and applying optimal state estimation. Then, the optimal finite-horizon perturbation policy is derived to achieve the highest possible expected performance for the maximum a posteriori estimator under a small perturbation cost. Performance is assessed through numerical experiments using the well-known p53-MDM2 negative-feedback loop regulatory model and synthetic GRNs.

Details DOI

AAAI Conference 2019 Conference Paper

MFBO-SSM: Multi-Fidelity Bayesian Optimization for Fast Inference in State-Space Models

Mahdi Imani
Seyede Fatemeh Ghoreishi
Douglas Allaire
Ulisses M. Braga-Neto

Nonlinear state-space models are ubiquitous in modeling real-world dynamical systems. Sequential Monte Carlo (SMC) techniques, also known as particle methods, are a well-known class of parameter estimation methods for this general class of state-space models. Existing SMC-based techniques rely on excessive sampling of the parameter space, which makes their computation intractable for large systems or tall data sets. Bayesian optimization techniques have been used for fast inference in state-space models with intractable likelihoods. These techniques aim to find the maximum of the likelihood function by sequential sampling of the parameter space through a single SMC approximator. Various SMC approximators with different fidelities and computational costs are often available for sample-based likelihood approximation. In this paper, we propose a multi-fidelity Bayesian optimization algorithm for the inference of general nonlinear state-space models (MFBO-SSM), which enables simultaneous sequential selection of parameters and approximators. The accuracy and speed of the algorithm are demonstrated by numerical experiments using synthetic gene expression data from a gene regulatory network model and real data from the VIX stock price index.

PDF Details

NeurIPS Conference 2018 Conference Paper

Bayesian Control of Large MDPs with Unknown Dynamics in Data-Poor Environments

Mahdi Imani
Seyede Fatemeh Ghoreishi
Ulisses M. Braga-Neto

We propose a Bayesian decision making framework for control of Markov Decision Processes (MDPs) with unknown dynamics and large, possibly continuous, state, action, and parameter spaces in data-poor environments. Most of the existing adaptive controllers for MDPs with unknown dynamics are based on the reinforcement learning framework and rely on large data sets acquired by sustained direct interaction with the system or via a simulator. This is not feasible in many applications, due to ethical, economic, and physical constraints. The proposed framework addresses the data poverty issue by decomposing the problem into an offline planning stage that does not rely on sustained direct interaction with the system or simulator and an online execution stage. In the offline process, parallel Gaussian process temporal difference (GPTD) learning techniques are employed for near-optimal Bayesian approximation of the expected discounted reward over a sample drawn from the prior distribution of unknown parameters. In the online stage, the action with the maximum expected return with respect to the posterior distribution of the parameters is selected. This is achieved by an approximation of the posterior distribution using a Markov Chain Monte Carlo (MCMC) algorithm, followed by constructing multiple Gaussian processes over the parameter space for efficient prediction of the means of the expected return at the MCMC sample. The effectiveness of the proposed framework is demonstrated using a simple dynamical system model with continuous state and action spaces, as well as a more complex model for a metastatic melanoma gene regulatory network observed through noisy synthetic gene expression data.

PDF Details