Author name cluster

Mehran Mesbahi

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers

2 author rows

AAAI Conference 2025 Conference Paper

Multi Agent Reinforcement Learning for Sequential Satellite Assignment Problems

Joshua Holder
Natasha Jaques
Mehran Mesbahi

Assignment problems are a classic combinatorial optimization problem in which a group of agents must be assigned to a group of tasks such that maximum utility is achieved while satisfying assignment constraints. Given the utility of each agent completing each task, polynomial-time algorithms exist to solve a single assignment problem in its simplest form. However, in many modern-day applications such as satellite constellations, power grids, and mobile robot scheduling, assignment problems unfold over time, with the utility for a given assignment depending heavily on the state of the system. We apply multi-agent reinforcement learning to this problem, learning the value of assignments by bootstrapping from the known polynomial-time greedy solver and then learning from further experience. We then choose assignments using a distributed optimal assignment mechanism rather than by selecting them directly. We demonstrate that this algorithm is theoretically justified and avoids pitfalls experienced by other RL algorithms in this setting. Finally, we show that our algorithm significantly outperforms other methods in the literature, even while scaling to realistic scenarios with hundreds of agents and tasks.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

Data-driven Optimal Filtering for Linear Systems with Unknown Noise Covariances

Shahriar Talebi
Amirhossein Taghvaei
Mehran Mesbahi

This paper examines learning the optimal filtering policy, known as the Kalman gain, for a linear system with unknown noise covariance matrices using noisy output data. The learning problem is formulated as a stochastic policy optimiza- tion problem, aiming to minimize the output prediction error. This formulation provides a direct bridge between data-driven optimal control and, its dual, op- timal filtering. Our contributions are twofold. Firstly, we conduct a thorough convergence analysis of the stochastic gradient descent algorithm, adopted for the filtering problem, accounting for biased gradients and stability constraints. Secondly, we carefully leverage a combination of tools from linear system theory and high-dimensional statistics to derive bias-variance error bounds that scale logarithmically with problem dimension, and, in contrast to subspace methods, the length of output trajectories only affects the bias term.

PDF Details

ICML Conference 2018 Conference Paper

Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator

Maryam Fazel
Rong Ge 0001
Sham M. Kakade
Mehran Mesbahi

Direct policy gradient methods for reinforcement learning and continuous control problems are a popular approach for a variety of reasons: 1) they are easy to implement without explicit knowledge of the underlying model, 2) they are an “end-to-end” approach, directly optimizing the performance metric of interest, 3) they inherently allow for richly parameterized policies. A notable drawback is that even in the most basic continuous control problem (that of linear quadratic regulators), these methods must solve a non-convex optimization problem, where little is understood about their efficiency from both computational and statistical perspectives. In contrast, system identification and model based planning in optimal control theory have a much more solid theoretical footing, where much is known with regards to their computational and statistical properties. This work bridges this gap showing that (model free) policy gradient methods globally converge to the optimal solution and are efficient (polynomially so in relevant problem dependent quantities) with regards to their sample and computational complexities.

Details

ICRA Conference 2015 Conference Paper

Efficient leader selection for translation and scale of a bearing-compass formation

Eric Schoof
Airlie Chapman
Mehran Mesbahi

The paper considers the efficient selection of leader agents in a swarm running a distributed bearing-compass formation controller. The leaders apply external control which induces translation and scaling of the formation, providing manipulation methods useful to a human operator. The selection algorithm for maximizing translation and scale draws from modularity and submodularity theory. Consequently, the algorithms exhibit guaranteed optimal and suboptimal performance, respectively. For more restricted human-swarm interaction requiring pure translation and scale, a relaxed integer programming algorithm is described to reduce the combinatorial optimization problem to a computationally tractable semidefinite program. The leader selection strategies are supported through demonstration on a swarm testbed.

Details

IROS Conference 2012 Conference Paper

Advection on networks with an application to decentralized load balancing

Airlie Chapman
Eric Schoof
Mehran Mesbahi

This paper examines an advection-based protocol for the coordination of a networked, multi-agent system. Diffusion forms the basis of the popular consensus dynamics and is closely related to advection. It is with this motivation that we examine a discretized version of advection over a network with the flow field realized through directed graph edges. We endeavor to demonstrate that the subsequent advection protocol forms an attractive set of system dynamics for coordinated control. This paper includes a formulation of the advection dynamics on directed graphs and a presentation of some of its characteristics. We also demonstrate the versatility of the advection dynamics with a decentralized load balancing application.

Details

ICRA Conference 2010 Conference Paper

Semi-autonomous networks: Theory and decentralized protocols

Airlie Chapman
Eric Schoof
Mehran Mesbahi

This paper examines the system dynamics of a networked multi-agent system, operating with a consensus-type algorithm, that can be influenced by external agents. We refer to this class of networks as semi-autonomous. We introduce a control scheme for such semi-autonomous networks, involving excitation of the network by the external agents, with the objective of manipulating or steering the network. In this context, we consider the situation where the external agents deliver a constant mean control signal. We proceed to examine the resultant mean and covariance of the network output and relate these quantities to circuit-theoretic notions of the network, quantifying the network's amenability to external signals. Four protocols for tree graphs, to promote and deter convergence and to increase and reduce average variance within the network are then presented. These protocols involve decentralized local edge swaps that can be performed in parallel and asynchronously.

Details

ICRA Conference 2010 Conference Paper

Topology control of dynamic networks in the presence of local and global constraints

Nima Moshtagh
Raman K. Mehra
Mehran Mesbahi

Formation flying (FF) is a critical element in NASA's future deep-space missions. Terrestrial Planet Finder (TPF), NASA's first space-based mission to directly observe planets outside our own solar system, will rely on FF to achieve the functionality and benefits of a large instrument using multiple lower cost smaller spacecraft. Many key network design problems for such FF missions can be formulated as optimization problems with local and global constraints. We develop a topology control algorithm that can be used for many network problems in the presence of local constraints, such as collision avoidance, and global constraints, such as network connectivity. The presence of contradictory objectives in topology control problems motivated a game-theoretic approach. We demonstrated that a game-theoretic technique could provide a framework for design and analysis of many topology control problems in dynamic networks. In particular, the problem of motion planning for formation reconfiguration in the presence of constraints on network connectivity and inter-spacecraft collisions is studied.

Details