Mohammed Abdullah Papers

AAAI Conference 2026 Conference Paper

Constrained Online Convex Optimization with Memory and Predictions

Mohammed Abdullah
George Iosifidis
Salah Eddine Elayoubi
Tijani Chahed

We study Constrained Online Convex Optimization with Memory (COCO-M), where both the loss and the constraints depend on a finite window of past decisions made by the learner. This setting extends the previously studied unconstrained online optimization with memory framework and captures practical problems such as the control of constrained dynamical systems and scheduling with reconfiguration budgets. For this problem, we propose the first algorithms that achieve sublinear regret and sublinear cumulative constraint violation under time-varying constraints, both with and without predictions of future loss and constraint functions. Without predictions, we introduce an adaptive penalty approach that guarantees sublinear regret and constraint violation. When short-horizon and potentially unreliable predictions are available, we reinterpret the problem as online learning with delayed feedback and design an optimistic algorithm whose performance improves as prediction accuracy improves, while remaining robust when predictions are inaccurate. Our results bridge the gap between classical constrained online convex optimization and memory-dependent settings, and provide a versatile learning toolbox with diverse applications.

PDF Details DOI

EWRL Workshop 2018 Workshop Paper

Reinforcement Learning with Wasserstein Distance Regularisation, with Applications to Multipolicy Learning

Mohammed Abdullah
Moez Draief
Aldo Pacchiano

We describe an application of Wasserstein distance to Reinforcement Learning. The Wasserstein distance in question is between the distribution of mappings of trajectories of a policy into some metric space, and some other fixed distribution (which may, for example, come from another policy). Different policies induce different distributions, so given an underlying metric, the Wasserstein distance quantifies how different policies are. This can be used to learn multiple polices which are different in terms of such Wasserstein distances by using a Wasserstein regulariser. Changing the sign of the regularisation parameter, one can learn a policy for which its trajectory mapping distribution is attracted to a given fixed distribution.

PDF Details

Possible papers

Constrained Online Convex Optimization with Memory and Predictions

Reinforcement Learning with Wasserstein Distance Regularisation, with Applications to Multipolicy Learning