Author name cluster

Malek Mechergui

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

3 papers

1 author row

NeurIPS Conference 2024 Conference Paper

Expectation Alignment: Handling Reward Misspecification in the Presence of Expectation Mismatch

Malek Mechergui
Sarath Sreedharan

Detecting and handling misspecified objectives, such as reward functions, has been widely recognized as one of the central challenges within the domain of Artificial Intelligence (AI) safety research. However, even with the recognition of the importance of this problem, we are unaware of any works that attempt to provide a clear definition for what constitutes (a) misspecified objectives and (b) successfully resolving such misspecifications. In this work, we use the theory of mind, i. e. , the human user's beliefs about the AI agent, as a basis to develop a formal explanatory framework, called Expectation Alignment (EAL), to understand the objective misspecification and its causes. Our EAL framework not only acts as an explanatory framework for existing works but also provides us with concrete insights into the limitations of existing methods to handle reward misspecification and novel solution strategies. We use these insights to propose a new interactive algorithm that uses the specified reward to infer potential user expectations about the system behavior. We show how one can efficiently implement this algorithm by mapping the inference problem into linear programs. We evaluate our method on a set of standard Markov Decision Process (MDP) benchmarks.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Goal Alignment: Re-analyzing Value Alignment Problems Using Human-Aware AI

Malek Mechergui
Sarath Sreedharan

While the question of misspecified objectives has gotten much attention in recent years, most works in this area primarily focus on the challenges related to the complexity of the objective specification mechanism (for example, the use of reward functions). However, the complexity of the objective specification mechanism is just one of many reasons why the user may have misspecified their objective. A foundational cause for misspecification that is being overlooked by these works is the inherent asymmetry in human expectations about the agent's behavior and the behavior generated by the agent for the specified objective. To address this, we propose a novel formulation for the objective misspecification problem that builds on the human-aware planning literature, which was originally introduced to support explanation and explicable behavioral generation. Additionally, we propose a first-of-its-kind interactive algorithm that is capable of using information generated under incorrect beliefs about the agent to determine the true underlying goal of the user.

PDF Details DOI

AAMAS Conference 2023 Conference Paper

Goal Alignment: Re-analyzing Value Alignment Problems Using Human-Aware AI

Malek Mechergui
Sarath Sreedharan

Value alignment problems arise in scenarios where the specified objectives of an AI agent don’t match the true underlying objectives of its users. While value alignment remains a popular topic within AI safety research, most existing works in this sphere tend to overlook one of the foundational causes for misalignment, namely the inherent asymmetry in human expectations about the agent’s behavior and the behavior generated by the agent for the specified objective. To address this lacuna, we propose a novel formulation for the value alignment problem, named Human-aware goal alignment that highlights this central challenge related to value alignment. Additionally, we propose a first-of-its-kind interactive goal elicitation algorithm that is capable of using information generated under incorrect beliefs about the agent, to determine the true underlying goal of the user.

PDF