A neurosymbolic approach to AI alignment

Benedikt J. Wagner; Artur d’Avlia Garcez

doi:10.3233/nai-240729

Back to NAI

NAI 2024

A neurosymbolic approach to AI alignment

Journal Article journal-article Artificial Intelligence · Neurosymbolic AI

Details DOI

Abstract

We propose neurosymbolic integration as an approach for AI alignment via concept-based model explanation. The aim is to offer AI systems the ability to learn from human revision but also assist humans at evaluating AI capabilities. The proposed method allows users and domain experts to learn about the data-driven decision making process of large neural network models and to impose a particular behaviour onto such models. The models are queried using a symbolic logic language that acts as a lingua franca between humans and model representations. Interaction with the user then confirms or rejects a revision of the model using logical constraints that can be distilled back into the neural network. We illustrate the approach using the Logic Tensor Network framework alongside Concept Activation Vectors and apply it to Convolutional Neural Networks and the task of achieving quantitative fairness. Our results illustrate how the use of a logical language is able to provide users with a formalisation of the model’s decision making whilst allowing users to steer the model towards a given alignment constraint.

A neurosymbolic approach to AI alignment

Abstract

Authors

Keywords

Context