Arrow Research search
Back to NAI

NAI 2024

A neurosymbolic approach to AI alignment

Journal Article journal-article Artificial Intelligence · Neurosymbolic AI

Abstract

We propose neurosymbolic integration as an approach for AI alignment via concept-based model explanation. The aim is to offer AI systems the ability to learn from human revision but also assist humans at evaluating AI capabilities. The proposed method allows users and domain experts to learn about the data-driven decision making process of large neural network models and to impose a particular behaviour onto such models. The models are queried using a symbolic logic language that acts as a lingua franca between humans and model representations. Interaction with the user then confirms or rejects a revision of the model using logical constraints that can be distilled back into the neural network. We illustrate the approach using the Logic Tensor Network framework alongside Concept Activation Vectors and apply it to Convolutional Neural Networks and the task of achieving quantitative fairness. Our results illustrate how the use of a logical language is able to provide users with a formalisation of the model’s decision making whilst allowing users to steer the model towards a given alignment constraint.

Authors

Keywords

No keywords are indexed for this paper.

Context

Venue
Neurosymbolic Artificial Intelligence
Archive span
2024-2026
Indexed papers
43
Paper id
173869330782335812