Regularizing Black-box Models for Improved Interpretability

Gregory Plumb; Maruan Al-Shedivat; Ángel Alexander Cabrera; Adam Perer; Eric Xing; Ameet Talwalkar

Back to NeurIPS

NeurIPS 2020

Regularizing Black-box Models for Improved Interpretability

Conference Paper Artificial Intelligence · Machine Learning

PDF Details

Abstract

Most of the work on interpretable machine learning has focused on designing either inherently interpretable models, which typically trade-off accuracy for interpretability, or post-hoc explanation systems, whose explanation quality can be unpredictable. Our method, ExpO, is a hybridization of these approaches that regularizes a model for explanation quality at training time. Importantly, these regularizers are differentiable, model agnostic, and require no domain knowledge to define. We demonstrate that post-hoc explanations for ExpO-regularized models have better explanation quality, as measured by the common fidelity and stability metrics. We verify that improving these metrics leads to significantly more useful explanations with a user study on a realistic task.

Authors

Keywords

No keywords are indexed for this paper.

Context

Venue: Annual Conference on Neural Information Processing Systems
Archive span: 1987-2025
Indexed papers: 30776
Paper id: 402314041548396754