Arrow Research search
Back to JMLR

JMLR 2023

Adapting and Evaluating Influence-Estimation Methods for Gradient-Boosted Decision Trees

Journal Article Articles Artificial Intelligence ยท Machine Learning

Abstract

Influence estimation analyzes how changes to the training data can lead to different model predictions; this analysis can help us better understand these predictions, the models making those predictions, and the data sets they are trained on. However, most influence-estimation techniques are designed for deep learning models with continuous parameters. Gradient-boosted decision trees (GBDTs) are a powerful and widely-used class of models; however, these models are black boxes with opaque decision-making processes. In the pursuit of better understanding GBDT predictions and generally improving these models, we adapt recent and popular influence-estimation methods designed for deep learning models to GBDTs. Specifically, we adapt representer-point methods and TracIn, denoting our new methods TREX and BoostIn, respectively; source code is available at https://github.com/jjbrophy47/treeinfluence. We compare these methods to LeafInfluence and other baselines using 5 different evaluation measures on 22 real-world data sets with 4 popular GBDT implementations. These experiments give us a comprehensive overview of how different approaches to influence estimation work in GBDT models. We find BoostIn is an efficient influence-estimation method for GBDTs that performs equally well or better than existing work while being four orders of magnitude faster. Our evaluation also suggests the gold-standard approach of leave-one-out (LOO) retraining consistently identifies the single-most influential training example but performs poorly at finding the most influential set of training examples for a given target prediction. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2023. ( edit, beta )

Authors

Keywords

No keywords are indexed for this paper.

Context

Venue
Journal of Machine Learning Research
Archive span
2000-2026
Indexed papers
4180
Paper id
512779079651754380