AAAI Conference 2025 Short Paper
Language Model Meets Prototypes: Towards Interpretable Text Classification Models through Prototypical Networks
- Ximing Wen
Pretrained transformer-based Language Models (LMs) are well-known for their ability to achieve significant improvement on NLP tasks, but their black-box nature, which leads to a lack of interpretability, has been a major concern. My dissertation focuses on developing intrinsically interpretable models when using LMs as encoders while maintaining their superior performance via prototypical networks. I initiated my research by investigating enhancements in performance for interpretable models of sarcasm detection. My proposed approach focuses on capturing sentiment incongruity to enhance accuracy while offering instance-based explanations for the classification decisions. Later, we develop a novel white-box multi-head graph attention-based prototypical framework designed to explain the decisions of text classification models without sacrificing the accuracy of the original black-box LMs. In addition, I am working on extending the attention-based prototypical framework with contrastive learning to redesign an interpretable graph neural network for document classification, aiming to enhance both the interpretability and performance of the model in document classification.