AAAI Conference 2022 Conference Paper
Explainable Survival Analysis with Convolution-Involved Vision Transformer
- Yifan Shen
- Li Liu
- Zhihao Tang
- Zongyi Chen
- Guixiang Ma
- Jiyan Dong
- Xi Zhang
- Lin Yang
Image-based survival prediction models can facilitate doctors in diagnosing and treating cancer patients. With the advance of digital pathology technologies, the big whole slide images (WSIs) provide increased resolution and more details for diagnosis. However, the gigabytesize or even terabyte-size WSIs would make most models computationally infeasible. To this end, instead of using the complete WSIs, most of the existing models only use a pre-selected subset of key patches or patch clusters as input, which might discard some important morphology information. In this work, we propose a novel survival analysis model to fully utilize the complete WSI information. We show that the use of a Vision Transformer (ViT) backbone, together with convolution operations involved in it, is an effective approach to improve the prediction performance. Additionally, we present a post-hoc explainable method to identify the most salient patches and distinct morphology features, making the model more faithful and the results easier to comprehend by human users. Evaluations on two large cancer datasets show that our proposed model is more effective and has better interpretability for survival prediction. We would make the code publicly available upon acceptance.