Author name cluster

Kaitao Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

3 papers

1 author row

AAAI Conference 2025 Conference Paper

Counterfactual Debiasing for Physical Audiovisual Commonsense Reasoning

Daoming Zong
Chaoyue Ding
Kaitao Chen
Yinsheng Li
Shuaiyu Wang

Physical commonsense is an essential aspect of human cognition, involving an intuitive understanding of the physical properties and interactions of everyday objects and materials. Though physical commonsense reasoning should inherently be a multisensory task, integrating both video and audio signals, existing physical audiovisual commonsense reasoning (PACR) models predominantly rely on visual information. This reliance leads to spurious correlations and undermines the models’ reasoning and generalization abilities. To counteract this, we introduce a model-agnostic Counterfactual Physical Audiovisual Commonsense Reasoning (CF-PACR) framework aimed at mitigating visual bias-induced spurious effects. Specifically, we construct a traditional PACR model using both audio and visual information as the factual reasoning model. Subsequently, in the counterfactual reasoning model, we isolate visual information to estimate direct effects. Finally, we subtract the direct effects from the total effects across modalities to derive indirect effects, thereby mitigating visual biases. Extensive experiments validate the effectiveness and generalizability of CF-PACR in alleviating the spurious correlations between visual modality and model predictions.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

RPMIL: Rethinking Uncertainty-Aware Probabilistic Multiple Instance Learning for Whole Slide Pathology Diagnosis

Zhikang Zhao
Kaitao Chen
Jing Zhao

Whole slide images (WSIs) are gigapixel digital scans of traditional pathology slides, offering substantial support for cancer diagnosis. Current multiple instance learning (MIL) methods for WSIs typically extract instance features and aggregate these into a single bag feature for prediction. We observe that these MIL methods rely on point estimation, where each bag is mapped to a deterministic embedding. Such MIL methods based on point estimation fail to capture the full spectrum of data variability due to the reliance on fixed embedding, especially when the number of trainable bags is limited. In this paper, we rethink probabilistic modeling in MIL and propose RPMIL, an uncertainty-aware probabilistic MIL method for whole slide pathology diagnosis. RPMIL learns a probabilistic aggregator to consolidate instance features into dynamic bag feature distributions instead of a deterministic bag feature. Specifically, we employ a variational autoencoder approach to compress multiple instance features into a low-dimension space with probabilistic representation and obtain the bag feature distribution formulated by the mean and variance. Furthermore, we drive the prediction by jointly leveraging the instance feature distribution and bag feature distribution. We evaluate the WSI classification performance on two public datasets: Camelyon16 and TCGA-NSCLC. Extensive experiments demonstrate that our method surpasses point estimation methods in MIL, achieving state-of-the-art levels.

PDF Details DOI

AAAI Conference 2024 Conference Paper

CaMIL: Causal Multiple Instance Learning for Whole Slide Image Classification

Kaitao Chen
Shiliang Sun
Jing Zhao

Whole slide image (WSI) classification is a crucial component in automated pathology analysis. Due to the inherent challenges of high-resolution WSIs and the absence of patch-level labels, most of the proposed methods follow the multiple instance learning (MIL) formulation. While MIL has been equipped with excellent instance feature extractors and aggregators, it is prone to learn spurious associations that undermine the performance of the model. For example, relying solely on color features may lead to erroneous diagnoses due to spurious associations between the disease and the color of patches. To address this issue, we develop a causal MIL framework for WSI classification, effectively distinguishing between causal and spurious associations. Specifically, we use the expectation of the intervention P(Y | do(X)) for bag prediction rather than the traditional likelihood P(Y | X). By applying the front-door adjustment, the spurious association is effectively blocked, where the intervened mediator is aggregated from patch-level features. We evaluate our proposed method on two publicly available WSI datasets, Camelyon16 and TCGA-NSCLC. Our causal MIL framework shows outstanding performance and is plug-and-play, seamlessly integrating with various feature extractors and aggregators.

PDF Details DOI