MLLM Enriched Explainable Multiple Clustering

Shan Zhang; Liangrui Ren; Qiaoyu Tan; Carlotta Domeniconi; Wei Du; Jun Wang; Guoxian Yu

doi:10.1609/aaai.v40i33.40066

Back to AAAI

AAAI 2026

MLLM Enriched Explainable Multiple Clustering

Conference Paper AAAI Technical Track on Machine Learning X Artificial Intelligence

PDF Details DOI

Abstract

Multiple clustering aims to uncover diverse latent structures within the data, enabling a more comprehensive understanding of complex datasets. However, existing approaches either heavily rely on user-supplied keywords or disregard user-interested clustering types, limiting the ability to discover the full range of explainable clusterings of interests, particularly in high-dimensional settings. Furthermore, existing methods insufficiently leverage the rich textual semantics and fall short in fully integrating multi-modal information. To address these challenges, we propose MLLM enriched Multiple Clustering (MLLMMC), a novel framework that leverages multi-modal large language model (MLLM) to explore explainable non-redundant clustering. Specifically, MLLMMC first employs MLLM to generate sample descriptions, which serve as input for LLM to perform prompt-driven reasoning and infer latent clustering types, and then merges them with user-interested types to obtain diverse and explainable clustering types. For each selected type, MLLMMC utilizes MLLM to generate sample-level textual descriptions and aligns them with corresponding visual features through a cross-attention fusion module, which produces a semantically aligned and enriched representation for the target clustering type. Extensive experiments on six benchmark datasets from diverse domains demonstrate that MLLMMC achieves diverse, explainable, and high-quality clustering outcomes, outperforming state-of-the-art multiple clustering methods with a large margin.

MLLM Enriched Explainable Multiple Clustering

Abstract

Authors

Keywords

Context