NeurIPS Conference 1999 Conference Paper
Model Selection in Clustering by Uniform Convergence Bounds
- Joachim Buhmann
- Marcus Held
Unsupervised learning algorithms are designed to extract struc(cid: 173) ture from data samples. Reliable and robust inference requires a guarantee that extracted structures are typical for the data source, Le. , similar structures have to be inferred from a second sample set of the same data source. The overfitting phenomenon in max(cid: 173) imum entropy based annealing algorithms is exemplarily studied for a class of histogram clustering models. Bernstein's inequality for large deviations is used to determine the maximally achievable approximation quality parameterized by a minimal temperature. Monte Carlo simulations support the proposed model selection cri(cid: 173) terion by finite temperature annealing.