FOCS Conference 2025 Conference Paper
Handling LP-Rounding for Hierarchical Clustering and Fitting Distances by Ultrametrics
- Hyung-Chan An
- Mong-Jen Kao
- Changyeol Lee
- Mu-Ting Lee
We consider the classic correlation clustering problem in the hierarchical setting. Given a complete graph $G=(V, E)$ and $\ell$ layers of input information, where the input of each layer consists of a non-negative weight and a labeling of the edges with either + or -, this problem seeks to compute for each layer a partition of V such that the partition for any non-top layer subdivides the partition in the upper-layer and the weighted number of disagreements over the layers is minimized, where the disagreement of a layer is the number of + edges across parts plus the number of - edges within parts. Hierarchical correlation clustering is a natural formulation of the classic problem of fitting distances by ultrametrics, which is further known as numerical taxonomy [1]–[3] in the literature. While single-layer correlation clustering received wide attention since it was introduced in [4] and major progress evolved in the past three years [5]–[8], few is known for this problem in the hierarchical setting [9], [10]. The lack of understanding and adequate tools is reflected in the large approximation ratio known for this problem, which originates from 2021. In this work we make both conceptual and technical contributions towards the hierarchical clustering problem. We present a simple paradigm that greatly facilitates LP-rounding in hierarchical clustering, illustrated with a delicate algorithm providing a significantly improved approximation guarantee of 25. 7846 for the hierarchical correlation clustering problem. Our techniques reveal surprising new properties and advances the current understanding for the formulation presented and subsequently used in [9] –[12] for hierarchical clustering over the past two decades. This provides a unifying interpretation on the core-technical problem in hierarchical clustering as the problem of finding cuts with prescribed properties regarding the average distance of certain cut pairs. We further illustrate this perspective by showing that a direct application of the paradigm and techniques presented in this work gives a simple alternative to the state-of-the-art result presented in [12] for the ultrametric violation distance problem. -hierarchical correlation clustering, ultrametric embedding, correlation clustering, linear programming rounding, approximation algorithms