Arrow Research search

Author name cluster

Pan Peng 0001

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

10 papers
1 author row

Possible papers

10

ICML Conference 2025 Conference Paper

Average Sensitivity of Hierarchical k-Median Clustering

  • Shijie Li
  • Weiqiang He
  • Ruobing Bai
  • Pan Peng 0001

Hierarchical clustering is a widely used method for unsupervised learning with numerous applications. However, in the application of modern algorithms, the datasets studied are usually large and dynamic. If the hierarchical clustering is sensitive to small perturbations of the dataset, the usability of the algorithm will be greatly reduced. In this paper, we focus on the hierarchical $k$ -median clustering problem, which bridges hierarchical and centroid-based clustering while offering theoretical appeal, practical utility, and improved interpretability. We analyze the average sensitivity of algorithms for this problem by measuring the expected change in the output when a random data point is deleted. We propose an efficient algorithm for hierarchical $k$-median clustering and theoretically prove its low average sensitivity and high clustering quality. Additionally, we show that single linkage clustering and a deterministic variant of the CLNSS algorithm exhibit high average sensitivity, making them less stable. Finally, we validate the robustness and effectiveness of our algorithm through experiments.

ICLR Conference 2024 Conference Paper

A Differentially Private Clustering Algorithm for Well-Clustered Graphs

  • Weiqiang He
  • Hendrik Fichtenberger
  • Pan Peng 0001

We study differentially private (DP) algorithms for recovering clusters in well-clustered graphs, which are graphs whose vertex set can be partitioned into a small number of sets, each inducing a subgraph of high inner conductance and small outer conductance. Such graphs have widespread application as a benchmark in the theoretical analysis of spectral clustering. We provide an efficient ($\epsilon$,$\delta$)-DP algorithm tailored specifically for such graphs. Our algorithm draws inspiration from the recent work of Chen et al., who developed DP algorithms for recovery of stochastic block models in cases where the graph comprises exactly two nearly-balanced clusters. Our algorithm works for well-clustered graphs with $k$ nearly-balanced clusters, and the misclassification ratio almost matches the one of the best-known non-private algorithms. We conduct experimental evaluations on datasets with known ground truth clusters to substantiate the prowess of our algorithm. We also show that any (pure) $\epsilon$-DP algorithm would result in substantial error.

ICML Conference 2022 Conference Paper

Sublinear-Time Clustering Oracle for Signed Graphs

  • Stefan Neumann 0003
  • Pan Peng 0001

Social networks are often modeled using signed graphs, where vertices correspond to users and edges have a sign that indicates whether an interaction between users was positive or negative. The arising signed graphs typically contain a clear community structure in the sense that the graph can be partitioned into a small number of polarized communities, each defining a sparse cut and indivisible into smaller polarized sub-communities. We provide a local clustering oracle for signed graphs with such a clear community structure, that can answer membership queries, i. e. , “Given a vertex $v$, which community does $v$ belong to? ”, in sublinear time by reading only a small portion of the graph. Formally, when the graph has bounded maximum degree and the number of communities is at most $O(\log n)$, then with $\tilde{O}(\sqrt{n}\operatorname{poly}(1/\varepsilon))$ preprocessing time, our oracle can answer each membership query in $\tilde{O}(\sqrt{n}\operatorname{poly}(1/\varepsilon))$ time, and it correctly classifies a $(1-\varepsilon)$-fraction of vertices w. r. t. a set of hidden planted ground-truth communities. Our oracle is desirable in applications where the clustering information is needed for only a small number of vertices. Previously, such local clustering oracles were only known for unsigned graphs; our generalization to signed graphs requires a number of new ideas and gives a novel spectral analysis of the behavior of random walks with signs. We evaluate our algorithm for constructing such an oracle and answering membership queries on both synthetic and real-world datasets, validating its performance in practice.

SODA Conference 2021 Conference Paper

On Testability of First-Order Properties in Bounded-Degree Graphs

  • Isolde Adler
  • Noleen Köhler
  • Pan Peng 0001

We study property testing of properties that are definable in first-order logic (FO) in the bounded-degree graph and relational structure models. We show that any FO property that is defined by a formula with quantifier prefix ∃∗∀∗ is testable (i. e. , testable with constant query complexity), while there exists an FO property that is expressible by a formula with quantifier prefix ∀∗∃∗ that is not testable. In the dense graph model, a similar picture is long known (Alon, Fischer, Krivelevich, Szegedy, Combinatorica 2000), despite the very different nature of the two models. In particular, we obtain our lower bound by a first-order formula that defines a class of bounded-degree expanders, based on zig-zag products of graphs. We expect this to be of independent interest. We then prove testability of some first-order properties that speak about isomorphism types of neighbourhoods, including testability of 1-neighbourhood-freeness, and r -neighbourhood-freeness under a mild assumption on the degrees.

SODA Conference 2020 Conference Paper

Robust Clustering Oracle and Local Reconstructor of Cluster Structure of Graphs

  • Pan Peng 0001

We develop sublinear time algorithms for analyzing the cluster structure of graphs with noisy partial information. A graph G with maximum degree at most d is called ( k, φ in, φ out )-clusterable, if it can be partitioned into at most k parts, such that each part has inner conductance at least φ in and outer conductance at most φ out, where d is assumed to be constant. A graph G is called to be an ∊ -perturbation of a ( k, φ in, φ out )-clusterable graph if there is partition of G with at most k parts (called clusters), such that one can insert/delete at most ϵdn intra-cluster edges to make it a ( k, φ in, φ out )-clusterable graph. We are given query access to the adjacency list of such a graph. We show that one can construct in time a robust clustering oracle for a bounded-degree graph G that is an ∊ -perturbation of a -clusterable graph. Using such an oracle, a typical clustering query (e. g. , I s O utlier ( s ), S ame C luster ( s, t )) can be answered in time and the answers are consistent with a partition of G in which all but vertices belong to a good cluster, i. e. , a set with inner conductance at least, and outer conductance. We also develop a local reconstruction algorithm that takes as input a graph as above, and on any query vertex v, outputs all its neighbors in the reconstructed graph G’, which is guaranteed to be -clusterable (with slightly boosting degree bound). The number of edges changed is at most. Furthermore, the algorithm runs in time (per query) and can answer consistently with the same G ′ for any sequence of queries it gets.

SODA Conference 2019 Conference Paper

Every Testable (Infinite) Property of Bounded-Degree Graphs Contains an Infinite Hyperfinite Subproperty

  • Hendrik Fichtenberger
  • Pan Peng 0001
  • Christian Sohler

One of the most fundamental questions in graph property testing is to characterize the combinatorial structure of properties that are testable with a constant number of queries. We work towards an answer to this question for the bounded-degree graph model introduced in [GR02], where the input graphs have maximum degree bounded by a constant d. In this model, it is known (among other results) that every hyperfinite property is constant-query testable [NS13], where, informally, a graph property is hyperfinite, if for every δ > 0 every graph in the property can be partitioned into small connected components by removing δn edges. In this paper we show that hyperfiniteness plays a role in every testable property, i. e. we show that every testable property is either finite (which trivially implies hyperfiniteness and testability) or contains an infinite hyperfinite subproperty. A simple consequence of our result is that no infinite graph property that only consists of expander graphs is constant-query testable. Based on the above findings, one could ask if every infinite testable non-hyperfinite property might contain an infinite family of expander (or near-expander) graphs. We show that this is not true. Motivated by our counterexample we develop a theorem that shows that we can partition the set of vertices of every bounded degree graph into a constant number of subsets and a separator set, such that the separator set is small and the distribution of k -discs on every subset of a partition class, is roughly the same as that of the partition class if the subset has small expansion.

SODA Conference 2018 Conference Paper

Estimating Graph Parameters from Random Order Streams

  • Pan Peng 0001
  • Christian Sohler

We develop a new algorithmic technique that allows to transfer some constant time approximation algorithms for general graphs into random order streaming algorithms. We illustrate our technique by proving that in random order streams with probability at least 2/3, • the number of connected components of G can be approximated up to an additive error of εn using space, • the weight of a minimum spanning tree of a connected input graph with integer edges weights from {1, …, W } can be approximated within a multiplicative factor of 1 + ε using space, • the size of a maximum independent set in planar graphs can be approximated within a multiplicative factor of 1+ ε using space.

STOC Conference 2016 Conference Paper

Relating two property testing models for bounded degree directed graphs

  • Artur Czumaj
  • Pan Peng 0001
  • Christian Sohler

We study property testing algorithms in directed graphs (digraphs) with maximum indegree and maximum outdegree upper bounded by d . For directed graphs with bounded degree, there are two different models in property testing introduced by Bender and Ron (2002). In the bidirectional model , one can access both incoming and outgoing edges while in the unidirectional model one can only access outgoing edges. In our paper we provide a new relation between the two models: we prove that if a property can be tested with constant query complexity in the bidirectional model, then it can be tested with sublinear query complexity in the unidirectional model. A corollary of this result is that in the unidirectional model (the model allowing only queries to the outgoing neighbors), every property in hyperfinite digraphs is testable with sublinear query complexity.