Author name cluster

zengfeng Huang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

30 papers

2 author rows

AAAI Conference 2026 Conference Paper

Know Your Neighbors: Subgraph Importance Sampling for Heterophilic Graph Active Learning

Wenjie Yang
Shengzhong Zhang
Chen Ye
Jiaxing Guo
Tongshan Xu
zengfeng Huang

Graph neural networks (GNNs) have demonstrated strong performance in various graph mining tasks but rely heavily on extensively labeled nodes. To improve training efficiency, graph active learning (GAL) has emerged as a solution for selecting the most informative nodes for labeling. However, existing GAL methods are primarily designed for homophilic graphs, where nodes with the same labels are more likely to be connected. In this work, we systematically study active learning on heterophilic graphs, a setting that has received limited attention. Surprisingly, we observe that existing GAL methods fail to consistently outperform random sampling on heterophilic graphs. Through an in-depth investigation, we reveal that these methods implicitly assume homophily even on heterophilic graphs, leading to suboptimal performance. To address this issue, we introduce the principle of "Know Your Neighbors" and propose an active learning algorithm KyN specifically for heterophilic graphs. The core idea of KyN is to provide GNNs with accurate estimations of homophily distribution by labeling nodes together with their neighbors. We implement KyN based on subgraph sampling with probabilities proportional to l1 Lewis weights, which is supported by solid theoretical guarantees. Extensive experiments on diverse real-world datasets, including a large heterophilic graph with over 2 million nodes, demonstrate the effectiveness and scalability of KyN.

PDF Details DOI

AAAI Conference 2026 Conference Paper

LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs

Xiaoran Liu
Yuerong Song
Zhigeng Liu
zengfeng Huang
Qipeng Guo
Ziwei He
Xipeng Qiu

Large Language Diffusion Models, or dLLMs, have emerged as a significant focus in NLP research, with substantial effort directed toward understanding their scalability and downstream task performance. However, their long-context capabilities remain unexplored, lacking systematic analysis or methods for context extension. In this work, we present the first systematic investigation comparing the long-context performance of diffusion LLMs and traditional auto-regressive LLMs. We first identify a unique characteristic of dLLMs, unlike auto-regressive LLMs, they maintain remarkably ***stable perplexity*** during direct context extrapolation. Moreover, where auto-regressive models fail outright during the Needle-In-A-Haystack task with context exceeding their pretrained length, we discover dLLMs exhibit a distinct ***local perception*** phenomenon, enabling successful retrieval from recent context segments. We explain both phenomena through the lens of Rotary Position Embedding (RoPE) scaling theory. Building on these observations, we propose LongLLaDA, a training-free method that integrates LLaDA with the NTK-based RoPE extrapolation. Our results validate that established extrapolation scaling laws remain effective for extending the context windows of dLLMs. Furthermore, we identify long-context tasks where dLLMs outperform auto-regressive LLMs and others where they fall short. Consequently, this study establishes the first length extrapolation method for diffusion LLMs while providing essential theoretical insights and empirical benchmarks critical for advancing future research on long-context diffusion LLMs.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Sparse-dLLM: Accelerating Diffusion LLMs with Dynamic Cache Eviction

Yuerong Song
Xiaoran Liu
Ruixiao Li
Zhigeng Liu
zengfeng Huang
Qipeng Guo
Ziwei He
Xipeng Qiu

Diffusion Large Language Models (dLLMs) enable breakthroughs in reasoning and parallel decoding but suffer from prohibitive quadratic computational complexity and memory overhead during inference. Current caching techniques accelerate decoding by storing full-layer states, yet impose substantial memory usage that limit long-context applications. Our analysis of attention patterns in dLLMs reveals persistent cross-layer sparsity, with pivotal tokens remaining salient across decoding steps and low-relevance tokens staying unimportant, motivating selective cache eviction. We propose Sparse-dLLM, the first training-free framework integrating dynamic cache eviction with sparse attention via delayed bidirectional sparse caching. By leveraging the stability of token saliency over steps, it retains critical tokens and dynamically evicts unimportant prefix/suffix entries using an attention-guided strategy. Extensive experiments on LLaDA and Dream series demonstrate Sparse-dLLM achieves up to 10 times higher throughput than vanilla dLLMs, with comparable performance and similar peak memory costs, outperforming previous methods in efficiency and effectiveness.

PDF Details DOI

AIJ Journal 2025 Journal Article

Contra2: A one-step active learning method for imbalanced graphs

Wenjie Yang
Shengzhong Zhang
Jiaxing Guo
zengfeng Huang

Graph active learning (GAL) is an important research direction in graph neural networks (GNNs) that aims to select the most valuable nodes for labeling to train GNNs. Previous works in GAL have primarily focused on the overall performance of GNNs, overlooking the balance among different classes. However, graphs in real-world applications are often imbalanced, which leads GAL methods to select class-imbalanced training sets, resulting in biased GNN models. Furthermore, due to the high cost of multi-turn queries, there is an increasing demand for one-step GAL methods, where the entire training set is queried at once. These realities prompt us to investigate the problem of one-step active learning on imbalanced graphs. In this paper, we propose a theory-driven method called Contrast & Contract (Contra2) to tackle the above issues. The key idea of Contra2 is that intra-class edges within the majority are dominant in the edge set, so contracting these edges will reduce the imbalance ratio. Specifically, Contra2 first learns node representations by graph contrastive learning (GCL), then stochastically contracts the edges that connect nodes with similar embeddings. We theoretically show that Contra2 reduces the imbalance ratio with high probability. By leveraging a more evenly distributed graph, we can achieve a balanced selection of labeled nodes without requiring any seed labels. The effectiveness of Contra2 is evaluated against various baselines on 11 datasets with different budgets. Contra2 demonstrates remarkable performance, achieving either higher or on-par performance with only half of the annotation budget on some datasets.

Details DOI

NeurIPS Conference 2025 Conference Paper

Geometric Imbalance in Semi-Supervised Node Classification

Liang Yan
Shengzhong Zhang
Bisheng Li
Menglin Yang
Chen Yang
Min Zhou
Weiyang Ding
Yutong Xie

Class imbalance in graph data presents a significant challenge for effective node classification, particularly in semi-supervised scenarios. In this work, we formally introduce the concept of geometric imbalance, which captures how message passing on class-imbalanced graphs leads to geometric ambiguity among minority-class nodes in the riemannian manifold embedding space. We provide a rigorous theoretical analysis of geometric imbalance on the riemannian manifold and propose a unified framework that explicitly mitigates it through pseudo-label alignment, node reordering, and ambiguity filtering. Extensive experiments on diverse benchmarks show that our approach consistently outperforms existing methods, especially under severe class imbalance. Our findings offer new theoretical insights and practical tools for robust semi-supervised node classification.

PDF Details

ICML Conference 2025 Conference Paper

High Probability Bound for Cross-Learning Contextual Bandits with Unknown Context Distributions

Ruiyuan Huang
Zengfeng Huang

Motivated by applications in online bidding and sleeping bandits, we examine the problem of contextual bandits with cross learning, where the learner observes the loss associated with the action across all possible contexts, not just the current round’s context. Our focus is on a setting where losses are chosen adversarially, and contexts are sampled i. i. d. from a specific distribution. This problem was first studied by Balseiro et al. (2019), who proposed an algorithm that achieves near-optimal regret under the assumption that the context distribution is known in advance. However, this assumption is often unrealistic. To address this issue, Schneider & Zimmert (2023) recently proposed a new algorithm that achieves nearly optimal expected regret. It is well-known that expected regret can be significantly weaker than high-probability bounds. In this paper, we present a novel, in-depth analysis of their algorithm and demonstrate that it actually achieves near-optimal regret with high probability. There are steps in the original analysis by Schneider & Zimmert (2023) that lead only to an expected bound by nature. In our analysis, we introduce several new insights. Specifically, we make extensive use of the weak dependency structure between different epochs, which was overlooked in previous analyses. Additionally, standard martingale inequalities are not directly applicable, so we refine martingale inequalities to complete our analysis.

Details

JMLR Journal 2025 Journal Article

Implicit vs Unfolded Graph Neural Networks

Yongyi Yang
Tang Liu
Yangkun Wang
zengfeng Huang
David Wipf

It has been observed that message-passing graph neural networks (GNN) sometimes struggle to maintain a healthy balance between the efficient / scalable modeling of long-range dependencies across nodes while avoiding unintended consequences such oversmoothed node representations, sensitivity to spurious edges, or inadequate model interpretability. To address these and other issues, two separate strategies have recently been proposed, namely implicit and unfolded GNNs (that we abbreviate to IGNN and UGNN respectively). The former treats node representations as the fixed points of a deep equilibrium model that can efficiently facilitate arbitrary implicit propagation across the graph with a fixed memory footprint. In contrast, the latter involves treating graph propagation as unfolded descent iterations as applied to some graph-regularized energy function. While motivated differently, in this paper we carefully quantify explicit situations where the solutions they produce are equivalent and others where their properties sharply diverge. This includes the analysis of convergence, representational capacity, and interpretability. In support of this analysis, we also provide empirical head-to-head comparisons across multiple synthetic and public real-world node classification benchmarks. These results indicate that while IGNN is substantially more memory-efficient, UGNN models support unique, integrated graph attention mechanisms and propagation rules that can achieve strong node classification accuracy across disparate regimes such as adversarially-perturbed graphs, graphs with heterophily, and graphs involving long-range dependencies. [abs] [ pdf ][ bib ] &copy JMLR 2025. ( edit, beta )

PDF Details

ICLR Conference 2025 Conference Paper

Lipschitz Bandits in Optimal Space

Xiaoyi Zhu
Zengfeng Huang

This paper considers the Lipschitz bandit problem, where the set of arms is continuous and the expected reward is a Lipschitz function over the arm space. This problem has been extensively studied. Prior algorithms need to store the reward information of all visited arms, leading to significant memory consumption. We address this issue by introducing an algorithm named Log-space Lipschitz bandits (Log-Li), which achieves an optimal (up to logarithmic factors) regret of $\widetilde{O}\left(T^{\frac{d_z+1}{d_z+2}}\right)$ while only uses $O\left(\log T\right)$ bits of memory. Additionally, we provide a complexity analysis for this problem, demonstrating that $\Omega\left(\log T\right)$ bits of space are necessary for any algorithm to achieve the optimal regret. We also conduct numerical simulations, and the results show that our new algorithm achieves regret comparable to the state-of-the-art while reducing memory usage by orders of magnitude.

Details

ICLR Conference 2025 Conference Paper

MuseGNN: Forming Scalable, Convergent GNN Layers that Minimize a Sampling-Based Energy

Haitian Jiang
Renjie Liu 0001
Zengfeng Huang
Yichuan Wang 0002
Xiao Yan 0002
Zhenkun Cai
Minjie Wang
David P. Wipf

Among the many variants of graph neural network (GNN) architectures capable of modeling data with cross-instance relations, an important subclass involves layers designed such that the forward pass iteratively reduces a graph-regularized energy function of interest. In this way, node embeddings produced at the output layer dually serve as both predictive features for solving downstream tasks (e.g., node classification) and energy function minimizers that inherit transparent, exploitable inductive biases and interpretability. However, scaling GNN architectures constructed in this way remains challenging, in part because the convergence of the forward pass may involve models with considerable depth. To tackle this limitation, we propose a sampling-based energy function and scalable GNN layers that iteratively reduce it, guided by convergence guarantees in certain settings. We also instantiate a full GNN architecture based on these designs, and the model achieves competitive accuracy and scalability when applied to the largest publicly-available node classification benchmark exceeding 1TB in size. Our source code is available at https://github.com/haitian-jiang/MuseGNN.

Details

STOC Conference 2025 Conference Paper

Simple and Optimal Algorithms for Heavy Hitters and Frequency Moments in Distributed Models

Zengfeng Huang
Zhongzheng Xiong
Xiaoyi Zhu
Zhewei Wei

We consider the problems of distributed heavy hitters and frequency moments in both the coordinator model and the distributed tracking model (also known as the distributed functional monitoring model). We present simple and optimal (up to logarithmic factors) algorithms for ℓ p heavy hitters and F p estimation ( p ≥ 2) in these distributed models. For ℓ p heavy hitters in the coordinator model, our algorithm requires only one round and uses Õ( k p −1 / p ) bits of communication. For p > 2, this is the first near-optimal result. By combining our algorithm with the standard recursive sketching technique, we obtain a near-optimal two-round algorithm for F p in the coordinator model, matching a significant result from recent work by Esfandiari et al. (STOC 2024). Our algorithm and analysis are much simpler and have better costs with respect to logarithmic factors. Furthermore, our technique provides a one-round algorithm for F p , which is a significant improvement over a result of Woodruff and Zhang (STOC 2012). Thanks to the simplicity of our heavy hitter algorithms, we manage to adapt them to the distributed tracking model with only a ( n ) increase in communication. For ℓ p heavy hitters, our algorithm has a communication cost of Õ( k p −1 / p ), representing the first near-optimal algorithm for all p ≥ 2. By applying the recursive sketching technique, we also provide the first near-optimal algorithm for F p in the distributed tracking model, with a communication cost of Õ( k p −1 / 2 ) for all p ≥ 2. Even for F 2 , our result improves upon the bounds established by Cormode, Muthukrishnan, and Yi (SODA 2008) and Woodruff and Zhang (STOC 2012), nearly matching the existing lower bound for the first time.

Details

ICLR Conference 2025 Conference Paper

TGB-Seq Benchmark: Challenging Temporal GNNs with Complex Sequential Dynamics

Lu Yi 0002
Jie Peng 0005
Yanping Zheng
Fengran Mo
Zhewei Wei
Yuhang Ye 0002
Yue Zixuan
Zengfeng Huang

Future link prediction is a fundamental challenge in various real-world dynamic systems. To address this, numerous temporal graph neural networks (temporal GNNs) and benchmark datasets have been developed. However, these datasets often feature excessive repeated edges and lack complex sequential dynamics, a key characteristic inherent in many real-world applications such as recommender systems and "Who-To-Follow" on social networks. This oversight has led existing methods to inadvertently downplay the importance of learning sequential dynamics, focusing primarily on predicting repeated edges. In this study, we demonstrate that existing methods, such as GraphMixer and DyGFormer, are inherently incapable of learning simple sequential dynamics, such as "a user who has followed OpenAI and Anthropic is more likely to follow AI at Meta next." Motivated by this issue, we introduce the Temporal Graph Benchmark with Sequential Dynamics (TGB-Seq), a new benchmark carefully curated to minimize repeated edges, challenging models to learn sequential dynamics and generalize to unseen edges. TGB-Seq comprises large real-world datasets spanning diverse domains, including e-commerce interactions, movie ratings, business reviews, social networks, citation networks and web link networks. Benchmarking experiments reveal that current methods usually suffer significant performance degradation and incur substantial training costs on TGB-Seq, posing new challenges and opportunities for future research. TGB-Seq datasets, leaderboards, and example codes are available at https://tgb-seq.github.io/.

Details

TMLR Journal 2025 Journal Article

Understanding Class Bias Amplification in Graph Representation Learning

Shengzhong Zhang
Wenjie Yang
Yimin Zhang
Hongwei Zhang
zengfeng Huang

Recent research reveals that GNN-based graph representation learning may inadvertently introduce various structural biases. In this work, we discover a phenomenon of structural bias in graph representation learning called class bias amplification, which refers to the exacerbation of performance bias between different classes by GNN encoder. We conduct an in-depth theoretical study of this phenomenon from a novel spectral perspective. Our analysis suggests that structural disparities between nodes in different classes result in varying local convergence speeds for node embeddings. This phenomenon leads to bias amplification in the classification results of downstream tasks. Based on the theoretical insights, we propose random graph coarsening, which is proved to be effective in dealing with the above issue. Finally, we propose an unsupervised graph contrastive learning model called Random Graph Coarsening Contrastive Learning (RGCCL), which utilizes random coarsening as data augmentation and mitigates class bias amplification by contrasting the coarsened graph with the original graph. Extensive experiments on various datasets demonstrate the advantage of our method when dealing with class bias amplification.

PDF Details

ICLR Conference 2024 Conference Paper

StructComp: Substituting propagation with Structural Compression in Training Graph Contrastive Learning

Shengzhong Zhang
Wenjie Yang 0006
Xinyuan Cao
Hongwei Zhang
Zengfeng Huang

Graph contrastive learning (GCL) has become a powerful tool for learning graph data, but its scalability remains a significant challenge. In this work, we propose a simple yet effective training framework called Structural Compression (StructComp) to address this issue. Inspired by a sparse low-rank approximation on the diffusion matrix, StructComp trains the encoder with the compressed nodes. This allows the encoder not to perform any message passing during the training stage, and significantly reduces the number of sample pairs in the contrastive loss. We theoretically prove that the original GCL loss can be approximated with the contrastive loss computed by StructComp. Moreover, StructComp can be regarded as an additional regularization term for GCL models, resulting in a more robust encoder. Empirical studies on various datasets show that StructComp greatly reduces the time and memory consumption while improving model performance compared to the vanilla GCL models and scalable training methods.

Details

NeurIPS Conference 2023 Conference Paper

Adversarially Robust Distributed Count Tracking via Partial Differential Privacy

Zhongzheng Xiong
Xiaoyi Zhu
zengfeng Huang

We study the distributed tracking model, also known as distributed functional monitoring. This model involves $k$ sites each receiving a stream of items and communicating with the central server. The server's task is to track a function of all items received thus far continuously, with minimum communication cost. For count tracking, it is known that there is a $\sqrt{k}$ gap in communication between deterministic and randomized algorithms. However, existing randomized algorithms assume an "oblivious adversary" who constructs the entire input streams before the algorithm starts. Here we consider adaptive adversaries who can choose new items based on previous answers from the algorithm. Deterministic algorithms are trivially robust to adaptive adversaries, while randomized ones may not. Therefore, we investigate whether the $\sqrt{k}$ advantage of randomized algorithms is from randomness itself or the oblivious adversary assumption. We provide an affirmative answer to this question by giving a robust algorithm with optimal communication. Existing robustification techniques do not yield optimal bounds due to the inherent challenges of the distributed nature of the problem. To address this, we extend the differential privacy framework by introducing "partial differential privacy" and proving a new generalization theorem. This theorem may have broader applications beyond robust count tracking, making it of independent interest.

PDF Details

ICML Conference 2023 Conference Paper

On Coresets for Clustering in Small Dimensional Euclidean spaces

Lingxiao Huang
Ruiyuan Huang
Zengfeng Huang
Xuan Wu 0002

We consider the problem of constructing small coresets for $k$-Median in Euclidean spaces. Given a large set of data points $P\subset \mathbb{R}^d$, a coreset is a much smaller set $S\subset \mathbb{R}^d$, so that the $k$-Median costs of any $k$ centers w. r. t. $P$ and $S$ are close. Existing literature mainly focuses on the high-dimension case and there has been great success in obtaining dimension-independent bounds, whereas the case for small $d$ is largely unexplored. Considering many applications of Euclidean clustering algorithms are in small dimensions and the lack of systematic studies in the current literature, this paper investigates coresets for $k$-Median in small dimensions. For small $d$, a natural question is whether existing near-optimal dimension-independent bounds can be significantly improved. We provide affirmative answers to this question for a range of parameters. Moreover, new lower bound results are also proved, which are the highest for small $d$. In particular, we completely settle the coreset size bound for $1$-d $k$-Median (up to log factors). Interestingly, our results imply a strong separation between $1$-d $1$-Median and $1$-d $2$-Median. As far as we know, this is the first such separation between $k=1$ and $k=2$ in any dimension.

Details

NeurIPS Conference 2023 Conference Paper

Rethinking Semi-Supervised Imbalanced Node Classification from Bias-Variance Decomposition

Divin Yan
Gengchen Wei
Chen Yang
Shengzhong Zhang
zengfeng Huang

This paper introduces a new approach to address the issue of class imbalance in graph neural networks (GNNs) for learning on graph-structured data. Our approach integrates imbalanced node classification and Bias-Variance Decomposition, establishing a theoretical framework that closely relates data imbalance to model variance. We also leverage graph augmentation technique to estimate the variance and design a regularization term to alleviate the impact of imbalance. Exhaustive tests are conducted on multiple benchmarks, including naturally imbalanced datasets and public-split class-imbalanced datasets, demonstrating that our approach outperforms state-of-the-art methods in various imbalanced scenarios. This work provides a novel theoretical perspective for addressing the problem of imbalanced node classification in GNNs.

PDF Details

NeurIPS Conference 2022 Conference Paper

Lipschitz Bandits with Batched Feedback

Yasong Feng
zengfeng Huang
Tianyu Wang

In this paper, we study Lipschitz bandit problems with batched feedback, where the expected reward is Lipschitz and the reward observations are communicated to the player in batches. We introduce a novel landscape-aware algorithm, called Batched Lipschitz Narrowing (BLiN), that optimally solves this problem. Specifically, we show that for a $T$-step problem with Lipschitz reward of zooming dimension $d_z$, our algorithm achieves theoretically optimal (up to logarithmic factors) regret rate $\widetilde{\mathcal{O}}\left(T^{\frac{d_z+1}{d_z+2}}\right)$ using only $ \mathcal{O} \left( \log\log T\right) $ batches. We also provide complexity analysis for this problem. Our theoretical lower bound implies that $\Omega(\log\log T)$ batches are necessary for any algorithm to achieve the optimal regret. Thus, BLiN achieves optimal regret rate using minimal communication.

PDF Details

ICML Conference 2022 Conference Paper

Optimal Clustering with Noisy Queries via Multi-Armed Bandit

Jinghui Xia
Zengfeng Huang

Motivated by many applications, we study clustering with a faulty oracle. In this problem, there are $n$ items belonging to $k$ unknown clusters, and the algorithm is allowed to ask the oracle whether two items belong to the same cluster or not. However, the answer from the oracle is correct only with probability $\frac{1}{2}+\frac{\delta}{2}$. The goal is to recover the hidden clusters with minimum number of noisy queries. Previous works have shown that the problem can be solved with $O(\frac{nk\log n}{\delta^2} + \text{poly}(k, \frac{1}{\delta}, \log n))$ queries, while $\Omega(\frac{nk}{\delta^2})$ queries is known to be necessary. So, for any values of $k$ and $\delta$, there is still a non-trivial gap between upper and lower bounds. In this work, we obtain the first matching upper and lower bounds for a wide range of parameters. In particular, a new polynomial time algorithm with $O(\frac{n(k+\log n)}{\delta^2} + \text{poly}(k, \frac{1}{\delta}, \log n))$ queries is proposed. Moreover, we prove a new lower bound of $\Omega(\frac{n\log n}{\delta^2})$, which, combined with the existing $\Omega(\frac{nk}{\delta^2})$ bound, matches our upper bound up to an additive $\text{poly}(k, \frac{1}{\delta}, \log n)$ term. To obtain the new results, our main ingredient is an interesting connection between our problem and multi-armed bandit, which might provide useful insights for other similar problems.

Details

NeurIPS Conference 2022 Conference Paper

Transformers from an Optimization Perspective

Yongyi Yang
zengfeng Huang
David P Wipf

Deep learning models such as the Transformer are often constructed by heuristics and experience. To provide a complementary foundation, in this work we study the following problem: Is it possible to find an energy function underlying the Transformer model, such that descent steps along this energy correspond with the Transformer forward pass? By finding such a function, we can reinterpret Transformers as the unfolding of an interpretable optimization process. This unfolding perspective has been frequently adopted in the past to elucidate more straightforward deep models such as MLPs and CNNs; however, it has thus far remained elusive obtaining a similar equivalence for more complex models with self-attention mechanisms like the Transformer. To this end, we first outline several major obstacles before providing companion techniques to at least partially address them, demonstrating for the first time a close association between energy function minimization and deep layers with self-attention. This interpretation contributes to our intuition and understanding of Transformers, while potentially laying the ground-work for new model designs.

PDF Details

ICLR Conference 2022 Conference Paper

Why Propagate Alone? Parallel Use of Labels and Features on Graphs

Yangkun Wang
Jiarui Jin
Weinan Zhang 0001
Yongyi Yang
Jiuhai Chen
Quan Gan
Yong Yu 0001
Zheng Zhang 0001

One of the challenges of graph-based semi-supervised learning over ordinary supervised learning for classification tasks lies in label utilization. The direct use of ground-truth labels in graphs for training purposes can result in a parametric model learning trivial degenerate solutions (e.g., an identity mapping from input to output). In addressing this issue, a label trick has recently been proposed in the literature and applied to a wide range of graph neural network (GNN) architectures, achieving state-of-the-art results on various datasets. The essential idea is to randomly split the observed labels on the graph and use a fraction of them as input to the model (along with original node features), and predict the remaining fraction. Despite its success in enabling GNNs to propagate features and labels simultaneously, this approach has never been analyzed from a theoretical perspective, nor fully explored across certain natural use cases. In this paper, we demonstrate that under suitable settings, this stochastic trick can be reduced to a more interpretable deterministic form, allowing us to better explain its behavior, including an emergent regularization effect, and motivate broader application scenarios. Our experimental results corroborate these analyses while also demonstrating improved node classification performance applying the label trick in new domains.

Details

NeurIPS Conference 2021 Conference Paper

BernNet: Learning Arbitrary Graph Spectral Filters via Bernstein Approximation

Mingguo He
Zhewei Wei
zengfeng Huang
Hongteng Xu

Many representative graph neural networks, $e. g. $, GPR-GNN and ChebNet, approximate graph convolutions with graph spectral filters. However, existing work either applies predefined filter weights or learns them without necessary constraints, which may lead to oversimplified or ill-posed filters. To overcome these issues, we propose $\textit{BernNet}$, a novel graph neural network with theoretical support that provides a simple but effective scheme for designing and learning arbitrary graph spectral filters. In particular, for any filter over the normalized Laplacian spectrum of a graph, our BernNet estimates it by an order-$K$ Bernstein polynomial approximation and designs its spectral property by setting the coefficients of the Bernstein basis. Moreover, we can learn the coefficients (and the corresponding filter weights) based on observed graphs and their associated signals and thus achieve the BernNet specialized for the data. Our experiments demonstrate that BernNet can learn arbitrary spectral filters, including complicated band-rejection and comb filters, and it achieves superior performance in real-world graph modeling tasks. Code is available at https: //github. com/ivam-he/BernNet.

PDF Details

JMLR Journal 2021 Journal Article

Communication-Efficient Distributed Covariance Sketch, with Application to Distributed PCA

zengfeng Huang
Xuemin Lin
Wenjie Zhang
Ying Zhang

A sketch of a large data set captures vital properties of the original data while typically occupying much less space. In this paper, we consider the problem of computing a sketch of a massive data matrix $A\in\mathbb{R}^{n\times d}$ that is distributed across $s$ machines. Our goal is to output a matrix $B\in\mathbb{R}^{\ell\times d}$ which is significantly smaller than but still approximates $A$ well in terms of {covariance error}, i.e., $\|{A^TA-B^TB}\|_2$. Such a matrix $B$ is called a covariance sketch of $A$. We are mainly focused on minimizing the communication cost, which is arguably the most valuable resource in distributed computations. We show that there is a nontrivial gap between deterministic and randomized communication complexity for computing a covariance sketch. More specifically, we first prove an almost tight deterministic communication lower bound, then provide a new randomized algorithm with communication cost smaller than the deterministic lower bound. Based on a well-known connection between covariance sketch and approximate principle component analysis, we obtain better communication bounds for the distributed PCA problem. Moreover, we also give an improved distributed PCA algorithm for sparse input matrices, which uses our distributed sketching algorithm as a key building block. [abs] [ pdf ][ bib ] &copy JMLR 2021. ( edit, beta )

PDF Details

ICML Conference 2021 Conference Paper

Graph Neural Networks Inspired by Classical Iterative Algorithms

Yongyi Yang
Tang Liu
Yangkun Wang
Jinjing Zhou
Quan Gan
Zhewei Wei
Zheng Zhang 0001
Zengfeng Huang

Despite the recent success of graph neural networks (GNN), common architectures often exhibit significant limitations, including sensitivity to oversmoothing, long-range dependencies, and spurious edges, e. g. , as can occur as a result of graph heterophily or adversarial attacks. To at least partially address these issues within a simple transparent framework, we consider a new family of GNN layers designed to mimic and integrate the update rules of two classical iterative algorithms, namely, proximal gradient descent and iterative reweighted least squares (IRLS). The former defines an extensible base GNN architecture that is immune to oversmoothing while nonetheless capturing long-range dependencies by allowing arbitrary propagation steps. In contrast, the latter produces a novel attention mechanism that is explicitly anchored to an underlying end-to-end energy function, contributing stability with respect to edge uncertainty. When combined we obtain an extremely simple yet robust model that we evaluate across disparate scenarios including standardized benchmarks, adversarially-perturbated graphs, graphs with heterophily, and graphs involving long-range dependencies. In doing so, we compare against SOTA GNN approaches that have been explicitly designed for the respective task, achieving competitive or superior node classification accuracy. Our code is available at https: //github. com/FFTYYY/TWIRLS. And for an extended version of this work, please see https: //arxiv. org/abs/2103. 06064.

Details

NeurIPS Conference 2021 Conference Paper

Understanding Bandits with Graph Feedback

Houshuang Chen
zengfeng Huang
Shuai Li
Chihao Zhang

The bandit problem with graph feedback, proposed in [Mannor and Shamir, NeurIPS 2011], is modeled by a directed graph $G=(V, E)$ where $V$ is the collection of bandit arms, and once an arm is triggered, all its incident arms are observed. A fundamental question is how the structure of the graph affects the min-max regret. We propose the notions of the fractional weak domination number $\delta^*$ and the $k$-packing independence number capturing upper bound and lower bound for the regret respectively. We show that the two notions are inherently connected via aligning them with the linear program of the weakly dominating set and its dual --- the fractional vertex packing set respectively. Based on this connection, we utilize the strong duality theorem to prove a general regret upper bound $O\left(\left(\delta^*\log |V|\right)^{\frac{1}{3}}T^{\frac{2}{3}}\right)$ and a lower bound $\Omega\left(\left(\delta^*/\alpha\right)^{\frac{1}{3}}T^{\frac{2}{3}}\right)$ where $\alpha$ is the integrality gap of the dual linear program. Therefore, our bounds are tight up to a $\left(\log |V|\right)^{\frac{1}{3}}$ factor on graphs with bounded integrality gap for the vertex packing problem including trees and graphs with bounded degree. Moreover, we show that for several special families of graphs, we can get rid of the $\left(\log |V|\right)^{\frac{1}{3}}$ factor and establish optimal regret.

PDF Details

IJCAI Conference 2020 Conference Paper

Joint Representation Learning of Legislator and Legislation for Roll Call Prediction

Yuqiao Yang
Xiaoqiang Lin
Geng Lin
zengfeng Huang
Changjian Jiang
Zhongyu Wei

In this paper, we explore to learn representations of legislation and legislator for the prediction of roll call results. The most popular approach for this topic is named the ideal point model that relies on historical voting information for representation learning of legislators. It largely ignores the context information of the legislative data. We, therefore, propose to incorporate context information to learn dense representations for both legislators and legislation. For legislators, we incorporate relations among them via graph convolutional neural networks (GCN) for their representation learning. For legislation, we utilize its narrative description via recurrent neural networks (RNN) for representation learning. In order to align two kinds of representations in the same vector space, we introduce a triplet loss for the joint training. Experimental results on a self-constructed dataset show the effectiveness of our model for roll call results prediction compared to some state-of-the-art baselines.

PDF Details DOI

ICML Conference 2020 Conference Paper

Simple and Deep Graph Convolutional Networks

Ming Chen
Zhewei Wei
Zengfeng Huang
Bolin Ding
Yaliang Li

Graph convolutional networks (GCNs) are a powerful deep learning approach for graph-structured data. Recently, GCNs and subsequent variants have shown superior performance in various application areas on real-world datasets. Despite their success, most of the current GCN models are shallow, due to the \emph{over-smoothing} problem. In this paper, we study the problem of designing and analyzing deep graph convolutional networks. We propose the GCNII, an extension of the vanilla GCN model with two simple yet effective techniques: \emph{Initial residual} and \emph{Identity mapping}. We provide theoretical and empirical evidence that the two techniques effectively relieves the problem of over-smoothing. Our experiments show that the deep GCNII model outperforms the state-of-the-art methods on various semi- and full-supervised tasks.

Details

JMLR Journal 2019 Journal Article

Near Optimal Frequent Directions for Sketching Dense and Sparse Matrices

zengfeng Huang

Given a large matrix $A\in\mathbb{R}^{n\times d}$, we consider the problem of computing a sketch matrix $B\in\mathbb{R}^{\ell\times d}$ which is significantly smaller than but still well approximates $A$. We consider the problems in the streaming model, where the algorithm can only make one pass over the input with limited working space, and we are interested in minimizing the covariance error $\|A^TA-B^TB\|_2.$ The popular Frequent Directions algorithm of \cite{liberty2013simple} and its variants achieve optimal space-error tradeoffs. However, whether the running time can be improved remains an unanswered question. In this paper, we almost settle the question by proving that the time complexity of this problem is equivalent to that of matrix multiplication up to lower order terms. Specifically, we provide new space-optimal algorithms with faster running times and also show that the running times of our algorithms can be improved if and only if the state-of-the-art running time of matrix multiplication can be improved significantly. [abs] [ pdf ][ bib ] &copy JMLR 2019. ( edit, beta )

PDF Details

NeurIPS Conference 2019 Conference Paper

Optimal Sparsity-Sensitive Bounds for Distributed Mean Estimation

zengfeng Huang
Ziyue Huang
Yilei Wang
Ke Yi

We consider the problem of estimating the mean of a set of vectors, which are stored in a distributed system. This is a fundamental task with applications in distributed SGD and many other distributed problems, where communication is a main bottleneck for scaling up computations. We propose a new sparsity-aware algorithm, which improves previous results both theoretically and empirically. The communication cost of our algorithm is characterized by Hoyer's measure of sparseness. Moreover, we prove that the communication cost of our algorithm is information-theoretic optimal up to a constant factor in all sparseness regime. We have also conducted experimental studies, which demonstrate the advantages of our method and confirm our theoretical findings.

PDF Details

ICML Conference 2018 Conference Paper

Near Optimal Frequent Directions for Sketching Dense and Sparse Matrices

Zengfeng Huang

Given a large matrix $A\in\real^{n\times d}$, we consider the problem of computing a sketch matrix $B\in\real^{\ell\times d}$ which is significantly smaller than but still well approximates $A$. We are interested in minimizing the covariance error $\norm{A^TA-B^TB}_2. $We consider the problems in the streaming model, where the algorithm can only make one pass over the input with limited working space. The popular Frequent Directions algorithm of Liberty (2013) and its variants achieve optimal space-error tradeoff. However, whether the running time can be improved remains an unanswered question. In this paper, we almost settle the time complexity of this problem. In particular, we provide new space-optimal algorithms with faster running times. Moreover, we also show that the running times of our algorithms are near-optimal unless the state-of-the-art running time of matrix multiplication can be improved significantly.

Details

FOCS Conference 2014 Conference Paper

The Communication Complexity of Distributed epsilon-Approximations

Zengfeng Huang
Ke Yi 0001

Data summarization is an effective approach to dealing with the "big data" problem. While data summarization problems traditionally have been studied is the streaming model, the focus is starting to shift to distributed models, as distributed/parallel computation seems to be the only viable way to handle today's massive data sets. In this paper, we study ε-approximations, a classical data summary that, intuitively speaking, preserves approximately the density of the underlying data set over a certain range space. We consider the problem of computing ε-approximations for a data set which is held jointly by k players, and give general communication upper and lower bounds that hold for any range space whose discrepancy is known.

Details