Author name cluster

Zhichen Zeng

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

5 papers

1 author row

TMLR Journal 2026 Journal Article

Pave Your Own Path: Graph Gradual Domain Adaptation on Fused Gromov-Wasserstein Geodesics

Zhichen Zeng
Ruizhong Qiu
Wenxuan Bao
Tianxin Wei
Xiao Lin
Yuchen Yan
Tarek F. Abdelzaher
Jiawei Han

Graph neural networks, despite their impressive performance, are highly vulnerable to distribution shifts on graphs. Existing graph domain adaptation (graph DA) methods often implicitly assume a mild shift between source and target graphs, limiting their applicability to real-world scenarios with large shifts. Gradual domain adaptation (GDA) has emerged as a promising approach for addressing large shifts by gradually adapting the source model to the target domain via a path of unlabeled intermediate domains. Existing GDA methods exclusively focus on independent and identically distributed (IID) data with a predefined path, leaving their extension to non-IID graphs without a given path an open challenge. To bridge this gap, we present Gadget, the first GDA framework for non-IID graph data. First (theoretical foundation), the Fused Gromov-Wasserstein (FGW) distance is adopted as the domain discrepancy for non-IID graphs, based on which, we derive an error bound on node, edge and graph-level tasks, showing that the target domain error is proportional to the length of the path. Second (optimal path), guided by the error bound, we identify the FGW geodesic as the optimal path, which can be efficiently generated by our proposed algorithm. The generated path can be seamlessly integrated with existing graph DA methods to handle large shifts on graphs, improving state-of-the-art graph DA methods by up to 6.8% in accuracy on real-world datasets.

PDF Details

NeurIPS Conference 2025 Conference Paper

Improving LLM General Preference Alignment via Optimistic Online Mirror Descent

Yuheng Zhang
Dian Yu
Tao Ge
Linfeng Song
Zhichen Zeng
Haitao Mi
Nan Jiang
Dong Yu

Reinforcement learning from human feedback (RLHF) has demonstrated remarkable effectiveness in aligning large language models (LLMs) with human preferences. Many existing alignment approaches rely on the Bradley-Terry (BT) model assumption, which assumes the existence of a ground-truth reward for each prompt-response pair. However, this assumption can be overly restrictive when modeling complex human preferences. In this paper, we drop the BT model assumption and study LLM alignment under general preferences, formulated as a two-player game. Drawing on theoretical insights from learning in games, we integrate optimistic online mirror descent into our alignment framework to approximate the Nash policy. Theoretically, we demonstrate that our approach achieves an $\mathcal{O}(T^{-1})$ bound on the duality gap, improving upon the previous $\mathcal{O}(T^{-1/2})$ result. Meanwhile, it enjoys a linear convergence rate in the last iterate, a property not achieved by previous methods. More importantly, we implement our method and show through experiments that it outperforms state-of-the-art RLHF algorithms across multiple representative benchmarks.

PDF Details

NeurIPS Conference 2025 Conference Paper

SeerAttention: Self-distilled Attention Gating for Efficient Long-context Prefilling

Yizhao Gao
Zhichen Zeng
DaYou Du
Shijie Cao
Peiyuan Zhou
Jiaxing Qi
Junjie Lai
Hayden So

Attention is the cornerstone of modern Large Language Models (LLMs). Yet its quadratic complexity hinders efficiency and scalability, especially for long-context processing. A promising approach is to leverage sparsity in attention. However, existing sparsity-based solutions predominantly rely on predefined patterns or heuristics at the attention head level, struggling to adapt dynamically to different contexts efficiently. We propose SeerAttention, a simple yet effective attention mechanism that directly learns the block-level attention sparsity from the LLM itself. Inspired by the gating mechanism in Mixture of Experts (MoE), SeerAttention augments the conventional attention with a learnable gate that selectively activates important blocks within the attention map. Specifically, the gate first pools the query (Q) and key (K) tensors along the sequence dimension and processes them through learnable linear layers. The resulting matrices are then multiplied together to produce the gating scores, which are used to predict block-level attention sparsity. Combined with our block-sparse FlashAttention kernel, SeerAttention can achieve significant speedup on GPUs. When applied to pre-trained LLMs, SeerAttention only requires training the gate parameters in a lightweight self-distillation manner, allowing rapid convergence. Our evaluation results demonstrate that SeerAttention achieves better model accuracy and lower latency for long-context pre-filling compared to prior methods. Code is available at: https: //github. com/microsoft/SeerAttention.

PDF Details

NeurIPS Conference 2024 Conference Paper

Discrete-state Continuous-time Diffusion for Graph Generation

Zhe Xu
Ruizhong Qiu
Yuzhong Chen
Huiyuan Chen
Xiran Fan
Menghai Pan
Zhichen Zeng
Mahashweta Das

Graph is a prevalent discrete data structure, whose generation has wide applications such as drug discovery and circuit design. Diffusion generative models, as an emerging research focus, have been applied to graph generation tasks. Overall, according to the space of states and time steps, diffusion generative models can be categorized into discrete-/continuous-state discrete-/continuous-time fashions. In this paper, we formulate the graph diffusion generation in a discrete-state continuous-time setting, which has never been studied in previous graph diffusion models. The rationale of such a formulation is to preserve the discrete nature of graph-structured data and meanwhile provide flexible sampling trade-offs between sample quality and efficiency. Analysis shows that our training objective is closely related to the generation quality and our proposed generation framework enjoys ideal invariant/equivariant properties concerning the permutation of node ordering. Our proposed model shows competitive empirical performance against other state-of-the-art graph generation solutions on various benchmarks while at the same time can flexibly trade off the generation quality and efficiency in the sampling phase.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Hierarchical Multi-Marginal Optimal Transport for Network Alignment

Zhichen Zeng
Boxin Du
Si Zhang
Yinglong Xia
Zhining Liu
Hanghang Tong

Finding node correspondence across networks, namely multi-network alignment, is an essential prerequisite for joint learning on multiple networks. Despite great success in aligning networks in pairs, the literature on multi-network alignment is sparse due to the exponentially growing solution space and lack of high-order discrepancy measures. To fill this gap, we propose a hierarchical multi-marginal optimal transport framework named HOT for multi-network alignment. To handle the large solution space, multiple networks are decomposed into smaller aligned clusters via the fused Gromov-Wasserstein (FGW) barycenter. To depict high-order relationships across multiple networks, the FGW distance is generalized to the multi-marginal setting, based on which networks can be aligned jointly. A fast proximal point method is further developed with guaranteed convergence to a local optimum. Extensive experiments and analysis show that our proposed HOT achieves significant improvements over the state-of-the-art in both effectiveness and scalability.

PDF Details DOI