Author name cluster

Qiuhao Zeng

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

10 papers

2 author rows

AAAI Conference 2026 Conference Paper

Graph Domain Adaptation via Homophily-Agnostic Reconstructing Structure

Ruiyi Fang
Shuo Wang
Ruizhi Pu
Qiuhao Zeng
Hao Zheng
Ziyan Wang
Jiale Cai
Zhimin Mei

Graph Domain Adaptation (GDA) transfers knowledge from labeled source graphs to unlabeled target graphs, addressing the challenge of label scarcity. However, existing GDA methods typically assume that both source and target graphs exhibit homophily, leading existing methods to perform poorly when heterophily is present. Furthermore, the lack of labels in the target graph makes it impossible to assess its homophily level beforehand. To address this challenge, we propose a novel homophily-agnostic approach that effectively transfers knowledge between graphs with varying degrees of homophily. Specifically, we adopt a divide-and-conquer strategy that first separately reconstructs highly homophilic and heterophilic variants of both the source and target graphs, and then performs knowledge alignment separately between corresponding graph variants. Extensive experiments conducted on five benchmark datasets demonstrate the superior performance of our approach, particularly highlighting its substantial advantages on heterophilic graphs.

PDF Details DOI

ICML Conference 2025 Conference Paper

Calibrated Language Models and How to Find Them with Label Smoothing

Jerry Huang
Peng Lu
Qiuhao Zeng

Recent advances in natural language processing (NLP) have opened up greater opportunities to enable fine-tuned large language models (LLMs) to behave as more powerful interactive agents through improved instruction-following ability. However, understanding how this impacts confidence calibration for reliable model output has not been researched in full. In this work, we examine various open-sourced LLMs, identifying significant calibration degradation after instruction tuning in each. Seeking a practical solution, we look towards label smoothing, which has been shown as an effective method to regularize for overconfident predictions but has yet to be widely adopted in the supervised fine-tuning (SFT) of LLMs. We first provide insight as to why label smoothing is sufficient to maintain calibration throughout the SFT process. However, settings remain where the effectiveness of smoothing is severely diminished, in particular the case of large vocabulary LLMs (LV-LLMs). We posit the cause to stem from the ability to become over-confident, which has a direct relationship with the hidden size and vocabulary size, and justify this theoretically and experimentally. Finally, we address an outstanding issue regarding the memory footprint of the cross-entropy loss computation in the label smoothed loss setting, designing a customized kernel to dramatically reduce memory consumption without sacrificing speed or performance in comparison to existing solutions for non-smoothed losses.

Details

ICML Conference 2025 Conference Paper

Homophily Enhanced Graph Domain Adaptation

Ruiyi Fang
Bingheng Li
Jingyu Zhao
Ruizhi Pu
Qiuhao Zeng
Gezheng Xu
Charles X. Ling
Boyu Wang 0004

Graph Domain Adaptation (GDA) transfers knowledge from labeled source graphs to unlabeled target graphs, addressing the challenge of label scarcity. In this paper, we highlight the significance of graph homophily, a pivotal factor for graph domain alignment, which, however, has long been overlooked in existing approaches. Specifically, our analysis first reveals that homophily discrepancies exist in benchmarks. Moreover, we also show that homophily discrepancies degrade GDA performance from both empirical and theoretical aspects, which further underscores the importance of homophily alignment in GDA. Inspired by this finding, we propose a novel homophily alignment algorithm that employs mixed filters to smooth graph signals, thereby effectively capturing and mitigating homophily discrepancies between graphs. Experimental results on a variety of benchmarks verify the effectiveness of our method.

Details

NeurIPS Conference 2025 Conference Paper

Mamba Modulation: On the Length Generalization of Mamba Models

Peng Lu
Jerry Huang
Qiuhao Zeng
Xinyu Wang
Boxing Chen
Philippe Langlais
Yufei Cui

The quadratic complexity of the attention mechanism in Transformer models has motivated the development of alternative architectures with sub-quadratic scaling, such as state-space models. Among these, Mamba has emerged as a leading architecture, achieving state-of-the-art results across a range of language modeling tasks. However, Mamba’s performance significantly deteriorates when applied to contexts longer than those seen during pre-training, revealing a sharp sensitivity to context length extension. Through detailed analysis, we attribute this limitation to the out-of-distribution behavior of its state-space dynamics, particularly within the parameterization of the state transition matrix $A$. Unlike recent works which attribute this sensitivity to the vanished accumulation of discretization time steps, $\exp(-\sum_{t=1}^N{\Delta}_t)$, we establish a connection between state convergence behavior as the input length approaches infinity and the spectrum of the transition matrix $A$, offering a well-founded explanation of its role in length extension. Next, to overcome this challenge, we propose an approach that applies spectrum scaling to pre-trained Mamba models to enable robust long-context generalization by selectively modulating the spectrum of $A$ matrices in each layer. We show that this can significantly improve performance in settings where simply modulating ${\Delta}_t$ fails, validating our insights and providing avenues for better length generalization of state-space models with structured transition matrices.

PDF Details

ICLR Conference 2025 Conference Paper

On the Benefits of Attribute-Driven Graph Domain Adaptation

Ruiyi Fang
Bingheng Li
Zhao Kang 0001
Qiuhao Zeng
Nima Hosseini Dashtbayaz
Ruizhi Pu
Charles X. Ling
Boyu Wang 0004

Graph Domain Adaptation (GDA) addresses a pressing challenge in cross-network learning, particularly pertinent due to the absence of labeled data in real-world graph datasets. Recent studies attempted to learn domain invariant representations by eliminating structural shifts between graphs. In this work, we show that existing methodologies have overlooked the significance of the graph node attribute, a pivotal factor for graph domain alignment. Specifically, we first reveal the impact of node attributes for GDA by theoretically proving that in addition to the graph structural divergence between the domains, the node attribute discrepancy also plays a critical role in GDA. Moreover, we also empirically show that the attribute shift is more substantial than the topology shift, which further underscore the importance of node attribute alignment in GDA. Inspired by this finding, a novel cross-channel module is developed to fuse and align both views between the source and target graphs for GDA. Experimental results on a variety of benchmark verify the effectiveness of our method.

Details

ICLR Conference 2025 Conference Paper

ZETA: Leveraging Z-order Curves for Efficient Top-k Attention

Qiuhao Zeng
Jerry Huang
Peng Lu
Gezheng Xu
Boxing Chen
Charles X. Ling
Boyu Wang 0004

Over recent years, the Transformer has become a fundamental building block for sequence modeling architectures. Yet at its core is the use of self-attention, whose memory and computational cost grow quadratically with the sequence length $N$, rendering it prohibitively expensive for long sequences. A promising approach is top-$k$ attention, which selects only the $k$ most relevant tokens and achieves performance comparable to vanilla self-attention while significantly reducing space and computational demands. However, causal masks require the current query token to only attend to past tokens, preventing existing top-$k$ attention methods from efficiently searching for the most relevant tokens in parallel, thereby limiting training efficiency. In this work, we propose ZETA, leveraging Z-Order Curves for Efficient Top-k Attention, to enable parallel querying of past tokens for entire sequences. We first theoretically show that the choice of key and query dimensions involves a trade-off between the curse of dimensionality and the preservation of relative distances after projection. In light of this insight, we propose reducing the dimensionality of keys and queries in contrast to values and further leveraging Z-order curves to map low-dimensional keys and queries into one-dimensional space, which permits parallel sorting, thereby largely improving the efficiency for top-$k$ token selection. Experimental results demonstrate that ZETA~matches the performance of standard attention on synthetic tasks Associative Recall and outperforms attention and its variants on Long-Range Arena and WikiText-103 language modeling.

Details

AAAI Conference 2024 Conference Paper

Generalizing across Temporal Domains with Koopman Operators

Qiuhao Zeng
Wei Wang
Fan Zhou
Gezheng Xu
Ruizhi Pu
Changjian Shui
Christian Gagné
Shichun Yang

In the field of domain generalization, the task of constructing a predictive model capable of generalizing to a target domain without access to target data remains challenging. This problem becomes further complicated when considering evolving dynamics between domains. While various approaches have been proposed to address this issue, a comprehensive understanding of the underlying generalization theory is still lacking. In this study, we contribute novel theoretic results that aligning conditional distribution leads to the reduction of generalization bounds. Our analysis serves as a key motivation for solving the Temporal Domain Generalization (TDG) problem through the application of Koopman Neural Operators, resulting in Temporal Koopman Networks (TKNets). By employing Koopman Neural Operators, we effectively address the time-evolving distributions encountered in TDG using the principles of Koopman theory, where measurement functions are sought to establish linear transition relations between evolving domains. Through empirical evaluations conducted on synthetic and real-world datasets, we validate the effectiveness of our proposed approach.

PDF Details DOI

ICLR Conference 2024 Conference Paper

Latent Trajectory Learning for Limited Timestamps under Distribution Shift over Time

Qiuhao Zeng
Changjian Shui
Long-Kai Huang
Peng Liu
Xi Chen 0009
Charles X. Ling
Boyu Wang 0004

Distribution shifts over time are common in real-world machine-learning applications. This scenario is formulated as Evolving Domain Generalization (EDG), where models aim to generalize well to unseen target domains in a time-varying system by learning and leveraging the underlying evolving pattern of the distribution shifts across domains. However, existing methods encounter challenges due to the limited number of timestamps (every domain corresponds to a timestamp) in EDG datasets, leading to difficulties in capturing evolving dynamics and risking overfitting to the sparse timestamps, which hampers their generalization and adaptability to new tasks. To address this limitation, we propose a novel approach SDE-EDG that collects the Infinitely Fined-Grid Evolving Trajectory (IFGET) of the data distribution with continuous-interpolated samples to bridge temporal gaps (intervals between two successive timestamps). Furthermore, by leveraging the inherent capacity of Stochastic Differential Equations (SDEs) to capture continuous trajectories, we propose their use to align SDE-modeled trajectories with IFGET across domains, thus enabling the capture of evolving distribution trends. We evaluate our approach on several benchmark datasets and demonstrate that it can achieve superior performance compared to existing state-of-the-art methods.

Details

NeurIPS Conference 2024 Conference Paper

Towards Understanding Evolving Patterns in Sequential Data

Qiuhao Zeng
Long-Kai Huang
Qi Chen
Charles Ling
Boyu Wang

In many machine learning tasks, data is inherently sequential. Most existing algorithms learn from sequential data in an auto-regressive manner, which predicts the next unseen data point based on the observed sequence, implicitly assuming the presence of an \emph{evolving pattern} embedded in the data that can be leveraged. However, identifying and assessing evolving patterns in learning tasks often relies on subjective judgments rooted in the prior knowledge of human experts, lacking a standardized quantitative measure. Furthermore, such measures enable us to determine the suitability of employing sequential models effectively and make informed decisions on the temporal order of time series data, and feature/data selection processes. To address this issue, we introduce the Evolving Rate (EvoRate), which quantitatively approximates the intensity of evolving patterns in the data with Mutual Information. Furthermore, in some temporal data with neural mutual information estimations, we only have snapshots at different timestamps, lacking correspondence, which hinders EvoRate estimation. To tackle this challenge, we propose EvoRate$_\mathcal{W}$, aiming to establish correspondence with optimal transport for estimating the first-order EvoRate. Experiments on synthetic and real-world datasets including images and tabular data validate the efficacy of our EvoRate.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Foresee What You Will Learn: Data Augmentation for Domain Generalization in Non-stationary Environment

Qiuhao Zeng
Wei Wang
Fan Zhou
Charles Ling
Boyu Wang

Existing domain generalization aims to learn a generalizable model to perform well even on unseen domains. For many real-world machine learning applications, the data distribution often shifts gradually along domain indices. For example, a self-driving car with a vision system drives from dawn to dusk, with the sky gradually darkening. Therefore, the system must be able to adapt to changes in ambient illuminations and continue to drive safely on the road. In this paper, we formulate such problems as Evolving Domain Generalization, where a model aims to generalize well on a target domain by discovering and leveraging the evolving pattern of the environment. We then propose Directional Domain Augmentation (DDA), which simulates the unseen target features by mapping source data as augmentations through a domain transformer. Specifically, we formulate DDA as a bi-level optimization problem and solve it through a novel meta-learning approach in the representation space. We evaluate the proposed method on both synthetic datasets and real-world datasets, and empirical results show that our approach can outperform other existing methods.

PDF Details DOI