Arrow Research search

Author name cluster

Jingrui He

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

52 papers
2 author rows

Possible papers

52

TMLR Journal 2026 Journal Article

Bi-level Hierarchical Neural Contextual Bandits for Online Recommendation

  • Yunzhe Qi
  • Yao Zhou
  • Yikun Ban
  • Allan Stewart
  • Chuanwei Ruan
  • Jiachuan He
  • Shishir Kumar Prasad
  • Haixun Wang

Contextual bandit algorithms aim to identify the optimal choice among a set of candidate arms, based on their contextual information. Among others, neural contextual bandit algorithms have demonstrated generally superior performance compared to conventional linear and kernel-based methods. Nevertheless, neural methods can be inherently unsuitable for handling a large number of candidate arms due to their high computational cost when performing principled exploration. Motivated by the widespread availability of arm category information (e.g., movie genres, retailer types), we formulate contextual bandits as a bi-level online recommendation problem, and propose a novel neural bandit framework, named $\text{H}_{2}\text{N-Bandit}$, which utilizes a bi-level hierarchical neural architecture to mitigate the substantial computational cost found in conventional neural bandit methods. To demonstrate its theoretical effectiveness, we provide regret analysis under general over-parameterization settings, along with a guarantee for category-level recommendation. To illustrate its effectiveness and efficiency, we conduct extensive experiments on multiple real-world data sets, highlighting that $\text{H}_{2}\text{N-Bandit}$ can significantly reduce the computational cost over existing strong non-linear baselines, while achieving better or comparable performance under online recommendation settings.

TMLR Journal 2026 Journal Article

DiffKGW: Stealthy and Robust Diffusion Model Watermarking

  • Tianxin Wei
  • Ruizhong Qiu
  • Yifan Chen
  • Yunzhe Qi
  • Jiacheng Lin
  • Wenxuan Bao
  • Wenju Xu
  • Sreyashi Nag

Diffusion models are known for their supreme capability to generate realistic images. However, ethical concerns, such as copyright protection and the generation of inappropriate content, pose significant challenges for the practical deployment of diffusion models. Recent work has proposed a flurry of watermarking techniques that inject artificial patterns into initial latent representations of diffusion models, offering a promising solution to these issues. However, enforcing a specific pattern on selected elements can disrupt the Gaussian distribution of the initial latent representation. Inspired by watermarks for large language models (LLMs), we generalize the LLM KGW watermark to image diffusion models and propose a stealthy probability adjustment approach DiffKGW that preserves the Gaussian distribution of initial latent representation. In addition, we dissect the design principles of state-of-the-art watermarking techniques and introduce a unified framework. We identify a set of dimensions that explain the manipulation enforced by watermarking methods, including the distribution of individual elements, the specification of watermark shapes within each channel, and the choice of channels for watermark embedding. Through the empirical studies on regular text-to-image applications and the first systematic attempt at watermarking image-to-image diffusion models, we thoroughly verify the effectiveness of our proposed framework through comprehensive evaluations. On all the diffusion models, including Stable Diffusion, our approach induced from the proposed framework not only preserves image quality but also outperforms existing methods in robustness against a wide range of attacks.

AAAI Conference 2026 Conference Paper

Panda: Test-Time Adaptation with Negative Data Augmentation

  • Ruxi Deng
  • Wenxuan Bao
  • Tianxin Wei
  • Jingrui He

Pretrained vision-language models exhibit strong zero-shot classification capabilities, but their predictions degrade significantly under common image corruptions. To improve robustness, many test-time adaptation (TTA) methods adopt positive data augmentation (PDA), which generates multiple views of each test sample to reduce prediction variance. However, these methods suffer from two key limitations. First, it introduces considerable computational overhead due to the large number of augmentations required per image. Second, it fails to mitigate prediction bias, where the model tends to predict certain classes disproportionately under corruption, as PDA operates on corrupted inputs and typically does not remove the corruption itself. To address these challenges, we propose Panda, a novel TTA method based on negative data augmentation (NDA). Unlike positive augmentations that preserve object semantics, Panda generates negative augmentations by disrupting semantic content. It divides images into patches and randomly assembles them from a shared patch pool. These negatively augmented images retain corruption-specific features while discarding object-relevant signals. We then subtract the mean feature of these negative samples from the original image feature, effectively suppressing corruption-related components while preserving class-relevant information. This mitigates prediction bias under distribution shifts. Importantly, Panda allows augmentation to be shared across samples within a batch, resulting in minimal computational overhead. Panda can be seamlessly integrated into existing test-time adaptation frameworks and substantially improve their robustness. Our experiments indicate that Panda delivers superior performance compared to PDA methods, and a wide range of TTA methods exhibit significantly enhanced performance when integrated with Panda.

ICML Conference 2025 Conference Paper

Breaking Silos: Adaptive Model Fusion Unlocks Better Time Series Forecasting

  • Zhining Liu 0002
  • Ze Yang
  • Xiao Lin 0016
  • Ruizhong Qiu
  • Tianxin Wei
  • Yada Zhu
  • Hendrik F. Hamann
  • Jingrui He

Time-series forecasting plays a critical role in many real-world applications. Although increasingly powerful models have been developed and achieved superior results on benchmark datasets, through a fine-grained sample-level inspection, we find that (i) no single model consistently outperforms others across different test samples, but instead (ii) each model excels in specific cases. These findings prompt us to explore how to adaptively leverage the distinct strengths of various forecasting models for different samples. We introduce TimeFuse, a framework for collective time-series forecasting with sample-level adaptive fusion of heterogeneous models. TimeFuse utilizes meta-features to characterize input time series and trains a learnable fusor to predict optimal model fusion weights for any given input. The fusor can leverage samples from diverse datasets for joint training, allowing it to adapt to a wide variety of temporal patterns and thus generalize to new inputs, even from unseen datasets. Extensive experiments demonstrate the effectiveness of TimeFuse in various long-/short-term forecasting tasks, achieving near-universal improvement over the state-of-the-art individual models. Code is available at https: //github. com/ZhiningLiu1998/TimeFuse.

NeurIPS Conference 2025 Conference Paper

CLIMB: Class-imbalanced Learning Benchmark on Tabular Data

  • Zhining Liu
  • Zihao Li
  • Ze Yang
  • Tianxin Wei
  • Jian Kang
  • Yada Zhu
  • Hendrik Hamann
  • Jingrui He

Class-imbalanced learning (CIL) on tabular data is important in many real-world applications where the minority class holds the critical but rare outcomes. In this paper, we present CLIMB, a comprehensive benchmark for class-imbalanced learning on tabular data. CLIMB includes 73 real-world datasets across diverse domains and imbalance levels, along with unified implementations of 29 representative CIL algorithms. Built on a high-quality open-source Python package with unified API designs, detailed documentation, and rigorous code quality controls, CLIMB supports easy implementation and comparison between different CIL algorithms. Through extensive experiments, we provide practical insights on method accuracy and efficiency, highlighting the limitations of naive rebalancing, the effectiveness of ensembles, and the importance of data quality. Our code, documentation, and examples are available at https: //github. com/ZhiningLiu1998/imbalanced-ensemble.

ICML Conference 2025 Conference Paper

Graph4MM: Weaving Multimodal Learning with Structural Information

  • Xuying Ning
  • Dongqi Fu
  • Tianxin Wei
  • Wujiang Xu
  • Jingrui He

Real-world multimodal data usually exhibit complex structural relationships beyond traditional one-to-one mappings like image-caption pairs. Entities across modalities interact in intricate ways, with images and text forming diverse interconnections through contextual dependencies and co-references. Graphs provide powerful structural information for modeling intra-modal and inter-modal relationships. However, previous works fail to distinguish multi-hop neighbors and treat the graph as a standalone modality, which fragments the overall understanding. This limitation presents two key challenges in multimodal learning: (1) integrating structural information from multi-hop neighbors into foundational models, and (2) fusing modality-specific information in a principled manner. To address these challenges, we revisit the role of graphs in multimodal learning within the era of foundation models and propose Graph4MM, a graph-based multimodal learning framework. To be specific, we introduce Hop-Diffused Attention, which integrates multi-hop structural information into self-attention through causal masking and hop diffusion. Furthermore, we design MM-QFormer, a multi-mapping querying transformer for cross-modal fusion. Through theoretical and empirical analysis, we show that leveraging structures to integrate both intra- and inter-modal interactions improves multimodal understanding beyond treating them as a standalone modality. Experiments on both generative and discriminative tasks show that Graph4MM outperforms larger VLMs, LLMs, and multimodal graph baselines, achieving a 6. 93% average improvement.

TMLR Journal 2025 Journal Article

Hypergraphs as Weighted Directed Self-Looped Graphs: Spectral Properties, Clustering, Cheeger Inequality

  • Zihao Li
  • Dongqi Fu
  • Hengyu Liu
  • Jingrui He

Hypergraphs naturally arise when studying group relations and have been widely used in the field of machine learning. To the best of our knowledge, the recently proposed edge-dependent vertex weights (EDVW) modeling is one of the most generalized modeling methods of hypergraphs, i.e., most existing hypergraph conceptual modeling methods can be generalized as EDVW hypergraphs without information loss. However, the relevant algorithmic developments on EDVW hypergraphs remain nascent: compared to the spectral theories for graphs, its formulations are incomplete, the spectral clustering algorithms are not well-developed, and the hypergraph Cheeger Inequality is not well-defined. To this end, deriving a unified random walk-based formulation, we propose our definitions of hypergraph Rayleigh Quotient, NCut, boundary/cut, volume, and conductance, which are consistent with the corresponding definitions on graphs. Then, we prove that the normalized hypergraph Laplacian is associated with the NCut value, which inspires our proposed HyperClus-G algorithm for spectral clustering on EDVW hypergraphs. Finally, we prove that HyperClus-G can always find an approximately linearly optimal partitioning in terms of both NCut and conductance. Additionally, we provide extensive experiments to validate our theoretical findings from an empirical perspective.

ICML Conference 2025 Conference Paper

Learnable Spatial-Temporal Positional Encoding for Link Prediction

  • Katherine Tieu
  • Dongqi Fu
  • Zihao Li 0006
  • Ross Maciejewski
  • Jingrui He

Accurate predictions rely on the expressiveness power of graph deep learning frameworks like graph neural networks and graph transformers, where a positional encoding mechanism has become much more indispensable in recent state-of-the-art (SOTA) works to record the canonical position information. However, the current positional encoding limits in three aspects, at least: (1) most positional encodings are pre-defined, and fixed functions, which are inadequate to adapt to the complex attributed graphs; (2) a few pioneering works propose the learnable positional encoding but still limited to the structural information, leaving the real-world time-evolving topological and feature information untouched; (3) most positional encodings should be equipped with transformer’s attention mechanism to fully release the power, where the dense or relational attention is often unaffordable on large-scale structured data. Hence, we study the possibility of Learnable Spatial-Temporal Positional Encoding in an effective and efficient manner and then propose a simple temporal link prediction model named L-STEP. Briefly, for L-STEP, we (1) prove the proposed positional learning scheme can preserve the graph property from the spatial-temporal spectral viewpoint, (2) verify that MLPs can fully exploit the expressiveness and reach Transformers’ performance on that encoding, (3) change different initial positional encoding inputs to show robustness, (4) analyze the theoretical complexity and obtain less empirical running time than SOTA, and (5) demonstrate its temporal link prediction out-performance on 13 classic datasets and with 10 algorithms in both transductive and inductive settings using 3 different sampling strategies. Also, L-STEP obtains the leading performance in the newest large-scale TGB benchmark.

ICLR Conference 2025 Conference Paper

Matcha: Mitigating Graph Structure Shifts with Test-Time Adaptation

  • Wenxuan Bao
  • Zhichen Zeng 0001
  • Zhining Liu 0002
  • Hanghang Tong
  • Jingrui He

Powerful as they are, graph neural networks (GNNs) are known to be vulnerable to distribution shifts. Recently, test-time adaptation (TTA) has attracted attention due to its ability to adapt a pre-trained model to a target domain, without re-accessing the source domain. However, existing TTA algorithms are primarily designed for attribute shifts in vision tasks, where samples are independent. These methods perform poorly on graph data that experience structure shifts, where node connectivity differs between source and target graphs. We attribute this performance gap to the distinct impact of node attribute shifts versus graph structure shifts: the latter significantly degrades the quality of node representations and blurs the boundaries between different node categories. To address structure shifts in graphs, we propose Matcha, an innovative framework designed for effective and efficient adaptation to structure shifts by adjusting the htop-aggregation parameters in GNNs. To enhance the representation quality, we design a prediction-informed clustering loss to encourage the formation of distinct clusters for different node categories. Additionally, Matcha seamlessly integrates with existing TTA algorithms, allowing it to handle attribute shifts effectively while improving overall performance under combined structure and attribute shifts. We validate the effectiveness of Matcha on both synthetic and real-world datasets, demonstrating its robustness across various combinations of structure and attribute shifts. Our code is available at https://github.com/baowenxuan/Matcha.

NeurIPS Conference 2025 Conference Paper

Mint: A Simple Test-Time Adaptation of Vision-Language Models against Common Corruptions

  • Wenxuan Bao
  • Ruxi Deng
  • Jingrui He

Pretrained vision-language models such as CLIP achieve strong zero-shot generalization but remain vulnerable to distribution shifts caused by input corruptions. In this work, we investigate how corruptions affect CLIP’s image embeddings and uncover a consistent phenomenon we term as embedding variance collapse, where both intra-class and inter-class variances shrink as corruption severity increases. We find that this collapse is closely tied to performance degradation, with inter-class variance strongly correlated with classification accuracy. To explain this phenomenon, we analyze how corruptions alter the structure of the embedding space. Our theoretical results suggest that the visual encoder tends to encode corruption-related signals, which dilute class-discriminative features and compress the representation geometry. We further show that maximizing inter-class variance, even when estimated from pseudo-labels, can provably enhance embedding quality. Based on this insight, we propose Mint, a simple test-time adaptation method that maximizes pseudo-label-based inter-class variance on the fly using a mean accumulator and a gradient accumulator. Mint operates effectively with small batch sizes and consistently improves performance across multiple corruption benchmarks and CLIP architectures. Our code is available at https: //github. com/baowenxuan/Mint.

NeurIPS Conference 2025 Conference Paper

ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs

  • Jiaru Zou
  • Ling Yang
  • Jingwen Gu
  • Jiahao Qiu
  • Ke Shen
  • Jingrui He
  • Mengdi Wang

Process Reward Models (PRMs) have recently emerged as a powerful framework for supervising intermediate reasoning steps in large language models (LLMs). Previous PRMs are primarily trained on model final output responses and struggle to evaluate intermediate thinking trajectories robustly, especially in the emerging setting of trajectory–response outputs generated by frontier reasoning models like Deepseek-R1. In this work, we introduce ReasonFlux-PRM, a novel trajectory-aware PRM explicitly designed to evaluate the trajectory-response type of reasoning traces. ReasonFlux-PRM incorporates both step-level and trajectory-level supervision, enabling fine-grained reward assignment aligned with structured chain-of-thought data. We adapt ReasonFlux-PRM to support reward supervision under both offline and online settings, including (i) selecting high-quality model distillation data for downstream supervised fine-tuning of smaller models, (ii) providing dense process-level rewards for policy optimization during reinforcement learning, and (iii) enabling reward-guided Best-of-N test-time scaling. Empirical results on challenging downstream benchmarks such as AIME, MATH500, and GPQA-Diamond demonstrate that ReasonFlux-PRM-7B selects higher quality data than strong PRMs (e. g. , Qwen2. 5-Math-PRM-72B) and human-curated baselines. Furthermore, ReasonFlux-PRM-7B yields consistent performance improvements, achieving average gains of 12. 1\% in supervised fine-tuning, 4. 5\% in reinforcement learning, and 6. 3\% in test-time scaling. We also release an efficient ReasonFlux-PRM-1. 5B for resource-constrained applications and edge deployment. Our code and models are released at https: //github. com/Gen-Verse/ReasonFlux.

ICLR Conference 2025 Conference Paper

Temporal Heterogeneous Graph Generation with Privacy, Utility, and Efficiency

  • Xinyu He 0003
  • Dongqi Fu
  • Hanghang Tong
  • Ross Maciejewski
  • Jingrui He

Nowadays, temporal heterogeneous graphs attract much research and industrial attention for building the next-generation Relational Deep Learning models and applications, due to their informative structures and features. While providing timely and precise services like personalized recommendations and question answering, this rich information also introduces extra exposure risk for each node in the graph. The distinctive local topology, the abundant heterogeneous features, and the time dimension of the graph data are more prone to expose sensitive information and narrow down the scope of victim candidates, which calls for well-defined protection techniques on graphs. To this end, we propose a Temporal Heterogeneous Graph Generator balancing Privacy, Utility, and Efficiency, named THePUff. More specifically, we first propose a differential privacy algorithm to perturb the input temporal heterogeneous graph for protecting privacy, and then utilize both the perturbed graph and the original one in a generative adversarial setting for THePUff to learn and generate privacy-guaranteed and utility-preserved graph data in an efficient manner. We further propose 6 new metrics in the temporal setting to measure heterogeneous graph utility and privacy. Finally, based on temporal heterogeneous graph datasets with up to 1 million nodes and 20 million edges, the experiments show that THePUff generates utilizable temporal heterogeneous graphs with privacy protected, compared with state-of-the-art baselines.

NeurIPS Conference 2025 Conference Paper

Transformer Copilot: Learning from The Mistake Log in LLM Fine-tuning

  • Jiaru Zou
  • Yikun Ban
  • Zihao Li
  • Yunzhe Qi
  • Ruizhong Qiu
  • Ling Yang
  • Jingrui He

Large language models are typically adapted to downstream tasks through supervised fine-tuning on domain-specific data. While standard fine-tuning focuses on minimizing generation loss to optimize model parameters, we take a deeper step by retaining and leveraging the model’s own learning signals, analogous to how human learners reflect on past mistakes to improve future performance. We first introduce the concept of Mistake Log to systematically track the model’s learning behavior and recurring errors throughout fine-tuning. Treating the original transformer-based model as the Pilot, we correspondingly design a Copilot model to refine the Pilot’s inference performance via logits rectification. We name the overall Pilot-Copilot framework the Transformer Copilot, which introduces (i) a novel Copilot model design, (ii) a joint training paradigm where the Copilot continuously learns from the evolving Mistake Log alongside the Pilot, and (iii) a fused inference paradigm where the Copilot rectifies the Pilot’s logits for enhanced generation. We provide both theoretical and empirical analyses on our new learning framework. Experiments on 12 benchmarks spanning commonsense, arithmetic, and recommendation tasks demonstrate that Transformer Copilot consistently improves performance by up to 34. 5%, while introducing marginal computational overhead to Pilot models and exhibiting strong scalability and transferability. Our code is released at https: //github. com/jiaruzouu/TransformerCopilot.

JAIR Journal 2025 Journal Article

Trustworthy Transfer Learning: A Survey

  • Jun Wu
  • Jingrui He

Transfer learning aims to transfer knowledge or information from a source domain to a relevant target domain. In this paper, we understand transfer learning from the perspectives of knowledge transferability and trustworthiness. This involves two research questions: How is knowledge transferability quantitatively measured and enhanced across domains? Can we trust the transferred knowledge in the transfer learning process? To answer these questions, this paper provides a comprehensive review of trustworthy transfer learning from various aspects, including problem definitions, theoretical analysis, empirical algorithms, and real-world applications. Specifically, we summarize recent theories and algorithms for understanding knowledge transferability under (within-domain) IID and non-IID assumptions. In addition to knowledge transferability, we review the impact of trustworthiness on transfer learning, e.g., whether the transferred knowledge is adversarially robust or algorithmically fair, how to transfer the knowledge under privacy-preserving constraints, etc. Beyond discussing the current advancements, we highlight the open questions and future directions for understanding transfer learning in a reliable and trustworthy manner.

ICML Conference 2024 Conference Paper

Class-Imbalanced Graph Learning without Class Rebalancing

  • Zhining Liu 0002
  • Ruizhong Qiu
  • Zhichen Zeng 0001
  • Hyunsik Yoo
  • David Zhou
  • Zhe Xu 0007
  • Yada Zhu
  • Kommy Weldemariam

Class imbalance is prevalent in real-world node classification tasks and poses great challenges for graph learning models. Most existing studies are rooted in a class-rebalancing (CR) perspective and address class imbalance with class-wise reweighting or resampling. In this work, we approach the root cause of class-imbalance bias from an topological paradigm. Specifically, we theoretically reveal two fundamental phenomena in the graph topology that greatly exacerbate the predictive bias stemming from class imbalance. On this basis, we devise a lightweight topological augmentation framework BAT to mitigate the class-imbalance bias without class rebalancing. Being orthogonal to CR, BAT can function as an efficient plug-and-play module that can be seamlessly combined with and significantly boost existing CR techniques. Systematic experiments on real-world imbalanced graph learning tasks show that BAT can deliver up to 46. 27% performance gain and up to 72. 74% bias reduction over existing techniques. Code, examples, and documentations are available at https: //github. com/ZhiningLiu1998/BAT.

ICLR Conference 2024 Conference Paper

Contextual Bandits with Online Neural Regression

  • Rohan Deb
  • Yikun Ban
  • Shiliang Zuo
  • Jingrui He
  • Arindam Banerjee 0001

Recent works have shown a reduction from contextual bandits to online regression under a realizability assumption (Foster and Rakhlin, 2020; Foster and Krishnamurthy, 2021). In this work, we investigate the use of neural networks for such online regression and associated Neural Contextual Bandits (NeuCBs). Using existing results for wide networks, one can readily show a ${\mathcal{O}}(\sqrt{T})$ regret for online regression with square loss, which via the reduction implies a ${\mathcal{O}}(\sqrt{K} T^{3/4})$ regret for NeuCBs. Departing from this standard approach, we first show a $\mathcal{O}(\log T)$ regret for online regression with almost convex losses that satisfy QG (Quadratic Growth) condition, a generalization of the PL (Polyak-\L ojasiewicz) condition, and that have a unique minima. Although not directly applicable to wide networks since they do not have unique minima, we show that adding a suitable small random perturbation to the network predictions surprisingly makes the loss satisfy QG with unique minima. Based on such a perturbed prediction, we show a ${\mathcal{O}}(\log T)$ regret for online regression with both squared loss and KL loss, and subsequently convert these respectively to $\tilde{\mathcal{O}}(\sqrt{KT})$ and $\tilde{\mathcal{O}}(\sqrt{KL^*} + K)$ regret for NeuCB, where $L^*$ is the loss of the best policy. Separately, we also show that existing regret bounds for NeuCBs are $\Omega(T)$ or assume i.i.d. contexts, unlike this work. Finally, our experimental results on various datasets demonstrate that our algorithms, especially the one based on KL loss, persistently outperform existing algorithms.

TMLR Journal 2024 Journal Article

DrGNN: Deep Residual Graph Neural Network with Contrastive Learning

  • Lecheng Zheng
  • Dongqi Fu
  • Ross Maciejewski
  • Jingrui He

Recent studies reveal that deep representation learning models without proper regularization can suffer from the dimensional collapse problem, i.e., representation vectors span over a lower dimensional space. In the domain of graph deep representation learning, the phenomenon that the node representations are indistinguishable and even shrink to a constant vector is called oversmoothing. Based on the analysis of the rank of node representations, we find that representation oversmoothing and dimensional collapse are highly related to each other in deep graph neural networks, and the oversmoothing problem can be interpreted by the dimensional collapse of the node representation matrix. Then, to address the dimensional collapse and the oversmoothing together in deep graph neural networks, we first find vanilla residual connections and contrastive learning producing sub-optimal outcomes by ignoring the structured constraints of graph data. Motivated by this, we propose a novel graph neural network named DrGNN to alleviate the oversmoothing issue from the perspective of addressing dimensional collapse. Specifically, in DrGNN, we design a topology-preserving residual connection for graph neural networks to force the low-rank of hidden representations close to the full-rank input features. Also, we propose the structure-guided contrastive learning to ensure only close neighbors who share similar local connections can have similar representations. Empirical experiments on multiple real-world datasets demonstrate that DrGNN outperforms state-of-the-art deep graph representation baseline algorithms. The code of our method is available at the GitHub link: https://github.com/zhenglecheng/DrGNN.

ICML Conference 2024 Conference Paper

Graph Mixup on Approximate Gromov-Wasserstein Geodesics

  • Zhichen Zeng 0001
  • Ruizhong Qiu
  • Zhe Xu 0007
  • Zhining Liu 0002
  • Yuchen Yan
  • Tianxin Wei
  • Lei Ying 0001
  • Jingrui He

Mixup, which generates synthetic training samples on the data manifold, has been shown to be highly effective in augmenting Euclidean data. However, finding a proper data manifold for graph data is non-trivial, as graphs are non-Euclidean data in disparate spaces. Though efforts have been made, most of the existing graph mixup methods neglect the intrinsic geodesic guarantee, thereby generating inconsistent sample-label pairs. To address this issue, we propose GeoMix to mixup graphs on the Gromov-Wasserstein (GW) geodesics. A joint space over input graphs is first defined based on the GW distance, and graphs are then transformed into the GW space through equivalence-preserving transformations. We further show that the linear interpolation of the transformed graph pairs defines a geodesic connecting the original pairs on the GW manifold, hence ensuring the consistency between generated samples and labels. An accelerated mixup algorithm on the approximate low-dimensional GW manifold is further proposed. Extensive experiments show that the proposed GeoMix promotes the generalization and robustness of GNN models.

ICLR Conference 2024 Conference Paper

Neural Active Learning Beyond Bandits

  • Yikun Ban
  • Ishika Agarwal
  • Ziwei Wu
  • Yada Zhu
  • Kommy Weldemariam
  • Hanghang Tong
  • Jingrui He

We study both stream-based and pool-based active learning with neural network approximations. A recent line of works proposed bandit-based approaches that transformed active learning into a bandit problem, achieving both theoretical and empirical success. However, the performance and computational costs of these methods may be susceptible to the number of classes, denoted as $K$, due to this transformation. Therefore, this paper seeks to answer the question: "How can we mitigate the adverse impacts of $K$ while retaining the advantages of principled exploration and provable performance guarantees in active learning?" To tackle this challenge, we propose two algorithms based on the newly designed exploitation and exploration neural networks for stream-based and pool-based active learning. Subsequently, we provide theoretical performance guarantees for both algorithms in a non-parametric setting, demonstrating a slower error-growth rate concerning $K$ for the proposed approaches. We use extensive experiments to evaluate the proposed algorithms, which consistently outperform state-of-the-art baselines.

NeurIPS Conference 2024 Conference Paper

PageRank Bandits for Link Prediction

  • Yikun Ban
  • Jiaru Zou
  • Zihao Li
  • Yunzhe Qi
  • Dongqi Fu
  • Jian Kang
  • Hanghang Tong
  • Jingrui He

Link prediction is a critical problem in graph learning with broad applications such as recommender systems and knowledge graph completion. Numerous research efforts have been directed at solving this problem, including approaches based on similarity metrics and Graph Neural Networks (GNN). However, most existing solutions are still rooted in conventional supervised learning, which makes it challenging to adapt over time to changing customer interests and to address the inherent dilemma of exploitation versus exploration in link prediction. To tackle these challenges, this paper reformulates link prediction as a sequential decision-making process, where each link prediction interaction occurs sequentially. We propose a novel fusion algorithm, PRB (PageRank Bandits), which is the first to combine contextual bandits with PageRank for collaborative exploitation and exploration. We also introduce a new reward formulation and provide a theoretical performance guarantee for PRB. Finally, we extensively evaluate PRB in both online and offline settings, comparing it with bandit-based and graph-based methods. The empirical success of PRB demonstrates the value of the proposed fusion approach. Our code is released at https: //github. com/jiaruzouu/PRB.

NeurIPS Conference 2024 Conference Paper

Robust Neural Contextual Bandit against Adversarial Corruptions

  • Yunzhe Qi
  • Yikun Ban
  • Arindam Banerjee
  • Jingrui He

Contextual bandit algorithms aim to identify the optimal arm with the highest reward among a set of candidates, based on the accessible contextual information. Among these algorithms, neural contextual bandit methods have shown generally superior performances against linear and kernel ones, due to the representation power of neural networks. However, similar to other neural network applications, neural bandit algorithms can be vulnerable to adversarial attacks or corruptions on the received labels (i. e. , arm rewards), which can lead to unexpected performance degradation without proper treatments. As a result, it is necessary to improve the robustness of neural bandit models against potential reward corruptions. In this work, we propose a novel neural contextual bandit algorithm named R-NeuralUCB, which utilizes a novel context-aware Gradient Descent (GD) training strategy to improve the robustness against adversarial reward corruptions. Under over-parameterized neural network settings, we provide regret analysis for R-NeuralUCB to quantify reward corruption impacts, without the commonly adopted arm separateness assumption in existing neural bandit works. We also conduct experiments against baselines on real data sets under different scenarios, in order to demonstrate the effectiveness of our proposed R-NeuralUCB.

NeurIPS Conference 2024 Conference Paper

Temporal Graph Neural Tangent Kernel with Graphon-Guaranteed

  • Katherine Tieu
  • Dongqi Fu
  • Yada Zhu
  • Hendrik Hamann
  • Jingrui He

_Graph Neural Tangent Kernel_ (GNTK) fuses graph neural networks and graph kernels, simplifies the process of graph representation learning, interprets the training dynamics of graph neural networks, and serves various applications like protein identification, image segmentation, and social network analysis. In practice, graph data carries complex information among entities that inevitably evolves over time, and previous static graph neural tangent kernel methods may be stuck in the sub-optimal solution in terms of both effectiveness and efficiency. As a result, extending the advantage of GNTK to temporal graphs becomes a critical problem. To this end, we propose the temporal graph neural tangent kernel, which not only extends the simplicity and interpretation ability of GNTK to the temporal setting but also leads to rigorous temporal graph classification error bounds. Furthermore, we prove that when the input temporal graph grows over time in the number of nodes, our temporal graph neural tangent kernel will converge in the limit to the _graphon_ NTK value, which implies the transferability and robustness of the proposed kernel method, named **Temp**oral **G**raph **N**eural **T**angent **K**ernel with **G**raphon-**G**uaranteed or **Temp-G$^3$NTK**. In addition to the theoretical analysis, we also perform extensive experiments, not only demonstrating the superiority of Temp-G$^3$NTK in the temporal graph classification task, but also showing that Temp-G^3NTK can achieve very competitive performance in node-level tasks like node classification compared with various SOTA graph kernel and representation learning baselines. Our code is available at https: //github. com/kthrn22/TempGNTK.

NeurIPS Conference 2024 Conference Paper

Towards Editing Time Series

  • Baoyu Jing
  • Shuqi Gu
  • Tianyu Chen
  • Zhiyu Yang
  • Dongsheng Li
  • Jingrui He
  • Kan Ren

Synthesizing time series data is pivotal in modern society, aiding effective decision making and ensuring privacy preservation in various scenarios. Time series are associated with various attributes, including trends, seasonality, and external information such as location. Recent research has predominantly focused on random unconditional synthesis or conditional synthesis. Nonetheless, these paradigms generate time series from scratch and are incapable of manipulating existing time series samples. This paper introduces a novel task, called Time Series Editing (TSE), to synthesize time series by manipulating existing time series. The objective is to modify the given time series according to the specified attributes while preserving other properties unchanged. This task is not trivial due to the inadequacy of data coverage and the intricate relationships between time series and their attributes. To address these issues, we introduce a novel diffusion model, called TEdit. The proposed TEdit is trained using a novel bootstrap learning algorithm that effectively enhances the coverage of the original data. It is also equipped with an innovative multi-resolution modeling and generation paradigm to capture the complex relationships between time series and their attributes. Experimental results demonstrate the efficacy of TEdit for editing specified attributes upon the existing time series data. The project page is at https: //seqml. github. io/tse.

ICLR Conference 2024 Conference Paper

Towards Unified Multi-Modal Personalization: Large Vision-Language Models for Generative Recommendation and Beyond

  • Tianxin Wei
  • Bowen Jin
  • Ruirui Li 0002
  • Hansi Zeng
  • Zhengyang Wang
  • Jianhui Sun
  • Qingyu Yin
  • Hanqing Lu

Developing a universal model that can effectively harness heterogeneous resources and respond to a wide range of personalized needs has been a longstanding community aspiration. Our daily choices, especially in domains like fashion and retail, are substantially shaped by multi-modal data, such as pictures and textual descriptions. These modalities not only offer intuitive guidance but also cater to personalized user preferences. However, the predominant personalization approaches mainly focus on ID or text-based recommendation problems, failing to comprehend the information spanning various tasks or modalities. In this paper, our goal is to establish a Unified paradigm for Multi-modal Personalization systems (UniMP), which effectively leverages multi-modal data while eliminating the complexities associated with task- and modality-specific customization. We argue that the advancements in foundational generative modeling have provided the flexibility and effectiveness necessary to achieve the objective. In light of this, we develop a generic and extensible personalization generative framework, that can handle a wide range of personalized needs including item recommendation, product search, preference prediction, explanation generation, and further user-guided image generation. Our methodology enhances the capabilities of foundational language models for personalized tasks by seamlessly ingesting interleaved cross-modal user history information, ensuring a more precise and customized experience for users. To train and evaluate the proposed multi-modal personalized tasks, we also introduce a novel and comprehensive benchmark covering a variety of user requirements. Our experiments on the real-world benchmark showcase the model's potential, outperforming competitive methods specialized for each task.

ICLR Conference 2024 Conference Paper

VCR-Graphormer: A Mini-batch Graph Transformer via Virtual Connections

  • Dongqi Fu
  • Zhigang Hua
  • Yan Xie
  • Jin Fang
  • Si Zhang
  • Kaan Sancak
  • Hao Wu
  • Andrey Malevich

Graph transformer has been proven as an effective graph learning method for its adoption of attention mechanism that is capable of capturing expressive representations from complex topological and feature information of graphs. Graph transformer conventionally performs dense attention (or global attention) for every pair of nodes to learn node representation vectors, resulting in quadratic computational costs that are unaffordable for large-scale graph data. Therefore, mini-batch training for graph transformers is a promising direction, but limited samples in each mini-batch can not support effective dense attention to encode informative representations. Facing this bottleneck, (1) we start by assigning each node a token list that is sampled by personalized PageRank (PPR) and then apply standard multi-head self-attention only on this list to compute its node representations. This PPR tokenization method decouples model training from complex graph topological information and makes heavy feature engineering offline and independent, such that mini-batch training of graph transformers is possible by loading each node's token list in batches. We further prove this PPR tokenization is viable as a graph convolution network with a fixed polynomial filter and jumping knowledge. However, only using personalized PageRank may limit information carried by a token list, which could not support different graph inductive biases for model training. To this end, (2) we rewire graphs by introducing multiple types of virtual connections through structure- and content-based super nodes that enable PPR tokenization to encode local and global contexts, long-range interaction, and heterophilous information into each node's token list, and then formalize our $\underline{\textbf{V}}$irtual $\underline{\textbf{C}}$onnection $\underline{\textbf{R}}$anking based $\underline{\textbf{Graph}}$ Trans$\underline{\textbf{former}}$ (VCR-Graphormer). Overall, VCR-Graphormer needs $O(m+klogk)$ complexity for graph tokenization as compared to $O(n^{3})$ of previous works. The [code](https://github.com/DongqiFu/VCR-Graphormer) is provided.

NeurIPS Conference 2023 Conference Paper

Adaptive Test-Time Personalization for Federated Learning

  • Wenxuan Bao
  • Tianxin Wei
  • Haohan Wang
  • Jingrui He

Personalized federated learning algorithms have shown promising results in adapting models to various distribution shifts. However, most of these methods require labeled data on testing clients for personalization, which is usually unavailable in real-world scenarios. In this paper, we introduce a novel setting called test-time personalized federated learning (TTPFL), where clients locally adapt a global model in an unsupervised way without relying on any labeled data during test-time. While traditional test-time adaptation (TTA) can be used in this scenario, most of them inherently assume training data come from a single domain, while they come from multiple clients (source domains) with different distributions. Overlooking these domain interrelationships can result in suboptimal generalization. Moreover, most TTA algorithms are designed for a specific kind of distribution shift and lack the flexibility to handle multiple kinds of distribution shifts in FL. In this paper, we find that this lack of flexibility partially results from their pre-defining which modules to adapt in the model. To tackle this challenge, we propose a novel algorithm called ATP to adaptively learns the adaptation rates for each module in the model from distribution shifts among source domains. Theoretical analysis proves the strong generalization of ATP. Extensive experiments demonstrate its superiority in handling various distribution shifts including label shift, image corruptions, and domain shift, outperforming existing TTA methods across multiple datasets and model architectures. Our code is available at https: //github. com/baowenxuan/ATP.

NeurIPS Conference 2023 Conference Paper

Graph-Structured Gaussian Processes for Transferable Graph Learning

  • Jun Wu
  • Lisa Ainsworth
  • Andrew Leakey
  • Haixun Wang
  • Jingrui He

Transferable graph learning involves knowledge transferability from a source graph to a relevant target graph. The major challenge of transferable graph learning is the distribution shift between source and target graphs induced by individual node attributes and complex graph structures. To solve this problem, in this paper, we propose a generic graph-structured Gaussian process framework (GraphGP) for adaptively transferring knowledge across graphs with either homophily or heterophily assumptions. Specifically, GraphGP is derived from a novel graph structure-aware neural network in the limit on the layer width. The generalization analysis of GraphGP explicitly investigates the connection between knowledge transferability and graph domain similarity. Extensive experiments on several transferable graph learning benchmarks demonstrate the efficacy of GraphGP over state-of-the-art Gaussian process baselines.

NeurIPS Conference 2023 Conference Paper

Meta-Learning with Neural Bandit Scheduler

  • Yunzhe Qi
  • Yikun Ban
  • Tianxin Wei
  • Jiaru Zou
  • Huaxiu Yao
  • Jingrui He

Meta-learning has been proven an effective learning paradigm for training machine learning models with good generalization ability. Apart from the common practice of uniformly sampling the meta-training tasks, existing methods working on task scheduling strategies are mainly based on pre-defined sampling protocols or the assumed task-model correlations, and greedily make scheduling decisions, which can lead to sub-optimal performance bottlenecks of the meta-model. In this paper, we propose a novel task scheduling framework under Contextual Bandits settings, named BASS, which directly optimizes the task scheduling strategy based on the status of the meta-model. By balancing the exploitation and exploration in meta-learning task scheduling, BASS can help tackle the challenge of limited knowledge about the task distribution during the early stage of meta-training, while simultaneously exploring potential benefits for forthcoming meta-training iterations through an adaptive exploration strategy. Theoretical analysis and extensive experiments are presented to show the effectiveness of our proposed framework.

AAAI Conference 2023 Conference Paper

Non-IID Transfer Learning on Graphs

  • Jun Wu
  • Jingrui He
  • Elizabeth Ainsworth

Transfer learning refers to the transfer of knowledge or information from a relevant source domain to a target domain. However, most existing transfer learning theories and algorithms focus on IID tasks, where the source/target samples are assumed to be independent and identically distributed. Very little effort is devoted to theoretically studying the knowledge transferability on non-IID tasks, e.g., cross-network mining. To bridge the gap, in this paper, we propose rigorous generalization bounds and algorithms for cross-network transfer learning from a source graph to a target graph. The crucial idea is to characterize the cross-network knowledge transferability from the perspective of the Weisfeiler-Lehman graph isomorphism test. To this end, we propose a novel Graph Subtree Discrepancy to measure the graph distribution shift between source and target graphs. Then the generalization error bounds on cross-network transfer learning, including both cross-network node classification and link prediction tasks, can be derived in terms of the source knowledge and the Graph Subtree Discrepancy across domains. This thereby motivates us to propose a generic graph adaptive network (GRADE) to minimize the distribution shift between source and target graphs for cross-network transfer learning. Experimental results verify the effectiveness and efficiency of our GRADE framework on both cross-network node classification and cross-domain recommendation tasks.

ICML Conference 2023 Conference Paper

NTK-approximating MLP Fusion for Efficient Language Model Fine-tuning

  • Tianxin Wei
  • Zeming Guo
  • Yifan Chen 0004
  • Jingrui He

Fine-tuning a pre-trained language model (PLM) emerges as the predominant strategy in many natural language processing applications. However, even fine-tuning the PLMs and doing inference are expensive, especially on edge devices with low computing power. Some general approaches (e. g. quantization and distillation) have been widely studied to reduce the compute/memory of PLM fine-tuning, while very few one-shot compression techniques are explored. In this paper, we investigate the neural tangent kernel (NTK)–which reveals the gradient descent dynamics of neural networks–of the multilayer perceptrons (MLP) modules in a PLM and propose to coin a lightweight PLM through NTK-approximating MLP fusion. To achieve this, we reconsider the MLP as a bundle of sub-MLPs, and cluster them into a given number of centroids, which can then be restored as a compressed MLP and surprisingly shown to well approximate the NTK of the original PLM. Extensive experiments of PLM fine-tuning on both natural language understanding (NLU) and generation (NLG) tasks are provided to verify the effectiveness of the proposed method MLP fusion. Our code is available at https: //github. com/weitianxin/MLP_Fusion.

ICML Conference 2023 Conference Paper

Optimizing the Collaboration Structure in Cross-Silo Federated Learning

  • Wenxuan Bao
  • Haohan Wang
  • Jun Wu 0019
  • Jingrui He

In federated learning (FL), multiple clients collaborate to train machine learning models together while keeping their data decentralized. Through utilizing more training data, FL suffers from the potential negative transfer problem: the global FL model may even perform worse than the models trained with local data only. In this paper, we propose FedCollab, a novel FL framework that alleviates negative transfer by clustering clients into non-overlapping coalitions based on their distribution distances and data quantities. As a result, each client only collaborates with the clients having similar data distributions, and tends to collaborate with more clients when it has less data. We evaluate our framework with a variety of datasets, models, and types of non-IIDness. Our results demonstrate that FedCollab effectively mitigates negative transfer across a wide range of FL algorithms and consistently outperforms other clustered FL algorithms.

IJCAI Conference 2022 Conference Paper

A Unified Meta-Learning Framework for Dynamic Transfer Learning

  • Jun Wu
  • Jingrui He

Transfer learning refers to the transfer of knowledge or information from a relevant source task to a target task. However, most existing works assume both tasks are sampled from a stationary task distribution, thereby leading to the sub-optimal performance for dynamic tasks drawn from a non-stationary task distribution in real scenarios. To bridge this gap, in this paper, we study a more realistic and challenging transfer learning setting with dynamic tasks, i. e. , source and target tasks are continuously evolving over time. We theoretically show that the expected error on the dynamic target task can be tightly bounded in terms of source knowledge and consecutive distribution discrepancy across tasks. This result motivates us to propose a generic meta-learning framework L2E for modeling the knowledge transferability on dynamic tasks. It is centered around a task-guided meta-learning problem with a group of meta-pairs of tasks, based on which we are able to learn the prior model initialization for fast adaptation on the newest target task. L2E enjoys the following properties: (1) effective knowledge transferability across dynamic tasks; (2) fast adaptation to the new target task; (3) mitigation of catastrophic forgetting on historical target tasks; and (4) flexibility in incorporating any existing static transfer learning algorithms. Extensive experiments on various image data sets demonstrate the effectiveness of the proposed L2E framework.

NeurIPS Conference 2022 Conference Paper

Augmentations in Hypergraph Contrastive Learning: Fabricated and Generative

  • Tianxin Wei
  • Yuning You
  • Tianlong Chen
  • Yang Shen
  • Jingrui He
  • Zhangyang Wang

This paper targets at improving the generalizability of hypergraph neural networks in the low-label regime, through applying the contrastive learning approach from images/graphs (we refer to it as HyperGCL). We focus on the following question: How to construct contrastive views for hypergraphs via augmentations? We provide the solutions in two folds. First, guided by domain knowledge, we fabricate two schemes to augment hyperedges with higher-order relations encoded, and adopt three vertex augmentation strategies from graph-structured data. Second, in search of more effective views in a data-driven manner, we for the first time propose a hypergraph generative model to generate augmented views, and then an end-to-end differentiable pipeline to jointly learn hypergraph augmentations and model parameters. Our technical innovations are reflected in designing both fabricated and generative augmentations of hypergraphs. The experimental findings include: (i) Among fabricated augmentations in HyperGCL, augmenting hyperedges provides the most numerical gains, implying that higher-order information in structures is usually more downstream-relevant; (ii) Generative augmentations do better in preserving higher-order information to further benefit generalizability; (iii) HyperGCL also boosts robustness and fairness in hypergraph representation learning. Codes are released at https: //github. com/weitianxin/HyperGCL.

NeurIPS Conference 2022 Conference Paper

Deep Active Learning by Leveraging Training Dynamics

  • Haonan Wang
  • Wei Huang
  • Ziwei Wu
  • Hanghang Tong
  • Andrew J Margenot
  • Jingrui He

Active learning theories and methods have been extensively studied in classical statistical learning settings. However, deep active learning, i. e. , active learning with deep learning models, is usually based on empirical criteria without solid theoretical justification, thus suffering from heavy doubts when some of those fail to provide benefits in applications. In this paper, by exploring the connection between the generalization performance and the training dynamics, we propose a theory-driven deep active learning method (dynamicAL) which selects samples to maximize training dynamics. In particular, we prove that the convergence speed of training and the generalization performance is positively correlated under the ultra-wide condition and show that maximizing the training dynamics leads to a better generalization performance. Furthermore, to scale up to large deep neural networks and data sets, we introduce two relaxations for the subset selection problem and reduce the time complexity from polynomial to constant. Empirical results show that dynamicAL not only outperforms the other baselines consistently but also scales well on large deep learning models. We hope our work inspires more attempts in bridging the theoretical findings of deep networks and practical impacts in deep active learning applications.

NeurIPS Conference 2022 Conference Paper

Distribution-Informed Neural Networks for Domain Adaptation Regression

  • Jun Wu
  • Jingrui He
  • Sheng Wang
  • Kaiyu Guan
  • Elizabeth Ainsworth

In this paper, we study the problem of domain adaptation regression, which learns a regressor for a target domain by leveraging the knowledge from a relevant source domain. We start by proposing a distribution-informed neural network, which aims to build distribution-aware relationship of inputs and outputs from different domains. This allows us to develop a simple domain adaptation regression framework, which subsumes popular domain adaptation approaches based on domain invariant representation learning, reweighting, and adaptive Gaussian process. The resulting findings not only explain the connections of existing domain adaptation approaches, but also motivate the efficient training of domain adaptation approaches with overparameterized neural networks. We also analyze the convergence and generalization error bound of our framework based on the distribution-informed neural network. Specifically, our generalization bound focuses explicitly on the maximum mean discrepancy in the RKHS induced by the neural tangent kernel of distribution-informed neural network. This is in sharp contrast to the existing work which relies on domain discrepancy in the latent feature space heuristically formed by one or several hidden neural layers. The efficacy of our framework is also empirically verified on a variety of domain adaptation regression benchmarks.

ICLR Conference 2022 Conference Paper

EE-Net: Exploitation-Exploration Neural Networks in Contextual Bandits

  • Yikun Ban
  • Yuchen Yan
  • Arindam Banerjee 0001
  • Jingrui He

In this paper, we propose a novel neural exploration strategy in contextual bandits, EE-Net, distinct from the standard UCB-based and TS-based approaches. Contextual multi-armed bandits have been studied for decades with various applications. To solve the exploitation-exploration tradeoff in bandits, there are three main techniques: epsilon-greedy, Thompson Sampling (TS), and Upper Confidence Bound (UCB). In recent literature, linear contextual bandits have adopted ridge regression to estimate the reward function and combine it with TS or UCB strategies for exploration. However, this line of works explicitly assumes the reward is based on a linear function of arm vectors, which may not be true in real-world datasets. To overcome this challenge, a series of neural bandit algorithms have been proposed, where a neural network is used to learn the underlying reward function and TS or UCB are adapted for exploration. Instead of calculating a large-deviation based statistical bound for exploration like previous methods, we propose "EE-Net", a novel neural-based exploration strategy. In addition to using a neural network (Exploitation network) to learn the reward function, EE-Net uses another neural network (Exploration network) to adaptively learn potential gains compared to the currently estimated reward for exploration. Then, a decision-maker is constructed to combine the outputs from the Exploitation and Exploration networks. We prove that EE-Net can achieve $\mathcal{O}(\sqrt{T\log T})$ regret and show that EE-Net outperforms existing linear and neural contextual bandit baselines on real-world datasets.

NeurIPS Conference 2022 Conference Paper

Improved Algorithms for Neural Active Learning

  • Yikun Ban
  • Yuheng Zhang
  • Hanghang Tong
  • Arindam Banerjee
  • Jingrui He

We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting. In particular, we introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work. Then, the proposed algorithm leverages the powerful representation of NNs for both exploitation and exploration, has the query decision-maker tailored for $k$-class classification problems with the performance guarantee, utilizes the full feedback, and updates parameters in a more practical and efficient manner. These careful designs lead to an instance-dependent regret upper bound, roughly improving by a multiplicative factor $O(\log T)$ and removing the curse of input dimensionality. Furthermore, we show that the algorithm can achieve the same performance as the Bayes-optimal classifier in the long run under the hard-margin setting in classification problems. In the end, we use extensive experiments to evaluate the proposed algorithm and SOTA baselines, to show the improved empirical performance.

AAAI Conference 2021 Conference Paper

Outlier Impact Characterization for Time Series Data

  • Jianbo Li
  • Lecheng Zheng
  • Yada Zhu
  • Jingrui He

For time series data, certain types of outliers are intrinsically more harmful for parameter estimation and future predictions than others, irrespective of their frequency. In this paper, for the first time, we study the characteristics of such outliers through the lens of the influence functional from robust statistics. In particular, we consider the input time series as a contaminated process, with the recurring outliers generated from an unknown contaminating process. Then we leverage the influence functional to understand the impact of the contaminating process on parameter estimation. The influence functional results in a multi-dimensional vector that measures the sensitivity of the predictive model to the contaminating process, which can be challenging to interpret especially for models with a large number of parameters. To this end, we further propose a comprehensive single-valued metric (the SIF) to measure outlier impacts on future predictions. It provides a quantitative measure regarding the outlier impacts, which can be used in a variety of scenarios, such as the evaluation of outlier detection methods, the creation of more harmful outliers, etc. The empirical results on multiple real data sets demonstrate the effectivenss of the proposed SIF metric.

AAAI Conference 2020 Conference Paper

Towards Fine-Grained Temporal Network Representation via Time-Reinforced Random Walk

  • Zhining Liu
  • Dawei Zhou
  • Yada Zhu
  • Jinjie Gu
  • Jingrui He

Encoding a large-scale network into a low-dimensional space is a fundamental step for various network analytic problems, such as node classification, link prediction, community detection, etc. Existing methods focus on learning the network representation from either the static graphs or timeaggregated graphs (e. g. , time-evolving graphs). However, many real systems are not static or time-aggregated as the nodes and edges are timestamped and dynamically changing over time. For examples, in anti-money laundering analysis, cycles formed with time-ordered transactions might be red flags in online transaction networks; in novelty detection, a star-shaped structure appearing in a short burst might be an underlying hot topic in social networks. Existing embedding models might not be able to well preserve such finegrained network dynamics due to the incapability of dealing with continuous-time and the negligence of fine-grained interactions. To bridge this gap, in this paper, we propose a fine-grained temporal network embedding framework named FiGTNE, which aims to learn a comprehensive network representation that preserves the rich and complex network context in the temporal network. In particular, we start from the notion of fine-grained temporal networks, where the temporal network can be represented as a series of timestamped nodes and edges. Then, we propose the time-reinforced random walk (TRRW) with a bi-level context sampling strategy to explore the essential structures and temporal contexts in temporal networks. Extensive experimental results on real graphs demonstrate the efficacy of our FiGTNE framework.

IJCAI Conference 2019 Conference Paper

Deep Multi-Task Learning with Adversarial-and-Cooperative Nets

  • Pei Yang
  • Qi Tan
  • Jieping Ye
  • Hanghang Tong
  • Jingrui He

In this paper, we propose a deep multi-Task learning model based on Adversarial-and-COoperative nets (TACO). The goal is to use an adversarial-and-cooperative strategy to decouple the task-common and task-specific knowledge, facilitating the fine-grained knowledge sharing among tasks. TACO accommodates multiple game players, i. e. , feature extractors, domain discriminator, and tri-classifiers. They play the MinMax games adversarially and cooperatively to distill the task-common and task-specific features, while respecting their discriminative structures. Moreover, it adopts a divide-and-combine strategy to leverage the decoupled multi-view information to further improve the generalization performance of the model. The experimental results show that our proposed method significantly outperforms the state-of-the-art algorithms on the benchmark datasets in both multi-task learning and semi-supervised domain adaptation scenarios.

IJCAI Conference 2018 Conference Paper

A Local Algorithm for Product Return Prediction in E-Commerce

  • Yada Zhu
  • Jianbo Li
  • Jingrui He
  • Brian L. Quanz
  • Ajay A. Deshpande

With the rapid growth of e-tail, the cost to handle returned online orders also increases significantly and has become a major challenge in the e-commerce industry. Accurate prediction of product returns allows e-tailers to prevent problematic transactions in advance. However, the limited existing work for modeling customer online shopping behaviors and predicting their return actions fail to integrate the rich information in the product purchase and return history (e. g. , return history, purchase-no-return behavior, and customer/product similarity). Furthermore, the large-scale data sets involved in this problem, typically consisting of millions of customers and tens of thousands of products, also render existing methods inefficient and ineffective at predicting the product returns. To address these problems, in this paper, we propose to use a weighted hybrid graph to represent the rich information in the product purchase and return history, in order to predict product returns. The proposed graph consists of both customer nodes and product nodes, undirected edges reflecting customer return history and customer/product similarity based on their attributes, as well as directed edges discriminating purchase-no-return and no-purchase actions. Based on this representation, we study a random-walk-based local algorithm for predicting product return propensity for each customer, whose computational complexity depends only on the size of the output cluster rather than the entire graph. Such a property makes the proposed local algorithm particularly suitable for processing the large-scale data sets to predict product returns. To test the performance of the proposed techniques, we evaluate the graph model and algorithm on multiple e-commerce data sets, showing improved performance over state-of-the-art methods.

AAAI Conference 2017 Conference Paper

Finding Cut from the Same Cloth: Cross Network Link Recommendation via Joint Matrix Factorization

  • Arun Reddy Nelakurthi
  • Jingrui He

With the emergence of online forums associated with major diseases, such as diabetes mellitus, many patients are increasingly dependent on such disease-specific social networks to gain access to additional resources. Among these patients, it is common for them to stick to one disease-specific social network, although their desired resources might be spread over multiple social networks, such as patients with similar questions and concerns. Motivated by this application, in this paper, we focus on cross network link recommendation, which aims to identify similar users across multiple heterogeneous social networks. The problem setting is different from existing work on cross network link prediction, which either tries to link accounts of the same user from different social networks, or aims to match users with complementary expertise or interest. To approach the problem of cross network link recommendation, we propose to jointly decompose the user-keyword matrices from multiple social networks, while requiring them to share the same topics and user group-topic association matrices. This constraint comes from the fact that social networks dedicated to the same disease tend to share the same topics as well as the interests of users groups in certain topics. Based on this intuition, we construct a generic optimization framework, provide four instantiations and an iterative optimization algorithm with performance analysis. In the experiments, we demonstrate the superiority of the proposed algorithm over state-of-the-art techniques on various real-world data sets.

IJCAI Conference 2017 Conference Paper

Learning from Data Heterogeneity: Algorithms and Applications

  • Jingrui He

Nowadays, as an intrinsic property of big data, data heterogeneity can be seen in a variety of real-world applications, ranging from security to manufacturing, from healthcare to crowdsourcing. It refers to any inhomogeneity in the data, and can be present in a variety of forms, corresponding to different types of data heterogeneity, such as task/view/instance/oracle heterogeneity. As shown in previous work as well as our own work, learning from data heterogeneity not only helps people gain a better understanding of the large volume of data, but also provides a means to leverage such data for effective predictive modeling. In this paper, along with multiple real applications, we will briefly review state-of-the-art techniques for learning from data heterogeneity, and demonstrate their performance at addressing these real world problems.

IJCAI Conference 2016 Conference Paper

Crowdsourcing via Tensor Augmentation and Completion

  • Yao Zhou
  • Jingrui He

Nowadays, the rapid proliferation of data makes it possible to build complex models for many real applications. Such models, however, usually require large amount of labeled data, and the labeling process can be both expensive and tedious for domain experts. To address this problem, researchers have resorted to crowdsourcing to collect labels from non-experts with much less cost. The key challenge here is how to infer the true labels from the large number of noisy labels provided by non-experts. Different from most existing work on crowdsourcing, which ignore the structure information in the labeling data provided by non-experts, in this paper, we propose a novel structured approach based on tensor augmentation and completion. It uses tensor representation for the labeled data, augments it with a ground truth layer, and explores two methods to estimate the ground truth layer via low rank tensor completion. Experimental results on 6 real data sets demonstrate the superior performance of the proposed approach over state-of-the-art techniques.

IJCAI Conference 2015 Conference Paper

MUVIR: Multi-View Rare Category Detection

  • Dawei Zhou
  • Jingrui He
  • K. Seluk Candan
  • Hasan Davulcu

Rare category detection refers to the problem of identifying the initial examples from underrepresented minority classes in an imbalanced data set. This problem becomes more challenging in many real applications where the data comes from multiple views, and some views may be irrelevant for distinguishing between majority and minority classes, such as synthetic ID detection and insider threat detection. Existing techniques for rare category detection are not best suited for such applications, as they mainly focus on data with a single view. To address the problem of multi-view rare category detection, in this paper, we propose a novel framework named MUVIR. It builds upon existing techniques for rare category detection with each single view, and exploits the relationship among multiple views to estimate the overall probability of each example belonging to the minority class. In particular, we study multiple special cases of the framework with respect to their working conditions, and analyze the performance of MUVIR in the presence of irrelevant views. For problems where the exact priors of the minority classes are unknown, we generalize the MUVIR algorithm to work with only an upper bound on the priors. Experimental results on both synthetic and real data sets demonstrate the effectiveness of the proposed framework, especially in the presence of irrelevant views.

IJCAI Conference 2013 Conference Paper

Improving Traffic Prediction with Tweet Semantics

  • Jingrui He
  • Wei Shen
  • Phani Divakaruni
  • Laura Wynter
  • Rick Lawrence

Road traffic prediction is a critical component in modern smart transportation systems. It provides the basis for traffic management agencies to generate proactive traffic operation strategies for alleviating congestion. Existing work on near-term traffic prediction (forecasting horizons in the range of 5 minutes to 1 hour) relies on the past and current traffic conditions. However, once the forecasting horizon is beyond 1 hour, i. e. , in longer-term traffic prediction, these techniques do not work well since additional factors other than the past and current traffic conditions start to play important roles. To address this problem, in this paper, for the first time, we examine whether it is possible to use the rich information in online social media to improve longer-term traffic prediction. To this end, we first analyze the correlation between traffic volume and tweet counts with various granularities. Then we propose an optimization framework to extract traffic indicators based on tweet semantics using a transformation matrix, and incorporate them into traffic prediction via linear regression. Experimental results using traffic and Twitter data originated from the San Francisco Bay area of California demonstrate the effectiveness of our proposed framework.

ICML Conference 2013 Conference Paper

MILEAGE: Multiple Instance LEArning with Global Embedding

  • Dan Zhang 0007
  • Jingrui He
  • Luo Si
  • Richard D. Lawrence

Multiple Instance Learning (MIL) methods generally represent each example as a collection of instances such that the features for local objects can be better captured, whereas traditional learning methods typically extract a global feature vector for each example as an integral part. However, there is limited research work on which of the two learning scenarios performs better. This paper proposes a novel framework – \emphMultiple Instance LEArning with Global Embedding (MILEAGE), in which the global feature vectors for traditional learning methods are integrated into the MIL setting. MILEAGE can leverage the benefits derived from both learning settings. Within the proposed framework, a large margin method is formulated. In particular, the proposed method adaptively tunes the weights on the two different kinds of feature representations (i. e. , global and multiple instance) for each example and trains the classifier simultaneously. An alternative algorithm is proposed to solve the resulting optimization problem, which extends the bundle method to the non-convex case. Some important properties of the proposed method, such as the convergence rate and the generalization error rate, are analyzed. A series of experiments have been conducted to demonstrate the advantages of the proposed method over several state-of-the-art multiple instance and traditional learning methods.

NeurIPS Conference 2012 Conference Paper

GenDeR: A Generic Diversified Ranking Algorithm

  • Jingrui He
  • Hanghang Tong
  • Qiaozhu Mei
  • Boleslaw Szymanski

Diversified ranking is a fundamental task in machine learning. It is broadly applicable in many real world problems, e. g. , information retrieval, team assembling, product search, etc. In this paper, we consider a generic setting where we aim to diversify the top-k ranking list based on an arbitrary relevance function and an arbitrary similarity function among all the examples. We formulate it as an optimization problem and show that in general it is NP-hard. Then, we show that for a large volume of the parameter space, the proposed objective function enjoys the diminishing returns property, which enables us to design a scalable, greedy algorithm to find the near-optimal solution. Experimental results on real data sets demonstrate the effectiveness of the proposed algorithm.

AAAI Conference 2012 Conference Paper

Hierarchical Modeling with Tensor Inputs

  • Yada Zhu
  • Jingrui He
  • Rick Lawrence

In many real applications, the input data are naturally expressed as tensors, such as virtual metrology in semiconductor manufacturing, face recognition and gait recognition in computer vision, etc. In this paper, we propose a general optimization framework for dealing with tensor inputs. Most existing methods for supervised tensor learning use only rank-one weight tensors in the linear model and cannot readily incorporate domain knowledge. In our framework, we obtain the weight tensor in a hierarchical way – we first approximate it by a low-rank tensor, and then estimate the lowrank approximation using the prior knowledge from various sources, e. g. , different domain experts. This is motivated by wafer quality prediction in semiconductor manufacturing. Furthermore, we propose an effective algorithm named H-MOTE for solving this framework, which is guaranteed to converge. The time complexity of H-MOTE is linear with respect to the number of examples as well as the size of the weight tensor. Experimental results show the superiority of H-MOTE over state-of-the-art techniques on both synthetic and real data sets.

IJCAI Conference 2007 Conference Paper

  • Jingrui He
  • Jaime Carbonell
  • Yan Liu

This paper proposes and develops a new graph-based semi-supervised learning method. Different from previous graph-based methods that are based on discriminative models, our method is essentially a generative model in that the class conditional probabilities are estimated by graph propagation and the class priors are estimated by linear regression. Experimental results on various datasets show that the proposed method is superior to existing graph-based semi-supervised learning methods, especially when the labeled subset alone proves insufficient to estimate meaningful class priors.

NeurIPS Conference 2007 Conference Paper

Nearest-Neighbor-Based Active Learning for Rare Category Detection

  • Jingrui He
  • Jaime Carbonell

Rare category detection is an open challenge for active learning, especially in the de-novo case (no labeled examples), but of significant practical importance for data mining - e. g. detecting new financial transaction fraud patterns, where normal legitimate transactions dominate. This paper develops a new method for detecting an instance of each minority class via an unsupervised local-density-differential sampling strategy. Essentially a variable-scale nearest neighbor process is used to optimize the probability of sampling tightly-grouped minority classes, subject to a local smoothness assumption of the majority class. Results on both synthetic and real data sets are very positive, detecting each minority class with only a frac- tion of the actively sampled points required by random sampling and by Pelleg’s Interleave method, the prior best technique in the sparse literature on this topic.