Author name cluster

Bo Long

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

15 papers

2 author rows

AAAI Conference 2026 Conference Paper

Learning the Latent Structure: A Feature-Centric Approach to Graph Data Augmentation

Yu Song
Zhigang Hua
Yan Xie
Bingheng Li
Jingzhe Liu
Bo Long
Jiliang Tang
Hui Liu

Graph-structured data plays a pivotal role in modeling complex relationships. However, real-world graphs are often incomplete due to data collection and observational constraints, severely limiting the effectiveness of modern graph learning pipelines. While existing Graph Data Augmentation (GDA) methods attempt to refine graph structures for improved downstream performance, they are typically label-dependent, computationally expensive, and inherently transductive, limiting their applicability in practical scenarios. In this work, we present a novel feature-centric graph data augmentation framework that bypasses explicit structure modeling by operating directly in the embedding space. Through a self-supervised inverse masking process, our method captures latent ties between observed and complete graphs, enabling recovery of unobserved structural signals through refined node representations. To enhance robustness under noisy and sparse supervision, we introduce a message regularizer and a bootstrap strategy for effective training and generalization. Evaluated on ten graph datasets spanning multiple domains, our approach, SelfAug, consistently outperforms state-of-the-art methods in both accuracy and efficiency across inductive and cold-start settings, highlighting its potential as a scalable and generalizable solution for real-world graph learning scenarios.

PDF Details DOI

AAAI Conference 2025 Conference Paper

A Scalable and Effective Alternative to Graph Transformers

Kaan Sancak
Zhigang Hua
Jin Fang
Yan Xie
Andrey Malevich
Bo Long
Muhammed Fatih Balin
Ümit V. Çatalyürek

Graph Neural Networks (GNNs) have shown impressive performance in graph representation learning, but they face challenges in capturing long-range dependencies due to their limited expressive power. To address this, Graph Transformers (GTs) were introduced, utilizing self-attention mechanism to effectively model pairwise node relationships. Despite their advantages, GTs suffer from quadratic complexity w.r.t. the number of nodes in the graph, hindering their applicability to large graphs. In this work, we present Graph-Enhanced Contextual Operator (GECO), a scalable and effective alternative to GTs that leverages neighborhood propagation and global convolutions to effectively capture local and global dependencies in quasiliniear time. Our study on synthetic datasets reveals that GECO reaches 169x speedup on a graph with 2M nodes w.r.t. optimized attention. Further evaluations on diverse range of benchmarks showcase that it scales to large graphs where traditional GTs often face memory and time limitations. Notably, GECO consistently achieves comparable or superior quality compared to baselines, improving the SOTA up to 4.5%, and offering a scalable and effective solution for large-scale graph learning.

PDF Details DOI

ICLR Conference 2025 Conference Paper

Learning Graph Quantized Tokenizers

Limei Wang
Kaveh Hassani
Si Zhang
Dongqi Fu
Baichuan Yuan
Weilin Cong
Zhigang Hua
Hao Wu

Transformers serve as the backbone architectures of Foundational Models, where domain-specific tokenizers allow them to adapt to various domains. Graph Transformers (GTs) have recently emerged as leading models in geometric deep learning, outperforming Graph Neural Networks (GNNs) in various graph learning tasks. However, the development of tokenizers for graphs has lagged behind other modalities, with existing approaches relying on heuristics or GNNs co-trained with Transformers. To address this, we introduce GQT (\textbf{G}raph \textbf{Q}uantized \textbf{T}okenizer), which decouples tokenizer training from Transformer training by leveraging multi-task graph self-supervised learning, yielding robust and generalizable graph tokens. Furthermore, the GQT utilizes Residual Vector Quantization (RVQ) to learn hierarchical discrete tokens, resulting in significantly reduced memory requirements and improved generalization capabilities. By combining the GQT with token modulation, a Transformer encoder achieves state-of-the-art performance on 20 out of 22 benchmarks, including large-scale homophilic and heterophilic datasets. The implementation is publicly available at \href{https://github.com/limei0307/GQT}{https://github.com/limei0307/GQT}.

Details

TMLR Journal 2025 Journal Article

Preference Discerning with LLM-Enhanced Generative Retrieval

Fabian Paischer
Liu Yang
Linfeng Liu
Shuai Shao
Kaveh Hassani
Jiacheng Li
Ricky T. Q. Chen
Zhang Gabriel Li

In sequential recommendation, models recommend items based on user's interaction history. To this end, current models usually incorporate information such as item descriptions and user intent or preferences. User preferences are usually not explicitly given in open-source datasets, and thus need to be approximated, for example via large language models (LLMs). Current approaches leverage approximated user preferences only during training and rely solely on the past interaction history for recommendations, limiting their ability to dynamically adapt to changing preferences, potentially reinforcing echo chambers. To address this issue, we propose a new paradigm, namely *preference discerning*, which explicitly conditions a generative recommendation model on user preferences in natural language within its context. To evaluate *preference discerning*, we introduce a novel benchmark that provides a holistic evaluation across various scenarios, including preference steering and sentiment following. Upon evaluating current state-of-the-art methods on our benchmark, we discover that their ability to dynamically adapt to evolving user preferences is limited. To address this, we propose a new method named Mender (**M**ultimodal Prefer**en**ce **D**iscern**er**), which achieves state-of-the-art performance in our benchmark. Our results show that Mender effectively adapts its recommendation guided by human preferences, even if not observed during training, paving the way toward more flexible recommendation models.

PDF Details

TMLR Journal 2025 Journal Article

Unifying Generative and Dense Retrieval for Sequential Recommendation

Liu Yang
Fabian Paischer
Kaveh Hassani
Jiacheng Li
Shuai Shao
Zhang Gabriel Li
Yun He
Xue Feng

Sequential dense retrieval models utilize advanced sequence learning techniques to compute item and user representations, which are then used to rank relevant items for a user through inner product computation between the user and all item representations. While effective, these approaches incur high memory and computational costs due to the need to store and compare a unique embedding for each item--leading to lower resource efficiency. In contrast, the recently proposed generative retrieval paradigm offers a promising alternative by directly predicting item indices using a generative model trained on semantic IDs that encapsulate items’ semantic information. Despite its potential for large-scale applications, a comprehensive comparison between generative retrieval and sequential dense retrieval under fair conditions is still lacking, leaving open questions regarding performance and resource efficiency trade-offs. To address this, we compare these two approaches under controlled conditions on academic benchmarks and observe performance gaps, with dense retrieval showing stronger ranking performance, while generative retrieval provides greater resource efficiency. Motivated by these observations, we propose LIGER (LeveragIng dense retrieval for GEnerative Retrieval), a hybrid model that combines the strengths of these two widely used approaches. LIGER integrates sequential dense retrieval into generative retrieval, mitigating performance differences between the two methods, and enhancing cold-start item recommendation in the evaluated datasets. This hybrid approach provides insight into the trade-offs between these approaches and demonstrates improvements in efficiency and effectiveness for recommendation systems in small-scale benchmarks.

PDF Details

ICLR Conference 2024 Conference Paper

VCR-Graphormer: A Mini-batch Graph Transformer via Virtual Connections

Dongqi Fu
Zhigang Hua
Yan Xie
Jin Fang
Si Zhang
Kaan Sancak
Hao Wu
Andrey Malevich

Graph transformer has been proven as an effective graph learning method for its adoption of attention mechanism that is capable of capturing expressive representations from complex topological and feature information of graphs. Graph transformer conventionally performs dense attention (or global attention) for every pair of nodes to learn node representation vectors, resulting in quadratic computational costs that are unaffordable for large-scale graph data. Therefore, mini-batch training for graph transformers is a promising direction, but limited samples in each mini-batch can not support effective dense attention to encode informative representations. Facing this bottleneck, (1) we start by assigning each node a token list that is sampled by personalized PageRank (PPR) and then apply standard multi-head self-attention only on this list to compute its node representations. This PPR tokenization method decouples model training from complex graph topological information and makes heavy feature engineering offline and independent, such that mini-batch training of graph transformers is possible by loading each node's token list in batches. We further prove this PPR tokenization is viable as a graph convolution network with a fixed polynomial filter and jumping knowledge. However, only using personalized PageRank may limit information carried by a token list, which could not support different graph inductive biases for model training. To this end, (2) we rewire graphs by introducing multiple types of virtual connections through structure- and content-based super nodes that enable PPR tokenization to encode local and global contexts, long-range interaction, and heterophilous information into each node's token list, and then formalize our $\underline{\textbf{V}}$irtual $\underline{\textbf{C}}$onnection $\underline{\textbf{R}}$anking based $\underline{\textbf{Graph}}$ Trans$\underline{\textbf{former}}$ (VCR-Graphormer). Overall, VCR-Graphormer needs $O(m+klogk)$ complexity for graph tokenization as compared to $O(n^{3})$ of previous works. The [code](https://github.com/DongqiFu/VCR-Graphormer) is provided.

Details

ICLR Conference 2022 Conference Paper

Givens Coordinate Descent Methods for Rotation Matrix Learning in Trainable Embedding Indexes

Yunjiang Jiang
Han Zhang 0047
Yiming Qiu 0003
Yun Xiao
Bo Long
Wen-Yun Yang

Product quantization (PQ) coupled with a space rotation, is widely used in modern approximate nearest neighbor (ANN) search systems to significantly compress the disk storage for embeddings and speed up the inner product computation. Existing rotation learning methods, however, minimize quantization distortion for fixed embeddings, which are not applicable to an end-to-end training scenario where embeddings are updated constantly. In this paper, based on geometric intuitions from Lie group theory, in particular the special orthogonal groupSO(n), we propose a family of block Givens coordinate descent algorithms to learn rotation matrix that are provably convergent on any convex objectives. Compared to the state-of-the-art SVD method, the Givens algorithms are much more parallelizable, reducing runtime by orders of magnitude on modern GPUs, and converge more stably according to experimental studies. They further improve upon vanilla product quantization significantly in an end-to-end training scenario.

Details

AAAI Conference 2022 Conference Paper

Reducing Flipping Errors in Deep Neural Networks

Xiang Deng
Yun Xiao
Bo Long
Zhongfei Zhang

Deep neural networks (DNNs) have been widely applied in various domains in artificial intelligence including computer vision and natural language processing. A DNN is typically trained for many epochs and then a validation dataset is used to select the DNN in an epoch (we simply call this epoch “the last epoch”) as the final model for making predictions on unseen samples, while it usually cannot achieve a perfect accuracy on unseen samples. An interesting question is “how many test (unseen) samples that a DNN misclassifies in the last epoch were ever correctly classified by the DNN before the last epoch? ”. In this paper, we empirically study this question and find on several benchmark datasets that the vast majority of the misclassified samples in the last epoch were ever classified correctly before the last epoch, which means that the predictions for these samples were flipped from “correct” to “wrong”. Motivated by this observation, we propose to restrict the behavior changes of a DNN on the correctlyclassified samples so that the correct local boundaries can be maintained and the flipping error on unseen samples can be largely reduced. Extensive experiments1 on different benchmark datasets with different modern network architectures demonstrate that the proposed flipping error reduction (FER) approach can substantially improve the generalization, the robustness, and the transferability of DNNs without introducing any additional network parameters or inference cost, only with a negligible training overhead.

PDF Details

ICML Conference 2022 Conference Paper

Robust Meta-learning with Sampling Noise and Label Noise via Eigen-Reptile

Dong Chen 0017
Lingfei Wu 0001
Siliang Tang
Xiao Yun
Bo Long
Yueting Zhuang

Recent years have seen a surge of interest in meta-learning techniques for tackling the few-shot learning (FSL) problem. However, the meta-learner is prone to overfitting since there are only a few available samples, which can be identified as sampling noise on a clean dataset. Besides, when handling the data with noisy labels, the meta-learner could be extremely sensitive to label noise on a corrupted dataset. To address these two challenges, we present Eigen-Reptile (ER) that updates the meta-parameters with the main direction of historical task-specific parameters. Specifically, the main direction is computed in a fast way, where the scale of the calculated matrix is related to the number of gradient steps for the specific task instead of the number of parameters. Furthermore, to obtain a more accurate main direction for Eigen-Reptile in the presence of many noisy labels, we further propose Introspective Self-paced Learning (ISPL). We have theoretically and experimentally demonstrated the soundness and effectiveness of the proposed Eigen-Reptile and ISPL. Particularly, our experiments on different tasks show that the proposed method is able to outperform or achieve highly competitive performance compared with other gradient-based methods with or without noisy labels. The code and data for the proposed method are provided for research purposes https: //github. com/Anfeather/Eigen-Reptile.

Details

NeurIPS Conference 2021 Conference Paper

Learning to Generate Visual Questions with Noisy Supervision

Shen Kai
Lingfei Wu
Siliang Tang
Yueting Zhuang
Zhen He
Zhuoye Ding
Yun Xiao
Bo Long

The task of visual question generation (VQG) aims to generate human-like neural questions from an image and potentially other side information (e. g. , answer type or the answer itself). Existing works often suffer from the severe one image to many questions mapping problem, which generates uninformative and non-referential questions. Recent work has demonstrated that by leveraging double visual and answer hints, a model can faithfully generate much better quality questions. However, visual hints are not available naturally. Despite they proposed a simple rule-based similarity matching method to obtain candidate visual hints, they could be very noisy practically and thus restrict the quality of generated questions. In this paper, we present a novel learning approach for double-hints based VQG, which can be cast as a weakly supervised learning problem with noises. The key rationale is that the salient visual regions of interest can be viewed as a constraint to improve the generation procedure for producing high-quality questions. As a result, given the predicted salient visual regions of interest, we can focus on estimating the probability of being ground-truth questions, which in turn implicitly measures the quality of predicted visual hints. Experimental results on two benchmark datasets show that our proposed method outperforms the state-of-the-art approaches by a large margin on a variety of metrics, including both automatic machine metrics and human evaluation.

PDF Details

TIST Journal 2014 Journal Article

Exploiting User Preference for Online Learning in Web Content Optimization Systems

Jiang Bian
Bo Long
Lihong Li
Taesup Moon
Anlei Dong
Yi Chang

Web portal services have become an important medium to deliver digital content (e.g. news, advertisements, etc.) to Web users in a timely fashion. To attract more users to various content modules on the Web portal, it is necessary to design a recommender system that can effectively achieve Web portal content optimization by automatically estimating content item attractiveness and relevance to user interests. The state-of-the-art online learning methodology adapts dedicated pointwise models to independently estimate the attractiveness score for each candidate content item. Although such pointwise models can be easily adapted for online recommendation, there still remain a few critical problems. First, this pointwise methodology fails to use invaluable user preferences between content items. Moreover, the performance of pointwise models decreases drastically when facing the problem of sparse learning samples. To address these problems, we propose exploring a new dynamic pairwise learning methodology for Web portal content optimization in which we exploit dynamic user preferences extracted based on users' actions on portal services to compute the attractiveness scores of content items. In this article, we introduce two specific pairwise learning algorithms, a straightforward graph-based algorithm and a formalized Bayesian modeling one. Experiments on large-scale data from a commercial Web portal demonstrate the significant improvement of pairwise methodologies over the baseline pointwise models. Further analysis illustrates that our new pairwise learning approaches can benefit personalized recommendation more than pointwise models, since the data sparsity is more critical for personalized content optimization.

Details DOI

AAAI Conference 2008 Conference Paper

Clustering on Complex Graphs

Bo Long
Philip S. Yu

Complex graphs, in which multi-type nodes are linked to each other, frequently arise in many important applications, such as Web mining, information retrieval, bioinformatics, and epidemiology. In this study, We propose a general framework for clustering on complex graphs. Under this framework, we derive a family of clustering algorithms including both hard and soft versions, which are capable of learning cluster patterns from complex graphs with various structures and statistical properties. We also establish the connections between the proposed framework and the traditional graph partitioning approaches. The experimental evaluation provides encouraging results to validate the proposed framework and algorithms.

PDF Details

AAAI Conference 2007 Conference Paper

Graph Partitioning Based on Link Distributions

Bo Long

Existing graph partitioning approaches are mainly based on optimizing edge cuts and do not take the distribution of edge weights (link distribution) into consideration. In this paper, we propose a general model to partition graphs based on link distributions. This model formulates graph partitioning under a certain distribution assumption as approximating the graph afﬁnity matrix under the corresponding distortion measure. Under this model, we derive a novel graph partitioning algorithm to approximate a graph afﬁnity matrix under various Bregman divergences, which correspond to a large exponential family of distributions. We also establish the connections between edge cut objectives and the proposed model to provide a uniﬁed view to graph partitioning.

PDF Details

ICML Conference 2007 Conference Paper

Relational clustering by symmetric convex coding

Bo Long
Zhongfei Zhang
Xiaoyun Wu
Philip S. Yu

Details

ICML Conference 2006 Conference Paper

Spectral clustering for multi-type relational data

Bo Long
Zhongfei Zhang
Xiaoyun Wu
Philip S. Yu

Clustering on multi-type relational data has attracted more and more attention in recent years due to its high impact on various important applications, such as Web mining, e-commerce and bioinformatics. However, the research on general multi-type relational data clustering is still limited and preliminary. The contribution of the paper is three-fold. First, we propose a general model, the collective factorization on related matrices, for multi-type relational data clustering. The model is applicable to relational data with various structures. Second, under this model, we derive a novel algorithm, the spectral relational clustering, to cluster multi-type interrelated data objects simultaneously. The algorithm iteratively embeds each type of data objects into low dimensional spaces and benefits from the interactions among the hidden structures of different types of data objects. Extensive experiments demonstrate the promise and effectiveness of the proposed algorithm. Third, we show that the existing spectral clustering algorithms can be considered as the special cases of the proposed model and algorithm. This demonstrates the good theoretic generality of the proposed model and algorithm.

Details