Arrow Research search

Author name cluster

Xia Hu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

56 papers
1 author row

Possible papers

56

JBHI Journal 2026 Journal Article

A 1D Snoring Waveform and 2D Composite Acoustic Feature Graph-Based Multi-Modal Fusion Network for Obstructive Sites Recognition

  • Xia Hu
  • Rui Fang
  • Huiping Luo
  • Jingchun Luo
  • Chen Chen
  • Wei Chen

As a critical factor in diagnostic work-up and treatment decision-making process of sleep-related breathing disorders, accurate localization of obstructive sites in the upper airway is in dire need. Snoring, as a dynamic acoustic signal, carries informative information relating to the sites and degree of obstruction in the upper airway, offering a non-invasive, cost-effective solution for obstructive sites recognition. However, most of existing snoring-based methods for recognizing obstructive sites only involve limited information (either mainly concentrated on traditional acoustic characteristics or spectrogram features), which may omit dynamic pathological information. Moreover, existing methods proceed from either a one-dimensional (1D) signal or two-dimensional (2D) image perspective, where complementary information from the other modality may be overlooked. In this paper, a multi-modal framework, which combines 1D snoring waveform and 2D Composite Acoustic Feature Graph (CAF-Graph), is proposed. 1D snoring waveform perceives fine time structure and local patterns, aiming at learning high-level discriminative representations by neural networks. 2D CAF-Graph is dedicated to emphasizing dynamic spatio-temporal and physiological-acoustic characteristic of snoring, which concatenates acoustic features related to Prosodic, Formant, Spectral, and Cepstral characteristics. Further, a multi-modal fusion network (BMFNet) effectively integrates independent and interactive information between single-modal features, which offers a more comprehensive perspective. The recognition task was formulated as a three-class classification problem, including upper (snoring caused by upper-level obstruction), lower (snoring caused by lower-level obstruction), and silence (obstruction without snoring). The proposed method was validated on a clinical dataset collected in the ENT institute and Department of Otorhinolaryngology, Eye & ENT Hospital, Fudan University, where reached 81. 2% Accuracy, 86. 8% Weighted Average Precision, 81. 2% Weighted Average Recall, and 82. 3% Weighted Average F1-Score. Results exhibit the effectiveness of multi-modal feature representations for snoring, providing a novel insight for obstructive sites recognition tasks.

NeurIPS Conference 2025 Conference Paper

70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float (DFloat11)

  • Tianyi Zhang
  • Mohsen Hariri
  • Shaochen (Henry) Zhong
  • Vipin Chaudhary
  • Yang Sui
  • Xia Hu
  • Anshumali Shrivastava

Large-scale AI models, such as Large Language Models (LLMs) and Diffusion Models (DMs), have grown rapidly in size, creating significant challenges for efficient deployment on resource-constrained hardware. In this paper, we introduce Dynamic-Length Float (DFloat11), a lossless compression framework that reduces LLM and DM size by 30\% while preserving outputs that are bit-for-bit identical to the original model. DFloat11 is motivated by the low entropy in the BFloat16 weight representation of LLMs, which reveals significant inefficiency in the existing storage format. By applying entropy coding, DFloat11 assigns dynamic-length encodings to weights based on frequency, achieving near information-optimal compression without any loss of precision. To facilitate efficient inference with dynamic-length encodings, we develop a custom GPU kernel for fast online decompression. Our design incorporates the following: (i) compact, hierarchical lookup tables (LUTs) that fit within GPU SRAM for efficient decoding, (ii) a two-phase GPU kernel for coordinating thread read/write positions using lightweight auxiliary variables, and (iii) transformer-block-level decompression to minimize latency. Experiments on Llama 3. 3, Qwen 3, Mistral 3, FLUX. 1, and others validate our hypothesis that DFloat11 achieves around 30\% model size reduction while preserving bit-for-bit identical outputs. Compared to a potential alternative of offloading parts of an uncompressed model to the CPU to meet memory constraints, DFloat11 achieves 2. 3--46. 2$\times$ higher throughput in token generation. With a fixed GPU memory budget, DFloat11 enables 5. 7--14. 9$\times$ longer generation lengths than uncompressed models. Notably, our method enables lossless inference of Llama 3. 1 405B, an 810GB model, on a single node equipped with 8$\times$80GB GPUs.

TMLR Journal 2025 Journal Article

Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models

  • Yang Sui
  • Yu-Neng Chuang
  • Guanchu Wang
  • Jiamu Zhang
  • Tianyi Zhang
  • Jiayi Yuan
  • Hongyi Liu
  • Andrew Wen

Large Language Models (LLMs) have demonstrated remarkable capabilities in complex tasks. Recent advancements in Large Reasoning Models (LRMs), such as OpenAI o1 and DeepSeek-R1, have further improved performance in System-2 reasoning domains like mathematics and programming by harnessing supervised fine-tuning (SFT) and reinforcement learning (RL) to enhance Chain-of-Thought (CoT) reasoning. However, while longer CoT reasoning sequences improve performance, they also introduce significant computational overhead due to lengthy and redundant outputs, known as the ''overthinking phenomenon''. Efficient Reasoning, which seeks to optimize reasoning length while preserving reasoning capabilities, offers practical benefits such as faster processing times, lower energy consumption, and improved responsiveness, especially valuable for reasoning-intensive applications. Despite its potential, efficient reasoning remains in the early stages of research. In this paper, we provide the first structured survey to systematically investigate and explore the current progress toward achieving efficient reasoning in LLMs. Overall, relying on the inherent mechanism of LLMs, we categorize existing works into several key directions: (1) model-based efficient reasoning, which considers optimizing full-length reasoning models into more concise reasoning models or directly training efficient reasoning models; (2) reasoning output-based efficient reasoning, which aims to dynamically reduce reasoning steps and length during inference; (3) input prompts-based efficient reasoning, which seeks to enhance reasoning efficiency based on input prompt properties such as difficulty or length control. Additionally, we introduce the use of efficient data for training reasoning models, explore the reasoning capabilities of small language models, and discuss evaluation methods and benchmarking.

NeurIPS Conference 2025 Conference Paper

Understanding and Mitigating Numerical Sources of Nondeterminism in LLM Inference

  • Jiayi Yuan
  • Hao Li
  • Xinheng Ding
  • Wenya Xie
  • Yu-Jhe Li
  • Wentian Zhao
  • Kun Wan
  • Jing Shi

Large Language Models (LLMs) are now integral across various domains and have demonstrated impressive performance. Progress, however, rests on the premise that benchmark scores are both accurate and reproducible. We demonstrate that the reproducibility of LLM performance is fragile: changing system configuration, such as evaluation batch size, GPU count, and GPU version, can introduce significant differences in the generated responses. This issue is especially pronounced in reasoning models, where minor rounding differences in early tokens can cascade into divergent chains of thought, ultimately affecting accuracy. For instance, under bfloat16 precision with greedy decoding, a reasoning model like DeepSeek-R1-Distill-Qwen-7B can exhibit up to 9\% variation in accuracy and 9, 000 tokens difference in response length due to differences in GPU count, type, and evaluation batch size. We trace the root cause of this variability to the non-associative nature of floating-point arithmetic under limited numerical precision. This work presents the first systematic investigation into how numerical precision affects reproducibility in LLM inference. Through carefully controlled experiments across various hardware, software, and precision settings, we quantify when and how model outputs diverge. Our analysis reveals that floating-point precision—while critical for reproducibility—is often neglected in evaluation practices. Inspired by this, we develop a lightweight inference pipeline, dubbed LayerCast, that stores weights in 16-bit precision but performs all computations in FP32, balancing memory efficiency with numerical stability. Code is available at https: //github. com/nanomaoli/llm_reproducibility.

AAAI Conference 2024 Conference Paper

Chasing Fairness in Graphs: A GNN Architecture Perspective

  • Zhimeng Jiang
  • Xiaotian Han
  • Chao Fan
  • Zirui Liu
  • Na Zou
  • Ali Mostafavi
  • Xia Hu

There has been significant progress in improving the performance of graph neural networks (GNNs) through enhancements in graph data, model architecture design, and training strategies. For fairness in graphs, recent studies achieve fair representations and predictions through either graph data pre-processing (e.g., node feature masking, and topology rewiring) or fair training strategies (e.g., regularization, adversarial debiasing, and fair contrastive learning). How to achieve fairness in graphs from the model architecture perspective is less explored. More importantly, GNNs exhibit worse fairness performance compared to multilayer perception since their model architecture (i.e., neighbor aggregation) amplifies biases. To this end, we aim to achieve fairness via a new GNN architecture. We propose Fair Message Passing (FMP) designed within a unified optimization framework for GNNs. Notably, FMP explicitly renders sensitive attribute usage in forward propagation for node classification task using cross-entropy loss without data pre-processing. In FMP, the aggregation is first adopted to utilize neighbors' information and then the bias mitigation step explicitly pushes demographic group node presentation centers together. In this way, FMP scheme can aggregate useful information from neighbors and mitigate bias to achieve better fairness and prediction tradeoff performance. Experiments on node classification tasks demonstrate that the proposed FMP outperforms several baselines in terms of fairness and accuracy on three real-world datasets. The code is available at https://github.com/zhimengj0326/FMP.

NeurIPS Conference 2024 Conference Paper

Gradient Rewiring for Editable Graph Neural Network Training

  • Zhimeng Jiang
  • Zirui Liu
  • Xiaotian Han
  • Qizhang Feng
  • Hongye Jin
  • Qiaoyu Tan
  • Kaixiong Zhou
  • Na Zou

Deep neural networks are ubiquitously adopted in many applications, such as computer vision, natural language processing, and graph analytics. However, well-trained neural networks can make prediction errors after deployment as the world changes. \textit{Model editing} involves updating the base model to correct prediction errors with less accessible training data and computational resources. Despite recent advances in model editors in computer vision and natural language processing, editable training in graph neural networks (GNNs) is rarely explored. The challenge with editable GNN training lies in the inherent information aggregation across neighbors, which can lead model editors to affect the predictions of other nodes unintentionally. In this paper, we first observe the gradient of cross-entropy loss for the target node and training nodes with significant inconsistency, which indicates that directly fine-tuning the base model using the loss on the target node deteriorates the performance on training nodes. Motivated by the gradient inconsistency observation, we propose a simple yet effective \underline{G}radient \underline{R}ewiring method for \underline{E}ditable graph neural network training, named \textbf{GRE}. Specifically, we first store the anchor gradient of the loss on training nodes to preserve the locality. Subsequently, we rewire the gradient of the loss on the target node to preserve performance on the training node using anchor gradient. Experiments demonstrate the effectiveness of GRE on various model architectures and graph datasets in terms of multiple editing situations. The source code is available at \url{https: //github. com/zhimengj0326/Gradient rewiring editing}.

TMLR Journal 2024 Journal Article

On the Equivalence of Graph Convolution and Mixup

  • Xiaotian Han
  • Hanqing Zeng
  • Yu Chen
  • Shaoliang Nie
  • Jingzhou Liu
  • Kanika Narang
  • Zahra Shakeri
  • Karthik Abinav Sankararaman

This paper investigates the relationship between graph convolution and Mixup techniques. Graph convolution in a graph neural network involves aggregating features from neighboring samples to learn representative features for a specific node or sample. On the other hand, Mixup is a data augmentation technique that generates new examples by averaging features and one-hot labels from multiple samples. One commonality between these techniques is their utilization of information from multiple samples to derive feature representation. This study aims to explore whether a connection exists between the two. Our investigation reveals that, under two mild modifications, graph convolution can be viewed as a specialized form of Mixup that is applied during both the training and testing phases. The two modifications are 1) \textit{Homophily Relabel} - assigning the target node's label to all its neighbors, and 2) \textit{Test-Time Mixup} - Mixup the feature during the test time. We establish this equivalence mathematically by demonstrating that graph convolution networks and simplified graph convolution can be expressed as a form of Mixup. We also empirically verify the equivalence by training an MLP using the two modifications to achieve comparable performance.

JMLR Journal 2023 Journal Article

AutoKeras: An AutoML Library for Deep Learning

  • Haifeng Jin
  • François Chollet
  • Qingquan Song
  • Xia Hu

To use deep learning, one needs to be familiar with various software tools like TensorFlow or Keras, as well as various model architecture and optimization best practices. Despite recent progress in software usability, deep learning remains a highly specialized occupation. To enable people with limited machine learning and programming experience to adopt deep learning, we developed AutoKeras, an Automated Machine Learning (AutoML) library that automates the process of model selection and hyperparameter tuning. AutoKeras encapsulates the complex process of building and training deep neural networks into a very simple and accessible interface, which enables novice users to solve standard machine learning problems with a few lines of code. Designed with practical applications in mind, AutoKeras is built on top of Keras and TensorFlow, and all AutoKeras-created models can be easily exported and deployed with the help of the TensorFlow ecosystem tooling. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2023. ( edit, beta )

NeurIPS Conference 2023 Conference Paper

Chasing Fairness Under Distribution Shift: A Model Weight Perturbation Approach

  • Zhimeng (Stephen) Jiang
  • Xiaotian Han
  • Hongye Jin
  • Guanchu Wang
  • Rui Chen
  • Na Zou
  • Xia Hu

Fairness in machine learning has attracted increasing attention in recent years. The fairness methods improving algorithmic fairness for in-distribution data may not perform well under distribution shifts. In this paper, we first theoretically demonstrate the inherent connection between distribution shift, data perturbation, and model weight perturbation. Subsequently, we analyze the sufficient conditions to guarantee fairness (i. e. , low demographic parity) for the target dataset, including fairness for the source dataset, and low prediction difference between the source and target datasets for each sensitive attribute group. Motivated by these sufficient conditions, we propose robust fairness regularization (RFR) by considering the worst case within the model weight perturbation ball for each sensitive attribute group. We evaluate the effectiveness of our proposed RFR algorithm on synthetic and real distribution shifts across various datasets. Experimental results demonstrate that RFR achieves better fairness-accuracy trade-off performance compared with several baselines. The source code is available at \url{https: //github. com/zhimengj0326/RFR_NeurIPS23}.

TMLR Journal 2023 Journal Article

DSpar: An Embarrassingly Simple Strategy for Efficient GNN training and inference via Degree-based Sparsification

  • Zirui Liu
  • Kaixiong Zhou
  • Zhimeng Jiang
  • Li Li
  • Rui Chen
  • Soo-Hyun Choi
  • Xia Hu

Running Graph Neural Networks (GNNs) on large graphs suffers from notoriously inefficiency. This is attributed to the sparse graph-based operations, which is hard to be accelerated by community hardware, e.g., GPUs and CPUs. One potential solution is to ``sketch'' the original graph by removing unimportant edges, then both the training and inference process are executed on the sparsified graph with improved efficiency. Traditional graph sparsification work calculates the edge importance score, i.e., effective resistance, from graph topology with theoretical guarantee. However, estimating effective resistance is even more expensive than training GNNs itself. Later, learning-based sparsification methods propose to learn the edge importance from data, but with significant overhead due to the extra learning process. Thus, both of them introduce significant ahead-of-training overhead. In this paper, we experimentally and theoretically prove that effective resistance can be approximated using only the node degree information and achieve similar node presentations on graph with/without sparsification. Based on this finding, we propose DSpar, to sparsify the graph once before training based on only the node degree information with negligible ahead-of-training overhead. In practice, for the training phase, DSpar achieves up to $5.9\times$ faster than baseline with almost no accuracy drop. For the inference phase, DSpar reduces up to $90\%$ latency.

NeurIPS Conference 2023 Conference Paper

Fair Graph Distillation

  • Qizhang Feng
  • Zhimeng (Stephen) Jiang
  • Ruiquan Li
  • Yicheng Wang
  • Na Zou
  • Jiang Bian
  • Xia Hu

As graph neural networks (GNNs) struggle with large-scale graphs due to high computational demands, data distillation for graph data promises to alleviate this issue by distilling a large real graph into a smaller distilled graph while maintaining comparable prediction performance for GNNs trained on both graphs. However, we observe that GNNs trained on distilled graphs may exhibit more severe group fairness problems than those trained on real graphs. Motivated by this observation, we propose \textit{fair graph distillation} (\Algnameabbr), an approach for generating small distilled \textit{fair and informative} graphs based on the graph distillation method. The challenge lies in the deficiency of sensitive attributes for nodes in the distilled graph, making most debiasing methods (e. g. , regularization and adversarial debiasing) intractable for distilled graphs. We develop a simple yet effective bias metric, called coherence, for distilled graphs. Based on the proposed coherence metric, we introduce a framework for fair graph distillation using a bi-level optimization algorithm. Extensive experiments demonstrate that the proposed algorithm can achieve better prediction performance-fairness trade-offs across various datasets and GNN architectures.

NeurIPS Conference 2023 Conference Paper

One Less Reason for Filter Pruning: Gaining Free Adversarial Robustness with Structured Grouped Kernel Pruning

  • Shaochen (Henry) Zhong
  • Zaichuan You
  • Jiamu Zhang
  • Sebastian Zhao
  • Zachary LeClaire
  • Zirui Liu
  • Daochen Zha
  • Vipin Chaudhary

Densely structured pruning methods utilizing simple pruning heuristics can deliver immediate compression and acceleration benefits with acceptable benign performances. However, empirical findings indicate such naively pruned networks are extremely fragile under simple adversarial attacks. Naturally, we would be interested in knowing if such a phenomenon also holds for carefully designed modern structured pruning methods. If so, then to what extent is the severity? And what kind of remedies are available? Unfortunately, both the questions and the solution remain largely unaddressed: no prior art is able to provide a thorough investigation on the adversarial performance of modern structured pruning methods (spoiler: it is not good), yet the few works that attempt to provide mitigation often do so at various extra costs with only to-be-desired performance. In this work, we answer both questions by fairly and comprehensively investigating the adversarial performance of 10+ popular structured pruning methods. Solution-wise, we take advantage of Grouped Kernel Pruning (GKP) 's recent success in pushing densely structured pruning freedom to a more fine-grained level. By mixing up kernel smoothness — a classic robustness-related kernel-level metric — into a modified GKP procedure, we present a one-shot-post-train-weight-dependent GKP method capable of advancing SOTA performance on both the benign and adversarial scale, while requiring no extra (in fact, often less) cost than a standard pruning procedure. Please refer to our GitHub repository for code implementation, tool sharing, and model checkpoints.

IJCAI Conference 2023 Conference Paper

Probabilistic Masked Attention Networks for Explainable Sequential Recommendation

  • Huiyuan Chen
  • Kaixiong Zhou
  • Zhimeng Jiang
  • Chin-Chia Michael Yeh
  • Xiaoting Li
  • Menghai Pan
  • Yan Zheng
  • Xia Hu

Transformer-based models are powerful for modeling temporal dynamics of user preference in sequential recommendation. Most of the variants adopt the Softmax transformation in the self-attention layers to generate dense attention probabilities. However, real-world item sequences are often noisy, containing a mixture of true-positive and false-positive interactions. Such dense attentions inevitably assign probability mass to noisy or irrelevant items, leading to sub-optimal performance and poor explainability. Here we propose a Probabilistic Masked Attention Network (PMAN) to identify the sparse pattern of attentions, which is more desirable for pruning noisy items in sequential recommendation. Specifically, we employ a probabilistic mask to achieve sparse attentions under a constrained optimization framework. As such, PMAN allows to select which information is critical to be retained or dropped in a data-driven fashion. Experimental studies on real-world benchmark datasets show that PMAN is able to improve the performance of Transformers significantly.

TMLR Journal 2023 Journal Article

Retiring $\Delta \text{DP}$: New Distribution-Level Metrics for Demographic Parity

  • Xiaotian Han
  • Zhimeng Jiang
  • Hongye Jin
  • Zirui Liu
  • Na Zou
  • Qifan Wang
  • Xia Hu

Demographic parity is the most widely recognized measure of group fairness in machine learning, which ensures equal treatment of different demographic groups. Numerous works aim to achieve demographic parity by pursuing the commonly used metric $\Delta DP$. Unfortunately, in this paper, we reveal that the fairness metric $\Delta DP$ can not precisely measure the violation of demographic parity, because it inherently has the following drawbacks: i) zero-value $\Delta DP$ does not guarantee zero violation of demographic parity, ii) $\Delta DP$ values can vary with different classification thresholds. To this end, we propose two new fairness metrics, Area Between Probability density function Curves (ABPC) and Area Between Cumulative density function Curves (ABCC), to precisely measure the violation of demographic parity at the distribution level. The new fairness metrics directly measure the difference between the distributions of the prediction probability for different demographic groups. Thus our proposed new metrics enjoy: i) zero-value ABCC/ABPC guarantees zero violation of demographic parity; ii) ABCC/ABPC guarantees demographic parity while the classification thresholds are adjusted. We further re-evaluate the existing fair models with our proposed fairness metrics and observe different fairness behaviors of those models under the new metrics.

NeurIPS Conference 2023 Conference Paper

Setting the Trap: Capturing and Defeating Backdoors in Pretrained Language Models through Honeypots

  • Ruixiang (Ryan) Tang
  • Jiayi Yuan
  • Yiming Li
  • Zirui Liu
  • Rui Chen
  • Xia Hu

In the field of natural language processing, the prevalent approach involves fine-tuning pretrained language models (PLMs) using local samples. Recent research has exposed the susceptibility of PLMs to backdoor attacks, wherein the adversaries can embed malicious prediction behaviors by manipulating a few training samples. In this study, our objective is to develop a backdoor-resistant tuning procedure that yields a backdoor-free model, no matter whether the fine-tuning dataset contains poisoned samples. To this end, we propose and integrate an \emph{honeypot module} into the original PLM, specifically designed to absorb backdoor information exclusively. Our design is motivated by the observation that lower-layer representations in PLMs carry sufficient backdoor features while carrying minimal information about the original tasks. Consequently, we can impose penalties on the information acquired by the honeypot module to inhibit backdoor creation during the fine-tuning process of the stem network. Comprehensive experiments conducted on benchmark datasets substantiate the effectiveness and robustness of our defensive strategy. Notably, these results indicate a substantial reduction in the attack success rate ranging from 10\% to 40\% when compared to prior state-of-the-art methods.

NeurIPS Conference 2023 Conference Paper

Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model

  • Zirui Liu
  • Guanchu Wang
  • Shaochen (Henry) Zhong
  • Zhaozhuo Xu
  • Daochen Zha
  • Ruixiang (Ryan) Tang
  • Zhimeng (Stephen) Jiang
  • Kaixiong Zhou

As the model size grows rapidly, fine-tuning the large pre-trained language model has become increasingly difficult due to its extensive memory usage. Previous works usually focus on reducing the number of trainable parameters in the network. While the model parameters do contribute to memory usage, the primary memory bottleneck during training arises from storing feature maps, also known as activations, as they are crucial for gradient calculation. Notably, machine learning models are typically trained using stochastic gradient descent. We argue that in stochastic optimization, models can handle noisy gradients as long as the gradient estimator is unbiased with reasonable variance. Following this motivation, we propose a new family of unbiased estimators called \sas, for matrix production with reduced variance, which only requires storing the sub-sampled activations for calculating the gradient. Our work provides both theoretical and experimental evidence that, in the context of tuning transformers, our proposed estimators exhibit lower variance compared to existing ones. By replacing the linear operation with our approximated one in transformers, we can achieve up to 2. 7X peak memory reduction with almost no accuracy drop and enables up to $6. 4\times$ larger batch size. Under the same hardware, \sas enables better down-streaming task performance by applying larger models and/or faster training speed with larger batch sizes. The code is available at https: //anonymous. 4open. science/r/WTACRS-A5C5/.

NeurIPS Conference 2022 Conference Paper

A Comprehensive Study on Large-Scale Graph Training: Benchmarking and Rethinking

  • Keyu Duan
  • Zirui Liu
  • Peihao Wang
  • Wenqing Zheng
  • Kaixiong Zhou
  • Tianlong Chen
  • Xia Hu
  • Zhangyang Wang

Large-scale graph training is a notoriously challenging problem for graph neural networks (GNNs). Due to the nature of evolving graph structures into the training process, vanilla GNNs usually fail to scale up, limited by the GPU memory space. Up to now, though numerous scalable GNN architectures have been proposed, we still lack a comprehensive survey and fair benchmark of this reservoir to find the rationale for designing scalable GNNs. To this end, we first systematically formulate the representative methods of large-scale graph training into several branches and further establish a fair and consistent benchmark for them by a greedy hyperparameter searching. In addition, regarding efficiency, we theoretically evaluate the time and space complexity of various branches and empirically compare them w. r. t GPU memory usage, throughput, and convergence. Furthermore, We analyze the pros and cons for various branches of scalable GNNs and then present a new ensembling training manner, named EnGCN, to address the existing issues. Remarkably, our proposed method has achieved new state-of-the-art (SOTA) performance on large-scale datasets. Our code is available at https: //github. com/VITA-Group/Large Scale GCN_Benchmarking.

IJCAI Conference 2022 Conference Paper

AutoVideo: An Automated Video Action Recognition System

  • Daochen Zha
  • Zaid Pervaiz Bhat
  • Yi-Wei Chen
  • Yicheng Wang
  • Sirui Ding
  • Jiaben Chen
  • Kwei-Herng Lai
  • Mohammad Qazim Bhat

Action recognition is an important task for video understanding with broad applications. However, developing an effective action recognition solution often requires extensive engineering efforts in building and testing different combinations of the modules and their hyperparameters. In this demo, we present AutoVideo, a Python system for automated video action recognition. AutoVideo is featured for 1) highly modular and extendable infrastructure following the standard pipeline language, 2) an exhaustive list of primitives for pipeline construction, 3) data-driven tuners to save the efforts of pipeline tuning, and 4) easy-to-use Graphical User Interface (GUI). AutoVideo is released under MIT license at https: //github. com/datamllab/autovideo

NeurIPS Conference 2022 Conference Paper

DreamShard: Generalizable Embedding Table Placement for Recommender Systems

  • Daochen Zha
  • Louis Feng
  • Qiaoyu Tan
  • Zirui Liu
  • Kwei-Herng Lai
  • Bhargav Bhushanam
  • Yuandong Tian
  • Arun Kejariwal

We study embedding table placement for distributed recommender systems, which aims to partition and place the tables on multiple hardware devices (e. g. , GPUs) to balance the computation and communication costs. Although prior work has explored learning-based approaches for the device placement of computational graphs, embedding table placement remains to be a challenging problem because of 1) the operation fusion of embedding tables, and 2) the generalizability requirement on unseen placement tasks with different numbers of tables and/or devices. To this end, we present DreamShard, a reinforcement learning (RL) approach for embedding table placement. DreamShard achieves the reasoning of operation fusion and generalizability with 1) a cost network to directly predict the costs of the fused operation, and 2) a policy network that is efficiently trained on an estimated Markov decision process (MDP) without real GPU execution, where the states and the rewards are estimated with the cost network. Equipped with sum and max representation reductions, the two networks can directly generalize to any unseen tasks with different numbers of tables and/or devices without fine-tuning. Extensive experiments show that DreamShard substantially outperforms the existing human expert and RNN-based strategies with up to 19% speedup over the strongest baseline on large-scale synthetic tables and our production tables. The code is available.

AAAI Conference 2022 Conference Paper

Orthogonal Graph Neural Networks

  • Kai Guo
  • Kaixiong Zhou
  • Xia Hu
  • Yu Li
  • Yi Chang
  • Xin Wang

Graph neural networks (GNNs) have received tremendous attention due to their superiority in learning node representations. These models rely on message passing and feature transformation functions to encode the structural and feature information from neighbors. However, stacking more convolutional layers significantly decreases the performance of GNNs. Most recent studies attribute this limitation to the over-smoothing issue, where node embeddings converge to indistinguishable vectors. Through a number of experimental observations, we argue that the main factor degrading the performance is the unstable forward normalization and backward gradient resulted from the improper design of the feature transformation, especially for shallow GNNs where the over-smoothing has not happened. Therefore, we propose a novel orthogonal feature transformation, named Ortho- GConv, which could generally augment the existing GNN backbones to stabilize the model training and improve the model’s generalization performance. Specifically, we maintain the orthogonality of the feature transformation comprehensively from three perspectives, namely hybrid weight initialization, orthogonal transformation, and orthogonal regularization. By equipping the existing GNNs (e. g. GCN, JKNet, GCNII) with Ortho-GConv, we demonstrate the generality of the orthogonal feature transformation to enable stable training, and show its effectiveness for node and graph classification tasks.

AAAI Conference 2022 System Paper

RES: An Interpretable Replicability Estimation System for Research Publications

  • Zhuoer Wang
  • Qizhang Feng
  • Mohinish Chatterjee
  • Xing Zhao
  • Yezi Liu
  • Yuening Li
  • Abhay Kumar Singh
  • Frank M. Shipman

Reliable and faithful research is the cornerstone of breakthrough advancements and disruptive innovations. Assessing the credibility of scientific findings and claims in research publications has long been a time-consuming and challenging task for researchers and decision-makers. In this paper, we introduce RES - an intelligent system that assists humans in analyzing the credibility of scientific findings and claims in research publications in the field of social and behavioral sciences by estimating their replicability. The pipeline of RES consists of four major modules that perform feature extraction, replicability estimation, result explanation, and sentiment analysis respectively. Our evaluation based on human experts’ assessments suggests that the RES has achieved adequate performance. The RES is also built with a Graphical User Interface (GUI) that is publicly accessible at https: //tamu-infolab. github. io/RES/.

IJCAI Conference 2022 Conference Paper

Table2Graph: Transforming Tabular Data to Unified Weighted Graph

  • Kaixiong Zhou
  • Zirui Liu
  • Rui Chen
  • Li Li
  • Soo-Hyun Choi
  • Xia Hu

Learning useful interactions between input features is crucial for tabular data modeling. Recent efforts start to explicitly model the feature interactions with graph, where each feature is treated as an individual node. However, the existing graph construction methods either heuristically formulate a fixed feature-interaction graph based on specific domain knowledge, or simply apply attention function to compute the pairwise feature similarities for each sample. While the fixed graph may be sub-optimal to downstream tasks, the sample-wise graph construction is time-consuming during model training and inference. To tackle these issues, we propose a framework named Table2Graph to transform the feature interaction modeling to learning a unified graph. Represented as a probability adjacency matrix, the unified graph learns to model the key feature interactions shared by the diverse samples in the tabular data. To well optimize the unified graph, we employ the reinforcement learning policy to capture the key feature interactions stably. A sparsity constraint is also proposed to regularize the learned graph from being overly-sparse/smooth. The experimental results in a variety of real-world applications demonstrate the effectiveness and efficiency of our Table2Graph, in terms of the prediction accuracy and feature interaction detection.

AAAI Conference 2022 Conference Paper

Towards Debiasing DNN Models from Spurious Feature Influence

  • Mengnan Du
  • Ruixiang Tang
  • Weijie Fu
  • Xia Hu

Recent studies indicate that deep neural networks (DNNs) are prone to show discrimination towards certain demographic groups. We observe that algorithmic discrimination can be explained by the high reliance of the models on fairness sensitive features. Motivated by this observation, we propose to achieve fairness by suppressing the DNN models from capturing the spurious correlation between those fairness sensitive features with the underlying task. Specifically, we first train a bias-only teacher model which is explicitly encouraged to maximally employ fairness sensitive features for prediction. The teacher model then counter-teaches a debiased student model so that the interpretation of the student model is orthogonal to the interpretation of the teacher model. The key idea is that since the teacher model relies explicitly on fairness sensitive features for prediction, the orthogonal interpretation loss enforces the student network to reduce its reliance on sensitive features and instead capture more taskrelevant features for prediction. Experimental analysis indicates that our framework substantially reduces the model’s attention on fairness sensitive features. Experimental results on four datasets further validate that our framework has consistently improved model fairness with respect to group fairness metrics, with a comparable or even better accuracy.

AAAI Conference 2021 Conference Paper

A Unified Taylor Framework for Revisiting Attribution Methods

  • Huiqi Deng
  • Na Zou
  • Mengnan Du
  • Weifu Chen
  • Guocan Feng
  • Xia Hu

Attribution methods have been developed to understand the decision making process of machine learning models, especially deep neural networks, by assigning importance scores to individual features. Existing attribution methods often built upon empirical intuitions and heuristics. There still lacks a general and theoretical framework that not only can unify these attribution methods, but also theoretically reveal their rationales, fidelity, and limitations. To bridge the gap, in this paper, we propose a Taylor attribution framework and reformulate seven mainstream attribution methods into the framework. Based on reformulations, we analyze the attribution methods in terms of rationale, fidelity, and limitation. Moreover, We establish three principles for a good attribution in the Taylor attribution framework, i. e. , low approximation error, correct contribution assignment, and unbiased baseline selection. Finally, we empirically validate the Taylor reformulations, and reveal a positive correlation between the attribution performance and the number of principles followed by the attribution method via benchmarking on real-world datasets.

NeurIPS Conference 2021 Conference Paper

Dirichlet Energy Constrained Learning for Deep Graph Neural Networks

  • Kaixiong Zhou
  • Xiao Huang
  • Daochen Zha
  • Rui Chen
  • Li Li
  • Soo-Hyun Choi
  • Xia Hu

Graph neural networks (GNNs) integrate deep architectures and topological structure modeling in an effective way. However, the performance of existing GNNs would decrease significantly when they stack many layers, because of the over-smoothing issue. Node embeddings tend to converge to similar vectors when GNNs keep recursively aggregating the representations of neighbors. To enable deep GNNs, several methods have been explored recently. But they are developed from either techniques in convolutional neural networks or heuristic strategies. There is no generalizable and theoretical principle to guide the design of deep GNNs. To this end, we analyze the bottleneck of deep GNNs by leveraging the Dirichlet energy of node embeddings, and propose a generalizable principle to guide the training of deep GNNs. Based on it, a novel deep GNN framework -- Energetic Graph Neural Networks (EGNN) is designed. It could provide lower and upper constraints in terms of Dirichlet energy at each layer to avoid over-smoothing. Experimental results demonstrate that EGNN achieves state-of-the-art performance by using deep layers.

AAAI Conference 2021 Conference Paper

Dynamic Memory based Attention Network for Sequential Recommendation

  • Qiaoyu Tan
  • Jianwei Zhang
  • Ninghao Liu
  • Xiao Huang
  • Hongxia Yang
  • Jingren Zhou
  • Xia Hu

Sequential recommendation has become increasingly essential in various online services. It aims to model the dynamic preferences of users from their historical interactions and predict their next items. The accumulated user behavior records on real systems could be very long. This rich data brings opportunities to track actual interests of users. Prior efforts mainly focus on making recommendations based on relatively recent behaviors. However, the overall sequential data may not be effectively utilized, as early interactions might affect users’ current choices. Also, it has become intolerable to scan the entire behavior sequence when performing inference for each user, since real-world system requires short response time. To bridge the gap, we propose a novel long sequential recommendation model, called Dynamic Memory-based Attention Network (DMAN). It segments the overall long behavior sequence into a series of sub-sequences, then trains the model and maintains a set of memory blocks to preserve longterm interests of users. To improve memory fidelity, DMAN dynamically abstracts each user’s long-term interest into its own memory blocks by minimizing an auxiliary reconstruction loss. Based on the dynamic memory, the user’s shortterm and long-term interests can be explicitly extracted and combined for efficient joint recommendation. Empirical results over four benchmark datasets demonstrate the superiority of our model in capturing long-term dependency over various state-of-the-art sequential models.

IS Journal 2021 Journal Article

Fairness in Deep Learning: A Computational Perspective

  • Mengnan Du
  • Fan Yang
  • Na Zou
  • Xia Hu

Fairness in deep learning has attracted tremendous attention recently, as deep learning is increasingly being used in high-stake decision making applications that affect individual lives. We provide a review covering recent progresses to tackle algorithmic fairness problems of deep learning from the computational perspective. Specifically, we show that interpretability can serve as a useful ingredient to diagnose the reasons that lead to algorithmic discrimination. We also discuss fairness mitigation approaches categorized according to three stages of deep learning life-cycle, aiming to push forward the area of fairness in deep learning and build genuinely fair and reliable deep learning systems.

NeurIPS Conference 2021 Conference Paper

Fairness via Representation Neutralization

  • Mengnan Du
  • Subhabrata Mukherjee
  • Guanchu Wang
  • Ruixiang Tang
  • Ahmed Awadallah
  • Xia Hu

Existing bias mitigation methods for DNN models primarily work on learning debiased encoders. This process not only requires a lot of instance-level annotations for sensitive attributes, it also does not guarantee that all fairness sensitive information has been removed from the encoder. To address these limitations, we explore the following research question: Can we reduce the discrimination of DNN models by only debiasing the classification head, even with biased representations as inputs? To this end, we propose a new mitigation technique, namely, Representation Neutralization for Fairness (RNF) that achieves fairness by debiasing only the task-specific classification head of DNN models. To this end, we leverage samples with the same ground-truth label but different sensitive attributes, and use their neutralized representations to train the classification head of the DNN model. The key idea of RNF is to discourage the classification head from capturing spurious correlation between fairness sensitive information in encoder representations with specific class labels. To address low-resource settings with no access to sensitive attribute annotations, we leverage a bias-amplified model to generate proxy annotations for sensitive attributes. Experimental results over several benchmark datasets demonstrate our RNF framework to effectively reduce discrimination of DNN models with minimal degradation in task-specific performance.

NeurIPS Conference 2021 Conference Paper

Revisiting Time Series Outlier Detection: Definitions and Benchmarks

  • Kwei-Herng Lai
  • Daochen Zha
  • Junjie Xu
  • Yue Zhao
  • Guanchu Wang
  • Xia Hu

Time series outlier detection has been extensively studied with many advanced algorithms proposed in the past decade. Despite these efforts, very few studies have investigated how we should benchmark the existing algorithms. In particular, using synthetic datasets for evaluation has become a common practice in the literature, and thus it is crucial to have a general synthetic criterion to benchmark algorithms. This is a non-trivial task because the existing synthetic methods are very different in different applications and the outlier definitions are often ambiguous. To bridge this gap, we propose a behavior-driven taxonomy for time series outliers and categorize outliers into point- and pattern-wise outliers with clear context definitions. Following the new taxonomy, we then present a general synthetic criterion and generate 35 synthetic datasets accordingly. We further identify 4 multivariate real-world datasets from different domains and benchmark 9 algorithms on the synthetic and the real-world datasets. Surprisingly, we observe that some classical algorithms could outperform many recent deep learning approaches. The datasets, pre-processing and synthetic scripts, and the algorithm implementations are made publicly available at https: //github. com/datamllab/tods/tree/benchmark

AAAI Conference 2021 System Paper

TODS: An Automated Time Series Outlier Detection System

  • Kwei-Herng Lai
  • Daochen Zha
  • Guanchu Wang
  • Junjie Xu
  • Yue Zhao
  • Devesh Kumar
  • Yile Chen
  • Purav Zumkhawaka

We present TODS, an automated Time Series Outlier Detection System for research and industrial applications. TODS is a highly modular system that supports easy pipeline construction. The basic building block of TODS is primitive, which is an implementation of a function with hyperparameters. TODS currently supports 70 primitives, including data processing, time series processing, feature analysis, detection algorithms, and a reinforcement module. Users can freely construct a pipeline using these primitives and perform endto-end outlier detection with the constructed pipeline. TODS provides a Graphical User Interface (GUI), where users can flexibly design a pipeline with drag-and-drop. Moreover, a data-driven searcher is provided to automatically discover the most suitable pipelines given a dataset. TODS is released under Apache 2. 0 license at https: //github. com/datamllab/tods. A video is available on YouTube1.

NeurIPS Conference 2020 Conference Paper

Detecting Interactions from Neural Networks via Topological Analysis

  • Zirui Liu
  • Qingquan Song
  • Kaixiong Zhou
  • Ting-Hsiang Wang
  • Ying Shan
  • Xia Hu

Detecting statistical interactions between input features is a crucial and challenging task. Recent advances demonstrate that it is possible to extract learned interactions from trained neural networks. It has also been observed that, in neural networks, any interacting features must follow a strongly weighted connection to common hidden units. Motivated by the observation, in this paper, we propose to investigate the interaction detection problem from a novel topological perspective by analyzing the connectivity in neural networks. Specially, we propose a new measure for quantifying interaction strength, based upon the well-received theory of persistent homology. Based on this measure, a Persistence Interaction Dection (PID) algorithm is developed to efficiently detect interactions. Our proposed algorithm is evaluated across a number of interaction detection tasks on several synthetic and real-world datasets with different hyperparameters. Experimental results validate that the PID algorithm outperforms the state-of-the-art baselines.

IJCAI Conference 2020 Conference Paper

Dual Policy Distillation

  • Kwei-Herng Lai
  • Daochen Zha
  • Yuening Li
  • Xia Hu

Policy distillation, which transfers a teacher policy to a student policy has achieved great success in challenging tasks of deep reinforcement learning. This teacher-student framework requires a well-trained teacher model which is computationally expensive. Moreover, the performance of the student model could be limited by the teacher model if the teacher model is not optimal. In the light of collaborative learning, we study the feasibility of involving joint intellectual efforts from diverse perspectives of student models. In this work, we introduce dual policy distillation (DPD), a student-student framework in which two learners operate on the same environment to explore different perspectives of the environment and extract knowledge from each other to enhance their learning. The key challenge in developing this dual learning framework is to identify the beneficial knowledge from the peer learner for contemporary learning-based reinforcement learning algorithms, since it is unclear whether the knowledge distilled from an imperfect and noisy peer learner would be helpful. To address the challenge, we theoretically justify that distilling knowledge from a peer learner will lead to policy improvement and propose a disadvantageous distillation strategy based on the theoretical results. The conducted experiments on several continuous control tasks show that the proposed framework achieves superior performance with a learning-based agent and function approximation without the use of expensive teacher models.

IJCAI Conference 2020 Conference Paper

Multi-Channel Graph Neural Networks

  • Kaixiong Zhou
  • Qingquan Song
  • Xiao Huang
  • Daochen Zha
  • Na Zou
  • Xia Hu

The classification of graph-structured data has be-come increasingly crucial in many disciplines. It has been observed that the implicit or explicit hierarchical community structures preserved in real-world graphs could be useful for downstream classification applications. A straightforward way to leverage the hierarchical structure is to make use the pooling algorithms to cluster nodes into fixed groups, and shrink the input graph layer by layer to learn the pooled graphs. However, the pool shrinking discards the graph details to make it hard to distinguish two non-isomorphic graphs, and the fixed clustering ignores the inherent multiple characteristics of nodes. To compensate the shrinking loss and learn the various nodes’ characteristics, we propose the multi-channel graph neural networks (MuchGNN). Motivated by the underlying mechanisms developed in convolutional neural networks, we define the tailored graph convolutions to learn a series of graph channels at each layer, and shrink the graphs hierarchically to en-code the pooled structures. Experimental results on real-world datasets demonstrate the superiority of MuchGNN over the state-of-the-art methods.

IJCAI Conference 2020 Conference Paper

RLCard: A Platform for Reinforcement Learning in Card Games

  • Daochen Zha
  • Kwei-Herng Lai
  • Songyi Huang
  • Yuanpu Cao
  • Keerthana Reddy
  • Juan Vargas
  • Alex Nguyen
  • Ruzhe Wei

We present RLCard, a Python platform for reinforcement learning research and development in card games. RLCard supports various card environments and several baseline algorithms with unified easy-to-use interfaces, aiming at bridging reinforcement learning and imperfect information games. The platform provides flexible configurations of state representation, action encoding, and reward design. RLCard also supports visualizations for algorithm debugging. In this demo, we showcase two representative environments and their visualization results. We conclude this demo with challenges and research opportunities brought by RLCard. A video is available on YouTube.

NeurIPS Conference 2020 Conference Paper

Towards Deeper Graph Neural Networks with Differentiable Group Normalization

  • Kaixiong Zhou
  • Xiao Huang
  • Yuening Li
  • Daochen Zha
  • Rui Chen
  • Xia Hu

Graph neural networks (GNNs), which learn the representation of a node by aggregating its neighbors, have become an effective computational tool in downstream applications. Over-smoothing is one of the key issues which limit the performance of GNNs as the number of layers increases. It is because the stacked aggregators would make node representations converge to indistinguishable vectors. Several attempts have been made to tackle the issue by bringing linked node pairs close and unlinked pairs distinct. However, they often ignore the intrinsic community structures and would result in sub-optimal performance. The representations of nodes within the same community/class need be similar to facilitate the classification, while different classes are expected to be separated in embedding space. To bridge the gap, we introduce two over-smoothing metrics and a novel technique, i. e. , differentiable group normalization (DGN). It normalizes nodes within the same group independently to increase their smoothness, and separates node distributions among different groups to significantly alleviate the over-smoothing issue. Experiments on real-world datasets demonstrate that DGN makes GNN models more robust to over-smoothing and achieves better performance with deeper GNNs.

AAAI Conference 2019 Conference Paper

Deep Bayesian Optimization on Attributed Graphs

  • Jiaxu Cui
  • Bo Yang
  • Xia Hu

Attributed graphs, which contain rich contextual features beyond just network structure, are ubiquitous and have been observed to benefit various network analytics applications. Graph structure optimization, aiming to find the optimal graphs in terms of some specific measures, has become an effective computational tool in complex network analysis. However, traditional model-free methods suffer from the expensive computational cost of evaluating graphs; existing vectorial Bayesian optimization methods cannot be directly applied to attributed graphs and have the scalability issue due to the use of Gaussian processes (GPs). To bridge the gap, in this paper, we propose a novel scalable Deep Graph Bayesian Optimization (DGBO) method on attributed graphs. The proposed DGBO prevents the cubical complexity of the GPs by adopting a deep graph neural network to surrogate black-box functions, and can scale linearly with the number of observations. Intensive experiments are conducted on both artificial and real-world problems, including molecular discovery and urban road network design, and demonstrate the effectiveness of the DGBO compared with the state-of-the-art.

IJCAI Conference 2019 Conference Paper

Experience Replay Optimization

  • Daochen Zha
  • Kwei-Herng Lai
  • Kaixiong Zhou
  • Xia Hu

Experience replay enables reinforcement learning agents to memorize and reuse past experiences, just as humans replay memories for the situation at hand. Contemporary off-policy algorithms either replay past experiences uniformly or utilize a rule-based replay strategy, which may be sub-optimal. In this work, we consider learning a replay policy to optimize the cumulative reward. Replay learning is challenging because the replay memory is noisy and large, and the cumulative reward is unstable. To address these issues, we propose a novel experience replay optimization (ERO) framework which alternately updates two policies: the agent policy, and the replay policy. The agent is updated to maximize the cumulative reward based on the replayed data, while the replay policy is updated to provide the agent with the most useful experiences. The conducted experiments on various continuous control tasks demonstrate the effectiveness of ERO, empirically showing promise in experience replay learning to improve the performance of off-policy reinforcement learning algorithms.

AAAI Conference 2019 Conference Paper

Interpreting Deep Models for Text Analysis via Optimization and Regularization Methods

  • Hao Yuan
  • Yongjun Chen
  • Xia Hu
  • Shuiwang Ji

Interpreting deep neural networks is of great importance to understand and verify deep models for natural language processing (NLP) tasks. However, most existing approaches only focus on improving the performance of models but ignore their interpretability. In this work, we propose an approach to investigate the meaning of hidden neurons of the convolutional neural network (CNN) models. We first employ saliency map and optimization techniques to approximate the detected information of hidden neurons from input sentences. Then we develop regularization terms and explore words in vocabulary to interpret such detected information. Experimental results demonstrate that our approach can identify meaningful and reasonable interpretations for hidden spatial locations. Additionally, we show that our approach can describe the decision procedure of deep NLP models.

AAAI Conference 2019 Conference Paper

Large-Scale Heterogeneous Feature Embedding

  • Xiao Huang
  • Qingquan Song
  • Fan Yang
  • Xia Hu

Feature embedding aims to learn a low-dimensional vector representation for each instance to preserve the information in its features. These representations can benefit various offthe-shelf learning algorithms. While embedding models for a single type of features have been well-studied, real-world instances often contain multiple types of correlated features or even information within a different modality such as networks. Existing studies such as multiview learning show that it is promising to learn unified vector representations from all sources. However, high computational costs of incorporating heterogeneous information limit the applications of existing algorithms. The number of instances and dimensions of features in practice are often large. To bridge the gap, we propose a scalable framework FeatWalk, which can model and incorporate instance similarities in terms of different types of features into a unified embedding representation. To enable the scalability, FeatWalk does not directly calculate any similarity measure, but provides an alternative way to simulate the similarity-based random walks among instances to extract the local instance proximity and preserve it in a set of instance index sequences. These sequences are homogeneous with each other. A scalable word embedding algorithm is applied to them to learn a joint embedding representation of instances. Experiments on four real-world datasets demonstrate the efficiency and effectiveness of FeatWalk.

AAAI Conference 2019 Conference Paper

Robust Negative Sampling for Network Embedding

  • Mohammadreza Armandpour
  • Patrick Ding
  • Jianhua Huang
  • Xia Hu

Many recent network embedding algorithms use negative sampling (NS) to approximate a variant of the computationally expensive Skip-Gram neural network architecture (SGA) objective. In this paper, we provide theoretical arguments that reveal how NS can fail to properly estimate the SGA objective, and why it is not a suitable candidate for the network embedding problem as a distinct objective. We show NS can learn undesirable embeddings, as the result of the “Popular Neighbor Problem. ” We use the theory to develop a new method “R-NS” that alleviates the problems of NS by using a more intelligent negative sampling scheme and careful penalization of the embeddings. R-NS is scalable to large-scale networks, and we empirically demonstrate the superiority of R-NS over NS for multi-label classification on a variety of real-world networks including social networks and language networks.

IJCAI Conference 2018 Conference Paper

Contextual Outlier Interpretation

  • Ninghao Liu
  • Donghwa Shin
  • Xia Hu

While outlier detection has been intensively studied in many applications, interpretation is becoming increasingly important to help people trust and evaluate the developed detection models through providing intrinsic reasons why the given outliers are identified. It is a nontrivial task for interpreting the abnormality of outliers due to the distinct characteristics of different detection models, complicated structures of data in certain applications, and imbalanced distribution of outliers and normal instances. In addition, contexts where outliers locate, as well as the relation between outliers and the contexts, are usually overlooked in existing interpretation frameworks. To tackle the issues, in this paper, we propose a Contextual Outlier INterpretation (COIN) framework to explain the abnormality of outliers spotted by detectors. The interpretability of an outlier is achieved through three aspects, i. e. , outlierness score, attributes that contribute to the abnormality, and contextual description of its neighborhoods. Experimental results on various types of datasets demonstrate the flexibility and effectiveness of the proposed framework.

AAAI Conference 2018 Conference Paper

Link Prediction With Personalized Social Influence

  • Zepeng Huo
  • Xiao Huang
  • Xia Hu

Link prediction in social networks is to infer the new links likely to be formed next or to reconstruct the links that are currently missing. Other than the pure topological network structures, social networks are often associated with rich information of social activities of users, such as tweeting, retweeting, and replying. Social theories such as social in- fluence indicate that social activities could have potential impacts on the neighbors, and links in social media could be the results of the social influence among users. It motivates us to learn and model social influence among users to tackle the link prediction problem. However, this is a non-trivial task since it is challenging to model heterogeneous social activities. Traditional methods often define universal metrics of social influence for all users, but even for the same activity of a user, the influence towards different neighbors might not be the same. It motivates a personalized learning schema. In information theory, if a time-series signal influences another, then the uncertainty in the latter one will be reduced, given the distribution of the former one. Thus, we are motivated to learn social influence based on the timestamps of social activities. Given the timestamps of each user, we use entropy to measure the reduction of uncertainty of his/her neighbors. The learned social influence is then incorporated into a graph based link prediction model to perform joint learning. Through comprehensive experiments, we demonstrate that the proposed framework can perform better than the state-of-the-art methods on different real-world networks.

TIST Journal 2018 Journal Article

Understanding and Identifying Rhetorical Questions in Social Media

  • Suhas Ranganath
  • Xia Hu
  • Jiliang Tang
  • Suhang Wang
  • Huan Liu

Social media provides a platform for seeking information from a large user base. Information seeking in social media, however, occurs simultaneously with users expressing their viewpoints by making statements. Rhetorical questions have the form of a question but serve the function of a statement and are an important tool employed by users to express their viewpoints. Therefore, rhetorical questions might mislead platforms assisting information seeking in social media. It becomes difficult to identify rhetorical questions as they are not syntactically different from other questions. In this article, we develop a framework to identify rhetorical questions by modeling some motivations of the users to post them. We focus on two motivations of the users drawing from linguistic theories to implicitly convey a message and to modify the strength of a statement previously made. We develop a quantitative framework from these motivations to identify rhetorical questions in social media. We evaluate the framework using two datasets of questions posted on a social media platform Twitter and demonstrate its effectiveness in identifying rhetorical questions. This is the first framework, to the best of our knowledge, to model the possible motivations for posting rhetorical questions to identify them on social media platforms.

IJCAI Conference 2017 Conference Paper

Accelerated Local Anomaly Detection via Resolving Attributed Networks

  • Ninghao Liu
  • Xiao Huang
  • Xia Hu

Attributed networks, in which network connectivity and node attributes are available, have been increasingly used to model real-world information systems, such as social media and e-commerce platforms. While outlier detection has been extensively studied to identify anomalies that deviate from certain chosen background, existing algorithms cannot be directly applied on attributed networks due to the heterogeneous types of information and the scale of real-world data. Meanwhile, it has been observed that local anomalies, which may align with global condition, are hard to be detected by existing algorithms with interpretability. Motivated by the observations, in this paper, we propose to study the problem of effective and efficient local anomaly detection in attributed networks. In particular, we design a collective way for modeling heterogeneous network and attribute information, and develop a novel and efficient distributed optimization algorithm to handle large-scale data. In the experiments, we compare the proposed framework with the state-of-the-art methods on both real and synthetic datasets, and demonstrate its effectiveness and efficiency through quantitative evaluation and case studies.

IJCAI Conference 2017 Conference Paper

Radar: Residual Analysis for Anomaly Detection in Attributed Networks

  • Jundong Li
  • Harsh Dani
  • Xia Hu
  • Huan Liu

Attributed networks are pervasive in different domains, ranging from social networks, gene regulatory networks to financial transaction networks. This kind of rich network representation presents challenges for anomaly detection due to the heterogeneity of two data representations. A vast majority of existing algorithms assume certain properties of anomalies are given a prior. Since various types of anomalies in real-world attributed networks co-exist, the assumption that priori knowledge regarding anomalies is available does not hold. In this paper, we investigate the problem of anomaly detection in attributed networks generally from a residual analysis perspective, which has been shown to be effective in traditional anomaly detection problems. However, it is a non-trivial task in attributed networks as interactions among instances complicate the residual modeling process. Methodologically, we propose a learning framework to characterize the residuals of attribute information and its coherence with network information for anomaly detection. By learning and analyzing the residuals, we detect anomalies whose behaviors are singularly different from the majority. Experiments on real datasets show the effectiveness and generality of the proposed framework.

AAAI Conference 2016 Conference Paper

Recommendation with Social Dimensions

  • Jiliang Tang
  • Suhang Wang
  • Xia Hu
  • Dawei Yin
  • Yingzhou Bi
  • Yi Chang
  • Huan Liu

The pervasive presence of social media greatly enriches online users’ social activities, resulting in abundant social relations. Social relations provide an independent source for recommendation, bringing about new opportunities for recommender systems. Exploiting social relations to improve recommendation performance attracts a great amount of attention in recent years. Most existing social recommender systems treat social relations homogeneously and make use of direct connections (or strong dependency connections). However, connections in online social networks are intrinsically heterogeneous and are a composite of various relations. While connected users in online social networks form groups, and users in a group share similar interests, weak dependency connections are established among these users when they are not directly connected. In this paper, we investigate how to exploit the heterogeneity of social relations and weak dependency connections for recommendation. In particular, we employ social dimensions to simultaneously capture heterogeneity of social relations and weak dependency connections, and provide principled ways to model social dimensions, and propose a recommendation framework SoDimRec which incorporates heterogeneity of social relations and weak dependency connections based on social dimensions. Experimental results on real-world data sets demonstrate the effectiveness of the proposed framework. We conduct further experiments to understand the important role of social dimensions in the proposed framework.

AAAI Conference 2015 Conference Paper

Burst Time Prediction in Cascades

  • Senzhang Wang
  • Zhao Yan
  • Xia Hu
  • Philip S. Yu
  • Zhoujun Li

Studying the bursty nature of cascades in social media is practically important in many applications such as product sales prediction, disaster relief, and stock market prediction. Although the cascade volume prediction has been extensively studied, how to predict when a burst will come remains an open problem. It is challenging to predict the time of the burst due to the “quick rise and fall” pattern and the diverse time spans of the cascades. To this end, this paper proposes a classification based approach for burst time prediction by utilizing and modeling rich knowledge in information diffusion. Particularly, we first propose a time window based approach to predict in which time window the burst will appear. This paves the way to transform the time prediction task to a classification problem. To address the challenge that the original time series data of the cascade popularity only are not sufficient for predicting cascades with diverse magnitudes and time spans, we explore rich information diffusion related knowledge and model them in a scale-independent manner. Extensive experiments on a Sina Weibo reposting dataset demonstrate the superior performance of the proposed approach in accurately predicting the burst time of posts.

AAAI Conference 2015 Conference Paper

Content-Aware Point of Interest Recommendation on Location-Based Social Networks

  • Huiji Gao
  • Jiliang Tang
  • Xia Hu
  • Huan Liu

The rapid urban expansion has greatly extended the physical boundary of users’ living area and developed a large number of POIs (points of interest). POI recommendation is a task that facilitates users’ urban exploration and helps them filter uninteresting POIs for decision making. While existing work of POI recommendation on location-based social networks (LBSNs) discovers the spatial, temporal, and social patterns of user check-in behavior, the use of content information has not been systematically studied. The various types of content information available on LBSNs could be related to different aspects of a user’s check-in action, providing a unique opportunity for POI recommendation. In this work, we study the content information on LB- SNs w. r. t. POI properties, user interests, and sentiment indications. We model the three types of information under a unified POI recommendation framework with the consideration of their relationship to check-in actions. The experimental results exhibit the significance of content information in explaining user behavior, and demonstrate its power to improve POI recommendation performance on LBSNs.

IJCAI Conference 2015 Conference Paper

Learning Geographical Hierarchy Features for Social Image Location Prediction

  • Xiaoming Zhang
  • Xia Hu
  • Zhoujun Li

Image location prediction is to estimate the geolocation where an image is taken. Social image contains heterogeneous contents, which makes image location prediction nontrivial. Moreover, it is observed that image content patterns and location preferences correlate hierarchically. Traditional image location prediction methods mainly adopt a single-level architecture, which is not directly adaptable to the hierarchical correlation. In this paper, we propose a geographically hierarchical bi-modal deep belief network model (GH- BDBN), which is a compositional learning architecture that integrates multi-modal deep learning model with non-parametric hierarchical prior model. GH-BDBN learns a joint representation capturing the correlations among different types of image content using a bi-modal DBN, with a geographically hierarchical prior over the joint representation to model the hierarchical correlation between image content and location. Experimental results demonstrate the superiority of our model for image location prediction.

AAAI Conference 2014 Conference Paper

Online Social Spammer Detection

  • Xia Hu
  • Jiliang Tang
  • Huan Liu

The explosive use of social media also makes it a popular platform for malicious users, known as social spammers, to overwhelm normal users with unwanted content. One effective way for social spammer detection is to build a classifier based on content and social network information. However, social spammers are sophisticated and adaptable to game the system with fast evolving content and network patterns. First, social spammers continually change their spamming content patterns to avoid being detected. Second, reflexive reciprocity makes it easier for social spammers to establish social influence and pretend to be normal users by quickly accumulating a large number of “human” friends. It is challenging for existing anti-spamming systems based on batch-mode learning to quickly respond to newly emerging patterns for effective social spammer detection. In this paper, we present a general optimization framework to collectively use content and network information for social spammer detection, and provide the solution for efficient online processing. Experimental results on Twitter datasets confirm the effectiveness and efficiency of the proposed framework.

IJCAI Conference 2013 Conference Paper

Exploiting Local and Global Social Context for Recommendation

  • Jiliang Tang
  • Xia Hu
  • Huiji Gao
  • Huan Liu

With the fast development of social media, the information overload problem becomes increasingly severe and recommender systems play an important role in helping online users find relevant information by suggesting information of potential interests. Social activities for online users produce abundant social relations. Social relations provide an independent source for recommendation, presenting both opportunities and challenges for traditional recommender systems. Users are likely to seek suggestions from both their local friends and users with high global reputations, motivating us to exploit social relations from local and global perspectives for online recommender systems in this paper. We develop approaches to capture local and global social relations, and propose a novel framework LOCABAL taking advantage of both local and global social context for recommendation. Empirical results on real-world datasets demonstrate the effectiveness of our proposed framework and further experiments are conducted to understand how local and global social context work for the proposed framework.

AAAI Conference 2013 Conference Paper

From Interest to Function: Location Estimation in Social Media

  • Yan Chen
  • Jichang Zhao
  • Xia Hu
  • Xiaoming Zhang
  • Zhoujun Li
  • Tat-Seng Chua

Recent years have witnessed the tremendous development of social media, which attracts a vast number of Internet users. The high-dimension content generated by these users provides an unique opportunity to understand their behavior deeply. As one of the most fundamental topics, location estimation attracts more and more research efforts. Different from the previous literature, we find that user’s location is strongly related to user interest. Based on this, we first build a detection model to mine user interest from short text. We then establish the mapping between location function and user interest before presenting an efficient framework to predict the user’s location with convincing fidelity. Thorough evaluations and comparisons on an authentic data set show that our proposed model significantly outperforms the state-of-the-arts approaches. Moreover, the high efficiency of our model also guarantees its applicability in real-world scenarios.

IJCAI Conference 2013 Conference Paper

Social Spammer Detection in Microblogging

  • Xia Hu
  • Jiliang Tang
  • Yanchao Zhang
  • Huan Liu

The availability of microblogging, like Twitter and Sina Weibo, makes it a popular platform for spammers to unfairly overpower normal users with unwanted content via social networks, known as social spamming. The rise of social spamming can significantly hinder the use of microblogging systems for effective information dissemination and sharing. Distinct features of microblogging systems present new challenges for social spammer detection. First, unlike traditional social networks, microblogging allows to establish some connections between two parties without mutual consent, which makes it easier for spammers to imitate normal users by quickly accumulating a large number of “human” friends. Second, microblogging messages are short, noisy, and unstructured. Traditional social spammer detection methods are not directly applicable to microblogging. In this paper, we investigate how to collectively use network and content information to perform effective social spammer detection in microblogging. In particular, we present an optimization formulation that models the social network and content information in a unified framework. Experiments on a real-world Twitter dataset demonstrate that our proposed method can effectively utilize both kinds of information for social spammer detection.