Author name cluster

Bin Shi

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

14 papers

2 author rows

AAAI Conference 2026 Conference Paper

Enhancing Pre-training Data Detection in LLMs Through Discriminative and Symmetric Prefix Selection

Kai Sun
Yuxin Lin
Bo Dong
Jingyao Zhang
Bin Shi

The rapid development of large language models (LLMs) has relied on access to high-quality, large-scale datasets, yet growing concerns around data privacy and security have spurred substantial research into pre-training data detection. While state-of-the-art (SOTA) methods such as RECALL and CON-RECALL leverage auxiliary prefixes to enhance detection performance, their dependence on individual prefixes introduces notable instability across varying prefix conditions. To address this, we first conduct a theoretical analysis to assess the impact of prefixes on existing prefix-based methods. Building on the analysis, we propose a novel prefix selection method to identify optimal prefixes. Specifically, our method derives two key criteria Discriminability and Symmetry. These criteria serve to quantify the effectiveness of prefixes in detecting pre-training data, enabling precise selection of high-performing candidate prefixes. Experiments on the WikiMIA dataset demonstrate that our method consistently improves the performance of RECALL and CON-RECALL, achieving gains of up to 21.1% in AUC scores while significantly enhancing robustness.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Scope Delineation Before Localization: A Two-Stage Framework for Enhancing Failure Attribution in Multi-Agent Systems

Kai Sun
Wenqiang Li
Bo Dong
Yuxin Lin
Jingyao Zhang
Bin Shi

Large language models (LLMs) are seeing growing adoption in multi-agent systems. In these systems, efficient failure attribution is critical for ensuring robustness and interpretability. Current LLM-based attribution methods often face challenges with lengthy logs and lacking expert knowledge. Drawing inspiration from human debugging strategies, we propose an automated failure attribution framework, Scope Delineation Before Localization, which operates in two key stages: (1) identifying the failure scope and (2) pinpointing the failure step. By decoupling failure attribution into the two stages, our approach alleviates the reasoning workload of LLMs, enabling more precise failure attribution. To support scope delineation, we further introduce two strategies: Stepwise Scope Delineation and Expertise-Assisted Scope Delineation. Experiments on the Who&When dataset validate the efficacy of our two-stage framework, demonstrating substantial improvements over prior methods (up to 24.27% on step-level accuracy).

PDF Details DOI

AAAI Conference 2025 Conference Paper

Out-of-Distribution Generalization on Graphs via Progressive Inference

Yiming Xu
Bin Shi
Zhen Peng
Huixiang Liu
Bo Dong
Chen Chen

The development and evaluation of graph neural networks (GNNs) generally follow the independent and identically distributed (i.i.d.) assumption. Yet this assumption is often untenable in practice due to the uncontrollable data generation mechanism. In particular, when the data distribution shows a significant shift, most GNNs would fail to produce reliable predictions and may even make decisions randomly. One of the most promising solutions to improve the model generalization is to pick out causal invariant parts in the input graph. Nonetheless, we observe a significant distribution gap between the causal parts learned by existing methods and the ground-truth, leading to undesirable performance. In response to the above issues, this paper presents GPro, a model that learns graph causal invariance with progressive inference. Specifically, the complicated graph causal invariant learning is decomposed into multiple intermediate inference steps from easy to hard, and the perception of GPro is continuously strengthened through a progressive inference process to extract causal features that are stable to distribution shifts. We also enlarge the training distribution by creating counterfactual samples to enhance the capability of the GPro in capturing the causal invariant parts. Extensive experiments demonstrate that our proposed GPro outperforms the state-of-the-art methods by 4.91% on average. For datasets with more severe distribution shifts, the performance improvement can be up to 6.86%.

PDF Details DOI

ICML Conference 2025 Conference Paper

Quantum Optimization via Gradient-Based Hamiltonian Descent

Jiaqi Leng 0001
Bin Shi

With rapid advancements in machine learning, first-order algorithms have emerged as the backbone of modern optimization techniques, owing to their computational efficiency and low memory requirements. Recently, the connection between accelerated gradient methods and damped heavy-ball motion, particularly within the framework of Hamiltonian dynamics, has inspired the development of innovative quantum algorithms for continuous optimization. One such algorithm, Quantum Hamiltonian Descent (QHD), leverages quantum tunneling to escape saddle points and local minima, facilitating the discovery of global solutions in complex optimization landscapes. However, QHD faces several challenges, including slower convergence rates compared to classical gradient methods and limited robustness in highly non-convex problems due to the non-local nature of quantum states. Furthermore, the original QHD formulation primarily relies on function value information, which limits its effectiveness. Inspired by insights from high-resolution differential equations that have elucidated the acceleration mechanisms in classical methods, we propose an enhancement to QHD by incorporating gradient information, leading to what we call gradient-based QHD. This gradient-based QHD achieves faster convergence and significantly increases the likelihood of identifying global solutions. Numerical simulations on challenging problem instances demonstrate that this gradient-based QHD outperforms existing quantum and classical methods by at least an order of magnitude.

Details

AAAI Conference 2025 Conference Paper

Revisiting Graph Contrastive Learning on Anomaly Detection: A Structural Imbalance Perspective

Yiming Xu
Zhen Peng
Bin Shi
Xu Hua
Bo Dong
Song Wang
Chen Chen

The superiority of graph contrastive learning (GCL) has prompted its application to anomaly detection tasks for more powerful risk warning systems. Unfortunately, existing GCL-based models tend to excessively prioritize overall detection performance while neglecting robustness to structural imbalance, which can be problematic for many real-world networks following power-law degree distributions. Particularly, GCL-based methods may fail to capture tail anomalies (abnormal nodes with low degrees). This raises concerns about the security and robustness of current anomaly detection algorithms and therefore hinders their applicability in a variety of realistic high-risk scenarios. To the best of our knowledge, research on the robustness of graph anomaly detection to structural imbalance has received little scrutiny. To address the above issues, this paper presents a novel GCL-based framework named AD-GCL. It devises the neighbor pruning strategy to filter noisy edges for head nodes and facilitate the detection of genuine tail nodes by aligning from head nodes to forged tail nodes. Moreover, AD-GCL actively explores potential neighbors to enlarge the receptive field of tail nodes through anomaly-guided neighbor completion. We further introduce intra- and inter-view consistency loss of the original and augmentation graph for enhanced representation. The performance evaluation of the whole, head, and tail nodes on multiple datasets validates the comprehensive superiority of the proposed AD-GCL in detecting both head anomalies and tail anomalies.

PDF Details DOI

ECAI Conference 2025 Conference Paper

Text-Attributed Graph Anomaly Detection via Multi-Scale Cross- and Uni-Modal Contrastive Learning

Yiming Xu 0001
Xu Hua
Zhen Peng 0005
Bin Shi
Jiarun Chen
Xingbo Fu
Song Wang 0013
Bo Dong 0001

The widespread application of graph data in various high-risk scenarios has increased attention to graph anomaly detection (GAD). Faced with real-world graphs that often carry node descriptions in the form of raw text sequences, termed text-attributed graphs (TAGs), existing graph anomaly detection pipelines typically involve shallow embedding techniques to encode such textual information into features, and then rely on complex self-supervised tasks within the graph domain to detect anomalies. However, this text encoding process is separated from the anomaly detection training objective in the graph domain, making it difficult to ensure that the extracted textual features focus on GAD-relevant information, seriously constraining the detection capability. How to seamlessly integrate raw text and graph topology to unleash the vast potential of cross-modal data in TAGs for anomaly detection poses a challenging issue. This paper presents a novel end-to-end paradigm for text-attributed graph anomaly detection, named CMUCL. We simultaneously model data from both text and graph structures, and jointly train text and graph encoders by leveraging cross-modal and uni-modal multi-scale consistency to uncover potential anomaly-related information. Accordingly, we design an anomaly score estimator based on inconsistency mining to derive node-specific anomaly scores. Considering the lack of benchmark datasets tailored for anomaly detection on TAGs, we release 8 datasets to facilitate future research. Extensive evaluations show that CMUCL significantly advances in text-attributed graph anomaly detection, delivering an 11. 13% increase in average accuracy (AP) over the suboptimal.

Details

AAAI Conference 2025 Conference Paper

VERO: Verification and Zero-Shot Feedback Acquisition for Few-Shot Multimodal Aspect-Level Sentiment Classification

Kai Sun
Hao Wu
Bin Shi
Samuel Mensah
Peng Liu
Bo Dong

Deep learning approaches for multimodal aspect-level sentiment classification (MALSC) often require extensive data, which is costly and time-consuming to obtain. To mitigate this, current methods typically fine-tune small-scale pretrained models like BERT and BART with few-shot examples. While these models have shown success, Large Vision-Language Models (LVLMs) offer significant advantages due to their greater capacity and ability to understand nuanced language in both zero-shot and few-shot settings. However, there is limited work on fine-tuning LVLMs for MALSC. A major challenge lies in selecting few-shot examples that effectively capture the underlying patterns in data for these LVLMs. To bridge this research gap, we propose an acquisition function designed to select challenging samples for the few-shot learning of LVLMs for MALSC. We compare our approach, Verification and ZERO-shot feedback acquisition (VERO), with diverse acquisition functions for few-shot learning in MALSC. Our experiments show that VERO outperforms prior methods, achieving an F1 score improvement of up to 6.07% on MALSC benchmark datasets.

PDF Details DOI

JMLR Journal 2024 Journal Article

On the Hyperparameters in Stochastic Gradient Descent with Momentum

Bin Shi

Following the same routine as Shi et al. (2023), we continue to present the theoretical analysis for stochastic gradient descent with momentum (SGD with momentum) in this paper. Differently, for SGD with momentum, we demonstrate that the two hyperparameters together, the learning rate and the momentum coefficient, play a significant role in the linear convergence rate in non-convex optimizations. Our analysis is based on using a hyperparameters-dependent stochastic differential equation (hp-dependent SDE) that serves as a continuous surrogate for SGD with momentum. Similarly, we establish the linear convergence for the continuous-time formulation of SGD with momentum and obtain an explicit expression for the optimal linear rate by analyzing the spectrum of the Kramers-Fokker-Planck operator. By comparison, we demonstrate how the optimal linear rate of convergence and the final gap for SGD only about the learning rate varies with the momentum coefficient increasing from zero to one when the momentum is introduced. Then, we propose a mathematical interpretation of why, in practice, SGD with momentum converges faster and is more robust in the learning rate than standard stochastic gradient descent (SGD). Finally, we show the Nesterov momentum under the presence of noise has no essential difference from the traditional momentum. [abs] [ pdf ][ bib ] &copy JMLR 2024. ( edit, beta )

PDF Details

AAAI Conference 2024 Conference Paper

RR-PU: A Synergistic Two-Stage Positive and Unlabeled Learning Framework for Robust Tax Evasion Detection

Shuzhi Cao
Jianfei Ruan
Bo Dong
Bin Shi
Qinghua Zheng

Tax evasion, an unlawful practice in which taxpayers deliberately conceal information to avoid paying tax liabilities, poses significant challenges for tax authorities. Effective tax evasion detection is critical for assisting tax authorities in mitigating tax revenue loss. Recently, machine-learning-based methods, particularly those employing positive and unlabeled (PU) learning, have been adopted for tax evasion detection, achieving notable success. However, these methods exhibit two major practical limitations. First, their success heavily relies on the strong assumption that the label frequency (the fraction of identified taxpayers among tax evaders) is known in advance. Second, although some methods attempt to estimate label frequency using approaches like Mixture Proportion Estimation (MPE) without making any assumptions, they subsequently construct a classifier based on the error-prone label frequency obtained from the previous estimation. This two-stage approach may not be optimal, as it neglects error accumulation in classifier training resulting from the estimation bias in the first stage. To address these limitations, we propose a novel PU learning-based tax evasion detection framework called RR-PU, which can revise the bias in a two-stage synergistic manner. Specifically, RR-PU refines the label frequency initialization by leveraging a regrouping technique to fortify the MPE perspective. Subsequently, we integrate a trainable slack variable to fine-tune the initial label frequency, concurrently optimizing this variable and the classifier to eliminate latent bias in the initial stage. Experimental results on three real-world tax datasets demonstrate that RR-PU outperforms state-of-the-art methods in tax evasion detection tasks.

PDF Details DOI

AAAI Conference 2024 Conference Paper

The Evidence Contraction Issue in Deep Evidential Regression: Discussion and Solution

Yuefei Wu
Bin Shi
Bo Dong
Qinghua Zheng
Hua Wei

Deep Evidential Regression (DER) places a prior on the original Gaussian likelihood and treats learning as an evidence acquisition process to quantify uncertainty. For the validity of the evidence theory, DER requires specialized activation functions to ensure that the prior parameters remain non-negative. However, such constraints will trigger evidence contraction, causing sub-optimal performance. In this paper, we analyse DER theoretically, revealing the intrinsic limitations for sub-optimal performance: the non-negativity constraints on the Normal Inverse-Gamma (NIG) prior parameter trigger the evidence contraction under the specialized activation function, which hinders the optimization of DER performance. On this basis, we design a Non-saturating Uncertainty Regularization term, which effectively ensures that the performance is further optimized in the right direction. Experiments on real-world datasets show that our proposed approach improves the performance of DER while maintaining the ability to quantify uncertainty.

PDF Details DOI

IJCAI Conference 2023 Conference Paper

NerCo: A Contrastive Learning Based Two-Stage Chinese NER Method

Zai Zhang
Bin Shi
Haokun Zhang
Huang Xu
Yaodong Zhang
Yuefei Wu
Bo Dong
Qinghua Zheng

Sequence labeling serves as the most commonly used scheme for Chinese named entity recognition(NER). However, traditional sequence labeling methods classify tokens within an entity into different classes according to their positions. As a result, different tokens in the same entity may be learned with representations that are isolated and unrelated in target representation space, which could finally negatively affect the subsequent performance of token classification. In this paper, we point out and define this problem as Entity Representation Segmentation in Label-semantics. And then we present NerCo: Named entity recognition with Contrastive learning, a novel NER framework which can better exploit labeled data and avoid the above problem. Following the pretrain-finetune paradigm, NerCo firstly guides the encoder to learn powerful label-semantics based representations by gathering the encoded token representations of the same Semantic Class while pushing apart that of different. Subsequently, NerCo finetunes the learned encoder for final entity prediction. Extensive experiments on several datasets demonstrate that our framework can consistently improve the baseline and achieve state-of-the-art performance.

PDF Details DOI

JMLR Journal 2023 Journal Article

On Learning Rates and Schrödinger Operators

Bin Shi
Weijie Su
Michael I. Jordan

Understanding the iterative behavior of stochastic optimization algorithms for minimizing nonconvex functions remains a crucial challenge in demystifying deep learning. In particular, it is not yet understood why certain simple techniques are remarkably effective for tuning the learning rate in stochastic gradient descent (SGD), arguably the most basic optimizer for training deep neural networks. This class of techniques includes learning rate decay, which begins with a large initial learning rate and is gradually reduced. In this paper, we present a general theoretical analysis of the effect of the learning rate in SGD. Our analysis is based on the use of a learning-rate-dependent stochastic differential equation (LR-dependent SDE) as a tool that allows us to set SGD distinctively apart from both gradient descent and stochastic gradient Langevin dynamics (SGLD). In contrast to prior research, our analysis builds on the analysis of a partial differential equation that models the evolution of probability densities, drawing insights from Wainwright and Jordan (2006); Jordan (2018). From this perspective, we derive the linear convergence rate of the probability densities, highlighting its dependence on the learning rate. Moreover, we obtain an explicit expression for the optimal linear rate by analyzing the spectrum of the Witten-Laplacian, a special case of the Schrödinger operator associated with the LR-dependent SDE. This expression clearly reveals the dependence of the linear convergence rate on the learning rate—the linear rate decreases rapidly to zero as the learning rate tends to zero for a broad class of nonconvex functions, whereas it stays constant for strongly convex functions. Based on this sharp distinction between nonconvex and convex problems, we provide a mathematical interpretation of the benefits of using learning rate decay for nonconvex optimization. [abs] [ pdf ][ bib ] &copy JMLR 2023. ( edit, beta )

PDF Details

IJCAI Conference 2023 Conference Paper

Reinforcement Learning Approaches for Traffic Signal Control under Missing Data

Hao Mei
Junxian Li
Bin Shi
Hua Wei

The emergence of reinforcement learning (RL) methods in traffic signal control (TSC) tasks has achieved promising results. Most RL approaches require the observation of the environment for the agent to decide which action is optimal for a long-term reward. However, in real-world urban scenarios, missing observation of traffic states may frequently occur due to the lack of sensors, which makes existing RL methods inapplicable on road networks with missing observation. In this work, we aim to control the traffic signals in a real-world setting, where some of the intersections in the road network are not installed with sensors and thus with no direct observations around them. To the best of our knowledge, we are the first to use RL methods to tackle the TSC problem in this real-world setting. Specifically, we propose two solutions: 1) imputes the traffic states to enable adaptive control. 2) imputes both states and rewards to enable adaptive control and the training of RL agents. Through extensive experiments on both synthetic and real-world road network traffic, we reveal that our method outperforms conventional approaches and performs consistently with different missing rates. We also investigate how missing data influences the performance of our model.

PDF Details DOI

NeurIPS Conference 2019 Conference Paper

Acceleration via Symplectic Discretization of High-Resolution Differential Equations

Bin Shi
Simon Du
Weijie Su
Michael Jordan

We study first-order optimization algorithms obtained by discretizing ordinary differential equations (ODEs) corresponding to Nesterov’s accelerated gradient methods (NAGs) and Polyak’s heavy-ball method. We consider three discretization schemes: symplectic Euler (S), explicit Euler (E) and implicit Euler (I) schemes. We show that the optimization algorithm generated by applying the symplectic scheme to a high-resolution ODE proposed by Shi et al. [2018] achieves the accelerated rate for minimizing both strongly convex function and convex function. On the other hand, the resulting algorithm either fails to achieve acceleration or is impractical when the scheme is implicit, the ODE is low-resolution, or the scheme is explicit.

PDF Details