Arrow Research search

Author name cluster

Beibei Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

14 papers
1 author row

Possible papers

14

AAAI Conference 2026 Conference Paper

Look Closer! An Adversarial Parametric Editing Framework for Hallucination Mitigation in VLMs

  • Jiayu Hu
  • Beibei Li
  • Jiangwei Xia
  • Yanjun Qin
  • Bing Ji
  • Zhongshi He

While Vision-Language Models (VLMs) have garnered increasing attention in the AI community due to their promising practical applications, they exhibit persistent hallucination issues, generating outputs misaligned with visual inputs. Recent studies attribute these hallucinations to VLMs' over-reliance on linguistic priors and insufficient visual feature integration, proposing heuristic decoding calibration strategies to mitigate them. However, the non-trainable nature of these strategies inherently limits their optimization potential. To this end, we propose an adversarial parametric editing framework for Hallucination mitigation in VLMs, which follows an Activate-Locate-Edit Adversarially paradigm. Specifically, we first construct an activation dataset that comprises grounded responses (positive samples attentively anchored in visual features) and hallucinatory responses (negative samples reflecting LLM prior bias and internal knowledge artifacts). Next, we identify critical hallucination-prone parameter clusters by analyzing differential hidden states of response pairs. Then, these clusters are fine-tuned using prompts injected with adversarial prefixes optimized via prompt tuning to maximize visual neglect, thereby forcing the model to prioritize visual evidence over inherent parametric biases. Evaluations on both generative and discriminative VLM tasks demonstrate the significant effectiveness of ALEAHallu in alleviating hallucinations.

NeurIPS Conference 2025 Conference Paper

Fairshare Data Pricing via Data Valuation for Large Language Models

  • Luyang Zhang
  • Cathy Jiao
  • Beibei Li
  • Chenyan Xiong

Training data is the backbone of large language models (LLMs), yet today’s data markets often operate under exploitative pricing -- sourcing data from marginalized groups with little pay or recognition. This paper introduces a theoretical framework for LLM data markets, modeling the strategic interactions between buyers (LLM builders) and sellers (human annotators). We begin with theoretical and empirical analysis showing how exploitative pricing drives high-quality sellers out of the market, degrading data quality and long-term model performance. Then we introduce fairshare, a pricing mechanism grounded in data valuation that quantifies each data’s contribution. It aligns incentives by sustaining seller participation and optimizing utility for both buyers and sellers. Theoretically, we show that fairshare yields mutually optimal outcomes: maximizing long-term buyer utility and seller profit while sustaining market participation. Empirically when training open-source LLMs on complex NLP tasks, including math problems, medical diagnosis, and physical reasoning, fairshare boosts seller earnings and ensures a stable supply of high-quality data, while improving buyers’ performance-per-dollar and long-term welfare. Our findings offer a concrete path toward fair, transparent, and economically sustainable data markets for LLM. Our code will be open sourced.

NeurIPS Conference 2025 Conference Paper

FracFace: Breaking the Visual Clues—Fractal-Based Privacy-Preserving Face Recognition

  • Wanying Dai
  • Beibei Li
  • Naipeng Dong
  • Guangdong Bai
  • Jin Song Dong

Face recognition is essential for identity authentication, but the rich visual clues in facial images pose significant privacy risks, highlighting the critical importance of privacy-preserving solutions. For instance, numerous studies have shown that generative models are capable of effectively performing reconstruction attacks that result in the restoration of original visual clues. To mitigate this threat, we introduce FracFace, a fractal-based privacy-preserving face recognition framework. This approach effectively weakens the visual clues that can be exploited by reconstruction attacks by disrupting the spatial structure in frequency domain features, while retaining the vital visual clues required for identity recognition. To achieve this, we craft a Frequency Channels Refining module that reduces sparsity in the frequency domain. It suppresses visual clues that could be exploited by reconstruction attacks, while preserving features indispensable for recognition, thus making these attacks more challenging. More significantly, we design a Frequency Fractal Mapping module that obfuscates deep representations by remapping refined frequency channels into a fractal-based privacy structure. By leveraging the self-similarity of fractals, this module preserves identity relevant features while enhancing defense capabilities, thereby improving the overall robustness of the protection scheme. Experiments conducted on multiple public face recognition benchmarks demonstrate that the proposed FracFace significantly reduces the visual recoverability of facial features, while maintaining high recognition accuracy, as well as the superiorities over state-of-the-art privacy protection approaches.

AAAI Conference 2025 Conference Paper

Graph Agent Network: Empowering Nodes with Inference Capabilities for Adversarial Resilience

  • Ao Liu
  • Wenshan Li
  • Tao Li
  • Beibei Li
  • Guangquan Xu
  • Pan Zhou
  • Wengang Ma
  • Hanyuan Huang

End-to-end training with global optimization have popularized graph neural networks (GNNs) for node classification, yet inadvertently introduced vulnerabilities to adversarial edge-perturbing attacks. Adversaries can exploit the inherent opened interfaces of GNNs' input and output, perturbing critical edges and thus manipulating the classification results. Current defenses, due to their persistent utilization of global-optimization-based end-to-end training schemes, inherently encapsulate the vulnerabilities of GNNs. This is specifically evidenced in their inability to defend against targeted secondary attacks. In this paper, we propose the Graph Agent Network (GAgN) to address the aforementioned vulnerabilities of GNNs. GAgN is a graph-structured agent network in which each node is designed as an 1-hop-view agent. Through the decentralized interactions between agents, they can learn to infer global perceptions to perform tasks including inferring embeddings, degrees and neighbor relationships for given nodes. This empowers nodes to filtering adversarial edges while carrying out classification tasks. Furthermore, agents' limited view prevents malicious messages from propagating globally in GAgN, thereby resisting global-optimization-based secondary attacks. We prove that single-hidden-layer multilayer perceptrons (MLPs) are theoretically sufficient to achieve these functionalities. Experimental results show that GAgN effectively implements all its intended capabilities and, compared to state-of-the-art defenses, achieves optimal classification accuracy on the perturbed datasets.

AAAI Conference 2025 Conference Paper

Grimm: A Plug-and-Play Perturbation Rectifier for Graph Neural Networks Defending Against Poisoning Attacks

  • Ao Liu
  • Wenshan Li
  • Beibei Li
  • Wengang Ma
  • Tao Li
  • Pan Zhou

Recent studies have revealed the vulnerability of graph neural networks (GNNs) to adversarial poisoning attacks on node classification tasks. Current defensive methods require substituting the original GNNs with defense models, regardless of the original's type. This approach, while targeting adversarial robustness, compromises the enhancements developed in prior research to boost GNNs' practical performance. Here we introduce Grimm, the first plug-and-play defense model. With just a minimal interface requirement for extracting features from any layer of the protected GNNs, Grimm is thus enabled to seamlessly rectify perturbations. Specifically, we utilize the feature trajectories (FTs) generated by GNNs, as they evolve through epochs, to reflect the training status of the networks. We then theoretically prove that the FTs of victim nodes will inevitably exhibit discriminable anomalies. Consequently, inspired by the natural parallelism between the biological nervous and immune systems, we construct Grimm, a comprehensive artificial immune system for GNNs. Grimm not only detects abnormal FTs and rectifies adversarial edges during training but also operates efficiently in parallel, thereby mirroring the concurrent functionalities of its biological counterparts. We experimentally confirm that Grimm offers four empirically validated advantages: 1) Harmlessness, as it does not actively interfere with GNN training; 2) Parallelism, ensuring monitoring, detection, and rectification functions operate independently of the GNN training process; 3) Generalizability, demonstrating compatibility with mainstream GNNs such as GCN, GAT, and GraphSAGE; and 4) Transferability, as the detectors for abnormal FTs can be efficiently transferred across different systems for one-step rectification.

AAAI Conference 2025 Conference Paper

Multi-Pair Temporal Sentence Grounding via Multi-Thread Knowledge Transfer Network

  • Xiang Fang
  • Wanlong Fang
  • Changshuo Wang
  • Daizong Liu
  • Keke Tang
  • Jianfeng Dong
  • Pan Zhou
  • Beibei Li

Given some video-query pairs with untrimmed videos and sentence queries, temporal sentence grounding (TSG) aims to locate query-relevant segments in these videos. Although previous respectable TSG methods have achieved remarkable success, they train each video-query pair separately and ignore the relationship between different pairs. To this end, in this paper, we pose a brand-new setting: Multi-Pair TSG, which aims to co-train these pairs. We propose a novel video-query co-training approach, Multi-Thread Knowledge Transfer Network, to locate a variety of video-query pairs effectively and efficiently. Firstly, we mine the spatial and temporal semantics across different queries to cooperate with each other. To learn intra- and inter-modal representations simultaneously, we design a cross-modal contrast module to explore the semantic consistency by a self-supervised strategy. To fully align visual and textual representations between different pairs, we design a prototype alignment strategy to 1) match object prototypes and phrase prototypes for spatial alignment, and 2) align activity prototypes and sentence prototypes for temporal alignment. Finally, we develop an adaptive negative selection module to adaptively generate a threshold for cross-modal matching. Extensive experiments show the effectiveness and efficiency of our proposed method.

TIST Journal 2024 Journal Article

An Unbiased Risk Estimator for Partial Label Learning with Augmented Classes

  • Jiayu Hu
  • Senlin Shu
  • Beibei Li
  • Tao Xiang
  • Zhongshi He

Partial Label Learning (PLL) is a typical weakly supervised learning task, which assumes each training instance is annotated with a set of candidate labels containing the ground-truth label. Recent PLL methods adopt identification-based disambiguation to alleviate the influence of false positive labels and achieve promising performance. However, they require all classes in the test set to have appeared in the training set, ignoring the fact that new classes will keep emerging in real applications. To address this issue, in this article, we focus on the problem of Partial Label Learning with Augmented Class (PLLAC), where one or more augmented classes are not visible in the training stage but appear in the inference stage. Specifically, we propose an unbiased risk estimator with theoretical guarantees for PLLAC, which estimates the distribution of augmented classes by differentiating the distribution of known classes from unlabeled data and can be equipped with arbitrary PLL loss functions. Besides, we provide a theoretical analysis of the estimation error bound of the estimator, which guarantees the convergence of the empirical risk minimizer to the true risk minimizer as the number of training data tends to infinity. Furthermore, we add a risk-penalty regularization term in the optimization objective to alleviate the influence of the over-fitting issue caused by negative empirical risk. Extensive experiments on benchmark, UCI, and real-world datasets demonstrate the effectiveness of the proposed approach.

TIST Journal 2024 Journal Article

Multiple-Instance Learning from Pairwise Comparison Bags

  • Shengjie Zhou
  • Senlin Shu
  • Haobo Wang
  • Hongxin Wei
  • Tao Xiang
  • Beibei Li

Multiple-instance learning (MIL) is a significant weakly supervised learning problem, where the training data consist of bags containing multiple instances and bag-level labels. Most previous MIL research required fully labeled bags. However, collecting such data is challenging due to the labeling costs or privacy concerns. Fortunately, we can easily collect pairwise comparison information, indicating one bag is more likely to be positive than the other. Therefore, we investigate a novel MIL problem about learning a bag-level binary classifier only from pairwise comparison bags. To solve this problem, we display the data generation process and provide a baseline method to train an instance-level classifier based on unlabeled-unlabeled learning. To achieve better performance, we propose a convex formulation to train a bag-level classifier and give a generalization error bound. Comprehensive experiments show that both the baseline method and the convex formulation achieve satisfactory performance, while the convex formulation performs better. 1

AAAI Conference 2024 Conference Paper

Towards Inductive Robustness: Distilling and Fostering Wave-Induced Resonance in Transductive GCNs against Graph Adversarial Attacks

  • Ao Liu
  • Wenshan Li
  • Tao Li
  • Beibei Li
  • Hanyuan Huang
  • Pan Zhou

Graph neural networks (GNNs) have recently been shown to be vulnerable to adversarial attacks, where slight perturbations in the graph structure can lead to erroneous predictions. However, current robust models for defending against such attacks inherit the transductive limitations of graph convolutional networks (GCNs). As a result, they are constrained by fixed structures and do not naturally generalize to unseen nodes. Here, we discover that transductive GCNs inherently possess a distillable robustness, achieved through a wave-induced resonance process. Based on this, we foster this resonance to facilitate inductive and robust learning. Specifically, we first prove that the signal formed by GCN-driven message passing (MP) is equivalent to the edge-based Laplacian wave, where, within a wave system, resonance can naturally emerge between the signal and its transmitting medium. This resonance provides inherent resistance to malicious perturbations inflicted on the signal system. We then prove that merely three MP iterations within GCNs can induce signal resonance between nodes and edges, manifesting as a coupling between nodes and their distillable surrounding local subgraph. Consequently, we present Graph Resonance-fostering Network (GRN) to foster this resonance via learning node representations from their distilled resonating subgraphs. By capturing the edge-transmitted signals within this subgraph and integrating them with the node signal, GRN embeds these combined signals into the central node's representation. This node-wise embedding approach allows for generalization to unseen nodes. We validate our theoretical findings with experiments, and demonstrate that GRN generalizes robustness to unseen nodes, whilst maintaining state-of-the-art classification accuracy on perturbed graphs. Appendices can be found on arXiv version: https://arxiv.org/abs/2312.08651

IJCAI Conference 2022 Conference Paper

A Murder and Protests, the Capitol Riot, and the Chauvin Trial: Estimating Disparate News Media Stance

  • Sujan Dutta
  • Beibei Li
  • Daniel S. Nagin
  • Ashiqur R. KhudaBukhsh

In this paper, we analyze the responses of three major US cable news networks to three seminal policing events in the US spanning a thirteen month period--the murder of George Floyd by police officer Derek Chauvin, the Capitol riot, Chauvin's conviction, and his sentencing. We cast the problem of aggregate stance mining as a natural language inference task and construct an active learning pipeline for robust textual entailment prediction. Via a substantial corpus of 34, 710 news transcripts, our analyses reveal that the partisan divide in viewership of these three outlets reflects on the network's news coverage of these momentous events. In addition, we release a sentence-level, domain-specific text entailment data set on policing consisting of 2, 276 annotated instances.

TIST Journal 2017 Journal Article

Using Online Geotagged and Crowdsourced Data to Understand Human Offline Behavior in the City

  • Yingjie Zhang
  • Beibei Li
  • Jason Hong

The pervasiveness of mobile technologies today has facilitated the creation of massive online crowdsourced and geotagged data from individual users at different locations in a city. Such ubiquitous user-generated data allow us to study the social and behavioral trajectories of individuals across both digital and physical environments. This information, combined with traditional economic and behavioral indicators in the city (e.g., store purchases, restaurant visits, parking), can help us better understand human behavior and interactions with cities. In this study, we take an economic perspective and focus on understanding human economic behavior in the city by examining the performance of local businesses based on the values learned from crowsourced and geotagged data. Specifically, we extract multiple traffic and human mobility features from publicly available data source geomapping and geo-social-tagging techniques and examine the effects of both static and dynamic features on booking volume of local restaurants. Our study is instantiated on a unique dataset of restaurant bookings from OpenTable for 3,187 restaurants in New York City from November 2013 to March 2014. Our results suggest that foot traffic can increase local popularity and business performance, while mobility and traffic from automobiles may hurt local businesses, especially the well-established chains and high-end restaurants. We also find that, on average, one or more street closure (caused by events or construction projects) nearby leads to a 4.7% decrease in the probability of a restaurant being fully booked during the dinner peak. Our study demonstrates the potential to best make use of the large volumes and diverse sources of crowdsourced and geotagged user-generated data to create matrices to predict local economic demand in a manner that is fast, cheap, accurate, and meaningful.