Arrow Research search

Author name cluster

Hui Lin

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

16 papers
2 author rows

Possible papers

16

ICLR Conference 2025 Conference Paper

DiffGAD: A Diffusion-based Unsupervised Graph Anomaly Detector

  • Jinghan Li
  • Yuan Gao
  • Jinda Lu
  • Junfeng Fang
  • Congcong Wen
  • Hui Lin
  • Xiang Wang 0010

Graph Anomaly Detection (GAD) is crucial for identifying abnormal entities within networks, garnering significant attention across various fields. Traditional unsupervised methods, which decode encoded latent representations of unlabeled data with a reconstruction focus, often fail to capture critical discriminative content, leading to suboptimal anomaly detection. To address these challenges, we present a Diffusion-based Graph Anomaly Detector (DiffGAD). At the heart of DiffGAD is a novel latent space learning paradigm, meticulously designed to enhance the model's proficiency by guiding it with discriminative content. This innovative approach leverages diffusion sampling to infuse the latent space with discriminative content and introduces a content-preservation mechanism that retains valuable information across different scales, significantly improving the model’s adeptness at identifying anomalies with limited time and space complexity. Our comprehensive evaluation of DiffGAD, conducted on six real-world and large-scale datasets with various metrics, demonstrated its exceptional performance. Our code is available at https://github.com/fortunato-all/DiffGAD

NeurIPS Conference 2025 Conference Paper

From Specificity to Generality: Revisiting Generalizable Artifacts in Detecting Face Deepfakes

  • Long Ma
  • Zhiyuan Yan
  • Jin Xu
  • Yize Chen
  • Qinglang Guo
  • Zhen Bi
  • Yong Liao
  • Hui Lin

Detecting deepfakes has been an increasingly important topic, especially given the rapid development of AI generation techniques. In this paper, we ask: How can we build a universal detection framework that is effective for most facial deepfakes? One significant challenge is the wide variety of deepfake generators available, resulting in varying forgery artifacts (e. g. , lighting inconsistency, color mismatch, etc). But should we ``teach" the detector to learn all these artifacts separately? It is impossible and impractical to elaborate on them all. So the core idea is to pinpoint the more common and general artifacts across different deepfakes. Accordingly, we categorize deepfake artifacts into two distinct yet complementary types: Face Inconsistency Artifacts (FIA) and Up-Sampling Artifacts (USA). FIA arise from the challenge of generating all intricate details, inevitably causing inconsistencies between the complex facial features and relatively uniform surrounding areas. USA, on the other hand, are the inevitable traces left by the generator's decoder during the up-sampling process. This categorization stems from the observation that all existing deepfakes typically exhibit one or both of these artifacts. To achieve this, we propose a new data-level pseudo-fake creation framework that constructs fake samples with only the FIA and USA, without introducing extra less-general artifacts. Specifically, we employ a super-resolution to simulate the USA, while utilise image-level self-blending on diverse facial regions to create the FIA. We surprisingly found that, with this intuitive design, a standard image classifier trained only with our pseudo-fake data can non-trivially generalize well to previously unseen deepfakes.

ICML Conference 2025 Conference Paper

HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation

  • Tianwei Lin 0001
  • Wenqiao Zhang
  • Sijing Li
  • Yuqian Yuan
  • Binhe Yu
  • Haoyuan Li 0002
  • Wanggui He
  • Hao Jiang 0014

We present HealthGPT, a powerful Medical Large Vision-Language Model (Med-LVLM) that integrates medical visual comprehension and generation capabilities within a unified autoregressive paradigm. Our bootstrapping philosophy is to progressively adapt heterogeneous comprehension and generation knowledge to pre-trained Large Language Models (LLMs). This is achieved through a novel heterogeneous low-rank adaptation (H-LoRA) technique, which is complemented by a tailored hierarchical visual perception (HVP) approach and a three-stage learning strategy (TLS). To effectively learn the HealthGPT, we devise a comprehensive medical domain-specific comprehension and generation dataset called VL-Health. Experimental results demonstrate exceptional performance and scalability of HealthGPT in medical visual unified tasks. Our project can be accessed at https: //github. com/DCDmllm/HealthGPT.

NeurIPS Conference 2025 Conference Paper

Polyline Path Masked Attention for Vision Transformer

  • Zhongchen Zhao
  • Chaodong Xiao
  • Hui Lin
  • Qi Xie
  • Lei Zhang
  • Deyu Meng

Global dependency modeling and spatial position modeling are two core issues of the foundational architecture design in current deep learning frameworks. Recently, Vision Transformers (ViTs) have achieved remarkable success in computer vision, leveraging the powerful global dependency modeling capability of the self-attention mechanism. Furthermore, Mamba2 has demonstrated its significant potential in natural language processing tasks by explicitly modeling the spatial adjacency prior through the structured mask. In this paper, we propose Polyline Path Masked Attention (PPMA) that integrates the self-attention mechanism of ViTs with an enhanced structured mask of Mamba2, harnessing the complementary strengths of both architectures. Specifically, we first ameliorate the traditional structured mask of Mamba2 by introducing a 2D polyline path scanning strategy and derive its corresponding structured mask, polyline path mask, which better preserves the adjacency relationships among image tokens. Notably, we conduct a thorough theoretical analysis on the structural characteristics of the proposed polyline path mask and design an efficient algorithm for the computation of the polyline path mask. Next, we embed the polyline path mask into the self-attention mechanism of ViTs, enabling explicit modeling of spatial adjacency prior. Extensive experiments on standard benchmarks, including image classification, object detection, and segmentation, demonstrate that our model outperforms previous state-of-the-art approaches based on both state-space models and Transformers. For example, our proposed PPMA-T/S/B models achieve 48. 7%/51. 1%/52. 3% mIoU on the ADE20K semantic segmentation task, surpassing RMT-T/S/B by 0. 7%/1. 3%/0. 3%, respectively. Code is available at https: //github. com/zhongchenzhao/PPMA.

AAAI Conference 2025 Conference Paper

Structural Entropy Guided Probabilistic Coding

  • Xiang Huang
  • Hao Peng
  • Li Sun
  • Hui Lin
  • Chunyang Liu
  • Jiang Cao
  • Philip S. Yu

Probabilistic embeddings have several advantages over deterministic embeddings as they map each data point to a distribution, which better describes the uncertainty and complexity of data. Many works focus on adjusting the distribution constraint under the Information Bottleneck (IB) principle to enhance representation learning. However, these proposed regularization terms only consider the constraint of each latent variable, omitting the structural information between latent variables. In this paper, we propose a novel structural entropy-guided probabilistic coding model, named SEPC. Specifically, we incorporate the relationship between latent variables into the optimization by proposing a structural entropy regularization loss. Besides, as traditional structural information theory is not well-suited for regression tasks, we propose a probabilistic encoding tree, transferring regression tasks to classification tasks while diminishing the influence of the transformation. Experimental results across 12 natural language understanding tasks, including both classification and regression tasks, demonstrate the superior performance of SEPC compared to other state-of-the-art models in terms of effectiveness, generalization capability, and robustness to label noise.

AAAI Conference 2025 Conference Paper

Towards Robust Visual Question Answering via Prompt-Driven Geometric Harmonization

  • Yishu Liu
  • Jiawei Zhu
  • Congcong Wen
  • Guangming Lu
  • Hui Lin
  • Bingzhi Chen

Visual Question Answering (VQA) has garnered significant attention as a crucial link between vision and language, aimed at generating accurate responses to visual queries. However, current VQA models still struggle with the challenges of minority class collapse and spurious semantic correlations posed by language bias and imbalanced distributions. To address these challenges, this paper proposes a novel Prompt-Driven Geometric Harmonization (PDGH) paradigm, which integrates both geometric structure and information entropy principles to enhance the ability of VQA models to generalize effectively across diverse scenarios. Specifically, our PDGH approach is meticulously designed to generate image-generated prompts that are guided by specific question cues, facilitating a more accurate and context-aware understanding of the visual content. Moreover, we project the prompt-visual-question and visual-question joint representations into a unified hypersphere space, applying feature weight self-orthogonality and prompt-information entropy correction constraints to optimize the margin, further alleviating minority class collapse and correcting language bias. To maintain the geometric integrity of the representation space, we introduce multi-space geometric contrast constraints to minimize the impact of spurious priors introduced during training. Finally, a semantic matrix is constructed for the coordinated joint representation to ensure that the learned instances are semantically consistent and improve reasoning ability. Extensive experiments on various general and medical VQA datasets demonstrate the consistent superiority of our PDGH approach over existing state-of-the-art baselines.

NeurIPS Conference 2024 Conference Paper

Customizing Language Models with Instance-wise LoRA for Sequential Recommendation

  • Xiaoyu Kong
  • Jiancan Wu
  • An Zhang
  • Leheng Sheng
  • Hui Lin
  • Xiang Wang
  • Xiangnan He

Sequential recommendation systems predict the next interaction item based on users' past interactions, aligning recommendations with individual preferences. Leveraging the strengths of Large Language Models (LLMs) in knowledge comprehension and reasoning, recent approaches are eager to apply LLMs to sequential recommendation. A common paradigm is converting user behavior sequences into instruction data, and fine-tuning the LLM with parameter-efficient fine-tuning (PEFT) methods like Low-Rank Adaption (LoRA). However, the uniform application of LoRA across diverse user behaviors is insufficient to capture individual variability, resulting in negative transfer between disparate sequences. To address these challenges, we propose Instance-wise LoRA (iLoRA). We innovatively treat the sequential recommendation task as a form of multi-task learning, integrating LoRA with the Mixture of Experts (MoE) framework. This approach encourages different experts to capture various aspects of user behavior. Additionally, we introduce a sequence representation guided gate function that generates customized expert participation weights for each user sequence, which allows dynamic parameter adjustment for instance-wise recommendations. In sequential recommendation, iLoRA achieves an average relative improvement of 11. 4\% over basic LoRA in the hit ratio metric, with less than a 1\% relative increase in trainable parameters. Extensive experiments on three benchmark datasets demonstrate the effectiveness of iLoRA, highlighting its superior performance compared to existing methods in mitigating negative transfer and improving recommendation accuracy. Our data and code are available at https: //github. com/AkaliKong/iLoRA.

AAAI Conference 2024 Conference Paper

Gramformer: Learning Crowd Counting via Graph-Modulated Transformer

  • Hui Lin
  • Zhiheng Ma
  • Xiaopeng Hong
  • Qinnan Shangguan
  • Deyu Meng

Transformer has been popular in recent crowd counting work since it breaks the limited receptive field of traditional CNNs. However, since crowd images always contain a large number of similar patches, the self-attention mechanism in Transformer tends to find a homogenized solution where the attention maps of almost all patches are identical. In this paper, we address this problem by proposing Gramformer: a graph-modulated transformer to enhance the network by adjusting the attention and input node features respectively on the basis of two different types of graphs. Firstly, an attention graph is proposed to diverse attention maps to attend to complementary information. The graph is building upon the dissimilarities between patches, modulating the attention in an anti-similarity fashion. Secondly, a feature-based centrality encoding is proposed to discover the centrality positions or importance of nodes. We encode them with a proposed centrality indices scheme to modulate the node features and similarity relationships. Extensive experiments on four challenging crowd counting datasets have validated the competitiveness of the proposed method. Code is available at https://github.com/LoraLinH/Gramformer.

NeurIPS Conference 2024 Conference Paper

MECD: Unlocking Multi-Event Causal Discovery in Video Reasoning

  • Tieyuan Chen
  • Huabin Liu
  • Tianyao He
  • Yihang Chen
  • Chaofan Gan
  • Xiao Ma
  • Cheng Zhong
  • Yang Zhang

Video causal reasoning aims to achieve a high-level understanding of video content from a causal perspective. However, current video reasoning tasks are limited in scope, primarily executed in a question-answering paradigm and focusing on short videos containing only a single event and simple causal relationships, lacking comprehensive and structured causality analysis for videos with multiple events. To fill this gap, we introduce a new task and dataset, Multi-Event Causal Discovery (MECD). It aims to uncover the causal relationships between events distributed chronologically across long videos. Given visual segments and textual descriptions of events, MECD requires identifying the causal associations between these events to derive a comprehensive, structured event-level video causal diagram explaining why and how the final result event occurred. To address MECD, we devise a novel framework inspired by the Granger Causality method, using an efficient mask-based event prediction model to perform an Event Granger Test, which estimates causality by comparing the predicted result event when premise events are masked versus unmasked. Furthermore, we integrate causal inference techniques such as front-door adjustment and counterfactual inference to address challenges in MECD like causality confounding and illusory causality. Experiments validate the effectiveness of our framework in providing causal relationships in multi-event videos, outperforming GPT-4o and VideoLLaVA by 5. 7% and 4. 1%, respectively.

NeurIPS Conference 2023 Conference Paper

GUST: Combinatorial Generalization by Unsupervised Grouping with Neuronal Coherence

  • Hao Zheng
  • Hui Lin
  • Rong Zhao

Dynamically grouping sensory information into structured entities is essential for understanding the world of combinatorial nature. However, the grouping ability and therefore combinatorial generalization are still challenging artificial neural networks. Inspired by the evidence that successful grouping is indicated by neuronal coherence in the human brain, we introduce GUST (Grouping Unsupervisely by Spike Timing network), an iterative network architecture with biological constraints to bias the network towards a dynamical state of neuronal coherence that softly reflects the grouping information in the temporal structure of its spiking activity. We evaluate and analyze the model on synthetic datasets. Interestingly, the segregation ability is directly learned from superimposed stimuli with a succinct unsupervised objective. Two learning stages are present, from coarsely perceiving global features to additionally capturing local features. Further, the learned symbol-like building blocks can be systematically composed to represent novel scenes in a bio-plausible manner.

NeurIPS Conference 2022 Conference Paper

Dance of SNN and ANN: Solving binding problem by combining spike timing and reconstructive attention

  • Hao Zheng
  • Hui Lin
  • Rong Zhao
  • Luping Shi

The binding problem is one of the fundamental challenges that prevent the artificial neural network (ANNs) from a compositional understanding of the world like human perception, because disentangled and distributed representations of generative factors can interfere and lead to ambiguity when complex data with multiple objects are presented. In this paper, we propose a brain-inspired unsupervised hybrid neural network (HNN) that introduces temporal binding theory originated from neuroscience into ANNs by integrating spike timing dynamics (via spiking neural networks, SNNs) with reconstructive attention (by ANNs). Spike timing provides an additional dimension for grouping, while reconstructive feedback coordinates the spikes into temporal coherent states. Through iterative interaction of ANN and SNN, the model continuously binds multiple objects at alternative synchronous firing times in the SNN coding space. The effectiveness of the model is evaluated on five artificially generated datasets of binary images. By visualization and analysis, we demonstrate that the binding is explainable, soft, flexible, and hierarchical. Notably, the model is trained on single object datasets without explicit supervision on grouping, but can successfully bind multiple objects on test datasets, showing its compositional generalization capability. Further results show its binding ability in dynamic situations.

IJCAI Conference 2021 Conference Paper

Direct Measure Matching for Crowd Counting

  • Hui Lin
  • Xiaopeng Hong
  • Zhiheng Ma
  • Xing Wei
  • Yunfeng Qiu
  • Yaowei Wang
  • Yihong Gong

Traditional crowd counting approaches usually use Gaussian assumption to generate pseudo density ground truth, which suffers from problems like inaccurate estimation of the Gaussian kernel sizes. In this paper, we propose a new measure-based counting approach to regress the predicted density maps to the scattered point-annotated ground truth directly. First, crowd counting is formulated as a measure matching problem. Second, we derive a semi-balanced form of Sinkhorn divergence, based on which a Sinkhorn counting loss is designed for measure matching. Third, we propose a self-supervised mechanism by devising a Sinkhorn scale consistency loss to resist scale changes. Finally, an efficient optimization method is provided to minimize the overall loss function. Extensive experiments on four challenging crowd counting datasets namely ShanghaiTech, UCF-QNRF, JHU++ and NWPU have validated the proposed method.

AAAI Conference 2021 Conference Paper

Learning to Count via Unbalanced Optimal Transport

  • Zhiheng Ma
  • Xing Wei
  • Xiaopeng Hong
  • Hui Lin
  • Yunfeng Qiu
  • Yihong Gong

Counting dense crowds through computer vision technology has attracted widespread attention. Most crowd counting datasets use point annotations. In this paper, we formulate crowd counting as a measure regression problem to minimize the distance between two measures with different supports and unequal total mass. Specifically, we adopt the unbalanced optimal transport distance, which remains stable under spatial perturbations, to quantify the discrepancy between predicted density maps and point annotations. An efficient optimization algorithm based on the regularized semidual formulation of UOT is introduced, which alternatively learns the optimal transportation and optimizes the density regressor. The quantitative and qualitative results illustrate that our method achieves state-of-the-art counting and localization performance.

AAAI Conference 2019 Conference Paper

Abstractive Summarization: A Survey of the State of the Art

  • Hui Lin
  • Vincent Ng

The focus of automatic text summarization research has exhibited a gradual shift from extractive methods to abstractive methods in recent years, owing in part to advances in neural methods. Originally developed for machine translation, neural methods provide a viable framework for obtaining an abstract representation of the meaning of an input text and generating informative, fluent, and human-like summaries. This paper surveys existing approaches to abstractive summarization, focusing on the recently developed neural approaches.

NeurIPS Conference 2011 Conference Paper

On fast approximate submodular minimization

  • Stefanie Jegelka
  • Hui Lin
  • Jeff Bilmes

We are motivated by an application to extract a representative subset of machine learning training data and by the poor empirical performance we observe of the popular minimum norm algorithm. In fact, for our application, minimum norm can have a running time of about O(n^7 ) (O(n^5 ) oracle calls). We therefore propose a fast approximate method to minimize arbitrary submodular functions. For a large sub-class of submodular functions, the algorithm is exact. Other submodular functions are iteratively approximated by tight submodular upper bounds, and then repeatedly optimized. We show theoretical properties, and empirical results suggest significant speedups over minimum norm while retaining higher accuracies.