Author name cluster

Hui Shen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

14 papers

1 author row

TMLR Journal 2025 Journal Article

Autoregressive Models in Vision: A Survey

Jing Xiong
Gongye Liu
Lun Huang
Chengyue Wu
Taiqiang Wu
Yao Mu
Yuan Yao
Hui Shen

Autoregressive modeling has been a huge success in the field of natural language processing (NLP). Recently, autoregressive models have emerged as a significant area of focus in computer vision, where they excel in producing high-quality visual content. Autoregressive models in NLP typically operate on subword tokens. However, the representation strategy in computer vision can vary in different levels, i.e., pixel-level, token-level, or scale-level, reflecting the diverse and hierarchical nature of visual data compared to the sequential structure of language. This survey comprehensively examines the literature on autoregressive models applied to vision. To improve readability for researchers from diverse research backgrounds, we start with preliminary sequence representation and modeling in vision. Next, we divide the fundamental frameworks of visual autoregressive models into three general sub-categories, including pixel-based, token-based, and scale-based models based on the representation strategy. We then explore the interconnections between autoregressive models and other generative models. Furthermore, we present a multifaceted categorization of autoregressive models in computer vision, including image generation, video generation, 3D generation, and multimodal generation. We also elaborate on their applications in diverse domains, including emerging domains such as embodied AI and 3D medical AI, with about 250 related references. Finally, we highlight the current challenges to autoregressive models in vision with suggestions about potential research directions. We have also set up a Github repository to organize the papers included in this survey at: https://github.com/ChaofanTao/Autoregressive-Models-in-Vision-Survey.

PDF Details

TMLR Journal 2025 Journal Article

Efficient Diffusion Models: A Survey

Hui Shen
Jingxuan Zhang
Boning Xiong
Rui Hu
Shoufa Chen
Zhongwei Wan
Xin Wang
Yu Zhang

Diffusion models have emerged as powerful generative models capable of producing high-quality contents such as images, videos, and audio, demonstrating their potential to revolutionize digital content creation. However, these capabilities come at the cost of significant computational resources and lengthy generation time, underscoring the critical need to develop efficient techniques for practical deployment. In this survey, we provide a systematic and comprehensive review of research on efficient diffusion models. We organize the literature in a taxonomy consisting of three main categories, covering distinct yet interconnected efficient diffusion model topics from algorithm-level, system-level, and framework perspective, respectively. We have also created a GitHub repository where we organize the papers featured in this survey at github.com/AIoT-MLSys-Lab/Efficient-Diffusion-Model-Survey. We hope our survey can serve as a valuable resource to help researchers and practitioners gain a systematic understanding of efficient diffusion model research and inspire them to contribute to this important and exciting field.

PDF Details

JBHI Journal 2025 Journal Article

EMGANet: Edge-Aware Multi-Scale Group-Mix Attention Network for Breast Cancer Ultrasound Image Segmentation

Jin Huang
Yazhao Mao
Jingwen Deng
Zhaoyi Ye
Yimin Zhang
Jingwen Zhang
Lan Dong
Hui Shen

Breast cancer is one of the most prevalent diseases for women worldwide. Early and accurate ultrasound image segmentation plays a crucial role in reducing mortality. Although deep learning methods have demonstrated remarkable segmentation potential, they still struggle with challenges in ultrasound images, including blurred boundaries and speckle noise. To generate accurate ultrasound image segmentation, this paper proposes the Edge-Aware Multi-Scale Group-Mix Attention Network (EMGANet), which generates accurate segmentation by integrating deep and edge features. The Multi-Scale Group Mix Attention block effectively aggregates both sparse global and local features, ensuring the extraction of valuable information. The subsequent Edge Feature Enhancement block then focuses on cancer boundaries, enhancing the segmentation accuracy. Therefore, EMGANet effectively tackles unclear boundaries and noise in ultrasound images. We conduct experiments on two public datasets (Dataset-B, BUSI) and one private dataset which contains 927 samples from Renmin Hospital of Wuhan University (BUSI-WHU). EMGANet demonstrates superior segmentation performance, achieving an overall accuracy (OA) of 98. 56%, a mean IoU (mIoU) of 90. 32%, and an ASSD of 6. 1 pixels on the BUSI-WHU dataset. Additionally, EMGANet performs well on two public datasets, with a mIoU of 88. 2% and an ASSD of 9. 2 pixels on Dataset-B, and a mIoU of 81. 37% and an ASSD of 18. 27 pixels on the BUSI dataset. EMGANet achieves a state-of-the-art segmentation performance of about 2% in mIoU across three datasets. In summary, the proposed EMGANet significantly improves breast cancer segmentation through Edge-Aware and Group-Mix Attention mechanisms, showing great potential for clinical applications.

Details DOI

NeurIPS Conference 2025 Conference Paper

Fully Spiking Neural Networks for Unified Frame-Event Object Tracking

Jingjun Yang
Liangwei Fan
Jinpu Zhang
Xiangkai Lian
Hui Shen
Dewen Hu

The integration of image and event streams offers a promising approach for achieving robust visual object tracking in complex environments. However, current fusion methods achieve high performance at the cost of significant computational overhead and struggle to efficiently extract the sparse, asynchronous information from event streams, failing to leverage the energy-efficient advantages of event-driven spiking paradigms. To address this challenge, we propose the first fully Spiking Frame-Event Tracking framework called SpikeFET. This network achieves synergistic integration of convolutional local feature extraction and Transformer-based global modeling within the spiking paradigm, effectively fusing frame and event data. To overcome the degradation of translation invariance caused by convolutional padding, we introduce a Random Patchwork Module (RPM) that eliminates positional bias through randomized spatial reorganization and learnable type encoding while preserving residual structures. Furthermore, we propose a Spatial-Temporal Regularization (STR) strategy that overcomes similarity metric degradation from asymmetric features by enforcing spatio-temporal consistency among temporal template features in latent space. Extensive experiments across multiple benchmarks demonstrate that the proposed framework achieves superior tracking accuracy over existing methods while significantly reducing power consumption, attaining an optimal balance between performance and efficiency.

PDF Details

NeurIPS Conference 2025 Conference Paper

SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning

Zhongwei Wan
Zhihao Dou
Che Liu
Yu Zhang
Dongfei Cui
Qinjian Zhao
Hui Shen
Jing Xiong

Multimodal large language models (MLLMs) have shown promising capabilities in reasoning tasks, yet still struggle significantly with complex problems requiring explicit self-reflection and self-correction, especially compared to their unimodal text-based counterparts. Existing reflection methods are simplistic and struggle to generate meaningful, instructive feedback, as the reasoning ability and knowledge limits of pre-trained models are largely fixed during initial training. To overcome these challenges, we propose \textit{multimodal \textbf{S}elf-\textbf{R}eflection enhanced reasoning with Group Relative \textbf{P}olicy \textbf{O}ptimization} \textbf{SRPO}, a two-stage reflection-aware reinforcement learning (RL) framework explicitly designed to enhance multimodal LLM reasoning. In the first stage, we construct a high-quality, reflection-focused dataset under the guidance of an advanced MLLM, which generates reflections based on initial responses to help the policy model to learn both reasoning and self-reflection. In the second stage, we introduce a novel reward mechanism within the GRPO framework that encourages concise and cognitively meaningful reflection while avoiding redundancy. Extensive experiments across multiple multimodal reasoning benchmarks—including MathVista, MathVision, Mathverse, and MMMU-Pro—using Qwen-2. 5-VL-7B and Qwen-2. 5-VL-32B demonstrate that SRPO significantly outperforms state-of-the-art models, achieving notable improvements in both reasoning accuracy and reflection quality.

PDF Details

EAAI Journal 2024 Journal Article

Crucial rather than random: Attacking crucial substructure for backdoor attacks on graph neural networks

Haibin Tong
Huifang Ma
Hui Shen
Zhixin Li
Liang Chang

Backdoor attacks on Graph Neural Networks (GNNs) seek to manipulate the behavior of GNNs model by introducing a particular pattern or trigger into the input graph, misleading the GNNs model into making inaccurate predictions. Existing methods for implementing backdoor attacks exhibit two notable limitations: Firstly, predefined substructures lack flexibility and effectiveness in compelling the classifier to associate them with the predicted label. Secondly, random injection locations for these substructures lack stealth and fail to exploit vulnerabilities in the target system. To address the aforementioned limitations, we present a novel approach targeting crucial substructures for backdoor attacks with two core modules. The crucial substructure detection module focuses on identifying predictive-relevant substructures in the input graph, which not only explains the model’s predictions but also suggests which aspects to target for accurate categorization. The graph alignment module transforms the crucial substructures of the non-target class graphs into the crucial substructures for the attacker-chosen target class graphs, modifying few key edges and nodes. We validate the effectiveness of our method on four benchmark datasets, including those from bioinformatics and social networks. The experimental results demonstrate that our approach outperforms the majority of existing baseline methods, achieving an average ASR of 90. 77%. To the best of our knowledge, our work is the first to perform backdoor attacks with crucial substructure. Through the proposed methodology, we establish a pioneering direction for refining backdoor attack techniques on GNNs.

Details DOI

YNICL Journal 2019 Journal Article

Changes in default mode network connectivity in different glucose metabolism status and diabetes duration

Huanghui Liu
Jun Liu
Limin Peng
Zhichao Feng
Lu Cao
Huasheng Liu
Hui Shen
Dewen Hu

AIMS/HYPOTHESES: It is now generally accepted that diabetes increases the risk for cognitive impairment, but the precise mechanisms are poorly understood. In recent years, resting-state functional magnetic resonance imaging (rs-fMRI) is increasingly used to investigate the neural basis of cognitive dysfunction in type 2 diabetes (T2D) patients. Alterations in brain functional connectivity may underlie diabetes-related cognitive dysfunction and brain damage. The aim of this study was to investigate the changes in default mode network (DMN) connectivity in different glucose metabolism status and diabetes duration. METHODS: We used a seed-based fMRI analysis to investigate positive and negative DMN connectivity in four groups (39 subjects with normal glucose metabolism [NGM], 23 subjects with impaired glucose metabolism [IGM; i.e., prediabetes], 59 T2D patients with a diabetes duration of <10 years, and 24 T2D patients with a diabetes duration of ≥10 years). RESULTS: Negative DMN connectivity increased and then regressed with deteriorating glucose metabolism status and extending diabetes duration. DMN connectivity showed a significant correlation with diabetes duration. CONCLUSION/INTERPRETATION: This study suggests that DMN connectivity may exhibit distinct patterns in different glucose metabolism status and diabetes duration, providing some potential neuroimaging evidence for early diagnosis and further understanding of the pathophysiological mechanisms of diabetic brain damage.

Details DOI

YNICL Journal 2018 Journal Article

Classification of multi-site MR images in the presence of heterogeneity using multi-task learning

Qiongmin Ma
Tianhao Zhang
Marcus V. Zanetti
Hui Shen
Theodore D. Satterthwaite
Daniel H. Wolf
Raquel E. Gur
Yong Fan

With the advent of Big Data Imaging Analytics applied to neuroimaging, datasets from multiple sites need to be pooled into larger samples. However, heterogeneity across different scanners, protocols and populations, renders the task of finding underlying disease signatures challenging. The current work investigates the value of multi-task learning in finding disease signatures that generalize across studies and populations. Herein, we present a multi-task learning type of formulation, in which different tasks are from different studies and populations being pooled together. We test this approach in an MRI study of the neuroanatomy of schizophrenia (SCZ) by pooling data from 3 different sites and populations: Philadelphia, Sao Paulo and Tianjin (50 controls and 50 patients from each site), which posed integration challenges due to variability in disease chronicity, treatment exposure, and data collection. Some existing methods are also tested for comparison purposes. Experiments show that classification accuracy of multi-site data outperformed that of single-site data and pooled data using multi-task feature learning, and also outperformed other comparison methods. Several anatomical regions were identified to be common discriminant features across sites. These included prefrontal, superior temporal, insular, anterior cingulate cortex, temporo-limbic and striatal regions consistently implicated in the pathophysiology of schizophrenia, as well as the cerebellum, precuneus, and fusiform, middle temporal, inferior parietal, postcentral, angular, lingual and middle occipital gyri. These results indicate that the proposed multi-task learning method is robust in finding consistent and reliable structural brain abnormalities associated with SCZ across different sites, in the presence of multiple sources of heterogeneity.

Details DOI

YNIMG Journal 2018 Journal Article

Impact of global signal regression on characterizing dynamic functional connectivity and brain states

Huaze Xu
Jianpo Su
Jian Qin
Ming Li
Ling-Li Zeng
Dewen Hu
Hui Shen

Recently, resting-state functional magnetic resonance imaging (fMRI) studies have been extended to explore fluctuations in correlations over shorter timescales, referred to as dynamic functional connectivity (dFC). However, the impact of global signal regression (GSR) on dFC is not well established, despite the intensive investigations of the influence of GSR on static functional connectivity (sFC). This study aimed to examine the effect of GSR on the performance of the sliding-window correlation, a commonly used method for capturing functional connectivity (FC) dynamics based on resting-state fMRI and simultaneous electroencephalograph (EEG)-fMRI data. The results revealed that the impact of GSR on dFC was spatially heterogeneous, with some susceptible regions including the occipital cortex, sensorimotor area, precuneus, posterior insula and superior temporal gyrus, and that the impact was temporally modulated by the mean global signal (GS) magnitude across windows. Furthermore, GSR substantially changed the connectivity structures of the FC states responding to a high GS magnitude, as well as their temporal features, and even led to the emergence of new FC states. Conversely, those FC states marked by obvious anti-correlation structures associated with the default model network (DMN) were largely unaffected by GSR. Finally, we reported an association between the fluctuations in the windowed magnitude of GS and the time-varying EEG power within subjects, which implied changes in mental states underlying GS dynamics. Overall, this study suggested a potential neuropsychological basis, in addition to nuisance sources, for GS dynamics and highlighted the need for caution in applying GSR to sliding-window correlation analyses. At a minimum, the mental fluctuations of an individual subject, possibly related to ongoing vigilance, should be evaluated during the entire scan when the dynamics of FC is estimated.

Details DOI

YNIMG Journal 2016 Journal Article

Changes in functional connectivity dynamics associated with vigilance network in taxi drivers

Hui Shen
Zhenfeng Li
Jian Qin
Qiang Liu
Lubin Wang
Ling-Li Zeng
Hong Li
Dewen Hu

An increasing number of neuroimaging studies have suggested that the fluctuations of low-frequency resting-state functional connectivity (FC) are not noise but are instead linked to the shift between distinct cognitive states. However, there is very limited knowledge about whether and how the fluctuations of FC at rest are influenced by long-term training and experience. Here, we investigated how the dynamics of resting-state FC are linked to driving behavior by comparing 20 licensed taxi drivers with 20 healthy non-drivers using a sliding window approach. We found that the driving experience could be effectively decoded with 90% (p <0. 001) accuracy by the amplitude of low-frequency fluctuations in some specific connections, based on a multivariate pattern analysis technique. Interestingly, the majority of these connections fell within a set of distributed regions named “the vigilance network”. Moreover, the decreased amplitude of the FC fluctuations within the vigilance network in the drivers was negatively correlated with the number of years that they had driven a taxi. Furthermore, temporally quasi-stable functional connectivity segmentation revealed significant differences between the drivers and non-drivers in the dwell time of specific vigilance-related transient brain states, although the brain's repertoire of functional states was preserved. Overall, these results suggested a significant link between the changes in the time-dependent aspects of resting-state FC within the vigilance network and long-term driving experiences. The results not only improve our understanding of how the brain supports driving behavior but also shed new light on the relationship between the dynamics of functional brain networks and individual behaviors.

Details DOI

YNICL Journal 2015 Journal Article

Multivariate pattern analysis reveals anatomical connectivity differences between the left and right mesial temporal lobe epilepsy

Peng Fang
Jie An
Ling-Li Zeng
Hui Shen
Fanglin Chen
Wensheng Wang
Shijun Qiu
Dewen Hu

Previous studies have demonstrated differences of clinical signs and functional brain network organizations between the left and right mesial temporal lobe epilepsy (mTLE), but the anatomical connectivity differences underlying functional variance between the left and right mTLE remain uncharacterized. We examined 43 (22 left, 21 right) mTLE patients with hippocampal sclerosis and 39 healthy controls using diffusion tensor imaging. After the whole-brain anatomical networks were constructed for each subject, multivariate pattern analysis was applied to classify the left mTLE from the right mTLE and extract the anatomical connectivity differences between the left and right mTLE patients. The classification results reveal 93.0% accuracy for the left mTLE versus the right mTLE, 93.4% accuracy for the left mTLE versus controls and 90.0% accuracy for the right mTLE versus controls. Compared with the right mTLE, the left mTLE exhibited a different connectivity pattern in the cortical-limbic network and cerebellum. The majority of the most discriminating anatomical connections were located within or across the cortical-limbic network and cerebellum, thereby indicating that these disease-related anatomical network alterations may give rise to a portion of the complex of emotional and memory deficit between the left and right mTLE. Moreover, the orbitofrontal gyrus, cingulate cortex, hippocampus and parahippocampal gyrus, which exhibit high discriminative power in classification, may play critical roles in the pathophysiology of mTLE. The current study demonstrated that anatomical connectivity differences between the left mTLE and the right mTLE may have the potential to serve as a neuroimaging biomarker to guide personalized diagnosis of the left and right mTLE.

Details DOI

YNIMG Journal 2012 Journal Article

Combined structural and resting-state functional MRI analysis of sexual dimorphism in the young adult human brain: An MVPA approach

Lubin Wang
Hui Shen
Feng Tang
Yufeng Zang
Dewen Hu

There has been growing interest recently in the use of multivariate pattern analysis (MVPA) to decode information from high-dimensional neuroimaging data. The present study employed a support vector machine-based MVPA approach to identify the complex patterns of sex differences in brain structure and resting-state function. We also aimed to assess the role of anatomy on functional sex differences during rest. One hundred and forty healthy young Chinese adults (70 men and 70 women) underwent structural and resting-state functional MRI scans. Gray matter density and regional homogeneity (ReHo) were used to map brain structure and resting-state function, respectively. After combining these two feature vectors into one union-vector, a pattern classifier was designed using principal component analysis and linear support vector machine to identify brain areas that had distinct characteristics between the groups. We found that: (1) male and female brains were different with a mean classification accuracy of 89%; (2) sex differences in gray matter density were widely distributed in the brain, notably in the occipital lobe and the cerebellum; (3) men primarily showed higher ReHo in their right hemispheres and women tended to show greater ReHo in their left hemispheres; (4) about 50% of brain areas with functional sex differences exhibited significant positive correlations between gray matter density and ReHo. Our results suggest that sex is an important factor that account for interindividual variability in the healthy brain.

Details DOI

YNIMG Journal 2011 Journal Article

Post-treatment with amphetamine enhances reinnervation of the ipsilateral side cortex in stroke rats

Hua-Shan Liu
Hui Shen
Brandon K. Harvey
Priscila Castillo
Hanbing Lu
Yihong Yang
Yun Wang

Amphetamine (AM) treatment has been shown to alter behavioral recovery after ischemia caused by embolism, permanent unilateral occlusion of the common carotid and middle cerebral arteries, or unilateral sensorimotor cortex ablation in rats. However, the behavioral results are inconsistent possibly due to difficulty controlling the size of the lesion before treatment. There is also evidence that AM promotes neuroregeneration in the cortex contralateral to the infarction; however, the effects of AM in the ipsilateral cortex remain unclear. The purpose of this study was to employ T2-weighted imaging (T2WI) to establish controlled criteria for AM treatment and to examine neuroregenerative effects in both cortices after stroke. Adult rats were anesthetized, and the right middle cerebral artery was ligated for 90min to generate lesions in the ipsilateral cortex. Animals were separated into two equal treatment groups (AM or saline) according to the size of infarction, measured by T2WI at 2days after stroke. AM or saline was administered to stroke rats every third day starting on day 3 for 4weeks. AM treatment significantly reduced neurological deficits, as measured by body asymmetry and Bederson's score. T2WI and diffusion tensor imaging (DTI) were used to examine the size of infarction and axonal reinnervation, respectively, before and following treatment on days 2, 10 and 25 after stroke. AM treatment reduced the volume of tissue loss on days 10 and 25. A significant increase in fractional anisotropy ratio was found in the ipsilateral cortex after repeated AM administration, suggesting a possible increase in axonal outgrowth in the lesioned side cortex. Western analysis indicated that AM significantly increased the expression of synaptophysin ipsilaterally and neurofilament bilaterally. AM also enhanced matrix metalloproteinase (MMP) enzymatic activity, determined by MMP zymography in the lesioned side cortex. qRT-PCR was used to examine the expression of trophic factors after the 1st and 2nd doses of AM or saline injection. The expression of BDNF, but not BMP7 or CART, was significantly enhanced by AM in the lesioned side cortex. In conclusion, post-stroke treatment with AM facilitates behavioral recovery, which is associated with an increase in fractional anisotropy activity, enhanced fiber growth in tractography, synaptogenesis, upregulation of BDNF, and MMP activity mainly in the lesioned cortex. Our data suggest that the ipsilateral cortex may be the major target of action in stroke brain after AM treatment.

Details DOI

YNIMG Journal 2010 Journal Article

Discriminative analysis of resting-state functional connectivity patterns of schizophrenia using low dimensional embedding of fMRI

Hui Shen
Lubin Wang
Yadong Liu
Dewen Hu

Recently, a functional disconnectivity hypothesis of schizophrenia has been proposed for the physiological explanation of behavioral syndromes of this complex mental disorder. In this paper, we aim at further examining whether syndromes of schizophrenia could be decoded by some special spatiotemporal patterns of resting-state functional connectivity. We designed a data-driven classifier based on machine learning to extract highly discriminative functional connectivity features and to discriminate schizophrenic patients from healthy controls. The proposed classifier consisted of two separate steps. First, we used feature selection based on a correlation coefficient method to extract highly discriminative regions and construct the optimal feature set for classification. Then, an unsupervised-learning classifier combining low-dimensional embedding and self-organized clustering of fMRI was trained to discriminate schizophrenic patients from healthy controls. The performance of the classifier was tested using a leave-one-out cross-validation strategy. The experimental results demonstrated not only high classification accuracy (93. 75% for schizophrenic patients, 75. 0% for healthy controls), but also good generalization and stability with respect to the number of extracted features. In addition, some functional connectivities between certain brain regions of the cerebellum and frontal cortex were found to exhibit the highest discriminative power, which might provide further evidence for the cognitive dysmetria hypothesis of schizophrenia. This primary study demonstrated that machine learning could extract exciting new information from the resting-state activity of a brain with schizophrenia, which might have potential ability to improve current diagnosis and treatment evaluation of schizophrenia.

Details DOI