Author name cluster

Bin Xiao

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

22 papers

1 author row

AAAI Conference 2026 Conference Paper

Clear Nights Ahead: Towards Multi-Weather Nighttime Image Restoration

Yuetong Liu
Yunqiu Xu
Yang Wei
Xiuli Bi
Bin Xiao

Restoring nighttime images affected by multiple adverse weather conditions is a practical yet under-explored research problem, as multiple weather degradations usually coexist in the real world alongside various lighting effects at night. This paper first explores the challenging multi-weather nighttime image restoration task, where various types of weather degradations are intertwined with flare effects. To support the research, we contribute the AllWeatherNight dataset, featuring large-scale nighttime images with diverse compositional degradations. By employing illumination-aware degradation generation, our dataset significantly enhances the realism of synthetic degradations in nighttime scenes, providing a more reliable benchmark for model training and evaluation. Additionally, we propose ClearNight, a unified nighttime image restoration framework, which effectively removes complex degradations in one go. Specifically, ClearNight extracts Retinex-based dual priors and explicitly guides the network to focus on uneven illumination regions and intrinsic texture contents respectively, thereby enhancing restoration effectiveness in nighttime scenarios. Moreover, to more effectively model the common and unique characteristics of multiple weather degradations, ClearNight performs weather-aware dynamic specificity and commonality collaboration that adaptively allocates optimal sub-networks associated with specific weather types. Comprehensive experiments on both synthetic and real-world images demonstrate the necessity of the AllWeatherNight dataset and the superior performance of ClearNight.

PDF Details DOI

AAAI Conference 2026 Conference Paper

SSR: Semantic and Spatial Rectification for CLIP-based Weakly Supervised Segmentation

Xiuli Bi
Die Xiao
Junchao Fan
Bin Xiao

In recent years, Contrastive Language-Image Pretraining (CLIP) has been widely applied to Weakly Supervised Semantic Segmentation (WSSS) tasks due to its powerful cross-modal semantic understanding capabilities. This paper proposes a novel Semantic and Spatial Rectification (SSR) method to address the limitations of existing CLIP-based weakly supervised semantic segmentation approaches: over-activation in non-target foreground regions and background areas. Specifically, at the semantic level, the Cross-Modal Prototype Alignment (CMPA) establishes a contrastive learning mechanism to enforce feature space alignment across modalities, reducing inter-class overlap while enhancing semantic correlations, to rectify over-activation in non-target foreground regions effectively; at the spatial level, the Superpixel-Guided Correction (SGC) leverages superpixel-based spatial priors to precisely filter out interference from non-target regions during affinity propagation, significantly rectifying background over-activation. Extensive experiments on the PASCAL VOC and MS COCO datasets demonstrate that our method outperforms all single-stage approaches, as well as more complex multi-stage approaches, achieving mIoU scores of 79.5% and 50.6%, respectively.

PDF Details DOI

AAAI Conference 2026 Conference Paper

TGDD: Trajectory Guided Dataset Distillation with Balanced Distribution

Fengli Ran
Xiao Pu
Bo Liu
Xiuli Bi
Bin Xiao

Dataset distillation compresses large datasets into compact synthetic ones to reduce storage and computational costs. Among various approaches, distribution matching (DM)-based methods have attracted attention for their high efficiency. However, they often overlook the evolution of feature representations during training, which limits the expressiveness of synthetic data and weakens downstream performance. To address this issue, we propose Trajectory Guided Dataset Distillation (TGDD), which reformulates distribution matching as a dynamic alignment process along the model’s training trajectory. At each training stage, TGDD captures evolving semantics by aligning the feature distribution between the synthetic and original dataset. Meanwhile, it introduces a distribution constraint regularization to reduce class overlap. This design helps synthetic data preserve both semantic diversity and representativeness, improving performance in downstream tasks. Without additional optimization overhead, TGDD achieves a favorable balance between performance and efficiency. Experiments on ten datasets demonstrate that TGDD achieves state-of-the-art performance, notably a 5.0% accuracy gain on high-resolution benchmarks.

PDF Details DOI

JBHI Journal 2025 Journal Article

ADMM-ESINet: A Deep Unrolling Network for EEG Extended Source Imaging

Ke Liu
Hang Jiang
Hu Yang
Jun Zhang
Zhenghui Gu
Zhuliang Yu
Yu Zhang
Bin Xiao

Electroencephalography (EEG) source imaging (ESI) methods aim to reconstruct cortical sources from scalp EEG signals, a crucial task for understanding the normal brain as well as brain disorders. Traditional model-driven ESI methods face challenges in real-time reconstruction, while deep neural network (DNN)-based ESI methods often struggle with generalization to new data. To address these issues, we propose ADMM-ESINet, a novel deep unfolding neural network for robust and efficient reconstruction of EEG extended sources. ADMM-ESINet leverages a structured sparsity constraint within a regularization framework and employs the Alternating Direction Method of Multipliers (ADMM) to achieve iterative solutions. By unrolling the ADMM algorithm into a cascaded network architecture, ADMM-ESINet effectively integrates prior knowledge, enabling end-to-end, real-time ESI. Crucially, both the regularization parameters and the spatial transform operator are learned directly from the training data. Numerical results demonstrate that ADMM-ESINet surpasses traditional DNN-based methods in generalization ability and accurately reconstructs the location, extent, and temporal dynamics of extended sources, establishing ADMM-ESINet as a promising method for real-time ESI.

Details DOI

NeurIPS Conference 2025 Conference Paper

Continual Knowledge Adaptation for Reinforcement Learning

Jinwu Hu
ZiHao Lian
Zhiquan Wen
Chenghao Li
Guohao Chen
Xutao Wen
Bin Xiao
Mingkui Tan

Reinforcement Learning enables agents to learn optimal behaviors through interactions with environments. However, real-world environments are typically non-stationary, requiring agents to continuously adapt to new tasks and changing conditions. Although Continual Reinforcement Learning facilitates learning across multiple tasks, existing methods often suffer from catastrophic forgetting and inefficient knowledge utilization. To address these challenges, we propose Continual Knowledge Adaptation for Reinforcement Learning (CKA-RL), which enables the accumulation and effective utilization of historical knowledge. Specifically, we introduce a Continual Knowledge Adaptation strategy, which involves maintaining a task-specific knowledge vector pool and dynamically using historical knowledge to adapt the agent to new tasks. This process mitigates catastrophic forgetting and enables efficient knowledge transfer across tasks by preserving and adapting critical model parameters. Additionally, we propose an Adaptive Knowledge Merging mechanism that combines similar knowledge vectors to address scalability challenges, reducing memory requirements while ensuring the retention of essential knowledge. Experiments on three benchmarks demonstrate that the proposed CKA-RL outperforms state-of-the-art methods, achieving an improvement of 4. 20% in overall performance and 8. 02% in forward transfer. The source code is available at https: //github. com/Fhujinwu/CKA-RL.

PDF Details

JBHI Journal 2025 Journal Article

Contrastive Learning Guided Fusion Network for Brain CT and MRI

Yuping Huang
Weisheng Li
Bin Xiao
Guofen Wang
Dan He
Xiaoyu Qiao

Medical image fusion technology provides professionals with more detailed and precise diagnostic information. This paper introduces a new efficient CT and MRI fusion network, CLGFusion, based on a contrastive learning-guided network. CLGFusion includes two encoding branches at the feature encoding stage, enabling them to interact and learn from each other. The approach begins with training a single-view encoder to predict the feature representation of an image from varied augmented views. Simultaneously, the multi-view encoder is improved using the exponential moving average of the single-view encoder. Contrastive learning is integrated into medical image fusion by creating a feature contrast space without constructing negative samples. This feature contrast space cleverly uses the information of the difference in the feature product of the source image and its corresponding augmented image. It continuously guides the network to constantly optimize its fusion effect by combining the method of structural similarity loss, to achieve more accurate and efficient image fusion. This approach represents an end-to-end unsupervised fusion model. Experimental validation shows that our proposed method demonstrates performance comparable to state-of-the-art techniques in both subjective evaluation and objective metrics.

Details DOI

AAAI Conference 2025 Conference Paper

CustomTTT: Motion and Appearance Customized Video Generation via Test-Time Training

Xiuli Bi
Jian Lu
Bo Liu
Xiaodong Cun
Yong Zhang
Weisheng Li
Bin Xiao

Benefiting from large-scale pre-training of text-video pairs, current text-to-video (T2V) diffusion models can generate high-quality videos from the text description. Besides, given some reference images or videos, the parameter-efficient fine-tuning method, i.e. LoRA, can generate high-quality customized concepts, e.g., the specific subject or the motions from a reference video. However, combining the trained multiple concepts from different references into a single network shows obvious artifacts. To this end, we propose CustomTTT, where we can joint custom the appearance and the motion of the given video easily. In detail, we first analyze the prompt influence in the current video diffusion model and find the LoRAs are only needed for the specific layers for appearance and motion customization. Besides, since each LoRA is trained individually, we propose a novel test-time training technique to update parameters after combination utilizing the trained customized models. We conduct detailed experiments to verify the effectiveness of the proposed methods. Our method outperforms several state-of-the-art works in both qualitative and quantitative evaluations.

PDF Details DOI

JBHI Journal 2025 Journal Article

DMSACNN: Deep Multiscale Attentional Convolutional Neural Network for EEG-Based Motor Decoding

Ke Liu
Xin Xing
Tao Yang
Zhuliang Yu
Bin Xiao
Guoyin Wang
Wei Wu

Objective: Accurate decoding of electroencephalogram (EEG) signals has become more significant for the brain-computer interface (BCI). Specifically, motor imagery and motor execution (MI/ME) tasks enable the control of external devices by decoding EEG signals during imagined or real movements. However, accurately decoding MI/ME signals remains a challenge due to the limited utilization of temporal information and ineffective feature selection methods. Methods: This paper introduces DMSACNN, an end-to-end deep multiscale attention convolutional neural network for MI/ME-EEG decoding. DMSACNN incorporates a deep multiscale temporal feature extraction module to capture temporal features at various levels. These features are then processed by a spatial convolutional module to extract spatial features. Finally, a local and global feature fusion attention module is utilized to combine local and global information and extract the most discriminative spatiotemporal features. Main results: DMSACNN achieves impressive accuracies of 78. 20%, 96. 34% and 70. 90% for hold-out analysis on the BCI-IV-2a, High Gamma and OpenBMI datasets, respectively, outperforming most of the state-of-the-art methods. Conclusion and significance: These results highlight the potential of DMSACNN in robust BCI applications. Our proposed method provides a valuable solution to improve the accuracy of the MI/ME-EEG decoding, which can pave the way for more efficient and reliable BCI systems.

Details DOI

IJCAI Conference 2025 Conference Paper

Efficient Dynamic Ensembling for Multiple LLM Experts

Jinwu Hu
Yufeng Wang
Shuhai Zhang
Kai Zhou
Guohao Chen
Yu Hu
Bin Xiao
Mingkui Tan

LLMs have demonstrated impressive performance across various language tasks. However, the strengths of LLMs can vary due to different architectures, model sizes, areas of training data, etc. Therefore, ensemble reasoning for the strengths of different LLM experts is critical to achieving consistent and satisfactory performance on diverse inputs across a wide range of tasks. However, existing LLM ensemble methods are either computationally intensive or incapable of leveraging complementary knowledge among LLM experts for various inputs. In this paper, we propose an efficient Dynamic Ensemble Reasoning paradigm, called DER to integrate the strengths of multiple LLM experts conditioned on dynamic inputs. Specifically, we model the LLM ensemble reasoning problem as a Markov Decision Process, wherein an agent sequentially takes inputs to request knowledge from an LLM candidate and passes the output to a subsequent LLM candidate. Moreover, we devise a reward function to train a DER-Agent to dynamically select an optimal answering route given the input questions, aiming to achieve the highest performance with as few computational resources as possible. Last, to fully transfer the expert knowledge from the prior LLMs, we develop a Knowledge Transfer Prompt that enables the subsequent LLM candidates to transfer complementary knowledge effectively. Experiments demonstrate that our method uses fewer computational resources to achieve better performance compared to state-of-the-art baselines. Code and appendix are available at https: //github. com/Fhujinwu/DER.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

LOMIA: Label-Only Membership Inference Attacks against Pre-trained Large Vision-Language Models

Yihao Liu
Xinqi Lyu
Dong Wang
Yanjie Li
Bin Xiao

Large vision-language models (VLLMs) have driven significant progress in multi-modal systems, enabling a wide range of applications across domains such as healthcare, education, and content generation. Despite the success, the large-scale datasets used to train these models often contain sensitive or personally identifiable information, raising serious privacy concerns. To audit and better understand such risks, membership inference attacks (MIAs) have become a key tool. However, existing MIAs against VLLMs predominantly assume access to full-model logits, which are typically unavailable in many practical deployments. To facilitate MIAs in a more realistic and restrictive setting, we propose a novel framework: label-only membership inference attacks (LOMIA) targeting pre-trained VLLMs where only the model’s top-1 prediction is available. Within this framework, we propose three effective attack methods, all of which exploit the intuition that training samples are more likely to be memorized by the VLLMs, resulting in outputs that exhibit higher semantic alignment and lower perplexity. Our experiments show that our framework surpasses existing label-only attack adaptations for different VLLMs and competes with state-of-the-art logits-based attacks across all metrics on three widely used open-source VLLMs and GPT-4o.

PDF Details

AAAI Conference 2025 Conference Paper

Power of Diversity: Enhancing Data-Free Black-Box Attack with Domain-Augmented Learning

Yang Wei
Jingyu Tan
Guowen Xu
Zhuoran Ma
Zhuo Ma
Bin Xiao

Substitute training-based data-free black-box attacks pose a significant threat to enterprise-deployed models. These attacks use a generator to synthesize data and query APIs, then train a substitute model to approximate the target model's decision boundary based on the returned results. However, existing attack methods often struggle to produce sufficiently diverse data, particularly for complex target models and extensive target data domains, severely limiting their practical application. To address this gap, we design domain-augmented learning to improve the quality of the synthetic data domain (SDD) generated by the generator from two perspectives. Specifically, (1) To broaden the SDD's coverage, we introduce textual semantic embeddings into the generator for the first time. (2) For enhancing the SDD's discretization, we propose a competitive optimization strategy that forces the generator to self-compete, along with heterogeneity excitation to overcome the constraints of information entropy on diversity. Comprehensive experiments demonstrate that our method is more effective. In non-targeted attacks on the CIFAR-10 and Tiny-ImageNet datasets, our method outperforms the state-of-the-art by 14% and 7% in attack success rate, respectively.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

StyleGuard: Preventing Text-to-Image-Model-based Style Mimicry Attacks by Style Perturbations

Yanjie Li
Wenxuan Zhang
Xinqi Lyu
Yihao Liu
Bin Xiao

Recently, text-to-image diffusion models have been widely used for style mimicry and personalized customization through methods such as DreamBooth and Textual Inversion. This has raised concerns about intellectual property protection and the generation of deceptive content. Recent studies, such as Glaze and Anti-DreamBooth, have proposed using adversarial noise to protect images from these attacks. However, recent purification-based methods, such as DiffPure and Noise Upscaling, have successfully attacked these latest defenses, showing the vulnerabilities of these methods. Moreover, present methods show limited transferability across models, making them less effective against unknown text-to-image models. To address these issues, we propose a novel anti-mimicry method, StyleGuard. We propose a novel style loss that optimizes the style-related features in the latent space to make it deviate from the original image, which improves model-agnostic transferability. Additionally, to enhance the perturbation's ability to bypass diffusion-based purification, we designed a novel upscale loss that involves ensemble purifiers and upscalers during training. Extensive experiments on the WikiArt and CelebA datasets demonstrate that StyleGuard outperforms existing methods in robustness against various transformations and purifications, effectively countering style mimicry in various models. Moreover, StyleGuard is effective on different style mimicry methods, including DreamBooth and Textual Inversion. The code is available at \url{https: //github. com/PolyLiYJ/StyleGuard}.

PDF Details

AAAI Conference 2024 Conference Paper

Focus Stacking with High Fidelity and Superior Visual Effects

Bo Liu
Bin Hu
Xiuli Bi
Weisheng Li
Bin Xiao

Focus stacking is a technique in computational photography, and it synthesizes a single all-in-focus image from different focal plane images. It is difficult for previous works to produce a high-quality all-in-focus image that meets two goals: high-fidelity to its source images and good visual effects without defects or abnormalities. This paper proposes a novel method based on optical imaging process analysis and modeling. Based on a foreground segmentation - diffusion elimination architecture, the foreground segmentation makes most of the areas in full-focus images heritage information from the source images to achieve high fidelity; diffusion elimination models the physical imaging process and is specially used to solve the transition region (TR) problem that is a long-term neglected issue and degrades visual effects of synthesized images. Based on extensive experiments on simulated dataset, existing realistic dataset and our proposed BetaFusion dataset, the results show that our proposed method can generate high-quality all-in-focus images by achieving two goals simultaneously, especially can successfully solve the TR problem and eliminate the visual effect degradation of synthesized images caused by the TR problem.

PDF Details DOI

JBHI Journal 2023 Journal Article

HS-Vectors: Heart Sound Embeddings for Abnormal Heart Sound Detection Based on Time-Compressed and Frequency-Expanded TDNN With Dynamic Mask Encoder

Lihong Qiao
Yonghao Gao
Bin Xiao
Xiuli Bi
Weisheng Li
Xinbo Gao

In recent years, auxiliary diagnosis technology for cardiovascular disease based on abnormal heart sound detection has become a research hotspot. Heart sound signals are promising in the preliminary diagnosis of cardiovascular diseases. Previous studies have focused on capturing the local characteristics of heart sounds. In this paper, we investigate a method for mapping heart sound signals with complex patterns to fixed-length feature embedding called HS-Vectors for abnormal heart sound detection. To get the full embedding of the complex heart sound, HS-Vectors are obtained through the Time-Compressed and Frequency-Expanded Time-Delay Neural Network(TCFE-TDNN) and the Dynamic Masked-Attention (DMA) module. HS-Vectors extract and utilize the global and critical heart sound characteristics by masking out irreverent information. Based on the TCFE-TDNN module, the heart sound signal within a certain time is projected into fixed-length embedding. Then, with a learnable mask attention matrix, DMA stats pooling aggregates multi-scale hidden features from different TCFE-TDNN layers and masks out irrelevant frame-level features. Experimental evaluations are performed on a 10-fold cross-validation task using the 2016 PhysioNet/CinC Challenge dataset and the new publicly available pediatric heart sound dataset we collected. Experimental results demonstrate that the proposed method excels the state-of-the-art models in abnormality detection.

Details DOI

AAAI Conference 2023 Conference Paper

i-Code: An Integrative and Composable Multimodal Learning Framework

Ziyi Yang
Yuwei Fang
Chenguang Zhu
Reid Pryzant
DongDong Chen
Yu Shi
Yichong Xu
Yao Qian

Human intelligence is multimodal; we integrate visual, linguistic, and acoustic signals to maintain a holistic worldview. Most current pretraining methods, however, are limited to one or two modalities. We present i-Code, a self-supervised pretraining framework where users may flexibly combine the modalities of vision, speech, and language into unified and general-purpose vector representations. In this framework, data from each modality are first given to pretrained single-modality encoders. The encoder outputs are then integrated with a multimodal fusion network, which uses novel merge- and co-attention mechanisms to effectively combine information from the different modalities. The entire system is pretrained end-to-end with new objectives including masked modality unit modeling and cross-modality contrastive learning. Unlike previous research using only video for pretraining, the i-Code framework can dynamically process single, dual, and triple-modality data during training and inference, flexibly projecting different combinations of modalities into a single representation space. Experimental results demonstrate how i-Code can outperform state-of-the-art techniques on five multimodal understanding tasks and single-modality benchmarks, improving by as much as 11% and demonstrating the power of integrative multimodal pretraining.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Self-Supervised Image Local Forgery Detection by JPEG Compression Trace

Xiuli Bi
Wuqing Yan
Bo Liu
Bin Xiao
Weisheng Li
Xinbo Gao

For image local forgery detection, the existing methods require a large amount of labeled data for training, and most of them cannot detect multiple types of forgery simultaneously. In this paper, we firstly analyzed the JPEG compression traces which are mainly caused by different JPEG compression chains, and designed a trace extractor to learn such traces. Then, we utilized the trace extractor as the backbone and trained self-supervised to strengthen the discrimination ability of learned traces. With its benefits, regions with different JPEG compression chains can easily be distinguished within a forged image. Furthermore, our method does not rely on a large amount of training data, and even does not require any forged images for training. Experiments show that the proposed method can detect image local forgery on different datasets without re-training, and keep stable performance over various types of image local forgery.

PDF Details DOI

JBHI Journal 2022 Journal Article

A Novel Framework With Weighted Decision Map Based on Convolutional Neural Network for Cardiac MR Segmentation

Fei yan Li
Weisheng Li
Xinbo Gao
Bin Xiao

For diagnosing cardiovascular disease, an accurate segmentation method is needed. There are several unresolved issues in the complex field of cardiac magnetic resonance imaging, some of which have been partially addressed by using deep neural networks. To solve two problems of over-segmentation and under-segmentation of anatomical shapes in the short-axis view from different cardiac magnetic resonance sequences, we propose a novel two-stage framework with a weighted decision map based on convolutional neural networks to segment the myocardium (Myo), left ventricle (LV), and right ventricle (RV) simultaneously. The framework comprises a decision map extractor and a cardiac segmenter. A cascaded U-Net++ is used as a decision map extractor to acquire the decision map that decides the category of each pixel. Cardiac segmenter is a multiscale dual-path feature aggregation network (MDFA-Net) which consists of a densely connected network and an asymmetric encoding and decoding network. The input to the cardiac segmenter is derived from processed original images weighted by the output of the decision map extractor. We conducted experiments on two datasets of multi-sequence cardiac magnetic resonance segmentation challenge 2019 (MS-CMRSeg 2019) and myocardial pathology segmentation challenge 2020 (MyoPS 2020). Test results obtained on MyoPS 2020 show that the average Dice coefficients of the proposed method on the segmentation tasks of Myo, LV and RV are 84. 70%, 86. 00%, and 86. 31%, respectively.

Details DOI

JBHI Journal 2022 Journal Article

Medical Image Fusion and Denoising Algorithm Based on a Decomposition Model of Hybrid Variation-Sparse Representation

Guofen Wang
Weisheng Li
Jiao Du
Bin Xiao
Xinbo Gao

Medical image fusion technology integrates the contents of medical images of different modalities, thereby assisting users of medical images to better understand their meaning. However, the fusion of medical images corrupted by noise remains a challenge. To solve the existing problems in medical image fusion and denoising algorithms related to excessive blur, unclean denoising, gradient information loss, and color distortion, a novel medical image fusion and denoising algorithm is proposed. First, a new image layer decomposition model based on hybrid variation-sparse representation and weighted Schatten p-norm is proposed. The alternating direction method of multipliers is used to update the structure, detail layer dictionary, and detail layer coefficient map of the input image while denoising. Subsequently, appropriate fusion rules are employed for the structure layers and detail layer coefficient maps. Finally, the fused image is restored using the fused structure layer, detail layer dictionary, and detail layer coefficient maps. A large number of experiments confirm the superiority of the proposed algorithm over other algorithms. The proposed medical image fusion and denoising algorithm can effectively remove noise while retaining the gradient information without color distortion.

Details DOI

NeurIPS Conference 2021 Conference Paper

Focal Attention for Long-Range Interactions in Vision Transformers

Jianwei Yang
Chunyuan Li
Pengchuan Zhang
Xiyang Dai
Bin Xiao
Lu Yuan
Jianfeng Gao

Recently, Vision Transformer and its variants have shown great promise on various computer vision tasks. The ability to capture local and global visual dependencies through self-attention is the key to its success. But it also brings challenges due to quadratic computational overhead, especially for the high-resolution vision tasks(e. g. , object detection). Many recent works have attempted to reduce the cost and improve model performance by applying either coarse-grained global attention or fine-grained local attention. However, both approaches cripple the modeling power of the original self-attention mechanism of multi-layer Transformers, leading to sub-optimal solutions. In this paper, we present focal attention, a new attention mechanism that incorporates both fine-grained local and coarse-grained global interactions. In this new mechanism, each token attends its closest surrounding tokens at the fine granularity and the tokens far away at a coarse granularity and thus can capture both short- and long-range visual dependencies efficiently and effectively. With focal attention, we propose a new variant of Vision Transformer models, called Focal Transformers, which achieve superior performance over the state-of-the-art (SoTA) Vision Transformers on a range of public image classification and object detection benchmarks. In particular, our Focal Transformer models with a moderate size of 51. 1M and a large size of 89. 8M achieve 83. 6% and 84. 0%Top-1 accuracy, respectively, on ImageNet classification at 224×224. When employed as the backbones, Focal Transformers achieve consistent and substantial improvements over the current SoTA Swin Transformers [44] across 6 different object detection methods. Our largest Focal Transformer yields58. 7/59. 0boxmAPs and50. 9/51. 3mask mAPs on COCO mini-val/test-dev, and55. 4mIoU onADE20K for semantic segmentation, creating new SoTA on three of the most challenging computer vision tasks.

PDF Details

AAAI Conference 2020 Conference Paper

3D Human Pose Estimation via Explicit Compositional Depth Maps

Haiping Wu
Bin Xiao

In this work, we tackle the problem of estimating 3D human pose in camera space from a monocular image. First, we propose to use densely-generated limb depth maps to ease the learning of body joints depth, which are well aligned with image cues. Then, we design a lifting module from 2D pixel coordinates to 3D camera coordinates which explicitly takes the depth values as inputs, and is aligned with camera perspective projection model. We show our method achieves superior performance on large-scale 3D pose datasets Human3. 6M and MPI-INF-3DHP, and sets the new state-of-the-art.

PDF Details

YNICL Journal 2019 Journal Article

Quantitative susceptibility mapping based hybrid feature extraction for diagnosis of Parkinson's disease

Bin Xiao
Naying He
Qian Wang
Zenghui Cheng
Yining Jiao
E. Mark Haacke
Fuhua Yan
Feng Shi

Parkinson's disease is the second most common neurodegenerative disease in the elderly after Alzheimer's disease. The aetiology and pathogenesis of Parkinson's disease (PD) are still unclear, but the loss of dopaminergic cells and the excessive iron deposition in the substantia nigra (SN) are associated with the pathophysiology. As an imaging technique that can quantitatively reflect the amount of iron deposition, Quantitative Susceptibility Mapping (QSM) has been shown to be a promising modality for the diagnosis of PD. In the present work, we propose a hybrid feature extraction method for PD diagnosis using QSM images. First, we extract radiomics features from the SN using QSM and employ machine learning algorithms to classify PD and normal controls (NC). This approach allows us to investigate which features are most vulnerable to the effects of the disease. Along with this approach, we propose a Convolutional Neural Network (CNN) based method which can extract different features from the QSM image to further support the diagnosis of PD. Finally, we combine these two types of features and we find that the radiomics features and CNN features are complementary to each other, which helps further improve the classification (diagnostic) performance. We conclude that: (1) radiomics features from QSM data have significant clinical value for the diagnosis of PD; (2) CNN features are also useful in the diagnosis of PD; and (3) the combination of radiomics features and CNN features can enhance the diagnostic accuracy.

Details DOI

TAAS Journal 2014 Journal Article

Modeling and Defending against Adaptive BitTorrent Worms in Peer-to-Peer Networks

Jiaqing Luo
Bin Xiao
Qingjun Xiao
Jiannong Cao
Minyi Guo

BitTorrent (BT) is one of the most common Peer-to-Peer (P2P) file sharing protocols. Rather than downloading a file from a single source, the protocol allows users to join a swarm of peers to download and upload from each other simultaneously. Worms exploiting information from BT servers or trackers can cause serious damage to participating peers, which unfortunately has been neglected previously. In this article, we first present a new worm, called Adaptive BitTorrent worm (A-BT worm), which finds new victims and propagates sending forged requests to trackers. To reduce its abnormal behavior, the worm estimates the ratio of infected peers and adaptively adjusts its propagation speed. We then build a hybrid model to precisely characterize the propagation behavior of the worm. We also propose a statistical method to automatically detect the worm from the tracker by estimating the variance of the time intervals of requests. To slow down the worm propagation, we design a safe strategy in which the tracker returns secured peers when receives a request. Finally, we evaluate the accuracy of the hybrid model, and the effectiveness of our detection method and containment strategy through simulations.

Details DOI