Author name cluster

Hang Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

15 papers

2 author rows

JBHI Journal 2026 Journal Article

CMIS: A Class-Aware Multi-Structure Instance Segmentation Model for Fetal Brain Ultrasound Images With Fuzzy Region-Based Constraints

Hang Wang
Mingxing Duan
Yuhuan Lu
Bin Pu
Yue Qin
Shuihua Wang
Kenli Li

Fetal anatomical structure segmentation in ultrasound images is essential for biometric measurement and disease diagnosis. However, current methods focus on a specific plane or a few structures, whereas obstetricians diagnose by considering multiple structures from different planes. In addition, existing methods struggle with segmenting fuzzy regions, which leads to performance degradation. We propose a real-time segmentation method called Class-aware Multi-structure Instance Segmentation (CMIS), designed to segment 19 key structures in 3 fetal brain planes to support brain-disease diagnosis. We extract instance information and generate class-aware attention for each class instead of dense instances to save computing resources and provide more informative details. Then we implement cross-layer and multi-scale fusion to obtain detailed prototypes. Finally, we fuse global attention with local prototypes cropped by boxes to generate masks and randomly perturb the boxes during training to enhance robustness. Moreover, we propose a new fuzzy region-based constraint loss to address the challenge of structures with varying scales and fuzzy boundaries. Extensive experiments on a fetal brain dataset demonstrate that CMIS outperforms 13 competing baselines, with an mDice of 83. 41 $\pm$ 0. 03% at 37 FPS. CMIS also excels in external experiments on a fetal heart ultrasound dataset, achieving a mDice of 85. 73 $\pm$ 0. 02%. These results demonstrate the effectiveness of CMIS in segmenting complex anatomical structures in ultrasound and its potential for real-time clinical applications. CMIS is limited to 2D normal standard planes ( $\geq$ 19 weeks). Thus, its generalization to abnormal cases and broader datasets remains to be investigated.

Details DOI

AAAI Conference 2026 Conference Paper

QRShield: Exploiting Vulnerabilities of Latent Diffusion Models for Preventing AI Art Plagiarism

Xunyue Mo
Weibin Wu
Qingrui Tu
Hang Wang
Junxi He
Zibin Zheng

Latent Diffusion Models (LDMs) have achieved remarkable success in image generation tasks, yet their low barrier to customization poses severe threats related to art plagiarism. As a countermeasure, adversarial methods have been proposed to protect artworks from plagiarism. However, current methods suffer from limited effectiveness, high cost, and complex optimization. Moreover, their exploration and exploitation of LDM vulnerabilities remain limited, restricting effectiveness and applicability. To address this issue, we analyze the VAE and U-Net components of LDMs, revealing their vulnerabilities. Specifically, we study the response of U-Net to specific structural and frequency patterns in the latent space and find that it is susceptible to high-frequency and periodic latent features. Furthermore, we observe channel correlations during the VAE encoding process. Inspired by these, we propose QRShield, an efficient protection method that exploits the vulnerabilities of LDMs. By constructing high-frequency and periodic features consistent across latent channels and combining them with a momentum-based translation-invariant attack strategy, QRShield achieves stronger and more efficient protection. QRShield significantly improves protection performance in various fine-tuning settings, with over 10% gains in multiple metrics, a threefold increase in generation speed, and nearly 50% reduction in memory usage. Therefore, our work offers a more practical method to prevent AI art plagiarism.

PDF Details DOI

ICLR Conference 2025 Conference Paper

AdaWM: Adaptive World Model based Planning for Autonomous Driving

Hang Wang
Xin Ye
Feng Tao
Chenbin Pan
Abhirup Mallik
Burhaneddin Yaman
Liu Ren
Junshan Zhang

World model based reinforcement learning (RL) has emerged as a promising approach for autonomous driving, which learns a latent dynamics model and uses it to train a planning policy. To speed up the learning process, the pretrain-finetune paradigm is often used, where online RL is initialized by a pretrained model and a policy learned offline. However, naively performing such initialization in RL may result in dramatic performance degradation during the online interactions in the new task. To tackle this challenge, we first analyze the performance degradation and identify two primary root causes therein: the mismatch of the planning policy and the mismatch of the dynamics model, due to distribution shift. We further analyze the effects of these factors on performance degradation during finetuning, and our findings reveal that the choice of finetuning strategies plays a pivotal role in mitigating these effects. We then introduce AdaWM, an Adaptive World Model based planning method, featuring two key steps: (a) mismatch identification, which quantifies the mismatches and informs the finetuning strategy, and (b) alignment-driven finetuning, which selectively updates either the policy or the model as needed using efficient low-rank updates. Extensive experiments on the challenging CARLA driving tasks demonstrate that AdaWM significantly improves the finetuning process, resulting in more robust and efficient performance in autonomous driving systems.

Details

EAAI Journal 2024 Journal Article

A deep evidence fusion framework for apple leaf disease classification

Hang Wang
Jiaxu Zhang
Zhu Yin
Liucheng Huang
Jie Wang
Xiaojian Ma

Details DOI

NeurIPS Conference 2024 Conference Paper

Breaking Semantic Artifacts for Generalized AI-generated Image Detection

Chende Zheng
Chenhao Lin
Zhengyu Zhao
Hang Wang
Xu Guo
Shuai Liu
Chao Shen

With the continuous evolution of AI-generated images, the generalized detection of them has become a crucial aspect of AI security. Existing detectors have focused on cross-generator generalization, while it remains unexplored whether these detectors can generalize across different image scenes, e. g. , images from different datasets with different semantics. In this paper, we reveal that existing detectors suffer from substantial Accuracy drops in such cross-scene generalization. In particular, we attribute their failures to ''semantic artifacts'' in both real and generated images, to which detectors may overfit. To break such ''semantic artifacts'', we propose a simple yet effective approach based on conducting an image patch shuffle and then training an end-to-end patch-based classifier. We conduct a comprehensive open-world evaluation on 31 test sets, covering 7 Generative Adversarial Networks, 18 (variants of) Diffusion Models, and another 6 CNN-based generative models. The results demonstrate that our approach outperforms previous approaches by 2. 08\% (absolute) on average regarding cross-scene detection Accuracy. We also notice the superiority of our approach in open-world generalization, with an average Accuracy improvement of 10. 59\% (absolute) across all test sets. Our code is available at https: //github. com/Zig-HS/FakeImageDetection.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Intrinsic Phase-Preserving Networks for Depth Super Resolution

Xuanhong Chen
Hang Wang
Jialiang Chen
Kairui Feng
Jinfan Liu
Xiaohang Wang
Weimin Zhang
Bingbing Ni

Depth map super-resolution (DSR) plays an indispensable role in 3D vision. We discover an non-trivial spectral phenomenon: the components of high-resolution (HR) and low-resolution (LR) depth maps manifest the same intrinsic phase, and the spectral phase of RGB is a superset of them, which suggests that a phase-aware filter can assist in the precise use of RGB cues. Motivated by this, we propose an intrinsic phase-preserving DSR paradigm, named IPPNet, to fully exploit inter-modality collaboration in a mutually guided way. In a nutshell, a novel Phase-Preserving Filtering Module (PPFM) is developed to generate dynamic phase-aware filters according to the LR depth flow to filter out erroneous noisy components contained in RGB and then conduct depth enhancement via the modulation of the phase-preserved RGB signal. By stacking multiple PPFM blocks, the proposed IPPNet is capable of reaching a highly competitive restoration performance. Extensive experiments on various benchmark datasets, e.g., NYU v2, RGB-D-D, reach SOTA performance and also well demonstrate the validity of the proposed phase-preserving scheme. Code: https://github.com/neuralchen/IPPNet/.

PDF Details DOI

JBHI Journal 2024 Journal Article

SKGC: A General Semantic-Level Knowledge Guided Classification Framework for Fetal Congenital Heart Disease

Yuhuan Lu
Guanghua Tan
Bin Pu
Hang Wang
Bocheng Liang
Kenli Li
Jagath C. Rajapakse

Congenital heart disease (CHD) is the most common congenital disability affecting healthy development and growth, even resulting in pregnancy termination or fetal death. Recently, deep learning techniques have made remarkable progress to assist in diagnosing CHD. One very popular method is directly classifying fetal ultrasound images, recognized as abnormal and normal, which tends to focus more on global features and neglects semantic knowledge of anatomical structures. The other approach is segmentation-based diagnosis, which requires a large number of pixel-level annotation masks for training. However, the detailed pixel-level segmentation annotation is costly or even unavailable. Based on the above analysis, we propose SKGC, a universal framework to identify normal or abnormal four-chamber heart (4CH) images, guided by a few annotation masks, while improving accuracy remarkably. SKGC consists of a semantic-level knowledge extraction module (SKEM), a multi-knowledge fusion module (MFM), and a classification module (CM). SKEM is responsible for obtaining high-level semantic knowledge, serving as an abstract representation of the anatomical structures that obstetricians focus on. MFM is a lightweight but efficient module that fuses semantic-level knowledge with the original specific knowledge in ultrasound images. CM classifies the fused knowledge and can be replaced by any advanced classifier. Moreover, we design a new loss function that enhances the constraint between the foreground and background predictions, improving the quality of the semantic-level knowledge. Experimental results on the collected real-world NA-4CH and the publicly FEST datasets show that SKGC achieves impressive performance with the best accuracy of 99. 68% and 95. 40%, respectively. Notably, the accuracy improves from 74. 68% to 88. 14% using only 10 labeled masks.

Details DOI

IJCAI Conference 2024 Conference Paper

Speech-Forensics: Towards Comprehensive Synthetic Speech Dataset Establishment and Analysis

Zhoulin Ji
Chenhao Lin
Hang Wang
Chao Shen

Detecting synthetic from real speech is increasingly crucial due to the risks of misinformation and identity impersonation. While various datasets for synthetic speech analysis have been developed, they often focus on specific areas, limiting their utility for comprehensive research. To fill this gap, we propose the Speech-Forensics dataset by extensively covering authentic, synthetic, and partially forged speech samples that include multiple segments synthesized by different high-quality algorithms. Moreover, we propose a TEmporal Speech LocalizaTion network, called TEST, aiming at simultaneously performing authenticity detection, multiple fake segments localization, and synthesis algorithms recognition, without any complex post-processing. TEST effectively integrates LSTM and Transformer to extract more powerful temporal speech representations and utilizes dense prediction on multi-scale pyramid features to estimate the synthetic spans. Our model achieves an average mAP of 83. 55% and an EER of 5. 25% at the utterance level. At the segment level, it attains an EER of 1. 07% and a 92. 19% F1 score. These results highlight the model's robust capability for a comprehensive analysis of synthetic speech, offering a promising avenue for future research and practical applications in this field.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Learning Continuous Depth Representation via Geometric Spatial Aggregator

Xiaohang Wang
Xuanhong Chen
Bingbing Ni
Zhengyan Tong
Hang Wang

Depth map super-resolution (DSR) has been a fundamental task for 3D computer vision. While arbitrary scale DSR is a more realistic setting in this scenario, previous approaches predominantly suffer from the issue of inefficient real-numbered scale upsampling. To explicitly address this issue, we propose a novel continuous depth representation for DSR. The heart of this representation is our proposed Geometric Spatial Aggregator (GSA), which exploits a distance field modulated by arbitrarily upsampled target gridding, through which the geometric information is explicitly introduced into feature aggregation and target generation. Furthermore, bricking with GSA, we present a transformer-style backbone named GeoDSR, which possesses a principled way to construct the functional mapping between local coordinates and the high-resolution output results, empowering our model with the advantage of arbitrary shape transformation ready to help diverse zooming demand. Extensive experimental results on standard depth map benchmarks, e.g., NYU v2, have demonstrated that the proposed framework achieves significant restoration gain in arbitrary scale depth map super-resolution compared with the prior art. Our codes are available at https://github.com/nana01219/GeoDSR.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

Meta-Adapter: An Online Few-shot Learner for Vision-Language Model

Cheng Cheng
Lin Song
Ruoyi Xue
Hang Wang
Hongbin Sun
Yixiao Ge
Ying Shan

The contrastive vision-language pre-training, known as CLIP, demonstrates remarkable potential in perceiving open-world visual concepts, enabling effective zero-shot image recognition. Nevertheless, few-shot learning methods based on CLIP typically require offline fine-tuning of the parameters on few-shot samples, resulting in longer inference time and the risk of overfitting in certain domains. To tackle these challenges, we propose the Meta-Adapter, a lightweight residual-style adapter, to refine the CLIP features guided by the few-shot samples in an online manner. With a few training samples, our method can enable effective few-shot learning capabilities and generalize to unseen data or tasks without additional fine-tuning, achieving competitive performance and high efficiency. Without bells and whistles, our approach outperforms the state-of-the-art online few-shot learning method by an average of 3. 6\% on eight image classification datasets with higher inference speed. Furthermore, our model is simple and flexible, serving as a plug-and-play module directly applicable to downstream tasks. Without further fine-tuning, Meta-Adapter obtains notable performance improvements in open-vocabulary object detection and segmentation tasks.

PDF Details

ICML Conference 2023 Conference Paper

Warm-Start Actor-Critic: From Approximation Error to Sub-optimality Gap

Hang Wang
Sen Lin 0001
Junshan Zhang

Warm-Start reinforcement learning (RL), aided by a prior policy obtained from offline training, is emerging as a promising RL approach for practical applications. Recent empirical studies have demonstrated that the performance of Warm-Start RL can be improved quickly in some cases but become stagnant in other cases, especially when the function approximation is used. To this end, the primary objective of this work is to build a fundamental understanding on ”whether and when online learning can be significantly accelerated by a warm-start policy from offline RL? ”. Specifically, we consider the widely used Actor-Critic (A-C) method with a prior policy. We first quantify the approximation errors in the Actor update and the Critic update, respectively. Next, we cast the Warm-Start A-C algorithm as Newton’s method with perturbation, and study the impact of the approximation errors on the finite-time learning performance with inaccurate Actor/Critic updates. Under some general technical conditions, we derive the upper bounds, which shed light on achieving the desired finite-learning performance in the Warm-Start A-C algorithm. In particular, our findings reveal that it is essential to reduce the algorithm bias in online learning. We also obtain lower bounds on the sub-optimality gap of the Warm-Start A-C algorithm to quantify the impact of the bias and error propagation.

Details

AAAI Conference 2022 Conference Paper

Cross-Species 3D Face Morphing via Alignment-Aware Controller

Xirui Yan
Zhenbo Yu
Bingbing Ni
Hang Wang

We address cross-species 3D face morphing (i. e. , 3D face morphing from human to animal), a novel problem with promising applications in social media and movie industry. It remains challenging how to preserve target structural information and source fine-grained facial details simultaneously. To this end, we propose an Alignment-aware 3D Face Morphing (AFM) framework, which builds semantic-adaptive correspondence between source and target faces across species, via an alignment-aware controller mesh ( Explicit Controller, EC) with explicit source/target mesh binding. Based on EC, we introduce Controller-Based Mapping (CBM), which builds semantic consistency between source and target faces according to the semantic importance of different face regions. Additionally, an inference-stage coarse-to-fine strategy is exploited to produce fine-grained meshes with rich facial details from rough meshes. Extensive experimental results in multiple people and animals demonstrate that our method produces high-quality deformation results.

PDF Details

NeurIPS Conference 2021 Conference Paper

Adaptive Ensemble Q-learning: Minimizing Estimation Bias via Error Feedback

Hang Wang
Sen Lin
Junshan Zhang

The ensemble method is a promising way to mitigate the overestimation issue in Q-learning, where multiple function approximators are used to estimate the action values. It is known that the estimation bias hinges heavily on the ensemble size (i. e. , the number of Q-function approximators used in the target), and that determining the 'right' ensemble size is highly nontrivial, because of the time-varying nature of the function approximation errors during the learning process. To tackle this challenge, we first derive an upper bound and a lower bound on the estimation bias, based on which the ensemble size is adapted to drive the bias to be nearly zero, thereby coping with the impact of the time-varying approximation errors accordingly. Motivated by the theoretic findings, we advocate that the ensemble method can be combined with Model Identification Adaptive Control (MIAC) for effective ensemble size adaptation. Specifically, we devise Adaptive Ensemble Q-learning (AdaEQ), a generalized ensemble method with two key steps: (a) approximation error characterization which serves as the feedback for flexibly controlling the ensemble size, and (b) ensemble size adaptation tailored towards minimizing the estimation bias. Extensive experiments are carried out to show that AdaEQ can improve the learning performance than the existing methods for the MuJoCo benchmark.

PDF Details

AAMAS Conference 2021 Conference Paper

Distributed Q-Learning with State Tracking for Multi-agent Networked Control

Hang Wang
Sen Lin
Hamid Jafarkhani
Junshan Zhang

This paper studies distributed Q-learning for Linear Quadratic Regulator (LQR) in a multi-agent network. The existing results often assume that agents can observe the global system state, which may be infeasible in large-scale systems due to privacy concerns or communication constraints. In this work, we consider a setting with unknown system models and no centralized coordinator. We devise a state tracking (ST) based Q-learning algorithm to design optimal controllers for agents. Specifically, we assume that agents maintain local estimates of the global state based on their local information and communications with neighbors. At each step, every agent updates its local global state estimation, based on which it solves an approximate Q-factor locally through policy iteration. Assuming a decaying injected excitation noise during the policy evaluation, we prove that the local estimation converges to the true global state, and establish the convergence of the proposed distributed ST-based Q-learning algorithm. The experimental studies corroborate our theoretical results by showing that our proposed method achieves comparable performance with the centralized case.

PDF

ICML Conference 2021 Conference Paper

Self-supervised Graph-level Representation Learning with Local and Global Structure

Minghao Xu
Hang Wang
Bingbing Ni
Hongyu Guo
Jian Tang 0005

This paper studies unsupervised/self-supervised whole-graph representation learning, which is critical in many tasks such as molecule properties prediction in drug and material discovery. Existing methods mainly focus on preserving the local similarity structure between different graph instances but fail to discover the global semantic structure of the entire data set. In this paper, we propose a unified framework called Local-instance and Global-semantic Learning (GraphLoG) for self-supervised whole-graph representation learning. Specifically, besides preserving the local similarities, GraphLoG introduces the hierarchical prototypes to capture the global semantic clusters. An efficient online expectation-maximization (EM) algorithm is further developed for learning the model. We evaluate GraphLoG by pre-training it on massive unlabeled graphs followed by fine-tuning on downstream tasks. Extensive experiments on both chemical and biological benchmark data sets demonstrate the effectiveness of the proposed approach.

Details