Author name cluster

Yanwu Xu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

10 papers

1 author row

AAAI Conference 2025 Conference Paper

Precision-Enhanced Human-Object Contact Detection via Depth-Aware Perspective Interaction and Object Texture Restoration

Yuxiao Wang
Wenpeng Neng
Zhenao Wei
Yu Lei
Weiying Xue
Nan Zhuang
Yanwu Xu
Xinyu Jiang

Human-object contact (HOT) is designed to accurately identify the areas where humans and objects come into contact. Current methods frequently fail to account for scenarios where objects are frequently blocking the view, resulting in inaccurate identification of contact areas. To tackle this problem, we suggest using a perspective interaction HOT detector called PIHOT, which utilizes a depth map generation model to offer depth information of humans and objects related to the camera, thereby preventing false interaction detection. Furthermore, we use mask dilatation and object restoration techniques to restore the texture details in covered areas, improve the boundaries between objects, and enhance the perception of humans interacting with objects. Moreover, a spatial awareness perception is intended to concentrate on the characteristic features close to the points of contact. The experimental results show that the PIHOT algorithm achieves state-of-the-art performance on three benchmark datasets for HOT detection tasks. Compared to the most recent DHOT, our method enjoys an average improvement of 13%, 27.5%, 16%, and 18.5% on SC-Acc., C-Acc., mIoU, and wIoU metrics, respectively.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Semi-IIN: Semi-Supervised Intra-Inter Modal Interaction Learning Network for Multimodal Sentiment Analysis

Jinhao Lin
Yifei Wang
Yanwu Xu
Qi Liu

Despite multimodal sentiment analysis being a fertile research ground that merits further investigation, current approaches take up high annotation cost and suffer from label ambiguity, non-amicable to high-quality labeled data acquisition. Furthermore, choosing the right interactions is essential because the significance of intra- or inter-modal interactions can differ among various samples. To this end, we propose Semi-IIN, a Semi-supervised Intra-inter modal Interaction learning Network for multimodal sentiment analysis. Semi-IIN integrates masked attention and gating mechanisms, enabling effective dynamic selection after independently capturing intra- and inter-modal interactive information. Combined with the self-training approach, Semi-IIN fully utilizes the knowledge learned from unlabeled data. Experimental results on two public datasets, MOSI and MOSEI, demonstrate the effectiveness of Semi-IIN, establishing a new state-of-the-art on several metrics.

PDF Details DOI

AAAI Conference 2024 Conference Paper

MedSegDiff-V2: Diffusion-Based Medical Image Segmentation with Transformer

Junde Wu
Wei Ji
Huazhu Fu
Min Xu
Yueming Jin
Yanwu Xu

The Diffusion Probabilistic Model (DPM) has recently gained popularity in the field of computer vision, thanks to its image generation applications, such as Imagen, Latent Diffusion Models, and Stable Diffusion, which have demonstrated impressive capabilities and sparked much discussion within the community. Recent investigations have further unveiled the utility of DPM in the domain of medical image analysis, as underscored by the commendable performance exhibited by the medical image segmentation model across various tasks. Although these models were originally underpinned by a UNet architecture, there exists a potential avenue for enhancing their performance through the integration of vision transformer mechanisms. However, we discovered that simply combining these two models resulted in subpar performance. To effectively integrate these two cutting-edge techniques for the Medical image segmentation, we propose a novel Transformer-based Diffusion framework, called MedSegDiff-V2. We verify its effectiveness on 20 medical image segmentation tasks with different image modalities. Through comprehensive evaluation, our approach demonstrates superiority over prior state-of-the-art (SOTA) methodologies. Code is released at https://github.com/KidsWithTokens/MedSegDiff.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

SF-V: Single Forward Video Generation Model

Zhixing Zhang
Yanyu Li
Yushu Wu
Yanwu Xu
Anil Kag
Ivan Skorokhodov
Willi Menapace
Aliaksandr Siarohin

Diffusion-based video generation models have demonstrated remarkable success in obtaining high-fidelity videos through the iterative denoising process. However, these models require multiple denoising steps during sampling, resulting in high computational costs. In this work, we propose a novel approach to obtain single-step video generation models by leveraging adversarial training to fine-tune pre-trained video diffusion models. We show that, through the adversarial training, the multi-steps video diffusion model, i. e. , Stable Video Diffusion (SVD), can be trained to perform single forward pass to synthesize high-quality videos, capturing both temporal and spatial dependencies in the video data. Extensive experiments demonstrate that our method achieves competitive generation quality of synthesized videos with significantly reduced computational overhead for the denoising process (i. e. , around $23\times$ speedup compared with SVD and $6\times$ speedup compared with existing works, with even better generation quality), paving the way for real-time video synthesis and editing.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

Semi-Implicit Denoising Diffusion Models (SIDDMs)

Yanwu Xu
Mingming Gong
Shaoan Xie
Wei Wei
Matthias Grundmann
Kayhan Batmanghelich
Tingbo Hou

Despite the proliferation of generative models, achieving fast sampling during inference without compromising sample diversity and quality remains challenging. Existing models such as Denoising Diffusion Probabilistic Models (DDPM) deliver high-quality, diverse samples but are slowed by an inherently high number of iterative steps. The Denoising Diffusion Generative Adversarial Networks (DDGAN) attempted to circumvent this limitation by integrating a GAN model for larger jumps in the diffusion process. However, DDGAN encountered scalability limitations when applied to large datasets. To address these limitations, we introduce a novel approach that tackles the problem by matching implicit and explicit factors. More specifically, our approach involves utilizing an implicit model to match the marginal distributions of noisy data and the explicit conditional distribution of the forward diffusion. This combination allows us to effectively match the joint denoising distributions. Unlike DDPM but similar to DDGAN, we do not enforce a parametric distribution for the reverse step, enabling us to take large steps during inference. Similar to the DDPM but unlike DDGAN, we take advantage of the exact form of the diffusion process. We demonstrate that our proposed method obtains comparable generative performance to diffusion-based models and vastly superior results to models with a small number of sampling steps.

PDF Details

JBHI Journal 2022 Journal Article

Hierarchical Amortized GAN for 3D High Resolution Medical Image Synthesis

Li Sun
Junxiang Chen
Yanwu Xu
Mingming Gong
Ke Yu
Kayhan Batmanghelich

Generative Adversarial Networks (GAN) have many potential medical imaging applications, including data augmentation, domain adaptation, and model explanation. Due to the limited memory of Graphical Processing Units (GPUs), most current 3D GAN models are trained on low-resolution medical images, these models either cannot scale to high-resolution or are prone to patchy artifacts. In this work, we propose a novel end-to-end GAN architecture that can generate high-resolution 3D images. We achieve this goal by using different configurations between training and inference. During training, we adopt a hierarchical structure that simultaneously generates a low-resolution version of the image and a randomly selected sub-volume of the high-resolution image. The hierarchical design has two advantages: First, the memory demand for training on high-resolution images is amortized among sub-volumes. Furthermore, anchoring the high-resolution sub-volumes to a single low-resolution image ensures anatomical consistency between sub-volumes. During inference, our model can directly generate full high-resolution images. We also incorporate an encoder with a similar hierarchical structure into the model to extract features from the images. Experiments on 3D thorax CT and brain MRI demonstrate that our approach outperforms state of the art in image generation. We also demonstrate clinical applications of the proposed model in data augmentation and clinical-relevant feature extraction.

Details DOI

AAAI Conference 2020 Conference Paper

Compressed Self-Attention for Deep Metric Learning

Ziye Chen
Mingming Gong
Yanwu Xu
Chaohui Wang
Kun Zhang
Bo Du

In this paper, we aim to enhance self-attention (SA) mechanism for deep metric learning in visual perception, by capturing richer contextual dependencies in visual data. To this end, we propose a novel module, named compressed selfattention (CSA), which signiﬁcantly reduces the computation and memory cost with a neglectable decrease in accuracy with respect to the original SA mechanism, thanks to the following two characteristics: i) it only needs to compute a small number of base attention maps for a small number of base feature vectors; and ii) the output at each spatial location can be simply obtained by an adaptive weighted average of the outputs calculated from the base attention maps. The high computational efﬁciency of CSA enables the application to high-resolution shallow layers in convolutional neural networks with little additional cost. In addition, CSA makes it practical to further partition the feature maps into groups along the channel dimension and compute attention maps for features in each group separately, thus increasing the diversity of long-range dependencies and accordingly boosting the accuracy. We evaluate the performance of CSA via extensive experiments on two metric learning tasks: person re-identiﬁcation and local descriptor learning. Qualitative and quantitative comparisons with latest methods demonstrate the signiﬁcance of CSA in this topic.

PDF Details

AAAI Conference 2020 Conference Paper

Generative-Discriminative Complementary Learning

Yanwu Xu
Mingming Gong
Junxiang Chen
Tongliang Liu
Kun Zhang
Kayhan Batmanghelich

The majority of state-of-the-art deep learning methods are discriminative approaches, which model the conditional distribution of labels given inputs features. The success of such approaches heavily depends on high-quality labeled instances, which are not easy to obtain, especially as the number of candidate classes increases. In this paper, we study the complementary learning problem. Unlike ordinary labels, complementary labels are easy to obtain because an annotator only needs to provide a yes/no answer to a randomly chosen candidate class for each instance. We propose a generative-discriminative complementary learning method that estimates the ordinary labels by modeling both the conditional (discriminative) and instance (generative) distributions. Our method, we call Complementary Conditional GAN (CCGAN), improves the accuracy of predicting ordinary labels and is able to generate high-quality instances in spite of weak supervision. In addition to the extensive empirical studies, we also theoretically show that our model can retrieve the true conditional distribution from the complementarilylabeled data.

PDF Details

AAAI Conference 2019 Conference Paper

Oversampling for Imbalanced Data via Optimal Transport

Yuguang Yan
Mingkui Tan
Yanwu Xu
Jiezhang Cao
Michael Ng
Huaqing Min
Qingyao Wu

The issue of data imbalance occurs in many real-world applications especially in medical diagnosis, where normal cases are usually much more than the abnormal cases. To alleviate this issue, one of the most important approaches is the oversampling method, which seeks to synthesize minority class samples to balance the numbers of different classes. However, existing methods barely consider global geometric information involved in the distribution of minority class samples, and thus may incur distribution mismatching between real and synthetic samples. In this paper, relying on optimal transport (Villani 2008), we propose an oversampling method by exploiting global geometric information of data to make synthetic samples follow a similar distribution to that of minority class samples. Moreover, we introduce a novel regularization based on synthetic samples and shift the distribution of minority class samples according to loss information. Experiments on toy and real-world data sets demonstrate the efficacy of our proposed method in terms of multiple metrics.

PDF Details

NeurIPS Conference 2019 Conference Paper

Twin Auxilary Classifiers GAN

Mingming Gong
Yanwu Xu
Chunyuan Li
Kun Zhang
Kayhan Batmanghelich

Conditional generative models enjoy significant progress over the past few years. One of the popular conditional models is Auxiliary Classifier GAN (AC-GAN) that generates highly discriminative images by extending the loss function of GAN with an auxiliary classifier. However, the diversity of the generated samples by AC-GAN tends to decrease as the number of classes increases. In this paper, we identify the source of low diversity issue theoretically and propose a practical solution to the problem. We show that the auxiliary classifier in AC-GAN imposes perfect separability, which is disadvantageous when the supports of the class distributions have significant overlap. To address the issue, we propose Twin Auxiliary Classifiers Generative Adversarial Net (TAC-GAN) that adds a new player that interacts with other players (the generator and the discriminator) in GAN. Theoretically, we demonstrate that our TAC-GAN can effectively minimize the divergence between generated and real data distributions. Extensive experimental results show that our TAC-GAN can successfully replicate the true data distributions on simulated data, and significantly improves the diversity of class-conditional image generation on real datasets.

PDF Details