Author name cluster

Qingyao Wu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

20 papers

2 author rows

AAAI Conference 2025 Conference Paper

IPVTON: Image-based 3D Virtual Try-on with Image Prompt Adapter

Xiaojing Zhong
Zhonghua Wu
Xiaofeng Yang
Guosheng Lin
Qingyao Wu

Given a pair of images depicting a person and a garment separately, image-based 3D virtual try-on methods aim to reconstruct a 3D human model that realistically portrays the person wearing the desired garment. In this paper, we present IPVTON, a novel image-based 3D virtual try-on framework. IPVTON employs score distillation sampling with image prompts to optimize a hybrid 3D human representation, integrating target garment features into diffusion priors through an image prompt adapter. To avoid interference with non-target areas, we leverage mask-guided image prompt embeddings to focus the image features on the try-on regions. Moreover, we impose geometric constraints on the 3D model with a pseudo silhouette generated by ControlNet, ensuring that the clothed 3D human model retains the shape of the source identity while accurately wearing the target garments. Extensive qualitative and quantitative experiments demonstrate that IPVTON outperforms previous methods in image-based 3D virtual try-on tasks, excelling in both geometry and texture.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

SViMo: Synchronized Diffusion for Video and Motion Generation in Hand-object Interaction Scenarios

Lingwei Dang
Ruizhi Shao
Hongwen Zhang
Wei MIN
Yebin Liu
Qingyao Wu

Hand-Object Interaction (HOI) generation has significant application potential. However, current 3D HOI motion generation approaches heavily rely on predefined 3D object models and lab-captured motion data, limiting generalization capabilities. Meanwhile, HOI video generation methods prioritize pixel-level visual fidelity, often sacrificing physical plausibility. Recognizing that visual appearance and motion patterns share fundamental physical laws in the real world, we propose a novel framework that combines visual priors and dynamic constraints within a synchronized diffusion process to generate the HOI video and motion simultaneously. To integrate the heterogeneous semantics, appearance, and motion features, our method implements tri-modal adaptive modulation for feature aligning, coupled with 3D full-attention for modeling inter- and intra-modal dependencies. Furthermore, we introduce a vision-aware 3D interaction diffusion model that generates explicit 3D interaction sequences directly from the synchronized diffusion outputs, then feeds them back to establish a closed-loop feedback cycle. This architecture eliminates dependencies on predefined object models or explicit pose guidance while significantly enhancing video-motion consistency. Experimental results demonstrate our method's superiority over state-of-the-art approaches in generating high-fidelity, dynamically plausible HOI sequences, with notable generalization capabilities in unseen real-world scenarios. Project page at https: //droliven. github. io/SViMo_project.

PDF Details

EAAI Journal 2024 Journal Article

A region-based convolutional fusion network for typhoon intensity estimation in satellite images

Pengshuai Yin
Huanxin Chen
Huichou Huang
Hanjing Su
Qingyao Wu
Qilin Wan

Details DOI

AAAI Conference 2024 Conference Paper

Diverse and Stable 2D Diffusion Guided Text to 3D Generation with Noise Recalibration

Xiaofeng Yang
Fayao Liu
Yi Xu
Hanjing Su
Qingyao Wu
Guosheng Lin

In recent years, following the success of text guided image generation, text guided 3D generation has gained increasing attention among researchers. Dreamfusion is a notable approach that enhances generation quality by utilizing 2D text guided diffusion models and introducing SDS loss, a technique for distilling 2D diffusion model information to train 3D models. However, the SDS loss has two major limitations that hinder its effectiveness. Firstly, when given a text prompt, the SDS loss struggles to produce diverse content. Secondly, during training, SDS loss may cause the generated content to overfit and collapse, limiting the model's ability to learn intricate texture details. To overcome these challenges, we propose a novel approach called Noise Recalibration algorithm. By incorporating this technique, we can generate 3D content with significantly greater diversity and stunning details. Our approach offers a promising solution to the limitations of SDS loss.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Spatial-Semantic Collaborative Cropping for User Generated Content

Yukun Su
Yiwen Cao
Jingliang Deng
Fengyun Rao
Qingyao Wu

A large amount of User Generated Content (UGC) is uploaded to the Internet daily and displayed to people world-widely through the client side (mobile and PC). This requires the cropping algorithms to produce the aesthetic thumbnail within a specific aspect ratio on different devices. However, existing image cropping works mainly focus on landmark or landscape images, which fail to model the relations among the multi-objects with the complex background in UGC. Besides, previous methods merely consider the aesthetics of the cropped images while ignoring the content integrity, which is crucial for UGC cropping. In this paper, we propose a Spatial-Semantic Collaborative cropping network (S2CNet) for arbitrary user generated content accompanied by a new cropping benchmark. Specifically, we first mine the visual genes of the potential objects. Then, the suggested adaptive attention graph recasts this task as a procedure of information association over visual nodes. The underlying spatial and semantic relations are ultimately centralized to the crop candidate through differentiable message passing, which helps our network efficiently to preserve both the aesthetics and the content integrity. Extensive experiments on the proposed UGCrop5K and other public datasets demonstrate the superiority of our approach over state-of-the-art counterparts.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Variance-Insensitive and Target-Preserving Mask Refinement for Interactive Image Segmentation

Chaowei Fang
Ziyin Zhou
Junye Chen
Hanjing Su
Qingyao Wu
Guanbin Li

Point-based interactive image segmentation can ease the burden of mask annotation in applications such as semantic segmentation and image editing. However, fully extracting the target mask with limited user inputs remains challenging. We introduce a novel method, Variance-Insensitive and Target-Preserving Mask Refinement to enhance segmentation quality with fewer user inputs. Regarding the last segmentation result as the initial mask, an iterative refinement process is commonly employed to continually enhance the initial mask. Nevertheless, conventional techniques suffer from sensitivity to the variance in the initial mask. To circumvent this problem, our proposed method incorporates a mask matching algorithm for ensuring consistent inferences from different types of initial masks. We also introduce a target-aware zooming algorithm to preserve object information during downsampling, balancing efficiency and accuracy. Experiments on GrabCut, Berkeley, SBD, and DAVIS datasets demonstrate our method's state-of-the-art performance in interactive image segmentation.

PDF Details DOI

AAAI Conference 2023 Conference Paper

DENet: Disentangled Embedding Network for Visible Watermark Removal

Ruizhou Sun
Yukun Su
Qingyao Wu

Adding visible watermark into image is a common copyright protection method of medias. Meanwhile, public research on watermark removal can be utilized as an adversarial technology to help the further development of watermarking. Existing watermark removal methods mainly adopt multi-task learning networks, which locate the watermark and restore the background simultaneously. However, these approaches view the task as an image-to-image reconstruction problem, where they only impose supervision after the final output, making the high-level semantic features shared between different tasks. To this end, inspired by the two-stage coarse-refinement network, we propose a novel contrastive learning mechanism to disentangle the high-level embedding semantic information of the images and watermarks, driving the respective network branch more oriented. Specifically, the proposed mechanism is leveraged for watermark image decomposition, which aims to decouple the clean image and watermark hints in the high-level embedding space. This can guarantee the learning representation of the restored image enjoy more task-specific cues. In addition, we introduce a self-attention-based enhancement module, which promotes the network's ability to capture semantic information among different regions, leading to further improvement on the contrastive learning mechanism. To validate the effectiveness of our proposed method, extensive experiments are conducted on different challenging benchmarks. Experimental evaluations show that our approach can achieve state-of-the-art performance and yield high-quality images. The code is available at: https://github.com/lianchengmingjue/DENet.

PDF Details DOI

AAAI Conference 2022 Conference Paper

Self-Supervised Object Localization with Joint Graph Partition

Yukun Su
Guosheng Lin
Yun Hao
Yiwen Cao
Wenjun Wang
Qingyao Wu

Object localization aims to generate a tight bounding box for the target object, which is a challenging problem that has been deeply studied in recent years. Since collecting bounding-box labels is time-consuming and laborious, many researchers focus on weakly supervised object localization (WSOL). As the recent appealing self-supervised learning technique shows its powerful function in visual tasks, in this paper, we take the early attempt to explore unsupervised object localization by self-supervision. Specifically, we adopt different geometric transformations to image and utilize their parameters as pseudo labels for self-supervised learning. Then, the classagnostic activation map is used to highlight the target object potential regions. However, such attention maps merely focus on the most discriminative part of the objects, which will affect the quality of the predicted bounding box. Based on the motivation that the activation maps of different transformations of the same image should be equivariant, we further design a siamese network that encodes the paired images and propose a joint graph partition mechanism in an unsupervised manner to enhance the object co-occurrent regions. To validate the effectiveness of the proposed method, extensive experiments are conducted on CUB-200-2011, Stanford Cars and FGVC-Aircraft datasets. Experimental results show that our method outperforms state-of-the-art methods using the same level of supervision, even outperforms some weaklysupervised methods.

PDF Details

NeurIPS Conference 2021 Conference Paper

Debiased Visual Question Answering from Feature and Sample Perspectives

Zhiquan Wen
Guanghui Xu
Mingkui Tan
Qingyao Wu
Qi Wu

Visual question answering (VQA) is designed to examine the visual-textual reasoning ability of an intelligent agent. However, recent observations show that many VQA models may only capture the biases between questions and answers in a dataset rather than showing real reasoning abilities. For example, given a question, some VQA models tend to output the answer that occurs frequently in the dataset and ignore the images. To reduce this tendency, existing methods focus on weakening the language bias. Meanwhile, only a few works also consider vision bias implicitly. However, these methods introduce additional annotations or show unsatisfactory performance. Moreover, not all biases are harmful to the models. Some “biases” learnt from datasets represent natural rules of the world and can help limit the range of answers. Thus, how to filter and remove the true negative biases in language and vision modalities remain a major challenge. In this paper, we propose a method named D-VQA to alleviate the above challenges from the feature and sample perspectives. Specifically, from the feature perspective, we build a question-to-answer and vision-to-answer branch to capture the language and vision biases, respectively. Next, we apply two unimodal bias detection modules to explicitly recognise and remove the negative biases. From the sample perspective, we construct two types of negative samples to assist the training of the models, without introducing additional annotations. Extensive experiments on the VQA-CP v2 and VQA v2 datasets demonstrate the effectiveness of our D-VQA method.

PDF Details

TIST Journal 2020 Journal Article

Domain-attention Conditional Wasserstein Distance for Multi-source Domain Adaptation

Hanrui Wu
Yuguang Yan
Michael K. Ng
Qingyao Wu

Multi-source domain adaptation has received considerable attention due to its effectiveness of leveraging the knowledge from multiple related sources with different distributions to enhance the learning performance. One of the fundamental challenges in multi-source domain adaptation is how to determine the amount of knowledge transferred from each source domain to the target domain. To address this issue, we propose a new algorithm, called Domain-attention Conditional Wasserstein Distance (DCWD), to learn transferred weights for evaluating the relatedness across the source and target domains. In DCWD, we design a new conditional Wasserstein distance objective function by taking the label information into consideration to measure the distance between a given source domain and the target domain. We also develop an attention scheme to compute the transferred weights of different source domains based on their conditional Wasserstein distances to the target domain. After that, the transferred weights can be used to reweight the source data to determine their importance in knowledge transfer. We conduct comprehensive experiments on several real-world data sets, and the results demonstrate the effectiveness and efficiency of the proposed method.

Details DOI

IJCAI Conference 2019 Conference Paper

Knowledge-enhanced Hierarchical Attention for Community Question Answering with Multi-task and Adaptive Learning

Min Yang
Lei Chen
Xiaojun Chen
Qingyao Wu
Wei Zhou
Ying Shen

In this paper, we propose a Knowledge-enhanced Hierarchical Attention for community question answering with Multi-task learning and Adaptive learning (KHAMA). First, we propose a hierarchical attention network to fully fuse knowledge from input documents and knowledge base (KB) by exploiting the semantic compositionality of the input sequences. The external factual knowledge helps recognize background knowledge (entity mentions and their relationships) and eliminate noise information from long documents that have sophisticated syntactic and semantic structures. In addition, we build multiple CQA models with adaptive boosting and then combine these models to learn a more effective and robust CQA system. Further- more, KHAMA is a multi-task learning model. It regards CQA as the primary task and question categorization as the auxiliary task, aiming at learning a category-aware document encoder and enhance the quality of identifying essential information from long questions. Extensive experiments on two benchmarks demonstrate that KHAMA achieves substantial improvements over the compared methods.

PDF Details

TIST Journal 2019 Journal Article

Online Heterogeneous Transfer Learning by Knowledge Transition

Hanrui Wu
Yuguang Yan
Yuzhong Ye
Huaqing Min
Michael K. Ng
Qingyao Wu

In this article, we study the problem of online heterogeneous transfer learning, where the objective is to make predictions for a target data sequence arriving in an online fashion, and some offline labeled instances from a heterogeneous source domain are provided as auxiliary data. The feature spaces of the source and target domains are completely different, thus the source data cannot be used directly to assist the learning task in the target domain. To address this issue, we take advantage of unlabeled co-occurrence instances as intermediate supplementary data to connect the source and target domains, and perform knowledge transition from the source domain into the target domain. We propose a novel online heterogeneous transfer learning algorithm called O nline H eterogeneous K nowledge T ransition (OHKT) for this purpose. In OHKT, we first seek to generate pseudo labels for the co-occurrence data based on the labeled source data, and then develop an online learning algorithm to classify the target sequence by leveraging the co-occurrence data with pseudo labels. Experimental results on real-world data sets demonstrate the effectiveness and efficiency of the proposed algorithm.

Details DOI

AAAI Conference 2019 Conference Paper

Oversampling for Imbalanced Data via Optimal Transport

Yuguang Yan
Mingkui Tan
Yanwu Xu
Jiezhang Cao
Michael Ng
Huaqing Min
Qingyao Wu

The issue of data imbalance occurs in many real-world applications especially in medical diagnosis, where normal cases are usually much more than the abnormal cases. To alleviate this issue, one of the most important approaches is the oversampling method, which seeks to synthesize minority class samples to balance the numbers of different classes. However, existing methods barely consider global geometric information involved in the distribution of minority class samples, and thus may incur distribution mismatching between real and synthetic samples. In this paper, relying on optimal transport (Villani 2008), we propose an oversampling method by exploiting global geometric information of data to make synthetic samples follow a similar distribution to that of minority class samples. Moreover, we introduce a novel regularization based on synthetic samples and shift the distribution of minority class samples according to loss information. Experiments on toy and real-world data sets demonstrate the efficacy of our proposed method in terms of multiple metrics.

PDF Details

AAAI Conference 2018 Short Paper

A Stratified Feature Ranking Method for Supervised Feature Selection

Renjie Chen
Xiaojun Chen
Guowen Yuan
Wenya Sun
Qingyao Wu

Most feature selection methods usually select the highest rank features which may be highly correlated with each other. In this paper, we propose a Stratiﬁed Feature Ranking (SFR) method for supervised feature selection. In the new method, a Subspace Feature Clustering (SFC) is proposed to identify feature clusters, and a stratiﬁed feature ranking method is proposed to rank the features such that the high rank features are lowly correlated. Experimental results show the superiority of SFR.

PDF Details

ICML Conference 2018 Conference Paper

Adversarial Learning with Local Coordinate Coding

Jiezhang Cao
Yong Guo
Qingyao Wu
Chunhua Shen
Junzhou Huang
Mingkui Tan

Generative adversarial networks (GANs) aim to generate realistic data from some prior distribution (e. g. , Gaussian noises). However, such prior distribution is often independent of real data and thus may lose semantic information (e. g. , geometric structure or content in images) of data. In practice, the semantic information might be represented by some latent distribution learned from data, which, however, is hard to be used for sampling in GANs. In this paper, rather than sampling from the pre-defined prior distribution, we propose a Local Coordinate Coding (LCC) based sampling method to improve GANs. We derive a generalization bound for LCC based GANs and prove that a small dimensional input is sufficient to achieve good generalization. Extensive experiments on various real-world datasets demonstrate the effectiveness of the proposed method.

Details

NeurIPS Conference 2018 Conference Paper

Discrimination-aware Channel Pruning for Deep Neural Networks

Zhuangwei Zhuang
Mingkui Tan
Bohan Zhuang
Jing Liu
Yong Guo
Qingyao Wu
Junzhou Huang
Jinhui Zhu

Channel pruning is one of the predominant approaches for deep model compression. Existing pruning methods either train from scratch with sparsity constraints on channels, or minimize the reconstruction error between the pre-trained feature maps and the compressed ones. Both strategies suffer from some limitations: the former kind is computationally expensive and difficult to converge, whilst the latter kind optimizes the reconstruction error but ignores the discriminative power of channels. To overcome these drawbacks, we investigate a simple-yet-effective method, called discrimination-aware channel pruning, to choose those channels that really contribute to discriminative power. To this end, we introduce additional losses into the network to increase the discriminative power of intermediate layers and then select the most discriminative channels for each layer by considering the additional loss and the reconstruction error. Last, we propose a greedy algorithm to conduct channel selection and parameter optimization in an iterative way. Extensive experiments demonstrate the effectiveness of our method. For example, on ILSVRC-12, our pruned ResNet-50 with 30% reduction of channels even outperforms the original model by 0. 39% in top-1 accuracy.

PDF Details

AAAI Conference 2018 Conference Paper

Double Forward Propagation for Memorized Batch Normalization

Yong Guo
Qingyao Wu
Chaorui Deng
Jian Chen
Mingkui Tan

Batch Normalization (BN) has been a standard component in designing deep neural networks (DNNs). Although the standard BN can signiﬁcantly accelerate the training of DNNs and improve the generalization performance, it has several underlying limitations which may hamper the performance in both training and inference. In the training stage, BN relies on estimating the mean and variance of data using a single minibatch. Consequently, BN can be unstable when the batch size is very small or the data is poorly sampled. In the inference stage, BN often uses the so called moving mean and moving variance instead of batch statistics, i. e. , the training and inference rules in BN are not consistent. Regarding these issues, we propose a memorized batch normalization (MBN), which considers multiple recent batches to obtain more accurate and robust statistics. Note that after the SGD update for each batch, the model parameters will change, and the features will change accordingly, leading to the Distribution Shift before and after the update for the considered batch. To alleviate this issue, we present a simple Double-Forward scheme in MBN which can further improve the performance. Compared to related methods, the proposed MBN exhibits consistent behaviors in both training and inference. Empirical results show that the MBN based models trained with the Double-Forward scheme greatly reduce the sensitivity of data and signiﬁcantly improve the generalization performance.

PDF Details

IJCAI Conference 2018 Conference Paper

Semi-Supervised Optimal Transport for Heterogeneous Domain Adaptation

Yuguang Yan
Wen Li
Hanrui Wu
Huaqing Min
Mingkui Tan
Qingyao Wu

Heterogeneous domain adaptation (HDA) aims to exploit knowledge from a heterogeneous source domain to improve the learning performance in a target domain. Since the feature spaces of the source and target domains are different, the transferring of knowledge is extremely difficult. In this paper, we propose a novel semi-supervised algorithm for HDA by exploiting the theory of optimal transport (OT), a powerful tool originally designed for aligning two different distributions. To match the samples between heterogeneous domains, we propose to preserve the semantic consistency between heterogeneous domains by incorporating label information into the entropic Gromov-Wasserstein discrepancy, which is a metric in OT for different metric spaces, resulting in a new semi-supervised scheme. Via the new scheme, the target and transported source samples with the same label are enforced to follow similar distributions. Lastly, based on the Kullback-Leibler metric, we develop an efficient algorithm to optimize the resultant problem. Comprehensive experiments on both synthetic and real-world datasets demonstrate the effectiveness of our proposed method.

PDF Details

IJCAI Conference 2017 Conference Paper

Learning Discriminative Correlation Subspace for Heterogeneous Domain Adaptation

Yuguang Yan
Wen Li
Michael Ng
Mingkui Tan
Hanrui Wu
Huaqing Min
Qingyao Wu

Domain adaptation aims to reduce the effort on collecting and annotating target data by leveraging knowledge from a different source domain. The domain adaptation problem will become extremely challenging when the feature spaces of the source and target domains are different, which is also known as the heterogeneous domain adaptation (HDA) problem. In this paper, we propose a novel HDA method to find the optimal discriminative correlation subspace for the source and target data. The discriminative correlation subspace is inherited from the canonical correlation subspace between the source and target data, and is further optimized to maximize the discriminative ability for the target domain classifier. We formulate a joint objective in order to simultaneously learn the discriminative correlation subspace and the target domain classifier. We then apply an alternating direction method of multiplier (ADMM) algorithm to address the resulting non-convex optimization problem. Comprehensive experiments on two real-world data sets demonstrate the effectiveness of the proposed method compared to the state-of-the-art methods.

PDF Details

IS Journal 2014 Journal Article

Cotransfer Learning Using Coupled Markov Chains with Restart

Qingyao Wu
Michael K. Ng
Yunming Ye

This article studies cotransfer learning, a machine learning strategy that uses labeled data to enhance the classification of different learning spaces simultaneously. The authors model the problem as a coupled Markov chain with restart. The transition probabilities in the coupled Markov chain can be constructed using the intrarelationships based on the affinity metric among instances in the same space, and the interrelationships based on co-occurrence information among instances from different spaces. The learning algorithm computes ranking of labels to indicate the importance of a set of labels to an instance by propagating the ranking score of labeled instances via the coupled Markov chain with restart. Experimental results on benchmark data (multiclass image-text and English-Spanish-French classification datasets) have shown that the learning algorithm is computationally efficient, and effective in learning across different spaces.

Details DOI