Author name cluster

Hanzi Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

14 papers

1 author row

AAAI Conference 2026 Conference Paper

Joint Implicit and Explicit Language Learning for Pedestrian Attribute Recognition

Yukang Zhang
Lei Tan
Yang Lu
Yan Yan
Hanzi Wang

Pedestrian attribute recognition (PAR) has received increasing attention due to its wide application in video surveillance and pedestrian analysis. Some text-enhanced methods tackle this task by converting attributes into language descriptions to facilitate interactive learning between attributes and visual images. However, these generic languages fail to uniquely describe different pedestrian images, missing individual characteristics. In this paper, we propose a Joint Implicit and Explicit Language Guidance Enhancement Learning (JGEL) method, which converts each pedestrian image into a language description with dual language learning to effectively learn enhanced attribute information. Specifically, we first propose an Implicit Language Guidance Learning (ILGL) stream. It projects visual image features into the text embedding space to generate pseudo-word tokens, implicitly modeling image attributes and providing personalized descriptions. Moreover, we propose an Explicit Attribute Enhancement Learning (EAEL) stream to guide the generated pseudo-word tokens obtained by ILGL explicitly aligned with pedestrian attributes, which can effectively align the pseudo-word tokens with the attribute concepts in the text embedding space. Extensive experiments show that JGEL has significant advantages in improving the performance of PAR and the challenging zero-shot PAR task.

PDF Details DOI

AAAI Conference 2026 Conference Paper

SAM2-OV: A Novel Detection-Only Tuning Paradigm for Open-Vocabulary Multi-Object Tracking

Yangkai Chen
Qiangqiang Wu
Guangyao Li
Junlong Gao
Guanglin Niu
Hanzi Wang

Open-vocabulary multi-object tracking (OV-MOT) aims to track objects with unseen categories beyond the training set. While existing methods rely on pseudo video sequences synthesized from static images, they struggle to model realistic motion patterns, resulting in limited association performance in real-world scenarios. To alleviate these issues, we propose SAM2-OV, a novel association learning-free OV-MOT method that adopts a detection-only tuning paradigm, eliminating the need for synthetic sequences or spatiotemporal supervision and substantially reducing the overall learnable parameters. The core of our method is a Unified Detection Module (UDM), which effectively provides object-level prompts to enable SAM2 for OV-MOT. Enabled by UDM, SAM2-OV is the first to integrate SAM2 for OV-MOT, fully unleashing its zero-shot cross-frame association ability. To further enhance object association under occlusion and abrupt motion, we introduce a Motion Prior Assistance Module (MPAM) that incorporates motion cues into the mask selection process. In addition, a Semantic Enhancement Adapter (SEA) distilled from CLIP is used to improve classification generalization. A sparse prompting strategy is also adopted to reduce computational redundancy by triggering detection only on selected keyframes. As only the detection module is tuned on static images, the overall training process remains simple and efficient. Experiments on the TAO dataset demonstrate that SAM2-OV achieves state-of-the-art performance under the TETA metric, particularly on novel categories. Evaluations on the KITTI dataset show the strong zero-shot cross-domain transferability of our SAM2-OV.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Unlocker: Disentangle the Deadlock of Learning between Label-noisy and Long-tailed Data

shu chen
HongJun Xu
Ruichi Zhang
Mengke Li
Yonggang Zhang
Yang Lu
Bo Han
Yiu-ming Cheung

In real world, the observed label distribution of a dataset often mismatches its true distribution due to noisy labels. In this situation, noisy labels learning (NLL) methods directly integrated with long-tail learning (LTL) methods tend to fail due to a dilemma: NLL methods normally rely on unbiased model predictions to recover true distribution by selecting and correcting noisy labels; while LTL methods like logit adjustment depends on true distributions to adjust biased predictions, leading to a deadlock of mutual dependency defined in this paper. To address this, we propose \texttt{Unlocker}, a bilevel optimization framework that integrates NLL methods and LTL methods to iteratively disentangle this deadlock. The inner optimization leverages NLL to train the model, incorporating LTL methods to fairly select and correct noisy labels. The outer optimization adaptively determines an adjustment strength, mitigating model bias from over- or under-adjustment. We also theoretically prove that this bilevel optimization problem is convergent by transferring the outer optimization target to an equivalent problem with a closed-form solution. Extensive experiments on synthetic and real-world datasets demonstrate the effectiveness of our method in alleviating model bias and handling long-tailed noisy label data. Code is available at \url{https: //anonymous. 4open. science/r/neurips-2025-anonymous-1015/}.

PDF Details

NeurIPS Conference 2025 Conference Paper

WarpGAN: Warping-Guided 3D GAN Inversion with Style-Based Novel View Inpainting

Kaitao Huang
Yan Yan
Jing-Hao Xue
Hanzi Wang

3D GAN inversion projects a single image into the latent space of a pre-trained 3D GAN to achieve single-shot novel view synthesis, which requires visible regions with high fidelity and occluded regions with realism and multi-view consistency. However, existing methods focus on the reconstruction of visible regions, while the generation of occluded regions relies only on the generative prior of 3D GAN. As a result, the generated occluded regions often exhibit poor quality due to the information loss caused by the low bit-rate latent code. To address this, we introduce the warping-and-inpainting strategy to incorporate image inpainting into 3D GAN inversion and propose a novel 3D GAN inversion method, WarpGAN. Specifically, we first employ a 3D GAN inversion encoder to project the single-view image into a latent code that serves as the input to 3D GAN. Then, we perform warping to a novel view using the depth map generated by 3D GAN. Finally, we develop a novel SVINet, which leverages the symmetry prior and multi-view image correspondence w. r. t. the same latent code to perform inpainting of occluded regions in the warped image. Quantitative and qualitative experiments demonstrate that our method consistently outperforms several state-of-the-art methods.

PDF Details

IJCAI Conference 2024 Conference Paper

Dynamically Anchored Prompting for Task-Imbalanced Continual Learning

Chenxing Hong
Yan Jin
Zhiqi Kang
Yizhou Chen
Mengke Li
Yang Lu
Hanzi Wang

Existing continual learning literature relies heavily on a strong assumption that tasks arrive with a balanced data stream, which is often unrealistic in real-world applications. In this work, we explore task-imbalanced continual learning (TICL) scenarios where the distribution of task data is non-uniform across the whole learning process. We find that imbalanced tasks significantly challenge the capability of models to control the trade-off between stability and plasticity from the perspective of recent prompt-based continual learning methods. On top of the above finding, we propose Dynamically Anchored Prompting (DAP), a prompt-based method that only maintains a single general prompt to adapt to the shifts within a task stream dynamically. This general prompt is regularized in the prompt space with two specifically designed prompt anchors, called boosting anchor and stabilizing anchor, to balance stability and plasticity in TICL. Remarkably, DAP achieves this balance by only storing a prompt across the data stream, therefore offering a substantial advantage in rehearsal-free CL. Extensive experiments demonstrate that the proposed DAP results in 4. 5% to 15% absolute improvements over state-of-the-art methods on benchmarks under task-imbalanced settings. Our code is available at https: //github. com/chenxing6666/DAP.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Federated Learning with Extremely Noisy Clients via Negative Distillation

Yang Lu
Lin Chen
Yonggang Zhang
Yiliang Zhang
Bo Han
Yiu-ming Cheung
Hanzi Wang

Federated learning (FL) has shown remarkable success in cooperatively training deep models, while typically struggling with noisy labels. Advanced works propose to tackle label noise by a re-weighting strategy with a strong assumption, i.e., mild label noise. However, it may be violated in many real-world FL scenarios because of highly contaminated clients, resulting in extreme noise ratios, e.g., >90%. To tackle extremely noisy clients, we study the robustness of the re-weighting strategy, showing a pessimistic conclusion: minimizing the weight of clients trained over noisy data outperforms re-weighting strategies. To leverage models trained on noisy clients, we propose a novel approach, called negative distillation (FedNed). FedNed first identifies noisy clients and employs rather than discards the noisy clients in a knowledge distillation manner. In particular, clients identified as noisy ones are required to train models using noisy labels and pseudo-labels obtained by global models. The model trained on noisy labels serves as a ‘bad teacher’ in knowledge distillation, aiming to decrease the risk of providing incorrect information. Meanwhile, the model trained on pseudo-labels is involved in model aggregation if not identified as a noisy client. Consequently, through pseudo-labeling, FedNed gradually increases the trustworthiness of models trained on noisy clients, while leveraging all clients for model aggregation through negative distillation. To verify the efficacy of FedNed, we conduct extensive experiments under various settings, demonstrating that FedNed can consistently outperform baselines and achieve state-of-the-art performance.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Spatial-Contextual Discrepancy Information Compensation for GAN Inversion

Ziqiang Zhang
Yan Yan
Jing-Hao Xue
Hanzi Wang

Most existing GAN inversion methods either achieve accurate reconstruction but lack editability or offer strong editability at the cost of fidelity. Hence, how to balance the distortion-editability trade-off is a significant challenge for GAN inversion. To address this challenge, we introduce a novel spatial-contextual discrepancy information compensation-based GAN-inversion method (SDIC), which consists of a discrepancy information prediction network (DIPN) and a discrepancy information compensation network (DICN). SDIC follows a ``compensate-and-edit'' paradigm and successfully bridges the gap in image details between the original image and the reconstructed/edited image. On the one hand, DIPN encodes the multi-level spatial-contextual information of the original and initial reconstructed images and then predicts a spatial-contextual guided discrepancy map with two hourglass modules. In this way, a reliable discrepancy map that models the contextual relationship and captures fine-grained image details is learned. On the other hand, DICN incorporates the predicted discrepancy information into both the latent code and the GAN generator with different transformations, generating high-quality reconstructed/edited images. This effectively compensates for the loss of image details during GAN inversion. Both quantitative and qualitative experiments demonstrate that our proposed method achieves the excellent distortion-editability trade-off at a fast inference speed for both image inversion and editing tasks. Our code is available at https://github.com/ZzqLKED/SDIC.

PDF Details DOI

TIST Journal 2023 Journal Article

Explicit State Representation Guided Video-based Pedestrian Attribute Recognition

Wei-Qing Lu
Hai-Miao Hu
Jinzuo Yu
Shifeng Zhang
Hanzi Wang

The pedestrian attribute recognition aims to generate a structured description of pedestrians, which serves an important role in surveillance. Current works usually assume that the images and the specific pedestrian states, including pedestrian occlusion and pedestrian orientation, are given. However, we argue that the current works ignore the guidance of the pedestrian state and cannot achieve the appropriate performance since the appearance feature will become unreliable due to the variance of the pedestrian state, which is common in practice. Therefore, this paper proposes the Explicit State Representation (ExSR) Guided Pedestrian Attribute Recognition to improve the accuracy through state learning and attribute fusion among frames. Firstly, the pedestrian state is explicitly represented by concatenating the pedestrian orientation and occlusion, which can be accurately determined via analyzing the pose. Secondly, the state-aware pedestrian attribute fusion method is proposed and divided into two cases, namely the inter-state case and the intra-state case. In the intra-state case, the appearance feature will remain stable and the attribute relations are propagated to refine. The method of exploiting attribute relations within a single frame is the Graph Neural Network. In the inter-state case, the state changes, the attribute relationship propagation is prevented, and the advantages of attribute recognition in each frame are complemented to make a reliable judgment on the invisible region. The experimental results demonstrate that the ExSR outperforms the state-of-the-art methods on two public databases, benefiting from the explicit introduction of the state into the attribute recognition.

Details DOI

AAAI Conference 2023 Conference Paper

MRCN: A Novel Modality Restitution and Compensation Network for Visible-Infrared Person Re-identification

Yukang Zhang
Yan Yan
Jie Li
Hanzi Wang

Visible-infrared person re-identification (VI-ReID), which aims to search identities across different spectra, is a challenging task due to large cross-modality discrepancy between visible and infrared images. The key to reduce the discrepancy is to filter out identity-irrelevant interference and effectively learn modality-invariant person representations. In this paper, we propose a novel Modality Restitution and Compensation Network (MRCN) to narrow the gap between the two modalities. Specifically, we first reduce the modality discrepancy by using two Instance Normalization (IN) layers. Next, to reduce the influence of IN layers on removing discriminative information and to reduce modality differences, we propose a Modality Restitution Module (MRM) and a Modality Compensation Module (MCM) to respectively distill modality-irrelevant and modality-relevant features from the removed information. Then, the modality-irrelevant features are used to restitute to the normalized visible and infrared features, while the modality-relevant features are used to compensate for the features of the other modality. Furthermore, to better disentangle the modality-relevant features and the modality-irrelevant features, we propose a novel Center-Quadruplet Causal (CQC) loss to encourage the network to effectively learn the modality-relevant features and the modality-irrelevant features. Extensive experiments are conducted to validate the superiority of our method on the challenging SYSU-MM01 and RegDB datasets. More remarkably, our method achieves 95.1% in terms of Rank-1 and 89.2% in terms of mAP on the RegDB dataset.

PDF Details DOI

IJCAI Conference 2022 Conference Paper

Federated Learning on Heterogeneous and Long-Tailed Data via Classifier Re-Training with Federated Features

Xinyi Shang
Yang Lu
Gang Huang
Hanzi Wang

Federated learning (FL) provides a privacy-preserving solution for distributed machine learning tasks. One challenging problem that severely damages the performance of FL models is the co-occurrence of data heterogeneity and long-tail distribution, which frequently appears in real FL applications. In this paper, we reveal an intriguing fact that the biased classifier is the primary factor leading to the poor performance of the global model. Motivated by the above finding, we propose a novel and privacy-preserving FL method for heterogeneous and long-tailed data via Classifier Re-training with Federated Features (CReFF). The classifier re-trained on federated features can produce comparable performance as the one re-trained on real data in a privacy-preserving manner without information leakage of local data or class distribution. Experiments on several benchmark datasets show that the proposed CReFF is an effective solution to obtain a promising FL model under heterogeneous and long-tailed data. Comparative results with the state-of-the-art FL methods also validate the superiority of CReFF. Our code is available at https: //github. com/shangxinyi/CReFF-FL.

PDF Details DOI

AAAI Conference 2022 Conference Paper

When Facial Expression Recognition Meets Few-Shot Learning: A Joint and Alternate Learning Framework

Xinyi Zou
Yan Yan
Jing-Hao Xue
Si Chen
Hanzi Wang

Human emotions involve basic and compound facial expressions. However, current research on facial expression recognition (FER) mainly focuses on basic expressions, and thus fails to address the diversity of human emotions in practical scenarios. Meanwhile, existing work on compound FER relies heavily on abundant labeled compound expression training data, which are often laboriously collected under the professional instruction of psychology. In this paper, we study compound FER in the cross-domain few-shot learning setting, where only a few images of novel classes from the target domain are required as a reference. In particular, we aim to identify unseen compound expressions with the model trained on easily accessible basic expression datasets. To alleviate the problem of limited base classes in our FER task, we propose a novel Emotion Guided Similarity Network (EGS-Net), consisting of an emotion branch and a similarity branch, based on a two-stage learning framework. Specifically, in the first stage, the similarity branch is jointly trained with the emotion branch in a multi-task fashion. With the regularization of the emotion branch, we prevent the similarity branch from overfitting to sampled base classes that are highly overlapped across different episodes. In the second stage, the emotion branch and the similarity branch play a “two-student game” to alternately learn from each other, thereby further improving the inference ability of the similarity branch on unseen compound expressions. Experimental results on both in-the-lab and in-the-wild compound expression datasets demonstrate the superiority of our proposed method against several stateof-the-art methods.

PDF Details

AAAI Conference 2020 Conference Paper

End-to-End Learning of Object Motion Estimation from Retinal Events for Event-Based Object Tracking

Haosheng Chen
David Suter
Qiangqiang Wu
Hanzi Wang

Event cameras, which are asynchronous bio-inspired vision sensors, have shown great potential in computer vision and artiﬁcial intelligence. However, the application of event cameras to object-level motion estimation or tracking is still in its infancy. The main idea behind this work is to propose a novel deep neural network to learn and regress a parametric object-level motion/transform model for event-based object tracking. To achieve this goal, we propose a synchronous Time-Surface with Linear Time Decay (TSLTD) representation, which effectively encodes the spatio-temporal information of asynchronous retinal events into TSLTD frames with clear motion patterns. We feed the sequence of TSLTD frames to a novel Retinal Motion Regression Network (RM- RNet) to perform an end-to-end 5-DoF object motion regression. Our method is compared with state-of-the-art object tracking methods, that are based on conventional cameras or event cameras. The experimental results show the superiority of our method in handling various challenging environments such as fast motion and low illumination conditions.

PDF Details

AAAI Conference 2019 Conference Paper

Hypergraph Optimization for Multi-Structural Geometric Model Fitting

Shuyuan Lin
Guobao Xiao
Yan Yan
David Suter
Hanzi Wang

Recently, some hypergraph-based methods have been proposed to deal with the problem of model fitting in computer vision, mainly due to the superior capability of hypergraph to represent the complex relationship between data points. However, a hypergraph becomes extremely complicated when the input data include a large number of data points (usually contaminated with noises and outliers), which will significantly increase the computational burden. In order to overcome the above problem, we propose a novel hypergraph optimization based model fitting (HOMF) method to construct a simple but effective hypergraph. Specifically, HOMF includes two main parts: an adaptive inlier estimation algorithm for vertex optimization and an iterative hyperedge optimization algorithm for hyperedge optimization. The proposed method is highly efficient, and it can obtain accurate model fitting results within a few iterations. Moreover, HOMF can then directly apply spectral clustering, to achieve good fitting performance. Extensive experimental results show that HOMF outperforms several state-of-the-art model fitting methods on both synthetic data and real images, especially in sampling efficiency and in handling data with severe outliers.

PDF Details

NeurIPS Conference 2009 Conference Paper

The Ordered Residual Kernel for Robust Motion Subspace Clustering

Tat-Jun Chin
Hanzi Wang
David Suter

We present a novel and highly effective approach for multi-body motion segmentation. Drawing inspiration from robust statistical model fitting, we estimate putative subspace hypotheses from the data. However, instead of ranking them we encapsulate the hypotheses in a novel Mercer kernel which elicits the potential of two point trajectories to have emerged from the same subspace. The kernel permits the application of well-established statistical learning methods for effective outlier rejection, automatic recovery of the number of motions and accurate segmentation of the point trajectories. The method operates well under severe outliers arising from spurious trajectories or mistracks. Detailed experiments on a recent benchmark dataset (Hopkins 155) show that our method is superior to other state-of-the-art approaches in terms of recovering the number of motions, segmentation accuracy, robustness against gross outliers and computational efficiency.

PDF Details