Author name cluster

Zhengming Ding

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

36 papers

2 author rows

NeurIPS Conference 2025 Conference Paper

$i$MIND: Insightful Multi-subject Invariant Neural Decoding

Zixiang Yin
Jiarui Li
Zhengming Ding

Decoding visual signals holds an appealing potential to unravel the complexities of cognition and perception. While recent reconstruction tasks leverage powerful generative models to produce high-fidelity images from neural recordings, they often pay limited attention to the underlying neural representations and rely heavily on pretrained priors. As a result, they provide little insight into how individual voxels encode and differentiate semantic content or how these representations vary across subjects. To mitigate this gap, we present an $i$nsightful **M**ulti-subject **I**nvariant **N**eural **D**ecoding ($i$MIND) model, which employs a novel dual-decoding framework--both biometric and semantic decoding--to offer neural interpretability in a data-driven manner and deepen our understanding of brain-based visual functionalities. Our $i$MIND model operates through three core steps: establishing a shared neural representation space across subjects using a ViT-based masked autoencoder, disentangling neural features into complementary subject-specific and object-specific components, and performing dual decoding to support both biometric and semantic classification tasks. Experimental results demonstrate that $i$MIND achieves state-of-the-art decoding performance with minimal scalability limitations. Furthermore, $i$MIND empirically generates voxel-object activation fingerprints that reveal object-specific neural patterns and enable investigation of subject-specific variations in attention to identical stimuli. These findings provide a foundation for more interpretable and generalizable subject-invariant neural decoding, advancing our understanding of the voxel semantic selectivity as well as the neural vision processing dynamics.

IJCAI Conference 2025 Conference Paper

A Simple yet Effective Hypergraph Clustering Network

Qianqian Wang
Bowen Zhao
Zhengming Ding
Xiangdong Zhang
Quanxue Gao

Hypergraph Clustering has gained significant attention due to its capability of capturing high order structural information. Among different approaches, contrastive learning-based methods leverage self-supervised learning and data augmentation, exhibiting impressive performance. However, most of them come with the following limitations: 1) Augmentation strategies like feature dropout can potentially disrupt the intrinsic clustering structure of hypergraphs. 2) High computational demands hinder their real-world application. To address the above issues, we propose a simple yet effective Hypergraph Clustering Network framework (HCN). Specifically, HCN replaces the hypergraph convolution operation with smoothing preprocessing, which avoids high computational complexity. Besides, to retain intrinsic structure, it develops two key modules: the self-diagonal consistency module and the structure alignment mod ule. They respectively align the similarity matrix with the identity matrix and the structural affinity matrix, which ensures intra-cluster compact ness and inter-cluster separability. Extensive experiments on five benchmark datasets demonstrate HCN’s superiority over state-of-the-art methods.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Diffusion Guided Adversarial State Perturbations in Reinforcement Learning

Xiaolin Sun
Feidi Liu
Zhengming Ding
Zizhan Zheng

Reinforcement learning (RL) systems, while achieving remarkable success across various domains, are vulnerable to adversarial attacks. This is especially a concern in vision-based environments where minor manipulations of high-dimensional image inputs can easily mislead the agent's behavior. To this end, various defenses have been proposed recently, with state-of-the-art approaches achieving robust performance even under large state perturbations. However, after closer investigation, we found that the effectiveness of the current defenses is due to a fundamental weakness of the existing $l_p$ norm-constrained attacks, which can barely alter the semantics of image input even under a relatively large perturbation budget. In this work, we propose SHIFT, a novel policy-agnostic diffusion-based state perturbation attack to go beyond this limitation. Our attack is able to generate perturbed states that are semantically different from the true states while remaining realistic and history-aligned to avoid detection. Evaluations show that our attack effectively breaks existing defenses, including the most sophisticated ones, significantly outperforming existing attacks while being more perceptually stealthy. The results highlight the vulnerability of RL agents to semantics-aware adversarial perturbations, indicating the importance of developing more robust policies.

NeurIPS Conference 2025 Conference Paper

Doctor Approved: Generating Medically Accurate Skin Disease Images through AI-Expert Feedback

Janet Wang
Yunbei Zhang
Zhengming Ding
Jihun Hamm

Paucity of medical data severely limits the generalizability of diagnostic ML models, as the full spectrum of disease variability can not be represented by a small clinical dataset. To address this, diffusion models (DMs) have been considered as a promising avenue for synthetic image generation and augmentation. However, they frequently produce medically inaccurate images, deteriorating the model performance. Expert domain knowledge is critical for synthesizing images that correctly encode clinical information, especially when data is scarce and quality outweighs quantity. Existing approaches for incorporating human feedback, such as reinforcement learning (RL) and Direct Preference Optimization (DPO), rely on robust reward functions or demand labor-intensive expert evaluations. Recent progress in Multimodal Large Language Models (MLLMs) reveals their strong visual reasoning capabilities, making them adept candidates as evaluators. In this work, we propose a novel framework, coined MAGIC ( M edically A ccurate G eneration of I mages through AI-Expert C ollaboration), that synthesizes clinically accurate skin disease images for data augmentation. Our method creatively translates expert-defined criteria into actionable feedback for image synthesis of DMs, significantly improving clinical accuracy while reducing the direct human workload. Experiments demonstrate that our method greatly improves the clinical quality of synthesized skin disease images, with outputs aligning with dermatologist assessments. Additionally, augmenting training data with these synthesized images improves diagnostic accuracy by +9. 02% on a challenging 20-condition skin disease classification task, and by +13. 89% in the few-shot setting. Beyond image synthesis, MAGIC illustrates a task-centric alignment paradigm: instead of adapting MLLMs to niche medical tasks, it adapts tasks to the evaluative strengths of general-purpose MLLMs by decomposing domain knowledge into attribute-level checklists. This design offers a scalable and reliable path for leveraging foundation models in specialized domains.

IJCAI Conference 2025 Conference Paper

Enhanced Unsupervised Discriminant Dimensionality Reduction for Nonlinear Data

Qianqian Wang
Mengping Jiang
Wei Feng
Zhengming Ding

Linear Discriminant Analysis (LDA) is a classical supervised dimensionality reduction algorithm. However, LDA focuses more on global structure and overly depends on reliable data labels. For data with outliers and nonlinear structures, LDA cannot effectively capture the true structure of the data. Moreover, the subspace dimension learned by LDA must be smaller than cluster number, which limits its practical applications. To address these issues, we propose a novel unsupervised LDA method that combines centerless K-means and LDA. This method eliminates the need to calculate cluster centroids and improves model robustness. By fusing centerless K-means and LDA into a unified framework and deducing the connection between K-means and manifold learning, this method captures the local manifold structure and discriminative structure. Additionally, the dimensionality of the subspace is not restricted. This method not only overcomes the limitations of traditional LDA but also improves the model’s adaptability to complex data. Extensive experiments on seven datasets demonstrate the effectiveness of the proposed method.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

PSI: A Benchmark for Human Interpretation and Response in Traffic Interactions

TAOTAO JING
Tina Chen
Renran Tian
Yaobin Chen
Joshua Domeyer
Heishiro Toyoda
Rini Sherony
Zhengming Ding

Accurately modeling pedestrian intention and understanding driver decision-making processes are critical for the development of safe and socially aware autonomous driving systems. However, existing datasets primarily emphasize observable behavior, offering limited insight into the underlying causal reasoning that informs human interpretation and response during traffic interactions. To address this gap, we introduce PSI, a benchmark dataset that captures the dynamic evolution of pedestrian crossing intentions from the driver’s perspective, enriched with human-annotated textual explanations that reflect the reasoning behind intention estimation and driving decision making. These annotations offer a unique foundation for developing and benchmarking models that combine predictive performance with interpretable and human-aligned reasoning. PSI supports standardized tasks and evaluation protocols across multiple dimensions, including pedestrian intention prediction, driver decision modeling, reasoning generation, and trajectory forecasting and more. By enabling causal and interpretable evaluation, PSI advances research toward autonomous systems that can reason, act, and explain in alignment with human cognitive processes.

NeurIPS Conference 2025 Conference Paper

Rethinking Joint Maximum Mean Discrepancy for Visual Domain Adaptation

Wei Wang
Haifeng Xia
Chao Huang
Zhengming Ding
Cong Wang
Haojie Li
Xiaochun Cao

In domain adaption (DA), joint maximum mean discrepancy (JMMD), as a famous distribution-distance metric, aims to measure joint probability distribution difference between the source domain and target domain, while it is still not fully explored and especially hard to be applied into a subspace-learning framework as its empirical estimation involves a tensor-product operator whose partial derivative is difficult to obtain. To solve this issue, we deduce a concise JMMD based on the Representer theorem that avoids the tensor-product operator and obtains two essential findings. First, we reveal the uniformity of JMMD by proving that previous marginal, class conditional, and weighted class conditional probability distribution distances are three special cases of JMMD with different label reproducing kernels. Second, inspired by graph embedding, we observe that the similarity weights, which strengthen the intra-class compactness in the graph of Hilbert Schmidt independence criterion (HSIC), take opposite signs in the graph of JMMD, revealing why JMMD degrades the feature discrimination. This motivates us to propose a novel loss JMMD-HSIC by jointly considering JMMD and HSIC to promote discrimination of JMMD. Extensive experiments on several cross-domain datasets could demonstrate the validity of our revealed theoretical results and the effectiveness of our proposed JMMD-HSIC.

ICRA Conference 2025 Conference Paper

RoBiFusion: A Robust and Bidirectional Interaction Camera-LiDAR 3D Object Detection Framework

Xubin Wen
Haifeng Xia
Zhengming Ding
Siyu Xia

Camera-LiDAR 3D object detection is currently becoming a crucial component in the field of autonomous driving perception. However, previous models only performed feature fusion in the deep-level BEV hierarchy when dealing with camera-LiDAR feature fusion. This approach lacks interaction with the shallow-level sensor features, which is beneficial in constructing the corresponding BEV features. However, a simple shallow-level feature interaction can introduce sensor noise caused by intrinsic and extrinsic camera calibration errors. To address this, we propose RoBiFusion, a novel camera-LiDAR 3D object detection framework designed for effective sensor feature interaction and mitigating sensor noise interference. This framework consists of three submodules: the Camera-LiDAR Feature Matching module, the LiDAR-to-Camera module, and the Camera-to-LiDAR module. Firstly, in the Camera-LiDAR Feature Matching module, we use the cross-attention module to dynamically match the camera features and the LiDAR features, which solves the problem of feature inconsistency caused by noise in the camera's intrinsic and extrinsic parameters. Secondly, in the LiDAR-to-Camera module, we propose a novel depth representation that can effectively mitigate LiDAR noise interference. Thirdly, in the Camera-to-LiDAR module, we introduce deformable attention to help LiDAR feature capture instance-level semantic features. Additionally, we design a novel differentiable and efficient grid sample module to accelerate the process since the bilinear grid sample module in deformable attention is time-consuming and not deployment-friendly. We compared RoBiFusion to the state-of-the-art BEVFusion on the nuScenes dataset and found that RoBiFusion surpasses BEVFusion by 1. 5% mAP and 2. 4% NDS. Furthermore, we designed a series of ablation experiments to verify the effectiveness of the aforementioned modules.

AAAI Conference 2025 Conference Paper

Supportive Negatives Spectral Augmentation for Source-Free Cross-Domain Segmentation

Kexin Zheng
Haifeng Xia
Siyu Xia
Ming Shao
Zhengming Ding

Source-free domain adaptation (SFDA) aims to transfer knowledge from the well-trained source model and optimize it to adapt target data distribution. SFDA methods are suitable for medical image segmentation task due to its data-privacy protection and achieve promising performances. However, cross-domain distribution shift makes it difficult for the adapted model to provide accurate decisions on several hard instances and negatively affects model generalization. To overcome this limitation, a novel method `supportive negatives spectral augmentation' (SNSA) is presented in this work. Concretely, SNSA includes the instance selection mechanism to automatically discover a few hard samples for which source model produces incorrect predictions. And, active learning strategy is adopted to re-calibrate their predictive masks. Moreover, SNSA deploys the spectral augmentation between hard instances and others to encourage source model to gradually capture and adapt the attributions of target distribution. Considerable experimental studies demonstrate that annotating merely 4%~5% of negative instances from the target domain significantly improves segmentation performance over previous methods.

PDF Details DOI

ICML Conference 2025 Conference Paper

Unified K-Means Clustering with Label-Guided Manifold Learning

Qianqian Wang 0001
Mengping Jiang
Zhengming Ding
Quanxue Gao

K-Means clustering is a classical and effective unsupervised learning method attributed to its simplicity and efficiency. However, it faces notable challenges, including sensitivity to random initial centroid selection, a limited ability to discover the intrinsic manifold structures within nonlinear datasets, and difficulty in achieving balanced clustering in practical scenarios. To overcome these weaknesses, we introduce a novel framework for K-Means that leverages manifold learning. This approach eliminates the need for centroid calculation and utilizes a cluster indicator matrix to align the manifold structures, thereby enhancing clustering accuracy. Beyond the traditional Euclidean distance, our model incorporates Gaussian kernel distance, K-nearest neighbor distance, and low-pass filtering distance to effectively manage data that is not linearly separable. Furthermore, we introduce a balanced regularizer to achieve balanced clustering results. The detailed experimental results demonstrate the efficacy of our proposed methodology.

IROS Conference 2023 Conference Paper

IDA: Informed Domain Adaptive Semantic Segmentation

Zheng Chen 0016
Zhengming Ding
Jason M. Gregory
Lantao Liu

Mixup-based data augmentation has been validated to be a critical stage in the self-training framework for unsupervised domain adaptive semantic segmentation (UDASS), which aims to transfer knowledge from a well-annotated (source) domain to an unlabeled (target) domain. Existing self-training methods usually adopt the popular region-based mixup techniques with a random sampling strategy, which unfortunately ignores the dynamic evolution of different semantics across various domains as training proceeds. To improve the UDA-SS performance, we propose an Informed Domain Adaptation (IDA) model, a self-training framework that mixes the data based on class-level segmentation performance, which aims to emphasize small-region semantics during mixup. In our IDA model, the class-level performance is tracked by an expected confidence score (ECS). We then use a dynamic schedule to determine the mixing ratio for data in different domains. Extensive experimental results reveal that our proposed method is able to outperform the state-of-the-art UDA-SS method by a margin of 1. 1 mIoU in the adaptation of GTA-V to Cityscapes and of 0. 9 mIoU in the adaptation of SYNTHIA to Cityscapes. Code link: https://github.com/ArlenCHEN/IDA.git

IJCAI Conference 2023 Conference Paper

RAIN: RegulArization on Input and Network for Black-Box Domain Adaptation

Qucheng Peng
Zhengming Ding
Lingjuan Lyu
Lichao Sun
Chen Chen

Source-Free domain adaptation transits the source-trained model towards target domain without exposing the source data, trying to dispel these concerns about data privacy and security. However, this paradigm is still at risk of data leakage due to adversarial attacks on the source model. Hence, the Black-Box setting only allows to use the outputs of source model, but still suffers from overfitting on the source domain more severely due to source model's unseen weights. In this paper, we propose a novel approach named RAIN (RegulArization on Input and Network) for Black-Box domain adaptation from both input-level and network-level regularization. For the input-level, we design a new data augmentation technique as Phase MixUp, which highlights task-relevant objects in the interpolations, thus enhancing input-level regularization and class consistency for target models. For network-level, we develop a Subnetwork Distillation mechanism to transfer knowledge from the target subnetwork to the full target network via knowledge distillation, which thus alleviates overfitting on the source domain by learning diverse target representations. Extensive experiments show that our method achieves state-of-the-art performance on several cross-domain benchmarks under both single- and multi-source black-box domain adaptation.

PDF Details DOI

AAAI Conference 2023 Conference Paper

TrEP: Transformer-Based Evidential Prediction for Pedestrian Intention with Uncertainty

Zhengming Zhang
Renran Tian
Zhengming Ding

With rapid development in hardware (sensors and processors) and AI algorithms, automated driving techniques have entered the public’s daily life and achieved great success in supporting human driving performance. However, due to the high contextual variations and temporal dynamics in pedestrian behaviors, the interaction between autonomous-driving cars and pedestrians remains challenging, impeding the development of fully autonomous driving systems. This paper focuses on predicting pedestrian intention with a novel transformer-based evidential prediction (TrEP) algorithm. We develop a transformer module towards the temporal correlations among the input features within pedestrian video sequences and a deep evidential learning model to capture the AI uncertainty under scene complexities. Experimental results on three popular pedestrian intent benchmarks have verified the effectiveness of our proposed model over the state-of-the-art. The algorithm performance can be further boosted by controlling the uncertainty level. We systematically compare human disagreements with AI uncertainty to further evaluate AI performance in confusing scenes. The code is released at https://github.com/zzmonlyyou/TrEP.git.

PDF Details DOI

IJCAI Conference 2022 Conference Paper

Adversarial Bi-Regressor Network for Domain Adaptive Regression

Haifeng Xia
Pu Wang
Toshiaki Koike-Akino
Ye Wang
Philip Orlik
Zhengming Ding

Domain adaptation (DA) aims to transfer the knowledge of a well-labeled source domain to facilitate unlabeled target learning. When turning to specific tasks such as indoor (Wi-Fi) localization, it is essential to learn a cross-domain regressor to mitigate the domain shift. This paper proposes a novel method Adversarial Bi-Regressor Network (ABRNet) to seek more effective cross- domain regression model. Specifically, a discrepant bi-regressor architecture is developed to maximize the difference of bi-regressor to discover uncertain target instances far from the source distribution, and then an adversarial training mechanism is adopted between feature extractor and dual regressors to produce domain-invariant representations. To further bridge the large domain gap, a domain- specific augmentation module is designed to synthesize two source-similar and target-similar inter- mediate domains to gradually eliminate the original domain mismatch. The empirical studies on two cross-domain regressive benchmarks illustrate the power of our method on solving the domain adaptive regression (DAR) problem.

PDF Details DOI

AAAI Conference 2022 Conference Paper

Cross-Domain Collaborative Normalization via Structural Knowledge

Haifeng Xia
Zhengming Ding

Batch Normalization (BN) as an important component assists Deep Neural Networks in achieving promising performance for extensive learning tasks by scaling distribution of feature representations within mini-batches. However, the application of BN suffers from performance degradation under the scenario of Unsupervised Domain Adaptation (UDA), since the estimated statistics fail to concurrently describe two different domains. In this paper, we develop a novel normalization technique, named Collaborative Normalization (CoN), for eliminating domain discrepancy and accelerating the model training of neural networks for UDA. Unlike typical strategies only exploiting domain-specific statistics during normalization, our CoN excavates cross-domain knowledge and simultaneously scales features from various domains by mimicking the merits of collaborative representation. Our CoN can be easily plugged into popular neural network backbones for cross-domain learning. On the one hand, theoretical analysis guarantees that models with CoN promote discriminability of feature representations and accelerate convergence rate; on the other hand, empirical study verifies that replacing BN with CoN in popular network backbones effectively improves classification accuracy in most learning tasks across three cross-domain visual benchmarks.

AAAI Conference 2021 Conference Paper

Balanced Open Set Domain Adaptation via Centroid Alignment

Mengmeng Jing
Jingjing Li
Lei Zhu
Zhengming Ding
Ke Lu
Yang Yang

Open Set Domain Adaptation (OSDA) is a challenging domain adaptation setting which allows the existence of unknown classes on the target domain. Although existing OSDA methods are good at classifying samples of known classes, they ignore the classification ability for the unknown samples, making them unbalanced OSDA methods. To alleviate this problem, we propose a balanced OSDA methods which could recognize the unknown samples while maintain high classification performance for the known samples. Specifically, to reduce the domain gaps, we first project the features to a hyperspherical latent space. In this space, we propose to bound the centroid deviation angles to not only increase the intraclass compactness but also enlarge the inter-class margins. With the bounded centroid deviation angles, we employ the statistical Extreme Value Theory to recognize the unknown samples that are misclassified into known classes. In addition, to learn better centroids, we propose an improved centroid update strategy based on sample reweighting and adaptive update rate to cooperate with centroid alignment. Experimental results on three OSDA benchmarks verify that our method can significantly outperform the compared methods and reduce the proportion of the unknown samples being misclassified into known classes.

AAAI Conference 2021 Conference Paper

Generative Partial Visual-Tactile Fused Object Clustering

Tao Zhang
Yang Cong
Gan Sun
Jiahua Dong
Yuyang Liu
Zhengming Ding

Visual-tactile fused sensing for object clustering has achieved significant progresses recently, since the involvement of tactile modality can effectively improve clustering performance. However, the missing data (i. e. , partial data) issues always happen due to occlusion and noises during the data collecting process. This issue is not well solved by most existing partial multi-view clustering methods for the heterogeneous modality challenge. Naively employing these methods would inevitably induce a negative effect and further hurt the performance. To solve the mentioned challenges, we propose a Generative Partial Visual-Tactile Fused (i. e. , GPVTF) framework for object clustering. More specifically, we first do partial visual and tactile features extraction from the partial visual and tactile data, respectively, and encode the extracted features in modality-specific feature subspaces. A conditional cross-modal clustering generative adversarial network is then developed to synthesize one modality conditioning on the other modality, which can compensate missing samples and align the visual and tactile modalities naturally by adversarial learning. To the end, two pseudo-label based KLdivergence losses are employed to update the corresponding modality-specific encoders. Extensive comparative experiments on three public visual-tactile datasets prove the effectiveness of our method.

NeurIPS Conference 2021 Conference Paper

Implicit Semantic Response Alignment for Partial Domain Adaptation

Wenxiao Xiao
Zhengming Ding
Hongfu Liu

Partial Domain Adaptation (PDA) addresses the unsupervised domain adaptation problem where the target label space is a subset of the source label space. Most state-of-art PDA methods tackle the inconsistent label space by assigning weights to classes or individual samples, in an attempt to discard the source data that belongs to the irrelevant classes. However, we believe samples from those extra categories would still contain valuable information to promote positive transfer. In this paper, we propose the Implicit Semantic Response Alignment to explore the intrinsic relationships among different categories by applying a weighted schema on the feature level. Specifically, we design a class2vec module to extract the implicit semantic topics from the visual features. With an attention layer, we calculate the semantic response according to each implicit semantic topic. Then semantic responses of source and target data are aligned to retain the relevant information contained in multiple categories by weighting the features, instead of samples. Experiments on several cross-domain benchmark datasets demonstrate the effectiveness of our method over the state-of-the-art PDA methods. Moreover, we elaborate in-depth analyses to further explore implicit semantic alignment.

ECAI Conference 2020 Conference Paper

Adaptive Local Neighbors for Transfer Discriminative Feature Learning

Wei Wang 0335
Zhihui Wang 0001
Haojie Li
Juan Zhou
Zhengming Ding

In Domain Adaptation (DA), how to reduce the distributional differences across domains and preserve the data structures are two critical issues to obtain domain-invariant features. Existing DA methods either preserve the Local Manifold Structure (LMS) or the Global Discriminative Consistency (GDC), while fail to take those two metrics into account simultaneously. Therefore, the extracted features are either short of discriminative ability or sensitive to the multimodally distributed data. Moreover, the local neighbored relationships among data points are mostly established in original data space, which is unreliable, especially for data with large noises. Therefore, this paper proposes a novel DA approach, i. e. , Adaptive Local Neighbors for Transfer Discriminative Feature Learning, to leverage LMS and GDC into a unified transfer feature learning model, where we only focus on the GDC between the local neighbors, so that the extracted features are more discriminative and robust to the multimodally distributed data. Moreover, the data points’ local neighbors are revealed adaptively in the learned subspace so that it is insensitive to the data noises. Compared with the state-of-the-art methods, the proposed approach achieves higher performance for different cross-domain image classification tasks, especially 3. 0% improved for Office10+Caltech10 dataset.

AAAI Conference 2020 Conference Paper

Bi-Directional Generation for Unsupervised Domain Adaptation

Guanglei Yang
Haifeng Xia
Mingli Ding
Zhengming Ding

Unsupervised domain adaptation facilitates the unlabeled target domain relying on well-established source domain information. The conventional methods forcefully reducing the domain discrepancy in the latent space will result in the destruction of intrinsic data structure. To balance the mitigation of domain gap and the preservation of the inherent structure, we propose a Bi-Directional Generation domain adaptation model with consistent classiﬁers interpolating two intermediate domains to bridge source and target domains. Speciﬁcally, two cross-domain generators are employed to synthesize one domain conditioned on the other. The performance of our proposed method can be further enhanced by the consistent classiﬁers and the cross-domain alignment constraints. We also design two classiﬁers which are jointly optimized to maximize the consistency on target sample prediction. Extensive experiments verify that our proposed model outperforms the state-of-the-art on standard cross domain visual benchmarks.

AAAI Conference 2020 Conference Paper

Domain Conditioned Adaptation Network

Shuang Li
Chi Liu
Qiuxia Lin
Binhui Xie
Zhengming Ding
Gao Huang
Jian Tang

Tremendous research efforts have been made to thrive deep domain adaptation (DA) by seeking domain-invariant features. Most existing deep DA models only focus on aligning feature representations of task-speciﬁc layers across domains while integrating a totally shared convolutional architecture for source and target. However, we argue that such strongly-shared convolutional layers might be harmful for domain-speciﬁc feature learning when source and target data distribution differs to a large extent. In this paper, we relax a shared-convnets assumption made by previous DA methods and propose a Domain Conditioned Adaptation Network (DCAN), which aims to excite distinct convolutional channels with a domain conditioned channel attention mechanism. As a result, the critical low-level domain-dependent knowledge could be explored appropriately. As far as we know, this is the ﬁrst work to explore the domain-wise convolutional channel activation for deep DA networks. Moreover, to effectively align high-level feature distributions across two domains, we further deploy domain conditioned feature correction blocks after task-speciﬁc layers, which will explicitly correct the domain discrepancy. Extensive experiments on three crossdomain benchmarks demonstrate the proposed approach outperforms existing methods by a large margin, especially on very tough cross-domain learning tasks.

IJCAI Conference 2018 Conference Paper

Adaptive Graph Guided Embedding for Multi-label Annotation

Lichen Wang
Zhengming Ding
Yun Fu

Multi-label annotation is challenging since a large amount of well-labeled training data are required to achieve promising performance. However, providing such data is expensive while unlabeled data are widely available. To this end, we propose a novel Adaptive Graph Guided Embedding (AG2E) approach for multi-label annotation in a semi-supervised fashion, which utilizes limited labeled data associating with large-scale unlabeled data to facilitate learning performance. Specifically, a multi-label propagation scheme and an effective embedding are jointly learned to seek a latent space where unlabeled instances tend to be well assigned multiple labels. Furthermore, a locality structure regularizer is designed to preserve the intrinsic structure and enhance the multi-label annotation. We evaluate our model in both conventional multi-label learning and zero-shot learning scenario. Experimental results demonstrate that our approach outperforms other compared state-of-the-art methods.

AAAI Conference 2018 Conference Paper

Discriminative Semi-Coupled Projective Dictionary Learning for Low-Resolution Person Re-Identification

Kai Li
Zhengming Ding
Sheng Li
Yun Fu

Person re-identiﬁcation (re-ID) is a fundamental task in automated video surveillance. In real-world visual surveillance systems, a person is often captured in quite low resolutions. So we often need to perform low-resolution person re-ID, where images captured by different cameras have great resolution divergences. Existing methods cope problem via some complicated and time-consuming strategies, making them less favorable in practice, and their performances are far from satisfactory. In this paper, we design a novel Discriminative Semi-coupled Projective Dictionary Learning (DSPDL) model to effectively and efﬁciently solve this problem. Speciﬁcally, we propose to jointly learn a pair of dictionaries and a mapping to bridge the gap across low(er) and high(er) resolution person images. Besides, we develop a novel graph regularizer to incorporate positive and negative image pair information in a parameterless fashion. Meanwhile, we adopt the efﬁcient and powerful projective dictionary learning technique to boost the our efﬁciency. Experiments on three public datasets show the superiority of the proposed method to the state-of-the-art ones.

AAAI Conference 2018 Conference Paper

Latent Discriminant Subspace Representations for Multi-View Outlier Detection

Kai Li
Sheng Li
Zhengming Ding
Weidong Zhang
Yun Fu

Identifying multi-view outliers is challenging because of the complex data distributions across different views. Existing methods cope this problem by exploiting pairwise constraints across different views to obtain new feature representations, based on which certain outlier score measurements are de- ﬁned. Due to the use of pairwise constraint, it is complicated and time-consuming for existing methods to detect outliers from three or more views. In this paper, we propose a novel method capable of detecting outliers from any number of data views. Our method ﬁrst learns latent discriminant representations for all view data and deﬁnes a novel outlier score function based on the latent discriminant representations. Speciﬁcally, we represent multi-view data by a global low-rank representation shared by all views and residual representations speciﬁc to each view. Through analyzing the view-speciﬁc residual representations of all views, we can get the outlier score for every sample. Moreover, we raise the problem of detecting a third type of multi-view outliers which are neglected by existing methods. Experiments on six datasets show our method outperforms the existing ones in identifying all types of multi-view outliers, often by large margins.

AAAI Conference 2018 Conference Paper

Learning Transferable Subspace for Human Motion Segmentation

Lichen Wang
Zhengming Ding
Yun Fu

Temporal data clustering is a challenging task. Existing methods usually explore data self-representation strategy, which may hinder the clustering performance in insufﬁcient or corrupted data scenarios. In real-world applications, we are easily accessible to a large amount of related labeled data. To this end, we propose a novel transferable subspace clustering approach by exploring useful information from relevant source data to enhance clustering performance in target temporal data. We manage to transform the original data into a shared low-dimensional and distinctive feature space by jointly seeking an effective domain-invariant projection. In this way, the well-labeled source knowledge can help obtain a more discriminative target representation. Moreover, a graph regularizer is designed to incorporate temporal information to preserve more sequence knowledge into the learned representation. Extensive experiments based on three human motion datasets illustrate that our approach is able to outperform state-of-the-art temporal data clustering methods.

IJCAI Conference 2018 Conference Paper

Robust Multi-view Representation: A Unified Perspective from Multi-view Learning to Domain Adaption

Zhengming Ding
Ming Shao
Yun Fu

Multi-view data are extensively accessible nowadays thanks to various types of features, different view-points and sensors which tend to facilitate better representation in many key applications. This survey covers the topic of robust multi-view data representation, centered around several major visual applications. First of all, we formulate a unified learning framework which is able to model most existing multi-view learning and domain adaptation in this line. Following this, we conduct a comprehensive discussion across these two problems by reviewing the algorithms along these two topics, including multi-view clustering, multi-view classification, zero-shot learning, and domain adaption. We further present more practical challenges in multi-view data analysis. Finally, we discuss future research including incomplete, unbalance, large-scale multi-view learning. This would benefit AI community from literature review to future direction.

AAAI Conference 2017 Conference Paper

Feature Selection Guided Auto-Encoder

Shuyang Wang
Zhengming Ding
Yun Fu

Recently the auto-encoder and its variants have demonstrated their promising results in extracting effective features. Specifically, its basic idea of encouraging the output to be as similar as input, ensures the learned representation could faithfully reconstruct the input data. However, one problem arises that not all hidden units are useful to compress the discriminative information while lots of units mainly contribute to represent the task-irrelevant patterns. In this paper, we propose a novel algorithm, Feature Selection Guided Auto-Encoder, which is a uniﬁed generative model that integrates feature selection and auto-encoder together. To this end, our proposed algorithm can distinguish the task-relevant units from the task-irrelevant ones to obtain most effective features for future classiﬁcation tasks. Our model not only performs feature selection on learned high-level features, but also dynamically endows the auto-encoder to produce more discriminative units. Experiments on several benchmarks demonstrate our method’s superiority over state-of-the-art approaches.

IJCAI Conference 2017 Conference Paper

From Ensemble Clustering to Multi-View Clustering

Zhiqiang Tao
Hongfu Liu
Sheng Li
Zhengming Ding
Yun Fu

Multi-View Clustering (MVC) aims to find the cluster structure shared by multiple views of a particular dataset. Existing MVC methods mainly integrate the raw data from different views, while ignoring the high-level information. Thus, their performance may degrade due to the conflict between heterogeneous features and the noises existing in each individual view. To overcome this problem, we propose a novel Multi-View Ensemble Clustering (MVEC) framework to solve MVC in an Ensemble Clustering (EC) way, which generates Basic Partitions (BPs) for each view individually and seeks for a consensus partition among all the BPs. By this means, we naturally leverage the complementary information of multi-view data in the same partition space. Instead of directly fusing BPs, we employ the low-rank and sparse decomposition to explicitly consider the connection between different views and detect the noises in each view. Moreover, the spectral ensemble clustering task is also involved by our framework with a carefully designed constraint, making MVEC a unified optimization framework to achieve the final consensus partition. Experimental results on six real-world datasets show the efficacy of our approach compared with both MVC and EC methods.

AAAI Conference 2017 Conference Paper

Multi-View Clustering via Deep Matrix Factorization

Handong Zhao
Zhengming Ding
Yun Fu

Multi-View Clustering (MVC) has garnered more attention recently since many real-world data are comprised of different representations or views. The key is to explore complementary information to beneﬁt the clustering problem. In this paper, we present a deep matrix factorization framework for MVC, where semi-nonnegative matrix factorization is adopted to learn the hierarchical semantics of multi-view data in a layerwise fashion. To maximize the mutual information from each view, we enforce the non-negative representation of each view in the ﬁnal layer to be the same. Furthermore, to respect the intrinsic geometric structure in each view data, graph regularizers are introduced to couple the output representation of deep structures. As a non-trivial contribution, we provide the solution based on alternating minimization strategy, followed by a theoretical proof of convergence. The superior experimental results on three face benchmarks show the effectiveness of the proposed deep matrix factorization model.

IJCAI Conference 2016 Conference Paper

Coupled Marginalized Auto-Encoders for Cross-Domain Multi-View Learning

Shuyang Wang
Zhengming Ding
Yun Fu

In cross-domain learning, there is a more challenging problem that the domain divergence involves more than one dominant factors, e. g. , different view-points, various resolutions and changing illuminations. Fortunately, an intermediate domain could often be found to build a bridge across them to facilitate the learning problem. In this paper, we propose a Coupled Marginalized Denoising Auto-encoders framework to address the cross-domain problem. Specifically, we design two marginalized denoising auto-encoders, one for the target and the other for source as well as the intermediate one. To better couple the two denoising auto-encoders learning, we incorporate a feature mapping, which tends to transfer knowledge between the intermediate domain and the target one. Furthermore, the maximum margin criterion, e. g. , intra-class compactness and inter-class penalty, on the output layer is imposed to seek more discriminative features across different domains. Extensive experiments on two tasks have demonstrated the superiority of our method over the state-of-the-art methods.

AAAI Conference 2016 Conference Paper

Pose-Dependent Low-Rank Embedding for Head Pose Estimation

Handong Zhao
Zhengming Ding
Yun Fu

Head pose estimation via embedding model has been demonstrated its effectiveness from the recent works. However, most of the previous methods only focus on manifold relationship among poses, while overlook the underlying global structure among subjects and poses. To build a robust and effective head pose estimator, we propose a novel Pose-dependent Low-Rank Embedding (PLRE) method, which is designed to exploit a discriminative subspace to keep within-pose samples close while between-pose samples far away. Speciﬁcally, low-rank embedding is employed under the multi-task framework, where each subject can be naturally considered as one task. Then, two novel terms are incorporated to align multiple tasks to pursue a better pose-dependent embedding. One is the cross-task alignment term, aiming to constrain each low-rank coefﬁcient to share the similar structure. The other is pose-dependent graph regularizer, which is developed to capture manifold structure of same pose cross different subjects. Experiments on databases CMU-PIE, MIT-CBCL, and extended YaleB with different levels of random noise are conducted and six embedding model based baselines are compared. The consistent superior results demonstrate the effectiveness of our proposed method.

AAAI Conference 2016 Conference Paper

Robust Multi-View Subspace Learning through Dual Low-Rank Decompositions

Zhengming Ding
Yun Fu

Multi-view data is highly common nowadays, since various view-points and different sensors tend to facilitate better data representation. However, data from different views show a large divergence. Speciﬁcally, one sample lies in two kinds of structures, one is class structure and the other is view structure, which are intertwined with one another in the original feature space. To address this, we develop a Robust Multi-view Subspace Learning algorithm (RMSL) through dual low-rank decompositions, which desires to seek a low-dimensional view-invariant subspace for multi-view data. Through dual low-rank decompositions, RMSL aims to disassemble two intertwined structures from each other in the low-dimensional subspace. Furthermore, we develop two novel graph regularizers to guide dual low-rank decompositions in a supervised fashion. In this way, the semantic gap across different views would be mitigated so that RMSL can preserve more within-class information and reduce the inﬂuence of view variance to seek a more robust low-dimensional subspace. Extensive experiments on two multi-view benchmarks, e. g. , face and object images, have witnessed the superiority of our proposed algorithm, by comparing it with the state-of-the-art algorithms.

AAAI Conference 2016 Conference Paper

Spectral Bisection Tree Guided Deep Adaptive Exemplar Autoencoder for Unsupervised Domain Adaptation

Ming Shao
Zhengming Ding
Handong Zhao
Yun Fu

Learning with limited labeled data is always a challenge in AI problems, and one of promising ways is transferring wellestablished source domain knowledge to the target domain, i. e. , domain adaptation. In this paper, we extend the deep representation learning to domain adaptation scenario, and propose a novel deep model called “Deep Adaptive Exemplar AutoEncoder (DAE2 )”. Different from conventional denoising autoencoders using corrupted inputs, we assign semantics to the input-output pairs of the autoencoders, which allow us to gradually extract discriminant features layer by layer. To this end, ﬁrst, we build a spectral bisection tree to generate source-target data compositions as the training pairs fed to autoencoders. Second, a low-rank coding regularizer is imposed to ensure the transferability of the learned hidden layer. Finally, a supervised layer is added on top to transform learned representations into discriminant features. The problem above can be solved iteratively in an EM fashion of learning. Extensive experiments on domain adaptation tasks including object, handwritten digits, and text data classiﬁcations demonstrate the effectiveness of the proposed method.

IJCAI Conference 2015 Conference Paper

Deep Linear Coding for Fast Graph Clustering

Ming Shao
Sheng Li
Zhengming Ding
Yun Fu

Clustering has been one of the most critical unsupervised learning techniques that has been widely applied in data mining problems. As one of its branches, graph clustering enjoys its popularity due to its appealing performance and strong theoretical supports. However, the eigen-decomposition problems involved are computationally expensive. In this paper, we propose a deep structure with a linear coder as the building block for fast graph clustering, called Deep Linear Coding (DLC). Different from conventional coding schemes, we jointly learn the feature transform function and discriminative codings, and guarantee that the learned codes are robust in spite of local distortions. In addition, we use the proposed linear coders as the building blocks to formulate a deep structure to further refine features in a layerwise fashion. Extensive experiments on clustering tasks demonstrate that our method performs well in terms of both time complexity and clustering accuracy. On a large-scale benchmark dataset (580K), our method runs 1500 times faster than the original spectral clustering.

IJCAI Conference 2015 Conference Paper

Deep Low-Rank Coding for Transfer Learning

Zhengming Ding
Ming Shao
Yun Fu

Recent researches on transfer learning exploit deep structures for discriminative feature representation to tackle cross-domain disparity. However, few of them are able to joint feature learning and knowledge transfer in a unified deep framework. In this paper, we develop a novel approach, called Deep Low-Rank Coding (DLRC), for transfer learning. Specifically, discriminative low-rank coding is achieved in the guidance of an iterative supervised structure term for each single layer. In this way, both marginal and conditional distributions between two domains intend to be mitigated. In addition, a marginalized denoising feature transformation is employed to guarantee the learned singlelayer low-rank coding to be robust despite of corruptions or noises. Finally, by stacking multiple layers of low-rank codings, we manage to learn robust cross-domain features from coarse to fine. Experimental results on several benchmarks have demonstrated the effectiveness of our proposed algorithm on facilitating the recognition performance for the target domain.

AAAI Conference 2014 Conference Paper

Latent Low-Rank Transfer Subspace Learning for Missing Modality Recognition

Zhengming Ding
Shao Ming
Yun Fu

We consider an interesting problem in this paper that uses transfer learning in two directions to compensate missing knowledge from the target domain. Transfer learning tends to be exploited as a powerful tool that mitigates the discrepancy between different databases used for knowledge transfer. It can also be used for knowledge transfer between different modalities within one database. However, in either case, transfer learning will fail if the target data are missing. To overcome this, we consider knowledge transfer between different databases and modalities simultaneously in a single framework, where missing target data from one database are recovered to facilitate recognition task. We referred to this framework as Latent Low-rank Transfer Subspace Learning method (L2 TSL). We first propose to use a low-rank constraint as well as dictionary learning in a learned subspace to guide the knowledge transfer between and within different databases. We then introduce a latent factor to uncover the underlying structure of the missing target data. Next, transfer learning in two directions is proposed to integrate auxiliary database for transfer learning with missing target data. Experimental results of multi-modalities knowledge transfer with missing target data demonstrate that our method can successfully inherit knowledge from the auxiliary database to complete the target domain, and therefore enhance the performance when recognizing data from the modality without any training data.