Arrow Research search

Author name cluster

Zhengming Ding

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

36 papers
2 author rows

Possible papers

36

NeurIPS Conference 2025 Conference Paper

$i$MIND: Insightful Multi-subject Invariant Neural Decoding

  • Zixiang Yin
  • Jiarui Li
  • Zhengming Ding

Decoding visual signals holds an appealing potential to unravel the complexities of cognition and perception. While recent reconstruction tasks leverage powerful generative models to produce high-fidelity images from neural recordings, they often pay limited attention to the underlying neural representations and rely heavily on pretrained priors. As a result, they provide little insight into how individual voxels encode and differentiate semantic content or how these representations vary across subjects. To mitigate this gap, we present an $i$nsightful **M**ulti-subject **I**nvariant **N**eural **D**ecoding ($i$MIND) model, which employs a novel dual-decoding framework--both biometric and semantic decoding--to offer neural interpretability in a data-driven manner and deepen our understanding of brain-based visual functionalities. Our $i$MIND model operates through three core steps: establishing a shared neural representation space across subjects using a ViT-based masked autoencoder, disentangling neural features into complementary subject-specific and object-specific components, and performing dual decoding to support both biometric and semantic classification tasks. Experimental results demonstrate that $i$MIND achieves state-of-the-art decoding performance with minimal scalability limitations. Furthermore, $i$MIND empirically generates voxel-object activation fingerprints that reveal object-specific neural patterns and enable investigation of subject-specific variations in attention to identical stimuli. These findings provide a foundation for more interpretable and generalizable subject-invariant neural decoding, advancing our understanding of the voxel semantic selectivity as well as the neural vision processing dynamics.

IJCAI Conference 2025 Conference Paper

A Simple yet Effective Hypergraph Clustering Network

  • Qianqian Wang
  • Bowen Zhao
  • Zhengming Ding
  • Xiangdong Zhang
  • Quanxue Gao

Hypergraph Clustering has gained significant attention due to its capability of capturing high order structural information. Among different approaches, contrastive learning-based methods leverage self-supervised learning and data augmentation, exhibiting impressive performance. However, most of them come with the following limitations: 1) Augmentation strategies like feature dropout can potentially disrupt the intrinsic clustering structure of hypergraphs. 2) High computational demands hinder their real-world application. To address the above issues, we propose a simple yet effective Hypergraph Clustering Network framework (HCN). Specifically, HCN replaces the hypergraph convolution operation with smoothing preprocessing, which avoids high computational complexity. Besides, to retain intrinsic structure, it develops two key modules: the self-diagonal consistency module and the structure alignment mod ule. They respectively align the similarity matrix with the identity matrix and the structural affinity matrix, which ensures intra-cluster compact ness and inter-cluster separability. Extensive experiments on five benchmark datasets demonstrate HCN’s superiority over state-of-the-art methods.

NeurIPS Conference 2025 Conference Paper

Diffusion Guided Adversarial State Perturbations in Reinforcement Learning

  • Xiaolin Sun
  • Feidi Liu
  • Zhengming Ding
  • Zizhan Zheng

Reinforcement learning (RL) systems, while achieving remarkable success across various domains, are vulnerable to adversarial attacks. This is especially a concern in vision-based environments where minor manipulations of high-dimensional image inputs can easily mislead the agent's behavior. To this end, various defenses have been proposed recently, with state-of-the-art approaches achieving robust performance even under large state perturbations. However, after closer investigation, we found that the effectiveness of the current defenses is due to a fundamental weakness of the existing $l_p$ norm-constrained attacks, which can barely alter the semantics of image input even under a relatively large perturbation budget. In this work, we propose SHIFT, a novel policy-agnostic diffusion-based state perturbation attack to go beyond this limitation. Our attack is able to generate perturbed states that are semantically different from the true states while remaining realistic and history-aligned to avoid detection. Evaluations show that our attack effectively breaks existing defenses, including the most sophisticated ones, significantly outperforming existing attacks while being more perceptually stealthy. The results highlight the vulnerability of RL agents to semantics-aware adversarial perturbations, indicating the importance of developing more robust policies.

NeurIPS Conference 2025 Conference Paper

Doctor Approved: Generating Medically Accurate Skin Disease Images through AI-Expert Feedback

  • Janet Wang
  • Yunbei Zhang
  • Zhengming Ding
  • Jihun Hamm

Paucity of medical data severely limits the generalizability of diagnostic ML models, as the full spectrum of disease variability can not be represented by a small clinical dataset. To address this, diffusion models (DMs) have been considered as a promising avenue for synthetic image generation and augmentation. However, they frequently produce medically inaccurate images, deteriorating the model performance. Expert domain knowledge is critical for synthesizing images that correctly encode clinical information, especially when data is scarce and quality outweighs quantity. Existing approaches for incorporating human feedback, such as reinforcement learning (RL) and Direct Preference Optimization (DPO), rely on robust reward functions or demand labor-intensive expert evaluations. Recent progress in Multimodal Large Language Models (MLLMs) reveals their strong visual reasoning capabilities, making them adept candidates as evaluators. In this work, we propose a novel framework, coined MAGIC ( M edically A ccurate G eneration of I mages through AI-Expert C ollaboration), that synthesizes clinically accurate skin disease images for data augmentation. Our method creatively translates expert-defined criteria into actionable feedback for image synthesis of DMs, significantly improving clinical accuracy while reducing the direct human workload. Experiments demonstrate that our method greatly improves the clinical quality of synthesized skin disease images, with outputs aligning with dermatologist assessments. Additionally, augmenting training data with these synthesized images improves diagnostic accuracy by +9. 02% on a challenging 20-condition skin disease classification task, and by +13. 89% in the few-shot setting. Beyond image synthesis, MAGIC illustrates a task-centric alignment paradigm: instead of adapting MLLMs to niche medical tasks, it adapts tasks to the evaluative strengths of general-purpose MLLMs by decomposing domain knowledge into attribute-level checklists. This design offers a scalable and reliable path for leveraging foundation models in specialized domains.

IJCAI Conference 2025 Conference Paper

Enhanced Unsupervised Discriminant Dimensionality Reduction for Nonlinear Data

  • Qianqian Wang
  • Mengping Jiang
  • Wei Feng
  • Zhengming Ding

Linear Discriminant Analysis (LDA) is a classical supervised dimensionality reduction algorithm. However, LDA focuses more on global structure and overly depends on reliable data labels. For data with outliers and nonlinear structures, LDA cannot effectively capture the true structure of the data. Moreover, the subspace dimension learned by LDA must be smaller than cluster number, which limits its practical applications. To address these issues, we propose a novel unsupervised LDA method that combines centerless K-means and LDA. This method eliminates the need to calculate cluster centroids and improves model robustness. By fusing centerless K-means and LDA into a unified framework and deducing the connection between K-means and manifold learning, this method captures the local manifold structure and discriminative structure. Additionally, the dimensionality of the subspace is not restricted. This method not only overcomes the limitations of traditional LDA but also improves the model’s adaptability to complex data. Extensive experiments on seven datasets demonstrate the effectiveness of the proposed method.

NeurIPS Conference 2025 Conference Paper

PSI: A Benchmark for Human Interpretation and Response in Traffic Interactions

  • TAOTAO JING
  • Tina Chen
  • Renran Tian
  • Yaobin Chen
  • Joshua Domeyer
  • Heishiro Toyoda
  • Rini Sherony
  • Zhengming Ding

Accurately modeling pedestrian intention and understanding driver decision-making processes are critical for the development of safe and socially aware autonomous driving systems. However, existing datasets primarily emphasize observable behavior, offering limited insight into the underlying causal reasoning that informs human interpretation and response during traffic interactions. To address this gap, we introduce PSI, a benchmark dataset that captures the dynamic evolution of pedestrian crossing intentions from the driver’s perspective, enriched with human-annotated textual explanations that reflect the reasoning behind intention estimation and driving decision making. These annotations offer a unique foundation for developing and benchmarking models that combine predictive performance with interpretable and human-aligned reasoning. PSI supports standardized tasks and evaluation protocols across multiple dimensions, including pedestrian intention prediction, driver decision modeling, reasoning generation, and trajectory forecasting and more. By enabling causal and interpretable evaluation, PSI advances research toward autonomous systems that can reason, act, and explain in alignment with human cognitive processes.

NeurIPS Conference 2025 Conference Paper

Rethinking Joint Maximum Mean Discrepancy for Visual Domain Adaptation

  • Wei Wang
  • Haifeng Xia
  • Chao Huang
  • Zhengming Ding
  • Cong Wang
  • Haojie Li
  • Xiaochun Cao

In domain adaption (DA), joint maximum mean discrepancy (JMMD), as a famous distribution-distance metric, aims to measure joint probability distribution difference between the source domain and target domain, while it is still not fully explored and especially hard to be applied into a subspace-learning framework as its empirical estimation involves a tensor-product operator whose partial derivative is difficult to obtain. To solve this issue, we deduce a concise JMMD based on the Representer theorem that avoids the tensor-product operator and obtains two essential findings. First, we reveal the uniformity of JMMD by proving that previous marginal, class conditional, and weighted class conditional probability distribution distances are three special cases of JMMD with different label reproducing kernels. Second, inspired by graph embedding, we observe that the similarity weights, which strengthen the intra-class compactness in the graph of Hilbert Schmidt independence criterion (HSIC), take opposite signs in the graph of JMMD, revealing why JMMD degrades the feature discrimination. This motivates us to propose a novel loss JMMD-HSIC by jointly considering JMMD and HSIC to promote discrimination of JMMD. Extensive experiments on several cross-domain datasets could demonstrate the validity of our revealed theoretical results and the effectiveness of our proposed JMMD-HSIC.

ICRA Conference 2025 Conference Paper

RoBiFusion: A Robust and Bidirectional Interaction Camera-LiDAR 3D Object Detection Framework

  • Xubin Wen
  • Haifeng Xia
  • Zhengming Ding
  • Siyu Xia

Camera-LiDAR 3D object detection is currently becoming a crucial component in the field of autonomous driving perception. However, previous models only performed feature fusion in the deep-level BEV hierarchy when dealing with camera-LiDAR feature fusion. This approach lacks interaction with the shallow-level sensor features, which is beneficial in constructing the corresponding BEV features. However, a simple shallow-level feature interaction can introduce sensor noise caused by intrinsic and extrinsic camera calibration errors. To address this, we propose RoBiFusion, a novel camera-LiDAR 3D object detection framework designed for effective sensor feature interaction and mitigating sensor noise interference. This framework consists of three submodules: the Camera-LiDAR Feature Matching module, the LiDAR-to-Camera module, and the Camera-to-LiDAR module. Firstly, in the Camera-LiDAR Feature Matching module, we use the cross-attention module to dynamically match the camera features and the LiDAR features, which solves the problem of feature inconsistency caused by noise in the camera's intrinsic and extrinsic parameters. Secondly, in the LiDAR-to-Camera module, we propose a novel depth representation that can effectively mitigate LiDAR noise interference. Thirdly, in the Camera-to-LiDAR module, we introduce deformable attention to help LiDAR feature capture instance-level semantic features. Additionally, we design a novel differentiable and efficient grid sample module to accelerate the process since the bilinear grid sample module in deformable attention is time-consuming and not deployment-friendly. We compared RoBiFusion to the state-of-the-art BEVFusion on the nuScenes dataset and found that RoBiFusion surpasses BEVFusion by 1. 5% mAP and 2. 4% NDS. Furthermore, we designed a series of ablation experiments to verify the effectiveness of the aforementioned modules.

AAAI Conference 2025 Conference Paper

Supportive Negatives Spectral Augmentation for Source-Free Cross-Domain Segmentation

  • Kexin Zheng
  • Haifeng Xia
  • Siyu Xia
  • Ming Shao
  • Zhengming Ding

Source-free domain adaptation (SFDA) aims to transfer knowledge from the well-trained source model and optimize it to adapt target data distribution. SFDA methods are suitable for medical image segmentation task due to its data-privacy protection and achieve promising performances. However, cross-domain distribution shift makes it difficult for the adapted model to provide accurate decisions on several hard instances and negatively affects model generalization. To overcome this limitation, a novel method `supportive negatives spectral augmentation' (SNSA) is presented in this work. Concretely, SNSA includes the instance selection mechanism to automatically discover a few hard samples for which source model produces incorrect predictions. And, active learning strategy is adopted to re-calibrate their predictive masks. Moreover, SNSA deploys the spectral augmentation between hard instances and others to encourage source model to gradually capture and adapt the attributions of target distribution. Considerable experimental studies demonstrate that annotating merely 4%~5% of negative instances from the target domain significantly improves segmentation performance over previous methods.

ICML Conference 2025 Conference Paper

Unified K-Means Clustering with Label-Guided Manifold Learning

  • Qianqian Wang 0001
  • Mengping Jiang
  • Zhengming Ding
  • Quanxue Gao

K-Means clustering is a classical and effective unsupervised learning method attributed to its simplicity and efficiency. However, it faces notable challenges, including sensitivity to random initial centroid selection, a limited ability to discover the intrinsic manifold structures within nonlinear datasets, and difficulty in achieving balanced clustering in practical scenarios. To overcome these weaknesses, we introduce a novel framework for K-Means that leverages manifold learning. This approach eliminates the need for centroid calculation and utilizes a cluster indicator matrix to align the manifold structures, thereby enhancing clustering accuracy. Beyond the traditional Euclidean distance, our model incorporates Gaussian kernel distance, K-nearest neighbor distance, and low-pass filtering distance to effectively manage data that is not linearly separable. Furthermore, we introduce a balanced regularizer to achieve balanced clustering results. The detailed experimental results demonstrate the efficacy of our proposed methodology.

IROS Conference 2023 Conference Paper

IDA: Informed Domain Adaptive Semantic Segmentation

  • Zheng Chen 0016
  • Zhengming Ding
  • Jason M. Gregory
  • Lantao Liu

Mixup-based data augmentation has been validated to be a critical stage in the self-training framework for unsupervised domain adaptive semantic segmentation (UDASS), which aims to transfer knowledge from a well-annotated (source) domain to an unlabeled (target) domain. Existing self-training methods usually adopt the popular region-based mixup techniques with a random sampling strategy, which unfortunately ignores the dynamic evolution of different semantics across various domains as training proceeds. To improve the UDA-SS performance, we propose an Informed Domain Adaptation (IDA) model, a self-training framework that mixes the data based on class-level segmentation performance, which aims to emphasize small-region semantics during mixup. In our IDA model, the class-level performance is tracked by an expected confidence score (ECS). We then use a dynamic schedule to determine the mixing ratio for data in different domains. Extensive experimental results reveal that our proposed method is able to outperform the state-of-the-art UDA-SS method by a margin of 1. 1 mIoU in the adaptation of GTA-V to Cityscapes and of 0. 9 mIoU in the adaptation of SYNTHIA to Cityscapes. Code link: https://github.com/ArlenCHEN/IDA.git

IJCAI Conference 2023 Conference Paper

RAIN: RegulArization on Input and Network for Black-Box Domain Adaptation

  • Qucheng Peng
  • Zhengming Ding
  • Lingjuan Lyu
  • Lichao Sun
  • Chen Chen

Source-Free domain adaptation transits the source-trained model towards target domain without exposing the source data, trying to dispel these concerns about data privacy and security. However, this paradigm is still at risk of data leakage due to adversarial attacks on the source model. Hence, the Black-Box setting only allows to use the outputs of source model, but still suffers from overfitting on the source domain more severely due to source model's unseen weights. In this paper, we propose a novel approach named RAIN (RegulArization on Input and Network) for Black-Box domain adaptation from both input-level and network-level regularization. For the input-level, we design a new data augmentation technique as Phase MixUp, which highlights task-relevant objects in the interpolations, thus enhancing input-level regularization and class consistency for target models. For network-level, we develop a Subnetwork Distillation mechanism to transfer knowledge from the target subnetwork to the full target network via knowledge distillation, which thus alleviates overfitting on the source domain by learning diverse target representations. Extensive experiments show that our method achieves state-of-the-art performance on several cross-domain benchmarks under both single- and multi-source black-box domain adaptation.

AAAI Conference 2023 Conference Paper

TrEP: Transformer-Based Evidential Prediction for Pedestrian Intention with Uncertainty

  • Zhengming Zhang
  • Renran Tian
  • Zhengming Ding

With rapid development in hardware (sensors and processors) and AI algorithms, automated driving techniques have entered the public’s daily life and achieved great success in supporting human driving performance. However, due to the high contextual variations and temporal dynamics in pedestrian behaviors, the interaction between autonomous-driving cars and pedestrians remains challenging, impeding the development of fully autonomous driving systems. This paper focuses on predicting pedestrian intention with a novel transformer-based evidential prediction (TrEP) algorithm. We develop a transformer module towards the temporal correlations among the input features within pedestrian video sequences and a deep evidential learning model to capture the AI uncertainty under scene complexities. Experimental results on three popular pedestrian intent benchmarks have verified the effectiveness of our proposed model over the state-of-the-art. The algorithm performance can be further boosted by controlling the uncertainty level. We systematically compare human disagreements with AI uncertainty to further evaluate AI performance in confusing scenes. The code is released at https://github.com/zzmonlyyou/TrEP.git.

IJCAI Conference 2022 Conference Paper

Adversarial Bi-Regressor Network for Domain Adaptive Regression

  • Haifeng Xia
  • Pu Wang
  • Toshiaki Koike-Akino
  • Ye Wang
  • Philip Orlik
  • Zhengming Ding

Domain adaptation (DA) aims to transfer the knowledge of a well-labeled source domain to facilitate unlabeled target learning. When turning to specific tasks such as indoor (Wi-Fi) localization, it is essential to learn a cross-domain regressor to mitigate the domain shift. This paper proposes a novel method Adversarial Bi-Regressor Network (ABRNet) to seek more effective cross- domain regression model. Specifically, a discrepant bi-regressor architecture is developed to maximize the difference of bi-regressor to discover uncertain target instances far from the source distribution, and then an adversarial training mechanism is adopted between feature extractor and dual regressors to produce domain-invariant representations. To further bridge the large domain gap, a domain- specific augmentation module is designed to synthesize two source-similar and target-similar inter- mediate domains to gradually eliminate the original domain mismatch. The empirical studies on two cross-domain regressive benchmarks illustrate the power of our method on solving the domain adaptive regression (DAR) problem.

AAAI Conference 2022 Conference Paper

Cross-Domain Collaborative Normalization via Structural Knowledge

  • Haifeng Xia
  • Zhengming Ding

Batch Normalization (BN) as an important component assists Deep Neural Networks in achieving promising performance for extensive learning tasks by scaling distribution of feature representations within mini-batches. However, the application of BN suffers from performance degradation under the scenario of Unsupervised Domain Adaptation (UDA), since the estimated statistics fail to concurrently describe two different domains. In this paper, we develop a novel normalization technique, named Collaborative Normalization (CoN), for eliminating domain discrepancy and accelerating the model training of neural networks for UDA. Unlike typical strategies only exploiting domain-specific statistics during normalization, our CoN excavates cross-domain knowledge and simultaneously scales features from various domains by mimicking the merits of collaborative representation. Our CoN can be easily plugged into popular neural network backbones for cross-domain learning. On the one hand, theoretical analysis guarantees that models with CoN promote discriminability of feature representations and accelerate convergence rate; on the other hand, empirical study verifies that replacing BN with CoN in popular network backbones effectively improves classification accuracy in most learning tasks across three cross-domain visual benchmarks.

AAAI Conference 2021 Conference Paper

Balanced Open Set Domain Adaptation via Centroid Alignment

  • Mengmeng Jing
  • Jingjing Li
  • Lei Zhu
  • Zhengming Ding
  • Ke Lu
  • Yang Yang

Open Set Domain Adaptation (OSDA) is a challenging domain adaptation setting which allows the existence of unknown classes on the target domain. Although existing OSDA methods are good at classifying samples of known classes, they ignore the classification ability for the unknown samples, making them unbalanced OSDA methods. To alleviate this problem, we propose a balanced OSDA methods which could recognize the unknown samples while maintain high classification performance for the known samples. Specifically, to reduce the domain gaps, we first project the features to a hyperspherical latent space. In this space, we propose to bound the centroid deviation angles to not only increase the intraclass compactness but also enlarge the inter-class margins. With the bounded centroid deviation angles, we employ the statistical Extreme Value Theory to recognize the unknown samples that are misclassified into known classes. In addition, to learn better centroids, we propose an improved centroid update strategy based on sample reweighting and adaptive update rate to cooperate with centroid alignment. Experimental results on three OSDA benchmarks verify that our method can significantly outperform the compared methods and reduce the proportion of the unknown samples being misclassified into known classes.

AAAI Conference 2021 Conference Paper

Generative Partial Visual-Tactile Fused Object Clustering

  • Tao Zhang
  • Yang Cong
  • Gan Sun
  • Jiahua Dong
  • Yuyang Liu
  • Zhengming Ding

Visual-tactile fused sensing for object clustering has achieved significant progresses recently, since the involvement of tactile modality can effectively improve clustering performance. However, the missing data (i. e. , partial data) issues always happen due to occlusion and noises during the data collecting process. This issue is not well solved by most existing partial multi-view clustering methods for the heterogeneous modality challenge. Naively employing these methods would inevitably induce a negative effect and further hurt the performance. To solve the mentioned challenges, we propose a Generative Partial Visual-Tactile Fused (i. e. , GPVTF) framework for object clustering. More specifically, we first do partial visual and tactile features extraction from the partial visual and tactile data, respectively, and encode the extracted features in modality-specific feature subspaces. A conditional cross-modal clustering generative adversarial network is then developed to synthesize one modality conditioning on the other modality, which can compensate missing samples and align the visual and tactile modalities naturally by adversarial learning. To the end, two pseudo-label based KLdivergence losses are employed to update the corresponding modality-specific encoders. Extensive comparative experiments on three public visual-tactile datasets prove the effectiveness of our method.

NeurIPS Conference 2021 Conference Paper

Implicit Semantic Response Alignment for Partial Domain Adaptation

  • Wenxiao Xiao
  • Zhengming Ding
  • Hongfu Liu

Partial Domain Adaptation (PDA) addresses the unsupervised domain adaptation problem where the target label space is a subset of the source label space. Most state-of-art PDA methods tackle the inconsistent label space by assigning weights to classes or individual samples, in an attempt to discard the source data that belongs to the irrelevant classes. However, we believe samples from those extra categories would still contain valuable information to promote positive transfer. In this paper, we propose the Implicit Semantic Response Alignment to explore the intrinsic relationships among different categories by applying a weighted schema on the feature level. Specifically, we design a class2vec module to extract the implicit semantic topics from the visual features. With an attention layer, we calculate the semantic response according to each implicit semantic topic. Then semantic responses of source and target data are aligned to retain the relevant information contained in multiple categories by weighting the features, instead of samples. Experiments on several cross-domain benchmark datasets demonstrate the effectiveness of our method over the state-of-the-art PDA methods. Moreover, we elaborate in-depth analyses to further explore implicit semantic alignment.

ECAI Conference 2020 Conference Paper

Adaptive Local Neighbors for Transfer Discriminative Feature Learning

  • Wei Wang 0335
  • Zhihui Wang 0001
  • Haojie Li
  • Juan Zhou
  • Zhengming Ding

In Domain Adaptation (DA), how to reduce the distributional differences across domains and preserve the data structures are two critical issues to obtain domain-invariant features. Existing DA methods either preserve the Local Manifold Structure (LMS) or the Global Discriminative Consistency (GDC), while fail to take those two metrics into account simultaneously. Therefore, the extracted features are either short of discriminative ability or sensitive to the multimodally distributed data. Moreover, the local neighbored relationships among data points are mostly established in original data space, which is unreliable, especially for data with large noises. Therefore, this paper proposes a novel DA approach, i. e. , Adaptive Local Neighbors for Transfer Discriminative Feature Learning, to leverage LMS and GDC into a unified transfer feature learning model, where we only focus on the GDC between the local neighbors, so that the extracted features are more discriminative and robust to the multimodally distributed data. Moreover, the data points’ local neighbors are revealed adaptively in the learned subspace so that it is insensitive to the data noises. Compared with the state-of-the-art methods, the proposed approach achieves higher performance for different cross-domain image classification tasks, especially 3. 0% improved for Office10+Caltech10 dataset.

AAAI Conference 2020 Conference Paper

Bi-Directional Generation for Unsupervised Domain Adaptation

  • Guanglei Yang
  • Haifeng Xia
  • Mingli Ding
  • Zhengming Ding

Unsupervised domain adaptation facilitates the unlabeled target domain relying on well-established source domain information. The conventional methods forcefully reducing the domain discrepancy in the latent space will result in the destruction of intrinsic data structure. To balance the mitigation of domain gap and the preservation of the inherent structure, we propose a Bi-Directional Generation domain adaptation model with consistent classifiers interpolating two intermediate domains to bridge source and target domains. Specifically, two cross-domain generators are employed to synthesize one domain conditioned on the other. The performance of our proposed method can be further enhanced by the consistent classifiers and the cross-domain alignment constraints. We also design two classifiers which are jointly optimized to maximize the consistency on target sample prediction. Extensive experiments verify that our proposed model outperforms the state-of-the-art on standard cross domain visual benchmarks.

AAAI Conference 2020 Conference Paper

Domain Conditioned Adaptation Network

  • Shuang Li
  • Chi Liu
  • Qiuxia Lin
  • Binhui Xie
  • Zhengming Ding
  • Gao Huang
  • Jian Tang

Tremendous research efforts have been made to thrive deep domain adaptation (DA) by seeking domain-invariant features. Most existing deep DA models only focus on aligning feature representations of task-specific layers across domains while integrating a totally shared convolutional architecture for source and target. However, we argue that such strongly-shared convolutional layers might be harmful for domain-specific feature learning when source and target data distribution differs to a large extent. In this paper, we relax a shared-convnets assumption made by previous DA methods and propose a Domain Conditioned Adaptation Network (DCAN), which aims to excite distinct convolutional channels with a domain conditioned channel attention mechanism. As a result, the critical low-level domain-dependent knowledge could be explored appropriately. As far as we know, this is the first work to explore the domain-wise convolutional channel activation for deep DA networks. Moreover, to effectively align high-level feature distributions across two domains, we further deploy domain conditioned feature correction blocks after task-specific layers, which will explicitly correct the domain discrepancy. Extensive experiments on three crossdomain benchmarks demonstrate the proposed approach outperforms existing methods by a large margin, especially on very tough cross-domain learning tasks.

IJCAI Conference 2018 Conference Paper

Adaptive Graph Guided Embedding for Multi-label Annotation

  • Lichen Wang
  • Zhengming Ding
  • Yun Fu

Multi-label annotation is challenging since a large amount of well-labeled training data are required to achieve promising performance. However, providing such data is expensive while unlabeled data are widely available. To this end, we propose a novel Adaptive Graph Guided Embedding (AG2E) approach for multi-label annotation in a semi-supervised fashion, which utilizes limited labeled data associating with large-scale unlabeled data to facilitate learning performance. Specifically, a multi-label propagation scheme and an effective embedding are jointly learned to seek a latent space where unlabeled instances tend to be well assigned multiple labels. Furthermore, a locality structure regularizer is designed to preserve the intrinsic structure and enhance the multi-label annotation. We evaluate our model in both conventional multi-label learning and zero-shot learning scenario. Experimental results demonstrate that our approach outperforms other compared state-of-the-art methods.

AAAI Conference 2018 Conference Paper

Discriminative Semi-Coupled Projective Dictionary Learning for Low-Resolution Person Re-Identification

  • Kai Li
  • Zhengming Ding
  • Sheng Li
  • Yun Fu

Person re-identification (re-ID) is a fundamental task in automated video surveillance. In real-world visual surveillance systems, a person is often captured in quite low resolutions. So we often need to perform low-resolution person re-ID, where images captured by different cameras have great resolution divergences. Existing methods cope problem via some complicated and time-consuming strategies, making them less favorable in practice, and their performances are far from satisfactory. In this paper, we design a novel Discriminative Semi-coupled Projective Dictionary Learning (DSPDL) model to effectively and efficiently solve this problem. Specifically, we propose to jointly learn a pair of dictionaries and a mapping to bridge the gap across low(er) and high(er) resolution person images. Besides, we develop a novel graph regularizer to incorporate positive and negative image pair information in a parameterless fashion. Meanwhile, we adopt the efficient and powerful projective dictionary learning technique to boost the our efficiency. Experiments on three public datasets show the superiority of the proposed method to the state-of-the-art ones.

AAAI Conference 2018 Conference Paper

Latent Discriminant Subspace Representations for Multi-View Outlier Detection

  • Kai Li
  • Sheng Li
  • Zhengming Ding
  • Weidong Zhang
  • Yun Fu

Identifying multi-view outliers is challenging because of the complex data distributions across different views. Existing methods cope this problem by exploiting pairwise constraints across different views to obtain new feature representations, based on which certain outlier score measurements are de- fined. Due to the use of pairwise constraint, it is complicated and time-consuming for existing methods to detect outliers from three or more views. In this paper, we propose a novel method capable of detecting outliers from any number of data views. Our method first learns latent discriminant representations for all view data and defines a novel outlier score function based on the latent discriminant representations. Specifically, we represent multi-view data by a global low-rank representation shared by all views and residual representations specific to each view. Through analyzing the view-specific residual representations of all views, we can get the outlier score for every sample. Moreover, we raise the problem of detecting a third type of multi-view outliers which are neglected by existing methods. Experiments on six datasets show our method outperforms the existing ones in identifying all types of multi-view outliers, often by large margins.

AAAI Conference 2018 Conference Paper

Learning Transferable Subspace for Human Motion Segmentation

  • Lichen Wang
  • Zhengming Ding
  • Yun Fu

Temporal data clustering is a challenging task. Existing methods usually explore data self-representation strategy, which may hinder the clustering performance in insufficient or corrupted data scenarios. In real-world applications, we are easily accessible to a large amount of related labeled data. To this end, we propose a novel transferable subspace clustering approach by exploring useful information from relevant source data to enhance clustering performance in target temporal data. We manage to transform the original data into a shared low-dimensional and distinctive feature space by jointly seeking an effective domain-invariant projection. In this way, the well-labeled source knowledge can help obtain a more discriminative target representation. Moreover, a graph regularizer is designed to incorporate temporal information to preserve more sequence knowledge into the learned representation. Extensive experiments based on three human motion datasets illustrate that our approach is able to outperform state-of-the-art temporal data clustering methods.

IJCAI Conference 2018 Conference Paper

Robust Multi-view Representation: A Unified Perspective from Multi-view Learning to Domain Adaption

  • Zhengming Ding
  • Ming Shao
  • Yun Fu

Multi-view data are extensively accessible nowadays thanks to various types of features, different view-points and sensors which tend to facilitate better representation in many key applications. This survey covers the topic of robust multi-view data representation, centered around several major visual applications. First of all, we formulate a unified learning framework which is able to model most existing multi-view learning and domain adaptation in this line. Following this, we conduct a comprehensive discussion across these two problems by reviewing the algorithms along these two topics, including multi-view clustering, multi-view classification, zero-shot learning, and domain adaption. We further present more practical challenges in multi-view data analysis. Finally, we discuss future research including incomplete, unbalance, large-scale multi-view learning. This would benefit AI community from literature review to future direction.

AAAI Conference 2017 Conference Paper

Feature Selection Guided Auto-Encoder

  • Shuyang Wang
  • Zhengming Ding
  • Yun Fu

Recently the auto-encoder and its variants have demonstrated their promising results in extracting effective features. Specifically, its basic idea of encouraging the output to be as similar as input, ensures the learned representation could faithfully reconstruct the input data. However, one problem arises that not all hidden units are useful to compress the discriminative information while lots of units mainly contribute to represent the task-irrelevant patterns. In this paper, we propose a novel algorithm, Feature Selection Guided Auto-Encoder, which is a unified generative model that integrates feature selection and auto-encoder together. To this end, our proposed algorithm can distinguish the task-relevant units from the task-irrelevant ones to obtain most effective features for future classification tasks. Our model not only performs feature selection on learned high-level features, but also dynamically endows the auto-encoder to produce more discriminative units. Experiments on several benchmarks demonstrate our method’s superiority over state-of-the-art approaches.

IJCAI Conference 2017 Conference Paper

From Ensemble Clustering to Multi-View Clustering

  • Zhiqiang Tao
  • Hongfu Liu
  • Sheng Li
  • Zhengming Ding
  • Yun Fu

Multi-View Clustering (MVC) aims to find the cluster structure shared by multiple views of a particular dataset. Existing MVC methods mainly integrate the raw data from different views, while ignoring the high-level information. Thus, their performance may degrade due to the conflict between heterogeneous features and the noises existing in each individual view. To overcome this problem, we propose a novel Multi-View Ensemble Clustering (MVEC) framework to solve MVC in an Ensemble Clustering (EC) way, which generates Basic Partitions (BPs) for each view individually and seeks for a consensus partition among all the BPs. By this means, we naturally leverage the complementary information of multi-view data in the same partition space. Instead of directly fusing BPs, we employ the low-rank and sparse decomposition to explicitly consider the connection between different views and detect the noises in each view. Moreover, the spectral ensemble clustering task is also involved by our framework with a carefully designed constraint, making MVEC a unified optimization framework to achieve the final consensus partition. Experimental results on six real-world datasets show the efficacy of our approach compared with both MVC and EC methods.

AAAI Conference 2017 Conference Paper

Multi-View Clustering via Deep Matrix Factorization

  • Handong Zhao
  • Zhengming Ding
  • Yun Fu

Multi-View Clustering (MVC) has garnered more attention recently since many real-world data are comprised of different representations or views. The key is to explore complementary information to benefit the clustering problem. In this paper, we present a deep matrix factorization framework for MVC, where semi-nonnegative matrix factorization is adopted to learn the hierarchical semantics of multi-view data in a layerwise fashion. To maximize the mutual information from each view, we enforce the non-negative representation of each view in the final layer to be the same. Furthermore, to respect the intrinsic geometric structure in each view data, graph regularizers are introduced to couple the output representation of deep structures. As a non-trivial contribution, we provide the solution based on alternating minimization strategy, followed by a theoretical proof of convergence. The superior experimental results on three face benchmarks show the effectiveness of the proposed deep matrix factorization model.

IJCAI Conference 2016 Conference Paper

Coupled Marginalized Auto-Encoders for Cross-Domain Multi-View Learning

  • Shuyang Wang
  • Zhengming Ding
  • Yun Fu

In cross-domain learning, there is a more challenging problem that the domain divergence involves more than one dominant factors, e. g. , different view-points, various resolutions and changing illuminations. Fortunately, an intermediate domain could often be found to build a bridge across them to facilitate the learning problem. In this paper, we propose a Coupled Marginalized Denoising Auto-encoders framework to address the cross-domain problem. Specifically, we design two marginalized denoising auto-encoders, one for the target and the other for source as well as the intermediate one. To better couple the two denoising auto-encoders learning, we incorporate a feature mapping, which tends to transfer knowledge between the intermediate domain and the target one. Furthermore, the maximum margin criterion, e. g. , intra-class compactness and inter-class penalty, on the output layer is imposed to seek more discriminative features across different domains. Extensive experiments on two tasks have demonstrated the superiority of our method over the state-of-the-art methods.

AAAI Conference 2016 Conference Paper

Pose-Dependent Low-Rank Embedding for Head Pose Estimation

  • Handong Zhao
  • Zhengming Ding
  • Yun Fu

Head pose estimation via embedding model has been demonstrated its effectiveness from the recent works. However, most of the previous methods only focus on manifold relationship among poses, while overlook the underlying global structure among subjects and poses. To build a robust and effective head pose estimator, we propose a novel Pose-dependent Low-Rank Embedding (PLRE) method, which is designed to exploit a discriminative subspace to keep within-pose samples close while between-pose samples far away. Specifically, low-rank embedding is employed under the multi-task framework, where each subject can be naturally considered as one task. Then, two novel terms are incorporated to align multiple tasks to pursue a better pose-dependent embedding. One is the cross-task alignment term, aiming to constrain each low-rank coefficient to share the similar structure. The other is pose-dependent graph regularizer, which is developed to capture manifold structure of same pose cross different subjects. Experiments on databases CMU-PIE, MIT-CBCL, and extended YaleB with different levels of random noise are conducted and six embedding model based baselines are compared. The consistent superior results demonstrate the effectiveness of our proposed method.

AAAI Conference 2016 Conference Paper

Robust Multi-View Subspace Learning through Dual Low-Rank Decompositions

  • Zhengming Ding
  • Yun Fu

Multi-view data is highly common nowadays, since various view-points and different sensors tend to facilitate better data representation. However, data from different views show a large divergence. Specifically, one sample lies in two kinds of structures, one is class structure and the other is view structure, which are intertwined with one another in the original feature space. To address this, we develop a Robust Multi-view Subspace Learning algorithm (RMSL) through dual low-rank decompositions, which desires to seek a low-dimensional view-invariant subspace for multi-view data. Through dual low-rank decompositions, RMSL aims to disassemble two intertwined structures from each other in the low-dimensional subspace. Furthermore, we develop two novel graph regularizers to guide dual low-rank decompositions in a supervised fashion. In this way, the semantic gap across different views would be mitigated so that RMSL can preserve more within-class information and reduce the influence of view variance to seek a more robust low-dimensional subspace. Extensive experiments on two multi-view benchmarks, e. g. , face and object images, have witnessed the superiority of our proposed algorithm, by comparing it with the state-of-the-art algorithms.

AAAI Conference 2016 Conference Paper

Spectral Bisection Tree Guided Deep Adaptive Exemplar Autoencoder for Unsupervised Domain Adaptation

  • Ming Shao
  • Zhengming Ding
  • Handong Zhao
  • Yun Fu

Learning with limited labeled data is always a challenge in AI problems, and one of promising ways is transferring wellestablished source domain knowledge to the target domain, i. e. , domain adaptation. In this paper, we extend the deep representation learning to domain adaptation scenario, and propose a novel deep model called “Deep Adaptive Exemplar AutoEncoder (DAE2 )”. Different from conventional denoising autoencoders using corrupted inputs, we assign semantics to the input-output pairs of the autoencoders, which allow us to gradually extract discriminant features layer by layer. To this end, first, we build a spectral bisection tree to generate source-target data compositions as the training pairs fed to autoencoders. Second, a low-rank coding regularizer is imposed to ensure the transferability of the learned hidden layer. Finally, a supervised layer is added on top to transform learned representations into discriminant features. The problem above can be solved iteratively in an EM fashion of learning. Extensive experiments on domain adaptation tasks including object, handwritten digits, and text data classifications demonstrate the effectiveness of the proposed method.

IJCAI Conference 2015 Conference Paper

Deep Linear Coding for Fast Graph Clustering

  • Ming Shao
  • Sheng Li
  • Zhengming Ding
  • Yun Fu

Clustering has been one of the most critical unsupervised learning techniques that has been widely applied in data mining problems. As one of its branches, graph clustering enjoys its popularity due to its appealing performance and strong theoretical supports. However, the eigen-decomposition problems involved are computationally expensive. In this paper, we propose a deep structure with a linear coder as the building block for fast graph clustering, called Deep Linear Coding (DLC). Different from conventional coding schemes, we jointly learn the feature transform function and discriminative codings, and guarantee that the learned codes are robust in spite of local distortions. In addition, we use the proposed linear coders as the building blocks to formulate a deep structure to further refine features in a layerwise fashion. Extensive experiments on clustering tasks demonstrate that our method performs well in terms of both time complexity and clustering accuracy. On a large-scale benchmark dataset (580K), our method runs 1500 times faster than the original spectral clustering.

IJCAI Conference 2015 Conference Paper

Deep Low-Rank Coding for Transfer Learning

  • Zhengming Ding
  • Ming Shao
  • Yun Fu

Recent researches on transfer learning exploit deep structures for discriminative feature representation to tackle cross-domain disparity. However, few of them are able to joint feature learning and knowledge transfer in a unified deep framework. In this paper, we develop a novel approach, called Deep Low-Rank Coding (DLRC), for transfer learning. Specifically, discriminative low-rank coding is achieved in the guidance of an iterative supervised structure term for each single layer. In this way, both marginal and conditional distributions between two domains intend to be mitigated. In addition, a marginalized denoising feature transformation is employed to guarantee the learned singlelayer low-rank coding to be robust despite of corruptions or noises. Finally, by stacking multiple layers of low-rank codings, we manage to learn robust cross-domain features from coarse to fine. Experimental results on several benchmarks have demonstrated the effectiveness of our proposed algorithm on facilitating the recognition performance for the target domain.

AAAI Conference 2014 Conference Paper

Latent Low-Rank Transfer Subspace Learning for Missing Modality Recognition

  • Zhengming Ding
  • Shao Ming
  • Yun Fu

We consider an interesting problem in this paper that uses transfer learning in two directions to compensate missing knowledge from the target domain. Transfer learning tends to be exploited as a powerful tool that mitigates the discrepancy between different databases used for knowledge transfer. It can also be used for knowledge transfer between different modalities within one database. However, in either case, transfer learning will fail if the target data are missing. To overcome this, we consider knowledge transfer between different databases and modalities simultaneously in a single framework, where missing target data from one database are recovered to facilitate recognition task. We referred to this framework as Latent Low-rank Transfer Subspace Learning method (L2 TSL). We first propose to use a low-rank constraint as well as dictionary learning in a learned subspace to guide the knowledge transfer between and within different databases. We then introduce a latent factor to uncover the underlying structure of the missing target data. Next, transfer learning in two directions is proposed to integrate auxiliary database for transfer learning with missing target data. Experimental results of multi-modalities knowledge transfer with missing target data demonstrate that our method can successfully inherit knowledge from the auxiliary database to complete the target domain, and therefore enhance the performance when recognizing data from the modality without any training data.