Author name cluster

Yilong Yin

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

28 papers

2 author rows

AAAI Conference 2026 Conference Paper

MTRL-CG: Multi-Task Reinforcement Learning Method with Spectral Clustering-Based Task Grouping

Wenjia Meng
Teng Zhang
Haoliang Sun
Yilong Yin

Multi-task reinforcement learning (RL) aims to enhance agent performance across multiple tasks by enabling effective knowledge transfer. However, these methods adopt a fully shared policy across all tasks without explicitly distinguishing between related and conflicting ones, making them suffer from negative interference issue, where updates beneficial to one task adversely affect others and lead to degraded overall performance. In this paper, we propose a multi-task reinforcement learning method with spectral clustering-based task grouping (MTRL-CG), which leverages spectral clustering to group related tasks and separate conflicting ones, enabling group-wise policy learning to mitigate negative interference. We first quantify inter-task affinity by measuring the influence of task-specific updates on others within a shared model, and construct an affinity matrix to capture these relationships. Spectral clustering is then applied to partition tasks via spectral embedding and k-means clustering. Each task group is trained with a dedicated policy network to promote focused learning. Built upon the Soft Actor-Critic (SAC) algorithm, MTRL-CG can be readily integrated into existing SAC-based multi-task RL methods. Extensive experiments on the Meta-World benchmark demonstrate the effectiveness of the proposed MTRL-CG method.

PDF Details DOI

AAAI Conference 2026 Conference Paper

PEOCH: Online Cross-Modal Hashing with Semi-Supervised Streaming Data Driving Prototype Evolution

Xiao Kang
Xingbo Liu
Shuo Pan
Xuening Zhang
Xiushan Nie
Yilong Yin

The exponential growth of streaming multi-modal data presents critical challenges for cross-modal retrieval: distribution shifts, modality gap, and scarce labels. Semi-supervised online cross-modal hashing has gained increasing interest due to its ability to encode complex streaming data and update hash functions simultaneously. Nevertheless, existing methods can hardly generate high-quality unsupervised hash codes, which fundamentally limits diversity and flexibility during the retrieval process. To this end, we propose a novel method named Prototype Evolution Online Cross-modal Hashing (PEOCH). By driving prototype evolution with semi-supervised streaming data, precise and stable hash codes are generated for both labeled and unlabeled data. Specifically, two prototype updates with stability guarantee are conducted: labeled samples push semantic knowledge into the supervised prototypes, while unlabeled samples perform clustering to generate unsupervised prototypes. Simultaneously, a co-optimization mechanism is designed to ensure the prototypes continuously evolve and preserve the consistency of the entire streaming data. Besides, an elasticity regularizer integrates discriminability and smoothness constraints, improving the reliability of prototypes. Extensive experiments on three benchmark datasets demonstrate that PEOCH outperforms state-of-the-art methods, achieving an average improvement of 6.7% in mAP@all across various retrieval tasks.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Retriever Encoder Selection Matters for In-Context Learning-based Medical Segmentation

Fan Wang
Zhongyi Han
Yongshun Gong
Yilong Yin

In-context learning-based medical segmentation (ICLM) enables foundation models to generalize to unseen cases without retraining. To enhance performance on test queries, existing methods typically follow a two-stage process: (1) using a retrieval encoder (RE) to map both queries and training samples into a shared feature space, and (2) retrieving and utilizing the top-k most similar training samples. While current methods fix the RE and focus on optimizing stage (2), we show that the choice of RE in stage (1) alone can account for over 70% of the performance variation, highlighting RE selection as a critical yet often overlooked factor in ICLM. In this paper, we conduct an analysis of the RE selection and make two main findings: (1) dynamically selecting the RE for each query outperforms selecting a fixed RE for the entire task; and (2) feature-space heuristics (e.g., intra-class compactness and inter-class separability) fail to predict RE quality. To this end, we propose the instance-adaptive retrieval encoder selection (IRES) method that can select the optimal RE for each query based on output predictions. IRES is based on the intuition that a good RE retrieves relevant demonstrations, helping the ICL model generate more accurate and stable segmentation masks. Thus, we introduce the shape stability score (S³), which evaluates the morphological stability of predicted masks under iterative erosion. Experiments show S³ correlates strongly with true RE quality (Pearson > 0.8), serving as a reliable selection proxy. To reduce S³’s per-query cost, we propose parallel prediction with reciprocal neighbor reuse (P2R), which accelerates inference by parallelizing encoding and reusing encoder selections across reciprocal neighbors, avoiding redundant computation. Built on S³ and P2R, IRES improves ICLM performance across FUNDUS, Brain MRI, and Chest X-ray datasets, with up to 10.6% gain on fundus segmentation.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

From Pretraining to Pathology: How Noise Leads to Catastrophic Inheritance in Medical Models

Hao Sun
Zhongyi Han
Hao Chen
Jindong Wang
Xin Gao
Yilong Yin

Foundation models pretrained on web-scale data drive contemporary transfer learning in vision, language, and multimodal tasks. Recent work shows that mild label noise in these corpora may lift in-distribution accuracy yet sharply reduce out-of-distribution generalization, an effect known as catastrophic inheritance. Medical data is especially sensitive because annotations are scarce, domain shifts are large, and pretraining sources are noisy. We present the first systematic analysis of catastrophic inheritance in medical models. Controlled label-corruption experiments expose a clear structural collapse: as noise rises, the skewness and kurtosis of feature and logit distributions decline, signaling a flattened representation space and diminished discriminative detail. These higher-order statistics form a compact, interpretable marker of degradation in fine-grained tasks such as histopathology. Guided by this finding, we introduce a fine-tuning objective that restores skewness and kurtosis through two scalar regularizers added to the task loss. The method leaves the backbone unchanged and incurs negligible overhead. Tests on PLIP models trained with Twitter pathology images, as well as other large-scale vision and language backbones, show consistent gains in robustness and cross-domain accuracy under varied noise levels.

PDF Details

AAAI Conference 2025 Conference Paper

Generalized Debiased Semi-Supervised Hashing for Large-Scale Image Retrieval

Xingbo Liu
Xuening Zhang
Xiushan Nie
Yang Shi
Yilong Yin

Semi-supervised hashing has shown promising efficacy in large-scale image retrieval, which learns similarity-preserving codes from both labeled and unlabeled data. To enable the use of advanced supervised hashing techniques, pseudo labels are widely applied. However, existing methods typically suffer from a biased learning issue due to pseudo label noise, which can be further aggravated during optimization. Although such a bias can adversely affect hashing accuracy, it has not been investigated sufficiently. In view of this, we present a comprehensive discussion on potential causes of biases, involving processes of pseudo-labeling, hash learning and optimization. Accordingly, a novel Generalized Debiased Semi-supervised Hashing (GDSH) method is proposed as a unified solution to mitigate the biases. Specifically, reliable pseudo labels are first predicted via a robust label completion strategy. Secondly, a debiased hash learning module is designed by combining label denoising and similarity updating. This can not only refine the supervision, but also obtain hash codes that are semantically debiased in both category and sample levels. Finally, a discrete semi-supervised hashing algorithm is proposed to alleviate the bias arising from optimization. Experimental results on three single-label and three multi-label image benchmarks demonstrate that GDSH remarkably outperforms the state-of-the-arts in different semi-supervised settings.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

Improving Generalization in Meta-Learning via Meta-Gradient Augmentation

Ren Wang
Haoliang Sun
Yuxiu Lin
Xinxin Zhang
Yilong Yin

Meta-learning methods typically follow a two-loop framework, where each loop potentially suffers from notorious overfitting, hindering rapid adaptation and generalization to new tasks. Existing methods address this by enhancing the mutual-exclusivity or diversity of training samples, but these data manipulation strategies are data-dependent and insufficiently flexible. This work proposes a data-independent Meta-Gradient Augmentation (MGAug) method from the perspective of gradient regularization. The key idea is first to break the rote memories by network pruning to address memorization overfitting in the inner loop, then use the gradients of pruned sub-networks to augment meta-gradients, alleviating overfitting in the outer loop. Specifically, we explore three pruning strategies, including random width pruning, random parameter pruning, and a newly proposed catfish pruning that measures a Meta-Memorization Carrying Amount (MMCA) score for each parameter and prunes high-score ones to break rote memories. The proposed MGAug is theoretically guaranteed by the generalization bound from the PAC-Bayes framework. Extensive experiments on multiple few-shot learning benchmarks validate MGAug's effectiveness and significant improvement over various meta-baselines.

PDF Details DOI

TIST Journal 2025 Journal Article

MGRL4RE: A Multi-Graph Representation Learning Approach for Urban Region Embedding

Meng Chen
Zechen Li
Hongwei Jia
Xin Shao
Jun Zhao
Qiang Gao
Min Yang
Yilong Yin

Using multi-modal data to learn region representations has gained popularity for its ability to reveal diverse socioeconomic features in cities. However, many studies focus solely on semantic features from points-of-interest (POIs), neglecting the issue of spatial imbalance. This article introduces a Multi-Graph Representation Learning framework for Region Embedding (MGRL4RE), which leverages both inter-region and intra-region correlations through two main components: multi-graph construction based on various region correlations and multi-graph representation learning. The construction module creates a multi-graph reflecting various correlations among regions, utilizing geo-tagged POIs, region data, and human mobility data. Specifically, we assess a region’s importance relative to its spatial context (neighborhood) and develop spatially invariant semantic features to address spatial imbalance. Furthermore, the representation learning module generates comprehensive and effective region representations via multi-view embedding fusion. Our extensive experiments across various downstream tasks, including land use clustering, region popularity prediction, and crime prediction, confirm that our model significantly outperforms existing state-of-the-art region embedding methods.

Details DOI

ICLR Conference 2025 Conference Paper

Re-Evaluating the Impact of Unseen-Class Unlabeled Data on Semi-Supervised Learning Model

Rundong He
Yicong Dong
Lanzhe Guo
Yilong Yin
Tailin Wu

Semi-supervised learning (SSL) effectively leverages unlabeled data and has been proven successful across various fields. Current safe SSL methods believe that unseen classes in unlabeled data harm the performance of SSL models. However, previous methods for assessing the impact of unseen classes on SSL model performance are flawed. They fix the size of the unlabeled dataset and adjust the proportion of unseen classes within the unlabeled data to assess the impact. This process contravenes the principle of controlling variables. Adjusting the proportion of unseen classes in unlabeled data alters the proportion of seen classes, meaning the decreased classification performance of seen classes may not be due to an increase in unseen class samples in the unlabeled data, but rather a decrease in seen class samples. Thus, the prior flawed assessment standard that "unseen classes in unlabeled data can damage SSL model performance" may not always hold true. This paper strictly adheres to the principle of controlling variables, maintaining the proportion of seen classes in unlabeled data while only changing the unseen classes across five critical dimensions, to investigate their impact on SSL models from global robustness and local robustness. Experiments demonstrate that unseen classes in unlabeled data do not necessarily impair the performance of SSL models; in fact, under certain conditions, unseen classes may even enhance them.

Details

AAAI Conference 2025 Conference Paper

Semi-Supervised Online Cross-Modal Hashing

Xiao Kang
Xingbo Liu
Xuening Zhang
Wen Xue
Xiushan Nie
Yilong Yin

Online cross-modal hashing has gained increasing interest due to its ability to encode streaming data and update hash functions simultaneously. Existing online methods often assume either fully supervised or completely unsupervised settings. However, they overlook the prevalent and challenging scenario of semi-supervised cross-modal streaming data, where diverse data types, including labeled/unlabeled, paired/unpaired, and multi-modal, are intertwined. To address this issue, we propose Semi-Supervised Online Cross-modal Hashing (SSOCH). It presents an alignment-free pseudo-labeling strategy that extracts semantic information from unlabeled streaming data without relying on pairing relations. Furthermore, we design an online tri-consistent preserving scheme, integrating pseudo-labeled data regularization, discriminative label embedding, and fine-grained similarity preservation. This scheme fully explores consistency across data annotation, modalities, and streaming chunks, improving the model's adaptiveness in these challenging scenarios. Extensive experiments on benchmark datasets demonstrate the superiority of SSOCH under various scenarios, highlighting the importance of semi-supervised learning for online cross-modal hashing.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Towards Macro-AUC Oriented Imbalanced Multi-Label Continual Learning

Yan Zhang
Guoqiang Wu
Bingzheng Wang
Teng Pang
Haoliang Sun
Yilong Yin

In Continual Learning (CL), while existing work primarily focuses on the multi-class classification task, there has been limited research on Multi-Label Learning (MLL). In practice, MLL datasets are often class-imbalanced, making it inherently challenging, a problem that is even more acute in CL. Due to its sensitivity to imbalance, Macro-AUC is an appropriate and widely used measure in MLL. However, there is no research to optimize Macro-AUC in MLCL specifically. To fill this gap, in this paper, we propose a new memory replay-based method to tackle the imbalance issue for Macro-AUC-oriented MLCL. Specifically, inspired by recent theory work, we propose a new Reweighted Label-Distribution-Aware Margin (RLDAM) loss. Furthermore, to be compatible with the RLDAM loss, a new memory-updating strategy named Weight Retain Updating (WRU) is proposed to maintain the numbers of positive and negative instances of the original dataset in memory. Theoretically, we provide superior generalization analyses of the RLDAM-based algorithm in terms of Macro-AUC, separately in batch MLL and MLCL settings. This is the first work to offer theoretical generalization analyses in MLCL to our knowledge. Finally, a series of experimental results illustrate the effectiveness of our method over several baselines.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

VT-FSL: Bridging Vision and Text with LLMs for Few-Shot Learning

Wenhao Li
Qiangchang Wang
Xianjing Meng
Zhibin Wu
Yilong Yin

Few-shot learning (FSL) aims to recognize novel concepts from only a few labeled support samples. Recent studies enhance support features by incorporating additional semantic information (e. g. , class descriptions) or designing complex semantic fusion modules. However, these methods still suffer from hallucinating semantics that contradict the visual evidence due to the lack of grounding in actual instances, resulting in noisy guidance and costly corrections. To address these issues, we propose a novel framework, bridging Vision and Text with LLMs for Few-Shot Learning (VT-FSL), which constructs precise cross-modal prompts conditioned on Large Language Models (LLMs) and support images, seamlessly integrating them through a geometry-aware alignment mechanism. It mainly consists of Cross-modal Iterative Prompting (CIP) and Cross-modal Geometric Alignment (CGA). Specifically, the CIP conditions an LLM on both class names and support images to generate precise class descriptions iteratively in a single structured reasoning pass. These descriptions not only enrich the semantic understanding of novel classes but also enable the zero-shot synthesis of semantically consistent images. The descriptions and synthetic images act respectively as complementary textual and visual prompts, providing high-level class semantics and low-level intra-class diversity to compensate for limited support data. Furthermore, the CGA jointly aligns the fused textual, support, and synthetic visual representations by minimizing the kernelized volume of the 3-dimensional parallelotope they span. It captures global and nonlinear relationships among all representations, enabling structured and consistent multimodal integration. The proposed VT-FSL method establishes new state-of-the-art performance across ten diverse benchmarks, including standard, cross-domain, and fine-grained few-shot learning scenarios. Code is available at https: //github. com/peacelwh/VT-FSL.

PDF Details

JBHI Journal 2024 Journal Article

Biomarkers-Aware Asymmetric Bibranch GAN With Adaptive Memory Batch Normalization for Prediction of Anti-VEGF Treatment Response in Neovascular Age-Related Macular Degeneration

Peng Zhao
Xian Song
Xiaoming Xi
Xiushan Nie
Xianjing Meng
Yi Qu
Yilong Yin

The emergence of anti-vascular endothelial growth factor (anti-VEGF) therapy has revolutionized neovascular age-related macular degeneration (nAMD). Post-therapeutic optical coherence tomography (OCT) imaging facilitates the prediction of therapeutic response to anti-VEGF therapy for nAMD. Although the generative adversarial network (GAN) is a popular generative model for post-therapeutic OCT image generation, it is realistically challenging to gather sufficient pre- and post-therapeutic OCT image pairs, resulting in overfitting. Moreover, the available GAN-based methods ignore local details, such as the biomarkers that are essential for nAMD treatment. To address these issues, a Biomarkers-aware Asymmetric Bibranch GAN (BAABGAN) is proposed to efficiently generate post-therapeutic OCT images. Specifically, one branch is developed to learn prior knowledge with a high degree of transferability from large-scale data, termed the source branch. Then, the source branch transfer knowledge to another branch, which is trained on small-scale paired data, termed the target branch. To boost the transferability, a novel Adaptive Memory Batch Normalization (AMBN) is introduced in the source branch, which learns more effective global knowledge that is impervious to noise via memory mechanism. Also, a novel Adaptive Biomarkers-aware Attention (ABA) module is proposed to encode biomarkers information into latent features of target branches to learn finer local details of biomarkers. The proposed method outperforms traditional GAN models and can produce high-quality post-treatment OCT pictures with limited data sets, as shown by the results of experiments.

Details DOI

AAAI Conference 2024 Conference Paper

DiffAIL: Diffusion Adversarial Imitation Learning

Bingzheng Wang
Guoqiang Wu
Teng Pang
Yan Zhang
Yilong Yin

Imitation learning aims to solve the problem of defining reward functions in real-world decision-making tasks. The current popular approach is the Adversarial Imitation Learning (AIL) framework, which matches expert state-action occupancy measures to obtain a surrogate reward for forward reinforcement learning. However, the traditional discriminator is a simple binary classifier and doesn't learn an accurate distribution, which may result in failing to identify expert-level state-action pairs induced by the policy interacting with the environment. To address this issue, we propose a method named diffusion adversarial imitation learning (DiffAIL), which introduces the diffusion model into the AIL framework. Specifically, DiffAIL models the state-action pairs as unconditional diffusion models and uses diffusion loss as part of the discriminator's learning objective, which enables the discriminator to capture better expert demonstrations and improve generalization. Experimentally, the results show that our method achieves state-of-the-art performance and significantly surpasses expert demonstration on two benchmark tasks, including the standard state-action setting and state-only settings.

PDF Details DOI

AIJ Journal 2024 Journal Article

Dual-track spatio-temporal learning for urban flow prediction with adaptive normalization

Xiaoyu Li
Yongshun Gong
Wei Liu
Yilong Yin
Yu Zheng
Liqiang Nie

Details DOI

AAAI Conference 2024 Conference Paper

Exploring Channel-Aware Typical Features for Out-of-Distribution Detection

Rundong He
Yue Yuan
Zhongyi Han
Fan Wang
Wan Su
Yilong Yin
Tongliang Liu
Yongshun Gong

Detecting out-of-distribution (OOD) data is essential to ensure the reliability of machine learning models when deployed in real-world scenarios. Different from most previous test-time OOD detection methods that focus on designing OOD scores, we delve into the challenges in OOD detection from the perspective of typicality and regard the feature’s high-probability region as the feature’s typical set. However, the existing typical-feature-based OOD detection method implies an assumption: the proportion of typical feature sets for each channel is fixed. According to our experimental analysis, each channel contributes differently to OOD detection. Adopting a fixed proportion for all channels results in several channels losing too many typical features or incorporating too many abnormal features, resulting in low performance. Therefore, exploring the channel-aware typical features is crucial to better-separating ID and OOD data. Driven by this insight, we propose expLoring channel-Aware tyPical featureS (LAPS). Firstly, LAPS obtains the channel-aware typical set by calibrating the channel-level typical set with the global typical set from the mean and standard deviation. Then, LAPS rectifies the features into channel-aware typical sets to obtain channel-aware typical features. Finally, LAPS leverages the channel-aware typical features to calculate the energy score for OOD detection. Theoretical and visual analyses verify that LAPS achieves a better bias-variance trade-off. Experiments verify the effectiveness and generalization of LAPS under different architectures and OOD scores.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Discriminability and Transferability Estimation: A Bayesian Source Importance Estimation Approach for Multi-Source-Free Domain Adaptation

Zhongyi Han
Zhiyan Zhang
Fan Wang
Rundong He
Wan Su
Xiaoming Xi
Yilong Yin

Source free domain adaptation (SFDA) transfers a single-source model to the unlabeled target domain without accessing the source data. With the intelligence development of various fields, a zoo of source models is more commonly available, arising in a new setting called multi-source-free domain adaptation (MSFDA). We find that the critical inborn challenge of MSFDA is how to estimate the importance (contribution) of each source model. In this paper, we shed new Bayesian light on the fact that the posterior probability of source importance connects to discriminability and transferability. We propose Discriminability And Transferability Estimation (DATE), a universal solution for source importance estimation. Specifically, a proxy discriminability perception module equips with habitat uncertainty and density to evaluate each sample's surrounding environment. A source-similarity transferability perception module quantifies the data distribution similarity and encourages the transferability to be reasonably distributed with a domain diversity loss. Extensive experiments show that DATE can precisely and objectively estimate the source importance and outperform prior arts by non-trivial margins. Moreover, experiments demonstrate that DATE can take the most popular SFDA networks as backbones and make them become advanced MSFDA solutions.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Exposing the Self-Supervised Space-Time Correspondence Learning via Graph Kernels

Zheyun Qin
Xiankai Lu
Xiushan Nie
Yilong Yin
Jianbing Shen

Self-supervised space-time correspondence learning is emerging as a promising way of leveraging unlabeled video. Currently, most methods adapt contrastive learning with mining negative samples or reconstruction adapted from the image domain, which requires dense affinity across multiple frames or optical flow constraints. Moreover, video correspondence predictive models require mining more inherent properties in videos, such as structural information. In this work, we propose the VideoHiGraph, a space-time correspondence framework based on a learnable graph kernel. Concerning the video as the spatial-temporal graph, the learning objectives of VideoHiGraph are emanated in a self-supervised manner for predicting unobserved hidden graphs via graph kernel manner. We learn a representation of the temporal coherence across frames in which pairwise similarity defines the structured hidden graph, such that a biased random walk graph kernel along the sub-graph can predict long-range correspondence. Then, we learn a refined representation across frames on the node-level via a dense graph kernel. The self-supervision of the model training is formed by the structural and temporal consistency of the graph. VideoHiGraph achieves superior performance and demonstrates its robustness across the benchmark of label propagation tasks involving objects, semantic parts, keypoints, and instances. Our algorithm implementations have been made publicly available at https://github.com/zyqin19/VideoHiGraph.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Off-Policy Proximal Policy Optimization

Wenjia Meng
Qian Zheng
Gang Pan
Yilong Yin

Proximal Policy Optimization (PPO) is an important reinforcement learning method, which has achieved great success in sequential decision-making problems. However, PPO faces the issue of sample inefficiency, which is due to the PPO cannot make use of off-policy data. In this paper, we propose an Off-Policy Proximal Policy Optimization method (Off-Policy PPO) that improves the sample efficiency of PPO by utilizing off-policy data. Specifically, we first propose a clipped surrogate objective function that can utilize off-policy data and avoid excessively large policy updates. Next, we theoretically clarify the stability of the optimization process of the proposed surrogate objective by demonstrating the degree of policy update distance is consistent with that in the PPO. We then describe the implementation details of the proposed Off-Policy PPO which iteratively updates policies by optimizing the proposed clipped surrogate objective. Finally, the experimental results on representative continuous control tasks validate that our method outperforms the state-of-the-art methods on most tasks.

PDF Details DOI

ICML Conference 2023 Conference Paper

Towards Understanding Generalization of Macro-AUC in Multi-label Learning

Guoqiang Wu
Chongxuan Li
Yilong Yin

Macro-AUC is the arithmetic mean of the class-wise AUCs in multi-label learning and is commonly used in practice. However, its theoretical understanding is far lacking. Toward solving it, we characterize the generalization properties of various learning algorithms based on the corresponding surrogate losses w. r. t. Macro-AUC. We theoretically identify a critical factor of the dataset affecting the generalization bounds: the label-wise class imbalance. Our results on the imbalance-aware error bounds show that the widely-used univariate loss-based algorithm is more sensitive to the label-wise class imbalance than the proposed pairwise and reweighted loss-based ones, which probably implies its worse performance. Moreover, empirical results on various datasets corroborate our theory findings. To establish it, technically, we propose a new (and more general) McDiarmid-type concentration inequality, which may be of independent interest.

Details

NeurIPS Conference 2023 Conference Paper

Unified 3D Segmenter As Prototypical Classifiers

Zheyun Qin
Cheng Han
Qifan Wang
Xiushan Nie
Yilong Yin
Lu Xiankai

The task of point cloud segmentation, comprising semantic, instance, and panoptic segmentation, has been mainly tackled by designing task-specific network architectures, which often lack the flexibility to generalize across tasks, thus resulting in a fragmented research landscape. In this paper, we introduce ProtoSEG, a prototype-based model that unifies semantic, instance, and panoptic segmentation tasks. Our approach treats these three homogeneous tasks as a classification problem with different levels of granularity. By leveraging a Transformer architecture, we extract point embeddings to optimize prototype-class distances and dynamically learn class prototypes to accommodate the end tasks. Our prototypical design enjoys simplicity and transparency, powerful representational learning, and ad-hoc explainability. Empirical results demonstrate that ProtoSEG outperforms concurrent well-known specialized architectures on 3D point cloud benchmarks, achieving 72. 3%, 76. 4% and 74. 2% mIoU for semantic segmentation on S3DIS, ScanNet V2 and SemanticKITTI, 66. 8% mCov and 51. 2% mAP for instance segmentation on S3DIS and ScanNet V2, 62. 4% PQ for panoptic segmentation on SemanticKITTI, validating the strength of our concept and the effectiveness of our algorithm. The code and models are available at https: //github. com/zyqin19/PROTOSEG.

PDF Details

JBHI Journal 2022 Journal Article

Learning Binary Semantic Embedding for Large-Scale Breast Histology Image Analysis

Xingbo Liu
Xiao Kang
Xiushan Nie
Jie Guo
Shaohua Wang
Yilong Yin

With the progress of clinical imaging innovation and machine learning, the computer-assisted diagnosis of breast histology images has attracted broad attention. Nonetheless, the use of computer-assisted diagnoses has been blocked due to the incomprehensibility of customary classification models. In view of this question, we propose a novel method for L earning B inary S emantic E mbedding (LBSE). In this study, bit balance and uncorrelation constraints, double supervision, discrete optimization and asymmetric pairwise similarity are seamlessly integrated for learning binary semantic-preserving embedding. Moreover, a fusion-based strategy is carefully designed to handle the intractable problem of parameter setting, saving huge amounts of time for boundary tuning. Based on the above-mentioned proficient and effective embedding, classification and retrieval are simultaneously performed to give interpretable image-based deduction and model helped conclusions for breast histology images. Extensive experiments are conducted on three benchmark datasets to approve the predominance of LBSE in different situations.

Details DOI

AAAI Conference 2022 Conference Paper

Not All Parameters Should Be Treated Equally: Deep Safe Semi-supervised Learning under Class Distribution Mismatch

Rundong He
Zhongyi Han
Yang Yang
Yilong Yin

Deep semi-supervised learning (SSL) aims to utilize a sizeable unlabeled set to train deep networks, thereby reducing the dependence on labeled instances. However, the unlabeled set often carries unseen classes that cause the deep SSL algorithm to lose generalization. Previous works focus on the data level that they attempt to remove unseen class data or assign lower weight to them but could not eliminate their adverse effects on the SSL algorithm. Rather than focusing on the data level, this paper turns attention to the model parameter level. We find that only partial parameters are essential for seen-class classification, termed safe parameters. In contrast, the other parameters tend to fit irrelevant data, termed harmful parameters. Driven by this insight, we propose Safe Parameter Learning (SPL) to discover safe parameters and make the harmful parameters inactive, such that we can mitigate the adverse effects caused by unseen-class data. Specifically, we firstly design an effective strategy to divide all parameters in the pre-trained SSL model into safe and harmful ones. Then, we introduce a bi-level optimization strategy to update the safe parameters and kill the harmful parameters. Extensive experiments show that SPL outperforms the stateof-the-art SSL methods on all the benchmarks by a large margin. Moreover, experiments demonstrate that SPL can be integrated into the most popular deep SSL networks and be easily extended to handle other cases of class distribution mismatch.

PDF Details

AAAI Conference 2020 Short Paper

Focusing on Detail: Deep Hashing Based on Multiple Region Details (Student Abstract)

Quan Zhou
Xiushan Nie
Yang Shi
Xingbo Liu
Yilong Yin

Fast retrieval efficiency and high performance hashing, which aims to convert multimedia data into a set of short binary codes while preserving the similarity of the original data, has been widely studied in recent years. Majority of the existing deep supervised hashing methods only utilize the semantics of a whole image in learning hash codes, but ignore the local image details, which are important in hash learning. To fully utilize the detailed information, we propose a novel deep multi-region hashing (DMRH), which learns hash codes from local regions, and in which the final hash codes of the image are obtained by fusing the local hash codes corresponding to local regions. In addition, we propose a self-similarity loss term to address the imbalance problem (i.e., the number of dissimilar pairs is significantly more than that of the similar ones) of methods based on pairwise similarity.

PDF Details

ICML Conference 2020 Conference Paper

Learning to Learn Kernels with Variational Random Features

Xiantong Zhen
Haoliang Sun
Yingjun Du
Jun Xu 0019
Yilong Yin
Ling Shao 0001
Cees G. M. Snoek

We introduce kernels with random Fourier features in the meta-learning framework for few-shot learning. We propose meta variational random features (MetaVRF) to learn adaptive kernels for the base-learner, which is developed in a latent variable model by treating the random feature basis as the latent variable. We formulate the optimization of MetaVRF as a variational inference problem by deriving an evidence lower bound under the meta-learning framework. To incorporate shared knowledge from related tasks, we propose a context inference of the posterior, which is established by an LSTM architecture. The LSTM-based inference network can effectively integrate the context information of previous tasks with task-specific information, generating informative and adaptive features. The learned MetaVRF can produce kernels of high representational power with a relatively low spectral sampling rate and also enables fast adaptation to new tasks. Experimental results on a variety of few-shot regression and classification tasks demonstrate that MetaVRF delivers much better, or at least competitive, performance compared to existing meta-learning alternatives.

Details

IJCAI Conference 2020 Conference Paper

Towards Accurate and Robust Domain Adaptation under Noisy Environments

Zhongyi Han
Xian-Jin Gui
Chaoran Cui
Yilong Yin

In non-stationary environments, learning machines usually confront the domain adaptation scenario where the data distribution does change over time. Previous domain adaptation works have achieved great success in theory and practice. However, they always lose robustness in noisy environments where the labels and features of examples from the source domain become corrupted. In this paper, we report our attempt towards achieving accurate noise-robust domain adaptation. We first give a theoretical analysis that reveals how harmful noises influence unsupervised domain adaptation. To eliminate the effect of label noise, we propose an offline curriculum learning for minimizing a newly-defined empirical source risk. To reduce the impact of feature noise, we propose a proxy distribution based margin discrepancy. We seamlessly transform our methods into an adversarial network that performs efficient joint optimization for them, successfully mitigating the negative influence from both data corruption and distribution shift. A series of empirical studies show that our algorithm remarkably outperforms state of the art, over 10% accuracy improvements in some domain adaptation tasks under noisy environments.

PDF Details DOI

AAAI Conference 2019 Short Paper

Jointly Multiple Hash Learning

Xingbo Liu
Xiushan Nie
Yingxin Wang
Yilong Yin

Hashing can compress heterogeneous high-dimensional data into compact binary codes while preserving the similarity to facilitate efficient retrieval and storage, and thus hashing has recently received much attention from information retrieval researchers. Most of the existing hashing methods first predefine a fixed length (e.g., 32, 64, or 128 bit) for the hash codes before learning them with this fixed length. However, one sample can be represented by various hash codes with different lengths, and thus there must be some associations and relationships among these different hash codes because they represent the same sample. Therefore, harnessing these relationships will boost the performance of hashing methods. Inspired by this possibility, in this study, we propose a new model jointly multiple hash learning (JMH), which can learn hash codes with multiple lengths simultaneously. In the proposed JMH method, three types of information are used for hash learning, which come from hash codes with different lengths, the original features of the samples and label. In contrast to the existing hashing methods, JMH can learn hash codes with different lengths in one step. Users can select appropriate hash codes for their retrieval tasks according to the requirements in terms of accuracy and complexity. To the best of our knowledge, JMH is one of the first attempts to learn multi-length hash codes simultaneously. In addition, in the proposed model, discrete and closed-form solutions for variables can be obtained by cyclic coordinate descent, thereby making the proposed model much faster during training. Extensive experiments were performed based on three benchmark datasets and the results demonstrated the superior performance of the proposed method.

PDF Details

IJCAI Conference 2019 Conference Paper

Supervised Short-Length Hashing

Xingbo Liu
Xiushan Nie
Quan Zhou
Xiaoming Xi
Lei Zhu
Yilong Yin

Hashing can compress high-dimensional data into compact binary codes, while preserving the similarity, to facilitate efficient retrieval and storage. However, when retrieving using an extremely short length hash code learned by the existing methods, the performance cannot be guaranteed because of severe information loss. To address this issue, in this study, we propose a novel supervised short-length hashing (SSLH). In this proposed SSLH, mutual reconstruction between the short-length hash codes and original features are performed to reduce semantic loss. Furthermore, to enhance the robustness and accuracy of the hash representation, a robust estimator term is added to fully utilize the label information. Extensive experiments conducted on four image benchmarks demonstrate the superior performance of the proposed SSLH with short-length hash codes. In addition, the proposed SSLH outperforms the existing methods, with long-length hash codes. To the best of our knowledge, this is the first linear-based hashing method that focuses on both short and long-length hash codes for maintaining high precision.

PDF Details

JBHI Journal 2018 Journal Article

Multiscale Rotation-Invariant Convolutional Neural Networks for Lung Texture Classification

Qiangchang Wang
Yuanjie Zheng
Gongping Yang
Weidong Jin
Xinjian Chen
Yilong Yin

We propose a new multiscale rotation-invariant convolutional neural network (MRCNN) model for classifying various lung tissue types on high-resolution computed tomography. MRCNN employs Gabor-local binary pattern that introduces a good property in image analysis-invariance to image scales and rotations. In addition, we offer an approach to deal with the problems caused by imbalanced number of samples between different classes in most of the existing works, accomplished by changing the overlapping size between the adjacent patches. Experimental results on a public interstitial lung disease database show a superior performance of the proposed method to state of the art.

Details DOI