Arrow Research search

Author name cluster

Jun Cheng

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

21 papers
2 author rows

Possible papers

21

AAAI Conference 2026 Conference Paper

Heterogeneous Complementary Distillation

  • Liuchi Xu
  • Hao Zheng
  • Lu Wang
  • Lisheng Xu
  • Jun Cheng

Knowledge distillation (KD) transfers the ``dark knowledge'' from a complex teacher model to a compact student model. However, heterogeneous architecture distillation, such as Vision Transformer (ViT) to ResNet18, faces challenges due to differences in spatial feature representations. Traditional KD methods are mostly designed for homogeneous architectures and hence struggle to effectively address the disparity. Although heterogeneous KD approaches have been developed recently to solve these issues, they often incur high computational costs and complex designs, or overly rely on logit alignment, which limits their ability to leverage the complementary features. To overcome these limitations, we propose Heterogeneous Complementary Distillation (HCD), a simple yet effective framework that integrates complementary teacher and student features to align representations in shared logits. These logits are decomposed and constrained to facilitate diverse knowledge transfer to the student. Specifically, HCD processes the student’s intermediate features through convolutional projector and adaptive pooling, concatenates them with teacher's feature from the penultimate layer and then maps them via the Complementary Feature Mapper (CFM) module, comprising fully connected layer, to produce shared logits. We further introduce Sub-logit Decoupled Distillation (SDD) that partitions the shared logits into n sub-logits, which are fused with teacher's logits to rectify classification. To ensure sub-logit diversity and reduce redundant knowledge transfer, we propose an Orthogonality Loss (OL). By preserving student-specific strengths and leveraging teacher knowledge, HCD enhances robustness and generalization in students. Extensive experiments on the CIFAR-100, fine-grained (e.g., CUB200, Aircraft) and ImageNet-1K datasets demonstrate that HCD outperforms state-of-the-art KD methods, establishing it as an effective solution for heterogeneous KD.

EAAI Journal 2025 Journal Article

A looped-type functional for non-fragile fuzzy sampled-data control of doubly fed induction generator-based wind energy conversion systems with failures

  • Xiaoqing Li
  • Kaibo Shi
  • Liang Han
  • Jun Cheng
  • Wei Sun
  • Zhinan Peng

In our investigation, by integrating the fuzzy modeling method and looped-type Lyapunov functional in tandem with free-matrix-based integral inequality technique, the nonfragile fault tolerant fuzzy sampled-data control (FTFSDC) problem for doubly fed induction generator (DFIG)-based wind energy conversion systems (WECSs) that are prone to semi-Markovian jump-type actuator failures (SMJAFs) has been examined. First and foremost, fuzzy modeling method, which well-suited for addressing high nonlinearity issues, is exploited to improve the imprecision of DFIG-based WECSs. Subsequently, to delineate the stochastic actuator failures (SAFs), the FTFSDC framework is reconstructed with semi-Markovian jump faulty coefficients undergoing partially unknown transition rates (PUTRs), which has stronger modeling precisely and exhibits the practical scenarios more accurately. More concretely, the PUTRs considered in this article including completely unknown elements and uncertain but bounded elements as two special situations. Thirdly, in a departure from the anterior sampled-data control (SDC) mechanisms, the memory signal τ is subtly embedded into the communication channel, and the sampling patterns in conjunction with the control actions are performed in a sawtooth-characteristic manner. Following that, by adequately capturing the characteristic information of the entire sampling intervals [ t k, t k + 1 ) and [ t k − τ, t k + 1 − τ ) simultaneously, an augmented dual-sided looped-type Lyapunov functional is put forward. The superiority of the newly constructed Lyapunov functional is that more sawtooth structure information of the actual sampling patterns is developed and some symmetric matrices involved are no longer restricted to positive definite, which can significantly enhance the design freedom and flexibility. Afterwards, with the assistance of mathematical deduction and stochastic analysis technique, several certain sufficient conditions such that the underlying plant is asymptotically stable are lastly precisely inferred in the shape of linear matrix inequalities (LMIs). In conclusion, a corresponding numerical simulation is conducted to thoroughly substantiate the efficacy and feasibility of the theoretical outcomes.

AAAI Conference 2025 Conference Paper

Debiased Distillation for Consistency Regularization

  • Lu Wang
  • Liuchi Xu
  • Xiong Yang
  • Zhenhua Huang
  • Jun Cheng

Knowledge distillation transfers "dark knowledge" from a large teacher model to a smaller student model, yielding a highly efficient network. To improve network's generalization ability, existing works use a larger temperature coefficient for knowledge distillation. Nevertheless, these methods may lower the target category's confidence and lead to ambiguous recognition of similar samples. To mitigate this issue, some studies introduce intra-batch distillation to reduce prediction discrepancy. However, these methods overlook the inconsistency between background information and the target category, which may increase prediction bias due to noise disturbance. Additionally, label imbalance from random sampling and batch size can undermine network generalization reliability. To tackle these challenges, we propose a simple yet effective Intra-class Knowledge Distillation (IKD) method that facilitates knowledge sharing within the same class to ensure consistent predictions. First, we initialize the matrix and the vector to store logits and class counts provided by the teacher, respectively. Then, in the first epoch, we calculate the sum of logits and sample counts per class and perform KD to prevent knowledge omission. Finally, in subsequent training, we update the matrix to obtain the average logits and compute the KL divergence between the student's output and the updated matrix according to the label index. This process ensures intra-class consistency and improves the student's performance. Furthermore, this method theoretically reduces prediction bias by ensuring intra-class consistency. Extensive experiments on the CIFAR-100, ImageNet-1K, and Tiny-ImageNet datasets validate the superiority of IKD.

IJCAI Conference 2025 Conference Paper

Human-Imperceptible, Machine-Recognizable Images

  • Fusheng Hao
  • Fengxiang He
  • Yikai Wang
  • Fuxiang Wu
  • Jing Zhang
  • Dacheng Tao
  • Jun Cheng

Massive human-related data is collected to train neural networks for computer vision tasks. A major conflict is exposed relating to software engineers between better developing AI systems and distancing from the sensitive training data. To reconcile this conflict, the paper proposes an efficient privacy-preserving learning paradigm, where images are encrypted to become ``human-imperceptible, machine-recognizable'' via one of the two encryption strategies: (1) random shuffling equally-sized patches and (2) mixing-up sub-patches. Then, minimal adaptations are made to vision transformer to enable it to learn on the encrypted images for vision tasks, including image classification and object detection. Extensive experiments on ImageNet and COCO show that the proposed paradigm achieves comparable accuracy with the competitive methods. Decrypting the encrypted images requires solving an NP-hard jigsaw puzzle or ill-posed inverse problem, which is empirically shown intractable to be recovered by various attackers, including the powerful vision transformer-based attacker. We thus show that the proposed paradigm can ensure the encrypted images have become human-imperceptible while preserving machine-recognizable information.

TMLR Journal 2025 Journal Article

Mixture of Balanced Information Bottlenecks for Long-Tailed Visual Recognition

  • Yifan Lan
  • Cai xin
  • Jun Cheng
  • Shan Tan

Deep neural networks (DNNs) have achieved significant success in various applications with large-scale and balanced data. However, data in real-world visual recognition are usually long-tailed, bringing challenges to efficient training and deployment of DNNs. Information bottleneck (IB) is an elegant approach for representation learning. In this paper, we propose a balanced information bottleneck (BIB) approach, in which loss function re-balancing and self-distillation techniques are integrated into the original IB network. BIB is thus capable of learning a sufficient representation with essential label-related information fully preserved for long-tailed visual recognition. To further enhance the representation learning capability, we also propose a novel structure of mixture of multiple balanced information bottlenecks (MBIB), where different BIBs are responsible for combining knowledge from different network layers. MBIB facilitates an end-to-end learning strategy that trains representation and classification simultaneously from an information theory perspective. We conduct experiments on commonly used long-tailed datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist 2018. Both BIB and MBIB reach state-of-the-art performance for long-tailed visual recognition.

EAAI Journal 2025 Journal Article

Multi-objective weighted average algorithm: a novel algorithm for multi-objective optimization problems and its application in engineering problems

  • Jun Cheng
  • Wim De Waele

Numerous meta-heuristic algorithms struggle with degraded performance when addressing multi-objective optimization problems due to the challenge of balancing two goals: accurately estimating Pareto-optimal solutions and ensuring their broad distribution across objectives. While the Weighted Average Algorithm (WAA) excels in single-objective optimization, its scalarization-based mechanism fundamentally conflicts with multi-objective requirements. To bridge this gap, we propose the Multi-Objective Weighted Average Algorithm (MOWAA) with three key innovations: (1) a hybrid exploration-exploitation mechanism integrating adaptive mutation and crossover operations; (2) an elitist archive management system using efficient non-dominated sorting across three critical solution sets; and (3) a novel roulette-wheel-based leader selection strategy that dynamically balances convergence and diversity. To verify the performance of the developed MOWAA, the numerical benchmark test functions (CEC2009, ZDT and DTLZ) and four engineering problems (the Binh and Korn (BNH), Constraint (CONSTR), Srinivas and Deb (SRN), and 10-bar Truss (BAR TRUSS)) are used in comparison with three multi-objective optimization algorithms. The results show that MOWAA achieves better optimization performance than comparative algorithms, with Pareto-optimal solutions exhibiting excellent convergence and coverage. Finally, applying MOWAA to an Artificial Neural Network (ANN) model using an experimental dataset on surface waviness (in mm) of Wire Arc Additive Manufacturing (WAAM) components enhances predictive accuracy by balancing optimization of prediction error and variance. Compared to single-objective optimization methods, the MOWAA approach effectively captures the complex relationships between process parameters and waviness in the WAAM process.

IROS Conference 2025 Conference Paper

Occlusion-Aware 6D Pose Estimation with Visual Observation Guided Diffusion Model

  • Yanbin Xiong
  • Buzhen Huang
  • Hui Ma
  • Yu Liu
  • Jun Cheng

Category-level 6D pose estimation in cluttered and occluded environments is a challenging task. Most existing methods rely on deterministic point-based correspondences to estimate target poses, which cannot consider the uncertainty for occluded objects, and thus result in inferior performance. In this paper, we propose a diffusion model guided by occlusion-aware observations to adaptively refine the object poses in occluded and cluttered scenes. Specifically, we first extract various 2D and 3D features from an RGB-D image to construct the conditions of diffusion model. In the reverse diffusion process, the model is guided by implicit correspondences, perception distance, and occlusion relationships to refine the noisy pose sampled from a standard Gaussian distribution. With several denoising steps, our method can produce accurate results that are consistent with image observations in occluded scenarios. The experimental results show that the proposed method can outperform baseline methods in major metrics in occlusion scenarios. Furthermore, our approach can also be applied in robotic grasping and manipulation tasks through grasping experiments in a cluttered enviroment on a physical UR5 robot.

IROS Conference 2025 Conference Paper

SDF-guided Keyframe Selection: Novel Boost for NeRF SLAM Loop Closure

  • Hui Ma
  • Yu Liu
  • Jun Cheng

In the domain of Simultaneous Localization and Mapping (SLAM), loop closure is a linchpin for achieving accurate and consistent 3D environment mapping. However, the process is fraught with abrupt light changes and motion blur. These elements introduce uncertainties and inaccuracies in the data captured by sensors, severely undermining the system’s robustness. To address this critical challenge, we present a novel SDF-guided keyframe selection algorithm tailored for loop closure. Our approach capitalizes on the geometric insights provided by the Signed Distance Function (SDF) to meticulously choose keyframes, effectively mitigating the impact of noisy data. By doing so, we enhance the reliability of loop closure, refine the accuracy of 3D map reconstructions, and fortify the overall stability of the system. Our algorithm’s efficacy is substantiated through comprehensive experiments on datasets like Replica, ScanNet, and Tum-RGBD. Notably, it can be easily integrated as a plug-and-play module into diverse existing methods, enhancing their performance across different scenarios. Real-world trials using a hand-held LeTMC-520 camera for indoor scene reconstruction further validate its practicality and effectiveness.

IROS Conference 2025 Conference Paper

The Sampling-Gaussian for Stereo Matching

  • Baiyu Pan
  • Bowen Yao
  • Jichao Jiao
  • Jianxin Pang
  • Jun Cheng

The soft-argmax operation is widely adopted in neural network-based stereo matching methods to enable differentiable regression of disparity. However, networks trained with soft-argmax tend to predict multimodal probability distributions due to the absence of explicit constraints on the shape of the distribution. Previous methods leveraged Laplacian distributions and cross-entropy for training but failed to effectively improve accuracy and even increased the network’s processing time. In this paper, we propose a novel method called Sampling-Gaussian as a substitute for soft-argmax. It improves accuracy without increasing inference time. We innovatively interpret the training process as minimizing the distance in vector space and propose a combined loss of L1 loss and cosine similarity loss. We leveraged the normalized discrete Gaussian distribution for supervision. Moreover, we identified two issues in previous methods and proposed extending the disparity range and employing bilinear interpolation as solutions. We have conducted comprehensive experiments to demonstrate the superior performance of our Sampling-Gaussian method. The experimental results prove that we have achieved better accuracy on five baseline methods across four datasets. Moreover, we have achieved significant improvements on small datasets and models with weaker generalization capabilities. Our method is easy to implement, and the code is available online.

NeurIPS Conference 2024 Conference Paper

Diffusion Priors for Variational Likelihood Estimation and Image Denoising

  • Jun Cheng
  • Shan Tan

Real-world noise removal is crucial in low-level computer vision. Due to the remarkable generation capabilities of diffusion models, recent attention has shifted towards leveraging diffusion priors for image restoration tasks. However, existing diffusion priors-based methods either consider simple noise types or rely on approximate posterior estimation, limiting their effectiveness in addressing structured and signal-dependent noise commonly found in real-world images. In this paper, we build upon diffusion priors and propose adaptive likelihood estimation and MAP inference during the reverse diffusion process to tackle real-world noise. We introduce an independent, non-identically distributed likelihood combined with the noise precision (inverse variance) prior and dynamically infer the precision posterior using variational Bayes during the generation process. Meanwhile, we rectify the estimated noise variance through local Gaussian convolution. The final denoised image is obtained by propagating intermediate MAP solutions that balance the updated likelihood and diffusion prior. Additionally, we explore the local diffusion prior inherent in low-resolution diffusion models, enabling direct handling of high-resolution noisy images. Extensive experiments and analyses on diverse real-world datasets demonstrate the effectiveness of our method. Code is available at https: //github. com/HUST-Tan/DiffusionVI.

ICRA Conference 2024 Conference Paper

Distill-then-prune: An Efficient Compression Framework for Real-time Stereo Matching Network on Edge Devices

  • Baiyu Pan
  • Jichao Jiao
  • Jianxing Pang
  • Jun Cheng

In recent years, numerous real-time stereo matching methods have been introduced, but they often lack accuracy. These methods attempt to improve accuracy by introducing new modules or integrating traditional methods. However, the improvements are only modest. In this paper, we propose a novel strategy by incorporating knowledge distillation and model pruning to overcome the inherent trade-off between speed and accuracy. As a result, we obtained a model that maintains real-time performance while delivering high accuracy on edge devices. Our proposed method involves three key steps. Firstly, we review state-of-the-art methods and design our lightweight model by removing redundant modules from those efficient models through a comparison of their contributions. Next, we leverage the efficient model as the teacher to distill knowledge into the lightweight model. Finally, we systematically prune the lightweight model to obtain the final model. Through extensive experiments conducted on two widely-used benchmarks, Sceneflow and KITTI, we perform ablation studies to analyze the effectiveness of each module and present our state-of-the-art results.

IROS Conference 2024 Conference Paper

Self-Supervised Monocular Depth Estimation with Effective Feature Fusion and Self Distillation

  • Zhenfei Liu
  • Chengqun Song
  • Jun Cheng
  • Jiefu Luo
  • Xiaoyang Wang

Monocular depth estimation obtaining scene depth information from a single image is an important task in the field of computer vision. Constrained by the limitations of convolutional networks in conducting long-distance modeling and the underutilization of datasets, the generalization of existing models is not satisfactory. In this paper, we propose an adaptive backbone named Internal Fusion Transformer to improve generalization ability compared to convolutional backbone, like HRNet, and a Bilateral Attention module which pays more attention to low-level semantic features compared to previous fuse methods. Meanwhile, we introduce three data augmentation methods, namely cropping-resizing (cr), cropping-shuffling (cs), and mirroring (mi), for self distillation, as well as discuss their contributions to model performance improvement. Our model is trained on the KITTI dataset, and without fine-tuning, tested on NYUv2 and Make3D datasets to confirm the generalization. The experimental results illustrate the effectiveness of our design. Our model also demonstrates better performance compared to other models on the KITTI dataset.

AAAI Conference 2024 Conference Paper

SuperJunction: Learning-Based Junction Detection for Retinal Image Registration

  • Yu Wang
  • Xiaoye Wang
  • Zaiwang Gu
  • Weide Liu
  • Wee Siong Ng
  • Weimin Huang
  • Jun Cheng

Keypoints-based approaches have shown to be promising for retinal image registration, which superimpose two or more images from different views based on keypoint detection and description. However, existing approaches suffer from ineffective keypoint detector and descriptor training. Meanwhile, the non-linear mapping from 3D retinal structure to 2D images is often neglected. In this paper, we propose a novel learning-based junction detection approach for retinal image registration, which enhances both the keypoint detector and descriptor training. To improve the keypoint detection, it uses a multi-task vessel detection to regularize the model training, which helps to learn more representative features and reduce the risk of over-fitting. To achieve effective training for keypoints description, a new constrained negative sampling approach is proposed to compute the descriptor loss. Moreover, we also consider the non-linearity between retinal images from different views during matching. Experimental results on FIRE dataset show that our method achieves mean area under curve of 0.850, which is 12.6% higher than 0.755 by the state-of-the-art method. All the codes are available at https://github.com/samjcheng/SuperJunction.

JBHI Journal 2024 Journal Article

Target-Guided Diffusion Models for Unpaired Cross-Modality Medical Image Translation

  • Yimin Luo
  • Qinyu Yang
  • Ziyi Liu
  • Zenglin Shi
  • Weimin Huang
  • Guoyan Zheng
  • Jun Cheng

In a clinical setting, the acquisition of certain medical image modality is often unavailable due to various considerations such as cost, radiation, etc. Therefore, unpaired cross-modality translation techniques, which involve training on the unpaired data and synthesizing the target modality with the guidance of the acquired source modality, are of great interest. Previous methods for synthesizing target medical images are to establish one-shot mapping through generative adversarial networks (GANs). As promising alternatives to GANs, diffusion models have recently received wide interests in generative tasks. In this paper, we propose a target-guided diffusion model (TGDM) for unpaired cross-modality medical image translation. For training, to encourage our diffusion model to learn more visual concepts, we adopted a perception prioritized weight scheme (P2W) to the training objectives. For sampling, a pre-trained classifier is adopted in the reverse process to relieve modality-specific remnants from source data. Experiments on both brain MRI-CT and prostate MRI-US datasets demonstrate that the proposed method achieves a visually realistic result that mimics a vivid anatomical section of the target organ. In addition, we have also conducted a subjective assessment based on the synthesized samples to further validate the clinical value of TGDM.

TMLR Journal 2024 Journal Article

When low-vision task meets dense prediction tasks with less data: an auxiliary self-trained geometry regularization

  • Zaiwang Gu
  • Weide Liu
  • Xulei Yang
  • Chuan-Sheng Foo
  • Jun Cheng

Many deep learning methods are data-driven, often converging to local minima due to limited training data. This situation poses a challenge in domains where acquiring adequate data is difficult for model training or fine-tuning, such as generalized few-shot semantic segmentation (GFSSeg) and monocular depth estimation (MDE). To this end, we propose a self-trained geometry regularization framework to enhance model training or fine-tuning in scenarios with limited training data using geometric knowledge. Specifically, we propose to leverage low-level geometry information extracted from the training data and define a novel regularization term, which is a plug-and-play module jointly trained with the primary task via multi-task learning. Our proposed regularization neither relies on extra manual labels and data in training nor requires extra computation during the inference stage. We demonstrate the effectiveness of this regularization on GFSSeg and MDE tasks. Notably, it improves the state-of-the-art GFSSeg by 5.61% and 4.26% mIoU of novel classes on PASCAL and COCO in the 1-shot scenario. In MDE, it achieves a relative reduction of SILog error by 16.6% and 9.4% for two recent methods in the KITTI dataset.

EAAI Journal 2023 Journal Article

Multi-scale split dual calibration network with periodic information for interpretable fault diagnosis of rotating machinery

  • Yongyi Chen
  • Dan Zhang
  • Hongjie Ni
  • Jun Cheng
  • Hamid Reza Karimi

Conventional intelligent fault diagnosis algorithms based on signal processing and pattern recognition have high demands on expert experience and poor generalization performance, which may not have good fault diagnosis performance in complex industrial fields. Meanwhile, the data acquisition system may suffer from cyber attacks when collecting vibration signals. The vibration signal has a very low signal-to-noise ratio (SNR), which seriously affects the accuracy of fault diagnosis. Aiming at the problem of fault diagnosis under low SNR, a new fault diagnosis framework based on a Multi-scale Split Dual Calibration Network with Periodic Information (PI-MSDCN) is proposed in this paper. In the fault diagnosis framework, a periodic block is constructed to automatically learn the periodic information of vibration signals through the neural network. The learned periodic information and raw vibration signal are used as the input data of MSDCN. Specifically, MSDCN uses convolution kernels of different sizes for different channels of input features to generate multi-scale features, and obtain mixed domain attention features for features with different scales respectively. Then, the attention feature is used as the threshold to remove the redundant information in the multi-scale feature adaptively. Finally, in order to calibrate the contribution of different scale features to fault diagnosis, the mixed-domain attention coefficients are applied to the corresponding features to obtain richer multi-scale attention features. The experimental studies under different levels of interference are performed to demonstrate the average accuracy of the proposed method is 92. 91% ( ± 5. 08 % ), which is superior to other existing results in literature.

AAAI Conference 2023 Conference Paper

Reject Decoding via Language-Vision Models for Text-to-Image Synthesis

  • Fuxiang Wu
  • Liu Liu
  • Fusheng Hao
  • Fengxiang He
  • Lei Wang
  • Jun Cheng

Transformer-based text-to-image synthesis generates images from abstractive textual conditions and achieves prompt results. Since transformer-based models predict visual tokens step by step in testing, where the early error is hard to be corrected and would be propagated. To alleviate this issue, the common practice is drawing multi-paths from the transformer-based models and re-ranking the multi-images decoded from multi-paths to find the best one and filter out others. Therefore, the computing procedure of excluding images may be inefficient. To improve the effectiveness and efficiency of decoding, we exploit a reject decoding algorithm with tiny multi-modal models to enlarge the searching space and exclude the useless paths as early as possible. Specifically, we build tiny multi-modal models to evaluate the similarities between the partial paths and the caption at multi scales. Then, we propose a reject decoding algorithm to exclude some lowest quality partial paths at the inner steps. Thus, under the same computing load as the original decoding, we could search across more multi-paths to improve the decoding efficiency and synthesizing quality. The experiments conducted on the MS-COCO dataset and large-scale datasets show that the proposed reject decoding algorithm can exclude the useless paths and enlarge the searching paths to improve the synthesizing quality by consuming less time.

EAAI Journal 2022 Journal Article

A review of visual SLAM methods for autonomous driving vehicles

  • Jun Cheng
  • Liyan Zhang
  • Qihong Chen
  • Xinrong Hu
  • Jingcao Cai

Autonomous driving vehicles require both a precise localization and mapping solution in different driving environment. In this context, Simultaneous Localization and Mapping (SLAM) technology is a well-study settlement. Light Detection and Ranging (LIDAR) and camera sensors are commonly used for localization and perception. However, through ten or twenty years of evolution, the LIDAR-SLAM method does not seem to have changed much. Compared with the LIDAR based schemes, the visual SLAM has a strong scene recognition ability with the advantages of low cost and easy installation. Indeed, people are trying to replace LIDAR sensors with camera only, or integrating other sensors on the basis of camera in the field of autonomous driving. Based on the current research situation of visual SLAM, this review covers the visual SLAM technologies. In particular, we firstly illustrated the typical structure of visual SLAM. Secondly, the state-of-the-art studies of visual and visual-based (i. e. visual-inertial, visual-LIDAR, visual-LIDAR-IMU) SLAM are completely reviewed, as well the positioning accuracy of our previous work are compared with the well-known frameworks on the public datasets. Finally, the key issues and the future development trend of visual SLAM technologies for autonomous driving vehicles applications are discussed.

IJCAI Conference 2021 Conference Paper

Multi-Level Graph Encoding with Structural-Collaborative Relation Learning for Skeleton-Based Person Re-Identification

  • Haocong Rao
  • Shihao Xu
  • Xiping Hu
  • Jun Cheng
  • Bin Hu

Skeleton-based person re-identification (Re-ID) is an emerging open topic providing great value for safety-critical applications. Existing methods typically extract hand-crafted features or model skeleton dynamics from the trajectory of body joints, while they rarely explore valuable relation information contained in body structure or motion. To fully explore body relations, we construct graphs to model human skeletons from different levels, and for the first time propose a Multi-level Graph encoding approach with Structural-Collaborative Relation learning (MG-SCR) to encode discriminative graph features for person Re-ID. Specifically, considering that structurally-connected body components are highly correlated in a skeleton, we first propose a multi-head structural relation layer to learn different relations of neighbor body-component nodes in graphs, which helps aggregate key correlative features for effective node representations. Second, inspired by the fact that body-component collaboration in walking usually carries recognizable patterns, we propose a cross-level collaborative relation layer to infer collaboration between different level components, so as to capture more discriminative skeleton graph features. Finally, to enhance graph dynamics encoding, we propose a novel self-supervised sparse sequential prediction task for model pre-training, which facilitates encoding high-level graph semantics for person Re-ID. MG-SCR outperforms state-of-the-art skeleton-based methods, and it achieves superior performance to many multi-modal methods that utilize extra RGB or depth features. Our codes are available at https: //github. com/Kali-Hac/MG-SCR.

IJCAI Conference 2020 Conference Paper

Self-Supervised Gait Encoding with Locality-Aware Attention for Person Re-Identification

  • Haocong Rao
  • Siqi Wang
  • Xiping Hu
  • Mingkui Tan
  • Huang Da
  • Jun Cheng
  • Bin Hu

Gait-based person re-identification (Re-ID) is valuable for safety-critical applications, and using only 3D skeleton data to extract discriminative gait features for person Re-ID is an emerging open topic. Existing methods either adopt hand-crafted features or learn gait features by traditional supervised learning paradigms. Unlike previous methods, we for the first time propose a generic gait encoding approach that can utilize unlabeled skeleton data to learn gait representations in a self-supervised manner. Specifically, we first propose to introduce self-supervision by learning to reconstruct input skeleton sequences in reverse order, which facilitates learning richer high-level semantics and better gait representations. Second, inspired by the fact that motion's continuity endows temporally adjacent skeletons with higher correlations (“locality”), we propose a locality-aware attention mechanism that encourages learning larger attention weights for temporally adjacent skeletons when reconstructing current skeleton, so as to learn locality when encoding gait. Finally, we propose Attention-based Gait Encodings (AGEs), which are built using context vectors learned by locality-aware attention, as final gait representations. AGEs are directly utilized to realize effective person Re-ID. Our approach typically improves existing skeleton-based methods by 10-20% Rank-1 accuracy, and it achieves comparable or even superior performance to multi-modal methods with extra RGB or depth information.

AIIM Journal 2020 Journal Article

Speckle reduction of OCT via super resolution reconstruction and its application on retinal layer segmentation

  • Qifeng Yan
  • Bang Chen
  • Yan Hu
  • Jun Cheng
  • Yan Gong
  • Jianlong Yang
  • Jiang Liu
  • Yitian Zhao

Optical coherence tomography (OCT) is a rapidly developing non-invasive three dimensional imaging approach, and it has been widely used in examination and diagnosis of eye diseases. However, speckle noise are often inherited from image acquisition process, and may obscure the anatomical structure, such as the retinal layers. In this paper, we propose a novel method to reduce the speckle noise in 3D OCT scans, by introducing a new super-resolution approach. It uses a multi-frame fusion mechanism that merges multiple scans for the same scene, and utilizes the movements of sub-pixels to recover missing signals in one pixel, which significantly improves the image quality. To evaluate the effectiveness of the proposed speckle noise reduction method, we have applied it for the application of retinal layer segmentation. Results show that the proposed method has produced promising enhancement performance, and enable deep learning-based methods to obtain more accurate retinal layer segmentation results.