Arrow Research search

Author name cluster

Zhong Ji

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers
1 author row

Possible papers

7

IJCAI Conference 2022 Conference Paper

Learning from Students: Online Contrastive Distillation Network for General Continual Learning

  • Jin Li
  • Zhong Ji
  • Gang Wang
  • Qiang Wang
  • Feng Gao

The goal of General Continual Learning (GCL) is to preserve learned knowledge and learn new knowledge with constant memory from an infinite data stream where task boundaries are blurry. Distilling the model's response of reserved samples between the old and the new models is an effective way to achieve promise performance on GCL. However, it accumulates the inherent old model's response bias and is not robust to model changes. To this end, we propose an Online Contrastive Distillation Network (OCD-Net) to tackle these problems, which explores the merit of the student model in each time step to guide the training process of the student model. Concretely, the teacher model is devised to help the student model to consolidate the learned knowledge, which is trained online via integrating the model weights of the student model to accumulate the new knowledge. Moreover, our OCD-Net incorporates both relation and adaptive response to help the student model alleviate the catastrophic forgetting, which is also beneficial for the teacher model preserves the learned knowledge. Extensive experiments on six benchmark datasets demonstrate that our proposed OCD-Net significantly outperforms state-of-the-art approaches in 3. 26%~8. 71% with various buffer sizes. Our code is available at https: //github. com/lijincm/OCD-Net.

IJCAI Conference 2022 Conference Paper

Masked Feature Generation Network for Few-Shot Learning

  • Yunlong Yu
  • Dingyi Zhang
  • Zhong Ji

In this paper, we present a feature-augmentation approach called Masked Feature Generation Network (MFGN) for Few-Shot Learning (FSL), a challenging task that attempts to recognize the novel classes with a few visual instances for each class. Most of the feature-augmentation approaches tackle FSL tasks via modeling the intra-class distributions. We extend this idea further to explicitly capture the intra-class variations in a one-to-many manner. Specifically, MFGN consists of an encoder-decoder architecture, with an encoder that performs as a feature extractor and extracts the feature embeddings of the available visual instances (the unavailable instances are seen to be masked), along with a decoder that performs as a feature generator and reconstructs the feature embeddings of the unavailable visual instances from both the available feature embeddings and the masked tokens. Equipped with this generative architecture, MFGN produces nontrivial visual features for the novel classes with limited visual instances. In extensive experiments on four FSL benchmarks, MFGN performs competitively and outperforms the state-of-the-art competitors on most of the few-shot classification tasks.

IJCAI Conference 2021 Conference Paper

Step-Wise Hierarchical Alignment Network for Image-Text Matching

  • Zhong Ji
  • Kexin Chen
  • Haoran Wang

Image-text matching plays a central role in bridging the semantic gap between vision and language. The key point to achieve precise visual-semantic alignment lies in capturing the fine-grained cross-modal correspondence between image and text. Most previous methods rely on single-step reasoning to discover the visual-semantic interactions, which lacks the ability of exploiting the multi-level information to locate the hierarchical fine-grained relevance. Different from them, in this work, we propose a step-wise hierarchical alignment network (SHAN) that decomposes image-text matching into multi-step cross-modal reasoning process. Specifically, we first achieve local-to-local alignment at fragment level, following by performing global-to-local and global-to-global alignment at context level sequentially. This progressive alignment strategy supplies our model with more complementary and sufficient semantic clues to understand the hierarchical correlations between image and text. The experimental results on two benchmark datasets demonstrate the superiority of our proposed method.

AAAI Conference 2020 Conference Paper

GTNet: Generative Transfer Network for Zero-Shot Object Detection

  • Shizhen Zhao
  • Changxin Gao
  • Yuanjie Shao
  • Lerenhan Li
  • Changqian Yu
  • Zhong Ji
  • Nong Sang

We propose a Generative Transfer Network (GTNet) for zero-shot object detection (ZSD). GTNet consists of an Object Detection Module and a Knowledge Transfer Module. The Object Detection Module can learn large-scale seen domain knowledge. The Knowledge Transfer Module leverages a feature synthesizer to generate unseen class features, which are applied to train a new classification layer for the Object Detection Module. In order to synthesize features for each unseen class with both the intra-class variance and the IoU variance, we design an IoU-Aware Generative Adversarial Network (IoUGAN) as the feature synthesizer, which can be easily integrated into GTNet. Specifically, IoUGAN consists of three unit models: Class Feature Generating Unit (CFU), Foreground Feature Generating Unit (FFU), and Background Feature Generating Unit (BFU). CFU generates unseen features with the intra-class variance conditioned on the class semantic embeddings. FFU and BFU add the IoU variance to the results of CFU, yielding class-specific foreground and background features, respectively. We evaluate our method on three public datasets and the results demonstrate that our method performs favorably against the state-of-the-art ZSD approaches.

AAAI Conference 2020 Conference Paper

SGAP-Net: Semantic-Guided Attentive Prototypes Network for Few-Shot Human-Object Interaction Recognition

  • Zhong Ji
  • Xiyao Liu
  • Yanwei Pang
  • Xuelong Li

Extreme instance imbalance among categories and combinatorial explosion make the recognition of Human-Object Interaction (HOI) a challenging task. Few studies have addressed both challenges directly. Motivated by the success of few-shot learning that learns a robust model from a few instances, we formulate HOI as a few-shot task in a meta-learning framework to alleviate the above challenges. Due to the fact that the intrinsic characteristic of HOI is diverse and interactive, we propose a Semantic-Guided Attentive Prototypes Network (SGAP-Net) to learn a semantic-guided metric space where HOI recognition can be performed by computing distances to attentive prototypes of each class. Specifically, the model generates attentive prototypes guided by the category names of actions and objects, which highlight the commonalities of images from the same class in HOI. In addition, we design a novel decision method to alleviate the biases produced by different patterns of the same action in HOI. Finally, in order to realize the task of few-shot HOI, we reorganize two HOI benchmark datasets, i. e. , HICO-FS and TUHOI-FS, to realize the task of few-shot HOI. Extensive experimental results on both datasets have demonstrated the effectiveness of our proposed SGAP-Net approach.

IJCAI Conference 2019 Conference Paper

Dual-Path in Dual-Path Network for Single Image Dehazing

  • Aiping Yang
  • Haixin Wang
  • Zhong Ji
  • Yanwei Pang
  • Ling Shao

Recently, deep learning-based single image dehazing method has been a popular approach to tackle dehazing. However, the existing dehazing approaches are performed directly on the original hazy image, which easily results in image blurring and noise amplifying. To address this issue, the paper proposes a DPDP-Net (Dual-Path in Dual-Path network) framework by employing a hierarchical dual path network. Specifically, the first-level dual-path network consists of a Dehazing Network and a Denoising Network, where the Dehazing Network is responsible for haze removal in the structural layer, and the Denoising Network deals with noise in the textural layer, respectively. And the second-level dual-path network lies in the Dehazing Network, which has an AL-Net (Atmospheric Light Network) and a TM-Net (Transmission Map Network), respectively. Concretely, the AL-Net aims to train the non-uniform atmospheric light, while the TM-Net aims to train the transmission map that reflects the visibility of the image. The final dehazing image is obtained by nonlinearly fusing the output of the Denoising Network and the Dehazing Network. Extensive experiments demonstrate that our proposed DPDP-Net achieves competitive performance against the state-of-the-art methods on both synthetic and real-world images.

NeurIPS Conference 2018 Conference Paper

Stacked Semantics-Guided Attention Model for Fine-Grained Zero-Shot Learning

  • Yunlong Yu
  • Zhong Ji
  • Yanwei Fu
  • Jichang Guo
  • Yanwei Pang
  • Zhongfei (Mark) Zhang

Zero-Shot Learning (ZSL) is generally achieved via aligning the semantic relationships between the visual features and the corresponding class semantic descriptions. However, using the global features to represent fine-grained images may lead to sub-optimal results since they neglect the discriminative differences of local regions. Besides, different regions contain distinct discriminative information. The important regions should contribute more to the prediction. To this end, we propose a novel stacked semantics-guided attention (S2GA) model to obtain semantic relevant features by using individual class semantic features to progressively guide the visual features to generate an attention map for weighting the importance of different local regions. Feeding both the integrated visual features and the class semantic features into a multi-class classification architecture, the proposed framework can be trained end-to-end. Extensive experimental results on CUB and NABird datasets show that the proposed approach has a consistent improvement on both fine-grained zero-shot classification and retrieval tasks.