Author name cluster

Xiaozhong Ji

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

3 papers

1 author row

IJCAI Conference 2024 Conference Paper

UniM-OV3D: Uni-Modality Open-Vocabulary 3D Scene Understanding with Fine-Grained Feature Representation

Qingdong He
Jinlong Peng
Zhengkai Jiang
Kai Wu
Xiaozhong Ji
Jiangning Zhang
Yabiao Wang
Chengjie Wang

3D open-vocabulary scene understanding aims to recognize arbitrary novel categories beyond the base label space. However, existing works not only fail to fully utilize all the available modal information in the 3D domain but also lack sufficient granularity in representing the features of each modality. In this paper, we propose a unified multimodal 3D open-vocabulary scene understanding network, namely UniM-OV3D, aligning point clouds with image, language and depth. To better integrate global and local features of the point clouds, we design a hierarchical point cloud feature extraction module that learns fine-grained feature representations. Further, to facilitate the learning of coarse-to-fine point-semantic representations from captions, we propose the utilization of hierarchical 3D caption pairs, capitalizing on geometric constraints across various viewpoints of 3D scenes. Extensive experimental results have demonstrated the effectiveness and superiority of our method in open-vocabulary semantic and instance segmentation, which achieves state-of-the-art performance on both indoor and outdoor benchmarks such as ScanNet, ScanNet200, S3IDS and nuScenes. Code is available at https: //github. com/hithqd/UniM-OV3D.

PDF Details DOI

AAAI Conference 2021 Conference Paper

Frequency Consistent Adaptation for Real World Super Resolution

Xiaozhong Ji
Guangpin Tao
Yun Cao
Ying Tai
Tong Lu
Chengjie Wang
Jilin Li
Feiyue Huang

Recent deep-learning based Super-Resolution (SR) methods have achieved remarkable performance on images with known degradation. However, these methods always fail in real-world scene, since the Low-Resolution (LR) images after the ideal degradation (e. g. , bicubic down-sampling) deviate from real source domain. The domain gap between the LR images and the real-world images can be observed clearly on frequency density, which inspires us to explicitly narrow the undesired gap caused by incorrect degradation. From this point of view, we design a novel Frequency Consistent Adaptation (FCA) that ensures the frequency domain consistency when applying existing SR methods to the real scene. We estimate degradation kernels from unsupervised images and generate the corresponding LR images. To provide useful gradient information for kernel estimation, we propose Frequency Density Comparator (FDC) by distinguishing the frequency density of images on different scales. Based on the domain-consistent LR-HR pairs, we train easy-implemented Convolutional Neural Network (CNN) SR models. Extensive experiments show that the proposed FCA improves the performance of the SR model under real-world setting achieving state-of-the-art results with high fidelity and plausible perception, thus providing a novel effective framework for realworld SR application.

PDF Details

NeurIPS Conference 2021 Conference Paper

Spectrum-to-Kernel Translation for Accurate Blind Image Super-Resolution

Guangpin Tao
Xiaozhong Ji
Wenzhuo Wang
Shuo Chen
Chuming Lin
Yun Cao
Tong Lu
Donghao Luo

Deep-learning based Super-Resolution (SR) methods have exhibited promising performance under non-blind setting where blur kernel is known; however, blur kernels of Low-Resolution (LR) images in different practical applications are usually unknown. It may lead to a significant performance drop when degradation process of training images deviates from that of real images. In this paper, we propose a novel blind SR framework to super-resolve LR images degraded by arbitrary blur kernel with accurate kernel estimation in frequency domain. To our best knowledge, this is the first deep learning method which conducts blur kernel estimation in frequency domain. Specifically, we first demonstrate that feature representation in frequency domain is more conducive for blur kernel reconstruction than in spatial domain. Next, we present a Spectrum-to-Kernel (S$2$K) network to estimate general blur kernels in diverse forms. We use a conditional GAN (CGAN) combined with SR-oriented optimization target to learn the end-to-end translation from degraded images' spectra to unknown kernels. Extensive experiments on both synthetic and real-world images demonstrate that our proposed method sufficiently reduces blur kernel estimation error, thus enables the off-the-shelf non-blind SR methods to work under blind setting effectively, and achieves superior performance over state-of-the-art blind SR methods, averagely by 1. 39dB, 0. 48dB (Gaussian kernels) and 6. 15dB, 4. 57dB (motion kernels) for scales $2\times$ and $4\times$ respectively.

PDF Details