Author name cluster

Wenming Tan

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

5 papers

1 author row

AAAI Conference 2023 Conference Paper

MaskBooster: End-to-End Self-Training for Sparsely Supervised Instance Segmentation

Shida Zheng
Chenshu Chen
Xi Yang
Wenming Tan

The present paper introduces sparsely supervised instance segmentation, with the datasets being fully annotated bounding boxes and sparsely annotated masks. A direct solution to this task is self-training, which is not fully explored for instance segmentation yet. In this paper, we propose MaskBooster for sparsely supervised instance segmentation (SpSIS) with comprehensive usage of pseudo masks. MaskBooster is featured with (1) dynamic and progressive pseudo masks from an online updating teacher model, (2) refining binary pseudo masks with the help of bounding box prior, (3) learning inter-class prediction distribution via knowledge distillation for soft pseudo masks. As an end-to-end and universal self-training framework, MaskBooster can empower fully supervised algorithms and boost their segmentation performance on SpSIS. Abundant experiments are conducted on COCO and BDD100K datasets and validate the effectiveness of MaskBooster. Specifically, on different COCO protocols and BDD100K, we surpass sparsely supervised baseline by a large margin for both Mask RCNN and ShapeProp. MaskBooster on SpSIS also outperforms weakly and semi-supervised instance segmentation state-of-the-art on the datasets with similar annotation budgets.

PDF Details DOI

AAAI Conference 2022 Conference Paper

Dual Decoupling Training for Semi-supervised Object Detection with Noise-Bypass Head

Shida Zheng
Chenshu Chen
Xiaowei Cai
Tingqun Ye
Wenming Tan

Pseudo bounding boxes from the self-training paradigm are inevitably noisy for semi-supervised object detection. To cope with that, a dual decoupling training framework is proposed in the present study, i. e. clean and noisy data decoupling, and classification and localization task decoupling. In the first decoupling, two-level thresholds are used to categorize pseudo boxes into three groups, i. e. clean backgrounds, noisy foregrounds and clean foregrounds. With a specially designed noise-bypass head focusing on noisy data, backbone networks can extract coarse but diverse information; and meanwhile, an original head learns from clean samples for more precise predictions. In the second decoupling, we take advantage of the two-head structure for better evaluation of localization quality, thus the category label and location of a pseudo box can remain independent of each other during training. The approach of two-level thresholds is also applied to group pseudo boxes into three sections of different location accuracy. We outperform existing works by a large margin on VOC datasets, reaching 54. 8 mAP (+1. 8), and even up to 55. 9 mAP (+1. 5) by leveraging MS-COCO train2017 as extra unlabeled data. On MS-COCO benchmark, our method also achieves about 1. 0 mAP improvements averaging across protocols compared with the prior state-of-the-art.

PDF Details

NeurIPS Conference 2022 Conference Paper

SAViT: Structure-Aware Vision Transformer Pruning via Collaborative Optimization

Chuanyang Zheng
Zheyang Li
Kai Zhang
Zhi Yang
Wenming Tan
Jun Xiao
Ye Ren
Shiliang Pu

Vision Transformers (ViTs) yield impressive performance across various vision tasks. However, heavy computation and memory footprint make them inaccessible for edge devices. Previous works apply importance criteria determined independently by each individual component to prune ViTs. Considering that heterogeneous components in ViTs play distinct roles, these approaches lead to suboptimal performance. In this paper, we introduce joint importance, which integrates essential structural-aware interactions between components for the first time, to perform collaborative pruning. Based on the theoretical analysis, we construct a Taylor-based approximation to evaluate the joint importance. This guides pruning toward a more balanced reduction across all components. To further reduce the algorithm complexity, we incorporate the interactions into the optimization function under some mild assumptions. Moreover, the proposed method can be seamlessly applied to various tasks including object detection. Extensive experiments demonstrate the effectiveness of our method. Notably, the proposed approach outperforms the existing state-of-the-art approaches on ImageNet, increasing accuracy by 0. 7% over the DeiT-Base baseline while saving 50% FLOPs. On COCO, we are the first to show that 70% FLOPs of FasterRCNN with ViT backbone can be removed with only 0. 3% mAP drop. The code is available at https: //github. com/hikvision-research/SAViT.

PDF Details

AAAI Conference 2022 Conference Paper

SOIT: Segmenting Objects with Instance-Aware Transformers

Xiaodong Yu
Dahu Shi
Xing Wei
Ye Ren
Tingqun Ye
Wenming Tan

This paper presents an end-to-end instance segmentation framework, termed SOIT, that Segments Objects with Instance-aware Transformers. Inspired by DETR, our method views instance segmentation as a direct set prediction problem and effectively removes the need for many hand-crafted components like RoI cropping, one-to-many label assignment, and non-maximum suppression (NMS). In SOIT, multiple queries are learned to directly reason a set of object embeddings of semantic category, bounding-box location, and pixel-wise mask in parallel under the global image context. The class and bounding-box can be easily embedded by a fixed-length vector. The pixel-wise mask, especially, is embedded by a group of parameters to construct a lightweight instance-aware transformer. Afterward, a fullresolution mask is produced by the instance-aware transformer without involving any RoI-based operation. Overall, SOIT introduces a simple single-stage instance segmentation framework that is both RoI- and NMS-free. Experimental results on the MS COCO dataset demonstrate that SOIT outperforms state-of-the-art instance segmentation approaches significantly. Moreover, the joint learning of multiple tasks in a unified query embedding can also substantially improve the detection performance. Code is available at https: //github. com/yuxiaodongHRI/SOIT.

PDF Details

AAAI Conference 2020 Conference Paper

Towards Accurate Low Bit-Width Quantization with Multiple Phase Adaptations

Zhaoyi Yan
Yemin Shi
Yaowei Wang
Mingkui Tan
Zheyang Li
Wenming Tan
Yonghong Tian

Low bit-width model quantization is highly desirable when deploying a deep neural network on mobile and edge devices. Quantization is an effective way to reduce the model size with low bit-width weight representation. However, the unacceptable accuracy drop hinders the development of this approach. One possible reason for this is that the weights in quantization intervals are directly assigned to the center. At the same time, some quantization applications are limited by the various of different network models. Accordingly, in this paper, we propose Multiple Phase Adaptations (MPA), a framework designed to address these two problems. Firstly, weights in the target interval are assigned to center by gradually spreading the quantization range. During the MPA process, the accuracy drop can be compensated for the unquantized parts. Moreover, as MPA does not introduce hyperparameters that depend on different models or bit-width, the framework can be conveniently applied to various models. Extensive experiments demonstrate that MPA achieves higher accuracy than most existing methods on classiﬁcation tasks for AlexNet, VGG-16 and ResNet.

PDF Details