Arrow Research search

Author name cluster

Zhanzhan Cheng

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers
1 author row

Possible papers

8

IJCAI Conference 2023 Conference Paper

Divide Rows and Conquer Cells: Towards Structure Recognition for Large Tables

  • Huawen Shen
  • Xiang Gao
  • Jin Wei
  • Liang Qiao
  • Yu Zhou
  • Qiang Li
  • Zhanzhan Cheng

Recent advanced Table Structure Recognition (TSR) models adopt image-to-text solutions to parse table structure. These methods can be formulated as image caption problem, i. e. , input a single-table image and output table structure description in a specific text format, e. g. , HTML. With the impressive success of Transformer in text generation tasks, these methods use Transformer architecture to predict HTML table text in an autoregressive manner. However, tables always emerge with a large variety of shapes and sizes. Autoregressive models usually suffer from the error accumulation problem as the length of predicted text increases, which results in unsatisfactory performance for large tables. In this paper, we propose a novel image-to-text based TSR method that relieves error accumulation problems and improves performance noticeably. At the core of our method is a cascaded two-step decoder architecture with the former decoder predicting HTML table row tags non-autoregressively and the latter predicting HTML table cell tags of each row in a semi-autoregressive manner. Compared with existing methods that predict HTML text autoregressively, the superiority of our row-to-cell progressive table parsing is twofold: (1) it generates an HTML tag sequence with a vertical-and-horizontal two-step `scanning', which better fits the inherent 2D structure of image data, (2) it performs substantially better for large tables (long sequence prediction) since it alleviates error accumulation problem specific to autoregressive models. Extensive experiments demonstrate that our method achieves competitive performance on three public benchmarks.

AAAI Conference 2022 Conference Paper

PMAL: Open Set Recognition via Robust Prototype Mining

  • Jing Lu
  • Yunlu Xu
  • Hao Li
  • Zhanzhan Cheng
  • Yi Niu

Open Set Recognition (OSR) has been an emerging topic. Besides recognizing predefined classes, the system needs to reject the unknowns. Prototype learning is a potential manner to handle the problem, as its ability to improve intra-class compactness of representations is much needed in discrimination between the known and the unknowns. In this work, we propose a novel Prototype Mining And Learning (PMAL) framework. It has a prototype mining mechanism before the phase of optimizing embedding space, explicitly considering two crucial properties, namely high-quality and diversity of the prototype set. Concretely, a set of high-quality candidates are firstly extracted from training samples based on data uncertainty learning, avoiding the interference from unexpected noise. Considering the multifarious appearance of objects even in a single category, a diversity-based strategy for prototype set filtering is proposed. Accordingly, the embedding space can be better optimized to discriminate therein the predefined classes and between known and unknowns. Extensive experiments verify the two good characteristics (i. e. , high-quality and diversity) embraced in prototype mining, and show the remarkable performance of the proposed framework compared to state-of-the-arts.

AAAI Conference 2021 Conference Paper

MANGO: A Mask Attention Guided One-Stage Scene Text Spotter

  • Liang Qiao
  • Ying Chen
  • Zhanzhan Cheng
  • Yunlu Xu
  • Yi Niu
  • Shiliang Pu
  • Fei Wu

Recently end-to-end scene text spotting has become a popular research topic due to its advantages of global optimization and high maintainability in real applications. Most methods attempt to develop various region of interest (RoI) operations to concatenate the detection part and the sequence recognition part into a two-stage text spotting framework. However, in such framework, the recognition part is highly sensitive to the detected results (e. g. , the compactness of text contours). To address this problem, in this paper, we propose a novel Mask AttentioN Guided One-stage text spotting framework named MANGO, in which character sequences can be directly recognized without RoI operation. Concretely, a positionaware mask attention module is developed to generate attention weights on each text instance and its characters. It allows different text instances in an image to be allocated on different feature map channels which are further grouped as a batch of instance features. Finally, a lightweight sequence decoder is applied to generate the character sequences. It is worth noting that MANGO inherently adapts to arbitraryshaped text spotting and can be trained end-to-end with only coarse position information (e. g. , rectangular bounding box) and text annotations. Experimental results show that the proposed method achieves competitive and even new state-ofthe-art performance on both regular and irregular text spotting benchmarks, i. e. , ICDAR 2013, ICDAR 2015, Total-Text, and SCUT-CTW1500.

AAAI Conference 2021 Conference Paper

SPIN: Structure-Preserving Inner Offset Network for Scene Text Recognition

  • Chengwei Zhang
  • Yunlu Xu
  • Zhanzhan Cheng
  • Shiliang Pu
  • Yi Niu
  • Fei Wu
  • Futai Zou

Arbitrary text appearance poses a great challenge in scene text recognition tasks. Existing works mostly handle with the problem in consideration of the shape distortion, including perspective distortions, line curvature or other style variations. Rectification (i. e. , spatial transformers) as the preprocessing stage is one popular approach and extensively studied. However, chromatic difficulties in complex scenes have not been paid much attention on. In this work, we introduce a new learnable geometric-unrelated rectification, Structure- Preserving Inner Offset Network (SPIN), which allows the color manipulation of source data within the network. This differentiable module can be inserted before any recognition architecture to ease the downstream tasks, giving neural networks the ability to actively transform input intensity rather than only the spatial rectification. It can also serve as a complementary module to known spatial transformations and work in both independent and collaborative ways with them. Extensive experiments show the proposed transformation outperforms existing rectification networks and has comparable performance among the state-of-the-arts.

NeurIPS Conference 2021 Conference Paper

STEP: Out-of-Distribution Detection in the Presence of Limited In-Distribution Labeled Data

  • Zhi Zhou
  • Lan-Zhe Guo
  • Zhanzhan Cheng
  • Yu-Feng Li
  • Shiliang Pu

Existing semi-supervised learning (SSL) studies typically assume that unlabeled and test data are drawn from the same distribution as labeled data. However, in many real-world applications, it is desirable to have SSL algorithms that not only classify the samples drawn from the same distribution of labeled data but also detect out-of-distribution (OOD) samples drawn from an unknown distribution. In this paper, we study a setting called semi-supervised OOD detection. Two main challenges compared with previous OOD detection settings are i) the lack of labeled data and in-distribution data; ii) OOD samples could be unseen during training. Efforts on this direction remain limited. In this paper, we present an approach STEP significantly improving OOD detection performance by introducing a new technique: Structure-Keep Unzipping. It learns a new representation space in which OOD samples could be separated well. An efficient optimization algorithm is derived to solve the objective. Comprehensive experiments across various OOD detection benchmarks clearly show that our STEP approach outperforms other methods by a large margin and achieves remarkable detection performance on several benchmarks.

IJCAI Conference 2021 Conference Paper

Towards Robust Model Reuse in the Presence of Latent Domains

  • Jie-Jing Shao
  • Zhanzhan Cheng
  • Yu-Feng Li
  • Shiliang Pu

Model reuse tries to adapt well pre-trained models to a new target task, without access of raw data. It attracts much attention since it reduces the learning resources. Previous model reuse studies typically operate in a single-domain scenario, i. e. , the target samples arise from one single domain. However, in practice the target samples often arise from multiple latent or unknown domains, e. g. , the images for cars may arise from latent domains such as photo, line drawing, cartoon, etc. The methods based on single-domain may no longer be feasible for multiple latent domains and may sometimes even lead to performance degeneration. To address the above issue, in this paper we propose the MRL (Model Reuse for multiple Latent domains) method. Both domain characteristics and pre-trained models are considered for the exploration of instances in the target task. Theoretically, the overall considerations are packed in a bi-level optimization framework with a reliable generalization. Moreover, through an ensemble of multiple models, the model robustness is improved with a theoretical guarantee. Empirical results on diverse real-world data sets clearly validate the effectiveness of proposed algorithms.

AAAI Conference 2020 Conference Paper

Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting

  • Liang Qiao
  • Sanli Tang
  • Zhanzhan Cheng
  • Yunlu Xu
  • Yi Niu
  • Shiliang Pu
  • Fei Wu

Many approaches have recently been proposed to detect irregular scene text and achieved promising results. However, their localization results may not well satisfy the following text recognition part mainly because of two reasons: 1) recognizing arbitrary shaped text is still a challenging task, and 2) prevalent non-trainable pipeline strategies between text detection and text recognition will lead to suboptimal performances. To handle this incompatibility problem, in this paper we propose an end-to-end trainable text spotting approach named Text Perceptron. Concretely, Text Perceptron first employs an efficient segmentation-based text detector that learns the latent text reading order and boundary information. Then a novel Shape Transform Module (abbr. STM) is designed to transform the detected feature regions into regular morphologies without extra parameters. It unites text detection and the following recognition part into a whole framework, and helps the whole network achieve global optimization. Experiments show that our method achieves competitive performance on two standard text benchmarks, i. e. , ICDAR 2013 and ICDAR 2015, and also obviously outperforms existing methods on irregular text benchmarks SCUT-CTW1500 and Total-Text.

AAAI Conference 2019 Conference Paper

Segregated Temporal Assembly Recurrent Networks for Weakly Supervised Multiple Action Detection

  • Yunlu Xu
  • Chengwei Zhang
  • Zhanzhan Cheng
  • Jianwen Xie
  • Yi Niu
  • Shiliang Pu
  • Fei Wu

This paper proposes a segregated temporal assembly recurrent (STAR) network for weakly-supervised multiple action detection. The model learns from untrimmed videos with only supervision of video-level labels and makes prediction of intervals of multiple actions. Specifically, we first assemble video clips according to class labels by an attention mechanism that learns class-variable attention weights and thus helps the noise relieving from background or other actions. Secondly, we build temporal relationship between actions by feeding the assembled features into an enhanced recurrent neural network. Finally, we transform the output of recurrent neural network into the corresponding action distribution. In order to generate more precise temporal proposals, we design a score term called segregated temporal gradient-weighted class activation mapping (ST-GradCAM) fused with attention weights. Experiments on THUMOS’14 and ActivityNet1. 3 datasets show that our approach outperforms the state-of-theart weakly-supervised method, and performs at par with the fully-supervised counterparts.