Author name cluster

Cheng-Lin Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

23 papers

1 author row

AAAI Conference 2025 Conference Paper

Recoverable Compression: A Multimodal Vision Token Recovery Mechanism Guided by Text Information

Yi Chen
Jian Xu
Xu-Yao Zhang
Wen-Zhuo Liu
Yang-Yang Liu
Cheng-Lin Liu

With the advancement of large-scale language modeling techniques, large multimodal models combining visual encoders with large language models have demonstrated exceptional performance in various visual tasks. Most of the current large multimodal models achieve this by mapping visual features obtained from the visual encoder into a large language model and using them as inputs alongside text for downstream tasks. Therefore, the number of visual tokens directly affects the training and inference speed of the model. There has been significant work on token pruning for visual transformers, but for large multimodal models, only relying on visual information for token pruning or compression may lead to significant loss of important information. On the other hand, the textual input in the form of a question may contain valuable information that can aid in answering the question, providing additional knowledge to the model. To address the potential oversimplification and excessive pruning that can occur with most purely visual token pruning methods, we propose a text information-guided dynamic visual token recovery mechanism that does not require training. This mechanism leverages the similarity between the question text and visual tokens to recover visually meaningful tokens with important text information while merging other less important tokens, to achieve efficient computation for large multimodal models. Experimental results demonstrate that our proposed method achieves comparable performance to the original approach while compressing the visual tokens to an average of 10\% of the original quantity.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

SolidGeo: Measuring Multimodal Spatial Math Reasoning in Solid Geometry

Peijie Wang
Chao Yang
Zhong-Zhi Li
Fei Yin
Dekang Ran
Mi Tian
Zhilong Ji
Jinfeng Bai

Geometry is a fundamental branch of mathematics and plays a crucial role in evaluating the reasoning capabilities of multimodal large language models (MLLMs). However, existing multimodal mathematics benchmarks mainly focus on plane geometry and largely ignore solid geometry, which requires spatial reasoning and is more challenging than plane geometry. To address this critical gap, we introduce SolidGeo, the first large-scale benchmark specifically designed to evaluate the performance of MLLMs on mathematical reasoning tasks in solid geometry. SolidGeo consists of 3, 113 real-world K–12 and competition-level problems, each paired with visual context and annotated with difficulty levels and fine-grained solid geometry categories. Our benchmark covers a wide range of 3D reasoning subjects such as projection, unfolding, spatial measurement, and spatial vector, offering a rigorous testbed for assessing solid geometry. Through extensive experiments, we observe that MLLMs encounter substantial challenges in solid geometry math tasks, with a considerable performance gap relative to human capabilities on SolidGeo. Moreover, we analyze the performance, inference effiency and error patterns of various models, offering insights into the solid geometric mathematical reasoning capabilities of MLLMs. We hope SolidGeo serves as a catalyst for advancing MLLMs toward deeper geometric reasoning and spatial intelligence. The dataset is released at https: //huggingface. co/datasets/HarryYancy/SolidGeo/

PDF Details

NeurIPS Conference 2024 Conference Paper

Happy: A Debiased Learning Framework for Continual Generalized Category Discovery

Shijie Ma
Fei Zhu
Zhun Zhong
Wenzhuo Liu
Xu-Yao Zhang
Cheng-Lin Liu

Constantly discovering novel concepts is crucial in evolving environments. This paper explores the underexplored task of Continual Generalized Category Discovery (C-GCD), which aims to incrementally discover new classes from unlabeled data while maintaining the ability to recognize previously learned classes. Although several settings are proposed to study the C-GCD task, they have limitations that do not reflect real-world scenarios. We thus study a more practical C-GCD setting, which includes more new classes to be discovered over a longer period, without storing samples of past classes. In C-GCD, the model is initially trained on labeled data of known classes, followed by multiple incremental stages where the model is fed with unlabeled data containing both old and new classes. The core challenge involves two conflicting objectives: discover new classes and prevent forgetting old ones. We delve into the conflicts and identify that models are susceptible to prediction bias and hardness bias. To address these issues, we introduce a debiased learning framework, namely Happy, characterized by H ardness- a ware p rototype sampling and soft entro py regularization. For the prediction bias, we first introduce clustering-guided initialization to provide robust features. In addition, we propose soft entropy regularization to assign appropriate probabilities to new classes, which can significantly enhance the clustering performance of new classes. For the harness bias, we present the hardness-aware prototype sampling, which can effectively reduce the forgetting issue for previously seen classes, especially for difficult classes. Experimental results demonstrate our method proficiently manages the conflicts of C-GCD and achieves remarkable performance across various datasets, e. g. , 7. 5% overall gains on ImageNet-100. Our code is publicly available at https: //github. com/mashijie1028/Happy-CGCD.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

MSPE: Multi-Scale Patch Embedding Prompts Vision Transformers to Any Resolution

Wenzhuo Liu
Fei Zhu
Shijie Ma
Cheng-Lin Liu

Although Vision Transformers (ViTs) have recently advanced computer vision tasks significantly, an important real-world problem was overlooked: adapting to variable input resolutions. Typically, images are resized to a fixed resolution, such as 224x224, for efficiency during training and inference. However, uniform input size conflicts with real-world scenarios where images naturally vary in resolution. Modifying the preset resolution of a model may severely degrade the performance. In this work, we propose to enhance the model adaptability to resolution variation by optimizing the patch embedding. The proposed method, called Multi-Scale Patch Embedding (MSPE), substitutes the standard patch embedding with multiple variable-sized patch kernels and selects the best parameters for different resolutions, eliminating the need to resize the original image. Our method does not require high-cost training or modifications to other parts, making it easy to apply to most ViT models. Experiments in image classification, segmentation, and detection tasks demonstrate the effectiveness of MSPE, yielding superior performance on low-resolution inputs and performing comparably on high-resolution inputs with existing methods.

PDF Details DOI

IJCAI Conference 2023 Conference Paper

A Multi-Modal Neural Geometric Solver with Textual Clauses Parsed from Diagram

Ming-Liang Zhang
Fei Yin
Cheng-Lin Liu

Geometry problem solving (GPS) is a high-level mathematical reasoning requiring the capacities of multi-modal fusion and geometric knowledge application. Recently, neural solvers have shown great potential in GPS but still be short in diagram presentation and modal fusion. In this work, we convert diagrams into basic textual clauses to describe diagram features effectively, and propose a new neural solver called PGPSNet to fuse multi-modal information efficiently. Combining structural and semantic pre-training, data augmentation and self-limited decoding, PGPSNet is endowed with rich knowledge of geometry theorems and geometric representation, and therefore promotes geometric understanding and reasoning. In addition, to facilitate the research of GPS, we build a new large-scale and fine-annotated GPS dataset named PGPS9K, labeled with both fine-grained diagram annotation and interpretable solution program. Experiments on PGPS9K and an existing dataset Geometry3K validate the superiority of our method over the state-of-the-art neural solvers. Our code, dataset and appendix material are available at \url{https: //github. com/mingliangzhang2018/PGPS}.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Social Relation Reasoning Based on Triangular Constraints

Yunfei Guo
Fei Yin
Wei Feng
Xudong Yan
Tao Xue
Shuqi Mei
Cheng-Lin Liu

Social networks are essentially in a graph structure where persons act as nodes and the edges connecting nodes denote social relations. The prediction of social relations, therefore, relies on the context in graphs to model the higher-order constraints among relations, which has not been exploited sufficiently by previous works, however. In this paper, we formulate the paradigm of the higher-order constraints in social relations into triangular relational closed-loop structures, i.e., triangular constraints, and further introduce the triangular reasoning graph attention network (TRGAT). Our TRGAT employs the attention mechanism to aggregate features with triangular constraints in the graph, thereby exploiting the higher-order context to reason social relations iteratively. Besides, to acquire better feature representations of persons, we introduce node contrastive learning into relation reasoning. Experimental results show that our method outperforms existing approaches significantly, with higher accuracy and better consistency in generating social relation graphs.

PDF Details DOI

IJCAI Conference 2022 Conference Paper

Plane Geometry Diagram Parsing

Ming-Liang Zhang
Fei Yin
Yi-Han Hao
Cheng-Lin Liu

Geometry diagram parsing plays a key role in geometry problem solving, wherein the primitive extraction and relation parsing remain challenging due to the complex layout and between-primitive relationship. In this paper, we propose a powerful diagram parser based on deep learning and graph reasoning. Specifically, a modified instance segmentation method is proposed to extract geometric primitives, and the graph neural network (GNN) is leveraged to realize relation parsing and primitive classification incorporating geometric features and prior knowledge. All the modules are integrated into an end-to-end model called PGDPNet to perform all the sub-tasks simultaneously. In addition, we build a new large-scale geometry diagram dataset named PGDP5K with primitive level annotations. Experiments on PGDP5K and an existing dataset IMP-Geometry3K show that our model outperforms state-of-the-art methods in four sub-tasks remarkably. Our code, dataset and appendix material are available at https: //github. com/mingliangzhang2018/PGDP.

PDF Details DOI

NeurIPS Conference 2021 Conference Paper

Class-Incremental Learning via Dual Augmentation

Fei Zhu
Zhen Cheng
Xu-Yao Zhang
Cheng-Lin Liu

Deep learning systems typically suffer from catastrophic forgetting of past knowledge when acquiring new skills continually. In this paper, we emphasize two dilemmas, representation bias and classifier bias in class-incremental learning, and present a simple and novel approach that employs explicit class augmentation (classAug) and implicit semantic augmentation (semanAug) to address the two biases, respectively. On the one hand, we propose to address the representation bias by learning transferable and diverse representations. Specifically, we investigate the feature representations in incremental learning based on spectral analysis and present a simple technique called classAug, to let the model see more classes during training for learning representations transferable across classes. On the other hand, to overcome the classifier bias, semanAug implicitly involves the simultaneous generating of an infinite number of instances of old classes in the deep feature space, which poses tighter constraints to maintain the decision boundary of previously learned classes. Without storing any old samples, our method can perform comparably with representative data replay based approaches.

PDF Details

AAAI Conference 2021 Conference Paper

Graph-to-Graph: Towards Accurate and Interpretable Online Handwritten Mathematical Expression Recognition

Jin-Wen Wu
Fei Yin
Yan-Ming Zhang
Xu-Yao Zhang
Cheng-Lin Liu

Recent handwritten mathematical expression recognition (HMER) approaches treat the problem as an imageto-markup generation task where the handwritten formula is translated into a sequence (e. g. L A TEX). The encoder-decoder framework is widely used to solve this image-to-sequence problem. However, (i) for structured mathematical formula, the hierarchical structure neither in the formula nor in the markup has been explored adequately. In addition, (ii) existing image-tomarkup methods could not explicitly segment mathematical symbols in the formula corresponding to each target markup token. In this paper, we address the above issues by formulating the HMER as a graph-tograph (G2G) learning problem. Graph is more flexible and general for structure representation and learning compared with image or sequence. At the core of our method lies the embedding of input formula and output markup into graphs on primitives, with Graph Neural Networks (GNN) to explore the structural information, and a novel sub-graph attention mechanism to match primitives in the input and output graphs. We conduct extensive experiments on CROHME datasets to demonstrate the benefits of the proposed G2G model. Our method yields significant improvements over previous SOTA image-to-markup systems. Moreover, it explicitly resolves the symbol segmentation problem while still being trained end-to-end, making the whole system much more accurate and interpretable.

PDF Details

AAAI Conference 2021 Conference Paper

Proxy Graph Matching with Proximal Matching Networks

Hao-Ru Tan
Chuang Wang
Si-Tong Wu
Tie-Qiang Wang
Xu-Yao Zhang
Cheng-Lin Liu

Estimating feature point correspondence is a common technique in computer vision. A line of recent data-driven approaches utilizing the graph neural networks improved the matching accuracy by a large margin. However, these learning-based methods require a lot of labeled training data, which are expensive to collect. Moreover, we find most methods are sensitive to global transforms, for example, a random rotation. On the contrary, classical geometric approaches are immune to rotational transformation though their performance is generally inferior. To tackle these issues, we propose a new learning-based matching framework, which is designed to be rotationally invariant. The model only takes geometric information as input. It consists of three parts: a graph neural network to generate a high-level local feature, an attention-based module to normalize the rotational transform, and a global feature matching module based on proximal optimization. To justify our approach, we provide a convergence guarantee for the proximal method for graph matching. The overall performance is validated by numerical experiments. In particular, our approach is trained on the synthetic random graphs and then applied to several real-world datasets. The experimental results demonstrate that our method is robust to rotational transform and highlights its strong performance of matching accuracy.

PDF Details

EAAI Journal 2020 Journal Article

Teaching machines to write like humans using L-attributed grammar

Yunxue Shao
Cheng-Lin Liu

Details DOI

AAAI Conference 2019 Conference Paper

Data-Distortion Guided Self-Distillation for Deep Neural Networks

Ting-Bing Xu
Cheng-Lin Liu

Knowledge distillation is an effective technique that has been widely used for transferring knowledge from a network to another network. Despite its effective improvement of network performance, the dependence of accompanying assistive models complicates the training process of single network in the need of large memory and time cost. In this paper, we design a more elegant self-distillation mechanism to transfer knowledge between different distorted versions of same training data without the reliance on accompanying models. Specifically, the potential capacity of single network is excavated by learning consistent global feature distributions and posterior distributions (class probabilities) across these distorted versions of data. Extensive experiments on multiple datasets (i. e. , CIFAR-10/100 and ImageNet) demonstrate that the proposed method can effectively improve the generalization performance of various network architectures (such as AlexNet, ResNet, Wide ResNet, and DenseNet), outperform existing distillation methods with little extra training efforts.

PDF Details

AAAI Conference 2018 Conference Paper

Fully Convolutional Network Based Skeletonization for Handwritten Chinese Characters

Tie-Qiang Wang
Cheng-Lin Liu

Structural analysis of handwritten characters relies heavily on robust skeletonization of strokes, which has not been solved well by previous thinning methods. This paper presents an effective fully convolutional network (FCN) to extract stroke skeletons for handwritten Chinese characters. We combine the holistically-nested architecture with regressive dense upsampling convolution (rDUC) and recently proposed hybrid dilated convolution (HDC) to generate pixel-level prediction for skeleton extraction. We evaluate our method on character images synthesized from the online handwritten dataset CASIA-OLHWDB and achieve higher accuracy of skeleton pixel detection than traditional thinning algorithms. We also conduct skeleton based character recognition experiments using convolutional neural network (CNN) classiﬁers on of- ﬂine/online handwritten datasets, and obtained comparable accuracies with recognition on original character images. This implies the skeletonization loses little shape information.

PDF Details

IJCAI Conference 2018 Conference Paper

Multi-task Layout Analysis for Historical Handwritten Documents Using Fully Convolutional Networks

Yue Xu
Fei Yin
Zhaoxiang Zhang
Cheng-Lin Liu

Layout analysis is a fundamental process in document image analysis and understanding. It consists of several sub-processes such as page segmentation, text line segmentation, baseline detection and so on. In this work, we propose a multi-task layout analysis method that use a single FCN model to solve the above three problems simultaneously. The FCN is trained to segment the document image into different regions and detect the center line of each text line by classifying pixels into different categories. By supervised learning on document images with pixel-wise labels, the FCN can extract discriminative features and perform pixel-wise classification accurately. After pixel-wise classification, post-processing steps are taken to reduce noises, correct wrong segmentations and find out overlapping regions. Experimental results on the public dataset DIVA-HisDB containing challenging medieval manuscripts demonstrate the effectiveness and superiority of the proposed method.

PDF Details

IJCAI Conference 2017 Conference Paper

Diverse Neuron Type Selection for Convolutional Neural Networks

Guibo Zhu
Zhaoxiang Zhang
Xu-Yao Zhang
Cheng-Lin Liu

The activation function for neurons is a prominent element in the deep learning architecture for obtaining high performance. Inspired by neuroscience findings, we introduce and define two types of neurons with different activation functions for artificial neural networks: excitatory and inhibitory neurons, which can be adaptively selected by self-learning. Based on the definition of neurons, in the paper we not only unify the mainstream activation functions, but also discuss the complementariness among these types of neurons. In addition, through the cooperation of excitatory and inhibitory neurons, we present a compositional activation function that leads to new state-of-the-art performance comparing to rectifier linear units. Finally, we hope that our framework not only gives a basic unified framework of the existing activation neurons to provide guidance for future design, but also contributes neurobiological explanations which can be treated as a window to bridge the gap between biology and computer science.

PDF Details

EAAI Journal 2017 Journal Article

Keyword spotting in handwritten chinese documents using semi-markov conditional random fields

Heng Zhang
Xiang-Dong Zhou
Cheng-Lin Liu

Details DOI

AAAI Conference 2016 Conference Paper

Large-Scale Graph-Based Semi-Supervised Learning via Tree Laplacian Solver

Yan-Ming Zhang
Xu-Yao Zhang
Xiao-Tong Yuan
Cheng-Lin Liu

Graph-based Semi-Supervised learning is one of the most popular and successful semi-supervised learning methods. Typically, it predicts the labels of unlabeled data by minimizing a quadratic objective induced by the graph, which is unfortunately a procedure of polynomial complexity in the sample size n. In this paper, we address this scalability issue by proposing a method that approximately solves the quadratic objective in nearly linear time. The method consists of two steps: it ﬁrst approximates a graph by a minimum spanning tree, and then solves the tree-induced quadratic objective function in O(n) time which is the main contribution of this work. Extensive experiments show the signiﬁcant scalability improvement over existing scalable semi-supervised learning methods.

PDF Details

IS Journal 2016 Journal Article

Pattern Recognition, Part 1 [Guest editors' introduction]

Cheng-Lin Liu
Brian Lovell
Dacheng Tao
Massimo Tistarelli

This special issue reports the advances in pattern recognition theory and applications, particularly, the basic issues of pattern classification and image analysis and their applications. The selected articles address image feature representation, contextual pattern analysis, compact low-rank sparse representation for abnormal event detection in video, customer churn prediction using ensemble learning, online text-independent writer identification using deep convolutional neural network, image depth ordering reasoning, and brain MR image tumor segmentation, respectively.

Details DOI

IS Journal 2016 Journal Article

Pattern Recognition, Part 2

Cheng-Lin Liu
Brian Lovell
Dacheng Tao
Massimo Tistarelli

This second part of the special issue on pattern recognition reports the advances in pattern recognition for visual data. The selected articles address visual categorization by cross-domain dictionary learning, facial expression recognition, face sketch-photo matching, heartbeat rate measurement from facial video, driver gaze estimation, nighttime vehicle detection, and overlaid arrow detection in biomedical images, respectively.

Details DOI

TIST Journal 2015 Journal Article

A Sparse Projection and Low-Rank Recovery Framework for Handwriting Representation and Salient Stroke Feature Extraction

Zhao Zhang
Cheng-Lin Liu
Ming-Bo Zhao

In this article, we consider the problem of simultaneous low-rank recovery and sparse projection. More specifically, a new Robust Principal Component Analysis (RPCA)-based framework called Sparse Projection and Low-Rank Recovery (SPLRR) is proposed for handwriting representation and salient stroke feature extraction. In addition to achieving a low-rank component encoding principal features and identify errors or missing values from a given data matrix as RPCA, SPLRR also learns a similarity-preserving sparse projection for extracting salient stroke features and embedding new inputs for classification. These properties make SPLRR applicable for handwriting recognition and stroke correction and enable online computation. A cosine-similarity-style regularization term is incorporated into the SPLRR formulation for encoding the similarities of local handwriting features. The sparse projection and low-rank recovery are calculated from a convex minimization problem that can be efficiently solved in polynomial time. Besides, the supervised extension of SPLRR is also elaborated. The effectiveness of our SPLRR is examined by extensive handwritten digital repairing, stroke correction, and recognition based on benchmark problems. Compared with other related techniques, SPLRR delivers strong generalization capability and state-of-the-art performance for handwriting representation and recognition.

Details DOI

IJCAI Conference 2011 Conference Paper

Pattern Field Classification with Style Normalized Transformation

Xu-Yao Zhang
Kaizhu Huang
Cheng-Lin Liu

Field classification is an extension of the traditional classification framework, by breaking the i. i. d. assumption. In field classification, patterns occur as groups (fields) of homogeneous styles. By utilizing style consistency, classifying groups of patterns is often more accurate than classifying single patterns. In this paper, we extend the Bayes decision theory, and develop the Field Bayesian Model (FBM) to deal with field classification. Specifically, we propose to learn a Style Normalized Transformation (SNT) for each field. Via the SNTs, the data of different fields are transformed to a uniform style space (i. i. d. space). The proposed model is a general and systematic framework, under which many probabilistic models can be easily extended for field classification. To transfer the model to unseen styles, we propose a transductive model called Transfer Bayesian Rule (TBR) based on self-training. We conducted extensive experiments on face, speech and a large-scale handwriting dataset, and got significant error rate reduction compared to the state-of-the-art methods.

PDF Details DOI

AAAI Conference 2010 Conference Paper

Gaussian Process Latent Random Field

Guoqiang Zhong
Wu-Jun Li
Dit-Yan Yeung
Xinwen Hou
Cheng-Lin Liu

The Gaussian process latent variable model (GPLVM) is an unsupervised probabilistic model for nonlinear dimensionality reduction. A supervised extension, called discriminative GPLVM (DGPLVM), incorporates supervisory information into GPLVM to enhance the classification performance. However, its limitation of the latent space dimensionality to at most C − 1 (C is the number of classes) leads to unsatisfactorily performance when the intrinsic dimensionality of the application is higher than C − 1. In this paper, we propose a novel supervised extension of GPLVM, called Gaussian process latent random field (GPLRF), by enforcing the latent variables to be a Gaussian Markov random field with respect to a graph constructed from the supervisory information. In GPLRF, the dimensionality of the latent space is no longer restricted to at most C − 1. This makes GPLRF much more flexible than DGPLVM in applications. Experiments conducted on both synthetic and real-world data sets demonstrate that GPLRF performs comparably with DGPLVM and other state-ofthe-art methods on data sets with intrinsic dimensionality at most C − 1, and dramatically outperforms DG- PLVM on data sets when the intrinsic dimensionality exceeds C − 1.

PDF Details

AAAI Conference 2010 Conference Paper

Transductive Learning on Adaptive Graphs

Yan-Ming Zhang
Yu Zhang
Dit-Yan Yeung
Cheng-Lin Liu
Xinwen Hou

Graph-based semi-supervised learning methods are based on some smoothness assumption about the data. As a discrete approximation of the data manifold, the graph plays a crucial role in the success of such graphbased methods. In most existing methods, graph construction makes use of a predefined weighting function without utilizing label information even when it is available. In this work, by incorporating label information, we seek to enhance the performance of graph-based semi-supervised learning by learning the graph and label inference simultaneously. In particular, we consider a particular setting of semi-supervised learning called transductive learning. Using the LogDet divergence to define the objective function, we propose an iterative algorithm to solve the optimization problem which has closed-form solution in each step. We perform experiments on both synthetic and real data to demonstrate improvement in the graph and in terms of classification accuracy.

PDF Details