Arrow Research search

Author name cluster

Ping Guo

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

20 papers
2 author rows

Possible papers

20

AAAI Conference 2026 Conference Paper

CoEvo: Continual Evolution of Symbolic Solutions Using Large Language Models

  • Ping Guo
  • Qingfu Zhang
  • Xi Lin

The discovery of symbolic solutions—mathematical expressions, logical rules, and algorithmic structures—is fundamental to advancing scientific and engineering progress. However, traditional methods often struggle with search efficiency and fail to integrate knowledge effectively. While recent large language model-based (LLM-based) approaches have demonstrated improvements in search efficiency, they lack the ability to continually refine and expand upon discovered solutions and their underlying knowledge, limiting their potential for \textit{open-ended innovation}. To address these limitations, we introduce CoEvo, a novel framework that leverages large language models within an evolutionary search methodology to continually generate and refine symbolic solutions. CoEvo integrates a dynamic knowledge library, enabling open-ended innovation of solutions through effective knowledge management. Additionally, CoEvo leverages multiple representations of solutions—including natural language, mathematical expressions, and code—to further enhance search efficiency. By combining the reasoning capabilities of LLMs with the exploratory power of evolutionary algorithms, CoEvo significantly improves the efficiency and scope of symbolic discovery. Our experimental results demonstrate that this method not only enhances the efficiency of searching for symbolic solutions but also supports the ongoing discovery process, akin to human scientific endeavors. This study represents a first effort in conceptualizing the search for symbolic solutions as a lifelong, iterative process, marking a significant step towards harnessing LLMs in the perpetual pursuit of scientific and engineering breakthroughs.

AAAI Conference 2026 Conference Paper

RSOD: Reliability-Guided Sonar Image Object Detection with Extremely Limited Labels

  • Chengzhou Li
  • Ping Guo
  • Guanchen Meng
  • Qi Jia
  • Jinyuan Liu
  • Zhu Liu
  • Xiaokang Liu
  • Yu Liu

Object detection in sonar images is a key technology in underwater detection systems. Compared to natural images, sonar images contain fewer texture details and are more susceptible to noise, making it difficult for non-experts to distinguish subtle differences between classes. This leads to their inability to provide precise annotation data for sonar images. Therefore, designing effective object detection methods for sonar images with extremely limited labels is particularly important. To address this, we propose a teacher-student framework called RSOD, which aims to fully learn the characteristics of sonar images and develop a pseudo-label strategy suitable for these images to mitigate the impact of limited labels. First, RSOD calculates a reliability score by assessing the consistency of the teacher's predictions across different views. To leverage this score, we introduce an object mixed pseudo-label method to tackle the shortage of labeled data in sonar images. Finally, we optimize the performance of the student by implementing a reliability-guided adaptive constraint. By taking full advantage of unlabeled data, the student can perform well even in situations with extremely limited labels. Notably, on the UATD dataset, our method, using only 5% of labeled data, achieves results that can compete against those of our baseline algorithm trained on 100% labeled data. We also collected a new dataset to provide more valuable data for research in the field of sonar.

IROS Conference 2025 Conference Paper

ContextCache: Task-Aware Lifecycle Management for Memory-Efficient LLM Agent Deployment

  • Tao Liu
  • Ping Guo
  • Dong Feng
  • Peng Wang

LLM-based agents have demonstrated remarkable capabilities in multi-step reasoning and task execution across domains such as robotics and autonomous systems. However, deploying these agents on resource-constrained platforms presents a fundamental challenge: minimizing latency while optimizing memory usage. Existing caching techniques (KVCache, PrefixCache, PromptCache) improve inference speed by reusing cached context but overlook LLM dependency relationships in agent workflows, leading to excessive memory usage or redundant recomputation across LLM calls. To address this, we propose ContextCache, a task-aware lifecycle management framework that optimizes context fragment caching for multi-step LLM agents. ContextCache predicts the lifespan of each context fragment and dynamically allocates and releases GPU memory accordingly. We evaluate our approach on a newly constructed dataset, covering logistics coordination, assembly tasks, and health management. Experimental results demonstrate a 15% reduction in memory usage compared to state-of-the-art caching strategies, with no loss in inference efficiency, making our approach well-suited for real-world deployment in resource-constrained environments.

NeurIPS Conference 2025 Conference Paper

Exploring Polyglot Harmony: On Multilingual Data Allocation for Large Language Models Pretraining

  • Ping Guo
  • Yubing Ren
  • Binbin Liu
  • Fengze Liu
  • Haobin Lin
  • Yifan Zhang
  • Bingni Zhang
  • Taifeng Wang

Large language models (LLMs) have become integral to a wide range of applications worldwide, driving an unprecedented global demand for effective multilingual capabilities. Central to achieving robust multilingual performance is the strategic allocation of language proportions within training corpora. However, determining optimal language ratios is highly challenging due to intricate cross-lingual interactions and sensitivity to dataset scale. This paper introduces CLIMB (Cross-Lingual Interaction-aware Multilingual Balancing), a novel framework designed to systematically optimize multilingual data allocation. At its core, CLIMB introduces a cross-lingual interaction-aware language ratio, explicitly quantifying each language’s effective allocation by capturing inter-language dependencies. Leveraging this ratio, CLIMB proposes a principled two-step optimization procedure—first equalizing marginal benefits across languages, then maximizing the magnitude of the resulting language allocation vectors—significantly simplifying the inherently complex multilingual optimization problem. Extensive experiments confirm that CLIMB can accurately measure cross-lingual interactions across various multilingual settings. LLMs trained with CLIMB-derived proportions consistently achieve state-of-the-art multilingual performance, even achieve competitive performance with open-sourced LLMs trained with more tokens.

ICRA Conference 2025 Conference Paper

FitnessAgent: A Unified Agent Framework for Open-Set and Personalized Fitness Evaluation

  • Zhenhui Tang
  • Jiahao Li
  • Ping Guo
  • Bowen Tian
  • Qingjun Xing
  • Xuyang Xing
  • Peng Wang 0099

Robotic systems face challenges in performing open-set and personalized fitness evaluations, especially when adapting to new exercises and individual user needs. This paper introduces FitnessAgent, a unified agent framework designed to address these challenges. Unlike traditional systems that rely on pre-trained neural networks or fixed rule-based criteria, FitnessAgent can assess any exercise without prior training, adapting evaluation metrics based on expert knowledge and user-specific requirements. The system breaks down fitness evaluation tasks into combinations of metrics, each calculated using measurable operators such as angles, distances, and positions. By leveraging a set of primitive, exercise-agnostic operators, a large language model (LLM)-based planner dynamically selects and combines these operators for each task. The open-set capability of FitnessAgent is validated through experiments on both the widely-used Functional Movement Screen dataset and a newly collected isometric pose dataset. Results highlight the system's flexibility in handling new movements and its ability to adapt to personalized evaluation criteria without the need for code or algorithm modifications. FitnessAgent offers a scalable and personalized solution for fitness evaluation, making it well-suited for robotic applications that require adaptability to diverse user needs.

NeurIPS Conference 2025 Conference Paper

MuRating: A High Quality Data Selecting Approach to Multilingual Large Language Model Pretraining

  • Zhixun Chen
  • Ping Guo
  • Wenhan Han
  • Yifan Zhang
  • Binbin Liu
  • Haobin Lin
  • Fengze Liu
  • Yan Zhao

Data quality is a critical driver of large language model performance, yet existing model-based selection methods focus almost exclusively on English, neglecting other languages that are essential in the training mix for multilingual LLMs. We introduce MuRating, a scalable framework that transfers high-quality English data-quality signals into a multilingual autorater, capable of handling 17 languages. MuRating aggregates multiple English autoraters via pairwise comparisons to learn unified document quality scores, then projects these judgments through translation to train a multilingual evaluator on monolingual, cross-lingual, and parallel text pairs. Applied to web data, MuRating selects balanced subsets of English and multilingual content to pretrain LLaMA-architecture models of 1. 2B and 7B parameters. Compared to strong baselines, including QuRater, FineWeb2-HQ, AskLLM, DCLM, our approach increases average accuracy on both English benchmarks and multilingual evaluations. Extensive analyses further validate that pairwise training provides greater stability and robustness than pointwise scoring, underscoring the effectiveness of MuRating as a general multilingual data-selection framework.

AAAI Conference 2025 Conference Paper

PoseLLaVA: Pose Centric Multimodal LLM for Fine-Grained 3D Pose Manipulation

  • Dong Feng
  • Ping Guo
  • Encheng Peng
  • Mingmin Zhu
  • Wenhao Yu
  • Peng Wang

Manipulating human poses based on natural language is an emerging research field that has traditionally focused on coarse commands such as “walking” or “dancing.” However, fine-grained pose manipulation, like instructing “put both hands in front of the stomach,” remains underexplored. In this paper, we introduce PoseLLaVA, a pioneering model that integrates SMPL-based pose representations into the multimodal LLaVA framework. Through a novel pose encoder decoder mechanism, PoseLLaVA achieves precise alignment between pose, textual, and visual modalities, enabling detailed control over pose manipulation tasks. PoseLLaVA excels in three key tasks: pose estimation, generation, and adjustment, all driven by detailed language instructions. We further introduce a fine-grained pose adjustment dataset PosePart, where each sample contains an initial pose and a target pose, along with specific instructions for adjustments, mimicking the guidance a human instructor might provide. Extensive evaluations across these tasks demonstrate significant improvements over existing methods, including metrics such as MPJPE and PA-MPJPE, which measure SMPL reconstruction errors, and Recall rates, which assess feature alignment across modalities. Specifically, PoseLLaVA reduces MPJPE errors by more than 20% compared to state-of-the-art methods in pose adjustment and generation tasks. Additionally, we demonstrate the feasibility of combining PoseLLaVA with generative models, such as diffusion, for pose image editing, highlighting its potential applications in language-controlled pose manipulation.

NeurIPS Conference 2025 Conference Paper

SymRTLO: Enhancing RTL Code Optimization with LLMs and Neuron-Inspired Symbolic Reasoning

  • Yiting Wang
  • Wanghao Ye
  • Ping Guo
  • Yexiao He
  • Ziyao Wang
  • Bowei Tian
  • Shwai He
  • Guoheng Sun

Optimizing Register Transfer Level (RTL) code is crucial for improving the efficiency and performance of digital circuits in the early stages of synthesis. Manual rewriting, guided by synthesis feedback, can yield high-quality results but is time-consuming and error-prone. Most existing compiler-based approaches have difficulty handling complex design constraints. Large Language Model (LLM)-based methods have emerged as a promising alternative to address these challenges. However, LLM-based approaches often face difficulties in ensuring alignment between the generated code and the provided prompts. This paper introduces SymRTLO, a neuron-symbolic framework that integrates LLMs with symbolic reasoning for the efficient and effective optimization of RTL code. Our method incorporates a retrieval-augmented system of optimization rules and Abstract Syntax Tree (AST)-based templates, enabling LLM-based rewriting that maintains syntactic correctness while minimizing undesired circuit behaviors. A symbolic module is proposed for analyzing and optimizing finite state machine (FSM) logic, allowing fine-grained state merging and partial specification handling beyond the scope of pattern-based compilers. Furthermore, a fast verification pipeline, combining formal equivalence checks with test-driven validation, further reduces the complexity of verification. Experiments on the RTL-Rewriter benchmark with Synopsys Design Compiler and Yosys show that SymRTLO improves power, performance, and area (PPA) by up to 43. 9%, 62. 5%, and 51. 1%, respectively, compared to the state-of-the-art methods. We will release the code as open source upon the paper's acceptance.

IROS Conference 2024 Conference Paper

Image to Patterning: Density-specified Patterning of Micro-structured Surfaces with a Mobile Robot

  • Annalisa T. Taylor
  • Malachi Landis
  • Yaoke Wang
  • Todd D. Murphey
  • Ping Guo

Micro-structured surfaces possess useful properties such as friction modification, anti-fouling, and hydrophobicity. However, manufacturing these surfaces in an affordable, scalable, and efficient manner remains challenging. Standard coverage methods for surface patterning require precise placement of micro-scale features over meter-scale surfaces with expensive tooling for support. In this work, we address the scalability challenge in surface patterning by designing a mobile robot with a credit-card-sized footprint to generate micro-scale divots using a modulated tool tip. We provide a control architecture with a target feature density to specify surface coverage, eliminating the dependence on individual indentation locations. Our robot produces high-fidelity surface patterns and achieves automatic coverage of a surface from sophisticated target images. We validate an exemplary application of such micro-structured surfaces by controlling the friction coefficients at different locations according to the density of indentations. These results show the potential for compact robots to perform scalable manufacturing of functional surfaces, switching the focus from precision machines to small-footprint devices tasked with matching only the density of features.

IROS Conference 2023 Conference Paper

Disentangled Discriminator for Unsupervised Domain Adaptation on Object Detection

  • Yangguang Zhu
  • Ping Guo
  • Haoran Wei
  • Xin Zhao
  • Xiangbin Wu

Object detection plays an important role in computer vision tasks such as autonomous driving, robotics, etc. Typically, a detection model is firstly trained on collected data and then deployed in real world. However, the discrepancy exists between training (source) and testing (target) data, which degrades the detection model's performance in the real world. To mitigate the negative effects, Unsupervised Domain Adaptation (UDA) methods learn the features of a shared domain via a discriminator. However, existing discriminators consider only the in-distribution adversarial learning, which ignore the out-of-distribution data of individual domains. In this paper, we propose a disentangled discriminator to consider the in-distribution and outliers separately. It aligns the source and target data with split branches under a gated strategy. We combine the disentangled discriminator with a Teacher-Student (T-S) framework that trains the student using labeled source data and unlabeled target data under a self-training mechanism. Specifically, the teacher network, that is updated with the parameters of student network via the exponential moving average, predicts pseudo labels for unlabeled data. The quality of pseudo labels can be improved after alleviating the domain discrepancy thanks to the disentangled discriminator. Extensive experiments on benchmarks demonstrate the superiority of the proposed method. Specifically, we achieve 53. 9% mAP on Foggy Cityscapes, which is 7. 2% higher than the Oracle.

NeurIPS Conference 2023 Conference Paper

EMMA-X: An EM-like Multilingual Pre-training Algorithm for Cross-lingual Representation Learning

  • Ping Guo
  • Xiangpeng Wei
  • Yue Hu
  • Baosong Yang
  • Dayiheng Liu
  • Fei Huang
  • Jun Xie

Expressing universal semantics common to all languages is helpful to understand the meanings of complex and culture-specific sentences. The research theme underlying this scenario focuses on learning universal representations across languages with the usage of massive parallel corpora. However, due to the sparsity and scarcity of parallel data, there is still a big challenge in learning authentic ``universals'' for any two languages. In this paper, we propose Emma-X: an EM-like Multilingual pre-training Algorithm, to learn Cross-lingual universals with the aid of excessive multilingual non-parallel data. Emma-X unifies the cross-lingual representation learning task and an extra semantic relation prediction task within an EM framework. Both the extra semantic classifier and the cross-lingual sentence encoder approximate the semantic relation of two sentences, and supervise each other until convergence. To evaluate Emma-X, we conduct experiments on xrete, a newly introduced benchmark containing 12 widely studied cross-lingual tasks that fully depend on sentence-level representations. Results reveal that Emma-X achieves state-of-the-art performance. Further geometric analysis of the built representation space with three requirements demonstrates the superiority of Emma-X over advanced models.

AAAI Conference 2023 Conference Paper

Learning to Know Myself: A Coarse-to-Fine Persona-Aware Training Framework for Personalized Dialogue Generation

  • Yunpeng Li
  • Yue Hu
  • Yajing Sun
  • Luxi Xing
  • Ping Guo
  • Yuqiang Xie
  • Wei Peng

A critical challenge for open-domain dialogue agents is to generate persona-relevant and consistent responses. Due to the nature of persona sparsity in conversation scenarios, previous persona-based dialogue agents trained with Maximum Likelihood Estimation tend to overlook the given personas and generate responses irrelevant or inconsistent with personas. To address this problem, we propose a two-stage coarse-to-fine persona-aware training framework to improve the persona consistency of a dialogue agent progressively. Specifically, our framework first trains the dialogue agent to answer the constructed persona-aware questions, making it highly sensitive to the personas to generate persona-relevant responses. Then the dialogue agent is further trained with a contrastive learning paradigm by explicitly perceiving the difference between the consistent and the generated inconsistent responses, forcing it to pay more attention to the key persona information to generate consistent responses. By applying our proposed training framework to several representative baseline models, experimental results show significant boosts on both automatic and human evaluation metrics, especially the consistency of generated responses.

IJCAI Conference 2022 Conference Paper

Corner Affinity: A Robust Grouping Algorithm to Make Corner-guided Detector Great Again

  • Haoran Wei
  • Chenglong Liu
  • Ping Guo
  • Yangguang Zhu
  • Jiamei Fu
  • Bing Wang
  • Peng Wang

Corner-guided detector enjoys potential ability to yield precise bounding boxes. However, unreliable corner pairs, generated by heuristic grouping guidance, hinder the development of this detector. In this paper, we propose a novel corner grouping algorithm, termed as Corner Affinity, to significantly boost the reliability and robustness of corner grouping. The proposed Corner Affinity is a couple of two interactional factors, namely, 1) the structure affinity (SA), applying to generate preliminary corner pairs through the corresponding object's shallow construction information. 2) the contexts affinity (CA), running as optimizing corner pairs via embedding deeper semantic features of affiliated instances. Equipped with the Corner Affinity, a detector can produce high-quality bounding boxes upon preferable paired corner keypoints. Experimental results show the superiority of our design on multiple benchmark datasets. Specifically, for CornerNet baseline, the proposed Corner Affinity brings AP boostings of 5. 8% on COCO, 35. 8% on Citypersons, and 17. 2% on UCAS-AOD without bells and whistles.

NeurIPS Conference 2022 Conference Paper

HumanLiker: A Human-like Object Detector to Model the Manual Labeling Process

  • Haoran Wei
  • Ping Guo
  • Yangguang Zhu
  • Chenglong Liu
  • Peng Wang

Popular object detection models generate bounding boxes in a different way than we humans. As an example, modern detectors yield object box either upon the regression of its center and width/height (center-guided detector), or by grouping paired estimated corners (corner-guided detector). However, that is not the pattern we manually label an object due to high degrees of freedom in searching centers or low efficiency of grouping corners. Empirically, humans run two steps to locate an object bounding box manually: 1) click the mouse at the top-left corner of object, and then drag the mouse to the bottom-right corner; 2) refine the corner positions to make the bounding box more precisely, if necessary. Inspired by this manual labeling process, we propose a novel human-like detector, termed as HumanLiker, which is devised as a two-stage end-to-end detector to simulate the two aforementioned. Like we humans in manual labeling, HumanLiker can effectively avert both the thorny center searching and heuristic corner grouping. Different from the mainstream detector branches, i. e. , the center/corner-guided methods, the HumanLiker provides a new paradigm which integrates the advantages of both branches to balance the detection efficiency and bounding box quality. On MS-COCO test-dev set, HumanLiker can achieve 50. 2%/51. 6% and 53. 8%/55. 6% in term of AP with ResNeXt-101 and SwinTransformer backbones in single/multi-scale testing, outperforming current popular center/corner-guided baselines (e. g. , DETR/CornerNet) by a large margin, with much less training epochs and higher inference FPS. Code will be available soon.

EAAI Journal 2021 Journal Article

PEAVC: An improved minimum vertex cover solver for massive sparse graphs

  • Jiaqi Gu
  • Ping Guo

Several important applications related to complex network analysis require finding small vertex covers in massive sparse graphs. To fulfill this task, this paper proposes a general algorithm framework named PEAF, which includes preprocessing stage, solving stage, and inverse-processing stage. Based on PEAF, a minimum vertex cover (MinVC) solver PEAVC is developed, which uses PreP to reduce the graph, BGVC to obtain a vertex cover of bipartite-graph components, FastVC2 to solve the connected components left, and Inv_PreP to get a vertex cover of the original problem. Computational experiments on 90 massive REAL-WORLD benchmark graphs indicate that PreP can reduce the vertex number by 83. 25% on average, which is superior to other graph reduction methods. PEAVC performs remarkably well by discovering 5 best-known results (new upper bounds) never reported in the literature, match the best known results for 63 other instances and obtain exact MinVCs for 55 instances. Experiments also show that PEAVC has an extremely high performance.

IROS Conference 2021 Conference Paper

RaP-Net: A Region-wise and Point-wise Weighting Network to Extract Robust Features for Indoor Localization

  • Dongjiang Li
  • Jinyu Miao
  • Xuesong Shi
  • Yuxin Tian
  • Qiwei Long
  • Tianyu Cai
  • Ping Guo
  • Hongfei Yu

Feature extraction plays an important role in visual localization. Unreliable features on dynamic objects or repetitive regions will interfere with feature matching and challenge indoor localization greatly. To address the problem, we propose a novel network, RaP-Net, to simultaneously predict region-wise invariability and point-wise reliability, and then extract features by considering both of them. We also introduce a new dataset, named OpenLORIS-Location, to train the proposed network. The dataset contains 1553 images from 93 indoor locations. Various appearance changes between images of the same location are included and can help the model to learn the invariability in typical indoor scenes. Experimental results show that the proposed RaP-Net trained with OpenLORIS-Location dataset achieves excellent performance in the feature matching task and significantly outperforms state-of-the-arts feature algorithms in indoor localization. The RaPNet code and dataset are available at https://github.com/ivipsourcecode/RaP-Net.

ICRA Conference 2019 Conference Paper

Customized Object Recognition and Segmentation by One Shot Learning with Human Robot Interaction

  • Ping Guo
  • Lidan Zhang
  • Lu Cao
  • Yingzhe Shen
  • Xuesong Shi
  • Haibing Ren
  • Yimin Zhang 0002

There are two difficulties to utilize state-of-the-art object recognition/detection/segmentation methods to robotic applications. First, most of the deep learning models heavily depend on large amounts of labeled training data, which are expensive to obtain for each individual application. Second, the object categories must be pre-defined in the dataset, thus not practical to scenarios with varying object categories. To alleviate the reliance on pre-defined big data, this paper proposes a customized object recognition and segmentation method. It aims to recognize and segment any object defined by the user, given only one annotation. There are three steps in the proposed method. First, the user takes an exemplar video of the target object with the robot, defines its name, and mask its boundary on only one frame. Then the robot automatically propagates the annotation through the exemplar video based on a proposed data generation method. In the meantime, a segmentation model continuously updates itself on the generated data. Finally, only a lightweight segmentation net is required at testing stage, to recognize and segment the user-defined object in any scenes.

IROS Conference 2018 Conference Paper

PCAOT: A Manhattan Point Cloud Registration Method Towards Large Rotation and Small Overlap

  • Ping Guo
  • Wei Hu 0002
  • Haibing Ren
  • Yimin Zhang 0002

Point cloud registration is a popular research topic and has been widely used in many tasks, such as robot mapping and localization. It is a challenging problem when the overlap is small, or the rotation is large. The problem has not been well solved by existing methods such as the iterative closest point (ICP) and its variants. In this paper, a novel method named principal coordinate alignment with overlap tuning (PCAOT) is proposed based on the Manhattan world assumption. It solves two key problems together, the transformation estimation and the overlap estimation. The overlap is represented by a 3D cuboid and the transformation is computed only within the overlap region. Instead of finding point correspondence as in traditional methods, we estimate the rotation by principal coordinates alignment, which is faster and less sensitive than ICP and its variants to small overlaps and large rotations. Evaluations demonstrate that our method achieves much better results than the ICP and its variants when the overlap ratio is smaller than 50%, or the rotation angle is larger than 60°. Especially, it is effective when the overlap ratio is less than 30%, or the rotation angle is larger than 90°.

IJCAI Conference 2015 Conference Paper

Cross-Domain Collaborative Filtering with Review Text

  • Xin Xin
  • Zhirun Liu
  • Chin-Yew Lin
  • Heyan Huang
  • Xiaochi Wei
  • Ping Guo

Most existing cross-domain recommendation algorithms focus on modeling ratings, while ignoring review texts. The review text, however, contains rich information, which can be utilized to alleviate data sparsity limitations, and interpret transfer patterns. In this paper, we investigate how to utilize the review text to improve cross-domain collaborative filtering models. The challenge lies in the existence of non-linear properties in some transfer patterns. Given this, we extend previous transfer learning models in collaborative filtering, from linear mapping functions to non-linear ones, and propose a cross-domain recommendation framework with the review text incorporated. Experimental verifications have demonstrated, for new users with sparse feedback, utilizing the review text obtains 10% improvement in the AUC metric, and the nonlinear method outperforms the linear ones by 4%.

AAAI Conference 2014 Conference Paper

Locality-Constrained Low-Rank Coding for Image Classification

  • Ziheng Jiang
  • Ping Guo
  • Lihong Peng

Low-rank coding (LRC), originated from matrix decomposition, is recently introduced into image classification. Following the standard bag-of-words (BOW) pipeline, when coding the data matrix in the sense of low-rankness incorporates contextual information into the traditional BOW model, this can capture the dependency relationship among neighbor patches. It differs from the traditional sparse coding paradigms which encode patches independently. Current LRC-based methods use `1 norm to increase the discrimination and sparseness of the learned codes. However, such methods fail to consider the local manifold structure between data space and dictionary space. To solve this problem, we propose a locality-constrained low-rank coding (LCLR) algorithm for image representations. By using the geometric structure information as a regularization term, we can obtain more discriminative representations. In addition, we present a fast and stable online algorithm to solve the optimization problem. In the experiments, we evaluate LCLR with four benchmarks, including one face recognition dataset (extended Yale B), one handwritten digit recognition dataset (USPS), and two image datasets (Scene13 for scene recognition and Caltech101 for object recognition). Experimental results show that our approach outperforms many stateof-the-art algorithms even with a linear classifier.