Author name cluster

Qian Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

38 papers

2 author rows

EAAI Journal 2026 Journal Article

An end-to-end wavelet-based irregular transformer with gumbel sampling for spatiotemporal welding prediction

Changhui Liu
Ke Jin
Jianzhi Sun
Lin Deng
Jiewu Leng
Xin Li
Qian Li
Qingcheng Yang

Welding prediction plays a vital role in ensuring assembly precision and minimizing rework in thin-walled structures. Data-driven approaches have attracted increasing attention; however, most existing methods rely on oversimplified input representations, overlook irregular temporal dynamics, and focus solely on single-variable prediction, limiting their applicability in complex welding scenarios. Motivated by these challenges, this study develops an end-to-end Wavelet Irregular Transformer with Gumbel Sampling, designed to achieve accurate spatiotemporal prediction of welding-induced deformation and residual stress. From the artificial intelligence perspective, the model incorporates a large language model-based embedding initializer that compresses and contextualizes step-level simulation parameters, and an adaptive parameterized Gumbel keyframe extractor that dynamically identifies the most informative temporal segments. This design enables efficient learning over ultra-long welding sequences while maintaining high-fidelity temporal representations. From the engineering application perspective, a channel-aware wavelet encoder–decoder is developed to fuse multi-frequency and multi-channel features, improving spatial coherence and capturing coupled stress–strain interactions. Validation on a dedicated thin-plate welding dataset, supplemented by physical experiments, shows that the proposed method achieves superior accuracy, robustness, and computational efficiency compared with optimized encoder–decoder and sequence-modeling baselines. The proposed Wavelet Irregular Transformer with Gumbel Sampling achieves a deformation mean absolute error of 0. 033 mm and a root mean square error of 0. 045 mm on the test set, reducing the deformation mean absolute error by 92. 0% and the root mean square error by 82. 1% compared with the best uniform-sampling baseline, while requiring 76. 7% fewer billion floating-point operations than a full-sequence Transformer.

Details DOI

AAAI Conference 2026 Conference Paper

Benchmarking Multimodal Knowledge Conflict for Large Multimodal Models

Yifan Jia
Yuntao Du
Kailin Jiang
Yuyang Liang
Qihan Ren
Yi Xin
Rui Yang
Fenze Feng

Large Multimodal Models (LMMs) face notable challenges when encountering multimodal knowledge conflicts, particularly under retrieval-augmented generation (RAG) frameworks, where the contextual information from external sources may contradict the model’s internal parametric knowledge, leading to unreliable outputs. However, existing benchmarks fail to reflect such realistic conflict scenarios. Most focus solely on intra-memory conflicts, while context-memory and inter-context conflicts remain largely unaddressed. Furthermore, commonly used factual knowledge-based evaluations are often overlooked, and existing datasets lack a thorough investigation into conflict detection capabilities.To bridge this gap, we propose MMKC-Bench, a benchmark designed to evaluate factual knowledge conflicts in both context-memory and inter-context scenarios. MMKC-Bench encompasses four types of multimodal knowledge conflicts and includes 1,881 knowledge instances and 3,997 images across 32 broad types, collected through automated pipelines with human verification. We evaluate four representative series of LMMs on both model behavior analysis and conflict detection tasks. Our findings show that while current LMMs are capable of recognizing knowledge conflicts, they tend to favor internal parametric knowledge over external evidence. We hope MMKC-Bench will foster further research in multimodal knowledge conflict and enhance the development of multimodal RAG systems.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Cross-Scale Collaboration between LLMs and Lightweight Sequential Recommenders with Domain-Specific Latent Reasoning

Yipeng Zhang
Xin Wang
Hong Chen
Junwei Pan
Qian Li
Jun Zhang
Jie Jiang
Hong Mei

Sequential recommendation aims to predict the next item based on historical interactions. To further enhance the reasoning capability in sequential recommendation, LLMs are employed to predict the next item or generate semantic IDs for item representation, given LLMs' extensive domain knowledge and reasoning ability. However, existing LLM-based methods suffer from two limitations. (i) The scarcity of recommendation data with reasoning paths makes it challenging to design suitable chain-of-thought prompting templates, and the full potential of LLMs' reasoning abilities remains underutilized. (ii) Upon obtaining semantic IDs, the LLMs and their representations are excluded from the subsequent recommendation model training, preventing downstream models from fully utilizing the rich semantic information encoded within these IDs. To address these issues, we propose a novel CoderRec framework, which is capable of fully exploiting the information encoded in semantic IDs to guide the recommendation process. Specifically, to address the problem of scarcity in reasoning path-augmented data, we introduce latent reasoning into sequential recommendation and treat the representation captured by the downstream model as domain-specific latent thought, enabling implicit logical inference without requiring explicit CoT annotations. To ensure that the downstream recommendation models are able to deeply leverage the semantic information within IDs, we propose a novel cross-scale model collaboration strategy, which employs cross-scale IDs and a two-phase approach to align LLM-derived semantics with recommendation objectives. Extensive experiments have shown the effectiveness of our proposed CoderRec framework.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Directing Uncertainty-Aware Information Flow for Robust Diffusion Prediction

Weikang He
Yunpeng Xiao
Mengyang Huang
Xuemei Mou
Rong Wang
Qian Li

Information diffusion prediction is crucial for understanding social network dynamics, yet existing methods often neglect user participation uncertainty. This oversight typically stems from an implicit participation homogeneity assumption, which treats all observed interactions as equally reliable propagation signals, leading to fragile inferred topologies and uncertainty contamination. To address this, we propose SIEVE, a novel framework employing two synergistic strategies. First, robust node representations are learned via controllable uncertainty injection coupled with associated contrastive learning, mitigating topological fragility. Second, an uncertainty-aware directed graph aggregation mechanism is introduced, which dynamically constructs asymmetric aggregation topologies with adaptive weighting, thereby suppressing uncertainty contamination. Experiments on four public datasets demonstrate that SIEVE significantly outperforms state-of-the-art methods, offering valuable insights for designing robust information diffusion prediction models.

PDF Details DOI

AAAI Conference 2026 Conference Paper

EchoBat: Echo-Vision Enhancement and Echo-Layered Sampling for Video LLMs Hallucination Mitigation

Shuai Liu
Da Chen
Yiheng Pan
Chenwei Tian
Qian Li
Chenhao Lin

Recent advancements in multimodal large language models (MLLMs) have shown remarkable progress in video understanding. However, video MLLMs (VideoMLLMs) still suffer from hallucinations, generating nonsensical or irrelevant content. This issue partly stems from over-reliance on pre-trained knowledge, sometimes neglecting the rich visual information present in the video. Additionally, many existing methods rely on uniform frame sampling, which can overlook critical visual cues. To address these challenges, we present EchoBat, a novel approach that leverages audio information as well as video temporal and logical consistency to improve preference data construction and keyframe extraction. Our method integrates Direct Preference Optimization (DPO) to mitigate hallucinations by leveraging high-quality, contextually rich preference feedback. Specifically, we use GPT-4o to generate high-quality video descriptions and integrate visually relevant segments from Whisper-derived transcripts to construct preference responses. Correspondingly, we use the reference model itself to describe the reversed video, and use GPT-4o to flashback the text and fill in the hallucination to produce non-preferred responses. This strategy enhances the model’s ability to better understand visual content and temporal, logical relationships within videos. Furthermore, we propose an echo-layered sampling strategy for keyframe extraction from videos, which can provide more precise visual supervision compared to uniform sampling. Experimental results on the three latest video hallucination benchmarks demonstrate the effectiveness of our approach.

PDF Details DOI

JBHI Journal 2026 Journal Article

MoACNN-XGNet: Interpretable Multi-Omics Convolutional Network for Breast Cancer Subtyping and Prognostic Genes Identification

Qian Li
Lei Liu
Qing Zhang
Xiaobin Zhang
Na Li
Yaoyao Zhao
Jiayi Teng
Fuzhong Xue

Breast cancer, a highly heterogeneous disease at both the phenotypic and molecular levels, presents significant challenges for prognosis and treatment. Accurate subtyping of breast cancer is critical due to its complex biological characteristics, which directly influence disease progression and therapeutic outcomes. In this study, we integrate multi-omics data, including copy number variation, RNA sequencing, and DNA methylation, to generate two-dimensional representations of each sample using Uniform Manifold Approximation and Projection. This transformation enhances data interpretability and supports subsequent learning tasks. Traditional convolutional neural networks have demonstrated potential in medical image analysis but often struggle with high-dimensional omics data. To address this limitation, we propose MoACNN-XGNet, an attention-based convolutional neural network framework that prioritizes key features within image-transformed multi-omics data. Our method significantly improves the precision of subtype classification and effectively overcomes the challenges posed by the high dimensionality and structural complexity of multi-omics data. Furthermore, we employ the Guided Grad-CAM method to enhance model interpretability, enabling the identification of subtype-specific explainable genes. Subsequent enrichment and survival analyses of these genes reveal critical biological pathways and potential therapeutic targets. This study offers a novel approach to refining breast cancer subtyping and highlights the potential for personalized treatment strategies, ultimately aiming to improve patient survival outcomes.

Details DOI

AAAI Conference 2026 Conference Paper

Multi-Modal Fact Knowledge Generation for Imbalanced Cross-Source Entity Alignment

Qian Li
Cheng Ji
Zhaoji Liang
Yuzheng Zhang
Zhuo Chen
Siyuan Liang

Multi-modal imbalanced cross-source entity alignment aims to identify equivalent entity pairs across multi-modal knowledge graphs (MMKGs) that encompass diverse data sources with imbalanced modality, which poses significant challenges due to the non-uniform distribution of information across different modalities. Existing methods encounter major limitations in aligning entities across MMKGs, where missing data and modality-specific inconsistencies thus create information gaps. These gaps, stemming from disparities in neighborhood structure and attribute availability, result in reduced alignment performance. To address these challenges, we propose a novel multi-modal fact knowledge generation framework to advance imbalanced cross-source entity alignment. Utilizing large language models (LLMs) for comprehensive knowledge completion, our framework enriches MMKGs by synthesizing missing neighboring entities and relational attributes, enabling precise one-to-one similarity comparisons across all relations and attributes. Specifically, neighbor entity completion generates probable neighboring entities to fill structural gaps, while attribute completion synthesizes missing relational attributes to improve alignment. The facts evaluation module assesses generated triples, ensuring that only high-quality information supports the alignment. Extensive experiments on benchmark datasets demonstrate that our framework significantly outperforms strong competitors, achieving superior entity alignment performance.

PDF Details DOI

AAAI Conference 2026 Conference Paper

OscuFit: Learning to Fit Osculating Implicit Quadrics for Point Clouds

Rao Fu
Qian Li
Liang Yu
Jianmin Zheng

This paper addresses the challenge of estimating local surface differential properties, specifically surface normals and curvatures, from raw 3D point clouds. Traditional methods either rely on fitting pre-defined analytic surfaces risking model bias, or directly regress normals and curvatures overlooking their intrinsic geometric correlation. We propose a learning-based approach that locally fits osculating implicit quadrics to recover both normals and curvatures simultaneously. Drawing on classical differential geometry, we exploit the fact that every point on a C² surface admits an osculating quadric in Monge form that exactly reproduces local differential properties. However, the Monge frame itself depends on the very differential quantities being estimated. To bypass this circularity, we reformulate the Monge-form quadric as an implicit representation in a canonical local frame derived solely from point coordinates, enabling supervised learning without requiring Monge frame alignment. This reformulation allows us to construct a ground-truth dataset of such local-frame quadrics and train a neural network to predict per-point weights and offsets for a robust weighted least squares fitting process. The learned offsets account for the deviations of neighboring points from the idealized osculating surface. We further incorporate stable curvature formulations into the training loss alongside normal supervision to enhance estimation fidelity. Extensive experiments on diverse datasets demonstrate that our method outperforms prior approaches in normal and curvature estimation from raw point clouds.

PDF Details DOI

TMLR Journal 2026 Journal Article

PredLDM: Spatiotemporal Sequence Prediction with Latent Diffusion Models

Yechao Xu
Zhengxing Sun
Qian Li
Jiao Qu

Predicting the accurate and realistic future is an attractive landmark in spatiotemporal sequence prediction. Despite recent progress in spatiotemporal predictive models, explorations in this field are challenging due to difficulties in intricate global coherence and comprehensive history understanding. In this study, we introduce latent diffusion models (LDMs) into spatiotemporal sequence prediction (PredLDM) with a two-stage training paradigm. (i) To compress intricate global coherent spatiotemporal content into latent space, we propose the masked-attention transformer-based variational autoencoder (MT-VAE) by exploiting transformers with masked self-attention layers. (ii) Different from LDMs in generation-related fields where the condition in our problem settings is historical observations instead of texts, the condition-aware LDM (CA-LDM) is provided for comprehensive understanding of historical sequences. Our denoising diffusion process learns the distribution of both conditional generation and condition-aware reconstruction. Results on KittiCaltech, KTH and SEVIR datasets show that our PredLDM provides promising performance and realistic predictions in multiple scenarios including car driving, humans and weather evolutions. (https://github.com/MaoWuToday/PredLDM.git)

PDF Details

AAAI Conference 2026 Conference Paper

SCo-Cloud: Satellite Constellation Collaboration for Cloud-Aware Onboard-Computed Imaging and Transmission

Jia Liu
Qian Li
Yongqi Li
Cheng Ji
Shangguang Wang

Satellite-acquired optical remote sensing imagery is extensively applied in time-critical applications like traffic surveillance and evaluation of natural disasters. However, clouds, as a common atmospheric phenomenon, frequently obscure observation. Current approaches aim to restore visibility in cloud-obscured regions, yet they typically fall short in the presence of dense cloud cover, which are exceedingly prevalent in remote sensing imagery. Alternative approaches rely on the satellite revisit cycle, frequently surpassing ten days, a duration impractical for genuine application scenarios due to target changes and bandwidth limitations. To address these issues, this paper proposes SCo-Cloud, a novel satellite constellation collaboration framework for cloud-aware onboard-computed imaging and transmission, which consists of Center-Sat and Edge-Sats. We propose onboard thin cloud removal and re-imaging region location models to locate the impact of clouds. We further design a novel multi-satellite scheduling strategy to eliminate clouds. The models above are integrated within the Center-Sat, with the nearby Edge-Sats collaborating in tandem to execute re-imaging assignments. Furthermore, to facilitate in-depth research, we have meticulously developed a cloud-covered target detection dataset. Comprehensive experiments have conclusively demonstrated that SCo-Cloud effectively surpasses the limitations inherent in current approaches, providing accurate and timely responses within the domain of Earth observation.

PDF Details DOI

EAAI Journal 2026 Journal Article

Unknown malware detection model based on genetic evolutionary strategy

Tun Li
Meishi Song
Juan Li
Mingru Jin
Lei Qiao
Chengkai Liu
Qian Li
Yunpeng Xiao

In recent years, malware variants have proliferated rapidly, while traditional detection systems exhibit slow recognition of unknown threats and high False Positive Rates (FPR). This paper proposes a detection model for unknown malware based on genetic evolutionary strategies. The core hypothesis is that simulating the natural evolutionary process of malware can generate representative unknown malware variants for training, thereby enhancing detection capabilities. The methodology incorporates three key technologies: first, Genetic Evolution-based Malicious Sample Generation (GE-MS), a method that evolves malicious samples and generates potential malware variants through selection, crossover, and mutation operations; second, Isolation Forest-Local Outlier Factor (IF-LOF), a hybrid anomaly detection approach combining Isolation Forests (IF) and Local Outlier Factor (LOF) algorithms to filter unrealistic samples; third, Active Learning-based Soft Voting (ALSV), an active learning-based ensemble detection model that enhances through uncertainty sampling and soft voting mechanisms. Experiments conducted on the Drebin dataset demonstrate that the proposed model outperforms baseline methods, achieving 95. 31% accuracy, a 95. 12% F1-score, and a low false positive rate of 5. 86%, proving its high efficacy in malware detection.

Details DOI

EAAI Journal 2025 Journal Article

Causality-guided fault diagnosis under visual interference in fused deposition modeling

Qian Li
Tingting Huang
Jie Liu
Shanggang Wang

Fused deposition modeling (FDM) is one of the additive manufacturing (AM) technologies widely used in various industrial fields. Several factors can affect the manufacturing process, leading to quality issues in produced parts. Fault diagnosis techniques aiming at identifying the cause of quality degradation are critical to managing the FDM process. Image-based methods are a trending topic in fault diagnosis with non-contact sensors. Visual interference may cause an unreliable correlation between the background and the root cause labels. Herein, a causality-guided model is proposed for fault diagnosis in the FDM process, which has practical value for improving the accuracy of the fault diagnosis models and helps provide reliable recommendations to the operators. From a causal inference perspective, a structural causal model (SCM) is constructed to describe the "pixel-feature-label" relationship in fault diagnosis tasks. A causality-guided network is proposed to extract features causally related to the root cause labels by eliminating the background information. It comprises three parts: a variational autoencoder (VAE) for learning the background from raw images, a convolution-based fault learning network for extracting defective features, and a feature intervention unit for implementing intervention strategy in SCM. In the experiments, the proposed method achieved a diagnostic F1-score of 92. 80 % in the classification of one normal and six abnormal scenarios. It outperformed state-of-art networks including non-causality and causality-guided ones. Furthermore, Grad-CAM visualizations reveal that the proposed model tends to focus on more compact defect areas, aiding in precise fault localization. This enhances both the interpretability and trustworthiness of the fault diagnosis results.

Details DOI

NeurIPS Conference 2025 Conference Paper

Constant Bit-size Transformers Are Turing Complete

Qian Li
Yuyi Wang

We prove that any Turing machine running on inputs of arbitrary length can be simulated by a constant bit-size transformer, as long as the context window is sufficiently long. This improves previous works, which require scaling up either the model's precision or the number of parameters on longer inputs. Furthermore, we prove that the complexity class SPACE$[s(n)]$ exactly characterizes the expressive power of a constant bit-size transformer with a context window of length $s(n)$. Our approach relies on simulating Post machines, a Turing-complete computational model. Post machines can be modeled as automata equipped with a queue, exhibiting computational behaviors naturally aligned with those of transformers. The behavioral similarity between transformers and Post machines may offer new insights into the mechanisms underlying the reasoning abilities of transformers.

PDF Details

ICLR Conference 2025 Conference Paper

Context-Alignment: Activating and Enhancing LLMs Capabilities in Time Series

Yuxiao Hu 0003
Qian Li
Dongxiao Zhang
Jinyue Yan
Yuntian Chen

Recently, leveraging pre-trained Large Language Models (LLMs) for time series (TS) tasks has gained increasing attention, which involves activating and enhancing LLMs' capabilities. Many methods aim to activate LLMs' capabilities based on token-level alignment, but overlook LLMs' inherent strength in natural language processing — their deep understanding of linguistic logic and structure rather than superficial embedding processing. We propose Context-Alignment (CA), a new paradigm that aligns TS with a linguistic component in the language environments familiar to LLMs to enable LLMs to contextualize and comprehend TS data, thereby activating their capabilities. Specifically, such context-level alignment comprises structural alignment and logical alignment, which is achieved by Dual-Scale Context-Alignment GNNs (DSCA-GNNs) applied to TS-language multimodal inputs. Structural alignment utilizes dual-scale nodes to describe hierarchical structure in TS-language, enabling LLMs to treat long TS data as a whole linguistic component while preserving intrinsic token features. Logical alignment uses directed edges to guide logical relationships, ensuring coherence in the contextual semantics. Following the DSCA-GNNs framework, we propose an instantiation method of CA, termed Few-Shot prompting Context-Alignment (FSCA), to enhance the capabilities of pre-trained LLMs in handling TS tasks. FSCA can be flexibly and repeatedly integrated into various layers of pre-trained LLMs to improve awareness of logic and structure, thereby enhancing performance. Extensive experiments show the effectiveness of FSCA and the importance of Context-Alignment across tasks, particularly in few-shot and zero-shot forecasting, confirming that Context-Alignment provides powerful prior knowledge on context. The code is open-sourced at https://github.com/tokaka22/ICLR25-FSCA.

Details

IJCAI Conference 2025 Conference Paper

Multimodal Knowledge Retrieval-Augmented Iterative Alignment for Satellite Commonsense Conversation

Qian Li
Xuchen Li
Zongyu Chang
Yuzheng Zhang
Cheng Ji
Shangguang Wang

Satellite technology has significantly influenced our daily lives, manifested in applications such as navigation and communication. With its development, a vast amount of multimodal satellite commonsense data has been generated, thus leading to an urgent demand for conversation about satellite data. However, existing large language models suffer from prevalent hallucinations and poor comprehensibility on multimodal satellite data due to their high professional content threshold and partial information opacity. To address these issues, we propose a multimodal satellite knowledge retrieval-augmented iterative alignment framework (Sat-RIA) for satellite commonsense conversation. We first construct multi-view retrieval expert knowledge to reduce hallucinations and enhance the interpretability of responses, which incorporates the satellite expert database, satellite rule, satellite image database, and a satellite knowledge graph. We next design commonsense conversation instructions to make the answers more legible and understandable. Furthermore, the retrieval-augmented iterative alignment module refines response precision by aligning outputs with task-specific standards through multi-stage evaluations. Finally, we construct satellite multi-turn dialogue and visual question-answer datasets for a more comprehensive evaluation of satellite commonsense conversation. Experimental results demonstrate that Sat-RIA outperforms existing large language models and provides more comprehensible answers with fewer hallucinations.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

OS-GCL: A One-Shot Learner in Graph Contrastive Learning

Cheng Ji
Chenrui He
Qian Li
Qingyun Sun
Xingcheng Fu
Jianxin Li

Graph contrastive learning (GCL) enhances the self-supervised learning capacity for graph representation learning. Nevertheless, the previous research has neglected to consider one fundamental nature of GCL -- graph contrastive learning operates as a one-shot learner, guided by the widely utilized noise contrastive estimation (e. g. , the InfoNCE loss). Theoretically, to initially investigate the factors that contribute to the one-shot learner essence, we analyze the InfoNCE-based objective and derive its equivalent form of the softmax-based cross-entropy function. It is concluded that the InfoNCE-based GCL is determined to be a (2n-1)-way 1-shot classifier (n is the number of nodes). In this particular context, each sample is indicative of a unique ideational class, and each class has only one sample. Consequently, the one-shot learning nature of GCL leads to the issue of the limited self-supervised signal. To further address the above issue, we propose a One-Shot Learner in Graph Contrastive Learning (OS-GCL). Firstly, we estimate the potential probability distributions of the deterministic node features and discrete graph topology. Secondly, we develop a probabilistic message-passing mechanism to propagate probability (of feature) on probability (of topology). Thirdly, we propose the ProbNCE loss functions to contrast distributions. Extensive experimental results demonstrate the superiority of OS-GCL. To the best of our knowledge, this is the first study to examine the one-shot learning essence and the limited self-supervised signal issue of GCL.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

T-T: Table Transformer for Tagging-based Aspect Sentiment Triplet Extraction

Kun Peng
Chaodong Tong
Cong Cao
Hao Peng
Qian Li
Guanlin Wu
Lei Jiang
Yanbing Liu

Aspect sentiment triplet extraction (ASTE) aims to extract triplets composed of aspect terms, opinion terms, and sentiment polarities from given sentences. The table tagging method is a popular approach to addressing this task, which encodes a sentence into a 2-dimensional table, allowing for the tagging of relations between any two words. Previous efforts have focused on designing various downstream relation learning modules to better capture interactions between tokens in the table, revealing that a stronger capability in relation capture can lead to greater improvements in the model. Motivated by this, we attempt to directly utilize transformer layers as downstream relation learning modules. Due to the powerful semantic modeling capability of transformers, it is foreseeable that this will lead to excellent improvement. However, owing to the quadratic relation between the length of the table and the length of the input sentence sequence, using transformers directly faces two challenges: overly long table sequences and unfair local attention interaction. To address these challenges, we propose a novel Table-Transformer (T-T) for the tagging-based ASTE method. Specifically, we introduce a stripe attention mechanism with a loop-shift strategy to tackle these challenges. The former modifies the global attention mechanism to only attend to a 2-dimensional local attention window, while the latter facilitates interaction between different attention windows. Extensive and comprehensive experiments demonstrate that the T-T, as a downstream relation learning module, achieves state-of-the-art performance with lower computational costs.

PDF Details DOI

ICLR Conference 2025 Conference Paper

Towards Explaining the Power of Constant-depth Graph Neural Networks for Structured Linear Programming

Qian Li
Minghui Ouyang
Tian Ding
Yuyi Wang 0006
Qingjiang Shi
Ruoyu Sun 0001

Graph neural networks (GNNs) have recently emerged as powerful tools for solving complex optimization problems, often being employed to approximate solution mappings. Empirical evidence shows that even shallow GNNs (with fewer than ten layers) can achieve strong performance in predicting optimal solutions to linear programming (LP) problems. This finding is somewhat counter-intuitive, as LPs are global optimization problems, while shallow GNNs predict based on local information. Although previous theoretical results suggest that GNNs have the expressive power to solve LPs, they require deep architectures whose depth grows at least polynomially with the problem size, and thus leave the underlying principle of this empirical phenomenon still unclear. In this paper, we examine this phenomenon through the lens of distributed computing and average-case analysis. We establish that the expressive power of GNNs for LPs is closely related to well-studied distributed algorithms for LPs. Specifically, we show that any $d$-round distributed LP algorithm can be simulated by a $d$-depth GNN, and vice versa. In particular, by designing a new distributed LP algorithm and then unrolling it, we prove that constant-depth, constant-width GNNs suffice to solve sparse binary LPs effectively. Here, in contrast with previous analyses focusing on worst-case scenarios, in which we show that GNN depth must increase with problem size by leveraging an impossibility result about distributed LP algorithms, our analysis shifts the focus to the average-case performance, and shows that constant GNN depth then becomes sufficient no matter how large the problem size is. Our theory is validated by numerical results.

Details

IJCAI Conference 2025 Conference Paper

Variational Multi-Modal Hypergraph Attention Network for Multi-Modal Relation Extraction

Qian Li
Cheng Ji
Shu Guo
Kun Peng
Qianren Mao
Shangguang Wang

Multi-modal relation extraction (MMRE) is a challenging task that seeks to identify relationships between entities with textual and visual attributes. However, existing methods struggle to handle the complexities posed by multiple entity pairs within a single sentence that share similar contextual information (e. g. , identical text and image content). These scenarios amplify the difficulty of distinguishing relationships and hinder accurate extraction. To address these limitations, we propose the variational multi-modal hypergraph attention network (VM-HAN), a novel and robust framework for MMRE. Unlike previous approaches, VM-HAN constructs a multi-modal hypergraph for each sentence-image pair, explicitly modeling high-order intra-/inter-modal correlations among different entity pairs in the same context. This design enables a more detailed and nuanced understanding of entity relationships by capturing intricate cross-modal interactions that are often overlooked. Additionally, we introduce the variational hypergraph attention network (V-HAN). This variational attention mechanism dynamically refines the hypergraph structure, enabling the model to effectively handle the inherent ambiguity and complexity of multi-modal data. Comprehensive experiments on benchmark MMRE datasets demonstrate that VM-HAN achieves state-of-the-art performance, significantly surpassing existing methods in both accuracy and efficiency.

PDF Details DOI

ICLR Conference 2025 Conference Paper

When GNNs meet symmetry in ILPs: an orbit-based feature augmentation approach

Qian Chen
Lei Li 0030
Qian Li
Jianghua Wu
Akang Wang
Ruoyu Sun 0001
Xiaodong Luo
Tsung-Hui Chang

A common characteristic in integer linear programs (ILPs) is symmetry, allowing variables to be permuted without altering the underlying problem structure. Recently, GNNs have emerged as a promising approach for solving ILPs. However, a significant challenge arises when applying GNNs to ILPs with symmetry: classic GNN architectures struggle to differentiate between symmetric variables, which limits their predictive accuracy. In this work, we investigate the properties of permutation equivalence and invariance in GNNs, particularly in relation to the inherent symmetry of ILP formulations. We reveal that the interaction between these two factors contributes to the difficulty of distinguishing between symmetric variables. To address this challenge, we explore the potential of feature augmentation and propose several guiding principles for constructing augmented features. Building on these principles, we develop an orbit-based augmentation scheme that first groups symmetric variables and then samples augmented features for each group from a discrete uniform distribution. Empirical results demonstrate that our proposed approach significantly enhances both training efficiency and predictive performance.

Details

IJCAI Conference 2024 Conference Paper

LLM-based Multi-Level Knowledge Generation for Few-shot Knowledge Graph Completion

Qian Li
Zhuo Chen
Cheng Ji
Shiqi Jiang
Jianxin Li

Knowledge Graphs (KGs) are pivotal in various NLP applications but often grapple with incompleteness, especially due to the long-tail problem where infrequent, unpopular relationships drastically reduce the KG completion performance. In this paper, we focus on Few-shot Knowledge Graph Completion (FKGC), a task addressing these gaps in long-tail scenarios. Amidst the rapid evolution of Large Language Models, we propose a generation-based FKGC paradigm facilitated by LLM distillation. Our MuKDC framework employs multi-level knowledge distillation for few-shot KG completion, generating supplementary knowledge to mitigate data scarcity in few-shot environments. MuKDC comprises two primary components: Multi-level Knowledge Generation, which enriches the KG at various levels, and Consistency Assessment, to ensure the coherence and reliability of the generated knowledge. Most notably, our method achieves SOTA results in both FKGC and multi-modal FKGC benchmarks, significantly advancing KG completion and enhancing the understanding and application of LLMs in structured knowledge generation and assessment.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

MKGL: Mastery of a Three-Word Language

Lingbing Guo
Zhongpu Bo
Zhuo Chen
Yichi Zhang
Jiaoyan Chen
Yarong Lan
Mengshu Sun
Zhiqiang Zhang

Large language models (LLMs) have significantly advanced performance across a spectrum of natural language processing (NLP) tasks. Yet, their application to knowledge graphs (KGs), which describe facts in the form of triplets and allow minimal hallucinations, remains an underexplored frontier. In this paper, we investigate the integration of LLMs with KGs by introducing a specialized KG Language (KGL), where a sentence precisely consists of an entity noun, a relation verb, and ends with another entity noun. Despite KGL's unfamiliar vocabulary to the LLM, we facilitate its learning through a tailored dictionary and illustrative sentences, and enhance context understanding via real-time KG context retrieval and KGL token embedding augmentation. Our results reveal that LLMs can achieve fluency in KGL, drastically reducing errors compared to conventional KG embedding methods on KG completion. Furthermore, our enhanced LLM shows exceptional competence in generating accurate three-word sentences from an initial entity and interpreting new unseen terms out of KGs.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

On the Power of Small-size Graph Neural Networks for Linear Programming

Qian Li
Tian Ding
Linxin Yang
Minghui Ouyang
Qingjiang Shi
Ruoyu Sun

Graph neural networks (GNNs) have recently emerged as powerful tools for addressing complex optimization problems. It has been theoretically demonstrated that GNNs can universally approximate the solution mapping functions of linear programming (LP) problems. However, these theoretical results typically require GNNs to have large parameter sizes. Conversely, empirical experiments have shown that relatively small GNNs can solve LPs effectively, revealing a significant discrepancy between theoretical predictions and practical observations. In this work, we aim to bridge this gap by providing a theoretical foundation for the effectiveness of small-size GNNs. We prove that polylogarithmic-depth, constant-width GNNs are sufficient to solve packing and covering LPs, two widely used classes of LPs. Our proof leverages the capability of GNNs to simulate a variant of the gradient descent algorithm on a carefully selected potential function. Additionally, we introduce a new GNN architecture, termed GD-Net. Experimental results demonstrate that GD-Net significantly outperforms conventional GNN structures while using fewer parameters.

PDF Details DOI

AAAI Conference 2024 Conference Paper

ReGCL: Rethinking Message Passing in Graph Contrastive Learning

Cheng Ji
Zixuan Huang
Qingyun Sun
Hao Peng
Xingcheng Fu
Qian Li
Jianxin Li

Graph contrastive learning (GCL) has demonstrated remarkable efficacy in graph representation learning. However, previous studies have overlooked the inherent conflict that arises when employing graph neural networks (GNNs) as encoders for node-level contrastive learning. This conflict pertains to the partial incongruity between the feature aggregation mechanism of graph neural networks and the embedding distinction characteristic of contrastive learning. Theoretically, to investigate the location and extent of the conflict, we analyze the participation of message-passing from the gradient perspective of InfoNCE loss. Different from contrastive learning in other domains, the conflict in GCL arises due to the presence of certain samples that contribute to both the gradients of positive and negative simultaneously under the manner of message passing, which are opposite optimization directions. To further address the conflict issue, we propose a practical framework called ReGCL, which utilizes theoretical findings of GCL gradients to effectively improve graph contrastive learning. Specifically, two gradient-based strategies are devised in terms of both message passing and loss function to mitigate the conflict. Firstly, a gradient-guided structure learning method is proposed in order to acquire a structure that is adapted to contrastive learning principles. Secondly, a gradient-weighted InfoNCE loss function is designed to reduce the impact of false negative samples with high probabilities, specifically from the standpoint of the graph encoder. Extensive experiments demonstrate the superiority of the proposed method in comparison to state-of-the-art baselines across various node classification benchmarks.

PDF Details DOI

EAAI Journal 2024 Journal Article

Research on adaptive prediction model of rate of penetration under dynamic formation conditions

Hu Yin
Xiuwen Zhao
Qian Li

Rate of Penetration (ROP) prediction is a crucial for optimizing drilling processes and enhancing drilling efficiency. However, most existing ROP prediction models are static, relying on fixed neighboring well datasets and incapable of adapting to real-time changes in formation conditions during drilling. The dynamic and uncertain nature of geological conditions and formation heterogeneity further limits the extension and application of current ROP prediction models to target wells. To address these challenges, this study introduces an Adaptive Random Forest (ARF) algorithm to establish an ROP adaptive prediction model under dynamic formation conditions during real-time drilling operations. The performance of the ARF model is compared with the Moving-Window Random Forest (MW-RF) ROP dynamic prediction model, using data from the W1 well in the Sichuan basin as an example. Experimental results demonstrate that the ARF model outperforms the MW-RF model, exhibiting smaller overall error and reduced error fluctuation during formation condition changes. The ARF model's dynamic adaptability allows it to respond to the evolving trends of the ROP prediction model, rapidly adjusting to changes in formation conditions based on real-time drilling data flow. In contrast, the MW-RF model exhibits slower adaptability and delays in updating, reacting passively to window movements. The proposed ARF model overcomes limitations associated with static ROP prediction models, making it applicable to practical drilling scenarios and providing a theoretical foundation for real-time drilling parameter optimization and intelligent drilling strategies.

Details DOI

IJCAI Conference 2023 Conference Paper

BPNet: Bézier Primitive Segmentation on 3D Point Clouds

Rao Fu
Cheng Wen
Qian Li
Xiao Xiao
Pierre Alliez

This paper proposes BPNet, a novel end-to-end deep learning framework to learn Bézier primitive segmentation on 3D point clouds. The existing works treat different primitive types separately, thus limiting them to finite shape categories. To address this issue, we seek a generalized primitive segmentation on point clouds. Taking inspiration from Bézier decomposition on NURBS models, we transfer it to guide point cloud segmentation casting off primitive types. A joint optimization framework is proposed to learn Bézier primitive segmentation and geometric fitting simultaneously on a cascaded architecture. Specifically, we introduce a soft voting regularizer to improve primitive segmentation and propose an auto-weight embedding module to cluster point features, making the network more robust and generic. We also introduce a reconstruction module where we successfully process multiple CAD models with different primitives simultaneously. We conducted extensive experiments on the synthetic ABC dataset and real-scan datasets to validate and compare our approach with different baseline methods. Experiments show superior performance over previous work in terms of segmentation, with a substantially faster inference speed.

PDF Details DOI

ICLR Conference 2023 Conference Paper

Hebbian and Gradient-based Plasticity Enables Robust Memory and Rapid Learning in RNNs

Yu Duan
Zhongfan Jia
Qian Li
Yi Zhong
Kaisheng Ma

Rapidly learning from ongoing experiences and remembering past events with a flexible memory system are two core capacities of biological intelligence. While the underlying neural mechanisms are not fully understood, various evidence supports that synaptic plasticity plays a critical role in memory formation and fast learning. Inspired by these results, we equip Recurrent Neural Networks (RNNs) with plasticity rules to enable them to adapt their parameters according to ongoing experiences. In addition to the traditional local Hebbian plasticity, we propose a global, gradient-based plasticity rule, which allows the model to evolve towards its self-determined target. Our models show promising results on sequential and associative memory tasks, illustrating their ability to robustly form and retain memories. In the meantime, these models can cope with many challenging few-shot learning problems. Comparing different plasticity rules under the same framework shows that Hebbian plasticity is well-suited for several memory and associative learning tasks; however, it is outperformed by gradient-based plasticity on few-shot regression tasks which require the model to infer the underlying mapping.

Details

JMLR Journal 2023 Journal Article

MARS: A Second-Order Reduction Algorithm for High-Dimensional Sparse Precision Matrices Estimation

Qian Li
Binyan Jiang
Defeng Sun

Estimation of the precision matrix (or inverse covariance matrix) is of great importance in statistical data analysis and machine learning. However, as the number of parameters scales quadratically with the dimension $p$, the computation becomes very challenging when $p$ is large. In this paper, we propose an adaptive sieving reduction algorithm to generate a solution path for the estimation of precision matrices under the $\ell_1$ penalized D-trace loss, with each subproblem being solved by a second-order algorithm. In each iteration of our algorithm, we are able to greatly reduce the number of variables in the problem based on the Karush-Kuhn-Tucker (KKT) conditions and the sparse structure of the estimated precision matrix in the previous iteration. As a result, our algorithm is capable of handling data sets with very high dimensions that may go beyond the capacity of the existing methods. Moreover, for the sub-problem in each iteration, other than solving the primal problem directly, we develop a semismooth Newton augmented Lagrangian algorithm with global linear convergence rate on the dual problem to improve the efficiency. Theoretical properties of our proposed algorithm have been established. In particular, we show that the convergence rate of our algorithm is asymptotically superlinear. The high efficiency and promising performance of our algorithm are illustrated via extensive simulation studies and real data applications, with comparison to several state-of-the-art solvers. [abs] [ pdf ][ bib ] &copy JMLR 2023. ( edit, beta )

PDF Details

TIST Journal 2022 Journal Article

A Survey on Text Classification: From Traditional to Deep Learning

Qian Li
Hao Peng
Jianxin Li
Congying Xia
Renyu Yang
Lichao Sun
Philip S. Yu
Lifang He

Text classification is the most fundamental and essential task in natural language processing. The last decade has seen a surge of research in this area due to the unprecedented success of deep learning. Numerous methods, datasets, and evaluation metrics have been proposed in the literature, raising the need for a comprehensive and updated survey. This paper fills the gap by reviewing the state-of-the-art approaches from 1961 to 2021, focusing on models from traditional models to deep learning. We create a taxonomy for text classification according to the text involved and the models used for feature extraction and classification. We then discuss each of these categories in detail, dealing with both the technical developments and benchmark datasets that support tests of predictions. A comprehensive comparison between different techniques, as well as identifying the pros and cons of various evaluation metrics are also provided in this survey. Finally, we conclude by summarizing key implications, future research directions, and the challenges facing the research area.

Details DOI

AAAI Conference 2022 Conference Paper

How Does Knowledge Graph Embedding Extrapolate to Unseen Data: A Semantic Evidence View

Ren Li
Yanan Cao
Qiannan Zhu
Guanqun Bi
Fang Fang
Yi Liu
Qian Li

Knowledge Graph Embedding (KGE) aims to learn representations for entities and relations. Most KGE models have gained great success, especially on extrapolation scenarios. Specifically, given an unseen triple (h, r, t), a trained model can still correctly predict t from (h, r, ?), or h from (? , r, t), such extrapolation ability is impressive. However, most existing KGE works focus on the design of delicate triple modeling function, which mainly tells us how to measure the plausibility of observed triples, but offers limited explanation of why the methods can extrapolate to unseen data, and what are the important factors to help KGE extrapolate. Therefore in this work, we attempt to study the KGE extrapolation of two problems: 1. How does KGE extrapolate to unseen data? 2. How to design the KGE model with better extrapolation ability? For the problem 1, we first discuss the impact factors for extrapolation and from relation, entity and triple level respectively, propose three Semantic Evidences (SEs), which can be observed from train set and provide important semantic information for extrapolation. Then we verify the effectiveness of SEs through extensive experiments on several typical KGE methods. For the problem 2, to make better use of the three levels of SE, we propose a novel GNNbased KGE model, called Semantic Evidence aware Graph Neural Network (SE-GNN). In SE-GNN, each level of SE is modeled explicitly by the corresponding neighbor pattern, and merged sufficiently by the multi-layer aggregation, which contributes to obtaining more extrapolative knowledge representation. Finally, through extensive experiments on FB15k- 237 and WN18RR datasets, we show that SE-GNN achieves state-of-the-art performance on Knowledge Graph Completion task and performs a better extrapolation ability. Our code is available at https: //github. com/renli1024/SE-GNN.

PDF Details

NeurIPS Conference 2021 Conference Paper

AFEC: Active Forgetting of Negative Transfer in Continual Learning

Liyuan Wang
Mingtian Zhang
Zhongfan Jia
Qian Li
Chenglong Bao
Kaisheng Ma
Jun Zhu
Yi Zhong

Continual learning aims to learn a sequence of tasks from dynamic data distributions. Without accessing to the old training samples, knowledge transfer from the old tasks to each new task is difficult to determine, which might be either positive or negative. If the old knowledge interferes with the learning of a new task, i. e. , the forward knowledge transfer is negative, then precisely remembering the old tasks will further aggravate the interference, thus decreasing the performance of continual learning. By contrast, biological neural networks can actively forget the old knowledge that conflicts with the learning of a new experience, through regulating the learning-triggered synaptic expansion and synaptic convergence. Inspired by the biological active forgetting, we propose to actively forget the old knowledge that limits the learning of new tasks to benefit continual learning. Under the framework of Bayesian continual learning, we develop a novel approach named Active Forgetting with synaptic Expansion-Convergence (AFEC). Our method dynamically expands parameters to learn each new task and then selectively combines them, which is formally consistent with the underlying mechanism of biological active forgetting. We extensively evaluate AFEC on a variety of continual learning benchmarks, including CIFAR-10 regression tasks, visual classification tasks and Atari reinforcement tasks, where AFEC effectively improves the learning of new tasks and achieves the state-of-the-art performance in a plug-and-play way.

PDF Details

ICRA Conference 2021 Conference Paper

Reasoning Operational Decisions for Robots via Time Series Causal Inference

Yu Cao 0007
Boyang Li 0005
Qian Li
Adam A. Stokes
David M. Ingram
Aristides E. Kiprakis

Justifying operational decisions for robots is a challenging task as the operator or the robot itself has to understand the underlying physical interaction between the robot and the environment to predict the potential outcome. It is desirable to understand how the decision influences the operational performance in the way of causal relationship for the purpose of explainable decision-making. Here we propose a novel causal inference framework for the discovery and inference on the reasoning of the operational decisions for robots. It unifies both domain knowledge integration and model-free causal inference, allowing a data-driven causal knowledge learning on time series data. The framework is evaluated in the experiments of an underwater robot with complex environmental interactions. The results show that the framework can learn the causal structure and inference model to accurately explain and predict the operation performance with integrated physics.

Details

TCS Journal 2020 Journal Article

On the modulo degree complexity of Boolean functions

Qian Li
Xiaoming Sun

For each integer m ≥ 2, every Boolean function f can be expressed as a unique multilinear polynomial modulo m, and the degree of this multilinear polynomial is called its modulo m degree. In this paper we investigate the modulo degree complexity of total Boolean functions initiated by Parikshit Gopalan et al. [9], in which they asked the following question: whether the degree complexity of a Boolean function is polynomially related with its modulo m degree. For m be a power of primes, it is already known that the module m degree can be arbitrarily smaller compare to the degree complexity (see Section 2 for details). When m has at least two distinct prime factors, the question remains open. Towards this question, our results include: (1) we obtain some nontrivial equivalent forms of this question; (2) we affirm this question for some special classes of functions; (3) we prove a no-go theorem, explaining why this problem is difficult to attack from the computational complexity point of view; (4) we show a super-linear separation between the degree complexity and the modulo m degree.

Details DOI

IJCAI Conference 2020 Conference Paper

Stochastic Batch Augmentation with An Effective Distilled Dynamic Soft Label Regularizer

Qian Li
Qingyuan Hu
Yong Qi
Saiyu Qi
Jie Ma
Jian Zhang

Data augmentation have been intensively used in training deep neural network to improve the generalization, whether in original space (e. g. , image space) or representation space. Although being successful, the connection between the synthesized data and the original data is largely ignored in training, without considering the distribution information that the synthesized samples are surrounding the original sample in training. Hence, the behavior of the network is not optimized for this. However, that behavior is crucially important for generalization, even in the adversarial setting, for the safety of the deep learning system. In this work, we propose a framework called Stochastic Batch Augmentation (SBA) to address these problems. SBA stochastically decides whether to augment at iterations controlled by the batch scheduler and in which a ''distilled'' dynamic soft label regularization is introduced by incorporating the similarity in the vicinity distribution respect to raw samples. The proposed regularization provides direct supervision by the KL-Divergence between the output soft-max distributions of original and virtual data. Our experiments on CIFAR-10, CIFAR-100, and ImageNet show that SBA can improve the generalization of the neural networks and speed up the convergence of network training.

PDF Details DOI

TCS Journal 2019 Journal Article

A tighter relation between sensitivity complexity and certificate complexity

Kun He
Qian Li
Xiaoming Sun

The sensitivity conjecture proposed by Nisan and Szegedy in 1994, which asserts that for any Boolean function, its sensitivity complexity is polynomially related to the block sensitivity complexity, is one of the most important and challenging problems in the study of decision tree complexity. Despite a lot of efforts, the best known upper bound of block sensitivity, as well as the certificate complexity, is still exponential in terms of sensitivity. In this paper, we give a better upper bound for certificate complexity and block sensitivity, b s ( f ) ≤ C ( f ) ≤ ( 8 9 + o ( 1 ) ) s ( f ) 2 s ( f ) − 1, where b s ( f ), C ( f ) and s ( f ) are the block sensitivity, certificate complexity and sensitivity, respectively. The proof is based on a deep investigation on the structure of the sensitivity graph. We also provide a tighter relationship between the 0-certificate complexity C 0 ( f ) and 0-sensitivity s 0 ( f ) for functions with small 1-sensitivity s 1 ( f ).

Details DOI

IJCAI Conference 2018 Conference Paper

A Fast Algorithm for Generalized Arc Consistency of the Alldifferent Constraint

Xizhe Zhang
Qian Li
Weixiong Zhang

The alldifferent constraint is an essential ingredient of most Constraints Satisfaction Problems (CSPs). It has been known that the generalized arc consistency (GAC) of alldifferent constraints can be reduced to the maximum matching problem in a value graph. The redundant edges, which do not appear in any maximum matching of the value graph, can and should be removed from the graph. The existing methods attempt to identify these redundant edges by computing the strongly connected components after finding a maximum matching for the graph. Here, we present a novel theorem for identification of the redundant edges. We show that some of the redundant edges can be immediately detected after finding a maximum matching. Based on this theoretical result, we present an efficient algorithm for processing alldifferent constraints. Experimental results on real problems show that our new algorithm significantly outperforms the-state-of-art approaches.

PDF Details

AAAI Conference 2017 Conference Paper

Efficient Delivery Policy to Minimize User Traffic Consumption in Guaranteed Advertising

Jia Zhang
Zheng Wang
Qian Li
Jialin Zhang
Yanyan Lan
Qiang Li
Xiaoming Sun

In this work, we study the guaranteed delivery model which is widely used in online advertising. In the guaranteed delivery scenario, ad exposures (which are also called impressions in some works) to users are guaranteed by contracts signed in advance between advertisers and publishers. A crucial problem for the advertising platform is how to fully utilize the valuable user trafﬁc to generate as much as possible revenue. Different from previous works which usually minimize the penalty of unsatisﬁed contracts and some other cost (e. g. representativeness), we propose the novel consumption minimization model, in which the primary objective is to minimize the user trafﬁc consumed to satisfy all contracts. Under this model, we develop a near optimal method to deliver ads for users. The main advantage of our method lies in that it consumes nearly as least as possible user trafﬁc to satisfy all contracts, therefore more contracts can be accepted to produce more revenue. It also enables the publishers to estimate how much user trafﬁc is redundant or short so that they can sell or buy this part of trafﬁc in bulk in the exchange market. Furthermore, it is robust with regard to priori knowledge of user type distribution. Finally, the simulation shows that our method outperforms the traditional state-of-the-art methods.

PDF Details

AAAI Conference 2017 Conference Paper

Riemannian Submanifold Tracking on Low-Rank Algebraic Variety

Qian Li
Zhichao Wang

Matrix recovery aims to learn a low-rank structure from high dimensional data, which arises in numerous learning applications. As a popular heuristic to matrix recovery, convex relaxation involves iterative calling of singular value decomposition (SVD). Riemannian optimization based method can alleviate such expensive cost in SVD for improved scalability, which however is usually degraded by the unknown rank. This paper proposes a novel algorithm RIST that exploits the algebraic variety of low-rank manifold for matrix recovery. Particularly, RIST utilizes an efﬁcient scheme that automatically estimate the potential rank on the real algebraic variety and tracks the favorable Riemannian submanifold. Moreover, RIST utilizes the second-order geometric characterization and achieves provable superlinear convergence, which is superior to the linear convergence of most existing methods. Extensive comparison experiments demonstrate the accuracy and efﬁciency of RIST algorithm.

PDF Details