Author name cluster

Jianguo Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

18 papers

2 author rows

AAAI Conference 2026 Conference Paper

LAMDAS: LLM as an Implicit Classifier for Domain-specific Data Selection

Jian Wu
Hang Yu
Bingchang Liu
Yang Wenjie
Peng Di
Jianguo Li
Yue Zhang

Adapting large language models (LLMs) to specific domains often faces a critical bottleneck: the scarcity of high-quality, human-curated data. While large volumes of unchecked data are readily available, indiscriminately using them for fine-tuning risks introducing noise and degrading performance. Strategic data selection is thus crucial, requiring a method that is both accurate and efficient. Existing approaches, categorized as similarity-based and direct optimization methods, struggle to simultaneously achieve these goals. In this paper, we introduce LAMDAS (LLM as an implicit classifier for domain-specific Data Selection), a novel approach that leverages the pre-trained LLM itself as an implicit classifier, thereby bypassing explicit feature engineering and computationally intensive optimization process. LAMDAS reframes data selection as a one-class classification problem, identifying candidate data that "belongs" to the target domain defined by a small reference dataset. Extensive experimental results demonstrate that LAMDAS not only exceeds the performance of full-data training using a fraction of the data but also outperforms nine state-of-the-art (SOTA) baselines under various scenarios. Furthermore, LAMDAS achieves the most compelling balance between performance gains and computational efficiency compared to all evaluated baselines.

PDF Details DOI

ICLR Conference 2025 Conference Paper

CAKE: Cascading and Adaptive KV Cache Eviction with Layer Preferences

Ziran Qin
Yuchen Cao 0007
Mingbao Lin
Wen Hu
Shixuan Fan
Ke Cheng
Weiyao Lin
Jianguo Li

Large language models (LLMs) excel at processing long sequences, boosting demand for key-value (KV) caching. While recent efforts to evict KV cache have alleviated the inference burden, they often fail to allocate resources rationally across layers with different attention patterns. In this paper, we introduce Cascading and Adaptive KV cache Eviction (CAKE), a novel approach that frames KV cache eviction as a ``cake-slicing problem.'' CAKE assesses layer-specific preferences by considering attention dynamics in both spatial and temporal dimensions, allocates rational cache size for layers accordingly, and manages memory constraints in a cascading manner. This approach enables a global view of cache allocation, adaptively distributing resources across diverse attention mechanisms while maintaining memory budgets. CAKE also employs a new eviction indicator that considers the shifting importance of tokens over time, addressing limitations in existing methods that overlook temporal dynamics. Comprehensive experiments on LongBench and NeedleBench show that CAKE maintains model performance with only 3.2\% of the KV cache and consistently outperforms current baselines across various models and memory constraints, particularly in low-memory settings. Additionally, CAKE achieves over 10$\times$ speedup in decoding latency compared to full cache when processing contexts of 128K tokens with FlashAttention-2. Our code is available at https://github.com/antgroup/cakekv.

Details

NeurIPS Conference 2025 Conference Paper

Code Graph Model (CGM): A Graph-Integrated Large Language Model for Repository-Level Software Engineering Tasks

Hongyuan Tao
Ying Zhang
Zhenhao Tang
Hongen Peng
Xukun Zhu
Bingchang Liu
Yingguang Yang
Ziyin Zhang

Recent advances in Large Language Models (LLMs) have shown promise in function-level code generation, yet repository-level software engineering tasks remain challenging. Current solutions predominantly rely on proprietary LLM agents, which introduce unpredictability and limit accessibility, raising concerns about data privacy and model customization. This paper investigates whether open-source LLMs can effectively address repository-level tasks without requiring agent-based approaches. We demonstrate this is possible by enabling LLMs to comprehend functions and files within codebases through their semantic information and structural dependencies. To this end, we introduce Code Graph Models (CGMs), which integrate repository code graph structures into the LLM's attention mechanism and map node attributes to the LLM's input space using a specialized adapter. When combined with an agentless graph RAG framework, our approach achieves a 43. 00% resolution rate on the SWE-bench Lite benchmark using the open-source Qwen2. 5-72B model. This performance ranks first among open weight models, second among methods with open-source systems, and eighth overall, surpassing the previous best open-source model-based method by 12. 33%.

PDF Details

ICLR Conference 2025 Conference Paper

Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient Attentions

Zhihao He
Hang Yu 0002
Zi Gong
Shizhan Liu
Jianguo Li
Weiyao Lin

Recent advancements in Transformer-based large language models (LLMs) have set new standards in natural language processing. However, the classical softmax attention incurs significant computational costs, leading to a $O(T)$ complexity for per-token generation, where $T$ represents the context length. This work explores reducing LLMs' complexity while maintaining performance by introducing Rodimus and its enhanced version, Rodimus$+$. Rodimus employs an innovative data-dependent tempered selection (DDTS) mechanism within a linear attention-based, purely recurrent framework, achieving significant accuracy while drastically reducing the memory usage typically associated with recurrent models. This method exemplifies semantic compression by maintaining essential input information with fixed-size hidden states. Building on this, Rodimus$+$ combines Rodimus with the innovative Sliding Window Shared-Key Attention (SW-SKA) in a hybrid approach, effectively leveraging the complementary semantic, token, and head compression techniques. Our experiments demonstrate that Rodimus$+$-1.6B, trained on 1 trillion tokens, achieves superior downstream performance against models trained on more tokens, including Qwen2-1.5B and RWKV6-1.6B, underscoring its potential to redefine the accuracy-efficiency balance in LLMs. Model code and pre-trained checkpoints are open-sourced at https://github.com/codefuse-ai/rodimus.

Details

ICLR Conference 2024 Conference Paper

AmortizedPeriod: Attention-based Amortized Inference for Periodicity Identification

Hang Yu 0002
Cong Liao
Ruolan Liu
Jianguo Li
Yun Hu
Xinzhe Wang

Periodic patterns are a fundamental characteristic of time series in natural world, with significant implications for a range of disciplines, from economics to cloud systems. However, the current literature on periodicity detection faces two key challenges: limited robustness in real-world scenarios and a lack of memory to leverage previously observed time series to accelerate and improve inference on new data. To overcome these obstacles, this paper presents AmortizedPeriod, an innovative approach to periodicity identification based on amortized variational inference that integrates Bayesian statistics and deep learning. Through the Bayesian generative process, our method flexibly captures the dependencies of the periods, trends, noise, and outliers in time series, while also considering missing data and irregular periods in a robust manner. In addition, it utilizes the evidence lower bound of the log-likelihood of the observed time series as the loss function to train a deep attention inference network, facilitating knowledge transfer from the seen time series (and their labels) to unseen ones. Experimental results show that AmortizedPeriod surpasses the state-of-the-art methods by a large margin of 28.5% on average in terms of micro $F_1$-score, with at least 55% less inference time.

Details

NeurIPS Conference 2024 Conference Paper

DeepITE: Designing Variational Graph Autoencoders for Intervention Target Estimation

Hongyuan Tao
Hang Yu
Jianguo Li

Intervention Target Estimation (ITE) is vital for both understanding and decision-making in complex systems, yet it remains underexplored. Current ITE methods are hampered by their inability to learn from distinct intervention instances collaboratively and to incorporate rich insights from labeled data, which leads to inefficiencies such as the need for re-estimation of intervention targets with minor data changes or alterations in causal graphs. In this paper, we propose DeepITE, an innovative deep learning framework designed around a variational graph autoencoder. DeepITE can concurrently learn from both unlabeled and labeled data with different intervention targets and causal graphs, harnessing correlated information in a self or semi-supervised manner. The model's inference capabilities allow for the immediate identification of intervention targets on unseen samples and novel causal graphs, circumventing the need for retraining. Our extensive testing confirms that DeepITE not only surpasses 13 baseline methods in the Recall@k metric but also demonstrates expeditious inference times, particularly on large graphs. Moreover, incorporating a modest fraction of labeled data (5-10\%) substantially enhances DeepITE's performance, further solidifying its practical applicability. Our source code is available at https: //github. com/alipay/DeepITE.

PDF Details DOI

ICML Conference 2024 Conference Paper

DUPLEX: Dual GAT for Complex Embedding of Directed Graphs

Zhaoru Ke
Hang Yu 0002
Jianguo Li
Haipeng Zhang 0004

Current directed graph embedding methods build upon undirected techniques but often inadequately capture directed edge information, leading to challenges such as: (1) Suboptimal representations for nodes with low in/out-degrees, due to the insufficient neighbor interactions; (2) Limited inductive ability for representing new nodes post-training; (3) Narrow generalizability, as training is overly coupled with specific tasks. In response, we propose DUPLEX, an inductive framework for complex embeddings of directed graphs. It (1) leverages Hermitian adjacency matrix decomposition for comprehensive neighbor integration, (2) employs a dual GAT encoder for directional neighbor modeling, and (3) features two parameter-free decoders to decouple training from particular tasks. DUPLEX outperforms state-of-the-art models, especially for nodes with sparse connectivity, and demonstrates robust inductive capability and adaptability across various tasks. The code will be available upon publication.

Details

ICLR Conference 2024 Conference Paper

Finite-State Autoregressive Entropy Coding for Efficient Learned Lossless Compression

Yufeng Zhang
Hang Yu 0002
Jianguo Li
Weiyao Lin

Learned lossless data compression has garnered significant attention recently due to its superior compression ratios compared to traditional compressors. However, the computational efficiency of these models jeopardizes their practicality. This paper proposes a novel system for improving the compression ratio while maintaining computational efficiency for learned lossless data compression. Our approach incorporates two essential innovations. First, we propose the Finite-State AutoRegressive (FSAR) entropy coder, an efficient autoregressive Markov model based entropy coder that utilizes a lookup table to expedite autoregressive entropy coding. Next, we present a Straight-Through Hardmax Quantization (STHQ) scheme to enhance the optimization of discrete latent space. Our experiments show that the proposed lossless compression method could improve the compression ratio by up to 6\% compared to the baseline, with negligible extra computational time. Our work provides valuable insights into enhancing the computational efficiency of learned lossless data compression, which can have practical applications in various fields. Code is available at https://github.com/alipay/Finite_State_Autoregressive_Entropy_Coding.

Details

AIJ Journal 2024 Journal Article

Functional Relation Field: A Model-Agnostic Framework for Multivariate Time Series Forecasting

Ting Li
Bing Yu
Jianguo Li
Zhanxing Zhu

Details DOI

TMLR Journal 2024 Journal Article

Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code

Ziyin Zhang
Chaoyu Chen
Bingchang Liu
Cong Liao
Zi Gong
Hang Yu
Jianguo Li
Rui Wang

In this work we systematically review the recent advancements in software engineering with language models, covering 70+ models, 40+ evaluation tasks, 180+ datasets, and 900 related works. Unlike previous works, we integrate software engineering (SE) with natural language processing (NLP) by discussing the perspectives of both sides: SE applies language models for development automation, while NLP adopts SE tasks for language model evaluation. We break down code processing models into general language models represented by the GPT family and specialized models that are specifically pretrained on code, often with tailored objectives. We discuss the relations and differences between these models, and highlight the historical transition of code modeling from statistical models and RNNs to pretrained Transformers and LLMs, which is exactly the same course that had been taken by NLP. We also go beyond programming and review LLMs' application in other software engineering activities including requirement engineering, testing, deployment, and operations in an endeavor to provide a global view of NLP in SE, and identify key challenges and potential future directions in this domain.

PDF Details

NeurIPS Conference 2023 Conference Paper

BasisFormer: Attention-based Time Series Forecasting with Learnable and Interpretable Basis

Zelin Ni
Hang Yu
Shizhan Liu
Jianguo Li
Weiyao Lin

Bases have become an integral part of modern deep learning-based models for time series forecasting due to their ability to act as feature extractors or future references. To be effective, a basis must be tailored to the specific set of time series data and exhibit distinct correlation with each time series within the set. However, current state-of-the-art methods are limited in their ability to satisfy both of these requirements simultaneously. To address this challenge, we propose BasisFormer, an end-to-end time series forecasting architecture that leverages learnable and interpretable bases. This architecture comprises three components: First, we acquire bases through adaptive self-supervised learning, which treats the historical and future sections of the time series as two distinct views and employs contrastive learning. Next, we design a Coef module that calculates the similarity coefficients between the time series and bases in the historical view via bidirectional cross-attention. Finally, we present a Forecast module that selects and consolidates the bases in the future view based on the similarity coefficients, resulting in accurate future predictions. Through extensive experiments on six datasets, we demonstrate that BasisFormer outperforms previous state-of-the-art methods by 11. 04% and 15. 78% respectively for univariate and multivariate forecasting tasks. Code isavailable at: https: //github. com/nzl5116190/Basisformer.

PDF Details

NeurIPS Conference 2023 Conference Paper

Neural Lad: A Neural Latent Dynamics Framework for Times Series Modeling

Ting Li
Jianguo Li
Zhanxing Zhu

Neural ordinary differential equation (Neural ODE) is an elegant yet powerful framework to learn the temporal dynamics for time series modeling. However, we observe that existing Neural ODE forecasting models suffer from two disadvantages: i) controlling the latent states only through the linear transformation over the local change of the observed signals may be inadequate; ii) lacking the ability to capture the inherent periodical property in time series forecasting tasks; To overcome the two issues, we introduce a new neural ODE framework called \textbf{Neural Lad}, a \textbf{Neural} \textbf{La}tent \textbf{d}ynamics model in which the latent representations evolve with an ODE enhanced by the change of observed signal and seasonality-trend characterization. We incorporate the local change of input signal into the latent dynamics in an attention-based manner and design a residual architecture over basis expansion to depict the periodicity in the underlying dynamics. To accommodate the multivariate time series forecasting, we extend the Neural Lad through learning an adaptive relationship between multiple time series. Experiments demonstrate that our model can achieve better or comparable performance against existing neural ODE families and transformer variants in various datasets. Remarkably, the empirical superiority of Neural Lad is consistent across short and long-horizon forecasting for both univariate, multivariate and even irregular sampled time series.

PDF Details

ICLR Conference 2022 Conference Paper

Pyraformer: Low-Complexity Pyramidal Attention for Long-Range Time Series Modeling and Forecasting

Shizhan Liu
Hang Yu 0002
Cong Liao
Jianguo Li
Weiyao Lin
Alex X. Liu
Schahram Dustdar

Accurate prediction of the future given the past based on time series data is of paramount importance, since it opens the door for decision making and risk management ahead of time. In practice, the challenge is to build a flexible but parsimonious model that can capture a wide range of temporal dependencies. In this paper, we propose Pyraformer by exploring the multiresolution representation of the time series. Specifically, we introduce the pyramidal attention module (PAM) in which the inter-scale tree structure summarizes features at different resolutions and the intra-scale neighboring connections model the temporal dependencies of different ranges. Under mild conditions, the maximum length of the signal traversing path in Pyraformer is a constant (i.e., $\mathcal O(1)$) with regard to the sequence length $L$, while its time and space complexity scale linearly with $L$. Extensive numerical results show that Pyraformer typically achieves the highest prediction accuracy in both single-step and long-range forecasting tasks with the least amount of time and memory consumption, especially when the sequence is long.

Details

IJCAI Conference 2022 Conference Paper

Regularized Graph Structure Learning with Semantic Knowledge for Multi-variates Time-Series Forecasting

Hongyuan Yu
Ting Li
Weichen Yu
Jianguo Li
Yan Huang
Liang Wang
Alex Liu

Multivariate time-series forecasting is a critical task for many applications, and graph time-series network is widely studied due to its capability to capture the spatial-temporal correlation simultaneously. However, most existing works focus more on learning with the explicit prior graph structure, while ignoring potential information from the implicit graph structure, yielding incomplete structure modeling. Some recent works attempts to learn the intrinsic or implicit graph structure directly, while lacking a way to combine explicit prior structure with implicit structure together. In this paper, we propose Regularized Graph Structure Learning (RGSL) model to incorporate both explicit prior structure and implicit structure together, and learn the forecasting deep networks along with the graph structure. RGSL consists of two innovative modules. First, we derive an implicit dense similarity matrix through node embedding, and learn the sparse graph structure using the Regularized Graph Generation (RGG) based on the Gumbel Softmax trick. Second, we propose a Laplacian Matrix Mixed-up Module (LM3) to fuse the explicit graph and implicit graph together. We conduct experiments on three real-word datasets. Results show that the proposed RGSL model outperforms existing graph forecasting algorithms with a notable margin, while learning meaningful graph structure simultaneously. Our code and models are made publicly available at https: //github. com/alipay/RGSL. git.

PDF Details DOI

AAAI Conference 2019 Conference Paper

Composite Binary Decomposition Networks

You Qiaoben
Zheng Wang
Jianguo Li
Yinpeng Dong
Yu-Gang Jiang
Jun Zhu

Binary neural networks have great resource and computing efficiency, while suffer from long training procedure and non-negligible accuracy drops, when comparing to the fullprecision counterparts. In this paper, we propose the composite binary decomposition networks (CBDNet), which first compose real-valued tensor of each layer with a limited number of binary tensors, and then decompose some conditioned binary tensors into two low-rank binary tensors, so that the number of parameters and operations are greatly reduced comparing to the original ones. Experiments demonstrate the effectiveness of the proposed method, as CBDNet can approximate image classification network ResNet-18 using 5. 25 bits, VGG-16 using 5. 47 bits, DenseNet-121 using 5. 72 bits, object detection networks SSD300 using 4. 38 bits, and semantic segmentation networks SegNet using 5. 18 bits, all with minor accuracy drops. 1

PDF Details

TCS Journal 2017 Journal Article

Neighbor sum distinguishing total coloring of planar graphs without 5-cycles

Shan Ge
Jianguo Li
Changqing Xu

Details DOI

IJCAI Conference 2015 Conference Paper

Optimal Bayesian Hashing for Efficient Face Recognition

Qi Dai
Jianguo Li
Jun Wang
Yurong Chen
Yu-Gang Jiang

In practical applications, it is often observed that high-dimensional features can yield good performance, while being more costly in both computation and storage. In this paper, we propose a novel method called Bayesian Hashing to learn an optimal Hamming embedding of high-dimensional features, with a focus on the challenging application of face recognition. In particular, a boosted random FERNs classification model is designed to perform efficient face recognition, in which bit correlations are elaborately approximated with a random permutation technique. Without incurring additional storage cost, multiple random permutations are then employed to train a series of classifiers for achieving better discrimination power. In addition, we introduce a sequential forward floating search (SFFS) algorithm to perform model selection, resulting in further performance improvement. Extensive experimental evaluations and comparative studies clearly demonstrate that the proposed Bayesian Hashing approach outperforms other peer methods in both accuracy and speed. We achieve state-of-the-art results on well-known face recognition benchmarks using compact binary codes with significantly reduced computational overload and storage cost.

PDF Details

IJCAI Conference 2007 Conference Paper

Jianguo Li
Changshui Zhang
Tao Wang
Yimin Zhang

Bayesian network classifiers (BNC) have received considerable attention in machine learning field. Some special structure BNCs have been proposed and demonstrate promise performance. However, recent works show that structure learning in BNs may lead to a non-negligible posterior problem, i. e, there might be many structures have similar posterior scores. In this paper, we propose a generalized additive Bayesian network classifiers, which transfers the structure learning problem to a generalized additive models (GAM) learning problem. We first generate a series of very simple BNs, and put them in the framework of GAM, then adopt a gradient-based algorithm to learn the combining parameters, and thus construct a more powerful classifier. On a large suite of benchmark data sets, the proposed approach outperforms many traditional BNCs, such as naive Bayes, TAN, etc, and achieves comparable or better performance in comparison to boosted Bayesian network classifiers.

PDF Details