Author name cluster

Disen Lan

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers

2 author rows

NeurIPS Conference 2025 Conference Paper

Improving Bilinear RNN with Closed-loop Control

Jiaxi Hu
Yongqi Pan
Jusen Du
Disen Lan
Tang Tang
Qingsong Wen
Yuxuan Liang
Weigao Sun

Recent efficient sequence modeling methods, such as Gated DeltaNet, TTT, and RWKV-7, have achieved performance improvements by supervising the recurrent memory management through the Delta learning rule. Unlike previous state-space models (e. g. , Mamba) and gated linear attentions (e. g. , GLA), these models introduce interactions between the recurrent state and the key vector, resulting in a bilinear recursive structure. In this paper, we first introduce the concept of Bilinear RNNs with a comprehensive analysis on the advantages and limitations of these models. Then based on the closed-loop control theory, we propose a novel Bilinear RNN variant named Comba, which adopts a scalar-plus-low-rank state transition, with both state feedback and output feedback corrections. We also implement a hardware-efficient chunk-wise parallel kernel in Triton and train models with 340M/1. 3B parameters on a large-scale corpus. Comba demonstrates its superior performance and computation efficiency on both language modeling and vision tasks.

PDF Details

ICML Conference 2025 Conference Paper

Liger: Linearizing Large Language Models to Gated Recurrent Structures

Disen Lan
Weigao Sun
Jiaxi Hu
Jusen Du
Yu Cheng 0001

Transformers with linear recurrent modeling offer linear-time training and constant-memory inference. Despite their demonstrated efficiency and performance, pretraining such non-standard architectures from scratch remains costly and risky. The linearization of large language models (LLMs) transforms pretrained standard models into linear recurrent structures, enabling more efficient deployment. However, current linearization methods typically introduce additional feature map modules that require extensive fine-tuning and overlook the gating mechanisms used in state-of-the-art linear recurrent models. To address these issues, this paper presents Liger, short for Li nearizing LLMs to g at e d r ecurrent structures. Liger is a novel approach for converting pretrained LLMs into gated linear recurrent models without adding extra parameters. It repurposes the pretrained key matrix weights to construct diverse gating mechanisms, facilitating the formation of various gated recurrent structures while avoiding the need to train additional components from scratch. Using lightweight fine-tuning with Low-Rank Adaptation (LoRA), Liger restores the performance of the linearized gated recurrent models to match that of the original LLMs. Additionally, we introduce Liger Attention, an intra-layer hybrid attention mechanism, which significantly recovers 93% of the Transformer-based LLM performance at 0. 02% pre-training tokens during the linearization process, achieving competitive results across multiple benchmarks, as validated on models ranging from 1B to 8B parameters.

Details

ICML Conference 2025 Conference Paper

TimeFilter: Patch-Specific Spatial-Temporal Graph Filtration for Time Series Forecasting

Yifan Hu 0006
Guibin Zhang
Peiyuan Liu
Disen Lan
Naiqi Li
Dawei Cheng
Tao Dai 0001
Shu-Tao Xia

Time series forecasting methods generally fall into two main categories: Channel Independent (CI) and Channel Dependent (CD) strategies. While CI overlooks important covariate relationships, CD captures all dependencies without distinction, introducing noise and reducing generalization. Recent advances in Channel Clustering (CC) aim to refine dependency modeling by grouping channels with similar characteristics and applying tailored modeling techniques. However, coarse-grained clustering struggles to capture complex, time-varying interactions effectively. To address these challenges, we propose TimeFilter, a GNN-based framework for adaptive and fine-grained dependency modeling. After constructing the graph from the input sequence, TimeFilter refines the learned spatial-temporal dependencies by filtering out irrelevant correlations while preserving the most critical ones in a patch-specific manner. Extensive experiments on 13 real-world datasets from diverse application domains demonstrate the state-of-the-art performance of TimeFilter. The code is available at https: //github. com/TROUBADOUR000/TimeFilter.

Details

AAAI Conference 2024 Conference Paper

Diffusion Language-Shapelets for Semi-supervised Time-Series Classification

Zhen Liu
Wenbin Pei
Disen Lan
Qianli Ma

Semi-supervised time-series classification could effectively alleviate the issue of lacking labeled data. However, existing approaches usually ignore model interpretability, making it difficult for humans to understand the principles behind the predictions of a model. Shapelets are a set of discriminative subsequences that show high interpretability in time series classification tasks. Shapelet learning-based methods have demonstrated promising classification performance. Unfortunately, without enough labeled data, the shapelets learned by existing methods are often poorly discriminative, and even dissimilar to any subsequence of the original time series. To address this issue, we propose the Diffusion Language-Shapelets model (DiffShape) for semi-supervised time series classification. In DiffShape, a self-supervised diffusion learning mechanism is designed, which uses real subsequences as a condition. This helps to increase the similarity between the learned shapelets and real subsequences by using a large amount of unlabeled data. Furthermore, we introduce a contrastive language-shapelets learning strategy that improves the discriminability of the learned shapelets by incorporating the natural language descriptions of the time series. Experiments have been conducted on the UCR time series archive, and the results reveal that the proposed DiffShape method achieves state-of-the-art performance and exhibits superior interpretability over baselines.

PDF Details DOI