Author name cluster

Haodong Lu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

2 papers

1 author row

TMLR Journal 2025 Journal Article

Continual Learning on CLIP via Incremental Prompt Tuning with Intrinsic Textual Anchors

Haodong Lu
Xinyu Zhang
Kristen Moore
Jason Xue
Lina Yao
Anton van den Hengel
Dong Gong

Continual learning (CL) enables deep neural networks to acquire new knowledge over time while mitigating catastrophic forgetting of previously learned information. The powerful generalization ability of pre-trained models (PTMs), such as the Contrastive Language-Image Pre-training (CLIP) model, has inspired a range of CL methods targeting new and specialized tasks, further bridging the gap between PTMs and continual adaptation. Leveraging its multi-modal visual and textual representations, CLIP offers a natural paradigm for CL, where new tasks can be accommodated by incrementally learning lightweight parameters, particularly prompts. However, existing prompt-based CL methods for PTMs often rely on complex designs built upon specific assumptions, such as intricate regularization schemes for prompt pools, specialized routing mechanisms, or multi-stage incrementation processes. While these approaches improve performance, they frequently introduce additional-and possibly unnecessary-complexity, underutilizing CLIP's intrinsic capabilities. In this paper, we propose a concise CL approach for CLIP based on incremental prompt tuning that fully exploits its multi-modal structure and the stability of textual representations. Our method, Textual Prototype-guided Prompt Tuning (TPPT), introduces textual prototypes not merely as static classifiers, as in existing methods, but as stable anchors to guide the learning of visual prompts, thereby shaping the embedding space (i.e., TPPT-V). We show that our bidirectional supervision strategy enables more effective learning of new knowledge while reducing forgetting. To further close the vision-language gap during CL, we activate the language branch and extend our approach to jointly optimize both visual and textual prompts (i.e., TPPT-VT). We also introduce a relational diversity regularization on the textual anchors to prevent embedding space collapse and mitigate correlated forgetting. Extensive experiments and analyses demonstrate the effectiveness of our proposed approach, highlighting the benefits of leveraging CLIP's intrinsic guidance for continual adaptation.

PDF Details

NeurIPS Conference 2025 Conference Paper

Model Inversion with Layer-Specific Modeling and Alignment for Data-Free Continual Learning

Ruilin Tong
Haodong Lu
Yuhang Liu
Dong Gong

Continual learning (CL) aims to incrementally train a model to a sequence of tasks while maintaining performance on previously seen ones. Despite effectiveness in mitigating forgetting, data storage and replay may be infeasible due to privacy or security constraints, and are impractical or unavailable for arbitrary pre-trained models. Data-free or examplar-free CL aims to continually update models with new tasks without storing previous data. In addition to regularizing updates, we employ model inversion to synthesize data from the trained model, anchoring learned knowledge through replay without retaining old data. However, model inversion in predictive models faces two key challenges. First, generating inputs (e. g. , images) solely from highly compressed output labels (e. g. , classes) often causes drift between synthetic and real data. Replaying on such synthetic data can contaminate and erode knowledge learned from real data, further degrading inversion quality over time. Second, performing inversion is usually computationally expensive, as each iteration requires backpropagation through the entire model and many steps are needed for convergence. These problems are more severe with large pre-trained models such as Contrastive Language-Image Pre-training (CLIP) models. To improve model inversion efficiency, we propose Per-layer Model Inversion (PMI) approach inspired by the faster convergence of single-layer optimization. The inputs optimized from PMI provide strong initialization for full-model inversion, significantly reducing the number of iterations required for convergence. To address feature distribution shift, we model class-wise feature distribution using a Gaussian distribution and preserve distributional information with a contrastive model. Sampling features for inversion ensures alignment between synthetic and real feature distributions. Combining PMI and feature modeling, we demonstrate the feasibility of incrementally training models on new classes by generating data from pseudo image features mapped through semantic-aware feature projection. Our method shows strong effectiveness and compatibility across multiple CL settings.

PDF Details