Author name cluster

Dulan Zhou

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

2 papers

1 author row

AAAI Conference 2026 Conference Paper

Perturbing to Preserve: Defending Fragile Knowledge in Online Continual Learning

Dulan Zhou
Zijian Gao
Kele Xu

Online continual learning requires models to learn from non‑stationary data streams while retaining prior knowledge. We identify an overlooked phenomenon—knowledge fragility—where correctly learned instances are rapidly forgotten after minor parameter updates. Our analysis attributes this fragility to a temporal–spatial dual mechanism: temporal instability, high-frequency parameter oscillations cause forgetting to outpace adaptation; and spatial vulnerability, fragile instances lie in sharp, high‑curvature regions of the loss landscape that are extremely sensitive to optimization noise. These insights motivate PDFK (Perturbing to Defend Fragile Knowledge), a unified framework that defends fragile knowledge along both dimensions. Temporally, we apply exponential moving averaging to smooth parameter evolution and stabilize long‑term memory. Spatially, we inject minimal structured perturbations with a consistency constraint to flatten sharp regions and enhance robustness. PDFK requires no task‑boundary annotations. Extensive experiments demonstrate that PDFK substantially improves knowledge retention and outperforms strong baselines under diverse and challenging continual learning settings.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Maintaining Fairness in Logit-based Knowledge Distillation for Class-Incremental Learning

Zijian Gao
Shanhao Han
Xingxing Zhang
Kele Xu
Dulan Zhou
Xinjun Mao
Yong Dou
Huaimin Wang

Logit-based knowledge distillation (KD) is commonly used to mitigate catastrophic forgetting in class-incremental learning (CIL) caused by data distribution shifts. However, the strict match of logit values between student and teacher models conflicts with the cross-entropy (CE) loss objective of learning new classes, leading to significant recency bias (i.e. unfairness). To address this issue, we rethink the overlooked limitations of KD-based methods through empirical analysis. Inspired by our findings, we introduce a plug-and-play pre-process method that normalizes the logits of both the student and teacher across all classes, rather than just the old classes, before distillation. This approach allows the student to focus on both old and new classes, capturing intrinsic inter-class relations from the teacher. By doing so, our method avoids the inherent conflict between KD and CE, maintaining fairness between old and new classes. Additionally, recognizing that overconfident teacher predictions can hinder the transfer of inter-class relations (i.e., dark knowledge), we extend our method to capture intra-class relations among different instances, ensuring fairness within old classes. Our method integrates seamlessly with existing logit-based KD approaches, consistently enhancing their performance across multiple CIL benchmarks without incurring additional training costs.

PDF Details DOI