When Transformers Forget: A Study of Catastrophic Forgetting in Continual Learning for NLP Tasks
Main article
Abstract
Catastrophic forgetting remains one of the most stubborn practical problems in deploying large language models for continual learning scenarios, yet the scale at which it manifests in modern transformer architectures is poorly characterised. This paper presents a systematic empirical study of forgetting behaviour across five transformer architectures — BERT, RoBERTa, T5, GPT-2, and LLaMA-3.1-8B — when sequentially fine-tuned on six NLP benchmarks spanning text classification, question answering, and named entity recognition. We measure forgetting through three complementary lenses: performance drop on previously learned tasks, representational drift in intermediate layers, and attention pattern disruption. Our results reveal that decoder-only models exhibit significantly higher forgetting rates than encoder architectures on sequential classification tasks, whereas encoder models degrade more sharply when task types shift between token-level and sequence-level objectives. We further show that simple replay-based mitigations reduce average forgetting by 34–41% without architectural changes, and that the forgetting trajectory is highly predictable from early fine-tuning dynamics, opening the door to adaptive early stopping strategies. These findings carry direct implications for practitioners deploying models in production environments where new task data arrives continuously.
