When Transformers Forget: A Study of Catastrophic Forgetting in Continual Learning for NLP Tasks

Yuki Tanaka; Marco Ferretti; Priya Subramanian

doi:10.63646/

Open Access PDF

Published 2023-03-30

Yuki Tanaka*

Department of Computer Science, University of Edinburgh, Edinburgh, UK, EH8 9AB
y.tanaka@ed.ac.uk

Marco Ferretti

Dipartimento di Informatica, Università degli Studi di Milano, Milan, Italy, 20133

Priya Subramanian

Department of Computer Science, University of Edinburgh, Edinburgh, UK, EH8 9AB

Abstract

Catastrophic forgetting remains one of the most stubborn practical problems in deploying large language models for continual learning scenarios, yet the scale at which it manifests in modern transformer architectures is poorly characterised. This paper presents a systematic empirical study of forgetting behaviour across five transformer architectures — BERT, RoBERTa, T5, GPT-2, and LLaMA-3.1-8B — when sequentially fine-tuned on six NLP benchmarks spanning text classification, question answering, and named entity recognition. We measure forgetting through three complementary lenses: performance drop on previously learned tasks, representational drift in intermediate layers, and attention pattern disruption. Our results reveal that decoder-only models exhibit significantly higher forgetting rates than encoder architectures on sequential classification tasks, whereas encoder models degrade more sharply when task types shift between token-level and sequence-level objectives. We further show that simple replay-based mitigations reduce average forgetting by 34–41% without architectural changes, and that the forgetting trajectory is highly predictable from early fine-tuning dynamics, opening the door to adaptive early stopping strategies. These findings carry direct implications for practitioners deploying models in production environments where new task data arrives continuously.

Keywords: catastrophic forgetting; continual learning; transformers; NLP benchmarks; sequential fine-tuning; representation drift

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite

Tanaka, Y., Ferretti, M., & Subramanian, P. (2023). When Transformers Forget: A Study of Catastrophic Forgetting in Continual Learning for NLP Tasks. DATAMIND, 1(1), 1-5. https://doi.org/10.63646/

Download Citation

Article sidebar

Main article

Abstract

Article details

How to Cite