ESGEventDB: A Corporate ESG Controversy Database for Risk Scoring and Text-Guided Analytics
Main article
Abstract
This paper introduces ESGEventDB, an event-level corporate ESG controversy database covering 12,268 events involving 3,841 publicly listed companies across 68 countries over 2015–2024. To our knowledge, few publicly available databases provide granular, event-level controversy records with validated severity scores, source metadata, and structured company identifiers suitable for both quantitative risk modelling and NLP research. Events are sourced from news wires, regulatory filings, NGO reports, and social media, classified into 42 subcategories across three ESG pillars using a fine-tuned FinBERT pipeline followed by domain-expert validation (inter-annotator κ = 0.84). Severity scores are assigned based on text-derived indicators independently of market outcomes; out-of-sample validation confirms that critical-severity events are associated with significantly more negative 90-day cumulative abnormal returns. The public release comprises metadata, labels, and source URLs under CC BY 4.0; full source texts are available under restricted access due to third-party copyright constraints. Benchmark experiments across two tasks—pillar classification (macro F1 = 0.831 ± 0.008) and severity classification (macro F1 = 0.724 ± 0.011)—demonstrate that domain-adapted ESG-BERT outperforms general-purpose models. The database and replication code will be released upon acceptance through a Zenodo repository with a permanent DOI.
