Supply-Chain Disruption Forecasting from News and Logistics Databases
Main article
Abstract
Geopolitical crises, extreme weather events, port congestion surges, and cargo flight disruptions propagate through global supply chains in ways that are observable days to weeks before their downstream impact manifests as stockouts, delivery delays, or revenue losses. Yet no unified, open database currently links unstructured news event streams to structured logistics records—port dwell times, air cargo delays, purchase-order completion rates, and inventory snapshots—within a single schema that supports reproducible forecasting research. This paper introduces SupplyDisruptDB, a relational database system that integrates six data streams into a coherent risk-oriented schema: NewsEvent, PortStatus, FlightCargo, PurchaseOrder, InventorySnapshot, and RiskScore. A five-stage ingestion pipeline applies named-entity recognition, sentiment scoring, and geospatial event parsing to news corpora, fuses the extracted signals with AIS-derived port congestion indices and OAG flight delay records, enforces structured quality-control validation, and computes per-SKU supply-chain risk scores using an LSTM-based forecasting model. Validated on a 36-month corpus spanning 187,400 news events, 14,280 port status records across 38 major seaports, 312,600 air cargo flight segments, 94,700 purchase-order records, and 61,200 inventory snapshots across 12 industry verticals, SupplyDisruptDB enables delivery-delay prediction with a mean absolute error of 1.84 days (LSTM, full database), stockout-event F1 of 0.871, and a risk lead time of 8.4 days—the advance warning horizon before a disruption event reaches critical inventory threshold. The database is released as open-source software under Apache 2.0 with a documented REST and GraphQL API, a Python client library, and reproducible experiment notebooks, providing a reusable foundation for supply-chain resilience research and operational risk intelligence.
