Main article

Aoife Brennan
Department of Marine Science and Environmental Studies, University of Galway, Galway H91 TK33, Ireland
Lars Eriksson
Department of Physical Geography and Ecosystem Science, Lund University, 223 62 Lund, Sweden
MacLachlan*
School of Geography and Sustainable Development, University of Dundee, Dundee DD1 4HN, UK
f.z.maclachlan@dundee.ac.uk

DOI: https://doi.org/10.63646/datamind.2025.030305

Abstract

Marine plastic pollution is recognised as a global environmental crisis, yet evidence-based policy design and AI-driven monitoring systems are constrained by the severe fragmentation and inconsistency of existing observational datasets. Plastic abundance records derived from beach surveys, drone overpasses, vessel trawls, satellite remote sensing, and port activity logs are dispersed across incompatible registries, differ in spatial resolution, depth coverage, temporal cadence, and polymer-classification schemes, and lack the linked governance metadata required to evaluate the effectiveness of regulatory interventions. This paper introduces OceanPlasticDB, an open, schema-documented, multi-source marine plastic observation database integrating 593,500 georeferenced records from six primary observational programmes spanning 1990 to 2024 across the global ocean. The database resolves three systematic deficiencies of existing data products: (i) a hierarchical label-harmonisation pipeline standardises plastic-type annotations across 12 polymer classes using ISO 472 nomenclature and YOLO-v8 confidence-weighted relabelling; (ii) ocean-state covariates (HYCOM sea-surface currents, ERA5 winds, AIS port activity indices) are collocated to each observation to support drift modelling and source attribution; and (iii) a structured policy register encoding 214 national and supranational plastic governance events (bans, levies, extended producer responsibility schemes) enables pre/post intervention analysis. The storage architecture comprises PostGIS for georeferenced observations, TimescaleDB for time-series ocean state data, Apache Parquet for drone and satellite image tiles, and Neo4j for river-to-coast plastic source-to-sink pathway graphs. Experimental validation demonstrates that a YOLO-v8 hotspot detector trained on OceanPlasticDB achieves F1 = 0.891, outperforming single-source baselines by up to 25 percentage points. A Lagrangian drift model fine-tuned with database-collocated current features reduces 72-hour trajectory RMSE from 15.7 km to 7.3 km. A difference-in-differences policy analysis of 38 port single-use ban events documents a statistically significant 28% plastic density reduction (p = 0.003) in treated coastal zones within 24 months. The full database, pipeline code, and benchmark harness are released under CC BY 4.0 with a persistent DOI.


 

Article details

How to Cite

Brennan, A. ., Eriksson, L., & MacLachlan, M. (2025). OceanPlasticDB: A Marine Plastic Observation Database for Environmental AI and Policy Evaluation. DATAMIND, 3(3), 60-75. https://doi.org/10.63646/datamind.2025.030305