Main article

Yujie Huang
Navigation College, Dalian Maritime University, Dalian 116026, Liaoning, China
Zhipeng Zhang*
Navigation College, Dalian Maritime University, Dalian 116026, Liaoning, China
zpzhang@dlmu.edu.cn
Hao Chen
Navigation College, Dalian Maritime University, Dalian 116026, Liaoning, China
Wei Liu
School of Transportation, Wuhan University of Technology, Wuhan 430063, Hubei, China

Abstract

Marine accident investigation reports constitute a critical yet substantially underutilized repository of maritime safety knowledge. These reports contain detailed causal chain analyses, contributing factor assessments, and remedial recommendations produced by professional marine investigators, yet their unstructured narrative format prevents systematic computational exploitation for safety pattern mining, risk quantification, and preventive decision support. Knowledge graphs (KGs) offer a principled representation for structuring accident causation knowledge as typed entity-relation networks that support both human-interpretable visualization and machine-executable semantic reasoning. However, constructing high-quality, domain-specific maritime safety KGs from narrative reports requires resolving complex challenges: specialized maritime terminology, multi-hop causal chain extraction, hazard factor coupling identification, and disambiguation of entities with context-dependent meanings. This paper proposes an automated knowledge extraction pipeline leveraging large language models (LLMs) with two methodological innovations: a chain-of-thought (CoT) plus one-shot prompting strategy that guides LLMs to reason step-by-step through causal attribution before extracting entities and relations, and a comprehensive quality assessment framework that evaluates KG accuracy via semantic fidelity metrics and KG utility via graph complexity indicators. Applied to 700 marine accident investigation reports from the China Maritime Safety Administration (CMSA) spanning 2010-2023, the pipeline constructs a maritime safety KG containing 12,847 entities across 23 types and 31,562 typed relations across 18 relation categories. Evaluation demonstrates that the CoT plus one-shot strategy achieves F1-scores of 0.895 for entity extraction, 0.852 for relation extraction, and 0.871 for event extraction, representing improvements of 22.4%, 24.8%, and 21.9% respectively over GPT-4 zero-shot baselines. Hazard factor coupling analysis reveals that human error co-occurs with navigation mistakes in 81% of collision accidents and with fatigue in 76% of night-time incidents, providing actionable insights for targeted safety interventions. The framework advances maritime safety informatics by enabling scalable, data-driven knowledge construction from the large corpus of existing accident reports.

Article details

How to Cite

Huang, Y., Zhang, Z. ., Chen, H. ., & Liu, W. . (2024). Automated Maritime Safety Knowledge Graph Construction Using Large Language Models with Chain-of-Thought Prompting and Quality Assessment Framework. Journal of Intelligent Industrial Convergence, 4(1), 1-12. https://doi.org/10.63646/jiic.2024.040101