Main article

Hongtao Lin
School of Pharmacy, Wenzhou Medical University, Wenzhou 325035, China
Yuwei Zhao*
School of Public Health, Anhui Medical University, Hefei 230032, China
zhaoyw@ahmu.edu.cn
Jianhua Sun
Department of Information Engineering, Henan University of Chinese Medicine, Zhengzhou 450046, China
Xueting Wei
Department of Clinical Pharmacology, Guangdong Pharmaceutical University, Guangzhou 510006, China

DOI: https://doi.org/10.63646/datamind.2023.010102

Abstract

Pharmacovigilance databases such as the FDA Adverse Event Reporting System (FAERS), the WHO VigiBase, the European EudraVigilance system, and the Japanese JADER store millions of spontaneous adverse drug event reports, yet their structural heterogeneity, sparse coding, and limited interpretability still constrain rapid safety signal discovery. This article treats the pharmacovigilance database itself as the central object of study. It is not a generic review. The work documents the schema, field dictionary, ingestion pipeline, quality controls, and reusable access interfaces of a multi-source pharmacovigilance lakehouse that we then mobilize for explainable adverse-event surveillance. The principal contribution is an end-to-end design that converts the relational and free-text records into a drug-event-patient-attribute knowledge graph, indexes the underlying narrative text and structured rows with dense vector representations, and exposes both layers to a retrieval-augmented generation (GraphRAG) pipeline guided by causal prompts. We describe how the architecture relates to the relational, graph-database, vector-store, and lakehouse layers behind it, and we report a runnable experiment on a working subset of 1,284,569 case reports drawn from FAERS Q1–2014 through Q4–2022. The system raises signal recall from 64.1 percent under reporting-odds-ratio baselines to 83.6 percent, lifts evidence-chain correctness from 54.3 percent for an ungrounded large language model baseline to 86.4 percent when graph retrieval and causal prompting are combined, and reduces expert audit time per flagged case from 42.8 minutes to 12.7 minutes. Field-coverage, missingness, and noise rates are reported for every source database, and the full schema, dictionaries, and access notebooks are released under an open license. These results indicate that database-centric design choices, rather than model size, dominate practical safety surveillance value.

Article details

How to Cite

Lin, H., Zhao, Y. ., Sun, J. ., & Wei, X. (2023). GraphRAG for Adverse-Event Surveillance: Applying Pharmacovigilance Databases to Explainable Safety Signal Discovery. DATAMIND, 1(1), 6-19. https://doi.org/10.63646/datamind.2023.010102