Main article

Mei-Xian Zhang*
Department of Biomedical Informatics and Data Science, Peking University, Beijing 100871, China
mei.zhang@hsc.pku.edu.cn
David Osei-Bonsu
School of Public Health, University of Ghana, Accra LG 13, Ghana

Abstract

Electronic health record (EHR) systems generate vast, heterogeneous, and temporally structured data streams whose anomaly detection is critical for patient safety, regulatory compliance, and clinical research integrity. Existing approaches are often trained on single-institution datasets, evaluated on narrow anomaly taxonomies, or reliant on domain-specific feature engineering that limits generalizability. We present NeuralGuard, a multi-modal ensemble framework that integrates Isolation Forest for unsupervised baseline profiling, XGBoost for structured feature classification, Bi-directional LSTM for temporal sequence modeling, and a Transformer-based attention encoder for context-aware representation learning. NeuralGuard is trained and evaluated on three publicly available, real-world clinical benchmarks: MIMIC-IV (52,473 critical care and 31,912 emergency department records), the eICU Collaborative Research Database (139,367 records), and the PhysioNet Challenge 2019 dataset (18,928 records), yielding a combined evaluation corpus of 242,680 patient records spanning 2008–2022 across six anomaly classes (medication errors, vitals crises, laboratory outliers, duplicate entries, data corruption, and normal controls). NeuralGuard achieves an AUROC of 0.961 and a macro-averaged F1-score of 0.943 on the combined test set, outperforming five recent baseline methods by 2.7–11.4% AUROC. Ablation experiments demonstrate that the Transformer component contributes the largest marginal AUROC gain (+0.027) and that SHAP-based explainability preserves 99.5% of detection performance. Cross-dataset generalization experiments show that within-dataset AUROC values (0.955–0.963) exceed cross-dataset AUROC values (0.812–0.882), motivating future domain-adaptation research. NeuralGuard is released as open-source software alongside the complete preprocessing pipeline, enabling reproducible replication and community-driven extension.

Article details

How to Cite

Zhang, M.-X. ., & Osei-Bonsu, D. . (2026). NeuralGuard: A Multi-Modal Ensemble Framework for Real-Time Anomaly Detection in Large-Scale Electronic Health Record Databases Using MIMIC-IV, eICU-CRD, and PhysioNet Benchmarks. DATAMIND, 4(2), 1-15. https://doi.org/10.63646/datamind.2026.040201