Hallucination Risks in Generative Deep Learning for Wearable Cardiovascular Monitoring: A Systematic Review of Quantitative Evaluation Methods

Elena Marín López; Marcos Vidal Serrano; Lucía Navarro Prieto

doi:10.63646/trmbe.2023.010201

Open Access PDF

Published 2023-06-30

Elena Marín López

Department of Computer Science, University of Jaén, Jaén, Spain

Marcos Vidal Serrano

Department of Electronics, University of Alcalá, Alcalá de Henares, Spain

Lucía Navarro Prieto*

Department of Signal Theory and Communications, University of Vigo, Vigo, Spain
lnavarro@uvigo.es

Abstract

Background: Generative deep learning is increasingly used to denoise, complete, translate, and synthesize wearable photoplethysmography and electrocardiography signals. These models can also create hallucinated cardiovascular structure that appears plausible while changing rhythms, morphology, or downstream risk estimates. Objective: This systematic methods review evaluates quantitative approaches for detecting and managing hallucination risk in generative wearable cardiovascular monitoring. Methods: We followed PRISMA-aligned evidence mapping and coded 46 reports published from 2017 to 2026 that involved generative time-series modeling, wearable cardiovascular signals, or decision-linked uncertainty evaluation. Metrics were grouped by signal fidelity, physiological feature preservation, distributional realism, downstream task utility, calibration, out-of-distribution stress testing, fairness, and expert review. Results: Pointwise error and signal-to-noise measures were the most common evaluation tools, but they were weak proxies for local clinical harm when paired clean targets were unavailable. Physiological feature metrics and downstream classifiers were more decision-relevant, yet they could miss subgroup failures and model-induced rhythm artifacts. Only a small subset of reports quantified uncertainty calibration or used deferral analysis. Conclusion: No single metric adequately evaluates hallucination risk. We propose a layered evaluation framework that combines paired fidelity, physiological constraints, task-specific decision loss, uncertainty calibration, and stress testing before generative models are used in wearable cardiovascular monitoring.

Article Metrics Graph

Article Views | PDF Downloads

Chart Graph | Range Graph

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite

López, E. M., Serrano, M. V., & Prieto, L. N. (2023). Hallucination Risks in Generative Deep Learning for Wearable Cardiovascular Monitoring: A Systematic Review of Quantitative Evaluation Methods. Trends and Reviews in Medicine and Biomedical Engineering, 1(2), 1-29. https://doi.org/10.63646/trmbe.2023.010201

Article sidebar

Main article

Abstract

Article Metrics Graph

Article details

How to Cite