Detecting Hallucinations in 1D Generative Models: A Decision-Theoretic Approach to Quality Control for Synthetic Biosignals
Main article
Abstract
Generative deep-learning models are increasingly used to denoise, harmonise, and adapt one-dimensional biosignals before they are passed to clinical classification pipelines. While these models can substantially narrow the gap between the data domain seen at training and the data observed in deployment, they are also prone to producing visually plausible outputs that nonetheless contain spurious features—a phenomenon we describe as one-dimensional hallucination. Standard quality indicators, such as reconstruction error, Fréchet-style distances, or perceptual similarity, are either unavailable in unpaired settings or insensitive to small but clinically meaningful artefacts. This paper proposes a decision-theoretic framework that evaluates synthetic biosignals through the eyes of the downstream task they are intended to support. The predictive entropy of a frozen, well-calibrated atrial-fibrillation classifier acts as a proxy for the Bayes-optimal misclassification risk of the adapted example, allowing per-instance trustworthiness scoring without requiring ground-truth references for the generated signals themselves. Using a 1D Pix2pix denoiser trained on a heavily augmented variant of a public photoplethysmography dataset, we show that classifier entropy is a reliable selector of high-utility outputs, that a 75 % retention threshold recovers the performance of the unaugmented baseline, and that calibration improves measurably (UCEtotal = 0.087 → 0.038) after generative adaptation. The approach formalises a heuristic that is widely used in practice—judging synthetic data by what a downstream model does with it—and turns it into a quantitative quality-control instrument suitable for wearable-health pipelines and similar safety-aware applications.
