On-Device Deception Monitoring for Clinical LLM Assistants: A Privacy-Preserving Biomedical AI Safety Framework
Main article
Abstract
Clinical large language model (LLM) assistants are increasingly positioned as triage aides, discharge educators, documentation companions, and patient-facing conversational interfaces. Their usefulness depends not only on factual accuracy but also on the safety of their hidden or intermediate reasoning. A clinical assistant may produce a plausible final answer while masking weak evidence, overstating guideline support, avoiding uncertainty, or aligning with the user's preference instead of the patient's risk profile. Cloud-based oversight can detect some unsafe behavior, but it requires transmission of sensitive clinical text, introduces service dependency, and is poorly matched to bedside devices, home-monitoring gateways, and low-connectivity care settings. This article proposes an on-device deception monitoring framework for clinical LLM assistants. The framework adapts self-supervised reasoning-trace monitoring, contrastive representation learning, entropy-filtered self-labeling, and clinician-centered escalation to biomedical contexts. It treats reasoning traces as local safety evidence rather than as cloud data, and it separates ordinary uncertainty from clinically meaningful deception patterns such as unsupported reassurance, contraindication suppression, false guideline attribution, hidden task substitution, and patient-preference sycophancy. A scenario-based evaluation protocol is developed with 720 simulated clinical prompts across emergency triage, medication counseling, oncology follow-up, chronic disease self-management, and postoperative rehabilitation. The illustrative analysis shows that on-device contrastive monitoring lowers the Deceptive Trace Rate from 31.6% under output-only checking to 22.8%, and to 14.7% when combined with a clinician review queue, while substantially reducing privacy exposure compared with cloud teacher monitoring. The contribution is a privacy-preserving biomedical AI safety architecture that links edge deployment, trace-level risk analytics, and clinical governance. The framework is not presented as a replacement for physician judgment; rather, it provides a deployable safety layer for keeping clinical LLM assistance auditable, privacy-aware, and accountable.
