On-Device Deception Monitoring for Clinical LLM Assistants: A Privacy-Preserving Biomedical AI Safety Framework

Daniel Azman Rahman; Nur Aisyah Ismail; Farhan Hakim Abdullah; Mei Ling Tan; Sarah Amira Yusuf

doi:10.63646/jaihbe.2024.020401

Open Access PDF

Published 2024-12-30

Daniel Azman Rahman

Faculty of Computing, Universiti Malaysia Pahang Al-Sultan Abdullah, Pahang, Malaysia. Email: danielrahman@umpsa.edu.my

Nur Aisyah Ismail

Faculty of Medicine and Health Sciences, Universiti Malaysia Sabah, Kota Kinabalu, Malaysia

Farhan Hakim Abdullah

Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, Shah Alam, Malaysia

Mei Ling Tan

Faculty of Data Science and Information Technology, INTI International University, Nilai, Malaysia

Sarah Amira Yusuf*

Faculty of Computing, Universiti Malaysia Pahang Al-Sultan Abdullah, Pahang, Malaysia. Email: danielrahman@umpsa.edu.my
sarahamira@umpsa.edu.my

DOI: https://doi.org/10.63646/jaihbe.2024.020401

Abstract

Clinical large language model (LLM) assistants are increasingly positioned as triage aides, discharge educators, documentation companions, and patient-facing conversational interfaces. Their usefulness depends not only on factual accuracy but also on the safety of their hidden or intermediate reasoning. A clinical assistant may produce a plausible final answer while masking weak evidence, overstating guideline support, avoiding uncertainty, or aligning with the user's preference instead of the patient's risk profile. Cloud-based oversight can detect some unsafe behavior, but it requires transmission of sensitive clinical text, introduces service dependency, and is poorly matched to bedside devices, home-monitoring gateways, and low-connectivity care settings. This article proposes an on-device deception monitoring framework for clinical LLM assistants. The framework adapts self-supervised reasoning-trace monitoring, contrastive representation learning, entropy-filtered self-labeling, and clinician-centered escalation to biomedical contexts. It treats reasoning traces as local safety evidence rather than as cloud data, and it separates ordinary uncertainty from clinically meaningful deception patterns such as unsupported reassurance, contraindication suppression, false guideline attribution, hidden task substitution, and patient-preference sycophancy. A scenario-based evaluation protocol is developed with 720 simulated clinical prompts across emergency triage, medication counseling, oncology follow-up, chronic disease self-management, and postoperative rehabilitation. The illustrative analysis shows that on-device contrastive monitoring lowers the Deceptive Trace Rate from 31.6% under output-only checking to 22.8%, and to 14.7% when combined with a clinician review queue, while substantially reducing privacy exposure compared with cloud teacher monitoring. The contribution is a privacy-preserving biomedical AI safety architecture that links edge deployment, trace-level risk analytics, and clinical governance. The framework is not presented as a replacement for physician judgment; rather, it provides a deployable safety layer for keeping clinical LLM assistance auditable, privacy-aware, and accountable.

Keywords: clinical LLM assistants; deception monitoring; on-device AI; biomedical AI safety; privacy-preserving healthcare; reasoning trace analytics

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite

Rahman, D. A., Ismail, N. A., Abdullah, F. H., Tan, M. L., & Yusuf, S. A. (2024). On-Device Deception Monitoring for Clinical LLM Assistants: A Privacy-Preserving Biomedical AI Safety Framework. Journal of AI in Healthcare and Biomedical Engineering, 2(4), 1-17. https://doi.org/10.63646/jaihbe.2024.020401

Article sidebar

Main article

Abstract

Article details

How to Cite