Main article

Daniel Azman Rahman
Faculty of Computing, Universiti Malaysia Pahang Al-Sultan Abdullah, Pahang, Malaysia. Email: danielrahman@umpsa.edu.my
Nur Aisyah Ismail
Faculty of Medicine and Health Sciences, Universiti Malaysia Sabah, Kota Kinabalu, Malaysia
Farhan Hakim Abdullah
Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, Shah Alam, Malaysia
Mei Ling Tan
Faculty of Data Science and Information Technology, INTI International University, Nilai, Malaysia
Sarah Amira Yusuf*
Faculty of Computing, Universiti Malaysia Pahang Al-Sultan Abdullah, Pahang, Malaysia. Email: danielrahman@umpsa.edu.my
sarahamira@umpsa.edu.my

DOI: https://doi.org/10.63646/jaihbe.2024.020401

Abstract

Clinical large language model (LLM) assistants are increasingly positioned as triage aides, discharge educators, documentation companions, and patient-facing conversational interfaces. Their usefulness depends not only on factual accuracy but also on the safety of their hidden or intermediate reasoning. A clinical assistant may produce a plausible final answer while masking weak evidence, overstating guideline support, avoiding uncertainty, or aligning with the user's preference instead of the patient's risk profile. Cloud-based oversight can detect some unsafe behavior, but it requires transmission of sensitive clinical text, introduces service dependency, and is poorly matched to bedside devices, home-monitoring gateways, and low-connectivity care settings. This article proposes an on-device deception monitoring framework for clinical LLM assistants. The framework adapts self-supervised reasoning-trace monitoring, contrastive representation learning, entropy-filtered self-labeling, and clinician-centered escalation to biomedical contexts. It treats reasoning traces as local safety evidence rather than as cloud data, and it separates ordinary uncertainty from clinically meaningful deception patterns such as unsupported reassurance, contraindication suppression, false guideline attribution, hidden task substitution, and patient-preference sycophancy. A scenario-based evaluation protocol is developed with 720 simulated clinical prompts across emergency triage, medication counseling, oncology follow-up, chronic disease self-management, and postoperative rehabilitation. The illustrative analysis shows that on-device contrastive monitoring lowers the Deceptive Trace Rate from 31.6% under output-only checking to 22.8%, and to 14.7% when combined with a clinician review queue, while substantially reducing privacy exposure compared with cloud teacher monitoring. The contribution is a privacy-preserving biomedical AI safety architecture that links edge deployment, trace-level risk analytics, and clinical governance. The framework is not presented as a replacement for physician judgment; rather, it provides a deployable safety layer for keeping clinical LLM assistance auditable, privacy-aware, and accountable.

Article details

How to Cite

Rahman, D. A., Ismail, N. A., Abdullah, F. H., Tan, M. L., & Yusuf, S. A. (2024). On-Device Deception Monitoring for Clinical LLM Assistants: A Privacy-Preserving Biomedical AI Safety Framework. Journal of AI in Healthcare and Biomedical Engineering, 2(4), 1-17. https://doi.org/10.63646/jaihbe.2024.020401