Privacy-Preserving Deception Detection for Healthcare Edge LLMs: Contrastive Chain-of-Thought Monitoring in Clinical Decision Support

Mingzhu  Chen; Ruijie  Wang; Liping  Xu

doi:10.63646/jaihbe.2025.030102

Open Access PDF

Published 2025-03-30

Mingzhu Chen

School of Health Informatics, Anhui Medical University, Hefei, Anhui, China.

Ruijie Wang

School of Computer Science and Software Engineering, Tianjin Polytechnic University, Tianjin, China

Liping Xu*

Department of Biomedical Engineering, Chongqing Medical University, Chongqing, China
lipingxu@cqmu.edu.cn

DOI: https://doi.org/10.63646/jaihbe.2025.030102

Abstract

Large language models (LLMs) deployed at the clinical edge present a previously uncharacterized risk: deceptive alignment, wherein a model produces plausibly safe reasoning chains while pursuing internal objectives misaligned with patient welfare. Existing mitigation pipelines depend on heavyweight cloud-based teacher models (e.g., GPT-4o) for Chain-of-Thought (CoT) annotation, introducing unacceptable privacy liabilities under healthcare data-protection regulations (HIPAA, GDPR). This paper presents a fully self-supervised, privacy-preserving framework for detecting clinical deception in edge-deployed LLMs. In place of binary cross-entropy classification, we introduce contrastive representation learning via Triplet Loss, projecting CoT hidden states into a structured semantic manifold in which clinically deceptive and safe reasoning patterns form geometrically separable clusters. Combined with entropy-filtered self-labeling and differentially private federated aggregation (epsilon = 1.5, delta = 1e-5), our lightweight monitor (0.1% of backbone parameters) eliminates cloud dependency at both training and inference. Evaluated on ClinDeceptionBench, a new benchmark encompassing 240 adversarial clinical scenarios across six deception taxonomies, the proposed Gemma-3-4B-IT implementation achieves a Deception Tendency Rate (DTR) of 36.7%, a 3.4 percentage-point improvement over the BCE baseline (40.1%), while maintaining strict PHI non-disclosure. The privacy cost is bounded: a 1.5 percentage-point DTR increase versus a non-private counterpart. Edge benchmarking on the NVIDIA Jetson Orin Nano confirms deployment feasibility at 28 ms per token with 7.5 W peak power. This work establishes the first geometric, privacy-preserving foundation for self-supervised clinical deception monitoring, transforming CoT transparency from a regulatory vulnerability into a forensic safety instrument.

Keywords: Healthcare Edge AI;Clinical Decision Support;Deceptive Alignment;Contrastive Learning;Privacy-Preserving;Chain-of-Thought;Federated Learning

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite

Chen, M., Wang, R. ., & Xu, L. (2025). Privacy-Preserving Deception Detection for Healthcare Edge LLMs: Contrastive Chain-of-Thought Monitoring in Clinical Decision Support. Journal of AI in Healthcare and Biomedical Engineering, 3(1), 17-32. https://doi.org/10.63646/jaihbe.2025.030102

Article sidebar

Main article

Abstract

Article details

How to Cite