Evidence-Grounded AI Analytics for Explainable Mental Health Screening Using Large Language Models
Main article
Abstract
Depression is among the most prevalent and disabling mental disorders worldwide, yet timely and accurate diagnosis remains a persistent public-health challenge. Large Language Models (LLMs) have shown promise as auxiliary tools in clinical screening due to their strong language understanding and generation capabilities, but their direct deployment in psychiatric decision support is hindered by hallucination, opacity, and the absence of traceable evidence. To address these limitations, this paper proposes an evidence-grounded analytics framework that integrates Retrieval-Augmented Generation (RAG) with an agent-based two-stage diagnostic pipeline. In the first stage, an LLM agent extracts salient symptom phrases from a user-provided text and formulates a query against a structured knowledge base derived from authoritative clinical practice guidelines. In the second stage, the retrieved evidence is fed back to the LLM, which produces a diagnostic conclusion together with explicit citations to the supporting guideline excerpts. We instantiate the framework on four open-weight LLMs (Gemma-3, Qwen-3, DeepSeek-R1, and Llama-3.1) at the 4–8B parameter scale and evaluate it on a public dataset of 100 simulated counseling samples. The augmented framework increases accuracy by up to 17 percentage points (Llama-3.1: 57% → 74%) and precision by up to 17 percentage points (Gemma-3: 76.81% → 94.12%) compared with direct prompting, while maintaining competitive recall. Two contributions follow: (i) a unified RAG-Agent diagnostic architecture that grounds LLM outputs in verifiable clinical evidence, substantially reducing false positives and improving interpretability; and (ii) a comprehensive empirical study across heterogeneous LLM families demonstrating the cross-model generality of the approach. Our results suggest that evidence-grounded LLM analytics constitute a viable pathway for safe and trustworthy AI deployment in mental-health screening.
