Main article

Wenjun Hao
School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, China
Liping Xu
Department of Psychology, Zhejiang Normal University, Jinhua 321004, China
Yuxin Tan
School of Information Engineering, Zhejiang University of Science and Technology, Hangzhou 310023, China
Jianhao Zhou
College of Public Health, Wenzhou Medical University, Wenzhou 325035, China
Mingxuan Cao
School of Management, Hangzhou Normal University, Hangzhou 311121, China
Qing Wei*
School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, China
qing.wei@hdu.edu.cn

DOI: https://doi.org/10.63646/jaiaa.2025.030302

Abstract

Depression is among the most prevalent and disabling mental disorders worldwide, yet timely and accurate diagnosis remains a persistent public-health challenge. Large Language Models (LLMs) have shown promise as auxiliary tools in clinical screening due to their strong language understanding and generation capabilities, but their direct deployment in psychiatric decision support is hindered by hallucination, opacity, and the absence of traceable evidence. To address these limitations, this paper proposes an evidence-grounded analytics framework that integrates Retrieval-Augmented Generation (RAG) with an agent-based two-stage diagnostic pipeline. In the first stage, an LLM agent extracts salient symptom phrases from a user-provided text and formulates a query against a structured knowledge base derived from authoritative clinical practice guidelines. In the second stage, the retrieved evidence is fed back to the LLM, which produces a diagnostic conclusion together with explicit citations to the supporting guideline excerpts. We instantiate the framework on four open-weight LLMs (Gemma-3, Qwen-3, DeepSeek-R1, and Llama-3.1) at the 4–8B parameter scale and evaluate it on a public dataset of 100 simulated counseling samples. The augmented framework increases accuracy by up to 17 percentage points (Llama-3.1: 57% → 74%) and precision by up to 17 percentage points (Gemma-3: 76.81% → 94.12%) compared with direct prompting, while maintaining competitive recall. Two contributions follow: (i) a unified RAG-Agent diagnostic architecture that grounds LLM outputs in verifiable clinical evidence, substantially reducing false positives and improving interpretability; and (ii) a comprehensive empirical study across heterogeneous LLM families demonstrating the cross-model generality of the approach. Our results suggest that evidence-grounded LLM analytics constitute a viable pathway for safe and trustworthy AI deployment in mental-health screening.

Article details

How to Cite

Hao, W., Xu, L., Tan, Y., Zhou, J., Cao, M., & Wei, Q. (2025). Evidence-Grounded AI Analytics for Explainable Mental Health Screening Using Large Language Models. Journal of AI Analytics and Applications, 3(3), 19-35. https://doi.org/10.63646/jaiaa.2025.030302