Evidence-Grounded AI Analytics for Explainable Mental Health Screening Using Large Language Models

Wenjun Hao; Liping Xu; Yuxin Tan; Jianhao Zhou; Mingxuan  Cao; Qing Wei

doi:10.63646/jaiaa.2025.030302

Open Access PDF

Published 2025-09-30

Wenjun Hao

School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, China

Liping Xu

Department of Psychology, Zhejiang Normal University, Jinhua 321004, China

Yuxin Tan

School of Information Engineering, Zhejiang University of Science and Technology, Hangzhou 310023, China

Jianhao Zhou

College of Public Health, Wenzhou Medical University, Wenzhou 325035, China

Mingxuan Cao

School of Management, Hangzhou Normal University, Hangzhou 311121, China

Qing Wei*

School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, China
qing.wei@hdu.edu.cn

DOI: https://doi.org/10.63646/jaiaa.2025.030302

Abstract

Depression is among the most prevalent and disabling mental disorders worldwide, yet timely and accurate diagnosis remains a persistent public-health challenge. Large Language Models (LLMs) have shown promise as auxiliary tools in clinical screening due to their strong language understanding and generation capabilities, but their direct deployment in psychiatric decision support is hindered by hallucination, opacity, and the absence of traceable evidence. To address these limitations, this paper proposes an evidence-grounded analytics framework that integrates Retrieval-Augmented Generation (RAG) with an agent-based two-stage diagnostic pipeline. In the first stage, an LLM agent extracts salient symptom phrases from a user-provided text and formulates a query against a structured knowledge base derived from authoritative clinical practice guidelines. In the second stage, the retrieved evidence is fed back to the LLM, which produces a diagnostic conclusion together with explicit citations to the supporting guideline excerpts. We instantiate the framework on four open-weight LLMs (Gemma-3, Qwen-3, DeepSeek-R1, and Llama-3.1) at the 4–8B parameter scale and evaluate it on a public dataset of 100 simulated counseling samples. The augmented framework increases accuracy by up to 17 percentage points (Llama-3.1: 57% → 74%) and precision by up to 17 percentage points (Gemma-3: 76.81% → 94.12%) compared with direct prompting, while maintaining competitive recall. Two contributions follow: (i) a unified RAG-Agent diagnostic architecture that grounds LLM outputs in verifiable clinical evidence, substantially reducing false positives and improving interpretability; and (ii) a comprehensive empirical study across heterogeneous LLM families demonstrating the cross-model generality of the approach. Our results suggest that evidence-grounded LLM analytics constitute a viable pathway for safe and trustworthy AI deployment in mental-health screening.

Keywords: Depression Screening; Large Language Models; Retrieval-Augmented Generation; LLM Agents; Explainable AI; Evidence-Based Reasoning

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite

Hao, W., Xu, L., Tan, Y., Zhou, J., Cao, M., & Wei, Q. (2025). Evidence-Grounded AI Analytics for Explainable Mental Health Screening Using Large Language Models. Journal of AI Analytics and Applications, 3(3), 19-35. https://doi.org/10.63646/jaiaa.2025.030302

Article sidebar

Main article

Abstract

Article details

How to Cite