Main article

Chiara Moretti
EURECOM, Sophia Antipolis, France, 06410
James Okwu*
African Institute for Mathematical Sciences, Cape Town, South Africa, 7945
j.okwu@aims.ac.za
Hana Kobayashi
African Institute for Mathematical Sciences, Cape Town, South Africa, 7945

Abstract

Retrieval-augmented generation (RAG) has become a standard approach for grounding large language model outputs in factual, updateable knowledge bases. Two dominant retrieval paradigms have emerged: vector-based retrieval (VectorRAG), which uses dense embedding similarity to identify relevant passages, and graph-based retrieval (GraphRAG), which traverses explicit knowledge graph structures to surface related entities and relationships. Despite both approaches seeing wide adoption, rigorous head-to-head comparisons under controlled conditions remain scarce. This paper presents an empirical comparison of VectorRAG and GraphRAG across four knowledge-intensive task families — single-hop QA, multi-hop QA, entity disambiguation, and temporal reasoning — using three corpora of varying knowledge graph density. We evaluate five specific implementations: naive RAG with FAISS, HyDE-augmented retrieval, Microsoft's GraphRAG, NebulaGraph RAG, and a hybrid architecture combining both paradigms. Results show that GraphRAG outperforms VectorRAG by 12–23 percentage points on multi-hop and temporal tasks but is competitive only with VectorRAG on single-hop factoid tasks while being substantially more expensive to construct and maintain. The hybrid architecture achieves the best overall performance across all task types at intermediate cost. We release all evaluation code and experimental logs to facilitate replication.

Article details

How to Cite

Moretti, C., Okwu, J., & Kobayashi, H. (2023). GraphRAG vs. VectorRAG: An Empirical Comparison of Retrieval Architectures for Knowledge-Intensive Tasks. DATAMIND, 1(4), 1-4. https://doi.org/10.63646/