Adaptive Multi-Modal Data Lakehouse Architecture for Low-Latency AI Feature Retrieval

Hao  Ren; Priya  Natarajan; Marcus  Feldmann

doi:10.63646/datamind.2023.010206

Open Access PDF

Received 2023-01-19

Accepted 2023-05-27

Published 2023-06-30

Hao Ren

School of Computing, Pacific Institute of Technology, Singapore 138632

Priya Natarajan*

Department of Computer Science, Northbridge University, Cambridge CB2 1TN, United Kingdom
p.natarajan@northbridge.ac.uk

Marcus Feldmann

School of Computing, Pacific Institute of Technology, Singapore 138632

DOI: https://doi.org/10.63646/datamind.2023.010206

Abstract

Modern artificial-intelligence applications increasingly depend on retrieving two kinds of signal at inference time: precomputed structured features keyed by an entity, and nearest-neighbour embeddings of unstructured content such as text, images, and audio. The data-lakehouse paradigm has unified the storage of these heterogeneous assets under open columnar formats, but lakehouse engines are optimised for high-throughput analytical scans rather than for the millisecond-scale point and similarity lookups that online inference requires. In practice, teams bridge this gap by copying features into a separate low-latency key-value store and embeddings into a dedicated vector database, producing duplicated storage, operational overhead, and train–serve skew. This paper formulates low-latency multi-modal feature retrieval from a lakehouse as a concrete systems-engineering problem and presents AdaLH, an adaptive serving architecture that keeps a single open-format source of truth while exposing a unified retrieval interface that joins a structured point lookup with an approximate-nearest-neighbour search in one request. AdaLH introduces three mechanisms: an access-aware tier controller that promotes hot entities and embeddings into an in-memory row plus graph index while demoting cold data to object storage; a cost-based retrieval planner that estimates per-tier latency and routes the structured and vector legs of a request independently before fusing them; and an incremental materialisation pipeline that preserves point-in-time consistency between offline training and online serving. We evaluate AdaLH against four baselines—a two-tier lakehouse-plus-cache stack, a lakehouse-only scan engine, a split vector-database design, and a monolithic in-memory store—on a workload that mixes a ninety-five-feature view with top-ten embedding search over a corpus of one hundred million vectors. AdaLH attains a p99 end-to-end latency of 8.7 ms, a 3.1× improvement over the split design and a 20.7× improvement over the lakehouse-only engine, while sustaining 410 thousand requests per second and maintaining recall@10 above 0.95. An ablation shows that the planner, the controller, and co-located fusion each contribute materially to tail latency, and a calibration study confirms that the planner predicts per-request latency with a mean absolute percentage error of 7.6 percent. All code, configurations, datasets, and a data dictionary are released openly.

Keywords: Data lakehouse; feature store; vector retrieval; low-latency serving; approximate nearest neighbour; AI data infrastructure; adaptive tiering

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite

Ren, H. ., Natarajan, P. ., & Feldmann, M. (2023). Adaptive Multi-Modal Data Lakehouse Architecture for Low-Latency AI Feature Retrieval. DATAMIND, 1(2), 58-81. https://doi.org/10.63646/datamind.2023.010206

Download Citation

Article sidebar

Main article

Abstract

Article details

How to Cite