Main article

Elena Petrova
School of Infocomm, Republic Polytechnic, Woodlands Avenue 9, Singapore 738964, Singapore
Yuki Tanaka
School of Infocomm, Republic Polytechnic, Woodlands Avenue 9, Singapore 738964, Singapore
Marcus Adebayo*
School of Computing, National University of Singapore, Singapore 119077
marcus.adebayo@nus.edu.sg

DOI: https://doi.org/10.63646/datamind.2025.030107

Abstract

Machine learning systems depend on data whose quality can degrade silently between training and inference. Missing values, distributional shifts, constraint violations, and stale records can erode model accuracy without triggering conventional database alarms. This paper introduces DataQualityOps, a framework for window-based continuous data quality observability tailored to databases that feed AI and machine learning pipelines. The system profiles incoming data against learned statistical baselines, validates records against declarative quality constraints, and aggregates dimension-level scores into a composite data quality index that serves as a CI/CD gate for model retraining and deployment decisions. We define five quality dimensions relevant to AI-critical databases—completeness, consistency, timeliness, uniqueness, and distributional stability—and implement anomaly detectors for each. Evaluation on a twelve-week simulated production workload with injected quality degradation events shows that DataQualityOps detects quality anomalies with a macro-averaged F1-score of 0.92 across five degradation categories, identifies degradation onset within a median of 1.3 hours, and reduces undetected degradation events under the simulated monitoring protocol by 78 percent compared to a manual inspection baseline. The framework integrates with standard MLOps toolchains and provides actionable quality diagnostics that enable data engineers to intervene before degraded data contaminates model predictions.

Article details

How to Cite

Petrova, E., Tanaka, Y. ., & Adebayo, M. . (2025). DataQualityOps: Continuous Data Quality Observability for AI-Critical Databases. DATAMIND, 3(1), 96-107. https://doi.org/10.63646/datamind.2025.030107