Precision-Aware Workload Analytics for AI Systems: A Unified Monitoring Framework Across PyTorch and Scikit-learn Pipelines
Main article
Abstract
The computational cost of modern machine-learning (ML) and deep-learning (DL) workloads has become a first-class concern in applied AI research, particularly as workloads grow in size and as hardware diversity widens. Conventional efficiency indicators such as wall-clock time, joules, or equivalent CO2 emissions depend strongly on the physical machine on which an experiment is executed, which makes reproducible comparison across laboratories and across hardware generations difficult. This paper develops a unified monitoring framework that measures computational workload at two complementary levels: the algorithmic level, captured by floating-point-operation (FLOP) counts, and the hardware-level, captured by bit-operation (BOP) counts that incorporate operand precision. The framework is implemented as a hardware-agnostic, backend-pluggable Python pipeline that intercepts operations dynamically in PyTorch through the dispatcher layer and wraps estimator methods analytically in Scikit-learn. Using a structured evaluation across three canonical workloads—a fully-connected classifier on tabular data, a convolutional model on image data, and a small transformer on text—we show that FLOP counts alone systematically mis-represent the efficiency benefits of quantization, while BOP counts provide a more faithful view of hardware-level effort. Aggregation over training and inference phases, combined with precision-aware scaling, yields a reproducible efficiency fingerprint that is stable across CPU and GPU backends to within a narrow interval. The framework preserves the structure of existing experimental pipelines and adds only a thin supervisory layer. The contribution is not a single tool but an analytics pattern that connects algorithmic complexity, numerical precision, and practitioner workflow into one coherent monitoring surface.
