Main article

Haoran Wei
School of Computing, University of Portsmouth, Portsmouth PO1 3HE, United Kingdom
Daniela M. Santos*
School of Computing and Mathematical Sciences, University of Greenwich, London SE10 9LS, United Kingdom
daniela.santos@greenwich.ac.uk
Junfeng Li
School of Computing, University of Portsmouth, Portsmouth PO1 3HE, United Kingdom
Aravind R. Nair
School of Physics, Engineering and Computer Science, University of Hertfordshire, Hatfield AL10 9AB, United Kingdom

DOI: https://doi.org/10.63646/datamind.2025.030106

Abstract

The recognition of network behavior from traffic data has become a central problem in modern cybersecurity, yet the difficulty is no longer the scarcity of data but the challenge of turning heterogeneous, imbalanced, and rapidly evolving traffic into reliable multi-class decisions that operators can trust. This study proposes and evaluates a data-driven computational pipeline that recognizes several classes of network behavior at once, combining a hybrid feature-transformation stage with a distributed verification mechanism. The hybrid stage fuses three complementary representations of each network flow: statistical descriptors that summarize volume and timing, spectral coefficients that capture periodic structure, and learned embeddings produced by a compact autoencoder. The fused descriptor is classified by a convolutional-recurrent model trained with class-balancing, and the resulting verdicts are confirmed by a lightweight distributed verification layer in which several independent nodes must reach consensus before an alert is committed. The pipeline is evaluated on harmonized records drawn from three widely used benchmarks, covering normal traffic and seven attack families, using stratified training and a held-out test partition. Across the multi-class task the pipeline reaches a macro-averaged F1-score of 0.969 and a weighted accuracy of 98.4 percent, with per-class recall above 0.95 for every attack family including rare minority classes. An ablation analysis shows that each transformation branch contributes measurably and that fusing all three is consistently better than any single representation. The distributed verification layer adds only a few milliseconds of median latency while scaling to sixty-four nodes and reducing committed false positives by roughly a third. The article argues that treating representation and verification as joint design choices, rather than afterthoughts, materially improves both accuracy and trustworthiness, and it offers a practical blueprint for building behavior-recognition pipelines that remain dependable under class imbalance and concept drift.

Article details

How to Cite

Wei, H., M. Santos, D., Li, J., & R. Nair, A. (2025). A Data-Driven Computational Pipeline for Multi-Class Network Behavior Recognition Using Hybrid Feature Transformation and Distributed Verification Mechanisms. DATAMIND, 3(1), 76-95. https://doi.org/10.63646/datamind.2025.030106