Main article

Hartmann
Faculty of Education and Technology, Polytechnic University of Porto, Porto, Portugal
Marcus Oliveirai*
Faculty of Education and Technology, Polytechnic University of Porto, Porto, Portugal
marcus.oliveira@ipp.pt
Annika Svensson
Faculty of Education and Technology, Polytechnic University of Porto, Porto, Portugal
Dario Esposito
Faculty of Education and Technology, Polytechnic University of Porto, Porto, Portugal

DOI: https://doi.org/10.63646/datamind.2023.010405

Abstract

Predicting student dropout before it occurs requires not just predictive models but purpose-built, ethically governed data infrastructure that integrates learning-platform clickstreams, assessment outcomes, assignment submission patterns, forum engagement, attendance records, and student support interactions. This paper introduces EduRiskDB, a relational-graph-vector hybrid database containing 148,392 student-term records drawn from four European higher-education institutions over six academic years (2017–2023). EduRiskDB is designed for reproducible experimentation in dropout-risk modelling and explainable educational analytics. The database schema, field dictionary, indexing strategy, data-quality controls, ethics-compliance pipeline, and open-access interfaces are described in detail. A benchmark experiment evaluates six dropout-prediction models on EduRiskDB, demonstrating that the augmented XGBoost configuration achieves an AUC-ROC of 0.903 with a mean early-warning lead time of 7.8 weeks, outperforming all baselines. SHAP attribution identifies cumulative click sequences, seven-day assignment lag, and attendance streaks as the three most predictive signals, with non-linear interactions confirmed by dependence plots. EduRiskDB is archived on Zenodo under a CC BY 4.0 licence and updated semi-annually.

Article details

How to Cite

Sophia Hartmann, S., Oliveirai, M., Svensson, A., & Esposito, D. (2023). EduRiskDB: A Student Learning and Dropout-Risk Database for Explainable Educational Analytics. DATAMIND, 1(4), 49-63. https://doi.org/10.63646/datamind.2023.010405