Multiscript Language Technologies and Digital Inclusion: A Sociotechnical Study of Kazakh Script Conversion
Main article
Abstract
Kazakh is a major Turkic language written and read through Arabic-based, Cyrillic-based, and Latin-based scripts across different communities, regions, archives, and digital platforms. This multiscript condition is often treated as a narrow technical problem of transliteration accuracy, yet it also shapes who can search for public information, access education, preserve family records, participate in e-government, and maintain linguistic identity in data-driven societies. This paper develops a sociotechnical study of Kazakh script conversion by connecting neural conversion methods, loanword-aware prompting, corpus governance, and digital inclusion. Building on a secondary analysis of recent Kazakh multiscript conversion benchmarks, the study reinterprets character error rate (CER) and word error rate (WER) as proxies for accessibility friction, institutional reliability, and cultural continuity. The analysis shows that prompt-constrained Transformer conversion substantially reduces word-level friction across six conversion directions but also reveals that model accuracy alone is insufficient for inclusive deployment. Script conversion systems affect people through interface design, standardization choices, metadata practices, education policies, data rights, and community trust. The paper contributes a multilayer framework that links script ecology, linguistic resources, conversion services, access settings, governance arrangements, and inclusion outcomes. It further proposes design principles for transparent, auditable, and community-sensitive Kazakh language technologies. The findings suggest that multiscript conversion should be governed not merely as an automation service but as digital public infrastructure for linguistic equity.
