Prompt-Constrained Transformer Analytics for Low-Resource Script Conversion in Kazakh Language Processing

Aigerim Nurzhanova; Miras Daurenov; Saltanat Ibrayeva

doi:10.63646/jaiaa.2025.030403

Open Access PDF

Published 2025-12-30

Aigerim Nurzhanova

Department of Computer Science and Software Engineering, Korkyt Ata Kyzylorda University, Kyzylorda, Kazakhstan

Miras Daurenov

School of Information Technologies, Almaty Technological University, Almaty, Kazakhstan

Saltanat Ibrayeva*

Department of Applied Linguistics and Digital Humanities, M. Kozybayev North Kazakhstan University, Petropavl, Kazakhstan
s.ibrayeva@ku.edu.kz

DOI: https://doi.org/10.63646/jaiaa.2025.030403

Abstract

This article develops a prompt-constrained Transformer analytics framework for low-resource script conversion in Kazakh language processing. Kazakh is written across Arabic-based, Cyrillic-based, and Latin-based orthographies, and conversion among these scripts is complicated by non-one-to-one grapheme mappings, vowel harmony, consonant alternation, agglutinative morphology, regional lexical preferences, and large numbers of loanwords. Instead of treating script conversion as a purely mechanical transliteration task, the proposed framework treats it as a constrained sequence analytics problem in which lexical prompts provide soft but explicit guidance to a Transformer encoder-decoder model. The framework integrates a multiscript lexical prompt bank, morphological screening, prompt-conditioned attention, and post-conversion error analytics. A controlled benchmark design is introduced to evaluate six conversion directions across general news, educational text, public-service notices, and technology-related terminology. The results indicate that prompt constraints reduce character and word error rates relative to attention-only baselines, with the greatest gains in loanwords, proper names, and morphologically inflected forms. The article further analyzes the trade-offs among conversion accuracy, robustness, latency, explainability, and data governance. The findings suggest that prompt-constrained analytics offers a practical pathway for low-resource language technologies because it combines neural contextual learning with auditable linguistic knowledge. The study contributes to artificial intelligence analytics by showing how structured prompts can turn scarce linguistic resources into deployable conversion intelligence for multilingual digital services.

Keywords: Low-resource language processing; Kazakh script conversion; Transformer analytics; Prompt constraints; Loanword normalization; Multiscript NLP; Cross-attention

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite

Nurzhanova, A., Daurenov, M., & Ibrayeva, S. (2025). Prompt-Constrained Transformer Analytics for Low-Resource Script Conversion in Kazakh Language Processing. Journal of AI Analytics and Applications, 3(4), 33-48. https://doi.org/10.63646/jaiaa.2025.030403

Article sidebar

Main article

Abstract

Article details

How to Cite