Main article

Yinyan Shi
College of Engineering, Nanjing Agricultural University, Nanjing 210031, Jiangsu, China
Qiang Chen
College of Engineering, Nanjing Agricultural University, Nanjing 210031, Jiangsu, China
Chuang Xia
College of Engineering, Nanjing Agricultural University, Nanjing 210031, Jiangsu, China
Chuang Xia
College of Engineering, Nanjing Agricultural University, Nanjing 210031, Jiangsu, China
Xiaochan Wang*
College of Engineering, Nanjing Agricultural University, Nanjing 210031, Jiangsu, China
xcwang@njau.edu.cn
Xuekai Huang
College of Engineering, Nanjing Agricultural University, Nanjing 210031, Jiangsu, China
Kenji Tanaka
Graduate School of Agriculture, Kyoto University, Kyoto 606-8502, Japan
Hiroshi Yamamoto
Graduate School of Agriculture, Kyoto University, Kyoto 606-8502, Japan

Abstract

Wheat diseases and nutrient stress represent critical threats to global food security, causing annual yield losses estimated at 10-28% in major producing regions. Timely and accurate spatial mapping of stress distribution is essential for precision intervention, yet conventional scouting methods are labor-intensive, subjective, and unable to capture fine-grained spatial heterogeneity at the field scale. This paper proposes a novel end-to-end framework integrating Unmanned Aerial Vehicle (UAV) hyperspectral imaging with a transformer-based semantic segmentation model, SegFormer-B4, for simultaneous detection and spatial mapping of four wheat stress categories: healthy canopy, stripe rust (Puccinia striiformis), powdery mildew (Blumeria graminis), and nitrogen deficiency. Hyperspectral imagery across 128 spectral bands (400-1000 nm) was acquired using a DJI M300 RTK UAV equipped with a Specim AFX10 pushbroom sensor over winter wheat fields in Jiangsu and Zhejiang provinces during the heading-to-filling growth stages. A dataset of 4,680 annotated image patches (256x256 pixels) was constructed through systematic sampling and multi-strategy data augmentation. The Mix Transformer (MiT-B4) encoder, pre-trained on ImageNet-22K and fine-tuned on the wheat hyperspectral dataset, captures multi-scale spatial-spectral features through hierarchical overlapping patch embeddings and efficient self-attention. Comparative evaluation against six baseline architectures (FCN-8s, U-Net with VGG-16, DeepLabv3+ with MobileNetV2, PSPNet with ResNet-50, Swin-T UperNet, and SegFormer-B2) demonstrates that SegFormer-B4 achieves a mean Intersection over Union (MIoU) of 92.8%, mean Pixel Accuracy (MPA) of 95.6%, Precision of 94.9%, and Recall of 94.6%, representing improvements of 3.9-20.4 percentage points on MIoU over baselines. Disease area estimation on 12 independent field plots yields a maximum relative error below 2%, confirming strong practical applicability. Ablation analysis reveals that spectral band selection and multi-scale feature fusion collectively contribute 6.5 MIoU points over the base encoder, underscoring the critical role of hyperspectral feature exploitation in agricultural stress detection. The proposed framework provides a scalable, data-driven foundation for early warning systems and site-specific crop management.

Article details

How to Cite

Shi, Y., Chen, Q. ., Xia, C. ., Xia, C., Wang, X. ., Huang, X. ., Tanaka, K. ., & Yamamoto, H. . (2026). UAV Hyperspectral Imaging and Transformer-Based Semantic Segmentation for Multi-Class Wheat Disease Stress Detection in Precision Agriculture. Journal of Intelligent Industrial Convergence, 6(1), 1-14. https://doi.org/10.63646/jiic.2026.060101