UAV Hyperspectral Imaging and Transformer-Based Semantic Segmentation for Multi-Class Wheat Disease Stress Detection in Precision Agriculture
Main article
Abstract
Wheat diseases and nutrient stress represent critical threats to global food security, causing annual yield losses estimated at 10-28% in major producing regions. Timely and accurate spatial mapping of stress distribution is essential for precision intervention, yet conventional scouting methods are labor-intensive, subjective, and unable to capture fine-grained spatial heterogeneity at the field scale. This paper proposes a novel end-to-end framework integrating Unmanned Aerial Vehicle (UAV) hyperspectral imaging with a transformer-based semantic segmentation model, SegFormer-B4, for simultaneous detection and spatial mapping of four wheat stress categories: healthy canopy, stripe rust (Puccinia striiformis), powdery mildew (Blumeria graminis), and nitrogen deficiency. Hyperspectral imagery across 128 spectral bands (400-1000 nm) was acquired using a DJI M300 RTK UAV equipped with a Specim AFX10 pushbroom sensor over winter wheat fields in Jiangsu and Zhejiang provinces during the heading-to-filling growth stages. A dataset of 4,680 annotated image patches (256x256 pixels) was constructed through systematic sampling and multi-strategy data augmentation. The Mix Transformer (MiT-B4) encoder, pre-trained on ImageNet-22K and fine-tuned on the wheat hyperspectral dataset, captures multi-scale spatial-spectral features through hierarchical overlapping patch embeddings and efficient self-attention. Comparative evaluation against six baseline architectures (FCN-8s, U-Net with VGG-16, DeepLabv3+ with MobileNetV2, PSPNet with ResNet-50, Swin-T UperNet, and SegFormer-B2) demonstrates that SegFormer-B4 achieves a mean Intersection over Union (MIoU) of 92.8%, mean Pixel Accuracy (MPA) of 95.6%, Precision of 94.9%, and Recall of 94.6%, representing improvements of 3.9-20.4 percentage points on MIoU over baselines. Disease area estimation on 12 independent field plots yields a maximum relative error below 2%, confirming strong practical applicability. Ablation analysis reveals that spectral band selection and multi-scale feature fusion collectively contribute 6.5 MIoU points over the base encoder, underscoring the critical role of hyperspectral feature exploitation in agricultural stress detection. The proposed framework provides a scalable, data-driven foundation for early warning systems and site-specific crop management.
