EAAI Journal 2026 Journal Article
An enhanced you only look once model for multi-class apple detection in natural orchard environments
- Xiaohang Liu
- Zhao Zhang
- Jiangfan Yu
- Wanjia Hua
- Xu Li
- Han Li
- Man Zhang
- Chayan Kumer Saha
Multi-class apple detection can improve automatic apple-picking robots' efficiency. Existing studies classified apples into four occlusion types but struggled with clustered fruits and could not balance precision, speed, and model size. A robust Apple State You Only Look Once version 8 medium (AS-YOLOv8m) model was thus proposed for detecting apples into 11 classes according to the apples’ occlusion and clustering conditions. Core innovations included: (i) A cross-stage partial bottleneck module with the deformable convolution was designed to enhance feature extraction and geometric transformation modeling capabilities; (ii) the space-to-depth convolution module was embedded in the backbone network to improve small target detection; (iii) the large-target detection head was removed to lighten the model size; and (iv) the wise intersection over union box loss function was used to balance the loss of high- and low-quality anchor boxes. The model was trained (5, 845 images), validated (1, 948 images), and tested (1, 950 images) using 9, 743 apple images, which were augmented from 1, 149 original captures collected from commercial orchards under diverse lighting conditions. Results showed that AS-YOLOv8m achieved a higher mean average precision of 95. 8% in 11 classes than that of 95. 4% in 4 classes, which also outperformed other comparison models (<95. 1%) and prior research results (<91. 3%). The detection speed was 76. 9 frames per second, and the model size was 36. 2 megabytes. With its real-time capability, small model size, and high detection precision, the AS-YOLOv8m model stands as a promising multi-class apple detection method for the further improvement of robot picking effect and efficiency.