Apnet: Lightweight network for apricot tree disease and pest detection in real-world complex backgrounds

Li, Minglang; Tao, Zhiyong; Yan, Wentao; Lin, Sen; Feng, Kaihao; Zhang, Zeyi; Jing, Yurong

doi:10.1186/s13007-025-01324-5

Research
Open access
Published: 09 January 2025

Apnet: Lightweight network for apricot tree disease and pest detection in real-world complex backgrounds

Minglang Li¹,
Zhiyong Tao¹,
Wentao Yan²,
Sen Lin³,
Kaihao Feng¹,
Zeyi Zhang¹ &
…
Yurong Jing¹

Plant Methods volume 21, Article number: 4 (2025) Cite this article

1227 Accesses
1 Altmetric
Metrics details

Abstract

Apricot trees, serving as critical agricultural resources, hold a significant role within the agricultural domain. Conventional methods for detecting pests and diseases in these trees are notably labor-intensive. Many conditions affecting apricot trees manifest distinct visual symptoms that are ideally suited for precise identification and classification via deep learning techniques. Despite this, the academic realm currently lacks extensive, realistic datasets and deep learning strategies specifically crafted for apricot trees. This study introduces ATZD01, a publicly accessible dataset encompassing 11 categories of apricot tree pests and diseases, meticulously compiled under genuine field conditions. Furthermore, we introduce an innovative detection algorithm founded on convolutional neural networks, specifically devised for the management of apricot tree pests and diseases. To enhance the accuracy of detection, we have developed a novel object detection framework, APNet, alongside a dedicated module, the Adaptive Thresholding Algorithm (ATA), tailored for the detection of apricot tree afflictions. Experimental evaluations reveal that our proposed algorithm attains an accuracy rate of 87.1% on ATZD01, surpassing the performance of all other leading algorithms tested, thereby affirming the effectiveness of our dataset and model. The code and dataset will be made available at https://github.com/meanlang/ATZD01.

Introduction

The apricot tree, a deciduous species of the genus Prunus in the family Rosaceae, is highly valued for its significant economic importance, drought tolerance, and adaptability to poor soils. This economically critical fruit tree is widely cultivated across various temperate regions worldwide. Its primary value lies in its edible fruits and medicinal properties. According to relevant research and statistical data, global apricot production is substantial and holds a prominent economic position in several countries. For instance, Turkey, a leading producer of apricots, accounts for approximately 22% of global production in certain years. Within Turkey, specific regions like Malatya Province contribute more than 37.83% of the country’s total apricot yield [1, 2]. On a global scale, the average fresh apricot production from 2000 to 2007 was approximately 2.67 million metric tons. Furthermore, predictive models estimate that global apricot production will remain steady at around 3.76 million metric tons annually during the period from 2018 to 2025 [3].

Over thirty distinct diseases and pests afflict apricot trees, with major threats including Bacterial shot hole, Gummosis, Scabbed disease, Brown rot, Anthracnose, and pest infestations such as Mites, Carposina sasakii Matsumura, Aromia bungii, Cnidocampa flavescens Walker, and Hyphantria cunea. These ailments are primarily responsible for significant reductions in apricot yield [4]. Cultivating apricots in regions like Western Europe, North Africa, East Asia, and West Asia presents challenges due to the difficulty growers face in accurately diagnosing these afflictions, thereby complicating targeted management practices. In modern agriculture, timely and precise detection of such diseases and pests is crucial for ensuring crop yield and profitability, particularly for economically and medicinally valuable fruit trees like apricots. Traditionally, the detection of diseases and pests in orchards has relied heavily on manual inspections by agricultural experts or empirical judgments by farmers [5], a process that is both inefficient and difficult to scale in extensive apricot orchards.

With the rapid advancement of deep learning technologies within the realm of computer vision, the application of these methodologies for detecting crop diseases and pests has proven to be a robust solution [6,7,8,9]. Empirical observations and experimental results demonstrate that common diseases and pests affecting apricot trees display pronounced visual characteristics, which form a solid foundation for the deployment of deep learning techniques in their detection and classification [10]. Within contemporary agricultural practices, the use of visual algorithms for diagnosing and managing crop afflictions has established itself as a definitive trend, thereby enhancing the efficiency of crop management and cultivation strategies [11].

Despite the satisfactory performance exhibited by some algorithms in the domain of crop disease and pest detection, the specific field of apricot tree disease and pest detection continues to face numerous unresolved challenges [12]. A primary concern is the absence of a comprehensive, publicly accessible dataset dedicated to apricot tree afflictions. The available datasets for apricot tree disease and pest detection remain non-public and are primarily composed of images that do not originate from real-world conditions [13]. Such limitations tend to oversimplify the complexities involved in detecting diseases and pests in apricot trees. The dissemination and sharing of datasets are critical in this research field as they enable the comparison and exchange of diverse methodologies and theories, and also facilitate the enhancement and expansion of these datasets by future researchers, thereby driving forward the progress of research in apricot tree disease and pest detection. Additionally, within the broader field of crop disease and pest detection, a notable discrepancy exists across datasets concerning the same crop. Specifically, when models are trained on one dataset and tested on another, there is a significant decline in their performance [14]. This issue predominantly stems from many existing crop datasets comprising a substantial proportion of images sourced from the internet or created under artificial conditions, which drastically undermines their applicability in practical scenarios. Another notable challenge involves the intricate visual characteristics of apricot tree diseases and pests, where there is considerable similarity across different classes (e.g., Bacterial shot hole and Scabbed disease) and diverse visual manifestations within the same class (e.g., Bacterial shot hole displaying different characteristics on fruits versus leaves). These factors significantly compromise the accuracy and robustness of detection models, underscoring the need for more nuanced and contextually accurate datasets.

To enhance the practical application of our research, we have carefully curated a new, diverse, and temporally varied real-world dataset for apricot tree disease and pest detection, designated as ATZD01. Compared to existing collections, this dataset stands out due to its distinct characteristics. Primarily, the original images are exclusively sourced from authentic real-world scenarios, captured across three distinct orchards over various time periods. This approach ensures that the dataset reflects complex scene transitions and backgrounds, closely aligning with actual application environments. Secondly, the imagery was captured under two distinct weather conditions-sunny and cloudy. Notably, images taken during sunny conditions include a substantial number affected by direct sunlight, introducing significant lighting variations. Finally, ATZD01 encompasses 6,055 images and 20,272 samples, with the majority containing multiple specimens across various classes, thus offering a rich diversity in viewing perspectives. To the best of our knowledge, ATZD01 represents the largest and most comprehensive public dataset available in the realm of apricot tree disease and pest detection. A more detailed exposition will be provided in Section ATZD01 Dataset.

To address the second challenge, we propose the design of a bespoke network tailored to the complex visual characteristics exhibited by various apricot tree diseases and pests. Specifically, we have developed a new module called the Adaptive Thresholding Algorithm (ATA), which is dedicated to detecting pests and diseases in apricot trees. This module is designed to reconstruct the neck and backbone of the target detection network, enabling more effective utilization of shallow features that retain detailed information, thereby enhancing the network’s focus on multiple targets and challenging samples. Furthermore, we have introduced a new detection head, Dyhead, to further leverage multi-scale feature information, thereby enhancing the overall precision and robustness of the model.

In summary, our contributions can be summarized as follows:

We have collected and released a new, challenging, and large dataset, ATZD01. Unlike existing datasets, ATZD01 is publicly available and sets a more realistic and challenging benchmark for apricot tree disease and pest detection tasks.
We have introduced a comprehensive network, named APNet, specifically designed for the detection of diseases and pests in apricot trees. This network effectively identifies 11 distinct classes of apricot tree afflictions and demonstrates superior performance compared to all previously tested models.
We have developed a specialized module, ATA, specifically for apricot tree pest and disease detection, which reconstructs the neck network and backbone of the detection model. This enhancement utilizes detailed information to further augment the model’s ability to detect challenging samples and improve its overall resistance to interference.

Related work

With the widespread application of deep learning technologies in agriculture, particularly in the detection of crop pests, researchers have developed various deep learning-based algorithms that significantly enhance detection accuracy and efficiency [15]. Notably, foundational networks such as Fast R-CNN [16], SSD [17], Mask R-CNN [18], and the YOLO series [19] have been extensively adapted and refined for deployment in the detection of diseases and pests across a range of crops.

Li and Rai [20] utilized ResNet18 to directly study the identification and classification of diseased and healthy apple leaves, achieving an accuracy rate of 98.5%. However, the dataset used was overly simplistic, limiting its applicability to more realistic and complex field scenarios. Thenmozhi and Reddy [21] proposed an effective deep CNN model, employing transfer learning to fine-tune a pretrained model. This approach significantly enhanced the model’s transferability for classifying insect species across three public insect datasets. Sardogan et al. [22] employed a classification method based on CNN models and Learning Vector Quantization (LVQ) algorithms for the detection of diseases in tomato leaves. This approach effectively utilized information from regions of interest, enhancing the precision of disease identification. Zhang et al. [23] proposed a pest classification and identification method that incorporates hard sample mining and residual networks. This approach significantly improved accuracy and enhanced the network’s capability to detect small targets. Afzaal et al. [24] enhanced the robustness and noise immunity of their model by employing a Mask R-CNN with a ResNet backbone for detecting strawberry diseases under seven complex background conditions, achieving an average precision of 82.43%. This approach significantly improved the model’s performance in challenging environments. Liu et al. [25] introduced an enhanced CNN methodology for identifying grape leaf diseases. The model employs depthwise separable convolutions instead of standard convolutions to reduce overfitting and the number of parameters. By applying an initialized architecture, the model enhances its ability to extract multi-scale features, thereby adapting to grape leaf lesions of varying sizes. This approach resulted in an accuracy of 97.22%.

In the detection of crop diseases and pests, attention mechanisms have been demonstrated to be succinct and effective. Consequently, many researchers opt to employ various attention mechanisms and networks to tailor the detection processes to specific crop afflictions. Xue et al. [26]enhanced the YOLOv5 detection framework by integrating self-attention and convolutional attention modules, achieving an accuracy of 82.6% in detecting diseases and pests on tea tree leaves. Hu et al. [27] proposed a deep neural network, YOLO-GBS, which incorporates a global context attention mechanism to locate targets against complex backgrounds. The model incorporates a self-attention mechanism that leverages global contextual information, significantly improving both accuracy and robustness, and achieving a mAP of 79.8%. Zhao et al. [28] developed SEV-Net, a deep learning-based attention network model for plant disease recognition. They embedded an improved channel and spatial attention module into the residual blocks of ResNet, reducing information redundancy between channels and focusing on the most informative regions of the feature maps. This integration significantly enhanced both the speed of recognition and the accuracy of detection, achieving an accuracy of 95.37%.

In the field of apricot tree disease and pest detection using deep learning, current research is notably sparse, primarily due to the absence of publicly available large-scale datasets specific to apricot tree afflictions. TÜRKOĞLU and HANBAY [29] applied CNN models to the domain of apricot tree disease detection, employing K-Nearest Neighbor (KNN) algorithms to classify deep features extracted by various CNN architectures. This approach achieved high accuracy in identifying specific diseases affecting apricot trees. Han et al. [30] integrated deep learning techniques with multiple data augmentation strategies to develop an Adaptive Sampling Latent Variable Network (ASLVN) framework, combined with spatial state attention. This innovative approach significantly improved the detection capability of apricot tree pests and diseases in complex environments, achieving an accuracy of 90%. As shown in Table 1, we have summarized the advantages and limitations of each key algorithm.

Table 1 Summary of the advantages and disadvantages of different methods across various years

Full size table

Table 2 Comparison of ATZD01 with other apricot tree pest and disease detection datasets, where bold indicates the best data, and “-” denotes undisclosed information in the original article

Full size table

ATZD01 dataset

Overview of previous datasets

Existing apricot tree disease and pest detection datasets, despite their deep learning-based initiatives as demonstrated in Table 2, have enabled the development of models that exhibit discernible capabilities in real-world scenarios through the use of annotated training data. However, these datasets suffer from five significant limitations. Firstly, both datasets are either non-public or semi-public, which stifles further progress in the field as other researchers cannot validate the performance of deep learning models on the same datasets. Secondly, the existing datasets do not provide a sufficient number of samples, particularly in comparison with real apricot orchard scenes; the largest dataset [30] contains only 9612 samples as shown in Table 2. Thirdly, there is an inadequacy of real samples within these datasets, which include images sourced from the internet and simulated settings. While these samples may enhance detection accuracy, they fail to mimic actual field conditions accurately. Fourthly, the current datasets focus only on pests without considering natural predators, which limits the comprehensive understanding of the apricot orchard ecosystem by management personnel. Lastly, there is a lack of challenging samples in the existing datasets, with no significant variation in scenes or lighting conditions. These limitations highlight the necessity of creating a larger, more realistic, publicly accessible dataset for apricot tree disease and pest detection.

Description to ATZD01

In response to the aforementioned limitations, we endeavored to emulate the working environments of orchard cultivators and forest conservationists as closely as possible by assembling a new, diverse, and multi-scenario apricot tree disease and pest detection dataset (ATZD01). This dataset was compiled over a five-month period using six different devices, including a Canon 6D DSLR camera. Images were captured under varying weather conditions-both sunny and overcast-throughout the morning, afternoon, and dusk periods. These images were collected from three distinct apricot orchards. Subsequently, these images were manually annotated by agricultural experts and volunteers.

Figure 1 illustrates the 11 categories of ATZD01, which encompass five major diseases, five primary pests of apricot trees, and one beneficial insect species For ease of subsequent experimental demonstration, we have defined abbreviations as follows: Mite Infestation, where mites typically aggregate on the underside of apricot leaves, causing leaf damage and affecting photosynthesis; Bacterial Shot Hole (BSH), known to induce extensive leaf drop and significantly impact apricot yields; Gummosis, leading to necrosis in the subcortical tissues of apricot trees, resulting in yellowing leaves and premature leaf drop; Scabbed disease, detrimental to apricot fruit and prone to causing fruit cracking; Brown Rot, primarily affecting apricot fruit, resulting in fruit decay and shriveling; Anthracnose, a major threat to apricot fruit, often causing extensive fruit rot and substantial losses; Carposina sasakii Matsumura Infestation (CMI), frequently causing fruit deformities and facilitating fruit rot; Aromia bungii Infestation (ABI), an omnivorous pest impacting both fruit and tree trunks, reducing apricot yield and lifespan; Cnidocampa Flavescens Walker Infestation (CFWI), known to consume apricot leaf tissues, potentially stripping leaves down to petioles and main veins, with toxic effects on human skin; Hyphantria cunea Infestation (HCI), capable of defoliating apricot trees entirely by gradually consuming all leaf tissues; Chilocorus Rubidus Hope (CRH) is a predator of pests such as Didesmococcus koreanus, providing partial protection to apricot orchards. We believe that the thorough detection of these 11 categories can assist orchard managers and cultivators in more effectively protecting and cultivating apricot orchards.

Figure 2 displays a selection of challenging samples from the ATZD01 dataset. We believe that these challenging samples make the dataset more representative of actual detection scenarios and enhance the robustness and interference resistance of the models trained on it. Compared to existing datasets, the novel features of ATZD01 can be summarized as follows:

More Categories, Images, and Samples: To our knowledge, ATZD01 is currently the largest publicly available dataset for apricot tree pest and disease detection. ATZD01 comprises 20,272 samples and 6,055 images, significantly exceeding the size of other datasets.
Complex Scenes and Backgrounds: ATZD01 includes the most challenging samples. As shown in Fig. 2, we have collected images with direct sunlight, long-distance multiple-sample images, images of the underside of leaves, and low-light samples. These aspects were not emphasized in previous datasets. These images introduce complex backgrounds and scene variations, thus making ATZD01 more appealing, challenging, and closer to real-world application scenarios.
Dramatic Lighting Variations Across Multiple Time Periods: ATZD01 was collected over a period of 5 months, encompassing both sunny and cloudy days during morning, afternoon, and dusk. It better simulates real-world application scenarios than previous datasets, but also introduces more dramatic lighting variations.

Proposed method

Considering the limited hardware capabilities of devices used for image collection in apricot orchards, this paper employs YOLOv8n as the foundational network due to its minimal weight and parameter requirements. However, YOLOv8n encounters several challenges in practical detection applications. Firstly, the complexity of the background environment in apricot tree pest and disease detection tasks significantly interferes with the detection process, particularly when dealing with multiple and small targets. Additionally, variations in lighting conditions cause visual characteristics of the same pest or disease to differ across images, leading to potential false positives or missed detections in YOLOv8n. To address these issues, we have redesigned a specialized module called ATA, specifically tailored for apricot tree pest and disease detection. This module completely replaces the C2f module. Additionally, we introduce a new detection head, Dyhead, as referenced in [32]. Based on these modifications, we propose APNet, a network specifically designed for pest and disease detection in apricot orchards. The architecture of APNet is illustrated in Fig. 3.

ATA

Although the C2f strategy can enhance model performance in general object detection tasks, it encounters specific challenges in apricot tree pest and disease detection tasks, where multi-scale targets are prevalent. The primary aim of C2f is to improve scale invariance by detecting the same target across multiple scales, thereby enhancing detection capabilities. While this approach does preserve more fine-grained features, it often fails to effectively retain deeper, coarse-grained features during the transition from coarse to fine resolutions. Additionally, in the complex scenarios characteristic of apricot tree pest and disease detection, the inability of C2f to effectively utilize contextual information from coarse-grained features limits the overall performance of the model [33]. To address the aforementioned challenges, as illustrated in Fig. 4, We propose a novel Adaptive Threshold Algorithm (ATA) that plays a crucial role in optimizing feature selection and enhancing the attention to relevant targets in complex scenes. The ATA dynamically adjusts the detection threshold based on contextual information from the image. It integrates global average pooling with a local enhancement mechanism to compute thresholds that are adapted to varying local illumination and background conditions. By selectively emphasizing high-confidence regions, this approach effectively reduces false positives and improves the model’s robustness under fluctuating lighting conditions or cluttered backgrounds. This method ensures that the model can still make accurate predictions even when pests or diseases are partially occluded or appear under challenging lighting conditions.

ApNeck

In the task of detecting diseases and pests in apricot trees, the most challenging samples often involve multiple or small targets appearing simultaneously within complex environments. This scenario not only necessitates the critical information provided by deep features but also requires the extensive detail encapsulated in shallow features to enrich the final feature representation, thereby enhancing its discriminative power. To better utilize the detailed information in shallow features while maintaining the network’s focus on critical information without overly emphasizing shallow layers, we have redesigned a new approach known as ApNeck, as depicted in Fig. 5. ApNeck is specifically designed to integrate shallow and deep features, aiming to enhance the model’s performance in complex detection scenarios. By adopting a multi-branch approach, it balances these two types of features, first processing shallow features to capture fine-grained details, and then integrating deep features to provide contextual information and abstract representations. This hybrid approach ensures that the model effectively handles both detailed and contextual information, thereby improving its accuracy and robustness when detecting challenging samples, such as pests hidden beneath leaves or in low-light environments.

While precision is crucial, computational complexity is equally significant as it determines the feasibility of deploying the trained model on the intended devices. To reduce computational demands while maintaining accuracy, we utilize Partial Convolution (PConv) to further filter extraneous information from feature maps [34]. This approach involves spatial feature extraction on a subset of input channels without impacting others. PConv invariably designates either the first or the last channel of the complete feature map as a representative, effectively decreasing unnecessary calculations. Define h and w as the dimensions of the input feature map’s height and width, c as the count of input channels, $c_p$ as the channels engaged in the convolution, and k as the dimension of the convolution kernel. The formula for the computational load of $F_{\text{ PConv } }$ is as follows:

$$\begin{aligned} F_{\textrm{PConv}}=h \cdot w \cdot k^2 \cdot c_p^2 \end{aligned}$$

(1)

Due to only $c_p$ channels being involved in spatial feature extraction, there is no loss of subsequent channel information. This approach significantly reduces both the computational load and memory access requirements.

In the task of detecting diseases and pests in apricot trees, the real-world environment presents an exceedingly complex challenge due to the presence of a vast amount of extraneous information, which complicates the feature extraction process for the model. A common approach to mitigate this issue involves channel dimension reduction to model cross-channel relationships for visual feature extraction. However, this strategy may inadvertently disrupt the extraction of critical information. To preserve the specific disease and pest information in each channel while reducing computational demands, we restructure certain channels into batch dimensions and group channel dimensions into multiple sub-features. This guarantees an even spread of spatial semantic features across each feature group. In particular, alongside encoding global information to recalibrate channel weights in every parallel branch, cross-dimensional interactions are utilized to amalgamate output features from both branches, thereby capturing pixel-level pairwise relationships.

Initially, we utilize a parallel substructure approach to prevent the loss of detailed features due to complex sequential processing and deep convolution, thereby averting performance degradation. Additionally, it leverages pixel-level attention features and aggregates multi-scale spatial structural information by placing $1 \times 1$ and $3 \times 3$ convolutions in parallel, effectively reducing computational latency. For any given input feature map $X \in R^{C \times H \times W}$, We divides it into G sub-features, denoted as $X=\left[ X_0, X_i, \ldots , X_{G-l}\right] $, where $X_i \in R^{C \times H \times W}$ typically selects G C to apply attention weights to enhance focal areas within each sub-feature.

We utilize three branches to derive weight descriptors from grouped feature maps. In the branch equipped with the $1 \times 1$ convolution, global average pooling is applied to encode channels along their dimension; meanwhile, the $3 \times 3$ convolution branch skips normalization and average pooling to preserve multi-scale feature representations. Two tensors are utilized, and 2D global average pooling is applied to process the output from the 1$\times $1 branch. This approach enables the capture of comprehensive spatial information. Subsequently, the output features from both branches are transformed into their corresponding dimensional shapes, represented as $R^{I \times C / / G} \times R^{C / / G \times H \times W}$. The pooling formula is as follows:

$$\begin{aligned} Z_c=\frac{1}{H \cdot W} \sum _j^H \sum _i^W x_c(i, j) \end{aligned}$$

(2)

Cross-spatial learning across three branches broadens the feature space and adeptly retrieves dependencies among the trio of channels, conserving spatial structural details within the channels and retaining accurate positional data. On the 2D global average pooling output, a Softmax normalization function is applied to accommodate linear transformations. Finally, the output features from the three branches are aggregated to generate two spatial attention weights, emphasizing the connections between contextual pixels. This ensures that the final output maintains the same dimensional size as the input feature map ($X \in R^{C \times H \times W}$).

Dyhead

In the task of detecting pests and diseases in apricot trees, the current detection head displays significant limitations that impede its ability to effectively adapt to the unique requirements of apricot pest and disease detection. Initially, the existing detection head employs a traditional single-scale prediction structure, which is suboptimal for managing the multi-scale targets prevalent in apricot orchard imagery. Additionally, it fails to consider a substantial amount of contextual information within these images, thereby lacking a comprehensive global perspective. Furthermore, due to its limited parameter size, it is inadequate in effectively capturing spatial structural information from the features. To overcome these challenges, we propose the adoption of DyHead, a detection head that is based on a dynamic mechanism. DyHead is capable of dynamically adjusting the weights of various features, thereby enhancing the extraction of multi-scale features. The configuration of a single DyHead unit, as well as multiple concatenated DyHead units, is depicted in Fig. 6. Unlike traditional detection heads that use fixed receptive fields, Dyhead dynamically adjusts the attention weights of each feature map based on the scale and relevance of the detected targets. It employs a multi-level attention mechanism, where scale-aware attention ensures that features from different scales (such as small and large targets) are appropriately weighted. Spatial attention focuses on the regions of the image that contain the most informative content, particularly in areas where targets are overlapped or in challenging visual conditions. Task-aware attention further adjusts the focus based on the specific type of pest or disease being detected, thereby enhancing detection accuracy for different categories. Due to this multi-level attention mechanism, Dyhead is able to maintain high detection accuracy, even when small pests or diseases are located in difficult-to-detect areas or surrounded by clutter.

DyHead utilizes a self-attention mechanism to effectively integrate scale-aware attention, spatial attention, and task-aware attention. Specifically, given a three-dimensional feature tensor $F \in \textrm{R}^{L \times S \times C}$ at the detection layer, the attention computation is formulated as follows:

$$\begin{aligned} W(F)=\pi _C\left( \pi _S\left( \pi _L(F) \cdot F\right) \cdot F\right) \cdot F \end{aligned}$$

(3)

where F represents an input three-dimensional tensor $L \times S \times C$, while $\pi _L(\cdot )$, $\pi _S(\cdot )$, and $\pi _C(\cdot )$ denote scale-aware attention, spatial-aware attention, and task-aware attention, respectively. These attentions are applied to the L, S, and C dimensions of the tensor F.

Table 3 Main Hyperparameters in APNet

Full size table

EXPERIMENTS and Discussions

Dataset

ATZD01 is a publicly available large-scale dataset specifically designed for detecting pests and diseases in apricot trees. It includes data collected from three apricot orchards located in different geographical regions. The images were captured under various seasonal and climatic conditions, as well as at different times of day, including morning, afternoon, and evening. This approach was employed to account for the influence of lighting variations on pest and disease detection. ATZD01 dataset consists of 6,055 images and 20,272 samples, covering 11 categories of apricot tree pests and diseases. These categories encompass five major diseases, five key pests, and one beneficial insect species. Each image typically contains multiple instances of various pests and diseases, offering a diverse and challenging set of samples.

All images underwent standardized preprocessing, which included resizing, data augmentation (such as rotation, scaling, cropping, and flipping), and annotation. To ensure the fairness and effectiveness of model training and evaluation, the dataset was split into training, validation, and test sets in an 8:1:1 ratio.

Experiments environment

To ensure the efficiency and scalability of the model, we conducted experiments on a high-performance hardware platform and implemented the model using mainstream deep learning frameworks.

In terms of hardware, all experiments were conducted on a high-performance computing server equipped with dual NVIDIA RTX 3090 GPUs and an E5-2680 CPU, which facilitated the acceleration of both model training and inference processes. Additionally, the server is outfitted with 64GB of RAM, enabling large-scale data processing and parallel computation, thereby ensuring efficiency when handling complex scenarios and large datasets. For the software environment, we set up the experimental platform on the Ubuntu 20.04 operating system. The model implementation was based on Python 3.8, utilizing the PyTorch 1.10 framework, with key libraries including NumPy, OpenCV, Matplotlib, and scikit-learn. Furthermore, as detailed in Table 3, we adjusted several hyperparameters to optimize the training process and enhance its efficiency.

Evaluation metrics

To enhance the assessment of future experimental outcomes, we utilize Precision (P), Recall (R), Mean Average Precision (mAP), F1-Score(F1), and Giga Floating Point Operations Per Second (GFLOPs) as evaluative metrics[35,36,37,38]. Precision is characterized as the proportion of accurately predicted positive observations relative to all predicted positives. The formula for calculation is as follows:

$$\begin{aligned} P=\frac{T P}{T P+F P} \end{aligned}$$

(4)

Recall is defined as the proportion of actual positives that are correctly identified as such by the model. The formula for calculation is as follows:

$$\begin{aligned} R=\frac{T P}{T P+F N} \end{aligned}$$

(5)

where TP refers to the number of samples correctly predicted in the detection results, FP represents the number of samples incorrectly predicted, and FN denotes the count of actual positives that were not detected. The mAP considers the precision and recall for m categories, providing a comprehensive reflection of the network’s performance. The formula for calculation is as follows:

$$\begin{aligned} m A P=\frac{1}{m} \sum _{i=1}^m \int _0^1 P(R) \textrm{d} R \end{aligned}$$

(6)

In the term mAP, m stands for mean. AP50% refers to the average precision when the threshold of the Intersection over Union (IoU) in the confusion matrix is set at 50%, specifically for that category of samples. mAP50% is calculated by averaging the precision of all categories of samples, reflecting the model’s ability to maintain high precision as the recall rate varies; a higher value indicates that the model can maintain high precision at a high recall rate. Meanwhile, mAP50%-95% represents the average mAP value across different IoU thresholds ranging from 50% to 95%, with increments of 5%. GFLOPs is a metric used to measure model complexity. A lower value indicates that the model requires less computational power. The F1 score balances both precision and recall, making it a more effective measure of the model’s performance on imbalanced datasets. The formula for calculation is as follows:

$$\begin{aligned} F1=2 \cdot \frac{ P \cdot \ R }{ P + R } \end{aligned}$$

(7)

Table 4 Comparison experiments with mainstream algorithms. Bold indicates the optimal results, while underlined indicates the secondary results

Full size table

Comparative experiments

To ascertain the superiority of APNet over currently popular object detection models and to further assess the algorithm’s performance on various metrics, we conducted a comparative analysis between APNet and other mainstream algorithms, as detailed in Table 4. To ensure that the algorithm meets the lightweight requirements necessary for apricot tree pest and disease detection, we selected the lightest model among the discussed algorithms for evaluation.

According to the results in Table 4, it is evident that SSD and Faster R-CNN, despite their historical significance in object detection, exhibit large parameter counts and GFLOPs, making them unsuitable for lightweight and real-time applications such as apricot tree pest and disease detection. While YOLOv10 achieves the best results in terms of lightweight metrics, including the smallest parameter count (2.3M) and lowest GFLOPs (6.7), its overall detection performance in metrics such as precision (81.3%) and mAP50-95 (41.4%) does not meet the demanding requirements of this task. In contrast, APNet stands out by achieving the highest performance across all major metrics, including precision (87.1%), recall (75.6%), and mAP50-95 (43.6%). This highlights the model’s ability to detect complex and small-scale targets effectively, even in challenging environments. Additionally, APNet achieves a balanced trade-off between accuracy and efficiency, with its GFLOPs (8.0) and parameter count (2.79M) being close to the best values, making it highly competitive for deployment on resource-constrained devices. Compared to YOLOv8, APNet significantly improves across all six metrics, particularly excelling in precision and mAP50-95, demonstrating the robustness of the proposed innovations such as the ATA and DyHead. These improvements justify the slight increase in computational cost and parameter count, offering substantial gains in detection accuracy and reliability. Overall, APNet’s performance validates the effectiveness of its tailored design for real-world apricot tree pest and disease detection.

To further illustrate the distinctions between APNet and other high-performing object detection algorithms on the ATZD01 dataset, we present heatmaps based on GRAD-CAM [49] in Fig. 7. We selected three images from different environmental conditions as examples: from top to bottom, these images showcase targets in a complex background with multiple samples, low-light conditions with dual samples, and standard conditions. The figure clearly demonstrates that RT-DETR, due to the transformer architecture’s self-attention mechanism, boasts a wider receptive field, making it more efficient in pinpointing various targets amidst intricate backgrounds. However, it struggles to focus on challenging targets in low-light scenarios. YOLOv5, YOLOv8, and YOLOv10 all exhibit certain limitations due to their focus on deeper features, thus overlooking the coarse-grained details provided by shallower features. This results in their inability to focus on all targets when faced with hard samples. Although YOLOv8 performs relatively well in low-light conditions, it still neglects some potential challenging targets. Enhanced by the redesigned ATA and the newly introduced DyHead, APNet effectively utilizes both coarse-grained information and detailed features, dynamically adjusting the weights of different features. This results in exceptional performance across a wide range of complex scenarios.

As illustrated in Fig. 8, we selected three images featuring challenging targets: from top to bottom, these are images of small targets with multiple samples, low-light conditions, and multiple targets under direct sunlight. The figure also displays the differences between APNet and three other algorithms. It is evident from the image that the other three algorithms exhibit certain limitations, specifically their inability to effectively utilize detail information from shallow features and insufficient use of spatial structural information. This inadequacy leads to numerous false positives and missed detections across the three types of challenging targets. In images with multiple samples, the other three algorithms all exhibited instances of missed detections and tended to overlook less conspicuous targets. In low-light condition images, RT-DETR missed detections, whereas YOLOv10 generated false positives. When presented with images containing multiple targets under direct sunlight, both RT-DETR and YOLOv8 struggled to detect targets set against intricate backgrounds, and YOLOv10 once again produced false positives. Conversely, APNet, leveraging the dynamic weight adjustment of Dyhead and the enhancement of detailed information by ATA, demonstrated superior performance over the other three algorithms in scenarios featuring challenging targets.

To further analyze the training process of APNet, we present the variation trends of the loss function and related evaluation metrics in Fig. 9. From the figure, it can be observed that both the loss function curve and the evaluation metric curves exhibit a smooth and consistent convergence trend. This indicates that the hyperparameter configuration is appropriate and that the model is capable of efficient training. Moreover, the loss curves for the training and validation sets remain closely aligned, with no significant divergence observed in later stages. This suggests that the model demonstrates strong generalization capabilities on the complex apricot pest and disease dataset.

As shown in Table 5, we conducted K-FOLD experiments to further demonstrate the performance of APNet on imbalanced datasets. The K-FOLD cross-validation results demonstrate the robustness and generalizability of APNet across varying data splits. With an average precision of 87.5%, recall of 79.7%, and F1-score of 83.4%, the model maintains stable and high performance across all folds. The slight variations between folds reflect APNet’s ability to adapt to diverse scenarios without overfitting, showcasing its reliability in real-world apricot tree pest and disease detection tasks. This consistent performance further validates the model’s effectiveness and practical applicability.

Table 5 K-FOLD Experiment Demonstration, with N-FOLD Indicating N-time Folding

Full size table

Table 6 Ablation study of different parameters and modules: P Represents Precision, R Represents Recall

Full size table

Ablation studies

To further explore the efficacy and specific performance enhancements provided by our designed ATA and the introduced Dyhead, we conducted detailed ablation studies on the various architectural modifications of the model, as detailed in Table 6. Analysis of the experimental results indicates that both APNeck and Dyhead independently enhance the overall network’s detection performance, with APNeck showing more substantial improvements. This is primarily due to its more effective utilization of detailed information from lower-level features, which significantly strengthens the robustness of the extracted features. Dyhead excels in dynamically aggregating multi-scale information, thus enhancing the overall model performance. Specifically, we investigated the impact of PConv on APNeck. Integrating PConv enhances the model’s ability to allocate weights more effectively to crucial and detailed information, preventing the model from excessively focusing on minutiae and instead distributing attention weights more judiciously. We observed that combining APNeck and Dyhead yields the best results.

As depicted in Fig. 10, we selected images from three different scenarios for analysis-direct sunlight background, complex multi-target background, and complex multi-sample multi-target background-and visualized them using GRAD-CAM heatmaps. The images illustrate that while using APNeck or Dyhead alone outperforms the baseline network, they still fail to effectively focus on all samples and lack sufficient resistance to interference. However, when combined, the model not only utilizes detailed information to enhance attention to hard samples but also greatly improves the situations of missed detections. The network can allocate attention weights more rationally.

To further investigate the precision of APNet across each category within the ATZD01 dataset and to compare it item-by-item with the baseline network YOLOv8, Fig. 11 illustrates the precision differences between the two algorithms across all 11 categories. As depicted, APNet demonstrates improved precision in all 11 categories compared to YOLOv8. This enhancement is attributed to the incorporation of more detailed information and multi-scale features. It is worth mentioning that APNet exhibits remarkable advancements in detecting categories like Gummosis and CMI, which often emerge amidst intricate backgrounds, as well as Scabbed Disease, prevalent in settings exposed to direct sunlight. Particularly for the most challenging categories, Mite Infestation and BSH, APNet benefits from the redesigned ATA, which enhances multi-scale information processing. This enhancement enables APNet to effectively detect difficult samples such as Mite Infestation and BSH. As a result, the detection precision for these two most challenging categories of apricot tree diseases is significantly improved, owing to the richly informative final features provided by ATA.

To further demonstrate the robust discriminative capability of APNet for detecting pests and diseases in apricot trees, As illustrated in Fig. 12, we have employed t-SNE [50] to visualize the effects of the foundational ATZD01 dataset, a basic Convolutional Neural Network (CNN) [19], and our specially designed APNet for apricot tree pest and disease detection. It is evident that while the basic CNN is capable of categorizing and detecting within the ATZD01 dataset, there are still numerous mixed and isolated samples. In contrast, APNet significantly ameliorates this issue, enhancing the clustering of samples within the same categories and substantially reducing the number of isolated samples. This visualization clearly demonstrates the superior capability of APNet to organize and discriminate data more effectively in comparison to traditional CNN approaches.

Discussions

APNet’s superior performance over mainstream models stems from its innovative design tailored for the complexities of apricot tree pest and disease detection. The Adaptive Thresholding Algorithm (ATA) dynamically adjusts detection thresholds, allowing the model to maintain high precision under varying lighting conditions and complex backgrounds. Meanwhile, the DyHead module enhances multi-scale feature aggregation, effectively detecting small or overlapping targets that challenge other models.

The ApNeck module further strengthens APNet by integrating shallow and deep features, enabling it to balance fine-grained detail with contextual information. These innovations collectively address the limitations of models like YOLOv8 and YOLOv10, which lack similar adaptability to real-world agricultural scenarios. Additionally, APNet’s lightweight design and optimized training strategy ensure high accuracy and generalizability, as evidenced by its consistent performance in K-FOLD experiments. Compared to traditional models like Faster R-CNN, APNet achieves a better balance between efficiency and accuracy, making it highly practical for resource-constrained environments. Its robust design not only ensures precise detection but also establishes it as a reliable tool for real-world pest and disease management. Future work could focus on further improving computational efficiency and expanding its applicability to broader datasets.

Conclusions

This study proposes APNet, a novel lightweight deep learning framework specifically designed for the detection of apricot tree pests and diseases in real-world complex environments. Addressing the limitations of existing models, APNet incorporates innovative components such as the ATA and DyHead module, which enhance its ability to handle challenging scenarios, including varied lighting, intricate backgrounds, and small or overlapping targets.

The experimental results, validated through comparative studies and K-FOLD cross-validation, demonstrate APNet’s superiority over mainstream detection models. With a precision of 87.1%, recall of 75.6%, and mAP50-95 of 43.6%, APNet consistently outperforms state-of-the-art algorithms while maintaining a lightweight architecture with only 2.79M parameters and 8.0 GFLOPs. These results confirm its robust detection capabilities and practicality for deployment on resource-constrained devices.

The necessity of this study lies in addressing the gap in reliable and efficient detection models tailored to the specific challenges of agricultural applications. Unlike traditional models, which either lack efficiency or fail to generalize well to real-world datasets, APNet provides a comprehensive solution that balances high accuracy with computational efficiency. Its ability to adapt to complex scenarios ensures practical applicability in the timely management of pests and diseases, potentially reducing agricultural losses. In summary, APNet sets a new benchmark in apricot tree pest and disease detection by achieving state-of-the-art performance across multiple evaluation metrics. The study highlights the critical role of customized architectures in addressing domain-specific challenges, offering a practical and scalable solution for real-world agricultural problems. Future research will focus on further optimizing APNet’s efficiency and extending its applicability to other crops and agricultural contexts, ensuring broader impact and usability.

Availability of data and materials

No datasets were generated or analysed during the current study.

References

Durmaz S, Ağır HB. Assessing the effect of El Niño-southern oscillation on apricot yield in Malatya province, tüRkiye. Appl Fruit Sci. 2024;66(6):2231–8. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s10341-024-01199-1.
Article Google Scholar
Poyraz S, Gül M. The development of apricot production and foreign trade in the world and in turkey. Development. 2022;22(2):601–16.
Google Scholar
Uzundumlu AS, Karabacak T, Ali A. Apricot production forecast of the leading countries in the period of 2018–2025. Emirates J Food Agricu (EJFA). 2021. https://doiorg.publicaciones.saludcastillayleon.es/10.9755/ejfa.2021.v33.i8.2744.
Article Google Scholar
Amari K, Ruiz D, Gómez G, Sánchez-Pina MA, Pallás V, Egea J. An important new apricot disease in Spain is associated with hop stunt viroid infection. Euro J Plant Pathol. 2007;118:173–81. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s10658-007-9127-7.
Article Google Scholar
Çayir A, Yenidoğan I, Dağ H. Feature extraction based on deep learning for some traditional machine learning methods. In: 2018 3rd International Conference on Computer Science and Engineering (UBMK), 2018. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/UBMK.2018.8566383 . IEEE
Qin Y, Wu B, Lei X, Feng L. Prediction of tree crown width in natural mixed forests using deep learning algorithm. Forest Ecosyst. 2023;10: 100109. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.fecs.2023.100109.
Article Google Scholar
Panchbhai KG, Lanjewar MG, Malik VV, Charanarur P. Small size cnn (cas-cnn), and modified mobilenetv2 (cas-modmobnet) to identify cashew nut and fruit diseases. Multim Tools Applic. 2024. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s11042-024-19042-w.
Article Google Scholar
Srinivasu PN, Lakshmi GJ, Narahari SC, Shafi J, Choi J, Ijaz MF. Enhancing medical image classification via federated learning and pre-trained model. Egyptian Infor J. 2024;27: 100530. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.eij.2024.100530.
Article Google Scholar
Lanjewar MG, Morajkar P, P, P. Modified transfer learning frameworks to identify potato leaf diseases. Multimed Tools Applic. 2024;83(17):50401–23. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s11042-023-17610-0.
Article Google Scholar
Wu B, Liang A, Zhang H, Zhu T, Zou Z, Yang D, Tang W, Li J, Su J. Application of conventional Uav-based high-throughput object detection to the early diagnosis of pine wilt disease by deep learning. Forest Ecol Manage. 2021;486: 118986. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.foreco.2021.118986.
Article Google Scholar
Yu R, Ren L, Luo Y. Early detection of pine wilt disease in Pinus Tabuliformis in north China using a field portable spectrometer and UAV-based hyperspectral imagery. Forest Ecosyst. 2021;8:44. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s40663-021-00328-6.
Article Google Scholar
Li D, Wei Y, Zhu R. A comparative study on point cloud down-sampling strategies for deep learning-based crop organ segmentation. Plant Methods. 2023;19(1):124. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13007-023-01099-7.
Article PubMed PubMed Central Google Scholar
Ferreira MP, Almeida DRA, Almeida Papa D, Minervino JBS, Veras HFP, Formighieri A, Santos CAN, Ferreira MAD, Figueiredo EO, Ferreira EJL. Individual tree detection and species classification of Amazonian palms using UAV images and deep learning. Forest Ecol Manage. 2020;475: 118397. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.foreco.2020.118397.
Article Google Scholar
Martinelli F, Scalenghe R, Davino S, Panno S, Scuderi G, Ruisi P, Villa P, Stroppiana D, Boschetti M, Goulart LR, et al. Advanced methods of plant disease detection. a review. Agron sustain Dev. 2015;35:1–25. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s13593-014-0246-1.
Article Google Scholar
Ferentinos KP. Deep learning models for plant disease detection and diagnosis. Comput Elect Agricul. 2018;145:311–8. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.compag.2018.01.009.
Article Google Scholar
Girshick, R. Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, 2015. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/ICCV.2015.169
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC. Ssd: Single shot multibox detector. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, 2016:21–37. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/978-3-319-46448-0_2 . Springer
He K, Gkioxari G, Dollár P, Girshick R. Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, 2017:2961–2969. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/TPAMI.2018.2844175
Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016:779–788. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/CVPR.2016.91
Li X, Rai L. Apple leaf disease identification and classification using resnet models. In: 2020 IEEE 3rd International Conference on Electronic Information and Communication Technology (ICEICT), 2020:738–742. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/ICEICT51264.2020.9334214 . IEEE
Thenmozhi K, Reddy US. Crop pest classification based on deep convolutional neural network and transfer learning. Comput Elect Agricul. 2019;164: 104906. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.compag.2019.104906.
Article Google Scholar
Sardogan M, Tuncer A, Ozen Y. Plant leaf disease detection and classification based on cnn with lvq algorithm. In: 2018 3rd International Conference on Computer Science and Engineering (UBMK), 2018. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/UBMK.2018.8566635 . IEEE
Zhang M, Chen Y, Zhang B, Pang K, Lv B. Recognition of pest based on faster rcnn. In: Signal and Information Processing, Networking and Computers: Proceedings of the 6th International Conference on Signal and Information Processing, Networking and Computers (ICSINC), 2020:62–69. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/978-981-15-4163-6_8 . Springer
Afzaal U, Bhattarai B, Pandeya YR, Lee J. An instance segmentation model for strawberry diseases based on mask R-CNN. Sensors. 2021;21(19):6565. https://doiorg.publicaciones.saludcastillayleon.es/10.3390/s21196565.
Article PubMed PubMed Central Google Scholar
Liu B, Ding Z, Tian L, He D, Li S, Wang H. Grape leaf disease identification using improved deep convolutional neural networks. Front Plant Sci. 2020;11:1082. https://doiorg.publicaciones.saludcastillayleon.es/10.3389/fpls.2020.01082.
Article PubMed PubMed Central Google Scholar
Xue Z, Xu R, Bai D, Lin H. Yolo-tea: a tea disease detection model improved by yolov5. Forests. 2023;14(2):415. https://doiorg.publicaciones.saludcastillayleon.es/10.3390/f14020415.
Article Google Scholar
Hu Y, Deng X, Lan Y, Chen X, Long Y, Liu C. Detection of rice pests based on self-attention mechanism and multi-scale feature fusion. Insects. 2023;14(3):280. https://doiorg.publicaciones.saludcastillayleon.es/10.3390/insects14030280.
Article PubMed PubMed Central Google Scholar
Zhao Y, Chen J, Xu X, Lei J, Zhou W. Sev-net: Residual network embedded with attention mechanism for plant disease severity detection. Concurr Comput: Pract Exper. 2021;33(10):6161. https://doiorg.publicaciones.saludcastillayleon.es/10.1002/cpe.6161.
Article Google Scholar
TÜRKOĞLU M, HANBAY D. Apricot disease identification based on attributes obtained from deep learning algorithms. In: 2018 International Conference on Artificial Intelligence and Data Processing (IDAP), 2018. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/IDAP.2018.8620831 . IEEE
Han B, Duan P, Zhou C, Su X, Yang Z, Zhou S, Ji M, Xie Y, Chen J, Lv C. Implementation and evaluation of spatial attention mechanism in apricot disease detection using adaptive sampling latent variable network. Plants. 2024;13(12):1681. https://doiorg.publicaciones.saludcastillayleon.es/10.3390/plants13121681.
Article PubMed PubMed Central Google Scholar
Zhang J, Qi C, Mecha P, Zuo Y, Ben Z, Liu H, Chen K. Pseudo high-frequency boosts the generalization of a convolutional neural network for cassava disease detection. Plant Methods. 2022;18(1):136. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13007-022-00969-w.
Article PubMed PubMed Central Google Scholar
Dai X, Chen Y, Xiao B, Chen D, Liu M, Yuan L, Zhang L. Dynamic head: Unifying object detection heads with attentions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021:7373–7382. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/CVPR46437.2021.00729
Ouyang D, He S, Zhang G, Luo M, Guo H, Zhan J, Huang Z. Efficient multi-scale attention module with cross-spatial learning. In: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023:1–5. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/ICASSP49357.2023.10096516 . IEEE
Chen J, Kao S-h, He H, Zhuo W, Wen S, Lee C-H, Chan S-HG. Run, don’t walk: chasing higher flops for faster neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023:12021–12031. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/CVPR52729.2023.01157
Bardin M, Gullino ML. Fungal diseases. Integ Pest Dis Manage Greenhouse Crops. 2020. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/978-3-030-22304-5_3.
Article Google Scholar
Devaux A, Goffart J-P, Kromann P, Andrade-Piedra J, Polar V, Hareau G. The potato of the future: opportunities and challenges in sustainable Agri-food systems. Potato Res. 2021;64(4):681–720. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s11540-021-09501-4.
Article PubMed PubMed Central Google Scholar
Tsedaley B. Late blight of potato (phytophthora infestans) biology, economic importance and its management approaches. J Biol Agricul Healthcare. 2014;4(25):215–25.
Google Scholar
Lanjewar MG, Panchbhai KG, Charanarur P. Lung cancer detection from Ct scans using modified Densenet with feature selection methods and ml classifiers. Expert Syst Applic. 2023;224: 119961. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.eswa.2023.119961.
Article Google Scholar
Ren S, He K, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. Adv Neural Inform Proc syst. 2015;28:6.
Google Scholar
Mathew MP, Mahesh TY. Leaf-based disease detection in bell pepper plant using yolo v5. Signal Image Video Proce. 2022. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s11760-021-02024-y.
Article Google Scholar
Ge Z, Liu S, Wang F, Li Z, Sun J. Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 (2021) https://doiorg.publicaciones.saludcastillayleon.es/10.48550/arXiv.2107.08430
Qadri SAA, Huang N-F, Wani TM, Bhat SA. Plant disease detection and segmentation using end-to-end yolov8: A comprehensive approach. In: 2023 IEEE 13th International Conference on Control System, Computing and Engineering (ICCSCE), pp. 155–160 (2023). https://doiorg.publicaciones.saludcastillayleon.es/10.1109/ICCSCE58721.2023.10237169 . IEEE
Xu X, Jiang Y, Chen W, Huang Y, Zhang Y, Sun X. Damo-yolo: A report on real-time object detection design. arXiv preprint arXiv:2211.15444 (2022) https://doiorg.publicaciones.saludcastillayleon.es/10.48550/arXiv.2211.15444
Wang C-Y, Yeh I-H, Mark Liao H-Y. Yolov9: Learning what you want to learn using programmable gradient information. In: European Conference on Computer Vision, 2025:1–21. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/978-3-031-72751-1_1 . Springer
Lin T. Focal loss for dense object detection. arXiv preprint arXiv:1708.02002 (2017)
Wang C, He W, Nie Y, Guo J, Liu C, Wang Y, Han K. Gold-yolo: efficient object detector via gather-and-distribute mechanism. Adv Neural Inform Proc Syst. 2024;36:6.
CAS Google Scholar
Zhao Y, Lv W, Xu S, Wei J, Wang G, Dang Q, Liu Y, Chen J. Detrs beat yolos on real-time object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024:16965–16974
Wang A, Chen H, Liu L, Chen K, Lin Z, Han J, Ding G. Yolov10: Real-time end-to-end object detection. arXiv preprint arXiv:2405.14458 (2024)
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, 2017:618–626. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/ICCV.2017.74
Maaten L, Hinton G. Visualizing data using T-SNE. J Machine Learn Res. 2008;9:11.
Google Scholar

Download references

Funding

This work was supported by the following projects: The Department of Science and Technology of Liaoning Province, No.2022JH2/101300274; The Department of Education of Liaoning Province, No.LNYJG2023117.

Author information

Authors and Affiliations

School of Electronic and Information Engineering, Liaoning Technical University, Huludao, 125105, China
Minglang Li, Zhiyong Tao, Kaihao Feng, Zeyi Zhang & Yurong Jing
Research Institute of Pomology, Chinese Academy of Agricultural Sciences, Xingcheng, 125100, China
Wentao Yan
School of Automation and Electrical Engineering, Shenyang Ligong University, Shenyang, 110159, China
Sen Lin

Authors

Minglang Li
View author publications
You can also search for this author inPubMed Google Scholar
Zhiyong Tao
View author publications
You can also search for this author inPubMed Google Scholar
Wentao Yan
View author publications
You can also search for this author inPubMed Google Scholar
Sen Lin
View author publications
You can also search for this author inPubMed Google Scholar
Kaihao Feng
View author publications
You can also search for this author inPubMed Google Scholar
Zeyi Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Yurong Jing
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

ML: conceptualization, methodology, software, investigation, formal analysis, writing-original draft; ZT: data curation, funding acquisition; WY: data curation, visualization, polishing the writing; SL: resources, supervision; KF: software, validation; ZZ: visualization, writing-review and editing; YJ: conceptualization, resources, supervision, writing-review and editing. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Zhiyong Tao.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Competing interests

The authors declare no Competing of interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Li, M., Tao, Z., Yan, W. et al. Apnet: Lightweight network for apricot tree disease and pest detection in real-world complex backgrounds. Plant Methods 21, 4 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13007-025-01324-5

Download citation

Received: 21 November 2024
Accepted: 04 January 2025
Published: 09 January 2025
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13007-025-01324-5

Apnet: Lightweight network for apricot tree disease and pest detection in real-world complex backgrounds

Abstract

Introduction

Related work