- Methodology
- Open access
- Published:
Dual discriminator GAN-based synthetic crop disease image generation for precise crop disease identification
Plant Methods volume 21, Article number: 46 (2025)
Abstract
Deep learning-based computer vision technology significantly improves the accuracy and efficiency of crop disease detection. However, the scarcity of crop disease images leads to insufficient training data, limiting the accuracy of disease recognition and the generalization ability of deep learning models. Therefore, increasing the number and diversity of high-quality disease images is crucial for enhancing disease monitoring performance. We design a frequency-domain and wavelet image augmentation network with a dual discriminator structure (FHWD). The first discriminator distinguishes between real and generated images, while the second high-frequency discriminator is specifically used to distinguish between the high-frequency components of both. High-frequency details play a crucial role in the sharpness, texture, and fine-grained structures of an image, which are essential for realistic image generation. During training, we combine the proposed wavelet loss and Fast Fourier Transform loss functions. These loss functions guide the model to focus on image details through multi-band constraints and frequency domain transformation, improving the authenticity of lesions and textures, thereby enhancing the visual quality of the generated images. We compare the generation performance of different models on ten crop diseases from the PlantVillage dataset. The experimental results show that the images generated by FHWD contain more realistic leaf disease lesions, with higher image quality that better aligns with human visual perception. Additionally, in classification tasks involving nine types of tomato leaf diseases from the PlantVillage dataset, FHWD-enhanced data improve classification accuracy by an average of 7.25% for VGG16, GoogleNet, and ResNet18 models.Our results show that FHWD is an effective image augmentation tool that effectively addresses the scarcity of crop disease images and provides more diverse and enriched training data for disease recognition models.
Introduction
In recent years, with the development of artificial intelligence(AI) technology, the efficiency and accuracy of crop disease monitoring have been greatly improved [1,2,3,4]. Through automated image recognition and analysis, AI can quickly detect and identify crop disease symptoms, conduct large-scale monitoring in real-time, and provide timely alerts [5, 6]. For example, Naqvi et al. proposed a new deep learning and optimization framework for the classification of apple and cucumber leaf diseases. In experiments on the apple and cucumber datasets, accuracy rates of 94.8% and 94.9% were achieved, respectively, and when compared with state-of-the-art techniques, the proposed framework demonstrated superior performance [7]. This helps farmers take effective measures before the disease spreads, reducing crop losses. However, crop disease monitoring still faces the issue of insufficient image data. Image data for many specific crops or disease types are scarce, leading to a lack of diverse data for training deep learning models, which affects the accuracy of disease identification [8, 9]. To address data scarcity and class imbalance, Shah et al. proposed FuzzyShallow, a novel framework that combines fuzzy deep learning and optimization for the recognition of citrus fruit diseases and agricultural land cover. Experimental results demonstrated that the improved model achieved accuracy rates of 98% and 96.5% on the Mendeley and NWPU datasets, respectively, thereby highlighting the critical role of enhancing image quality and optimizing feature selection in scenarios with limited data [10]. Additionally, the lack of sufficient data limits the model’s generalization ability, making it difficult for the model to adapt to different environments and disease variations, thus reducing its performance in practical applications. Conventional data augmentation [11] and image generation [12] methods can somewhat alleviate the issue of insufficient data. However, these methods are not sufficiently realistic and perfect in generating detailed disease images, making it difficult for deep learning models to learn enough discriminative features, affecting the model’s classification accuracy and generalization capability.
Data augmentation methods have been proven effective in machine vision [13,14,15]. They can be mainly divided into two categories: conventional data augmentation (such as geometric transformations, color transformations, and noise addition) and model-based augmentation methods, such as generative adversarial network (GAN) [16], Swap Autoencoder (SwapAE) [17], and denoising diffusion probabilistic model (DDPM) [18].
Conventional augmentation methods generate more and different forms of data by applying transformations and modifications to the original disease images, thereby enhancing the model's generalization capability. Such methods usually must be selected and combined based on specific tasks and data characteristics. For example, Ullah et al. proposed AppViT, a hybrid vision model that combines convolutional modules and multi-head self-attention mechanisms, to enable timely diagnosis of apple leaf diseases on lightweight devices [19]. Haikal et al. employed data augmentation methods such as Cutmix, Cutout, Mixup, and FMix and their combinations to enhance the performance of rice leaf detection models [20]. Wagle et al. expanded the dataset by combining data augmentation methods such as adding noise, applying blurring, and color perturbations. They conducted classification validation of healthy and diseased tomato leaf images using the ResNet101 model, achieving a test classification accuracy of 99.99% and a validation accuracy of 95.83% [21]. Shaji et al. enhanced a dataset containing 2,215 rice leaf images to 12,684 images by comprehensively applying seven different conventional data augmentation techniques, resulting in a test classification accuracy of 99.55% for the ResNet-32 model [22]. Conventional data augmentation methods have significantly improved the classification accuracy and robustness of models in crop disease identification, especially in situations with limited data or complex environments. Although these conventional data augmentation techniques can alleviate the scarcity of individual data to some extent, they cannot fundamentally increase the diversity of individual data [23].
In contrast, generative models learn the distribution characteristics of disease images to generate new data samples that are similar to the original data, thereby effectively increasing the scale and diversity of the dataset. Wang et al. proposed a Generative Adversarial Classification Network (GACN), which both balances the dataset by generating synthetic images to enhance CNN performance and functions as an independent classifier for disease recognition [24]. Zhang et al. aimed to reduce the difficulty of acquiring disease data for Chinese cabbage, utilized a Cycle-consistent Generative Adversarial Network (CycleGAN) to simulate and generate Verticillium wilt feature images from multiple fields. They then achieved rapid and accurate detection of Verticillium wilt in Chinese cabbage by leveraging an improved version of YOLOv8 [25]. Haruna et al. utilized StyleGAN2-ADA for data augmentation of four major rice disease images while using the variance of the Laplacian filter to discard blurry and poorly generated images [26]. Xin et al. developed an augmented GAN model based on the Wasserstein GAN loss function (AWGAN) and performed augmentation on corn disease leaf images in the PlantVillage dataset, achieving good results in tests with models such as AlexNet, VGG16, and ResNet18 [27]. Cap et al. elaborated the LeafGAN model with a self-attention mechanism, which enhances the performance of plant disease diagnosis by converting healthy leaf images into disease-infected leaf images. In experiments on classifying five types of cucumber diseases, data augmentation through LeafGAN significantly improved the diagnostic performance of the model [28]. Li et al. introduced the SugarcaneGAN, combining a lightweight U-RSwinT as the extraction module and generator for sugarcane leaf images to expand crop disease image samples. Experimental results showed that, compared to conventional GAN models such as LeafGAN and CycleGAN, SugarcaneGAN significantly improved data quality in complex backgrounds, achieving an FID(Fréchet Inception Distance) score that is 24% lower than that of LeafGAN and 34% lower than that of CycleGAN [29]. Chen et al. proposed using transfer learning to apply diffusion models on a multiclass weed dataset to generate high-quality weed recognition images [30].
Unlike GANs such as BigGAN, StyleGAN2, and StyleGAN3, diffusion models achieved the best balance between sample fidelity and diversity, resulting in the highest Fréchet Inception Distance. Wang et al. developed a diffusion model based on Efficient Channel Attention (ECA) [31] and an Inception-SSNet classification model based on Inception-v3 [32] to classify six major disease types in the leaves of Panax notoginseng, including gray mold, powdery mildew, viral infections, anthracnose, yellow–brown spots, and round spots. Experimental results showed that the proposed ECA-based diffusion model achieved an FID of 42.73, with a 74.71% improvement over the baseline model. Additionally, testing the classification model using the generated dataset of Panax notoginseng leaf diseases improved the classification accuracy of 11 mainstream classification models [33]. Li et al. proposed a method for data augmentation using DDPM to generate synthetic images and pre-trained the Swin Transformer model on the synthetic dataset generated by DDPM. They then fine-tuned the model on the original citrus leaf images for disease classification through transfer learning [34]. Diffusion models require multiple iterative steps to complete the noise addition and denoising processes, resulting in a generation speed significantly lower than that of GANs, especially evident in high-resolution image generation. The generated images require stepwise sampling and cannot be generated in parallel like GANs, leading to lower efficiency of diffusion models in practical applications.
In summary, GAN-based data augmentation methods can generate high-quality images, effectively expand data samples, and thus improve the performance of deep learning models [35]. Although GAN provides powerful data augmentation capabilities, GAN models tend to fit low-frequency information and lack the ability to generate high-quality high-frequency information [36]. Image distortion can arise when generating subtle disease spot features and leaf texture details. This is particularly crucial in the field of crop leaf diseases. The disease spot characteristics of certain leaves are highly similar, and minor image distortions may lead to misjudgments in disease diagnosis.
Therefore, based on the GAN and frequency domain image translation (FDIT) [37] models, we propose a novel data augmentation model with dual discriminators (FHWD). Unlike many GAN models [38, 39] that only consider pixel space information, FHWD integrates both pixel space and frequency domain information, effectively improving the accuracy of capturing high-frequency details in crop disease leaf images generated by existing GAN-based data augmentation models. This allows the generated images to maintain good structure and detail, thereby enhancing the overall quality of the generated images.The FHWD model can be widely applied in field crops, providing strong support for disease monitoring, early warning, and precision agricultural management. Additionally, the FHWD model can be effective in vineyards, apple orchards, and other fruit orchards, accurately detecting and monitoring diseases, and providing early warning to assist in the efficient management and scientific decision-making of smart orchards.
The main contributions of this paper are as follows:
-
1.
We propose a data augmentation model called FHWD, which adopts a dual discriminator architecture consisting of a standard discriminator and a high-frequency discriminator. The standard discriminator is responsible for distinguishing the authenticity of generated images from real images, while the high-frequency discriminator focuses on evaluating the high-frequency information in the generated images. These two work in tandem. Specifically, the standard discriminator guides the model to generate images that closely resemble real images by assessing the realism of the generated images, while the high-frequency discriminator further refines the evaluation of detailed features, such as texture and edge information, ensuring that the generated images are not only realistic overall but also consistent with real images in terms of high-frequency details.
-
2.
In the high-frequency discriminator, we not only introduced LSGAN (Least Squares Generative Adversarial Network) [40] loss to reduce high-frequency artifacts during the generation process, but also designed Fourier loss, which measures the error by transforming the images into Fourier space, further enhancing the frequency information of the images. In the standard discriminator, we designed wavelet loss,which minimizes the difference between the original image and the generated image in the wavelet domain to supplement the leaf detail information, thereby improving the quality of the generated images.
-
3.
We validated our method on the PlantVillage dataset, covering ten types of crop diseases, and compared it with existing algorithms from multiple perspectives. The experimental results show that the FHWD model can effectively generate high-quality, detail-rich crop disease images and significantly improve the performance of multiple classification models in recognizing tomato leaf diseases.
Materials and methods
Dataset acquisition
The crop disease images used in this study are sourced from the publicly available PlantVillage [41] dataset. We selected disease categories based on the shape, size, and distribution density of the lesions. In terms of shape, we considered the diversity of both regular shapes (such as circular and elliptical) and irregular shapes to ensure the model can adapt to different morphological features. The lesion size covers a range from small early-stage lesions to large, extensive lesions representing severe diseases, testing the model's applicability across different disease severities. Regarding distribution density, we chose categories with dispersed lesions (such as early infections) and highly concentrated lesions (such as late-stage severe infections) to evaluate the model's ability to handle complex distribution patterns. This selection strategy comprehensively tests the model's adaptability and robustness.
We selected ten disease types (apple black rot, corn gray leaf spot, corn common rust, corn northern leaf blight, potato early blight, potato late blight, pepper bacterial spot, tomato early blight, Grape Black rot and Strawberry Leaf scorch) to construct ten data subsets, which were used for image generation experiments to validate the broad effectiveness of the FHWD model. The detailed information of the generated experimental dataset is shown in Table 1, which presents the leaf types, disease characteristics, and the number of images.
We selected images of nine types of tomato leaf diseases (tomato bacterial spot, tomato early blight, tomato late blight, tomato leaf mold, tomato early blight, tomato spider mite, tomato target spot, tomato yellow leaf curl virus, and tomato mosaic virus disease) and performed data augmentation using FHWD. Classification experiments were conducted using these augmented images to verify the effectiveness of the generated images in classification tasks. Detailed information on the classification experiment’s dataset is given in Table 2, presenting the leaf types, disease characteristics, and the number of images (Fig. 1).
The proposed FHWD model
The FHWD model consists of two parts: the discriminator and the generator. The discriminator part is designed with a dual discriminator structure. This structure includes a standard discriminator and a high-frequency discriminator, both of which adopt the same network architecture, as shown in Fig. 2. The effectiveness of this dual discriminator structure lies in its ability to address the limitations of traditional discriminators, which often only handle low-frequency information and overlook high-frequency details. Therefore, the dual discriminator structure can more comprehensively assess the quality of generated images, improving both the detail representation and overall consistency in image generation tasks. The traditional generative adversarial network (GAN) loss function, particularly the discriminator based on logistic loss, often overlooks the high-frequency information of the image, which may result in generated images lacking details and texture. The standard discriminator module introduces two loss functions, namely Logistic Loss and Wavelet Loss, which are primarily responsible for evaluating the global structural consistency between generated images and real images. The high-frequency discriminator module incorporates LSGAN Loss and FFT Loss, which are mainly used to assess the similarity between generated images and real images in terms of high-frequency detail information, such as details and texture, as well as overall structure. The generator module includes Pixel Space Loss and Fourier Space Loss, which generate fake samples indistinguishable from real data.
The high-frequency discriminator requires high-frequency information from both real and generated images. Therefore, the first step is to extract the high-frequency components from the images, as shown in Fig. 3, the minus sign represents the subtraction operation between the grayscale image and the low-pass filtered image.
In the standard discriminator, Logistic Loss is used to differentiate between real and generated images during the discrimination process. This loss function is based on log-likelihood estimation, encouraging the discriminator to label real images as 'real' (label 1) and generated images as 'fake' (label 0). The generator learns to produce more realistic images by minimizing this loss function. Traditional GAN loss functions typically focus only on the overall structure (low-frequency information), neglecting the representation of details and local textures. Wavelet Loss compensates for this deficiency by introducing the optimization of high-frequency information. By evaluating the image in the wavelet domain, it ensures that the generated images maintain consistency not only at the low-frequency level (global structure) but also at the high-frequency level (details and textures).
In the high-frequency discriminator, LSGAN Loss evaluates the authenticity of high-frequency information in generated images. The standard GAN discriminator mainly evaluates images in pixel space (i.e., the spatial domain), but this approach tends to overlook high-frequency details. FFT Loss provides a more precise way to assess high-frequency information by transforming the images into the frequency domain, particularly focusing on details and textures.
In the generator, Pixel Space Loss helps ensure that the local details of the generated images are consistent with their overall appearance in pixel space. Directly comparing the high-frequency and low-frequency components of the generated and real images guarantees matching details and global features. Fourier Space Loss constrains the frequency characteristics of images in Fourier space, ensuring that the generated images are consistent with the global structure and local details of the real images in the frequency domain.
The generated images maintain authenticity in global structure through the combined effects of these losses and achieve high fidelity in local details and high-frequency information.
The specific steps are as follows:
-
1.
(1) The generated and original images are converted to grayscale, removing color and lighting information irrelevant to texture and edges via the following equation:
$$\begin{array}{c}x=rgb2gray\left(y\right)\end{array}$$(1)where \(y\) is the original color image, and x is the converted grayscale image.
-
2.
(2) A Gaussian filter is applied to the generated grayscale images through convolution to obtain low-frequency images. The Gaussian kernel is derived as follows:
$$\begin{array}{c}{K}_{\sigma }=\frac{1}{2\pi {\sigma }^{2}}{e}^{-\frac{1}{2}}\left(\frac{{i}^{2}+{j}^{2}}{{\sigma }^{2}}\right)\end{array}$$(2)where \(\left[i,j\right]\) is the spatial position within the image, and \(\sigma\) is the variance of the Gaussian function.
The convolution operation is expressed as:
$$\begin{array}{c}{x}_{L}\left[i,j\right]=\sum \sum k\left[m,n\right]\cdot x\left[i+m,i+n\right]\end{array}$$(3)where \(x\) is the input image,\({x}_{L}\) is the low-frequency image obtained by convolving the input image with the Gaussian kernel, and \(\text{m}\),\(\text{n}\) is the index of the 2D Gaussian kernel.
-
3.
(3) The high-frequency image is obtained by subtracting the low-frequency image from the acquired grayscale image, thereby revealing sharp edges and texture information, via the following relation:
$$\begin{array}{c}{x}_{H}=x-{x}_{L}\end{array}$$(4)
The obtained high-frequency image is input into the high-frequency discriminator, which outputs a score for each high-frequency image. This score assesses the quality of the high-frequency components in the generated images. If the quality of the high-frequency components in the generated image is good, the score will be higher; otherwise, it will be lower. Based on the output scores from the high-frequency discriminator, the FHWD model updates the gradients to guide the generator in producing images with better high-frequency components, thereby enhancing the high-frequency details of the generated images.
Loss function
The workflow of the six loss functions in the FHWD model has the following configuration: First, the standard discriminator distinguishes between real and generated images using Logistic Loss. This loss function assists the discriminator in determining whether the input image originates from the real data distribution, thereby guiding the generator to produce more realistic images. Subsequently, we designed Wavelet Loss, which incorporates frequency signals, particularly the high-frequency components that the generator tends to overlook during training. By incorporating this high-frequency information, Wavelet Loss ensures consistency at different frequency levels, thereby effectively preserving the authenticity of both the image details and overall structure. By minimizing the sum of these two loss functions (Logistic Loss and Wavelet Loss), the generator is able to continuously refine its generation process and learn how to produce more realistic images.
Secondly, the high-frequency discriminator utilizes LSGAN Loss to focus on the high-frequency components of the generated images, comparing them with the high-frequency components of the real images. This loss function effectively reduces high-frequency artifacts in the generated images, promoting consistency in edges, details, and textures, further enhancing the clarity and realism of the generated images. Meanwhile, we designed FFT Loss, which aims to correct the generator’s tendency to overlook high-frequency components and improve the retention of high-frequency information. By minimizing the sum of these two loss functions, the generator is enabled to produce more realistic image details and a complete image structure.
Finally, the generator uses Pixel Space Loss to compare the generated images with the real images in the pixel space, calculating the loss of high-frequency and low-frequency components to ensure similarity in details and overall structure. Additionally, Fourier Space Loss performs a Fourier transform on both the generated images and the real images to calculate the loss of their frequency characteristics, ensuring consistency with the global structure and local details of the real images in the frequency domain, thereby enhancing the realism of the generated images.
Logistic loss
In GANs [16], Logistic Loss is utilized by the discriminator to differentiate between real images and generated images, thereby guiding the generator to produce more realistic images. The discriminator's objective is to maximize the logarithmic loss associated with the predicted probability of real images while minimizing the logarithmic loss associated with the predicted probability of generated images. The generator aims to minimize the logarithmic loss assigned to the generated images by the discriminator, thereby bringing the probability of the generated images closer to that of the real images, enhances the generated outputs’ quality. The Logistic Loss is derived as follows:
where \(D\left({x}_{i}\right)\) is the predicted probability of the discriminator for the real image \({x}_{i}\)​,\(D\left(G\left({z}_{i}\right)\right)\) is the predicted probability of the discriminator for the generated image \(G\left({z}_{i}\right)\).\({Besides, y}_{i}\) is the true label (the label for real images is 1, and the label for generated images is 0).
LSGAN loss
To make the generated samples more realistic, the high-frequency discriminator adopts the training strategy of LSGAN and employs a least squares loss function. It first computes the mean squared error (MSE) loss between the discriminator score for the high-frequency components of the real samples and 1. Next, the MSE loss between the discriminator score for the high-frequency components of the generated samples and 0 is calculated. Finally, the sum of these two MSE losses is used as the loss function for the high-frequency discriminator. This method helps make the generated samples clearer, with richer detailed information, bringing them closer to the real data distribution. The loss function of the high-frequency discriminator has the following form:
where \({\text{x}}_{\text{H}}\) is the high-frequency components of the real samples, \(\text{D}\left({\text{x}}_{\text{H}}\right)\) is the predicted output of the discriminator for the high-frequency components, bringing them closer to unity, \(\text{D}\left({\text{G}\left(\text{E}\left(\text{x}\right)\right)}_{\text{H}}\right)\) is the predicted output of the discriminator for the generated samples, bringing them closer to zero.
Wavelet loss
Wavelet transform is a classical technique widely used in image compression, employed to separate low-frequency approximations and high-frequency detail information from the original image [42]. In image generation, wavelet transform has been introduced into recent generative adversarial networks [43,44,45,46] to enhance the visual quality of output images. When determining whether a generated image is realistic, the discriminator may lack sensitivity to high-frequency details such as textures and edges. By incorporating features derived from the wavelet transform, the discriminator can more accurately distinguish the differences in high-frequency information between real and generated images, thereby improving the effectiveness of adversarial training.
Our designed wavelet loss enhances the standard discriminator's focus on details through multi-band constraints, ultimately improving the overall visual quality of generated images. Specifically, we apply the Daubechies wavelet transform [47] to each image, decomposing the original and generated images into four components: one low-frequency component and three high-frequency components. The high-frequency components are subdivided into three subbands corresponding to the horizontal, vertical, and diagonal directions. In calculating the loss, we apply L1 loss constraints to the low-frequency component, horizontal subband, and vertical subband, as shown in Fig. 4a, b, d, ensuring that the features of the generated images are consistent with those of the original images across various frequency domains. Meanwhile, the loss for the diagonal subband is computed using mean squared error (MSE), as the features in the diagonal direction contain both high-frequency horizontal and vertical components, indicating that this subband encompasses more complex texture details, as illustrated in Fig. 4c.
Using mean squared error as the loss function, we can more sensitively reflect the changes in these details. The wavelet loss function has the following form:
where \({\text{LL}}_{\text{real}}\) and \({\text{LL}}_{\text{fake}}\) represent the low-frequency components obtained from the wavelet transform of the real image and the generated image, respectively; \({\text{LH}}_{\text{real}}\) and \({\text{LH}}_{\text{fake}}\) are the high-frequency components in the horizontal direction obtained from the wavelet transform of the real and generated images, respectively; \({\text{HL}}_{\text{real}}\) and \({\text{HL}}_{\text{fake}}\) are the high-frequency components in the vertical direction obtained from the wavelet transform of the real and generated images, respectively;\({\text{HH}}_{\text{real}}\) and \({\text{HH}}_{\text{fake}}\) are the high-frequency components in the diagonal direction obtained from the wavelet transform of the real and generated images, respectively; \({\Vert \cdot \Vert }_{1}\) is the L1 norm (L1 loss), which is used to calculate the difference between the low-frequency component, the horizontal subband, and the vertical subband; \({\Vert \cdot \Vert }_{2}^{2}\) is the square of the L2 norm (MSE loss), which is used to calculate the difference between the diagonal subbands.
Fast fourier transform (FFT) loss
The Fourier transform provides a way to convert signals from the time domain to the frequency domain, allowing one to observe the different frequency components contained within the signal. GAN models typically tend to fit low-frequency components while neglecting high-frequency components, resulting in generated images lacking high-frequency details [37]. To address this limitation, we introduce a FFT loss on the high-frequency discriminator to help preserve high-frequency information. Specifically, we employ the two-dimensional Fast Fourier Transform (FFT) to convert both the real and generated images into the frequency domain, compute their amplitude spectra, and calculate the L1 loss between them to maintain the overall structure of the generated images. The FFT loss compensates for the limitation of traditional pixel-level loss functions in capturing high-frequency information, thereby further enhancing image generation performance, particularly in terms of detail and texture representation. The Fast Fourier Transform (FFT) loss function has the following form:
where \(real\_img1\) is the original image 1; \(real\_img2\) is the original image 2, \(fake\_img2\) is the image generated using the structural information of original image 1 and the texture information of original image 2; ⊕ is concatenation; \(\text{N}\) is the number of elements in the tensor. Finally, \(\text{FFT}\_\text{mag}\) calculates the magnitude of the Fourier transform of the image.
Pixel space loss
We separate the image into high-frequency and low-frequency components in the pixel space. The high-frequency components represent the structure and details of the image, while the low-frequency components represent large, smooth areas such as color and lighting in the image. After obtaining the high-frequency and low-frequency components, we constructed a reconstruction loss in the pixel space to enhance the similarity between the input image and the generator's output in both the low-frequency and high-frequency components. In addition to the reconstruction loss, we employed a translation matching loss, which adjusts the high-frequency components to maintain consistency between the high-frequency details of the generated image and the original image, helping the model generate high-frequency information more effectively.
The reconstruction loss function in the pixel space is expressed below:
where \(x\) is the input image, \(G\left(E\left(x\right)\right)\) is the generated image, while \({x}_{L}\), \({\left(G\left(E\left(x\right)\right)\right)}_{L}\), and \({\left(G\left(E\left(x\right)\right)\right)}_{H}\) represent the real image, the low-frequency part of the generated image, and the high-frequency part, respectively.
The translation matching loss in the pixel space can be expressed as follows:
where \({z}_{c}^{source}\) and \({z}_{s}^{ref}\) are the original image’s content and the reference image’s style, respectively.
Fourier space loss
In the Fourier space, the signals in the frequency domain enable the model to learn the frequency information of the image from a global perspective. Therefore, designing frequency-based training objectives facilitates the preservation of frequency information that visually reflects the important structures and details of the image during the training process. The aforementioned FFT Loss primarily computes the L1 loss of the amplitudes between the generated image and the original image in the frequency domain through a high-frequency discriminator. This approach ensures the high-frequency components' similarity between the generated and original images and maintains consistency in the overall structure. This paper also employs the Fourier space loss function proposed by Cai et al. [37], where the generator constrains the generated image by utilizing frequency information in both the pixel space and frequency space, allowing the generated image to retain more structural and detailed information. The reconstruction loss function in the frequency space has the following form:
where \({F}^{R}\) denotes the Fourier transform of the image converted to the real domain. The translation matching loss in the frequency space is derived as follows:
where \({F}_{H}^{R}\) is the corresponding high-frequency component.
Overall loss
Considering all the aforementioned losses, the overall loss can be divided into generator loss, standard discriminator loss, and high-frequency discriminator loss.
-
1.
The generator loss can be summarized as Eq. (13):
$$\begin{array}{c}{\text{L}}_{\text{G}}={\text{L}}_{\text{recon}}+{\text{L}}_{\text{g}}+{\text{L}}_{\text{gh}}+{\uplambda }_{1}{\text{L}}_{\text{rec},\text{pic}}+{\uplambda }_{2}{\text{L}}_{\text{trans},\text{pix}}+{\uplambda }_{3}{\text{L}}_{\text{rec},\text{fft}}+{{\uplambda }_{4}\text{L}}_{\text{trans},\text{ftt}}\end{array}$$(13)where \({\text{L}}_{\text{recon}}\) is a commonly used loss function, typically applied in Generative Adversarial Networks (GANs) to measure the difference between generated images and real images. \({\text{L}}_{\text{g}}\) is the loss function for the generator in a Generative Adversarial Network (GAN). \({\text{L}}_{\text{gh}}\) is a loss function specifically designed to guide the generator in producing high-frequency information.
-
2.
The standard discriminator loss can be summarized as Eq. (14):
$$\begin{array}{c}{\text{L}}_{\text{D}}={\text{L}}_{\text{gan}}+{\text{L}}_{\text{wavelet}}\end{array}$$(14)where \({\text{L}}_{\text{D}}\) refers to the standard GAN loss used to train the discriminator. \({\text{L}}_{\text{wavelet}}\) represents a wavelet-based loss, which introduces additional constraints related to the frequency content of the image.
-
3.
The high-frequency discriminator loss can be summarized as Eq. (15):
$$\begin{array}{c}{\text{L}}_{\text{HD}}=LSGAN+{\text{L}}_{\text{fft}}\end{array}$$(15)where LSGAN refers to LSGAN loss, which is a variation of the traditional GAN loss. \({\text{L}}_{\text{fft}}\) refers to the FFT loss, which is based on the Fast Fourier Transform (FFT).
Results
In the validation experiments for image generation, we selected ten types of crop diseases: apple black rot, corn gray leaf spot, corn common rust, corn northern leaf blight, potato early blight, potato late blight, pepper bacterial spot, tomato early blight,grape black rot and strawberry leaf scorch, to evaluate the effectiveness of FHWD in generating disease images. Among them, the structure of corn disease leaves is the most complex, with corn common rust presenting small and dense spots, corn gray leaf spots showing elliptical lesions confined to the spaces between parallel leaf veins, and corn northern leaf blight exhibiting elongated, spindle-shaped lesions. To assess the generative performance of FHWD, we compared it with high-performing models such as StyleGAN2 [48], SwapAE [17], FDIT [37], DDPM [18], StyleGAN3-T [49] and DDGAN(Denoising Diffusion Generative Adversarial Network) [50]. In addition, to further validate the effectiveness of FHWD in generating disease images, we used FHWD for data augmentation. We conducted classification task verification using representative and widely applied deep neural network models such as VGG16 [51], GoogLeNet [52], ResNet18 [53], SE-ResNet18 [54], PreActResNet18 [55], Swin Transformer [56] and Vision Transformer [57].
Experimental metrics
To evaluate the quality of generated images, we utilized the FID (Frechet Inception Distance) [58] and KID (Kernel Inception Distance) [59] metrics. FID measures the differences in overall distribution between generated and real samples, providing a global perspective. In contrast, KID assesses the maximum mean discrepancy between distributions, focusing more on local aspects. If a generative model performs well in global distribution (e.g., capturing the overall feature distribution) but lacks local generation quality (e.g., inconsistencies in the details or quality of samples from specific categories), it may result in a favorable FID but a poor KID. Therefore, to better evaluate the quality of the images generated by the model, this paper employs both FID and KID as evaluation metrics. FID is a popular metric for evaluating the quality of generated images. It provides a comprehensive and reliable quality assessment by comparing the distributions of generated images and real images in the feature space. A lower score indicates a higher quality of the generated images. The calculation formula for FID is as follows:
where \(x\) is the set of real images, and \(y\) is the set of generated images. \({\mu }_{x}\) and \({\mu }_{y}\) represent the means of the features of the real and generated images, respectively. At the same time, \({\sum }_{x}\) and \({\sum }_{y}\) denote the covariances of the real images and the generated images. Here, \({T}_{r}\) denotes the trace of a matrix (the sum of its diagonal elements), and \({({\sum }_{x}{\sum }_{y})}^\frac{1}{2}\) is the square root of a matrix.
where \({x}_{i}\) and \({x}_{j}\) are the feature samples of the real images, and \({y}_{i}\) and \({y}_{j}\) are the feature samples of the generated images.\(m\) and \(n\) represent the number of samples for the real images and generated images, respectively, and \(k\left(\cdot \right)\) is a kernel function.
Experimental setup
During the training process of FHWD, the Adam optimizer is used to update the weights and bias parameters of each layer in the network. The initial learning rate for the generator and discriminator is set to 0.002, with the learning rate gradually decreasing linearly as the number of training iterations increases. The exponential decay rate for the first moment estimate is set to 0.5, and the exponential decay rate for the second moment estimate is set to 0.99. The update rate for the generator and discriminator in the model is 1:1, meaning that after the generator is updated once, the discriminator will also be updated once. The total number of iterations for the entire model is set to 50,000.
In the classification task validation experiments, models such as VGG16 [51], GoogLeNet [52], ResNet18 [53], SE-ResNet18 [54], PreActResNet18 [55], Swin Transformer [56] and Vision Transformer [57] adopted the same settings: the initial learning rate was set to 0.001, and for each group of experiments, the learning rate was multiplied by 0.2 at the 20th, 60th, 120th, and 160th batches, training a total of 200 batches. The weight decay was set to 5e-4, and the Nesterov momentum was set to 0.9. Additionally, a warm-up training strategy was employed to ensure stability during the early stages of network training.
Experimental results and analysis
Quantitative analysis of image generation quality
Table 3 compares the FID metrics for generated images from StyleGAN2 [48], SwapAE [17], FDIT [37], DDPM [18], StyleGAN3-T [49], DDGAN[50], and the FHWD model across ten diseases: apple black rot, corn gray leaf spot, corn common rust, corn northern leaf blight, potato early blight, potato late blight, pepper bacterial spot, tomato early blight, grape black rot and strawberry leaf scorch.
Table 3 shows the performance differences among different models in generating various leaf disease images. The FHWD model consistently achieved the lowest FID value across all disease image categories, indicating that it generates the best quality images. StyleGAN2 had the highest FID value across all disease categories, suggesting that it generates the poorest quality images. SwapAE, FDIT, DDPM, StyleGAN3-T, and DDGAN models generate images with varying results across different disease categories, but it is evident that simpler leaf structures lead to better generation performance. Compared to the FDIT model, the FHWD model showed improvements in generating images for the ten leaf diseases, with FID values reduced by 2.52, 4.77, 4.05, 6.22, 0.33, 2.37, 3.19, 3.47, 3.25, and 4.9, respectively. Additionally, the FHWD model performed the best in generating corn leaf diseases with rich lesion details, achieving an average reduction of 5.01 in FID values. This result demonstrates that the proposed FHWD method can effectively improve the quality of generated disease images.
The average FID metrics for the generated images of the ten leaf diseases show that the FHWD model has an average FID value of 68.67, indicating that the quality of images it generates is overall the closest to the real images. The DDPM model has an average FID of 75.9, showing better performance in generating corn leaf diseases with complex lesion features. The SwapAE model has an average FID value of 76.19, which is close to DDPM, indicating relatively stable performance across different disease categories. StyleGAN2 has the highest average FID of 111.69, significantly higher than other methods, suggesting a large discrepancy between the generated images and real images. The FDIT model has an average FID value of 72.12, ranking second, with image quality slightly worse than FHWD. The StyleGAN3-T model has an average FID value of 73.39, with slightly lower image quality compared to FDIT and FHWD. The average FID value of DDGAN is 79.99, only lower than StyleGAN2, showing the poorest performance in generating corn leaf diseases, especially corn gray leaf spot. However, DDGAN performs relatively well in other disease categories, indicating its advantages in generating simpler disease images.
The outstanding performance of the FHWD model is attributed to its unique dual-discriminator design, particularly the inclusion of the high-frequency discriminator, which effectively preserves the detailed information of the images and significantly improves the quality of the generated images. Compared to other models, the FHWD model maintains the consistency of the generated images with the real images regarding global structure and excels in handling details. In the standard discriminator, the introduction of Wavelet Loss ensures the integrity of the overall information during the image generation process. Meanwhile, the high-frequency discriminator focuses on optimizing high-frequency information such as details and textures, using LSGAN Loss for fine constraints on the high-frequency components, making the generated images more realistic regarding edges, details, and textures. Additionally, the introduction of FFT Loss constrains the global frequency information of the images in the frequency domain, which not only enhances the fidelity of high-frequency details in the generated images but also ensures the coordination between details and the overall structure. This multi-layered loss design, which combines frequency domain and spatial domain constraints, gives the FHWD model a significant advantage in image generation quality compared to other models.
In summary, the FHWD model's dual improvements in global structure and detail handling have resulted in better generation performance across various types of disease images, particularly in preserving image details and enhancing image quality, far surpassing other models.
As shown in Table 4, the proposed FHWD model consistently exhibits the lowest KID value across all categories of leaf disease images, with an average KID of 0.0056. This indicates that the images generated by FHWD are more detailed and realistic compared to other models, particularly in generating corn leaf diseases, where the textures and lesion information are effectively restored. On the other hand, StyleGAN2 has the highest KID value across all leaf categories, with an average KID value of 0.0756, suggesting that it generates the poorest quality images. This is especially evident in the corn northern leaf blight and corn gray leaf spot, where the KID values are 0.1285 and 0.1219, respectively, significantly higher than those of other models. The average KID value of DDGAN is 0.0301, second only to StyleGAN2, indicating its overall poor visual performance in generating images of the ten leaf diseases, especially in corn gray leaf spot, where it performs the worst with a KID value of 0.0733. SwapAE and DDPM perform relatively well, with KID values lower than StyleGAN2, achieving average KID values of 0.0124 and 0.0159, respectively. However, compared to the FDIT and FHWD models, SwapAE and DDPM still show a significant gap. Among the other better-performing models, StyleGAN3-T has a KID value of 0.0084, and FDIT has a KID value of 0.0075. Therefore, in terms of the detail and realism of the generated disease images, the images produced by FDIT and StyleGAN3 are inferior to those generated by FHWD.
Ablation study
Analyze the relationship between different loss functions and the dual discriminators in the FHWD model using corn rust as an example.
When using Wavelet Loss, FFT Loss, and Dual Discriminator individually, the model's performance improves compared to using no loss functions or dual discriminators at all. FID values decrease by 1.22, 1.06, and 2.12, and KID values decrease by 0.0043, 0.0054, and 0.0046, respectively. When these losses are combined in pairs, except for the combination of Wavelet Loss and FFT Loss, which results in FID and KID values slightly higher by 0.39 and 0.0005 compared to using Dual Discriminator alone, all other combinations result in lower FID and KID values than using them individually. Combining Wavelet Loss, FFT Loss, and Dual Discriminator together achieved the best performance, with FID and KID values of 88.84 and 0.012, respectively (see Table 5).
Visualization of the generation process
To evaluate the quality and stability of the generated images, we present the image generation process of different models using corn common rust disease as an example, as shown in Table 6. Table 6 shows that the images generated by each model in the first round are chaotic noise. By the 1000th iteration, all three models, except for StyleGAN2 and DDGAN, had begun to fit the contours and backgrounds of the leaves.
After 10,000 iterations, the FHWD model can generate sharply textured diseased leaves, while the other models require more iterations to achieve similar results. The FHWD model significantly outperforms the others in generating small and dense lesions associated with corn common rust. The images generated by StyleGAN2 remain quite blurry, with lesions on the leaves yet to be visible; the lesions produced by the StyleGAN3-T model exhibit noticeable grid-like artifacts. The images generated by DDGAN are also blurry. Although both SwapAE and FDIT models can generate leaves with lesions, there is still a significant gap in the quality of the generated lesions compared to the DDPM and FHWD models.
After 20,000 iterations, StyleGAN2, SwapAE, and DDGAN can effectively reproduce leaf veins and lesion details. Additionally, the disease leaves generated by the StyleGAN3-T model continue to have the grid-like artifact issue. By the 50,000th iteration, aside from the leaves generated by StyleGAN2, which still lack clarity and have fuzzy lesions, the other five models produce visually indistinguishable leaves. The FHWD model can generate finely detailed and realistic lesions by the 10,000th iteration, indicating that it learns and fits the data features more quickly and efficiently, demonstrating a stronger ability to capture the details and local features of the data.
Visualization of generated images
Figure 5 depicts the leaves affected by different corn diseases generated by six models. These images provide a visual comparison to evaluate each model's generation performance further. From Fig. 5, it can be observed that corn common rust predominantly features small spots, while corn gray leaf spots and corn northern leaf blight include both large and small spots. The disease spot information generated by StyleGAN2 for corn common rust leaves is relatively sparse for these three types of diseases. Its generated corn gray leaf spot and corn northern leaf blight leaves only roughly fit the leaf contours and large spots compared to other models, lacking the texture details of the leaves. The SwapAE and FDIT models also perform poorly in generating small spots. The image quality generated by the StyleGAN3-T model remains subpar, with significant distortion of the corn leaf edges and insufficient generation of texture and detail information, resulting in an overall visual quality that is noticeably inferior to that of DDPM and FHWD. In contrast, the images generated by the DDPM and FHWD models are nearly indistinguishable from the naked eye, warranting further quantitative analysis.
CAM visualization
To conduct an in-depth comparison of the performance of six models (Original, StyleGAN2, SwapAE, FDIT, DDPM, StyleGAN3-T, and FHWD) in generating disease images, we further employed the CAM (Class Activation Mapping) [60] technique to visualize the analysis of generated images for three types of corn diseases. CAM allows one to intuitively observe the focal areas that the model concentrates on when identifying specific regions of diseased leaves, enabling us to assess the quality of the generated images and the model's ability to capture disease features. It visually illustrates the focal areas that deep neural network models pay attention to and aids in understanding how the models make classification decisions based on different regions of the input images. Additionally, it helps us to meticulously compare the similarities and differences between the images generated by each model and the real diseased leaves.
As shown in Fig. 6, the ResNet50 was applied to classify the disease in images generated by the six models. The red areas indicate regions that ResNet50 paid more attention to, while the blue areas indicate regions with less focus. The attention area of the original image is not focused on the disease spots. ResNet50 did not show a clear preference across the images generated by the six models for corn northern leaf blight disease, which contains less detailed information. In contrast, for corn common rust, the images generated by the StyleGAN2 model did not draw ResNet50's attention to the key disease areas, indicating a deficiency in capturing and reproducing disease details. The images generated by the other five models successfully directed ResNet50's attention to the disease spot regions, with DDPM and FHWD performing particularly well by enabling ResNet50 to capture the details of the spots accurately. For corn gray leaf spot disease, FHWD attracted ResNet50's attention to the larger spots and focused on the leaf texture and smaller spot areas, demonstrating FHWD's advantage in capturing details when generating disease images.
T-SNE visualization
To demonstrate the proximity of the generated disease images to the real images in the distribution space, we employed t-SNE (t-Distributed Stochastic Neighbor Embedding) [61] technology to conduct a comparative analysis of the image quality generated by six models across three types of corn diseases. Using t-SNE, we can compress high-dimensional image data into two or three-dimensional space, allowing complex data structures to be visually represented. This method is particularly suited for revealing the feature distributions captured by different generative models, helping us understand and compare the performance differences of each model in generating corn disease leaf images.
As shown in Fig. 7, the distributions of the three types of disease images generated by the FHWD model overlap well with the distributions of the real images, indicating that the disease images generated by the FHWD model are visually closer to the real images. The brown points representing the corn gray leaf spot and corn northern leaf blight generated by StyleGAN2 show little to no overlap with the blue points representing the real images, suggesting that the quality of the leaves generated by StyleGAN2 for these two diseases is poor and still has a significant gap compared to the real images. The images of corn common rust and corn northern leaf blight disease generated by the SwapAE, FDIT, and StyleGAN3-T models exhibit a more uniform distribution with the real images, with higher quality than those generated for corn gray leaf spot. However, the distributions of the corn northern leaf blight disease and corn gray leaf spot images generated by the DDPM model still show a noticeable boundary with the real image distributions, indicating that there is still a gap between the generated images and the real images.
Analysis of classification experimental results
The experiments were conducted using data enhanced by conventional methods, FDIT, and FHWD model enhancements to train classifiers on nine types of tomato leaf diseases (including bacterial spot, early blight, late blight, leaf mold, gray leaf spot, spider mites, target spot, mosaic disease, and yellow curl leaf disease). This was done to observe the impact of different data augmentation techniques on the classification models. As shown in Table 2, there are 50 original images for each type of disease, and we utilized data augmentation methods to expand each disease category to 200 images. The conventional data augmentation methods included adding Gaussian noise, brightness adjustments, contrast changes, saturation variations, and random flipping. Each augmentation technique was applied randomly, ensuring that the generated images had a certain degree of variability each time.
To comprehensively evaluate the performance of different deep learning architectures in classification tasks, this study selected VGG16, GoogLeNet, ResNet18, SE-ResNet18, and PreActResNet18 as classification validation models. These models represent various design philosophies, including convolutional networks, residual networks, and attention mechanisms, allowing us to assess the applicability and robustness of different augmentation methods across various networks. The classification performance of the models was evaluated using metrics such as precision, recall, F1 score, and accuracy, derived from the confusion matrix [62]. As shown in Table 7, the FHWD model performed the best across all classification models. Compared to the FDIT model, the FHWD model improved 0.87% in classification accuracy, 0.93% in recall, and a 0.95% increase in F1 score. This indicates that the FHWD model is more effective at improving the performance of classification models when dealing with complex features in deep networks.
For the VGG16 model, as a shallow network with relatively limited feature extraction capabilities, the classification accuracy after applying the FHWD augmentation was only 93.83%. In the absence of data augmentation, the classification accuracy of ViT was 93.96%. With the application of traditional augmentation methods and FDIT augmentation, with accuracies of 95.51% and 95.29%, respectively, the performance improved, but still fell short compared to the results of FHWD augmentation. After applying FHWD augmentation, the classification performance of ViT further improved, with precision reaching 95.54%, and recall and F1 scores being 95.33% and 95.33%, respectively. For the Swin Transformer, after applying FHWD augmentation, the precision was 95.37%, with recall and F1 scores of 95.11% and 95.07%, respectively, outperforming the ViT model. In contrast, the FHWD augmentation method performed exceptionally well on the GoogLeNet model, with precision, recall, and F1 scores reaching 97.62%, 97.56%, and 97.56%, respectively. It is worth noting that the improvement from conventional data augmentation methods was relatively limited. In some cases, excessive noise or irrelevant features were introduced, which led to a decline in model performance. The FDIT augmentation method also generated less realistic images, resulting in lower classification accuracy than FHWD. This further demonstrates the significant advantages of the FHWD method in data augmentation.
In summary, different classification models exhibit varying adaptability to image augmentation methods. Overall, data enhanced by the FHWD model demonstrates a more significant performance improvement across different classification models. FHWD enhances the quality of the generated images by improving the global structure and high-frequency details, which allows the model to learn the features of crop disease images more accurately during training. This enhancement not only improves the visual quality of the images but also provides richer and more realistic training data for classification tasks, significantly boosting the model's performance and accuracy. As shown in Table 8, the average classification performance comparison of the nine classification models under different augmentation methods indicates that the Precision, Recall, and F1-Score metrics of FHWD outperformed all other models, reaching 95.41%, 95.33%, and 95.32%, respectively. This proves that the data enhanced by the FHWD model have the highest quality and diversity, providing superior data samples for the classification models. FHWD enhances the quality of generated images by improving the global structure and high-frequency details, allowing the model to more accurately learn the features of crop disease images during the training process. This enhancement not only improves the visual quality of the images but also provides more diverse and realistic training data for classification tasks, thereby significantly boosting the model’s performance and accuracy.
Conclusion
This study introduced a method for generating crop disease images based on frequency domain and wavelet data augmentation (FHWD). This method enhances the original GAN model by adding a high-frequency discriminator, which uses filters to isolate and specifically evaluate the high-frequency detail information of the images. Additionally, wavelet loss and Fast Fourier Transform (FFT) loss are used to constrain the image generation process on the standard discriminator and high-frequency discriminator, ensuring that the generated images align with the real images in both global structure and detail.
The FHWD model effectively guides the generator in producing refined high-frequency details of disease images. In tasks involving the generation of richly detailed agricultural disease images, the FHWD model demonstrates significant advantages in image quality compared to other generative models. Experimental results show that FHWD effectively enhances the performance of generating crop disease images and improves the details and textures of the generated images, providing new perspectives and feasible solutions for downstream tasks such as disease image classification.
The images generated by FHWD are visually similar to real images, but there are still differences in image texture. At the same time, the FHWD model faces the challenge of slow training speed, especially on large-scale datasets. Future research will focus on improving the GAN architecture, exploring new high-frequency detail enhancement techniques, and adjusting the training strategy to further improve the authenticity and speed of the generated images.
Data availability
Data will be made available on request.
References
Pacal I, Kunduracioglu I, Alma MH, Deveci M, Kadry S, Nedoma J, et al. A systematic review of deep learning techniques for plant diseases. Artif Intell Rev. 2024;57:304. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s10462-024-10944-7.
Pacal I. Enhancing crop productivity and sustainability through disease identification in maize leaves: exploiting a large dataset with an advanced vision transformer model. Expert Syst Appl. 2024;238: 122099. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.eswa.2023.122099.
Kunduracioglu I, Pacal I. Advancements in deep learning for accurate classification of grape leaves and diagnosis of grape diseases. J Plant Dis Prot. 2024;131:1061–80. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s41348-024-00896-z.
Pacal I, Işık G. Utilizing convolutional neural networks and vision transformers for precise corn leaf disease identification. Neural Comput Appl. 2024;37:2479–96. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s00521-024-10769-z.
Haider A, Arsalan M, Hong JS, Sultan H, Ullah N, Park KR. Multi-scale and multi-receptive field-based feature fusion for robust segmentation of plant disease and fruit using agricultural images. Appl Soft Comput. 2024;167: 112300. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.asoc.2024.112300.
Haider A, Arsalan M, Nam SH, Sultan H, Park KR. Computer-aided fish assessment in an underwater marine environment using parallel and progressive spatial information fusion. J King Saud Univ Comput Inf Sci. 2023;35:211–26. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.jksuci.2023.02.016.
Naqvi SAF, Khan MA, Hamza A, Alsenan S, Alharbi M, Teng S, et al. Fruit and vegetable leaf disease recognition based on a novel custom convolutional neural network and shallow classifier. Front Plant Sci. 2024;15:1469685. https://doiorg.publicaciones.saludcastillayleon.es/10.3389/fpls.2024.1469685.
Wang C, Zhang J, He J, Luo W, Yuan X, Gu L. A two-stream network with complementary feature fusion for pest image classification. Eng Appl Artif Intell. 2023;124: 106563. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.engappai.2023.106563.
Joseph DS, Pawar PM, Chakradeo K. Real-time plant disease dataset development and detection of plant disease using deep learning. IEEE Access. 2024;12:16310–33. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/ACCESS.2024.3358333.
Shah A, Khan MA, Alzahrani AI, Alalwan N, Hamza A, Manic S, et al. FuzzyShallow: a framework of deep shallow neural networks and modified tree growth optimization for agriculture land cover and fruit disease recognition from remote sensing and digital imaging. Measurement. 2024;237: 115224. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.measurement.2024.115224.
Cireşan DC, Meier U, Gambardella LM, Schmidhuber J. Deep, big, simple neural nets for handwritten digit recognition. Neural Comput. 2010;22:3207–20. https://doiorg.publicaciones.saludcastillayleon.es/10.1162/NECO_a_00052.
Gurumurthy S, Sarvadevabhatla RK, Radhakrishnan VB. Deligan: generative adversarial networks for diverse and limited data. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2017. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/CVPR.2017.525
Taylor L, Nitschke G. Improving deep learning with generic data augmentation. In: 2018 IEEE symposium series on computational intelligence (SSCI), 2018. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/SSCI.2018.8628742
Yun S, Han D, Oh SJ, Chun S, Choe J, Yoo Y. Cutmix: regularization strategy to train strong classifiers with localizable features. In: Proceedings of the IEEE/CVF international conference on computer vision, 2019. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/ICCV.2019.00612
Takahashi R, Matsubara T, Uehara K. Ricap: Random image cropping and patching data augmentation for deep cnns. In: Asian conference on machine learning, 2018. https://proceedings.mlr.press/v95/takahashi18a.html
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. Courville and Y. Bengio. Generative adversarial nets. Adv Neural Inform Proc Syst. 2014:2672:2680. https://doiorg.publicaciones.saludcastillayleon.es/10.3156/JSOFT.29.5_177_2
Park T, Zhu JY, Wang O, Lu J, Shechtman E, Efros A, et al. Swapping autoencoder for deep image manipulation. Adv Neural Inform Proc Syst. 2020;33:7198–7211. https://dl.acm.org/doi/abs/https://doiorg.publicaciones.saludcastillayleon.es/10.5555/3495724.3496328
Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models. Adv Neural Inform Proc Syst. 2020;33:6840–6851. https://dl.acm.org/doi/https://doiorg.publicaciones.saludcastillayleon.es/10.5555/3495724.3496298. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s41348-024-00896-z
Ullah W, Javed K, Khan MA, Alghayadh FY, Bhatt MW, Al Naimi, et al. Efficient identification and classification of apple leaf diseases using lightweight vision transformer (ViT). Discover Sustainability. 2024;5:116. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s43621-024-00307-1
Haikal ALA., Yudistira N ,Ridok A. Comprehensive Mixed-Based Data Augmentation For Detection of Rice Leaf Disease in The Wild. Crop Prot. 2024:106816. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.cropro.2024.106816
Wagle SA, Harikrishnan R, Sampe FASHM, Sawal Hamid Md. Effect of Data Augmentation in the Classification and Validation of Tomato Plant Disease with Deep Learning Methods. Trait. signal. 2021;38:1657–1670. https://doiorg.publicaciones.saludcastillayleon.es/10.18280/ts.380609
Shaji AP, Hemalatha S. Data augmentation for improving rice leaf disease classification on residual network architecture. In: 2022 International Conference on Advances in Computing, Communication and Applied Informatics (ACCAI), 2022. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/ACCAI53970.2022.9752495
Li T, Asai M, Kato Y, Fukano Y, Guo W. Channel Attention GAN-Based Synthetic Weed Generation for Precise Weed Identification. Plant Phenomics. 2024;6:0122. https://doiorg.publicaciones.saludcastillayleon.es/10.34133/plantphenomics.0122
Wang X, Cao W. GACN: generative adversarial classified network for balancing plant disease dataset and plant disease recognition. Sensors. 2023;23:6844. https://doiorg.publicaciones.saludcastillayleon.es/10.3390/s23156844.
Zhang J, Zhang D, Liu J, Zhou Y, Cui X, Fan X. DSCONV-GAN: a UAV-BASED model for Verticillium Wilt disease detection in Chinese cabbage in complex growing environments. Plant Methods. 2024;20:186. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13007-024-01303-2.
Haruna Y, Qin S, Mbyamm Kiki MJ. An improved approach to detection of rice leaf disease with gan-based data augmentation pipeline. Appl Sci. 2023;13:1346. https://doiorg.publicaciones.saludcastillayleon.es/10.3390/app13031346.
Xin M, Ang LW, Palaniappan S. A data augmented method for plant disease leaf image recognition based on enhanced GAN model network. J. Inform. Web Eng. 2023;2:1–12. https://doiorg.publicaciones.saludcastillayleon.es/10.33093/jiwe.2023.2.1.1
Cap QH, Uga H, Kagiwada S, Iyatomi H. Leafgan: An effective data augmentation method for practical plant disease diagnosis. IEEE Trans Autom Sci Eng. 2020;19:1258–67. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/TASE.2020.3041499.
Li X, Li X, Zhang M, Dong Q, Zhang G, Wang Z, et al. SugarcaneGAN: A novel dataset generating approach for sugarcane leaf diseases based on lightweight hybrid CNN-Transformer network. Comput Electron Agric. 2024;219: 108762. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.compag.2024.108762.
Chen D, Qi X, Zheng Y, Lu Y, Huang Y, Li Z. Synthetic data augmentation by diffusion probabilistic models to enhance weed recognition. Comput Electron Agric. 2024;216: 108517. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.compag.2023.108517.
Wang Q, Wu B, Zhu P, Li P, Zuo W, Hu Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/CVPR42600.2020.01155
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/CVPR.2016.308
Wang R, Zhang X, Yang Q, Lei L, Liang J, Yang L. Enhancing Panax notoginseng Leaf Disease Classification with Inception-SSNet and Image Generation via Improved Diffusion Model. Agronomy. 2024;14:1982. https://doiorg.publicaciones.saludcastillayleon.es/10.3390/agronomy14091982.
Li Y, Guo J, Qiu H, Chen F, Zhang J. Denoising Diffusion Probabilistic Models and Transfer Learning for citrus disease diagnosis. Front Plant Sci. 2023;14:1267810. https://doiorg.publicaciones.saludcastillayleon.es/10.3389/fpls.2023.1267810.
Rahman ZU, Asaari MSM, Ibrahim H, Abidin ISZ, Ishak MK. Generative Adversarial Networks (GANs) for Image Augmentation in Farming: A Review. IEEE Access. 2024;12:179912–43. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/ACCESS.2024.3505989.
Zhang L, Chen X, Tu X, Wan P, Xu N, Ma K. Wavelet knowledge distillation: Towards efficient image-to-image translation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/CVPR52688.2022.01214
Cai M, Zhang H, Huang H, Geng Q, Li Y, Huang G. Frequency domain image translation: More photo-realistic, better identity-preserving. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/ICCV48922.2021.01367
Karras T. Progressive Growing of GANs for Improved Quality, Stability, and Variation. arxiv preprint. 2017. https://arxiv.org/abs/1710.10196
Zhu JY, Park T, Isola P, Efros AA. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE international conference on computer vision, 2017. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/ICCV.2017.244
Mao X, Li Q, Xie H, Lau RY, Wang Z, Paul Smolley S. Least squares generative adversarial networks. In: Proceedings of the IEEE international conference on computer vision, 2017. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/ICCV.2017.304
Hughes D, Salathé M. An open access repository of images on plant health to enable the development of mobile disease diagnostics. arXiv preprint arXiv:1511.08060. 2015. https://doiorg.publicaciones.saludcastillayleon.es/10.48550/arXiv.1511.08060
Phung H, Dao Q, Tran A. Wavelet diffusion models are fast and scalable image generators. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/CVPR52729.2023.00983
Gal R, Hochberg DC, Bermano A, Cohen-Or D. Swagan: A style-based wavelet-driven generative model. ACM T Graphic. 2021;40:1–11.
Yang M, Wang Z, Chi Z, Feng W. Wavegan: Frequency-aware gan for high-fidelity few-shot image generation. In: European conference on computer vision. 2022.
Wang Z, Chi Z, Zhang Y. FreGAN: exploiting frequency components for training GANs under limited data. Adv Neural Inform Proc Syst. 2022;35:33387–99.
Zhang Z, Zhan W, Sun Y, Peng J, Zhang Y, Guo Y, et al. Mask-guided dual-perception generative adversarial network for synthesizing complex maize diseased leaves to augment datasets. Eng Appl Artif Intell. 2024;136: 108875.
Ten DI. lectures on wavelets. Soc Ind Appl Math. 1992. https://doiorg.publicaciones.saludcastillayleon.es/10.1063/1.4823127.
Karras T, Laine S, Aittala M, Hellsten J, Lehtinen J, Aila T. Analyzing and improving the image quality of stylegan. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/CVPR42600.2020.00813
Karras T, Aittala M, Laine S, Härkönen E, Hellsten J, et al. Alias-free generative adversarial networks. Adv Neural Inform Proc Syst. 2021;34:852–63. https://doiorg.publicaciones.saludcastillayleon.es/10.5555/3540261.3540327.
Xiao Z, Kreis K, Vahdat A. Tackling the generative learning trilemma with denoising diffusion gans. arXiv preprint. 2021. https://doiorg.publicaciones.saludcastillayleon.es/10.48550/arXiv.2112.07804.
Simonyan K. Very deep convolutional networks for large-scale image recognition. arXiv preprint. 2014. https://doiorg.publicaciones.saludcastillayleon.es/10.48550/arXiv.1409.1556.
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, et al. Vanhoucke and A. Rabinovich. Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2015. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/CVPR.2015.7298594
He K, Zhang X, Ren S ,Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/CVPR.2016.90
Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2018. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/TPAMI.2019.2913372
He K, Zhang X, Ren S, Sun J. Identity Mappings in Deep Residual Networks. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14; Springer: Cham, Switzerland, 2016. pp. 630-645. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/978-3-319-46493-0_38
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, et al. Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision. 2021. https://doiorg.publicaciones.saludcastillayleon.es/10.48550/arXiv.2103.14030
Dosovitskiy A. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint. 2020. https://doiorg.publicaciones.saludcastillayleon.es/10.48550/arXiv.2010.11929
Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Adv Neural Inform Proc Syst. 2017. https://doiorg.publicaciones.saludcastillayleon.es/10.48550/arXiv.1706.08500.
Bińkowski M, Sutherland DJ, Arbel M, Gretton, A. Demystifying mmd gans. arXiv preprint. 2018. http://arxiv.org/abs/1801.01401
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A. Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/CVPR.2016.319
Van der Maaten L, Hinton G. Visualizing data using t-SNE. J Mach Learn Res. 2008;9:2579–605.
Karimi Z. Confusion matrix. Encyclopedia of Machine Learning and Data Miningno. 2021; p. 260.
Acknowledgements
The authors would like to thank the Anhui Agricultural University Application Technology Research Institute of Agricultural Big Data for their helps.
Funding
This research was funded by the National Natural Science Foundation of China, grant number 32472007, 62306008 and 62301006; the National Key Research and Development Program, grant number 2023YFD1802200; the Key Project of Anhui Province's Science and Technology Innovation Tackle Plan, grant number 202423k09020040; the Natural Science Research Project of Education Department of Anhui Province of China, grant number 2023AH051020; the Natural Science Foundation of Anhui Province, grant number 2308085MF21; the University Synergy Innovation Program of Anhui Province, Grant number GXXT-2022–046, GXXT-2022–055 and GXXT-2022–040.
Author information
Authors and Affiliations
Contributions
Conceptualization: CW and LG; methodology: CW and YX; software: YX.; validation: LX; formal analysis:YX; investigation: LG; writing—original draft preparation: CW and YX; writing—review and editing: CW and QW; visualization: YX; supervision: LG; project administration: LG; funding acquisition: LGu, CW and QW. All authors have read and agreed to the published version of the manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Wang, C., Xia, Y., Xia, L. et al. Dual discriminator GAN-based synthetic crop disease image generation for precise crop disease identification. Plant Methods 21, 46 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13007-025-01361-0
Received:
Accepted:
Published:
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13007-025-01361-0