GrainNet: efficient detection and counting of wheat grains based on an improved YOLOv7 modeling

Wang, Xin; Li, Changchun; Zhao, Chenyi; Jiao, Yinghua; Xiang, Hengmao; Wu, Xifang; Chai, Huabin

doi:10.1186/s13007-025-01363-y

Research
Open access
Published: 25 March 2025

GrainNet: efficient detection and counting of wheat grains based on an improved YOLOv7 modeling

Xin Wang¹,
Changchun Li¹,
Chenyi Zhao¹,
Yinghua Jiao²,
Hengmao Xiang²,
Xifang Wu¹ &
…
Huabin Chai¹

Plant Methods volume 21, Article number: 44 (2025) Cite this article

568 Accesses
Metrics details

Abstract

Background

Seed testing plays a crucial role in improving crop yields.In actual seed testing processes, factors such as grain sticking and complex imaging environments can significantly affect the accuracy of wheat grain counting, directly impacting the effectiveness of seed testing. However, most existing methods primarily focus on simple counting tasks and lack general applicability.

Results

To enable fast and accurate counting of wheat grains under severe adhesion and complex scenarios, this study collected images of wheat grains from different varieties, backgrounds, densities, imaging heights, adhesion levels, and other natural conditions using various imaging devices and constructed a comprehensive wheat grain dataset through data enhancement techniques. We propose a wheat grain detection and counting model called GrainNet, which significantly improves the counting performance and detection speed across diverse conditions and adhesion levels by incorporating lightweight and efficient feature fusion modules. Specifically, the model incorporates an Efficient Multi-scale Attention (EMA) mechanism, effectively mitigating the interference of background noise on detection results. Additionally, the ASF-Gather and Distribute (ASF-GD) module optimizes the feature extraction component of the original YOLOv7 network, improving the model’s robustness and accuracy in complex scenarios. Ablation experiments validate the effectiveness of the proposed methods.Compared with classic models such as Faster R-CNN, YOLOv5, YOLOv7, and YOLOv8, the GrainNet model achieves better detection performance and computational efficiency in various scenarios and adhesion levels. The mean Average Precision reached 93.15%, the F1 score was 0.946, and the detection speed was 29.10 frames per second (FPS). A comparative analysis with manual counting results revealed that the GrainNet model achieved the highest coefficient of determination and Mean Absolute Error values for wheat grain counting tasks, which were 0.93 and 5.97, respectively, with a counting accuracy of 94.47%.

Conclusions

Overall, the GrainNet model presented in this study enables accurate and rapid recognition and quantification of wheat grains, which can provide a reference for effective seed examination of wheat grains in real scenarios. Related content can be accessed through the following link: https://github.com/1371530728/grainnet.git.

Introduction

Wheat is a major food crop in China. Achieving sustainable wheat production while improving yield and quality is crucial to ensure national food security [1]. Wheat grain detection and counting are key indicators for assessing yield and are essential for crop examination and cultivation management [2]. Therefore, accurate and efficient detection and counting of wheat grains is crucial for variety selection and crop evaluation [3]. However, wheat grain detection is challenging due to complex agricultural environments, varying degrees of overlapping shading, and dense sticking of grains. Traditional wheat grain counting relies mainly on manual operations, which are inefficient and costly. Current research focuses mainly on the segmentation and counting of simple crop grains, whereas detection in cases of severe adhesion and complex scenarios still presents certain challenges. With the advent of smart agriculture, the application of intelligent and automated inspection techniques has become increasingly important. Therefore, an in-depth study of wheat grain detection and recognition is necessary.

Traditional image processing methods separate the grains from the background and then calculate the number of grains in each region using techniques such as expansion erosion [4], the watershed algorithm [5], and feature point matching [6, 7]. Extensive research has been conducted on identifying, detecting, and counting crop seeds. Song et al. applied the watershed algorithm and an improved target segmentation algorithm to segment and extract soybean seed images, achieving a counting accuracy of 95.2% [8]. Mao et al. proposed a contour fitting image segmentation method based on the fusion of angular point features to address the overlapping seeds in sesame seed examinations, which achieved an accuracy rate of 97.28% for different sesame seed varieties [9]. Researchers have explored wheat grain segmentation and counting extensively. Liu et al. applied the Otsu algorithm to obtain binary images of wheat seeds and leveraged the relationship between the feature points and the number of seeds to achieve wheat seed counting [10]. Zhao et al. employed a global thresholding approach for the segmentation and extraction of wheat grain regions, but this method was only effective when the adhesion among the grains was minimal [11]. However, such feature extraction methods based on traditional image processing rely on predefined features, which can limit their generalizability and reduce their robustness [12]. In addition, these methods often require specific imaging environments, increasing the cost of image acquisition and requiring manual feature extraction. Image processing techniques are also susceptible to changes in external conditions such as variations in imaging height, which can inhibit accurate seed grain recognition. These limitations may cause poor generalizability and robustness, making it difficult to apply these methods to practical seed grain detection scenarios.

In recent years, target detection techniques based on deep learning have been widely applied in agriculture, including plant pest control [13,14,15,16], plant counting [12, 17,18,19] and plant phenotype detection [20,21,22]. These techniques can learn and extract relevant features from complex backgrounds and dense object distributions, improve recognition accuracy, and effectively overcome the limitations of traditional methods [23]. Deep learning algorithms for target recognition can be categorized into single- and two-stage approaches, with the former being faster and less complex [24]. Deep learning methods have significant advantages in crop seed detection and counting. For example, Yang et al. proposed a method for generating and enhancing synthetic images on area randomization with soybean seeds as the research object. They trained the dataset generated by this method on a Mask RCNN network, significantly reducing the cost of image annotation and network computation and achieving effective segmentation and counting of soybean seeds [25]. Zang et al. proposed a YOLOv5s model incorporating the ECA attention mechanism to detect small-scale wheat spikes, addressing the issue of counting overlapping and occluded spikes [26]. He et al. combined YOLOv5 with a back-propagation (BP) neural network to detect the pod number and pod phenotypic traits in single soybean plants and achieved an average accuracy of 91.7% [27]. Deep learning technologies have also been adopted for counting wheat grains. Wang et al. developed a seed detection model with a vibratory separation device for the automatic identification and counting of wheat seeds [28]. Pan et al. established a wheat grain recognition model based on an improved Mask R-CNN, effectively segmenting the seed contours and counting grains, but it performs poorly with heavy seed adhesion [29]. Xu et al. proposed a deep learning algorithm, CBAM-HRNet, which incorporates the convolutional block attention module to count wheat grains by double counting grains on the same side, which results indicate that this method can be used to detect the number of wheat grains effectively [30]. However, these studies focused mainly on simple target occlusion problems in complex contexts. When multiple targets are occluded and the degree of adhesive occlusion is high, the algorithm can only utilize unmasked local features to identify the targets. Consequently, occluded targets may be mistaken for neighboring targets, leading to missed detections.

Although most current deep learning-based seed detection methods demonstrate high detection accuracy, they are often associated with high parameter and computational complexity. These methods typically either sacrifice some accuracy to enhance detection speed or incur increased computational cost at the expense of accuracy. For instance, convolutional neural networks (CNNs) extract more abstract features through multiple convolutional layers to improve detection accuracy; however, this also results in increased computation and longer inference times. Moreover, the training of deep models often relies on high-performance computational resources, which exacerbates computational complexity. In practical applications, real-time performance is a critical requirement, especially in scenarios demanding rapid responses, where computational complexity and detection speed are key concerns. Therefore, achieving an effective balance between detection speed, accuracy, and computational complexity remains a significant research problem in the field of seed recognition.

To achieve fast and accurate counting of wheat grains in cases of severe adhesion and complex scenarios, this study proposed an improved wheat grain detection model, GrainNet, which is based on YOLOv7. Compared to other advanced models, GrainNet exhibits better detection performance, with a precision of 97.32%, a recall of 92.05%, an F1-score of 0.946, and a mAP@0.5 of 93.15%. These metrics surpass Faster R-CNN, YOLOv5, YOLOv7, and YOLOv8, indicating that the model is better at completing wheat grain detection tasks under severe particle adhesion and complex environments.

The main contributions of this study are as follows.

(1) A diverse dataset of wheat grains was created in this study. Creating a comprehensive dataset with diverse wheat varieties, heights, backgrounds, and densities can facilitate the evaluation of the detection performance of the algorithm in various scenarios.

(2) A GrainNet model based on the improved YOLOv7 algorithm was proposed. The innovative adaptive spatial feature gathering and distribution (ASF-GD) mechanism effectively addresses the issues of miscounting and missed counts in densely adhered wheat grain scenarios through adaptive spatial feature extraction and feature guidance. The introduction of the multiscale attention (EMA) module reduces background noise interference with detection results, enhancing the model’s performance in complex scenarios for both detection and counting.

The remainder of this paper is organized as follows. Section 2 details the materials and methods, including dataset acquisition and processing, target detection algorithm, and accuracy assessment metrics. Section 3 provides a thorough analysis of the experimental results, and Sect. 4 discusses the findings. Finally, Sect. 5 provides a summary of this study.

Materials and methods

Overview of the workflow

To propose a diverse wheat grain dataset with different environment complexities and test the performance of our proposed model for wheat grain detection and counting under different levels of adhesion, this study consisted of four main steps: data collection, data augmentation, model construction, and performance evaluation. The workflow is illustrated in Fig. 1.

The specific steps are as follows.

(1) Data collection. Two types of imaging devices, covering three different varieties, three heights, three adhesion levels, and six background conditions, were used to capture images of the wheat grains. This created a diverse and representative dataset, enhancing the model’s ability to recognize and adapt to wheat grains under various condition.

(2) Data Augmentation. To improve the model’s generalization ability and avoid overfitting, data augmentation techniques were applied to the dataset and its labels. The specific operations included random rotation, flipping, translation, brightness adjustment, cutout, mixup, and mosaic processing, further enriching the training dataset and enhancing the model’s stability in practical applications.

(3) Model construction. A deep learning model, GrainNet, was built to integrate lightweight modules, the feature fusion module (ASF-GD), and the EMA attention mechanism, which optimizes the model’s ability to extract features from densely adhered small target grains and improves the accuracy and efficiency of wheat grain counting.

(4) Model evaluation. The model was compared with classic object detection models such as Faster-RCNN, YOLOv5, YOLOv7, and YOLOv8 and comprehensively validated in multiple dimensions, including accuracy, recall, F1 score, and inference speed to demonstrate its advantages in the wheat grain counting task. A linear correlation between the actual and model-predicted grain counts was established to further verify the effectiveness of this method, with the aim of supporting in wheat grain variety testing.

Data acquisition and preprocessing

The wheat grain data used in the experiment were obtained from samples collected in the experimental fields of the Jiaozuo Academy of Agricultural and Forestry Sciences, Henan Province. Three wheat varieties were selected based on grain morphology and color. The wheat grain image acquisition setup is depicted in Fig. 1, which uses an Mi10 camera (5792 × 4344 pixels) and a Canon EOS 200D II camera (6000 × 4000 pixels). Stability during image capture was ensured using a fixed tripod and lifting platform, maintaining a vertical distance between the platform and equipment within 10–20 cm. The camera was adjusted to capture images at heights of 10, 15, and 20 cm under six different background conditions.

Diverse images of wheat grains used in this study were collected to explore the impact of different conditions on wheat grain detection. The color, shape, and texture characteristics are essential for wheat kernel recognition. To ensure sample diversity, images of the wheat kernels were collected from various varieties, backgrounds, densities, imaging heights, adhesion levels, and other natural conditions. This approach brought the obtained data closer to the actual working environment of wheat kernel detection and enhanced the robustness of the model to image quality at different resolutions. During image acquisition, a random quantity of wheat kernels was scattered on the platform and distributed by gentle shaking to avoid adhesion caused by human factors. The subjects of this study were three high-yield experimental wheat varieties, 0275, 1708, and 5630, cultivated by the Agricultural Science Institute. The three wheat varieties had their own distinct grain colors and morphological traits at maturity. On the basis of differences in grain color, these varieties were labeled “G” (green), “Y” (yellow), and “B” (brown), a labeling method that facilitates the distinction of different wheat varieties. To explore the impacts of the grain distribution density, imaging height, and background on the detection effectiveness, three distribution densities, namely, low-density adhesion, medium-density adhesion, and high-density adhesion, were utilized during image capture. On the basis of previous studies, localized image areas containing 2–20 grains were classified as having low-density adhesion, 21–50 grains as having medium-density adhesion, and more than 50 grains as having high-density adhesion. A total of 1,198 original wheat grain images were obtained, and manual annotation was performed using LabelImg to construct a diverse dataset. Figure 2 shows examples from the wheat grain dataset.

Considering the impact of real-world detection environments on wheat grain detection results, the original images were subjected to data enhancement to prevent model overfitting and bias [31]. Specific methods include random rotation, flipping, translation, image brightness adjustment, cutout, mixup, and mosaic techniques. Mixup combines two images, whereas mosic randomly combines four images, as shown in Fig. 3. After image enhancement, 4552 images were generated and divided into training and validation sets at an 8:2 ratio, resulting in 3642 images for training and 910 for validation. The training set was used to train the model, and the validation set was used to assess the detection capabilities of the final model. The detailed information of the dataset is shown in Table 1.

Table 1 Dataset statistics

Full size table

GrainNet model structure

This study proposed a novel GrainNet model based on the YOLOv7 [32] network, which significantly improved the performance and detection speed of wheat grain counting under various detection conditions and adhesion levels. The structure is shown in Fig. 4. The GrainNet model is composed of an ASF-Gather and Distribute (ASF-GD) module, EMA attention mechanism, and lightweight module. The ASF-GD module plays a key role in the neck part of the model. By integrating multiscale feature maps and applying the Channel Position Attention Mechanism (CPAM) [33], the ASF-GD module enhances the model’s sensitivity to grains of different shapes and sizes, effectively capturing key features in adhesive regions and improving the recognition accuracy for adhered grains. The EMA module was designed to address the challenges in wheat grain detection and counting under various scenarios. By leveraging multi-scale attention mechanisms, the model adaptively highlights key features, suppresses background noise, and enhances detection and counting performance in complex scenes. To enhance computational efficiency and inference speed, the model introduced lightweight modules, ensuring fast and accurate detection and counting of wheat grains even in highly adhesive and complex scenarios.

ASF-Gather and distribute (ASF-GD) mechanism

The ASF-Gather and Distribute (ASF-GD) module is a comprehensive feature extraction module designed in this study to improve the detection and counting capabilities of wheat grains. This module consists of a Scale Sequence Feature Fusion (SSFF) module, a Gather-and-Distribute (GD) mechanism [34], and a channel position attention mechanism (CPAM). These components work closely together to enhance the model’s adaptability and accuracy in complex scenarios.

GD mechanism

The GD mechanism is the core mechanism that improves the detection and counting of wheat grains. It consists of the Low-Gather and Distribute branch (Low-GD) and the High-Gather and Distribute branch (High-GD), which replace the original network’s PANet upsampling and downsampling fusion stages, as shown in Fig. 5. This mechanism enhances the information fusion capabilities of the neck part without significantly increasing the computation latency, thereby improving the model’s detection accuracy for different grains. Specifically, the GD mechanism first uses the Low-GD branch to extract local features from shallow feature maps and expands them to a global scale through upsampling. Moreover, the High-GD branch captures global context information from high-level feature maps and fuses it with lower-level features through downsampling. This allows bidirectional information flow and fusion between the upper and lower levels, improving the model’s ability to capture multiscale features.

In the GD mechanism, the feature alignment module (FAM) performs precise upsampling or downsampling of the enhanced features from the SSFF module to ensure spatial consistency across layers. This is especially important for detecting wheat grains of various sizes as they effectively maintain feature stability. Next, the information fusion module (IFM) globally fuses these aligned features during the postprocessing stage, generating a unified global feature representation and improving the model’s overall understanding of wheat grain features. Finally, these features are distributed to different levels through a slicing operation, making effective use of both local and global information. This mechanism enhances the model’s adaptability to various grain shapes and densities while maintaining high computational efficiency, significantly improving its detection capabilities in complex backgrounds and adhesion scenarios.

SSFF module and CPAM module

The scale sequence feature fusion (SSFF) module plays a pivotal role in multi-scale information processing, increasing the neural network’s ability to extract information across various scales. By integrating features from different network layers, the SSFF enriches the feature representations, thereby enhancing the model performance across objects of diverse sizes. Moreover, the channel and location attention mechanism (CPAM) enhances feature representation by focusing on informative channels and localizing small objects, providing effective attention guidance. This improves the model’s accuracy in detection and segmentation tasks.

EMA attention mechanism

The attentional mechanism can effectively capture critical target information, enabling the model to prioritize essential details crucial for diverse computer vision tasks [35]. To enhance the model’s ability to distinguish wheat kernels from background noise, the efficient multiscale attention (EMA) [36] mechanism was integrated into the head of YOLOv7. By applying attention across multiple scales, the EMA mechanism dynamically focuses on key feature regions of wheat grains, reducing interference from background noise and improving detection and counting performance in complex scenarios. The structure of this attention mechanism is illustrated in Fig. 6.

EMA extracts the attention weights of grouped feature graphs via three parallel branches: 1 × 1, 1 × 1, and 3 × 3. In the 1 × 1 branch, the channels were first encoded in the H and W directions using adaptive global mean pooling. The encoded channel features were then convolved in the image height direction to avoid dimensionality reduction in the 1 × 1 branch. The output of the 1 × 1 convolution was decomposed into two vectors that were adjusted via a sigmoid function to approximate the bivariate binomial distribution of the linear convolution. These adapted weights were applied to the output of the 1 × 1 branch after adaptive reweighting of the feature variables. The 3 × 3 branch employs a single 3 × 3 convolution to capture multiscale features that serve as the output of the branch.

The output of the 1 × 1 branch underwent 2-D global average pooling, followed by a Softmax function applied to the transformed result. This output was then multiplied pointwise by the output of the 3 × 3 branch to obtain the first spatial attention map. Similarly, the output of the 3 × 3 branch underwent a dot product operation with the output of the 1 × 1 branch followed by the same operations to obtain the second spatial attention map. Within each group, the output feature maps were processed with a sigmoid function and adaptive feature variable selection was performed to obtain the global contextual information. The three branches of the EMA attention mechanism synergistically combined channel and spatial attention benefits, effectively capturing both global and local spatial features. This approach efficiently extracts multi-scale spatial information.

Lightweight module

To meet the storage and computing demands of advancing mobile devices, reducing the computational workload and model parameters is crucial. To optimize the model structure, improvements were made to ELAN and SPPCSPC by introducing partial convolution (PConv) [37], and P-ELAN and P-SPPCSPC structures were constructed. PConv significantly reduces computational load and memory usage by selectively performing convolution operations on certain channels, thereby enhancing the model’s inference efficiency. In the P-ELAN and P-SPPCSPC modules, PConv minimizes computational redundancy in multiscale feature aggregation, improves feature extraction efficiency and ensures rapid response capability when handling complex scenes and adhesive grains.

Specifically, PConv uses redundant information within the feature map to selectively apply conventional convolutional methods to certain input channels while keeping the others unchanged. This mechanism allows certain sequential or regular memory access patterns to treat either the first or last sequential channel as representative of the entire feature map, thereby reducing the computational load. By maintaining equal channel numbers for both the input and output feature maps, this optimization significantly lowered the model complexity. Overall, this optimized design greatly reduced the computational burden while still ensuring high accuracy for wheat grain detection and counting tasks. The structural diagram of the lightweight module is shown in Fig. 7. The relevant formulas are presented in Eqs. (1)–(4), where $\:{\text{F}}_{\text{P}\text{C}\text{o}\text{n}\text{v}}$ and $\:{\text{F}}_{\text{C}\text{o}\text{n}\text{v}}$ represent the computational loads of Pconv and standard convolution, respectively; $\:{\text{M}\text{A}\text{C}}_{\text{C}\text{o}\text{n}\text{v}}$ and $\:{\text{M}\text{A}\text{C}}_{\text{P}\text{C}\text{o}\text{n}\text{v}}$ denote the memory access volumes for Pconv and standard convolution, respectively; h and w denote the height and width of the convolution kernel, respectively; k represents the kernel size; $\:{\text{c}}_{\text{p}}$ represents the number of channels involved in the convolution; c denotes the number of channels in the input feature map; and r denotes the convolution involvement rate.

$$\:\begin{array}{c}\begin{array}{c}{F}_{PConv}=h\times\:w\times\:{k}^{2}\times\:{c}_{p}^{2}\:\end{array}\end{array}$$

(1)

$$\:\begin{array}{c}\begin{array}{c}{F}_{Conv}=h\times\:w\times\:{k}^{2}\times\:{c}^{2}\end{array}\end{array}$$

(2)

$$\:\begin{array}{c}\begin{array}{c}{MAC}_{PConv}=h\times\:w\times\:2{c}_{p}+{k}^{2}\times\:{c}_{p}^{2}\approx\:h\times\:w\times\:2{c}_{p}\end{array}\end{array}$$

(3)

$$\:\begin{array}{c}\begin{array}{c}{MAC}_{Conv}=h\times\:w\times\:2{c}_{p}+{k}^{2}\times\:{c}_{p}^{2}\approx\:h\times\:w\times\:2c\end{array}\end{array}$$

(4)

Model evaluation index

To assess the validity of the model, P (Precision), R (Recall, ) F1 (F1 score), mAP (mean Average Precision), Params (parameter count), FPS (Frames Per Second) and FLOPs (Floating-point operations) were used to evaluate the recognition performance of the wheat grain detection model. P denotes the ratio of correctly identified positive samples to total number of detected targets. R represents the percentage of correctly identified targets out of the total intended targets. TP denotes correctly detected positive grain samples, FP represents incorrectly detected negative samples, and FN represents incorrectly detected positive samples. The calculation formula is as follows:

$$\:\begin{array}{c}P=\frac{TP}{TP+FP}\times\:100\%\end{array}$$

(5)

$$\:\begin{array}{c}R=\frac{TP}{TP+FN}\times\:100\%\end{array}$$

(6)

The F1 integrates the precision and recall to assess the accuracy and completeness of the model. Generally, a higher F1 score indicates greater model stability. The calculation formula is as follows:

$$\:\begin{array}{c}F1=\frac{2\times\:P\times\:R}{P+R\:}\times\:100\%\end{array}$$

(7)

The mAP represents the average precision (AP) across all categories, offering a comprehensive evaluation of the model’s detection performance across various categories. A higher mAP value generally indicates better model detection performance. IoU was adopted to measure the agreement between the predicted and actual bounding boxes, with higher IoU thresholds demanding higher prediction accuracy. In this study, mAP@0.5 and mAP@0.5:0.95 were applied as the evaluation metrics, where mAP@0.5 denotes the average precision calculated at an IoU threshold of 0.5, and mAP@0.5:0.95 denotes the value calculated through the average precision (AP) at IoU thresholds from 0.5 to 0.95 in increments of 0.05, followed by averaging the Aps for these different thresholds. Wheat grains were categorized into three groups for this study, and the mAP was computed via the following formula:

$$\:\begin{array}{c}mAP\:=\frac{{\sum\:}_{i=1}^{n}A{P}_{i}}{n}\end{array}$$

(8)

Params denote the sum of all the parameters within a trained model and is used to quantify the size and complexity of the mode [38]. FLOPs represent the number of floating-point operations necessary for a model to execute one forward propagation or backpropagation, which is crucial for assessing the computational complexity of the model. The calculation equation is as follows:

$$\:\begin{array}{c}Params={C}_{0}\times\:\left({k}_{w}\times\:{k}_{ℎ}\times\:{C}_{i}+1\right)\end{array}$$

(9)

$$\eqalign{FLO{P_s} = & \left[ {\left( {{C_i} \times \>{k_w} \times \>{k_h}} \right) + \left( {{C_i} \times \>{k_w} \times \>{k_h} - 1} \right) + 1} \right] \cr& \times \>{C_0} \times \>W \times \>H \cr} $$

(10)

where $\:{k}_{w}$ and $\:{k}_{ℎ}$ represent the convolution kernel width and height respectively; $\:{C}_{0}$ represents the number of output channels; $\:{C}_{i}\:$represents the number of input channels; and W and H represent the length and width of the feature graph, respectively.

Additionally, this study employed the RMSE (Root Mean Square Error), R² (coefficient of determination), and MAE (Mean Absolute Error) as evaluation metrics to assess the performance of model for wheat grain counting. R² was adopted to quantify the correlation between the manually counted and model-predicted kernel numbers. RMSE and MAE measure the discrepancies between the manually counted and model-predicted kernels. Higher R² values indicate better model fitting, whereas lower RMSE and MAE values indicate improved counting accuracy. These metrics are defined as follows:

$$\:\begin{array}{c}RMSE\:=\sqrt{\frac{1}{n}{\sum\:}_{i=1}^{n}{\left({y}_{i}-{x}_{i}\right)}^{2}}\end{array}$$

(11)

$$\:\begin{array}{c}{R}^{2}=1-\frac{{\sum\:}_{i=1}^{n}{\left({x}_{i}-{y}_{i}\right)}^{2}}{{\sum\:}_{i=1}^{n}{\left({x}_{i}-\overline{{x}_{i}}\right)}^{2}}\end{array}$$

(12)

$$\:\begin{array}{c}MAE=\frac{1}{n}{\sum\:}_{i=1}^{n}\left|{y}_{i}-{x}_{i}\right|\end{array}$$

(13)

where $\:{x}_{i}$ and $\:{y}_{i}$ are the numbers of manually counted and predicted grains in the i-th image, respectively; $\:\overline{{x}_{i}}\:$ is the average number of manually counted grains; and n is the total number of images in the verification set.

All experiments in this study were conducted via specific hardware and software configurations: a desktop workstation featuring a 13th generation Intel(R) Core (TM) i9-13900 K processor (32 processors) @ 3.0 GHz paired with an NVIDIA GeForce RTX 4080 graphics card (32 GB × 2). The operating system used was Windows 10 (64-bit), with PyCharm 2022 as the translation platform and Python 3.10 as the programming language. The deep learning framework employed was PyTorch 1.10 with CUDA 11.8 and cuDNN 7.6.5 for optimized training performance. The experiment utilized an input image size of 640 × 640, ran for 200 epochs, employed a stack size of 16, initiated with a learning rate of 0.01, a weight decay of 0.0005, and a momentum parameter of 0.937. The model parameters were further refined via an SGD optimizer.

Results

Ablation experiments

To assess the effects of various enhancement strategies on model detection performance, we conducted ablation tests on the wheat grain dataset under consistent configuration conditions. Model performance was evaluated using metrics such as mAP@0.5, mAP@0.5:0.95, F1 scores, FLOPs, and Params. The original YOLOv7 model served as the baseline, with systematic incremental improvements. The experimental results are listed in Table 2.

Table 2 Results of ablation experiments

Full size table

In terms of the lightweight design, we introduce the P-ELAN and P-SPPCSPC lightweight modules and analyze the changes in model parameters and computations in experiments No. 1 and No. 2. The results showed that in experiment No. 1, the model’s parameter count was reduced by 4.51 G, and the number of computations decreased by 20.30 M. In experiment No. 2, the parameter count and the number of computations were further reduced by 10.11 G and 31.80 M. In terms of feature fusion, the introduction of the ASF-GD feature fusion module in experiment No. 3 significantly improved the model’s detection performance. Specifically, mAP@0.5 increased from 91.59 to 92.54%, mAP@0.5:0.95 increased from 59.09 to 62.29%, and the F1 score increased from 0.912 to 0.934. After introducing the EMA attention mechanism in experiment No.3, the model’s detection performance improved once again. Specifically, mAP@0.5 increased by 0.61–93.15%, and the F1 score increased by 1.2% to 0.946. Overall, by integrating the lightweight module, ASF-GD mechanism, and EMA attention module, the GrainNet model achieves improvements of 1.59% in terms of mAP@0.5, 8.83% in terms of mAP@0.5:0.95, and 2.71% in the F1 scores. Compared with the original YOLOv7 model, the enhanced model notably enhances the performance while reducing the number of computations by 15.31% and the number of parameters by 16.30%.

Comparative experiments

To evaluate the performance of different models in the wheat grain detection task, we compared five target detection models (Faster-RCNN, YOLOv5, YOLOv7, YOLOv8, and GrainNet) with their performance metrics under identical training conditions. Specifically, the performance metrics include precision, recall, F1 score, and mAP@0.5, as shown in Table 3. The results showed that GrainNet surpassed the other models across all metrics, achieving an overall accuracy of 97.32%, an F1 score of 0.946, and an mAP@0.5 of 93.15%. This comparison demonstrated that GrainNet significantly outperformed the other models in terms of overall performance, validating its effectiveness for wheat grain detection in various scenarios. In summary, GrainNet significantly improved detection accuracy, enhanced its reliability in complex scenarios.

Table 3 Comparison of test results of different detection algorithms

Full size table

Table 4 compares the performance of the YOLOv7 and GrainNet models across the accuracy rate, recall rate, average accuracy, and F1 score for the three wheat varieties. The results demonstrated that GrainNet enhanced the wheat grain detection accuracy for the “G”, “Y”, and “B” varieties by 0.65%, 2.17%, and 1.20%, respectively, compared with the original YOLOv7 model. Correspondingly, recall rates improved by 6.38%, 0.73%, and 5.15%, respectively. At an IoU threshold of 0.5, the average accuracy increased by 2.79%, 0.25%, and 1.78%, and the F1 values increased by 3.49%, 1.28%, and 3.08%, respectively. GrainNet maintained a detection speed of 29.10 frames/sec, ensuring its suitability for real-time applications. In summary, the improved GrainNet model effectively and accurately detected wheat grains in complex scenarios, thereby surpassing the performance of the original YOLOv7 model.

Table 4 Comparison of the experimental results before and after improvement

Full size table

To evaluate the detection performance of GrainNet in scenarios involving highly adhered wheat grains, we conducted a comparative experiment with YOLOv8. Detection was performed on three types of wheat grain images with high adhesion, captured at a height of 20 cm, and the results are presented in Fig. 8. For the detection of the “G,” “Y,” and “B” wheat varieties, GrainNet demonstrated superior performance compared to YOLOv8. The upper left corner of each image displays the counting results labeled with a black number, where “GT” denotes the actual number of wheat grains. The experimental results show that YOLOv8 had more missed counts across all three varieties compared to GrainNet. Additionally, YOLOv8 exhibited instances of miscounting in the detection of the ‘B’ and ‘Y’ varieties, whereas GrainNet did not exhibit similar issues. The results indicate that GrainNet demonstrates superior accuracy in identifying and counting wheat grains in scenarios with high adhesion, compared to YOLOv8.

Comparison of counting performance

To assess the overall counting performance of the model, we conducted tests on the wheat grain target counts in each image from the validation set. Table 5 presents summary statistics of the manually counted grains, detailing the mean, minimum, maximum, and coefficient of variation (CV) per variety. The grain counts per image ranged widely from 10 to 300, with over 95% CV values, indicating significant variability across all varieties. These randomly selected images demonstrate the wide range of grain counts, reflecting the diversity in the dataset.

Table 5 Summary statistics of wheat grain quantity

Full size table

To comprehensively evaluate the performance of the proposed GrainNet model, we conducted a detailed comparative analysis of the five target detection models. Table 6 presents the performance of GrainNet in wheat grain counting compared with the four other detection models. The results indicated that GrainNet exhibited outstanding performance, surpassing the other models in both count predictions and accuracy. Specifically, GrainNet achieved an R² value of 0.93, indicating a strong fit with the actual counts, and the MAE was only 5.97, demonstrating the smallest counting deviation. Additionally, the counting accuracy of GrainNet reached 94.47%, significantly outperforming the Faster R-CNN and YOLO series models. These findings demonstrated GrainNet’s significant improvement over other models in counting wheat grains, validating the effectiveness of the proposed approach.

Table 6 Comparison of the counting performances of different models

Full size table

Figure 9 illustrates the performance of the various models in intensive wheat‒grain detection tasks. The upper left corner of each image displays the counting results labeled with a black number, where “GT” denotes the actual number of wheat grains. The original image contains 300 seed targets of varieties “B”, “G”, and “Y”. Figure 8 shows that the GrainNet model excels in intensive grain-detection tasks, demonstrating the best detection effectiveness. Following GrainNet, YOLOv8 and YOLOv7 exhibited minimal missed detections. YOLOv5 also performed well but presented few missed seeds. The YOLOv8, YOLOv7 and YOLOv5 models all had a small number of false detections. In contrast, the two-stage detection model, Faster R-CNN, missed more targets. These findings reaffirmed GrainNet’s superiority in handling intensive wheat grain detection tasks, highlighting its advantages in terms of accuracy and miss rate compared with other models.

Evaluation of model generalization performance

Model performance over varieties

To further validate the proposed method, we manually counted 200 randomly selected images of wheat grains from a 2024 experimental field at the Academy of Agricultural and Forestry Sciences in Jiaozuo City, Henan Province, China. Figure 10 shows that GrainNet significantly outperformed YOLOv7 in wheat grain counting tasks. Specifically, GrainNet achieved higher accuracy and consistency in detecting three distinct wheat varieties, with R² values of 0.94, 0.93, and 0.93, respectively, indicating a strong correlation with manual counting results. These findings indicate that the GrainNet model effectively adapts to morphological differences among wheat varieties, ensuring robust detection and performance. In contrast, YOLOv7 exhibited larger detection errors and struggled with complex or high-density grain scenarios.

3.4.2 Evaluation the influence of different imaging heights and varying degrees of grain adhesion on the model.

To investigate the influence of different imaging heights and levels of grain adhesion on the detection results, we designed and conducted two sets of experiments to evaluate the model’s adaptability under varying conditions. The results are shown in Fig. 11.

In the first set of experiments, we examined the effect of imaging heights (10, 15, and 20 cm) on the detection performance of three wheat varieties (Variety B, Variety G, and Variety Y) under high-density adhesion conditions, as shown in Fig. 11(a), (b), and (c). The results indicate that at imaging heights of 10 and 15 cm, the detection performance for all varieties remained relatively stable, with minimal fluctuations in the model’s predicted counts, demonstrating that GrainNet can maintain stable detection accuracy at these distances. However, when the imaging height was increased to 20 cm, the detection performance declined slightly, particularly for Variety B.

In the second set of experiments, we examined the effect of varying adhesion levels (low, medium, and high density) on the detection performance of wheat grains from the three varieties at a fixed imaging height of 20 cm, as shown in Fig. 11(d), (e), and (f). The results indicate that in low and medium adhesion scenarios, the GrainNet model effectively detected wheat grains from all three varieties, with a relatively consistent distribution of predicted counts. This demonstrates the model’s capability to handle grain recognition tasks at these adhesion levels. However, when adhesion reached a high density, the model’s detection accuracy for Variety B was slightly lower than that of Varieties G and Y.

In order to evaluate the performance of the model in a complex environment more comprehensively, we further discussed the influence of imaging height and adhesion degree on the model performance, and the specific results are shown in Table 7.

In terms of imaging height, we tested the effect of different imaging heights (10, 15, 20 cm) on the detection performance of wheat grains under the condition of high density adhesion. The experimental results show that the MAE value of the model is less than 1 at the imaging height of 10 cm and 15 cm, which shows excellent performance. However, MAE and RMSE values increased when the image height increased to 20 cm, indicating that the performance of the model decreased. This shows that with the increase of imaging height, the detection accuracy of the model is affected to some extent.

In terms of the degree of adhesion, we tested the effects of low, medium and high density adhesion on the grain detection performance of GrainNet model wheat at a fixed imaging height of 20 cm. The experimental results show that the MAE value of the model is less than 4 under the conditions of low density and medium density adhesion, and the model performs well. However, the MAE and RMSE values of the model increase significantly under the condition of high density adhesion, indicating that high density adhesion leads to more errors and detection difficulties.

In general, the imaging height and the degree of adhesion have significant effects on the detection performance of the model. Moderate imaging height and adhesion can improve the accuracy of the model, while high imaging height and high density adhesion can lead to poor performance. These two experimental results are consistent with the conclusions drawn in Fig. 11, further verifying the influence of these factors on the model performance.

Table 7 Influence of different detection conditions on model detection results

Full size table

Impact of different imaging backgrounds on the model

To investigate the effect of different backgrounds on wheat grain detection, we tested the detection performance of the “G” wheat variety against six different backgrounds at a fixed imaging height of 20 cm. The results are shown in Fig. 12. In conditions of severe grain adhesion, GrainNet exhibited fewer missed detections across different backgrounds compared to the original YOLOv7 model, demonstrating strong robustness and adaptability.

Discussion

Hyperparameter sensitivity analysis

To assess the impact of key hyperparameters on model performance, we performed sensitivity analyses for learning rate and batch size. Due to the high computational cost of performing a comprehensive hyperparameter search on a complete dataset, we randomly selected 500 images from the entire dataset to ensure that different detection conditions (including background, variety, imaging height, adhesion, etc.) were covered to obtain representative experimental results, as shown in Table 8.

(1) Analysis of the influence of learning rate.

The learning rate is crucial to the convergence rate and final performance of the model. The experimental results show that although the higher learning rate (0.1) can converge quickly in the early stage, it is easy to cause the model performance instability. When the learning rate was reduced to 0.01, the performance of the model was improved to achieve the best effect (P = 93.0%, R = 81.3%, mAP@0.5 = 84.0%). However, when the learning rate drops to 0.001, the performance of the model decreases significantly, mAP@0.5 is only 35.9%, indicating that the smaller learning rate leads to slow training and even difficult convergence of the model. Therefore, we finally chose the learning rate of 0.01 as the optimal value to strike a balance between the convergence speed and the final accuracy.

(2) Analysis of the impact of batch size.

The choice of batch size also affects the stability and performance of the model. It can be seen from the experimental results that a small batch size leads to poor model performance, mAP@0.5 is only 71.5%, which may be due to the instability and low convergence efficiency of gradient update. When the batch size was increased to 8, the model performance was significantly improved, mAP@0.5 increased to 80.8%. When the batch size was further increased to 16, the model achieved the best performance, the accuracy was 93.0%, the recall rate was 81.3%, and mAP@0.5 was 84.0%. However, when the batch size increased to 32, the model performance decreased to 82.6%, mAP@0.5, possibly because the larger batch size reduced the frequency of gradient updates and affected the generalization ability of the model. Therefore, we ultimately chose batch size 16 to ensure training stability while obtaining the best performance.

In summary, the hyperparameter sensitivity analysis of this study shows that the learning rate and batch size have significant effects on the performance of the target detection model. A reasonable learning rate setting can achieve a good balance between convergence speed and final accuracy, and an appropriate batch size helps to improve the stability of gradient estimation. In the selection of momentum parameters, 0.937 was adopted and combined with SGD optimizer for training. This setting is based on a study by Geoffrey Hinton et al. in the journal Science [39], which shows that a momentum parameter of 0.937 can effectively accelerate convergence and enhance model stability. Future research could explore automated hyperparameter optimization methods, such as Bayesian optimization or deep learning optimization algorithms combined with grid search, to further improve model performance.

Table 8 Hyperparameter sensitivity analysis

Full size table

ANOVA statistical test

To verify the statistical significance of the GrainNet model in terms of R2, mean absolute error (MAE), and root mean square error (RMSE), this study employed a one-way analysis of variance (ANOVA) test [40]. The performance of GrainNet was compared with that of YOLOv7, YOLOv8, FasterRCNN, and YOLOv5 to examine whether the differences in the given performance metrics are statistically significant. The null hypothesis (H0) states that the average performance of the models across all groups is equal. The null hypothesis is given by Eq. 14 [41, 42]. The difference in means is said to be significant if and only if the alternative hypothesis (H1), given by Eq. 15, is true.To conduct the ANOVA, each model group was independently trained and tested five times. The results of the variance analysis tests for R2, MAE, and RMSE are presented in Tables 9, 10 and 11.

$$\eqalign{{H_0}:{\mu _{GrainNet}}{\mkern 1mu} & = {\mu _{YOLOv7}} = {\mu _{YOLOv8}} \cr& = {\mu _{FasterRCNN}} = {\mu _{YOLOv5}} \cr} $$

(14)

$$\:\begin{array}{c}{H}_{1}:\exists\:i,j,{{\upmu\:}}_{\text{i}}\ne\:{{\upmu\:}}_{\text{j}}\:(i,j\in\:model\:groups)\end{array}$$

(15)

Table 9 ANOVA on R2 of various models

Full size table

Table 10 ANOVA on MAE of various models

Full size table

Table 11 ANOVA on RMSE of various model

Full size table

The results of the analysis of variance (ANOVA) indicate that there are significant differences in the means of different object detection models across the three performance metrics (R², MAE, and RMSE) (P < 0.05, F-statistic far exceeds the critical value). Therefore, we can reject the null hypothesis that “all models perform the same,” suggesting significant statistical differences among the models. However, ANOVA only reveals overall differences and does not directly indicate which models are superior. To further identify which models exhibit significant differences, we conducted a post hoc Least Significant Difference (LSD) test and calculated 95% confidence intervals (CI). The results showed that, in terms of R², MAE, and RMSE, GrainNet outperformed the other models. Specifically, GrainNet achieved the best performance in R² (0.921), MAE (5.57), and RMSE (20.53), with R² improving by 9.75%, and MAE and RMSE decreasing by 38.9% and 32.5%, respectively, compared to the baseline model YOLOv7. Additionally, the ANOVA for all three metrics showed extremely significant differences (P < 1e-15, F-value > 100), indicating a statistically significant performance difference between GrainNet and YOLOv7. To further assess the practical significance of the improvements, we quantified the effect size of GrainNet relative to YOLOv7, with effect sizes for R², MAE, and RMSE being 2.31, 1.83, and 1.67, respectively, all greater than 1.5, indicating that these improvements are practically meaningful and beyond the range of random fluctuations. In terms of model stability, FasterRCNN performed the worst across all metrics, particularly exhibiting large errors in MAE and RMSE. YOLOv8 and YOLOv7 performed at an intermediate level, weaker than GrainNet. Although the performance difference between GrainNet and YOLOv5 was not statistically significant, GrainNet performed better across all three metrics (R², MAE, and RMSE), with a smaller standard deviation, indicating more stable error control. Moreover, the average direction of the differences across all performance metrics was consistent, showing improved accuracy and reduced errors. Therefore, we conclude that the overall performance of the GrainNet model is superior to that of the other models.

Attention mechanism analysis

Different attention mechanisms have varying impacts on model performance, and the selection of an attention mechanism should be adapted to the specific requirements of the task. Guo et al. classified attention mechanisms in computer vision into six categories: channel attention, spatial attention, temporal attention, branch attention, channel and spatial attention, and spatial and temporal attention [43]. ECA and SE are examples of channel attention mechanisms, which focus on enhancing the representation power of important channels by weighting channel feature responses, typically relying on global information. CBAM, on the other hand, combines both channel and spatial attention mechanisms, applying weighting in both the channel and spatial dimensions to comprehensively capture key image information. However, these mechanisms emphasize single-feature processing, which makes it difficult to effectively integrate multi-scale information, thus limiting performance. In contrast, EMA improves the model’s feature representation capability by aggregating multi-scale features, incorporating information from different receptive fields. Unlike traditional attention mechanisms, EMA emphasizes the efficient fusion of features across multiple scales, rather than just applying weights in the channel or spatial dimensions. This approach allows EMA to capture image information more comprehensively and ensures more efficient information flow within the network, avoiding the issues of local information loss or inefficient information propagation common in traditional mechanisms. Compared to CBAM, SE, and ECA, EMA requires fewer parameters, making it more suitable for lightweight models.

Experimental results show that, with fewer parameters and similar computational complexity, EMA outperforms other mechanisms in YOLO V7 network improvement experiments. In the wheat grain counting task, adding EMA to the model resulted in an mAP@0.5 of 89.22 and an mAP@0.5:0.95 of 56.18, demonstrating its effectiveness. Therefore, the EMA attention mechanism was selected for the GrainNet model to enhance wheat grain detection performance. Table 12 presents the impact of different attention mechanisms on model performance. Figure 13 displays the confusion matrix for the various attention mechanisms.

Table 12 The impact of different attention mechanisms on model performance

Full size table

Limitations and future prospects

Although the GrainNet model demonstrates strong detection accuracy and stability, it still has some limitations in practical applications. First, the dataset used for training and testing consisted primarily of samples collected from experimental fields at the Agricultural and Forestry Science Research Institute in Jiaozuo, Henan Province. This geographic limitation may affect the model’s ability to generalize to other regions or environments. In particular, its detection performance needs further validation when handling wheat varieties with similar grain colors or when subjected to varying climatic conditions. Additionally, the detection accuracy decreases in high-density grain adhesion scenarios, especially for Variety B, suggesting that the model’s ability to differentiate between closely adhered grains with similar morphological features requires further improvement. Moreover, although GrainNet’s detection speed meets most real-time detection requirements, its real-time performance and processing efficiency still have room for optimization in larger-scale or higher-resolution applications.

To address these limitations, future studies should focus on several areas. First, expanding the dataset by incorporating samples from diverse geographic regions, wheat varieties, extreme lighting conditions, and complex backgrounds will enhance the model’s generalization capabilities. Integrating advanced image-processing algorithms or attention mechanisms may be beneficial for improving the detection accuracy in high-density adhesion scenarios. In addition, leveraging hyperspectral imagery and depth information can further boost the robustness of the model under challenging conditions. Finally, optimizing the model’s computational efficiency through techniques such as model compression and knowledge distillation could make GrainNet more suitable for large-scale agricultural automation applications.

Conclusions

The efficient and accurate counting of wheat grains is critical for seed testing. This study introduced GrainNet, a novel counting method based on the YOLOv7 network that was optimized for identifying and counting adherent wheat kernels under complex conditions. We created a wheat grain dataset and applied data augmentation to enhance the model’s ability to learn robust wheat grain features. GrainNet improved upon the YOLOv7 architecture by substituting the Pconv convolutions for standard convolutions to reduce the parameters and computational load. Furthermore, the enhancements in the neck network leveraged the ASF-GD mechanism to enhance feature information transfer across network layers, thereby promoting deeper integration of the information. In addition, the EMA attention module in the Head part of the model enhanced small-target recognition in the images. The experimental results on the constructed dataset demonstrated GrainNet’s outstanding counting performance with a correlation coefficient of 0.93, MAE of 5.97, RMSE of 23.15, and accuracy rate of 94.47%. The model consistently performed well across various detection conditions, as supported by ablation studies that confirmed the efficacy of the enhanced modules in enhancing network performance.

The method proposed in this study markedly enhanced the efficiency and accuracy of wheat seed detection and counting. The results indicated an overall accuracy of 97.32%, recall rate of 92.05%, and mAP@0.5 of 93.15%, demonstrating strong performance across various practical scenarios. These findings demonstrate the robustness and reliability of the model. Therefore, the GrainNet model proposed in this study can effectively detect wheat grain targets in diverse environments, offer significant technical support, and serve as a valuable reference for agricultural wheat grain seed testing.

Data availability

Data is provided within the manuscript.

References

Pang DH, Wang H, Chen P, et al. Spider mites detection in wheat field based on an improved retinanet. Agriculture-Basel. 2022;12(12):2160. https://doiorg.publicaciones.saludcastillayleon.es/10.3390/agriculture12122160.
Article Google Scholar
Liang N, Sun SS, Yu JJ, et al. Novel segmentation method and measurement system for various grains with complex touching. Comput Electron Agric. 2022;202:107351. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.compag.2022.107351.
Article Google Scholar
Zou Y, Tian ZF, Cao JW, et al. Rice grain detection and counting method based on tcle-yolo model. Sensors-Basel. 2023;23(22):9129. https://doiorg.publicaciones.saludcastillayleon.es/10.3390/s23229129.
Article PubMed PubMed Central Google Scholar
Zhou DX, Wu GP, Yang HW, et al. The improvement of adhesive grain particles image segmentation algorithm based on the mathematical morphology. J Agricultural Mechanization Res. 2010;32(7):49–52. https://doiorg.publicaciones.saludcastillayleon.es/10.3969/j.issn.1003-188X.2010.07.013.
Article Google Scholar
Wu WH, Zhou L, Chen J, et al. Gaintkw: A measurement system of thousand kernel weight based on the android platform. Agronomy-Basel. 2018;8(9):178. https://doiorg.publicaciones.saludcastillayleon.es/10.3390/agronomy8090178.
Article Google Scholar
Chen YM, Lin P. Automatically determining the segmentation lines between images of adherent rice grains. Appl Eng Agric. 2017;33(5):603–9. https://doiorg.publicaciones.saludcastillayleon.es/10.13031/aea.11213.
Article Google Scholar
Yang SQ, Ning JF, He DJ. Identification of tipcap of agricultural kernel based on Harris algorithm. Trans Chin Soc Agricultural Mach. 2011;42(3):166–9.
Google Scholar
Song CX, Yu CY, Xing YC, et al. Algorith for acquiring multi-phenotype parameters of soybean seed based on Opencv. Trans Chin Soc Agricultural Eng. 2022;38(20):156–63. https://doiorg.publicaciones.saludcastillayleon.es/10.11975/j.issn.1002-6819.2022.20.018.
Article Google Scholar
Mao YW, Han JY, Liu CZ. Automated flax seeds testing methods based on machine vision. Smart Agric. 2024;6(1):135–46. https://doiorg.publicaciones.saludcastillayleon.es/10.12133/j.smartag.SA202309011.
Article Google Scholar
Liu T, Chen W, Wang YF, et al. Rice and wheat grain counting method and software development based on android system. Comput Electron Agric. 2017;141:302–9. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.compag.2017.08.011.
Article Google Scholar
Zhao HM, Ge CJ, Jia JQ, et al. Study on high-throughput phenotyping system of wheat grains based on image analysis. Shandong Agricultural Sci. 2021;53(6):113–20. https://doiorg.publicaciones.saludcastillayleon.es/10.14083/j.issn.1001-4942.2021.06.020.
Article Google Scholar
Huang YN, Qian YR, Wei HY, et al. A survey of deep learning-based object detection methods in crop counting. Comput Electron Agric. 2023;215:108425. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.compag.2023.108425.
Article Google Scholar
Deng J, Zhang XH, Yang ZQ, et al. Pixel-level regression for uav hyperspectral images: deep learning-based quantitative inverse of wheat Stripe rust disease index. Comput Electron Agric. 2023;215:108434. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.compag.2023.108434.
Article Google Scholar
Sajitha P, Andrushia AD, Anand N, et al. A review on machine learning and deep learning image-based plant disease classification for industrial farming systems. J Ind Inf Integr. 2024;38:100572. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.jii.2024.100572.
Article Google Scholar
Wang BB, Zhang CX, Li YY, et al. An ultra-lightweight efficient network for image-based plant disease and pest infection detection. Precis Agric. 2023;24(5):1836–61. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s11119-023-10020-0.
Article Google Scholar
Xing S, Lee HJ. Crop pests and diseases recognition using Danet with Tldp. Comput Electron Agric. 2022;199:107144. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.compag.2022.107144.
Article Google Scholar
Bai YF, Yu JZ, Yang SQ, et al. An improved Yolo algorithm for detecting flowers and fruits on strawberry seedlings. Biosyst Eng. 2024;237:1–12. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.biosystemseng.2023.11.008.
Article CAS Google Scholar
Ghosal S, Zheng BY, Chapman SC, et al. A weakly supervised deep learning framework for sorghum head detection and counting. Plant Phenomics. 2019;2019:1525874. https://doiorg.publicaciones.saludcastillayleon.es/10.34133/2019/1525874.
Article PubMed PubMed Central Google Scholar
Li ZP, Zhu YJ, Sui SS, et al. Real-time detection and counting of wheat ears based on improved yolov7. Comput Electron Agric. 2024;218:108670. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.compag.2024.108670.
Article Google Scholar
Cardellicchio A, Solimani F, Dimauro G, et al. Detection of tomato plant phenotyping traits using yolov5-based single stage detectors. Comput Electron Agric. 2023;207:107757. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.compag.2023.107757.
Article Google Scholar
Duc NT, Ramlal A, Rajendran A, et al. Image-based phenotyping of seed architectural traits and prediction of seed weight using machine learning models in soybean. Front Plant Sci. 2023;14:1206357. https://doiorg.publicaciones.saludcastillayleon.es/10.3389/fpls.2023.1206357.
Article PubMed PubMed Central Google Scholar
Murphy KM, Ludwig E, Gutierrez J, et al. Deep learning in image-based plant phenotyping. Annu Rev Plant Biol. 2024;75(1). https://doiorg.publicaciones.saludcastillayleon.es/10.1146/annurev-arplant-070523-042828.
Chen JC, Ma BX, Ji C, et al. Apple inflorescence recognition of phenology stage in complex background based on improved yolov7. Comput Electron Agric. 2023;211:108048. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.compag.2023.108048.
Article Google Scholar
Du XQ, Cheng HC, Ma ZH, et al. Dsw-yolo: A detection method for ground-planted strawberry fruits under different occlusion levels. Comput Electron Agric. 2023;214:108304. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.compag.2023.108304.
Article Google Scholar
Yang S, Zheng LH, He P, et al. High-throughput soybean seeds phenotyping with convolutional neural networks and transfer learning. Plant Methods. 2021;17(1):50. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13007-021-00749-y.
Article PubMed PubMed Central Google Scholar
Zang HC, Wang YJ, Ru LY, et al. Detection method of wheat Spike improved yolov5s based on the attention mechanism. Front Plant Sci. 2022;13:993244. https://doiorg.publicaciones.saludcastillayleon.es/10.3389/fpls.2022.993244.
Article PubMed PubMed Central Google Scholar
He HT, Ma XD, Guan HO, et al. Recognition of soybean pods and yield prediction based on improved deep learning model. Front Plant Sci. 2023;13:1096619. https://doiorg.publicaciones.saludcastillayleon.es/10.3389/fpls.2022.1096619.
Article PubMed PubMed Central Google Scholar
Wang L, Zhang Q, Feng TC, et al. Wheat grain counting method based on Yolo v7-st model. Trans Chin Soc Agricultural Mach. 2023;54(10):188–197204. https://doiorg.publicaciones.saludcastillayleon.es/10.6041/j.issn.1000-1298.2023.10.018.
Article Google Scholar
Pan WT, Sun ML, Yuan Y, et al. Identification method of wheat grain phenotype based on deep learning of imcascade r-cnn. Smart Agric. 2023;5(3):110–20. https://doiorg.publicaciones.saludcastillayleon.es/10.12133/j.smartag.SA202304006.
Article Google Scholar
Xu X, Geng Q, Gao F, et al. Segmentation and counting of wheat Spike grains based on deep learning and textural feature. Plant Methods. 2023;19(1):77. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13007-023-01062-6.
Article CAS PubMed PubMed Central Google Scholar
Bosquet B, Cores D, Seidenari L, et al. A full data augmentation pipeline for small object detection based on generative adversarial networks. Pattern Recognit. 2023;133:108998. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.patcog.2022.108998.
Article Google Scholar
Wang C-Y, Bochkovskiy A, Liao H-YM. Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. 2022. https://doiorg.publicaciones.saludcastillayleon.es/10.48550/arXiv.2207.02696
Kang M, Ting CM, Ting FF, et al. Asf-yolo: A novel Yolo model with attentional scale sequence fusion for cell instance segmentation. Image Vis Comput. 2024;147:105057. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.imavis.2024.105057.
Article Google Scholar
Wang CC, He W, Nie Y et al. Gold-yolo: efficient object detector via gather-and-distribute mechanism. 2023. https://doiorg.publicaciones.saludcastillayleon.es/10.48550/arXiv.2309.11331
Cheng HX, Qiao QY, Luo XL, et al. Object detection algorithm for uav aerial image based on improved yolov8. Radio Eng. 2024;54(4):871–81. https://doiorg.publicaciones.saludcastillayleon.es/10.3969/j.issn.1003-3106.2024.04.010.
Article Google Scholar
Ouyang DL, He S, Zhan J et al. Efficient multi-scale attention module with cross-spatial learning. 2023. https://doiorg.publicaciones.saludcastillayleon.es/10.48550/arXiv.2406.03902
Chen JR, Kao SH, He H et al. Run, don’t walk: Chasing higher flops for faster neural networks. 2023. https://doiorg.publicaciones.saludcastillayleon.es/10.48550/arXiv.2303.03667
Li ZJ, Yan JF, Zhou J, et al. An efficient smd-pcba detection based on yolov7 network model. Eng Appl Artif Intell. 2023;124:106492. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.engappai.2023.106492.
Article Google Scholar
Hinton GE, Ruslan R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science. 2006;313(5786):504–7. https://doiorg.publicaciones.saludcastillayleon.es/10.1126/science.1127647.
Article CAS PubMed Google Scholar
Singh V, Shrivastava S, Kumar Singh S, et al. StaBle-ABPpred: a stacked ensemble predictor based on BiLSTM and attention mechanism for accelerated discovery of antibacterial peptides[J]. Brief Bioinform. 2022;23(1):bbab439. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/bib/bbab439.
Article CAS PubMed Google Scholar
Singh V, Gupta I, Jana PK. A novel cost-efficient approach for deadline-constrained workflow scheduling by dynamic provisioning of resources. Future Generation Comput Syst. 2018;79:95–110.
Article Google Scholar
Singh V, Gupta I, Jana PK. An energy efficient algorithm for workflow scheduling in IAAS cloud. J Grid Comput. 2020;18(3):357–76.
Article Google Scholar
Guo MH, Xu TX, Liu JJ, et al. Attention mechanisms in computer vision: A survey[J]. Comput Visual Media. 2022;8(3):331–68.
Article Google Scholar

Download references

Funding

This work was supported by Fundamental Research Funds for the Universities of Henan Province (242300420221), National Major Scientific Research Achievement Cultivation Fund (NSFRF240101), Henan Provincial Postdoctoral Research Launch Project (202103072), Henan Polytechnic University Doctoral Fund Project (B2021-19), Henan Polytechnic University’s High-Level Talent Development Program for the Establishment of “Double First-Class” Discipline in Surveying and Mapping Science and Technology (GCCYJ202427), and Henan Polytechnic University’s High-Level Talent Development Program for the Establishment of “Double First-Class” Discipline in Surveying and Mapping Science and Technology (BZCG202301).

Author information

Authors and Affiliations

School of Surveying and Land Information Engineering, Henan Polytechnic University, Jiaozuo, 454000, China
Xin Wang, Changchun Li, Chenyi Zhao, Xifang Wu & Huabin Chai
Shandong Provincial Institute of Land Surveying and Mapping, Jinan, 250102, China
Yinghua Jiao & Hengmao Xiang

Authors

Xin Wang
View author publications
You can also search for this author inPubMed Google Scholar
Changchun Li
View author publications
You can also search for this author inPubMed Google Scholar
Chenyi Zhao
View author publications
You can also search for this author inPubMed Google Scholar
Yinghua Jiao
View author publications
You can also search for this author inPubMed Google Scholar
Hengmao Xiang
View author publications
You can also search for this author inPubMed Google Scholar
Xifang Wu
View author publications
You can also search for this author inPubMed Google Scholar
Huabin Chai
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Xin Wang: Conceptualization, Investigation, Methodology, Supervision, Software, Data curation, Visualization, Writing—original draft, Writing—review and editing. Changchun Li: Project administration, Funding acquisition, Supervision, Writing—review and editing. Chenyi Zhao: Methodology, Visualization, Data curation, Software. Yinghua Jiao: Methodology, Resources. Hengmao Xiang: Methodology, Supervision, Writing—review and editing.Xifang Wu: Writing—review and editing, Funding acquisition. Huabin Chai: Writing—review and Funding acquisition. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Changchun Li or Hengmao Xiang.

Ethics declarations

Ethics and Consent to Participate declarations

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Wang, X., Li, C., Zhao, C. et al. GrainNet: efficient detection and counting of wheat grains based on an improved YOLOv7 modeling. Plant Methods 21, 44 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13007-025-01363-y

Download citation

Received: 29 September 2024
Accepted: 11 March 2025
Published: 25 March 2025
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13007-025-01363-y

GrainNet: efficient detection and counting of wheat grains based on an improved YOLOv7 modeling

Abstract

Background

Results

Conclusions

Introduction

Materials and methods

Overview of the workflow

Data acquisition and preprocessing

GrainNet model structure

ASF-Gather and distribute (ASF-GD) mechanism

GD mechanism

SSFF module and CPAM module

EMA attention mechanism

Lightweight module

Model evaluation index

Results

Ablation experiments

Comparative experiments

Comparison of counting performance

Evaluation of model generalization performance

Model performance over varieties

Impact of different imaging backgrounds on the model

Discussion

Hyperparameter sensitivity analysis

ANOVA statistical test

Attention mechanism analysis

Limitations and future prospects

Conclusions

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Ethics and Consent to Participate declarations

Consent for publication

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Plant Methods

Contact us