Skip to main content

Spectral estimation of the aboveground biomass of cotton under water–nitrogen coupling conditions

Abstract

Aims

Hyperspectral remote sensing technology can quickly obtain above-ground biomass (AGB) information of cotton, playing an important role in realizing accurate management for cotton cultivation.

Methods

Using Tahe-2 as the research object, nitrogen application rates and irrigation amounts were set to 0 (N0), 100 (N1), 150 (N2), 200 (N3), 250 (N4) kg ha− 1 and 4500 (W1), 6000 (W2), 7500 (W3) m³ ha− 1 under the coupled conditions of water and nitrogen. Through correlation analysis between cotton AGB and canopy spectral reflectance, the intersection of feature wavelengths screened by the successive projection algorithm (SPA) and highly significant wavelengths was used as the input vector for modeling. Support vector machine (SVM), regression tree (RT), and convolutional neural network (CNN) were employed to verify the accuracy.

Results

The results revealed the following: (1) The AGB of cotton at the bud stage was highest under the W1N2 gradient. At the flowering stage, the highest AGB was observed under the W3N2 gradient. At the boll stage, the highest AGB was under the W3N0 gradient. (2) The optimal spectral model based on SVM for cotton AGB identification had higher R2 values and lower RMSE values at the boll stage, with R2 = 0.76, RMSE = 0.35 g and RPD = 17.59. The optimal spectral model based on RT had higher R2 values and lower RMSE values at the bud stage, with R2 = 0.79, RMSE = 0.24 g and RPD = 16.18. The optimal spectral model based on CNN also had higher R2 values and lower RMSE values at the bud stage, with R2 = 0.70, RMSE = 0.42 g and RPD = 4.50. These results indicated that the inversion effect at the bud stage was better than at other stages.

Conclusions

In terms of model testing, the RT model was found to be the most accurate for estimating cotton AGB, outperforming SVM and CNN.

Highlights

The AGB value of cotton at bud stage, flowering stage and boll stage reaches the highest at W1N2, W3N2 and W2N4, respectively.

The intersection between the feature wavelengths screened by SPA and the highly significant wavelengths between 325 and 1075 nm was used as the input vector for modeling.

The wavelengths between 700 and 800 nm can be very good for indirect inversion of AGB changes through spectral changes.

The RT model demonstrated higher R2 values and lower RMSE values compared to SVM and CNN.

Introduction

Cotton is an important cash crop with the most extensive planting area in Xinjiang, and the planting area is approximately one third of the total cultivated land [1]. Among them, biomass is an important agronomic parameter reflecting vegetation life activities, closely related to crop growth and yield and a widely used indicator in agricultural monitoring [2, 3]. However, in the traditional agricultural production process, the characteristics of cotton growth changes cannot be accurately and quickly detected [4,5,6].

At present, the use of hyperspectral technology or UAV satellite remote sensing technology to study plant biomass and estimation models has advanced significantly. Hojo et al. [7] used open source satellite and multi-source data to estimate above-ground biomass (AGB) at two test sites in Hokkaido, Japan, and found that the machine learning model performed well, and the importance of variables varied according to geographical location (e.g., L-band backscattering, canopy height model, slope, slope direction, average annual temperature, annual precipitation, and forest type). Their research also shows that non-synthetic aperture radar (SAR) variables play an important role in large-scale mapping. Liu et al. [8] carried out a ground spectrum measurement experiment on typical herbs in the typical temperate grassland of Inner Mongolia, and found that the overall accuracy of random forest (RF) and support vector machine (SVM) models was low, but SVM had high accuracy in identifying some plant species (such as mugworm and scallion), and the improved convolutional neural network (CNN) model had the highest accuracy. Tang et al. [9] established leaf area index (LAI) and AGB estimation models by collecting spectral data of winter wheat and calculating relevant spectral indices, and found that the modeling method combined with first-order differential spectral index and RF was the best, and the model had high accuracy, which could provide references for crop monitoring and parameter estimation. Yang et al. [10] pointed out that optimizing spectral index is the best input variable of RF model, which can significantly improve the prediction accuracy of AGB and reduce the number of input variables. Ling et al. [11] studied the ability of spectral features, image texture and their combination to estimate AGB of winter wheat, and found that different models showed different performances at different growth stages, and the combination of spectrum and image texture could effectively improve the accuracy of AGB estimation, especially at late seedling stage.

Studies on crop agronomic parameters have focused mainly on the physiological and ecological parameters and AGB of crops and have established an optimal spectral estimation model by comparing the correlation between the AGB of crops and the spectral reflectance [12,13,14,15]. However, most of the current studies on cotton AGB estimation focus on the influence of a single factor (such as irrigation or fertilization), and the studies on water-nitrogen coupling conditions are limited. In addition, the importance of variable selection in hyperspectral data analysis has not been fully explored.

In summary, cotton AGB is a key indicator to measure the growth status and yield potential of cotton, which directly affects the final yield and economic benefits of cotton, and has become the object of inversion for most scholars in smart agriculture. Therefore, the main contents of this study are as follows: (1) To clarify the variation characteristics of cotton AGB in cotton fields at different growth stages under different water-nitrogen coupling conditions, so as to provide a scientific basis for nutrient management of cotton in precision agriculture; (2) To establish a diagnostic model of water and nitrogen nutrition of cotton at different growth stages based on the combination of spectral characteristics and SPA, and to establish the optimal model to achieve rapid and non-destructive monitoring of cotton growth and nutrient requirements.

Materials and methods

Overview of the study area

Figure 1 shows the details of specific Study location overview in this study. The research area is located at the Batuan National Field Scientific Observation and Research Station in Aksu Prefecture, Xinjiang. It is located in the Tarim Basin, which has a typical warm temperate zone with an extreme continental arid desert climate and features hot summers and cold winters, sunshine duration, scarce precipitation and intense evaporation. The altitude is 1025 m above sea level, and the annual average temperature in the area is 11.5 °C. The average annual precipitation is 48.2 mm, and the frost-free period lasts 213 days.

Fig. 1
figure 1

Research area

Experimental design

This experiment employed a Split-Plot Design (SPD) and different levels of irrigation water and nitrogen application set for different gradients. Five levels of nitrogen application were established: 0 (N0) kg ha− 1, 100 (N1) kg ha− 1, 150 (N2) kg ha− 1, 200 (N3) kg ha− 1 and 250 (N4) kg ha− 1 and the three levels of irrigation water were applied: 4500 (W1) m3 ha− 1, 6000 (W2) m3 ha− 1 and 7500 (W3) m3 ha− 1 in the 15 plots. Each plot had a total area of 1/45 ha. Three rows of cotton were planted in each plot, and two representative cotton plants were selected from each row for spectral determination. Samples were collected at three growth stages: bud stage, flowering stage, and boll stage. A total of 90 samples were collected at each stage, and 2/3 of these samples were randomly selected as the training set, while the remaining samples were used as the validation set. SVM, RT and CNN were employed to establish the cotton AGB estimation models. The accuracy of these models was evaluated through analysis and comparison.

Data acquisition

Collection and determination of the cotton samples

Cotton samples were collected from June to August 2023. The total area of the cotton field in the test area was 0.33 hm2. The selected cotton variety was Tahe 2. T The planting method is three rows of film per plot. Each row of film has six rows of cotton. Three cotton plants were selected as representative samples at two points in an area of 0.5 m × 0.5 m on each film. Simultaneously, the underground parts of the cotton were removed and placed in a constant-temperature drying oven. The temperature in the oven was raised to 105 °C, and after heating for half an hour, the samples were dried to constant weight at 85 °C. The dry weight of the cotton AGB was quickly weighed to obtain the measured cotton AGB value.

Determination of the spectral data of the cotton canopy

In this study, an ASD FieldSpec HandHeld 2 portable ground spectrometer was used to obtain spectral information from cotton leaves. The instrument was used to measure spectral reflectance, with a spectral range of 325–1075 nm and a spectral resolution of 1 nm and the sampling interval was 1.4 nm. The reflectance was derived from the cotton canopy, which had no pests or diseases, no missing seedlings or ridges, and was in a uniform growth state. After the spectral reflectance of 10 cotton leaves at each point was measured, the average value was used as the average reflectance of the point.

Data processing and analysis

Screening feature band

The original band was smoothed by Savitzky-Golay convolution [16]. The correlation between the cotton AGB and the canopy spectral reflectance was analyzed under the coupling conditions of water and nitrogen in different periods. The highly significant wavelengths (R > 0.267) were extracted, and the intersection between the feature wavelengths screened by SPA and the highly significant wavelengths between 325 and 1075 nm was used as the input vector for modeling.

SPA is a forward variable selection algorithm that minimizes the collinearity of the vector space and is used mainly for screening the spectral feature wavelengths. With the help of vector projection analysis, wavelengths are projected onto other wavelengths. By comparing the size of the projection vectors, the wavelength with the maximum projection vector is regarded as the selected wavelength, and the final feature wavelengths are chosen according to the correction model [17]. In this study, MATLAB R2022a was used to screen the feature wavelengths of the original wavelength range of 325–1075 nm.

Model construction and accuracy checking

In this study, MATLAB R2022a was used to run the relevant models. A support vector machine (SVM), regression tree (RT) and convolutional neural network (CNN) were used to establish the optimal model for the AGB of cotton.

SVM analysis is an efficient supervised learning pattern recognition algorithm with structural risk minimization as its core concept. It constructs the optimal classification hyperplane by mapping inseparable low-dimensional data to high-dimensional space using a kernel function. SVM has the advantages of strong adaptability, global optimization, simple structure, and high accuracy for small sample, nonlinear, and high-dimensional data estimation. It has been widely used in near-Earth hyperspectral analysis and inversion [18, 19]. In this study, the MATLAB R2022a SVR model type with a radial basis function (RBF) kernel (g = 0.8) was used, and the optimal kernel function and penalty factor (c = 4.0) were selected for modeling.

RT is an application of the decision tree algorithm in regression problems. It subdivides the data space step by step through binary partitioning of the input data features to form a tree-like structure. At each leaf node, the model provides a predicted value, typically the mean of the target variable for the samples contained in that node [20]. The MATLAB R2022a RT model type was used to reduce the dimensionality of the original data and then model.

CNN is a feedforward neural networks that automatically extracts features from input images, requiring fewer parameters for training and thus being more convenient for optimization. Typical CNN architectures generally contain an input layer, hidden layers (convolutional, activation, pooling, and fully connected layers), and an output layer [21]. Here, the Stochastic Gradient Descent with Momentum (SGDM) optimization algorithm was used to train the neural network. SGDM extends the basic stochastic gradient descent by adding a momentum term to accelerate convergence and reduce oscillations during training. The MATLAB R2022a CNN model type was adopted, with a training batch size of 30 samples, a maximum training iteration of 2000, an initial learning rate of 0.01, and a learning rate decay factor of 0.5. The learning rate was adjusted to 0.01 × 0.5 every 400 iterations.

The accuracy evaluation of the models was based on the determination coefficient (R2), root mean square error (RMSE), and relative predictive deviation (RPD). The RPD (percentage of relative deviation) is a statistical measure used to assess the accuracy of a forecasting model, calculated as the ratio of the standard deviation of the residuals to the mean of the observed values, expressed as a percentage. Higher verification R2, lower RMSE, and RPD values greater than 1.4 [22] indicate better prediction performance.

$${R^2} = 1 - \frac{{\sum\nolimits_{i = 0}^n {{{(AG{B_i} - AGB{P_i})}^2}} }}{{\sum _{i = 0}^n{{(AG{B_i} - \overline {AG{B_i}} )}^2}}}$$
(1)
$${\rm{RMSE = }}\sqrt {\frac{{\sum _{i = 0}^n{{(AG{B_i} - AGB{P_i})}^2}}}{n}}$$
(2)
$${\rm{RPD}} = {\rm{SD/RMSE}}$$
(3)
$${\rm{SD = }}\sqrt {\frac{1}{n}\sum _{i = 0}^n{{(AG{B_i} - \overline {AG{B_i}} )}^2}}$$
(4)

Where, AGB is the measured above-ground biomass of cotton. AGBP is the predicted value based on cotton abovementioned biomass estimation model. \(\:\stackrel{-}{{AGB}_{i}}\) is the mean value of cotton abovementioned biomass and n represents the number of samples.

Results

Changes in the AGB of cotton during the different periods with different amounts of irrigation and nitrogen application

The dynamic plots of cotton biomass at different growth stages reflect the dry weight changes of cotton at different growth stages. For example, the healthy development of natural crop ecosystems and the changes in the plant ecological environment can be used to determine the biomass in the soil; this can reflect the crop growth and the development and crop productivity [23]. By using different amounts of irrigation and nitrogen application on the cotton during different periods, we can observe the growth of cotton more directly and then determine the effect on the abovementioned biomass content of cotton.

The trends in AGB variation under different water and nitrogen conditions show the changes in the aboveground dry weights of all the sampled cotton plants under different water and nitrogen conditions.

Fig. 2
figure 2

Changes in AGB during the cotton different periods under different water and nitrogen conditions (5% significance level)

Figure 2 shows the AGB at the cotton bud stage under different water and nitrogen conditions; here, the AGB under the W1 and W2 gradients was greater than that under W3, and the W1 gradient being superior to W2 gradient. The content of the cotton bud biomass under the W1N2 gradient was the highest, at 37.53 g. At the 5% significance level, the biomass under the W1N2 gradient was significantly different from those under the W2N3 and W2N4 gradients. The AGB of cotton at the flowering stage under different water and nitrogen conditions showed a similar trend. The W1 and W2 gradients had higher AGB, while the W3 gradient had the best overall performance. The biomass content under the W3N2 gradient was the highest, at 55.57 g. At the 5% significance level, the results under the W3N2 gradient were not significantly different from those under the W1N0, W3N0, W3N1 and W3N4 gradients but were significantly higher than those under the other gradients. The use of the W3 gradient, with excessive water, was not conducive to the normal growth and development during the cotton boll stage, leading to decreased biomass accumulation [24]. The W2 gradient better met the water demand of the cotton boll stage, and its overall change trend was greater than that of the other two water treatments. However, the biomass content under the W3N0 gradient was the highest at the cotton boll stage, at 73.82 g. At the 5% significance level, the results under the W3N0 gradient significantly differed from those under the W1N3, W2N1, W2N2, W3N1, W3N2, W3N3 and W3N4 gradients.

Correlation analysis of the cotton AGB under different irrigation and nitrogen applications

Excel 2007 software was utilized to conduct correlation analyses and to map the AGB and spectral reflectance of cotton under varying water and nitrogen conditions across different periods, as illustrated in Fig. 3:

Fig. 3
figure 3

Correlation between the AGB and spectral reflectance of cotton during different periods

Under water and nitrogen conditions, a significant positive correlation was observed between the AGB and the reflectance of cotton in the 325–712 nm band at the bud stage, with a very significant positive correlation in the 325–704 nm band. At the flowering stage, significant positive correlations were observed between the AGB and reflectance in the 325–346 nm, 358–731 nm, 835–949 nm, and 997–1075 nm bands. Among these, very significant positive correlations were found in the 325–340 nm, 403–421 nm, 522–668 nm, and 692–729 nm bands, while very significant negative correlations were observed in the 1027–1075 nm band. At the boll stage, significant positive correlations were noted between the AGB and reflectance in the 687–696 nm, 715–1050 nm, and 1063–1068 nm bands, with a very significant positive correlation in the 725–1048 nm band.

Extraction of the spectral characteristics of the cotton canopy in different periods

Based on the characteristic bands of the spectral data of cotton at different growth stages obtained through SPA, the spectral data at the bud stage were selected using characteristic wavelength variables, as shown in Fig. 4.

Fig. 4
figure 4

Extraction results based on the bud stage spectral feature wavelengths

When 20 characteristic variables were selected, the model assumed that the optimal RMSE could reach 9.3021; this value was reached at 334–344 nm, 351 nm, 388 nm, 502 nm, 657 nm, 721 nm, 817 nm, 951 nm, 965 nm, and 1066 nm, respectively.

The spectral data of the flowering period were selected using their characteristic wavelength variables, as shown in Fig. 5.

Fig. 5
figure 5

Extraction results based on the flowering stage spectral feature wavelengths

When 20 characteristic variables were selected, the model assumes that the optimal RMSE could reach 35.2364; this value was reached at 325, 331, 334–344, 357, 547, 666, 701, 753, and 928 nm, respectively.

The spectral data of the boll period were selected with the characteristic wavelength variables, as shown in Fig. 6.

Fig. 6
figure 6

Extraction results based on the boll stage spectral feature wavelengths

When 20 characteristic variables were selected, the model assumes that the optimal RMSE could reach 40.0069; this value was reached at 325, 334–344, 401, 490, 632, 699, 983, 1041, 1061, and 1075 nm, respectively.

The screening of characteristic wavelengths via SPA significantly reduces the complexity of the spectral data. At the bud, flowering, and boll stages, 20 spectral wavelengths were screened, accounting for only 2.66% of the original spectral wavelengths (ranging from 325 to 1075 nm). The characteristic wavelengths corresponding to different growth stages are listed in Table 1.

Table 1 Results of cotton AGB extraction

Estimation modeling of the cotton AGB using hyperspectral technology

Descriptive statistical analysis of cotton AGB

The total, training and validation samples of cotton biomass in the different periods are listed in Table 2.

Table 2 Descriptive statistical analysis of the AGB of cotton

As shown in Table 2, a minimal difference is observed between the average values of the training set and the validation set. The maximum value of the overall data is located in the training set, while the minimum value is found in the validation set, with a minimal difference between them. The average AGB of cotton ranges from approximately 34 to 59 g.

Modeling and establishment of cotton AGB Estimation

Combined with Fig. 3; Table 1, the intersection of the highly significant correlation wavelengths of cotton AGB and spectral reflectance, as well as the wavelengths extracted by SPA, were selected as the input vectors for modeling. Based on these selections, a spectral index for cotton AGB estimation was constructed. The estimation model of cotton AGB was developed using the following wavelengths: 334–344 nm, 351 nm, 388 nm, 502 nm, and 657 nm for the bud stage; 325 nm, 331 nm, 334–344 nm, 547 nm, 666 nm, and 701 nm for the flowering stage; and 983 nm and 1041 nm for the boll stage.

For inversion modeling, three analysis methods—SVM, RT, and CNN—were employed, with the selected characteristic wavelengths serving as the input vectors. The dataset was divided into a training set and a validation set to perform the inversion of cotton AGB. The test results are presented in Table 3.

Table 3 Cotton AGB Estimation model based on machine learning

In the three cotton growth stages, the coefficient of determination (R2) of the training and validation sets for the SVM inversion model ranged from 0.57 to 0.76, while the RMSE varied between 0.35 g and 9.82 g. Most RPD values for the training sets were above 10, whereas only the validation set at the boll stage exceeded 10. The estimation model established during the boll stage using the training set yielded high R2 values, with the training set achieving R2 = 0.76, RMSE = 0.35 g, and RPD = 17.59, and the validation set achieving R2 = 0.73, RMSE = 0.74 g, and RPD = 10.14. For the RT inversion model, the R2 of the training and validation sets ranged from 0.48 to 0.79, and the RMSE varied between 0.10 g and 12.54 g. Most RPD values for both the training and validation sets were below 5, except for the bud stage, where both the training and validation sets exceeded 10, indicating good overall inversion performance. The estimation model established at the bud stage using the training dataset achieved higher R2 values, with the training set reaching R2 = 0.79, RMSE = 0.24 g, and RPD = 16.18, and the test set reaching R2 = 0.64, RMSE = 0.10 g, and RPD = 35.49. For the CNN inversion model, the R2 of the training and validation sets ranged from 0.54 to 0.70, and the RMSE varied between 0.42 g and 3.99 g. All RPD values for both the training and validation sets were below 10. The estimation model established at the bud stage using the training dataset achieved higher R2 values, with the training set reaching R2 = 0.70, RMSE = 0.42 g, and RPD = 4.50, and the test set reaching R2 = 0.55, RMSE = 0.46 g, and RPD = 4.62.

The AGB of cotton in different periods could be effectively inverted using these three modeling methods; however, the results from the modeling and inversion of cotton AGB in different periods showed clear differences upon comparison. As shown in Table 3, most R2 values were above 0.58, and the RMSE was generally within 10 g. At the bud stage, the RT model achieved higher R2 values and lower RMSE values compared to SVM and CNN, reaching R2 = 0.79, RMSE = 0.24 g, and RPD = 16.18. At the flowering stage, the RT model outperformed SVM and CNN, with R2 = 0.71, RMSE = 1.84 g, and RPD = 4.10. At the boll stage, the SVM model achieved higher R2 values and lower RMSE values compared to RT and CNN, with R2 = 0.76, RMSE = 0.35 g, and RPD = 17.59. Considering the inversion effects across the three periods of cotton, the SVM model at the boll stage, the RT model at the bud stage, and the CNN model at the bud stage can be used as the spectral models for cotton AGB identification. The training results are shown in Fig. 7. From the perspective of model testing, the RT model demonstrated higher R2 values and lower RMSE values compared to SVM and CNN, with the modeling effect following the order RT > SVM > CNN.

Fig. 7
figure 7

Fitting diagram of the optimal cotton period training and validation based on machine learning

Discussion

Effects of water-nitrogen coupling on cotton biomass

Water and nitrogen are the key factors affecting cotton growth and biomass formation. Reasonable irrigation and nitrogen fertilizer application can significantly increase the biomass and yield of cotton [25]. Proper irrigation and nitrogen fertilizer application can significantly promote the growth and biomass accumulation of cotton. Under drought conditions, increased irrigation and nitrogen fertilizer application can increase plant height, LAI, and AGB of cotton [26]. In this study, the effects of water-nitrogen coupling on the AGB of cotton show significant phased and interactive characteristics. In the bud stage, proper irrigation and moderate nitrogen fertilizer application (such as the W1N2 treatment) can significantly promote cotton growth and lay a foundation for its subsequent development. At flowering, higher irrigation and moderate nitrogen fertilizer levels (such as the W3N2 treatment) further promote biomass accumulation, which may be related to the high water and nutrient requirements at this stage. In the boll stage, although a high irrigation rate (W3) is usually not conducive to the normal growth of cotton, the biomass reaches its highest level under the condition of no nitrogen fertilizer application (N0), which may indicate that the sensitivity of cotton to water and nitrogen fertilizer changes during the boll stage, and excessive nitrogen fertilizer may inhibit further biomass accumulation. In addition, previous studies have shown that the interaction of water and nitrogen has an important effect on cotton yield and water use efficiency (WUE). Although total irrigation and nitrogen fertilizer application have significant positive effects on cotton seed yield, the interaction between water and nitrogen is not significant [27]. When optimizing water and nitrogen management strategies, the needs of different growth stages must be taken into account to maximize biomass and yield. Therefore, in precision agriculture, irrigation and fertilization strategies need to be optimized according to the cotton growth stage and environmental conditions to improve resource utilization efficiency and crop productivity.

Possibility of the crop biomass estimation via spectral techniques

Spectral technology can quickly acquire a large amount of spectral data to reflect the growth of crops and has obvious advantages in estimation modeling [28]. Ling Z et al. [11] used SPA to study the AGB of winter wheat at 400–1000 nm and reported that four effective wavelengths were selected at the seedling stage: 583.35 nm, 763.55 nm, 929.18 nm and 940.64 nm. In the late seedling stage and the whole growth stage, 12 and 14 effective wavelengths were screened, respectively, which were mainly distributed between 700 and 800 nm and 900–1000 nm. In this study, the original spectra of cotton were obtained via SPA at the bud stage, flowering stage, and boll stage, and the effective wavelengths were mainly between 300 and 400 nm, 700–800 nm, and 1000–1075 nm. This may be due to the differences between different crops, but there are also similarities, of which 700–800 nm is the similarity between this study and the above studies. It is highly advantageous to construct a vegetation index based on the original spectrum as a predictor of the AGB of crops. The combination or optimization of multiple vegetation indices can improve the accuracy of modeling [28, 29]. Liu Y et al. [30] used potato as the research object and selected 734 nm and 742 nm wavelengths with high correlations with the AGB of potato by optimizing the vegetation index as the basis for subsequent modeling. In this study, in addition to extracting wavelengths with extremely significant correlations between cotton AGB and canopy spectral reflectance, SPA screening of sensitive wavelengths also selected the intersection of the two as input vectors for subsequent modeling, in which the final characteristic wavelengths were 701 nm, 721 nm, and 753 nm between 700 and 800 nm.

Therefore, the 700–800 nm spectral band continues to show significance in different studies, which further emphasizes its key role in monitoring plant physiological characteristics [31]. This band is closely related to chlorophyll absorption and reflects the efficiency of photosynthesis, which is a key factor in determining plant biomass and yield. For cotton, the 700–800 nm band is particularly important because it is closely related to chlorophyll content and leaf structure. Studies have shown that this band can effectively distinguish between healthy and stressed plants, highlighting its potential for early detection of physiological changes [32]. Our results are consistent with previous studies that have identified the 700–800 nm band as an important indicator for predicting biomass and yield of various crops [31]. The strong correlation between this band and cotton biomass observed in this study is consistent with the physiological role of this spectral region in photosynthesis [32]. Chlorophyll absorbs light in this band and converts light energy into chemical energy, which drives plant growth and development. Therefore, spectral technology has been extensively researched in crop estimation modeling, which also fully explains the universality and accuracy of spectral technology in this field.

Advantages of using machine learning modeling methods to estimate the crop biomass

To improve the accuracy of spectral monitoring, certain spectral analysis methods need to be used. The basis and premise of constructing an inversion model are strong universality and high stability. Regression trees are always the preferred method for obtaining agronomic advice because they provide intuitive results [33, 34]. Nayak et al. [35] used a variety of machine learning methods to estimate and evaluate the precision of wheat yield in the northwest Indo-Gangetic Plain, where the R2 of the classification and regression tree models was between 0.4 and 0.5. In this study, the R2 of the Regression Tree (RT) was between 0.4 and 0.8, and the inversion effect of RT was better than those of Support Vector Machine (SVM) and Convolutional Neural Network (CNN). The inversion effect of SVM was better than that of CNN. The effectiveness of different machine learning models in estimating cotton biomass further emphasizes the importance of selecting the right model for the specific agricultural context. Although CNNs show great potential in image recognition tasks, they did not perform as well as RT and SVM in our study. This difference could be attributed to the small sample size and the complexity of the spectral data, factors that may not be sufficient to train a robust CNN model [36, 37]. A recent study by Li W et al. [38] compared various machine learning algorithms for biomass estimation and found that the RT model outperformed the SVM and CNN models with limited data and high nonlinearity. This is consistent with our findings that RT models exhibit superior performance during the bud stage. In contrast, RT and SVM demonstrated better generalization capabilities, highlighting their applicability to small-scale agricultural datasets. Li L et al. [17] used hyperspectral technology to estimate the vertical heterogeneity of nitrogen concentration in winter rapeseed leaves through two years of field experiments and reported that the overall R2 value of the SVM model under first-order derivative reflectance treatment was above 0.80, and the RPD was approximately 2.40; these results showed good prediction performance. In this study, the SVM model was used to estimate the three stages of cotton through the original spectral screening feature wavelengths. The boll stage could be used as the spectral model for cotton AGB identification, with an R2 of 0.76, an RMSE of 0.35 g, and an RPD of 17.59. However, the inversion effect of the other two periods was not as evident as that of the first derivative reflectivity treatment. This finding indicates that proper pretreatment of the original spectrum and the use of two or more combinations could enhance model accuracy further [39, 40], obtain more information on crop growth and nutrition, and improve the accuracy of spectral estimation. Therefore, different pretreatment methods need to be used to model the AGB of cotton to optimize the model.

Our study highlights the important role of the 700–800 nm spectral band in estimating cotton biomass under the coupled condition of water and fertilizer. The strong correlation of this band with chlorophyll content and photosynthetic efficiency makes it a reliable indicator for monitoring plant health and productivity. By harnessing this spectral band, farmers and researchers can develop more accurate and efficient models for estimating biomass, which is critical for optimizing agricultural management practices. This study confirms the key role of the 700–800 nm band in monitoring plant physiological characteristics. Future studies should further explore the combination of this band with other spectral regions to improve the accuracy of biomass estimation models. Combining data from the 700–800 nm band with data from the near-infrared (NIR) and short-wave infrared (SWIR) regions can provide a more comprehensive understanding of plant health and growth dynamics [41]. The effectiveness of different machine learning models highlights the importance of selecting the right model for the specific agricultural context. Future research should explore the use of integrated methods and deep learning techniques to enhance the robustness and accuracy of the model. Combining multiple machine learning models can improve the generalization ability of the model and reduce overfitting [42]. Future studies should also explore the integration of spectral data with other information sources, such as meteorological data, soil moisture data, and crop growth models. This multi-sensor approach can provide a more complete understanding of crop health and growth, thereby improving the accuracy and reliability of biomass estimation models.

Conclusion

In this study, hyperspectral techniques combined with machine learning methods are used to establish an estimation model of cotton AGB, and the characteristics of cotton growth under water-nitrogen coupling conditions are discussed. The main conclusions are as follows:

  1. 1)

    Under different water-nitrogen coupling conditions, cotton AGB shows significant stage differences. AGB at the bud stage and flowering stage is more affected by irrigation and fertilization levels, while AGB at the boll stage is more sensitive to irrigation. It is important to optimize irrigation and fertilization strategies to increase cotton biomass and yield.

  2. 2)

    By comparing SVM, RT, and CNN machine learning models, it is found that the RT model performs best at the bud stage, while the SVM model performs better at the boll stage. These results indicate that there are differences in the adaptability of cotton at different growth stages to the models, and future studies should further optimize model selection and parameter adjustment.

  3. 3)

    This study highlights the crucial role of the 700–800 nm band in estimating cotton AGB, which is closely related to plant photosynthetic efficiency and chlorophyll content. Combined with the feature bands selected by SPA, the prediction accuracy and calculation efficiency of the model can be effectively improved.

Data availability

The authors do not have permission to share data. The data that support the findings of this study are available on request from the corresponding author.

References

  1. Arshad MU, Yunfeng Z, Hanif S, Fatima F. Impact of climate change and technological advancement on cotton production: evidence from Xinjiang region, China. J Agricultural Sci Technol. 2022;24:1519–31.

    Article  Google Scholar 

  2. Li Z, Wan S, Chen G, Han Y, Lei Y, Ma Y, Xiong S, Mao T, Feng L, Wang G. Effects of irrigation regime on soil hydrothermal microenvironment, cotton biomass, and yield under non-film drip irrigation system in cotton fields in Southern Xinjiang, China. Ind Crops Prod. 2023;198:116738–54.

    Article  CAS  Google Scholar 

  3. Qiu Y, Li X, Tang Y, Xiong S, Han Y, Wang Z, Feng L, Wang G, Yang B, Lei Y. Directly linking plant N, P and K nutrition to biomass production in cotton-based intercropping systems. Eur J Agron. 2023;151:126960–71.

    Article  CAS  Google Scholar 

  4. Meng B, Yi S, Liang T, Yin J, Sun Y. Modeling alpine grassland above ground biomass based on remote sensing data and machine learning algorithm: A case study in the East of Tibetan plateau, China. IEEE J Sel Top Appl Earth Observations Remote Sens. 2020;13:2986–95.

    Article  Google Scholar 

  5. Kumar P, Krishna AP, Rasmussen TM, Pal MK. Rapid evaluation and validation method of above ground forest biomass Estimation using optical remote sensing in Tundi reserved forest area, India. Int J Geo-Information. 2021;10:29.

    Article  Google Scholar 

  6. Sainuddin FV, Malek G, Rajwadi A, Nagar PS, Asok SV, Reddy CS. Estimating Above-Ground biomass of the regional forest landscape of Northern Western Ghats using machine learning algorithms and Multi-sensor remote sensing data. J Indian Soc Remote Sens. 2024;52:885–902.

    Article  Google Scholar 

  7. Hojo A, Avtar R, Nakaji T, Tadono T, Takagi K. Modeling forest above-ground biomass using freely available satellite and multisource datasets. Ecol Informatics: Int J Ecoinformatics Comput Ecol. 2023;74:101973–8.

    Article  Google Scholar 

  8. Liu H, Wang H, Li X, Qu T, Zhang Y, Lu Y, Yang Y, Liu J, Zhao X, Su J, Luo D. Identification of constructive species and degraded plant species in the temperate typical grassland of inner Mongolia based on hyperspectral data. Agriculture. 2023;13:399.

    Article  Google Scholar 

  9. Tang Z, Guo J, Xiang Y, Lu X, Wang Q, Wang H, Cheng M, Wang H, Wang X, An J, Abdelghany A, Li Z, Zhang F. Estimation of leaf area index and Above-Ground biomass of winter wheat based on optimal spectral index. Agronomy. 2022;12:1729.

    Article  CAS  Google Scholar 

  10. Yang H, Li F, Wang W, Yu K. Estimating Above-Ground biomass of potato using random forest and optimized hyperspectral indices. Remote Sens. 2021;13:2339–2339.

    Article  Google Scholar 

  11. Ling Z, Qun C, Tao J, Zhang Y, Lei Y, Zhao J, Huang L. Estimation of aboveground biomass for winter wheat at the later growth stage by combining digital texture and spectral analysis. Agronomy. 2023;13:865–865.

    Article  Google Scholar 

  12. Oliveira FM, Carneiro MF, Ortiz VB, Thurmond M, Oliveira LP, Bao Y, Sanz-Saez A, Tedesco D. Predicting below and above-ground peanut biomass and maturity using multi-target regression. Comput Electron Agric. 2024;218:108647–60.

    Article  Google Scholar 

  13. Pennacchi JP, Virlet N, Rodrigues ADBJP, Parry MAJ, Feuerhelm D, Hawkesford M, Carmo-Silva E. A predictive model of wheat grain yield based on canopy reflectance indices and theoretical definition of yield potential. Theoretical Experimental Plant Physiol. 2022;34:537–50.

    Article  Google Scholar 

  14. Fugen J, Mykola K, Kaisen M, Song C, Long J, Hua S. Estimating the aboveground biomass of coniferous forest in Northeast China using spectral variables, land surface temperature and soil moisture. Sci Total Environ. 2021;785:147335–50.

    Article  Google Scholar 

  15. Sun S, Zuo Z, Yue W, Morel J, Parsons D, Liu J, Peng J, Cen H, He Y, Shi J, Li X, Zhou Z. Estimation of biomass and nutritive value of grass and clover mixtures by analyzing spectral and crop height data using chemometric methods. Comput Electron Agric. 2021;192:106571–81.

    Article  Google Scholar 

  16. Savitzky A, Golay EJM. Smoothing and differentiation of data by simplified least squares procedures. Anal Chem. 1964;36:1627–39.

    Article  CAS  Google Scholar 

  17. Li L, Jákli B, Lu P, Ren T, Ming J, Liu S, Wang S, Lu J. Assessing leaf nitrogen concentration of winter oilseed rape with canopy hyperspectral technique considering a non-uniform vertical nitrogen distribution. Ind Crops Prod. 2018;116:1–14.

    Article  CAS  Google Scholar 

  18. Cherkassky V, Ma Y. Practical selection of SVM parameters and noise Estimation for SVM regression. Neural Netw. 2004;17:113–26.

    Article  PubMed  Google Scholar 

  19. Hong Y, Liu Y, Chen Y, Liu Y, Yu L, Liu Y, Cheng H. Application of fractional-order derivative in the quantitative Estimation of soil organic matter content through visible and near-infrared spectroscopy. Elsevier. 2019;337:758–69.

    CAS  Google Scholar 

  20. Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and regression trees. 1st ed. Routledge; 2017.

  21. Sun Z, Guo X, Xu Y, Zhang S, Cheng X, Hu Q, Wang W, Xue X. Image recognition of male oilseed rape (brassica napus) plants based on convolutional neural network for UAAS navigation applications on supplementary pollination and aerial spraying. Agriculture. 2022;12:62.

    Article  CAS  Google Scholar 

  22. Saeys W, Mouazen AM, Ramon H. Potential for onsite and online analysis of pig manure using visible and near infrared reflectance spectroscopy. Biosyst Eng. 2005;91:393–402.

    Article  Google Scholar 

  23. Yilmaz E, Gürbüz T, Dadelen N, Wzorek M. Impacts of different irrigation water levels on the yield, water use efficiency, and fiber quality properties of cotton (Gossypium hirsutum L.) irrigated by drip systems. Euro-Mediterranean J Environ Integr. 2021;6:1–7.

    Article  Google Scholar 

  24. Papastylianou PT, Argyrokastritis IG. Effect of limited drip irrigation regime on yield, yield components, and fiber quality of cotton under mediterranean conditions. Agric Water Manage. 2014;142:127–34.

    Article  Google Scholar 

  25. Hou X, Fan J, Zhang F, Hu W, Xiang Y. Optimization of water and nitrogen management to improve seed cotton yield, water productivity and economic benefit of mulched drip-irrigated cotton in Southern Xinjiang, China. Field Crops Res. 2024;308:109301.

    Article  Google Scholar 

  26. Wei K, Wang Q, Deng M, Lin S, Guo Y. Response of cotton growth, yield, and water and nitrogen use efficiency to nitrogen application rate and ionized brackish water irrigation under film-mulched drip fertigation. Front Plant Sci. 2024;15:1361202.

    Article  PubMed  PubMed Central  Google Scholar 

  27. Wang Z, Zhang K, Shao G, Lu J, Gao Y, Song E. Water and nitrogen use efficiencies in cotton production: A meta-analysis. Field Crops Res. 2024;309:109322.

    Article  Google Scholar 

  28. Zhang Y, Wang R. Estimation of aboveground biomass of vegetation based on Landsat 8 OLI images. Heliyon. 2022;8:e11099.

    Article  PubMed  PubMed Central  Google Scholar 

  29. Martina C, Daniele C, Giovanni C, Luca B, Nicolò P, Dario P, Laura M, Antonio V, Luigi D, Pietro MG. Improved Estimation of herbaceous crop aboveground biomass using UAV-derived crop height combined with vegetation indices. Precision Agric. 2022;24:587–606.

    Google Scholar 

  30. Liu Y, Fan Y, Feng H, Chen R, Bian M, Ma Y, Yue J, Yang G. Estimating potato above-ground biomass based on vegetation indices and texture features constructed from sensitive bands of UAV hyperspectral imagery. Comput Electron Agric. 2024;220:108918.

    Article  Google Scholar 

  31. Zhang Y, Liao B, Li F, Eneji AE, Du M, Tian X. Growth, leaf anatomy, and photosynthesis of cotton (Gossypium hirsutum L.) seedlings in response to four light-emitting diodes and high pressure sodium lamp. J Cotton Res. 2024;7:8.

    Article  CAS  Google Scholar 

  32. Sai S, Kumar S, Gaur A, Goyal S, Chamola V, Hussain A. Unleashing the power of generative AI in agriculture 4.0 for smart and sustainable farming. Cogn Comput. 2025;17:63.

    Article  Google Scholar 

  33. Di Mauro G, Cipriotti PA, Gallo S, Rotundo JL. Environmental and management variables explain soybean yield gap variability in central Argentina. Eur J Agron. 2018;99:186–94.

    Article  Google Scholar 

  34. Krupnik TJ, Ahmed ZU, Timsina J, Yasmin S, Hossain F, Al Mamun A, Mridha AI, McDonald AJ. Untangling crop management and environmental influences on wheat yield variability in Bangladesh: an application of non-parametric approaches. Agric Syst. 2015;139:166–79.

    Article  Google Scholar 

  35. Nayak HS, Silva JV, Parihar CM, Krupnik TJ, Sena DR, Kakraliya SK, Jat HS, Sidhu HS, Sharma PC, Jat ML, Sapkota TB. Interpretable machine learning methods to explain on-farm yield variability of high productivity wheat in Northwest India. Field Crops Res. 2022;287:108640.

    Article  Google Scholar 

  36. Mirzabozorg SAAS, Abedi M, Yousefi M. Enhancing training performance of convolutional neural network algorithm through an autoencoder-based unsupervised labeling framework for mineral exploration targeting. Geochemistry. 2024;84:126197.

    Article  Google Scholar 

  37. Khaing KK, Htwe NA, Lewis A. A comparative analysis of convolutional neural networks (CNN) and long Short-Term memory (LSTM) networks for forecasting stock prices over a One-Week horizon at Yangon stock exchange. Genetic Evolutionary Compution. 2025;1321:463–72.

    Article  Google Scholar 

  38. Li W, Luo Y, Tang C, Zhang K, Ma X. Boosted fuzzy granular regression trees. Math Probl Eng. 2021;2021:9958427.

    Google Scholar 

  39. Xin T, Limin D, Tingxi L, Zhenlei Y, Yixuan W, Singh Vijay P. Estimation of grassland aboveground biomass combining optimal derivative and Raw reflectance vegetation indices at peak productive growth stage. Geocarto Int. 2023;38:1–27.

    Google Scholar 

  40. Lv X. Estimation of cotton leaf area index (LAI) based on spectral transformation and vegetation index. Remote Sens. 2021;14:136.

    Article  Google Scholar 

  41. Botero-Valencia J, García-Pineda V, Valencia-Arias A, Valencia J, Reyes-Vera E, Mejia-Herrera M, Hernández-García R. Machine learning in sustainable agriculture: systematic review and research perspectives. Agriculture. 2025;15:377.

    Article  Google Scholar 

  42. Oumer MA, Burton M, Kassie M. Dynamics of multiple sustainable agricultural intensification practices adoption: application of the intertemporal multivariate probit model. PLoS ONE. 2025;20:e0314172.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

Download references

Acknowledgements

We are grateful to the National Aksu Farmland Ecosystem Field Scientific Observation and Research Station of Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences for the support of this study.

Funding

This research was supported by project from Chinese Academy of Sciences (GJ05040103), the Scientific and Technological Plan Project of Xinjiang Production and Construction Corps Alar City (2022XX001).

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization, J.W.; investigation, S.Q. and F.L.; methodology, S.Q.; formal analysis, J.S.; resources, C.C.; writing-original draft, S.Q.; writing-review and editing, J.W.; funding acquisition, J.W. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Jiaqiang Wang.

Ethics declarations

Ethics approval and consent to participate

This article does not contain any studies with human participants or animals performed by any of the authors.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qiao, S., Wang, J., Li, F. et al. Spectral estimation of the aboveground biomass of cotton under water–nitrogen coupling conditions. Plant Methods 21, 34 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13007-025-01358-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13007-025-01358-9

Keywords