A cotton organ segmentation method with phenotypic measurements from a point cloud using a transformer

Liu, Fu-Yong; Geng, Hui; Shang, Lin-Yuan; Si, Chun-Jing; Shen, Shi-Quan

doi:10.1186/s13007-025-01357-w

Methodology
Open access
Published: 16 March 2025

A cotton organ segmentation method with phenotypic measurements from a point cloud using a transformer

Fu-Yong Liu¹,
Hui Geng²,
Lin-Yuan Shang²,
Chun-Jing Si^2,3 &
…
Shi-Quan Shen¹

Plant Methods volume 21, Article number: 37 (2025) Cite this article

491 Accesses
Metrics details

Abstract

Cotton phenomics plays a crucial role in understanding and managing the growth and development of cotton plants. The segmentation of point clouds, a process that underpins the measurement of plant organ structures through 3D point clouds, is necessary for obtaining precise phenotypic parameters. This study proposes a cotton point cloud organ semantic segmentation method named TPointNetPlus, which combines PointNet++ and Transformer algorithms. Firstly, a dedicated point cloud dataset for cotton plants is constructed using multi-view images. Secondly, the attention module Transformer is introduced into the PointNet++ model to increase the accuracy of feature extraction. Finally, organ-level cotton plant point cloud segmentation is performed using the HDBSCAN algorithm, successfully segmenting cotton leaves, bolls, and branches from the entire plant, and obtaining their phenotypic feature parameters. The research results indicate that the TPointNetPlus model achieved a high accuracy of 98.39% in leaf semantic segmentation. The correlation coefficients between the measured values of four phenotypic parameters (plant height, leaf area, and boll volume) ranged from 0.95 to 0.97, demonstrating the accurate predictive capability of the model for these key traits. The proposed method, which enables automated data analysis from a plant's 3D point cloud to phenotypic parameters, provides a reliable reference for in-depth studies of plant phenotypes.

Introduction

The climate and soil conditions in Xinjiang provide an ideal environment for the growth of cotton, making it one of the most significant cotton-producing regions in China [1]. However, the prosperity of the cotton industry is directly linked to the livelihoods of local farmers and the economic development of the region [2]. Research on cotton phenotypes allows for a more accurate understanding of the plant's physiological status, adaptability, and response to environmental changes [3]. This approach is crucial for increasing cotton yield, improving quality, and cultivating more resilient varieties. Precise measurements of various cotton traits, such as plant height, leaf morphology, and boll size, enable a better assessment of the growth conditions of plants under different varieties or treatments, providing targeted recommendations for breeding and cultivation [4]. However, traditional methods often rely on manual measurements and observations, leading to subjectivity and a high workload [5]. Two-dimensional image measurement methods, including pixel-based analysis, feature extraction and matching, geometric shape fitting, and visual measurement techniques, may encounter challenges such as image quality, deformation, feature extraction stability, scale, and computational complexity [6].

The use of three-dimensional point cloud technology for obtaining plant organ parameters offers several advantages, including precise structural information, non-invasive measurement, comprehensive data retrieval, adaptability to complex environments, and automation for efficiency. This method serves as a valuable tool for plant phenotyping research and agricultural production [7,8,9]. Various technologies used for point cloud acquisition include laser scanning (Lidar), structured light, time-of-flight (ToF) cameras, stereo vision, multi-view photogrammetry, panoramic photography, and sonar scanning. These technologies utilize sensors like lasers, light patterns, cameras, and sound waves to measure surface attributes and generate three-dimensional coordinate data in the form of point clouds [10,11,12]. The generation of such data has facilitated the use of deep learning for point cloud segmentation [13, 14].

Plant point cloud segmentation and measurement technology based on deep learning combines deep learning algorithms with point cloud processing, enabling accurate segmentation of plant structures and precise measurement of phenotypic parameters. By utilizing deep learning algorithms such as convolutional neural networks, this technology can learn and extract complex plant features, providing robust support for point cloud segmentation and feature extraction. Techniques such as PointNet++ have been applied to plant point cloud segmentation, demonstrating their potential in identifying and analyzing different plant parts, such as leaves, stems, and fruits [15,16,17]. By segmenting plant point clouds into these parts, researchers can finely measure and analyze the phenotypic parameters of each part, such as plant height and leaf area [18, 19]. This technology has found widespread applications in agricultural research, plant science, and agricultural production, offering a new approach to understanding plant growth, adaptability, and environmental response [20]. The study provides a scientific basis for breeding and agricultural management.

However, despite the significant advancements made by PointNet++ in local feature extraction, its application in plant point cloud segmentation and measurement still has limitations. Specifically, PointNet++ may not fully capture all the fine local features when dealing with the complex structures and diverse morphological characteristics of plants. Its hierarchical structure has limitations in capturing large-scale and long-distance dependencies, potentially leading to information loss. Additionally, PointNet++ shows insufficient robustness to noise and occlusions commonly encountered in practical agricultural scenarios, affecting the accuracy of segmentation and measurement.

This study presents an effective approach for addressing the difficulties in extracting plant phenotypes from 3D point clouds. This approach comprises acquiring plant point clouds, creating appropriate datasets, and partitioning plant point clouds. Firstly, a dedicated point cloud dataset for cotton plants is constructed using multi-view images. Secondly, the attention module Transformer is introduced into the PointNet++ model to increase the accuracy of feature extraction. Finally, organ-level cotton plant point cloud segmentation is performed using the HDBSCAN algorithm, successfully segmented cotton leaves, bolls, and branches from the entire plant, and obtained their phenotypic feature parameters. Our primary contributions can be outlined as follows:

We constructed a high-precision, dense point cloud dataset of cotton plants using Structure from Motion (SfM) 3D structural reconstruction. The dataset consists of over 724 high-quality partial and complete point clouds, documenting the growth of the plants over time. Each cotton plant is represented by a 3D point cloud containing 40,960 points.
The integration of the Transformer module into the PointNet++ network was aimed at improving the accuracy of cotton plant point cloud organ segmentation.
Leaves, bolls, and branches obtained after the segmentation of the instances have a correlation coefficient greater than 0.9 between the predicted values and the true values obtained from the point cloud computation, which can be applied to the actual production.

Related works

Plant point cloud segmentation and measurement

Despite the significant progress in plant point cloud segmentation and measurement technology based on deep learning in plant phenotyping research, some challenges still exist. Meng et al. designed a Vv-Net [21] to voxelize point clouds. Since transforming point clouds into images or voxels cannot effectively utilize the spatial features of point clouds and increase the data processing cost and structured noise to varying degrees, Xu et al. [22] constructed a convolution kernel with a dynamic convolutional weight matrix and proposed a position adaptive convolution (PAConv), for which the weight coefficients of the matrix can be obtained by the fractional network adaptively learning the relative positional relationships of the points. To address the problem that point clouds need to be transformed into images or voxels first when using 2D convolution or 3D convolution in point cloud semantic segmentation, in addition to designing point convolution with point clouds as input, GNNs can be construct to build a special graph structure about the point clouds [23,24,25], then graph convolution can be used to explore the neighboring information of each point to better utilize the spatial features of the point clouds and to improve the segmentation accuracy. The dynamic graph convolutional neural network (DGCNN) designed by Wang et al. [23] takes the input N points as the centers, computes the respective K nearest neighbors layer by layer to dynamically construct the local neighborhood graph, and then uses edge convolution to compute the edge features between the center point and its nearest neighbors. However, the fixed size of the edge features prevents the model from performing well at different scales and input points. Therefore, we have constructed a point cloud semantic segmentation network based on the deep learning network PointNet++ . This has solved the current problem of insufficient accuracy in plant organ segmentation.

Transformer

In the field of computer vision, transformer model processes global information mainly through an attention mechanism, and the core idea is to compute an output representation for each location by focusing on different parts of the input sequence [26,27,28,29,30]. DGANet [31] further differentiates each edge of the constructed local graph by integrating a dilated graph attention module implemented by an offset attention mechanism to better learn the edge features. PAN [32] is based on a novel local attention edge convolution layer and a point-by-point spatial attention module. Although the attention mechanism allows the model to filter and learn the most important information [33, 34], researchers often need to expend much effort designing special attention modules for different tasks, such as channel attention and spatial attention [35], and the computational complexity of different attention modules is different, which does not support parallel computation. By introducing the multi-head attention mechanism, the transformer can focus on different aspects of the input sequence at the same time, thus capturing the global information effectively. In addition, the Transformer model can be further enhanced by stacking multiple Transformer layers to further enhance its modeling capabilities. This paper introduces the integration of the Transformer module into the PointNet++ network, resulting in a novel point cloud transformer structure tailored for enhancing segmentation accuracy in 3D point clouds within the PointNet++ network.

The self-attention mechanism of Transformer is able to capture global features and long-range dependencies in point cloud data more effectively. This enhances the ability to understand and capture details of complex plant structures, especially when dealing with subtle and complex features. Additionally, Transformer exhibits greater robustness in the face of noise and occlusion, improving the overall understanding and segmentation accuracy of point cloud data through information balancing and optimization on a global scale. Therefore, adding Transformer to the PointNet++ network encoder can improve the accuracy of cotton plant organ segmentation.

Materials and methods

Overview

The proposed method for cotton organ segmentation and measurement is illustrated in the flowchart in Fig. 1, presenting a comprehensive framework. This method is divided into four fundamental components: image data acquisition, generation of a cotton plant point cloud, segmentation of the cotton organ point cloud, and extraction of organ phenotypic parameters.

Cotton3D dataset

Data acquisition

The Xinjiang cotton experiment was conducted in 2021 at the East District of Tarim University in Alar city, located in the Aral Reclamation Area of Xinjiang. This area is situated at the northern edge of the Taklamakan Desert and at the confluence of the Aksu, Hotan, and Yarkant Rivers in the upper reaches of the Tarim River.

In this experiment, the later cotton plants were tall, with a mean plant height of approximately 1.1 m, and a mean distance between the longest leaves on both sides of the plant at the same level as the body, between 50 and 60 cm. Cotton was placed on the centre table. Due to the lack of light in the laboratory, three additional photographic lights were added to supplement the light, each placed 120 degrees apart horizontally, with the lights positioned 2 m above the ground. The actual shooting environment was arranged as shown in Fig. 2. Because the plant was too large, it was not suitable to use a motorized turntable to carry out the rotational shooting, because the plant itself was difficult to fix, resulting in shaking to produce a large amount of noise or failure to align the image. Therefore, this experiment used purely manual imaging to obtain images. When photographing the cotton plants, the distance between the camera and the cotton plant was approximately 30 cm. The camera was used to take pictures at elevated, flat, and overhead shooting angles, and one picture was taken at the same height at intervals of approximately 6 degrees, depending on the visual angle of the camera. Approximately 60 pictures were taken at each shooting angle, totaling approximately 180 pictures per cotton plant.

Point cloud preprocessing

3D reconstruction of cotton plants Utilizing RealityCapture's Motion Recovery Structure technology, objects of the same target are photographed multiple times from different angles. The photos are then filtered to exclude blurred, out-of-focus, and dissimilar photos to avoid compromising the accuracy of the reconstruction. Next, the images are imported into the software to generate spatial point cloud pixel values using a dense reconstruction algorithm. Multiple stereo matching algorithms are then used to reconstruct the images. Once the reconstruction is complete, camera calibration parameters are applied to eliminate lens aberrations, automatically remove outliers, and concatenate the point cloud to form a triangular mesh file for post-measurement.

Point cloud normalization Point cloud normalization is the process of scaling point cloud data to a specified range along three coordinate axes $\left(X, Y, Z\right)$. Typically, point cloud normalization scales the point cloud into a unit cube with the center point of the cube as the origin $\left(0, 0, 0\right)$ and the length of its side as 1. The purpose of point cloud normalization is to map different sizes of point cloud data into the same scale space, which is beneficial for subsequent data processing and analysis.

Data labels The data labels for 3D reconstruction of cotton are key information used to identify and describe the point cloud data, which helps to distinguish between different objects, parts, or features. These labels can include object labels, part labels, and feature labels. In this paper, when using CloudCompare software to label the reconstructed 3D point cloud data of cotton plants, the following steps can be taken opening the point cloud data file of cotton plants, and classifying and labelling the leaves, bolls, and branches according to demand. All points in the point cloud of leaf organs were labeled 1, those in the point cloud of boll organs should be labeled 2, and the point cloud of branch organs should be labeled 3, with the rest given a value of 0. In CloudCompare, the point cloud data are labeled interactively through the view interface.

Cotton3D dataset

The Cotton3D dataset is a comprehensive collection of 3D point cloud data that captures the growth cycle of cotton plants. The original cotton plant point cloud data was acquired using a camera and SfM techniques, and therefore the dataset exhibits greater morphological irregularities and discontinuities in surface features. The dataset consists of more than 724 high-quality partial and complete point clouds detailing plant growth over time. Each cotton plant is represented by a three-dimensional point cloud containing 40,960 points. The dataset was further expanded with data enhancement methods such as random rotation (angle_sigma = 0.05, angle_clip = 0.1), random noise (sigma = 0.01, clip = 0.05), disrupted point clouds, and random scaling (scale_low = 0.95, scale_high = 1.05). The data with better plant integrity was finally selected to be brought into the network to complete the segmentation task. The Cotton3D dataset used for the segmentation task is shown in Table 1.

Table 1 The settings for the training dataset and test dataset

Full size table

Transformer

Transformer in point cloud processing captures global relationships through its self-attention mechanism, thereby enhancing the richness and accuracy of feature representation. The transformer architecture [36] is as follows (Fig. 3): First, the point cloud data with input dimensions $\left\{N, K, d+C\right\}$, where $N$ is the number of points, $K$ is the number of neighbors per point, and $d+C$ is the feature dimension, passes through a Multi-Layer Perceptron (MLP) to extract initial features, denoted as $\begin{array}{ccc}{q}_{j}^{i-1},& {k}_{j}^{i-1},& {v}_{j}^{i-1}\end{array}=MLP\left(X\right)$, where $X\in {\mathbb{R}}^{N\times K\times (d+C)}$. Next, position encoding $P$ is added to the initial features to incorporate positional information, resulting in $F={v}_{j}^{i-1}+P$. The processed features then undergo a series of element-wise operations: element-wise subtraction $S={q}_{j}^{i-1}-{k}_{j}^{i-1}$, element-wise addition $A=S+P$, and element-wise multiplication $M=A\odot F$, where $\odot$ denotes element-wise multiplication. Finally, the results of these operations are merged and processed by another MLP to produce the final output ${F}_{out} = MLP(M)$, which has the same dimensions as the input $\left\{N, K, d+C\right\}$. This architecture enhances the feature representation of the point cloud, enabling the Transformer to more effectively process point cloud data and improve processing performance.

The advantages of the transformer model include its parallel processing capability and its ability to model long sequences. Moreover, this approach can better capture the semantic information in the input sequence because the self-attention mechanism can focus on all positions in the sequence at the same time. However, compared with those of traditional RNN and CNN models, the computational complexity of the transformer model is greater, requiring a large amount of computational resources and training data. In recent years, the visual transformer model has become one of the mainstream models in the field of computer vision, and its application scenarios and model structure have been continuously extended and improved. For example, models such as DETR [30] and ViT [37] are based on the improved version of the visual transformer, and these models have undergone many related optimizations based on 2D vision tasks. For example, it is unrealistic to input all the pixels of a 2D image into the fully connected layer for computation at the same time, which greatly exceeds the capacity of the fully connected layer. In VIT, the network first slices the input image into multiple 16 × 16 patches, and then the patches are used as the smallest unit for inputting the multi-attention mechanism, which significantly reduces the difficulty of the computation, and achieves excellent performance.

Instance segmentation of point clouds

Semantic segmentation

The improved PointNet++ network is used for semantic segmentation of cotton plant point clouds. PointNet++ [38] achieves better descriptions of local features and overall features than PointNet dose. PointNet fundamentally learns the information of each point to obtain the spatial encoding, after which all the obtained features of all the points are aggregated together to become the point cloud's global features. PointNet, however, is unable to acquire structural information between the points of the point cloud or more compatible local features due to its network structure. Therefore, constructing local features of the point cloud is a crucial part of the network design, as only when the neural network is provided with enough receptive fields and receptive domains, can it acquire better features. PointNet++ uses the spatial information between points as a criterion for dividing the point cloud. This method divides the point cloud into one overlapping local space, and then spatially encodes the points in the local space to obtain the local features of each region. The design of local features takes into account the structural and geometric information of the point cloud and can improve the semantic segmentation of the point cloud. In this study, the incorporation of Transformer into the feature extraction component of PointNet++ addresses the limitations of the original model and compensates for the lack of localized features and the ability to capture global relationships in point cloud data, as shown in Fig. 4.

In the sampling process, the distance metric between points is used as a criterion by using the Euclidean distance, which is calculated in Eq. (1) for points ${p}_{i}\left({x}_{i}, {y}_{i}, {z}_{i}\right)$ and ${p}_{j}\left({x}_{j}, {y}_{j}, {z}_{j}\right)$.

$$\text{D}\left({\text{p}}_{\text{i}},{\text{p}}_{\text{j}}\right)=\sqrt{{\left({\text{x}}_{\text{i}}-{\text{x}}_{\text{j}}\right)}^{2}+{\left({\text{y}}_{\text{i}}-{\text{y}}_{\text{j}}\right)}^{2}+{\left({\text{z}}_{\text{i}}-{\text{z}}_{\text{j}}\right)}^{2}}$$

(1)

The PointNet layer, on the other hand, utilizes a simple PointNet structure to form a local spatial feature extraction module. The function of the underlying PointNet network is to map an unordered set of point clouds $\left\{{x}_{1}, {x}_{2}, \cdots , {x}_{n}\right\}$ onto a single vector using a function that behaves as follows.

$$\text{f}\left({\text{x}}_{1},{\text{x}}_{2},\cdots ,{\text{x}}_{\text{n}}\right)=\upgamma \left(\underset{\text{i}=1,\cdots ,\text{n}}{\text{MAX}}\left\{\text{h}\left({\text{x}}_{\text{i}}\right)\right\}\right)$$

(2)

where the networks $\gamma$ and $h$ usually behave as multi-layer perceptrons, and $h$ corresponds to the encoding of the local spatial information of the point cloud. Through the PointNet layer, the information of the points in the local space is finally aggregated into a one-dimensional vector.

The realization process involves performing point feature propagation through distance-based interpolation, and aggregating the carry features of the corresponding points in the corresponding coding layer through cross-layer skip link concatenation. When carrying out point feature propagation from the ${N}_{l}$ layer to the ${N}_{l-1}$ layer, assuming that we want to obtain the features of point $A$ in the ${N}_{l-1}$ layer, we first use the KNN interpolation approach to find the three nearest points of point $A$ in the ${N}_{l}$ layer, and carry out the weighted summation, and the corresponding $p=2$ and $k=3$ in Eq. (3). Then we use the weighted features obtained and the corresponding set of points in the encoding process, the SA abstract layer, to aggregate their corresponding points in the corresponding coding layer through the cross-layer Skip Link Concatenation obtained during the encoding process are combined through cross-layer jump links, and the connected combined features are aggregated through a single PointNet layer structure. Different point cloud upsampling steps are performed by this feature aggregation approach until the original point cloud size is restored to the original point cloud size.

$${f}^{\left(j\right)}\left(x\right)=\frac{{\sum }_{i=1}^{k}{w}_{i}\left(x\right){f}_{i}^{\left(j\right)}}{{\sum }_{i=1}^{k}{w}_{i}\left(x\right)}where {w}_{i}\left(x\right)=\frac{1}{d{\left(x,{x}_{i}\right)}^{p}},j=1,\cdots ,\text{ C}$$

(3)

Clustering algorithm

After semantic recognition using TPointNetPlus, this study utilized the HDBSCAN (Hierarchical Density -Based Spatial Clustering of Applications with Noise) [39] algorithm to achieve organ-level instance segmentation of cotton (Fig. 5). HDBSCAN is a clustering method that utilizes DBSCAN and hierarchical clustering, both of which are well-known. It is a density-based spatial data clustering technique that determines the reachable distances between neighboring points and core points to construct an inter-arrival graph. The last step is to use hierarchical clustering and clustering tree compression to form the clusters.

HDBSCAN does not require users to pre-determine the number of clusters or the distance threshold around cluster points. The reachable distance design can deal with clusters with different density distances, constructing a hierarchical structure based on density clustering, thereby facilitating more efficient extraction of discontinuities. The mutual reachability distance between two points is defined as Eq. (4).

$${d}_{k}\left(p,q\right)=\text{max}\left\{{c}_{k}\left(p\right),{c}_{k}\left(q\right),d\left(p,q\right)\right\}$$

(4)

where $d\left(p ,q\right)$ denotes the distance between points$p$, and $q$ and the core distance ${c}_{k}\left(p\right)=d\left(p,{N}^{k}\left(p\right)\right)$ denotes the distance between core point p and the kth neighboring point.

Phenotype parameter extraction

Cotton leaf area The area of a leaf is often calculated by converting the point cloud data into a grid through the process of triangulation. Let there be a point cloud with $N$ points, each represented by coordinates $\left({x}_{i},{y}_{i},{z}_{i}\right)$ for $i=\text{1,2},...,N$. Let $M$ be the number of triangles in the mesh. For each triangle $j$, defined by three vertices $\left({x}_{j1},{y}_{j1},{z}_{j1}\right),\left({x}_{j2},{y}_{j2},{z}_{j2}\right),\left({x}_{j3},{y}_{j3},{z}_{j3}\right)$, the lengths of the three sides of the triangle were computed as ${a}_{j}=||{P}_{j2}-{P}_{j1}||, {b}_{j}=||{P}_{j3}-{P}_{j2}||, {c}_{j}=||{P}_{j1}-{P}_{j3}||$, calculate the semi-perimeter, ${s}_{j}$ was calculated as ${s}_{j}=\frac{{a}_{j}+{b}_{j}+{c}_{j}}{2}$, Heron's formula was applied to compute the triangle area ${A}_{j}=\sqrt{{s}_{j}\cdot \left({s}_{j}-{a}_{j}\right)\cdot \left({s}_{j}-{b}_{j}\right)\cdot \left({s}_{j}-{c}_{j}\right)}$, and the areas of all the triangles were summed to obtain the total surface area $S={\sum }_{j=1}^{M}{A}_{j}$.

Cotton plant height The $x$, $y$, and $z$ coordinates of all the point clouds from the segmented cotton plant stems were extracted, normalized, and processed to obtain a normalized coordinate matrix. The elements of each row in the matrix were summed, compressed into a single column, and subsequently subjected to square root summation to obtain the centroid of the organ. Subsequently, the distance from the organ's centroid to the vertex was calculated, allowing the deduction of the reference point cloud's edge length. This edge length was subsequently used to determine the scale factor. By multiplying the differences between the maximum and minimum values in the $x$, $y$, and $z$ directions by the scale factor, estimated values for various organs of the cotton plant were obtained.

$${H}_{stem}=\frac{{F}_{reference}}{{E}_{reference}}\times {F}_{stem}\times S$$

(5)

where ${H}_{stem}$ is the cotton organ estimate, ${F}_{reference}$ is the actual value of the reference, ${E}_{reference}$ is the projected value of the reference, and ${F}_{stem}$ is the actual value of the cotton organ. $S$ is the proportionality adjustment factor, which was experimentally verified by setting 0.94.

Boll volumes To compute the volume of a point cloud, a common method involves voxelization. In this process, the continuous space occupied by the point cloud is discretized into small cubic elements known as voxels. Let there be $N$ points in a point cloud, with the same coordinates $\left({x}_{i}, {y}_{i}, {z}_{i}\right)$, where $i=\text{1,2},\dots ,N$. The voxel size is denoted as $\Delta x \times \Delta y \times \Delta z$. By mapping each point to discrete voxel coordinates $\left({v}_{x}, {v}_{y}, {v}_{z}\right)$, which are calculated as $v_{x} = \left\lfloor {\frac{{x_{i} }}{\Delta x}} \right\rfloor ,\;v_{y} = \left\lfloor {\frac{{y_{i} }}{\Delta y}} \right\rfloor ,\;v_{z} = \left\lfloor {\frac{{z_{i} }}{\Delta z}} \right\rfloor$, the point cloud is voxelized. The volume of each voxel is $\Delta x \times \Delta y \times \Delta z$, and the total volume is obtained by multiplying the number of voxels concerning the volume of each voxel. This voxel-based approach provides an estimate of the point cloud volume, offering a discrete representation suitable for computational analysis.

Evaluation metric

Evaluation metrics for point cloud segmentation

The evaluation metrics for the cotton organ point cloud segmentation results include the accuracy, intersection and concatenation ratio (IoU) and cross-entropy loss function. In calculating the accuracy rate, the predicted values (${p}_{i}$) generated by the TPointNetPlus model were compared with the actual labels (${y}_{i}$) of the point cloud to evaluate the degree of consistency between them.The IoU metric evaluates the degree of overlap between the predicted label set and the actual label set, whereas the cross-entropy loss function quantifies the difference between the predicted and actual values [40]. To address the category imbalance problem, a weighted cross-entropy loss function was used to assign weights (${w}_{i}$) to each category based on the number of point clouds in each category. This weighting mechanism aims to prioritize categories with fewer point clouds, ensuring a fair assessment of the model's performance across all categories [41]. In addition to the TPointnetPlus module, factors such as the architecture of the network model, the quality and diversity of the dataset, and the design of the training process also have impacts on the accuracy of the model.

$$\text{Acc}=\frac{1}{m}\sum_{i=1}^{m}{f}_{i},{f}_{i}=\left\{\begin{array}{c}1\\ 2\end{array} \genfrac{}{}{0pt}{}{{y}_{i}={p}_{i },}{{y}_{i}\ne {p}_{i}}\right.$$

(6)

$$\text{Loss}=\sum_{\text{i}=1}^{\text{m}}({\text{y}}_{\text{i}}\text{log}\left({\text{p}}_{\text{i}}\right)+\left(1-{\text{y}}_{\text{i}}\right)\text{log}(1-{\text{p}}_{\text{i}}))$$

(7)

$$\begin{gathered} {\text{IoU}} = \frac{{\text{Area of overlap}}}{{\text{Area of union}}} \hfill \\ \quad \quad = \frac{{{\text{Area}}\left( {{\text{prediction}} \cap {\text{target}}} \right)}}{{{\text{Area}}\left( {{\text{prediction}} \cup {\text{target}}} \right)}} \hfill \\ \quad \quad = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FP}} + {\text{FN}}}} \hfill \\ \end{gathered}$$

(8)

$${\text{w}}\left( {{\text{p}},{\text{y}}} \right) = \mathop \sum \limits_{{{\text{i}} = 1}}^{{\text{C}}} {\text{w}}_{{\text{i}}} {\text{p}}_{{\text{i}}} \log \left( {{\text{y}}_{{\text{i}}} } \right)$$

(9)

Evaluation metrics for measured cotton organs

The true values of the measured cotton organ lengths, widths and heights were compared with the model-calculated estimates of the cotton organs, and accuracy was assessed by the correlation coefficient ($R$), root mean square error ($RMSE$), and margin of error ($\delta$). The predicted values of cotton plant organ phenotypic data $\widehat{\text{Y}}:\left\{{\widehat{\text{Y}}}_{1},{\widehat{\text{Y}}}_{2},\cdots ,{\widehat{\text{Y}}}_{\text{n}}\right\}$, and the true values of the cotton plant organ phenotypic data $\text{Y}:\left\{{\text{Y}}_{1},{\text{Y}}_{2},\cdots ,{\text{Y}}_{\text{n}}\right\}$ were obtained. The correlation coefficient is calculated as shown in Eqs. 10–13.

$${\text{E}}\left( {{\hat{\text{Y}}}} \right) = \frac{{\mathop \sum \nolimits_{{{\text{i}} = 1}}^{{\text{n}}} {\hat{\text{Y}}}_{{\text{i}}} }}{{\text{n}}},{\text{ E}}\left( {\text{Y}} \right) = \frac{{\mathop \sum \nolimits_{{{\text{i}} = 1}}^{{\text{n}}} {\text{Y}}_{{\text{i}}} }}{{\text{n}}}$$

(10)

$$\text{Cov}\left(\widehat{\text{Y}},\text{Y}\right)=\frac{\sum_{\text{i}=1}^{\text{n}}\left({\widehat{\text{Y}}}_{\text{i}}-\text{E}\left(\widehat{\text{Y}}\right)\right)\left({\text{Y}}_{\text{i}}-\text{E}\left(\text{Y}\right)\right)}{\text{n}}$$

(11)

$${\sigma }_{\widehat{\text{Y}}}=\sqrt{\frac{\sum_{i=1}^{n}{\left({\widehat{\text{Y}}}_{i}-E\left(\widehat{\text{Y}}\right)\right)}^{2}}{n}},{\sigma }_{Y}=\sqrt{\frac{\sum_{i=1}^{n}{\left({Y}_{i}-E\left(Y\right)\right)}^{2}}{n}}$$

(12)

$$R=\frac{\text{Cov}\left(\widehat{\text{Y}},\text{Y}\right)}{{\sigma }_{\widehat{\text{Y}}}{\sigma }_{Y}}=\frac{\sum_{i=1}^{n}\frac{\left({\widehat{\text{Y}}}_{i}-E\left(\widehat{\text{Y}}\right)\right)}{{\sigma }_{\widehat{\text{Y}}}} \frac{\left({Y}_{i}-E\left(Y\right)\right)}{{\sigma }_{Y}}}{n}$$

(13)

where $\text{E}\left(\widehat{\text{Y}}\right)$ and $\text{E}\left(\text{Y}\right)$ are the overall means of $\widehat{Y}$, and $Y$ respectively. $\text{Cov}\left(\widehat{\text{Y}},\text{Y}\right)$ is the population covariance. ${\sigma }_{\widehat{\text{Y}}}$ and ${\sigma }_{Y}$ are the standard deviations of $\widehat{Y}$ and $Y$ respectively. $R$ is the correlation coefficient, where the higher the value is the more accurate the prediction.

$RMSE$ is a commonly used measure of the difference between the predicted and actual observed values of a model to assess how well the model fits the given data. The smaller this value is the more valid the model. $RMSE$ is calculated as shown in Eq. 14.

$$RMSE=\sqrt{\frac{1}{n}\sum_{i=0}^{n}{\left({\text{Y}}_{\text{i}}-{\widehat{\text{Y}}}_{\text{i}}\right)}^{2}}$$

(14)

where $n$ is the number of samples, and ${\text{Y}}_{\text{I}}$ and ${\widehat{\text{Y}}}_{\text{I}}$ are the organ estimates and actual organ measurements, respectively.

$$\delta =\frac{\Delta }{L}\times 100\text{\%}$$

(15)

where $\Delta$ is the absolute value of the algorithm's estimate subtracted from the actual measurement and $L$ is the actual measurement of the trait.

Results

Detailed settings

The training environment for this experiment is Intel (R) Xeon (R) CPU E5-2678v3 at 2.50 GHz; NVIDIA GeForce RTX 2080Ti 10G GPU; and the operating system is Ubuntu18.04. The deep learning environment is CUDA11.6, cuDNN8.1.1, Pytorch1.11, Python3.7. Implementation details, listed in Table 2.

Table 2 Hardware configuration and operating environment

Full size table

In order to maximize the accuracy of the modeling results, hyperparametric experiments were conducted in this study, as shown in Table 3. The optimizer ADAM trained TPointNetPlus with an initial learning rate of 0.5, a batch size of 6, and an iterative calendar element count of 500. All networks were trained end-to-end using stochastic gradient descent. These network parameters trained the model with the highest test accuracy.

Table 3 Experimental results of hyperparametric network training

Full size table

Results of the improved network

The training process

After training the TPointNetPlus network, this study tested the point clouds of cotton plants on the network for organ segmentation. The results, depicted in Fig. 6a, showed that TPointNetPlus had the highest accuracy. Compared to PointNet, both TPointNetPlus and PointNet++ had higher accuracy, with PointNet's accuracy being limited to less than 80% due to its inability to capture localized details. TPointNetPlus had an accuracy more than 5% greater than PointNet++ . The performance of the network was evaluated by analysing the loss function values during the training process in Fig. 6b. Initially, PointNet had highly unstable values with large fluctuations, while TPointNetPlus had some regions with greater fluctuations in loss than PointNet++. However, from $Epoch=100$ onwards, TPointNetPlus began to level off, while PointNet++ and PointNet still exhibited large fluctuations. By $Epoch=500$, the actual loss values had decreased to less than 0.5 for PointNet, less than 0.4 for PointNet++, and below 0.2 for TPointNetPlus. After training, the average loss for TPointNetPlus was approximately 0.08, whereas the average loss for PointNet++ which was approximately 0.14.

Quantitative comparison

Similarly, the data in the test set were input into the PointNet and PointNet++ networks. The experimental results showed that the TPointNetPlus network outperformed the PointNet and PointNet++ networks in terms of several evaluation metrics, such as accuracy, F1 score, and mIoU, for the performance of leaf, boll, branch, and overall segmentation of the cotton plant. As shown in Table 4, the PointNet++ networks had the lowest accuracy compared to the other networks. In terms of accuracy, TPointNetPlus achieved the greatest improvement in the accuracy of individual organ segmentation. Compared to the other networks (PointNet and PointNet++), TPointNetPlus improved by 18.65%, 18.12%, 13.14%, and 5% for leaves, bolls, branches, and overall, respectively. This is because leaves have a larger area with distinctive overall features, whereas bolls and branches are smaller and more dispersed. Therefore, the segmentation effect on cotton leaves was more than 80%, while the accuracy of boll and branch segmentation was less than 40%. In terms of overall segmentation accuracy, there was not much difference between TPointNetPlus and PointNet++, but there was a significant difference in the segmentation accuracy scores for each organ. This indicates that TPointNetPlus is more effective than PointNet++ in local detail feature extraction.

Table 4 Segmentation results from three deep learning networks on the test set

Full size table

According to the F1 score, the performance of leaves, bolls, branches, and overall performance exceeded 90%, with the values for leaves exceeding 99%. This indicates that TPointNetPlus has achieved a good balance of accuracy and recall. Due to the narrowly dispersed feature area of the branches, the mIoU value of TPointNetPlus is only 80.37%. However, the value of the PointNet network is only 19.06%, which is enough to prove the effectiveness of TPointNetPlus in local feature extraction. This finding suggested that the model can accurately capture the location and shape of the target during segmentation.

Qualitative comparison

The point cloud plants were segmented into leaves, bolls, and branches at the seedling, bud, and boll stages of the cotton growth cycle. The visualization results of point cloud organ segmentation of cotton plants show that TPointNetPlus outperforms PointNet++ and PointNet in segmenting main stems, branches, and bolls, as shown in Fig. 7. The segmentation results of PointNet validate the quantitative segmentation results in Table 4. PointNet performs poorly on all three metrics reviewed, and Fig. 7 shows that most points were misclassified. From leaf and stem segmentation at the seedling stage (Fig. 7a), to leaf, boll, and stem segmentation at the bud stage (Fig. 7b), and to leaf, boll, and stem segmentation at the boll stage (Fig. 7c), more misclassifications occur. Compared to the real values, PointNet++ and TPointNetPlus have some errors in organ segmentation, but the overall segmentation effect is good. PointNet++ has errors in branch and leaf segmentation, especially at the top of the plant at the boll stage. This is due to the thinness of the branches and trunks at the top of the cotton plant, and the insufficient extraction of network features.

The results of the cotton plant organ segmentation visualization showed that TPointNetPlus and PointNet++ both achieved overall accuracies greater than 90%. However, there were several false predictions in both networks. TPointNetPlus misclassified a part of the branch stem as a leaf in some cases (Fig. 8d), while the PointNet++ misprediction was more prominent (Fig. 8c). Additionally, parts of the branches were incorrectly predicted as leaves. These misclassified branch regions were connected to the leaves and had thin branches, but this misclassification was not as prominent in the TPointNetPlus inference as it was in the PointNet++ inference. Moreover, TPointNetPlus successfully segmented cotton bolls, while the other two models did not.

Another source of incorrect predictions is incorrect manual labelling. Due to manual labelling, many leaf stalks attached to leaves are labelled leaves, and the network learns to segment them as leaves. As a result, in some cases, the branch portion attached to the leaf blade was incorrectly segmented. Similarly, the network incorrectly categorizes thicker branches as leaves. This is because they are similar in shape to smaller blade sections. However, due to their small size, they were manually labelled as part of the branch in the ground truth.

Results of instance segmentation

The TPointNetPlus network was used to perform semantic segmentation on a cotton plant point cloud, resulting in data on leaves, cotton bolls, and branches. To further distinguish individual leaves and cotton bolls, the HDBSCAN algorithm was used to conduct instance segmentation. Figure 9 illustrates the successful segmentation of cotton plants in the boll stage, these plants had more than 15 leaves and 5 or more cotton bolls. The successful semantic and instance segmentation of leaves and cotton bolls provides an ideal basis for accurate measurements in the following stages.

Results of phenotypic parameter extraction

Cotton leaf area

Figure 10 illustrates the process and analysis of measuring the cotton leaf area. Initially, a cotton leaf was segmented from the cotton plant point cloud (Fig. 10a). The actual leaf area was measured using an LA-S series plant image analyser (Fig. 10d). The predicted value was obtained by triangulating the point cloud of the leaf (Fig. 10b) to form a triangular mesh representation of the leaf (Fig. 10c). Across the entire test dataset, the estimated leaf area based on the predicted segment showed a high correlation with the actual measurements, with an R² value exceeding 0.96 (Fig. 10e). For leaf area, the root mean square error (RMSE) was relatively low, with an RMSE = 3.41, indicating accurate predictions of leaf area size in most cases. A comparison between the ground truth values and predicted values demonstrated that the estimated leaf area was generally equal to or smaller than the ground truth values, without exceeding the actual measured values (Fig. 10e). Since the estimation of leaf area relies on the triangular mesh, accurate pediction of these points is crucial, and occasional rare segmentation errors in the leaf area do not affect the overall area calculation.

Boll volumes

The process of measuring the volume of cotton bolls is demonstrated in Fig. 11. The test data showed that accurate predictions were made (Fig. 11a). The volumetric representation of the cotton boll point cloud, obtained through voxelization, enables the calculation of the boll volume based on the point cloud density (Fig. 11b). The leaf area traits derived from the predicted segments were strongly correlated with the ground truth measurements, with an R² value greater than 0.95 (Fig. 11e). The root mean square error (RMSE) for boll volume was low at RMSE = 24.47, indicating precise predictions of cotton boll volume in most cases. A comparison between the ground truth values and predictions showed that the estimated boll volume was usually equal to or lower than the ground truth values without surpassing the actual measurements (Fig. 11e).

Leaf number

The process of counting cotton leaves and analyzing the results is illustrated in Fig. 12. Initially, the TPointNetPlus network segments the point cloud model of the cotton plant to identify leaves (Fig. 12a). Subsequently, each leaf is extracted using the HDBSCAN clustering algorithm (Fig. 12b). Finally, the number of leaves is statistically analyzed using the counting function (Fig. 12c). The estimated number of leaves based on the predicted segments shows a high correlation with the actual values, with an R-squared value exceeding 0.98 (Fig. 12d). The root mean square error (RMSE) for leaf number is relatively low at RMSE = 0.62, indicating accurate prediction of leaf number in most cases. The experimental results demonstrate that this process can effectively and accurately identify and quantify the leaves of cotton plants, providing valuable data for further research and analysis.

Discussion

3D reconstruction of cotton plants

3D reconstruction of cotton plants is essential for acquiring point cloud data, with its quality directly affecting data accuracy. Currently, different 3D reconstruction techniques have been used to construct 3D plant models for phenotyping [42]. Although LiDAR scanners can provide highly accurate point cloud data, they may be affected by plant density and shading [43]. Dense vegetation may cause the laser beam to lose some information as it passes through the plant layers, and occlusion between plants may complicate the challenge of obtaining a complete plant structure. A 3D time-of-flight camera (ToF) has been shown to rapidly acquire 3D images of plants, but its resolution and accuracy relatively low [44,45,46]. 3D laser scanning typically acquires data from only one point of view and therefore may present challenges when dealing with complex plant structures [43, 47, 48]. This single viewpoint may not be able to capture all the details of the plant surface, especially if the plant's underside or leaves overlap. The contour shape-based method is efficient in measuring plant volume, stem height, and surface area of individual leaves, but it may not be robust enough for localized occlusion or leaf overlap, leading to a decrease in the accuracy of the measurements [49].

The multi-view point based approach [50] examines plants from multiple perspectives simultaneously, providing more comprehensive and accurate information about their structure. This method addresses issues such as occlusion and lack of detail that can arise from a single viewpoint. However, the point cloud collection method used in this article has limitations: it cannot be applied on a large scale outdoors and requires destructive sampling. Additionally, collecting 180 images involves a significant workload and the reconstruction operation time is lengthy. Despite these drawbacks, the cotton multi-view plant 3D reconstruction method achieves more precise depth information and integrates geometric morphology with texture information to create a more realistic and lifelike 3D model of the cotton plant. This method can assist botanists and ecologists in better studying the growth patterns and interactions of cotton, deepening their understanding of its structure and function, and diagnosing pests and diseases.

Cotton plant height

In this study, we measured and analyzed the height of cotton plants. However, the dataset only includes 5 sets of data, which is relatively small and may affect the reliability and representativeness of the results. The following will analyze the reasons for the insufficient data and its impact on the study's findings, as well as propose potential improvements for future research.

The process of measuring plant height in the tested cotton plants and analysing the results are illustrated in Fig. 13. The prediction accuracy was consistent across all test cases, accurately predicting the bottom 1 cm region of the main stem (Fig. 13b). There was a high correlation between the main stem traits estimated from the predicted segments and those estimated from the ground truth throughout the entire test set, with R² values exceeding 0.97 (Fig. 13d). RMSE was low (RMSE = 0.26), indicating accurate categorization of the lowest and highest points in most cases. A comparison between the ground truth and predicted values demonstrated that the estimated plant height values were equal to or less than the ground truth values and did not exceed them (Fig. 13c). Since plant height estimation relies on correctly predicting the highest and lowest points, any occasional missegregations in the middle of the plant height had no impact on the resulting heights.

Analysis of the clustering results

Density-based spatial clustering of applications with noise (DBSCAN) as a density clustering algorithm has shown significant advantages in many aspects. Its robustness enables it to handle noise and outliers efficiently without the need to know the number of clusters in advance [51]. Compared to other clustering algorithms, DBSCAN has better adaptability to irregularly shaped clusters, is able to discover clusters of arbitrary shapes, and is sensitive to data with large variations in density [52]. In addition, the algorithm performs well in terms of scalability and is suitable for clustering tasks on large-scale datasets [51]. However, DBSCAN has several drawbacks. Its performance is sensitive to clusters with large differences in density in the data, and the parameters may need to be adjusted to accommodate clusters with different densities [53]. In addition, DBSCAN may face the challenge of dimensionality catastrophe when dealing with high-dimensional data and needs to be used with caution [54].

Compared to HDBSCAN, the main disadvantage of DBSCAN is the need to prespecify some parameters, such as the neighborhood radius and the number of points in the minimum neighborhood, whereas HDBSCAN is relatively more adaptive and does not require an explicit density threshold [55]. HDBSCAN is also hierarchical in nature and is able to perform clustering at different density levels, making it more suitable for clusters with different density levels [56]. Both DBSCAN and HDBSCAN yield correct results when the cotton instance is split. However, DBSCAN was incorrect when segmenting leaf instances with more complex morphology and closer proximity (Fig. 14).

Conclusion

The proposed TPointNetPlus, a cotton point cloud organ instance segmentation method, seamlessly integrates deep learning and clustering algorithms to enhance the accuracy of phenotypic organ structure measurements through 3D point clouds. The creation of a dedicated point cloud dataset for cotton plants, coupled with the incorporation of the attention module transformer into the PointNet++ model, contributes to precise feature extraction. The application of the HDBSCAN algorithm for organ-level cotton plant point cloud segmentation successfully isolates cotton leaves, bolls, and branches, providing accurate phenotypic feature parameters. The research outcomes highlight the exceptional semantic segmentation accuracy of TPointNetPlus (98.38%) for cotton leaves. Correlation coefficients between the measured values of key phenotypic parameters demonstrated the model's reliability in predicting traits such as plant height, leaf area, and boll volume. This automated method involves translating a plant's 3D point cloud into phenotypic parameters, which can be applied in fields such as cotton breeding and plant physiology.

Availability of data and materials

No datasets were generated or analysed during the current study.

References

Li N, et al. Impact of climate change on cotton growth and yields in Xinjiang, China. Field Crops Res. 2020. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.fcr.2019.107590.
Article Google Scholar
Lin HX, et al. Rationality of Xinjiang to undertake textile industry based on water resources carrying capacity. Desalin Water Treat. 2021;219:147–56.
Article Google Scholar
Pabuayon ILB, et al. High-throughput phenotyping in cotton: a review. J Cotton Res. 2019. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s42397-019-0035-0.
Article Google Scholar
Das Choudhury S, et al. Leveraging image analysis to compute 3D plant phenotypes based on voxel-grid plant reconstruction. Front Plant Sci. 2020. https://doiorg.publicaciones.saludcastillayleon.es/10.3389/fpls.2020.521431.
Article PubMed PubMed Central Google Scholar
Yang WN, et al. Crop phenomics and high-throughput phenotyping: past decades, current challenges, and future perspectives. Mol Plant. 2020;13(2):187–214.
Article CAS PubMed Google Scholar
Li ZB, et al. A review of computer vision technologies for plant phenotyping. Comput Electron Agric. 2020. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.compag.2020.105672.
Article Google Scholar
Khan Z, et al. Estimation of vegetation indices for high-throughput phenotyping of wheat using aerial imaging. Plant Methods. 2018. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13007-018-0287-6.
Article PubMed PubMed Central Google Scholar
Huang LW, et al. Real-time motion tracking for indoor moving sphere objects with a LiDAR sensor. Sensors. 2017. https://doiorg.publicaciones.saludcastillayleon.es/10.3390/s17091932.
Article PubMed PubMed Central Google Scholar
Perez AJ, Perez-Cortes JC, Guardiola JL. Simple and precise multi-view camera calibration for 3D reconstruction. Comput Ind. 2020. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.compind.2020.103256.
Article Google Scholar
Zhong YJ, et al. Multi-view 3d reconstruction from video with transformer. In IEEE International Conference on Image Processing (ICIP). 2022. Bordeaux, France.
Xie CQ, Yang C. A review on plant high-throughput phenotyping traits using UAV-based sensors. Comput Electron Agric. 2020. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.compag.2020.105731.
Article Google Scholar
Feng L, et al. A comprehensive review on recent applications of unmanned aerial vehicle remote sensing with various sensors for high-throughput plant phenotyping. Comput Electron Agric. 2021. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.compag.2021.106033.
Article Google Scholar
Qi CR, et al. PointNet: deep learning on point sets for 3D classification and segmentation. In 30th IEEE/CVF conference on computer vision and pattern recognition (CVPR). 2017. Honolulu, HI.
Thomas H, et al. KPConv: flexible and deformable convolution for point clouds. In IEEE/CVF international conference on computer vision (ICCV). 2019. Seoul, South Korea.
Wang F, Bryson M. Tree segmentation and parameter measurement from point clouds using deep and handcrafted features. Remote Sens. 2023. https://doiorg.publicaciones.saludcastillayleon.es/10.3390/rs15041086.
Article Google Scholar
Sun P, Yuan X, Li D. Classification of individual tree species using UAV LiDAR based on transformer. Forests. 2023. https://doiorg.publicaciones.saludcastillayleon.es/10.3390/f14030484.
Article Google Scholar
Saeed F, et al. Cotton plant part 3D segmentation and architectural trait extraction using point voxel convolutional neural networks. Plant Methods. 2023;19(1):33.
Article PubMed PubMed Central Google Scholar
Ruiming D, et al. PST: plant segmentation transformer for 3D point clouds of rapeseed plants at the podding stage. J Photogramm Remote Sens. 2023;195:380–92.
Article Google Scholar
Li H, et al. Automatic branch-leaf segmentation and leaf phenotypic parameter estimation of pear trees based on three-dimensional point clouds. Sensors. 2023. https://doiorg.publicaciones.saludcastillayleon.es/10.3390/s23094572.
Article PubMed PubMed Central Google Scholar
Liu B, et al. TSCMDL: multimodal deep learning framework for classifying tree species using fusion of 2-D and 3-D features. IEEE Trans Geosci Remote Sens. 2023;61:1–11.
Article CAS Google Scholar
Meng HY, et al. VV-NET: Voxel VAE net with group convolutions for point cloud segmentation. In IEEE/CVF International Conference on Computer Vision (ICCV). 2019. Seoul, South Korea.
Xu MT, et al. PAConv: position adaptive convolution with dynamic kernel assembling on point clouds. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2021. Electr Network.
Wang Y, et al. Dynamic graph CNN for learning on point clouds. ACM Trans Graph. 2019;38(5):1–12.
Article Google Scholar
Zeng Z, et al. RG-GCN: a random graph based on graph convolution network for point cloud semantic segmentation. Remote Sens. 2022. https://doiorg.publicaciones.saludcastillayleon.es/10.3390/rs14164055.
Article Google Scholar
Xu QG, et al. Grid-GCN for fast and scalable point cloud learning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2020. Electr Network.
Wang F, et al. Residual attention network for image classification. In 30th IEEE/CVF conference on computer vision and pattern recognition (CVPR). 2017. Honolulu, HI.
Wu B, et al. Visual transformers: token-based image representation and processing for computer vision. 2020. abs/2006.03677.
Dosovitskiy A, et al. An image is worth 16x16 words: transformers for image recognition at scale. 2020. abs/2010.11929.
Zhang H, et al. Self-attention generative adversarial networks. in 36th international conference on machine learning (ICML). 2019. Long Beach, CA.
Carion N, et al. End-to-end object detection with transformers. in European conference on computer vision. 2020. Springer.
Wan J, et al. DGANet: a dilated graph attention-based network for local feature extraction on 3D point clouds. Remote Sens. 2021. https://doiorg.publicaciones.saludcastillayleon.es/10.3390/rs13173484.
Article Google Scholar
Feng MT, et al. Point attention network for semantic segmentation of 3D point clouds. Pattern Recogn. 2020. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.patcog.2020.107446.
Article Google Scholar
Wen X, et al. CF-SIS: semantic-instance segmentation of 3D point clouds by context fusion with self-attention. In Proceedings of the 28th ACM International Conference on Multimedia. 2020. p. 1661–1669.
Niu Z, Zhong G, Yu H. A review on the attention mechanism of deep learning. Neurocomputing. 2021;452:48–62.
Article Google Scholar
Guo M-H, et al. Attention mechanisms in computer vision: a survey. Comput Visual Media. 2022;8(3):331–68.
Article Google Scholar
Wen X, et al. PMP-Net++: point cloud completion by transformer-enhanced multi-step point moving paths. IEEE Trans Pattern Anal Mach Intell. 2023;45(1):852–67.
Article PubMed Google Scholar
Dosovitskiy A, et al. An image is worth 16x16 words: transformers for image recognition at scale. 2020.
Qi CR, et al. PointNet plus plus: deep hierarchical feature learning on point sets in a metric space. In 31st Annual Conference on Neural Information Processing Systems (NIPS). 2017. Long Beach, CA.
Campello R, et al. Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM Trans Knowl Discov Data. 2015. https://doiorg.publicaciones.saludcastillayleon.es/10.1145/2733381.
Article Google Scholar
Ho Y, Wookey S. The real-world-weight cross-entropy loss function: modeling the costs of mislabeling. IEEE Access. 2020;8:4806–13.
Article Google Scholar
Zhou DF, et al. IoU Loss for 2D/3D Object Detection. In 7th International Conference on 3D Vision (3DV). 2019. Quebec City, Canada.
Shi WN, et al. Plant-part segmentation using deep learning and multi-view vision. Biosyst Eng. 2019;187:81–95.
Article Google Scholar
Thapa S, et al. A novel LiDAR-based instrument for high-throughput, 3D measurement of morphological traits in maize and sorghum. Sensors. 2018. https://doiorg.publicaciones.saludcastillayleon.es/10.3390/s18041187.
Article PubMed PubMed Central Google Scholar
Chaivivatrakul S, et al. Automatic morphological trait characterization for corn plants via 3D holographic reconstruction. Comput Electron Agric. 2014;109:109–23.
Article Google Scholar
Guan HO, et al. Three-dimensional reconstruction of soybean canopies using multisource imaging for phenotyping analysis. Rem Sens. 2018. https://doiorg.publicaciones.saludcastillayleon.es/10.3390/rs10081206.
Article Google Scholar
Vázquez-Arellano M, et al. Leaf area estimation of reconstructed maize plants using a time-of-flight camera based on different scan directions. Robotics. 2018. https://doiorg.publicaciones.saludcastillayleon.es/10.3390/robotics7040063.
Article Google Scholar
Paulus S, et al. Low-cost 3d systems: suitable tools for plant phenotyping. Sensors. 2014;14(2):3001–18.
Article PubMed PubMed Central Google Scholar
Garrido M, et al. 3D maize plant reconstruction based on georeferenced overlapping LiDAR point clouds. Rem Sens. 2015;7(12):17077–96.
Article Google Scholar
Golbach F, et al. Validation of plant part measurements using a 3D reconstruction method suitable for high-throughput seedling phenotyping. Mach Vis Appl. 2016;27(5):663–80.
Article Google Scholar
Pound MP, et al. Automated recovery of three-dimensional models of plant shoots from multiple color images. Plant Physiol. 2014;166(4):1688-U801.
Article PubMed PubMed Central Google Scholar
Ester M, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In kdd. 1996.
Ankerst M, et al. OPTICS: Ordering points to identify the clustering structure. In 1999 ACM SIGMOD International Conference on Management of Data. 1999. Philadelphia, Pa.
Schubert E, et al. DBSCAN revisited, revisited: why and how you should (still) use DBSCAN. ACM Trans Database Syst. 2017. https://doiorg.publicaciones.saludcastillayleon.es/10.1145/3068335.
Article Google Scholar
Kriegel HP, et al. Outlier detection in axis-parallel subspaces of high dimensional data. In 13th Pacific-Asia Conference on Knowledge and Data Mining. 2009. Bangkok, Thailand.
Campello RJ, Moulavi D, Sander J. Density-based clustering based on hierarchical density estimates. In Pacific-Asia conference on knowledge discovery and data mining. 2013. Springer.
McInnes L, Healy J. Accelerated hierarchical density based clustering. In 17th IEEE International Conference on Data Mining (ICDMW). 2017. New Orleans, LA.

Download references

Funding

This work was supported by the Ministry of Education Industry-University Cooperation Collaborative Education Program (No. 220505876092353), the First-class Undergraduate Programmes Foundation in Computer Graphics at Tarim University (No. TDYLKC202231), and the Xinjiang Production and Construction Corps Science and Technology Program (No. 2022DB002).

Author information

Authors and Affiliations

College of Information Science and Engineering, Xinjiang University of Science and Technology, Korla, 841000, China
Fu-Yong Liu & Shi-Quan Shen
College of Information Engineering, Tarim University, Alaer, 843300, China
Hui Geng, Lin-Yuan Shang & Chun-Jing Si
Key Laboratory of Tarim Oasis Agriculture (Tarim University), Ministry of Education, Alaer, 843300, China
Chun-Jing Si

Authors

Fu-Yong Liu
View author publications
You can also search for this author inPubMed Google Scholar
Hui Geng
View author publications
You can also search for this author inPubMed Google Scholar
Lin-Yuan Shang
View author publications
You can also search for this author inPubMed Google Scholar
Chun-Jing Si
View author publications
You can also search for this author inPubMed Google Scholar
Shi-Quan Shen
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

CJS and FYL wrote the manuscript with contributions from all authors. SQS performed the experiments. CJS analyzed the results. HG collected data on cottons. LYS set up a simple experimental setup. HG performed the image analysis. All authors have revised and approved the final manuscript.

Corresponding authors

Correspondence to Chun-Jing Si or Shi-Quan Shen.

Ethics declarations

Ethics approval and consent to participate

This research contains no materials, procedures, and case studies related to human and/or animal.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Material 1.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Liu, FY., Geng, H., Shang, LY. et al. A cotton organ segmentation method with phenotypic measurements from a point cloud using a transformer. Plant Methods 21, 37 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13007-025-01357-w

Download citation

Received: 13 January 2024
Accepted: 04 March 2025
Published: 16 March 2025
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13007-025-01357-w

A cotton organ segmentation method with phenotypic measurements from a point cloud using a transformer

Abstract

Introduction

Related works

Plant point cloud segmentation and measurement

Transformer

Materials and methods

Overview

Cotton3D dataset

Data acquisition

Point cloud preprocessing

Cotton3D dataset

Transformer

Instance segmentation of point clouds

Semantic segmentation

Clustering algorithm

Phenotype parameter extraction

Evaluation metric

Evaluation metrics for point cloud segmentation

Evaluation metrics for measured cotton organs

Results

Detailed settings

Results of the improved network

The training process

Quantitative comparison

Qualitative comparison

Results of instance segmentation

Results of phenotypic parameter extraction

Cotton leaf area

Boll volumes

Leaf number

Discussion

3D reconstruction of cotton plants

Cotton plant height

Analysis of the clustering results

Conclusion

Availability of data and materials

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Ethics approval and consent to participate

Competing interests

Additional information

Publisher's Note

Supplementary Information

Supplementary Material 1.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Plant Methods

Contact us