Soybean Root System Architecture Trait Study through Genotypic, Phenotypic, and Shape-Based Clusters

We report a root system architecture (RSA) traits examination of a larger scale soybean accession set to study trait genetic diversity. Suffering from the limitation of scale, scope, and susceptibility to measurement variation, RSA traits are tedious to phenotype. Combining 35,448 SNPs with an imaging phenotyping platform, 292 accessions (replications = 14) were studied for RSA traits to decipher the genetic diversity. Based on literature search for root shape and morphology parameters, we used an ideotype-based approach to develop informative root (iRoot) categories using root traits. The RSA traits displayed genetic variability for root shape, length, number, mass, and angle. Soybean accessions clustered into eight genotype- and phenotype-based clusters and displayed similarity. Genotype-based clusters correlated with geographical origins. SNP profiles indicated that much of US origin genotypes lack genetic diversity for RSA traits, while diverse accession could infuse useful genetic variation for these traits. Shape-based clusters were created by integrating convolution neural net and Fourier transformation methods, enabling trait cataloging for breeding and research applications. The combination of genetic and phenotypic analyses in conjunction with machine learning and mathematical models provides opportunities for targeted root trait breeding efforts to maximize the beneficial genetic diversity for future genetic gains.


Introduction
Root system architecture (RSA) is essential for water and nutrient acquisition, microbe interaction, nutrient storage, and structural anchorage and impacts grain yield [1,2]. Crop breeding programs including soybean rarely utilize RSA as selection criteria; therefore, RSA traits have developed indirectly in crop species [3]. Researchers are cognizant of the genetic and phenotypic complexity that is inherent at the organismal level and promote standardization in terminology and removal of redundancies for the measurement of every conceivable trait [4][5][6]. However, RSA studies have been hindered by trait, measurement, and environment complexity. The plethora of root traits identified through different studies and software further complicate the identification of opportunities to select the most informative and relevant suite of traits [5,[7][8][9][10][11]. A recent focus on the investigation of root trait methodologies has significantly advanced trait measurement capability and capacity [4,[12][13][14][15][16][17]. Continual efforts are needed to utilize genomics and phenomics tools to study the RSA trait variation for application in crop breeding and research programs [18].
Various systems have been introduced to study RSA traits including methods that focus on the controlled environment (lab bench, growth chamber, and greenhouse) and in the field environment [4,13,16,17]. Controlled environments provide the ease of use, speed, and scalability required for crop breeding programs. Field environment methods can provide higher immediate applicability despite being more resource intensive and with lower trait heritability [4]. Researchers have attempted to gain insight through a balanced approach utilizing higher throughput systems with advanced technology together with field-based validation leveraging the advantages of both artificial and field-based methods while reducing their drawbacks [9,[19][20][21]. This insight will ensure a comprehensive understanding of RSA traits, their genome to phenome relationship, and trait selection targets for cultivar development. However, due to higher trait heritability and ease of scale, controlled environment experiments serve as a strong foundation for future RSA trait studies.
In each crop species, one of the first steps in utilizing traits for practical breeding outcomes starts with the exploration of its genetic diversity [11,22,23]. Limited information is available in soybean [Glycine max L. (Merr.)] for RSA traits, although some QTL studies with a limited number of genotypes have been published [24][25][26]. Soybean is an interesting crop for these studies due to a severe genetic bottleneck reported in cultivated varieties [27]. Despite the narrow genetic diversity of soybean cultivars in the USA, nonroot trait-focused studies have reported the value of germplasm banks being well equipped with large and useful genetic diversity [28][29][30][31][32]. Therefore, an initial scan to catalogue the diversity of root phenotypes within soybean genetic diversity is needed.
Morphology parameters are useful to classify roots into different types and to correlate root type to environmental advantages such as nutrient acquisition and drought or flood tolerance. For example, crop species with deep rooting systems have been correlated to adaptation in drought prone environments [1,33,34] while those with shallow, fibrous root systems have been shown to have efficient phosphorus uptake in phosphorus-deficient soils [35][36][37][38][39][40][41][42]. The "steep, deep, and cheap" root type in maize has been promoted for efficient and effective water and nitrogen acquisition [43]. A highly competitive root with fast-growing characteristics and efficient root placement, including deep roots, to chase moisture through the soil profile is most suitable for water deficit crop growing environments [1,34,44,45]. Soybean taproots that elongate faster from germination also have been shown to burrow deeper into the soil profile, have increased root densities at depth, and are better able access to water in drought situations [46,47]. A current dilemma is that optimum root architecture is based on the assumption that deep roots need to be complemented with shallow lateral roots to efficiently forage for soil immobile nutrients [36,48,49]. This dichotomy creates a need to further explore the elusive optimum root architecture. The initial step should be the compilation of reported root shape categories available in the literature (see Materials and Methods; iRoot categories), which can be accomplished through the combination of genetic and phenotypic information. For example, singlenucleotide polymorphisms (SNPs) can help determine genetic variability and create genotype-based clusters. Similarly, genetic diversity and RSA can be studied on a trait by trait or overall trait basis using phenotypic information in conjunction with principal component analysis (PCA), and hierarchical and k-means statistical methodologies allowing further insights to be drawn using these relationships [50].
The overall objective of this research was to catalogue soybean root trait diversity in controlled environment conditions and to investigate the correlation between genotype, phenotype-and nonroot phenotype/country of origin-based descriptors. This study was carried out using 292 diverse soybean accessions from the USDA core collection primarily in maturity groups II and III together with a subset of the Soybean Nested Association Mapping (SoyNAM) parents. These accessions were studied in controlled environment conditions and phenotyped with an imaging platform at 6 (6 d), 9 (9 d), and 12 (12 d) days after germination. Genotypebased clusters (GBC) were created using SNP data generated by 50K Illumina chip [51] in which genotypes were separated using hierarchical cluster analysis. Eight phenotype-based clusters (PBC) were created based on hierarchical clustering of thirteen root traits derived from the experimental study. Root shape-based clusters were created using averaged, smoothed, normalized, and compressed (high-level features) root shape outline data generated by Elliptic Fourier Transformation (EFT) and a Convolutional Neural Network (CNN). We created informative root (iRoot) categories based on characteristic root types described in the literature such as drought tolerant or nutrient foraging as a method looking beyond individual root traits to capture the essence of differing root shapes and characteristics. Our results indicate that soybean accessions for RSA traits are genetically diverse and cluster-specific trends and differentiations were evident. The US accessions showed limited genetic diversity, suggesting it could benefit from the infusion of useful RSA trait diversity in breeding programs. The approach present in this paper is applicable to other crops for RSA-focused breeding and research applications.

Experimental and Technical Design.
We developed a mobile, low-cost, and high-resolution root phenotyping system composed of an imaging platform to establish a seamless end-to-end pipeline previously described in Falk et al. [52]. The platform includes obtaining root samples through image-based trait processing and extraction of biologically relevant time series data on root growth and development for phenomics, genomics, and plant breeding applications. The seedling growth system component allowed for easy removal from the growth chamber and nondestructive imaging at multiple time points (6, 9, and 12 days) after germination. The imaging platform component consisted of a Canon T5i digital SLR camera (lens: EF-S 18-55 mm f/3.5-5.6 IS II) (Canon USA, Inc., Melville, NY) mounted to an aluminum T-slot extrusion frame (80/20 Inc., Columbia City, IN) with two softbox photography lights (Neewer; Shenzen, China), four 70-watt CFL bulbs in total, to provide consistent illumination. Together with a connected laptop computer, the entire system was assembled on a utility cart (Uline, Pleasant Prairie, WI) creating a small, mobile imaging station. Images were captured via a laptop computer using Smart Shooter 3 remote capture software [53] allowing for automatic image naming via the affixed barcode, optimizing time and reducing human transcription error.
Root phenotyping software, ARIA 2.0 [52], was used to batch process over 12,000 images. Prior to image processing, color thresholding app extension in MATLAB (MathWorks, Inc., Natick, MA) was used to interactively develop a rule to generate a binary image for segmenting the foreground (root) and the background (blue germination paper). The developed rule was to convert the RGB image to HSV and consider the pixels with a hue (H) value higher than 175°as the blue background. The rule did not work for some images where the background was infrequently oversaturated by light reflection. In these cases, the RGB image was converted to Lab color space and a threshold value for L (lightness) channel was selected heuristically for the background. These rules were implemented in the ARIA 2.0. Bulk image sets were automatically processed through the ARIA 2.0 software for root segmentation and skeletonization, followed by root system architecture trait information extraction (Table 1). Seedling shoot and root dry weights were also collected at 12 days after germination.

Plant Materials.
The diversity panel used in this experiment consisted of plant introductions (PIs) from the USDA core collection and Soybean Nested Association Mapping (SoyNAM) parental lines [54]. Selections from the SoyNAM panel included lines with diverse ancestry (n = 10) and high- yielding elite lines (n = 13), which were combined with the USDA core collection landraces (n = 269) to assemble a genetically diverse panel that has previously been genotyped [17] (Supplementary files 1 and 2). These genotypes consisted of a wide range of geographies (12 countries of origin), maturity groups (groups 1 (n = 19), 2 (n = 115), 3 (n = 156), and 4 (n = 2)), and growth habit (determinate (n = 88), semideterminate (n = 34), and indeterminate (n = 164)) along with various other morphological and seed quality traits.
2.3. Seedling Growth. The protocol for seed germination and transplanting is described by Falk et al. [52]. Ten seeds of each genotype were germinated in paper rolls in which two of the ten seedlings were transplanted at five days onto wet blue germination paper for further growth and root trait phenotyping. Two blue germination papers (30 cm × 45 cm) (Anchor Paper, Minneapolis, MN); each containing seedlings of each genotype was placed together, attached with binder clips as a growth pouch unit. Each growth pouch unit was suspended by the rungs of a growth chamber shelf with the lower 3 cm of the paper submerged in water as a wick to bring moisture to the roots. The growth chambers were 175 cm by 100 cm and contained standard metal grate shelves 1.3 cm by 35 cm slots (Controlled Environments Ltd., Winnipeg, Canada). The growth chambers contained a plastic tote on the floor providing a water depth of 5 cm allowing each growth paper unit to be submerged to 3 cm. Growth chambers were set at 25°C during a 16-hour day, 22°C for an 8-hour night. Growth chamber light intensity was measured at 300 and 350 μmol photons (m -2 s -1 ) using a Li-250A light meter (Li-Cor Biosciences, Lincoln, NE, USA).

Statistical Analysis.
All statistical analyses were performed using R programming language unless otherwise specified. To evaluate the 292 genotypes, we first eliminated outliers using Tukey's boxplot method [55] before calculating the best linear unbiased predictor (BLUP) values for each trait utilizing a mixed model and the "lme4" package [56,57]. In the model (Equation (1)), y ik is the response variable of the i th genotype at the k th block (i.e., growth chamber used), μ is the total mean, g i is the genetic effect of the i th genotype, b k is the block effect, and e ik is the experimental error following N ð0, σ 2 e Þ. All factors were considered random effects. Broad sense heritability was calculated on an entry-mean basis using Equation (2), where σ 2 g is the genotypic variance and r is the number of replications (r = 14). Tukey's honestly significant differences (HSD) [58] were used to identify statistical differences between genotype-based clusters (GBC) in which MSE is the mean squared error, q is the test statistic found in the q-table, and S a is the number of observations of the a th group (Equation (3)). To identify excessive correlation, a collinearity test of the predictor variables was performed using a variance inflation factor of five as a threshold to quantify the severity; R is the regression coefficient of the j th variable with respect to the rest of the variables (Equation (4)). Fixation indices were calculated using the Hudson F ST approach using the fst.hudson function in the KRIS R package where n i is the sample size andp i is the sample allele frequency in population i for i ϵ f1, 2g (Equation (5)).
Tukey′s HSD = q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi Variance inflation factor = 1 2.5. Informative Root (iRoot) Categories. We created iRoot categories in an effort to (a) reduce dimensionality (looking at a category rather than individual traits), (b) aggregate traits, as opposed to focusing on an individual root trait for increased biological relevance, and (c) identify specific trait measurements and statistical analyses to quantify differing root shapes based on a previous scientific work found in the literature [7,9,[59][60][61]. All 292 genotypes were ranked from 1 (highest numerically) to 292 (lowest) based on each root trait. These rank scores of genotypes are summed for the specific traits that compile each iRoot category.  (Figure 1). For example, a particular genotype ranked highly in the nutrient foraging iRoot category would display high values in three root traits: WID, TRL_GR, and TRLUpper. To be clear, iRoot categories selected roots that display root trait characteristics affiliated with the quality (e.g., deep roots for drought tolerance) in the growth chamber experiment, not to be confused with genotypes displaying "classical drought tolerance characteristics" in the field environment. Additionally, the analysis of all iRoot categories was restricted to images from nine days after germination (9 d) as particular slow growing genotypes lacked significant lateral roots at 6 d while other fast growing genotypes were subject to outgrow the germination paper medium by 12 d. The nutrient foraging iRoot category was based from previous reports [36] which describe this phenotype as maximizing the distribution of lateral roots in the topsoil at a minimal metabolic cost to outperform competitor genotypes in nutrient poor soil. The phenotype was created to contain a wide root system with a high ratio for total root length in the upper 1/3 of the root system as well as a fast TRL_GR. The drought tolerant iRoot category followed the steep, deep, and cheap paradigm [43]   Hierarchical clustering was used to group similar traits. Symbols (shape and color) denote RSA traits used in the corresponding iRoot index (cumulative trait scores). a high total root length growth rate while selecting steep lateral root angles (low angle is more advantageous) and minimizing root solidity (NWA/CVA) thus minimizing spatial density and therefore the metabolic cost. The beard iRoot type [62] maximizes total root length, the number of lateral root branches, and length distribution (TRLUpper/TRL-Lower) while minimizing total root width, lateral root angle, and root solidity (NWA/CVA) thus maximizing root density. The umbrella category [62] maximizes primary root length, root width, convex area, shallow lateral root angle, and length distribution (root length in the upper 1/3 over root length in the lower 2/3 of the root system). Finally, the maximum iRoot category was created for this study as an effort to identify genotypes that maximize the phenotypic potential without a particular environment in mind. As previously stated, these iRoot categories were created for dimension reduction, to increase of biological relevance and to facilitate comparison with a previous scientific work.
2.6. RSA Trait Correlations. Correlations between 49 plant traits were obtained using Pearson's correlation by implementing cor function in the "stats" package. Traits were grouped using hierarchical clustering using complete linkage with the hclust function. Visualization was performed using the "corrplot" package.

Phenotype-Based Clusters.
Aside from typical trait-based analytical approaches, we explored alternate methods to group and correlate genotypes and their root trait phenotypes. While iRoot traits were picked using a heuristic approach, PBC were generated using unsupervised hierarchical clustering. Genotypes were grouped into eight PBC clusters based on 13 phenotypic root trait values (TRL, PRL, WID, CVA, LRB, VOL, LRA, SOL2, LED, RHZO, TRL_GR, TRLUpper, and Root_weight). A heatmap was created using the "heatmap2" and "dendextend" packages to display the interactions between the genotypic relationships and phenotypic trait values across the 13 root traits. In this manuscript, better performance indicates a higher value of the root trait, with LRA and SOL2 being exceptions. These are exceptions as steep root angle (lower number) and low solidity can be advantageous root traits.

Genotype-Based Clusters.
Genotype-based clusters were formed by grouping the 292 genotypes into eight groups based on hierarchical clustering of the SNP data. These accessions, from the USDA soybean germplasm collection, have been genotyped using the Illumina SoySNP50k iSelect Bead-Chip (Illumina, San Diego, USA), which detected the segregation of 42,509 SNPs [51]. Using preprocessing steps to eliminate SNPs below a minor allele frequency of 0.05 and monomorphic SNPs, 35,448 SNPs were identified and used for subsequent analysis. Principal component analysis (PCA) was performed on SNP data using the prcomp function the "stats" package and graphed using the "ggplot2" package. Nei's genetic similarity was used to construct a pairwise distance matrix using all polymorphic SNPs [63][64][65]. Hierarchical cluster analysis using Ward's minimum variance produced a linkage dendrogram using the "dendextend" and "circlize" packages [66]. The optimal number of SNPbased clusters was determined using the iterative k-means approach in which the procedure successively increases the number of clusters and measures the goodness of fit based on the Bayes Information Criterion (BIC). Eight genotypebased clusters (GBC) were inferred from the inflection point in the BIC curve (Supplementary Figure S1).
2.9. Mean Root Shape and Shape Clusters. Shape-based clusters were formed by grouping the 292 genotypes into eight groups based on the mean root shape of each genotype, with clustering of the root shapes via a CNN algorithm. Mean root shape outline was generated from all the root images at day 9 for each genotype using Elliptical Fourier Transformation (EFT) [67] (Supplementary Figure S2). In brief the steps are as follows: (a) collect all the segmented root images for a genotype, (b) dilate the images with a 50 × 50 kernel, (c) extract the root shape outline, (d) perform EFT on the outline, (e) calculate mean Fourier descriptors for all the roots (n ≤ 14) for a genotype, and (f) reconstruct mean shape outline with N Fourier harmonics. Here, a dilation step (b) was necessary to capture the root shape faithfully by EFT. Additionally, in the reconstruction step (f), we used N = 5 harmonics to capture only the basic shape of the roots. Shape-based clustering was performed on the mean root shape using a CNN and k-means algorithms. In brief, the steps are as follows: (a) fill the mean shape outline and pad the image along the width direction to make it square, (b) use a convolution autoencoder network to convert the images into eight-dimensional (high-level feature) vectors (Supplementary Figure S3), and (c) cluster the roots into eight groups using k-means clustering with Euclidean distance as the distance metric.

Genetic Diversity for Root System Architecture (RSA)
Traits. Descriptive statistics, broad sense heritability and ANOVA analysis of root, shoot, and seed traits are reported ( Table 2). Accessions displayed a range of phenotypic expression across the three imaging days. Representative phenotypic root traits for a particular maturity group, growth habit, or diversity was not identified. Genotypes were a significant source of variation for all but one trait (width-depth ratio (WDR) at 12 d). Large variation was observed for a majority of traits evidenced through comparison of mean, median, and trait ranges for RSA traits. Broad sense heritability (H 2 ) across RSA traits ranged 0.26-0.93 (6 d), 0.14-0.92 (9 d), and 0.04-0.93 (12 d). Minimal differences of the traits were observed among the diverse, elite, and landrace groups. Supplementary Table S1 displays the statistically significant root traits between diversity groupings. PRL, PRA, and DEP of accessions with diverse ancestry, and elite lines were often larger than landraces at 9 d and 12 d based on Tukey's HSD metrics. Additionally, WDR of landraces at 12 d was higher than elite lines which could be attributable to elite lines' increased DEP. No perceptible relationship patterns were detected among genotypes when comparisons were made between maturity groups and growth habits.    Due to the close correlation of many of the root traits, from here on in this paper, we narrow the focus to 13 root traits (TRL, PRL, WID, CVA, LRB, VOL, LRA, SOL2, LED, RHZO, TRL_GR, TRLUpper, and Root_weight) that define the iRoot categories. While some of the 13 root traits showed collinearity, we report on all to ensure comparison with previous literature [7,9,[59][60][61] (Supplementary Table S2). These thirteen root traits show a range of phenotypic expression ( Figure 2). For effective visualization and analysis, mean values of the traits for the top 10 ranked iRoot genotypes were employed as a reference (Figure 2, Table 3).
The highest ranked soybean genotypes in each of the iRoot category were as follows: nutrient foraging-PI 479713, drought tolerant-PI 458506, umbrella type-PI 438139, beard type-PI 430596, and maximum-PI 507487 ( Figure 2). Particular accessions often scored high rankings in two or three different iRoot categories. For example, genotypes PI 507487 and PI 578367 ranked in the top ten soybean lines for three separate iRoot categories including maximum,  nutrient foraging, and umbrella. Soybean genotypes PI 458506 and PI 507491 scored in top ten for maximum, nutrient foraging, and drought tolerant, while PI 89134 placed in top ten for maximum, beard, and drought tolerant categories (Supplementary Figure S5). The maximum and foraging iRoot categories have a substantial overlap (7 of the top 10) of genotypes representing both categories. These results reveal that despite being calculated from different root traits, iRoot categories such as maximum, nutrient foraging, and umbrella displayed similarities due to trait correlations.

Phenotype-Based Clusters.
Aside from typical trait-based analytical approaches, we explored alternate methods to group and correlate genotypes, their genotypic data, root shape, and individual root trait phenotypes. Eight phenotype-based clusters (PBC) were created based on hierarchical clustering of thirteen root traits used to create the iRoot categories. Due to the nature of the analysis and its reporting, PBC were arranged so that high performing genotypes constituted PBC "A" (n = 3) while a decreasing gradient was formed to the lowest performing PBC "H" (n = 4) ( Table 4, see also Supplementary Table S3). The majority of the 292 genotypes fell into PBC "C," "D," and "E" creating a bell-like distribution curve from PBC 1 to PBC 8. One exception to the decreasing gradient was the SOL2 root trait and, to a lesser extent, LRA. Lower angles provide a steeper root angle of attack, and such genotypes can cover a large area with limited root length scavenging soil more efficiently.

Genotype-Based Clusters.
Genotype-based clusters were formed by grouping the 292 genotypes into eight groups based on clustering the SNP data. The first two principal components explain 11.3% and 6.2% of the genetic variation ( Figure 3). Visually, the PCA scaffold was predominately soybean accessions from China, while clusters of accessions representing USA, Japan, and Russia were also evident ( Table 4). To further visualize the relationship between country of origin and GBC, we created a dendrogram representation based on genetic distance and the genetic clustering of the 292 soybean genotypes (Supplementary Figure S6). The USA accessions comprised a majority of GBC "A," Japan in "E," and Korea and Russia in "B." Soybean accessions from China were represented in all eight GBC. GBC "B" and "E" accessions were primarily of determinate growth habit. Mean fixation indices, based on SNP values, were calculated between GBC to display genetic diversity ( Table 5). The ranking of iRoot categories showed that GBC "B," "C," "D," "E," and "H" have representatives in the top 25 representative genotypes of each iRoot categories while GBC "F" has none (Supplementary Table S5).

GBC-PBC Relationship.
To visualize the relationship between PBC and GBC, we created a dendrogram representation containing PBC and GBC information of the 292 soybean genotypes (Figure 4). The distance between the tree's branches is based on genetic relatedness, the branch color represents the GBC of the genotypes, the leaf text denotes the country of origin, and the leaf text color represents the PBC of the genotype. A phenotype-based clustering algorithm created a gradient from high to low root trait values. The first PBC with genotypes having numerically high trait values we labeled "A," while the last PBC with very low trait values we labeled "H." PBC "A" was given a numerical value as "1" with a gradient of reducing root trait values to PBC "H" as "8." These numerical values allowed us to correlate GBC and PBC. We ranked the grouping of each GBC by calculating the mean PBC of the genotypes within the GBC cluster: 1 being high and 8 being low. Genotype-based clusters with a PBC mean closer to 1 correspond to genotypes demonstrating high root trait values, while a mean closer to 8 suggests low trait values. The resulting mean PBC values for each GBC from "A" to "H" are 4.00, 3.72, 3.35, 4.15, 4.12, 6.88, 4.55, and 3.7, respectively (Table 4). Genotype-based clusters with a mean < 4: "B" (3.72), "C" (3.35), and "H" (3.7), are comprised of genotypes with numerically higher trait values. Genotypes comprising GBC "F" and "G" display mean PBC values of 6.88 and 4.55 consisting of lower root trait values.
To correlate between PBC and country of origin, the mean PBC value for each country, from low to high PBC mean, was 3.47 for Russia, 3.65 for Korea, 3.99 for China, 4.17 for USA, 4.29 for Other, and 4.46 for Japan. To correlate between iRoot categories and PBC, we focused on PBC values of only the top 10 genotypes of each iRoot category. The top 10 genotypes of the maximum and drought tolerant categories come from high performing PBC "A," "B," and "C" while the top 10 nutrient foraging came solely from PBC "A" and "B." The top 10 genotypes of the umbrella type representatives are from mid-performing PBC "B," "C," and "D"; and the beard type has 10 representatives in relatively lower performing PBC "C," "D," and "E."  Table  4: Summary data for genotype-, phenotype-, and shape-based clusters. Summary in respect to country of origin, maturity group (MG), growth habit, diversity, root traits (TRL, PRL, WID, CVA, LRB, VOL, LRA, SOL2, LED, RHZO, TRL_GR, TRLUpper, and Root_weight) at 9 days after germination, iRoot category rankings, and phenotype-based cluster mean value. iRoot mean rank is calculated from the scores of all the genotypes within the specific GBC, PBC, or shape-based cluster. PBC mean was calculated by averaging the PBC groupings of genotypes that comprised the cluster.    A two-way clustering heatmap was used to further illustrate the comparison of genotype and phenotype performance ( Figure 5). The dendrogram tree on the left-hand side groups the 292 genotypes based on genetic relatedness with GBC identified by color. The first branch division of the tree indicates that GBC "G" and "H" are genetically distinct from the other 6 GBC. Genotypes from these to clusters have also contained lower root trait values (Table 4). On the red-black heatmap to the right of the dendrogram, color is based on iRoot category with black signifying high (closer to 1) and red signifying low (closer to 292) ranking for the iRoot category. Genetically distinct GBC "G" and "H" show poor ranking (red) across most of the iRoot categories with exception of the beard type. GBC "C" displays high root trait values (black) for three iRoot categories. Other GBC such as "A," "B," and "E" are definitive with branches within each cluster containing high values with others low. A second dendrogram above the heatmap connects phenotypic traits based on hierarchical relatedness of their numerical values. Here, the iRoot categories of umbrella, maximum, and nutrient foraging form a distinct group. The largest grouping of root traits consisted of WID, CVA, TRLUpper, TRL_GR, TRL, and RHZO; and this group remained consistent at 6 d, 9 d, and 12 d while the remaining 7 traits did not form tight groups (Supplementary Figure S7).
The heatmap on the right-side is based on phenotypic performance of 13 root traits with blue being high and orange being low trait values. GBC "F" and "G" were generally grouped together as identified by the red (left) and orange (right) shading denoting low value iRoot category performance. Conversely, GBC "B," "C," and "H" perform well with black (left) and blue (right) shading. Further examination within each GBC indicates that particular subbranches within each cluster often show higher values than others which is evident for iRoot categories and individual traits. For example, this is evident in the lowest-most subbranch of the large GBC "B." This subbranch contains 19 genotypes which display high values for umbrella, maximum, nutrient foraging, and drought-tolerant iRoot types as well as for many root traits.
To further explore the relationship between genotypic data, phenotypic data, and iRoot categories, iRoot category ranking (for each genotype from 1 to 292) was averaged for each GBC (Table 4). The results reflect whether certain  genotypic clusters display iRoot features. In step with earlier results, GBC "C" produced the highest average ranking in umbrella, nutrient foraging, and maximum iRoot types as well as having the highest overall average ranking (116 of 292) ( Table 4). GBC "F" produced the worst ranking in nutrient foraging, drought tolerant, umbrella, and maximum iRoot types as well as the worst overall average ranking (239 of 292). Ranking results for GBC "D" displayed a high ranking for the beard iRoot type (68 of 292). Interestingly, the average iRoot scores of the 19 genotypes that make up a small, aforementioned genetically divergent subbranch of GBC "B" are 89, 104, 66, 95, and 164 out of a possible 292 for maximum, drought tolerant, nutrient foraging, umbrella, and beard iRoot types, respectively. These results suggest that these 19 genetically related accessions display many desirable phenotypic attributes in this experimental environment. Genotypes from Russia and Korea produced higher than average iRoot rankings (121 of 292) each in relation to China = 147, Other = 148, USA = 159, and Japan = 164. Finally, maximum root type iRoot rank results were plotted on the genomic data that produced PCA plot for visualization (Figure 3(b)). The presence of plane separation between red (low root trait values) and blue (high root trait values) data points displays the evidence of the correspondence between the performance of soybean accessions and genotypic data.
Differences appear when comparing genotype-and phenotype-based clustering methods across 13 root traits in which GBC "C" performed well and GBC "F" performed poorly (Figures 6(a) and 6(b)). Genotypic-based clusters "B," "E," and "H" display consistency in relation to the other clusters. The trend lines from 6 d to 12 d show that GBC "F" and "G" consistently have lower trait value performance (Supplementary Figure S8). GBC "A" performed similarly to GBC "B," "C," and "E" for the majority of root traits with the exception of PRL and DEP. PRL and DEP have a higher trait value compared to the mean at 6 d, 9 d, and 12 d. The LRB trait has 1.1% more lateral root branches than the mean on 12 d (Supplementary Table S4). In GBC "A," with the exception of LG05-4464, the 23 genotypes from the USA displayed relatively similar trait values to one another (data not shown).
LG05-4464, a line with diverse ancestry, ranked first of 23, in TRL and TRArea measurements as well as WID, and thus outpaced the others within the USA-dominated GBC "A." 3.6. Shape-Based Clusters. Shape-based clustering into eight groups was performed on 9 d roots. Mean values of shapebased clusters (SBC) labeled "A," "B," "C," and "D" display high values across the 13 root traits while SBC "G" and "H" display low values (Figure 6(c)). SBC "A" has high values in the 13 traits used in iRoot estimation while SBC "H" has generally low values (Table 4). Cross-referencing SBC "A" to GBC, individuals were derived from higher trait valued GBC including GBC "B" (n = 17), "H" (n = 6), "C" (n = 4), "D" (n = 4), and "E" (n = 4). GBC "F" contains eight genotypes which all correlate to low trait valued shape clusters, four being in SBC "G" and four being in SBC "H". SBC "A" has a higher iRoot ranking than other SBCs for maximum, drought tolerant, nutrient foraging, and umbrella types while SBC "H" has the highest ranking for beard type (  Figure 5: Heatmap displaying relationships between genotypes and phenotypes. Correlations between genotype (y-axis) and phenotype (x -axis) for 9 d after germination using data from 292 soybean accession studies for 13 root traits (TRL, PRL, WID, CVA, LRB, VOL, LRA, SOL2, LED, RHZO, TRL_GR, TRLUpper, and Root_weight) and 5 iRoot categories (nutrient foraging, drought tolerant, umbrella, beard, and maximum) (replications per accession = 14). The dendrogram on the y-axis was developed using hierarchical clustering using SNP data; the x-axis displays iRoot categories and 13 root traits that comprise iRoot categories.

Mean Root Shape
Outline of the iRoots. Mean root shape profiles of the top 10 ranked genotypes of each iRoot at 9d were generated using five harmonics of Elliptic Fourier descriptors as described in Materials and Methods (Figure 7). The shape profiles were created solely from images of the top 10 ranked genotypes of each iRoot category. Initial impression suggests the outlines look similar; however, each profile is subtly distinct due to a combination of shape, size, and structure. The nutrient foraging and maximum representations display a similar shape while the beard shape is visually distinct from the other categories. The droughttolerant based shape shows a reduced upper to lower width ratio compared to maximum, umbrella, and nutrient foraging which were less distinct from each other.

Discussion
Over 12,000 images were collected over three time points throughout the course of this experiment. ARIA 2.0-based trait extraction provided over 500,000 data points. The best linear unbiased predictors (BLUPs) were calculated to act as a weighted genotypic mean for all root system architecture traits as well as seed, root, and shoot dry weights. Broad sense heritability of root traits was equal or above similar previous studies [7,59,61] with 15 traits being above 0.9 and 27 being above 0.8. These heritability results could be reflective of the number of replications (14) Figure S9). Strong correlations between root traits concurred with an a priori hypothesis that Root_weight correlated with long TRL and increased LRB, CVA, VOL, and WDR. The root traits providing metrics for length and width generally clustered together, while those for weight, volume, and root number congregated together. Additionally, correlations between traits exhibited the highest values (measured using cumulative correlation intensity) at 6 d and lowest at 12 d. This effect could be the result of lower trait variability at earlier growth stages.
This study is aimed at connecting genomic and phenomic information to elucidate genetic diversity present for RSA traits. Using genomic distance approaches, genotypes were grouped into eight clusters for analysis using SNP data (GBC) and eight clusters using the phenotypic data (PBC) using data from 13 root traits. We explored the trait expression in depth to determine the extent of genetic diversity as it is important for breeding applications. Selection efforts in the last century have only indirectly targeted root architecture traits, while the primary efforts have been to select for seed yield under the influence of agronomic and management practices including plant population density, fertilizer application, water availability/irrigation, and soil types. Implicit assumptions are made that above ground trait variability and expression are mirrored in below ground RSA traits. This knowledge could be leveraged in the future as studies have shown a positive relationship between a common bean leaf surface area and a root surface area [69], allowing for indirect selection of root through shoot assessment. Our results show that GBC "A", composed primarily of US accessions, did not have full expression of phenotypic diversity for RSA traits. GBC "A" at 9 d displayed average root trait expression with the exception of above average PRL and DEP (two correlated traits; at 9 d; r = 0:996) suggesting US cultivars exhibit increased depth suggesting potential drought tolerance characteristics. However, GBC "A" did not display differences in LRA or TRL_GR, with other traits reported for positive drought response. A field study is needed to ascertain if the US germplasm accessions have been indirectly selected for long PRL (reported in this paper using controlled environment conditions), which has been linked to increased drought tolerance [33,34], and for LRA and other surrogates of drought tolerance response. Additionally, high root trait values such as TRL have not been correlated to increased seed yield in soybean. While some US accessions were cultivars, the lack of RSA trait diversity presents opportunities to further improve the genetic potential of soybean cultivars. While our interest is predominantly focused on studying the US accessions, these approaches and results can be useful for breeders and researchers worldwide to understand the full complement of genetic and trait diversity in their programs and targeted regions. Strong performing accessions can be identified individually or as a cluster using genomic population structure to direct further exploration using genomic selection with a multivariate approach bridging genomics and phenomics data. For example, subbranches within genomic-based clusters contained many highly related genotypes displaying high root trait values. The near-exhaustive analyses we performed did not uncover significant correlations to geographic origin, climatic zone, and root phenotype. Our scope of inference is limited to 6, 9, and 12 days after germination, and additional studies are needed to confirm if this lack of relationship is maintained at later growth stages. The iRoot categories were generated by leveraging phenomics data to identify specific trait measurements and statistical analyses to quantify differing root shapes. While these iRoot categories have been reported in different crops, we were interested in integrating information from multiple crops to study and explain the root trait diversity in soybean. The iRoot categories were based on a previous work of the root scientific community including nutrient foraging [36], drought tolerant [43], beard [62], and umbrella [62]. The maximum iRoot category was developed to identify the greatest root growth potential without a particular environment in mind. The umbrella iRoot category was based on Liao et al.'s study, who use the common bean as an archetype of umbrella shape describing it as P-foraging noting that "basal roots tend to be shallow in the phosphorus-rich topsoil and tap roots tend to be deep for water in the subsoil." The nutrient foraging iRoot category, similar to the umbrella type, was created to capture a root phenotype that could optimize nutrient acquisition in low-fertility soils [36]. The nutrient foraging iRoot is composed of a wide root system with a high ratio for total root length in the upper 1/3 of the root system as well as a fast growth rate. The beard shape iRoot was noted as the ideal rice root type, "moderately dispersed yet uniformly distributed adventitious and lateral roots so as to keep most roots in the topsoil for phosphorus and a few roots in the subsoil for water" [59]. The drought tolerant iRoot category was developed to "chase the water"; in other words, fast growing, steep, deep roots provide yield security during drought [36,49]. Potentially, soybean genotypes with a dominant, rapidly elongating taproot could lead to a deeper root system and enhanced water acquisition. Uga et al. report that steeper root angles in rice have also been correlated to higher yield in drought environments [70]. Our observation of the correlation between shallow LRA and TRL growth rate (r = 0:28) suggests that, genetically, roots may have a predisposition to both traits and require further testing in field tests. Lab-based root angles have been shown to correlate to drought tolerance by other studies [37,71] including Rellán-Álvarez et al. who noted that water-deficient Arabidopsis roots grow at a steeper angle in soil-filled rhizotrons compared to the well-watered treatment and serve as an optimal starting point for larger scale genetic studies [9]. The maximum iRoot, which successfully correlated phenotypic root traits with genotypic based population structure visualized in Figure 3(b), suggests that genotypic information can predict certain population groups which may have potential use in breeding. The next step in this root trait research is to develop and perform a large-scale study to correlate controlled environment and field-based results. Our preliminary investigations of top ranked iRoot genotypes in a field environment experiment display visual similarities worth further investigation (Figure 8). Additional experiments are needed to verify the seed yield performance associated with different iRoot categories.
We combined mathematical functions Elliptical Fourier Transformation (EFT) and machine learning (ML) approaches on image data to generate shape profiles and shaped-based clusters, which have previously not been reported for root-related trait studies. The EFT approach is advantageous to explore root shape diversity allowing for systematic root outline analysis while maintaining the integrity of the shape. In our efforts, we generated mean shape profiles of the iRoot categories and the profiles qualitatively capture the expected difference among the categories. The shapebased clusters removed the human annotation steps and helped to segregate strong performing and weak performing genotypes into different shape-based clusters. SBC "A" contained genotypes with high mean trait values that were grouped into high performing PBCs. Additionally, the SBC "A" was also the highest ranking for 4 of 5 iRoot categories. Genotypes grouped within SBC "G" and "H" performed poorly for trait values, PBC, and iRoot categories. These results suggest that high performing genotypes can be identified solely from their root shape and present an attractive application for phenomics in breeding and research. However, this approach requires further validation for comparison with state-of-the-art shape profile generation and applicability to field performance through plant breeding efforts.
We propose that RSA trait research for practical breeding outcomes will benefit from further studies in high throughput phenotyping systems that can do the following: (a) connect artificial and field environment studies, (b) make correlations between easily assessed and difficultly assessed traits to determine optimal balance of traits to focus on, (c) understand the physiology behind drivers for yield using large plant populations in specific and diverse environments, and (d) integrate genomics and phenomics pipeline for breeding decision-making. Root trait research requires infusion of analytics, such as leveraging advanced sensors coupled with computer vision and ML-based feature extraction methods [72][73][74][75][76][77]. Finally, robotics-assisted phenotyping is transforming above ground trait studies [78], and there is a need for robotics-based phenotyping solution for end-to-end phenotyping platforms and root excavation without information loss. Unlike above ground traits, root systems still do not have well-described growth stages and are often described by corresponding length or depth which lacks the inclusion of development stages [79,80]. An understanding of root growth stages and developmental process stages and processes is integral to adapting root development into mathematical growth models, which will help breeders develop more efficient crops. These abovementioned solutions will help integrate multitrait objective functions [81] for above and below ground trait selections for furthering genetic gains. There is a need to deploy RSA traits in prescriptive cultivar development [82] and continue to explore and identify trait predictors for phenomics-assisted breeding [83].

Summary
In this study, we explored informative root categories (iRoot), built on a previous literature in different crops, leveraging data to identify specific trait measurements and statistical analyses to quantify iconic root shapes. Results demonstrate that superior root trait values and root shape correlate to specific genomic clusters. In addition, USderived genotypes have long primary roots but fail to show further root trait values indicating room for improvement of the RSA of US germplasm. However, other groupings of genotypically related accessions did show high root trait values. Our study demonstrates the relevance of ML and computer vision-based software for the study of RSA traits. These tools can be useful for discovering and characterizing new traits and advancing time series-based studies on the growth and development of root systems. While we now can correlate root values and shape to genomic clusters, there is a need to connect controlled environment studies to fieldbased studies to improve methods of data collection. There is a large inventory of genetic variation among the world's soybean germplasm collection which provides the base for future crop improvement for RSA traits. After centuries of indirect selection for RSA, there is a pressing need to harness and implement quality soybean RSA diversity in cultivar development programs. Building upon the correlations of root phenotype and shape to genomic regions with improved phenotyping and ML techniques, we have captured the diversity of RSA available within the soybean germplasm core collection and can now leverage our results to develop improved soybeans using traditional and phenomics integrated plant breeding approaches.

Data Availability
The ARIA 2.0 code is freely available at the address: https:// bitbucket.org/baskargroup/aria2/src/master/. Analysis code is freely available at the address: https://github.com/ mighster/ARIA2.0. Root extracted data, raw images and/or segmented masks are available upon request.