Introduction
Soybean (Glycine max (L.) Merr.) is one of the major legumes and oil crops of the world, in terms of total production and trade (Chen and Nelson, Reference Chen and Nelson2005; Hymowitz and Shurtleff, Reference Hymowitz and Shurtleff2005; FAO, 2020). However, the genetic improvement of such an essential crop has been challenged by its extremely narrow genetic base (Gizlice et al., Reference Gizlice, Carter and Burton1993, Reference Gizlice, Carter and Burton1994, Reference Gizlice, Carter, Gerig and Burton1996; Salado-Navarro et al., Reference Salado-Navarro, Sinclair and Hinson1993; Sneller, Reference Sneller1994; Singh and Hymowitz, Reference Singh and Hymowitz1999; Cornelious and Sneller, Reference Cornelious and Sneller2002). To sustain its genetic diversity, over 170,000 soybean germplasms have been conserved across 17 countries globally; while the Chinese National Crop Genebank maintains 31,575 accessions (Qiu et al., Reference Qiu, Chen, Liu, Li, Guan, Wang and Chang2011) and the soybean genetic resource centre of the United States Department of Agriculture (USDA) maintains over 18,000 germplasm collections (Soybase, 2020). These germplasm collections have made significant contributions to production and breeding programmes, since they possess several unique genes that can be utilized for the genetic improvement of the crop (Qiu et al., Reference Qiu, Chen, Liu, Li, Guan, Wang and Chang2011; Soybase, 2020). Populations derived from the genetic recombination of biparental crosses of diverse parents might be vital sources of higher genetic variability (Helms et al., Reference Helms, Orf, Vallad and McClean1997; Kisha et al., Reference Kisha, Sneller and Diers1997). Some reports show the contributions of improved varieties (e.g. crosses of elite X elite varieties) and elite breeding lines developed by breeding programmes are on the rise, compared to germplasm collections and landraces. For instance, in China, elite breeding lines from crosses and cultivars contributed to 36 and 86% of the soybean varieties released in the 1950s, and between 1993 and 2004, respectively, relative to landraces, traditional varieties and wild relatives (Qiu et al., Reference Qiu, Chen, Liu, Li, Guan, Wang and Chang2011).
Progress in genetic improvement of any crop depends on the presence of genetic diversity within the populations under selection. Knowledge of the genetic diversity crops is very important for designing strategies to establish core collections and enhance utilization of the germplasm by breeding programmes. Soybean production is gaining increasing importance in several sub-Saharan African (SSA) countries, such as Nigeria, Ghana, Uganda, Ethiopia, Zambia and Malawi (FAO, 2020), the crop being cultivated in the wider agro-ecological conditions of these countries. The soybean improvement programme at the International Institute of Tropical Agriculture (IITA), Ibadan, Nigeria, has a soybean collection of more than 1800 accessions, including high-yielding cultivars without information on the extent of their genetic diversity. These accessions have been phenotypically screened for priority traits that include disease resistance, pod shattering tolerance, early and medium maturity, efficient natural nodulation, lodging tolerance and yield improvement (Tefera et al., Reference Tefera, Kamara and Asafo-Adjei2009; Chigeza et al., Reference Chigeza, Boahen, Gedil, Agoyi, Mushoriwa, Denwar, Gondwe, Tesfaye, Kamara, Alamu and Chikoye2019). These authors also reported the release of more than 100 IITA bred soybean varieties by the National Agricultural Research Systems across several countries in SSA, following the screening efforts.
Numerous studies have been carried out to determine the degree of genetic variation of varieties and breeding lines of soybean and their relatedness (Keim et al., Reference Keim, Beavis, Schupp and Freestone1992; Gizlice et al., Reference Gizlice, Carter and Burton1994; Sneller, Reference Sneller1994; Sneller et al., Reference Sneller, Miles and Hoyt1997; Kisha et al., Reference Kisha, Diers, Hoyt and Sneller1998; Nelson et al., Reference Nelson, Elmore, Klein and Shapiro1998). Due to its commercial value, soybean has been the subject of advanced genomic studies by the private and public sectors. A number of molecular genomic resources, including single-nucleotide polymorphisms (SNPs), such as: Phytozome (https://phytozome.jgi.doe.gov/pz/portal.html), SoyKB (http://soykb.org/) and Soybase (https://soybase.org/) are available in the public databases. Despite the abundance of genomic resources, no molecular characterization has been performed on the soybean breeding lines at IITA. Therefore, the objective of this study was to assess the extent of genetic diversity of breeding lines and varieties developed by the IITA soybean breeding programme.
Materials and methods
Plant material and DNA isolation
A total of 65 soybean genotypes (17 released varieties and 48 elite breeding lines) from IITA's soybean breeding programme were used in this study (Table 1). Although most of these varieties were released in Nigeria, some lines have been released in Ghana, Benin, Togo, Democratic Republic of the Congo, Uganda, and Ethiopia from 1989 to 2011. All the genotypes with TGx 1987 series are rust-resistant, as they are developed from a cross of rust-resistant donor parent UG-5, in addition to other desirable traits. TGx 1835-10E is a variety released in Nigeria for early maturity and rust resistance. Soy104 and UG-5 were unimproved, but sources of rust resistance genes. TGx 1440-1E and TGx 1448-2E are suitable as a trap varieties in depleting the seed bank of Striga hermonthica.
a Res, resistant; Susc, susceptible.
DNA extraction and genotyping
For SNP genotyping, the 65 soybean genotypes were planted and grown into seedlings for 3 weeks, after which fresh bulk young leaves were harvested from all the seedlings (10) per genotype and ground into a fine powder using liquid nitrogen. The genomic DNA of each plant sample was extracted using a miniprep Dellaporta extraction protocol (Dellaporta et al., Reference Dellaporta, Wood and Hicks1983). The quality of the extracted DNA samples was checked using a 1% agarose gel prepared in 150 ml of 1× TBE agarose gel and quantified using Nanodrop Technologies ND-1000 model range of a nanodrop spectrophotometer.
GoldenGate assay-based SNP genotyping
The genomic DNA extracted from each of the 65 soybean genotypes was sent to the Soybean Genomics and Improvement Laboratory, Beltsville Agricultural Research Center (SGIL-BARC), Beltsville, Maryland. DNA quality was also checked at SGIL-BARC and immediately used for the GoldenGate assay. The GoldenGate assay was performed as per the procedure described by Hyten et al. (Reference Hyten, Song, Choi, Yoon, Specht, Matukumalli, Nelson, Shoemaker, Young and Cregan2008).
Statistical analyses
Genetic diversity indices
SNP markers with a minor allele frequency (MAF) less than 0.05 were filtered out, resulting in 1223 informative loci used in the analyses. The genetic properties of SNPs, such as MAF, polymorphic information content (PIC) and percentage of polymorphic loci (% P) were calculated to quantify the genetic diversity within and among the 65 soybean genotypes. In addition, genetic diversity indices, such as total number of different alleles (N a), number of effective alleles (N e), Shannon's information index (I), number of private alleles, gene diversity (H e), observed heterozygosity (H o) and number of loci with private alleles were computed using Power Marker (Liu and Muse, Reference Liu and Muse2005) and GenAlEx version 6.41 (Peakall and Smouse, Reference Peakall and Smouse2012) software.
Population structure
The inherent population structure within the genotypes was characterized based on the common attributes of the genotypes using the three complementary clustering approaches. In the first distance-based hierarchical clustering analysis, a pairwise genetic distance (identity-by-state, IBS) matrix was calculated among all individuals using PLINK 5 (Purcell et al., Reference Purcell, Neale, Todd-brown, Thomas, Bender, Maller, Sklar, de Bakker, Daly and Sham2007) and Ward's minimum variance. A hierarchical cluster dendrogram was then built from the IBS matrix using the Analyses of Phylogenetics and Evolution (ape) package (Paradis et al., Reference Paradis, Claude and Strimmer2004) implemented in R (R core team, 2015). The second approach was a model-based maximum likelihood estimation of ancestral subpopulations using ADMIXTURE (Alexander et al., Reference Alexander, Novembre and Lange2009). ADMIXTURE assumes that the loci are in linkage equilibrium, and the ancestral populations are in Hardy–Weinberg equilibrium (Frichot et al., Reference Frichot, Mathieu, Trouillon, Bouchard and François2014). In the ADMIXTURE analysis, the number of subpopulations (K) varied from 2 to 12, and the value of K exhibiting a low cross-validation error was selected (Alexander and Lange, Reference Alexander and Lange2011). The third approach was an assumption-free discriminant analysis of principal components (DAPC) analysis which was implemented in R using the ‘adegenet’ package (Jombart, Reference Jombart2008). DAPC that involved optimal clusters of transformed principal component analysis-based SNP data was used to identify and describe clusters of genetically related individuals and subgrouping based on k-means. The best-supported model by Bayesian information criterion (BIC) was selected. Based on results of hierarchical clustering information and ADMIXTURE analysis, the most appropriate K was selected. The membership probabilities of each genotype for the different groups were obtained, and results of the three complementary approaches (the hierarchical tree/dendrogram, ADMIXTURE and DAPC) were compared.
Based on the numbers of inferred clusters determined from the three complementary approaches, analysis of molecular variance (AMOVA) was computed to estimate molecular variation within and among the genotypes using GenAlEx 6.41 and Power Marker V3.25 software. The extent of genetic variance explained by population structure was derived from the AMOVA, fixation index (F ST) and standardized F ST (F′ST) based on Wright's F-statistic (Wright, Reference Wright1978).
Results
Genetic diversity
The MAF ranged from 0.02 to 0.50, with an average of 0.23. The highest PIC value of 0.38 was recorded in the markers BARC-064873-18956, BARC-051149-11016, BARC-029669-06297 and BARC-019787-04375; while the lowest value was 0.02 with an average value of 0.25. The percentage of polymorphic loci recorded in this study (85%) was high, and an indicator of the efficiency of SNP markers used in this study in detecting polymorphism (Fig. 1). The entire soybean samples had an average effective number of alleles, gene diversity, observed heterozygosity and Shannon's information index of 1.53, 0.31, 0.19 and 0.25, respectively (Table 2). These frequencies were considered to be desirable in differentiating the studied soybean genotypes.
The presence of private alleles was considered as an additional factor to differentiate the population. A total of 108 private alleles were identified among the 65 soybean genotypes. However, based on the number of clusters identified, the number of private alleles detected in cluster 2 was about 3–9-fold higher than the other two clusters.
Genetic relatedness
All the three complementary methods used in determining the number of clusters among the 65 soybean genotypes showed the presence of three major clusters with few sub-clustering (Fig. 2). Cluster 1 consists of 19 genotypes, most of which were crosses with Uganda's UG-5, including the genotype from USA (SOY104), all of which are rust-resistant, except for TGX 536-02D, which was released in Nigeria, Benin and Ghana since 1985, and susceptible to the rust disease. Cluster 2 was the largest, with 26 genotypes, which are all susceptible to leaf rust. Some genotypes (i.e. TGx 1440-1E, TGx 1448-2E, TGx 1937-1F, TGx 1908-8F and TGx 1910-14F) in this cluster are early flowering, resistant to lodging, resistant to pod shattering and have good nodulation ability. These genotypes were released in Nigeria, Cameroon, Togo, Mozambique, Ghana, Benin, Cote D'Ivoire, Kenya, Malawi and Mozambique. Cluster 3 had 20 genotypes, of which only two (TGX 1961-1F and TGX 1835-10E) were resistant to leaf rust with the latter released in Nigeria, Uganda, Kenya and Cameroon since 2008. Other genotypes in this cluster are susceptible to soybean leaf rust disease, but possess other desirable agronomic traits. Although all clusters are discrete and well separated from the other clusters, each cluster is reasonably heterogeneous in terms of the genotypes' attributes. It was observed that genotypes with the same pedigrees clustered together that validates clustering with the SNP markers was efficient in grouping genetically related genotypes. Very few intermixing of the rust-resistant with susceptible soybean genotypes were observed across the clusters (Fig. 2). The DAPC method, using discriminant functions (Fig. 3), maximized the diversity between the three clusters while minimizing the diversity with-in cluster. The first three principal components explained 31.95% of the cumulative variation. The three genetically distinct groups identified using DAPC were consistent with the groups identified by the hierarchical cluster/dendrogram and ADMIXTURE. The error rate from the cross-validation method of both the ADMIXTURE and the BIC from the DAPC showed a rapid decline from K = 1 to K = 3 and from 1 to 3, respectively, indicating that the samples can be grouped into three major clusters. The results obtained above were consistent and showed good correspondence; thus, indicating the samples' population structure had been correctly identified.
The extent of genetic variation among the genotypes was further revealed by AMOVA, which showed high variations (71%) (P < 0.001) among the individual genotypes, and 11% (P < 0.001) of the total variation was ascribed to differences among the three clusters detected by the three complementary approaches used to determine the population structure (Table 3). The 18% (P < 0.001) was accounted for by the variation within the 65 genotypes. The estimated fixation index (F ST) was 0.11 (P < 0.001), indicating moderate genetic differentiation among the clusters.
***Significant at the 0.001 probability level.
Discussion
Genetic characterization information is vital in designing future hybridization plans of the IITA soybean breeding programme and partners receiving IITA's elite soybean lines to be utilized by the national soybean improvement programmes across Africa, and beyond. In this study, the discriminatory power and information obtained from the estimates of genetic diversity and population structure of the studied soybean genotypes were enlightening.
The number of alleles per locus measures genetic variation at the gene level. In a population of self-fertilizing species, such as soybean, lower allelic diversity and heterozygosity are commonly expected (Wright, Reference Wright1921). Hence, PIC, MAF, Shannon's information index, gene diversity and observed heterozygosity were >0.5 in this study. The average PIC value of 0.25 was moderately informative and implies that the SNP markers have differentiating power, since PIC cannot exceed 0.50 in bi-allelic markers (Singh et al., Reference Singh, Choudhury, Singh, Kumar, Srinivasan, Tyagi, Singh and Singh2013). Comparable average PIC values were reported on ryegrass (Roldàn-Ruiz et al., Reference Roldàn-Ruiz, Dendauw, Van Bockstaele, Depicker and De Loose2000), soybean (Chen et al., Reference Chen, Hou, Zhang, Pang and Li2017) and wheat (Eltaher et al., Reference Eltaher, Sallam, Belamkar, Emara, Nower, Salem, Poland and Baenziger2018). The high percent polymorphic loci and other genetic diversity indices measured also depict the existence of variability among the soybean genotypes. The occurrence of private alleles among the genotypes indicates that the germplasm consisted of diverse, unique, and favourable alleles that may contribute positively to soybean breeding that is yet to be exploited. Thus, these observations imply the presence of diversity within the genotypes and demonstrate that the selected markers were informative and useful for further soybean genetic diversity studies.
For germplasm characterization, earlier reports concluded that large numbers of SNPs would be required to replace the highly polymorphic SSRs in diversity and relatedness studies (Hamblin et al., Reference Hamblin, Warburton and Buckler2007; Semagn et al., Reference Semagn, Babu, Hearne and Olsen2014). The average genetic distance (similarity) among a set of genotypes measures genetic diversity at the population level (Lu and Bernardo, Reference Lu and Bernardo2001). The three diverse but complementary clustering analyses employed in this study differentiated the 65 genotypes from each other, assigning them into three different groups, which indicate substantial genetic diversity. The genotypes in each cluster share common features, which largely corresponds to their pedigree and agronomic traits. TGx 1835-10E, released for leaf rust resistance (Table 1), found in the cluster comprising genotypes with efficient natural nodulation, pod shattering resistance, medium maturity and high yield. The two striga trap varieties: TGx 1440-1E and TGx 1448-2E that are known in depleting the seed bank of S. hermonthica through suicidal germination were found in the same cluster with other genotypes that are early maturing, resistant to lodging and high yielding. All the genotypes derived from crosses made to UG-5, a rust-resistant parent, were found in the same cluster because all of them have rust resistance features (Table 1), and hence, according to Tantasawat et al. (Reference Tantasawat, Trongchuen, Prajongjai, Jenweerawat and Chaowiset2011), share genetic homology. Some correspondence between the clustering pattern and the pedigree of the soybean genotypes was observed. These results indicate that the different pedigrees for each soybean genotypes played an essential role in maintaining genetic variation, as genotypes with similar pedigree clustered together by the SNP markers. Lee et al. (Reference Lee, Yu, Hwang, Blake, So, Lee, Nguyen and Shannon2008) suggested that soybean genotypes originating from different genetic backgrounds could have important genetic differences. The markers successfully differentiated the studied germplasm, and the genetic distance observed within the genotypes indicates that good recombination to produce superior progenies can be achieved from crosses between genetically dissimilar genotypes (Narvel et al., Reference Narvel, Fehr, Chu and Grant2000).
Assessment of genetic relatedness among parental lines will help breeders identify the most diverse cross combinations useful to enhance genetic gain from the population. Therefore, the relationship of soybean genotypes of IITA breeding lines can facilitate the selection of diverse parental lines carrying priority traits for recombination.
The 71% variation among individual genotypes confirms that populations of self-fertilizing species are expected to have high differentiation. The F ST value in this study was 0.11, which may be regarded as moderate as per Wright's qualitative guidelines (Wright, Reference Wright1978). These observations indicate the presence of diversity between the genotypes, and demonstrate the highly informativeness and usefulness of the selected markers for future soybean genetic diversity studies.
SNPs have emerged as powerful tools for many genetic applications. Its unique features include abundance in the genome and the ability to generate polymorphism at a single base level. We aim to have the SNP markers optimized by providing relevant information on the markers' discriminatory power in characterizing the IITA soybean germplasm. The markers used were powerful for detecting genetic diversity among and within the soybean populations. We suggest using these validated sets of SNP markers as they spanned the whole genome and provided a biologically sound classification of the genotypes. The results of this study will help to conserve, utilize and manage IITA's soybean germplasm effectively. Determination of the extent of genetic variability present in this germplasm will furnish soybean breeders' with the required information and decision-making tools for effective parental selection in their breeding programmes.
Acknowledgements
This research was supported by the core fund provided by the International Institute of Tropical Agriculture (IITA), and support from Soybean Genomics and Improvement Laboratory, Beltsville Agricultural Research Center (SGIL-BARC), Beltsville, Maryland.
Conflict of interest
None.