Introduction
Soybean (Glycine max (L.) Merr) is an important legume crop, providing oil and protein for human and animal consumption, and its oil is a major source for biodiesel production (Pimentel and Patzek, Reference Pimentel and Patzek2005; Singh et al., Reference Singh, Nelson, Chung and Singh2007). Most of the elite soybean cultivars were developed from a narrow genetic base, derived from a limited number of ancestral lines. Wild soybean (Glycine soja Sieb. and Zucc.) is presumed to be the undomesticated progenitor of the closest relative of soybean to G. max. Therefore, various genetic accessions in G. soja are expected to be a valuable genetic resource for soybean crop improvement. Recently, whole-genome sequencing of G. max var. Williams 82 and wild soybean (G. soja var. IT182932) has been completed in the USA and Korea, respectively (Kim et al., Reference Kim, Lee, Van, Kim, Jeong, Choi, Kim, Lee, Park, Ma, Kim, Kim, Park, Lee, Kim, Kim, Shin, Jang, Kim, Liu, Chaisan, Kang, Lee, Kim, Moon, Schmutz, Jackson, Bhak and Lee2010; Schmutz et al., Reference Schmutz, Cannon, Schlueter, Ma, Mitors, Nelson, Hyten, Song, Thelen, Cheng, Xu, Hellsten, May, Yu, Sakurai, Umezawa, Bhattacharyya, Sandhu, Valliyodan, Lindquist, Peto, Grant, Shu, Goodstein, Barry, Griggs, Abernathy, Du, Tian, Zhu, Gill, Joshi, Libault, Sethuranman, Zhang, Shinozake, Nguyen, Wing, Cregan, Specht, Grimwood, Rokhsar, Stacey, Shoemaker and Jackson2010). The consensus sequence of G. soja spanned 915.4 Mb, covering 97.65% of the Williams 82 genome sequence. G. soja distributed in Eastern Asia including eastern China, Korea, Japan and eastern Russia has important phenotypic characteristics and specific alleles that are not present in G. max (Hymowitz and Singh, Reference Hymowitz, Singh and Wilcox1987; Carter et al., Reference Carter, Nelson, Sneller, Cui, Boerma and Specht2004). Interestingly, Kuroda et al. (Reference Kuroda, Tomooka, Kaga, Wanigadeva and Vaughan2009) reported that a higher proportion of rare alleles are present in wild soybean from Korea than in that from other countries. Several studies have been conducted using electrophoresis or simple sequence repeat (SSR) markers to estimate genetic diversity within the Korean wild soybean (Yu and Kiang, Reference Yu and Kiang1993; Choi et al., Reference Choi, Kang, Song and Kim1999; Lee et al., Reference Lee, Yu, Hwang, Blake, So, Lee, Nguyen and Shannon2008). However, a limited number of G. soja accessions from Korea were used in the previous studies. In this study, genotypes of 733 G. soja accessions collected from Korea were used to estimate genetic diversity and population structure.
Materials and methods
A total of 733 G. soja accessions originating from Korea were used in this study (see online supplementary Table S1). Genomic DNA was extracted from young leaves using a modified CTAB procedure (Keim et al., Reference Keim, Olson and Shoemaker1988). Based on the soybean genetic map (Song et al., Reference Song, Marek, Shoemaker, Lark, Concibido, Delannay, Specht and Cregan2004), a total of 21 SSR markers were randomly selected by greater polymorphic information content (PIC). The M13-tail PCR method labelled with one of the fluorescent dyes 6-FAM, NED or HEX (Applied Biosystems, Foster City, CA, USA) was used to measure the size of the PCR products (Schuelke, Reference Schuelke2000). SSR alleles were resolved on the ABI 3730 × 1 automatic DNA sequencer (Applied Biosystems). Allelic differences in SSR markers were analysed using the GeneMapper 3.0 software (version 3.7; Applied Biosystems). For overall genetic diversity assessment of the collection, the observed number of alleles and rare alleles, genetic diversity, PIC per locus and Shannon's information index were calculated using POPGENE 1.31 and POWERMARKER 3.25 (Nei, Reference Nei1973; Yeh et al., Reference Yeh, Yang and Boyle1999; Liu and Muse, Reference Liu and Muse2005; see online supplementary Table S2). To estimate genetic differentiation among the populations, F st statistics with AMOVA was implemented using Arlequin 3.11 (Excoffier et al., Reference Excoffier, Laval and Schneider2005; see online supplementary Table S3). For population structure analysis, STRUCTURE 2.34 analysis was carried out using a burn-in period of 10,000 Markov chain Monte Carlo iterations, a run length of 10,000 and a model allowing for admixture and correlated allele frequencies (Pritchard et al., Reference Pritchard, Stephens and Donnelly2000; Fig. 1(A) and (B)). Ten independent runs were performed for each simulated value of the true number of populations (K) from 1 to 15. For each value of K, the estimated log probability of data Pr(X|K) was used to calculate ΔK (Evanno et al., Reference Evanno, Regnaut and Goudet2005). Principal coordinate analysis was carried out based on Nei's genetic distance matrix among the populations using GenAlEx 6.5 (Nei, Reference Nei1978; Peakall and Smouse, Reference Peakall and Smouse2012; Fig. 1(C) and online supplementary Table S4).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170128063454-66603-mediumThumb-S1479262114000239_fig1g.jpg?pub-status=live)
Fig. 1 Model-based populations of Glycine soja accessions in Korea. (A) ΔK values for detecting a true K in the STRUCTURE analysis. (B) Population structure of 733 G. soja accessions (K= 2). (C) Principal coordinate analysis of three populations (P1, P1P2 and P2).
Results and discussion
The genetic diversity and population structure of 733 G. soja accessions collected in Korea were evaluated using 21 SSR markers (see online supplementary Tables S1 and S2). A total of 539 alleles were identified, ranging from 18 (Satt070) to 33 (Satt237 and Satt382) per locus with an average allelic richness value of 25.7 (see online supplementary Table S2). Mean values of the estimated Shannon's information index and genetic diversity for all the markers were 2.58 and 0.882, respectively. PIC values ranged from 0.722 (Satt184) to 0.945(Satt373), with an average value of 0.873. In the previous studies using wild soybean from China, Japan, Russia and Korea, the SSR loci produced 364 (18.2 per locus), 405 (20.25 per locus), 462 (23.1 per locus) and 322 (16.1 per locus) alleles, respectively (Kuroda et al., Reference Kuroda, Tomooka, Kaga, Wanigadeva and Vaughan2009; Wen et al., Reference Wen, Diang, Zhao and Gai2009). Although the average number of alleles in wild soybean from Korea was lower than that in wild soybean from other countries, the number of rare alleles in wild soybean from Korea was much higher (Kuroda et al., Reference Kuroda, Tomooka, Kaga, Wanigadeva and Vaughan2009). A total of 539 alleles (25.7 per locus) were detected in the Korean wild soybean collection evaluated in the present study. Comparing the previous studies with the present study, it is difficult to determine whether the average number of alleles in wild soybean from Korea is much higher than that in wild soybean from other countries because of the different markers and populations employed. Interestingly, 406 (75% of the total number) rare alleles, present at a frequency of less than 5%, were identified in this study. Among these, 59 (10% of the total number) were unique alleles, among which only one allele was detected in a SSR marker (see online supplementary Table S2), suggesting that many accession-specific alleles were present in the collection. AMOVA of populations P1 and P2 revealed that 7.1% of the variation was due to differences among the populations and 92.9% was due to differences within the populations (see online supplementary Table S3). This indicates that there is significant geographical differentiation in the Korean wild soybean.
Population structure was estimated with STRUCTURE 2.3.4 analysis using 21 SSR markers. The maximum of the ad hoc measure ΔK was observed at K= 2 (Fig. 1A), indicating that the entire collection was divided into two main populations (namely P1 and P2; Fig. 1B). Of the 733 G. soja accessions, 609 were assigned to population P1 and 64 were assigned to population P2 with membership probabilities >70%. The remaining 60 accessions were classified as admixture forms (P1P2). In the principal coordinate analysis, population P2 was divided into three subpopulations based on the cross line (Fig. 1C). As shown in online supplementary Table S4, the main populations and subpopulations reflected the G. soja collection sites. Subpopulation 1, consisting of 22 accessions from Gangwon-do and Gyeonggi-do, is positioned at the upper left side in Fig. 1C. Subpopulation 2, consisting of 11 accessions from Eumseong-gun and Goesan-gun in Chungcheongbuk-do, is positioned at the bottom right side. Subpopulation 3, consisting of 31 accessions from diverse provinces, is placed together with P1 and P1P2 populations. Based on its geographical features, the southern part of Korea can be divided into two regions separated by the Taebaek and Sobaek mountains. G. soja accessions from populations P1 and P1P2 were distributed throughout the country, while most of the accessions from population P2 were divided into three subgroups, distributed west of the Taebaek and Sobaek mountains.
Overall, the Korean G. soja collection had a high mean number of alleles and rare alleles and genetic diversity. Combined with next-generation sequencing technologies, this collection should be a key resource for identifying new genes for soybean improvement with regard to seed yield and resistance to abiotic and biotic stress.
Supplementary material
To view supplementary material for this article, please visit http://dx.doi.org/10.1017/S1479262114000239
Acknowledgements
This study was supported principally by a grant from the Next-Generation BioGreen 21 Program (PJ009091), by the Rural Development Administration, and partly by the National Institute of Crop Science Research Program (PJ009330) as a post-doctoral fellowship.