Introduction
The detection and characterization of genetic variation present in germplasm collections is important for plant breeding programmes seeking to widen the genetic base of breeding populations. Barley (Hordeum vulgare L.) is globally the fourth most important cereal crop. It is grown across a wide range of temperate and semi-arid environments, primarily for animal feed and as a feedstock for beer production. A major public barley germplasm collection is curated by International Center for Agricultural Research in the Dry Areas (ICARDA), but the molecular diversity (commonly measured using one or a combination of molecular marker systems) present in this collection has been rather poorly characterized to date.
Microsatellites (or simple sequence repeats, SSRs) are distributed throughout eukaryotic genomes. Some are present within coding sequence, but the majority, particularly in large genome species such as barley, is located with the non-coding fraction of the genome. A number of attempts have been made to use SSRs to measure genetic diversity in barley, and most have exploited allelic variation in microsatellites derived from genomic DNA (gSSRs) (Liu et al., Reference Liu, Biyashev and Saghai Maroof1996; Russell et al., Reference Russell, Fuller, Macaulay, Hatz, Jahoor, Powell and Waugh1997; Struss and Plieske, Reference Struss and Plieske1998). Few gSSRs have been characterized as to whether or not they assay a stretch of transcribed DNA. However, the wide availability of extensive amounts of barley expressed sequence tag (EST) sequence has now allowed for the systematic development of both expressed SSRs (eSSRs; Varshney et al., Reference Varshney, Graner and Sorrells2005) and genic single nucleotide polymorphisms (SNPs; Kota et al., Reference Kota, Varshney, Thiel, Dehmer and Graner2001a). Because the latter generate a simple binary output, which is well suited to automatic data collection systems, their use is gaining momentum (Kota et al., Reference Kota, Varshney, Thiel, Dehmer and Graner2001a, Reference Kota, Wolf, Michalek and Granerb; Kanazin et al., Reference Kanazin, Talbert, See, DeCamp, Nevo and Blake2002; Russell et al., Reference Russell, Booth, Fuller, Harrower, Hedley, Machray and Powell2004; Rostoks et al., Reference Rostoks, Ramsay, MacKenzie, Cardle, Bhat, Roose, Svensson, Stein, Varshney, Marshall, Graner, Close and Waugh2006). However, SNP genotyping in large germplasm collections is a capital-intensive process, unless the markers can be converted into cleaved amplified polymorphic sequence (CAPS) assays (Kota et al., Reference Kota, Varshney, Prasad, Zhang, Stein and Graner2007; Varshney et al., Reference Varshney, Beier, Khlestkina, Kota, Korzun, Graner and Börner2007a, c).
The present study was undertaken with the following objectives: (1) to generate a comparative assessment of the potential of gSSRs, eSSRs and SNPs to estimate genetic diversity; (2) to investigate the relationship between genetic diversity and the provenance; (3) to analyse the genetic structure of the ICARDA barley germplasm collection.
Material and methods
Genotypes and genotyping
The sample set of the ICARDA barley collection (containing 223 genotypes) consisting of 70 accessions was chosen to maximize diversity on the basis of provenance. In all, seven African (21 accessions), ten Asian (21 accessions), nine Middle Eastern (26 lines) and two European (two accessions) countries were represented (further details given in Table 1). DNA was isolated as described by Thiel et al. (Reference Thiel, Michalek, Varshney and Graner2003).
a cv, cultivars belonging to H. vulgare convar. vulgare and their names are written in parenthesis.
The set of 20 gSSRs was selected from those developed by Liu et al. (Reference Liu, Biyashev and Saghai Maroof1996), Struss and Plieske (Reference Struss and Plieske1998), Ramsay et al. (Reference Ramsay, Macaulay, Ivanissevich, MacLean, Cardle, Fuller, Edwards, Tuvesson, Morgante, Massari, Maestri, Marmiroli, Sjakste, Ganal, Powell and Waugh2000) and Li et al. (Reference Li, Sjakste, Röder and Ganal2003) based on high-quality amplification profile and distribution on all the linkage groups. PCR and fragment detection followed methods described by Röder et al. (Reference Röder, Korzun, Wendehake, Plaschke, Tixier, Leroy and Ganal1998). The set of 20 eSSRs was assembled from those developed by Thiel et al. (Reference Thiel, Michalek, Varshney and Graner2003) and Varshney et al. (Reference Varshney, Grosse, Hahnel, Thiel, Rudd, Zhang, Prasad, Stein, Langridge and Graner2006), and form part of the core set defined by Varshney et al. (Reference Varshney, Thiel, Sretenovic-Rajicic, Baum, Valkoun, Guo, Grando, Ceccarelli and Graner2007c). PCR conditions for the eSSRs followed Thiel et al. (Reference Thiel, Michalek, Varshney and Graner2003), except that fluorescent-dye-labelled primer pairs were used for amplification, the amplicons were separated on an ABI377 device and alleles were scored using GenoTyper 3.7 (Applied Biosystems). SNP primer pairs were selected from those derived by Kota et al. (Reference Kota, Varshney, Prasad, Zhang, Stein and Graner2007), and PCR conditions followed Kota et al. (Reference Kota, Varshney, Thiel, Dehmer and Graner2001a). A CAPS assay was applied to 16 out of the 20 SNPs, while the remaining four were assayed using pyrosequencing (for experimental details of these assays, see Varshney et al., Reference Varshney, Thiel, Sretenovic-Rajicic, Baum, Valkoun, Guo, Grando, Ceccarelli and Graner2007c). The 60 marker loci were distributed across all seven linkage groups of barley.
Diversity analysis
While SSR allelic data were scored using GenoTyper, the SNP marker profiles were scored manually: each allele was scored as present (1) or absent (0) at each SSR and SNP locus. Polymorphism information content (PIC) values were calculated following Anderson et al. (Reference Anderson, Churchill, Autrique, Tanksley and Sorrells1993).
Genetic similarities (GSs) were calculated in the NTSYS-pc 2.11 software package (Biostatistics Inc., USA: Rohlf, Reference Rohlf1998) for each marker pair, using Dice's similarity coefficient. Sequential, agglomerative, hierarchical and nested clustering was employed for the construction of unweighted pair group method with arithmetic mean (UPGMA) dendrograms. The correlations between matrices were calculated using the Mantel (Reference Mantel1967) test, employing 10,000 random iterations in the non-parametric test calculator (Mantel version 2.0, Liedloff, Reference Liedloff1999).
Results
Genotypic variation
Allele number among the gSSR loci ranged from four (Bmac0040) to eight (Ebmac0788), giving a mean of 5.7 alleles per locus. The PIC values ranged from 0.59 (GBMS192) to 0.82 (Ebmac0788) (mean 0.74 ± 0.06) (Table 2). The 20 eSSR markers generated a mean of 9.5 alleles per locus, with allele number varied between 4 (GBM1212) and 17 (GBM1461). The mean eSSR PIC value was 0.73 ± 0.08 [ranging from 0.56 (GBM1029) to 0.88 (GBM1461)] (Table 3). Among the SNP markers provided a total of 23 SNP datapoints as two SNP markers namely GBS0461 and GBS0576 in pyrosequencing assays yielded two and three SNP datapoints, respectively. The PIC values of assayed SNPs was observed in the range of 0.09 (GBS0136) to a maximum of 0.50 (GBS0526) with an average of 0.39 ± 0.10 per SNP (Table 4).
a In parentheses, the identity of the restriction enzyme (CAPS assay), or the SNP position (pos) numbers (pyrosequencing assay).
Genetic diversity
Accessions derived from the Middle East were characterized by the highest PIC values within each marker type (Fig. 1). To measure the comparative diversity in genotypes of different geographic regions, the numbers of alleles at a particular microsatellite locus were counted in each of four geographic regions and these were relatively higher in the Asian region (Fig. 2). The eSSRs tended to display more alleles per locus than did the gSSRs across all the geographical regions (Fig. 2). However, the eSSR and gSSR PIC values were indistinguishable from one another within each geographical region (except Europe). The SNP loci had lower PIC values than the SSRs. When the gSSR, eSSR and SNP GS matrices were correlated, the highest correlation was between the gSSRs and eSSRs (r = 0.86, P < 0.05), followed by that between the eSSRs and the SNPs (r = 0.74, P < 0.05). The correlation between the gSSRs and the SNPs was nevertheless still statistically significant (r = 0.67, P < 0.05).
Genetic structure of the genotype set
Since a significant correlation existed between the three GS matrices, the allelic data (349 alleles) were pooled to examine the genetic structure of the sample. The overall mean GS was 0.43, ranging from 0.16 (IG138235–IG27683) to 0.87 (IG137726–IG138228). The resulting UPGMA dendrogram suggested the presence of two major clusters (I and II), both of which could be classified into sub-clusters. Cluster I was formed from two sub-clusters (IA and IB), comprising sub-sub-clusters IA-i, IA-ii, IB-i and IB-ii. Similarly, cluster II comprised three sub-clusters (IIA-i, IIA-ii and IIB). Six out of these seven sub-clusters were further divisible into the sub-sub-clusters designated p and q (Fig. 3, Supplementary Table 1, available online only at http://journals.cambridge.org). Cluster IA-i p included 75% of the cluster I Asian accessions, while the 48% of the African accessions fell into sub-cluster IIA-ii p. In contrast, the Middle East-derived materials were distributed across clusters I and II, with 31% of the accessions in IA-i q and 23% in IIA-i q.
Discussion
Comparison of marker systems
SSR markers are inherently multi-allelic, and the range of allele number across the 40 loci assayed ranged from 4 to 17. This multi-allelism has established SSRs as the effective marker platform in the current crop diversity studies (Gupta and Varshney, Reference Gupta and Varshney2000). The expectation is that the allele number at gSSR loci is likely to be superior to that at eSSR loci, as non-coding DNA tends to be the more tolerant of sequence expansion (Russell et al., Reference Russell, Booth, Fuller, Harrower, Hedley, Machray and Powell2004; Varshney et al., Reference Varshney, Graner and Sorrells2005). Surprisingly, this expectation was not borne out in the present study, where the mean allele number at the eSSR loci was 9.5, while at the gSSR loci it was only 5.7. We suppose that this apparent inconsistency arose because, while the gSSR set was a random sample (Varshney et al., Reference Varshney, Marcel, Ramsay, Russell, Röder, Stein, Waugh, Langridge, Niks and Graner2007b), the eSSR set represents a ‘core set’, pre-selected on the basis of a high PIC (Varshney et al., Reference Varshney, Thiel, Sretenovic-Rajicic, Baum, Valkoun, Guo, Grando, Ceccarelli and Graner2007c). Unlike for the number of alleles detected, no significant difference was observed between the gSSR and the eSSR PIC values (0.74 ± 0.06 and 0.73 ± 0.08, respectively) markers. This suggests that although a greater number of alleles was present at the eSSR loci, the allele frequencies of the rarer alleles was too low to have any marked effect on the informativeness of the loci.
In terms of generating the comparable genotyping datasets, the highest correlation (r = 0.86, P < 0.05) was observed between gSSR and eSSR marker data. This is possible as both markers are multi-allelic and represent a random distribution in genome. The second strongest correlation (r = 0.74, P < 0.05) between eSSR and SNP marker datasets can be explained based on the similar origin (transcribed part of genome) of these two markers, as shown in Kota et al. (Reference Kota, Varshney, Thiel, Dehmer and Graner2001a) and Varshney et al. (Reference Varshney, Thiel, Sretenovic-Rajicic, Baum, Valkoun, Guo, Grando, Ceccarelli and Graner2007c).
Germplasm diversity and relationships
Accessions originating from the Middle East and Asia were more genetically diverse than those from Africa or Europe. Specifically, the former displayed a higher PIC value, while the latter tended to show a greater number of alleles. These tendencies agree well with outcomes reported by Malysheva-Otto et al. (Reference Malysheva-Otto, Ganal and Röder2006), and are predictable, given that the Middle East and Central Asia have been identified as major centres of origin and/or diversification of barley (Pozzi et al., Reference Pozzi, Rossini, Vecchietti, Salamini, Gupta and Varshney2004). The germplasm collection showed an overall mean GS of 0.43. However, the sampling strategy for the 70 accessions was not random, so caution needs to be exercised in extrapolating the sample mean GS levels to the collection as a whole.
The clustering exercise identified a substantial degree of association between provenance and genotype. Thus, sub-clusters IA-i p and IIA-ii p were formed from accessions sharing, respectively, Asian and African provenance. The small clusters IB-i, IB-ii, IIA-i q, IIA-ii q and IIB were all dominated by accessions of similar provenance. However, two moderately sized clusters (IA-i q and IIA-i p) were geographically heterogeneous. These patterns are suggestive of a narrowing of the genetic base of barley due to breeding for adaptation in certain geographical regions. This applies less to accessions from the Middle East, which group with both Asian and African materials, consistent with the accepted view that the centre of origin of barley is in the Fertile Crescent (Pozzi et al., Reference Pozzi, Rossini, Vecchietti, Salamini, Gupta and Varshney2004), from where it spread to both Africa (Orabi et al., Reference Orabi, Backes, Wolday, Yahyaoui and Jahoor2007) and Asia (Pandey et al., Reference Pandey, Wagner, Friedt and Ordon2006). There does therefore seem to be scope for barley improvement via the exploitation of geographically exotic materials.
In summary, the present study has shown that the core set eSSRs display a comparable level of polymorphism as do gSSRs and SNPs. It appears that more genetic diversity is present among accessions from the Middle East or Asia than among those from Africa, so that this diversity could be of benefit for the improvement of barley in less diversity-rich regions.
Acknowledgements
We thank Jan Valkoun (ICARDA) for providing seed of the barley accessions. This research was financially supported in part by the GTZ project 2002.7860.6-001.00 and Contract 1060503, provided by BMZ (Bundesministerium für Wirtschaftliche Zusammenarbeit und Entwicklung), Germany.