Introduction
Analysis of a large sample size of a species constitutes an important constraint in diversity studies. Studying a subset might be more efficient if the sample spans the full range of variation (Frankel, Reference Frankel, Arber, Limensee, Peacock and Starlinger1984). Core subsets are assembled to facilitate the intensive study and utilization of genetic resources as well as to improve their conservation (Frankel and Brown, Reference Frankel, Brown, Holden and Williams1984).
Core subsets have been formed on the basis of passport and phenotypic data (Amalraj et al., Reference Amalraj, Balakrishnan, Jebadhas and Balasundaram2006), or molecular markers for many species (Belaj et al., Reference Belaj, Dominguez-García, Atienza, Urdíroz, De la Rosa, Satovic, Martín, Kilian, Trujillo, Valpuesta and Del Río2012; Hu et al., Reference Hu, Wang, Su, Wang, Li and Sun2014). The later would reflect changes that have occurred at the DNA level but are not necessarily expressed in the phenotype of the organism and can lead to significant gains in the number of alleles retained in a sample compared with random sampling. In addition, M (Maximization) strategy is one of the best methods to construct core collections because genetic markers are used to sample a core collection while maximizing allele richness at each marker locus (Schoen and Brown, Reference Schoen and Brown1993). To date, the M strategy is clearly the most powerful function for selecting entries with the most diverse alleles and eliminating redundancy that comes from non-informative alleles, which arise from co-ancestry and certain assertive mating systems in establishing core sets (Franco et al., Reference Franco, Crossa, Warburton and Taba2006).
Apricot (Prunus armeniaca L.) is one of the most important fruit crops of the Mediterranean basin and was probably initially domesticated in China (Bailey and Hough, Reference Bailey, Hough, Janick and Moore1975). The first domesticated forms would then have been diffused through Central Asia and the Irano-Caucasian area (Vavilov, Reference Vavilov1992). A study on apricot diffusion in the Mediterranean Basin showed that apricot in this region was structured into at least three major gene pools: Irano-Caucasian, North Mediterranean and South Mediterranean. It was demonstrated that starting from the Irano-Caucasian area, apricot was spread to the Mediterranean Basin through two diffusion routes: one through northern countries of the Mediterranean Sea and one through North African countries (Bourguiba et al., Reference Bourguiba, Audergon, Krichen, Trifi-Farah, Mamouni, Trabelsi, D'Onofrio, Asma, Santoni and Khadari2012).
Maghreb region in North Africa was characterized by original apricot genetic resources with the coexistence of traditional local cultivars propagated by grafting and accessions propagated by seeds and specific to the oasis agroecosystems, which shared a common gene pool originated from the Irano-Caucasian area (Bourguiba et al., Reference Bourguiba, Audergon, Krichen, Trifi-Farah, Mamouni, Trabelsi, D'Onofrio, Asma, Santoni and Khadari2012). North African apricot germplasm enclosed unique traits such as high level of sugar content, early flowering and ripening period and low need cold (Crossa-Raynaud, Reference Crossa-Raynaud1960). Therefore, its conservation was essential for worldwide breeding programmes. However, owing to its in-situ collection, severe genetic erosion was reported as a consequence of both urbanization and selection of more economically profitable crops (Krichen et al., Reference Krichen, Trifi-Farah, Marrakchi and Audergon2009).
In this study, our aim was to examine the genetic diversity and the population structure of 183 apricot accessions from Algeria, Morocco and Tunisia, and to establish the first core collection representative of the genetic diversity encountered in North Africa region using 24 informative SSR markers. Our results will help to optimize conservation of apricot genetic resources in order to facilitate their utilization in modern breeding programmes.
Materials and methods
Plant material
We evaluated 183 North African apricot (Prunus armeniaca L.) accessions composed of 80 accessions from Tunisia, 69 accessions from Morocco and 34 accessions from Algeria. Except for 7 Moroccan accessions obtained from an ex situ germplasm collection maintained at the National Institute of Agronomic Research of Meknes (INRAM), all the studied apricot accessions were collected in situ from the main apricot growing areas in each country: Messaad in Algeria; Dadès Valley, Drâa Valley, Moulouya Valley and Ziz Valley in Morocco; and North (Ras Jebel and Testour), Centre (Kairouan, Mahdia and Sfax), South (Gabès and Jerba) and the Oasis (Gafsa, Midess, Tameghza, Degache, Nefta and Tozeur) in Tunisia.
Apricots from the North, Centre and South of Tunisia as well as from the INRAM of Morocco were propagated by grafting, while all the remaining accessions were propagated by seeds (Table S1, available online).
DNA extraction and genotyping procedure
Total genomic DNA was extracted from fresh young leaves of a single individual plant (150 mg) per accession using the DNeasy Plant Mini Kit (Qiagen) following the manufacturer's instructions with slight modifications: addition of 1% w/v of PVP-40 to the AP1 buffer solution. DNA concentrations were estimated by spectrofluorometry and diluted to a final concentration of 20 ng/μl.
A set of 24 microsatellite loci were selected for genotyping according to their location on the Prunus reference genetic map (Joobeur et al., Reference Joobeur, Viruel, de Vicente, Jauregui, Ballester, Dettori, Verde, Truco, Messeguer, Batlle, Quarta, Dirlewanger and Arús1998; Aranzana et al., Reference Aranzana, Pineda, Cosson, Dirlewanger, Ascasibar, Cipriani, Ryder, Testolin, Abbott, King, Iezzoni and Arùs2003) as they are homogenously distributed on the eight linkage groups of Prunus genome (Table 1). Polymerase chain reaction (PCR) amplification was performed in a final volume of 20 μl containing 20 ng of template DNA, 2 mM of MgCl2, 4 pmol of the reverse primer and 1 pmol of the forward primer, 0.2 mM of each deoxynucleotide triphosphate and 1 unit of Taq DNA polymerase (Sigma, Saint-Louis, MO, USA). The forward primer was 5′-labelled with one of the three fluorophores: 6FAM, NED or HEX. Reactions were carried out using a thermal cycler (MasterCycler ep gradient S; Eppendorf, Hamburg, Germany). Following 5 min of initial denaturation step at 94°C, 35 cycles of amplification were performed for 30 s at 94°C, 1 min at T° annealing (depending on the locus) and 1 min at 72°C, followed by final extension step for 10 min at 72°C. PCR products were separated using an automatic capillary sequencer (ABI prism 3130 Genetic Analyzer; Applied Biosystems, Foster City, CA, USA). Analyses of allele sizes were performed using the GeneMapper 3.7 software (Applied Biosystems, Foster City, CA, USA).
Table 1 Diversity statistics for 24 SSR loci studied in the 183 apricot accessions

N a, observed number of alleles; *, observed alleles with frequencies less than 5%; N e, effective number of alleles; H e, expected heterozygosity; H o, observed heterozygosity; I, Shannon's information index.
a Hagen et al. (Reference Hagen, Chaib, Fady, Decroocq, Bouchet, Lambert and Audergon2004); bDirlewanger et al. (Reference Dirlewanger, Cosson, Tavaud, Aranzana, Poizat, Zanetto, Arús and Laigret2002); cAranzana et al. (Reference Aranzana, Garcia-Màs, Carbo and Arús2002); dYamamoto et al. (Reference Yamamoto, Mochida, Imai, Shi, Ogiwara and Hayashi2002); eCipriani et al. (Reference Cipriani, Lot, Huang, Marrazzo, Peterlunger and Testolin1999); fTestolin et al. (Reference Testolin, Marrazzo, Cipriani, Quarta, Verde, Dettori, Pancaldi and Sansavini2000).
Statistical analysis
Genetic diversity parameters for each SSR locus were calculated as the observed number of alleles (N a), the effective number of alleles (N e), the observed heterozygosity (H o), the expected heterozygosity (H e) and Shannon's information index (I) using the program POPGENE v. 1.32 (Yeh and Boyle, Reference Yeh and Boyle1997).
To infer the genetic structure of the apricot germplasm, a model-based Bayesian clustering method implemented in the STRUCTURE V. 2.2 program (Pritchard et al., Reference Pritchard, Stephens and Donnelly2000) was used. STRUCTURE was run using a model with admixture and correlated allele frequencies, with the assumed number of genetic clusters (K) varying from 1 to 10 and ten independent replicate runs for each K value. Each run consisted of a burn-in period of 100,000 steps followed by 100,000 Monte Carlo Markov Chain simulations. Statistic parameters defined by Evanno et al. (Reference Evanno, Regnaut and Goudet2005) based on the rate of change in the log probability of data between successive K values were used to confirm the exact estimation of the most likely number of clusters K. No prior information was used to define the clusters.
Development of core collections was performed by the advanced M (maximization) strategy, using a modified heuristic algorithm implemented in the PowerCore V. 1.0 software as proposed by Kim et al. (Reference Kim, Chung, Cho, Ma, Chandrabalan, Gwag, Kim, Cho and Park2007). PowerCore selects the accessions with an effort to maximize the number of alleles in the SSR data. Thus, the size of the final core collection is not known a priori and it depends on the levels of variability and redundancy present in the collection. The significance of richness differences between the entire and the core collections was tested by the two-tailed Mann–Whitney U test using StatsDirect. The total number of alleles captured using the M strategy in samples of increasing sizes was assessed in order to study the sampling efficiency.
Results
Microsatellite diversity
The selected 24 mapped SSR markers were used for genotyping the 183 apricot accessions (Table 1). A total number (N a) of 192 alleles were detected, ranging from 2 (BPPCT001) to 16 (UDP98-409) with a mean of eight alleles per locus. Out of these, 76 were regarded as rare alleles with frequencies less than 5% in the total accessions. The effective number of alleles (N e) ranged from 1.168 (AMPA119) to 7.578 (UDP98-409) with a mean level of 2.986. Nei's diversity index (H e) values varied from 0.169 (AMPA109) to 0.870 (UDP98-409), with an average of 0.593. H o values ranged from 0.129 (UDP98-409) to 0.855 (AMPA119), with an average of 0.406. Shannon's information index ranged from 0.335 (AMPA119) to 2.179 (UDP98-409) with a mean value of 1.211 (Table 1). Compared with the work of Zhang et al. (Reference Zhang, Liu, Liu, Liu, Wei, Zhang and Liu2014), which studied 94 apricot samples from China with 21 SSR markers and revealed an average of 15.14 alleles per locus, and even if the set of SSR markers is different, the results suggested that the apricot germplasm in North Africa enclosed a moderate genetic diversity.
Population structure analysis
The population structure of North African apricot accessions was assessed using a model-based Bayesian method analysis implemented in the STRUCTURE program. Results indicated K= 4 (ΔK= 0.805) as the most likely number of clusters, suggesting the presence of four genetic clusters in the total panel (Fig. 1). Of the 183 accessions, 142 accessions were clearly assigned to the four genetic clusters with a membership probability of 0.7 and the remaining 41 accessions were an admixture cluster with a membership probability < 0.7 (Table S2, available online). Cluster 1 comprised 38 accessions from the different prospected landraces of Morocco. Cluster 2 included 27 accessions belonging to the ‘North Mediterranean’ gene pool as established by Bourguiba et al. (Reference Bourguiba, Audergon, Krichen, Trifi-Farah, Mamouni, Trabelsi, D'Onofrio, Asma, Santoni and Khadari2012). These accessions belonged to the variety population ‘Canino’ and they were preferentially located in Morocco and to a lesser extent in Algeria and Tunisia. Cluster 3 grouped 46 accessions from the oasis regions of Tunisia and the Messaad region in Algeria. Finally, cluster 4 consisted of 31 apricots originating from northern, central and southern Tunisia, which were all propagated by grafting. Except for cluster 2, the accessions were assigned to a genetic cluster according to both their geographic origin and mode of propagation. The presence of cluster 2 confirmed the presence of gene exchanges between the northern and southern countries of the Mediterranean region as attested by Bourguiba et al. (Reference Bourguiba, Audergon, Krichen, Trifi-Farah, Mamouni, Trabelsi, D'Onofrio, Asma, Santoni and Khadari2012). These findings were also highlighted by the percentage of admixed accessions among cluster 2 and the three other clusters (63.4%; Table S2, available online).

Fig. 1 Population structure of North African apricot accessions inferred from STRUCTURE analysis and presented by country. The whole accession panel was divided into four main genetic clusters. Each colour represents one cluster. Each individual is represented by a vertical line and the length of the coloured segment shows the proportion of each cluster within the accession.
Development of a core collection
In order to develop a preliminary core collection representing a full coverage of all alleles existing in the entire apricot collection with a smaller number of accessions, SSR data and the software PowerCore v. 1.0 were used. The capture of a maximized genetic diversity using a heuristic search, which employs an advanced M strategy, has been estimated. On the given bases, a core collection of 53.55% of the entire collection (98/183; Table S2, available online) was constructed capturing 99.47% of the total alleles.
To compare allelic richness and evenness between the core based on 98 accessions and the entire collections, Nei's gene diversity (H e) and the Shannon–Weaver diversity index (I) were computed for each SSR marker (Fig. 2), and the two-tailed Mann–Whitney U tests were performed. The distributions of H e (Fig. 2(a)) and I (Fig. 2(b)) over the 24 markers for the core (red) and the whole (blue) collections displayed a high similarity, whereas the mean H e (0.617) and I (1.282) in the core set were higher than those in the whole collection with H e= 0.593 and I= 1.211, respectively (Table 2). In addition, the total number of alleles in the core set (N a= 7.958) was almost equal to the number of alleles in the whole collection, while the effective number of alleles in the core set (N e= 3.179) was higher than the original germplasm (N e= 2.986). No significant difference was observed for N a, N e, H e and I between the core and the whole collections, as indicated by the two-tailed Mann–Whitney U tests (Table 2). The constructed core collection displayed a relatively good representation of accessions from the four genetic clusters identified in North Africa (Table 3). In fact, the representation of clusters 1, 2 and 3 in the core set varied from 50 to 59.25%. Only 11 out of 31 accessions (35.48%) from cluster 4, which was composed of grafting propagated accessions from Tunisia, were retained in the core set. However, more than 65% of the admixed accessions from the entire collection were presented in the core set (Table 3).

Fig. 2 Comparison of (a) Nei's diversity index and (b) Shannon's information index over the 24 SSR markers between the core (red) and the entire (blue) collections.
Table 2 Comparison of diversity parameters between the entire collection and the constructed core collection based on 98 out of the 183 accessions

* Probability of independency between the entire and the core collections using the two-tailed Mann–Whitney U test.
Table 3 Apricot accession repartition for both the entire and core collections among the four genetic clusters identified by the STRUCTURE analysis assessed on the entire collection

%D, differences as percentage.
Overall, the core collection developed herein retained allelic richness and evenness, having a significant representation of the 183 accessions.
Sampling efficiency and population structure
The sampling efficiency (the ability to capture genetic diversity) using the M strategy related to increasing core sample sizes was assessed and plotted in Fig. 3. Results indicated that the number of alleles captured in a core collection was approximately proportional to the evolution of its size, with approximately 20% of the allelic diversity captured with the first accessions, 55% with 5 accessions and more than 99% with half of the collection (98/183 accessions).

Fig. 3 Sampling efficiency evolution according to the core collection size of North African apricots based on M strategy and using the PowerCore analysis.
Regarding the repartition of the four genetic clusters identified by STRUCTURE analysis within the different core subsets, the part of clusters 1 and 2 was similar throughout the different core subsets, while the part of clusters 3 and 4 was increased from the core subset with 5 accessions (0%) to the developed core collection with 98 accessions (11.22 and 27.55%, respectively; Table S3, available online). However, the percentage of admixed accessions in the different core subsets decreased from 60% (core subset with 5 accessions) to 27.55% (core collection with 98 accessions). From the core subset with 20 accessions, the accessions from cluster 3 as well as the admixed accessions were mostly retained.
Discussion
In this study, we assessed the patterns of genetic diversity, underlying population structure of a large and original apricot collection in North Africa using 24 informative SSR markers evenly distributed on the eight linkage groups of Prunus genome.
A total of 192 alleles with an average of 8 alleles per locus were detected in 183 apricot accessions using 24 SSR markers. Further, we noted 32 alleles (16.6%) detected once, and 76 alleles (39.58%) were considered as rare. Among the SSR loci, UDP98-409 generated the highest number of alleles. Similar results were also reported in the Turkish apricot germplasm (Yilmaz et al., Reference Yilmaza, Paydas-Kargib, Doganb and Kafkasb2012), attesting the high level of polymorphism of this locus in apricot species.
The gene diversity index (H e= 0.593) was lower than that detected in 94 apricot accessions from China using 21 SSR markers (H e= 0.792; Zhang et al., Reference Zhang, Liu, Liu, Liu, Wei, Zhang and Liu2014) and 39 apricot accessions from Iran using 10 SSR markers (H e= 0.63; Raji et al., Reference Raji, Jannatizadeh, Fattahi and Esfahlani2014). Such results confirmed the presence of an East-West gradient of genetic diversity as attested by Khadari et al. (Reference Khadari, Krichen, Lambert, Marrakchi and Audergon2006) and Bourguiba et al. (Reference Bourguiba, Audergon, Krichen, Trifi-Farah, Mamouni, Trabelsi, D'Onofrio, Asma, Santoni and Khadari2012) related to the history of diffusion and domestication of apricot species.
The genetic structure of our panel was studied by a model-based Bayesian clustering method using the STRUCTURE software. In fact, STRUCTURE analysis is proved to be an effective method to study genetic relationships within a germplasm collection (Song et al., Reference Song, Fan, Chen, Zhang, Ma, Zhang and Wu2014). Four main genetic clusters were identified according to both the geographic origin and the mode of propagation of the accessions. Such results are in agreement with previous studies considering only Tunisian apricot germplasm (Bourguiba et al., Reference Bourguiba, Khadari, Krichen, Trifi-Farah, Santoni and Audergon2010; Krichen et al., Reference Krichen, Bourguiba, Audergon and Trifi-Farah2010). In addition, the genetic structure highlighted the presence of accessions from ‘North-Mediterranean’ gene pool (Bourguiba et al., Reference Bourguiba, Audergon, Krichen, Trifi-Farah, Mamouni, Trabelsi, D'Onofrio, Asma, Santoni and Khadari2012) in North Africa corresponding mainly to the variety population of the cultivar ‘Canino’. Such results attested the presence of frequent gene exchanges between the northern and southern countries of the Mediterranean Basin, which were certainly mediated by human migration.
By employing PowerCore with an advanced M strategy, we constructed a core collection containing 98 diverse accessions representing 53.55% of the whole panel. Owing to the high sampling efficiency, this strategy ensured the capture of 99.47% of the allelic diversity existing in the whole panel. Brown (Reference Brown1989) pointed out that core collections generally account for the entire genetic resources with 5–10% of cultivars, or a total of not more than 3000 individuals. Such discrepancy could be explained by two reasons: (1) the base collection (183 apricot accessions) has a small size compared with the collections of field crops, resulting in higher sampling percentage and (2) most of the accessions in the base collection are landraces that possess larger allelic richness than the bred accessions and, hence, high sampling percentage can ensure 100% allele coverage. Other studies have also reported a high sampling percentage as 39.5% for wild tomatoes (Rao et al., Reference Rao, Kadirvel, Symonds, Geethnajali and Ebert2011) and 32.2% for pear (Song et al., Reference Song, Fan, Chen, Zhang, Ma, Zhang and Wu2014). The core collection comprised 98 accessions with 19 from Algeria, 38 from Morocco and 41 from Tunisia. In addition, 21 out of 43 grafting propagated accessions (48.33%) and 77 out of 140 seed-propagated accessions (55%) were included in the core set. Regarding the genetic clusters identified by STRUCTURE analysis, the core collection presented the same representation of clusters 1, 2 and 3 varying from 50 to 59.25%. However, only 35.48% of accessions from cluster 4, which was composed of grafting propagated accessions from Tunisia, were retained in the core set. Such results reflect that the genetic similarity is larger in clonal crops than in seed crops. Therefore, the core set retained allelic richness and evenness, and can be deemed as an appropriate choice for applications involving genetic conservation in apricot and a source of beneficial genes for some important agronomical traits. In addition, the detailed evaluations of this core collection may also be useful to maximize polymorphism in a QTL mapping population.
Studying the sampling efficiency of the M strategy in increasingly large samples, results revealed that 55% of the total alleles captured in the entire collection were obtained when the sample size was equal to 5 accessions (approximately 3% of the whole sample size) attesting a high level of redundancy in the entire collection. In addition, it expressed that the clusters widely influenced the core collection elaboration. In fact, the contribution of the four genetic clusters throughout the different core subsets revealed that the part of clusters 1 and 2 was similar while the part of clusters 3 and 4 was increased and the part of admixed accessions was decreased. Finally, starting from core subset with 20 accessions, accessions from cluster 3, which was composed of seed-propagated material from Algeria and Tunisia, as well as the admixed accessions were mostly retained.
In conclusion, the approach followed in this work, permitted to develop a small representative sample of apricot accessions, makes it easier to look for allelic variations in genes of interest and is more efficient to assess future genetic association studies.
Supplementary material
To view supplementary material for this article, please visit http://dx.doi.org/10.1017/S1479262115000313
Acknowledgements
The authors would like to thank Dr Ali Mamouni and Dr Samia Trabelsi for kindly providing apricot leaf material from Morocco and Algeria, respectively, and Dr Sylvain Santoni and A. Weber for their help in microsatellite genotyping. This work was supported by a bilateral Franco-Tunisian initiative within the framework of the CMCU project (05G0904), with a helpful assistance from the Tunisian ‘Ministère de l'Enseignement Supérieur et de la Recherche Scientifique’.