Hostname: page-component-745bb68f8f-5r2nc Total loading time: 0 Render date: 2025-02-11T14:46:24.205Z Has data issue: false hasContentIssue false

Presize: an approach for precise estimation of core collection size using the Similarity Elimination (SimEli) method

Published online by Cambridge University Press:  16 September 2014

R. Ramesh Krishnan
Affiliation:
Molecular Biology Laboratory-1, Host Plant Improvement, Central Sericultural Research and Training Institute, Srirampura, Manandavadi Road, Mysore 570008, Karnataka, India
B. B. Bindroo
Affiliation:
Director, Central Sericultural Research and Training Institute, Srirampura, Manandavadi Road, Mysore 570008, India
V. Girish Naik*
Affiliation:
Molecular Biology Laboratory-1, Host Plant Improvement, Central Sericultural Research and Training Institute, Srirampura, Manandavadi Road, Mysore 570008, Karnataka, India
*
*Corresponding author. E-mail: vgirishnaik@yahoo.com
Rights & Permissions [Opens in a new window]

Abstract

Core collections are the integral part of biotechnology-aided modern-day crop improvement programmes and utilized for a variety of applications including conventional plant breeding, association mapping, resequencing, among others. Since their advent, determination of core collection size has been based on the size of the whole collection. In this study, we precisely estimated the size of the core collection based on the diversity of the whole collection using the Similarity Elimination method. For each of the elimination cycle, allele retention and pairwise and mean genetic distances were calculated and used as the criteria for the precise estimation of the core collection size. We sampled a coconut core collection with 266 entries by retaining the diversity of the whole collection. During the elimination process, accessions with very rare alleles were eliminated first when compared with those having rare and common alleles. Therefore, our results support the hypothesis that the less frequent alleles seldom contribute to the genetic distance when compared with common alleles. In conclusion, presize can be efficiently utilized in any crop for the precise estimation of core collection size.

Type
Short Communication
Copyright
Copyright © NIAB 2014 

Introduction

Core collections are sampled and utilized for a variety of applications in crop improvement programmes. Diversity and size of the core collection plays a crucial role in its effective utilization. A good core collection should represent maximum diversity without having similar accessions in a minimum number of entries (Krishnan et al., Reference Krishnan, Sumathy, Ramesh, Bindroo and Naik2014). An array of sampling methodologies are in practice including stratum-based methods, genetic distance sampling (Jansen and van Hintum, Reference Jansen and van Hintum2007), maximization method (Schoen and Brown, Reference Schoen and Brown1993), Core Hunter (De Beukelaer et al., Reference De Beukelaer, Smýkal, Davenport and Fack2012), genetic distance optimization (Odong et al., Reference Odong, van Heerwaarden, Jansen, van Hintum and van Eeuwijk2011), Groupwise sampling (Guruprasad et al., Reference Guruprasad, Krishnan, Dandin and Naik2014) and the Similarity Elimination (SimEli) method (Krishnan et al., Reference Krishnan, Sumathy, Ramesh, Bindroo and Naik2014). All of these methodologies sample either a diverse or representative core collection (Odong et al., Reference Odong, Jansen, van Eeuwijk and van Hintum2013). A diverse core collection retains maximum diversity in a minimum number of entries, whereas a representative core collection preserves the genetic structure of the whole collection.

Since the advent of the core collection, the size of the core subset has been determined based on the size of the whole collection. It is assumed that the size of the core collection is proportional to the diversity of the whole collection (Bhattacharjee et al., Reference Bhattacharjee, Khairwal, Bramel and Reddy2006). The majority of studies have sampled 5–20% of entries irrespective of the diversity of the whole collection (Reddy et al., Reference Reddy, Upadhyaya, Gowda and Singh2005). In some cases, core collections with different sizes (e.g. 10, 20 and 30%) were sampled and among them best-performing core collection was selected (Wang et al., Reference Wang, Hu, Xu and Zhang2007). In this study, we precisely estimate the size of the core collection based on the diversity of the whole collection using the SimEli method. This approach was developed based on our previously reported SimEli methodology (Krishnan et al., Reference Krishnan, Sumathy, Ramesh, Bindroo and Naik2014). Therefore, we refer the reader to SimEli article as a prerequisite for the proper understanding of this work.

Experimental

All computations were carried out using the R Development Core Team (2013) by using either appropriate packages or our custom scripts. A genotypic dataset of 1014 coconut accessions profiled using 30 simple sequence repeat (SSR) markers was utilized in this study (Odong et al., Reference Odong, van Heerwaarden, Jansen, van Hintum and van Eeuwijk2011; Krishnan et al., Reference Krishnan, Sumathy, Ramesh, Bindroo and Naik2014). SSR marker allele data were converted into allele frequency and used in the calculation of modified Rogers' genetic distance. The SimEli method accepts any pairwise genetic distance of accessions in the whole collection and involves two steps: (1) selection criterion – a pair of accession having the least distance is identified and (2) elimination criterion – one accession among the pair is eliminated based on different elimination criteria. In this study, the ‘accession to rest of the accession’ distance was used as the elimination criteria. Pairwise and mean genetic distances, and allele retention were measured for each elimination cycle in the SimEli method (Fig. 1). We repeated this elimination cycle until all the accessions in the whole collection were eliminated. Our aim was to monitor the rate of change in diversity measures during the elimination process, and to determine the precise size of core collection, in which the diversity of the whole collection is retained in minimum number of entries.

Fig. 1 Rate of change in allele retention, and pairwise and mean genetic distances during the elimination cycle. Green line represents pairwise genetic distance, black line represents mean genetic distance and blue line represents the number of alleles retained during each elimination cycle.

Results and discussion

A total of 173 alleles were recorded in the whole collection, and all the alleles were retained until the size of the core collection reached 532 accessions. One allele was lost when the 532nd accession was removed, and the rest of the alleles were retained until the size of the core collection reduced to 266 accessions (26.23% of the whole collection). The eliminated allele is a very rare allele, which was recorded only in one of the 1014 coconut accessions. Probably, the eliminated allele might have been recorded due to the scoring error or non-specific amplification. Beyond the 266 accessions, allele retention progressively decreased in each subsequent elimination cycle (Fig. 1). Alleles with very less frequency were eliminated initially, followed by rare alleles and, finally, common alleles (Fig. 2). Our results support the hypothesis that the marker with rare and very rare alleles contributes very less to the genetic distance when compared with that having more common alleles. Moreover, the utility of these very rare alleles in crop improvement programmes is debatable (Odong et al., Reference Odong, van Heerwaarden, Jansen, van Hintum and van Eeuwijk2011; Zhang et al., Reference Zhang, Zhang, Wang, Sun, Qi, Li, Wei, Han, Qiu, Tang and Li2011).

Fig. 2 Rate of change in allele retention. A total of 173 alleles in the whole collection were grouped into ten quantiles with an equal number of alleles. Retention of alleles in the ten quantiles was measured during the elimination cycle.

During the initial elimination cycle, the mean genetic distance of the core collection showed a steady decrease due to the elimination of similar diverse accessions. These similar diverse accessions are genetically distant from the rest of the collection and likely to be genetically close to each other. After the elimination of these similar diverse entries, the mean genetic distance showed a steady increase until the size of the core collection reached 19 accessions. The rate of change in pairwise genetic distance was maximum between the initial and final elimination cycles, indicating that the majority of the accessions were 0.4–0.6 genetic distance apart.

On the basis of these observations, core collection size can be precisely estimated by monitoring the change in allele retention, and mean and pairwise genetic distances. To achieve maximum allelic richness, the size of the coconut core collection can be set to 266 (26.23%), where all the SSR alleles in the whole collection were retained in the core collection. As has been discussed previously (Krishnan et al., Reference Krishnan, Sumathy, Ramesh, Bindroo and Naik2014), allelic richness of the core collection can be increased by using expected heterozygosity (H e) as the elimination criteria. For sampling distant entries, pairwise and mean genetic distances can be used instead of allelic richness to determine the size of the core collection. Therefore, the selection of this criterion should be based on the objective of the core collection such as whether to sample a collection with high allelic richness or high-pairwise genetic distance among the entries. The presize approach can be efficiently utilized to study the contribution of trait variations or alleles to the diversity of the whole collection and to precisely estimate the size of the core collection.

Acknowledgements

The authors thank the reviewers for their constructive and insightful comments. They thank the Generation Challenge Program for providing the coconut dataset in the public domain (http://gcpcr.grinfo.net). The authors also thank all the researchers involved in the generation of the dataset. They acknowledge the use of adegenet, cluster, ggplot2, hmisc and reshape2 R packages.

References

Bhattacharjee, R, Khairwal, IS, Bramel, PJ and Reddy, KN (2006) Establishment of a pearl millet [Pennisetum glaucum (L.) R. Br.] core collection based on geographical distribution and quantitative traits. Euphytica 155: 3545.Google Scholar
De Beukelaer, H, Smýkal, P, Davenport, GF and Fack, V (2012) Core Hunter II: fast core subset selection based on multiple genetic diversity measures using Mixed Replica search. BMC Bioinformatics 13: 312.Google Scholar
Guruprasad, , Krishnan, RR, Dandin, S and Naik, VG (2014) Groupwise sampling: a strategy to sample core entries from RAPD marker data with application to mulberry. Trees 28: 723731.Google Scholar
Jansen, J and van Hintum, T (2007) Genetic distance sampling: a novel sampling method for obtaining core collections using genetic distances with an application to cultivated lettuce. Theoretical and Applied Genetics 114: 421428.CrossRefGoogle ScholarPubMed
Krishnan, RR, Sumathy, R, Ramesh, SR, Bindroo, BB and Naik, VG (2014) SimEli: Similarity Elimination method for sampling distant entries in development of core collections. Crop Science 54: 19.Google Scholar
Odong, TL, van Heerwaarden, J, Jansen, J, van Hintum, TJL and van Eeuwijk, FA (2011) Statistical techniques for defining reference sets of accessions and microsatellite markers. Crop Science 51: 2401.CrossRefGoogle Scholar
Odong, TL, Jansen, J van Eeuwijk, FA and van Hintum, TJL (2013) Quality of core collections for effective utilisation of genetic resources review, discussion and interpretation. Theoretical and Applied Genetics 289: 305.Google Scholar
R Development Core Team (2013) Version 2.12.3. R Foundation for Statistical Computing, Vienna, Austria.Google Scholar
Reddy, LJ, Upadhyaya, HD, Gowda, CLL and Singh, S (2005) Development of core collection in pigeonpea [Cajanus cajan (L.) Millspaugh] using geographic and qualitative morphological descriptors. Genetic Resources and Crop Evolution 52: 10491056.CrossRefGoogle Scholar
Schoen, DJ and Brown, AH (1993) Conservation of allelic richness in wild crop relatives is aided by assessment of genetic markers. Proceedings of the National Academy of Sciences of the United States of America 90: 1062310627.CrossRefGoogle ScholarPubMed
Wang, JC, Hu, J, Xu, HM and Zhang, S (2007) A strategy on constructing core collections by least distance stepwise sampling. Theoretical and Applied Genetics 115: 18.Google Scholar
Zhang, H, Zhang, D, Wang, M, Sun, J, Qi, Y, Li, J, Wei, X, Han, L, Qiu, Z, Tang, S and Li, Z (2011) A core collection and mini core collection of Oryza sativa L. in China. Theoretical and Applied Genetics 122: 4961.Google Scholar
Figure 0

Fig. 1 Rate of change in allele retention, and pairwise and mean genetic distances during the elimination cycle. Green line represents pairwise genetic distance, black line represents mean genetic distance and blue line represents the number of alleles retained during each elimination cycle.

Figure 1

Fig. 2 Rate of change in allele retention. A total of 173 alleles in the whole collection were grouped into ten quantiles with an equal number of alleles. Retention of alleles in the ten quantiles was measured during the elimination cycle.