Hostname: page-component-745bb68f8f-l4dxg Total loading time: 0 Render date: 2025-02-06T18:43:31.877Z Has data issue: false hasContentIssue false

Sample size and haplotype richness in population samples of the lichen-forming ascomycete Xanthoria parietina

Published online by Cambridge University Press:  06 August 2009

Louise LINDBLOM
Affiliation:
Department of Biology and Museum of Natural History, University of Bergen, P.O. Box 7800, NO-5020 Bergen, Norway. Email: louise.lindblom@bm.uib.no
Rights & Permissions [Opens in a new window]

Abstract

The fine-scale genetic variation and population structure of lichen-forming fungi is little known and sampling strategies are rarely recommended or discussed. I tested if the sample sizes of molecular data sets used in recent population studies of Xanthoria parietina revealed all haplotypes potentially present and, accordingly, quantified how many haplotypes were potentially missing in the samples. Data sets were concatenated from two geographical regions in Scandinavia and investigated if the sampling reached saturation at two levels: 1) individual-based using rarefaction curves and 2) population-based using species accumulation curves. At both levels, the matrices of two molecular markers (IGS and ITS) were analysed separately. The molecular markers show similar and parallel patterns in all analyses. Rarefaction analyses did not reveal different patterns for populations in different habitats, i.e., bark and rock. Species accumulation curves estimated with the Chao 1 richness estimator indicated that 23% of the IGS and 8% of the ITS haplotypes were not detected. Corresponding figures from an abundance-based coverage estimator (ACE) were 37% and 18%. Pilot studies are recommended to determine appropriate sample sizes for genetic-based population studies of lichen-forming fungi.

Type
Research Article
Copyright
Copyright © British Lichen Society 2009

Introduction

The number of publications examining fine-scale genetic variation and population structure of lichen-forming fungi based on DNA variation has increased markedly in recent years (e.g., Zoller et al. Reference Zoller, Lutzoni and Scheidegger1999; Walser et al. Reference Walser, Holderegger, Gugerli, Hoebbe and Scheidegger2005; Lindblom & Ekman Reference Lindblom and Ekman2006, Reference Lindblom and Ekman2007; Werth & Sork Reference Werth and Sork2008). However, sampling strategies are never explicitly recommended or discussed. It would be unrealistic for the population geneticist to sample all individuals in a population to ensure that all haplotypes (haploid genotype, is here used as equivalent to allele or combination of alleles) are observed. The only practical approach is to take samples from the total populations studied. Textbooks on population genetics are, however, reluctant in suggesting sampling methods and ideal sample sizes. The number of individuals, both the number per population and total number, and the number of populations sampled in population studies should depend on the aims of the study and also on the organism studied. A few authors go so far as to note that it is appropriate to perform pilot studies in order to achieve optimal sampling (Baverstock & Moritz Reference Baverstock, Moritz, Hillis, Moritz and Mable1996). In the rare cases where a concrete number of sampled individuals is suggested in a publication, it ranges from four to six (when the differentiation between populations is high) or 20 to 40 (when the differentiation between populations is low) (Karp et al. Reference Karp, Isaac and Ingram1998) and even up to hundreds of individuals (Weir Reference Weir1996b). An unusually definitive number was given by Weir (Reference Weir, Hillis, Moritz and Mable1996a), who recommended that more than 20 individuals should be sampled, from at least five localities each, to provide for statistical testing via resampling methods. For fungi, McDonald(Reference McDonald1997) stated that sample sizes of 10 individuals or less are usually meaningless, whereas samples of 30 to 100 individuals can be ‘quite reliable’. In practice it seems that between 20 and 30 individuals are often sampled [e.g., Rosquist & Prentice Reference Rosquist and Prentice2002 (20–30); Kull & Oja Reference Kull and Oja2007 (15–25); Prentice et al. Reference Prentice, Lönn, Rosquist, Ihse and Kindström2006 (up to 26); Werth et al. Reference Werth, Wagner, Holderegger, Kalwij and Scheidegger2006 (up to 24); but see also Walser et al. Reference Walser, Holderegger, Gugerli, Hoebbe and Scheidegger2005 (c. 45) and Werth & Sork Reference Werth and Sork2008 (18)]; scientific or statistically well-founded arguments for the number chosen are rarely presented. Furthermore, it is almost never mentioned that sampling haploid organisms and diploid organisms is different in that n haploid individuals will give n gene copies, whereas n diploid individuals will give 2n (twice) gene copies. Many statistical methods make inferences from the number of gene copies and not from the number of individuals in the sample.

Various inventory methods to sample the area exhaustively are applied when field ecologists set out to determine the species diversity of a locality. Biologists conducting surveys of species richness have long recognised that it is virtually impossible to detect all species and their relative abundance within a limited number of samples. Completing an exhaustive total inventory that finds all species present is not possible. The invested sampling effort represents a trade-off between the cost as determined by time and effort involved in collecting and identifying individuals and the probability of encountering new species. The rate at which species are added to the inventory provides important clues about species richness (and the species abundance distribution) of the total assemblage (Magurran Reference Magurran2004). Haplotype richness (the number of haplotypes) in a population genetic study could be considered analogous to species richness (alpha diversity). Hence, the rate at which new haplotypes are added to the inventory would provide important clues about the haplotype richness of the population as a whole.

Lindblom & Ekman (Reference Lindblom and Ekman2006, Reference Lindblom and Ekman2007) investigated the evolutionary dynamics of lichen-forming ascomycetes through the genetic diversity and fine-scale population structure of Xanthoria parietina by using sequence variation in two regions of the nuclear ribosomal DNA. Thirty-three individuals were sampled from each of 14 populations in two field study sites in Norway and Sweden. The sample size in each population was chosen following recommendations from members of the research group and a textbook (Karp et al. Reference Karp, Isaac and Ingram1998) that approximately 25 individuals per population would probably be enough. Five individuals were then added to the number to secure a buffer should anything fail during molecular work in the laboratory (DNA extraction, PCR amplification, sequencing). Finally, three individuals were added, making a total sample size of 33. An anonymous reviewer on a version of one manuscript (Lindblom & Ekman Reference Lindblom and Ekman2006) remarked that “…the sample size per population is around the minimum amount suggested to be analyzed in text books on population genetics”, but did not indicate which text books were referred to. The number of populations (seven at each site) was chosen for practical reasons, because it was the maximum number that could be reached given the sample size, time, and funding for molecular work.

The objective of the present study is to assess if the sample sizes of molecular data sets in recent population studies of Xanthoria parietina revealed all haplotypes potentially present and to quantify the potentially missing haplotypes in the samples using methods commonly applied in species richness studies. The results thus estimate the completeness of the haplotype sampling. I know of only one previous attempt to make similar estimates for populations of lichen-forming fungi (Printzen et al. Reference Printzen, Ekman and Tønsberg2003).

Methods

Data sources

Concatenated versions of molecular population data sets published by Lindblom & Ekman (Reference Lindblom and Ekman2006, Reference Lindblom and Ekman2007) were analysed. The data sets were sampled from seven Xanthoria parietina populations in Norway (Lindblom & Ekman Reference Lindblom and Ekman2006) and seven in Sweden (Lindblom & Ekman Reference Lindblom and Ekman2007) and sample sizes from all populations were more or less equal (c. 30 thalli) (Table 1). One data set consisted of 439 sequences represented by 17 haplotypes of a part of the intergenic spacer (IGS) marker and the other data set consisted of 452 sequences represented by 21 haplotypes of the internal transcribed spacer (ITS) marker. Both IGS and ITS represent non-coding regions of the nuclear ribosomal DNA; hence they are putatively selectively neutral or near-neutral.

Table 1. Populations of Xanthoria parietina from which the two data sets (IGS and ITS) were obtained, the resulting number of sequences, total number of haplotypes, number of haplotypes observed only once (singletons), number of haplotypes observed twice (doubletons), habitat, and estimated population size (Lindblom & Ekman Reference Lindblom and Ekman2006, Reference Lindblom and Ekman2007).

Data analyses

To quantify the haplotypes found in relation to the total number of haplotypes present, I focused on two levels: 1) analyses of rarefaction curves based on individuals per population (individual-based) and 2) analyses of population-based species accumulation curves. The distinction between individual-based and population-based assessments was recognised by Gotelli & Colwell (Reference Gotelli and Colwell2001) and is explained in Magurran (Reference Magurran2004). “Deducting the total number of existing haplotypes in populations based on the number of sampled haplotypes” wastreated as analogous to “estimating species richness defined as total number of species (S) in species surveys”.The concepts of individual-based and population-based are explained in detail by Gotelli & Colwell (Reference Gotelli and Colwell2001).

To quantify the number of haplotypes found in each population, rarefaction (individual-based re-sampling curves; Gotelli & Colwell Reference Gotelli and Colwell2001) as implemented by the vegan package in the software R 2.6.2 (R Development Core Team 2008) was used. The function rarefy is based on Hurlbert's (Reference Hurlbert1971) formulation and will give expected species richness in random subsamples of size sampled from the community (Oksanen et al. Reference Oksanen, Kindt, Legendre, O'Hara, Simpson, Solymos, Stevens and Wagner2009). Using rarefaction the haplotype richness for each population was analysed and depicted. Rarefaction curves are randomized plots of the number of observed species (here, haplotypes) against the number of observations. An estimate of the total number of haplotypes is obtained by fitting a curve to the observed data and the asymptotic value is equivalent to the total number of haplotypes.

To assess whether the total sample size was an accurate reflection of the total number of haplotypes present, population-based species accumulation curves (Gotelli & Colwell Reference Gotelli and Colwell2001; Magurran Reference Magurran2004) were applied. Analyses to test if the total population sampling reached saturation were performed using the software EstimateS version 8.0.0 (Colwell Reference Colwell2005). Estimations were calculated with 1000 randomizations and the default option ‘without replacement’. The software selects a single sample at random, computes the requested estimators and indices based on that sample, selects a second sample, re-computes the estimators using the pooled data from both samples, selects a third, re-computes, and so on until all samples in the matrix are included. Samples are added to the analysis in random order, without replacement, i.e., each sample is selected once. Each randomization accumulates the samples in a different order, but includes all samples. The final value for the averaged random-order species accumulation curve therefore matches the total number of observed species.

In addition to species accumulation curves, there are two different statistical approaches; parametric methods and non-parametric methods (Magurran Reference Magurran2004). Non-parametric methods are generally considered more efficient and have been shown to be more accurate and precise, independent of sampling effort (Hortal et al. Reference Hortal, Borges and Gaspar2006). For an overview of non-parametric methods see Chazdon et al. (Reference Chazdon, Colwell, Denslow, Guariguata, Dallmeier and Comiskey1998). To test how well the number of observed haplotypes in sampled populations reflected the total number of haplotypes present and, accordingly, how many haplotypes were potentially missing in the samples, two non-parametric methods were appliedusing the software EstimateS: 1) the Chao 1 richness estimator, and 2) the abundance-based coverage estimator (ACE). The Chao 1 richness estimator (Chao Reference Chao1984; Colwell & Coddington Reference Colwell and Coddington1994), estimates the mean among runs as well as a variance formula based on the lower and upper bounds of the log-linear confidence intervals (Chao Reference Chao1987; Colwell Reference Colwell2005). The method is based on the concept that rare species carry the most information about missing ones (Chao Reference Chao, Balakrishnan, Read and Vidakovic2005). Hence, species observed only once (singletons) or twice (doubletons) are used to estimate the number of missing species. ACE is an estimator first introduced by Chao & Lee (Reference Chao and Lee1992) and modified and introduced in the ecological literature by Chazdon et al. (Reference Chazdon, Colwell, Denslow, Guariguata, Dallmeier and Comiskey1998). The total number of species in a community is estimated using a sample-coverage estimation – the proportion of all individuals in rare species that are not singletons (Chao Reference Chao, Balakrishnan, Read and Vidakovic2005). Observed frequencies are separated into two groups, rare and abundant. For the abundant species, only presence/absence information is needed because they would always be detected. To define which are rare species a cut-off value of ten is used, hence, a species (or in this case haplotype) is considered rare if it is observed ten times or less.

Results

The sampled population data sets and the number of haplotypes observed only once (singletons) or twice (doubletons) in each population sample are summarized in Table 1.

Rarefaction analyses for each population did not reveal different patterns for populations in different habitats of bark and rock (Fig. 1).

Fig. 1. Rarefaction curves for each of the 14 populations of Xanthoria parietina using haplotypes instead of species. A, IGS molecular marker; B, ITS molecular marker. Lines mark curves for populations sampled from rock, broken lines marks curves for populations sampled from bark.

Species accumulation curves failed to reach an asymptote for either marker (Fig. 2), indicating that sampling was not nearing completeness when all populations were included.

Fig. 2. Species accumulation curves (Sobs), using haplotypes of Xanthoria parietina instead of species, showing the mean cumulative number of haplotypes in relation to the number of sampled populations (each population corresponds to c. 30 individuals). The original sampling was permuted 1000 times. ♦, IGS molecular marker; 󰂟, ITS molecular marker.

For the IGS marker, analytical estimates performed with the Chao 1 richness estimator showed that 22 haplotypes potentially exist in the total community, whereas estimations performed with the ACE method gave 27 potential haplotypes. In the real data sets, 17 haplotypes were observed, which means that 63% (ACE) to 77% (Chao 1) of haplotypes potentially present were found with the sample size used (missing 23–37%).For the ITS marker, analytical estimates performed with the Chao 1 richness estimator showed that 25 potentially exist in the total community, whereas estimations performed with the ACE method gave 28 potential haplotypes. In the real data sets, 23 haplotypes were observed, which means that 82% (ACE) to 92% (Chao 1) of haplotypes potentially present were found with the sample size used (missing 8–18%).

Discussion

This study was performed to quantify the population sampling effort of a previous project (Lindblom & Ekman Reference Lindblom and Ekman2006, Reference Lindblom and Ekman2007) and to examine to what extent the haplotypes potentially present in the populations were sampled and how many haplotypes were potentially missing from the sample.

In the project it was shown that bark populations of X. parietina contain a higher number of haplotypes than rock populations (Lindblom & Ekman Reference Lindblom and Ekman2006, Reference Lindblom and Ekman2007). Hence, I expected the rarefaction analyses to reveal differences in the success of the sampling in order to cover all haplotypes present in populations in different habitats. Rarefaction analyses for each population showed that this is not the case (Fig. 1). In the analyses both for IGS and ITS, the accumulation curves seem to form two distinct clusters. However, these clusters do not unequivocally correspond to accumulation curves from each habitat, bark and rock. The populations in the lower cluster of both IGS (Fig. 1A) and ITS (Fig. 1B) are closer to reaching saturation by the used sample size. The lower cluster consists of populations from both bark, with a higher number of haplotypes, and rock, with a lower number of haplotypes.

Population-based species accumulation curves indicated that the total sampling of populations was not entirely satisfactory (Fig. 2). There was a slight tendency for both curves to approach an asymptote, but they did not show that sampling was complete. A recent study (Printzen et al. Reference Printzen, Ekman and Tønsberg2003) using a similar approach on three population samples in the range of 39 to 212 individuals, gave similar results, i.e., that saturation was not reached. The authors argued that saturation was reached in one of the three areas, with the smallest sample size, but it is virtually impossible to observe in the illustration provided (Printzen et al. Reference Printzen, Ekman and Tønsberg2003: 1478).

What percentage of the haplotypes was actually present in the populations sampled? How many haplotypes were potentially missing in the total sample? The analytical methods give slightly different answers. In this study, the number of putatively present haplotypes estimated by ACE was higher than for Chao 1 for both molecular markers. Whereas Chao 1 indicated that 23% of the IGS and 8% of the ITS haplotypes were not observed, the corresponding figures from ACE were 37% and 18%. Both approaches are non-parametric methods, which are considered more efficient (Magurran Reference Magurran2004) and recently demonstrated to be more accurate and precise, independent of sampling effort (Hortal et al. Reference Hortal, Borges and Gaspar2006). The difference between the two markers (IGS and ITS) concerning how many haplotypes are missing has not been demonstrated before, and is puzzling. One possible explanation is that the rate at which new haplotypes are encountered with increasing sample size is lower for IGS because of a more patchy distribution, i.e., less random due to spatial autocorrelation, than for ITS. Chazdon et al. (Reference Chazdon, Colwell, Denslow, Guariguata, Dallmeier and Comiskey1998) found that Chao 1 and ACE measures are sensitive to patchiness. Since there is an ecological patchiness in the sampling (populations were growing either on bark or on rock) and population differentiation has been shown previously (Lindblom & Ekman Reference Lindblom and Ekman2006, Reference Lindblom and Ekman2007), results from the present study could indicate that IGS haplotypes are, in some way, more dependent on substratum than ITS. Neither of the two markers used here are coding and, hence, are regarded as selectively neutral or near-neutral. It is not known how such a correlation could arise.

For field collectors making species inventories, a so-called ‘stopping rule’ is invaluable. This is an indication of the point beyond which further sampling is no longer necessary or quite simply too costly (see an overview in Magurran Reference Magurran2004). In the present case, where the number of haplotypes is counted instead of species, a stopping rule would be extremely difficult if not impossible to apply since field sampling and haplotype identification are necessarily temporally and spatially separated by the laboratory work that is done. In the studies investigated here (Lindblom & Ekman Reference Lindblom and Ekman2006, Reference Lindblom and Ekman2007), using a sample size of approximately 30 individuals seems to have covered a substantial part of, but not all, haplotypes present in the populations. Population genetic analysis methods normally accommodate for uncertainty due to sample size. However, I think that it is important to assess what is the minimum number of samples per population and/or number of populations needed to provide reasonable statistical power. Preliminary pilot studies are strongly recommended for genetic-based population studies of lichen-forming fungi.

I am grateful to John-Arvid Grytnes for generous advice and comments on several versions of the manuscript and to Cathy Jenks for language corrections. Stefan Ekman is thanked for suggesting the Xanthoria parietina project back in 2000.

References

Baverstock, P. R. & Moritz, C. (1996) Project design. In Molecular Systematics (Hillis, D. M., Moritz, C. & Mable, B. K., eds): 1727. Sunderland: Sinauer.Google Scholar
Chao, A. (1984) Nonparametric estimation of the number of classes in a population. Scandinavian Journal of Statistics 11: 265270.Google Scholar
Chao, A. (1987) Estimating the population size for capture-recapture data with unequal catchability. Biometrics 43: 783791.CrossRefGoogle ScholarPubMed
Chao, A. (2005) Species richness estimation. In Encyclopedia of Statistical Sciences (Balakrishnan, N., Read, C. B. & Vidakovic, B., eds): 79097916. New York: Wiley.Google Scholar
Chao, A. & Lee, S.-M. (1992) Estimating the number of classes via sample coverage. Journal of the American Statistical Association 87: 210217.CrossRefGoogle Scholar
Chazdon, R. L., Colwell, R. K., Denslow, J. S. & Guariguata, M. R. (1998) Statistical methods for estimating species richness of woody regeneration in primary and secondary rain forests of NE Costa Rica. In Forest Biodiversity Research, Monitoring and Modeling: Conceptual Background and Old World Case Studies (Dallmeier, F. & Comiskey, J. A., eds): 285309. Paris: Parthenon Publishing.Google Scholar
Colwell, R. K. (2005) EstimateS: Statistical Estimation of Species Richness and Shared Species from Samples. Version 8.0.0. Persistent url: <purl.oclc.org/estimates>>Google Scholar
Colwell, R. K. & Coddington, J. A. (1994) Estimating terrestrial biodiversity through extrapolation. Philosophical Transactions of the Royal Society London B 345: 101118.Google ScholarPubMed
Gotelli, N. J. & Colwell, R. K. (2001) Quantifying biodiversity: procedures and pitfalls in the measurement and comparison of species richness. Ecology Letters 4: 379391.CrossRefGoogle Scholar
Hortal, J., Borges, P. A. V. & Gaspar, C. (2006) Evaluating the performance of species richness estimators: sensitivity to sample grain size. Journal of Animal Ecology 75: 274287.CrossRefGoogle ScholarPubMed
Hurlbert, S. H. (1971) The non-concept of species diversity: a critique and alternative parameters. Ecology 52: 577586.CrossRefGoogle Scholar
Karp, A., Isaac, P. G. & Ingram, D. S. (1998) Molecular Tools for Screening Biodiversity: Plants and Animals. London: Chapman & Hall.CrossRefGoogle Scholar
Kull, T. & Oja, T. (2007) Low allozyme variation in Carex loliacea (Cyperaceae), a declining wood-land sedge. Annales Botanici Fennici 44: 267275.Google Scholar
Lindblom, L. & Ekman, S. (2006) Genetic variation and population differentiation in Xanthoria parietina on the island Storfosna, central Norway. Molecular Ecology 15: 15451559.CrossRefGoogle ScholarPubMed
Lindblom, L. & Ekman, S. (2007) New evidence corroborates population differentiation in Xanthoria parietina. Lichenologist 39: 259271.CrossRefGoogle Scholar
Magurran, A. E. (2004) Measuring Biological Diversity. Malden: Blackwell Publishing.Google Scholar
McDonald, B. A. (1997) The population genetics of fungi: tools and techniques. Phytopathology 87: 448453.CrossRefGoogle ScholarPubMed
Prentice, H. C., Lönn, M., Rosquist, G., Ihse, M. & Kindström, M. (2006) Gene diversity in a fragmented population of Briza media: grassland continuity in a landscape context. Journal of Ecology 94: 8797.CrossRefGoogle Scholar
Printzen, C., Ekman, S. & Tønsberg, T. (2003) Phylogeography of Cavernularia hultenii: evidence of slow genetic drift in a widely disjunct lichen. Molecular Ecology 12: 14731486.CrossRefGoogle Scholar
R Development Core Team (2008) R: a Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing.Google Scholar
Rosquist, G. & Prentice, H. C. (2002) Genetic variation in Scandinavian Anthericum liliago (Anthericaceae): allopolyploidy, hybridization and immigration history. Plant Systematics and Evolution 236: 5572.CrossRefGoogle Scholar
Walser, J.-C., Holderegger, R., Gugerli, F., Hoebbe, S. E. & Scheidegger, C. (2005) Microsatellites reveal regional population differentiation and isolation in Lobaria pulmonaria, an epiphytic lichen. Molecular Ecology 14: 457467.CrossRefGoogle ScholarPubMed
Weir, B. S. (1996 a) Intraspecific differentiation. In Molecular Systematics (Hillis, D. M., Moritz, C. & Mable, B. K., eds): 385405. Sunderland: Sinauer.Google Scholar
Weir, B. S. (1996 b) Genetic Analysis II: Methods for Discrete Population Genetic Data. Sunderland: Sinauer.Google Scholar
Werth, S. & Sork, V. L. (2008) Local genetic structure in a North American epiphytic lichen, Ramalina menziesii (Ramalinaceae). American Journal of Botany 95: 568576.CrossRefGoogle Scholar
Werth, S., Wagner, H. H., Holderegger, R., Kalwij, J. M. & Scheidegger, C. (2006) Effect of disturbances on the genetic diversity of an old-forest associated lichen. Molecular Ecology 15: 911921.CrossRefGoogle ScholarPubMed
Zoller, S., Lutzoni, F. & Scheidegger, C. (1999) Genetic variation within and among populations of the threatened lichen Lobaria pulmonaria in Switzerland and implications for its conservation. Molecular Ecology 8: 20492059.CrossRefGoogle ScholarPubMed
Figure 0

Table 1. Populations of Xanthoria parietina from which the two data sets (IGS and ITS) were obtained, the resulting number of sequences, total number of haplotypes, number of haplotypes observed only once (singletons), number of haplotypes observed twice (doubletons), habitat, and estimated population size (Lindblom & Ekman 2006, 2007).

Figure 1

Fig. 1. Rarefaction curves for each of the 14 populations of Xanthoria parietina using haplotypes instead of species. A, IGS molecular marker; B, ITS molecular marker. Lines mark curves for populations sampled from rock, broken lines marks curves for populations sampled from bark.

Figure 2

Fig. 2. Species accumulation curves (Sobs), using haplotypes of Xanthoria parietina instead of species, showing the mean cumulative number of haplotypes in relation to the number of sampled populations (each population corresponds to c. 30 individuals). The original sampling was permuted 1000 times. ♦, IGS molecular marker; 󰂟, ITS molecular marker.