Experimental
As decisions about the rejuvenation of seed lots conserved in ex situ collections are primarily based on germination test results, knowledge about the reliability of these results is of crucial importance. Rejuvenation is a costly activity that bears the dangers of genetic drift, shift and contamination, all compromising authenticity (van Hintum et al., Reference van Hintum, van de Wiel, Visser, van Treuren and Vosman2007; van de Wouw et al., Reference van de Wouw, van Treuren and van Hintum2011). Therefore, unnecessary rejuvenation based on unreliable germination test results should be avoided, while also samples that do need rejuvenation should be reliably identified.
In publications that analyse seed storage behaviour in genebanks (e.g. Walters et al., Reference Walters, Wheeler and Grotenhuis2005; Nagel et al., Reference Nagel, Vogel, Landjeva, Buck-Sorlin, Lohwasser, Scholz and Börner2009), the reliability of the germination test results has not been discussed.
Germination tests are used as a proxy of seed viability. The ‘Genebank Standards’ (FAO/IPGRI, 1994) recommend seed viability testing soon after receipt of a seed lot, followed by retesting at 5 or 10 years’ time intervals, depending on the expected storage life or the initial germination. Due to high testing costs and the substantial amount of seeds needed for testing, genebanks generally test at lower frequency. For example, the Centre for Genetic Resources, the Netherlands (CGN) tests at approximately half to one-third of the recommended frequency.
CGN outsources its germination tests to ISTA-certified testing agencies in the Netherlands. They apply ISTA protocols for the genebank material, using the possibilities of these protocols to break dormancy and to extend the observation period. Only in specific cases other protocols are used, such as for potato where the temperature and dormancy-breaking methods are different. A consistent deviation is that usually only 200 instead of 400 seeds of genebank accessions are tested in order to avoid rapid seed depletion. At CGN, since 2001, 5–10% of the annually tested seed lots are retested in order to obtain reliability estimates. The 641 accessions in the randomly chosen and anonymized subsets involved all 25 CGN crops, ranging from a single sample of lamb's lettuce to 145 lettuce samples.
Pairwise plotting of the test results revealed large discrepancies (Fig. 1). The probability that pairwise differences were due to sampling effects was tested with the χ2 distribution, using a two-tailed test. This allowed us to focus on the pairs that differed more than could be explained based on chance alone with a probability of 5%. Since in the margins of the distribution range, small pairwise differences may already be significant, further analysis was limited to those significantly different pairs where at least one value was smaller than 90%. These pairs, designated as ‘suspicious’, included 116 (18.1%) of the 641 seed lots, whereas based on the confidence level used statistically maximally 5% would be expected if it was only due to chance.
To compare among different crops, population types (cultivated, wild) and test years, a correction was made to avoid confounding underlying factors. The chance P CPY of a crop C with population type P tested in year Y generating a suspicious result was calculated as:
where P 0 represents the overall probability on a suspicious result (18.1%), the factors P C, P P and P Y represent the additional probabilities based on the crop, population type and year, respectively, and ɛ is a normally distributed error. The factors were estimated by minimizing the variance of ɛ, i.e. the sum of the squared differences between the expected and observed number of suspicious results at the different estimated factor levels. Subsequently, the corrected chance of suspicious results per crop (P 0+P C), population type (P 0+P P) or year (P 0+P Y) could be calculated (Table 1).
a Data were corrected based on an additive model with crop, population type and year as factors.
The size of the pairwise differences obviously also depended on the average germination level of the tested pair. In case of very high values, e.g. 97%, the difference cannot be larger than 6% (100 and 94% for the first and second test, respectively). For this reason, the pairwise differences were classified in ten groups of approximately similar size according to average germination result of the pairs, and the standard deviation of the pairwise differences was calculated for each group. The resulting values allowed estimation of the standard deviation of the errors in the germination results by dividing by √2 (Supplementary Fig. S1, available online only at http://journals.cambridge.org). For a group with average germination between 77.5 and 82.5%, including 49 pairs of results, the standard deviation of the errors in the germination results was 8.0%. Based on the binomial distribution, a sample size of 200 seeds and a germination level of 80%, a value of 2.8% was to be expected if the error would be due to sampling effects only.
Discussion
The results of germination tests are of vital importance for genebank management decisions, and hence should be sufficiently reliable. Sample size is an important factor influencing reliability. For example, a recommended sample of 200 seeds (FAO/IPGRI, 1994) results in a sampling error of 2.8% at a germination level of 80%, while this value is 5.7% when only 50 seeds are used.
For the tests reported here, observed error levels were substantially higher, e.g. 8.0% at 80% germination using an average number of 181.6 seeds. This implies that a sample with a true germination of 80% can show test results with a 95% confidence interval of 62.6 to 97.4%. The causes of these observed errors could be many. Some crops appeared more error prone than others, crop wild relatives appeared more difficult than crops, and in some testing years the discrepancies were substantially higher than in others (Table 1). Underlying causes may include dormancy, misjudgement of seedling health or unequal composition of different samples of a seed lot. Whether these factors explain the observed differences remains a subject of further study.
Outsourcing is not common practice in most other well-established genebanks, making it possible to optimally use in-house experience. Our results indicate that genebanks in general should be aware of the potential problems due to low reliability of germination data, and therefore should critically examine their test procedures and the implications of the test results for decisions on genebank operations.
Acknowledgements
The authors would like to thank Liesbeth de Groot (CGN) for preparing the data from the germination control tests. The study was part of the Programme for Statutory Research Tasks regarding Genetic Resources (WOT-03-436) and the Fundamental Research Programme on Sustainable Agriculture (KB-12-005.03-003), both funded by the Dutch Ministry of Economic Affairs, Agriculture and Innovation. The authors are grateful to Bert Visser, Chris Kik and Liesbeth de Groot for their comments on an earlier version of the manuscript, and to two anonymous reviewers for their constructive suggestions for further improvements.