Introduction
Many studies related to seed germination behaviour focus on predictive models such as the hydrothermal time (HTT) model (Gummerson, Reference Gummerson1986; Bradford, Reference Bradford2002; Finch-Savage, Reference Finch-Savage, Benech-Arnold and Sanchez2004; Watt and Bloomberg, Reference Tribouillois, Dürr, Demilly, Wagner and Justes2012; Mesgaran et al., Reference Mesgaran, Mashhadi, Alizade, Hunt, Young and Cousens2013; Hay et al., Reference Hay, Mead and Bloomberg2014; Hardegree et al., Reference Hardegree, Walter, Boehm, Olsoy, Clark and Pierson2015). HTT models describe the time course of seed germination for a population of seeds germinated under specific temperature (T) and water potential (Ψ) conditions, since these are the two main factors determining the speed of seed germination and the proportion of the seed population that completes germination (Gummerson, Reference Gummerson1986; Roberts, Reference Roberts1988; Probert, 2000; Bradford, Reference Bradford2002). The germination experiment usually consists of many identical dishes with n seeds each, which are subject to factorial combinations of constant water potential and temperature. Germination of seeds in the dishes is monitored at pre-specified time points. The cumulative proportion of seeds germinated in each dish at each time point is then recorded, and the cumulative proportion of the population that has completed germination (g(t)) is modelled in the HTT model as a function of time since sowing (t) and seedbed water potential and temperature. The parameters of the HTT model are usually estimated using either a linear regression (e.g. Bradford, Reference Bradford2002) or non-linear least squares estimation (e.g. Watt et al., Reference Watt, Xu and Bloomberg2010) with a suitably transformed response, or using a generalized linear regression model with a suitable link function (Hay et al., Reference Hay, Mead and Bloomberg2014)].
There are, however, several problems with the above approaches. Firstly, while it is explicitly assumed that the new seeds have germinated exactly at the time of observation, in fact they must have germinated at any time between the current and the previous time of observation. Secondly, by analyzing proportions rather than binomial counts, one loses information. For example, although 1 out of 2 and 1000 out of 2000 both refer to the same proportion of 50%, the uncertainty (i.e. standard error) associated with this estimate is 10 times larger for the first case. It is therefore important, to include both germination counts and the total number of seeds involved in a germination experiment. The censoring inherent in the experiment, because observation stops at some time and the fate of the remaining seeds is unknown, is also not accounted for. Finally, there is always a lag period in a germination experiment after the seeds are sown, but before germination occurs. It is common to drop these initial ‘nil’ observations, thus losing further information. To solve these problems, several authors have proposed the use of survival likelihood rather than analysis of cumulative germination (McNair et al., Reference McNair, Sunkara and Frobish2012; Ritz et al., Reference Ritz, Pipper and Streibig2013; Hay et al., Reference Hay, Mead and Bloomberg2014; Onofri et al., Reference Onofri, Benincasa, Mesgaran and Ritz2018).
A further problem with modelling such germination experiments is that the observations are not independent and identically distributed (i.i.d.), but rather correlated within the dish. Onofri et al. (Reference Onofri, Mesgaran, Neve and Cousens2014) suggest the use of jackknife resampling to remedy this. It can be applied within the survival modelling framework in a way similar to the non-linear modelling framework. However, jackknife resampling is a non-parametric approach, which does not require the modeller to state explicitly which parameters differ from dish to dish and which stay constant. The two-stage approach proposed by Jensen et al. (Reference Jensen, Andreasen, Streibig, Keshtkar and Ritz2017) uses a weighted mixed-effects model on germination curve parameter estimates to account for that. However, the authors mention that the two-step analysis has to be carried out separately for each parameter of interest and may not optimally utilize germination data from individual germination tests which were not run for long enough time to characterize the entire germination curve.
Finally, in the case of a non-differentiable g(t), numerical optimization algorithms are required for both the non-linear model and the survival model, and these do not always work well. For example, a grid-search based solution is feasible for some problems (Shayanfar et al., Reference Shayanfar, Sharifiamina, Moot, Moltchanova and Bloomberg2019), but for others the computational burden is too great.
Bayesian inference is becoming more and more popular among statistical practitioners due to its flexibility, the possibility of taking into account prior knowledge based on the nature of the phenomenon under study, prior experimental results or expert elicitation and, finally, its ability to continuously update the state of knowledge. As Humplík et al. (Reference Humplík, Dostál, Ugena, Spíchal, De Diego, Vencálek and Fürst2020) say in their recent paper on the application of Bayesian statistics to germination modelling, ‘the Bayesian computational method properly handles uncertainty in time-to-event data and it is capable to reliably answer questions that are difficult to address by classical methods’. Here, we set the HTT model within the Bayesian modelling framework to demonstrate how all the above problems can be solved in an elegant and computationally efficient way. We also show how the above framework provides a wealth of inferential information allowing, among other things, for easy comparison of HTT parameters among different species and comparison of HTT models with different assumed frequency distributions for parameters.
We use experimental germination data for two species of clover − subterranean (or sub) clover (Trifolium subterraneum L.) and white clover (Trifolium repens L.), to illustrate the performance of the above methods. These two species are widely used in agricultural pastures in many temperate regions of the world (Burdon, Reference Burdon1983; Gibson and Cope, Reference Gibson, Cope and Taylor1985; McGuire, Reference McGuire and Taylor1985; Smetham, Reference Smetham2003; Frame and Laidlaw, Reference Frame, Laidlaw, Reynolds and Frame2005; Teixeira et al., Reference Spiegelhalter, Best, Carlin and Van Der Linde2015). Sub clover is a winter active annual with ‘compulsory’ germination in autumn, whereas white clover is a summer active perennial where annual germination is not compulsory to maintain the population.
We hypothesize that (1) there will be differences in the HTT model parameters of these two species, which may reflect their different life histories and ecological niches and (2) the three different statistical approaches (non-linear least squares, survival and Bayesian) may differ in their ability to accurately estimate HTT model parameters and therefore detect differences in the HTT model parameters between the two species.
Materials and methods
Germination experiment
Sub clover seeds (‘Napier’, DM32Ø45RC) and white clover seeds (‘Nomad’) (Seed Force Ltd, Christchurch, New Zealand 8441) were screened carefully to remove empty or broken seeds from the population. Then 50 seeds were placed on filter papers in sealed 500-ml snap-top containers and incubated in the dark in factorial combinations of seedbed temperature (5, 10, 15, 20, 25, 30 and 35°C) and water potential (0, −0.18, −0.37, −0.63 and −0.95 MPa). There were three replicates for each combination of temperature and water potential. Full details of seed preparation and treatment during the experiment are given in Shayanfar et al. (Reference Shayanfar, Sharifiamina, Moot, Moltchanova and Bloomberg2019). Seed germination was monitored every 8 h through the first 4 days, then inspected every 24 h for 25 days. Seeds were considered to have completed germination when a ≥2 mm radicle protrusion occurred, at which point they were counted, then removed from the container and discarded.
The HTT model
The HTT model is usually expressed as
where g is the germination percentile observed at time t, Ψ and T are the seedbed water potential and temperature, respectively, and constants are T b, the base threshold temperature, θ HT, the HTT constant, Ψb(50), the 50th percentile of the seed population's base water potential distribution and $\sigma _{{\rm \Psi }_b}$, the standard deviation of the Ψb values in the population. Here, f(g) is the inverse cumulative distribution function (cdf). Often, the inverse normal cdf, i.e. probit is chosen – in this paper, we will refer to such a model as the Gaussian model – but a wide range of other options, such as, for example, Weibull and Gamma distributions are possible (Watt et al., Reference Watt, Xu and Bloomberg2010; Mesgaran et al., Reference Mesgaran, Mashhadi, Alizade, Hunt, Young and Cousens2013). Given the data, the above model can be estimated using the non-linear least squares regression and jackknife resampling as described in Onofri et al. (Reference Onofri, Mesgaran, Neve and Cousens2014).
The basic HTT model predicts faster seed germination rates with higher temperature treatments, since an increase in T requires a decrease in t for any specific value of f(g). However, germination rates typically reach a maximum at optimum temperatures (often in the range of 15–25°C) before declining to zero at the ceiling temperature for germination. This process is commonly described as ‘thermo-inhibition’ (Horowitz and Taylorson, Reference Horowitz and Taylorson1983; Hills and van Staden, Reference Hills and van Staden2003). To account for thermo-inhibition at supra-optimal temperatures, Rowse and Finch-Savage (Reference Rowse and Finch-Savage2003) proposed a modification to the HTT model, where
Here, I z is an indicator function, which equals to one if the subscript is true and is zero otherwise, and the criterion is, T > Td. Td is the temperature above which thermo-inhibition of seed germination will occur. Thermo-inhibition is modelled by the term $[ {{\rm \Psi }_b\lpar {50} \rpar + k\lpar {T-T_d} \rpar I_{T \gt T_d}} ] $ becoming larger at a rate of k for every degree Celsius of seedbed temperature above Td, so that the term t must become larger for any specific value of f(g).
Note that the addition of I z makes the HTT model function non-differentiable. This in turn may affect the convergence, speed and general feasibility of the optimization (whether minimizing squared errors or maximizing the likelihood), since many algorithms assume the object function to be smooth, i.e. differentiable (Shor, Reference Shor1985). In Shayanfar et al. (Reference Shayanfar, Sharifiamina, Moot, Moltchanova and Bloomberg2019), the optimization was done as follows:
(i) A grid of T d values was set up.
(ii) The non-linear least squares estimators were found conditional on each value of the grid and the corresponding sum of squared errors (SSE) was recorded.
(iii) The T d value associated with the smallest SSE and the corresponding non-linear least squares estimators were then finally chosen.
Note that the above algorithm means that the model needs to be re-estimated for every possible value of T d. The grid resolution defines the precision with which T d can be estimated and affects the precision of other parameters. In this study, we used the values in the range 15–30°C with 0.1°C steps, resulting in 151 iterations. Using jackknife to obtain standard errors for the estimates means that the process needs to be repeated as many times as there are dishes in the experiment (Onofri et al., Reference Onofri, Mesgaran, Neve and Cousens2014). Thus, in our case, the model was re-estimated 93 × 151 = 14,043 times per species.
The survival likelihood
As explained in the introduction, the above HTT model ignores the actual data generating mechanism i.e. it does not account for the fact that the new seeds did not sprout at the observation time, but instead germinated sometime between the current and the previous observation instances. It also loses some information by considering proportions only and thus ignoring the number of seeds in each dish. To more precisely account for the experimental process, one can make use of the survival modelling framework as described by Ritz et al. (Reference Ritz, Pipper and Streibig2013).
Let
describe the cumulative probability of germination at time t. And assume that the dish with n seeds is observed at time points 0 < t 1 < t 2 < ⋅ ⋅ ⋅ < t k < t k+1. Let x j denote the number of new seeds germinated between t j−1 and t j, where t 0 = 0 and t k+1 = +∞, so that
The probability of a seed germinating between t j−1 and t j is g(t j) − g(t j−1), and the observed data can be modelled via multinomial likelihood as follows:
where g(t 0) = g(0) = 0 and g(t k+1) = 1. Note that the last group, the seeds which have not germinated before the end of the experiment, includes both the seeds which are capable of germinating later, and the seeds that would never germinate (dead or dormant).
The corresponding log-likelihood
can be optimized with respect to the parameters of interest using a suitable optimization routine, and the jackknife estimates may be obtained by dropping one dish at a time and re-estimating the parameters in a manner similar to that described by Onofri et al. (Reference Onofri, Mesgaran, Neve and Cousens2014) for the non-linear model. Note, however, that if the likelihood function is non-differentiable, the possible caveats regarding the performance of an optimization algorithm mentioned in the previous section still apply. Therefore, the following algorithm was used to find the profile maximum likelihood estimator:
(i) A grid of T d values was set up.
(ii) The likelihood was maximized conditional on each value of the grid and recorded.
(iii) The T d value associated with the greatest likelihood and the corresponding parameter estimates were chosen.
The Bayesian framework for the survival likelihood
Now that we have used the likelihood function to explicitly describe the data generating mechanism, it is possible to frame the analysis within the Bayesian paradigm. For that, it is necessary to specify the prior distribution of the parameters involved. In the absence of other information, vague conjugate priors are often used. Accordingly, we have chosen to use the following structure:
Note, that, as is customary in Bayesian inference, the second parameter of the normal distribution here is precision (i.e. the inverse variance). The priors for k and T d were chosen to be somewhat more informative due to the nature of these parameters and their role in the model. The variation between dishes was incorporated explicitly into the parameter Ψb(50) via the prior:
where j is the dish number, and furthermore
and
The parameters μΨ and τΨ thus reflect the average water potential and its variability among the dishes.
The Weibull distribution function
To illustrate the model selection, we have also considered a Weibull germination model:
This model allows for an asymmetric germination function and the additional parameter η influences the degree of asymmetry with η = 3.5 corresponding to an approximately normal distribution. In the Bayesian framework, to facilitate convergence, the parameter η was given an informative prior:
All the analyses have been done in R (R Core Team, Reference Probert and Fenner2018). The non-linear model was estimated using the nls function and the survival likelihood was estimated using the optim function. Jackknife resampling was used to obtain estimator uncertainty for these models. To estimate the parameters of the Bayesian models, a Markov Chain Monte Carlo algorithm was written and implemented in R. For the Gaussian model, 200,000 iterations were run with 100,000 burn-in, and for the Weibull model, 1,000,000 iterations were run with 500,000 burn-in. The convergence was checked visually using trace and posterior density plots. A regularly thinned posterior sample of size 1000 was used for the inference. The posterior distribution has been summarized using mean, standard deviation, median and 95% credible intervals.
The Bayesian model comparison has been done via the deviance information criterion (DIC) (Spiegelhalter et al., Reference Teixeira, Lucas and Moot2014). The smaller DIC corresponds to a statistically better model, and a difference of at least 3 is considered statistically relevant.
Results
Observed germination behaviour
Under optimum seedbed water potential (0 MPa), the maximum cumulative germination percentages for sub clover were 98% (achieved over all temperatures from 5 to 35°C) and for white clover 94% (at T = 20°C). Germination time courses (percentage of seeds with completed germination, plotted versus time from the sowing of seeds) are shown for both sub clover and white clover in Fig. 1. Sub clover maintained high maximum cumulative germination percentages over the full range of temperatures (5–35°C) under optimum seedbed water potential. This is unusual, as most species decline in maximum cumulative germination percentages at temperatures approaching minimum and ceiling temperature for germination. The maximum cumulative germination percentage of sub clover was also quite resilient to changes in water potential and only began to significantly decline when seedbed water potential was −0.63 MPa or less. In contrast, white clover maximum cumulative germination percentages declined markedly above 25°C even when water potential was 0 MPa and declined to <20% when water potential was −0.63 or −0.95 MPa, even at optimum temperatures (Fig. 1).
Under optimum conditions of temperature and water potential, both species achieved rapid germination, but white clover germination rate declined rapidly for temperatures above and below optimum temperature, and as water potentials decreased progressively down to −0.95 MPa (Fig. 1). In contrast again, the germination rate for sub clover was more rapid and more resilient to varying temperatures and water potential conditions than for white clover.
HTT model of germination
The HTT models provided reasonably accurate descriptions of in vitro germination behaviour of both sub and white clover (Fig. 1), although model fit was less accurate for germination under extreme conditions. The parameter estimates assuming Gaussian and Weibull distribution for the base water potential are shown in Tables 1 and 2, respectively. For the nls model, the estimators are based on non-linear least squares and for the survival model, the parameter estimates are based on maximum likelihood. The standard errors for these two methods are based on jackknife resampling. For the Bayesian model, posterior means and standard deviations are reported. It should be noted that the parameter estimates are similar between the models. Table 3 shows that the root mean squared errors (RMSEs) for the models are similar as well, with the nls performing the best. However, it should be remembered that the RMSEs are based on comparing predicted values at observation times to the values observed without taking into account the censoring and the interval nature of observation. The lower RMSE for the nls may simply reflect the fact that the nls does not take these factors into account.
For the Bayesian framework, in order to decide whether Gaussian distribution is more appropriate than the Weibull distribution, we have used DIC to compare the models. For the white clover, the DICs for the Gaussian and Weibull distributions were 12071.62 and 12,270.43, respectively, indicating a very strong preference for the Gaussian (ΔDIC = 198.81). For the sub clover, the DICs for the Gaussian and Weibull distributions were 16,836.21 and 16,726.35, respectively, indicating a very strong preference for the Weibull distribution model (ΔDIC = 109.86).
To perform a comparison between the species within the Bayesian framework, DIC can be used to compare the model fitted with the same set of parameters for both species to the model fitted with species-specific parameters. For the Gaussian case, the corresponding values of DIC are 28,944.4 and 28,907.8, respectively, indicating that the species-specific model is more accurate (ΔDIC = 36.57). The plots of posterior distribution densities for the individual HTT parameters are shown in Fig. 2. The two species differed markedly in terms of Tb, μΨ, and Td, but not in terms of other parameters. For the Weibull model (not shown), there was also a clear difference in terms of the σ Ψb parameter. In addition, the posterior means for the skewness parameter (η) were 2.85 and 2.15 for the white clover and the sub clover, respectively, with the corresponding 95% CI of (2.57, 3.15) and (2.01,2.31), respectively, demonstrating a clear difference between the species.
The posterior estimates for the dish-specific random effects [Ψb(50)]j based on the Gaussian model are shown in Fig. 3. The between-dish variation (inverse precision) was on average slightly higher for the white clover than for the sub clover. For the white clover, the posterior mean estimated precision was 101.1 (95% CI: 66.4, 143.9), whereas for the sub clover, the posterior mean estimated precision was 102.6 (95% CI 69.4, 147.0). However, as the high degree of overlap between the 95% credible intervals implies, the difference is not substantial.
Discussion and conclusion
We have applied a Bayesian modelling framework to the problem of estimating the parameters of the HTT model and compared the process and the results with some of the previously reported approaches (nls and the survival model). We have found no differences in fit accuracy (which is somewhat expected given that the same model is fitted to the same data), but we have found the Bayesian approach computationally faster, especially when applied to the case of the non-differentiable HTT model.
In general, the Bayesian framework is often preferred when dealing with complex multi-parameter models drawing on data from multiple experiments. There are many reasons for this: the Bayesian approach is very flexible in terms of model specification; it allows the inclusion of information from experts or previous experiments via the prior distribution, leading to seamless sequential updating of the state of knowledge; it can deal with a large number of parameters (in some cases, in their thousands) and it allows for model comparison via DIC. Because our model set-up was based on the survival analysis principles and allowed for random effects, it used more information by modelling the number of germinated seeds rather than the proportion alone and accounted explicitly for the various aspects of the data generating mechanism such as censoring, interval observation, within-dish autocorrelation and between-dish variation.
The estimation process, i.e. convergence and the speed thereof, is not impacted by the use of a non-differentiable function within the HTT model. It should also be noted that the jackknife resampling method used to obtain uncertainty for the estimators in the non-linear regression model and the maximum likelihood-based survival model requires the parameters to be re-estimated as many times as there are dishes in the experiment. The computational time required is thus linearly dependent on the number of dishes. This is not the case for the Bayesian model estimating algorithm, where the proposed dish-specific random effects are drawn all at once, and their number has little effect on computational efficiency.
Finally, the Bayesian modelling framework also allows for easy comparison of HTT models (for example, using Gaussian versus Weibull frequency distributions for modelling the seed population's base water potential distribution). It also makes it easy to differentiate or group species according to their HTT model parameters. There is now an extensive literature comparing seed germination traits of different species in response to differences in seedbed temperature and water potential. In some studies, these germination traits of species are specified as HTT parameters which are then modelled as functions of life history, taxonomic or functional classification, or edaphic or climatic conditions within their natural, cultivated or adventive range (e.g. Allen et al., Reference Allen, Meyer, Khan, Black, Bradford and Vázquez-Ramos2000; Köchy and Tielbörger, Reference Köchy and Tielbörger2007; Kos and Poschlod, Reference Kos and Poschlod2008; Dürr et al., Reference Dürr, Dickie, Yang and Pritchard2015; Tribouillois et al., Reference Watt and Bloomberg2016).
In this study HTT models that accounted for a species effect (sub versus white clover) were superior to those that did not account for species. Inspection of credibility intervals and estimated posterior distributions for the Bayesian model (Tables 2 and 3, Fig. 2) shows that it is credible that Tb, Td, mu.Ψb50, and the skewness of the distribution for mu.Ψb are different for the two species, whereas θHT, σ Ψb and k are not. This allows the following inferences: under similar conditions, the two species will germinate at approximately the same speed (θHT), and thermo-inhibition will increase with supra-optimal temperatures at the same rate (k). However, sub clover will germinate more rapidly than white clover at cooler temperatures (Tb = 1.72 versus 2.46°C) or under drier soil conditions (mu.Ψb50 = −0.67 versus −0.38 MPa), and thermo-inhibition will not commence until much higher temperatures (Td = 27.4 versus 18.9°C) so that germination will be more resilient to dry conditions at supra-optimal temperatures. Finally, although both species have a similar frequency distribution of base water potential (σΨb) for the Gaussian model, using a Weibull model shows that the frequency distribution for sub clover is right skewed so that a larger proportion of the sub clover population will have base water potentials lower than the mean. This implies that sub clover will have a higher proportion of fast-germinating seeds than white clover under any specific conditions of T and Ψ.
Figure 1 shows that the HTT model fit was less accurate for germination under extreme conditions. Due to considerations of space, we only show the results for the Bayesian method, but the results for the non-linear regression and the survival analysis are similar (as can also be seen from the Table 3). The HTT model is a parsimonious description of germination behaviour across the whole range of temperature and water potentials where germination is possible. Inspection of plots for published HTT models shows that they are often less accurate for more extreme conditions of temperature and water potential, and most accurate at near-optimal conditions. It is thus a feature of the HTT model, rather than the model estimation technique. While the model fit can be improved by breaking the data into subsets and fitting separate models to each of the subsets (e.g. Mesgaran et al., Reference Mesgaran, Mashhadi, Alizade, Hunt, Young and Cousens2013; Hardegree et al., Reference Hardegree, Walter, Boehm, Olsoy, Clark and Pierson2015, Reference Hardegree, Moffet, Walters, Sheley and Flerchinger2017), such improved accuracy is achieved at the cost of generality and parsimony.
The germination traits we have found are consistent with the characteristics of the two species as pastoral crop species. Sub clover is a winter annual species of Mediterranean origin. It is used predominantly to provide feed through autumn, winter and early spring in Australasian pastures. Under these circumstances, the germination and growth of sub clover commence with autumn rainfall and it remains in a vegetative rosette until increasing day lengths in spring promote flowering before it sets seed to avoid dry summer conditions. In contrast, white clover is the most widely used temperate perennial legume used to provide high-quality feed in dairy and sheep pastures where consistent rainfall or irrigation allows it to grow through summer. Ecologically and agronomically, germination under a wide range of conditions is more important for sub clover which must re-establish from seed each year compared with white clover which also relies on stolon fragments to perennate.
This study used germination data for white clover ‘Nomad’ and sub clover ‘Napier’ DM32Ø45RC, therefore results and conclusions from this study may not apply to other cultivars since these are species that are widely distributed within their natural and adventive ranges and have recognized cultivars, ecotypes or subspecies that differ in their adaptation to climatic and edaphic conditions (Burdon, Reference Burdon1983; Teixeira et al., Reference Spiegelhalter, Best, Carlin and Van Der Linde2015). Having said that differences in germination behaviour among cultivars and subspecies would be amenable to the same analysis using the HTT model, as the present comparison between one cultivar each of two clover species.
Supplementary material
To view supplementary material for this article, please visit : https://doi.org/10.1017/S0960258520000082
Acknowledgements
Seeds used in this study were provided by Seed Force Ltd. The authors acknowledge the financial support of Farshid Ghaderifar (Gorgan University of Agricultural Sciences and Natural Resources) and Roland Stead, and the technical assistance of Stephen Stilwell (Lincoln University).