Hostname: page-component-6bf8c574d5-mggfc Total loading time: 0 Render date: 2025-02-22T18:16:47.088Z Has data issue: false hasContentIssue false

OPTIMAL PLANNING OF COCOA CLONAL SELECTION PROGRAMMES

Published online by Cambridge University Press:  25 April 2013

F. OWUSU-ANSAH*
Affiliation:
Cocoa Research Institute of Ghana, PO Box 8 Tafo-Akim, Ghana
R. N. CURNOW
Affiliation:
Department of Mathematics and Statistics, School of Mathematical and Physical Sciences, University of Reading, Reading, UK
Y. ADU-AMPOMAH
Affiliation:
Cocoa Research Institute of Ghana, PO Box 8 Tafo-Akim, Ghana
*
Corresponding author. Email: bywasahad@yahoo.com
Rights & Permissions [Opens in a new window]

Summary

Data from three cocoa (Theobroma cacao) clonal selection trials are used to investigate the genetic and environmental components of variation in yield and the percentage of total pods affected by black pod disease (Phytophtora pod rot). Simulations based on these estimated components of variation are then used to discuss the best choice in future of numbers of clones, replicates and years of harvest to maximise selection advances in the traits measured. The three main conclusions are the need to increase the number of clones at the expense of the number of replicates of each clone, the diminishing returns from additional years of harvesting and the importance of widening the genetic base of the clones chosen to be tested.

Type
Research Article
Copyright
Copyright © Cambridge University Press 2013 

INTRODUCTION

Clones are the most efficient means of utilizing genetic variation (Simmonds, Reference Simmonds1979), though selection over several generations is expected to be less efficient than the selection of seedlings. Most of the commercial tree crops grown in the world today are clones. However, cocoa (Theobroma cacao) clonal selection in Ghana and the rest of Africa, the source of over 68% of the global cocoa supplies, proved difficult initially due to what has been recently attributed to epigenetic effects (Pang and Lockwood, Reference Pang and Lockwood2008). This has led to a reduced interest in clonal selection in favour of bi-parental crosses (hybrids). There has recently been a renewed interest in cocoa clonal selection as it is believed to be a possible means of increasing yield through reducing the high inter-tree yield variation recorded in seedling material (e.g. Adomako and Adu-Ampomah, Reference Adomako and Adu-Ampomah2003) and fixing disease-resistant traits. Pang and Lockwood (Reference Pang and Lockwood2008) have suggested simple recurrent selection as the appropriate breeding strategy with different planting densities for the seedling and clonal phases. Currently, a large-scale cocoa clonal selection programme is underway in Ghana aimed at selecting high yielding genotypes in the presence of the major fungal disease, black pod disease (Phytophtora pod rot).

The optimal use of resources in clonal selection programmes requires knowledge of the likely genetic and environmental components of the variation in the traits to be improved. The only available data from experimental studies of clonal cocoa trees in Ghana come from harvests of three trials between 1957 and 1964. The data from these trials will be analysed and used, via simulation, to study the optimal choice in the final stage of a selection trial of the number of clones to be tested, the number of replications of each clone and the number of years of harvest.

MATERIALS AND METHODS

The data analysed in this paper and used to estimate the parameters for the simulation of alternative selection programmes come from three clonal trials, namely M2, N4 and D8. The design was always a randomised complete block design. A few changes to the randomisations in D8 were necessary to avoid having neighbouring clones that were incompatible for fertilisation. The trials varied in the number of new clones, the number of replicates and the number of years of harvest (Table 1). Two clones, the basis of selection which is not clearly known in the first two trials whilst that of the third trial is as a result of their highest yielding record in the earlier trials (Lockwood et al., Reference Lockwood, Owusu-Ansah and Adu-Ampomah2007), were included in each trial as standards. The three traits of interest are the total number of pods per tree, referred to as total yield, the percentage of diseased pods, referred to as per cent diseased, and the number of healthy pods per tree, referred to as healthy pods. The diseased pods had most often been affected by black pod disease. However, in the two years of the M2 and N4 trials in which separate figures are available, the proportions of bad pods that were not diseased but were affected by rodent damage ranged from 1.3 to 40% with the most damage occurring in the third year. The effect of this complication is discussed later. The conclusions drawn from the trials are recorded in Lockwood et al. (Reference Lockwood, Owusu-Ansah and Adu-Ampomah2007) and Lockwood (Reference Lockwood1971).

Table 1. Layout, level and number of years of recording for the three clonal trials.

*Each pair of trials has one standard in common. In D8, one of the standards was duplicated.

The experimental units in the analysis are the plots. Because of tree death and the removal of trees with swollen shoot virus infection, some plots had, at the time of harvest, less than the prescribed number of trees, for example there was an 8% loss in M2. There were no obvious differences between clones in the number of missing trees. There appeared to be no consistent pattern in the relationship of the plot means to the number of trees harvested, suggesting no effects of reduced competition with fewer trees or selective losses related to the performance of the trees. The analysis of the data was therefore based on the means per tree for each plot.

Various transformations of the data were applied in an attempt to satisfy the assumptions that are needed in the analysis and for the simulations. For total yield and per cent diseased, the square-root transformation achieved overall the closest approximation to normality of the residuals, variances independent of the means, normality of the distribution of the clonal means and a linear relationship between clonal means for total yield and per cent diseased. The analysis concentrated on total yield and per cent diseased with the number of healthy pods derived later from these two quantities.

Using the total yields as an example, the model for the square root of the total yield, yijk, of the plot with new clone i in block j in year k was

\begin{equation*} y_{ijk} = \mu + c_i + b_j + p_k + (bp)_{jk} + (cp)_{ik} + e_{ijk} ,\end{equation*}
where c represents the clones, b the blocks and p the years, and e is the sum of the interaction of all three factors and the contribution of the individual plot to the square-root yield. The effects of blocks and years will be treated as fixed effects and the clones and clones by years (clones × years) terms as random effects with the clones considered a sample from a larger population of clones that could have been tested. There appeared to be no pattern in the clones × years interaction terms to suggest that some clones were more or less variable between years or that there was more variation in some years than others. This justifies treating clones × years interactions as random. Because the same trees and plots are measured in the different years, the analysis has to take account of the correlations of the error teams, eijk, in the different years. The analysis is therefore a repeated measures analysis (Crowder and Hand, Reference Crowder and Hand1990; Kenward and Roger, Reference Kenward and Roger1997) and was carried out using GENSTAT 11. The clones by blocks (clones × blocks) interaction term contributes to the correlation of the same plot in different years and was therefore absorbed into the residual term eijk .

In addition to modelling the two basic traits separately, the genetic and environmental correlations between the two traits also need to be modelled and estimated. The genetic correlations between both the clonal effects and the clones × years interactions of the two traits were estimated by halving the difference between the variance of the sum of the two effects and the sum of the two separate variances. For the environmental components, the correlations were calculated by treating the two traits as a factor in the repeated measures analysis (Payne et al., Reference Payne, Welham and Harding2008).

Analyses were also used to compare the performance of the standard varieties with the average of the new clones and with each other.

In summary, these analyses allowed us to estimate, for the two basic traits, the genetic and environmental variances and correlations of the clonal effects and the clones × years interactions, and the variances and correlations of the residuals, eijk.

The simulations based on each of the three trials were planned to investigate the effects of varying the number of clones tested, the amount of replication and the number of years of harvest contributing to the data used in the selection. The total number of plots was kept constant and equal to the number in the trial being studied, for example, 108 for trial M2. The values, on the square-root scales, for total yield and per cent diseased respectively were generated as

\begin{equation*} y_{ik} = m_k + c_i + (cp)_{ik} + e_{ik} \quad {\rm and}\quad y_{ik}^* = m_k^* + c_i^* + (cp)_{ik}^* + e_{ik}^* ,\end{equation*}
where mk and m*k are the average values of yik and y*ik in year k over all the new clones in the relevant clonal trial, and the random c, cp and e terms are generated from multivariate normal distributions with zero means and variances and covariances estimated from the analysis of the appropriate clonal trial (see Table 3).

The fixed effects of blocks do not have to be included in the simulation because in selection the only interest is in differences between the clonal means. Differences between the averages in the different years are only important when we later estimate the yield of non-diseased pods. For each combination of the number of clones, number of replicates and number of harvest years used in the selection, the clones were ranked according to their means for each of the two basic traits and also, to provide theoretical upper limits to the gains possible, on the basis of their true values, m + ci and m*+c*i, where, although irrelevant to the ranking, m and m* are the average of the mj and m*j values over the different years.

The best clones are likely to be similar in performance and more than one selected for further evaluation. The effectiveness of each simulated selection programme has therefore been measured by the average of the true values, ci and c*i, for the best three clones. To provide values to compare with the performance of the standard clones and in terms that are easily appreciated, these true values are transformed back to the original yield scales. The back transformation that translates a c value on the square root scale to a value on the original yield or per cent diseased scale has to be of the form (A + c)2. The value of A that results in the correct average for all the new clones is A = (Mvc)1/2, where M is the average value of the trait on the original scale and vc, the expected value of c 2, is the variance of the c values for the trait concerned. Hence, the transformation is ((Mvc)1/2 +c)2.

The clones can also be ranked by their numbers of healthy pods by using the relation of the number of healthy pods to the total number of pods and the percentage of diseased pods using the formula for the number of healthy pods y2ik(1−y*2ik). The corresponding true values are (m+ci)2 (1−(m*+c*i)2). To obtain the effectiveness of selection measured by the number of healthy pods on the original scale requires the calculation for the selected clones of the quantity ((M−vc)1/2+c)2 (1−((M*−v*c)1/2+c*)2). The small correlation between the c and c* values will result in this being an approximate calculation so that the average value of all the new clones will not be exactly the value for the data analysed.

All results are based on the average of 2000 replications of each simulation. In addition to mean values, the simulations provide estimates of the variability of the responses to selection. The simulation study was carried out using the R statistical package.

RESULTS

Statistical analysis of the three clonal trials

For comparative purposes, the analyses were confined to the first four years of each trial. The implications of the further two years of harvest available for the D8 trial will be discussed later. Each of the three trials included two commercially grown clones as standards against which to judge the performance of the new clones. There was one standard in common to each pair of trials. Table 2 shows the mean values of the yield and the per cent diseased pods for each standard and for the average of all the new clones. The superior performance of the better standard compared with the average performance of the new clones in both traits is clear and statistically significant. There are also statistically significant differences between the disease proportions of the two standards in each trial but for yield only in M2 and N4. The variation between the new clones, discussed later, provides the probabilities in Table 2 that a new clone in each trial will outperform the best standard. The low values of these probabilities show that a large number of clones are needed to provide a reasonable expectation that some clones will outperform the standards.

Table 2. Mean values per year for pods per tree (yield) and per cent diseased (disease) of standard clones and new clones and probability that a new clone will outperform the best standard.

*N > B represents new clone outperforms best standard.

There were, as expected, large and statistically significant differences between years for the yields and for the proportions diseased in all three trials. There were statistically significant differences between the proportions diseased in the different blocks of the design in all three trials but only in N4 for the yields.

Analyses of the total yields and proportions diseased for the new clones provide the estimates of the variation and covariation attributable to differences between the clones and to the interaction of clones and years shown in Table 3. The estimates of the clonal and clones × years variance components were statistically significant or nearly statistically significant at p = 0.05 in M2 and N4 but in D8 only the clones × years variance component for yield was statistically significant.

Table 3. The standard deviations and correlations (r) of the clone, clone by year and residual components (the last averaged over the four years) for total pods (yield) and per cent diseased (disease) for the three trials (square root scales).

Analyses of the number of healthy pods, again on a square-root scale, showed that the clonal differences for this trait were highly correlated with the clonal differences for total yield, the minimum value of the correlation coefficient being 0.94. The clones × years interactions were also highly correlated with minimum value again 0.94. This justifies concentrating on the total yield and the per cent diseased as the traits to be used in selection.

The residual variability is attributable to differences between plots and to clones-by-blocks interactions. The variances showed appreciable variation between years and the correlations between the same plot in different years did not display any obvious pattern and so an unstructured repeated measures analysis was used. The residual variation is summarised in Table 3 by the standard deviations and correlations of the two traits when the data have been averaged over the four years.

The positive correlations in M2 for the clone and clone × year components indicate that simultaneously improving yield and reducing the incidence of the disease will be particularly difficult with the clones included in this trial.

Optimal selection indices (Baker, Reference Baker1986) were calculated to see whether including the other basic trait in the selection criterion improved selection gains. There were virtually no advantages in using selection indices in M2 and D8. There were gains from using an index in N4, particularly when per cent diseased is used to select for yield, which increases the selection gain by 14%.

Simulations

Results from the simulation studies based on the estimates of the parameters of the M2 trial are given in Figures 1–3. The figures show, respectively, the consequences of using total yield to select for total yield, per cent diseased to select for per cent diseased, and healthy pods to select for healthy pods. Each figure shows the average value for the new clones and the value for the best standard. Also shown are the plots against the number of replicates, or equivalently the number of new clones tested, of the results of selection based on the cumulative yields over a single year, two years, three years and four years, together with the theoretical effects of selection if the true values of the trait were known for the new clones. In all cases, the performance of the best standard far exceeds the average performance of all the new clones. The theoretical curves for selection based on knowing the true values of the clones decrease, as expected, as the number of clones tested decreases. The points at which the curves for the theoretical values and the best standard cross show the number of new clones that would be needed so that the average true value of the best three clones is equal to that of the best standard. Comparisons of the theoretical curves with the curves for the differing numbers of years of harvest provide an indication of the effects of errors in selecting the best new clones. The balance has always to be between selecting from a small number of clones with relatively small errors or from a larger number of clones but with larger errors involved.

Figure 1. The yield response of the best three clones ranked by yield for different number of years of harvest and different numbers of clones compared with the true clonal values, best standard and average of new clones.

Figure 2. The per cent disease pods response of the best three clones ranked by per cent disease pods for different number of years of harvest and different numbers of clones compared with the true clonal values, best standard and average of new clones.

Figure 3. The healthy pods response of the best three clones ranked by healthy pods for different number of years of harvest and different numbers of clones compared with the true clonal values, best standard and average of new clones.

Figures 1–3 and the corresponding figures for the other two trials showed that there are only very small advantages in basing selection on more than three years of harvest and that the optimal number of replicates is always either three or four compared with a minimum of six replicates in the trials analysed. Results that are based on six years of record in D8 showed the same pattern as the four years of records in the other two trials.

The effect of rodent-damaged pods contributing to the number of diseased pods in the analyses was assessed by restricting the analysis of M2 and N4 to the third and fourth years. In these years, rodent-damaged pods were separated from diseased pods. The optimal number of clones to test and number of years to harvest were not affected. However, particularly in M2, the selection gains in both the basic traits would have been greater if the rodent-damaged pods had been excluded from the count of diseased pods.

The consequences of selecting clones on the basis of one trait on the selection advance in another trait are shown in Table 4.

Table 4. Average performance of best three clones as compared to the average of new clones and best standards for total yield, per cent diseased and healthy pods.

The table shows the estimated average performance of the best three new clones for each of the trials and for all combinations of the trait used in the selection and the trait being improved. The selection is assumed to be based on three years of harvest and four replicates of each new clone. Four replicates imply that 27, 24 and 26 new clones would be tested in the three trials compared respectively with the 18, 16 and 13 new clones in the actual trials. Also shown for comparison are the mean values for all new clones and the values of the best standard.

As expected from the low values in all three trials of the relevant clonal correlation coefficients, the ranking of the clones by per cent diseased does not lead to any appreciable changes in the total yields or healthy yields of the selected clones and conversely. The high correlations between the total yields and the healthy yields of the different clones in all three trials result in the selection advances in each of these two traits being very similar whether total or healthy yields are used in the selection. Note that in only one case, N4 with selection by disease and evaluated by disease, do the average values of the best three clones achieve as good a value as the best standard.

The replicate simulation runs allow the estimation of the variation and therefore the predictability of the responses to selection in an individual trial. Again taking the results based on three years of harvest and four replicates of each clone, the coefficients of variation over replicates of the selection advances for yield, percentage disease and healthy yield when selected on the basis of the same trait values varied from about 9% to near 20%. This means that about 5% of the responses in future trials would be expected to differ from their predicted values by, at best, more than 18% and, at worst, by 40%. Between about 60 and 80% of this variability can be attributed to the inevitable variation in the true values of the best three clones in each trial with the remainder being caused by the errors in selecting the best clones. The estimated selection advances being based on averages over 2000 simulations are very accurate with coefficients of variation varying from 0.2 to 0.4%.

DISCUSSION

The detailed statistical analysis of the results from the three clonal selection trials has shown the importance of balancing the choice of the number of clones tested with the number of replicates of each clone and of choosing an appropriate number of years on which to base the selection of the best clones. The analysis was necessary to provide information about the extent and nature of the various sources of the genetic and environmental variation and covariation affecting the traits of interest. This then allowed the assessment through simulation of alternative selection programmes aimed at improving the yields and decreasing the proportions of diseased pods.

Assuming, as is often the case, that there are more clones available for testing that are thought to be of comparable worth to those chosen for testing, larger selection advances using the same number of plots could often be obtained by testing more clones but with fewer replications. The expected gains from increasing the number of clones tested may have been overestimated if block sizes have to be increased to accommodate the extra clones. The use of incomplete block or lattice designs (Mead et al., Reference Mead, Gilmour and Mead2013) or of analyses based on modelling the spatial variation (Durbán et al., Reference Durbán, Hackett, McNicol, Newton, Thomas and Currie2003; Gilmour et al., Reference Gilmour, Cullis and Verbyla1997) should reduce this overestimation. The more usual criterion when choosing the number of replicates and therefore the number of clones to test has been the requirement to achieve statistically significant differences between the chosen clones and those not chosen or between the chosen clones and some standard clones. However, criteria based on statistical significance are largely irrelevant when the purpose of the trial is to select those clones likely to be worth further assessment. The maximisation of the expected yield of the selected varieties, in our case clones, was discussed in relation to annual crops by Finney (Reference Finney1958a, Reference Finney1958b) and Curnow (Reference Curnow1961). They also considered the choice of the number of stages or years of selection but ignored the effects of varieties by years interactions. Gauch and Zobel (Reference Gauch and Zobel1996) used the criterion of expected yield of selected genotypes in selection experiments but considered only a single stage of selection.

Increasing the number of harvests on which the selection is based will usually improve the efficiency of the selection. However, the returns from obtaining more years of data are often small and may not justify the consequent delays in the further testing and eventual release for commercial use of the selected clones.

Simulations of the effects of basing the selection on each trait individually have shown the compromises that may have to be made to achieve desirable selection advances in the values of the different traits.

The main conclusions in this paper are that, for the relative amounts of genetic and environmental variation and covariation in the particular trials analysed, an optimal programme would probably have just four replicates of each clone and have harvests recorded for three years. Even with an optimal programme, the best new clones are unlikely to be as good as the standard clones, suggesting that the search for new clones should be widened to seek new genetic material.

Only if clonal trials are thoroughly analysed, including the estimation of the genetic and environmental contributions to the variation and covariation of the different traits, can future trials be more effective in selecting clones that will outperform the existing standard clones.

Acknowledgements

This paper (CRIG/CC/2/23) is published with the permission of the Executive Director of the Cocoa Research Institute of Ghana. F. Owusu-Ansah acknowledges the financial support of the Rothamsted International African Fellowship Programme, Dr G. Lockwood for providing the data and answering queries, and the Department of Mathematics and Statistics of the University of Reading for providing office and computing facilities. We thank Dr. Sue Welham for assisting with the repeated measures analysis and a referee for helpful comments on the submitted paper.

References

REFERENCES

Adomako, B. and Adu-Ampomah, Y. (2005). Assessment of the yield of individual cacao trees in four field trials. In Proceedings of the International Workshop on Cocoa Breeding for Improved Production Systems, 2003. Accra, Ghana: INGENIC/Ghana COCOBOD, 4149.Google Scholar
Baker, R. J. (1986). Selection Indices in Plant Breeding. Boca Raton, FL: CRC Press.Google Scholar
Crowder, M. J. and Hand, D. J. (1990). Analysis of Repeated Measures. London: Chapman & Hall.Google Scholar
Curnow, R. N. (1961). Optimal programmes for varietal selection (with discussion). Journal of the Royal Statistical Society B 23:282318.Google Scholar
Durbán, M., Hackett, C. A., McNicol, J. W., Newton, A. C., Thomas, T. B. W. and Currie, I. D. (2003). The practical use of semi-parametric models in field trials. Journal of Agricultural Biological and Environmental Statistics 8:4866.Google Scholar
Finney, D. J. (1958a). Statistical problems of plant selection. Bulletin of the International Statistical Institute 36:242268.Google Scholar
Finney, D. J. (1958b). Plant selection for yield improvement. Euphytica 7:83106.Google Scholar
Gauch, H. G. and Zobel, R. W. (1996). Optimal replication in selection experiments. Crop Science 36:838843.CrossRefGoogle Scholar
Gilmour, A. R., Cullis, B. R. and Verbyla, A. P. (1997). Accounting for natural and extraneous variation in the analysis of field experiments. Journal of Agricultural, Biological, and Environmental Statistics 2 (3):269293.Google Scholar
Kenward, M. G. and Roger, J. H. (1997). Small sample inference for fixed effects from restricted maximum likelihood. Biometrics 53:983997.Google Scholar
Lockwood, G. (1971). Early results from a trial of Upper Amazon cocoa clones in Ghana. Experimental Agriculture 7:321327.Google Scholar
Lockwood, G., Owusu-Ansah, F. and Adu-Ampomah, Y. (2007). Heritability of single plant yield and incidence of black pod disease in cocoa. Experimental Agriculture 43:455462.Google Scholar
Mead, R., Gilmour, S. G. and Mead, A. (2013). Statistical Principles for the Design of Experiments: Applications to Real Experiments. New York: Cambridge University Press.Google Scholar
Pang, J. T. Y. and Lockwood, G. (2008). A re-interpretation of hybrid vigour in cocoa. Experimental Agriculture 44:329338.Google Scholar
Payne, R., Welham, S. and Harding, S. (2008). A Guide to REML in Genstat. Hertfordshire, UK: VSN International.Google Scholar
Simmonds, N. W. (1979). Principles of Crop Improvement. New York: Longman.Google Scholar
Figure 0

Table 1. Layout, level and number of years of recording for the three clonal trials.

Figure 1

Table 2. Mean values per year for pods per tree (yield) and per cent diseased (disease) of standard clones and new clones and probability that a new clone will outperform the best standard.

Figure 2

Table 3. The standard deviations and correlations (r) of the clone, clone by year and residual components (the last averaged over the four years) for total pods (yield) and per cent diseased (disease) for the three trials (square root scales).

Figure 3

Figure 1. The yield response of the best three clones ranked by yield for different number of years of harvest and different numbers of clones compared with the true clonal values, best standard and average of new clones.

Figure 4

Figure 2. The per cent disease pods response of the best three clones ranked by per cent disease pods for different number of years of harvest and different numbers of clones compared with the true clonal values, best standard and average of new clones.

Figure 5

Figure 3. The healthy pods response of the best three clones ranked by healthy pods for different number of years of harvest and different numbers of clones compared with the true clonal values, best standard and average of new clones.

Figure 6

Table 4. Average performance of best three clones as compared to the average of new clones and best standards for total yield, per cent diseased and healthy pods.