Knowledge of the behaviour of lactation curves is crucial for decision-making in commercial dairies (Macciotta et al. Reference Macciotta, Vicario, Di Mauro and Cappio-Borlino2004, Reference Macciotta, Vicario and Cappio-Borlino2005). This is important not only at the population or dairy level, but also at the individual level (Dekkers et al., Reference Dekkers, en Hag and Weersink1998; Vargas et al. Reference Vargas, Koops, Herrero and Van Arendonk2000). Moreover, with models having a mechanistic interpretation it is possible to make inferences about management and physiology from parameter values through the construct of the model (Ehrlich, Reference Ehrlich2013). Ehrlich also demonstrates that lactation curve shape varies significantly between herds, suggesting that fitted parameter values could be used as a metric for monitoring herd performance, and points out that if lactation models are used to predict future daily milk production for incomplete lactations, residuals between predicted and actual daily production can be used to quantify the response to an intervention. The estimation of lactation curves provides parameters that are critical in determining the reproductive cost per animal (De Vries, Reference De Vries2006). In the bio-economic model for dairy farms proposed by Cabrera & Giordano (Reference Cabrera and Giordano2010), some of the inputs needed when deriving reproductive cost per animal under a given dairy production system are the parameters derived from fitting lactation curves.
Several mathematical functions have been proposed for fitting the values of daily yields according to days in milk. Those functions may be linear or nonlinear in the parameters and may use a different number of constants for a better fitting of the observed data. The incomplete gamma function proposed by Wood (Reference Wood1967), with three parameters, is one of the most popular and valid nonlinear models to date for describing lactation curves. Other nonlinear models, such as the diphasic model (Grossman & Koops, Reference Grossman and Koops1988) and the MilkBot model proposed by Ehrlich (Reference Ehrlich2011) have been widely used for fitting lactation curves. In both of these models, the number of parameters to be estimated is larger than in the Wood model. The addition of extra parameters aims at capturing certain movements of lactation curves, some of which are of genetic and others of environmental nature (Macciotta et al. Reference Macciotta, Dimauro, Rassu, Steri and Pulina2011). A simplified 3-parameter version of the MilkBot model has been suggested for use where data do not include daily milk weights covering the first weeks of lactation. This would match the parameter count of the Woods model with little decrease in precision expected (Ehrlich, Reference Ehrlich2013). All three models are a set of functions used as a standard for modelling normal milk production and comparing changes in productivity over time in different dairy production systems. The widely used derived parameters from a lactation curve are peak yield, days in milk (DIM) to peak yield and 305-d yield.
Mostert et al. (Reference Mostert, Theron and Kanfer2003) classified the productive performance of commercial dairies in two groups based on milk production: a high production group (above the national average milk yield) and a low production group, and showed how this partition affected the shape of the lactation curve. They found that 305-d yield was higher for high production dairies during the whole lactation. In addition, high production dairies showed steeper lactation curves and curves of low production dairies were flatter. As a consequence, the resulting persistence was higher in lower production dairies than in higher production ones. These results were also reported in other studies, such as in Olori & Galesloot (Reference Olori and Galesloot1999) for Holstein cows in Ireland, for the Dutch goat population (Galesloot, Reference Galesloot2000) and for Holsteins in Hungary (Galesloot, personal communication).
Catillo et al. (Reference Catillo, Macciotta, Carretta and Cappio-Borlino2002) used lactation curves to show that calving season affects milk production. According to Tekerli et al. (Reference Tekerli, Akinci, Dogan and Ackan2000), peak and milk yields were higher for cows that calved in fall and winter. Moreover, others studies have concluded that primiparous cows produce less milk than multiparous cows in early lactation, with lower peak yields, but are more persistent, with 305-d yields being significantly lower (Rao & Sundaresan, Reference Rao and Sundaresan1979; Singh & Shukla, Reference Singh and Shukla1985; Keown et al. Reference Keown, Everett, Empet and Wadell1986; Tekerli et al. Reference Tekerli, Akinci, Dogan and Ackan2000; Rekik et al. Reference Rekik, Gara, Hamouda and Hammami2003; Silvestre et al. Reference Silvestre, Martins, Santos, Ginja and Colaço2009). Therefore, patterns of lactation curves depend not only on time or days in milk, but also on several uncontrolled genetic and environmental factors (Nicolò et al. Reference Nicolò, Macciotta, Cappio-Borlino and Pulina2004; Dijkstra et al. Reference Dijkstra, Lopez, Bannink, Dhanoa, Kebreab, Odongo, Fathi Nasri, Behera, Hernandez Ferrer and France2010). Ehrlich (Reference Ehrlich2013) suggested that significant differences in parameters between herds imply uncontrolled genetic and environmental factors. This fact produces high variability among lactation curves which cannot be appropriately modelled by means of classical fixed effects models with a unique random term (error term). Macciotta et al. (Reference Macciotta, Vicario and Cappio-Borlino2005) cited the great variation that characterises parameter values estimated on individual lactation curves. Mixed models (models with fixed and random effects) (West et al. Reference West, Welch and Gatecki2007) allow handling extra variability associated with factors that could affect each lactation curve differently.
In numerous works that study the fitting of lactation curves, the models are estimated assuming not only the existence of only one variance component, but also the independence of the data from consecutive productions. However, it should be noted that milk yield test records may be serially correlated because they are measurements taken over time on the same individual (longitudinal data). The correlation between those records may be high as a result not only of the animal's genetics but also of the environment such as the short-term effect of, for example, feeding (Ali & Schaeffer, Reference Ali and Schaeffer1987; Carvalheira et al. Reference Carvalheira, Blake, Pollak, Quaas and Duran-Castro1998; Ammon & Spilke, Reference Ammon and Spilke2005; Macciotta et al. Reference Macciotta, Mele, Conte, Serra, Cassandro, Dal Zotto, Cappio-Borlino, Pagnacco and Secchiari2008). It can be expected that measurements that are closer together in time are more correlated than those that are more distant (positive autocorrelation). Moreover, measurements recorded on the same cow might be more correlated than those recorded on different animals because they share a common contribution from the same individual (Macciotta et al. Reference Macciotta, Dimauro, Rassu, Steri and Pulina2011). Lack of modelling of these autocorrelation patterns in the dataset can lead to erroneous conclusions, especially when evaluating treatment differences (Nicolò et al. Reference Nicolò, Macciotta, Cappio-Borlino and Pulina2004). This difficulty can be also addressed using mixed models by modelling the covariance structure of the error terms in the model or, alternatively, by incorporating random effects associated with one (or more) of the parameters of the mean structure of the model. Therefore, mixed models are a powerful tool for analysing lactation curve data. Advances in the development of a statistical method for the treatment of normal longitudinal data (West et al. Reference West, Welch and Gatecki2007; Fitzmaurice et al. Reference Fitzmaurice, Davidian, Verbeke and Molenberghs2008) have been scarcely explored in applications of milk production (Quintero et al. Reference Quintero Vélez, Serna Gallo and Cerón-Muñoz2007). The nonlinear characteristic of the lactation curve models makes the mixed model estimation computationally harder than estimation of linear functions. Using contemporary statistical modelling techniques, such as nonlinear mixed models, can help improve population average estimation of lactation curves and predictions made for each cow.
The aim of this work was to compare the fit of lactation curves achieved from three nonlinear models (Wood, MilkBot, and diphasic models) using two statistical approaches: a nonlinear fixed-effect model and a nonlinear mixed model composed of the same lactation curve models carrying an additional random effect at the cow level.
Materials & methods
Data
We worked with a database that included 19 382 records obtained from monthly test data conducted for one year (2008), for a total of 2167 complete lactations, all longer than 149 d. In this dataset, there is one lactation per cow. 984 were first parity lactations and the rest were third parity lactations. We used separately these parity lactations to fit the alternative models because those two groups showed two very different shapes of lactation curve (Figs. 1 & 2). We did not worked with the second parity lactation as Dematawewa et al. (Reference Dematawewa, Pearson and VanRaden2007), because as is known have an intermediate behaviour and was not of our interest. Furthermore, we worked with 2008 data because they were highly depurated. The dairies involved are located in the centre and south of Santa Fe and Córdoba provinces in Argentina. The lactations selected were associated with dairies with high reproductive performance (average annual pregnancy ≥16%) and production efficiency (total liters produced per lactation ≥7·200 l). The lactations included in this study were those that started between March and August (fall + winter), which are considered to be the cool seasons.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170520080122-24989-mediumThumb-S0022029917000085_fig1g.jpg?pub-status=live)
Fig. 1. Observed lactation curves from 30 randomly selected lactations (a) and raw average lactation curve observed for all lactations (n = 984) (b) of first lactation cows.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170520080122-59518-mediumThumb-S0022029917000085_fig2g.jpg?pub-status=live)
Fig. 2. Observed lactation curves from 30 randomly selected lactations (a) and raw average lactation curve observed for all lactations (n = 1183) (b) for third lactation cows.
Statistical analyses
All three non-linear functions, Wood, MilkBot, and diphasic, were fitted to each of the recorded lactations as fixed effect models.
The model of the incomplete gamma function (Wood, Reference Wood1967) (1) was
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170519114556929-0509:S0022029917000085:S0022029917000085_eqn1.gif?pub-status=live)
where Y (t) is milk yield on day t of lactation, a is a parameter representing the production at the beginning of lactation; b and c are parameters associated with the increase and decrease of the slopes of the lactation curve, respectively. Typical lactation curves have positive b and c values. Once the fixed parameters were estimated, peak yield (Y max) (2) and days in milk at the peak yield DIMpeak (3) were estimated using the following formulas:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170519114556929-0509:S0022029917000085:S0022029917000085_eqn2.gif?pub-status=live)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170519114556929-0509:S0022029917000085:S0022029917000085_eqn3.gif?pub-status=live)
The MilkBot function (Ehrlich, Reference Ehrlich2011) (4) for each lactation for milk yield on day t of lactation Y (t):
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170519114556929-0509:S0022029917000085:S0022029917000085_eqn4.gif?pub-status=live)
This function consists of four parameters: parameter a called scale, which is a multiplier that determines the total amount of milk yield and is expressed in l/d, it must be a positive number; parameter b, ramp, controls the rate of increase in milk production in early lactation and is expressed in days; parameter c, offset, represents the offset in time between calving and the time when the highest rate of increase in milk production occurs, and is expressed in days; finally, parameter d, or decay, controls the loss of productive capacity and is expressed as day−1. With these parameters, peak yield (Y max) (5) and days in milk at peak DIMpeak (6) were estimated using the following formulas:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170519114556929-0509:S0022029917000085:S0022029917000085_eqn5.gif?pub-status=live)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170519114556929-0509:S0022029917000085:S0022029917000085_eqn6.gif?pub-status=live)
The third function adjusted in this work was the diphasic function (Grossman & Koops, Reference Grossman and Koops1988) (7), which is based on the sum of two logistic functions, defined as:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170519114556929-0509:S0022029917000085:S0022029917000085_eqn7.gif?pub-status=live)
where Y(t) is milk yield of day t (t are the days in milk; DIM), tanh is the hyperbolic tangent, for the first phase, a 1 and a 2 are half the asymptote of the total production (L) in phase one and two, respectively; b 1 and b 2 are the production rate relative to a 1 (d−1) or a 2 of phase one and two, respectively. Finally, c 1 and c 2 are the peak time (d) of phase one and two, respectively. Liters associated with peak production of phase i were calculated as the product of parameters a and b of a same phase.
After fitting the three nonlinear models as fixed effects models, we fit the same models as mixed models with an extra random component at the parameter related with the magnitude of milk production. Therefore, for the Wood and MilkBot model was the parameter a, and for the diphasic model was choose the parameter a
1. This random component is assumed normally distributed with zero mean and variance
$\sigma _a^2 $
, which should be interpreted as an unobservable variable that represents a random deviation of the coefficient a of the ith lactation from the population parameter. Such random deviation is assumed to be independent of the error term. Fixed and Mixed Nonlinear Models were fitted with PROC NLMIXED in SAS (SAS, 2008).
Comparison criteria
The significance of the variance component associated with the additional random effect was evaluated by comparing, for the three functions (Wood, Milkbot, and diphasic), the mixed model with the analogous fixed effects model, using a likelihood ratio test (LRT). The LRT statistic was calculated by computing the maximum log-likelihood of the reference model (with a higher number of parameters, the mixed model) and the maximum log-likelihood of the simpler model (Fixed effects model). Twice the difference of log-likelihoods is usually compared with a chi-square distribution with degrees of freedom equal to the difference between the numbers of parameters estimated by the two models, except when comparison includes a mixed model with an extra random effect and a fixed model (Molenberghs & Verbeke, Reference Molenberghs and Verbeke2007). In this case, which was our particular case, a chi-square distribution with 0·5 degrees of freedom is used to evaluate the approximate significance of the log-likelihood differences (Molenberghs & Verbeke, Reference Molenberghs and Verbeke2007). This test was performed with restricted maximum likelihood (REML) estimators (Searle et al. Reference Searle, Casella and McCulloch1992). Then, if the test is significant, the correct model is the most complex (mixed model); otherwise, the reduced model (fixed model) is the appropriate one.
The Akaike Information Criterion (AIC; Sakamoto et al. Reference Sakamoto, Ishiguro and Kitagawa1986) and Bayesian Information Criterion (BIC; Schwarz, Reference Schwarz1978) were also used as criteria for model selection. These indices can be used to rank different models and choose the one of lowest value.
The Durbin-Watson (DW) test (Durbin, Reference Durbin1970) was used to evaluate the hypothesis of independence between the errors of each fitted model. This test assumes that the errors are normally distributed with zero mean and equal variance. The null hypothesis for this test is that the errors are independent, against the alternative hypothesis that the errors are correlated with a first-order autoregressive structure. The test statistic is d = 2 (1 − r), where r is the sample autocorrelation of the model residuals. If the errors are independent, d will be close to 2. By contrast, if the errors are strongly autocorrelated, d will be far from 2.
Finally, to test the predictive value of any of the models we used the cross-validation method. We randomly chose 75% of the data to be in a calibration set, and the remainder 25% of the data was the validation set. The parameters were estimated from the calibration set using all the studies models and validated on the validation set. The models were fitted to individual monthly test data yields within each parity class and parameters were estimated by nonlinear regression using the NLMIXED procedure. The root mean square error (RMSE) was used for comparing models within fixed and random models; thus, those with the lowest values are the ones of best fit (Kobayashi & Salam, Reference Kobayashi and Salam2000). In order to compare two models, we used the ratio between RMSE's, selecting the model with the lowest RMSE as referent. This ratio was used as measurement of the relative efficient (RE) of a given model with regard to the reference model.
Results & discussion
Observed lactation curves
In the first lactation, peak yield (27 l) occurred between 78–86 DIM. In the third lactation, peak yield (33 l) occurred between 57–64 DIM (Tables 1 & 2). Consequently, in agreement with other authors (Rekik et al. Reference Rekik, Gara, Hamouda and Hammami2003; Macciotta et al. Reference Macciotta, Vicario, Di Mauro and Cappio-Borlino2004; Dematawewa et al. Reference Dematawewa, Pearson and VanRaden2007; Silvestre et al. Reference Silvestre, Martins, Santos, Ginja and Colaço2009; Cole et al. Reference Cole, De Vries and Null2011), third lactation cows (mature cows) reached peak yield earlier and with higher milk yield than first-lactation cows. The rate at which production declines after reaching the peak was also higher in third lactation than in first lactation cows.
Table 1. Parameter estimates and goodness of fit criteria for three models of lactation curves, as fixed effects model and mixed model with random effect associated at the parameter a for the Wood and MilkBot model and at the parameter a 1 for the Diphasic † model. First lactation cows calved in the cool season
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170520080122-66441-mediumThumb-S0022029917000085_tab1.jpg?pub-status=live)
RE, Relative efficient; AIC, Akaike information criterion; BIC, Bayesian information criterion; DW, Durbin-Watson statistic; −2log L., −2 log(likelihood); LRT, likelihood ratio test; na, not available
† For the diphasic model, each parameter was estimated for the two phases of the function separately, a: a 1 and a 2; b: b 1 and b 2; c: c 1 and c 2
‡ Standard error between parentheses
§ Milk yield in the Wood model = a(b/c)b × e − b; in the MilkBot = (a) × ((1 − e((c − (c − b × Log((2 × b × d)/(1 + b × d))))/b)/2) × e(−d × (c − b × Log((2 × b × d)/(1 + b × d))))); and in the Diphasic model = ab
¶ Days in milk to Peak in the Wood model = b/c; in the MilkBot model = b × (Log((2 × d × b)/(d × b + 1))) + c; and in the Diphasic model = c 1 and c 2 for each phase.
Table 2. Parameter estimates and goodness of fit criteria for three models of lactation curves as fixed effects model and mixed model with random effect associated with parameter a for the Wood and MilkBot model and at the parameter a 1 for the Diphasic † model. Third lactating cows calved in the cool season
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170520080122-95619-mediumThumb-S0022029917000085_tab2.jpg?pub-status=live)
RE, Relative efficient; AIC, Akaike information criterion; BIC, Bayesian information criterion; DW, Durbin-Watson statistic; −2log L., −2 log(likelihood); LRT, likelihood ratio test; na, not available
† For the diphasic model, each parameter was estimated for the two phases of the function separately, a: a 1 and a 2; b: b 1 and b 2; c: c 1 and c 2
‡ Standard error between parentheses
§ Milk yield in the Wood model = a(b/c)b × e − b; in the MilkBot = (a)×((1 − e((c-(c − b × Log((2 × b × d)/(1 + b × d))))/b)/2)×e(−d×(c − b × Log((2 × b × d)/(1 + b × d))))); and in the Diphasic model = ab
¶ Days in milk to Peak in the Wood model = b/c; in the MilkBot model = b × (Log((2 × d × b)/(d × b + 1))) + c; and in the Diphasic model = c 1 and c 2 for each phase
Lactation curves observed from 30 randomly selected lactations and average lactation curve observed for all first lactations (n = 984) are presented in Fig. 1. Lactation curves observed in 30 randomly selected lactations and average lactation curve observed for all lactations (n = 1183) for third lactation cows are presented in Fig. 2. In both figures, the variability between animals of the same group becomes apparent.
Fitted lactation curves
Estimates of the model parameters for fitting the lactation curves for the Wood (Reference Wood1967), MilkBot (Ehrlich, Reference Ehrlich2011), and diphasic (Grossman & Koops, Reference Grossman and Koops1988) models for first lactation calving are shown in Table 1. According to the goodness of fit criteria used, AIC, BIC, and LRT, the diphasic model had the best fit for first lactations. However, it had the worst relative efficiency (RE) values. In terms of goodness of fit of the observed data, a low RMSE might be also a consequence of an over-parameterised model (Grossman & Koops, Reference Grossman and Koops1988; Reference Grossman and Koops2003; Dematawewa et al. Reference Dematawewa, Pearson and VanRaden2007). Over-parameterised models fit the available data well, but may fail to make good predictions with new data. Over-parameterisation also produces estimation problems because of parameter correlations (singularity or near singularity in the Hessian matrix), and standard errors of estimation could become not available (Tables 1 & 2; Dematawewa et al. Reference Dematawewa, Pearson and VanRaden2007).
For the Wood and MilkBot classical models (fixed effects), while majority of the information criteria suggest selecting MilkBot, there are no differences between the RE criteria. Therefore, following the principle of parsimony, selecting Wood model with three parameters would be a good option. According to Dematawewa et al. (Reference Dematawewa, Pearson and VanRaden2007), the Wood model compared with other models not included in this study, presented the lowest RMSE values. Since Wood and MilkBot model use the same exponential decay function, they would be expected to function very similarly in the extended lactations which were the focus of that study.
The comparison of each of the fixed effects models with their respective fits with mixed models with an additional random effect showed an improvement of fit according to AIC and BIC. Both approaches (fixed and mixed models) exhibit a statistically significant difference according to the LRT. Moreover, as shown by the DW test, the inclusion of the random effect at the cow level reduced the correlation between repeated test day records on the same lactations and, in this case, on the same cow. DW statistic values increased, approaching the expected value of 2 for withe noise (uncorrelated data), and even the predictive ability of the models improved significantly under the mixed model framework (Dematawewa et al. Reference Dematawewa, Pearson and VanRaden2007).
The estimate of the variance of the random effect was relatively high in the MilkBot and diphasic models, highlighting the high variability between individual lactation curves and the need to adjust estimates/forecasts to the different characteristics of different lactations, even within a single group. This was not reflected in the Wood model. However, according to the comparison criteria, predictions are improved by adding a random effect to the Wood model.
The additional random effect produced a better fit from a statistical standpoint, increasing the predictive potential (Calegario & Mastri, Reference Calegario and Mastri2005), but did not affect the estimates of dairy parameters derived from lactation curves, such as DIM to peak and peak yield. The derived parameter that was most sensitive to statistical modelling was DIM to peak, especially in the MilkBot model. However, using MilkBot or Wood models generated greater differences than the addition or not of a random effect to model correlations between test day records. Nonetheless, these random terms are timely because as concluded by Lindstrom & Bates (Reference Lindstrom and Bates1990), incorporating extra random deviations gives flexibility to the model, eliminating the need to adjust different functional forms for individual curves of the same population. It should be noted that the interpretation of the derived parameters in the fixed and mixed models differ slightly: while in the fixed model they represent averages through the population of dairies considered, in the mixed model they represent averages for ‘typical’ cows, i.e., cows whose random effects are average.
Estimates of the parameters for fits made to lactation curves with the Wood (Reference Wood1967), MilkBot (Ehrlich, Reference Ehrlich2011) and diphasic (Grossman & Koops, Reference Grossman and Koops1988) models for third lactation cows calved in cool seasons are shown in Table 2. Although the parameter values are higher in third lactation cows, except for DIM to peak, the same, but more pronounced, behaviour of residuals is observed when comparing with first lactations cows at the model level. A correct fit of diphasic models with random cow effects was not accomplished because of correlated parameter estimates.
Moreover, curves for lactations that started in hot seasons (September to February) were also fitted and the effect of heat stress was reflected in the indicator peak yield in both lactations studied (first and third lactation), resulting in a reduction of between 2 to 3 l at peak yield. The indicator DIM to peak was also affected, with a reduction of 2 and 15 DIM until peak yield occurs in the first and in the third lactations, respectively (data not shown). The effect of heat stress on milk production was previously described (Tekerli et al., Reference Tekerli, Akinci, Dogan and Ackan2000; Catillo et al. Reference Catillo, Macciotta, Carretta and Cappio-Borlino2002; Rekik et al. Reference Rekik, Gara, Hamouda and Hammami2003; Silvestre et al. Reference Silvestre, Martins, Santos, Ginja and Colaço2009). At the model level, similar behaviours to those observed in both lactations started in the cool season were found (data not shown).
Conclusions
Fitted lactation curves are useful to estimate dairy productive indicators. The three nonlinear models for lactation curves had a better fit when an additional random effect was included to model the variability between lactations and data correlation from successive milk yield tests. However, the inclusion of random effects did not significantly affect production parameters derived from the curve, such as milk yield or days in milk to peak. In the dairy farms studied, peak yield occurred at about 80 DIM, with an average peak production of 27 l. We recommend adding a random cow effect when the fitted lactation curve is used for making predictions. The diphasic model was computationally complex because of its over-parameterisation, and therefore it proved to be impractical for fitting lactation curves such as those in the pasture-based production systems. Moreover, since the differences in goodness of fit between Wood and MilkBot models were not important, the most parsimonious model (Wood) was selected to derive the productive indicators.
We thank the Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) for financial support.