Introduction
Milk composition and its variation during lactation of cows can indicate imbalances in health or nutrition. Particularly changes of fat to protein ratio, milk urea nitrogen content, and concentration of ketone bodies provide suitable information on energy, protein and crude fibre supply, and on metabolic imbalances in dairy cows. Likewise, somatic cell count (SCC) and e.g. concentration of minerals and lactose in raw milk are indicators for udder diseases and helpful tools for monitoring of udder health (Friggens et al. Reference Friggens, Ridder and Lovendahl2007; Brandt et al. Reference Brandt, Haeussermann and Hartung2010). Since monitoring of animal health and metabolic indicators are increasingly part of herd management in dairy production, the knowledge of daily milk composition changes can assist this process. An analytical tool based on near-infrared spectroscopy (NIRS) with the possibility to analyse milk during daily milking routine could provide that information. The robustness of the near-infrared (NIR) equipment makes this technology in general suitable for in-line application in milking parlours, with the aim to analyse individual cow's milk during milking. Analysing raw milk with NIRS was the topic in several studies either in a static measurement setup (Tsenkova et al. Reference Tsenkova, Atanassova, Toyoda, Ozaki, Itoh and Fearn1999, Reference Tsenkova, Atanassova, Ozaki, Toyoda and Itoh2001, Reference Tsenkova, Atanassova, Morita, Ikuta, Toyoda, Iordanova and Hakogi2006, Schmilovitch et al. Reference Schmilovitch, Shmulevich, Notea and Maltz2000; Chen et al. Reference Chen, Iyo, Terada and Kawano2002; Saranwong & Kawano, Reference Saranwong and Kawano2008; Aernouts et al. Reference Aernouts, Polshin, Lammertyn and Saeys2011) or in an in-line application (Kawasaki et al. Reference Kawasaki, Kawamura, Nakatsuji and Natsuga2005, Reference Kawasaki, Kawamura, Tsukahara, Morita, Komiya and Natsuga2008). Our own investigations resulted in a good accuracy in predicting fat, protein, lactose and urea content as well as SCC in milk with NIRS both in a static measurement setup (Melfsen et al. Reference Melfsen, Hartung and Haeussermann2012a) and in an in-line application (Melfsen et al. Reference Melfsen, Hartung and Haeussermann2012b). These and results in the literature of the last decade suggest that NIRS is a promising tool to predict concentration of some milk constituents. However, in most studies on the use of NIRS in milk the prediction was done on randomised test sets closely connected with the calibration set. In practical farm application, the calibration model has to be applied to samples from farms, cows or seasonal circumstances that are either not part of the calibration or meanwhile have changed (independent test set). Chemometric models should possess a high robustness towards external variations to achieve sufficient prediction accuracy (Wang et al. Reference Wang, Veltkamp and Kowalski1991; Naes et al. Reference Naes, Isaksson, Fearn and Davies2004). In contrast to manufactured products, it is rather difficult to include all future variability in calibrations for natural animal products. The composition of raw milk is influenced by within-cow variability (cow age, lactation number, lactation state, breed, health status, oestrus and gestation status), within-nutrition variability (cows diet rationing, energy and protein supply, feeding time, pasture management), within-farm variability (number of milkings per day) as well as climatic and seasonal variability, such as temperature and daylight hours (Neitz & Robertson, Reference Neitz and Robertson1991). Additional influences of the individual animal on NIR spectra are based to a large extent on scattering effects of fat globules (Tsenkova et al. Reference Tsenkova, Atanassova, Itoh, Ozaki and Toyoda2000, Reference Tsenkova, Meilina, Kuroki and Burns2009; Cattaneo et al. Reference Cattaneo, Cabassi, Profaizer and Giangiacomo2009). Furthermore, variations in measuring instruments and environment must be included in calibration models to achieve accurate prediction results. This is of particular importance when the spectroscopic signal from the compound of interest in milk is rather small (Thomas & Ge, Reference Thomas and Ge2000). Most of these effects are missing in the multivariate calibration models on milk composition analysis described in literature. Robust calibration models with the inclusion of large amounts of biological variability exist in literature mainly for herbal products such as wheat (Delwiche & Norris, Reference Delwiche and Norris1993), kiwi fruits (McGlone & Kawano, Reference McGlone and Kawano1998) and apples (Peirs et al. Reference Peirs, Lammertyn, Ooms and Nicola2001, Reference Peirs, Tirry, Verlinden, Darius and Nicola2003).
The aim of this study is to evaluate and compare the robustness and prediction accuracy of raw milk constituents with NIRS by means of different multivariate models comprising internal and external effects.
Material and Methods
NIR spectra of raw cow milk were acquired during the milking process. At the same time, corresponding subsamples of each milking were taken during the milking process intended for reference analysis in the laboratory. Measurements were done on three different farms, each applying a different feeding diet, over a time period of six month.
NIR spectra acquisition
An experimental measuring setup has been designed, consisting of a commercially available milk meter (LactoCorder, WMB AG, Balgach, Switzerland), a measuring cell, designed as a flow-through chamber, a sample bottle, a contact reflection sensor head (PSS-H-B01) and a diode array spectrometer system with an InGaAs (Indium Gallium Arsenide) detector (PSS-1720; both Polytec GmbH, Waldbronn Germany). The setup was installed in the long milk tube of one milking place per farm.
The measuring cell and thereupon the designed bypass system were attached to the sampling valve of the milk meter as described in Melfsen et al. (Reference Melfsen, Hartung and Haeussermann2012b). The valve diverted a representative sample of 6·25% of the raw milk during the milking process to the measuring cell, which was designed in such a way that the milk flow after the cell can be directed either into a sample bottle or back to the milk tube. The contact reflection sensor head was attached to the measuring cell and connected to the spectrometer. Near-infrared spectra were acquired in the measuring cell in diffuse reflection mode in the wavelength region of 851–1649 nm during the milking process every 500 ms while the milk was flowing through the cell. Each 2 kg of milking, spectra were averaged and subsamples of raw milk in the bypass system were collected in a sample bottle for reference analysis after passing the measuring cell. In a final step, spectra of complete milkings per cow were additionally averaged to predict constituents in composite milk samples.
Milk sampling
Bypassed milk was taken for reference analysis at each time 2 kg of milk had passed the milk meter. An aliquot of 45 ml from all milk samples was conserved with bronopol after sampling and sent to laboratory for reference analysis of fat, protein and lactose (Milkoscan FT + ; Foss-Electric A/S, Hillerød, Denmark). The rest of the bypassed milk was collected and summed up with all subsamples of one complete cow milking to gather a milk sample of the total cow milking. This sample was also sent for reference analysis to the laboratory.
Dairy farms and measurement frequency
In-line measurements were done on three different farms located in the federal state of Schleswig-Holstein, Germany: one commercial dairy farm (ME); one research dairy farm (Max-Rubner-Institute Kiel, in Schaedtbek; SB); and one education and research dairy farm (Chamber of Agriculture Schleswig-Holstein, in Futterkamp; FK). Farm ME has been visited six times for measurements and milk sampling, farms SB and FK three times each. An overview on measurement sequence, total number of milk samples and feeding ratios is provided in Fig. 1. Each visit to farm ME comprised three consecutive milkings (two evening and one morning milking). The data sets of two consecutive visits were summarised in one data set (MEa1 + a2; MEb1 + b2; MEc1 + c2), referred to as visit a, b, and c, respectively. Milk samples were taken during 154 cow milkings of 95 different cows (total 1313 samples). The cows were in their first to eleventh lactation and 22–490 d in milk ($\bar x = 2{\cdot}6$ lactations and 217 DIM). Each visit (a, b, and c) to farms SB and FK comprised four consecutive milkings (two evening and two morning milkings). Milk samples at farm SB were taken during 116 cow milkings of 51 different cows (total 937 samples). The cows were in their first to ninth lactation and 39–467 d in milk ($\bar x = 2{\cdot}4$ lactations and 189 DIM).
Milk samples at farm FK were taken during 84 cow milkings of 72 different cows (total 869 samples). The cows were in their first to fifth lactation and 7–312 d in milk ($\bar x = 2{\cdot}3$ lactations and 165 DIM).
The amount of analysed cow milkings depended on the number of visits, the total number of cows and the type of milking parlour. The number of cows that appeared at more than one milking in the analysis was incidental due to randomised appearance of cows at the milking place at which the measuring setup was installed. All farms had the same cow breed (German Holstein) with a high lactation performance (9·500–10·600 l/305d).
Calibration and validation sets
A total of 3119 NIR spectra and milk samples were acquired (spectra taken during parts of the milking process (n = 2765) and averaged spectra (n = 354)). In a first step, spectra were merged into one dataset per visit per farm in farms SB and FK (SBa–SBc; FKa–FKc), and per two consecutive visits per farm, respectively, in farm ME (MEa–MEc) (Fig. 1.). Additional farm datasets contained all spectra and results of reference milk samples of the respective farm (MEtotal, SBtotal, FKtotal). In a second step, datasets were combined in different calibration and validation sets (CV-set) (Table 1). The nomenclature of the CV-sets is based on the farm in the validation set on the one hand and on the status of the calibration set (random internal (RDM), internal (INT), external (EXT), combined (EXT + 1/3 & EXT + 2/3)) on the other hand.
Fully random internal calibration models (RDM): NIR spectra of the complete dataset per farm were subdivided into two random sample sets: a calibration set containing two-thirds of all samples and a test set containing the remaining samples.
Internal calibration models (INT): NIR spectra of the visits a and b were summarised in one calibration set per farm. Validation was done on visit c to the same farm (MEc; SBc; FKc, Table 1).
External calibration models (EXT): The whole spectral information of two farms was summarised in one calibration set. Validation was done on the milk samples of visit c to the respective remaining farm, i.e. on the equal validation set like the validation in INT (Table 1). The aim of the external calibration sets was to evaluate their robustness to predict milk constituents in milk samples of an unknown third farm. In a further step, the external calibration sets were complemented with spectral information of either visit a (EXT + 1/3) or visits a and b (EXT + 2/3) of the remaining validated farm. The validation set remained equal to the validation set in EXT and INT (Table 1).
Cal set: calibration set; val set: validation set
Preprocessing of NIR data
Chemometric tools (SL Calibration Wizard v.1.1.0; SensoLogic GmbH, Norderstedt, Germany) were used for preprocessing of spectra, calibration of different milk constituents, and validation of results. Spectra normalisation was done to NIR reflectance spectra in order to reduce scattering effects. The applied normalisation of spectra scales the measured reflection to an ordinate between 0 and 1, where the minimum and maximum reflection value of each spectrum corresponds to none (i.e. 0) and total (i.e. 1) reflection, respectively, in the normalised spectra. The statistical PLS-1 method (partial least squares) was used for calibration of each constituent. Cross-validation (20 cross-validation segments) was used in the calibration process to estimate the number of principal components for the calibration. Validation was based on the previously created test set. Validation criteria for each PLS model were coefficient of determination (R 2), Root Mean Square Error of Prediction (RMSEP) and Ratio of Prediction to Deviation (RPD). In general the RPD is defined as ratio of sd of the validation set to Standard Error of Prediction (SEP) when no or only small bias is existing (Williams, Reference Williams, Williams and Norris2001). Since the RMSEP is independent of appearances of high bias values, RPD is calculated as ratio of sd of the validation set to RMSEP in this study.
Statistical significance of differences between the varying CV-sets was tested by comparing absolute values of the residuals of NIR predicted and laboratory analysed milk contents (IBM SPSS Statistics, Version 19).
Results
An overview of the range, mean and sd of the fat, protein and lactose contents in the reference milk samples of each visit per farm is shown in Table 2. Since reference milk samples were taken every 2 kg of total milking, large ranges of the single milk contents were covered.
The data of all three milk contents in the calibration and validation sets of all created datasets were normally distributed.
Statistical calibration performance
The statistical performances of fat, protein and lactose prediction for the fully random internal CV-sets (ME-RDM; SB-RDM; FK-RDM) are shown in Table 3. Excellent calibration results were achieved with respect to RPD values in all three farms for predicting fat content in milk. Very good RPD values up to 6·36 were achieved when protein content in milk was predicted. RPD of lactose prediction showed useful results for the farms ME and SB and a good accuracy in farm FK. Comparable RMSEP were achieved in all three farms for the prediction of fat and protein content. RMSEP for the prediction of lactose content was considerably lower at FK compared with the other farms. The bias was rather small for all milk constituents with regard to the range of data.
SEC: standard error of calibration; RMSECV: root mean square error of cross validation; RMSEP: root mean square error of prediction; RPD: ratio of sd of the validation set to RMSEP
In Table 4 the model performances of farm internal CV-sets are shown (ME-INT; SB-INT; FK-INT). Compared with the results in Table 3, considerably higher bias can be observed for fat, protein and lactose prediction. Especially absolute values of protein and lactose bias were rather high with up to 0·0955 and −0·0701%, respectively. Excellent calibration results were still achieved regarding RPD values of fat prediction for all three farms. Slightly higher RMSEP than in RDM can be observed for predicting fat content (Tables 3 & 4).
SEC: standard error of calibration; RMSECV: root mean square error of cross validation; RMSEP: root mean square error of prediction; RPD: ratio of sd of the validation set to RMSEP
The prediction of protein content in the internal CV-set INT achieved a much lower accuracy than in RDM, with RPD values around 2·0. An insufficient accuracy was achieved for all three farms when lactose content in milk was predicted. NIR prediction results in INT for protein and lactose showed a significantly higher deviation from the laboratory values compared with validation of randomised selected samples in RDM (Kruskal–Wallis-Test (P ≤ 0·001)).
Model results of CV-sets with external validation spectra (ME-EXT; SB-EXT; FK-EXT) are shown in Table 4. According to RPD classification by Williams (Reference Williams, Williams and Norris2001), excellent prediction accuracy was achieved with regard to the fat content. Suitable RPD values for analytical purposes in most NIR applications in agricultural products with low sd were achieved for protein calibration. RPD values for lactose prediction in raw milk samples were rather small and classified as not recommended for analytical purposes.
Even though results in EXT did not achieve the same accuracy as in RDM, the prediction of protein content in raw milk in EXT was realised with a significantly higher accuracy than in INT (Kruskal–Wallis-Test (P ≤ 0·001); Table 4). In addition, bias was smaller in EXT compared with the high bias levels in INT.
In the extended external calibration sets the previous external data sets were amended with farm information of that farm from which spectra were taken for validation. The statistical performances of calibration and validation for the extended external calibration sets EXT + 1/3 and EXT + 2/3, with different amounts of additional farm information, are shown in Table 5. The prediction of the fat content in raw milk was done with an accuracy classified as excellent regarding the high RPD values in all six calibration sets. Suitable up to good calibrations were developed for predicting protein content in milk. The RPD values of lactose prediction were rather small in all calibration sets.
SEC: standard error of calibration; RMSECV: root mean square error of cross validation; RMSEP: root mean square error of prediction; RPD: ratio of sd of the validation set to RMSEP
Comparing EXT + 1/3 and EXT + 2/3 (Table 5) with results from EXT (Table 4), slightly improved calibration results were observed the more farm information was added. RMSEP of fat prediction was comparable with EXT. RMSEP was improved significantly with regard to lactose content in milk (Kruskal–Wallis-Test (P ≤ 0·05)). Moderate bias values existed for CV-sets FK-EXT + 1/3 and FK-EXT + 2/3. The bias for prediction of fat and lactose in milk was rather high in FK-EXT + 1/3.
Discussion
The variation in the content of the milk ingredients, shown in Table 2, was in accordance with typical ranges of subsamples from total milkings (Nielsen et al. Reference Nielsen, Larsen, Bjerring and Ingvartsen2005). The changes of milk contents during milking time of cows have a positive effect on the performance of the calibration since they have a strong influence on spectra variability. Including subsamples of cow milkings in the calibration sets is particularly important for predicting the whole range of milk contents during the course of milking with sufficient accuracy in a practical in-line application.
Comparing the prediction results of RDM data sets with calibration results from other randomised data sets in literature (Tsenkova et al. Reference Tsenkova, Atanassova, Toyoda, Ozaki, Itoh and Fearn1999, Reference Tsenkova, Atanassova, Ozaki, Toyoda and Itoh2001, Reference Tsenkova, Atanassova, Morita, Ikuta, Toyoda, Iordanova and Hakogi2006; Schmilovitch et al. Reference Schmilovitch, Shmulevich, Notea and Maltz2000; Chen et al. Reference Chen, Iyo, Terada and Kawano2002; Kawasaki et al. Reference Kawasaki, Kawamura, Tsukahara, Morita, Komiya and Natsuga2008; Saranwong & Kawano, Reference Saranwong and Kawano2008) considerably improved or comparable prediction accuracy was achieved for all three farms with regard to the protein and lactose content. An important problem of fully randomised data sets is that the consideration of variability in future milk samples and of samples from different origin is missing in general. Naturally, accuracy of validation results from such fully randomised models is over-optimistic with regard to future samples (Peirs et al. Reference Peirs, Tirry, Verlinden, Darius and Nicola2003), a fact that is ignored in the RDM data set as well as in most literature calibration models of raw milk contents. The milk samples in a calibration set should span the natural range of variability of influences on the milk spectra in order to be able to predict milk constituents in future unknown milk samples with a sufficient accuracy (Naes et al. Reference Naes, Isaksson, Fearn and Davies2004). The distributions of milk content variation in randomised calibration and validation sets are nearly identical, which leads to advanced prediction accuracy compared with most independent validation sets. Nevertheless, additional instrumental and environmental variations must be included in calibration models to achieve accurate prediction results with high robustness in commercial farm applications, especially when the spectroscopic signal from the compound of interest is rather small (Thomas & Ge, Reference Thomas and Ge2000; Williams & Norris, Reference Williams, Norris, Williams and Norris)2001). Since spectroscopic signals in raw milk spectra in the selected wavelength range are dominated by absorption bands of O–H and C–H bonds this is of particular importance for the milk ingredients protein and lactose.
Thomas & Ge (Reference Thomas and Ge2000) suggest two different ways of including sufficient sample information variability in the calibration set to gain robust calibration models. The variant of controlled calibration with selection or preparation of concentration series including the designated concentrations of the ingredient is useful for manufactured products. Due to high variability of agricultural products this method seems insufficient for creating robust calibration models for the prediction of future samples (Dhanoa et al. Reference Dhanoa, Lister, France and Barnes1999; Peirs et al. Reference Peirs, Tirry, Verlinden, Darius and Nicola2003; Sileoni et al. Reference Sileoni, van den Berg, Marconi, Perretti and Fantozzi2011). The approach of passive or natural calibration, with a global calibration includes most part of biological variability over a sufficient period of time with samples from different origin. Despite the effort of time consuming development of natural calibrations (Thomas & Ge, Reference Thomas and Ge2000) and continuous efforts of updating or replacements with the introduction of new varieties to the model (Guthrie et al. Reference Guthrie, Wedding and Walsh1998) this approach is preferable for reaching an adequate prediction accuracy in agricultural products (Peirs et al. Reference Peirs, Tirry, Verlinden, Darius and Nicola2003; Naes et al. Reference Naes, Isaksson, Fearn and Davies2004). Since variability in NIR spectra of raw milk is in addition to the milk constituent also influenced by external disturbances such as season, farm or cow (see above) it requires a large data set of calibration spectra to cover this variation and to achieve a sufficient robustness.
The approach of an internal calibration set (ME-INT; SB-INT; FK-INT) or the procedure of an external calibration set with calibration data of different origin (ME-EXT; SB-EXT; FK-EXT) is a more adequate solution for independent validation. As expected, the results were much poorer for these internal CV-sets compared with fully randomised CV-sets. Both calibration sets include the intra-farm variance of different cows with different lactation state and number and health status. Nevertheless, lack of robustness of the calibration model can be observed, which made it difficult to predict milk contents in future raw milk samples with satisfying accuracy. According to Tsenkova et al. (Reference Tsenkova, Atanassova, Itoh, Ozaki and Toyoda2000, Reference Tsenkova, Meilina, Kuroki and Burns2009), Cattaneo et al. (Reference Cattaneo, Cabassi, Profaizer and Giangiacomo2009) and Melfsen et al. (Reference Melfsen, Hartung and Haeussermann2012c), spectral information of individual cows is dominant in raw milk spectra, a factor that was underrepresented in the calibration sets in INT and EXT. Only, a few of the cows were represented both in the calibration and in the validation sets in INT. Since the spectra variability from these cows was already included in calibration creation, better prediction results for the milk contents can be expected. As expected, the RMSEP for the milk samples from these cows were significantly smaller (Mann–Whitney U-Test (P < 0·001)) than the RMSEP for milk samples from cows that were new in the validation set (results not plotted). This was apparent for the ingredients protein and lactose of all three farms.
The objective of the analysis was to investigate whether a large external data set for calibration, with or without additional information from the validated farms and cows, avoids these effects and hence improves the robustness of the calibration models.
Even though no information of the farm, the cows or the feeding were existent in the form of raw milk spectra in the model development of external calibrations, these sets provided equal or even better prediction accuracy results compared with internal CV-sets. Since sample numbers were considerably higher in the external CV-set, improved prediction accuracy might have been caused by that fact. However, additional model developments with randomly reduced sample numbers at a level equal to the internal CV-sets still showed improved prediction results (results not plotted). Most likely, the occurrence of additional variability when more than one farm was used to build up the calibration sets represented feeding and seasonal changes better than INT and hence was responsible for a more robust calibration. The calibration sets included milk samples from a larger quantity of cows, a higher variability of lactation characteristics, multiple different diet layouts, as well as a higher variability of instrumental and environmental conditions.
Further improvements were observed when external calibration sets were provided with additional spectral information from that farm at which the validation was done. In this case, clear improvements for RMSEP of prediction of lactose content in milk and smaller bias values for protein and lactose prediction were observed. An increase in the amount of farm information in the external calibration sets led to an increased accuracy for predicting lactose content in milk and a further reduction of bias values for all milk constituents. Unexpectedly large bias values were still observed, however, when fat content in test set FKc was predicted with the calibration sets in FK-EXT + 1/3 and FK-EXT + 2/3.
Conclusion
In this study the robustness and prediction accuracy of raw milk constituents with NIRS by means of different multivariate models comprising internal and external effects were evaluated and compared. The analysed results in this study underlined that excellent to good accuracy was achieved for predicting fat, protein and lactose content in milk in fully randomised calibration and validation sets. In most cases, the variability of future milk samples is missing in randomised test sets, which causes an insufficient robustness towards independent spectra of future measurements. In consequence, validation on temporally independent spectra achieved much poorer prediction results, especially for the prediction of protein and lactose content. Prediction was improved when calibration was done on external spectra of other farms, which was probably due to the superior amount of sample variation in terms of feeding diets and cows in external calibration sets. Further improvements were achieved when additional information of the farm at which the validation was done was added to the calibration set. In summary, randomised created calibration and validation sets reflect the accuracy of raw milk content analysis only in a minor way. A robust calibration that covers up most of the variability of raw milk samples is more suitable and beneficial than calibration sets of single farms. For the creation of robust calibrations, it is recommended to include as much cow, feeding and seasonal variation as possible.
This project was co-financed by the European Regional Development Fund (ERDF). We gratefully acknowledge the ‘Zukunfsprogramm Wirtschaft Schleswig-Holstein’, Polytec GmbH and Sensologic GmbH for financial and technical support. Furthermore we thank the Chamber of Agriculture, Schleswig-Holstein, the Max-Rubner Institute, Kiel and the Melfsen & Partner GbR for enabling sampling, and the Institute for Animal Breeding and Husbandry, University Kiel for technical support.