Hostname: page-component-745bb68f8f-grxwn Total loading time: 0 Render date: 2025-02-06T09:12:36.920Z Has data issue: false hasContentIssue false

Comparison of different response variables in genomic prediction using GBLUP and ssGBLUP methods in Iranian Holstein cattle

Published online by Cambridge University Press:  23 May 2022

Mohamadreza Afrazandeh
Affiliation:
Department of Animal Science, Faculty of Agriculture Sciences and Food Industries, Science and Research Branch, Islamic Azad University, Tehran, Iran
Rostam Abdolahi-Arpanahi*
Affiliation:
Department of Animal and Dairy Science, College of Agricultural and Environmental Sciences, University of Georgia, Athens, USA
Mokhtar Ali Abbasi
Affiliation:
Animal Science Research Institute of Iran, Agricultural Research, Education and Extension Organization (AREEO), Karaj, Iran
Nasser Emam Jomeh Kashan
Affiliation:
Department of Animal Science, Faculty of Agriculture Sciences and Food Industries, Science and Research Branch, Islamic Azad University, Tehran, Iran
Rasoul Vaez Torshizi
Affiliation:
Department of Animal Science, Faculty of Agriculture, Tarbiat Modares University, Tehran, Iran
*
Author for correspondence: Rostam Abdolahi-Arpanahi, Email: rostam.abdollahi@uga.edu
Rights & Permissions [Opens in a new window]

Abstract

We compared the reliability and bias of genomic evaluation of Holstein bulls for milk, fat, and protein yield with two methods of genomic best linear unbiased prediction (GBLUP) and single-step GBLUP (ssGBLUP). Four response variables of estimated breeding value (EBV), daughter yield deviation (DYD), de-regressed proofs based on Garrick (DRPGR) and VanRaden (DRPVR) were used as dependent variables. The effects of three weighting methods for diagonal elements of the incidence matrix associated with residuals were also explored. The reliability and the absolute deviation from 1 of the regression coefficient of the response variable on genomic prediction (Dev) using GBLUP and ssGBLUP methods were estimated in the validation population. In the ssGBLUP method, the genomic prediction reliability and Dev from un-weighted DRPGR method for milk yield were 0.44 and 0.002, respectively. In the GBLUP method, the corresponding measurements from un-weighted EBV for fat were 0.52 and 0.008, respectively. Moreover, the un-weighted DRPGR performed well in ssGBLUP with fat yield values for reliability and Dev of 0.49 and 0.001, respectively, compared to equivalent protein yield values of 0.38 and 0.056, respectively. In general, the results from ssGBLUP of the un-weighted DRPGR for milk and fat yield and weighted DRPGR for protein yield outperformed other models. The average reliability of genomic predictions for three traits from ssGBLUP was 0.39 which was 0.98% higher than the average reliability from GBLUP. Likewise, the Dev of genomic predictions was lower in ssGBLUP than GBLUP. The average Dev of predictions for three traits from ssGBLUP and GBLUP were 0.110 and 0.144, respectively. In conclusion, genomic prediction using ssGBLUP outperformed GBLUP both in terms of reliability and bias.

Type
Research Article
Copyright
Copyright © The Author(s), 2022. Published by Cambridge University Press on behalf of Hannah Dairy Research Foundation

Genomic selection has been adopted as a standard tool for genetic evaluation in different livestock species (Misztal et al., Reference Misztal, Lourenco and Legarra2020). In genomic selection, there is a need to establish a reference population comprised of animals with phenotypic records and genotypes for single-nucleotide polymorphism (SNP) markers. Usually, the target phenotypic values of the reference population are called response variables. Historically, the estimated breeding values (EBVs) of animals were predicted from the phenotypic and pedigree information and then in the reference population, the EBVs were used as response variables to estimate the effects of SNPs. The genomic estimated breeding values (GEBV) of animals were predicted from the sum of these effects. Finally, the candidate animals were selected according to their GEBVs. Nowadays, the methodology of genomic selection benefits from the construction of a genomic relationship matrix (G) (VanRaden, Reference VanRaden2008), which consists of genomic relationships of the animals and allows conceptual comparisons between pedigree-based and genome-based predictions. Meanwhile, the information regarding the relations of a dairy bull is used for the prediction of his EBV and also for creating the G matrix. Therefore, if the EBV is used as a response variable, it causes double-counting of information (Garrick et al., Reference Garrick, Taylor and Fernando2009).

To avoid double-counting of information derived from the bull's relatives, the daughter yield deviation (DYD) and de-regressed proofs (DRP) are proposed as response variables. The DYD of a bull is derived from the information of its daughters only and is the average performance of the daughters adjusted for all fixed effects as well as the EBV of mates of the bull (Mrode, Reference Mrode2014). Therefore, using DYD as a response variable for genomic evaluation does not cause the problem of double-counting of information. The DRP is estimated from the EBV, which is obtained by dividing the EBV of the animal by its reliability (Goddard, Reference Goddard1985) or setting up the complete mixed model equations (MME) for all animals in the pedigree (Jairath et al., Reference Jairath, Dekkers, Schaeffer, Liu, Burnside and Kolstad1998). Thus, to simplify and prevent reconstructing the complete MME, simpler strategies have been proposed (Garrick et al., Reference Garrick, Taylor and Fernando2009; VanRaden et al., Reference VanRaden, Van Tassell, Wiggans, Sonstegard, Schnabel, Taylor and Schenkel2009). In genomic evaluation of animals, comparing these different response variables from the bias point of view is an important issue. In a study on Chinese Holstein cattle, the results of using the EBV, DRPGR (Garrick et al., Reference Garrick, Taylor and Fernando2009) and DRPJR (Jairath et al., Reference Jairath, Dekkers, Schaeffer, Liu, Burnside and Kolstad1998) as response variables for genomic evaluation were compared. The DRPJR outperformed DRPGR and EBV in terms of accuracy and unbiasedness (Song et al., Reference Song, Li, Zhang, Zhang and Ding2018).

Genomic breeding values can be predicted with one of two models of GBLUP or ssGBLUP. The GBLUP has the same form as the conventional BLUP model, but the inverse of the numerator relationship matrix $( {\bf A}^{{-}1}) $ is replaced by the inverse of the genomic relationship matrix $( {\bf G}^{{-}1}) $ (Hayes et al., Reference Hayes, Bowman, Chamberlain and Goddard2009). Using the genomic relationship, the proportion of chromosome segments shared by individuals can be estimated. The reason is that high-density genotyping identifies genes identical in state (Forni et al., Reference Forni, Aguilar and Misztal2011). However, in the ssGBLUP approach, the A matrix is replaced by the H matrix which combines G and A matrices (Misztal et al., Reference Misztal, Aguilar, Johnson, Legarra, Tsuruta and Lawlor2009). In some cases, A and G matrices are not on the same scale (Misztal, Reference Misztal2017) and one needs to use optimal scaling factors to blend ${\bf G}^{{-}1}$ with the inverse of the pedigree relationship matrix $( {\bf A}_{22}^{{-}1} ) $ for the genotyped animals (Vitezica et al., Reference Vitezica, Aguilar, Misztal and Legarra2011; Misztal et al., Reference Misztal, Tsuruta, Aguilar, Legarra, VanRaden and Lawlor2013).

Generally, the reliability of predicted response variables is not the same. Thus, in genomic predictions, the weighted analyses are carried out to account for heterogeneous residual variances among bulls due to differences in reliabilities of response variables. Therefore, the performance of ssGBLUP and GBLUP can be affected not only by the type of response variable but also by weighting of residuals. In a study of genomic prediction, in which the DRP was used as the response variable, the reliability of GEBV from ssGBLUP was 2.1% higher than reliability from GBLUP (Gao et al., Reference Gao, Christensen, Madsen, Nielsen, Zhang, Lund and Su2012). In another study, the use of ssGBLUP based on the DRP response variable led to slightly higher reliability than GBLUP (Koivula et al., Reference Koivula, Strandén, Su and Mäntysaari2012). Thus, this study aimed to estimate the reliability of genomic prediction with two methods of GBLUP and ssGBLUP, using four response variables including EBV, DYD, and two DRPs with weighted and un-weighted residuals. We used the data of Iranian Holstein dairy cattle.

Materials and methods

Data

The phenotypic performance, pedigree, and genotypes of Iranian Holstein cattle were provided by the Animal Breeding Center of Iran. The dataset consisted of 651 985 and 479 268 and 425 151 records of milk (MY), fat (FY) and protein (PY) yield, respectively, from cows of sires born in the years 1989–2014. There were 101 bulls genotyped with low-density (<20 k), 749 bulls with medium-density (>20 and <60 k), and 759 bulls with high-density (>60 k) SNP chips. SNPs with minor allele frequency (MAF) less than 0.01, call rate for each marker less than 0.95 and Hardy-Weinberg equilibrium less than α/n (α equals 0.05 and n is the number of SNPs) were removed by the QCf90 software (Misztal et al., Reference Misztal, Tsuruta, Strabel, Auvray, Druet and Lee2002). Then, all genotypes were imputed to 40 k by FImpute software (Sargolzaei et al., Reference Sargolzaei, Chesnais and Schenkel2014). Finally, 1609 genotyped bulls and 41 135 SNPs were retained for analysis.

The de-regressed proof was estimated by two methods of de-regression, namely those of Garrick et al. (Reference Garrick, Taylor and Fernando2009) and VanRaden et al. (Reference VanRaden, Van Tassell, Wiggans, Sonstegard, Schnabel, Taylor and Schenkel2009). The DYD was calculated as described in Mrode (Reference Mrode2014). Then, they were used as response variables for GEBV prediction. The two de-regression methods and DYD which were used in this study calculated as follows:

Vanraden method (DRPVR).

$$DRP_i = \left[{\displaystyle{{EBV_i-PA_i} \over {R_i^2 }}} \right] + PA_i$$

where DRP i is DRP for bull i, PA i average EBV of parents of bull i, EBV i is the estimated breeding value of bull i, $R_i^2 $ is reliability of DRP i and is calculated with:

$$R_i^2 = \displaystyle{{ERC_{P_i}} \over {ERC_{P_i} + ERC_{PA_i} + 1}}$$

where $ERC_{P_i}$ is the effective record contributions of progeny for the bull i, $ERC_{PA_i}$ is the effective record contributions for parent of bull i. They were estimated with the following formulae (VanRaden and Wiggans, Reference VanRaden and Wiggans1991):

$$ERC_{P_i} = \left[{\lambda \displaystyle{{REL_{EBV_i}} \over {( {1-REL_{EBV_i}} ) }}} \right]-ERC_{PA_i}$$
$$ERC_{PA_i} = \lambda \displaystyle{{REL_{PA_i}} \over {( {1-REL_{PA_i}} ) }}$$

where λ = (1 − h 2)/h 2, $REL_{EBV_i}$ is reliability of EBV i and $REL_{PA_i}$ is reliability of PA i.

Garrick method (DPRGR)

The equations solved to get DRP for each animal are as follows (Garrick et al., Reference Garrick, Taylor and Fernando2009):

$$\left[{\matrix{ {Z_{PA}^{\rm^{\prime}} Z_{PA} + 4\lambda } & {-2\lambda } \cr {-2\lambda } & {Z_i^{\prime} Z_i + 2\lambda } \cr } } \right]\left[{\matrix{ {PA} \cr {EBV} \cr } } \right] = \left[{\matrix{ {y_{PA}^\ast } \cr {y_i^\ast } \cr } } \right]$$

The elements of the matrices are:

$$Z_{PA}^{\rm ^{\prime}} Z_{PA} = \lambda ( {0.5\alpha -4} ) + 0.5\lambda \sqrt {( {\alpha^2 + 16/\delta } ) } $$
$$\;\;\alpha = 1/( {0.5-R_{PA}^2 } )\qquad\qquad\qquad\qquad\quad\qquad $$
$$\;\;Z_i^{\rm ^{\prime}} Z_i = \delta Z_{PA}^{\rm ^{\prime}} Z_{PA} + 2\lambda ( {2\delta -1} )\quad\quad\quad\quad\quad\quad $$
$$\;\;\delta = ( {0.5-\;R_{PA}^2 } ) /( {1-R_{EBV}^2 } ) \quad\quad\quad\quad\quad\quad$$
$$\;\;y_i^\ast{ = }{-}2\lambda \;PA_i + ( {Z_i^{\rm^{\prime}} Z_i + 2\lambda } ) EBV_i\quad\quad\quad\quad\quad$$
$$\;DRP_i = ( y_i^\ast{/}Z_i^{\rm ^{\prime}} Z_i) + PA_i \quad\quad\quad\quad\quad\quad\quad\quad$$

The reliability of DRP was calculated as:

$$R_{DRP_i}^2 = 1-\;\lambda /( {Z_i^{\rm^{\prime}} Z_i + \lambda } ) $$

Daughter yield deviation method (DYD)

The DYD was calculated as below (Mrode, Reference Mrode2014).

$$DYD_i = \displaystyle{{\mathop \sum \nolimits_i^k u_{\,prog}\ast n_{\,prog}\ast ( {2YD_{\,prog}-{\hat{a}}_{mate}} ) } \over {\mathop \sum \nolimits_i^k ( {u_{\,prog}\ast n_{\,prog}} ) }} + PA_i$$

where k is the number of daughters of the bull i, YD prog is the deviation of performance of daughters of the bull i from the average of population. It adjusts production of daughters of the bull i for all effects (except for additive animal genetic and error effect) and $\hat{a}_{mate}$ is breeding value of the bull i mate. If the mate of the bull i is known the u prog is 1 and if it is not known equals 2/3. The reliability of DYD was calculated with the following formulae (VanRaden et al., Reference VanRaden, Van Tassell, Wiggans, Sonstegard, Schnabel, Taylor and Schenkel2009).

$$\eqalign{R_{DYD}^2 = &\displaystyle{{DE_{\,prg}} \over {DE_{\,prg} + 1}} \cr DE_{\,prg} = &\displaystyle{{R_{EBV}^2 } \over {1-R_{EBV}^2 }}-\displaystyle{{R_{PA}^2 } \over {1-R_{PA}^2 }}} $$

where $R_{PA}^2 $ is the reliability of parent average EBV of the bull i $( R_{sire}^2 + R_{dam}^2 /4) $. The DE prg is the daughter equivalent from daughters information.

GBLUP and ssGBLUP methods

Using the EBV, DYD, and two DRPs as response variables, the GEBV of bulls was predicted with GBLUP and ssGBLUP methods. The statistical model is as:

$${\boldsymbol y} = 1\mu + {\bf Z} {\boldsymbol g}+ {\boldsymbol e}$$

where ${\boldsymbol y}$ is the vector of response variable, μ is the total mean, 1 is the vector with all elements of 1, Z is incidence matrix which connects ${\boldsymbol g}$ to ${\boldsymbol y}$, ${\boldsymbol g}$ is the vector of additive genetic effects of all genotyped bulls and e is the vector of residuals. The additive genetic effects have a normal distribution with $N( {0, \;{\bf G}\;\sigma_{\rm g}^2 } ) $, or $N( {0, \;{\boldsymbol \;}{\bf H}\sigma_{\rm g}^2 } ) $, $\sigma _{\rm g}^2 $ is the additive genetic variance and ${\bf G}$ represents the genomic relationship matrix (VanRaden, Reference VanRaden2008), and H is matrix which combines G and A matrices. The dimensions of matrices G and H were 1609 and 5133, respectively. ${\boldsymbol e}$ is the vector of random residuals with a normal distribution $N( {0, \;{\bf D}\sigma_e^2 } ) $, $\sigma _e^2 $ is residual variance and ${\bf D}$ represents a diagonal matrix with b ii = 1/W i, where W i is the weight.

Since the G matrix was not positive definite, therefore 5% of A matrix was added to 95% of G matrix. The H matrix blends the pedigree and genomic information (Legarra et al., Reference Legarra, Aguilar and Misztal2009). The H−1 matrix is constructed as follows (Aguilar et al., Reference Aguilar, Misztal, Johnson, Legarra, Tsuruta and Lawlor2010; Christensen and Lund, Reference Christensen and Lund2010):

$${\bf H}^{{-}1} = \left[{\matrix{ 0 & 0 \cr 0 & {\tau {( {0.95{\bf G}-0.05{\bf A}_{22}} ) }^{{-}1}-\omega {\bf A}_{22}^{{-}1} } \cr } } \right] + {\bf A}^{{-}1}$$

The ${\bf A}^{{-}1}$ is inverse of the pedigree-based relationship matrix and ${\bf A}_{22}^{{-}1} $ is the inverse of the subset of A for genotyped individuals. The ${\bf A}^{{-}1}$ consisted of individuals with genotype (1609 bulls) and the ancestors up to three generations ago. Therefore, the dimensions of matrix A were 5133 × 5133. The τ and ω as scaling factors were used for accounting for the reduced genetic variance and different depths of pedigree, respectively, to make ${\bf G}^{{-}1}$ compatible with ${\bf A}_{22}^{{-}1} $ and also ${\bf A}^{{-}1}$. Different values were tested for τ and ω, so that the optimal scaling factors had the lowest bias and the highest reliability for each response variable were selected.

Weights

For GEBV predictions with two methods of GBLUP and ssGBLUP, the residual variance matrix was weighted with three different formulae:

  1. (1) When EBV and DYD used as response variables, the diagonal elements of D matrix were weighted with R 2/1 − R 2, where the R 2 is the reliability of response variables. This weight is called as W classic in the context.

  2. (2) When DRPGR and DRPVR were used as response variables, the ERCp was used as the weight (VanRaden and Wiggans, Reference VanRaden and Wiggans1991).

  3. (3) For DRPGR and DRPVR, a new formula was used for the estimation of the weight (Garrick et al., Reference Garrick, Taylor and Fernando2009).

    $$W_{GR_i} = ( {1-h^2} ) /[ {( c + ( {1-r_i^2 } ) /( {r_i^2 } ) ) \times h^2} ] $$

where, c is the proportion of genetic variance which is not captured by markers and c was 0.1 and $r_i^2 $ was calculated according to Garrick et al. (Reference Garrick, Taylor and Fernando2009).

Genomic prediction

In this study, different datasets were prepared to assess prediction performance for three traits: (1) a full dataset containing all records of cows of which their sires born during years 1989–2014, and EBV, DYD and two DRPs were calculated for bulls to be used as benchmark; and (2) a reduced dataset included records of cows of which their sires born during years 1989–2012 and EBV, DYD and two DRPs were calculated for bulls to be used in genomic prediction. Subsequently, bulls from the reduced dataset were assigned to the reference population according to year of birth (1989–2012) and bulls born during 2013–2014 were assigned to validation population. The validation population (used to assess genomic prediction reliability and bias) included only genotyped bulls with no daughters in the reduced dataset, but with at least 10 daughters in the full dataset. Table 1 provides summary information such as number of bulls, the progeny, and the size of reference and validation populations for three traits in GBLUP and ssGBLUP methods.

Table 1. Number of bulls, progeny and the size of reference and validation populations for milk (MY), fat (FY) and protein (PY) yields in GBLUP and ssGBLUP methods

Validation

The reliability of genomic predictions for the studied traits was measured as the squared correlation between genomic prediction (obtained with the reduced dataset) and response variable (EBV, DYD or two DRPs from the full dataset) divided by the average reliability of response variable in the validation datasets (Gao et al., Reference Gao, Christensen, Madsen, Nielsen, Zhang, Lund and Su2012).

To access the bias of genomic predictions for each method, the following regression model was used:

$$RV_i = b_0 + b_1\;{X_pi} + e_i$$

where RV is the response variable (EBV, DYD or two DRPs), obtained from the full dataset, of the ith validation bull; b 0 is the intercept; b 1is the linear regression coefficient indicating bias (bias in dispersion) of the predictions; X p is the ith bull's genomic prediction obtained from the reduced dataset; and e is the residual.

Results and discussion

Using the full dataset and the single-trait model, the estimates of heritability (±standard error) for MY, FY and PY were 0.30 (±0.003), 0.21 (±0.004) and 0.24 (±0.004), respectively. The estimates correspond with the results of another report which were 0.39, 0.29 and 0.31 for MY, FY and PY, respectively from Holstein cattle of Canada (Oliveira et al., Reference Oliveira, Silva, Brito, Rocha Guarini, Jamrozik and Schenkel2018). The average reliabilities of EBV and DRPVR for three traits were 0.80 and 0.79, respectively. The response variable with the highest reliability was EBV followed by DRPVR, DRPGR and DYD (Fig. 1).

Fig. 1. The reliability of response variables for milk (MY), fat (FY) and protein (PY) yields. EBV, estimated breeding value; DYD, daughter yield deviation; VR, de-regressed proof estimated by VanRaden's formula; GR, de-regressed proof estimated by Garrick's formula.

The highest estimated correlation (r ~ 0.99) was observed between DRPVR and EBV and the lowest correlation (r ~ 0.96) was between DYD and EBV (Fig. 2). The average correlation between DRPVR and EBV was 0.99 for three traits which showed the estimates of DRPVR were almost the same as the estimates of the EBV (Fig. 2).

Fig. 2. The correlation of DYD, DRPGR and DRPVR with EBV for milk (MY), fat (FY) and protein (PY) yields. EBV, estimated breeding value; DYD, daughter yield deviation; VR, de-regressed proof estimated by VanRaden's formula; GR, de-regressed proof estimated by Garrick's formula.

Comparison of different response variables in genomic prediction

The genomic prediction is affected by the accuracy of marker effects estimation which depends on the information in response variables. The reliability, bias and the absolute deviation from 1 of the regression coefficient of the response variable on genomic prediction (Dev) using GBLUP and ssGBLUP methods for the validation population are presented in Table 2. In ssGBLUP, the reliability and Dev from un-weighted DRPGR for MY were 0.44 and 0.002, respectively. In GBLUP, the reliability and Dev from un-weighted EBV for FY were 0.52 and 0.008, respectively. Moreover, in ssGBLUP, the reliability and Dev of DRPGR in un-weighted analysis for FY were 0.49 and 0.001, respectively. In ssGBLUP, the reliability and Dev in weighted DRPGR analysis for PY were 0.38 and 0.056, respectively (Table 2).

Table 2. Reliabilities (R 2), regression coefficients (b 1) and the absolute deviation of the regression coefficients from 1.0 (Dev) for three traits of milk (MY), fat (FY) and protein (PY) yields in GBLUP and ssGBLUP methods

De-regression was based on VanRaden (DRPVR), Garrick (DRPGR), Estimated breeding value (EBV) and Daughter yield deviation (DYD); W classic, W GR and ERC P represent three methods for weighting diagonal elements of incident matrix of residual error in estimation of genomic breeding values; W without shows the diagonal elements of incident matrix of residual error not weighted; for genomic evaluation MY, FY and PY, the optimal scaling factors were τ = 1 and ω = 0.4, 0.2 and 0.5, respectively for EBV as response variable. for MY, FY and PY, the optimal scaling factors were τ = 1, 0.2 and 1.3 and ω = 0.6, 0.6 and 0.1, respectively for DRPGR as response variable. for MY, FY and PY, the optimal scaling factors were τ = 1 and ω = 0.6, 0.6 and 1, respectively for DRPVR as response variable. For MY, FY and PY, the optimal scaling factors were τ = 1.5, 1 and 1.5 and ω = 1, 0.7 and 1, respectively for DYD as response variable.

The estimated reliabilities of the response variables are different among the bulls. This variability is incorporated in ${\bf D}$ matrix which could result in more reliable predictions (Vandenplas and Gengler, Reference Vandenplas and Gengler2015). Therefore, two weighting methods in D matrix (W GR and ERCP) for DRPGR and DRPVR response variables were compared. Also, the same weighting method (W classic) was compared for EBV and DYD. The W GR is based on heritability, reliability and portion of genetic variance not explained by markers (Garrick et al., Reference Garrick, Taylor and Fernando2009). In this study, the value of c for W GR was assumed to be 0.1 according to other studies (Song et al., Reference Song, Li, Zhang, Zhang and Ding2018). If the c is very close to zero, the reliability of genomic prediction is higher and the bias is lower (Song et al., Reference Song, Li, Zhang, Zhang and Ding2018).

In cases where the EBV or DRPVR were used as response variables, the bias of prediction was the highest for MY and PY. These results agree with reports from a simulation study on different de-regression methods (Calus et al., Reference Calus, Vandenplas, Ten Napel and Veerkamp2016). The performance of response variables of DRPVR and EBV are reported to be modest (Calus et al., Reference Calus, Vandenplas, Ten Napel and Veerkamp2016). In this study, when DRPGR was used as the response variable, the bias was lower compared with the DRPVR. The results of another study showed the GEBV predicted from EBV as response variable was biased (Guo et al., Reference Guo, Lund, Zhang and Su2010). In a simulation study, it was concluded that the de-regression by DRPGR was superior compared to DRPVR method (Calus et al., Reference Calus, Vandenplas, Ten Napel and Veerkamp2016). In the present study, the results show the bias of DYD is low, which is because of no double-counting in the analysis.

Comparison of ssGBLUP and GBLUP methods in genomic prediction

The effect of scaling factors in H when combining G−1(τ) and ${\bf A}_{22}^{{-}1} ( \omega ) $ on validation reliabilities was low, but the effect on bias was high. However, scaling factors were different for each response variable and trait. The optimal scaling factors for EBV and the three traits were τ = 1 and ω = 0.4, 0.2 and 0.5, respectively. For DRPGR and the three traits, the optimal scaling factors were τ = 1, 0.2 and 1.3 and ω = 0.6, 0.6 and 0.1, respectively. For DRPVR regardless of studied trait, the optimal scaling factors were τ = 1 and ω = 0.6, 0.6 and 1, respectively. For DYD, the optimal scaling factors were τ = 1.5, 1 and 1.5 and ω = 1, 0.7 and 1, respectively. The differences in τ and ω values for different response variables and traits are due to differences in formulations of response variables and the genetic architecture of traits. The ideal scaling factors are specific according to the population and trait (Oliveira et al., Reference Oliveira, Lourenco, Masuda, Misztal, Tsuruta, Jamrozik, Brito, Silva and Schenkel2019). The scaling factors which are estimated for milk, fat and protein of Iranian Holstein cattle can be used or referred for this population by other researchers.

In ssGBLUP, the average reliability of genomic predictions for the three traits was 0.39, which was 0.98% points higher than the average reliability from the GBLUP method. Moreover, the bias of predictions from ssGBLUP was lower than GBLUP. The average Dev for the three traits was 0.11 in ssGBLUP and 0.14 in GBLUP. The τ and ω parameters that are used for calculating H showed that ω reduced bias in genomic prediction (Tsuruta et al., Reference Tsuruta, Misztal, Aguilar and Lawlor2011). It is suggested that the optimal scaling factors decrease the possible inflation of genomic predictions (Misztal et al., Reference Misztal, Tsuruta, Aguilar, Legarra, VanRaden and Lawlor2013). Using the optimal scaling factors in H matrix reduce bias and increase the reliability of prediction (Oliveira et al., Reference Oliveira, Lourenco, Masuda, Misztal, Tsuruta, Jamrozik, Brito, Silva and Schenkel2019).

In the present study the number of bulls in the reference population was relatively small (818 animals). The size of the reference population in other studies was 3,045 (Gao et al., Reference Gao, Christensen, Madsen, Nielsen, Zhang, Lund and Su2012) and 5,160 bulls (Song et al., Reference Song, Li, Zhang, Zhang and Ding2018). The accuracy of genomic evaluation depends on heritability of the trait, the method of prediction and the number of animals in the reference population (Goddard, Reference Goddard2009). Interestingly, the high reliability of predictions from ssGBLUP indicates the method can be used for predictions in populations with a small number of genotyped animals.

In the routine procedure of multi-step genomic evaluation, the EBV, de-regressed EBV, direct genomic value (DGV) and finally the GEBV is predicted. Also, in the multi-step, the G matrix is used for prediction of DGV. In the present study, multi-step method was used but instead of G matrix, the H matrix was used for predictions. The results show that using the H matrix increased the reliability and reduced the bias.

In conclusion, the type of response variable and weighting or unweighting the residuals affects the prediction performance of statistical methods. In ssGBLUP, the un-weighted DRPGR as the response variable for MY and FY and weighted DRPGR for PY outperformed other response variables. Generally, the ssGBLUP method outperformed the GBLUP method both in terms of reliability as well as bias.

Acknowledgements

The authors gratefully acknowledge the assistance of Animal Breeding Center of Iran for providing the data. We also thank Dr Guosheng Su from the University of Aarhus for helpful comments.

References

Aguilar, I, Misztal, I, Johnson, D, Legarra, A, Tsuruta, S and Lawlor, T (2010) Hot topic: a unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score. Journal of Dairy Science 93, 743752.CrossRefGoogle Scholar
Calus, M, Vandenplas, J, Ten Napel, J and Veerkamp, R (2016) Validation of simultaneous deregression of cow and bull breeding values and derivation of appropriate weights. Journal of Dairy Science 99, 64036419.CrossRefGoogle ScholarPubMed
Christensen, OF and Lund, MS (2010) Genomic prediction when some animals are not genotyped. Genetics Selection Evolution 42, 18.CrossRefGoogle Scholar
Forni, S, Aguilar, I and Misztal, I (2011) Different genomic relationship matrices for single-step analysis using phenotypic, pedigree and genomic information. Genetics Selection Evolution 43, 17.CrossRefGoogle ScholarPubMed
Gao, H, Christensen, OF, Madsen, P, Nielsen, US, Zhang, Y, Lund, MS and Su, G (2012) Comparison on genomic predictions using three GBLUP methods and two single-step blending methods in the Nordic Holstein population. Genetics Selection Evolution 44, 18.CrossRefGoogle ScholarPubMed
Garrick, DJ, Taylor, JF and Fernando, RL (2009) Deregressing estimated breeding values and weighting information for genomic regression analyses. Genetics Selection Evolution 41, 18.CrossRefGoogle ScholarPubMed
Goddard, M (1985) A method of comparing sires evaluated in different countries. Livestock Production Science 13, 321331.CrossRefGoogle Scholar
Goddard, M (2009) Genomic selection: prediction of accuracy and maximisation of long term response. Genetica 136, 245257.CrossRefGoogle ScholarPubMed
Guo, G, Lund, M, Zhang, Y and Su, G (2010) Comparison between genomic predictions using daughter yield deviation and conventional estimated breeding value as response variables. Journal of Animal Breeding and Genetics 127, 423432.CrossRefGoogle ScholarPubMed
Hayes, BJ, Bowman, PJ, Chamberlain, AJ and Goddard, ME (2009) Invited review: genomic selection in dairy cattle: progress and challenges. Journal of Dairy Science 92, 433443.CrossRefGoogle ScholarPubMed
Jairath, L, Dekkers, J, Schaeffer, L, Liu, Z, Burnside, E and Kolstad, B (1998) Genetic evaluation for herd life in Canada. Journal of Dairy Science 81, 550562.CrossRefGoogle ScholarPubMed
Koivula, M, Strandén, I, Su, G and Mäntysaari, EA (2012) Different methods to calculate genomic predictions—comparisons of BLUP at the single nucleotide polymorphism level (SNP-BLUP), BLUP at the individual level (G-BLUP), and the one-step approach (H-BLUP). Journal of Dairy Science 95, 40654073.CrossRefGoogle ScholarPubMed
Legarra, A, Aguilar, I and Misztal, I (2009) A relationship matrix including full pedigree and genomic information. Journal of Dairy Science 92, 46564663.CrossRefGoogle ScholarPubMed
Misztal, I (2017) Studies on inflation of GEBV in single-step GBLUP for type. interbull bulletin 51.Google Scholar
Misztal, I, Tsuruta, S, Strabel, T, Auvray, B, Druet, T and Lee, D (2002) BLUPF90 and related programs (BGF90). In Proceedings of the 7th world congress on genetics applied to livestock production, pp. 743744.Google Scholar
Misztal, I, Aguilar, I, Johnson, D, Legarra, A, Tsuruta, S and Lawlor, T (2009) A unified approach to utilize phenotypic, full pedigree and genomic information for a genetic evaluation of Holstein final score. Interbull Bulletin 40, 240240.Google Scholar
Misztal, I, Tsuruta, S, Aguilar, I, Legarra, A, VanRaden, P and Lawlor, T (2013) Methods to approximate reliabilities in single-step genomic evaluation. Journal of Dairy Science 96, 647654.CrossRefGoogle ScholarPubMed
Misztal, I, Lourenco, D and Legarra, A (2020) Current status of genomic evaluation. Journal of Animal Science 98, 101115.CrossRefGoogle ScholarPubMed
Mrode, RA (2014) Linear Models for the Prediction of Animal Breeding Values. CABI, Wallingford, UK.CrossRefGoogle Scholar
Oliveira, HR, Silva, FF, Brito, L, Rocha Guarini, A, Jamrozik, J and Schenkel, F (2018) Comparing deregression methods for genomic prediction of test-day traits in dairy cattle. Journal of Animal Breeding and Genetics 135, 97106.CrossRefGoogle ScholarPubMed
Oliveira, HR, Lourenco, DAL, Masuda, Y, Misztal, I, Tsuruta, S, Jamrozik, J, Brito, LF, Silva, FF and Schenkel, FS (2019) Application of single-step genomic evaluation using multiple-trait random regression test-day models in dairy cattle. Journal of Dairy Science 102, 23652377.CrossRefGoogle ScholarPubMed
Sargolzaei, M, Chesnais, JP and Schenkel, FS (2014) A new approach for efficient genotype imputation using information from relatives. BMC Genomics 15, 112.CrossRefGoogle ScholarPubMed
Song, H, Li, L, Zhang, Q, Zhang, S and Ding, X (2018) Accuracy and bias of genomic prediction with different de-regression methods. Animal: An International Journal of Animal Bioscience 12, 11111117.CrossRefGoogle ScholarPubMed
Tsuruta, S, Misztal, I, Aguilar, I and Lawlor, T (2011) Multiple-trait genomic evaluation of linear type traits using genomic and phenotypic data in US holsteins. Journal of Dairy Science 94, 41984204.CrossRefGoogle ScholarPubMed
Vandenplas, J and Gengler, N (2015) Strategies for comparing and combining different genetic and genomic evaluations: a review. Livestock Science 181, 121130.CrossRefGoogle Scholar
VanRaden, PM (2008) Efficient methods to compute genomic predictions. Journal of Dairy Science 91, 44144423.CrossRefGoogle ScholarPubMed
VanRaden, P and Wiggans, G (1991) Derivation, calculation, and use of national animal model information. Journal of Dairy Science 74, 27372746.CrossRefGoogle ScholarPubMed
VanRaden, P, Van Tassell, C, Wiggans, G, Sonstegard, T, Schnabel, R, Taylor, J and Schenkel, F (2009) Invited review: reliability of genomic predictions for North American Holstein bulls. Journal of Dairy Science 92, 1624.CrossRefGoogle ScholarPubMed
Vitezica, Z-G, Aguilar, I, Misztal, I and Legarra, A (2011) Bias in genomic predictions for populations under selection. Genetics Research 93, 357366.CrossRefGoogle ScholarPubMed
Figure 0

Table 1. Number of bulls, progeny and the size of reference and validation populations for milk (MY), fat (FY) and protein (PY) yields in GBLUP and ssGBLUP methods

Figure 1

Fig. 1. The reliability of response variables for milk (MY), fat (FY) and protein (PY) yields. EBV, estimated breeding value; DYD, daughter yield deviation; VR, de-regressed proof estimated by VanRaden's formula; GR, de-regressed proof estimated by Garrick's formula.

Figure 2

Fig. 2. The correlation of DYD, DRPGR and DRPVR with EBV for milk (MY), fat (FY) and protein (PY) yields. EBV, estimated breeding value; DYD, daughter yield deviation; VR, de-regressed proof estimated by VanRaden's formula; GR, de-regressed proof estimated by Garrick's formula.

Figure 3

Table 2. Reliabilities (R2), regression coefficients (b1) and the absolute deviation of the regression coefficients from 1.0 (Dev) for three traits of milk (MY), fat (FY) and protein (PY) yields in GBLUP and ssGBLUP methods