Buffaloes are animals of great socioeconomic importance, especially in developing countries because of their contribution to meat and milk production and labor. The adaptability of these animals to the Brazilian climate has led to a significant increase in the buffalo herd in the 1980s. At present, the Brazilian buffalo herd comprises three million animals and there are four breeds officially recognized by the Brazilian Buffalo Association: Mediterranean, Murrah, Jafarabadi (Bubalus bubalis, 2n = 50), and Carabao (Bubalus carabensis, 2n = 48). In Brazil, the mozzarella cheese has gained popularity in local cooking and demand is high on the national market. Buffalo milk contains high levels of fat, protein and total solids, a fact that results in greater yields of milk products and consequent higher remuneration of producers.
Genome-wide association studies (GWAS) permit the investigation of the genetic architecture of traits through the identification of single nucleotide polymorphisms (SNPs), exploring the genome regions where this SNP is found and thus identifying genes or quantitative trait loci (QTLs) that are associated with the phenotypic expression of the trait studied (Utsunomiya et al. Reference Utsunomiya, Carmo, Neves, Carvalheiro, Matos, Zavarez, Ito, O'Brien, Solkner, Porto-Neto, Schenkel, McEwan, Cole, da Silva, Van Tassell, Sonstegard and Garcia2013). Studies that permit the identification of genes and/or chromosome regions related to the phenotypic expression of milk production traits and their components in buffalo cows should enable the construction of genome libraries for this species and thus contribute to a better understanding of the genetic control of these traits. In addition, such studies could lead to the development of a smaller specific panel for buffaloes, which would render this genomic technology more accessible for the producer.
Since only some individuals can be genotyped because of the high cost, Wang et al. (Reference Wang, Misztal, Aguilar, Legarra and Muir2012) proposed single-step GWAS (ssGWAS). This method combines all available information (phenotype, genotype, and pedigree) in a single step, in addition to the phenotypic data of animals that were not genotyped. The use of a model that takes into consideration all available information should result in gains since a more representative sample of the population is analyzed.
The ssGWAS can be an important tool for the detection of genomic regions considering a larger number of animals than the traditional GWAS, which contributes to the increase of the power of the analyses, besides allowing the adjustment of the same models used for complex traits genetic evaluations. Thus, due to the lack of reports using this methodology in buffalo dairy traits, the objective of the present study was to identify regions in the buffalo genome that could be related to milk production and milk quality traits in Brazilian buffalo cows using information from genotyped and non-genotyped individuals.
Material and methods
Phenotypic and genotypic data
Data of milk yield adjusted to 305 d (MY) and milk fat (%F) and protein (%P) percentage of cows from 12 Brazilian Murrah buffalo herds were used. The database contained 10 507 lactations (average 10 observations within each lactation) of 3355 buffalo cows monitored monthly between 1995 and 2014. The relationship matrix contained 15 495 animals. The contemporary groups were formed by the concatenation of herd and year and season of calving. Descriptive statistics of studied traits are detailed in the online Supplementary Table S1.
The animals were genotyped using the 90 K Axiom® Buffalo Genotyping panel (Affymetrix). Quality control was performed adopting the following exclusion criteria for SNPs: minor allele frequency (MAF) <0·05, call rate <0·95, and p-value for Hardy–Weinberg equilibrium <10−6. Coincident SNPs were also eliminated. The criterion for exclusion of the sample was a call rate <0·90. After application of the criteria adopted, the number of genotyped animals was 322 (289 females and 33 males) and 45 330 SNPs were considered in the present study.
Genome-wide association analysis using information from genotyped and non-genotyped animals (ssGWAS)
The genomic estimated breeding values (GEBVs) were obtained with the BLUPF90 program (Misztal et al. Reference Misztal, Aguilar, Legarra and Vitezica2002), applying a single-trait repeatability animal model that considered the contemporary groups as fixed effects, age of cow at calving as covariate (linear and quadratic), and additive genetic, permanent environmental (repeated lactations) and residual effects as random effects. The general model can be described as:
where y is a vector of phenotype values; b, a and pe are the solution vectors of fixed, additive genetic (GEBV) and permanent environmental, respectively; X, Z, and W are incidence matrices relating phenotypes with solution vectors b, a and m, respectively; and e is a vector of residuals assuming e ~ N (0, I$\sigma _e^2 $). In this analyses we assumed pe ~ N(0, I$\sigma _{pe}^2 $) and a ~ N(0, H$\sigma _a^2 $), where H is a matrix composed by A (pedigree relationship matrix) and G (the genomic relationship matrix of 322 individuals). The inverse of H matrix (H −1) is constructed according Aguilar et al. (Reference Aguilar, Misztal, Johnson, Legarra, Tsuruta and Lawlor2010) as:
where G −1 is the inverse of the genomic relationship matrix and $A_{22}^{-1} $ is the inverse of the numerator relationship matrix of genotyped animals $A_{22}^{-1} $. The G matrix is calculated according to (VanRaden, Reference VanRaden2008).
The SNP effects were obtained from the GEBVs (a) of genotyped animals in an iterative manner as described by Wang et al. (Reference Wang, Misztal, Aguilar, Legarra and Muir2012). In the present study, two iterations were performed to estimate the SNP effects. In the first iteration, weighting factors equal to one were assumed, which were calculated as a function of the squared effects of the SNPs and allele frequencies and used in the second iteration.
The variance of each SNP was calculated according to Zhang et al. (Reference Zhang, Liu, Ding, Bijma, de Koning and Zhang2010) and percentage of genetic variance explained by each SNP was calculated as described by Wang et al. (Reference Wang, Misztal, Aguilar, Legarra, Fernando, Vitezica, Okimoto, Wing, Hawken and Muir2014). The 10 SNPs explaining the highest percentage of additive genetic variance were selected for the investigation of genes.
Prospecting genes based on the selected SNPs
A chip specific for buffaloes was used in the ssGWAS analyses. Although the SNPs included in the chip were selected based on buffalo sequence data, the position of the markers and annotated genes were obtained using the cattle genome as a reference. Thus, the UMD 3·1 bovine genome sequence assembly (Zimin et al. Reference Zimin, Delcher, Florea, Kelley, Schatz, Puiu, Hanrahan, Pertea, Van Tassell, Sonstegard, Marçais, Roberts, Subramanian, Yorke and Salzberg2009) was used for prospecting genes next to the SNPs that explained the highest proportion of additive genetic variance. The SNPs were attributed to the genes if they were located in the genome sequence of the gene or within 20 kb from the 5’ and 3’ end of the first and last exon, respectively.
Further details of the analytical methods are provided in the online Supplementary File.
Results and discussion
The present results revealed the most relevant SNPs in terms of the proportions of genetic variance explained for the three traits studied (Fig. 1). The 10 SNPs of largest effect for MY, %F and %P explained 7·46, 9·94 and 6·57% of the genetic variance, respectively (Table 1). This small participation of SNPs in the genetic variation was also verified by El-Halawany et al. (Reference El-Halawany, Abdel-Shafy, Shawky, Abdel-Latif, Al-Tohamy and El-Moneim2017), who stated that only 23% of the genetic variation of milk yield can be attributed to the markers used.
Chromosome: UMD 3·1 Bos taurus
In this study, genomic regions of interest were identified on 7, 6 and 8 chromosomes for MY, FP and FP, respectively (Table 1). In other studies with buffaloes, a lower number of chromosomes with regions of interest were observed. Liu et al. (Reference Liu, Liang, Camanille, Plastow, Zhang, Wang, Salzano, Gasparrini, Cassandro and Yang2018) describe important SNPs on the BTA3 chromosome in Mediterranean buffalo for protein percentage, and El-Halawany et al. (Reference El-Halawany, Abdel-Shafy, Shawky, Abdel-Latif, Al-Tohamy and El-Moneim2017) describe milk yield-related SNPs on chromosomes BTA1, BTA5, BTA6 and BTA27.
The SNPs that explained the highest proportions of additive genetic variance in the traits are not the same as those reported by Iamartino et al. (Reference Iamartino, Williams, Sonstegard, Reecy, Van Tassell, Nicollazi, Biffani, Biscarini, Schroeder, de Oliveira, Colleta, Garcia, Ali, Ramunno, Pasquariello, Drummond, Bastianetto, Fritz and Knoltes2013) who studied milk production of buffaloes in Italy, or by de Camargo et al. (Reference De Camargo, Aspilcueta-Borquis, Fortes, Porto-Neto, Cardoso, Santos, Lehnert, Reverter, Moore and Tonhati2015) who used the same dataset. These divergent results can be attributed to the use of different statistical methods, which resulted in the identification of different regions in the genome and of new candidate genes related to the traits analyzed in the present study.
For the milk production trait, three genes of the same family were identified (ADAMTS12, ADAMTS8 and ADAMTS15). These genes participate in the process of angiogenesis (Kumar et al. Reference Kumar, Rao and Ge2012). The relationship of angiogenesis with milk production presumably relates to supply of precursors. Thus, increased vascularization of the alveoli of the udder is potentially associated with higher milk yield.
Another gene identified for %F is the ABCC4 gene. This gene has been associated with the number of services per conception in buffaloes (De Camargo et al. Reference De Camargo, Aspilcueta-Borquis, Fortes, Porto-Neto, Cardoso, Santos, Lehnert, Reverter, Moore and Tonhati2015), and exerts effects on both productive and reproductive traits.
The BDNF gene was identified for %P. This gene has already been associated with the same trait in cattle (Zielke et al. Reference Zielke, Bortfeldt, Tetens and Brockmann2011), indicating that some genes can act in the same order of importance in different species.
Conclusion
The results of ssGWAS revealed genomic regions related to milk yield and milk fat and protein percentage in buffalo cows. The study also showed that these chromosome regions explaining the highest proportion of additive genetic variance harbor candidate genes with biological functions that could be directly related to the traits studied.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S0022029918000766.
The authors thank the financial support offered by grant 2013/24427-3 from Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP, São Paulo, SP, Brazil) and National Council for Scientific and Technological Development (CNPq).