Introduction
Cotton (Gossypium spp.) is the leading fibre crop in the world, and secondarily an important source of edible oil and protein meals. The five largest producers (China, 27%; United States, 12%; former Soviet Union, 10%; India, 11%; Pakistan, 9%) of cottonseed oil from 1995 to 2003 accounted for 70% of global output. Cottonseed kernels contain 27.83–45.6% protein and 28.24–44.05% oil (Sun et al., Reference Sun, Chen, Xian and Wei1987). In addition to flavour stability, cottonseed oil also has superior nutritive qualities; it has a 3: 1 ratio of unsaturated to saturated fatty acids, which meets the recommendations of many health professionals. Cottonseed meal is used principally as a protein concentrate in feed for livestock (http://www.cottonseed.com/publications).
Protein and oil concentration, kernel index and kernel percentage in cotton are controlled by multiple genes (Singh et al., Reference Singh, Singh and Chahal1985; Dani and Kohel, Reference Dani and Kohel1989; Ye et al., Reference Ye, Lu and Zhu2003) and are strongly influenced by the environment (Kohel and Cherry, Reference Kohel and Cherry1983; Singh et al., Reference Singh, Singh and Chahal1985; Ye et al. Reference Ye, Lu and Zhu2003). Seed traits may be simultaneously controlled by seed nuclear genes, cytoplasmic genes and maternal nuclear genes (Ye et al., Reference Ye, Lu and Zhu2003). Previous studies indicated significant negative coefficients between oil content and protein content (Kohel and Cherry, Reference Kohel and Cherry1983; Chen et al., Reference Chen, Zhang and Cheng1986; Sun et al., Reference Sun, Chen, Xian and Wei1987). Such factors may hinder progress in improvement of these traits in conventional cotton breeding programmes.
Genetic mapping provides an essential tool to understand the genetic architecture of quantitative traits at the molecular level. DNA markers linked to quantitative trait loci (QTL) controlling seed protein content have been identified in soybean (Chung et al., Reference Chung, Babka, Graef, Staswick, Lee, Cregan, Shoemaker and Specht2003; Panthee et al., Reference Panthee, Pantalone, West, Saxton and Sams2005), rice (Tan et al., Reference Tan, Sun, Xing, Hua, Sun, Zhang and Corke2001), barley (See et al., Reference See, Kanazin, Kephart and Blake2002) and field pea (Tar'an et al., Reference Tar'an, Warkentin, Somers, Miranda, Vandenberg, Blade and Bing2004). DNA markers associated with loci controlling seed oil content or fatty acid composition have been identified in soybean (Kianian et al., Reference Kianian, Egli, Phillips, Rines, Somers, Gengenbach, Webster, Livingston, Groh, O'Donoughue, Sorrells, Wesenberg, Stuthman and Fulcher1999), rapeseed (Zhao et al., Reference Zhao, Becker, Zhang, Zhang and Ecke2006), sunflower (Bert et al., Reference Bert, Jouan, Tourvieille de Labrouhe, Serre, Philippon, Nicolas and Vear2003; Pérez-Vich et al., Reference Pérez-Vich, Knapp, Leon, Fernández-Martínez and Berry2004), oilseed mustard (Gupta et al., Reference Gupta, Mukhopadhyay, Arumugam, Sodhi, Pental and Pradhan2004) and canola (Hu et al., Reference Hu, Sullivan-Gilbert, Gupta and Thompson2006). However, no such results have been reported in cotton.
A major function of proteins in nutrition is to supply adequate amounts of required amino acids, which can be divided into two groups, essential and non-essential. Essential amino acids cannot be synthesized in animals, but play a crucial role in metabolic processes; they are lysine (Lys), histidine (His), leucine (Leu), isoleucine (Ile), valine (Val), methionine (Met), threonine (Thr), tryptophan (Trp) and phenylalanine (Phe) (D'Mello, Reference D'Mello and D'Mello2003). Lysine is an important amino acid for humans and animals; cotton kernels contain on average 2.37% Lys (dry weight of kernel powder basis), higher than rice (2.15%) and lower than wheat (2.7%) (Chen et al., Reference Chen, Zhang and Cheng1986).
Despite the central role of amino acids in animal nutrition, little information is available about the genetic control of amino acid content in cotton. Molecular analysis provides a good tool to understand the genetics of a trait in detail, to optimize its improvement through breeding. Such work has been conducted in corn (Wang and Larkins, Reference Wang and Larkins2001) and soybean (Panthee et al., Reference Panthee, Pantalone, Saxton, West and Sams2006). However, there are no such reports for cotton.
The objective of the present study was to identify and map genomic regions associated with kernel index, kernel percentage, hull percentage, kernel oil percentage, kernel protein percentage and amino acid composition, to facilitate the selection of these traits in cotton trait introgression and breeding.
Materials and methods
Plant material and trait measurements
The 140 BC1 mapping population was derived from a backcross of [TM-1 (G. hirsutum L.) × Hai7124 (G. barbadense L.)] × TM-1], the latter a genetic standard accession of G. hirsutum, kindly provided by Drs R.J. Kohel and J. Yu from the Southern Plains Agricultural Research Center, College Station, Texas. All BC1 plants were grown in the field during summer seasons and in a greenhouse during the winter at the Jiangpu Cotton Research Station of Nanjing Agricultural University (JCRSNAU), Nanjing, China, and were self-pollinated to produce BC1S1 seeds. Grafting was performed to maintain the BC1 mapping population and provide enough BC1S1 seeds.
All BC1S1 plants and the two parents were grown at JCRSNAU in 2003 and 2004 under field conditions. Each BC1S1 plot was planted in two fully randomized replications, each having two rows in 2003, and one row in 2004, measuring 5.5 m. Eight rows of parents were randomly planted among BC1S1. The row spacing and plant spacing were 0.8 m and 0.4 m in the 2 years.
Normally opened bolls were harvested, and their seeds ginned and acid-delinted. Seeds were dried at 38°C in a forced-air oven to equalize moisture contents among samples. They were then separated into seed hull and kernel to determine kernel index (KID), hull index (HID) and kernel percentage (KP). Kernel index and hull index were the weight (g) of 100 kernels and hulls, respectively. Assays for kernel oil percentage (OP) and protein (N × 6.25) percentage (PP) (on a dry weight of kernel powder basis) were made in accordance with the standard methods described in Ye et al. (Reference Ye, Lu and Zhu2003). Kernels were ground into power and dried to equilibrium at 38°C to equalize moisture contents for amino acid analysis. Amino acid content (% dry weight of kernel powder) was determined using a Biochrom30 amino acid analyser (Biochrom Ltd., Cambridge, UK), according to the manufacturer's instructions. In total, 17 amino acids, including threonine (Thr), valine (Val), methionine (Met), isoleucine (Ile), leucine (Leu), histidine (His), lysine (Lys), phenylalanine (Phe), tryptophan (Trp), aspartic acid (Asp), serine (Ser), glutamic acid (Glu), proline (Pro), glycine (Gly), alanine (Ala), cysteine (Cys) and arginine (Arg), were examined.
Construction of the SSR linkage map
QTL mapping was conducted based on the cotton genetic map (Han et al., Reference Han, Wang, Song, Guo, Gou, Li, Chen and Zhang2006). This linkage map was first constructed by Song et al. (Reference Song, Wang, Guo, Zhang and Zhang2005) with the same TM-1/Hai7124//TM-1 BC1 population and enhanced with expressed sequence tag–simple sequence repeat (EST–SSR) technology by Han et al. (Reference Han, Guo, Song and Zhang2004, Reference Han, Wang, Song, Guo, Gou, Li, Chen and Zhang2006). The present map consists of 918 marker loci assorted to 32 linkage groups, covering 5136.5 cM with an average distance of 5.60 cM between adjacent markers (Song et al., Reference Song, Wang, Guo, Zhang and Zhang2005). A complete assignment of chromosomes of cotton has been constructed by Wang et al. (Reference Wang, Song, Han, Guo, Yu, Sun, Pan, Kohel and Zhang2006) using translocation and bacterial artificial chromosome–fluorescence in situ hybridization (BAC–FISH) technology. Only segments of linkage groups associated with QTL detected in the present experiment are shown.
QTL mapping
QTL analyses were carried out with QTLNetwork-2.0 software (http://ibi.zju.edu.cn/software/qtlnetwork), using the mixed-model based composite interval mapping (MCIM) with a 10 cM window size and a 1 cM walking speed. A 10 cM filtration window was used to distinguish two adjacent test statistic peaks whether they are two QTLs or not. One thousand permutation tests were performed on all traits in the combined data from two environments to calculate the critical F value at the 5% probability level. The Monte Carlo Markov Chain (MCMC) was used to estimate QTL effects.
Results
Phenotypic variation and trait correlation
Significant differences in KID, KP, HID, OP and PP between TM-1 and Hai7124 were detected during 2003 and 2004. TM-1 had higher values of KID, KP and PP, and lower values of HID and OP (on average) than Hai7124 (Table 1). Of the seventeen amino acids examined, four essential and five non-essential amino acids differed significantly between the two parents. Except for Cys, in which the skewing exceeded 1.0, all other traits were normally distributed (skew < 1.00), and were therefore suitable for QTL analysis (Table 1). All the traits expressed transgressive segregation in both directions in the BC1S1 population.
t-values: *difference significant at α = 0.05; **difference significant at α = 0.01.
Correlation coefficients based on average data between phenotypic traits are given in Table 2. Significant positive correlations were detected between KID and HID and KP, KP and PP and OP. PP showed significant negative correlation to OP. No correlation was found between HID and other traits. Positive correlations were detected between PP and all the amino acids.
*, **: Significant at 0.05, 0.01 level, respectively.
Single-locus analysis of quantitative trait loci for seed quality
Using joint analysis, 11 significant QTL for 10 seed quality traits were identified (Table 3; Fig. 1); none for KID, HID, Cys or Val were detected.
a Range means the support interval of QTL position.
b A is the additive genetic effects estimated at the testing points. Positive values mean that the TM-1 genotype has a positive effect on the trait. Negative values indicate that the heterozygote genotype has a positive effect on the trait.
c H2(a) represents the phenotypic variations explained by the A value.
On chromosome D12, a QTL, qKP-D12-1, for kernel percentage (KP) was identified between a SSR marker BNL1227_180 and a SRAP marker DC1SA21_B, explaining 46.28% of the phenotypic variation (PV). The allele from Hai7124 increased KP. For kernel oil percentage (OP), a significant QTL qOP-D8-1 was detected in a region of 9.2 cM between two SSR markers BNL3860_190 and NAU1369_400, explaining 29.35% of PV. The genotype of TM-1 was in the direction of increasing OP. For kernel protein percentage (PP), a significant QTL qPP-D9-1 was mapped between BNL1672_140 and BNL3031_190, explaining 22.25% of PV. The genotype of the heterozygote was in the direction of increasing PP.
For aspartic acid (Asp), a significant QTL qAsp-A11-1 was detected in a region of 7.6 cM on chromosome A11, explaining 22.12% of PV. The heterozygote genotype increased Asp percentage in the kernel. For serine (Ser), a significant QTL qSer-A8-1 was identified between NAU1531_170 and NAU537_220, explaining 23.66% of PV. The allele from the Gh parent TM-1 increased kernel Ser percentage. QTL qGly-A11-1 and qGly-A8-1 were significant and detected for glycine (Gly). The former explained 16.97% of PV, while the latter explained 10.89% of PV. The positive alleles at qGly-A11-1 came from Hai7124 and the positive alleles at qGly-A8-1 from TM-1.
For isoleucine (Ile), a significant QTL qIle-D3-1 was mapped in the region of 9.4 cM between NAU1028_225 and BNL359_180, explaining 25.88% of PV. The heterozygote genotype was in the direction of increasing kernel Ile percentage. For leucine (Leu), a significant QTL was detected between SSR markers BNL3145_290 and NAU652_105 on chromosome D2, explaining 20.32% of PV. The genotype of the heterozygote was in the direction of increasing the kernel Leu percentage. For phenylalanine (Phe), a significant QTL qPhe-A8-1 was detected between SSR markers BNL3792_225 and BNL3474_193 on chromosome A8, explaining 31.13% of PV. The positive allele came from TM-1. For arginine (Arg), a significant QTL was identified in a region of 10.1 cM between two SSR markers NAU1190_205 and NAU797_170 on chromosome A5, explaining 18.21% of PV. The genotype of the heterozygote was in the direction of increasing the kernel Arg percentage.
Epistatic quantitative trait loci for amino acid composition
Two interactions for Cys and Leu were detected between the same two marker intervals, CIR099_80–CIR099_90 on D13 and Y1278B–CIR156_150 on D11, respectively. The former explained 9.55% of PV, while the latter explained 4.43% of PV (Table 4, Fig. 1). This indicates that the two regions on D11 and D13 were associated with Cys and Leu simultaneously, which agrees with the significant positive correlation between Cys and Leu. Such effects may have been due to linkage of multiple QTL, or to pleiotropic effects of a single gene on multiple traits (Jiang et al., Reference Jiang, Wright, Woo, Delmonte and Paterson2000).
a Chr is the chromosome number of the points tested.
b Ranges are the position support intervals of the two QTL.
c H2 (aa) are the phenotypic variations explained by AA.
Discussion
The complex genetic base for cottonseed quality traits has limited the progress of conventional breeding methods. Additionally, little attention has been paid to breeding for high nutrition quality lines. Molecular markers linked to QTL controlling seed oil content (Diers et al., Reference Diers, Keim, Fehr and Shoemaker1992; Mansur et al., Reference Mansur, Lark, Kross and Oliveira1993; Goldman et al., Reference Goldman, Rocheford and Dudley1994; Alrefai et al., Reference Alrefai, Berke and Rocheford1995; Tanhuanpaa et al., Reference Tanhuanpaa, Vilkki and Vilkki1995a, Reference Tanhuanpaa, Vilkki and Vilkkib; Brummer et al., Reference Brummer, Graef, Orf, Wilcox and Shoemaker1997; Kianian et al., Reference Kianian, Egli, Phillips, Rines, Somers, Gengenbach, Webster, Livingston, Groh, O'Donoughue, Sorrells, Wesenberg, Stuthman and Fulcher1999), seed protein content (Chung et al., Reference Chung, Babka, Graef, Staswick, Lee, Cregan, Shoemaker and Specht2003; Panthee et al., Reference Panthee, Pantalone, West, Saxton and Sams2005) and amino acids (Wang and Larkins, Reference Wang and Larkins2001; Panthee et al., Reference Panthee, Pantalone, Saxton, West and Sams2006) have been identified in soybean, rape seed, maize and oat. Here is the first report on QTL mapping of cotton-seed nutritional quality traits based on a new SSR genetic map using a G. hirsutum × G. barbadense population. Eleven QTL and two epistatic interaction effects were identified for 11 seed quality traits based on 2 years of phenotypic data. Both G. hirsutum and G. barbadense contributed some alleles that increase, and others that decreased KP, PP, OP and amino acids, suggesting that there exists substantial opportunity to breed cotton varieties that transgress the current range of seed phenotypes. Molecular markers identified herein provide a useful base for genetic manipulation of these seed quality traits in breeding programmes.
Cotton-seed kernels contain a high percentage of protein, which is mainly water soluble (Zhang et al., Reference Zhang, Yin, Gao and Jia1998). Zhu et al. (Reference Zhu, Zhao and Liao1995) showed that cotton-seed protein is mainly composed of subunits of 46, 41, 30, 27, 23, 16 and 14 kDa; and protein subunits of 34 and 18 kDa are specific for varieties with the A and AD genomes, respectively. Significant variation in protein components and relative content of the protein subunits were also investigated among varieties. Electrophoresis of cotton-seed ethanol-soluble proteins detected three special bands distinctive to all eight G. hirsutum varieties studied and three special bands distinctive to all four G. barbadense varieties (Gao et al., Reference Gao, Lu, Su, Liu, Wu and Zhu2003). These studies indicate a high polymorphism in cotton-seed protein components. In this paper, only one significant QTL (qPP-D9-1) for total protein percentage was identified (Table 3). This may be because total protein content did not reflect large variations in protein components between the two parents. The same reason may explain partly why only one QTL for kernel oil content was detected. The original purpose of breeding a G. hirsutum × G. barbadense population was to construct a saturated molecular genetic map of tetraploid cotton and to integrate the high yield of G. hirsutum and good fibre quality of G. barbadense; thus, little attention was paid to seed quality variation in selecting parental varieties, which may have resulted in low polymorphism at loci controlling seed quality traits.
The nutritional quality of protein is determined by its amino acid composition. Panthee et al. (Reference Panthee, Pantalone, Saxton, West and Sams2006) detected genomic regions associated with amino acid composition in soybean, and found that most of the amino acid QTL had been reported previously as protein QTL in one or more populations. In the present study, a single QTL and/or epistatic QTL was associated with three essential amino acids Ile, Leu and Phe, as well as five non-essential amino acids Ser, Asp, Gly, Arg and Cys in cotton seeds (Tables 3 and 4). These QTL may be useful to obtain balanced amino acid protein contents by breeding.
Only QTL with associations in two environments (two years) are reported here, but most of them could explain more than 10% of PV, indicating that they are major QTL (Falconer and Mackay, Reference Falconer and Mackay1996). However, it is recognized that it is necessary to validate these putative QTL in more environments and populations before using them in marker-assisted selection.
Acknowledgements
This program was supported financially by the National High-tech Program (2004AA211172), National Science Foundation in China (30070483, 30270806), the Key Project of Chinese Ministry of Education (10 418), Postdoctoral Science Foundation of China (2005037736) and Jiangsu High-tech Project (BG2002306) and IAEA (12 846).