Introduction
Soybean [Glycine max (L.) Merr.] belongs to the family Leguminosae, sub-family Papilionatae, genus Glycine and subgenus Soja (Hymowitz, Reference Hymowitz, Boerma and Specht2004). As a food crop, it has both high nutritional and health value, its seed being rich in both protein (40%) and oil (20%), as well as appreciable amounts of saponin, isoflavones and phospholipids. With its versatile end-uses, soybean today is grown all over the world for food, feed, oil, industrial materials and bio-energy. The global area planted to the crop has increased markedly over the past 40 years. FAO statistics (http://faostat.fao.org) show that, for example, planting area and total production in China in the period 1970–2008 rose from 8.0 to 9.1 Mha and from 8.8 to 15.6 Mt. Over the same period, the world planting area grew from 29.5 to 96.9 Mha, and production from 43.7 to 231.0 Mt (Fig. 1).
Soybean was domesticated in China about 4500 years ago, during the ancient Huangdi period. Its earliest recorded mention in the literature occurs in ‘The Book of Songs’, where a reference was made to the fact that its maturity is rather variable. During the Western Jin Dynasty (265–316 CE), Guo Yigong in his book ‘Guang Zhi’ ascribed names to particular soybean cultivars. Various seed-coat colours (white, yellow and black) were referred to in the time of the Yuan Dynasty (1279–1368 CE). Wild soybean was first recorded in the ‘Herbal for Relief of Famines’, written by Zhu Su in 1406 CE (Ming Dynasty). By the late 18th century (Qing Dynasty), Cheng Yaotian in his book ‘Jiu Gu Kao’ was able to describe three types of soybean grown in southern China, namely spring type (Liuyuehuang or ‘June yellow’), summer type (Bayuebai or ‘August white’) and autumn soybean (Dongdou or ‘winter bean’). Soybean landraces were transported to Japan, Indonesia, The Philippines and Vietnam between the first century CE and the Age of Discovery (15–16th century) (Hymowitz and Newell, Reference Hymowitz, Newell, Summerfeld and Bunting1980), and later introduced to Europe and America.
Chinese soybean germplasm has played a key role, not only in the domestic, but also in world soybean breeding and production. Since World War II, the level of soybean production has increased greatly worldwide. The pedigree of many high-yielding cultivars can be traced back to Chinese germplasm. It has been estimated that 50% of the nuclear DNA and 83% of the cytoplasmic DNA of northern US cultivars originated in China (Gizlice et al., Reference Gizlice, Carter and Burton1994). Soybean production began in Brazil in 1960, and has since become a major cash crop, to the extent that Brazil now ranks as the second largest world producer. The 69 cultivars used in Brazil are thought to trace back to just 26 ancestors, of which 11 contributed 89% of their genetic make-up (Hiromoto and Vello, Reference Hiromoto and Vello1986). Since six of the key progenitors of US soybean germplasm also feature in the ancestry of the Brazilian cultivars, these two major gene pools are closely related to one another. Similarly, the soybean output of Argentina is based largely on US and Brazilian material, and thus indirectly from China. Chinese germplasm has also contributed significantly to cultivar development in Korea, Japan, Europe and Australia in the same way.
Speciation and distribution in the genus Glycine
The primary germplasm pool of the genus Glycine is represented by the subgenus Soja (Harlan and de Wet, Reference Harlan and de Wet1971; Hymowitz, Reference Hymowitz, Boerma and Specht2004), which includes the two annual species Glycine soja Sieb. & Zucc. (wild soybean) and G. max (L.) Merr. (cultivated soybean). G. soja is endemic to China, far eastern Russia, Japan and Korea, but 90% of its geographic range lies within China. In contrast, G. max is distributed very widely (Hymowitz, Reference Hymowitz, Boerma and Specht2004). Although both are predominantly self-pollinating species, outcrossing is more frequent in G. soja (2.4–19.0%, Kiang et al., Reference Kiang, Chiang and Kaizuma1992; Fujita et al., Reference Fujita, Ohara, Okazaki and Shimamoto1997) than in G. max (1–2.5%, Chiang and Kiang, Reference Chiang and Kiang1987; Ahrent and Caviness, Reference Ahrent and Caviness1994). Chromosome pairing at pachytene has demonstrated that the two species' genomes are very homologous to one another (Singh and Hymowitz, Reference Singh and Hymowitz1998). The mode of inheritance for most traits in interspecific G. soja × G. max crosses is similar to that found in intraspecific crosses within G. max. Natural hybridization between the wild and the cultivated soybean in an experimental situation can reach 6% per maternal plant, but on average remains below 1% (Nakayama and Yamaguchi, Reference Nakayama and Yamaguchi2002).
The distribution of G. soja within China has been thoroughly surveyed. It ranges from Yixiken Tahe to Mohe County (Heilongjiang Province) in the north, from Xiangzhou County (Guangxi Province) to Yingde County (Guangdong Province) in the south, from Fuyuan County (Heilongjiang Province) in the east, and in the Shangchayu and Xiachayu Regions of Tibet in the west. In north-eastern China, the upper altitude bound for the species is 1300 m asl, rising to 1700 m asl in the Yellow River and Yangzi River valleys, and 2250 m asl in Tibet. Its highest recorded location is 2650 m asl in Ninglang County (Yunnan Province). With the exception of Qinghai, Xinjiang and Hainan, this species is present in all the Chinese Provinces, (Fig. 2). G. max, in contrast, is grown all over the country (Fig. 3), and is planted throughout the year, depending on the local cropping system. Chinese soybean cultivars are classified as spring types (Ssp), summer types (Ssu) or autumn types (Sau). Wang (Reference Wang1987) devised a cultivar classification system for 22,595 soybean accessions, based on a combination of these three planting types and maturity time, seed-coat colour and 100-seed weight; this produced a set of 403 groups, which was somewhat fewer than the theoretical maximum of 480 (Zhou et al., Reference Zhou, Peng, Wang and Chang1998).
The subgenus Glycine includes 23 perennial species, 20 of which are endemic to Australia except for Glycine dolichocarpa Tateishi & Ohash, Glycine tabacina (Labill.) Benth. and Glycine tomentella Hayata. The former species has been collected in Taiwan in China (Hsieh et al., Reference Hsieh, Hsieh, Tsai and Hsing2001). G. tabacina (Labill.) Benth. and G. tomentella Hayata are also found in Fujian Province and in Taiwan (Hsieh et al., Reference Hsieh, Hsieh, Tsai and Hsing2001; Gao et al., Reference Gao, Qian, Ma and Zheng2002) in areas which do not overlap with G. soja. The perennial species are highly variable in terms of their morphology, chromosome number and sequence variation, and are adapted to a diverse set of climatic and edaphic conditions (Hymowitz, Reference Hymowitz, Boerma and Specht2004). Wide crosses can be made between G. tomentella and G. max (Ladizinsky et al., Reference Ladizinsky, Newell and Hymowitz1979; Chung and Kim, Reference Chung and Kim1990) or G. soja (Hymowitz et al., Reference Hymowitz, Sing and Kollipara1998), and provide a potential source of genetic diversity enhancement. The lack of any geographical overlap in the distribution within China of any of the perennial species and G. soja presents a problem in trying to explain the origin of the annual type.
Germplasm collection and conservation
Qiu and Chang (Reference Qiu, Chang and Singh2010) have provided a detailed overview of the history of soybean germplasm collection in China. Even prior to the 19th century, the records show that diversity in China was already substantial. Local chronicles name a number of landraces, notably Ciguqing, Dengxifeng, Niutabian, Shanzibai, Houzimao, Hebaodou, Suidaohuang and Mashizi. Germplasm collection was initiated in China in the early 20th century. The Gongzhuling Agricultural Experimental Station (Jilin Province) was established in 1913, and its researchers collected and evaluated germplasm from north-eastern China. Meanwhile, in southern China, Professor Shou Wang, based at Jinling University (Nanjing), was also collecting germplasm in the 1920s, and some of this collection was evaluated by the pioneer soybean breeder Professor Jinlin Wang. The American plant explorer P. H. Dorsett was the first foreigner to systematically collect Chinese soybean accessions, and between 1924 and 1926 managed to dispatch ~1500 accessions to the US. This was followed by a joint collection with W.J. Morse between 1929 and 1931 in Japan, Korea and China, which gathered some 4500 further accessions (Hymowitz, Reference Hymowitz1984). Although only a small proportion of the Morse–Dorsett collection has survived, it includes line selected later as the cultivar Richland, all of which have been important parents in the US soybean breeding programme. In 1929, N.I. Vavilov investigated the distribution of soybean in China, Japan and North Korea, and assembled a small collection of cultivated and wild types (Kurlovich et al., Reference Kurlovich, Rep'ev, Petrova, Buravtseva, Kartuzova and Voluzneva2000). As a result, the Chinese soybean spread all over the world, so that the number of G. max accessions currently conserved has now reached at least 170,000 across 70 countries (Nelson, Reference Nelson2009a, Reference Nelsonb). The Asian G. soja collection remains relatively small, but is larger than that of the combined 20 perennial species. The largest single collection of soybean is curated by China. Of its 31,575 accessions, 18,780 are local landraces, 2370 are local breeding lines, 1500 are modern Chinese cultivars, 2156 are cultivars bred overseas and 6644 are G. soja. The three perennial Glycine spp. present in China are represented by 125 accessions. Thus, most of the X collection is native to China (Figs 2 and 3).
Four strategies, namely long-term, medium-term and replicated storage, along with in situ conservation, have been applied to conserve Chinese soybean germplasm. The long-term and medium-term storage facilities are located at the National Crop Genebank at the Institute of Crop Science, Chinese Academy of Agricultural Sciences, Beijing, China. In long-term storage, germplasm is stored at − 18°C and 50 ± 7% relative humidity, and the medium-term material is stored at − 4 to 0°C. Each accession from mid-term storage is field regenerated every 5–10 years to ensure viability, as well as to multiply seed to meet demand. The replicated storage option requires that each accession is copied and conserved at the National Germplasm Storage Facility in Qinhai. In situ conservation is primarily applied to G. soja, since populations typically show high levels of genetic heterogeneity. The more than 40 in situ conservation sites are located in 15 provinces and regions. The populations are well differentiated from one another (Zhao et al., Reference Zhao, Nian and Yang2009). Their genetic diversity is generally assessed by genotyping ~40 individuals at >20 marker loci (Guan et al., Reference Guan, Liu, Chang, Ning, Yuan, Liu and Qiu2006; Zhao et al., Reference Zhao, Cheng, Lu and Lu2006; Zhu et al., Reference Zhu, Zhou, Zhong and Lu2006), a strategy which is designed to capture at least 90% of the total genetic diversity present.
The second largest world soybean collection is curated by the USDA. In contrast to the Chinese collection, the bulk of US accessions (~93%) represent introductions from elsewhere. The current USDA collection comprises 19,557 G. max accessions, derived from 87 countries, 1181 G. soja accessions and 1038 representatives of the 20 perennial species. The provenance of most of the G. max accessions is from either China, Japan or Korea, although a serious effort has also been made to collect materials from Australia, Vietnam, India, Indonesia, Germany, Moldova, France, Africa and South America. A number of smaller collections are maintained by Brazil, Russia, South Korea, India, Thailand and Japan. Redundancy in the global collections may be as high as 70% (Nelson, Reference Nelson2009a, Reference Nelsonb).
Germplasm evaluation
Phenotyping diversity and gene discovery
The seed (coat colour, cotyledon colour, hilum colour, seed shape and 100-seed weight) and plant (flower colour, pubescence colour, leaf shape, plant height and podding habit) characteristics, along with the maturity time and contents of seed protein and oil of the accessions conserved under long-term storage, have been recorded in various germplasm catalogues. Full descriptions of released cultivars, including coloured images of plant and seed, have been published in three series of the Chinese Soybean Cultivar Catalogue (Wang, Reference Wang1982; Chang and Sun, Reference Chang and Sun1991; Chang et al., Reference Chang, Sun, Qiu and Chen1996; Qiu et al., Reference Qiu, Wang and Chang2007a). A set of 132 defined traits, classified into six categories, has been integrated into the system (Qiu and Chang, Reference Qiu and Chang2006). However, the G. soja accessions have only been evaluated with respect to their morphological traits, with a sample tested for various seed quality characteristics (Institute of Crop Germplasm Resource, Chinese Academy of Agricultural Sciences, 1990).
A large-scale evaluation of 17 traits was measured for >20,000 conserved accessions (Qiu et al., Reference Qiu, Chang, Chen, Li, Guan and Sun2002), but none of these traits were completely identified in all the accessions. The accession evaluation rate was different for various traits, but the average was 35% (Table 1). The seed protein content of 21,050 cultivated (Qiu et al., Reference Qiu, Chang, Chen, Li, Guan and Sun2002) and 6115 wild accessions (Wang et al., Reference Wang, Liu, Cao and Zhang1998) was assessed. The mean oil content of cultivated soybeans across the whole country was 17.2%, which is 6.2% more than is present in wild soybean seed; the proportion of oleic acid was 23.2% (7.7% more than in the wild soybean), that of linoleic acid was 53.5% (2.6% less than in the wild soybean), while that of linolenic acid was 8.0% (4.2% less than in the wild soybean). Among the cultivated soybean, this ranged from 29.3 to 52.9% (mean 44.31%), while among the wild soybean, the range was 29.0–55.7% (mean 45.4%). Thus, there was an indication that the protein content of the wild soybean tended to be superior to that of the cultivated soybean. A screen of over 10,000 accessions from various production regions in China led to the identification of, respectively, 144, 318 and 11 accessions as resistance sources to soybean cyst nematode (SCN) races 1, 3 and 4 (Coordinative Group of Evaluation of SCN, 1993). A further screen of 9729 accessions for rust resistance produced no immune or highly resistant material, and detected just 83 accessions having a medium level of resistance (Tan et al., Reference Tan, Shan, Shen, Yu, Chang, Sun, Luo and Xiao1997; Shan et al., Reference Shan, Tan and Shen2000). Shan et al. (Reference Shan, Shan, Qiu, Zhang, Cai, Wu and Zhou2008) have, however, identified a highly resistant accession of G. soja.
SMV, Soybean Mosaic Virus.
A number of screening were made on the basis of individual trait performance. In south China, protein content of soybean seed ranged from 30 to 53%. Ten cultivars with stable high protein (>49%) were selected as elite cultivars from 1656 cultivars (Huang, Reference Huang1989). Two accessions, namely N23547 and N23697, were selected from a screen of 265 cultivated and 141 wild soybeans on the basis of their high oil content, high oleic acid content, high linoleic acid and low linolenic acid (Zheng et al., Reference Zheng, Gai, Zhao, Zhou and Tian2008). A further analysis of protein content, oil content and soluble protein content among 264 cultivars released since 2003 and 190 bred before 1995 (Li et al., Reference Li, Zhu, Liu, Liu, Zhang, Li and Wang2004) showed that the protein content of the earlier cultivars was similar to that of the more recent releases. However, oil content has changed over time, with high oil content cultivars predominating since 2003. Qiu et al. (Reference Qiu, Wang, Zhou, Chen and Chang2007b) have also described the use of genetic mapping, based on populations created from elite lines, to define the inheritance of a number of key seed quality and other agronomic traits. In addition to the 17 core traits above, sub-collections of the material have been evaluated for traits of particular interest to genetic research and breeding (Table 2). For example, in a survey of 78 perennial accessions, Qu et al. (Reference Qu, Liu, Li, Qiu, Tang, Chang and Cao2008) identified G. tabacina leaves as synthesizing high quantities of daidzein and genistein, both of which are of pharmacological interest. The soybean aphid (Aphis glycines Matsumura) is a native of eastern Asia, but by 2000 had appeared in the US, prompting a high level of interest in securing sources of resistance. A small number of potentially resistant accessions were eventually identified in China (Fan, Reference Fan1988; Sun et al., Reference Sun, Liu, Hu and Xu1991). The detailed list of sources of biotic and abiotic stress tolerances, along with certain seed quality traits, is given in Supplementary Tables S1–S3, available online only at http://journals.cambridge.org.
Assessment of genetic diversity based on molecular markers
Although diversity can be evaluated using various types of phenotypic data, genotypic variation has proven particularly useful for the assessment of levels of genetic diversity in germplasm and for the identification of rare alleles. Using a combined phenotypic and genotypic approach, various attempts have been made to define the intra- and interspecific levels of diversity present in G. max and G. soja (Powell et al., Reference Powell, Morgante, Andre, Hanafey, Vogel, Tingey and Rafalski1996; Wu et al., Reference Wu, He, Chen, Zhuang, Wang and Wang2001; Kuroda et al., Reference Kuroda, Kaga, Tomooka and Vaughan2006; Li et al., Reference Li, Li, Zhang, Yang, Chang, Gaut and Qiu2010). Genetic relationships between accessions have also been established by classifying nucleotide variation in the chloroplast genome (Sakai et al., Reference Sakai, Kanazawa, Fujii, Thseng, Abe and Shimamoto2003). Some 6.8% of G. soja accessions show evidence of introgression from G. max (Kuroda et al., Reference Kuroda, Kaga, Tomooka and Vaughan2006). Overall, about 20% of accessions are not easily classifiable into one or other of the two species, and thus suggest a history of extensive post-domestication hybridization between G. max and G. soja (Li et al., Reference Li, Li, Zhang, Yang, Chang, Gaut and Qiu2010).
Genotypic analysis has also succeeded in identifying a degree of geographical differentiation between both wild and cultivated soybean populations, even where the overall level of genetic variance (particularly in the G. soja populations) is relatively low. The gene pools of Japanese, Korean, Russian and Chinese G. soja appear to be rather distinct from one another (Abe et al., Reference Abe, Xu, Suzuki, Kanazawa and Shimamoto2003; Wang and Takahata, Reference Wang and Takahata2007; Li et al., Reference Li, Li, Zhang, Yang, Chang, Gaut and Qiu2010). Sequence variation with respect to the nuclear, mitochondrial and chloroplast genomes has been used to show that G. soja from the north of Japan is quite distinct from that adapted to the south (Tozuka et al., Reference Tozuka, Fukushi, Hirata, Ohara, Kanazawa, Mikami, Abe and Shimamoto1998; Xu et al., Reference Xu, Abe, Gai and Shimamoto2002; Kuroda et al., Reference Kuroda, Kaga, Tomooka and Vaughan2006). Similarly, in Korea, G. soja originating from the nine South Korean Provinces could be assigned into one of three geographical groups (Lee et al., Reference Lee, Yu, Hwang, Blake, So, Lee, Nguyen and Shannon2008); while in China, three clusters corresponding to the northern region, the Huanghuai region and the southern region were recognized from a combined genotypic (DNA markers) and phenotypic (morphological traits) analysis (Ding et al., Reference Ding, Zhao and Gai2008; Wen et al., Reference Wen, Ding, Zhao and Gai2009; Li et al., Reference Li, Li, Zhang, Yang, Chang, Gaut and Qiu2010). The variation in morphological traits suggests that geographical variation, meaning most probably photoperiod and mean temperature, exerts a strong selective force. The genetic structure of G. max was also consistent with provenance (Xu and Gai, Reference Xu and Gai2003; Abe et al., Reference Abe, Xu, Suzuki, Kanazawa and Shimamoto2003; Dong et al., Reference Dong, Zhao, Liu, Wang, Jin and Sun2004; Li et al., Reference Li, Guan, Liu, Ma, Wang, Li, Lin, Luan, Chen, Yan, Guan, Zhu, Ning, Marinus, Smulders, Li, Piao, Cui, Yu, Guan, Chang, Hou, Shi, Zhang, Zhu and Qiu2008a, Reference Li, Li, Chang and Zhangb, Reference Li, Zheng and Hanc). Other similar studies have underlined that sets of cultivated accessions from particular countries generally represent distinct gene pools (Qiu et al., Reference Qiu, Nelson and Vodkin1997a, Reference Qiu, Zhao and Gaib; Hirata et al., Reference Hirata, Abe and Shimamoto1999; Kuroda et al., Reference Kuroda, Kaga, Tomooka and Vaughan2006; Guan et al., Reference Guan, Qin, Hu, Chen, Chang, Liu and Qiu2009). In China, production region and planting type together were highly influential (Qiu et al., Reference Qiu, Li, Guan, Liu, Wang and Chang2009). The seven-cluster structure, which emerged from a microsatellite-based dataset acquired from a collection of 1863 Chinese landraces, was in good overall accordance with the production area and the planting season (Li et al., Reference Li, Guan, Liu, Ma, Wang, Li, Lin, Luan, Chen, Yan, Guan, Zhu, Ning, Marinus, Smulders, Li, Piao, Cui, Yu, Guan, Chang, Hou, Shi, Zhang, Zhu and Qiu2008a, Reference Li, Li, Chang and Zhangb, Reference Li, Zheng and Hanc).
A number of potential centres of genetic diversity of G. soja have been suggested in China, based on genotypic data (Xu et al., Reference Xu, Lu and Zhuang1987; Dong et al., Reference Dong, Zhuang, Zhao, Sun and He2001). An analysis based on a set of 12 qualitative and quantitative traits led Dong et al. (Reference Dong, Zhuang, Zhao, Sun and He2001) to propose three centres, with the one in the North-east being the primary one. However, a study of nuclear and cytoplasmic DNA sequence polymorphisms has demonstrated that genetic diversity was highest in populations from southern China (Shimamoto et al., Reference Shimamoto, Fukushi, Abe, Kanazawa, Gai, Gao and Xu1998; Xu et al., Reference Xu, Gao, Tian, Gai, Fukushi, Kitajma, Abe and Shimamoto1999; Wen et al., Reference Wen, Ding, Zhao and Gai2009). Populations sampled in South Korea, China and Japan all show high levels of genetic diversity (Lee et al., Reference Lee, Yu, Hwang, Blake, So, Lee, Nguyen and Shannon2008), and Abe et al. (Reference Abe, Ohara and Shimamoto1992) concluded from an isozyme-based analysis of 383 Japanese and 28 Korean accessions that Korea was the most likely centre of genetic diversity.
The distribution of genetic diversity shown by Chinese G. max populations is rather uneven (Zhou et al., Reference Zhou, Peng, Wang and Chang1998; Dong et al., Reference Dong, Zhao, Liu, Wang, Jin and Sun2004; Xie et al., Reference Xie, Guan, Chang and Qiu2005; Li et al., Reference Li, Guan, Liu, Ma, Wang, Li, Lin, Luan, Chen, Yan, Guan, Zhu, Ning, Marinus, Smulders, Li, Piao, Cui, Yu, Guan, Chang, Hou, Shi, Zhang, Zhu and Qiu2008a, Reference Li, Li, Chang and Zhangb, Reference Li, Zheng and Hanc; Wang et al., Reference Wang, Lin, Luan, Li, Guan, Li, Ma, Liu, Chang and Qiu2008a, Reference Wang, Wang, Dong, Yu and Mengb). The diversity present in a collection of >22,000 entries, as measured from the standard set of qualitative and quantitative traits, indicated that the original centre of cultivated soybean within China was a strip running from the South-east to the North-east (Zhou et al., Reference Zhou, Peng, Wang and Chang1998). Dong et al. (Reference Dong, Zhao, Liu, Wang, Jin and Sun2004) suggested that the lower section of the Yellow River valley was a probable centre of diversity, based on an analysis of 15 qualitative and quantitative traits, a conclusion reinforced by microsatellite-based genotyping (Li et al., Reference Li, Guan, Liu, Ma, Wang, Li, Lin, Luan, Chen, Yan, Guan, Zhu, Ning, Marinus, Smulders, Li, Piao, Cui, Yu, Guan, Chang, Hou, Shi, Zhang, Zhu and Qiu2008a, Reference Li, Li, Chang and Zhangb, Reference Li, Zheng and Hanc). The latter authors identified the Huanghuai region (32.0–40.5°N, 105.4–122.2°E) along the central and downstream parts of the Yellow River valley, as being the most genetically diverse. However, a second region SsuM (South region, Summer-sowing type, Modeled) was also considered as a plausible centre of genetic diversity, since it featured substantial allelic richness and the largest observed number of region-specific alleles.
The relationship between G. max and G. soja has also been explored using DNA-based marker technology, both restriction fragment length polymorphism (Gai et al., Reference Gai, Xu, Gao, Yoshiya, Jun, Hirofumi and Shunji2000), and microsatellite and single nucleotide polymorphism (SNP) (Li et al., Reference Li, Li, Zhang, Yang, Chang, Gaut and Qiu2010). This approach suggested four locations in China for the domestication process (Qiu and Chang, Reference Qiu, Chang and Singh2010). Domestication was responsible for a massive loss of rare alleles (81% of those present in G. soja are not represented in G. max), implying a narrowing of the genetic base available for soybean improvement (Hyten et al., Reference Hyten, Song, Nelson, Specht, Shoemaker and Cregan2008a, Reference Hyten, Song, Zhu, Choi, Nelson, Costa, Specht, Shoemaker and Creganb). Guo et al. (Reference Guo, Wang, Song, Zhou, Qiu, Huang and Wang2010) have estimated that only about 2% of the variation present in G. soja has been retained in the domesticated G. max gene pool. The recent acquisition of the genome sequence of soybean (Schmutz et al., Reference Schmutz, Cannon, Schlueter, Ma, Mitros, Nelson, Hyten, Song, Thelen, Cheng, Xu, Hellsten, May, Yu, Sakurai, Umezawa, Bhattacharyya, Sandhu, Valliyodan, Lindquist, Peto, Grant, Shu, Goodstein, Barry, Griggs, Abernathy, Du, Tian, Zhu, Gill, Joshi, Libault, Sethuraman, Zhang, Shinozaki, Nguyen, Wing, Cregan, Specht, Grimwood, Rokhsar, Stacey, Shoemaker and Jackson2010) and the development of high-throughput SNP discovery systems (Hyten et al., Reference Hyten, Cannon, Song, Weeks, Fickus, Shoemaker, Specht, Farmer, May and Cregan2010) will doubtless produce many new insights into the history of soybean domestication.
Gene diversity based on association mapping
Genotype/phenotype association analyses require diversity at both levels. Zhang et al. (Reference Zhang, Li, Li, Hu, Fan, Chen, Wang, Liu, Fu and Lin2008) used such an approach in G. soja to establish that the photoperiod-dependent circadian rhythm expression of the GmCRY1a protein is correlated with photoperiod-induced flowering and the latitudinal distribution of soybeans. On this basis, it was proposed that the genes affecting the expression of GmCRY1a play an important role in determining the latitudinal distribution of both the cultivated and the wild forms. Shi et al. (Reference Shi, Chen, Zhang and Hou2010) have used a set of genotypic (SSR) data, collected from 105 food-grade soybean lines, to identify a sub-set of 13 SSR loci (mapping to 11 chromosomes) likely to be linked to genes affecting seed oil content, and a set of 19 SSR loci (distributed over 14 chromosomes) with those determining protein content. Twelve of these SSR loci were associated with genes controlling both protein and oil content. In a search for markers for resistance to SCN, Li et al. (Reference Li, Zhang, Gao, Smulders, Ma, Liu, Nan, Chang and Qiu2009) applied SNP technology to a set of phenotypically characterized germplasm. This approach succeeded in identifying a genetic marker for resistance located within a proposed candidate gene, and led to the development of an effective marker-assisted selection strategy for breeding resistance to this pathogen. In a screen of both G. max and G. soja populations, Wen et al. (Reference Wen, Zhao, Zheng, Liu, Wang, Wang and Gai2008a) were able to show that linkage disequilibrium was greater in G. max than in G. soja. Of the 60 SSR loci tested, 27 in G. max and 34 in G. soja were associated with variation for traits such as plant height, oil content and total protein content. Many of these SSR loci have also been linked to quantitative trait loci in segregation studies (Wen et al., Reference Wen, Zhao, Zheng, Liu, Wang, Wang and Gai2008a). Some elite alleles existed in G. max and G. soja were also discovered in an additional study (Wen et al., Reference Wen, Zhao, Zheng, Liu, Wang, Wang and Gai2008b).
Establishment of core collections
Although a large number of soybean accessions are now conserved in various genebanks or in situ, fewer than 1% of these have been used for breeding. In order to broaden the cultivar genetic base, a primary core collection has been assembled from the overall Chinese soybean germplasm collection, using a combination of passport data and morphological traits (Qiu et al., Reference Qiu, Cao, Chang, Zhou, Wang, Sun, Xie, Zhang, Li, Xu and Liu2003). This set of accessions has since been comprehensively genotyped using SSR markers (Li et al., Reference Li, Guan, Liu, Ma, Wang, Li, Lin, Luan, Chen, Yan, Guan, Zhu, Ning, Marinus, Smulders, Li, Piao, Cui, Yu, Guan, Chang, Hou, Shi, Zhang, Zhu and Qiu2008a, Reference Li, Li, Chang and Zhangb, Reference Li, Zheng and Hanc). Further sub-sets were constructed, one from the G. soja collection (Zhao et al., Reference Zhao, Dong, Liu, Hao, Wang and Li2005; Wang et al., Reference Wang, Guan, Liu, Chang and Qiu2006a, Reference Wang, Guan, Guan, Li, Ma, Dong, Liu, Zhang, Zhang, Liu, Chang, Xu, Li, Lin, Luan, Yan, Ning, Zhu, Cui, Piao and Liub; Qiu et al., Reference Qiu, Li, Guan, Liu, Wang and Chang2009) and another from the G. max mini core collection (Qiu et al., Reference Qiu, Li, Guan, Liu, Wang and Chang2009). In addition to these two core collections, Brown (Reference Brown1989) had earlier established a perennial core collection. Since this time, a number of other core collections have been established to address the needs of soybean breeders; these include variously the ancestral parental lines of modern cultivars (Gizlice et al., Reference Gizlice, Carter and Burton1994; Gai and Zhao, Reference Gai and Zhao2001), and accessions resistant to soybean mosaic virus (Mi et al., Reference Mi, Qiu, Chang, Hao and Guan2003) and to SCN (Ma et al., Reference Ma, Wang, Wang and Qiu2006; Duan et al., Reference Duan, Zhou, Chen and Wu2008), accessions showing high phosphorus use efficiency (Zhao et al., Reference Zhao, Fu, Liao, He, Nian, Hu, Qiu, Dong and Yan2004), and accessions displaying good quality in the context of summer-planted vegetable soybean (Han et al., Reference Han, Qiu, Xu, Hu and Gai2008).
The Chinese soybean core collection has been the most extensively studied and utilized (Qiu et al., Reference Qiu, Li, Guan, Liu, Wang and Chang2009). It can be used as part of a strategy for trait discovery, for example the presence of seed allergens (Yaklich et al., Reference Yaklich, Helm, Cockrell and Herman1999). A reduced ‘mini core’ collection has been exploited both for assessing the level of genetic diversity (Li et al., Reference Li, Li, Zhang, Yang, Chang, Gaut and Qiu2010; Song et al., Reference Song, Li, Chang, Guo and Qiu2010) and for phenotyping purposes (Zhang et al., Reference Zhang, Guan, Liu, Chang, Yao and Qiu2006). The four alleles of GmTFL1 (a gene encoding determinate growth; Tian et al., Reference Tian, Wang, Lee, Li, Specht, Nelson, McClean, Qiu and Ma2010), represented in the mini core collection, were the same as those identified in US G. soja and G. max germplasm. More importantly, accessions from the mini core collection have been incorporated into Chinese soybean breeding programmes as donors for both crossing and backcrossing with local elite cultivars (Qiu et al., Reference Qiu, Li, Guan, Liu, Wang and Chang2009).
Germplasm enhancement
Germplasm enhancement is the foundation for breeding and has played a key role in raising the level of soybean production since 1950. Of the 206 cultivars released in China during the 1950s, 64% have been derived from selection within landraces, and the remainder by a cultivar of crossing methods. Thereafter, crosses between breeding lines gradually replaced both those between landraces and those between a landrace and a breeding line. Within-landrace selections declined to 32% by the 1960s and 16% by the 1970s (Peng, Reference Peng1988). Among the 605 soybean cultivars released in China in the period 1993–2004, over 86% were the result of hybridization between elite lines or cultivars (Qiu et al., Reference Qiu, Wang and Chang2007a, Reference Qiu, Wang, Zhou, Chen and Changb).
Main utilization of G. max
Landraces have been a valuable resource for the development of modern soybean cultivars. These populations tend to be diverse at both the phenotypic and the genotypic levels (Li et al., Reference Li, Guan, Liu, Ma, Wang, Li, Lin, Luan, Chen, Yan, Guan, Zhu, Ning, Marinus, Smulders, Li, Piao, Cui, Yu, Guan, Chang, Hou, Shi, Zhang, Zhu and Qiu2008a, Reference Li, Li, Chang and Zhangb, Reference Li, Zheng and Hanc). The landraces Jinyuan, Silihuang, Baimei and Duludou have all had a noteworthy impact on the Chinese soybean breeding programme. Individually, they represent an ancestor of between 313 and 577 cultivars, and are within the pedigree of 24–44% of cultivars released prior to 2005. However, they were only used as a direct parent of 17% of derived cultivars (compared to the figure of 25% for released cultivars and 47% for breeding lines (Xiong et al., Reference Xiong, Zhao and Gai2008)). The cultivar 58-161, a high yielding, large-seeded selection from the landrace Binhaidabaihua, was released in Jiangsu in 1964. This cultivar became the ancestor of 61 cultivars, of which 12 were generated from a cross involving 58-161 as a parent; 33 other cultivars have a genetic contribution of at least 25% from 58-161 (Zhao et al., Reference Zhao, Cui and Gai1998). Binhaidabaihua and Tongshantianedan were ancestors of 62 and 61 cultivars (Qiu et al., Reference Qiu, Nelson and Vodkin1997a, Reference Qiu, Zhao and Gaib). Tokachi Nagaha, a soybean introduction from Japan, was ancestor of 195 Chinese soybean cultivars released before 2005. The contribution of Tokachi Nagaha to various cultivars ranged from 0.78% to 50.0%, with 77.3% cultivars have contribution more than 6.25% (Guo et al. Reference Guo, Chang, Zhang, Zhang, Guan and Qiu2007).
Certain elite breeding lines and cultivars have been particularly important in the Chinese soybean breeding programme. Fu et al. (Reference Fu, Chen, Jiang, Jing, Fu and Lu2002) identified a set of 21 lines as being key crossing parents. The most widely grown Chinese cultivar Zhonghuang13 was selected from a cross between the elite line Zhongzuo 90 052-76 and the cultivar Yudou 8 (Wang et al., Reference Wang, Guan, Liu, Chang and Qiu2006a, Reference Wang, Guan, Guan, Li, Ma, Dong, Liu, Zhang, Zhang, Liu, Chang, Xu, Li, Lin, Luan, Yan, Ning, Zhu, Cui, Piao and Liub). Combinations among a set of eight elite lines have been responsible for 19 cultivars released in Henan Province, including some with a protein content as high as 50% (Yudou 12) and 52% (Zheng 85 558), and others (Zheng 133 and Shang 7608) with excellent resistance to insect. Ke4430-20 is another important parent, giving rise to 42 Heilongjiang Province cultivars by 2005. Five of these cultivars have a 50% genetic contribution, and 21 of them have a contribution of 25% from Ke4430-20 (Liu et al., Reference Liu, Gai and Lv2005). These data indicate that both 58-161 and Ke4430-20 were not only direct parents contributing 50% of the alleles, but also that they featured in the pedigree of the cultivar parent as well. Guan et al. (Reference Guan, Qin, Hu, Chen, Chang, Liu and Qiu2009) found that many genes present on linkage group G of the successful cultivar Hefeng 25, a descendant of Ke4430-20, had Ke4430-20 alleles, which may have been responsible for the cultivar's high yield and good adaptation. This linkage group contains a 113cM block which is conserved in five of the 26 independent cultivars analyzed by Lorenzen et al. (Reference Lorenzen, Lin and Shoemaker1996), and four other cultivars derived from sister crosses share a recombination event in a common 1 cM region of the same linkage group. A SSR-based association study of 163 cultivars belonging to the five major family pedigrees in Huanghuai valley and southern China was reported by Zhang et al. (Reference Zhang, Zhao and Gai2009). The five key ancestors were 58-161, Xudouintra-cultivar selection 1, Qihuang 1, Nannong 493-1 and Nannong 1138-2. The same set of elite alleles is present in all five pedigrees, although the frequencies differ. This evidence has been taken to indicate that elite alleles do tend to accumulate in elite lines, explaining perhaps why elite lines are generally preferred to landraces as breeding parents.
In addition to indigenous landraces, plant introduction has been a consistent source of parental material. Gizlice et al. (Reference Gizlice, Carter and Burton1994) concluded that just 35 ancestors contributed over 95% of the alleles present in a set of 258 North American cultivars released between 1947 and 1988, and that just five of them accounted for over 55% of the genetic background of public sector North American cultivars. Indian cultivars have been classified into two groups, depending on their breeding history. The first group represents direct selections from exotic and indigenous materials (e.g. Bragg, Lee, Hardee, Monetta, CO1, VL Soya 2), while the second group is formed from cultivars derived by crossing or mutagenesis (Karmakar and Bhatnagar, Reference Karmakar and Bhatnagar1996). Brazilian germplasm was assembled from materials imported from Japan, Taiwan and the US, and these were used to adapt the crop to the various production areas in the country. A pedigree analysis of 651 Chinese cultivars released from 1923 to 1995 showed that of the 348 ancestors identified, 46 were introductions (two major countries, 24 from the US and 13 from Japan) (Gai et al., Reference Gai, Zhao, Cui and Qiu1998). The ten most often used introductions prior to 1995 were Mamotan, Takachinagaha, Yaki 1, Amsoy, Clark 63, Beeson, Wilkin, Williams, Magnolia and Amur41; Takachinagaha and Yaki 1 were from Japan, Amer41 was introduced from the former Soviet Union, and the other seven cultivars were all from the US. Williams and Amsoy were also the most often used parental cultivars in North America (Mikel et al., Reference Mikel, Diers, Nelson and Smith2010). The Japanese cultivar Tokachinagaha was a parent of 52 Chinese cultivars, while Yaki 1 was a parent of 20. Extending the survey period to 2005, Xiong et al. (Reference Xiong, Zhao and Gai2008) found that at least 287 cultivars had Tokachinagaha in their pedigree. Zhongdou 27 (featuring an isoflavone content of 3704 μg/g) was derived from a cross between Zhongdou19 and L81-4590 (an introduction from the US). The search for resistance to SCN has prompted many introductions. In a survey of 130 SCN-resistant cultivars, Anand (Reference Anand and Colyer1991) reported that most of the resistance was inherited from PI88788 and the cultivar Peking. PI437654 was shown to be resistant to nearly all races of SCN (Anand et al., Reference Anand, Gallo, Baker and Hartwig1988). The resistance genes against SCN races 1 and 3 in Brazilian materials were obtained from Peking and/or PI437654 via the cultivar Hartwig (Arias et al., Reference Arias, Dias, Carneiro, Oliveira, Ferraz de – Toledo, Carrão-Panizzi, Pipolo, Moreira, Kaster and Bertagnolli2009). Cultivars Zhonghuang 26 (Dan 8 × PI437654) and Kangxianchong 8 (Dongxiaoli × Franklin) were also resistant to SCN race 3 (Wang et al., Reference Wang, Guan, Liu, Chang and Qiu2006a, Reference Wang, Guan, Guan, Li, Ma, Dong, Liu, Zhang, Zhang, Liu, Chang, Xu, Li, Lin, Luan, Yan, Ning, Zhu, Cui, Piao and Liub; Tian et al., Reference Tian, Wang, Lee, Li, Specht, Nelson, McClean, Qiu and Ma2010). The other SCN-resistant cultivars Kangxian 1, Kangxian 2 and Kangxianchong 4 all carry a resistance gene from Franklin (Xu et al., Reference Xu, Wang, Zhan and Li2010). Seven elite selections resistant to SCN races 1 and 4 were developed from the cross Hartwig × Jin 1261 (Liu et al., Reference Liu, Lu, Chang and Qiu2008). Much of the germplasm registered in Crop Science consists of improved experimental lines carrying genes or combination of genes transferred from exotic germplasm. These lines are, in general, not agronomically acceptable in themselves as cultivars, but are valuable as parental lines (Carter et al., Reference Carter, Nelson, Sneller, Cui, Spect and Boerna2004).
Passport data for the leading ancestors of North American germplasm show that most originated in China (Carter et al., Reference Carter, Nelson, Sneller, Cui, Spect and Boerna2004). Cultivars recommended for the northern US were mostly developed from spring-sown types introduced from North-east China. The Chinese cultivars Mandarin, Richland, S-100, Muken and CNS contributed 43.9% of the genetic base present in the northern US breeding gene pool during the period 1947–1988 (Carter et al., Reference Carter, Nelson, Sneller, Cui, Spect and Boerna2004). Likewise, the southern US cultivars were developed mainly from spring-sown type introductions from southern China. For example, CNS and S-100 are common to the pedigree of the majority of the southern US cultivars. In addition, Chinese cultivars have also provided an important genetic base for cultivar development in Brazil, Japan and Europe (Chang, Reference Chang1989).
Genetic utilization of G. soja
As the probable ancestor of cultivated soybean, G. soja has been used to some extent in breeding programmes. It has been recognized as a potential source for increasing seed protein content since the 1940s, and was used as a parent to develop the elite lines Longpin 8807 and 8802-1, which both had a seed protein content over 45%, and a total content of both protein and oil over 63% (Lin, Reference Lin1996; Qi et al., Reference Qi, Lin, Wei, Yang and Liu2005). Brummer et al. (Reference Brummer, Graef, Orf, Wilcox and Shoemaker1997) have identified a major protein concentration quantitative trait locus on G. soja linkage group I, and the evidence is that introgression into cultivated germplasm, especially if supported by marker-assisted selection, will be feasible (Sebolt et al., Reference Sebolt, Shoemaker and Diers2000). Some derivatives of interspecific crosses produced multi-branching plants bearing small (7–12 g per 100 seed) seeds (Lin, Reference Lin1996), but showing high yield and pronounced abiotic stresses tolerance (Yang and Ji, Reference Yang and Ji1999; Wang and Li, Reference Wang and Li2000). Small seed is desirable for producing either sprouts or natto (a fermented product). Some food-grade cultivars are of this type, notably such as NC114 and NC115 (Carter and Burton, Reference Carter and Burton2007), Jilinxiaoli cultivars (Wang et al., Reference Wang, Lin, Luan, Li, Guan, Li, Ma, Liu, Chang and Qiu2008a, Reference Wang, Wang, Dong, Yu and Mengb), and Longjiangxiaoli cultivars and Hongfengxiali cultivars. Wild soybean is also a potential source of high isoflavone content (Hwang et al., Reference Hwang, Lee, Jeong, Dhakal, Lee and Seo2009), and has been used to breed the cultivar Jilinxiaoli No. 7, which has an isoflavone content of 5.86 g/kg. Resistance against Cercospora sojina hara and SCN has been incorporated into elite cultivated lines from G. soja by hybridization or via the pollen tube pathway (Yang et al., Reference Yang, Wang, Ma, Wang and Jiang2005; Tian et al., Reference Tian, Zhou, Wu, Yang, Li and Gao2007).
Potential utilization of perennial species
The wild perennial relatives of soybean harbour a number of unique genes. Hymowitz et al. (Reference Hymowitz, Sing and Kollipara1998) have provided a contemporary summary of attempts to use wide crosses from these species, but in practice, the contribution of the perennial relatives to soybean breeding has been minimal. Singh and Nelson (Reference Singh and Nelson2009) successfully crossed PI441001 (G. tomentella) with the cultivar Dwight, treated the hybrid progeny with colchicine and then backcrossed the resulting synthetic allopolyploid with Dwight five times to derive a series of monosomic alien addition lines. Progeny of these lines segregated a small proportion of disomic alien addition lines, which are seen as a reservoir of new genes for future soybean improvement.
Prospects
Integrating genetic diversity in modern soybean breeding with make best use of the genetic resources to the global soybean research community. The recently completed soybean (Schmutz et al., Reference Schmutz, Cannon, Schlueter, Ma, Mitros, Nelson, Hyten, Song, Thelen, Cheng, Xu, Hellsten, May, Yu, Sakurai, Umezawa, Bhattacharyya, Sandhu, Valliyodan, Lindquist, Peto, Grant, Shu, Goodstein, Barry, Griggs, Abernathy, Du, Tian, Zhu, Gill, Joshi, Libault, Sethuraman, Zhang, Shinozaki, Nguyen, Wing, Cregan, Specht, Grimwood, Rokhsar, Stacey, Shoemaker and Jackson2010) and wild soybean (Kin et al., Reference Kin, Lee, Van, Kim, Jeong, Choi, Kim, Lee, Park, Ma, Kim, Kim, Park, Lee, Kim, Kim, Shin, Jang, Kin, Liu, Chaisan, Kang, Lee, Kim, Moon, Schmutz, Jackson, Bhak and Lee2010) genome sequences, which are expected to greatly facilitate the process of gene discovery and utilization. A process promises to be much accelerated by the development of high-throughput SNP technology. The US soybean collection is presently being characterized in this way at >50,000 SNP loci, and key genes in selected soybean accessions are being resequenced (Hyten et al., Reference Hyten, Cannon, Song, Weeks, Fickus, Shoemaker, Specht, Farmer, May and Cregan2010). Combined with an ever-increasing number of molecular markers and the refinement of genotyping and sequencing technologies and bioinformatic analysis, genetic diversity analysis offers an opportunity to gain a deeper understanding of the evolution of soybean as a crop, and facilitates the discovery of novel alleles and genes which can be exploited for soybean improvement.
Germplasm accessions are seldom used in cultivar development programmes because they typically yield poorly and are less well adapted than modern cultivars. Few breeding programmes can afford the time to develop the necessary bridging materials, while modern cultivars and high yielding breeding lines can be used directly. As fewer than 1% of conserved accessions have been incorporated into practical breeding programmes, the genetic base of modern cultivars has become rather narrow. The annualized rate of yield improvement in the period 1924–1978 is nearly identical to that achieved from 1978 to 2009, so breeders have clearly succeeded in making genetic gains despite the narrowing of the genetic base. Nevertheless, incorporating new germplasm should be encouraged, especially given the new genetic technology now available (Nelson, Reference Nelson2009a, Reference Nelsonb).
Varietal development is carried out in both the public and private sectors. Public sector breeders tend to emphasize germplasm enhancement, breeding methodology and further development in molecular technology. The importance of genetic diversity research needs to be particularly recognized, especially because the private sector is not in a position to sustain the required long-term investment. The genetic gain achieved by soybean breeders with respect to yield potential is a major driver of the increased productivity of soybean. As the world population continues to increase and cropping area decreases, the breeding community will face continuing challenges to maintain the upward pressure on yield, as well as having to address new needs with respect to seed quality, and better levels of tolerance to both biotic and environmental stresses. The bridge parental line to meet these challenges can only come from the soybean germplasm gene pool.
Acknowledgements
This research was supported by Crop Germplasm Conservation (NB04-22-6, NB04-23-3, NB05-070401-22-06, NB06-070401-05, NB07-2130315-06, NB08-2130315-06 and NB08-213031506), National Key Technologies R&D Program in the 11th Five-Year Plan (No. 2006BAD13B05) and the State Key Basic Research and Development Plan of China (973) (Nos. G1998010203, 2004CB117203 and 2010CB125900), and International Scientific Cooperative Project (No. 2008DFA330550).