Introduction
Recently, Carter et al. (Reference Carter, Nelson, Sneller, Cui, Boerma and Specht2004) provided a thorough review of soybean genetic diversity, and in this issue, Qiu et al. (Reference Qiu, Peng-Yin, Zhang-Xiong, Ying-Hui, Rong-Xia, Li-Hui and Ru-Zhen2011) have provided updated and detailed information about research on soybean germplasm. The purpose of this article is not to duplicate those reviews but to use the history of soybean germplasm in the United States to help understand the various transitions that have occurred in the management of genetic resources and the changes that will be needed if we are to meet the challenges of the future. The focus will be on the USDA Soybean Germplasm Collection, but the concepts are applicable to any self-pollinated species. In a time of rapidly evolving genetic and genomic technologies, germplasm collections can become an even more important resource for plant breeders to develop improved cultivars that will produce the food, feed and fibre that will sustain a growing world population, and have a much greater role in the research of other plant scientists to understand the basic biology of critical plant functions. However, to reach this potential will require changes in how germplasm collections are managed and how those resources are utilized by those that can most benefit from their effective use. Germplasm collections have often been managed based on theoretical principles for maintaining maximum genetic diversity without sufficient consideration as to how that diversity will actually be maintained, evaluated and utilized.
The beginning of the USDA Soybean Germplasm Collection
The soybean first arrived in the United States in 1765 (Hymowitz and Harlan, Reference Hymowitz and Harlan1983), but neither that introduction nor others that occurred over the next 100 years were preserved or permanently established soybean as a crop in the U.S. The oldest extant accession in the USDA Soybean Germplasm Collection was introduced into the U.S. sometime prior to 1880, but the exact origin and circumstances of that introduction have been lost (Bernard et al., Reference Bernard, Juvik and Nelson1987). Formal record keeping of plant introductions began in the U.S. in 1898. This function was assigned to U.S. Department of Agriculture, Division of Botany. The first plant introduction number (PI 1) was assigned to a Brassica oleracea (cabbage) accession from Russia. The first soybean recorded in the system (PI 480) came from Siberia in the same year (USDA, 1898). Both of these accessions were collected by Prof. Niels E. Hansen of the Agricultural College of South Dakota during a visit to Russia, central Asia and Siberia. Unfortunately, that soybean accession like most that were introduced through the first half of the 20th century was not preserved. At this time, soybean was not a crop but a scientific curiosity. New introductions were saved if they were perceived to have value, and the others were discarded. Between 1898 and 1949 when the USDA Soybean Germplasm collection (hereafter referred to as the USDA Collection) was established, 8225 soybean introductions were brought into the U.S. More than 5000 of these lines came from the plant collecting expeditions of P. H. Dorsett and W. J. Morse in China, Korea and Japan between 1924 and 1932 (Dorsett, Reference Dorsett1927; Bernard et al., Reference Bernard, Juvik and Nelson1987; Piper and Morse, Reference Piper and Morse1910, Reference Piper and Morse1923, Reference Piper and Morse1925). As important has these collections were, only a very few of the major ancestral lines that formed the genetic base of current U.S. soybean production came from these expeditions. Based on the ancestral contributions calculated by Gizlice et al. (Reference Gizlice, Carter and Burton1994), the Dorsett and Morse expeditions provide about 13% of the current genetic base, and most of that (8%) was contributed by the major ancestral line Richland collected in Jilin, China in 1926. In 1949, only 1677 of these soybean accessions and other lines introduced prior to 1898 were maintained in individual U.S. research programmes (Bernard et al., Reference Bernard, Juvik and Nelson1987), and those accessions became the initial USDA Collection. This advanced the concept that germplasm should be preserved even when the immediate value was unknown. Beginning in 1949, soybean germplasm introduced into the U.S. was carefully preserved.
Rationale for pure-lining self-pollinated accessions
In 1954, Dr. Richard Bernard was appointed curator of the USDA Northern Soybean Germplasm Collection. He made two observations that produced significant changes in how the USDA Collection was managed (R. L. Bernard, personal communication). Original seed samples had been preserved at Beltsville, MD for early soybean introductions. These samples had not been stored in a controlled environment, and most were decades old so, by the 1950s, none were viable. However, it did allow for a physical comparison of seeds from the original sample with those currently in the USDA Collection. Based on seed coat and hilum colour, and seed size and shape, Dr. Bernard determined that many of the current seed lots were different from the original samples. The accessions that are different were recorded and are reported in Bernard et al. (Reference Bernard, Juvik and Nelson1987). The second observation also involved changes in accessions. The names associated with some Japanese accessions in the USDA Collection were descriptive. Based on those names, some of the accessions should have been glabrous, but by the mid 1950s, most of them were pubescent. These observations highlight two critical aspects of managing a germplasm collection. There could have been many reasons why some of the germplasm accessions were different from the original sources, but a key factor in these errors is the lack of an accurate description for each accession. Without these descriptions, errors in the seed lots were not detected and corrected, and accessions were irrevocably changed. It is possible that could explain the change in the glabrous accessions as well, except that there is a more likely explanation. In the midwest U.S., the potato leafhopper (Empoasca fabae) causes severe damage to soybeans without pubescence and greatly reduces plant growth and seed yield. We demonstrated this effect by planting equal quantities of seeds of near-isogenic soybean lines that differed only at the P1 locus (Nagai and Saito, Reference Nagai and Saito1923; Palmer et al., Reference Palmer, Pfeiffer, Buss, Kilen, Boerma and Specht2004), which controls the glabrous trait, in a mixed population. The stunting effect of the leafhoppers was compounded by the shading of the more vigorous pubescent plant. Natural selection removed all of the glabrous plants after three cycles of bulk harvesting the plots and replanting that seed (unpublished results). Because of these experiences, Bernard concluded that the optimum strategy for maintaining self-pollinated species in a germplasm collection was to maintain all accessions as pure lines.
Historically, the standard practice is that all germplasm accessions should be maintained as collected so as not to lose any of the variation in the original sample. In theory, this seems like an ideal strategy for preserving genetic diversity, but in practice, it is simply not workable. Heterogeneous accessions are in constant risk of change both during storage (from unequal seed longevity) and during regeneration (from natural selection, genetic drift and mechanical contamination) and as was previously cited those changes can lead to permanent loss of diversity. It is possible to mitigate each of these risk factors but doing so can greatly increase the costs of curation, and in reality, the risks can only be lessened and not eliminated. Maintaining separate samples of original seeds to use for future regeneration has been suggested as a way of reducing genetic drift and natural selection, but it is impractical because original samples are generally much too small to provide the needed reserve. Even if large samples were obtained that would increase storage costs, not guarantee accession integrity, and only be viable for a finite period of time until the original sample is exhausted or dies. Assuming that all genotypes will produce some seeds each time the accession is grown, the effects of natural selection and genetic drift could be eliminated by storing equal number of seeds from each plant in the regeneration plot. This also assumes that each time the accession is grown, the sample size is sufficiently large to include even the very low frequency genotypes that may exist in heterogeneous accessions. This could be a practical solution for very small germplasm collections, but the labour costs required to implement such a strategy would be prohibitively high for any major collection. None of these strategies addresses the concern of accidental seed contamination. It is not practically possible to accurately describe a heterogeneous accession. When seed contamination occurs, it is highly unlikely that it will be recognized and remedied, and, no matter what precautions are taken to ensure seed purity, accidental seed contamination or mislabelled seed lots will occur in large germplasm seed collections.
In pure-lined collections, each accession is descended from a single seed in the original seed lot. This requires careful classification of each plant grown from the original seed lot. At maturity, multiple single plants are harvested to represent each identified phenotype. The following year, each plant row is again characterized, and each row with a different phenotype is harvested and added as a new accession. Rows that are segregating are discarded based on the assumption that these rows are the product of a recent cross pollination and both parents would be available. There are certainly risks associated with pure-lining accessions. Practically, pure lining can only select on phenotypic variation, so it is always possible that genotypic variation could be overlooked and lost. Because it is possible to select on both qualitative traits that are generally easily classified and quantitative variation that can be distinguished even when it cannot be precisely described, the odds of genotypic differences not being expressed in some phenotypic variation are low. Although wild soybean (Fujita et al., Reference Fujita, Ohara, Okazaki and Shimamoto1997) can have higher levels of cross pollination than cultivated soybean (Ahrent and Caviness, Reference Ahrent and Caviness1994), both species are highly self-pollinated, so achieving homozygousity is a natural process. For domesticated species such as soybean, primitive varieties are commonly derived by pure line selection, and heterogeneity is as likely to have come from accidental contamination (perhaps in a previous germplasm collection) as from inherent variation. For wild species, populations should be extensively sampled to effectively capture the natural variation in multiple accessions (Zhu et al., Reference Zhu, Zhou, Zhong and Lu2007). Approximately, 75% of the accessions in the USDA Collection were phenotypically homozygous and homogeneous when they were received. Although there is a risk that the pure lines will not preserve all of the variation in the original sample, this risk occurs once, and then the integrity of each accession can be economically and predictably maintained forever. Genetic drift and natural selection are not factors in compromising the integrity of the accession, and accidental seed contamination can be detected and removed because each accession has a precisely known description.
Growth of the USDA Soybean Germplasm Collection
Beginning in 1949, all soybean germplasm introduced into the U.S. was carefully preserved, but for the next 25 years, there was little active effort to expand the collection. A major turning point for germplasm collections, in general, in the U.S. was the southern leaf blight epidemic (caused by Helminthosporium maydis, Nisikado & Miyake) (Sprague, Reference Sprague1971) of 1970 in maize. This publicly highlighted the perils of genetic uniformity of crops in modern agriculture and increased the visibility of and support for germplasm programmes (Committee on Genetic Vulnerability, National Academy of Science, 1972; Committee on Germplasm Resources, 1978). This period also coincided with an impetus to intentionally increase the holdings of the USDA Collection with a particular focus on Asia. From 1949 to 1974, an average of 72 accessions was added to the USDA Collection each year. From 1975 to 2000, this number increased to over 500 annually. Some of these accessions were collected by U.S. scientists in cooperation with host countries, but nearly 90% of the new additions were obtained through exchanges with other collections. Worldwide, the era of soybean collection is coming to an end. There are only a few places in Asia where primitive varieties, not currently in any germplasm collection, are being grown by farmers, but these isolated pockets are rapidly disappearing, as cultivars from new and expanding scientific breeding programmes are being widely distributed and grown. This loss of genetic diversity that has been previously managed by farmers increases the responsibility of ex situ collections and the importance of germplasm exchange among these collections.
A major remaining challenge is the collection of wild soybean, Glycine soja. Phenotypically, the wild soybean has less variation than the cultivated soybean in plant and seed pigments, seed size, pubescence characteristics and plant type. Although there are fewer variations, there are significant differences among wild soybean accessions. Significant differences for mean seed size (1.9 vs. 2.8 g/100 seeds) were found among populations in southwestern Hokkaido (Ohara et al., Reference Ohara, Shimamoto and Sanbuichi1989). Morphological differences among G. soja populations may be related to adaptation to specific environments as two distinct types, twining and branching, were found in contrasting ecological niches on the Saru River in Hokkaido (Ohara and Shimamoto, Reference Ohara and Shimamoto1994). There are some phenotypic traits for which the wild soybean does have more variation. Chen and Nelson (Reference Chen and Nelson2004a) identified much greater diversity in leaflet size and shape within wild soybean accessions from Russia, China, Japan and Korea, and they found that variation was associated with geographical origin. They also found large differences in early plant growth not observed in soybean (Chen and Nelson, Reference Chen and Nelson2006). Although there are mixed results for measures of phenotypic diversity, all reported research clearly shows that the wild soybean is genotypically much more diverse than soybean. Both Li and Nelson (Reference Li and Nelson2002) and Chen and Nelson (Reference Chen and Nelson2004b) using Random Amplified Polymorphic DNA markers indentified much greater genetic diversity in wild soybean than in soybean. Using variation within gene sequences, Hyten et al. (Reference Hyten, Qijian, Youlin, Ik-Young, Nelson, Costa Jose, Specht, Shoemaker and Cregan2006) concluded that approximately half of the diversity in wild soybean was not transferred through the genetic bottleneck of domestication. Certainly, some of that loss is responsible for the improved characteristics that make soybean an agronomic crop, but it is highly likely that valuable alleles for soybean improvement exist within this species. There is ample evidence that there is great genetic diversity within and among wild soybean populations (Kiang et al., Reference Kiang, Chiang and Kaizuma1992; Dong et al., Reference Dong, Zhuang, Zhao, Sun and He2001; Lee et al., Reference Lee, Yu, Hwang, Blake, So, Lee, Nguyen and Shannon2008; Wen et al., Reference Wen, Ding, Zhao and Gai2009; Li et al., Reference Li, Li, Zhang, Yang, Chang, Gaut and Qiu2010). The total ex situ collection of wild soybean is only a small percentage of the worldwide soybean germplasm holdings and should be increased. With the current development of high throughput systems for single-nucleotide polymorphism (SNP) markers and the potential of extensive accession sequencing in the future, extraction of useful alleles from wild soybean will very likely become much easier if not routine in the future.
Germplasm evaluation
Germplasm evaluation is the critical first step in effective germplasm utilization. Without evaluation, germplasm collections have limited utility and unless they are used or have the potential to be used, they have no practical value. Pure-lined accessions have an enormous advantage for evaluation. It is not possible to accurately evaluate heterogeneous accessions because of the uncertainty in separating genetic from environmental variation. As a very simple case, we find that hilum colour expressivity in some germplasm accessions is much more variable than in standard soybean cultivars. For some accessions, there can be a wide range of hilum colour among seeds within a seed lot. If the extremes are selected and planted as two seed lots, the same variation will be found within each harvested seed lot. This same variation is very likely to be observed for economically useful traits such as disease resistance. Unless one is certain that each seed lot is homozygous and homogeneous, this type of variation would most likely be classified as genetic differences. If useful variation is identified within a heterogeneous accession, it must be preserved as a pure line if it is to be used. It is very inefficient to have to locate the needed genotype within a mixed seed lot each time it is needed. In reality, only pure-lined accessions, either intentionally or by default, can be used because germplasm is introgressed one ovule or pollen grain at a time.
An argument against pure lining is that rare alleles are likely to be lost in the initial selection. This is a valid argument; however, maintaining and finding rare alleles in heterogeneous accessions is also problematic. We have already discussed the negative impact that genetic drift and natural selection can have on heterogeneous populations and particularly on rare alleles. If rare alleles are maintained within heterogeneous populations, there still exists the major challenge of finding them. If there were an allele with a frequency of 5% in a heterogeneous accession, one would have to evaluate 90 plants to have a 99% chance of finding that allele, assuming it could be definitively identified on a single plant. With an allelic frequency of 1%, 490 plants would need to be evaluated. If nothing is known of the allelic frequencies within accessions or whether or not an accession is actually heterogeneous, as is most likely to be the case with most self-pollinated accessions, it would not be cost effective to evaluate 100–500 plants/accessions in an attempt to find a potential rare allele. This would be the best case scenario for qualitative traits that could be reasonably identified on a single plant. The problem would become much greater for quantitative traits where replicated trials would be needed. Rare alleles are not likely ever to be identified in heterogeneous accessions, because even if they survive natural selection and genetic drift, the cost of finding them is prohibitive.
All of the accessions in the USDA Collection are evaluated for basic agronomic traits, major seed composition components and descriptive characters. These data are critically important in providing an accurate description of each accession and helping to select accessions that will be most useful in a research or breeding programme. These data are published in technical bulletins (Nelson et al., Reference Nelson, Amdor, Orf, Lambert, Cavins, Kleiman, Laviolette and Athow1987; Nelson et al., Reference Nelson, Amdor, Orf and Cavins1988; Coble et al., Reference Coble, Sprau, Nelson, Orf, Thomas and Cavins1991; Bernard et al., Reference Bernard, Cremeens, Cooper, Collins, Krober, Athow, Laviolette, Coble and Nelson1998; Hill et al., Reference Hill, Peregrine, Sprau, Cremeens, Nelson, Kenty, Kilen and Thomas2001; Hill et al., Reference Hill, Peregrine, Sprau, Cremeens, Nelson, Orf and Thomas2005; Hill et al., Reference Hill, Peregrine, Sprau, Cremeens, Nelson, Orf and Thomas2008; Peregrine et al., Reference Peregrine, Sprau, Cremeens, Handly, Kilen, Smith, Thomas, Sarins and Nelson2008). Many accessions are also evaluated for important disease and insect resistance traits, amino-acid composition, and some abiotic stresses. All of the data collected are available through the Internet at http://www.ars-grin.gov/npgs/. There are more than 700,000 data points currently in our database. A core collection was recently selected for the USDA Collection (Oliveira et al., Reference Oliveira, Randall L, Isaias O, Cosme D and José Francisco F2010), and this can be used as a guide for selectively testing additional soybean accessions.
Not many years ago, evaluation data were considered a major limiting factor in utilizing germplasm collections, but for many collections, additional evaluation data are no longer the limiting factor in making the collection useful. For example, we have evaluation data for over 6800 accessions for resistance to Fusarium virguliforme, which causes sudden death syndrome (SDS). Even though there are fewer than 100 accessions with moderate resistance, it is questionable whether additional resistance sources would be useful at this time unless accessions with immunity or near immunity could be found. For plant breeders, this is a difficult disease for which to breed host plant resistance. Resistance is quantitative (Meksem et al., Reference Meksem, Doubler, Chancharoenchai, Njiti, Chang, Rao Arelli, Cregan, Gray, Gibson and Lightfoot1999; Iqbal et al., Reference Iqbal, Meksem, Njiti, Kassem and Lightfoot2001; Njiti et al., Reference Njiti, Meksem, Iqbal, Johnson, Kassem, Zobrist, Kilo and Lightfoot2002; Farias Neto et al., Reference Farias Neto, Hasmi, Schmidt, Carlson, Hartman, Li, Nelson and Diers2007), producing consistent symptoms in the field is very difficult (Scherm and Yang, Reference Scherm and Yang1996), and although more consistent results can be obtained in the greenhouse, those data can be different from those collected in field trials (Njiti et al., Reference Njiti, Johnson, Torto, Gray and Lightfoot2001; Farias Neto et al., Reference Farias Neto, Schmidt, Hartman, Li and Diers2008). Most breeders are going to be very hesitant to cross with a new source of SDS resistance that is most likely to come from an agronomically inferior line unless there is good evidence that this line can provide new alleles for resistance. Without that knowledge, the breeder may be spend years developing a new resistant, high-yielding line that in the end will have the same resistance alleles as are already available in elite cultivars or improved germplasm. Characterizing genotypic differences among phenotypically similar accessions is a much more challenging activity than identifying additional sources of resistance, but it should be a high priority activity because it is critical for effective utilization of germplasm collections. An excellent example of what can be done is represented by the work of Tian et al. (2010). Using a Basic Local Alignment Search Tool (BLAST) search with a known sequence from an Arabidopsis thaliana gene that affects inflorescence type (TFL1) against the soybean whole genome sequence (Schmutz et al., Reference Schmutz2010), four gene models were identified that were homologous to TFL1. Using the known map position of Dt1 soybean locus, Glyma19g37890.1 was proposed as the candidate gene for the Dt1. Inserting that allele into Arabidopsis duplicated the function of TFL1. Sequencing that gene from a diverse selection of soybean and wild soybean germplasm identified four single base changes. Three of the changes corresponded to known phenotypic differences, but knowing these genotypes allows for classification of stem termination types that could not be separated phenotypically because of genetic background effects. As more agronomically important genes are identified and sequenced and technological advances reduce the cost of this research, finding useful, new diversity through sequence variation will become a powerful tool for germplasm utilization.
DNA markers are an essential tool for germplasm management. Not many years ago, it would not have been financially feasible to consider characterizing entire germplasm collections at a significant number of marker loci, but that has changed. The United Soybean Board in the U.S. is currently investing approximately 3.5 million U.S. dollars to characterize all of the annual accessions in the USDA Collection with more than 40,000 SNP markers. This is a cost of about $175/accession, and the more than 800 million data points that this research will generate will provide an unprecedented resource for germplasm research. Re-sequencing several soybean germplasm accessions is already being discussed, and it is very likely that some time in the not too distant future, the cost of a complete genome sequence for a plant accession may not be much more expensive than the current characterization with SNP markers. Genotypic characterization is enormously useful for understanding and utilizing germplasm, but for self-pollinated species, unambiguous results can only be obtained from pure-lined accessions.
Germplasm distribution
The ultimate value of any germplasm collection is determined by how the accessions in that collection are used. The USDA Soybean Germplasm Collection is one of the most intensely used collections in the world. Of all of the collections in the USDA National Plant Germplasm System, it is the only collection that has an annual average distribution rate that is greater than the total number of accessions in the inventory. Figure 1 shows the growth of the USDA Collection and the annual sample distribution since 1991 when the portions of the USDA Soybean Germplasm Collection maintained in Stoneville, MS and in Urbana, IL were consolidated into a single collection. Over the past 15 years, the USDA Collection has an average distribution of 25,809 seed packets/year, while the average size of the Collection during that time is only 19,536 accessions. During that period, we have distributed seeds of 99% of the accessions in collection. More than 75% of the 211 accessions that have not been distributed are perennial Glycine, and the remaining are accessions recently added to the Collection. The most requested accession is the old cultivar Peking, the first widely used source of soybean cyst nematode (SCN) resistance. It has been requested 607 times. Only two other accessions have been requested more than 500 times: Williams 82, the line recently used to create the first soybean genomic sequence, and PI 88 788, the currently most widely used source of SCN resistance in the U.S. There were 26 accessions requested more than 200 times, and 18 of those lines are U.S. cultivars. The remaining eight accessions include six sources of SCN resistance and the sources of Rpp1 and Rpp2, Asian soybean rust resistance alleles. All of the highly requested cultivars were released more than 20 years ago, and some are 75 years old. The most often requested wild soybean line is PI 468916, which was a parent of the population that was used to create the first soybean linkage map using DNA markers (Keim et al., Reference Keim, Diers, Olson and Shoemaker1990). Over 790 accessions have been requested an average of three times/year for the past 15 years. Fifty-nine percentage of the USDA Collection has been requested an average of once a year, and 97% of the USDA Collection has been requested more than once during the past 15 years.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160921032101192-0168:S147926211000047X:S147926211000047X_fig1g.gif?pub-status=live)
Fig. 1 Growth of the USDA Soybean Germplasm Collection since 1991 and the number of seed samples distributed/year.
To effectively manage a germplasm collection, it is important to understand the needs of the users, and to understand the needs of users, it is instructive to examine who is requesting germplasm and what types of germplasm are being requested. The USDA Collection is divided into eight sub-collections as follows: introduced G. max, G. soja, germplasm releases, old cultivars, modern cultivars, private cultivars, isolines, genetic types and perennials. The introduced G. max and G. soja sub-collections are self-explanatory. The old cultivars are a group of introduced G. max that were given names in the U.S. or were selected in the U.S. but have unknown parentage. Some, but not all, were used as early commercial cultivars. The modern cultivars are U.S. cultivars with known pedigrees released by public institutions beginning in the 1940s. This collection of 523 cultivars includes almost all publicly released U.S. soybean cultivars with the exception of some cultivars that cannot be distributed without restrictions. The private cultivar collection was initially formed with cultivars that were farmer selections from the late 1940s through the 1960s. About 20 years ago, the decision was made to preserve cultivars developed by private industry that met the following criteria: (1) has been among the leading varieties within a maturity group for total acreage for several years, has some unique feature that merits preservation or has been a significant part of published research, (2) has a publicly available pedigree, (3) can be distributed and used without restrictions, (4) has the approval of the organization that developed the cultivar and (5) is no longer sold as a commercial variety. Approximately, 60 cultivars have been added to this sub-collection in less than 20 years. Germplasm releases that are registered in Crop Science and can be distributed without restriction are maintained in the collection. Since many of the releases have a limited time of usefulness, each release is evaluated every 10 years. Those that have not been requested are no longer retained in the active collection but are still maintained in the National Center for Genetic Resources Preservation at Fort Collins, Colorado. The isoline sub-collection is largely a collection of near-isogenic lines that were developed through backcrossing using principally three recurrent parents (Harosoy, Clark and Williams). These lines have been used extensively in the genetic study of qualitative traits and essentially discovering the first Quantitative trait loci (QTL) in soybean by isolating major genes affecting quantitative traits through backcrossing. The discovery of major genes affecting the time of flowering and maturity is a notable example (Bernard, Reference Bernard1971). These lines have also been very useful in determining the impact of allelic substitutions especially for disease resistance and seed composition traits. The genetic type collection contains mutants not known to occur in other lines within the collection. Approximately, 80% of the entries in this sub-collection have been genetically characterized.
Examining the distribution patterns of each of the sub-collections provides important information about the needs of germplasm users. There are a variety of statistics that can be used in this process. To compare the usage of the various types of germplasm, the average number of times each accession with the sub-collection had been requested, and the average time that the accessions in the sub-collection had been available during the past 15 years was calculated (Table 1). The latter is needed since accessions are continually being added to the USDA Collection and germplasm releases are being dropped from the active collection when they are no longer requested. The introduced soybean and wild soybean sub-collections averaged more than one request/accession per year. The much smaller wild soybean collection is likely a factor in the slightly higher demand/accession for wild soybean than for soybean accessions. The modern and old cultivars are being requested a rate more than double the next most requested category of germplasm with each cultivar being requested three times/year on average. All of the other specialty collections have very similar rates of requests and given that many of the entries in these collections would have only very specific uses, the high rate of usage is somewhat surprising. The least requested sub-collection is that of the perennial species.
Table 1 Number of accessions in each sub-collection of the USDA Soybean Germplasm Collection, the average and total number of requests and the average number of years accessions in each sub-collection were available between 1995 and 2009
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160921032101192-0168:S147926211000047X:S147926211000047X_tab1.gif?pub-status=live)
Over the past 15 years, the number of seed lots distributed from each sub-collection is generally similar to the number of accessions in each sub-collection (Table 1). The introduction of soybean collection is 81% of the whole collection and 81.7% of the seed lots distributed. Most of the smaller sub-collections (isolines, private, germplasm releases and genetic types) make up slightly smaller percentages of the distributions than percentages of the USDA Collection. The percentage of perennial accessions distributed is much lower than the percentage in the USDA Collection. This lack of activity is not surprising since few research programmes are actively working with perennial Glycine species. These diverse species will very likely be an important source of genetic diversity for soybean breeders in the future. Wild soybean accessions make up a slightly higher percentage of the requests than of the total accessions in the USDA Collection. As was demonstrated with the average distribution rate, the two small collections that are exceptions are the old and modern varieties, which have distribution percentages more than twice the size of the sub-collections. These distribution numbers indicate the importance of having a source of seeds for these types of accessions for research purposes. Gene bank should not be an appropriate alternative name for germplasm collection, because in many cases, we are preserving genotypes and not just genes.
A final way of analyzing distribution records is to determine who is requesting germplasm (Table 2). Although we keep more detailed records, for this manuscript, the requestors are divided into six categories: foreign commercial companies, foreign public institutions, domestic commercial companies, domestic institutions, U.S. government agencies and unaffiliated individuals. Over the past 15 years, 25% of the accessions distributed go outside the United States. Of those foreign distributions, 5% went to 58 private companies and 95% of the seed lots went to 239 public institutions. Within the U.S., 76% of the seed lots distributed went to public institutions and 24% to private industry. Slightly more than half of the public distributions go to scientists within the Agricultural Research Service (ARS) of the United State Department of Agriculture and the reminder to 156 colleges and universities, and 33 other public institutions. Twenty-four percentage of the domestic distributions went to 145 commercial companies within the U.S. We had 130 individuals not associated with public institutions or private research companies that requested seeds, but the total number of seed lots sent to these people was only 0.3% of the total.
Table 2 Categories of requestors, and types and numbers of germplasm accessions requested between 1995 and 2009
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160921032101192-0168:S147926211000047X:S147926211000047X_tab2.gif?pub-status=live)
Each of these categories of users has a different need for germplasm. We can assess differences in requesting patterns, by comparing the percentage of distributions of each class of accessions with the percentage of total distributions that were sent to each class of requestors. Those instances where requests for specific categories of accessions were much larger or smaller than the overall percentage of requests made are highlighted. The foreign commercial companies requested only 1% of the seed lots distributed, but 3% of all modern cultivar requests came from this group and it represented 12% of the seed lots sent to these companies (Table 2). Approximately, 20% of our distributions went to foreign public institutions, but one-third of the requests for isolines, privately development cultivars, germplasm releases and genetic type accessions came from these institutions. ARS scientists and other public institution scientists in the U.S. each account from about 30% of the seed lot distributions. ARS scientists requested 38% of all wild soybean seed lots distributed but only 18% of the privately developed cultivars. University and other public institution scientists requested 37% of the old cultivars that were distributed but only 18 and 15% of the privately developed cultivars and germplasm releases, respectively. Because developers of germplasm releases provide most of the domestic distribution for the first 5 years after release, this statistic most certainly underestimates the demand for this type of germplasm. Private industry in the U.S. received 19% of all distributions. They requested only 10% of the perennial accessions, but 29% of the privately developed cultivars distributed. Germplasm collections meet different needs for different users. For many users having access to specific genotypes that are not available from any other source is critically important. Maintaining all aspects of the genetic resources of a crop is needed to fully meet the expectations of the great variety of users of a modern germplasm collection.
The users of the USDA Collection have changed dramatically over the past 40 years. Research on soybeans in private industry was essentially non-existent in 1970, so users were primarily the few soybean breeders, pathologists and occasionally physiologists at public institutions. As recently as 1991, in the first year, when the USDA Southern and Northern Soybean Germplasm Collections were consolidated, only 7803 seed lots were distributed (Fig. 1). Over the past 15 years, we have distributed seeds to 1791 individuals (Table 2). Although we try to keep our records to a single contact/laboratory, our efforts are not perfect, and some of these individuals are graduate students, technicians and post-docs that are working in the same laboratory. Reporting multiple persons/laboratory is most likely to occur in the statistics of domestic public institutions. However, these nearly 1800 individuals do represent over 200 private companies, and 429 public research organizations, government agencies and universities. In 2008, there were only 70 countries with soybean production of greater than 1000 metric tons (http://www.geohive.com/charts/ag_soybean.aspx, verified 13 September 2010), and in 2009, there were 31 states in the United States with reported soybean production (http://www.soystats.com/2009/page_15.htm, verified 13 September 2010). We have distributed soybean germplasm to 69 foreign countries and all 50 states within the United States. The users of soybean germplasm now go far beyond the traditional agricultural scientists.
Conclusions
Maintaining heterogeneous germplasm accession for self-pollinated species has long been the standard for germplasm collections. In theory, this seems like a good strategy, but in practice, it has many shortcomings. Genetic drift and natural selection will alter the composition of these accessions and inevitably lead to loss of diversity unless expensive and time-consuming sampling measures are instituted each time the accession is grown. Without an accurate description of each accession, accidental seed or pollen contamination is unlikely to be found and removed. Rare alleles, if they survive, will only be found with testing of large numbers of individual plants from each accession, and evaluation for quantitative traits will be an even greater challenge. Collections of heterogeneous accessions will be excluded for taking advantage of the current capacity to extensively use DNA markers and eventually whole genome sequencing. Heterogeneous accessions of self-pollinated species may have been the standard of the past, but pure-lined accessions will be the standard for the future.
The USDA Soybean Germplasm Collection has been maintaining pure-lined accessions for over 50 years. It is the most intensively used collection in the USDA National Plant Germplasm System with an average distribution of 132% of the accessions in inventory/year over the past 15 years. Analysis of these distribution records shows the great diversity of clientele that utilize this collection and the varied needs of these germplasm users. Users of the USDA Soybean Germplasm Collection are no longer limited to traditional agricultural scientists. All users expect accessions to be consistent from year to year, so that research results can be repeated even with procedures that can be done on a single seed. To meet the needs of a very diverse research community, it is important to maintain not only extensive collections of land races and wild species but also commercial cultivars, genetic stocks and improved germplasm releases. All of these accessions play a vital role in the research to understand and use genetic diversity.
There are increasing trends to restrict the exchange of genetic resources within the U.S. and internationally, but the USDA has maintained a policy of free exchange for all germplasm within the National Plant Germplasm System. As ex situ collections become the only source of soybean germplasm in future, exchange among collections will become more important. It is imperative that these collections are well maintained and accurately described if this critically important diversity is to be preserved and the integrity of each accession is retained. This high level of quality control is necessary for the maximum usefulness of germplasm exchange. It will allow for accurate data as well as correct genotypes to be moved between collections and from collections to soybean scientists. As new technology allows a greater understanding of the genetic diversity among germplasm accessions, the challenges created by the enormous diversity available will likewise increase. The great genetic challenge of the future will be to identify the allelic diversity within the species, and perhaps related species, for the loci affecting economically important traits of soybean. Accomplishing such a task will take more and not less cooperation. The future of soybean breeding and much of soybean research is in our germplasm collections, and new genetic technologies will continue to increase the value of these collections.