In this issue of Plant Genetic Resources, we have collected articles that describe our current understanding of genetic diversity in ex situ collections of legume germplasm. These articles are not a comprehensive description for all legume crop species, but the range covered gives a reasonable flavour of current concerns in both inbreeding and outbreeding species (clover, Abberton and Thomas, and sanfoin, Hayot-Carbonero et al.). For soybean (Qui et al., Nelson), chickpea (Upadhyaya et al., Krishnamurthy et al.) and lentil (Singh et al., Tulli et al.), this is viewed from more than one perspective, while for common bean, Díaz et al. consider the structure of germplasm where both the Andean and Mesoamerican domesticates were introduced.
A combination of methodological developments has changed the amount of data available to us for the understanding of genetic diversity. This has in turn changed our view of the ways to approach germplasm collections generally, including ways of generating and of analysing data. Both of these topics are discussed in several of the articles in this volume. The article on pea germplasm by Smykal et al., for example, includes a comparison of the results from different Bayesian modelling methods.
Developments in molecular taxonomy have come to maturity since the seminal paper by Chase et al. (Reference Chase, Soltis, Olmstead, Morgan, Les, Mishler, Duvall, Price, Hills, Qiu, Kron, Rettig, Conti, Palmer, Manhart, Sytsma, Michaels, Kress, Karol, Clark, Hedren, Gaut, Jansen, Kim, Wimpee, Smith, Furnier, Strauss, Xiang, Plunkett, Soltis, Swensen, Williams, Gadek, Quinn, Eguiarte, Golenberg, Learn, Graham, Barrett, Dayanandan and Albert1993) first attempted a comprehensive description of angiosperm molecular phylogeny. The promise of that study has come good, and for legumes, a detailed phylogeny with many divergence times between groups has been presented by Lavin et al. (Reference Lavin, Herendeen and Wojciechowski2005). Some of these dates were established with reference to the fossil record, while others are interpolated from patterns of sequence divergence. This structure (simplified in Fig. 1) provides many insights. One challenging conclusion relates to the positioning of the Cicereae as an outgroup to both the Viceae and Trifoleae with paraphyly in the latter. One of the few intra-generic divergence times given by Lavin et al. (Reference Lavin, Herendeen and Wojciechowski2005), within the genus Cicer, has a basal divergence at 14.8 My. This is greater than several inter-generic divergence times; some genera diverged much more recently, and presumably species more recently again. Jing et al. (Reference Jing, Knox, Lee, Vershinin, Ambrose, Ellis and Flavell2005) obtained estimates of the age of alleles segregating within Pisum and obtained values between 1.5 and 5 My. Together, these pieces of information show how taxonomic divisions can represent rather different divergence times within and between species or genera depending on context.

Fig. 1 Simplified legume taxonomy from Lavin et al. (Reference Lavin, Herendeen and Wojciechowski2005); a time-line is indicated to the right, and divergences dated by Lavin et al. (Reference Lavin, Herendeen and Wojciechowski2005) are indicated to the right. The divergence time for chickpea is between 24.7 and 32.1 My and for Sanfoin is between 32.1 and 39 My based on the average of flanking nodes. The position of Cajanus (pigeonpea) is not clear from Lavin et al., but it groups with the Phaseoleae that include Phaseolus and Glycine according to Doyle and Lucknow (Reference Doyle and Luckow2003).
The insights from molecular taxonomy allow us to form evolutionary hypotheses more clearly, and underscore the coherence and unity of legumes as a group. This coherence is reflected in our understanding of the conservative nature of the structure of legume nuclear genomes (Choi et al., Reference Choi, Mun, Kim, Zhu, Baek, Mudge, Roe, Ellis, Doyle, Kiss, Young and Cook2004; Kaló et al., Reference Kaló, Seres, Taylor, Jakab, Kevei, Kereszt, Endre, Ellis and Kiss2004; George et al., Reference George, Sawbridge, Cogan, Gendall, Smith, Spangenberg and Forster2008; Hougaard et al., Reference Hougaard, Madsen, Sandal, Moretzsohn, Fredslund, Schauser, Nielsen, Rohde, Sato, Tabata, Bertioli and Stougaard2008; Timko et al., Reference Timko, Rushton, Laudeman, Bokowiec, Chipumuro, Cheung, Town and Chen2008; Bertioli et al., Reference Bertioli, Moretzsohn, Madsen, Sandal, Leal-Bertioli, Guimarães, Hougaard, Fredslund, Schauser, Nielsen, Sato, Tabata, Cannon and Stougaard2009; Muchero et al., Reference Muchero, Diop, Bhat, Fenton, Wanamaker, Marti Pottorff, Hearne, Cisse, Fatokun, Ehlers, Roberts and Close2009) and conservation of gene function (Domoney et al., Reference Domoney, Duc, Ellis, Ferrándiz, Firnhaber, Gallardo, Hofer, Kopka, Küster, Madueño, Munier-Jolain, Mayer, Thompson, Udvardi and Salon2006). For example, the Dt1 gene of soybean (Liu et al., Reference Liu, Watanabe, Uchiyama, Kong, Kanazawa, Xia, Nagamatsu, Arai, Yamada, Kitamura, Masuta, Harada and Abe2010) discussed by Nelson in this volume corresponds to Det in pea (Foucher et al., Reference Foucher, Morin, Courtiade, Cadioux, Ellis, Banfield and Rameau2003) and most likely the equivalent growth habit determinant in faba bean (Avila et al., Reference Avila, Nadal, Moreno and Torres2006).
The ease with which molecular markers can be developed for crop species has enabled us to obtain high resolution descriptions of the pattern of genetic variation within species. In general, we also find that genetic variation is unevenly partitioned with respect to eco-geographical factors, so an emerging concern is of the extent to which the consequent correlations reflect local adaptation and hence may help direct us to specific subsets of germplasm for different purposes. It seems that the genotypic description of germplasm collections is a powerful tool, especially for inbreeding species. The extent to which genotype data can remain associated with germplasm collections is addressed by Nelson's article in this volume, and has caused much concern widely. If accessions are heterogeneous, then what is the detection limit for particular alleles and how does this propagate to ‘missing data’ in subsequent analyses? One approach has been to replicate germplasm collections from single individuals which were the material for genotypic analysis (Jing et al., 2010), but this may not be appropriate for much larger collections.
The concept of ‘core’ and ‘composite’ collections, for example in the discussion of the pigeonpea collection by Upadhyaya et al., is another recurring theme. This type of sampling represents a way of simplifying analyses, but there is a diversity of opinions on how best to accomplish this simplification. Simplified sets of germplasm have the advantage that they can be managed more easily and phenotyping experiments become feasible, but they have the disadvantage that some variation is lost. It seems likely that a combination of information on genotype and provenance may in the future aid in the definition of different sets of material appropriate for different purposes.
This introduction has discussed the linked themes of phylogenetic relationships between, and diversity within, crop species, which represents a combination central to much of N. I. Vavilov's work. His interest in legumes is presented in several of the articles in this volume, and indeed the genus that bears his name is also discussed by Smykal et al. It is sobering to think of the insights he brought to biology, and especially to plant genetics, with what seems relatively little data. What would he have made of our data and concerns? It seems to me that this creates two distinct challenges. One is to see out of the fog of data to try to find clear and general principles. The other is to use the precision to hand for practical purpose in crop improvement.