Introduction
Cowpea [Vigna unguiculata (L.) Walp.] is a grain legume grown in savannah regions of the tropics and subtropics. It is adapted to very harsh environments and is considered a drought-tolerant legume. The major cowpea-growing countries are Nigeria, Niger, Mali, Senegal, Togo, Benin, Ghana, Chad in West Africa; Tanzania, Somalia, Kenya, Zambia, Zimbabwe, Botswana and Mozambique in eastern and southern Africa; India, Pakistan, Sri Lanka, the Philippines, Bangladesh, Indonesia and China in Asia; and Brazil, West Indies, Cuba and southern USA in America. The Food and Agriculture Organization (FAO) estimates that 3.7 million tonnes of cowpea dry grains were produced worldwide in 2003. Nigeria produced 2.1 million tonnes of this total, making this country the world's largest producer, followed by Niger (400,000 tonnes) and Mali (100,000 tonnes). The total area grown to cowpea worldwide was 9.8 million ha, with about 91% of this in West Africa. While the world average yield was 378 kg ha− 1, it was 440 kg ha− 1 in Nigeria and 114 kg ha− 1 in Niger.
The value of cowpea lies in its high protein content, its ability to tolerate drought, and the fact that it fixes atmospheric nitrogen which allows it to grow on, and improve, poor soils. The major pests attacking cowpea plants are flower thrips (Megalurothrips sjostedti), legume pod borer (Maruca vitrata) and a complex of pod-sucking bugs. Bruchid (Callosobruchus maculatus) causes damage to stored dry cowpea grains. Fungal diseases affecting cowpea include stem and root rots and leaf spot diseases. Viruses cause a mosaic of diseases and mottle symptoms in cowpea. The parasitic weed Striga gesneroides can severely damage cowpea plants. Losses due to pest attack or disease can be as high as 90% (Singh, Reference Singh, Fatokun, Tarawali, Singh, Kormawa and Tamo2002).
Cowpea belongs to the family Leguminosae, subfamily Papliononideae, tribe Phaseoleae and genus Vigna. Vigna is a large and immensely variable genus consisting of more than 85 species divided into seven subgenera. Cowpea falls under the subgenus Ceratotropis under the species unguiculata. Cowpea is a diploid with chromosome number 2n = 2x = 22 and with many cross-compatible or incompatible wild species. For example, V. rhomboidea (Ng, Reference Ng, Smart and Simmonds1995) is a wild perennial creeping or climbing herb, which develops a large woody root stock and shows incompatibility with all other taxa within V. unguiculata. The centre of diversity of wild V. unguiculata and V. rhomboidea is in southern and south-eastern Africa, whereas the centre of diversity of cultivated cowpea V. unguiculata is in West Africa in the area comprising savannah regions of northern Nigeria, southern Niger, Burkina Faso, northern Benin, Togo and northern Cameroon (Padulosi, Reference Padulosi1993).
The International Institute for Tropical Agriculture (IITA) genebank holds over 15,000 accessions of cultivated cowpea collected from 89 countries. At present, over 12,000 accessions have been characterized for 28 agrobotanical descriptors. Germplasm collections were originally set up to preserve the genetic diversity of crop species and their wild relatives. Given that such genetic diversity of crops has an economic value, conservation for use has been the driving force behind many genebanks. The sheer number of accessions making up germplasm collections could be an obstacle for their full exploitation, evaluation and utilization to impact the crop improvement or breeding programmes. In this regard, genetic diversity of such a large collection may not have been adequately evaluated for various biotic and abiotic stresses, due to resource and time constraints. It is impractical to evaluate such large collections in detail as it would be expensive and time consuming. This task could be more easily fulfilled by developing subsets of the whole collection, called active working collections by Harlan (Reference Harlan1972) and core collections by Frankel and Brown (Reference Frankel, Brown, Holden and Williams1984). A core collection should include a maximum of the genetic variation contained in the whole collection with minimal repetitiveness, ideally conserving at least 70% of the alleles in the whole collection (Brown, Reference Brown, Brown, Frankel, Marshall and Williams1989a, Reference Brownb). The ninth activity of the Global Plan of Action addresses the issue of promoting the use of germplasm through expanding the characterization and evaluation, and developing a number of core collections (FAO http://www.fao.org/WAICENT/FaoInfo/Agricult/AGP/AGPS/GpaEN/gpaact9.htm).
Core collections have been established for many crop species, using morphological and genetic marker variation; e.g. bean Phaseolus vulgaris L. (Tohme et al., Reference Tohme, Jones, Beebe, Iwanaga, Hodgkin, Brown, van Hintum and Morales1995); barley Hordeum vulgare (Knüpffer and van Hintum, Reference Knüpffer, van Hintum, Hodgkin, Brown, van Hintum and Morales1995); chickpea Cicer arietinum L. (Hannan et al., Reference Hannan, Kaiser and Muehlbauer1994; Upadhyaya et al., Reference Upadhyaya, Bramel and Singh2001); peanut Arachis hypogaea L. (Holbrook et al., Reference Holbrook, Anderson and Pittman1993; Upadhyaya et al., Reference Upadhyaya, Ortiz, Bramel and Singh2003); quinoa Chenopodium quinoa (Ortiz et al., Reference Ortiz, Ruiz-Tapia and Mújica-Sánchez1998); sweetpotato Ipomoea batatas L. (Huamán et al., Reference Huamán, Aguilar and Ortiz1999); potato Solanum tuberosum L. (Huamán et al., Reference Huamán, Ortiz and Gómez2000a, Reference Huamán, Ortiz, Zhang and Rodríguezb); lentil Lens culinaris Medic (Erskine and Muehlbauer, Reference Erskine and Muehlbauer1991); and cassava Mannihot esculanta Crantz (Cordeiro et al., Reference Cordeiro, Morales, Ferreira, Rocha, Costa, Valios, Hodgkin, Brown, van Hintum and Morales1995). A core collection made up of 699 accessions from 7737 cowpea accessions is available from the GRIN-USDA collection at Griffin (http://www.ars-grin.gov/ars/NoPlains/FtCollins/plants.htm). The objective of this study was to develop a core collection of cowpea using the 28 agrobotanical characters in the larger world cowpea collection held at IITA.
Materials and methods
The cowpea collection at IITA consists of 15,003 accessions from 89 countries (Table 1), with the majority of the accessions from Africa. Africa is the primary centre of diversity for cowpea and this collection should represent the vast genetic diversity of this crop. Of this, 12,594 accessions are landraces collected from farmer's fields and 10,277 of these have been characterized using the cowpea descriptors. Advanced cultivars and breeding or research lines constitute 1422 accessions and 1291 of these have been characterized. Of the 923 accessions of unknown biological status, 880 have been characterized. Sixty-four accessions are weedy or wild cowpea species.
Table 1 Number of accessions based on their biological status in entire and core collections of cowpea at IITA

Frequency distribution of accession between entire and core by biological status χ2 = 8.80; P ≤ 0.18.
Data on the 28 agronomical and botanical descriptors (see Table 3) were collected following the International Board for Plant Genetic Resources descriptor list (IBPGR, 1983). Although the characterization of the germplasm was done over years, as the majority of the descriptors are qualitative and considered highly heritable, the influence of environment is expected to be minimal, if any. For each accession the qualitative traits were converted to binary values, as either presence or absence in that trait class. The data recorded on the quantitative traits were standardized using the range of each variable to eliminate scale differences (Upadhyaya et al., Reference Upadhyaya, Gowda, Pundir, Goppal Reddy and Singh2006).
The entire collection was stratified by biological status followed by the country of origin. For each country and biological status, accessions with complete information on the 28 descriptors were chosen for clustering into groups. In certain cases a full set of 28 descriptors was not available. In such cases, the minimum number of descriptors that could include a maximum number of accessions was determined. For example, there are 103 accessions from Togo, but only 73 had complete data for 17 descriptors for cluster analysis. From the remaining 30 accessions few were included in the core set. A set of 18 accessions, based on the relative size of clusters, was selected. A proportionate number of accessions was also chosen randomly from those accessions that were not included for cluster analysis due to lack of data.
The data were subjected to a hierarchical cluster algorithm based on Euclidean distances between and among accessions, using Ward (Reference Ward1963), and the number of clusters retained were those at an R 2 (squared multiple correlation) in excess of 75% (SAS Institute, 1989). Ward's algorithm is often used for the agglomerative cluster hierarchy when a precise solution for a specified number of groups is not practical. Given N sets, this procedure reduces them to N–1 mutually exclusive sets by considering the union of all possible N(N(1)/2 pairs and selecting a union having a maximal value for the functional relation, or objective function, which reflects the criterion chosen. By repeating this process until only one group remains, the complete hierarchical structure and a quantitative estimate of the loss associated with each stage in the grouping can be obtained. For each country by biological group, the percentage of accessions to be chosen was determined according to the size of the collection, importance of cowpea in the country's agriculture and country's relevance to the centre of diversity for cowpea. The number of accessions chosen in each cluster group in the core collection was determined as a proportion of the number of accessions in the cluster group relative to the total number of accessions in the entire collection. At least one accession from each cluster group was chosen to ensure that all the cluster groups were represented. Accessions were chosen randomly within cluster members.
The diversity for each trait in the entire and core collections was measured using the Shannon–Weaver diversity index (Shannon and Weaver, Reference Shannon and Weaver1946). Homogeneity of distribution of the entire and core collections for each trait was measured by comparing the frequency distribution of the trait using the Chi-squared (χ2) test. For the quantitative traits the means and standard deviation of the entire and core collections were also used as a measure for comparison. Correlations between the descriptors in the core and entire collections were estimated independently, to assess whether any of the trait correlations, which may be under genetic control, were lost in the sampling of the core collection.
Results and discussion
A cowpea core collection consisting of 2062 accessions covering most of the diversity found in the entire collection of 15,003 accessions of cultivated cowpea held at IITA genebank was defined. The analysis of frequency distribution between the entire and this core collection showed heterogeneity between their distributions (χ2 = 478; P = 0.001). This was expected, as the sampling procedure that ensures representation from all countries, including those with very small collections, would result in a higher proportion of accessions in the core from these countries. West Africa and East and Central Africa, which constitute the major centres of diversity for cultivated cowpea, are well represented in the core collection. Therefore, this core collection should cover most of the diversity of this crop.
The frequency distribution between the core and entire collections based on the biological status was homogeneous (Table 1). This suggests that the core collection shows a similar pattern of distribution as the entire collection for biological status. Although the number of accessions selected from the wild and weedy forms is rather small (6), a priori information suggests that this group might contain useful genes for certain traits. Hence, it may be useful based on the need to define a set containing these wild and weedy accessions as their number is small.
The 28 agronomic and botanical descriptors used for grouping of accessions cover a wide range of phenotypic variation (see Table 3). For most countries a major proportion (>70%) of accessions had information on the full set of 28 descriptors. The clustering was based on the accessions with full information. From the remaining accessions that were not included due to lack of descriptor data, a few accessions were selected randomly, representing the same percentage as those with information. Similarly, for countries where data for all the 28 descriptors were not available, the clustering was based on the number of descriptors that accounted for most accessions from that country. This process resulted in lower number of descriptors for 11 countries in the landrace biological status [e.g. in Yemen and Oman the descriptor number of nodes (IBPGR, 1983) was missing, while Brazil, Russia and Central African Republic had the highest number of descriptors missing, i.e. 8]. As the stratification was within a country and biological status, this approach would not bias the grouping strategy. It would ensure maximal information available across a large number of accessions, and it would cover the diversity from country and biological group.
The major interest after developing a core collection was to examine the extent of diversity retained in the core collection that was present in the entire collection. For any quantitative trait this was measured by comparing the mean and the standard deviation of the trait in the core and entire collection (Table 2). For all the quantitative traits the core collection retained the diversity that was present in the entire collection, suggesting that a non-biased procedure was used for defining the core and also that the diversity was fully represented in the core for these traits.
Table 2 Number of accessions, mean and standard deviation for the quantitative traits in the entire collection and in the core collection of cowpea at IITA

For qualitative traits with distinct classes, the comparison of the frequency distribution showed differences in the distribution for some traits (Table 3). Often the interest in a trait is in the diversity of the trait and how it is retained in the core. For all the descriptors, the diversity in the core and in the entire collection was similar. Cowpea descriptors are very diverse in the entire collection held at IITA genebank, and the ensuing core collection retains this diversity across all the descriptors.
Table 3 Shannon–Weaver diversity index and comparison of frequency distribution (χ2) for qualitative (*) and quantitative descriptors between the entire and core collections of cowpea held at IITA genebank

Traits are often correlated with each other, and while developing a core collection these associations could be lost. The correlations between the descriptors in the core and entire collection showed no significant differences for the qualitative traits. Only pod length and number of branches, which showed no relationship in the entire collections, showed a significant relationship in the core collection (Table 4). Early flowering was associated with small leaflets, fewer pods and fewer seeds per pod, and this association was retained in the core but the magnitude was lower.
Table 4 Correlation coefficients between quantitative traits in the entire collection (below diagonal) and core collection (above diagonal) of cowpea held at IITA genebank. Absolute values of 0.020 are highly significant at P>0.01

This cowpea core collection, constituted to cover the entire diversity in the collection held at IITA, will open up new opportunities for assessing and evaluating germplasm for various biotic and abiotic stresses. Such evaluation should facilitate discovery of genes for resistance to traits of interest, and link them to both descriptor information and geographical information to probe the collections further in search of these genes in different genetic and agronomic backgrounds.
One of the major challenges today for genebank curators is to ensure effective and economical ways of conserving diversity. Duplication and redundancy in collections has always been a major challenge faced by the curators. Core collections represent the entire diversity in the crop but have limited use to those users who are not interested in entire diversity but rather in accessions meeting a specific diversity of traits or domain. With the modern revolution in information and computing technology it is possible to sample diversity based on traits of interest, e.g. disease resistance, using procedures such as core selector (van Hintum, Reference van Hintum1999; Mahalakshmi et al., Reference Mahalakshmi, van Hintum and Ortiz2003).
Core collections should not be considered static but dynamic to accommodate the various needs. If sufficient neutral molecular markers were available to cover the entire genome of cowpea, it would be possible to relate the diversity found outside the centres of diversity to those from the centre of diversity. The availability of molecular markers provides new avenues to probe these collections in search of new allelic forms of existing genes, and also in discovery of new genes. Seedborne viruses are one of the major challenges limiting the movement of cowpea germplasm worldwide. Many countries have stringent regulations for importing seeds. Virus testing and the production of pathogen- and pest-free seeds of cowpea are currently under way at IITA, but the cleaning of 15,003 accessions of cowpea needs a planned strategy, which will allow the movement of seeds and also cover users' needs. Development of this core collection has served this purpose and need. The seeds of this cowpea core collection are virus tested and available for distribution from the IITA genebank (iita@cgiar.org).