Introduction
Cassava has an edible starchy tuberous root, well adapted to poor soils with minimal inputs compared with other crops (FAO Report 2006). Tanzania is one of the major producers of cassava in Africa where the average cassava productivity is 8.5 t/ha (Mkamilo and Jeremiah, Reference Mkamilo and Jeremiah2005), which is below the reported productivity of 14–35 t/ha (Lebot, Reference Lebot2009). Biotic and abiotic stresses have contributed to the low cassava productivity. Improved productivity can be achieved through the development and deployment of improved varieties that are tolerant. Cassava is a highly genetically diverse crop (Hershey, Reference Hershey1994), vegetatively propagated through stem cuttings. Sexual reproduction is amenable and farmers incorporate seedlings, generating new landraces (Pujol et al., Reference Pujol, Renoux, Elias, Rival and Mckey2007) or adding to morphologically identical landraces (Elias et al., Reference Elias, Penet, Vindry, MacKey, Panaud and Robert2001). The term landrace used here refers to a set of clones identified by farmers by a single name. Cassava farmers keep a diverse set of landraces, to suit their preferences (Mtunguja et al., Reference Mtunguja, Laswai, Muzanila and Ndunguru2014). This practice has increased genetic diversity and biodiversity. Reported that least grown landraces were more heterogeneous than those commonly grown. This is due to the selection of few desirable traits during the breeding of commercial varieties (Raman et al., Reference Raman, Stodart, Cavanagh, Mackay, Matthew, Milgate and Martin2010). The assessment of diversity in germplasm maintained by farmers is important in guiding germplasm conservation efforts and maintaining biodiversity. It also guides crop improvement efforts for developing varieties for commercial exploitation.
Morphological descriptors are very important in cassava germplasm identification despite reports of low discriminatory power of these descriptors (Benesi et al., Reference Benesi, Lubuschagne and Mahungu2010; Kawuki et al., Reference Kawuki, Ferguson, Labuschagne, Herseliman and Kim2009). Molecular markers are more efficient in measuring genetic diversity, as they directly quantify genetic variability at the DNA level. Next-generation sequencing (NGS) technologies identify thousands of markers across the entire genome of interest in a single experiment (Varshney et al., Reference Varshney, Nayak, May and Jackson2009), thus making single nucleotide polymorphisms (SNPs) ideal genetic markers (Ching et al., Reference Ching, Caldwell, Jung, Dolan, Smith, Tingey, Morgante and Rafalski2002). Genotyping-by-sequencing (GBS) using the TASSEL pipeline provides a robust bioinformatics tool for efficient processing of raw sequence data for the identification of robust and high-resolution SNPs at the entire genome level.
SNP markers have become an attractive high-throughput genotyping molecular technology, and its cost per data point is relatively low (Rafalski, Reference Rafalski2002; Oliveira et al., Reference Oliveira, Ferreira, Silva Santos, de Juses, Oliveira and Silva2014). SNPs have been revealed to have a high discriminatory power compared with other molecular markers and to be suitable for cassava genotyping (Kawuki et al., Reference Kawuki, Ferguson, Labuschagne, Herseliman and Kim2009; Garcia-Lor et al., Reference Garcia-Lor, Curk, Snoussi-Trifa, Morillon, Ancillo, Luro, Navarro and Ollitraut2013; Oliveira et al., Reference Oliveira, Ferreira, Silva Santos, de Juses, Oliveira and Silva2014). However, a significant number of SNPs are required to achieve the discrimination level provided by simple sequence repeat (SSR) markers (Kawuki et al., Reference Kawuki, Ferguson, Labuschagne, Herseliman and Kim2009). This is possible as NGS generates thousands of markers (SNPs) in a single experiment. SNPs have an advantage over SSR makers due to their abundance in a genome, greater reliability of results and cost-effectiveness (Mammadov et al., Reference Mammadov, Aggarwal, Buyyarapu and Kumpatla2012). The association of SNPs with traits of agricultural importance will be the target for future breeding programmes for improved cassava productivity. In this study, morphological descriptors and SNPs were used to assess the genetic diversity of cassava landraces. This information on unexploited alleles will be useful for breeders in cassava improvement programmes.
Materials and methods
A total of 52 farmer-preferred cassava landraces with their passport data were collected from the eastern zone of Tanzania in February 2012, and used for this study.
Morphological characterization
The collected landraces were planted at Chambezi, Bagamoyo in three replicates in April 2012. Morphological descriptors (Table 1) were recorded at 6 and 9 months after planting according to Fukuda et al. (Reference Fukuda, Guevara, Kawuki and Ferguson2010). Each morphological trait was scored as number and later transformed into the binary matrix (Tairo et al., Reference Tairo, Mneney and Kullaya2008). Clustering was performed by the unweighted pair group method with arithmetic mean (UPGMA) using NTSYS 2.1 (Rohlf, Reference Rohlf2009).
Table 1 Morphological traits used for the characterization of cassava landraces
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170306064702-58029-mediumThumb-S1479262115000453_tab1.jpg?pub-status=live)
Molecular characterization using NGS
DNA extraction
Fresh young cassava leaves were collected and stored at − 80°C, and DNA extraction was done according to Dellaporta et al. (1990). DNA concentration and quality (ratio 230/210) was determined using a NanoDrop spectrophotometer. Agarose gel electrophoresis (1% agarose gel concentration) was also used to check the quality of DNA.
Library preparation and Illumina genome sequencing
Library preparation was done following the RESCAN procedure (Monson-Miller et al., Reference Monson-Miller, Sanches-Mendez, Fass, Henry, Tai and Comal2012). The procedure involved restriction enzyme digestion of genomic DNA using enzyme NlaIII, size selection (~500 bp), ligation with Illumina Y adapters, clean-up of ligated products, enrichment by PCR, pooling the libraries into a single sample and finally sequencing. Concentration of the libraries was determined using SYBR Green in order to properly balance the samples during pooling.
Identification of SNP markers for the cassava landraces using GBS
Raw quality score of the sequence was determined using FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Reads with low quality score (Phred score < 20) and adapter contamination were filtered out. The barcode information of the reads from the original Fastq file in combination with the master tag count file was used to parse the tag count for each landrace. Only the tags that were detected in 45 or more, out of 52, landraces were used for the identification of SNPs. Quality-filtered reads were then processed through the TASSEL-GBS pipeline that split the reads into individual samples based on barcodes and aligned the reads to the reference cassava genome [Mesculenta: cassavaV5_0.chromosomes (Phytozome v10)].
Key settings for SNP calling were based on the following criteria: first, minimum minor allele frequency (mnMAF): 0.02. SNPs that passed the minor allele frequency of 0.02, i.e. minor allele that appeared in at least one taxon, were reported. Minor allele refers to the least common allele. Second, the minimum locus coverage was 0.9. SNPs that were detected from tags present in at least 90% of landraces were reported. Third, the minimum value of inbreeding coefficient was set at − 0.1. Negative inbreeding coefficient was used because the samples did not contain inbred lines of selfing species. SNPs were then filtered and merged if they had the same pair of alleles, and if their mismatch rate was not greater than 0.1. SNPs were further filtered on minimum minor allele frequency (mnMAF) of 0.02 and minimum site coverage (mnSCov) of 0.9, i.e. coverage of the position in at least 90% of landraces. SNPs were identified from each tag locus alignment for all the landraces using TOPM and TagsByTaxa. Identified SNPs for individual landraces were aligned across all the chromosomes (Glaubitz et al., Reference Glaubitz, Cassteve, Lu, Harriman, Elshire, Sun and Buckler2014). Allele frequency was calculated and a dendrogram was constructed using NTSYS 2.1 (Rohlf, Reference Rohlf2009).
Results
Morphological characterization
Only 24 morphological descriptors were used in the study for the characterization of 52 cassava landraces, which generated 82 markers (Table 1). Leaf veins and petiole colour descriptors divided the landraces into two main groups. About 50% of landraces had green leaf veins and 42.3% had red petioles. Cluster analysis performed on the similarity matrix using the UPGMA to generate a dendrogram (Fig. 1) separated the landraces into three clusters (I, II and III) at a genetic distance (GD) of 1.02 (similarity coefficient 0.69). Cluster I, comprising 43 landraces, was further divided into sub-clusters A, B and C, and sub-cluster IB was further divided into three subgroups (i, ii and iii).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170306064702-92492-mediumThumb-S1479262115000453_fig1g.jpg?pub-status=live)
Fig. 1 Dendogram showing the diversity of 52 cassava landraces generated from 24 morphological descriptors using genetic distance coefficient by Jaccard coefficient.
Cluster IA contained eight landraces, including Beba and Sinawangu, which were the closest landraces in this cluster with GD of 0.28. The landraces in this cluster had green branch colour and cream colour stem epidermis. Cluster IB subgroup (i) had 11 landraces, which were characterized by green apical leaves and seven leaf lobes. Closest landraces in this subgroup were Mwarusha and Mwalimuhamisi, with GD of 0.36. Cluster IB subgroup (ii) comprised 13 landraces characterized by dark green leaf colour. Nyamkagile and Kasunga were the closest landraces in this subgroup with GD of 0.15 (similarity coefficient 0.93). Nyamkagile was collected from Mkuranga and Kasunga from Muheza. In this study, these two landraces were found to be morphologically the same using the 24 morphological descriptors. Cluster IC contained 11 landraces and was characterized by having seven leaf lobes and light green leaf colour.
Cluster II comprised five landraces and was characterized by orange colour stem cortex and had lanceolate leaf shape. Cluster III consisted of four landraces characterized by green colour branch, purplish green apical leaves and oblong-lanceolate shape of central leaflet. The principal component analysis (PCA) (Supplementary Fig. S1, available online) revealed that the first three components accounted for 74% of the variability observed between the farmer-preferred landraces. This analysis further confirmed the clustering of landraces obtained by UPGMA analysis.
Molecular characterization
Number and distribution of SNPs in cassava chromosomes
A total of 17,393 variant positions were identified in this study and aligned across the chromosomes (Fig. 2). The abundance of SNPs varied remarkably among the 18 cassava chromosomes (Fig. 2) across all the 52 samples. The highest number of SNPs (1335) was found in chromosome 2, followed by chromosome 14, which had 1315 SNPs. The least number of SNPs (734) was found in chromosome 18. A dendrogram generated based on chromosome (Fig. 3) divided the chromosomes into three major clusters. Cluster I comprised chromosomes 4, 9 and 7, while cluster II had chromosomes 3, 14, 10, 1, 2, 11, 8 and 13 and cluster III contained chromosomes 18, 17, 5, 12, 6, 16 and 15. The relationship between cassava chromosomes and the nature of SNPs they contained is depicted in a dendrogram (Fig. 3).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170306064702-49106-mediumThumb-S1479262115000453_fig2g.jpg?pub-status=live)
Fig. 2 SNPs distribution across 18 cassava chromosomes based on reference cassava genome [Mesculenta: cassavaV5_0.chromosomes (Phytozome v10)].
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170306064702-68850-mediumThumb-S1479262115000453_fig3g.jpg?pub-status=live)
Fig. 3 Tree based on SNPs aligned per cassava chromosomes derived from 52 cassava landraces.
Genetic relationship
The neighbour-joining dendrogram constructed using GDs derived from SNPs grouped cassava landraces into five major clusters (Fig. 4). Cluster 1 comprised 27 landraces, and was further divided into six subgroups: Dide, Beba, Sufi, Kiguuchaninga, Kichooko and Moshiwataa. Dide and Beba (in subgroup 1A), which appeared to be genetically the same, were also clustered together in the morphological analysis. Subgroup 1B contained four landraces, namely Magereza, Cheusi, Mwarusha and Dihanga. Subgroup 1C also comprised four landraces, namely Makaniki, Kilusungu, Nyamato and Kabangi. Subgroup 1D comprised three landraces, namely Mwanamkenyonga, Kiroba and Bwanamrefu. Subgroup 1E comprised four landraces, namely Shemakange, Kichongameno, Mbiliti and Rasta. Finally, the last subgroup 1F consisted of six landraces, namely Barawa, Nyamkagile, Mgeni, Mamosi, Tandika and Nyamwali. Cluster II comprised four landraces, namely Kalolo, Cheupe, Kigoma and Msenene. Msenene and Cheupe, in this group, were clustered together (in cluster IC) in morphological analysis. Cluster III contained 15 landraces, which were divided into three subgroups. Subgroup 3A comprised Cosmas, Kikombe, Mshelisheli and Mahiza. Subgroup 3B comprised Tengatele, Kasunga, Sinawangu, Kibandameno, Ponjoo, Shibatumbo, Mwamuonage, Pushuli, Pusuu and Mbande. Cultivars Shibatumbu and Pusuu clustered together in morphological analysis, and also both Ponjoo and Mbande clustered together at a GD of 0.67. Subgroup 3C consisted of only one landrace, namely Mzungu mwekundu.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170306064702-79545-mediumThumb-S1479262115000453_fig4g.jpg?pub-status=live)
Fig. 4 Dendogram obtained using unweighted pair group arithmetic method (UPGMA) showing diversity of 52 cassava landraces based on genetic distance matrix derived from SNP markers.
Cluster IV comprised four landraces, namely Mfaransa, Mwalimuhamisi, Heya, Kitingishandevu. Landraces Mwalimuhamisi and Kitingishandevu were in the same subgroup (GD 0.6) in the dendrogram generated by morphological descriptors. Cluster V comprised two landraces, namely Mbega and Mzungumweupe. Landraces in clusters IV and V branched earlier than those in the other clusters, showing that the landraces in these two groups were older than those in the other clusters and may have played a parental/ancestral role.
Genetic diversity
The PCA plot (Supplementary Fig. S2, available online) and genetic distance matrix (Table 2) showed how diverse and polymorphic were the 52 cassava landraces collected from farmers' fields. The first three principal components explained 69% of the variability present in the germplasm. These analyses further support the diversity shown by the dendrogram (Fig. 4) constructed from the GD derived from SNPs.
Table 2 Genetic distance matrix across the cassava chromosomes based on single nucleotide polymorphism data from the 52 Tanzanian cassava landraces
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170306064702-89872-mediumThumb-S1479262115000453_tab2.jpg?pub-status=live)
OTU, operational taxonomic units.
Discussion
The present study shows that sequencing-based genotyping can be used to cost-effectively generate robust SNP data for genetic studies. In this study, the genetic relationship and diversity of 52 farmer-preferred cassava landraces in Tanzania were successfully characterized using morphological and NGS data. Both morphological and SNP analysis data revealed considerable variability among the 52 cassava landraces, and cluster analysis did not segregate landraces according to geographic location. In general, the internal branches of the dendrogram from SNP analysis were short, while the external branches were long, indicating that within-group variability was higher than between-group variability. This infers that within a geographical location, the cultivars were more diverse and there was no misnaming of landraces. It has been previously reported (Mkumbira et al., Reference Mkumbira, Chiwona-Karltun, Lagercrantz, Mahungu, Saka, Mhone, Bokanga, Brimer, Gullberg and Rosling2003) that farmers were knowledgeable in identifying known varieties within their locality. Raji et al. (Reference Raji, Dixon and Ladeinde2007) reported that planting material exchange among farmers takes place over a wide agroecological area and landraces are ascribed different names, and in that process, they lose their original identity. This is the case because within the same locality, diversity is high among landraces.
Morphological descriptors allow for the rapid characterization of germplasm, especially for traits that show high heritability (Benesi et al., Reference Benesi, Lubuschagne and Mahungu2010; Mezette et al., Reference Mezette, Blumer and Veasey2013). Colour traits, shape of central leaf lobe and branching habit form the basis for clustering and are used by farmers in distinguishing cultivars (Elias et al., Reference Elias, Penet, Vindry, MacKey, Panaud and Robert2001; Benesi et al., Reference Benesi, Lubuschagne and Mahungu2010).
In this study, landraces Nyamkagile and Kasunga collected from different regions had GD of 0.15 (93% similarity coefficient), implying that they were morphologically the same (or actual duplicates). However, SNP analysis revealed that they were genetically different. Benesi et al. (Reference Benesi, Lubuschagne and Mahungu2010) observed the same trend whereby cultivars Mutuvi and Depwete were 100% morphologically similar but clustered separately when subjected to the amplified fragment length polymorphism analysis.
Landraces Dide and Beba were clustered together by both morphological characterization and SNP analysis . Furthermore, Mwarusha and Mwalimuhamisi were clustered together by morphological characterization with 85% similarity (GD 0.4), but clustered differently by SNP markers. Morphological descriptors separated Pushuli and Pusuu at GD 1.18 (65% similarity), but SNP analysis revealed that they were genetically identical only at the GD of 0.002. Kizito et al. (Reference Kizito, Chiwona-Karltun, Egwang, Fregene and Westerbergh2007) reported that cultivars are sometimes renamed when introduced to new fields. Therefore, these landraces are actually duplicates and suggested the possibility of intra-variety diversity, as reported by Elias et al. (Reference Elias, Penet, Vindry, MacKey, Panaud and Robert2001). Vieira et al. (Reference Vieira, Carvalho, Bertan, Kopp, Zimmer, Benin, Silva, Hartwig, Malone and Oliveira2007) suggested that to obtain a more complete understanding of the degree of genotype divergence, it is necessary to consider the molecular and morphological data separately.
The relationship of SNPs within chromosomes shows the nature of SNPs found in the cassava genome. Chromosomes 4, 9 and 7 contained SNPs that are related, and chromosomes 8 and 3 were also shown to be closely related. This implies that chromosomes within the same cluster may carry similar alleles. So, during the development of markers to aid the selection of traits with agronomic importance, common SNP markers found on one of these chromosomes could be used to avoid the cost of screening more SNPs. Conversely, if unique SNPs were desired, then the SNPs that overlap between chromosomes could be avoided. Our SNP analysis provides a foundation for the deep exploration of cassava diversity and gene–trait relationship, and their use for future cassava improvement.
From this study, we can infer that morphological traits offer a quick evaluation tool to assess germplasm diversity. This study has shown the existence of high morphological variability that exists among cassava landraces. The diversity observed in farmers' fields was confirmed by this study. The study also confirmed that farmers were knowledgeable in identifying and distinguishing landraces found in their field, and they play a significant role in shaping the biodiversity by growing diverse landraces in their farms. Low discriminating power for morphological descriptors has been shown, especially for morphologically similar genotypes. This study revealed that similar landraces could not be easily distinguished using morphological descriptors, but SNP analysis was able to differentiate between them. SNP markers were able to discriminate morphologically similar landraces (Kasunga and Nyamkagile) and morphologically different landraces: Pusuu and Pushuli were found by the SNP analysis to be genetically near identical. The advantage of SNP to discriminate closely related individuals has been shown in this study. SNP analysis confirmed the morphological characterization for landraces Dide and Beba. In our study, both morphological and molecular analyses showed Mbega among the most divergent landrace. This collection revealed a wide range of genetic diversity and represented a valuable resource for trait improvement, enabling the capture of farmer-preferred traits in future cassava breeding programmes. Desirable traits can be exploited and incorporated during breeding programmes. For example, Kiroba, a popular landrace that has been recommended due to its tolerance to the major cassava mosaic disease, clustered together with Bwanamrefu and Mwanamkenyonga. These landraces can also be evaluated for the tolerance to major cassava viral diseases. Data generated from this study will help the breeders to devise more appropriate and cost-effective breeding strategies, and will aid in deciding which germplasm to conserve.
Supplementary material
To view supplementary material for this article, please visit http://dx.doi.org/10.1017/S1479262115000453
Acknowledgements
This study was funded by Tanzania's Commission for Science and Technology (COSTECH) and Borlaug Leadership Enhancement in Agriculture Program (LEAP). Borlaug LEAP funded the DNA sequencing work through a grant to the University of California-Davis by the United States Agency for International Development (USAID). The authors are grateful to Dr Alois Kullaya and Dr Fred Tairo for helping in the morphological data collection and analysis. They also gratefully acknowledge the farmers who provided the cassava cuttings and valuable information on the germplasm.