Introduction
Oil palm (Elaeis guineensis Jacq.) (2n= 2x= 32; 1.8 Gb) is a perennial oil crop, with high oil yield production per hectare compared with other vegetable oilseeds. The global production of oil palm in 2012–2013 was 55,969 thousand metric tons (USDA, 2014). The global harvesting area in 2011 was 3.60% of the 252.83 million hectare of total harvesting oilseed area. With a total 153.95 Mt of vegetative oil produced in 2011, oil palm had the highest oil yield production (36.30%) of any crop compared with soyabean (26.9%) and rapeseed (15.4%) (MPOC, 2014).
Simple sequence repeat (SSR) markers are useful markers for assessing genetic variation within and between populations, genome mapping and paternity confirmation (Kale et al., Reference Kale, Pardeshi, Kadoo, Ghorpade, Jana and Gupta2012). Next-generation sequencing (NGS) is a technology for large-scale genome sequencing and for exploring SSR markers in plants. The Illumina platform is an NGS approach that allows sequencing without genomic library construction, which is more accurate and more cost and time effective than conventional sequencing technologies (Van et al., Reference Van, Rastogi, Kim and Lee2013). The main objective of our study was to use NGS to shorten the time required for constructing and sequencing a genomic library of oil palm breeding material in order to isolate and pre-evaluate a large number of SSR markers.
Experimental
Three oil palm clones (A, B and C) were obtained from a breeding population of Golden Tenera Limited Partnership (Krabi, Thailand). Each clone comprised F2 plants derived from each F1 plant. Clone A was developed from a cross between Ulu Remis Dura and Dumpy AVROS Pisifera, clone B was derived from Deli Dura and Dumpy Pisifera (SP540P), while clone C was derived from Deli Dura and Dumpy Pisifera (S29/36P). Altogether, six Dura (D) and five Pisifera (P) plants were selected from about 100 F2 plants (10% selection intensity) to establish a set of parents for further testing. All selected plants possessed slow stem growth with no crown disease. The selected Dura plants gave an average yield of over 4 tons/year and over 25% oil/bunch at 7 years old. Clone A included palms no. D1, D2, D4, P2 and P7; clone B included D3, D5, D6, P3 and P4; while clone C was derived from a single ortet, P5.
Genomic DNA was extracted from young leaves Tanya et al. (Reference Tanya, Taeprayoon, Hadkam and Srinives2011). One of the maternal parents (D4) was chosen for sequencing using the Illumina HiSeq platform. Genome sequence assembly was carried out using the ABySS1.3.2 software (Simpson et al., Reference Simpson, Wong, Jackman, Schein, Jones and Birol2009). Identification of SSR motifs was performed by MISA software (Thiel et al., Reference Thiel, Michalek, Varshney and Graner2003). PRIMER3 (Rozen and Skaletsky, Reference Rozen and Skaletsky2000) was used to design SSR primer. PCR was amplified in 10 μl reactions containing 10 ng DNA, 10 × PCR buffer with MgCl2 (Vivagen, Seoul, Republic of Korea), 2.5 mM dNTPs, 10 μM for primer pairs and 1 U Taq polymerase. The amplification program consisted of de-naturation (95°C) for 10 min, 30 cycles of 95°C for 30 s, 54°C for 30 s and 72°C for 30 s with a final extension (72°C) for 5 min in a PCT-100™ Thermal Controller (MJ Research, USA). The Fragment analyzer™ Automated CE System was used to analyse PCR products. Based on the UPGMA method, a dendrogram was clustered using NTSYS-pc version 2.20e (Rohlf and Sokal, Reference Rohlf and Sokal1981). Polymorphic information content (PIC), expected heterozygosity (H e) and observed heterozygosity (H o) were calculated using the PowerMarker 3.25 program (Liu and Muse, Reference Liu and Muse2005).
Discussion
Using 386,996,504 reads of clone D4, de novo assembly generated a total of 218,183 contigs. This Whole Genome Shotgun project has been deposited at DDBJ/EMBL/GenBank under the accession no. JRVM00000000. The version described in this paper is version JRVM01000000. SSR motif search in these contigs identified 76,032 monomers (58.11%), 42,532 di-mers (32.51%), 6604 tri-mers (3.71%), 4859 tetra-mers (5.05%), 585 penta-mers (0.45%) and 228 hexa-mers (0.17%). For mononucleotides, A/T were more abundant (99.72%) than C/G (0.28%) as reported in E. oleifera (Zaki et al., Reference Zaki, Singh, Rosli and Ismail2012). Feng et al. (Reference Feng, Li, Huang, Wang and Wu2009) reported that mononucleotides were not considered for analysis. Among di-nucleotides, the AT/AT (51.13%) motif was the most abundant repeat followed by AG/CT (23.24%) and AC/GT (23.20%). Tri-nucleotides comprised mainly AAT/ATT (41.3%) and AAG/CTT (32.98%). The most abundant in tetra-nucleotide motifs were ACAT/ATGT (71.89%) and AAAT/ATTT (16.26%) while penta-nucleotide motifs were AATAT/ATATT (34.02%) and AAATT/AATTT (25.98%). Zhao et al. (Reference Zhao, Roxanne, Prakash and He2013) worked in date palm (Phoenix dactylifera L.) and found highly abundant di-nucleotides AG/CT (85.7%), AC/GT (8.2%), AT/TA (5.4%) and GC/CG (0.7%) while among the tri-nucleotides AGG (26.8%) and AAG (9.3%) were most abundant. Ting et al. (Reference Ting, Noorhariza, Rozana, Low, Ithnin, Cheah, Tan and Singh2010) reported SSR mining in an oil palm (E. guineensis) EST database and found that the major di-nucleotides consisting of AG/CT (66.9%), AT/AT (21.9%) and AC/GT (10.9%), tri-nucleotides consisting of AAG/CTT (23.3%), AAG/CTT (23.3%), AGG/CCT (13.7%) and AAT/ATT (10.8%). Our analysis found 130,840 SSR motifs in the 499,254,157 bp sequences examined (Table 1). Only 763 (11.55%) out of 6604 tri-nucleotide sequences had primers designed for SSR markers. The other motifs had insufficient flanking sequences to design a pair of primers. The designed markers were classified into ten classes (Table 1). Class 1 was the most abundant (41.3%), whereas class 8 (0.1%) and class 10 (0.3%) were the least. The (AAT)n motifs in class 1 and (AAG)n in class 2 gave the largest number at 100 and 70 SSR primers, respectively. Out of 763 SSR primer pairs, 144 were used for pre-evaluation of primers using the 11 elite oil palms. The numbers of primers designed from classes 1, 2, 3, 6 and 9 were 48, 39, 27 20 and 10 pairs, respectively. These motifs were commonly found in oil palm, i.e. E. guineensis EST-SSR (Singh et al., Reference Singh, Noorhariza, Ting, Rozana, Tan, Low, Ithnin and Cheah2008) and E. oleifera SSR (Noorharriza et al., 2012). Out of the 144 SSR markers, 61 were amplifiable. There were 18.03, 31.15, 26.23, 13.11 and 11.48% from class 1, 2, 3, 6 and 9, respectively. These primers detected 371 polymorphic alleles ranging from 3 to 11 alleles per locus (average of six alleles). The highest and lowest PIC values were 0.75 and 0.62 with an average of 0.68 similar to previously reports by Noorharriza et al. (2012) (PIC = 0.63). The H e (0.72) and H o (0.41) showed H e was higher than H o implying that the oil palm samples might be affected by repeated rounds of breeding cycles that have caused deviation from the Hardy–Weinberg equilibrium. The higher H o in our work compared with that reported in E. oleifera (H o= 0.16) by Noorharriza et al. (2012) revealed that E. guineensis used in our experiment had higher heterozygosity. The E. oleifera samples were collected from isolated areas in four countries of South-Central America. The dendrogram separated the palms into three groups at 0.45 Jaccard's coefficient (Fig. 1) with 0.81 cophenetic correlation. We concluded that NGS technology using Illumina HiSeq platform is a powerful tool for sequencing and developing valuable molecular markers from a large and complex genomes.
Table 1 Summary of sequencing information, genome assembly, SSR screen and frequency of the tri-nucleotide repeat motif used for primer design
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170128204004-03547-mediumThumb-S1479262115000143_tab1.jpg?pub-status=live)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160921031732150-0178:S1479262115000143:S1479262115000143_fig1g.gif?pub-status=live)
Fig. 1 A dendrogram of genetic relatedness among 11 oil palm parents based on UPGMA clustering. Cluster analysis clearly separated these into three groups with a 0.45 Jaccard's similarity coefficient. Samples from the same clone tended to group together.
Supplementary material
To view supplementary material for this article, please visit http://dx.doi.org/10.1017/S1479262115000143
Acknowledgements
Puntaree Taeprayoon was granted a scholarship (CHE-PhD-SW) from Commission on Higher Education (CHE). The authors thank (1) Center of Excellence in Oil Palm Biotechnology for Renewable Energy, CHE (2) Center for Advanced Studies for Agriculture and Food (CASAF), Kasetsart University, (3) Next Generation BioGreen 21 Program (code no. PJ0110262015), Rural Development Administration, Republic of Korea, and (4) Golden Tenera Limited Partnership, Krabi, Thailand.