Introduction
Molecular tools such as microsatellites are useful for investigating the population dynamics and social biology of forest and agricultural pests (Lundquist & Klopfenstein, Reference Lundquist and Klopfenstein2001; Ellis et al., Reference Ellis, Blackshaw, Parker, Hicks and Knight2009). Successful implementation of these tools has been hindered by the difficulty associated with the isolation of genetic markers de novo for each study species. Historically, the development of microsatellite markers was undertaken by screening small-insert genomic DNA libraries or repeat-enriched libraries (see Zane et al., Reference Zane, Bargelloni and Patarnello2002). The success of these time-consuming and expensive procedures has been limited by their dependence on the repeat motif of the probes used; probes found to be useful for one species may be less useful for another because the frequency of each repeat motif differs between species and taxa (Gao et al., Reference Gao, Cai, Yan, Chen and Yu2009). There are numerous examples where, despite extensive investment of time and resources, microsatellite isolation has failed or resulted in a very low yield of polymorphic markers (see, for example, Schmuki et al., Reference Schmuki, Blacket and Sunnucks2006; Arthofer et al., Reference Arthofer, Schlick-Steiner, Steiner, Avtzis, Crozier and Stauffer2007). However, recent advances in the technology and accessibility of high-throughput genomic sequencing (see review by Hudson, Reference Hudson2008) are providing a much more efficient and cost-effective method for the acquisition of genetic markers, including microsatellites (Abdelkrim et al., Reference Abdelkrim, Robertson, Stanton and Gemmell2009; Santana et al., Reference Santana, Coetzee, Steenkamp, Mlonyeni, Hammond, Wingfield and Wingfield2009).
The eusocial ambrosia beetle, Austroplatypus incompertus (Schedl) (Coleoptera: Platypodidae), excavates and inhabits multi-branched tunnels in a horizontal plane within the heartwood of living trees (Kent & Simpson, Reference Kent and Simpson1992). Infested trees are of the genus Eucalyptus, including Eucalyptus pilularis, a critical species in plantations and re-growth forests across the eastern tablelands of Australia (Kent, Reference Kent1997). Economic loss to the timber industry in Australia results from timber downgrade due to visual and structural defects inflicted by the beetles tunnelling, and the growth of associated fungi (Kent, Reference Kent2008). As with many other forest pests (especially bark or wood-boring beetles), direct behavioural observations are unattainable due to their cryptic lifestyle. As such, we resolved to use molecular markers to explore their social system, population structure and dispersal.
In our initial investigations, a traditional microsatellite isolation technique based on a partial genomic library (identical to that reported in Smith & Stow, Reference Smith and Stow2008) implicated a very low frequency of microsatellites in the genome of A. incompertus. Through this work, we isolated and characterized only one informative microsatellite locus (Smith et al., Reference Smith, Beattie, Kent and Stow2009). Investigations using a biotin enrichment protocol identified an additional four microsatellite markers (protocol briefly described below). Advancements in next-generation sequencing signify the end of such laborious techniques. A next-generation sequencing platform enabled us to characterize a further nine microsatellite markers, demonstrating the utility of this approach for other projects requiring the use of co-dominant molecular markers.
Methods and results
Sample collection and DNA extraction
Emerging beetles were collected from 32 galleries within Olney State Forest, NSW, Australia (33.09932°, 151.34210°). Each Austroplatypus incompertus gallery is characterized by a single entrance tunnel (Kent & Simpson, Reference Kent and Simpson1992). Brass gauze micro-cages (as outlined in Kent, Reference Kent2001) were placed on the entrances of galleries during their emergence period, and beetles were collected and stored in 90% ethanol. Genomic DNA was extracted from the whole body of one individual from each gallery using ammonium acetate precipitation (Bruford et al., Reference Bruford, Hanotte, Brookfield, Burke and Hoelzel1998).
Microsatellite development
DNA from five individuals was combined to construct each enriched genomic library as described in Banks et al. (Reference Banks, Piggott, Williamson and Beheregaray2007). Briefly, DNA was digested with the enzymes RsaI, AluI and HaeIII (New England Biolabs) and fragments of size 200–900 bp ligated to oligo-adapters and annealed to biotinylated probes (dGA10 and dCA10). These fragments were selectively purified using streptavidin magnetic particles (Promega) and amplified using polymerase chain reaction (PCR). The annealing protocol and PCR was repeated, the fragments integrated into plasmids, and TA-cloning (pCR 2.1-TOPO, Invitrogen) was undertaken. The whole process from DNA extraction and digest to TA cloning was repeated for five libraries. Two of these libraries resulted in preferential enrichment of a small number of sequences (14 clones sequenced from each library revealed no microsatellites and many clones contained identical sequences). Sequences obtained from the remaining three libraries revealed 15 out of 74 clones harboured a microsatellite (di-, tri-, tetra- or hexa nucleotides repeated five or more times), ten of which were imperfect/compound microsatellites. Primers were developed using PRIMER 3, version 0.4 (Rozen & Skaletsky, Reference Rozen, Skaletsky, Krawetz and Misener2000) for the remaining five microsatellites, four of which amplified (using the PCR protocols described below) and were polymorphic (Ai3, Ai4, Ai5 and Ai6; table 1).
Table 1. Characteristics of 13 polymorphic microsatellite loci in the ambrosia beetle, Austroplatypus incompertus, isolated from an enriched genomic library (EGL) or using high-throughput sequencing (454).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160713205849-33872-mediumThumb-S0007485311000137_tab1.jpg?pub-status=live)
Repeat motif, the repeat motif found in the clone or contig; N, sample size; N A, number of alleles; Size, size of the allele in the clone/contig; HO, observed heterozygosity; HE, expected heterozygosity; PHWE, Hardy-Weinberg equilibrium value; PIC, polymorphic information content.
* denotes significant deviation at P<0.004 after Bonferroni correction.
Given the low number of repeats found using the above approach and the recent advancements in pyrosequencing technology, the Roche GS-FLX System (454 Life Sciences) was used to generate 62,650 reads of the genome of A. incompertus. AGRF (www.agrf.com.au) performed this sequencing using a total of 10 μg of DNA pooled from 16 individual beetles, 1/8th of a 70×75 PicoTiterPlate and following the techniques outlined by Margulies et al. (Reference Margulies, Egholm, Altman, Attiya, Bader, Bemben, Berka, Braverman, Chen, Chen, Dewell, Du, Fierro, Gomes, Godwin, He, Helgesen, Ho, Irzyk, Jando, Alenquer, Jarvie, Jirage, Kim, Knight, Lanza, Leamon, Lefkowitz, Lei, Li, Lohman, Lu, Makhijani, McDade, McKenna, Myers, Nickerson, Nobile, Plant, Puc, Ronan, Roth, Sarkis, Simons, Simpson, Srinivasan, Tartaro, Tomasz, Vogt, Volkmer, Wang, Wang, Weiner, Yu, Begley and Rothberg2005). FASTA files were generated from the raw .sff files using Roche's 454 sffinfo program with default parameters and quality clipping points as calculated during signal processing. Post-filtering, the average read length was 145 bp with a mean GC content of 28.3%. Screening these sequences with MSATCOMMANDER 0.8.2 (Faircloth, Reference Faircloth2008) using default settings enabled us to identify 385 cases with eight or more repeats of di- to hexa-nucleotide repeat motifs (excluding imperfect microsatellites). Di-, Tri- and Hexa-nucleotide repeat classes were the most abundant (with approximately 46%, 22% and 21%, respectively). Utilising the ‘Design Primers’ option within MSATCOMMANDER, flanking sequences considered suitable for primer design were identified in 49 cases (28 di-nucleotides, 19 tri-nucleotides, 1 tetra- and 1 hexa-nucleotide repeat). We subsequently used PRIMER 3 version 0.4 (Rozen & Skaletsky, Reference Rozen, Skaletsky, Krawetz and Misener2000) to design 14 primer pairs. Nine primer sets yielded consistent PCR products and were polymorphic (table 1).
DNA amplification and genotyping
All microsatellite loci were amplified using PCR with cost-effective M13 fluorochrome-labelled primer sets following the methods of Schuelke (Reference Schuelke2000). PCR reactions were performed with a MJ Research PTC100 thermocycler in 10-μl volumes containing 50–100 ng of genomic DNA, 0.1–0.5 μM of each primer, 200 μM of each dNTP, 2.0 mM MgCl2, 1 U DNA polymerase (Promega), 1 μl reaction buffer (200 mM (NH4)2SO4, 750 mM Tris-HCl pH 8.8, 0.1% (v/v) Tween® 20).
Following optimization, the amplification conditions offering the highest resolution for these primers were: initial denaturation for 3 min at 94°C, followed by seven ‘touchdown’ cycles of 94°C denaturation for 30 s, annealing temperatures (60, 59, 58, 57, 56, 54 and 52°C) for 45 s and an extension step of 72°C for 1 min. On completion of the last touch down cycle, another 33 cycles were carried out at 50°C annealing temperature with a final step at 72°C for 10 min. Amplicons were subsequently electrophoresed on an ABI 3130 Genetic Analyser (Applied Biosystems). The sizes of alleles were estimated against an internal size standard LIZ using Peak Scanner™ Software v1.0 (Applied Biosystems). We characterized the genetic diversity at each locus within 32 individuals (one from each of 32 galleries) collected from one region in Olney State Forest, NSW, Australia. The number of alleles and their approximate sizes were established and recorded. At least 15% of samples were re-analysed at each locus to ensure data integrity and scoring consistency.
Genetic analyses
No evidence for allelic drop-out, scoring error due to stutter or null alleles was found using MICRO-CHECKER (van Oosterhout et al., Reference van Oosterhout, Hutchinson, Wills and Shipley2004). The observed and expected heterozygosity (HO and HE) and the polymorphic information content (the PIC is an indicator of the utility of the marker for linkage or population genetic studies) based on allele frequencies pooled across all samples was determined using CERVUS v3.03 (Kalinowski et al., Reference Kalinowski, Taper and Marshall2007). Deviations from Hardy-Weinberg equilibrium (HWE) and linkage disequilibrium were examined using the software GENEPOP v 3.4 (Raymond & Rousset, Reference Raymond and Rousset1995). The observed heterozygosity was lower than that expected under HWE in all cases (table 1). The deficiency in heterozygotes is unlikely to be a technical artefact, as it occurs in the majority of markers. Further investigations are required to examine whether this is due to population subdivision (a Walhund effect) or inbreeding. One locus (Ai15) deviated significantly from HWE (after sequential Bonferroni adjustment for multiple comparisons P<0.004). Numbers of alleles per locus ranged from 2 to 17, and observed and expected heterozygosites from 0.344 to 0.767 and from 0.507 to 0.860, respectively. No significant linkage disequilibrium was detected between any locus pairs.
Discussion
The success of enrichment protocols for microsatellite isolation is often influenced by the repeat motif of the probes used. Lacking any prior information on the frequency of different repeats in this genus, we chose 2 di-nucleotide repeats, for which the annealing conditions were similar (CA and GA). The sequences obtained from 454 sequencing in this study indicate that the most abundant di-nucleotide repeat motifs were TA and GA. Therefore, our probes were not complementary to one of the more frequent microsatellite motifs (TA). An inherent problem with the creation of enriched genomic libraries is the difficulty in predicting a priori which repeat motifs are most frequent in the genome. The use of more than a few probes is not always an adequate solution because of the cost and laboratory time requirement.
The development of microsatellite markers using the enrichment procedures described here entailed approximately ten weeks of laboratory effort (including 2–3 weeks for primer testing) and cost approximately $700 (AUD) per polymorphic locus developed. This is primarily because of the preferential enrichment of a small number of sequences in several of the enriched genomic libraries, a technical issue which appears to occur quite commonly with these protocols (Arthofer et al., Reference Arthofer, Schlick-Steiner, Steiner, Avtzis, Crozier and Stauffer2007; Banks et al., Reference Banks, Piggott, Williamson and Beheregaray2007). Our study confers with that of Csencsics et al. (Reference Csencsics, Brodbeck and Holderegger2010), who ascertained less than three weeks of laboratory effort are required for marker development using 454 sequencing (plus a supplier-dependent period of time for the 454 sequencing to be completed). We expended only $450 (AUD) for each polymorphic locus developed using 454 sequencing, constituting a significant reduction in the cost of microsatellite isolation.
An increasing number of studies are using the next-generation sequencing technique to identify microsatellites. The low cost and effort reported for such approaches suggests that development of these markers is no longer a serious impediment to population-level studies (Csencsics et al., Reference Csencsics, Brodbeck and Holderegger2010; Perry & Rowe, Reference Perry and Rowe2010). Using 454 pyrosequencing, we tested 14 primer pairs and characterised nine microsatellite loci for A. incompertus. An additional 300 sequences were obtained which contained microsatellites but lacked sufficient flanking regions for primer development. As improvements in technology enable longer length reads, the efficiency and yield of microsatellites using this approach will be further enhanced.
Although the distribution and occurrence of microsatellites among different taxa is difficult to compare without accurate genome size estimates, this study confirms the premise that microsatellites are relatively infrequent in A. incompertus. Just 0.6% of all contigs obtained using 454 sequencing contained a microsatellite. Using identical microsatellite search criteria, >5% of contigs harboured a microsatellite in the Australian gummy shark Mustelus antarcticus (Boomer & Stow, Reference Boomer and Stow2010), greater than 2% of contigs in the Gecko Gehyra variegata (Duckett & Stow, Reference Duckett and Stow2010) and greater than 1.7% of contigs in the frog Leiopelma hochstetteri (Clay et al., Reference Clay, Gleeson, Howitt, Lawrence, Abdelkrim and Gemmell2010). The increased use of high-throughput sequencing technology for marker development will enable a more thorough exploration of the factors influencing the abundance and type of microsatellites across taxa and may lead to some definitive explanations as to the function of repetitive DNA such as microsatellites.
This is among the first few studies to validate the 454 sequencing method of microsatellite acquisition for a significant pest species, one where microsatellite yield using traditional techniques has been low. The microsatellites described here will enable us to examine the mating system and colony structure of A. incompertus and illuminate historical and current dispersal patterns. Information on patterns of dispersal will help manage the risk of timber defect by A. incompertus in re-growth forests or plantations.
Acknowledgements
This work was partially funded by the 2008 Maxwell Ralph Jacobs Award to Shannon Smith from the Institute of Foresters of Australia. The manuscript benefited from the comments of two anonymous reviewers. We would like to gratefully acknowledge Alf Britton and Deborah Kent for field assistance and Mike Gardner for support with the 454 sequencing.