Introduction
The oriental fruit moth (OFM) Grapholita molesta (Lepidoptera: Tortricidae) is a major pest of stone and pome fruit, especially species of Rosaceae (Rothschild & Vickers, Reference Rothschild, Vickers, der Geest and Evenhuis1991). Larvae of the OFM cause damage by boring into twigs as well as fruits. Assumed to be native to China, this moth has spread into other stone-fruit growing continents, such as the Europe, Africa, South and North America, New Zealand, and Australia, during the last century (Quaintance & Wood, Reference Quaintance and Wood1916; Rothschild & Vickers, Reference Rothschild, Vickers, der Geest and Evenhuis1991).
Population genetics of the OFM have been widely studied on both global and national scales (Torriani et al., Reference Torriani, Mazzi, Hein and Dorn2010; Kirk et al., Reference Kirk, Dorn and Mazzi2013; Zheng et al., Reference Zheng, Peng, Liu, Pan, Dorn and Chen2013) because of the OFM's threat to fruit industries and complicated invasion history. The high genetic diversity in the Chinese populations provided strong evidence that China was the native range of the OFM (Kirk et al., Reference Kirk, Dorn and Mazzi2013; Zheng et al., Reference Zheng, Peng, Liu, Pan, Dorn and Chen2013). Further population genetic structure studies across China and South Korea revealed that this moth originated in Southwestern China and followed a northward dispersal (Wei et al., Reference Wei, Cao, Gong, Shi, Wang, Zhang, Guo, Wang and Chen2015). Global studies of the OFM indicated a weak but significant genetic structure on a continental scale (Kirk et al., Reference Kirk, Dorn and Mazzi2013). Although no pattern of isolation by distance was found within invasive populations of South Africa or Italy (Timm et al., Reference Timm, Geertsema and Warnich2008; Torriani et al., Reference Torriani, Mazzi, Hein and Dorn2010), Silva-Brandão et al., found that geographic distance was the main factor affecting genetic structure and gene flow in the invasive populations of Brazil (Silva Brandão et al., Reference Silva Brandão, Brandão, Omoto and Sperling2015). Structured populations from different orchards within an area indicated a selective host switch occurred in certain segments of the population (Zheng et al., Reference Zheng, Peng, Liu, Pan, Dorn and Chen2013). Also, pest management methods, such as fruit bagging, can have an important impact on the levels of genetic diversity and the genetic structures of OFM populations (Zheng et al., Reference Zheng, Qiao, Wang, Dorn and Chen2015).
In previous genetic studies of the OFM, four types of markers, including amplified fragment length polymorphisms (Timm et al., Reference Timm, Geertsema and Warnich2008), microsatellites (Torriani et al., Reference Torriani, Mazzi, Hein and Dorn2010), mitochondrial genes (Wei et al., Reference Wei, Cao, Gong, Shi, Wang, Zhang, Guo, Wang and Chen2015) and single nucleotide polymorphisms (Silva Brandão et al., Reference Silva Brandão, Brandão, Omoto and Sperling2015) were used. Among the markers, microsatellites were the most frequently used (Torriani et al., Reference Torriani, Mazzi, Hein and Dorn2010; Kirk et al., Reference Kirk, Dorn and Mazzi2013; Zheng et al., Reference Zheng, Peng, Liu, Pan, Dorn and Chen2013; Wei et al., Reference Wei, Cao, Gong, Shi, Wang, Zhang, Guo, Wang and Chen2015; Zheng et al., Reference Zheng, Qiao, Wang, Dorn and Chen2015) because of their high levels of polymorphism, co-dominance and easy detection by polymerase chain reaction (PCR). However, microsatellite development is challenging work, especially for species of Lepidoptera (Zhang, Reference Zhang2004). The first set of ten microsatellite markers for the OFM was developed from an enriched library of genomic DNA (Torriani et al., Reference Torriani, Mazzi, Hein and Dorn2010). These microsatellite loci showed relatively high frequencies of null alleles, ranging from 0.005 to 0.208 in Torriani et al. (Reference Torriani, Mazzi, Hein and Dorn2010), from 0.07 to 0.25 in Kirk et al. (Reference Kirk, Dorn and Mazzi2013), from 0.112 to 0.241 in Zheng et al. (Reference Zheng, Qiao, Wang, Dorn and Chen2015), from 0.13 to 0.28 in Zheng et al. (Reference Zheng, Peng, Liu, Pan, Dorn and Chen2013) and from 0.04–0.31 in Wei et al. (Reference Wei, Cao, Gong, Shi, Wang, Zhang, Guo, Wang and Chen2015). A new set of nine microsatellite markers was developed using the high-throughput genomic sequencing approach; however, the null allele frequency was still very high, ranging from 0.05 to 0.33 (Kirk et al., Reference Kirk, Dorn and Mazzi2013).
Null alleles are a common issue of microsatellite markers in eukaryotic genomes, especially in Lepidoptera (Sinama et al., Reference Sinama, Dubut, Costedoat, Gilles, Junker, Malausa, Martin, Neve, Pech, Schmitt, Zimmermann and Meglecz2011). The high null allele frequency leads to great difficulty in microsatellite genotyping (Anthony et al., Reference Anthony, Gelembiuk, Raterman and Nice2001; Habel et al., Reference Habel, Finger, Meyer, Schmitt and Assmann2008) and the transferability between species of the same genus (Flanagan et al., Reference Flanagan, Blum, Davison, Alamo, Albarrán, Faulhaber, Peterson and Mcmillan2002) as well as between populations of the same species (Jiggins et al., Reference Jiggins, Mavarez, Beltran, McMillan, Johnston and Bermingham2005). Microsatellite null alleles are caused by flanking sequence variants (point mutations, insertions or deletions) that affect primer binding sites (Wang, Reference Wang1994), unlike null alleles of isozyme markers (Primmer et al., Reference Primmer, Møller and Ellegren1995). Although software programs, such as Micro-checker (Van Oosterhout et al., Reference Van Oosterhout, Hutchinson, Wills and Shipley2004), GENEPOP (Raymond & Rousset, Reference Raymond and Rousset1995; Rousset, Reference Rousset2008) and FreeNA (Chapuis & Estoup, Reference Chapuis and Estoup2007), have been developed to estimate the frequencies of null alleles, the effects of the null alleles on genetic diversity analyses and population structures needs further empirical evaluation.
In this study, we developed novel sets of microsatellite markers for the OFM from the genomic sequences and validated them in four natural populations from the native range of China. Additionally, the effects of null alleles on population genetics analyses were assessed.
Materials and methods
Samples and DNA extraction
A larva of the OFM collected from Aksu Prefecture, Xinjiang Province, China was sampled for genomic sequencing. Eight individuals from eight different locations of China were used for the initial testing of the primer pairs. In total, 95 OFM larvae collected between 2010 and 2012 from four geographic locations across China, including 17 from Shilin, Yunnan Province (SL) (E103°19′48.00″, N24°51′48.96″), 39 from Chengdu, Sichuan Province (CD) (E104°03′53.48″, N30°39′30.96″), 20 from Nanjing, Jiangsu Province (NJ) (E118°47′48.76″, N32°03′36.92″) and 19 from Shenyang, Liaoning Province (SY) (E123°25′53.29″, N41°48′20.59″), were used for population-level analyses. All of the specimens were stored in absolute ethanol and frozen at −80°C prior to the DNA extraction and stored at the Integrated Pest Management Laboratory of the Beijing Academy of Agriculture and Forestry Sciences. Genomic DNA was extracted from a segment of an individual larva using a DNeasy Blood & Tissue Kit (Qiagen, Hilden, Germany) according to the manufacturer's instructions.
Sequencing and assembly of the OFM genome
A library with a 500-bp insert size was constructed using the Illumina TruSeq DNA PCR-Free HT Library Prep Kit (Illumina, San Diego, CA, USA). The MiSeq Reagent Kit v3 (Illumina, San Diego, CA, USA) was used to sequence the prepared library on an Illumina MiSeq Sequencer. The generated genomic sequences were assembled by IDBA with a Kmer = 80–240 (Peng et al., Reference Peng, Leung, Yiu and Chin2010). The low-quality reads were removed using SolexaQA software (Cox et al., Reference Cox, Peterson and Biggs2010).
Genome-wide microsatellite survey and primer design
We surveyed all of the potential microsatellite loci from the assembled genomic sequences using the software MSDB (Du et al., Reference Du, Li, Zhang and Yue2013) under parameters of a minimum of 250 (an extremely high value to exclude mononucleotide motifs), 5, 5, 5, 5 and 5 repeats to identify the mono-, di-, tri-, tetra-, penta- and hexanucleotide motifs, respectively. Classifying repeats into classes was simplified. For example, among the dinucleotide, (AC)n, (CA)n, (TG)n and (GT)n were considered as the same class (Jurka & Pethiyagoda, Reference Jurka and Pethiyagoda1995). The program QDD (Meglécz et al., Reference Meglécz, Costedoat, Dubut, Gilles, Malausa, Pech and Martin2010) was used to isolate microsatellites, and primers were designed according to the method of Wang et al. (Reference Wang, Cao, Zhu and Wei2016). The parameters are as follows: The annealing temperature for each primer was set between 58 and 62°C, while the difference in the annealing temperatures within a primer pair was <4°C. The outputs of primer pairs from QDD were further filtered by the following stringent criteria: (i) the microsatellites had to be pure and specific; (ii) the design strategy of ‘A’ was used; and (iii) the minimum distance between the 3′ end of a primer pair and its target region had to be no shorter than 10 bp. Putative genes of validated microsatellite loci were identified using the BLASTX algorithm against the GenBank database using a maximum e-value of 1e-5.
Primer validation and polymorphism detection
For primer validation, a primer C tail (PC tail) (5′ CAGGACCAGGCTACCGTG 3′) was added to the 5′ end of each candidate forward primer (Schuelke, Reference Schuelke2000; Blacket et al., Reference Blacket, Robin, Good, Lee and Miller2012) to reduce the cost. Eight OFM larvae from eight geographical populations were used for the initial test. The final amplification volume was 10 µl, including 0.5 µl of template DNA (5–20 ng µl−1), 5 µl of Master Mix (Promega, Madison, WI, USA), 0.08 µl of PC tail modified forward primer (10 mM), 0.16 µl of reverse primer (10 mM), 0.32 µl of fluorescence-labeled PC tail (10 mM) and 3.94 µl of ddH2O. The amplification program was performed under the following conditions: 4 min at 94°C; 35 cycles of 30 s at 94°C, 30 s at 56°C and 45 s at 72°C, followed by a final 10-min extension at 72°C. The amplified PCR fragments were analyzed on the ABI 3730xl DNA Analyzer (Applied Biosystems) using the GeneScan 500 LIZ size standard (Applied Biosystems).
Primer pairs for 64 loci (GenBank accession numbers: KX711549–KX711612) were selected for initial test based on two criteria: (1) at most one primer pair is retained from one scaffold and (2) repeat motif of the expected amplification is larger or equal to 3. Primer pairs with amplification rates lower than 75% were excluded for genotyping (marked as ‘non-amplification’ or ‘low-success-rate’ in table S1). Those that failed to amplify target sequences were repeated using annealing temperatures of 53 and 50°C. During genotyping step by GENEMAPPER version 4.0 (Applied Biosystems), the loci showed more than two peaks in one individuals (marked as ‘non-specific amplification’ in table S1) and those had less than two alleles in eight testing individuals (marked as ‘no polymorphism’ in table S1) were excluded for large-scale examination. The remaining primer pairs were validated in 95 individuals from four natural populations, conducted as in the previous steps.
The genotyping data were determined using GENEMAPPER version 4.0 (Applied Biosystems). The stuttering and large allele dropouts were detected using MICRO-CHECKER version 2.2.3 (Van Oosterhout et al., Reference Van Oosterhout, Hutchinson, Wills and Shipley2004) and checked back in GENEMAPPER. The null allele frequencies were estimated using the software FReeNA (Chapuis & Estoup, Reference Chapuis and Estoup2007). Allele frequencies, observed heterozygosity (H O), expected heterozygosity (H E) and the polymorphic information content (PIC) were calculated using the macros Microsatellite Tools (Park, Reference Park2001). We used GENEPOP version 4.0.11 (Raymond & Rousset, Reference Raymond and Rousset1995) to test the likely deviation from Hardy–Weinberg equilibrium (HWE) at each locus/population pair and the linkage disequilibrium (LD) among loci within each population. The allelic richness (A R) of each loci and inbreeding coefficient (F IS) between the individuals within each population were detected by the software FSTAT version 2.9.3 (Goudet, Reference Goudet1995). The program LOSITAN (Antao et al., Reference Antao, Lopes, Lopes, Beja-Pereira and Luikart2008) was used to detect putative loci potentially under selection with two options: neutral mean F ST′ and force mean F ST′.
Selection of marker panels
To explain the influence of the null allele on genetic diversity and population genetic structure, we established three marker panels based on the following criteria: (i) all of the loci (ALL), (ii) the top one-third of the loci (GM3-S11, GM3-S13, GM5-S18, GM3-S22, GM3-S31, GM3-S34, GM5-S44, GM3-S46 and GM3-S64) with the lowest null allele frequencies (LNAs) and (iii) the top one-third of the loci (GM3-S04, GM3-S12, GM3-S15, GM3-S33, GM3-S35, GM3-S41, GM3-S49, GM3-S51 and GM3-S61) with the highest null allele frequencies (HNAs). The three marker panels were independently used to estimate the genetic diversity and in population structure analyses.
Population genetic structure analyses
The pairwise differentiations (F ST) of the four populations were calculated using both GENEPOP version 4.0.11 (Raymond & Rousset, Reference Raymond and Rousset1995) and FReeNA (Chapuis & Estoup, Reference Chapuis and Estoup2007). The latter uses an excluding null allele method to avoid the effects of a null allele on the estimates of genetic differentiation. Paired t-tests were used to compare the F ST values calculated by the above two methods within each marker panel.
The population genetic structure was analyzed by STRUCTURE version 2.3.4 (Pritchard et al., Reference Pritchard, Stephens and Donnelly2000). For the cluster identification, we used replicates with K from 1 to 10, with 30 repeats for each, and a burn-in of 100,000 iterations followed by 200,000 Markov Chain Monte Carlo iterations. The results were submitted to the online software STRUCTURE HARVESTER version 0.6.94 (Earl & vonHoldt, Reference Earl and vonHoldt2011) (http://taylor0.biology.ucla.edu/structureHarvester/) to determine the optimal K value using the Delta (K) method. The visualized results obtained from previous databases were handled by two programs, CLUMPP version 1.1.2 (Jakobsson & Rosenberg, Reference Jakobsson and Rosenberg2007) and DISTRUCT version 1.1 (Rosenberg, Reference Rosenberg2004). To quantify the distinction rate of each locus, we concluded that individuals were correctly assigned to their respective populations as long as the Q-value obtained with CLUMPP met our lowest standard value of 0.6. Because the STRUCTURE analysis considers HWE and linkage equilibrium, an extra discriminant analysis of principal components (DAPC) was implemented in the R package ADEGENET 1.4–2 (Jombart et al., Reference Jombart, Devillard, Dufour and Pontier2008), which does not require a biological hypothesis. The detailed methods are as follows: after importing the data to be processed, we chose the optimal number of clusters and PCs. Other parameters and settings were determined by the script.
Results
Genomic sequencing and characteristics of the OFM's microsatellites
In total, 4916 Mb paired-end sequences with read lengths of 300 bp were generated by the Illumina MiSeq system. Raw data sequences were submitted to the Short Read Archive of the National Center for Biotechnology Information under accession number SRP074918. The high-quality reads were assembled into 65,534 scaffolds with a total length of 243 Mb, mean size of 2061 bp, N50 of 2611 bp and coverage of approximately 20 times.
We isolated 56,674 microsatellites from the genomic sequences. No mononucleotide repeats were obtained under our selection parameters, while 36,789 (64.913%) dinucleotide, 14,135 (24.941%) trinucleotide, 5005 (8.831%) tetranucleotide, 605 (1.068%) pentanucleotide and 140 (0.247%) hexanucleotide repeats were included (table S1). The average length of the different nucleotide repeats ranged from 13 to 35 bp, while the corresponding frequencies and densities decreased from 75.55 to 0.29 loci Mb−1 and 956.975 to 10.03 bp Mb−1, respectively. The number of microsatellites decreased as the number of repeat motifs increased. There were far more dinucleotide repeats than other repeats in our study, which was common in other species of Lepidoptera (A'Hara & Cottrell, Reference A'Hara and Cottrell2013; Pavinato et al., Reference Pavinato, Silva-Brandao, Monteiro, Zucchi, Pinheiro, Dias and Omoto2013), Thysanoptera (Yang et al., Reference Yang, Sun, Xue, Zhu and Hong2012) and Coleoptera (Bouanani et al., Reference Bouanani, Magné, Lecompte and Crouau-Roy2014).
Validation and null alleles of microsatellite loci
In total, 11,584 loci were suitable for primer design. We retained one primer pair for each locus in primer design using the software QDD. Thirty-six of the 64 primer pairs were retained after the stringent criteria selection and applied into the initial validation on eight individuals, while the other 28 pairs were discarded because they resulted in an amplification rate lower than 75%, generated more than two peaks in the genotyping process or had a low polymorphism. After the second round of screening, 27 primer pairs were maintained (table 1, table S2). BLAST search showed that 14 loci are located on putative coding regions, while the others might be from non-coding regions of the OFM genome.
BLASTx, results using the BLASTx algorithm against GenBank.
*Indicates the locus deviated from Hardy–Weinberg equilibrium (HWE) in one population.
**Indicates the locus deviated from HWE in two populations.
***Indicates the locus deviated from HWE in three populations.
****Indicates the locus deviated from HWE in all four populations.
Among the 27 validated microsatellites, six loci had low frequencies of null alleles in the four populations, ranging from 0.002 to 0.078, ten loci had moderate frequencies, ranging from 0.140 to 0.190, and 11 loci had high frequencies, ranging from 0.218 to 0.325 (table 2). Among the four tested populations, CD had the highest null allele frequency (0.213), followed by NJ (0.185), SL (0.176) and SY (0.166).
ALL, LNA and HNA in the first column indicate the three marker panels established based on the frequencies of null alleles.
Genetic diversity of the OFM based on the three marker panels
Population level tests showed that all of the loci were highly polymorphic. The H O and H E varied from 0.081 to 0.702 and from 0.194 to 0.850, respectively. The allele numbers ranged from 4 to 40, with an average value of 13.7 per locus. The inbreeding coefficient ranged from −0.500 to 1.000, with an average value of 0.486, and the PIC was distributed between 0.167 and 0.810, with an average value of 0.544 (table 3), after applying Holm's correction (Gaetano, Reference Gaetano2013).
A R, allelic richness per locus; H O, observed heterozygosity; H E, expected heterozygosity; F IS, inbreeding coefficient; HWE, average P-value of Hardy–Weinberg equilibrium; PIC, polymorphic information content.
Within the marker panel ALL, 12 loci showed significant deviation from HWE (P < 0.05) in different populations (table 3), which might be caused by heterozygote deficiencies and the presence of null alleles, corroborating the previously observed results in most lepidopteran microsatellites (An et al., Reference An, Deng, Shi, Ding, Lan, Yang and Li2014). The H O value was lower than the H E value in all of the loci among all of the populations. A high F IS was found in the loci GM3-S01, GM3-S15, GM5-S23, GM3-S32, GM3-S35, GM3-S41, GM3-S49, GM3-S51 and GM3-S61 in some populations, and might be caused by the Wahlund Effect in the CD population (Wahlund, Reference Wahlund1928; Dharmarajan et al., Reference Dharmarajan, Beatty and Rhodes2013). Thirty-six pairs of loci (11 pairs in the CD population, 12 pairs in the SL population, 10 pairs in the NJ population and 3 pairs in the SY population) of the 1404 pairwise comparisons between each pair of loci in the four populations showed a significant LD after multiple tests. Since no pair of loci showed LD in all tested populations of OFM, the presence of LD is unlikely due to physical linkage, as in previous studies (Torriani et al., Reference Torriani, Mazzi, Hein and Dorn2010; Wei et al., Reference Wei, Cao, Gong, Shi, Wang, Zhang, Guo, Wang and Chen2015).
When the marker panels LNA and HNA were used for genetic diversity estimations, the average A R values for the four populations calculated from HNA were the highest, followed by those from LNA and ALL, although there were no statistically significant differences (table 4). The H O values for these two panels were obviously lower than the H E values; however, the differences between H O and H E calculated from the HNA panel were higher than those from LNA, which indicates that the null allele's presence might explain the low H O and the deviation from the HWE. The high F IS of loci from HNA (0.715, 0.702, 0.666 and 0.714 in CD, NJ, SL and SY, respectively) were approximately two or three times those from LNA (0.334, 0.235, 0.211 and 0.183 in CD, NJ, SL and SY, respectively).
The neutrality test showed that S22 was a candidate for positive selection and S12 for balancing selection. The other 25 of the markers showed no selection pressure. S12, S22 as well as another ten loci displayed significant deviation from HWE in average of P values (<0.05) of each populations (table 3).
Population genetic structure of the OFM using three marker panels
No significant differences were observed between pairwise F ST and excluding null alleles-corrected pairwise F ST for null alleles (t-test: ALL, P = 0.99995; LNA, P = 0.99988; HNA, P = 0.99989) (table S3), which suggests that null alleles did not affect this analysis.
Using the 27 microsatellites for a STRUCTURE analysis (marker panel ALL) resulted in the individuals from four geographic populations being divided into five clusters. The NJ and SY populations together formed one cluster, the SL population formed another one, while the CD population was separated into three distinct clusters (fig. 1a). To eliminate the biological hypothesis, an extra DAPC was performed, resulting in the individuals from four populations sampled being clearly divided into three genetic clusters (fig. 2a). This was in agreement with the clusters determined by STRUCTURE, although the CD population was regarded as one integral unit rather than three clusters. Our analyses indicated that the microsatellite markers validated in our study were powerful for detecting the population genetic structure.
When LNA and HNA were subjected to STRUCTURE analyses, the populations were clearly divided into three clusters. To make the results comparable and consistent, the STRUCTURE results of the ALL panel was drawn for K = 3 (fig. 1b). Almost all of the loci in the three marker panels displayed a very high degree of distinction in the four natural populations (fig. 1b–d). The average posterior probability (Q) was high in all cases (CD: ALL, Q = 0.969; LNA, Q = 0.863; HNA, Q = 0.811; SL: ALL, Q = 0.972; LNA, Q = 0.966; HNA, Q = 0.924; NJ: ALL, Q = 0.956; LNA, Q = 0.894; HNA, Q = 0.837; SY: ALL, Q = 0.981; LNA, Q = 0.922; and HNA, Q = 0.850), although 1, 4 and 10 individuals were not correctly assigned to their particular populations in the ALL, LNA and HNA panels, respectively. Similar results for the population's genetic structure were obtained using DAPC (fig. 2b, c).
Discussion
The genome sizes of sequenced lepidopteran species are usually several hundred megabases, such as Plutella xylostella (You et al., Reference You, Yue, He, Yang, Yang, Xie, Zhan, Baxter, Vasseur and Gurr2013) (343 Mb), Papilio machaon (281 Mb), Papilio xuthus (244 Mb) (Li et al., Reference Li, Fan, Zhang, Liu, Zhang, Zhao, Fang, Chen, Dong and Chen2015) and Papilio glaucus (376 Mb) (Cong et al., Reference Cong, Borek, Otwinowski and Grishin2015). Although the size of the OFM genome is currently not available, it is assumed that the 243-Mb sequences obtained in our study represent a large portion of the OFM genome.
Microsatellites are versatile potential molecular markers in population genetics and evolution (Estoup & Angers, Reference Estoup and Angers1998). Changes, such as mutations and substitutions in flanking region sequences, may prevent the primer annealing to template DNA during amplification of the microsatellite locus by PCR, resulting in a null allele. The preferential amplification of short alleles owing to inconsistent DNA template quality, or quantity, and slippage during PCR amplification (Gagneux et al., Reference Gagneux, Boesch and Woodruff1997) might also result in microsatellite null alleles. Additionally, the enzyme activity can be reduced heavily at the end of the amplification reaction, leaving an unavoidable null allele. Thus, markers should be validated in multiple populations to minimize null allele occurrence (Guichoux et al., Reference Guichoux, Lagache, Wagner, Chaumeil, Leger, Lepais, Lepoittevin, Malausa, Revardel, Salin and Petit2011). In this study, we used eight individuals from eight geographical populations in the native range of the OFM for initial testing and 95 individuals from four natural populations for population-level validation. All loci used for examination had trinucleotide repeats except for GM5-S18, GM5-S23 and GM5-S44 (table 1), which had pentanucleotide. High, moderate and low frequency of null allele was found for the three loci with pentanucleotide repeat, which might indicate that there is no relation between size of the repeats and null allele frequency (table 2). Nevertheless, high null allele frequencies were present in most of the remaining loci. Our study, as well as previous reports (Torriani et al., Reference Torriani, Mazzi, Hein and Dorn2010; Kirk et al., Reference Kirk, Dorn and Mazzi2013; Zheng et al., Reference Zheng, Peng, Liu, Pan, Dorn and Chen2013; Wei et al., Reference Wei, Cao, Gong, Shi, Wang, Zhang, Guo, Wang and Chen2015; Zheng et al., Reference Zheng, Qiao, Wang, Dorn and Chen2015), indicated that the OFM is plagued by null alleles, which is common in lepidopteran species (Sinama et al., Reference Sinama, Dubut, Costedoat, Gilles, Junker, Malausa, Martin, Neve, Pech, Schmitt, Zimmermann and Meglecz2011; Jiang et al., Reference Jiang, Zhu, Zhan, Chen, Song and Yu2014), but not in all species (Lebigre et al., Reference Lebigre, Turlure and Schtickzelle2015; Wang et al., Reference Wang, Cao, Zhu and Wei2016). The null allele frequencies in CD were higher than in the other three populations for almost all of the loci developed in this study. This corroborates the high genetic diversity in the CD population that was revealed in a previous study (Wei et al., Reference Wei, Cao, Gong, Shi, Wang, Zhang, Guo, Wang and Chen2015), and this may lead to a high mutation rate in the microsatellite-flanking sequences.
The presence of null alleles can sometimes cause heterozygosity deficiencies that lead to deviations from HWE. Because null alleles create false homozygotes, they are problematic for the exclusion of the true parents in parentage analyses (Dakin & Avise, Reference Dakin and Avise2004). The presence of the null allele may reduce the power to assign individuals to populations correctly, erroneously inflate levels of genetic differentiation (Carlsson, Reference Carlsson2008), and underestimate the genetic diversity estimates of the population that rely on HWE (Sousa et al., Reference Sousa, Finkeldey and Gailing2005; Chapuis & Estoup, Reference Chapuis and Estoup2007). In our study, the deviation of most loci from HWE, low H O values and relatively high F IS may indicate that the null alleles affect the estimations of genetic diversity, although the weak flight ability might cause high rates of inbreeding in the species (Torriani et al., Reference Torriani, Mazzi, Hein and Dorn2010; Wei et al., Reference Wei, Cao, Gong, Shi, Wang, Zhang, Guo, Wang and Chen2015). The results of neutrality test also suggest that selection was still an important factor causing the deviation.
The three panels, with different null allele frequencies, divided the four populations into three groups using STRUCTURE and DAPC, which is congruent with a previous study (Wei et al., Reference Wei, Cao, Gong, Shi, Wang, Zhang, Guo, Wang and Chen2015). When we used all of the loci for a STRUCTURE analysis, the CD population was divided into three clusters. The presence of subpopulations might be caused by the specimen sampling location and the implementation of the model in an algorithm of STRUCTURE (Pritchard et al., Reference Pritchard, Stephens and Donnelly2000; Falush et al., Reference Falush, Stephens and Pritchard2003). There were 10, 4 and 1 individuals that were not assigned to their particular populations correctly when the marker panels HNA, LNA and ALL were used, respectively, indicating that the null allele affects individual assignments, as previously reported (Carlsson, Reference Carlsson2008). The STRUCTURE and the DAPC results were congruent with the analysis of Wei et al. (Reference Wei, Cao, Gong, Shi, Wang, Zhang, Guo, Wang and Chen2015), in which the CD, SL, NJ and SY populations were divided into CD, SL, CE (Central) and NO (Northern) groups. As is shown in the table 4, the average A R values for the four populations calculated from HNA were the highest and those from LNA were lower. The H O values for these two panels were obviously lower than the H E values, indicating that the null allele's presence might cause the low H O and the deviation from the HWE. The high F IS of loci from HNA were much higher than those from LNA. However, when LNA and HNA were subjected to STRUCTURE analyses, the populations were clearly divided into three clusters. It can be concluded that the null allele influenced estimations of genetic diversity parameters but not the OFM's genetic structure.
Conclusions
We characterized the distribution of microsatellites in the genomic sequences and developed a novel set of microsatellite markers for the OFM. A population-level validation showed that the microsatellites of the OFM were plagued by the presence of null alleles. Comparisons among the three marker panels showed that null alleles could influence the genetic diversity and individual assignments, but not the division of the population genetic structure. The microsatellites developed in our study are useful markers for further genetic studies of the OFM.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S0007485316000936.
Acknowledgements
The research was funded by the Beijing Natural Science Foundation (Grant no. 6162010), National Basic Research Program of China (Grant no. 2013CB127600) and the National Natural Science Foundation of China (Grant no. 31472025).