INTRODUCTION
In recent years, Single Nucleotide Polymorphisms (SNPs) have become the marker-of-choice for estimating genetic diversity and inferring the evolutionary history of genes and species populations. Furthermore, the choice of genetic markers (coding/non-coding) is of prime importance as the inferences are highly influenced by the functional differences between genomic regions and thus the differential evolutionary pressure they experience in populations (Das et al. Reference Das, Mohanty and Stephan2004). Numerous studies in model organisms have indicated that limiting the genetic markers is the first step to seek answers for specific evolutionary questions. For example, in Drosophila, distinguishing the role of natural selection from genetic drift could be possible by careful choice of markers across the genome (Baines et al. Reference Baines, Das, Mousset and Stephan2004; Das et al. Reference Das, Mohanty and Stephan2004). This approach has not been explored much in pathogenic organisms; thus, our understanding of the evolutionary genetics of drug resistance and host invasion mechanisms is still underdeveloped.
Malaria is a severe infectious vector-borne disease, endemic to many of the tropical and subtropical countries of the globe, with 300–500 million cases and 1 million deaths per year (WHO, 2008). Among the 4 different species of the genus Plasmodium that are parasites of human malaria, Plasmodium falciparum and P. vivax are the 2 most widely prevalent. While the former species causes severe, complicated and often fatal malaria, the latter is considered rarely lethal but can reside inside human hosts for longer periods, and burden populations with high morbidity. This basic malaria epidemiological difference has resulted in diverting more attention towards research on P. falciparum, with comparatively less focus towards P. vivax. However, recently published genome sequence information about P. vivax (Carlton et al. Reference Carlton, Adams, Silva, Bidwell, Lorenzi, Caler, Crabtree, Anginoli, Merino, Amedeo, Cheng, Conlson, Crabb, del Po, Essien, Feldblyum, Fernandez-Becerra, Gilson, Gueye, Guo Kang'a, Kooij, Korsinczky, Meyer, Nene, Paulsen, White, Ralph, Ren and Sargeant2008a), and associated comparative genomic studies between P. vivax and P. falciparum (Carlton et al. Reference Carlton, Adams, Silva, Bidwell, Lorenzi, Caler, Crabtree, Anginoli, Merino, Amedeo, Cheng, Conlson, Crabb, del Po, Essien, Feldblyum, Fernandez-Becerra, Gilson, Gueye, Guo Kang'a, Kooij, Korsinczky, Meyer, Nene, Paulsen, White, Ralph, Ren and Sargeant2008 a, Reference Carlton, Escalante, Neafsey and Volkmanb; Das et al. Reference Das, Sharma, Gupta and Dash2009), have provided baselines in understanding basic genomic differences between these two species, renewing hopes to control both forms of malaria.
Several past population genetics studies of P. vivax have provided important information on the evolutionary history of this species (Cornejo and Escalante, Reference Cornejo and Escalante2000; Cui et al. Reference Cui, Escalante, Imwong and Snounou2003; Jongwutiwes et al. Reference Jongwutiwes, Putaporntip, Iwasaki, Ferreira, Kanbara and Hughes2005; Leclerc et al. Reference Leclerc, Durand, Gauthier, Patot, Billotte, Menegon, Severini, Ayala and Renaud2004; Lim et al. Reference Lim, Tazi and Ayala2005; Mu et al. Reference Mu, Joy, Duan, Huang, Carlton, Walker, Barnwell, Beerli, Charleston, Pybus and Su2005). However, molecular genetic markers used in these studies (microsatellite, SNPs) were not uniform, nor were the genetic regions (nuclear, mitochondrial genome) the same. In most cases known functional genes MSP, CSP, Duffy binding protein genes etc. mostly antigenic targets of vaccine development exhibited very high genetic diversity (Figtree et al. Reference Figtree, Pasay, Slade, Cheng, Cloonan, Walker and Saul2000; Cole-Tobian and King, Reference Cole-Tobian and King2003; Rayner et al. Reference Rayner, Tran, Corredor, Huber, Barnwell and Galinski2005; Prajapati et al. Reference Prajapati, Verma, Adak, Yadav, Kumar, Eapen, Das, Singh, Sharma, Rizvi, Dash and Joshi2006; Grynberg et al. Reference Grynberg, Fontes, Hughes and Braga2008; Hawkins et al. Reference Hawkins, Auliff, Prajapati, Rungsihirunrat, Hapuarachchi, Maestre, O'Neil, Cheng, Joshi, Na-Bangchang and Sibley2008). On the other hand, limited genetic diversity has been reported for some other drug targets (Na et al. Reference Na, Lee, Lee, Cho, Bae, Kong, Lee and Kim2004; Imwong et al. Reference Imwong, Pukrittayakamee, Cheng, Moore, Looareesuwan, Snounou, White and Day2005; Leclerc et al. Reference Leclerc, Gauthier, Villegas and Urdaneta2005). Thus, the evolutionary history of different genes and populations of the species is still unclear, primarily because of the selection of non-uniform genetic markers in different studies. To understand the evolutionary history more clearly, it is very important to disentangle the differential roles of natural selection and demography on different genomic regions (Das et al. Reference Das, Mohanty and Stephan2004). It is therefore of prime importance to choose appropriate genetic markers to seek answers to the questions at hand, if apposite measures (e.g., population-specific drugs and vaccines) for malaria control are to be taken. This is especially relevant in countries like India where about half of the total malaria cases are reported to be due to P. vivax infection (Singh et al. Reference Singh, Mishra, Awasthi, Dash and Das2009). In India, past studies of this species were restricted to a few enzymes (Joshi et al. Reference Joshi, Subbarao, Raghavendra and Sharma1989, Reference Joshi, Subbarao, Adak, Nanda, Ghosh, Carter and Sharma1997) and genes (Kim et al. Reference Kim, Imwong, Nandy, Chotivanich, Nontprasert, Tonomsing, Maji, Addy, Day, White and Pukrittayakamee2006; Prajapati et al. Reference Prajapati, Verma, Adak, Yadav, Kumar, Eapen, Das, Singh, Sharma, Rizvi, Dash and Joshi2006; Rajesh et al. Reference Rajesh, Elamaran, Vidya, Gowrishankar, Kochar and Das2007; Hawkins et al. Reference Hawkins, Auliff, Prajapati, Rungsihirunrat, Hapuarachchi, Maestre, O'Neil, Cheng, Joshi, Na-Bangchang and Sibley2008; Thakur et al. Reference Thakur, Alam, Bora, Kaur and Sharma2008), and utilization of genetic diversity data for evolutionary inferences about genes and populations of Indian P. vivax has been impractical until now.
We herewith report results of the development of 11 putatively neutral DNA fragments from P. vivax located in a chromosomal region that is highly conserved and syntenic with P. falciparum. We used these markers to estimate different population genetic parameters in a population sample of P. vivax collected in the northeastern region of India. Different tests of neutrality confirm that these loci could be used to infer the origin and evolutionary history of one of the most wide-spread human malaria parasites.
MATERIALS AND METHODS
Plasmodium vivax genetic region and details of the DNA fragments
We selected a ~200 kb DNA region located in chromosome 13 (Carlton et al. Reference Carlton, Adams, Silva, Bidwell, Lorenzi, Caler, Crabtree, Anginoli, Merino, Amedeo, Cheng, Conlson, Crabb, del Po, Essien, Feldblyum, Fernandez-Becerra, Gilson, Gueye, Guo Kang'a, Kooij, Korsinczky, Meyer, Nene, Paulsen, White, Ralph, Ren and Sargeant2008a) of P. vivax that is in conserved synteny with P. falciparum chromosome 14 and contains 2 of the most important genes for the survival of the parasite, aspartic protease (PvPM4) and circumsporozoite protein (CSP). We scanned the ~200 kb region present on ctg_6877 (CM000454; data retrieved from PlasmoDB release 5.4, (http//:www.plasmodb.org)) of P. vivax from nucleotide position 1 725 175 to 1 933 181. This region was found to contain 44 genes (Fig. 1), out of which 41 are orthologous to P. falciparum genes. Boundaries of predicted genes in P. vivax were based on P. falciparum gene models, which have been used to annotate genes in P. vivax. To develop putatively neutral DNA markers in P. vivax, we randomly selected 27 non-coding DNA fragments (17 intergenic and 10 introns; Fig. 1) and designed primers (PrimerSelect, a component of the DNASTAR, Madison, WI, USA) to amplify 300–750 bp DNA fragments of the non-coding regions. A detailed list of primers is provided in Table 1 and the locations of each of the primer pairs are indicated in Fig. 1. DNA sequences reported in this article can be found in GenBank under Accession numbers HM045910 to HM046161.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160626195343-47511-mediumThumb-S0031182010000533_fig1g.jpg?pub-status=live)
Fig. 1. Schematic overview of the ~200 kb DNA region located in chromosome 13 of Plasmodium vivax. No component of the figure is in scale.
Table 1. Details of the primer sequences employed in amplification of 17 non-coding DNA fragments in Indian Plasmodium vivax along with their annealing temperatures (°C)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160626195645-04974-mediumThumb-S0031182010000533_tab1.jpg?pub-status=live)
Study area and molecular methods
Finger-prick blood samples from P. vivax-positive malaria patients in Sonapur, Assam, a northeastern state in India (26°11′N, 91°44′E) were collected in clinical studies during March–April 2007. We have chosen Assam for this study since that region is highly endemic to malaria (Joshi et al. Reference Joshi, Valecha, Verma, Kaul, Mallick, Shalini, Prajapati, Sharma, Dev, Biswas, Nanda, Malhotra, Subbarao and Dash2007) and P. vivax is one of the predominant species. All necessary clearances were obtained from the ethical committee of the National Institute of Malaria Research, New Delhi, India and informed consents were obtained from the patients. Blood samples were spotted on Whatman filter paper, dried and stored at room temperature in individually sealed plastic bags containing desiccant packets until DNA extraction was performed. Genomic DNA was extracted using QIAamp mini DNA isolation kit (Qiagen, Germany) from these dried blood spots following standard Qiagen protocol. The genomic DNA from each isolate was then analysed using Plasmodium species-specific primers (Johnston et al. Reference Johnston, Pieniazek, Xayavong, Slemenda, Wilkins and da Silva2006) to rule out mixed infection with P. falciparum. Sixteen samples showing pure P. vivax infection by PCR were included in the study. All amplification reactions were carried out in a final volume of 25 μl, which included 0·3 μm of each primer (forward and reverse), dNTP at 0·2 mm, 2·5 mm MgCl2 and 1 unit of Taq DNA polymerase (Bangalore Genei) with 1X Taq DNA polymerase buffer. Amplification conditions included an initial denaturation at 94°C (5 min) and 35 cycles of denaturation at 94°C (50 sec), followed by annealing of templates at different temperatures for different DNA fragments (Table 1) and extension at 72°C for 50 sec. A final extension step lasted for 5 min at 72°C. Five μl of amplified PCR products for each DNA fragment were run on a 2% agarose gel to check the quality of amplification. If a single band and no primer-dimers were present, the PCR products were considered for sequencing. In that case, the products were purified with Exo-SAP (Fermentas, Life Sciences) in a thermal cycler at 37°C for 30 min and 85°C for another 15 min. For sequencing reactions we used about 2–4 μl of the purified PCR product, 4 μl of big dye terminator (BDT) ready reaction mix, and 0·8 pmoles of primer, and cycle sequencing was performed in a thermal cycler as follows: initial denaturation at 95°C for 5 min, followed by 25 cycles of final denaturation at 95°C for 10 sec, annealing at 50°C for 5 sec and extension at 60°C for 4 min. The probes were then transferred to a 96-well plate and DNA sequencing was performed on a 3730XL DNA Analyzer (Applied Biosystem), an in-house facility of NIMR. Each fragment was sequenced in both the forward and reverse directions (2X coverage), assembled and edited using the SeqMan computer program (DNASTAR, Madison, WI, USA). Homologous DNA fragments were then aligned using the MegAlign program of DNASTAR following the ClustalW algorithm. Insertion and deletion mutations were not considered in the analyses, since mutation rates of SNPs and indels are reported to be dissimilar (Väli et al. Reference Väli, Brandström, Johansson and Ellegren2008) and would thus affect uniform inference of population genetic parameters (see below).
Estimation of population genetic parameters
We considered only SNPs in estimating population genetic parameters of our Indian population of P. vivax. Based on the SNP data, several haplotypes (alleles) were derived for each of the 11 fragments. Estimations of different diversity parameters, such as haplotype diversity (Hd) (Nei, Reference Nei1987), and 2 measures of nucleotide diversities, θw (Watterson, Reference Watterson1975) and π (Tajima, Reference Tajima1983) were performed using DnaSP version 4.50 (Rozas et al. Reference Rozas, Sanchez-DelBarrio, Messeguer and Rozas2003). Whereas estimation of π is based on the mean number of pair-wise nucleotide differences in a sample (Tajima, Reference Tajima1983), θw is based on the number of segregating mutations (Watterson, Reference Watterson1975). To test whether the observed allele frequency spectrum was in accordance with the expectations from the neutral model of evolution in each DNA fragment, Tajima's D (TD) (Tajima, Reference Tajima1989) was calculated, which is based on the normalized discrepancy between π and θw. A test of significance of TD was conducted after coalescent simulation following a standard neutral model using DnaSP. We also calculated other tests of neutrality, such as Fu and LI's D* (FLD) (Fu and Li, Reference Fu and Li1993) and Fu and Li s F* (FLF) (Fu and Li, Reference Fu and Li1993). Both the FLD and the FLF rely on the differences between the number of polymorphic sites in external branches (polymorphisms unique to an extant sequence) and number of polymorphic sites in internal phylogenetic branches (polymorphisms shared by extant sequences) (Zhang and Ge, Reference Zhang and Ge2007). For both tests, negative values indicate an excess of low-frequency polymorphisms, whereas positive values indicate an excess of intermediate polymorphisms (Tajima, Reference Tajima1989; Fu and Li, Reference Fu and Li1993; Fu, Reference Fu1997). Furthermore, 2 measures of Linkage Disequilibrium (LD) analyses were performed, one between each possible pair of SNPs by estimating r2 (Weir, Reference Weir and Austin1996), using Haploview (Barrett et al. Reference Barrett, Fry, Maller and Daly2005), and the other between each haplotype among the 11 DNA loci by estimating the index of association, I A (Brown et al. Reference Brown, Feldman and Nevo1980; Haubold et al. Reference Haubold, Travisano, Rainey and Hudson1998) using Multilocus 1.3 (Agapow and Burt, Reference Agapow and Burt2001). Correlation between LD values and the physical distance between each pair of SNPs was also calculated using Pearson's correlation coefficient (r). We also estimated the recombination parameter (R=4Ner) (Hudson, Reference Hudson1987) of only 4 polymorphic DNA fragments using DnaSP 4.5, as no realistic estimation could be obtained for the other 7 fragments. Correlation between nucleotide diversity and the recombination rate was also computed by using Pearson's correlation coefficient (r).
RESULTS
Distribution of SNPs in introns and intergenic regions
Out of 27 total selected non-coding fragments (17 intergenic and 10 introns), only 17 fragments (10 intergenic and 7 introns) could be amplified and sequenced (Fig. 1); this discrepancy might be due to the sequence polymorphism in the primer sequences. We sequenced a total of 7012 bp of non-coding genomic DNA from each isolate of Indian P. vivax where all the 17 fragments could be successfully sequenced. Since amplification and/or sequencing of all 17 DNA fragments was not possible in all 16 isolates, the number of isolates varied for different fragments (from a minimum number of 8 to a maximum of 16). Out of 17 amplifiable fragments, only 11 (7 intergenic and 4 introns) were found to be polymorphic for single nucleotide mutations. In total, 18 SNPs (termed as PVS1-PVS18) were detected in these 11 polymorphic DNA fragments, out of which only 2 were singletons (SNP that occur only once in the population). Considering that this is the first report of SNPs in Indian P. vivax, we consider all these SNPs to be novel. In 3 of the 15 isolates for one intergenic fragment (P13), 2 adjacent nucleotides had overlapping peaks for CC/AT and another 3 isolates had overlapping peaks for CT/GC, which might be due to a mutation in the infection (Joy et al. Reference Joy, Gonzalez-Ceron, Carlton, Gueye, Fay, McCutchan and Su2008). This is suggested because all the other fragments in these isolates had a single allele. In 1 intron fragment (P3), a fixed mutation was found to be segregating in Indian isolates when compared to the P. vivax reference sequence (http//:www.plasmodb.org). Moreover, the number of SNPs varies among fragments; the intergenic fragments contain 13 and the introns had only 5 SNPs. On average, 1 SNP per 390 nucleotides was found in the non-coding DNA fragments of Indian P. vivax. For intergenic regions this frequency was a bit higher (1 SNP in 320 bases) than in the introns (1 SNP in 571 bases).
Haplotype and nucleotide diversity
With SNPs segregating in 11 DNA fragments, we ascertained different haplotypes and estimated the haplotype diversity (Hd) (Nei, Reference Nei1987) for each fragment. The number of haplotypes ranged from 1 to 5 and the Hd value ranged from 0·125 to 0·822 (Table 2). The average Hd for introns was 0·238 and for intergenic regions 0·349 (Table 2), indicating that haplotype diversity is greater in intergenic DNA fragments than in introns. This pattern is very similar to that of the density of SNPs (Table 2), where more SNPs were found to be segregating in intergenic DNA fragments (see above). Furthermore, 2 different measures of nucleotide diversity, θw (Watterson, Reference Watterson1975) per site and π (Tajima, Reference Tajima1983) per site were estimated separately for each of the 11 polymorphic fragments (Table 2) which vary across the fragments. The average θw per site for intergenic regions was 0·0012 and for introns was 0·0005, and the average values of π per site was 0·0015 (intergenic regions) and 0·0007 (introns). Since the DNA fragments come from a contiguous DNA region, we determined the pattern of nucleotide diversity across the whole ~200 kb region by plotting the values of θw and π of all the 17 fragments across the region (Fig. 2). Both estimates of nucleotide diversity are largely parallel to each other, with values of π being higher than that of θw in the majority of the fragments. However, it is clear that the introns display less nucleotide diversity than the intergenic regions (Fig. 2), corresponding to the results for the number of SNPs and haplotype diversity, and also in agreement with the average θw and average π between these 2 types of fragments (see above). Interestingly, both the measures of nucleotide diversity fluctuate drastically across the whole region (in comparison to the average π and θw values, Fig. 2). The 2 most diverse fragments were the 4th (intergenic) and the 13th (intergenic) and nucleotide diversity drops in and around these 2 fragments (Fig. 2). Furthermore, 1 of the fragments, P16 (Fig. 1), was found to be monomorphic and the adjoining P17 (Figs 1 and 2) had only 1 single SNP. Interestingly, 3 fragments (P13, P14 and P15) display notably higher nucleotide diversities than the adjacent downstream fragments P16 and P17, which flank the circumsporozoite protein (CSP) gene (Fig. 1).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160626195348-84680-mediumThumb-S0031182010000533_fig2g.jpg?pub-status=live)
Fig. 2. Pattern of variation in nucleotide diversity across non-coding DNA fragments in the ~200 kb DNA region of Indian Plasmodium vivax.
Table 2. Details of the DNA fragments and population genetic parameters in Indian Plasmodium vivax
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160626195715-63004-mediumThumb-S0031182010000533_tab2.jpg?pub-status=live)
* Total number of nucleotide base pairs.
Tests of neutrality
In order to determine if any of the 11 polymorphic DNA fragments deviate from the standard neutral model of molecular evolution, we have conducted several statistical tests. TD (Tajima, Reference Tajima1989) values have been calculated and plotted across the studied fragments of Indian P. vivax (Fig. 3). Apart from 3 fragments (2 intergenic and 1 intron), TD values were nominally positive in all other 8 fragments; however, none of the values were significantly different from zero, suggesting that all the 11 DNA fragments evolve putatively neutrally in this Indian population of P. vivax. Interestingly, however, the lowest TD value in the study was observed in P17 (TD=−1·1622, Table 2, Fig. 3) and the adjoining fragment with polymorphism (P15) has a comparably higher TD value (1·0766). Similarly, the FLD (Fu and Li, Reference Fu and Li1993) values were generally found to be positive across the fragments but none of them were statistically significant (Table 2). As for TD, the lowest (and negative) value of FLD was found for P17.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160626195346-30031-mediumThumb-S0031182010000533_fig3g.jpg?pub-status=live)
Fig. 3. Pattern of Tajima's D values across 11 polymorphic non-coding DNA fragments in Indian Plasmodium vivax.
Linkage disequilibrium (LD) and recombination
In order to know if the 18 SNPs detected in Indian P. vivax were evolving independently of each other, we estimated linkage disequilibrium (LD) between all possible pairs of SNPs. The results indicate 2 instances of statistically significant r2 values: one between the 3rd (P2) and 5th (P4) SNPs and the other between the 8th and 9th SNPs of a single fragment (P6) (Fig. 4). It is to be noted that the ASP gene is located in between P2 and P4 putatively neutral DNA fragments. We also tested the correlation of the LD values with physical distance (in nucleotide base pairs) between a pair of SNPs and found no correlation (Pearson's r=−0·082, P=0·312), indicating that the 2 statistically significant LDs cannot simply be attributed to their physical closeness. Furthermore, in order to check if all the 11 polymorphic DNA fragments were randomly associated, we calculated the index of association (I A) (Brown et al. Reference Brown, Feldman and Nevo1980; Haubold et al. Reference Haubold, Travisano, Rainey and Hudson1998) among the fragments. The observed I A value 0·189 was statistically non-significant (P>0·05), indicating no association among the haplotypes of the DNA loci presently studied in this northeastern Indian population of P. vivax. The intragenic recombination parameter (R=4Ner) (Hudson, Reference Hudson1987) was also estimated for polymorphic DNA fragments and showed a positive and statistically significant correlation (r=0·972, P<0·05) with nucleotide diversity (π).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160626195343-72336-mediumThumb-S0031182010000533_fig4g.jpg?pub-status=live)
Fig. 4. LD plot (r2) between 153 possible pairs of SNPs in Indian Plasmodium vivax. The strength of statistical significance of LD between a pair of SNPs is represented with the extent of darkness of the boxes. The two completely black squares thus represent statistically significant LD; one between PVS3 and PVS5 and the other between PVS8 and PVS9.
DISCUSSION
The present study aims to develop putatively neutral DNA fragments for understanding the demographic history of a widespread malaria pathogen, P. vivax. Since ideal genetic markers to uncover population genetic structure and demography of this species must rely upon several factors, we have considered each of these factors while developing such markers in this study. Out of the 27 different non-coding DNA fragments located in a 13th chromosomal region of P. vivax that is syntenic with P. falciparum chromosome 14, only 11 were found to be suitable. This is because the 2 principle criteria (amplification and intra-population variability) for choosing a fragment for population genetic studies was found to be fulfilled only by these 11 non-coding DNA fragments, in addition to several other criteria (see below).
Distribution of SNPs and associated genetic diversity estimates (e.g., haplotype diversity, nucleotide diversity) and other population genetic parameters (see below) seem to confirm the suitable nature of these 11 DNA fragments. Sequencing and sequence alignment of all the 11 fragments in a malaria-endemic population sample of India, however, resulted in a different pattern of genetic variation across isolates, fragments and type of genetic regions (introns and intergenic regions). A general observation of comparatively higher genetic variation in the intergenic regions clearly corroborates earlier observations in this species (Feng et al. Reference Feng, Carlton, Joy, Mu, Furuya, Suh, Wang, Barnwell and Su2003) in another ~100 kb syntenic region of P. vivax with P. falciparum. Although no other study in P. vivax has compared the level of diversity in these 2 types of non-coding genetic regions, Jongwutiwes et al. (Reference Jongwutiwes, Putaporntip, Friedman and Hughes2002) have reported complete absence of polymorphism in the introns of MSP4 and MSP5 genes in P. falciparum as compared to the flanking intergenic regions and exons. Similar observations supporting a low level of polymorphism in introns have also been reported in humans and P. falciparum (Cereb et al. Reference Cereb, Hughes and Yang1997; Hughes, Reference Hughes2000; Volkman et al. Reference Volkman, Barry, Lyons, Nielsen, Thomas, Choi, Thakore, Day, Wirth and Hartl2001). Since introns might be influenced by natural selection that acts on the functional part of their host genes, and introns themselves might bear some functional elements (Fedorova and Fedorov, Reference Fedorova and Fedorov2003), loss in genetic diversity in introns might be due to indirect and direct influences of natural selection (Parsch, Reference Parsch2003).
Although different tests for neutrality provided different results for the 11 non-coding DNA fragments, in no case were statistically significant results found that deviated from a standard neutral model of molecular evolution. Taking this together with the results from different diversity estimates (see above), all 11 fragments could thus be considered as putatively neutral genetic markers for population genetic studies in P. vivax. This contention is further substantiated by the fact that all these 11 genetic loci were found to be evolving independent of each other (i.e., had statistically insignificant values of I A).
Overall, P. vivax isolates from the northeastern Indian population sample were found to be fairly highly polymorphic, both in the extent of diversity in SNPs and in all the other employed measures of nucleotide diversity. Interestingly, 2 incidences of statistically significant associations between 2 different pairs of SNPs were detected. One pair surrounds the aspartic protease (ASP) gene. This seems to be a local event, as none of the other 151 possible combinations show statistically significant LD (covering the whole ~120 kb region), nor the I A value was statistically significant among all the 11 DNA fragments. Similar findings of very rare LD with high genetic diversity between the microsatellites studied in Asian samples have also been reported in P. vivax (Imwong et al. Reference Imwong, Nair, Pukrittayakamee, Sudimack, Williams, Mayxay, Newton, Jung, Nandy, Osorio, Carlton, White, Day and Anderson2007). This suggests further investigation to see if a definite haplotype structure exists in this particular region, as the parasite ASP gene is considered to be a potential anti-malarial target (Coombs et al. Reference Coombs, Goldberg, Klemba, Berry, Kay and Mottram2001). If such association is found to exist in other populations as well, one of these SNPs under LD could be potentially considered for association mapping. Moreover, the sudden drop in nucleotide diversity in the fragment P16 (completely monomorphic) and P17 (only 1 SNP) compared to the immediately upstream fragments P13, P14 and P15 is equally interesting. Furthermore, estimates of the tests of neutrality (TD and FLD) were found to be negative in P17. Low diversity and negative TD in putatively neutral genetic regions bear the hallmarks of a recent selective sweep, i.e., genetic hitchhiking (Braverman et al. Reference Braverman, Hudson, Kaplan, Langley and Stephan1995). Since the circumsporozoite protein gene (CSP) is flanked by the P16 and P17 DNA fragments and is considered to be an important vaccine candidate in malaria parasites (Cui et al. Reference Cui, Escalante, Imwong and Snounou2003), this particular DNA fragment is worth analysing further. All such contentions require verification using several other natural populations as well. The positive correlation between nucleotide diversity and rate of recombination is a novel finding in P. vivax, although it is a known phenomenon in several other organisms (Begun and Aquadro, Reference Begun and Aquadro1992; Andolfatto and Przeworski, Reference Andolfatto and Przeworski2001). Whether this phenomenon also exists in P. vivax, in general, remains to be seen since the present study is limited to a single population sample from India.
In conclusion, the putatively neutral genetic markers developed in this study could be helpful in inferring population genetic structure and demography using SNP markers in a comparatively less-studied malaria pathogen, and in tracing the origin and migration history of this species. On the other hand, India being one of the very rare countries where both P. vivax and P. falciparum malaria are of almost equal occurrence (Singh et al. Reference Singh, Mishra, Awasthi, Dash and Das2009), such a type of study could help not only in understanding the molecular epidemiology of P. vivax, but to devise new measures for an enduring solution to P. vivax malaria infection in the future.
ACKNOWLEDGEMENTS
We thank Dr Vas Dev, in charge of the Guwahati Field Unit of NIMR for his support in sample collection and V. K. Dua, Director-in-charge, NIMR, New Delhi for providing facilities. We are indebted to Drs. Jane Carlton and Steven Sullivan for suggestions, scientific comments and language modifications of the manuscript. We are also thankful to the two anonymous reviewers for critical and constructive suggestions in an earlier version of the manuscript. All the members of the Evolutionary Genomics and Bioinformatics Laboratory have provided valuable scientific suggestions and help in the present study.
FINANCIAL SUPPORT
The work has been supported by a Junior Research Fellowship to B.G. and the intramural funding of the Indian Council of Medical Research, New Delhi, India.