Published online by Cambridge University Press: 18 November 2004
The pond snail Lymnaea stagnalis is an intermediate vector for the liver fluke Fasciola hepatica, a common parasite of ruminants and humans. Yet, despite being a disease of medical and economic importance, as well as a potentially useful comparative tool, the genetics of the relationship between Lymnaea and Fasciola has barely been investigated. As a complement to forthcoming F. hepatica expressed sequence tags (ESTs), we generated 1320 ESTs from L. stagnalis central nervous system (CNS) libraries. We estimate that these sequences derive from 771 different genes, of which 374 showed significant similarity to proteins in public databases, and 169 were similar to ESTs from the snail vector Biomphalaria glabrata. These L. stagnalis ESTs will provide insight into the function of the snail CNS, as well as the molecular components of behaviour and response to parasitism. In the future, the comparative analysis of Lymnaea/Fasciola with Biomphalaria/Schistosoma will help to understand both conserved and divergent aspects of the host-parasite relationship. The L. stagnalis ESTs will also assist gene prediction in the forthcoming B. glabrata genome sequence. The dataset is available for searching on the world-wide web at http://zeldia.cap.ed.ac.uk/mollusca.html.
Although fascioliasis is often only considered to be a disease of economic importance, up to 17 million people are infected world-wide (Esteban, Bargues & Mas-Coma, 1998; Hurtrez-Bousses et al. 2001). The Lymnaeid snails are the principal intermediate vectors for the disease, caused by the liver fluke Fasciola hepatica (Hurtrez-Bousses et al. 2001). Snails become infected with F. hepatica when a motile miracidium hatches from parasite eggs in faeces-contaminated water and penetrates their body. The cycle is completed when cercaria emerge from the snail, attach to submerged vegetation and are eaten by the definitive host, most frequently a ruminant. The life-cycle of F. hepatica in Lymnaea stagnalis is typical of the dignean platyhelminthes and similar to that of schistosomes in Biomphalaria glabrata, suggesting that a comparison between different systems may help understand commonalities in the host-parasite relationship (Bayne, Hahn & Bender, 2001; Sorensen & Minchella, 2001).
The Lymnaea – Fasciola pairing has been investigated with respect to life-history, prevalence, and population dynamics of the host-parasite interaction, but to a much lesser extent than has Biomphalaria and its Schistosoma parasites (Hurtrez-Bousses et al. 2001; Mas-Coma, Funatsu & Bargues, 2001; Sorensen & Minchella, 2001). Fasciolid genetics has rarely been investigated (Vignoles, Dreyfuss & Rondelaud, 2002; Meunier et al. 2004). As of April 2004, only 140 F. hepatica and 165 L. stagnalis DNA sequences had been submitted to GenBank. In comparison, the S. mansoni genome project is approaching completion (http://www.sanger.ac.uk/Projects/S_mansoni/), and F. hepatica ESTs will be produced shortly, following an initiative at the Sanger Centre (http://www.sanger.ac.uk/Projects/S_mansoni/). Large-scale sequencing of B. glabrata expressed sequence tags (ESTs) is under way, a BAC library is available, and the genome will shortly be sequenced to 4 to 6-fold coverage by the National Human Genome Research Institute (http://biology.unm.edu/biomphalaria-genome/; http://www.genome.gov/11007951).
While the outcome of an attack by a schistosome parasite is genetically determined (Bayne et al. 2001; Jones et al. 2001), genetic factors that control the resistance to trematode infection are relatively poorly understood in both L. stagnalis (Hoek et al. 1997) and the definitive hosts (Piedrafita et al. 2004). In contrast, an understanding of the genetic basis of the host-parasite relationship is more advanced in B. glabrata, where candidate genes involved in parasite clearance (Knight et al. 1998, 1999; Jones et al. 2001), and genes that are upregulated during parasitosis (Léonard et al. 2001; Miller et al. 2001; Raghavan et al. 2003; Nowak et al. 2004) have been isolated (reviewed most recently by Knight, Ongele & Lewis, 2000; Lockyer et al. 2004). Ultimately, only detailed studies of snail genes that are expressed upon exposure to a parasite will improve our understanding of snail resistance to infection.
L. stagnalis has been used as a model in studies of neuronal function (Smit et al. 1992, 2001; Smit, Hoek & Geraerts, 1993), toxicology (Frantsevich et al. 1996; Salanki, 2000), the molecular basis of behaviour and learning (Chase, 2002; Lukowiak et al. 2003), and the evolution of body asymmetry (Hosoiri, Harada & Kuroda, 2003). Snails in general are useful for understanding how natural selection operates (Davison, 2002). Thus, the present study was initiated with the aim to rapidly identify genes in L. stagnalis, and compare them with genes in B. glabrata and other organisms, through EST analysis. In this approach, DNA sequences derived from randomly selected cDNAs can be used to define the genes expressed by an organism. It is a useful technique where no prior sequence information is available (Daub et al. 2000; Kenyon et al. 2003). As a first step, two L. stagnalis central nervous system (CNS) cDNA libraries were used, because it has been reported previously that neuropeptide genes moderate behaviour during parasitosis (Hoek et al. 1997), but also because L. stagnalis is a model for neuronal function.
Two lambda cDNA libraries were prepared previously using similar methods, both described in detail by Hoek et al. (1997) and Sadamoto et al. (2004). Both lab stocks derive from the same wild collection (circa 1960), from Eempolder, The Netherlands. The libraries were plated on E. coli XL-1-Blue cells and recombinant clones picked at random. The cDNA inserts of clones were amplified by PCR using the universal T3 and T7PL primers and sized on a 1·2% agarose gel. PCR products were cleaned and sequenced using the SAC primer (GGGAACAAAAGCTGGAG) with Big Dye v3.0 and an ABI 3730 automated sequencer. The phage stocks were archived at −70 °C, and are available for research purposes on request. Several clones were also completely sequenced, using an internal, clone-specific sequencing primer.
Vector and poor quality sequences were removed using an automated method, implemented in TRACE2DBEST, then clustered on the basis of sequence identity using CLOBB, both programs within PARTIGENE (Parkinson et al. 2004; available from http://www.nematodes.org/PartiGene/). Briefly, CLOBB is an iterative clustering method where sequences are grouped together on the basis of BLAST similarity. The program identifies ‘superclusters’ of related clusters and attempts to avoid expansion of chimeric clusters. A consensus sequence was derived for each cluster. Each consensus sequence was compared to the public databases (GenBank non-redundant nucleotide and protein databases and B. glabrata ESTs) using the BLAST algorithms (Altschul et al. 1990, 1997) within PARTIGENE. Genes were assigned to gene classes, first by translating them using prot4EST v2.0 (James Wasmuth, unpublished; downloaded from http://www.nematodes.org/PartiGene/), then gene ontology annotating them using GOblet (Hennig, Groth & Lehrach, 2003). The output clusters (e.g. LSC00010) and analyses are available at http://zeldia.cap.ed.ac.uk/mollusca.html. The website also has an equivalent cluster analysis for B. glabrata, and each cluster is fully searchable, by keyword and BLAST.
Prediction of signal peptides and cleavage sites was carried out using SignalP 3.0 (Bendtsen et al. 2004) and ProP 1.0 (Duckert, Brunak & Blom, 2004). Sequence conservation plots were produced using the Sequence Manipulation Suite 2 (http://bioinformatics.org/sms2/), using pre-defined groups of similar amino acids (GAVLI, FYW, CM, ST, KRH, DENQ, P). In the plots, positions with identical amino acids (above a threshold proportion) are shaded black, whereas positions with different, but same-group amino acids are shaded grey.
Phylogenetic analyses were performed using PAUP* (version 4.0b10). Evolutionary trees were constructed using the neighbour-joining method (NJ) with distances corrected for multiple hits by using the general time-reversible (GTR) model with between-site rate heterogeneity accounted for by incorporating a proportion of invariant sites and gamma-distributed rates into the model. The rate matrix, base frequencies, proportion of invariant sites and shape parameter (alpha) of the gamma distribution (based on 16 rate categories) were estimated using likelihood by iteration from an initial neighbour-joining tree. The parameters estimated from the initial tree were then used to build a new neighbour-joining tree and the parameters re-estimated. This process was repeated until there was no further improvement in likelihood.
The lambda inserts of the cDNA library developed by Hoek et al. (1997) were randomized with respect to orientation. This made single pass sequencing less efficient (only half of all sequencing reactions work due to problems sequencing through the poly-A tail), so only 33 sequences from this library were used. The remainder of the sequences (1287) were from the library developed by Sadamoto et al. (2004). From a total of 1320 sequences longer than 100 bp (GenBank Accession numbers CN809706-CN811025), the analysis yielded 771 clusters (see http://zeldia.cap.ed.ac.uk/mollusca.html), with 374 (48%) showing significant BLASTx similarity to another sequence in the nonredundant databases. 650 EST clusters (49%) had one sequence only, and 253 (39%) of these singletons had significant homology to another sequence in the databases (Fig. 1).
Fig. 1. Assessment of redundancy of the Lymnaea stagnalis EST dataset. The ESTs were clustered into putative genes. The graph shows the relative abundance of each EST-size-class of cluster. Most genes were represented by a single EST.
In total, 276 L. stagnalis clusters were assigned Gene Ontology (GO) terms using GOblet, and higher-level GO terms were extracted for the 3 GO domains (Fig. 2). Of these, 261 had terms for molecular function assigned, and within these ‘binding’ and ‘catalytic activity’ were the most common GO assignments (Fig. 2A). Of the clusters with GO molecular function terms 12% were assigned to ‘transporter activity’ or ‘signal transduction’, including ion transporters, protein transporters and transmembrane neuroreceptors. In the GO biological process domain (Fig. 2B), just over half of the assignments were to ‘physiological processes’ such as metabolism (including neurotransmitter and amine metabolism), stimulus response, and secretion of neurotransmitters.
Fig. 2. Annotated GO terms and protein domains in the Lymnaea stagnalis EST dataset. (A) GO Molecular Function (GO:0003674; 261 matches). (B) GO Biological Process (GO:0008150; 252 matches). In the Cellular Process subcategory (GO:0009987), the largest groups were associated with cell communication, cell growth and/or maintenance and cell death. In the Physiological Process subcategory (GO:0007582), the largest groups were associated with metabolism, stimulus response, and neurophysiological processes.
Previous to this work, genes reported previously in L. stagnalis (165 DNA sequences, 304 proteins; GenBank May 2004) include high copy-number genes isolated for phylogenetic analyses (Remigio & Blair, 1997; Remigio & Hebert, 2003), and high abundance peptides expressed in the central nervous system (e.g. Smit et al. 1992, 1993, 2001; Kellett et al. 1994; Table 1). Only a small proportion of the ESTs (~9%) were derived from previously isolated L. stagnalis sequences, but 12 of the most abundant ESTs that we isolated have been isolated previously as cDNAs (Table 1). We also identified neuropeptides novel to L. stagnalis (Fig. 2), an immune defence gene (Fig. 3), and an abundant gene family with features suggestive of secretion and processing to form smaller peptide signal molecules (Fig. 4; 8 peptides were derived from 10 different DNA sequences).
Table 1. Genes expressed abundantly in the Lymnaea stagnalis central nervous system (Many of the abundant clusters have been identified previously in L. stagnalis or a related mollusc. In contrast,~52% of the other sequences showed no significant similarity to other proteins.)
Fig. 3. Conservation plot of achacin-like partial protein sequences from Aplysia punctata (AY442281, AAR14186, AAR14187, CAC19361, CAC19362), Aplysia californica (AAN78211), Achatina fulica (P35903), Biomphalaria glabrata (CN476061, CN810215, CN810515, CN810525, CN810863) and Lymnaea stagnalis (CN476059, CN445880, CN445878, CN476093; cluster LSC00366). Both the L. stagnalis and B. glabrata sequences are partial.
Fig. 4. A putative novel gene family in Lymnaea stagnalis identified by EST analysis (clusters LSC00481 and LSC00059). (A) Sequence conservation plot of the predicted near full length translations of 8 different peptides, including the signal region (*). Two putative lysine (K)/arginine (R) propeptide cleavage sites are present (#). (B) Neighbour-joining DNA phylogeny of the gene family. Two divergent groups were recovered, with one containing 4 separate groups. Bootstrap support >95% is shown by an asterisk.
As the public sequence databases are growing exponentially, BLAST-based analysis of similarities can only represent a snapshot in time of the relationships and putative functions of genes identified through EST sequencing. The B. glabrata EST set is also growing rapidly, but in May 2004, 169 or 22% of the L. stagnalis cluster consensus sequences had a significant match to a B. glabrata sequence. Abundant sequences in L. stagnalis are also more likely to have been isolated in B. glabrata, because clusters of two or more sequences more often have a B. glabrata sequence match (37%) compared with singleton clusters (16%). Some of these similarities to B. glabrata are shown in Table 2.
ESTs are an effective way of sampling the expressed genome of an organism, and are a route to rapid identification of conserved genes in otherwise neglected taxa. Thus, many of the L. stagnalis ESTs are derived from genes that are probably universally present in animals, but had not previously been reported from the superphylum Lophotrochozoa, of which Mollusca is a member (Table 3). These genes may be useful for deeper phylogenetic analyses (e.g. Elongation factor 1 alpha, clusters LSC00016, LSC00124), as will the discovery of 7 of the expected 13 mitochondrial genome-encoded genes (Table 4).
The announcement of the B. glabrata genome project will provide an invaluable resource to facilitate gene discovery and genetic mapping in Biomphalaria and other snails. Complementary to that, the L. stagnalis ESTs that we have isolated will facilitate gene prediction in the complete Biomphalaria genome, because although many (~78%) do not have current BLAST hits to a Biomphalaria EST, we expect that most of these genes are conserved between these two relatively closely related species. However, out of necessity, we here concentrate our analysis on genes that have been isolated in both species or other species (~48% of the total), especially those which are interesting from a parasitological perspective.
There has been an extensive and rapidly increasing effort to elucidate the genes involved in conferring resistance of Biomphalaria to infection by schistosomes and other parasites, work which was reviewed most recently by Knight et al. (2000) and Lockyer et al. (2004), and described extensively in a special issue of Parasitology (vol. 123, issue 7). While an in depth review of the molecular genetics of B. glabrata/S. mansoni interaction is beyond the scope of this discussion, we were interested to know if there were any orthologues of the putative resistance genes (or genes linked to resistance genes) in our dataset, even though the snails had not been exposed to parasites.
The response to parasitism is poorly understood in L. stagnalis, and has been studied in greater depth at the molecular genetic level in B. glabrata (Knight et al. 1998, 1999; Jones et al. 2001; Léonard et al. 2001; Miller et al. 2001; Raghavan et al. 2003; Nowak et al. 2004). In both species, the main line of defence is the circulating haemocyte cell type but, at least in B. glabrata, a range of soluble factors such as haemolymph proteins and cytokine-like components also contribute to resistance (Bayne et al. 2001). For example, upon exposure to infection by trematodes, B. glabrata snails increase production of fibrinogen-related proteins (FREPs) in the haemolymph, which recognize and precipitate trematode antigens (Adema et al. 1997, 1999; Léonard et al. 2001; Zhang et al. 2004). We did not identify any expressed FREPs in L. stagnalis, which is perhaps not surprising since the snails had not been exposed to a parasite, and the libraries were CNS-derived.
Differential display experiments have also shown that reverse transcriptases are abundantly expressed in infected B. glabrata, but only at moderate levels in uninfected snails (Raghavan et al. 2003). The reverse transcriptases probably derive from endogenous retrovirus activity, suggesting that infected B. glabrata may be a compromised host in which pathogens are more free to replicate. Reverse transcriptase sequences were also rare in the L. stagnalis EST dataset (CN809742, CN809820, CN810394, CN811017 only), but their presence is still intriguing.
More significantly, we isolated heat shock protein (HSP70; clusters LSC00649, LSC00099; GenBank AF025477), which has been shown to be upregulated in schistosome-resistant B. glabrata (Jones et al. 2001; Lockyer et al. 2004). On a similar note, we also isolated a gene with globin domains from L. stagnalis (LSC00360), an orthologue of which is again differentially expressed in B. glabrata (Lockyer et al. 2004). Similarly, several neuropeptides that were also isolated in L. stagnalis, including pedal peptide, show divergent expression during parasitosis of B. glabrata, perhaps as a result of the parasite modulating snail behaviour (Hoek et al. 1997).
Recently, Nowak et al. (2004) used suppression subtractive hybridization to enrich for transcripts that are expressed in a resistant strain of B. glabrata. Eighty-eight unique ESTs were isolated, and further screening showed that 22 of these were significantly up-regulated in exposed infected snails. The majority of the ESTs were novel. Comparing these putative resistance genes to the L. stagnalis dataset identified 7 putative orthologues, 2 of which were identified by Nowak et al. (2004) as differentially expressed. One gene, cytochrome c oxidase subunit VIb (CN809827, cluster LSC00086 in L. stagnalis; CD760681 in B. glabrata), is also associated with an RFLP marker that distinguishes resistant from susceptible strains of snails (Knight et al. 1998). The other is a gene of unknown function (CN810899, cluster LSC00764 in L. stagnalis; CD760608 in B. glabrata), with no homologues in any other organisms.
Several other putative immune defence genes were also isolated, including an orthologue of achacin from Achatina fulica (Obara et al. 1992), and ink toxin or aplysianin from the sea-hare (Aplysia punctata; Butzke et al. 2004), both of which are antibacterial amino oxidases (Tossi & Sandri, 2002). Multiple copies of a B. glabrata achacin/aplysianin orthologue are also present in dbEST. Finally, Knight et al. (1999) used genetic crosses to identify two RAPD markers (both repetitive sequences within the genome) that co-segregate with resistance to schistosomes. Similar repetitive sequences are not present in our EST data set.
Many of the most abundant sequences from our L. stagnalis CNS cDNA library corresponded to previously identified L. stagnalis sequences, though some, such as ovulation prohormone precursor and APGWamide, are known from their protein sequence only (Ebberink et al. 1985; Smit et al. 1992). Moreover, at least 8 of the most abundant ESTs (Table 1) corresponded to known neuropeptides (e.g. APGWamide, FMRFamide, Smit et al. 1992, 1993; preproLYCP, Kellett et al. 1994). One L. stagnalis cluster (CN810391, CN810195; LSC00324) had 100% amino acid identity (63% DNA) with the achatin neuropeptide, a gene previously isolated in Achatina fulica (Giant African land snail; Satake et al. 1999) and Helix lucorum (garden snail). Another L. stagnalis cluster (CN809782; LSC00018) had 100% identity (over 46 amino acids) with APGWamide, first isolated in Aplysia californica (U85585). Additional neuropeptide genes were also identified in the lower copy number clusters, bringing the known total to more than 30.
Few of the previously identified L. stagnalis neuropeptides had significant similarity to B. glabrata sequences, probably because no CNS ESTs have been generated from the latter snail, and the expression of many genes will be tissue-specific. Instead, many of the L. stagnalis and B. glabrata joint hits were conserved ribosomal or structural proteins, which are presumably expressed in most tissues. As studies on B. glabrata have thus far focused on haemocytes (e.g. Raghavan et al. 2003) and other tissues including the ovotestis and haemopoetic organ, then this emphasizes that a range of other tissue-specific libraries should be sampled to fully elucidate the transcriptome.
Of particular interest for future research are the abundant ESTs that have no sequence similarity to other genes (~52%), yet which must still have an important function. For example, one EST cluster from L. stagnalis, proved to contain a remarkable diversity of different, but closely related sequences (10 sequences were derived). It will be interesting to see whether the same gene family exists in B. glabrata (which may require a CNS EST library, or else a direct search) and more distantly related snails and molluscs.
From an evolutionary perspective snails are part of a large clade, the Lophotrochozoa, that includes not only their trematode platyhelminth parasites, but also annelids (e.g. earthworms), bryozoans, and rotifers, amongst other phyla (Winnepenninckx et al. 1995; Peterson & Eernisse, 2001). The group in general has been under-represented by recent genome sequencing efforts, except for the schistosome genome project and an earthworm EST project (www.earthworms.org; Sturzenbaum et al. 2003). The situation is rapidly changing with the recent addition of a large number of oyster Crassostrea sp. ESTs (>5000; Jenny et al. 2002; Gueguen et al. 2003) and the drive to sequence the B. glabrata genome (http://www.genome.gov/11007951). Thus, it is not surprising that several genes were identified in the L. stagnalis EST dataset that have not been previously described from any lophotrochozoan species. As about 48% of the clusters hit another protein in the non-redundant databases, and only 24% were significantly similar to any lophotrochozoan sequences, we estimate that around 190 genes were isolated for the first time in the Lophotrochozoa, several of which are especially interesting from an evolutionary perspective.
The homeobox protein PBX1 (known in Drosophilamelanogaster as extradenticle) is a transcription factor involved in development that has been described previously in both the Ecdysozoa and Deuterostomia (Monica et al. 1991; Rauskolb, Peifer& Wieschaus, 1993). The cDNA clone corresponding to the EST was recovered from stocks and sequenced using internal primers, yielding a complete open reading frame and ~230 bp of the 5′ untranslated region (CN809997). Interestingly, though there are many more sequences from the earthworm (Lumbricus rubellus; ~12000) and Schistosoma mansoni(~156000) in GenBank, the earthworm PBX1 has not been isolated and there is only one putative PBX1 EST from S. mansoni. Similarly, an alpha coat protein (involved with the traffic of proteins through the secretory pathway), a dynein (microtubule-based motor protein), a ruvB-like protein (branch migration of Holliday junctions), and finally a DEAD-box peptide were also novel lophotrochozoan isolates. Another interesting cluster was a putative glycosylhydrolase family 9 gene (GHF9). GHF9 endoglucanases degrade cellulose, yet few orthologues have previously been isolated from the Metazoa, giving rise to the suggestion that the gene was gained by horizontal gene transfer from bacteria, though other explanations are possible (Lo, Watanabe & Sugimura, 2003). This clone was also recovered and sequenced. Initial analyses indicate that it is the orthologue of a previously identified abalone (Haliotisdiscus) gene, lending support to the hypothesis that these GHF9 genes were anciently present in animals (Loet al. 2003).
Some of the existing L. stagnalis DNA sequences were produced for phylogenetic reconstruction, in particular mitochondrial DNA sequences. Previously, these analyses have been limited to 16S rRNA and cytochrome oxidase subunit I genes, because of the lack of conserved primers available for molluscs (Remigio & Blair, 1997; Remigio & Hebert, 2003). The EST data set contained 5 additional mitochondrial genes: NADH dehydrogenase subunits II and IV, cytochrome oxidase subunit II, ATP synthase subunit VI, and 12S rRNA, which can now be used, in conjunction with complete and partial mitochondrial genomes from other taxa, to develop further conserved primer sets. Sequencing and characterization of the complete L. stagnalis mitochondrial genome should also be straightforward. The B. glabrata mitochondrial genome has recently been sequenced (Dejong, Emery & Adema, 2004), and adding a L. stagnalis genome to the dataset could improve our understanding of the evolution of snails and their relationships with parasites.
For similar reasons, several of the ESTs may be useful for phylogenetic reconstruction at a variety of systematic levels. Highly conserved genes may be useful for recovery of branching orders of deep divergences. For example, a catenated dataset of beta-tubulin, alpha-tubulin, elongation factor 1 alpha (EF1alpha) and actin was used to attempt to resolve the higher order systematics of Eukaryota (Baldauf et al. 2000). Beta-tubulin has also been used to investigate microsporidian/fungal phylogeny (Keeling, Luker & Palmer, 2000) and the relationships between ‘jakobid’ flagellates (Edgcomb et al. 2001). However, 4 distinct beta-tubulin paralogues were recovered in the L. stagnalis ESTs. Phylogenetic analysis of these genes with other mollusc beta-tubulins revealed that at least 1 of the gene duplications that generated these 4 paralogues in L. stagnalis may be ancient (Fig. 5). This argues for caution in the use of this gene in phylogenetic analysis. In contrast, EF1alpha has almost always been found to be a single copy gene and this study, the earthworm EST project, and the schistosome genome project have only recovered a single copy of EF1alpha. Phylogenetic analysis using EF1alpha (LSC00016, LSC00124) recovers the expected tree (not shown).
Fig. 5. Neighbour-joining DNA phylogeny of paralogous beta-tubulin genes (665 bp). The 4 beta tubulin paralogues from L. stagnalis may be the result of ancient duplication.
Finally, these ESTs may be useful in illuminating some of the many other fascinating aspects of snail biology. L. stagnalis is a model for the evolution of body asymmetry because populations are sometimes polymorphic for shell and body asymmetry (sinistral and dextral coiling; Hosoiri et al. 2003; Shibazaki, Shimizu & Kuroda, 2004). This coiling asymmetry has been shown to be genetically determined, and it will be interesting to isolate the gene(s) responsible and compare them to orthologues in humans and nematodes. The ESTs provide a source of marker loci that could be developed into PCR fragment size, restriction fragment length polymorphism or single nucleotide polymorphism markers to facilitate genetic mapping (e.g. Choi et al. 2004; Komulainen et al. 2003).
These ESTs will also provide additional molecular markers for analysis of the function of the snail CNS, the molecular components of behaviour and response to parasitism. In particular, the diversity of neuropeptides discovered by EST sequencing suggests that neurohormonal control may be more complex than previously modelled. Comparative analysis with the Biomphalaria/Schistosoma pair will help to identify both conserved and divergent aspects of the host-parasite relationship.
This work was supported by funding from the Royal Society (A.D.), NERC and Wellcome (M.B.) for which the authors are very grateful. Thanks to Jen Daub, Claire Whitton, Katelyn Fenn, James Wasmuth, Alasdair Anthony, and Ralf Schmid for technical assistance. Ronald van Kesteren and Hisayo Sadamoto kindly supplied the cDNA libraries.
Fig. 1. Assessment of redundancy of the Lymnaea stagnalis EST dataset. The ESTs were clustered into putative genes. The graph shows the relative abundance of each EST-size-class of cluster. Most genes were represented by a single EST.
Fig. 2. Annotated GO terms and protein domains in the Lymnaea stagnalis EST dataset. (A) GO Molecular Function (GO:0003674; 261 matches). (B) GO Biological Process (GO:0008150; 252 matches). In the Cellular Process subcategory (GO:0009987), the largest groups were associated with cell communication, cell growth and/or maintenance and cell death. In the Physiological Process subcategory (GO:0007582), the largest groups were associated with metabolism, stimulus response, and neurophysiological processes.
Table 1. Genes expressed abundantly in the Lymnaea stagnalis central nervous system
Fig. 3. Conservation plot of achacin-like partial protein sequences from Aplysia punctata (AY442281, AAR14186, AAR14187, CAC19361, CAC19362), Aplysia californica (AAN78211), Achatina fulica (P35903), Biomphalaria glabrata (CN476061, CN810215, CN810515, CN810525, CN810863) and Lymnaea stagnalis (CN476059, CN445880, CN445878, CN476093; cluster LSC00366). Both the L. stagnalis and B. glabrata sequences are partial.
Fig. 4. A putative novel gene family in Lymnaea stagnalis identified by EST analysis (clusters LSC00481 and LSC00059). (A) Sequence conservation plot of the predicted near full length translations of 8 different peptides, including the signal region (*). Two putative lysine (K)/arginine (R) propeptide cleavage sites are present (#). (B) Neighbour-joining DNA phylogeny of the gene family. Two divergent groups were recovered, with one containing 4 separate groups. Bootstrap support >95% is shown by an asterisk.
Table 2. Selected genes isolated from both Lymnaea stagnalis and Biomphalaria glabrata
Table 3. Genes isolated from Lymnaea stagnalis that have not been previously described in the Lophotrochozoa
Table 4. Genes in the Lymnaea stagnalis EST data set that derive from the mitochondrial genome
Fig. 5. Neighbour-joining DNA phylogeny of paralogous beta-tubulin genes (665 bp). The 4 beta tubulin paralogues from L. stagnalis may be the result of ancient duplication.