INTRODUCTION
Gyrodactylus salaris Malmberg, 1957 is a monogenean flatworm that is among the most serious threats to wild and farmed Atlantic salmon (Salmo salar) today. Ectoparasitic, viviparous and with a direct life-cycle, this species can occur in such high numbers on its hosts, and throughout entire river systems, that it can cause the collapse of fish stocks. Losses are estimated at more than 15% of the total wild salmon catch in Norway alone, with concomitant costs from the loss of fisheries, tourism and the need to survey and eradicate infected stocks, amounting to more than USD 50 million per year (Mo et al. Reference Mo, Norheim and Hellesnes2004). Salmon stocks in 46 watercourses in Norway are threatened or have been lost (Hansen et al. Reference Hansen, Bachmann and Bakke2003; Mo and Norheim, Reference Mo and Norheim2005), and other fish species such as rainbow trout (Oncorhynchus mykiss), brook trout (Salvelinus fontinalis) and Arctic charr (S. alpinus) readily act as suitable hosts (Bakke et al. Reference Bakke, Harris and Cable2002; Robertsen et al. Reference Robertsen, Hansen, Bachmann and Bakke2006). In Europe, particularly in the UK, economically important salmon populations are at severe risk from the introduction of G. salaris with live fish, especially rainbow trout (Peeler and Thrush, Reference Peeler and Thrush2004). No convincing, parasite-specific control measures are yet available. Presently, the only methods to eradicate G. salaris include killing all fish by the use of the biocide rotenone or by aqueous aluminium to selectively kill the parasites (Soleng et al. Reference Soleng, Poleo, Alstad and Bakke1999, Reference Soleng, Poleo and Bakke2005; Poléo et al. Reference Poléo, Schjolden, Hansen, Bakke, Mo, Rosseland and Lydersen2004). Despite all eradication attempts, the parasite is still increasing its geographical distribution and in approximately 30% of the treated rivers, the parasite has reappeared (Mo and Norheim, Reference Mo and Norheim2005). Many EU states have instigated codes of practice for the movement of G. salaris (see Peeler et al. Reference Peeler, Thrush, Paisley and Rodgers2006), but there are few tools available to accurately track and understand the origins, spread and epidemiology of infections (Bakke et al. Reference Bakke, Cable and Harris2006).
The accurate diagnosis of infection is central to the control of G. salaris. Traditional methods for identification of gyrodactylids are based on morphometry of the attachment sclerites. However, more than 400 nominal species have been described within the genus Gyrodactylus (Harris et al. Reference Harris, Shinn, Cable and Bakke2004), and additional species are continuously discovered as new suitable host species are examined. Also, within G. salaris, there is significant variation in morphology. Its hook morphology closely resembles those of other related species. Discriminating among them requires high quality scanning electron micrographs and the application of sophisticated statistical tools (Shinn et al. Reference Shinn, Gibson and Sommerville2001). However, the discrimination between G. salaris and G. thymalli Zitnan, 1960 has proved difficult, since at least 1 Norwegian G. thymalli population fell within the morphological variation of G. salaris (Shinn et al. Reference Shinn, Hansen, Olstad, Bachmann and Bakke2004; unpublished observations). Thus, morphology alone does not easily lend itself to routine use in a diagnostic laboratory. To date, molecular identification methods have mainly used the internal transcribed spacers (ITS) of the nuclear ribosomal DNA (rDNA), since they display a low degree of variation within species (Zietara et al. Reference Zietara, Huyse, Lumme and Volckaert2002; Huyse et al. Reference Huyse, Audenaert and Volckaert2003; Matejusová et al. Reference Matejusová, Gelnar, Verneau, Cunningham and Littlewood2003). However, although G. thymalli has been described as a distinct species based on morphological, ecological and pathological grounds, no variation was detected in the ITS region when compared with G. salaris (see Cunningham, Reference Cunningham1997). Limited success has been achieved with the sequencing of the intergenic spacer (IGS) of rDNA (Sterud et al. Reference Sterud, Mo, Collins and Cunningham2002; Cunningham et al. Reference Cunningham, Collins, Malmberg and Mo2003; Hansen et al. Reference Hansen, Martinsen, Bakke and Bachmann2006), but a suite of alternative markers is still needed for the detection of population variation, to further understand the taxonomy and biology of the parasite, and to study its transmission and dispersion in space and time.
Mitochondrial (mt) genomes offer a wealth of informative characters, useful in phylogenetic and population genetic studies (Boore et al. Reference Boore, Macey and Medina2005). Many commercially and economically important parasites are now the focus of mitogenomic studies, as their mt genomes are now relatively easy to characterize and can be used as the basis for the design of molecular markers (Place et al. Reference Place, Feng, Steven, Fourcade and Boore2005). Thus far, partial sequences of the cytochrome c oxidase subunit 1 gene (cox1) have been used in studying the phylogeographic structure of G. salaris and G. thymalli (see Hansen et al. Reference Hansen, Bachmann and Bakke2003, Reference Hansen, Martinsen, Bakke and Bachmann2006; Meinila et al. Reference Meinila, Kuusela, Zietara and Lumme2004). These studies have revealed a high level of intra- and interspecific variation and could not confirm monophyly of either species. However, more variable regions are likely to be present in the other parts of the mt genome, offering higher resolution for studies on more recent evolutionary processes in G. salaris, such as host switching and speciation processes. Comparing the mt genome sequences of different species will identify regions of high sequence divergence. In the present study, we sequenced and characterized the mt genome of G. salaris, the first for any monogenean. We describe the gene order, codon usage, tRNA features and gene boundaries, and compare these features with those found in other parasitic flatworms. This study provides the first step to defining new molecular mt markers, which will assist in studying the evolution, ecology and epidemiology of this important fish pathogen and its control.
MATERIALS AND METHODS
Sample collection and preparation
Parasite samples of G. salaris from wild Atlantic salmon parr were collected in the river Signaldalselva, North Norway in September 2001. This river is infected via migrating infected parr from the river Skibotnelva, which, in turn, has been infected by an accidental introduction of Swedish salmon parr. Hansen et al. (Reference Hansen, Bachmann and Bakke2003) sequenced the partial cox1 of this strain (haplotype B), and found this haplotype in other Swedish rivers which drain into the Baltic Sea and are close to the Hölle hatchery, Sweden. The authors proposed that River Signaldalselva is most likely infected with G. salaris specimens originating from the Hölle hatchery, which is the type locality for this species. The samples were kept in 96% (v/v) ethanol at 5°C. For genomic DNA extraction, a total of 5–10 parasites were briefly air dried to remove the ethanol, pooled in 5 μl of H20 and digested by the addition of 5 μl of lysis solution, 1× PCR buffer (Eurogentec; 16 mm (NH4)2SO4, 67 mm Tris-HCl (pH 8·8 at 25°C), 0·01% Tween-20), 0·45% (v/v) Tween 20, 0·45% (v/v) NP 40 and 60 μg/ml of proteinase K (Sigma). The sample was incubated at 65°C for 25 min, followed by 10 min at 95°C to inactivate the proteinase.
Conventional and long polymerase chain reaction (PCR)
Part of the cytochrome b gene (cytb) was amplified (in 25 μl reaction volumes) using universal primers CytbF – CytbR (see Table 1) (Boore and Brown, Reference Boore and Brown2000) and puRe Taq Ready-to-Go PCR beads (Amersham Biosciences). The solution consisted of 20 ng of gDNA and 40 pm of each PCR primer; beads contained 1·5 U Taq polymerase, 10 mm Tris-HCl (pH 9·0), 50 mm KCl, 1·5 mm MgCl2, 200 μm each dNTP and stabilisers including BSA. Cycling conditions were as follows: initial denaturation for 3 min at 95°C, followed by 40 cycles of 30 sec at 95°C, 30 sec at 45°C, 2 min at 72°C, and a final extension for 10 min at 72°C. Specific reverse and forward primers (see Table 1) were designed from the sequence of this fragment and from partial cox1 from G. salaris available from GenBank (Accession no. AY258372). The mt genome was amplified by long PCR using the GeneAmp XL PCR kit (Applied Biosystems) or the Expand Long Template PCR System (Roche Applied Science) in 3 fragments employing primer pairs Gsal_CO1_544F – U12SR (2015 bp) Gsal_16SF – Gsal_CytbR (6109 bp) and Gsal_CytbF – Gsal_MIT3R (6537 bp). Additional overlapping fragments were amplified, using, for example, primer pairs U12SF – UNAD5R, UND1F – Gsal_CO1_204R). Cycling conditions were: an initial denaturation for 30 sec at 94°C, followed by 40 cycles of 20 sec at 94°C, 30 sec at 58–65°C, 8 min at 64–68°C, and final extension of 10 min at 68°C.
* General and long PCR primers have also been used as sequencing primers.
Cloning and sequencing
PCR products, covering the whole genome, were individually cloned using the TOPO®XL PCR Cloning Kit (Invitrogen), following manufacturer's instructions. Ten clones from each of the cloning reactions were grown for 15 h in 3 ml volumes of Luria-Bertani (LB) medium, shaking (220 rpm) at 37°C. Plasmid DNA was extracted using QIAprep Spin Miniprep Kit (Qiagen). Clones were examined for inserts by digestion with the restriction endonuclease EcoRI (Invitrogen). Two positive clones from each of the reactions were selected for sequencing, carried out using Big Dye Chemistry (version 1.1) in a 3730 DNA Analyser (Applied Biosystems). The flanking regions of the inserts were sequenced with forward and reverse M13 primers, and sequence identity was verified using the Basic Local Alignment Search Tool (BLAST) (available at www.ncbi.nih.gov/BLAST/). The remaining fragments were sequenced by primer walking (Table 1).
Annotation of sequences
Contiguous sequence fragments were assembled using Sequencher™ 4.5 (GeneCodes Corporation). Genome annotation was performed using MacVector® 7.2.3 (Accelrys). The sequence identity of open reading frames (ORFs) was verified using BLAST, and individual genes were aligned with published mt genomes of other flatworms to identify start and stop codons. The flatworms included: Digenea – Schistosoma japonicum (Accession no. NC_002544), S. mekongi (NC_002529), S. mansoni (NC_002545), S. haematobium (NC_008074), S. spindale (NC_008067), Paragonimus westermani (NC_002354) and Fasciola hepatica (NC_002546); Cestoda – Echinococcus granulosus (NC_008075), E. multilocularis (NC_000928), Hymenolepis diminuta (NC_002767), Taenia crassiceps (NC_002547), T. asiatica (NC_004826) and T. solium (NC_004022). The program tRNAscan-SE 1.21 (available at www.genetics.wustl.edu/eddy/tRNAscan-SE/) was used to identify tRNAs and their secondary structures (Lowe and Eddy, Reference Lowe and Eddy1997). The tRNAs, which were not detected by tRNA scan-SE 1.21, were identified by searching for the conserved motif YUxxxR, where xxx denotes the anticodon, and by detecting stem and loop regions by eye. The 2 rRNA genes were identified by BLAST searches and boundaries were determined by the terminal ends of adjacent genes and verified by comparison with homologous genes for other flatworms available from the GenBank database. Non-coding regions were scanned for repeat elements using the program Tandem Repeats Finder (Benson, Reference Benson1999).
RESULTS AND DISCUSSION
The circular mitochondrial genome of G. salaris is 14 790 bp long and has an overall A+T content of 62·3%. The complete annotated genome has been deposited in GenBank under Accession no. DQ988931. All the genes recognized previously in the mt genomes of other flatworms were identified in G. salaris. The annotated mt genome is depicted in Fig. 1, and details of gene boundaries are given in Table 2. As for other flatworms, all genes are transcribed from the same strand, and the genome lacks atp8 and is relatively compact. Fig. 2 shows that the length of individual protein-coding genes of G. salaris falls within the range of that found for other flatworm taxa for which mt genome sequences are available in the GenBank. Thus, the length differences among different taxa relates mainly to non-coding regions, which, for example, can range from 1·5 to 10 kb in S. mansoni (see Després et al. Reference Després, Imbert-Establet, Combes, Bonhomme and Monnerot1991). The order of protein-coding and rRNA genes is consistent with that of the Neodermata, in that they follow the pattern of most digeneans (except some species of Schistosoma) and all cestodes for which mt genome sequence data are available (Littlewood et al. Reference Littlewood, Lockyer, Webster, Johnston and Le2006). Only short, partial, multi-gene regions of non-neodermatan (‘turbellarian’) flatworms exist – the macrostomid Microstomum lineare (AY228756) and the polyclad Pseudostylochus intermedius (AB049114), and these show markedly different gene orders (cf. Littlewood et al. Reference Littlewood, Lockyer, Webster, Johnston and Le2006). Of the mt genomes of the Neodermata characterized thus far, only the gene order for members of the genus Schistosoma appears to vary (Le et al. Reference Le, Humair, Blair, Agatsuma, Littlewood and McManus2001).
G. salaris from the Signaldalselva river system (Troms County, North Norway) was chosen for mt genome characterization, since the mitochondrial haplotype of this strain should be the same as that of the type specimens of G. salaris. This particular haplotype was also found in G. salaris from the Vindelälv and Tornioälv river systems that drain into the Baltic Sea. These rivers are believed to have been originally infected by dumping salmon smolt in 1975 from the Hölle hatchery, Sweden, the type locality for G. salaris (see Johnsen et al. Reference Johnsen, Møkkelgjerd and Jensen1999; Hansen et al. Reference Hansen, Bachmann and Bakke2003). With a known provenance, the present mt genome may now provide a reference for further mitogenomic studies, testing hypotheses regarding the ecology, epidemiology and population genetics of G. salaris.
Start and stop codons
All initiation codons predicted are ATG, whereas TAG (for cytb, nad3 and nad5) and TAA (for cox3, nad4l, nad4, atp6, nad2, nad1, cox1, cox2 and nad6) are used as stop codons. Other invertebrate initiation codons, such as ATC and GTG, were not identified. Protein-coding genes were translated using the flatworm (rhabditophoran) mitochondrial code (see Table 9 in GenBank; Telford et al. Reference Telford, Herniou, Russell and Littlewood2000).
tRNA secondary structures, and overlap between adjacent genes
All of the 22 tRNA genes were identified. All but 8 were identified using the program tRNAscan; the remaining tRNA genes were identified by examining for putative anti-codon sequences and determining characteristic flanking stem and loop features. The 2 leucine tRNAs code for CTA and TTA (trnL1 and trnL2, respectively), and the 2 serine tRNAs code for AGC and TCA (trnS1 and trnS2, respectively). Three tRNA genes overlap with other genes, namely trnV (39 bp with nad2), trnS1 (5 bp with trnP), trnS2 (2 bp with nad3). Of all of the predicted secondary structures for the tRNA genes, trnS1, trnS2 and trnC lack the DHU arm, and trnP appears to lack the TΨC arm.
Nucleotide composition and codon usage
The overall nucleotide composition of all coding sites is: A (29·1%), C (17·2%), G (20·5%), and T (33·2%) (see Table 3). The genome has an overall A+T content of 62·3%. The protein-coding genes display a relatively low A+T content, particularly at the third codon positions (55·2%). The highest A+T content is in the rRNA and tRNA genes (67·6 and 66·9%). These findings fall within the ranges reported for other mt genomes of flatworms (Le et al. Reference Le, Blair and McManus2002; Johnston Reference Johnston, Maule and Marks2006). The codons predicted to be most frequently used are AUA (214), CUA (187), UUC (169), UUA (150), and the least frequently used (G+C-rich) codons are CCG, CGC, CGT (11), CAG and CGG (9), not including the stop codons UAA (9) and UAG (3) (Table 4).
Non-coding regions
There are 18 non-coding regions ranging from 4 bp to 112 bp. Typical control regions are not readily identifiable within the mt genomes of flatworms. However, the origin of replication may be located in the non-coding region between cox2 and trnE (positions 11244–11354), as this region folds with hairpins but without a T-rich loop (features of the control region as described by Wolstenholme (Reference Wolstenholme1992); nevertheless, this designation remains putative). Control regions are often targeted as a source of genetic markers, since they tend to vary considerably between species. Therefore, this region warrants characterization for additional strains and species of gyrodactylids. The 2 large non-coding regions, NC1 and NC2 (see Fig. 1) are 798 bp and 767 bp long, with an A+T content of 63 and 64·7%, respectively. NC1 is situated between trnP and atp6 and NC2 within a cluster of tRNA genes (trnW, trnL1, trnQ, trnM, NC2, trnS2, trnL2 and trnR). No repetitive regions were found in either of these non-coding regions. An alignment of NC1 and NC2 shows that almost the entire region (722 bp) is conserved (see Fig. 3), with only 6 bp (<0·1%) difference (Fig. 2). The conserved fragment begins in frame on a recognized initiation codon, ATG. However, in spite of extensive searches, no significant ORFs (for putative genes) were detected. The stop codon TAA occurred frequently in the 3 reading frames within NC1 and NC2. Repeat regions, particularly tandemly repeated regions, can yield variable length polymorphisms through slippage events, but NC1 and NC2 are separated by several kilobases from one another, leading us to speculate as to their possible origin and function within the mt genome. One may hypothesize that the NC1 and NC2 are not of mitochondrial but of nuclear origin. However, these non-coding regions have been identified in different PCR products amplified by long PCR. None of the coding genes detected in these long PCR products displayed degeneration which is typical for nuclear copies of mt DNA. Sequences obtained from PCR products using different primer pairs were consistently the same. Furthermore, independent PCR and sequencing conducted at the Natural History Museums in London and in Oslo confirmed the results. Therefore, we conclude that a nuclear origin of the non-coding NC1 and NC2 regions can be ruled out.
In summary, the mt genome of G. salaris, from a pathogenic population infecting Atlantic salmon parr in northern Norway, has been sequenced and annotated. Short non-coding regions, a putative control region and 2 longer, almost identical but separated non-coding regions, offer the opportunity of discovering rapidly evolving regions of the genome suitable for the definition of strain-specific markers. The variability in these and other regions may now be explored by comparing G. salaris from various geographical locations with reference strains and with other species of Gyrodactylus. These markers can then be used to ‘genotype’ G. salaris occurring on other salmonid hosts (e.g. Oncorhynchus mykiss, Salvelinus fontinalis and S. alpinus) to recognize host-associated haplotypes, and to genetically characterize G. salaris strains with varying degrees of virulence and host-specificity to salmon (Mo, Reference Mo2006; Olstad et al. Reference Olstad, Robertsen, Bachmann and Bakke2006). Having available mt markers will enable tools to be developed for studying the transmission dynamics of various species of Gyrodactylus between/among different host species and river systems, providing crucial information for an improved understanding of the spread and epidemiology of this pathogen. Additional mt genomes from a diversity of flatworm taxa, including polyopisthocotylean monogeneans and turbellarians, will allow an assessment of the phylogenetic utility of mt genomes at deeper evolutionary levels within the phylum.
We are grateful to Dr Catherine Collins and Dr Carey Cunningham for providing starting material for this project. Dr Le Thanh Hoa provided useful primer sequences prior to their publication. We thank Julia Llewellyn-Hughes and Claire Griffin of the NHM Sequencing Unit for expert technical assistance. T. H. was funded by a Marie Curie Fellowship (Meif-ct-2004-501684). D. T. J. L. and B. L. W. were funded by the Wellcome Trust, through a Senior Research Fellowship (043965) to D. T. J. L. L. P., T. A. B. and L. B. were supported by the National Centre for Biosystematics (currently NCB 146515/420; co-funded by NHM, UiO, and the Norwegian Research Council, NRC) and the ‘Wild Salmon Program’ (currently NRC 145861/720). Note on authorship: T. H. and D. T. J. L. conceived the project. T. A. M. and T. B. provided parasite samples. T. H. sequenced and annotated the genome with assistance from B. L. W. and D. T. J. L. L. P. assisted with primer design. T. H. and D. T. J. L. wrote the manuscript.