Article contents
Expressed sequence tag analysis of Sarcoptes scabiei
Published online by Cambridge University Press: 09 October 2003
Abstract
Sarcoptes scabiei is an important parasitic mite in both man and animals. Little is known about the molecular interactions between this pathogen and its host. This is in part explained by the paucity of mite-derived material, including antigens. To extend the knowledge of the molecular repertoire in S. scabiei, we have performed a gene survey by an expressed sequence tag (EST) analysis. A total of 1020 ESTs were generated from an S. scabiei cDNA library. The average sequence read was 510 bp after editing and the overall sequencing success was 89%. Clustering of the sequences resulted in 76 clusters, comprising 36% of the ESTs. Sequence similarity searches showed that almost half of the S. scabiei ESTs could be assigned a putative identity. Many of these transcripts shared similarity with genes involved in basic metabolism and cellular organization. In the data set we also identified several proteases and other types of potential allergens implicated in various disease mechanisms. A relatively large fraction of the ESTs coded for different proteins carrying protease inhibitor-like domains. The clones with no similarity to previously identified genes constituted 11% of our transcripts. The EST data generated in this study will be a valuable resource in further studies of the biology of S. scabiei and in the identification of genes that can serve as potential targets in the control of the parasite.
- Type
- Research Article
- Information
- Copyright
- 2003 Cambridge University Press
INTRODUCTION
The parasite Sarcoptes scabiei is a pathogenic mite of man and animals that is prevalent worldwide. It burrows into the stratum corneum of the skin where mechanical injury and cytological changes, combined with a mounting immune response, lead to a dermal disease (Burgess, 1994). In man the disease is called scabies and mange in animals. Scabies is usually associated with an intense itch and a range of other symptoms including consistent cutaneous eruption and the formation of lesions (Cabrera, Agar & Dahl, 1993; Chouela et al. 2002). The most severe form of the disease is crusted scabies, a condition that is often confined to individuals with a malfunctioning immune system (Donabedian & Khazan, 1992; Schlesinger, Oelrich & Tyring, 1994). If untreated, this condition can have fatal consequences. Recent estimates put the prevalence of scabies to about 300 million infected humans, the majority of them in developing countries (Walker & Johnstone, 2000), S. scabiei also occurs in a wide range of wild and domestic animals (Burgess, 1994). In particular, canines are susceptible to infection and mange is a highly contagious disease among dogs. In pigs, S. scabiei is probably the most important ectoparasite (Davies, 1995). Affected pigs scratch continuously and, as the disease progresses, erythema and abrasive lesions form and eventually the animals may lose condition. Not only are animal welfare aspects important but there is also a significant economic consequence of the disease that contributes to its importance. The cost due to mange has been estimated at hundreds of millions of dollars per year for the US pig industry alone (Arends & Ritzhaupt, 1995). Most cases of zoonotic scabies are contracted from dogs. However, in humans the infection is normally less severe when of animal origin compared with human-to-human transfer, and disappears in a few weeks (Burgess, 1994).
Despite the importance of S. scabiei as a pathogen in man and animals, very few basic studies employing molecular tools has been done (Kemp et al. 2002). Thus, there is only a limited amount of molecular data available, which can provide us with clues to the infection process or the molecular targets for the host's immune response to the parasite. For instance, Walton and colleagues have used molecular fingerprinting to look at the genetic relationship between S. scabiei isolated from dogs and S. scabiei from humans (Walton et al. 1999 a, b). We have previously isolated several immunodominant antigens by screening an S. scabiei cDNA expression library. One of them is paramyosin, which provokes a specific antibody response in animals during infection (Mattsson, Ljunggren & Bergstrom, 2001). To extend our analysis of the molecular repertoire in S. scabiei we have now completed an expressed sequence tag (EST) project. In an EST analysis, single-pass sequences are generated from randomly selected cDNAs that can be used to survey and define the genes expressed by an organism (or specific life-stage or cell type) (Fields, 1994). In the current work we have analysed more than 1000 ESTs with the aim of identifying genes that can serve as potential targets in the control of the parasite as well as to increase our basic understanding of the parasite. The generated data will also position S. scabiei in a wider molecular evolutionary context.
MATERIALS AND METHODS
The S. scabiei cDNA library
Mites of both sexes and of different developmental stages were isolated from the skin of red foxes (Vulpes vulpes) as described previously (Bornstein & Zakrisson, 1993). PolyA RNA was isolated from the mites and transcribed into cDNA, which was ligated into the EcoRI–XhoI sites of the UNI-ZAP XR vector (Stratagene, La Jolla, CA). For more details on the cDNA library see Mattsson et al. (2001).
Template preparation
The S. scabiei cDNA library was diluted in LB-media in order to avoid overlapping plaque formation on the agar plates. Escherichia coli strain XL-1 Blue MRF′ (Stratagene) was infected with the diluted phage cDNA library followed by an overnight incubation at 42 °C. Single plaques containing recombinant clones were randomly picked and transferred to sterile tubes with 100 μl of SM buffer (100 mM NaCl, 50 mM Tris–HCl (pH 7·5); 10 mM MgSO4). The agar plugs were incubated in SM-buffer between 15 and 60 min before samples were used as templates in the individual PCR reactions. The cDNA inserts were amplified by PCR in 40 μl vol. reactions using the vector specific primers T3 and T7. The reaction mixtures contained 10 mM Tris–HCl, pH 8·3, 50 mM KCl, 1·5 mM MgCl2, 0·25 μM of each primer, 200 μM of each deoxynucleotide, 1 μl of template and 1 U of AmpliTaq DNA polymerase (Applied Biosystems, Foster City, CA). The amplifications were done in either a PE 2400 (Applied Biosystems) or a PTC 200 (MJ Research, Waltham, MA). After 2 initial 5 min incubations at 93 °C and 48 °C, respectively, the cDNA was amplified for 30 cycles. Each cycle consisted of 3 min of extension at 72 °C, 1 min of denaturation at 93 °C and 1 min of annealing at 48 °C. The PCR ended with a final extension for 10 min at 72 °C. All PCR reactions were purified using the QIAquick PCR purification kit (Qiagen, Hilden, Germany) and were eluted with 30 μl of dH2O. To analyse the amplification results, 2 μl of each purified PCR sample was loaded onto 1% agarose gels containing ethidium bromide. Amplicons larger than 500 bp were chosen for DNA sequencing. All selected clones were prepared for long-time storage in SM-buffer with the addition of chloroform.
DNA sequencing
The purified templates were sequenced in single-pass reactions with the T3 primer using BigDye terminators according to the manufacturers' descriptions (Applied Biosystems). Unincorporated terminators were removed by ethanol precipitation and the sequencing reactions were separated on an ABI 377 automated sequencer (Applied Biosystems).
Bioinformatics
All sequences were base called using ABIprism version 3.2 (Applied Biosystems) and the software PHRED (Ewing & Green, 1998; Ewing et al. 1998), and subsequently edited in a semi automatic manner to remove 5′ and 3′ end vector and adaptor sequence as well as low quality 3′ sequence including the polyA tail. Edited sequences equal to or above 100 bp in length were compared with public databases (GenBank non-redundant nucleotide and protein databases) using the BLAST family of algorithms using default parameters (BLOSUM62, gap existence and extension penalties 11 and 1, E=10 and word size 3 without complexity filtering) (Altschul et al. 1990; Gish & States, 1993). Where applicable, ESTs were assigned a putative functionality. The threshold for significant/informative matches was set to a BLASTX similarity score value [ges ]50 and P[les ]10−4. Sequences without significant database matches were considered novel (Porcel et al. 2000). The ESTs were assembled into clusters using the PHRAP software version 0.990319 with a minimum overlap score of 44 and other parameters at default settings (http://www.phrap.org). This corresponds to a minimum of 14 bp overlaps in the establishment of the clusters. The consensus sequences of the clusters were also compared to the GenBank non-redundant protein database using BLASTX as above. The databases of Gadfly (http://hedgehog.lbl.gov) and Flybase (FlyBase, 1999) were accessed through the Kyoto Encyclopedia of Genes and Genomes (KEGG) database (Kanehisa et al. 2002). This protein database-managing tool is useful to classify protein matches functionally, especially when GenBank entries provide very little information about the putative function. The reason for this is that there is a time-lag between updates in the NCBI database and the actual ongoing or finished genome/EST projects. For the analysis of the predicted polypeptides from Drosophila melanogaster, release 2 of the database available from the Berkley Drosophila Genome Project (www.fruitfly.org) was used.
Identifiers used in the dataset
All ESTs have been given a project specific identity, e.g. SAS0110, where SASnnnn denotes Sarcoptes scabiei and 0110 is the number. The example SAS0110 corresponds to an EST coding for a Cofilin/Actin depolymerizing factor (ADF) homologue and contain a cofilin/ADF domain. All clones submitted to dbEST (Expressed Sequence Tags database; http://www.ncbi.nlm.nih.gov/dbEST/index.html) were also given a unique identifier called ESSUmmmm (EST, Sarcoptesscabiei, Uppsala) with a consecutive number. The EST clone SAS0110 received the dbEST identifier ESSU0082. The ESSU identifier was introduced in order to obtain a chronological order even if the SAS succession was broken. Both the SAS and ESSU identifiers are valid search terms at the dbEST.
RESULTS AND DISCUSSION
Generation and overall features of the EST dataset
In order to accelerate gene discovery and to gain insight of the biology of S. scabiei, a total of 1020 cDNA sequences was generated from a mixed life-stage library constructed from the mite. The SAS identifier goes up to SAS1027, the discrepancy between the actual number of ESTs and the identifier number is due to 7 clones, which were given identifiers but were never included in the dataset. The single-pass sequencing was always performed from the 5′-end of the cDNA clones. The inserts of the clones that were put forward for sequencing ranged from 500 to ~3500 bp. The average sequence read was 510 bp after editing. The overall sequencing success was 89%. The majority of the 116 SAS clones that were excluded consisted mostly, if not exclusively, of poly A tail sequences. Even though the cut-off for sequencing an amplicon was set to 500 bp, a few selected clones contained multiple inserts that resulted in short sequences after final editing. We therefore used a 100 bp cut-off for the BLASTX searches. In the final dataset this corresponded to 6 sequences that are less than 150 bp. A cluster analysis of the 904 sequences submitted to dbEST (GenBank accession numbers BG817579-BG817974, BM521860-BM522350, BM564941-BM564942, CA305267-CA305281) formed a total of 76 clusters and 576 singletons. A majority of the clusters were assembled from only 2–3 sequences (67%). The largest cluster contained 27 sequences and includes transcripts coding for cytochrome c oxidase subunit I. This finding is not very surprising since cytochrome oxidase catalyses the transfer of electrons to oxygen in the respiratory chain and is essential for most cell types. The second largest cluster comprised 19 sequences and displays similarity to a class of cysteine-rich proteins (see below for details). In all, 62 of the clusters have a significant match in the BLASTX analysis.
The redundancy of the cDNA library was established to be 37% using those sequences that assembled into clusters, which is in keeping with previous studies (Daub et al. 2000). On the other hand, further analysis suggests that sequence polymorphisms are present among different SAS ESTs, which would indicate that some sequences originate from different gene copies and thus that the actual redundancy of the cDNA library is lower. The number of clones that had sequences corresponding to ribosomal RNAs was 3·3%. The highest number of matches for the SAS dataset is with genes from the fruit fly D. melanogaster and putative function was assigned using KEGG, FlyBase and GadFly.
A total of 5 clones were excluded from the dataset since they were clearly of host origin. The 5 sequences were all coding for cytochrome c oxidase subunit III and matched sequences from dog (Canis familaris). We could not observe any other top-scoring matches with sequences from red fox or other members of the Canidae family. The Canidae family is represented by more than 24000 sequences and V. vulpes by 100 sequences, including cytochrome b and cytochrome c oxidase subunit I and II. This suggests a low level of host cDNA contamination.
Compilation of the identified genes in S. scabiei
Sequence similarity searches in public databases revealed that 48% of the S. scabiei ESTs could be assigned a putative identity (Table 1). Depending on the identity, these transcripts have been classified into different groups according to function (Fig. 1). The functions assigned to various ESTs are given in Table 2 (for a full list contact the corresponding author). The majority of matches were with sequences with a known or partly described function. However, a considerable fraction corresponds to evolutionary conserved sequences of unknown function. Interestingly, 50% of our ESTs matched 1 or more of the 14334 possible polypeptides from the D. melanogaster ORF dataset.

Fig. 1. A functional classification of the Sarcoptes scabiei ESTs. The transcripts with putative identities in S. scabiei were divided into functional categories. Transcripts involved in metabolism represent the largest group. The group represented by others contain ESTs that match proteins with a known function, which cannot be easily classified in one of the major groups and are too few to form unique groups.
Table 1. A summary of the sequence similarity searches in public databases with the Sarcoptes scabiei EST dataset (The P values in the BLASTX searches ranged from 10−4 to 10−102 for the ESTs that had an informative match. The sequences used in the analysis ranged from 104 bp to 848 bp.)

Table 2. Functional classification of individual ESTs from Sarcoptes scabiei (The threshold for informative matches was set to a BLASTX similarity score value [ges ]50 and P[les ]10−4. The table does not include the ESTs that had no informative match (a total of 372).)

The largest fraction of transcripts with a putative identity coded for proteins involved in metabolism (28%), including 55 transcripts coding for cytochrome c components, cytochrome 450 and cytochrome b. Transcripts corresponding to proteins involved in cellular and structural organization is the second largest group. The most common were actin-related and keratin-associated proteins. A number of myosins were also identified but they have not been classified in any detail. We have previously identified and over-expressed paramyosin from S. scabiei (Mattsson et al. 2001), one of the major structural components of thick filaments in invertebrates. However, in this study we did not identify any EST corresponding to paramyosin.
A large group of transcripts corresponded to proteins involved in translation. Predictably this group was dominated by ribosomal proteins, but also included initiation and elongation factors. The absence of several of the ribosomal proteins can be explained by the fact that we only analysed amplicons that were longer than 500 bp. Many of the smaller ribosomal proteins are transcribed from shorter mRNAs (Daub et al. 2000; Mattsson & Soldati, 1999). Interestingly, only 1 EST assigned to tRNA synthesis was found. With 7%, the group of ESTs representing protein destination also included the subgroups protein modification, protein folding and stabilisation, and proteolysis. The group included various heat-shock proteins, protein phosphatases and prefoldin. Signal transduction, detoxification, intracellular trafficking and transport proteins were the smallest groups, only adding up to a total of 6%. An example from the detoxification group is SAS0751 that shared similarity with a glutathione-S-transferase. This clone overlapped with SAS0635. However, the latter clone did not have any significant match with any sequence available in the redundant databases.
The group with ESTs matching proteins involved in cell growth, division and DNA synthesis represented transcripts with similarities to cell division control proteins, cyclin-dependent kinases and a cell division checkpoint protein. The part of the dataset denoted ‘others’ contained ESTs with matches to proteins with a known function, which cannot be easily classified in one of the major groups and are too few to form unique groups (Table 2). The functional classifications of motifs on the other hand are represented by ESTs coding for protein kinases, DNA-binding and Zn-finger motifs.
Proteases
We have identified several ESTs that are similar to previously described allergens/antigens from various mites. For instance, these include clones that share homology with the allergen Lep D 13, a fatty acid-binding protein isolated from the dust mite Lepidoglyphus destructor (Eriksson et al. 2001) and tropomyosin, called Mag44 from the house dust mite Dermatophagoides farinae (Aki et al. 1995). Of particular interest is a number of ESTs with similarity to various proteolytic enzymes. The clone SAS0725 encoded for a cysteine protease that shares its highest homology with Der f 1, a group 1 allergen from D. farinae. The group 1 allergens belong to the papain-like cysteine protease family and are primarily found in mite faeces (Yasuhara et al. 2001). In addition, 5 of our SAS-clones corresponded to Eur m 1 (Kent et al. 1992), which represents another group 1 cysteine protease, albeit from the house dust mite Euroglyphus maynei (Thomas & Smith, 1998). The group 1 enzymes represent a major type of house dust mite allergen in humans and the cysteine protease activity is thought to induce the pathogenic processes of allergy (Shakib & Gough, 2000; Shakib, Schulz & Sewell, 1998). In parasitic organisms cysteine proteases play many important roles. They are essential for general catabolic functions as well as protein processing. However, in parasites they may also be pivotal for immunoevasion, cell and tissue invasion or destruction, excystment/encystment and exsheathing (Sajid & McKerrow, 2002). To our knowledge no experimental data have so far been presented on the action or identity of any proteases from S. scabiei. In contrast, in the closely related sheep scab mite Psoroptes ovis much more is known (Kenyon & Knox, 2002; Nisbet & Billingsley, 2000). This parasite is a non-burrowing mite and causes a dermatitis that has the characteristics of a hypersensitivity reaction similar to scabies and sarcoptic mange. The actions of several proteases, including various cysteine proteases, have been implicated as important factors for the establishment of psoroptic mange (Kenyon & Knox, 2002; Nisbet & Billingsley, 2000).
We found 2 EST clones with homology to different cathepsins. The first was an aspartic proteinase similar to a cathepsin D. In P. ovis cathepsin D-like enzymes have been localized to the digestive cells and are involved in the lysomal degradation of nutrients (Nisbet & Billingsley, 2000). A cathepsin D-like enzyme is also present in a third member of the Sarcoptiodea superfamily, Psoroptes cuniculi, where it represents the major endopeptidase activity in mite extracts (Nisbet & Billingsley, 1999). The second cathepsin found in the S. scabiei EST sequence set was a lysosomal cysteine protease corresponding to cathepsin L. A recombinant form of cathepsin L from the tick Boophilus microplus hydrolyses a range of substrates including synthetic as well as natural proteins like haemoglobin, vitellin and gelatin (Renard et al. 2000, 2002). Additionally, in the SAS dataset we have identified several clones linked to the function of the protoesome and related functions.
Abundantly expressed cysteine-rich proteins in S. scabiei
In our S. scabiei dataset a total of 32 ESTs shared similarity with a scavenger receptor cysteine rich protein (SRCRP) from purple sea urchin (Strongylocentrotus purpuratos). In the cluster analysis these individual ESTs formed 3 separate contigs c49, c69 and c76. A number of polymorphisms among the ESTs in the c69 cluster suggested that it might be composed of at least 2 separate transcripts. However, complementary sequencing is necessary to define the 3 contigs definitely. A BLAST search with the contigs gave the highest matches with 3 unidentified ORFs from the malaria mosquito Anopheles gambiae genome project (Holt et al. 2002). All 3 have 1 or more trypsin inhibitor like-cysteine rich domain (TIL), a domain that is also present in the SRCRP. The TIL domain typically contains 10 cysteine residues that form 5 disulphide bridges and can be found in a variety of proteins, e.g. trypsin inhibitors as well as a domain in different types of extracellular proteins. Furthermore, the whole or part of the TIL domain was found in an additional 9 S. scabiei ESTs. By using NCBIs' conserved domain architecture retrieval tool (CDART), all 3 A. gambiae ORFs were found to be part of a group with similar domain architectures, that is in part defined by accessory gland protein Acp62 from D. melanogaster (Wolfner et al. 1997).
Based on the available information we cannot assign any exact function to the 3 related EST clusters. The presence of a TIL domain and its relation to the Acp62-group of proteins is intriguing. More importantly, in our analysis more than 3% of the investigated clones belong to this group, suggesting that the mRNA levels from this family of genes are highly abundant in S. scabiei. Several possible roles could be assigned to this class of proteins. The secretion of proteins with trypsin inhibitory domains might increase parasite survival simply by blocking various host immune molecules or in other ways interfere with proteolytic actions. The proteins might also be necessary to regulate the actions of proteases secreted by the mites. Whatever function this group of proteins has in the life-cycle of S. scabiei it clearly merits further investigations.
Concluding remarks
One of the most important types of information that can be gained from an EST dataset is guidance in choosing candidate genes for future studies. Some of the transcripts identified in this study are obvious choices, such as potential virulence factors, including proteases and various allergen-related proteins. Not only are these types of proteins fundamental for a basic understanding of the parasite–host interaction, but they can also be turned into valuable tools for the immunological control of S. scabiei. Transcripts that correspond to proteins of unknown function, in particular those that appear very frequently in the EST collection represent an intriguing group. The mere fact that they are over-represented in the dataset suggests that the parasite requires large amounts of their protein products, which merits them for further investigations. The rapid growth of the different molecular databases will certainly be helpful in identifying the roles for the hitherto unknown transcripts/proteins, as proven by data-mining strategies (Conklin et al. 2000). Nevertheless, biochemical studies are inevitable if we want to begin to understand the molecular interplay between S. scabiei and its host.
This study was funded by the Swedish Research Council for Environment, Agricultural Sciences and Spatial Planning (FORMAS). We thank Arvid Uggla, Lena Åström and Anh-Nhi Tran for valuable discussion and Katarina Näslund for skilful technical assistance.
References
REFERENCES

Fig. 1. A functional classification of the Sarcoptes scabiei ESTs. The transcripts with putative identities in S. scabiei were divided into functional categories. Transcripts involved in metabolism represent the largest group. The group represented by others contain ESTs that match proteins with a known function, which cannot be easily classified in one of the major groups and are too few to form unique groups.

Table 1. A summary of the sequence similarity searches in public databases with the Sarcoptes scabiei EST dataset

Table 2. Functional classification of individual ESTs from Sarcoptes scabiei
- 23
- Cited by