INTRODUCTION
Oxygen generates an array of reactive oxygen species (ROS) such as the superoxide radical (), which are scavenged by specialized antioxidant machinery in aerobic organisms (Scandalios, Reference Scandalios2005). The key components of this machinery are superoxide dismutases (SODs) that catalyse the disproportionation of superoxide radicals to hydrogen peroxide and oxygen (for a review see Wolfe-Simon et al. Reference Wolfe-Simon, Grzebyk, Schofield and Falkowski2005). There are 3 main types of these enzymes: Cu/Zn-, Mn-, and Fe-SODs. While all 3 varieties are used in higher eukaryotes, such as plants and animals, Fe-SODs predominate in protozoans including parasitic lineages (see, for example, Fink and Scandalios, Reference Fink and Scandalios2002; Brydges and Carruthers, Reference Brydges and Carruthers2003).
One of these parasitic lineages is the Trypanosomatidae, which contains medically important species such as Trypanosoma brucei (the causative agent of African sleeping sickness) and Leishmania donovani (Kala Azar or visceral leishmaniasis) (for a recent review see Simpson et al. Reference Simpson, Stevens and Lukeš2006). Molecular phylogenetic analyses clearly demonstrate that these parasites are closely related to bodonids and diplonemids (Simpson et al. Reference Simpson, Stevens and Lukeš2006). Interestingly, the sister group to the collective trypanosomatid, bodonid, diplonemid lineage is the Euglenoidea, a class of protists with many photosynthetic species such as Euglena gracilis (Simpson et al. Reference Simpson, Stevens and Lukeš2006). Based on these evolutionary relationships, Hannaert et al. (Reference Hannaert, Saavedra, Duffieux, Szikora, Rigden, Michels and Opperdoes2003) suggested that trypanosomatid parasites harboured a eukaryotic alga-derived plastid in their evolutionary past (see also Martin and Borst, Reference Martin and Borst2003).
T. brucei has 4 Fe-containing superoxide dismutases, designated as TbSODA, TbSODB1, TbSODB2, and TbSODC (Dufernez et al. Reference Dufernez, Yernaux, Gerbod, Noel, Chauvenet, Wintjens, Edgcomb, Capron, Opperdoes and Viscogliosi2006; Wilkinson et al. Reference Wilkinson, Prathalingam, Taylor, Ahmed, Horn and Kelly2006). TbSODB1 has no targeting signal, TbSODB2 carries a C-terminal peroxisome/glycosome import sequence, while TbSODA and TbSODC have N-terminal extensions that resemble mitochondrial-targeting signals (Dufernez et al. Reference Dufernez, Yernaux, Gerbod, Noel, Chauvenet, Wintjens, Edgcomb, Capron, Opperdoes and Viscogliosi2006; Wilkinson et al. Reference Wilkinson, Prathalingam, Taylor, Ahmed, Horn and Kelly2006). In agreement with these findings, targeting studies using green fluorescent protein (GFP) demonstrated that TbSODB1 localizes to the cytosol, TbSODB2 to glycosomes, and TbSODA and TbSODC to mitochondria (Dufernez et al. Reference Dufernez, Yernaux, Gerbod, Noel, Chauvenet, Wintjens, Edgcomb, Capron, Opperdoes and Viscogliosi2006; Wilkinson et al. Reference Wilkinson, Prathalingam, Taylor, Ahmed, Horn and Kelly2006).
In this paper, we investigate the N-terminal extensions of Fe-SODCs of Trypanosoma and Leishmania in detail. We demonstrate that they carry bipartite targeting sequences composed of a signal peptide-like domain followed by a transit peptide-like domain. Since comparable pre-sequences are found in proteins targeted to multi-membrane plastids, we propose that this class of dismutases was initially imported into a plastid present in the ancestors of trypanosomatids. When plastids later were lost from this parasitic lineage, selection must have favoured re-direction of SODC into the mitochondrion.
MATERIALS AND METHODS
Sequences analysed
SOD sequences from Trypanosoma and Leishmania were identified by BLAST searches of sequence databases and genome projects, including GenBank (http://www.ncbi.nlm.nih.gov), GeneDB (http://www.genedb.org), and The Wellcome Trust Sanger Institute (http://www.sanger.ac.uk). A non-redundant set of 11 A and 9 C dismutases was selected for further analyses (see Table 1S in Supplementary material, in Online version only).
Based on investigations of known targeting/localization of trypanosomatid proteins we assembled a set of 48 trypanosomatid nuclear-encoded, mitochondrial matrix-targeted proteins from two databases: GenBank (http://www.ncbi.nlm.nih.gov) and UniProt (http://www.expasy.uniprot.org). Annotations and accession numbers are given in Table 2S in the Supplementary material (in Online version only).
Twelve amino acid sequences of class II nuclear-encoded, plastid-targeted proteins of E. gracilis were inferred from EST sequences identified by Durnford and Gray (Reference Durnford and Gray2006). We also included 15 inferred amino acid sequences of nuclear-encoded, plastid-targeted proteins from peridinin dinoflagellates. All of these proteins belong to class II and come from corresponding gene and EST sequences (for their detailed description see Patron et al. Reference Patron, Waller, Archibald and Keeling2005). A complete list of Euglena and dinoflagellate plastid proteins used in these studies can be found in Table 3S in the Supplementary material (in Online version only).
Prediction of targeting signals and subcellular localizations
We used 30 programs and software tools to predict targeting signals and subcellular localizations of SODAs, SODCs, mitochondrial matrix-targeted proteins of the Trypanosomatidae, and Euglena and dinoflagellate plastid proteins. These tools can be divided into 4 groups: (i) programs specializing in predicting signal peptides; (ii) programs that distinguish different kinds of N-terminal targeting signals, such as signal peptides, mitochondrial transit peptides, and plastid transit peptides; (iii) programs specializing in predicting mitochondrial transit peptides or mitochondrial localizations; and (iv) programs that find different subcellular localizations of a protein. A full list of these programs and references is available in Table 4S in the Supplementary material (in Online version only).
Based on the results generated, we calculated predictabilities that each sequence contained mitochondrial transit peptide and/or mitochondrial localization as well as signal peptide and/or some localization that requires presence of this peptide (extracellular, secretory pathway, ER, plasma membrane, lysosome, the Golgi apparatus). Because all these latter localizations require at least temporary residence within the endomembrane system, we refer to them collectively as endomembrane localization. Predictabilities of the targeting signals and/or intracellular localizations were expressed as percentages of given positive predictions among total potential predictions.
Calculation of substitution rates in SODCs
SODC amino acid sequences were aligned by MSA 2.1 (Gupta et al. Reference Gupta, Kececioglu and Schaffer1995) and edited in GeneDoc 2.6 (Nicholas and Nicholas, Reference Nicholas and Nicholas1997). A best-fit amino acid substitution model was selected by ProtTest 1.2.7 (Abascal et al. Reference Abascal, Zardoya and Posada2005) according to the Akaike Information Criterion (AIC), Second-order AIC (AICc), and Bayesian Information Criterion (BIC); this turned out to be the JTT+Γ, which was used in subsequent analyses. Site substitution rates were computed with the program CODEML from the PAML 3.15 package (Yang, Reference Yang1997), given the tree inferred by PHYML 2.4.4 (Guindon and Gascuel, Reference Guindon and Gascuel2003) and a JTT+Γ model; all parameters were estimated assuming 40 rate categories for the discrete-gamma model.
Statistical analyses
The non-parametric ANOVA Kruskal-Wallis test was used to determine the statistical significance of support for competing hypotheses. These tests and correspondence analyses were carried out using Statistica software (StatSoft, Inc., 2006). The Benjamini-Hochberg multiple comparisons procedure for controlling false discovery rate (Benjamini and Hochberg, Reference Benjamini and Hochberg1995) was employed in comparisons of amino acid content of pre-sequences and predictabilities of the targeting signals and/or intracellular localizations in the R 2.4.1. package (R Development Core Team, 2006).
RESULTS
Analysis of targeting signals of SODAs and SODCs from Trypanosoma and Leishmania
Fe-SODAs
Hydropathy profiles indicate that Trypanosoma SODAs possess an N-terminal hydrophilic domain corresponding to the entire pre-sequence (Fig. 1), suggesting that they are, indeed, imported into the mitochondrial matrix. In support of this hypothesis, 86% (values are given as median percentages) of computational predictions found a mitochondrial transit peptide and/or mitochondrial localization (Fig. 2). In contrast, predictability of a signal peptide and/or localization within the endomembrane system (e.g. ER) was only 2% (Fig. 2).

Fig. 1. Analyses of the N-terminal extensions of trypanosomatid mitochondrion-targeted SODAs, which carry single domain pre-sequences. A hydrophobic stretch is discernible in Leishmania pre-sequences. There are striking similarities between hydropathy profiles of SODAs and trypanosomatid mitochondrial matrix-targeted proteins (see Fig. 1S in Supplementary material, in Online version only); this is especially clear in the case of LdcSODA and AAK64280.1, Tb427SODA and O79469, TcoSODA and XP_843727.1 as well as TvSODA and XP_847200.1. The thick black landscape curve shows the hydrophobicity profiles of sequences according to the Kyte-Doolittle scale (Kyte and Doolittle, Reference Kyte and Doolittle1982) assuming a sliding window length of 9 residues. Regions present in the mature proteins are shaded in grey. Short coloured vertical lines under the profile show the distribution of particular amino acid residues grouped in 3 physicochemical classes: hydroxylated (Ser, Thr, Tyr) in yellow, basic (Arg, His, Lys) in blue, and acidic (Asp, Glu) in red.

Fig. 2. Median predictability of targeting signals and subcellular localizations for Euglena and dinoflagellate plastid proteins, SODCs, trypanosomatid mitochondrial matrix-targeted proteins, and SODAs. Interestingly, a signal peptide and/or endomembrane localization were found with the same predictability for SODCs and plastid proteins. Moreover, the predictability of a mitochondrial transit peptide and/or mitochondrial localization for SODCs was intermediate between those for plastid proteins and mitochondrial proteins.
Unlike Trypanosoma SODAs, Leishmania presequences possess a hydrophobic domain (Fig. 1). Concordant with this finding, a signal peptide and/or endomembrane localization was predicted for Leishmania SODAs at 21% (Fig. 2). Even so, prediction of a mitochondrial transit peptide and/or mitochondrial localization was considerably higher at 57% (Fig. 2). This suggests that, although the N-terminal extensions of Leishmania SODAs may bear some resemblance to signal peptides, their potential for mitochondrial targeting is much stronger.
Fe-SODCs
Hydropathy profile analyses of SODCs yielded unexpected results. In contrast to SODAs, SODCs carry bipartite pre-sequences composed of a hydrophobic domain followed by a hydrophilic domain. The first domain encompasses ~20 aa, whereas the second is ~70 aa (Fig. 3). The N-terminal position of the hydrophobic domain in Trypanosoma and Leishmania SODC pre-sequences is suggestive of a signal peptide. Computational programs found a signal peptide and/or endomembrane localization for these dismutases with 52% predictability (Fig. 2). In turn, a mitochondrial transit peptide and/or mitochondrial localization was recognized with lower predictability (40%) (Fig. 2).

Fig. 3. Analyses of the N-terminal extensions of mitochondrion-targeted SODCs of the Trypanosomatidae, which carry bipartite pre-sequences with a hydrophobic domain followed by a hydrophilic domain. In this way, they strongly resemble N-terminal extensions of Euglena and dinoflagellate plastid proteins (see Fig. 2S in Supplementary material in Online version only). Please compare, for example, LiSODC with EEL00003797, LmSODC with AAW79321, TcrB1SODC with EEL00002416, and TvSODC with AAW79349.
Further clues to the character of the SODC N-terminal extensions come from programs specialized in identifying signal peptides. These programs found a signal peptide in SODCs with the predictability of 63% (quartile range: 63%−75%), while the same prediction for SODAs was effectively 0% (quartile range: 0%−25%).
Comparison of SODA and SODC pre-sequences with N-terminal extensions of trypanosomatid mitochondrial matrix-targeted proteins
The data presented above indicate that SODAs carry typical mitochondrial targeting signals, whereas those of SODCs appear unusual for mitochondrial-targeted proteins. Therefore, we further investigated the possibility that SODC N-terminal extensions fall within the normal diversity of trypanosomatid mitochondrial targeting signals. To test this hypothesis, we compared the pre-sequences of SODAs and SODCs with those of trypanosomatid mitochondrial matrix-targeted proteins, focusing on length, hydropathy profile, as well as predictions of targeting signals and intracellular localizations.
The N-terminal extensions of trypanosomatid mitochondrial matrix-targeted proteins are composed of 7–52 aa (Table 1). As expected, at 31–36 aa, SODA pre-sequences fall within an ideal length range for mitochondrial matrix proteins (P=0·16). In contrast, the N-terminal extensions of SODCs contain 90–101 aa (Table 1). This is significantly longer (almost 2×) than the longest pre-sequences of other determined matrix proteins (P<0·0001).
Table 1. Lengths of the pre-sequences of SODAs and Cs, Euglena and dinoflagellate plastid proteins, and mitochondrial matrix-targeted proteins of the Trypanosomatidae given in amino acid residues

Although hydropathy profiles of the N-terminal extensions of trypanosomatid mitochondrial matrix-targeted proteins have somewhat varied patterns, their common feature is the presence of one-domain targeting signals (see Fig. 1S in Supplementary material, in Online version only). In this way, they clearly resemble SODAs. By contrast, none of the 48 trypanosomatid mitochondrial matrix-targeting sequences analysed resembles the peculiar, bipartite pre-sequences present in SODCs (Fig. 3).
Predictability of a signal peptide and/or endomembrane localization was very low for trypanosomatid mitochondrial matrix-targeted proteins, amounting to only 5% (Fig. 2). Interestingly, computational programs found almost the same (4%) predictability for SODAs. In contrast, predictability of a signal peptide and/or endomembrane localization for SODCs was more than 10 times higher (52%). This is similar to predictions of mitochondrial transit peptide and/or mitochondrial localization for mitochondrial matrix proteins and SODAs, at 63% and 72% respectively (Fig. 2). For SODCs, however, predictability of such a peptide and/or localization was only 40%.
The above comparisons demonstrate that SODA N-terminal extensions are typical for trypanosomatid mitochondrial matrix-targeted proteins, whereas SODC pre-sequences clearly deviate from the matrix targeting signals in each of the features analysed.
N-terminal extensions of Trypanosoma and Leishmania SODCs resemble pre-sequences of euglenoid and dinoflagellate plastid proteins
Hydropathy profiles and bioinformatic analyses both distinguish 2 domains in Trypanosoma and Leishmania SODC pre-sequences. The N-terminal domains are hydrophobic and resemble signal peptides, whereas the C-terminal domains are hydrophilic and possess features of transit peptides. Interestingly, this kind of bipartite targeting signal, known as a class II pre-sequence, is present on proteins imported into multi-membrane plastids such as those of Euglena (Durnford and Gray, Reference Durnford and Gray2006) and dinoflagellates (Patron et al. Reference Patron, Waller, Archibald and Keeling2005) (see also Ishida, Reference Ishida2005; Hempel et al. Reference Hempel, Bozarth, Sommer, Zauner, Przyborski and Maier2007 for reviews on other complex plastids). The N-terminal extensions of trypanosomatid SODCs and type II plastid pre-sequences also display striking similarities in their hydropathy profiles (see Fig. 3, and Fig. 2S in Supplementary material, in Online version only). In view of these resemblances, we performed detailed comparisons of the N-terminal extensions of SODCs with those of Euglena and dinoflagellate class II plastid proteins.
Pre-sequence lengths
The N-terminal extensions of Euglena and dinoflagellate plastid proteins comprise 40 to 175 aa (Table 1). Consequently, the ~100 aa pre-sequences of Trypanosoma and Leishmania SODCs fall nicely within the range of these class II plastid proteins. In fact, differences in pre-sequence length between SODCs and plastid proteins, as well as those between Euglena proteins and dinoflagellate proteins, are indistinguishable statistically (P=1·0). These striking results, along with the observation that SODC N-terminal extensions do not come close to overlapping the maximum size for mitochondrial matrix proteins and SODAs (P<0·0001) (see Table 1), are consistent with SODC having an alternative evolutionary history from other mitochondrial-directed proteins.
Targeting properties
Predictability of a signal peptide and/or endomembrane localization for euglenoid and dinoflagellate plastid proteins was 52% (Fig. 2). Interestingly, we found no difference in the predictabilities of such a peptide and/or localization for SODCs. This suggests that the pre-squences of SODCs possess relatively well-preserved signal peptides. By contrast, predictability of a signal peptide and/or endomembrane localization for trypanosomatid mitochondrial matrix-targeted proteins and SODAs was more than 10 times lower (see Fig. 2).
A mitochondrial transit peptide and/or mitochondrial localization were recognized in Euglena and dinoflagellate plastid proteins with the predictability of 28% (Fig. 2). These results are not surprising given that these pre-sequences possess hydrophilic domains that function as N-terminal targeting signals in protein import into plastids and mitochondria (Bruce, Reference Bruce2001; Neupert and Herrmann, Reference Neupert and Herrmann2007). In accordance with these findings, transit peptide-like domains of many proteins imported into multi-membrane plastids have features of mitochondrial transit peptides (see, for example, Brydges and Carruthers, Reference Brydges and Carruthers2003). It is noteworthy that the 40% predictability of the mitochondrial transit peptide and/or mitochondrial localization for SODCs is intermediate between those of plastid proteins (28%) and trypanosomatid mitochondrial matrix-targeted proteins (63%) or SODAs (72%) (see Fig. 2). Thus, it is reasonable to postulate that the N-terminal extensions of SODCs originally were plastid bipartite targeting signals, which underwent a transformation into mitochondrial transit peptides.
Amino acid composition provides additional support for the plastidic nature of the N-terminal extensions of dismutases C
It is well established that targeting information of an N-terminal extension is contained in physico-chemical features of its amino acids rather than in its specific sequence order (see, for example, Bruce, Reference Bruce2001; Patron et al. Reference Patron, Waller, Archibald and Keeling2005; Neupert and Herrmann, Reference Neupert and Herrmann2007). Therefore, we compared the contents of particular amino acids and their physico-chemical properties among pre-sequences.
Particular amino acids frequencies
The N-terminal extensions of trypanosomatid SODCs have comparable amino acid frequencies to Euglena and dinoflagellate class II plastid protein pre-sequences (Table 5S in Supplementary material, in Online version only). The only clear differences were for Ala (P=0·031) and Tyr (P=0·011). In contrast, the similar amino acid compositions of SODC and plastid protein pre-sequences are quite different from those of mitochondrial proteins and SODAs (see Table 5S). We identified substantial differences in Glu (E) and Asp (D) content between the N-terminal extensions of SODCs and SODAs (P=0·015 and P=0·005 for E and D, respectively) as well as those of mitochondrial proteins (P<0·0001 for both amino acids). In fact, pre-sequences of SODAs and other mitochondrion-targeted proteins are nearly devoid of these amino acids. Second, both SODC and plastid protein pre-sequences are characterized by reduced numbers of Met residues compared to mitochondrial proteins (P=0·003 and P<0·0001, respectively) and to SODAs (P=0·033 and P=0·006, respectively). Third, there are significant differences in the frequencies of Tyr (P<0·0001) and Asp (P=0·003) between the N-termini of SODCs and mitochondrial proteins. Finally, SODC pre-sequences have fewer Leu residues than do SODAs (P=0·041).
Physico-chemical amino acids groups
Significant discrepancies in the proportions of acidic, basic, polar, and hydrophobic/non-polar amino acids also were identified between the pre-sequences of SODCs/plastid proteins and those of mitochondrial proteins/SODAs. The differences described below are statistically significant at P<0·01 unless otherwise noted. First, the N-terminal extensions of SODCs and plastid proteins contain more acidic residues than those of SODAs or mitochondrial proteins (plastid proteins versus SODAs, P=0·047). Second, SODC pre-sequences are richer in polar residues than SODA and mitochondrial pre-sequences (P=0·014). Lastly, decreased levels of hydrophobic and non-polar amino acids distinguish SODC pre-sequences from all other pre-sequence groups (versus mitochondrial proteins, P=0·01).
Correspondence analysis
To better visualize differences in the amino acid composition of various pre-sequences, we performed correspondence analyses (see Fig. 3S in Supplementary material, in Online version only). In agreement with our hypothesis that SODCs are descended from plastid-targeted proteins, they occur in close proximity to plastid proteins but clearly are separated from SODAs/mitochondrial proteins.
Signal peptide-like domains of Trypanosoma and Leishmania SODCs have undergone gradual degeneration
Hydropathy profiles of Trypanosoma and Leishmania SODCs show that this character is strongly conserved in transit peptide-like domains, but highly variable in signal peptide-like domains (Fig. 3). It is even possible to identify sequences at what appear to be various stages of evolutionary degradation. In the signal peptide of L. infantum, the main hydrophobic domain has begun to divide into 2 smaller domains (Fig. 3). In L. major this division is still incomplete, whereas in L. braziliensis 2 separate domains clearly are resolved. A similar fragmentation of the signal peptide-like domain is observable in the SODCs of T. congolense and T. vivax (Fig. 3).
These results suggest an ongoing degeneration of the signal peptide region of trypanosomatid SODCs, a process we argue is related to their re-direction to the mitochondrion. This hypothesis is further supported in comparisons of amino acid substitution rates in different regions of SODC (Fig. 4). Inferred substitution rates in the transit peptide are the same as in mature proteins, whereas they are several times higher in domains predicted to be signal peptides with high probability. Moreover, there are numerous insertions and deletions in these signal peptide-like regions (see Fig. 4S in Supplementary material, in Online version only). Comparable indels are absent from the transit peptide domain, and these differences characterize all sequences analysed.

Fig. 4. Substitution rate and prediction of a signal peptide along the alignment of SODC N-terminal extensions.
DISCUSSION
Trypanosomatids encode 4 iron-containing superoxide dismutases, with 2 of them, SODA and SODC, targeted to the mitochondrion (Dufernez et al. Reference Dufernez, Yernaux, Gerbod, Noel, Chauvenet, Wintjens, Edgcomb, Capron, Opperdoes and Viscogliosi2006; Wilkinson et al. Reference Wilkinson, Prathalingam, Taylor, Ahmed, Horn and Kelly2006). Although transported into the same organelle, SODA and SODC differ dramatically in the lengths of their pre-sequences (see Table 1 and Dufernez et al. Reference Dufernez, Yernaux, Gerbod, Noel, Chauvenet, Wintjens, Edgcomb, Capron, Opperdoes and Viscogliosi2006; Wilkinson et al. Reference Wilkinson, Prathalingam, Taylor, Ahmed, Horn and Kelly2006). Considering these discrepancies, Dufernez et al. (Reference Dufernez, Yernaux, Gerbod, Noel, Chauvenet, Wintjens, Edgcomb, Capron, Opperdoes and Viscogliosi2006) suggested that the two kinds of Fe-SODs are delivered to distinct mitochondrial subcompartments, with SODA transported into the mitochondrial matrix and SODC targeted to the inter-membrane space.
Our detailed bioinformatics studies support matrix residence of SODA, with pre-sequences clearly resembling those of trypanosomatid mitochondrial matrix-targeted proteins in length, amino acid composition, and targeting properties. In contrast, our analyses cast doubt on targeting of Trypanosoma and Leishmania SODC to the mitochondrial inter-membrane space. Although their pre-sequences do carry well-distinguished hydrophobic domains (Fig. 3), these are localized at their N-termini not near their centres as is typical of inter-membrane space-targeted proteins (Neupert and Herrmann, Reference Neupert and Herrmann2007). Moreover, SODCs sharply deviate from established trypanosomatid mitochondrial inner membrane/inter-membrane space-targeted proteins. For example, cytochrome c1 in T. brucei is devoid of any N-terminal targeting signal, whereas the Rieske protein from T. brucei carries an N-terminal extension composed of only 17 aa (Priest and Hajduk, Reference Priest and Hajduk1996, Reference Priest and Hajduk2003).
SODC pre-sequences possess an N-terminal signal peptide followed by a hydrophilic domain (Fig. 3), the latter similar to both mitochondrial and plastid transit peptides. In this way, they resemble the N-terminal extensions of proteins transported into multi-membrane plastids (Ishida, Reference Ishida2005; Patron et al. Reference Patron, Waller, Archibald and Keeling2005; Durnford and Gray, Reference Durnford and Gray2006; Hempel et al. Reference Hempel, Bozarth, Sommer, Zauner, Przyborski and Maier2007), which evolved from eukaryotic algae by secondary and tertiary endosymbioses (for a review see Bodył, Reference Bodył2005). Moreover, there are compelling similarities between pre-sequences of SODCs and those of Euglena and dinoflagellate plastid proteins of class II in length, hydropathy profiles, amino acid composition, and targeting properties (see Table 1 and Figs 2 and 3). Considering these resemblances, we propose that SODC originally was imported into a eukaryotic alga-derived plastid. According to this model, the Trypanosomatidae initially possessed 1 iron-containing superoxide dismutase targeted to the mitochondrial matrix (Fig. 5). The gene encoding this dismutase underwent duplication, giving rise to SODA and SODC paralogues, both targeted to the mitochondrion; this key event could have occurred either before or after acquisition of the eukaryotic plastid. Subsequent linking of a signal peptide to SODC caused it to be re-targeting to the plastid, presumably a highly advantageous adaptation. After the trypanosomatid plastid lost its ability to photosynthesize, however, its production of ROS decreased drastically, and mutations that further re-directed SODC to the mitochondrion were favoured by selection. Although the plastid has been lost completely, evidence of its prior existence lies in the peculiar, plastid-like pre-sequences of SODC proteins.

Fig. 5. Hypothetical evolutionary pathway of trypanosomatid SODs. An ancestor of trypanosomatids initially possessed a gene encoding a Fe-SOD, equipped with a classical mitochondrial transit peptide and targeted to the mitochondrion. After acquisition of a eukaryotic alga-derived plastid, this gene underwent duplication, giving rise to SODA and SODC. In a next evolutionary step, selection favoured acquisition of a signal peptide by SODC and its import into the plastid. The subsequent loss of the plastid resulted in adaptation of the N-terminal targeting signal of SODC as a mitochondrial transit peptide, and its re-direction to the mitochondrion.
The above evolutionary scenario is most consistent with the framework provided by phylogenetic analyses of Fe-SODs, which demonstrate that SODA and SODC are paralogues that were duplicated early in trypanosomatid evolution (see Figs 5 and 6 in Dufernez et al. Reference Dufernez, Yernaux, Gerbod, Noel, Chauvenet, Wintjens, Edgcomb, Capron, Opperdoes and Viscogliosi2006). Interestingly, apicomplexan parasites possess a Fe-SOD that carries a SODC-like pre-sequence composed of a signal peptide followed by a transit peptide (Brydges and Carruthers, Reference Brydges and Carruthers2003). In accordance with our evolutionary hypothesis for SODCs, the apicomplexan dismutase is dually targeted to the apicomplexan mitochondrion and its complex plastid surrounded by 3 or 4 membranes (Pino et al. Reference Pino, Foth, Kwok, Sheiner, Schepers, Soldati and Soldati-Favre2007).
Alternative evolutionary scenarios for SODCs such as vertical inheritance from an ancient cyanobacterium-derived plastid (see Nozaki et al. Reference Nozaki, Matsuzaki, Takahara, Misumi, Kuroiwa, Hasegawa, Shin-i, Kohara, Ogasawara and Kuroiwa2003), or horizontal gene transfer from an alga with complex plastids (e.g. cryptophyte, heterokont or chlorarachniophyte; Ishida, Reference Ishida2005; Hempel et al. Reference Hempel, Bozarth, Sommer, Zauner, Przyborski and Maier2007) are inconsistent with the well-supported phylogenetic position of SODC sequences that branch with trypanosomatid SODA sequences and next with sequences of other excavates represented by trichomonads (Dufernez et al. Reference Dufernez, Yernaux, Gerbod, Noel, Chauvenet, Wintjens, Edgcomb, Capron, Opperdoes and Viscogliosi2006). These trees indicate rather a horizontal transfer of an ε-proteobacterial sod gene to the ancestor of Excavata (see Archibald et al. Reference Archibald, Longet, Pawlowski and Keeling2003) and a successive vertical inheritance of the acquired dismutase gene by all excavate taxa including trypanosomatids (Dufernez et al. Reference Dufernez, Yernaux, Gerbod, Noel, Chauvenet, Wintjens, Edgcomb, Capron, Opperdoes and Viscogliosi2006). Moreover, it could be postulated that SODCs were first targeted to the endomembrane system (e.g. ER) with the help of a signal peptide, and later re-directed to the mitochondrion through acquisition of a hydrophilic domain. However, comparison of SODs from ε-purple bacteria with those of eukaryotic proteins demonstrates that the sequences of bacterial proteins are strikingly similar to the trichomonad dismutases that are devoid of any pre-sequences (data not shown). Consequently, the regions encoding pre-sequences of SODAs and SODCs must have been added only in the trypanosomatid lineage, and likely functioned from the beginning as mitochondrial/plastid targeting signals. In addition, no scenario of a de novo origin of the SODC pre-sequence for mitochondrial targeting explains why the appended hydrophilic domain would resemble comparable domains of Euglena and dinoflagellate plastid proteins in length, hydropathy profile, and amino acid composition. Taken together, available data are most consistent with the hypothesis that bipartite pre-sequences of SODCs evolved initially to function as plastid targeting signals.
Our proposed evolutionary pathway for SODC can be tested through isolation and sequencing of the SOD genes from algae with complex plastids. If they do not show a phylogenetic affinity with SODCs, this will be further support for the argument that trypanosomatid dismutases evolved vertically, as postulated by our model. Characterization of additional Fe-SODs from all other main lineages from the excavate superassemblage could supply further support of our hypothesis, by showing that they form a monophyletic clade with SODCs and SODAs, as already has been shown for trichomonad dismutases. Finally, GFP studies should be undertaken to determine whether the signal peptide-like domain in the pre-sequences of SODCs is still able to target a protein to complex plastids.
We are very grateful to Dr D. G. Durnford for supplying EST clusters of E. gracilis and helpful discussions about their annotation, to Dr J. W. Stiller for critical comments on a first draft of this paper and English editing, to K. Moszczyński for help with figures, and to two anonymous referees for their helpful comments and suggestions. This work was supported by funds from the research grant BW/2020/2007 to A. B.