Previous Article | Next Article ![]()
Eukaryotic Cell, December 2006, p. 2079-2091, Vol. 5, No. 12
1535-9778/06/$08.00+0 doi:10.1128/EC.00222-06
Copyright © 2006, American Society for Microbiology. All Rights Reserved.
Department of Biology, University of New Brunswick, Fredericton, New Brunswick, Canada E3B 5A3,1 Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada B3H 1X52
Received 12 July 2006/ Accepted 15 September 2006
|
|
|---|
|
|
|---|
Plants, green algae, and red algae have plastids derived from an endosymbiotic cyanobacterium, with two membranes enveloping the chloroplast (34). Protein targeting to these plastids is fairly well understood; generally, the transit peptides direct newly synthesized proteins to the outer envelope membrane, where they interact with receptors and other components of the translocation apparatus so that protein import and subsequent sorting can take place (33). Many of the translocation components present in the outer and inner envelopes have been identified in plants (31).
Many protists, however, possess secondary plastids that are believed to have arisen from endosymbiosis with a eukaryotic alga. These organisms have complex plastids with either three membranes around the chloroplast, as occurs in the dinoflagellates and Euglena spp., or four membranes, as in the stramenopiles and haptophytes (34). The presence of additional membranes surrounding the plastid would seem to necessitate additional targeting information, complicating the process of translocation. We know, for example, that during the evolution of secondary plastids, genes from the endosymbiont were functionally transferred to the host's nuclear genome. These genes must then be expressed and their protein products targeted back to the organelle, and this process is undoubtedly more complicated than that in the case of primary plastids. A significant hurdle in this pathway is the necessity to acquire appropriate targeting information that allows nucleus-encoded proteins to be directed to the plastid and to traverse additional membranes in the process. Understanding the mechanism of targeting and translocation in organisms with complex plastids has been key to understanding how the transition from algal symbiont to plastid occurred (12, 35, 47, 50, 63, 74).
In protists with four membranes around the plastid, the outermost membrane often has ribosomes attached and is typically continuous with the endoplasmic reticulum (ER) (23). Proteins directed to these plastids possess bipartite targeting sequences, with an N-terminal signal sequence (24) that directs them to the chloroplast ER, where they are cotranslationally imported across the first membrane (4, 7, 30). The domain after the signal sequence is the predicted transit peptide for transport across the inner two membranes, in a process likely to resemble translocation across plant chloroplast envelopes (43).
The euglenophytes and dinoflagellates have plastids with three membranes, the outermost of which lacks bound ribosomes. In both cases, plastid proteins are targeted through the endomembrane system (49, 53, 67, 70, 71). From studies of several complete, publicly available Euglena gracilis plastid protein sequences (13, 25, 27, 28, 38, 44, 52, 56, 61, 64, 66, 73), it was predicted that the plastid proteins have an N-terminal signal sequence, an inference that was confirmed by both in vitro (38) and in vivo (70, 71) experimental approaches. Following the signal sequence is the predicted transit peptide, which is sufficient for translocation across plant chloroplast membranes (29), and a hydrophobic region that acts as a "stop-transfer" sequence to prevent complete transport into the ER, such that the mature protein remains in the cytoplasm (69). The protein is then targeted to the plastid, likely via a vesicular transport system (67). Also described for Euglena are tripartite transit sequences that possess an additional hydrophobic domain predicted to target proteins to the thylakoid lumen (73).
Because relatively few Euglena plastid protein sequences are publicly available, the study we report here more comprehensively examines the characteristics of plastid-targeting sequences. Since many of the known Euglena proteins, including all of those for which biochemical analyses of targeting have been conducted, are encoded as polyproteins, we sought to determine whether all plastid proteins are likely to proceed to the plastid via a similar pathway in this organism. By examining the targeting sequences of a large number of plastid proteins, the majority of which are not organized as polyproteins, we have been able to define the characteristics that can be used to identify Euglena plastid-targeted proteins with high confidence and to infer modes of transport to the plastid.
|
|
|---|
To search for plastid-targeted proteins, the 9,461 clusters were translated in three reading frames (ORFs) (plus orientation), and the longest ORF of >19 amino acids starting with a methionine was retained for further analyses (http://maven.smith.edu/
vvouille/sumCGI/translator.html). Screening for plastid-targeted proteins was carried out in several rounds. First, all ORFs were screened for the presence of a signal sequence using the program SignalP3 (6, 51; http://www.cbs.dtu.dk/services/SignalP/). Any ORFs with a signal sequence predicted with the hidden Markov model (HMM) or the artificial neural network (NN) were retained. All selected ORFs were then rescreened, and those having a clear role in plastid function and/or those whose top BLASTnr hit was plant, algal, or cyanobacterial in origin were segregated for further consideration. Finally, the putative plastid-targeted proteins were screened further according to the following criteria: (i) the top BLAST hit (NCBI nonredundant database) was plant/algal or cyanobacterial and/or the protein has a clear role in plastid function, and (ii) the BLASTp E value was
1e05. The ORF was considered to possess a complete transit sequence when (i) there was evidence for a spliced leader sequence (TTTTTTTCG) at the 5' end of the cDNA that would indicate that the cDNA was full length (72), (ii) there was an extension of the ORF toward the N terminus upstream of the first region of evident amino acid sequence similarity following a BLASTp search, and (iii) the beginning of the mature protein was identified by comparison with orthologous proteins.
Potential membrane-spanning regions were identified using the hidden Markov model-based program TMHMM (39; http://www.cbs.dtu.dk/services/TMHMM/). Hydrophobicity plots were generated using the Protscale program at the exPASy site (http://www.expasy.org/tools/protscale.html), using a Kyte-Doolittle scale with a sliding window length of 7 or 19 nucleotides, as indicated. The amino acid content of peptides was calculated using the PEPSTATS program in the EMBOSS package, available at AnaBench (http://anabench.bcm.umontreal.ca/anabench/Anabench-Jsp/Welcome.jsp). Sequence logo displays were generated using the online program WebLogo (weblogo.berkeley.edu/logo.cgi).
Nucleotide sequence accession numbers. All individual EST sequences have been deposited in the NCBI dbEST database under accession numbers EG565093 [GenBank] to EG565263 [GenBank] .
|
|
|---|
|
View this table: [in a new window] |
TABLE 1. EST clusters
|
![]() View larger version (40K): [in a new window] |
FIG. 1. Characteristics of class I targeting sequences of Euglena. (A) Averaged TMHMM probabilities for 70 class I proteins identified in this study. Because the region upstream of the first TMH is of variable length (range, 2 to 32 amino acids; mean, 12.7 ± 6.7 amino acids), the data were normalized to a starting TMHMM probability of 0.1, which corresponds to the beginning of a predicted membrane-spanning region, and then averaged. The error bars show 2 standard errors. Key features of a Euglena class I targeting sequence are depicted above the graph. (B) Overview (McClade) of amino acid categories of the targeting sequences of selected plastid-targeted proteins. Colors represent different amino acids, as follows: gray, hydrophobic and nonpolar (A, C, F, G, I, L, M, P, V, W, and Y); red, acidic (D and E); purple, basic (H, K, and R); yellow, hydroxylated (S and T); and blue, polar (Q and N). (C) Sequence logo plot showing occurrence of amino acids around the signal sequence cleavage site (arrow) predicted by SignalP (neural net). The y axis is displayed as bits, as described at weblogo.berkeley.edu/logo.cgi.
|
The location of the second TMH is remarkably consistent, at 60 ± 8 amino acids following the end of the first predicted TMH, with a range of 45 to 94 amino acids. We designate this localization the "60 ± 8 rule" (Fig. 1A). The properties of the amino acids within the targeting regions of selected plastid-localized proteins are shown in Fig. 1B. In this figure, the hydrophobic regions (gray) are obvious. The presence of the two TMH motifs separated by 60 ± 8 amino acids had excellent discriminating power for identifying potential plastid-targeted proteins. For class I targeting sequences, the TMHMM program was able to predict upwards of 95% of the plastid proteins simply by searching for N-terminal regions with TMHs according to the 60 ± 8 rule. If we combined the entire set of predicted plastid proteins (all classes), the TMHMM program would have an overall success rate of 82%. In cases where the TMHMM probability did not meet the threshold for formal TMH prediction (Table 1, underlined values), the probability of a TMH was usually between 0.3 and 0.9, and the success rate would be very high if the threshold was reduced in subsequent rounds of screening. Rescreening the entire population of ORFs using the 60 ± 8 rule detected all of the class I proteins listed in Table 1, including isoforms, plus an additional 25 proteins classified as unknowns (data not shown). The TP domains of dinoflagellates, whose plastid leader sequences have a similar structure (49), are about half the size (25 ± 8 residues) (data not shown) of those of Euglena proteins.
Class IB proteins (Table 1) also possess two predicted TMHs separated by 60 ± 8 amino acids, but they have a third hydrophobic domain with a mean distance of 17 residues (range, 7 to 25 residues) downstream of the end of TMH2 (Fig. 2). This region resembles a prokaryotic signal sequence and is postulated to function in the targeting of proteins to the thylakoid lumen (73). We identified five proteins that are homologous to thylakoid lumen-localized proteins and for which biochemical evidence for this location exists (four of these class IB proteins are shown in Fig. 2, along with a lumen-targeted class II protein [see below]). Two additional proteins are predicted to function in the lumen, based on their annotation as well as their possession of a putative lumen-targeting domain (LTD). Three of the seven class IB proteins (ascorbate peroxidase, HCF136, and OEE2) contain a double Arg immediately preceding the third hydrophobic domain (data not shown); another two class IB proteins (PSI-III and cytochrome c6) have the same motif within six amino acids of the start of the hydrophobic LTD, suggesting that the twin-arginine translocation (Tat) pathway (58) is functional in Euglena.
![]() View larger version (16K): [in a new window] |
FIG. 2. Kyte-Doolittle hydropathy plots for class IB plastid-targeting sequences of Euglena. Hydrophobicity plots for five confirmed lumen-targeted proteins are shown. The analyses were conducted with a window size of 19, and the hydrophobic regions (positive scores) corresponding to the TMHs of the signal sequence (SS) and the stop-transfer sequence (ST) are indicated with black bars. The hydrophobic region corresponding to the LTD is indicated with gray bars. Oxygen-evolving enhancer 3 (OEE3) has a class II targeting sequence and thus lacks the typical ST region. TP, transit peptide; MP, mature protein.
|
![]() View larger version (37K): [in a new window] |
FIG. 3. Characteristics of class II targeting sequences of Euglena plastid proteins. (A) Scatter plot showing TMHMM probability for the first 100 amino acids. Because the region before the first TMH is of variable length, the data were normalized to a starting TMHMM probability of 0.1. In all cases, a second TMH 60 ± 8 amino acids downstream from the first was absent. The hydrophobic region centered at position 45 is the LTD of OEE3. (B) Overview (McClade) of amino acid categories of the targeting sequences of class II plastid-targeted proteins. Colors represent defined categories of amino acids, as indicated in the legend to Fig. 1. The black arrowhead indicates the predicted signal sequence cleavage site.
|
To test whether the class II TP region was similar to that of class I targeting sequences and to determine the chemical properties of both TP domains compared to the TPs of green algae and plants, we examined their amino acid compositions. We also compared these compositions to those of the mature region of proteins with class I targeting domains as well as selected Chlamydomonas proteins. The amino acid composition was calculated from the entire intervening region between the TMH regions of class I proteins (the predicted transit peptide), the estimated transit peptide from class II proteins that was located after the signal sequence and before the predicted start of the mature protein, and the entire coding region from all proteins having class I targeting sequences.
The data for selected amino acids and amino acid categories are shown in the form of box-and-whisker plots (Fig. 4). Since plastid transit peptides are reportedly enriched in hydroxylated amino acids and deficient in acidic amino acids (76), we analyzed a priori these amino acid categories in the putative TPs of class I and class II targeting sequences of Euglena in addition to a selection of 25 predicted TPs from Chlamydomonas proteins (Fig. 4). The region immediately downstream of the signal sequence in class I and II targeting sequences was significantly enriched in Ser and Thr (22% and 17%, respectively) compared to the mature regions of proteins with class I targeting sequences (11%) (one-way analysis of variance [ANOVA] and Tukey's test [
0.05]). The TPs of Chlamydomonas proteins were similarly enriched in Ser/Thr (17%) compared to the mature portions of the proteins (11%) (Fig. 4). The putative transit peptide regions of class I and II targeting sequences were also significantly depleted in acidic amino acids (Asp and Glu) compared to the mature regions of the same proteins (Fig. 4) (one-way ANOVA and Tukey's test [
0.05]). The predicted transit peptide regions were also found to have a higher Ala and Pro content than the mature portions of proteins (Fig. 4) (one-way ANOVA and Tukey's test [
0.05]). However, given that 20 tests were conducted and that the amino acid composition is not truly independent, there is a possibility that some of these differences could be by chance. Although the Chlamydomonas TP exhibited a clear elevation in Ala content, there was no difference in the amount of Pro compared to that in the Euglena TPs. In terms of charged amino acids, the TP region is deficient in acidic amino acids, yet there is little significant change in the content of basic (His, Lys, and Arg) residues compared to the mature regions of the same proteins. However, examination of Lys and Arg separately reveals discrimination against Lys in the TP regions of class I and II targeting sequences (mean, 1.6% and 2.1%, respectively) compared to the mature proteins (mean, 5.8%; P < 0.001 [Kruskall-Wallis]) (Fig. 4). There were no significant differences in Arg content between the predicted transit peptides and the mature portions of the same proteins. Chlamydomonas TPs discriminate strongly against acidic amino acids (mean, 0.2%) and have an elevated content of Arg compared to the mature regions of the same proteins. Unlike Euglena, Chlamydomonas shows no bias against Lys in the TP. Without exception, the amino acid compositions of the Euglena class I and II transit peptides were the same, and both were significantly different from the composition of the mature protein (Fig. 4).
![]() View larger version (19K): [in a new window] |
FIG. 4. Amino acid composition analyses of the predicted TPs of class I and II targeting sequences compared to the mature proteins (MP). The amino acid compositions of the intervening region between TMH1 and TMH2 of class I targeting sequences (TP, I; n = 70), the predicted transit peptide region for class II proteins (TP, II; n = 13), and the mature protein regions from class I proteins (MP, I; n = 70) were determined. Also shown are the amino acid compositions of Chlamydomonas reinhardti TPs (TP, Cr; n = 25) and mature proteins (MP, Cr; n = 25). Box-and-whisker plots were used to represent the data and are based on quartiles around the median value. The box encloses 50% of the data, with 25% above and below the median (solid line). Each whisker represents the data range of an additional 25% of the data. The existence of outliers beyond the 5% and 95% confidence ranges is indicated with a solid dot where applicable. Categories indicated with different letters on the plot are significantly different (one-way ANOVA and Tukey's test [ 0.05]). All data were normal except for the Lys content in class II peptides, in which case nonparametric statistics were used to assess differences.
|
![]() View larger version (23K): [in a new window] |
FIG. 5. Amino acid composition analysis of the plastid TP domain of class I targeting sequences. Each TP region was divided into three equal segments (TP1-3), and the basic (H, K, and R), acidic (D and E), and serine/threonine (Ser/Thr) contents were calculated. These values were compared to the averaged amino acid composition of the mature protein (MP).
|
Stop-transfer sequences are a predicted feature of class I proteins. Stop-transfer sequences function to halt the cotranslational import of proteins into the ER and serve an important role in determining the orientation of a protein in the membrane (8). For Euglena, it has been proposed that the second TMH acts as a stop-transfer sequence (69). From analysis of a large number of proteins with class I targeting sequences, it is clear that a stop-transfer sequence is a common motif in Euglena plastid-targeted proteins. In a few cases (Table 1), the second TMH region was not predicted by the TMHMM program, and the probability of having a TMH ranged from 0.1 to 0.9. Nevertheless, in these cases, subsequent hydropathy plots confirmed that these targeting domains are still strongly hydrophobic (data not shown) and therefore likely to have the same stop-transfer function. Immediately following the second TMH and within six residues of its end, ca. 80% of proteins of this class have two or more basic amino acids, and 97% of proteins have at least one. Only 2 of the 71 class I proteins lack a positively charged residue immediately after the TMH. The sharp change in polarity immediately after the second hydrophobic region, particularly towards positively charged residues, is apparent in the hydropathy plots encompassing this region (Fig. 6A). Class I polypeptides display a sharp decline in hydrophobicity immediately following the second TMH, a feature that presumably acts to block further insertion into the membrane. In class IB proteins, an additional hydrophobic region, the lumen-targeting domain, is located 25 to 30 amino acids further downstream. In contrast, class II polypeptides do not exhibit this sharp increase in polarity immediately after the hydrophobic section of the predicted signal sequence. The sequence logo illustrates the common occurrence of basic amino acids immediately after the hydrophobic domain of the stop-transfer domain (Fig. 6B), which is not observed after the signal sequences of class II proteins. These differences provide additional evidence that the TMH of a class II protein is not simply the second TMH of a 5'-truncated cDNA encoding a class I protein.
![]() View larger version (20K): [in a new window] |
FIG. 6. (A) Kyte-Doolittle hydrophobicity profiles for the stop-transfer region of class I targeting sequences and the region immediately following the signal sequence of class II targeting domains. Plots begin 10 amino acid residues upstream of the start of the second TMH (for class I proteins) or the first TMH (for class II proteins), and the hydrophobicity profiles were calculated with a window size of 7 residues. The thick lines are the mean scores, and the thin lines on either side represent the 95% confidence intervals. The black bars above the hydrophobic regions indicate the location of the predicted TMH. (B) Sequence logo plot of class IA sequences when the second transmembrane helixes (TMH2) were aligned. Only the regions immediately before and after TMH2 are shown.
|
![]() View larger version (40K): [in a new window] |
FIG. 7. Alignment of targeting sequences from selected Euglena plastid-targeted proteins. (A) Comparison of FNR and CP29 targeting sequences. Identical amino acids are white on a black background. (B) Second group of proteins possessing similar targeting sequences. Identical amino acids compared to the top sequence are indicated by white letters on a black background. The hydrophobic regions of the signal sequence and stop-transfer domains are indicated by lines above the appropriate amino acids. The mature portions of the proteins, if shown, are indicated with double underlining.
|
|
|
|---|
In Euglena, proteins targeted to the plastid do not fully insert into the ER lumen or the membrane during translation due to the presence of a stop-transfer domain, so the majority of the protein remains exposed in the cytoplasm (69). Indeed, in class I proteins, the presequence has a second hydrophobic region followed by positively charged amino acids, both of which are characteristics typical of stop-transfer sequences (14, 41). Although 2 of the 70 class I proteins lack positively charged amino acids immediately after the second TMH, such residues are not an absolute requirement for a stop-transfer function, with the effectiveness of targeting depending on a combination of hydrophobicity, length, and charge (14, 41, 60). The presence of a functioning stop-transfer motif in a plastid presequence is unique to Euglena and dinoflagellates. Both groups have three plastid membranes, leading Nassoury et al. (49) to suggest that the stop-transfer sequence arose from a mechanistic requirement driven by the number of plastid membranes. It is generally agreed that Euglena and dinoflagellates are phylogenetically distant; thus, the similarities between their targeting sequences, and presumably the underlying transport mechanisms, would appear to be convergent as part of a necessary step in protein targeting.
Although targeting in organisms with complex plastids first requires import of the protein into the ER, little is known about subsequent mechanisms of targeting to the plastid. In organisms with three plastid membranes, such as euglenophytes and dinoflagellates, targeting from the ER to the outer plastid membrane involves vesicular transport via the Golgi system (49, 53). The segregation of plastid-bound proteins into the proper vesicles may involve receptors located in the endomembrane system that recognize the transit peptide and direct the protein to its appropriate destination. This pathway is analogous to that in animal and fungal systems, where receptors within the endomembrane system, such as the classic mannose-6-phosphate receptor system for targeting to the lysosome (22), are able to recognize features of the protein and ensure proper localization. Ultimately, cytoplasmic sorting factors, such as adaptins (9), may play a role in the accumulation of plastid-targeted proteins and their segregation to vesicles destined for the plastid. Such cytosolic factors could participate in the recognition of receptors that bind to plastid-targeted proteins and/or specific motifs just beyond the stop-transfer domain of the targeted protein itself to facilitate targeting. One potential series of residues includes the cluster of basic amino acids that immediately follows the stop-transfer domain. The importance of short, cytoplasm-exposed targeting motifs for intracellular sorting is well known (9). For Euglena, Sláviková et al. (67) determined that this cytoplasm-exposed portion of the presequence is not required for plastid import in vitro, but they suggested that it may function in vesicle routing.
Of particular interest here is our discovery of plastid-targeted proteins lacking the putative stop-transfer sequence (class II), implying that these proteins are inserted entirely into the ER, leaving a soluble portion within the ER lumen and a membrane portion integrated within the ER membrane, once the signal sequence is removed. Given that the Euglena class II proteins comprise both soluble and membrane proteins, it is unlikely that other domains within the mature protein could impart a similar stop-transfer effect to compensate for the lack of such a region in the presequence. The targeting route for class II proteins is conceptually similar to the targeting of proteins to the remnant plastid (apicoplast) in apicomplexans; apicoplast proteins lack the stop-transfer sequence and are targeted to the plastid via the ER (19), presumably by vesicular transport. Thus, for correct targeting, the putative transit peptide, and possibly the mature protein, must contain features that would be recognized by specific cofactors or receptors that are localized to the ER lumen, not the cytoplasm. Since class II transit peptides are predicted to lack the stop-transfer sequence and thus the cytoplasm-exposed region just beyond, redirection to the plastid must be facilitated solely by interaction with targeting factors that bind to the TP and allow these precursors to "hitchhike" in vesicles with the class I proteins. An alternative, albeit unlikely, mechanism is that the class II signal sequence acts as a signal anchor, with the N terminus facing the ER lumen. However, in this orientation the transit peptide would be facing the cytoplasm and presumably would be inaccessible to the targeting machinery.
Even more surprising is the resemblance of this class of targeting sequence to those of dinoflagellates, whose plastid-targeted proteins also exhibit a similar proportion of presequences lacking stop-transfer domains (55), with the remainder resembling class I proteins. As possible explanations for the dinoflagellates, Patron et al. (55) ruled out the evolutionary history of the gene transfer or final destination of the protein, suggesting instead that the "physical characteristics" of the plastid-targeted protein may determine the nature of its presequence. In support of the latter hypothesis, they found that the class I and II distinction was conserved between proteins in two dinoflagellates examined. If "physical characteristics" was the main factor determining the mode of transport, then we would predict that Euglena would exhibit a similar distribution of proteins having class I and II presequences. Some similarities are clearly evident, such as with phosphoribulose kinase and oxygen-evolving enhancer 3 (PsbO), which lack a stop-transfer sequence in both dinoflagellates and Euglena. However, other dinoflagellate proteins with class II (and III) targeting sequences are class I proteins in Euglena (acyl carrier protein, carbonic anhydrase, cytochrome c6, and the PSII 11-kDa protein). Although the sample size for comparison is small, there do not appear to be any obvious inherent functional or physical properties that would require a class I versus class II targeting sequence. In vitro import assays should help to define the functional requirements of the different classes of presequence and determine whether either is essential for the import of specific proteins.
With the exception of apicomplexans, complex plastids with four plastid membranes often have ribosomes attached to the outer membrane (chloroplast ER [CER]). However, the primary plastid-sorting mechanism must still occur after cotranslational import across/into the ER membrane, since in diatoms the signal sequences of ER and plastid-resident proteins are functionally equivalent (37), and in a raphidophyte, few ribosomes are bound to the CER (30). Thus, once inserted into the endomembrane system, the plastid-bound proteins still have to be targeted to and transverse at least three membranes, similar to the situation in Euglena and dinoflagellates. Though not involving the Golgi dictyosomes, a vesicular transfer between the CER and the third membrane has been proposed (23), and a recent report supports such a mechanism (37). Apicomplexans, in particular, provide a valuable model system for dissecting the targeting process in complex plastids with four membranes, with several studies indicating not only that there is partially redundant targeting information in the presequences of apicoplast-targeted proteins (26, 81, 82) but also that there is a distinction between the information for targeting and that for import into the apicoplast (26). Recent work has even identified proteins that interact with the TP and that may be involved in sorting from the ER to the apicoplast (82).
In Euglena, the region between the signal sequence and the stop-transfer sequence in class I proteins functions as a TP (67). This region and the TPs of class II proteins possess characteristics typical of most TPs. These similarities include enrichment in Ser/Thr (S/T bias) and Ala. S/T bias is a common feature of most transit peptides of plants and algae (2, 4, 11, 15, 20, 49, 54, 55, 59, 76). Some notable exceptions to this rule include apicomplexans (77, 78) and nucleomorph-encoded plastid proteins from the cryptomonad alga Guillardia theta (57). Replacement of all Ser/Thr residues in the TP of Plasmodium had no effect on plastid targeting, demonstrating a lack of a requirement for such residues (78). Although an elevated Ser/Thr content is evidently dispensable in apicomplexans, it remains one of the more consistent features of most TPs, which may reflect a requirement for phosphorylation-dependent binding of 14-3-3 proteins as part of a preinsertion guidance complex (46).
Euglena TPs also have an overall positive charge, an apparently universal feature of TPs (57), that is primarily due to a reduction in the content of acidic amino acids. Of particular interest is the asymmetric distribution of acidic amino acids in the TP, with the first two-thirds being deficient in such residues, whereas the remaining third has a composition resembling that of the mature protein. This asymmetry may reflect a distinction between functional TPs (with a bias against acidic residues) and regions having a different function. The importance of a TP depleted in acidic residues was demonstrated in Plasmodium, where the replacement of basic with acidic amino acids eliminated apicoplast targeting (19). Interestingly, Euglena TPs are also deficient in Lys (but not Arg) and have biases in favor of Ala and Pro compared to mature proteins, which are also features of the TPs of the chlorarachniophyte Bigelowiella natans (59). Some of the shared features of TPs, such as a bias against acidic amino acids and a bias in favor of some hydrophobic residues, may be due to a requirement for binding of import factors, such as molecular chaperones (Hsp70) (83). Although the biological significance of the biased amino acid composition in TPs is not entirely understood, and despite any differences in primary structure, TPs from diverse plastid types are functionally sufficient in heterologous import assays (3, 32, 42, 49, 67, 79).
The striking amino acid similarity between certain plastid-targeting sequences is surprising. In general, transit peptides lack evident sequence similarity, even among paralogs of the same gene family, so the detection of clusters of related targeting sequences may shed light on how targeting sequences were acquired following transfer of the endosymbiont's genes to the host nucleus during plastid evolution. Reports of highly similar plastid and mitochondrial TPs are relatively rare, but the examples can be separated into two categories. In the first case, homologs from different species exhibit a greater-than-expected similarity within the TP region compared to that of the mature proteins, which is attributed to a conserved functional role (80). The second category includes unrelated proteins that possess highly similar TPs (1, 5, 40, 45), which is what we observe in Euglena. This similarity is often attributed to exon shuffling, as introns commonly separate the transit peptide from the mature protein (36, 45). There are also reports of transit peptide acquisition through insertion into preexisting genes for plastid (5)- and mitochondrion-targeted (1) proteins. Thus, the newly transferred genes would acquire not only the targeting mechanism but also the regulatory sequences required for expression, in the so-called "lucky insertion scenario" (21). Although we lack the appropriate genomic information from Euglena to be able to completely assess the mechanism of TP acquisition, a genomic sequence for an LHCII gene of this organism does have an intron that roughly separates the predicted targeting domain from the mature protein (48), suggesting exon shuffling as a potential mode of TP acquisition. However, the similarity of the rpL3 and rpL21 presequences to a small portion of the LHCI mature protein (GFDPLGL) (Fig. 7) suggests that TP acquisition by insertion into a preexisting copy of the LHCI gene is also a strong possibility. The maintenance of a continued high degree of conservation between rpL21-rpL3 and CP29-FNR could also imply recent recombination, or perhaps alternative splicing, as described for rice mitochondrion-targeted rpS14 and SDHB proteins (40). The pronounced sequence conservation within these regions also raises the possibility that these targeting sequences have an additional function(s) in the cell, either before or after cleavage, as proposed for some mammalian signal sequences (10, 17).
In summary, we have characterized two distinct classes of Euglena plastid presequences, i.e., classes I and II, that differ by the presence and absence of a predicted stop-transfer sequence, respectively, revealing an additional level of complexity in the protein transport mechanism. In addition to enhancing our ability to predict Euglena presequences, we expect that the characteristics of these TPs will stimulate further import studies, both in vitro and in vivo, seeking to dissect the processes of targeting and import into the complex plastids of Euglena.
We are grateful to Patrick Keeling for sharing a paper on dinoflagellate targeting sequences prior to publication. We also thank Steve Heard and Penny Humby for helpful discussions on statistics. The technical assistance of H. Rissler, who isolated RNAs from Euglena for the construction of two of the five cDNA libraries sequenced for this study, is acknowledged.
Published ahead of print on 22 September 2006. ![]()
|
|
|---|
-subunit of R-phycoerythrin and its possible mode of transport into the plastid of red algae. J. Biol. Chem. 268:16208-16215.
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2010 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»