Previous Article | Next Article ![]()
Eukaryotic Cell, April 2002, p. 304-310, Vol. 1, No. 2
1535-9778/02/$04.00+0 DOI: 10.1128/EC.1.2.304-310.2002
Copyright © 2002, American Society for Microbiology. All Rights Reserved.
The Canadian Institute for Advanced Research, Program in Evolutionary Biology, Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia B3H 4H7, Canada
Received 12 December 2001/ Accepted 29 January 2002
|
|
|---|
|
|
|---|
We have chosen diplomonads, a group of anaerobic protists, to test the hypothesis that LGT from prokaryotes to protists is occurring at an appreciable rate. There is an ongoing whole-genome sequence project for Giardia lamblia, the most studied diplomonad, that releases newly generated sequences on a regular basis to GenBank (19). In our laboratory we have focused on another diplomonad, the Atlantic salmon parasite Spironucleus barkhanus (25), and we are, together with our collaborators, performing a sequence survey project. Genes that were laterally transferred from prokaryotes to the diplomonad lineage are expected to show higher similarity to prokaryotic genes than to any eukaryotic gene. Therefore, candidate laterally transferred genes are easily identified in these projects with similarity searches against the public sequence databases. Such genes have been further analyzed with phylogenetic methods that will give a much more reliable indication of whether the gene originated via LGT. Furthermore, if the gene is present in both the Giardia and Spironucleus projects, the phylogeny will show whether the gene has a common source in both lineages or it was a recent acquisition in one of the two lineages. Here we present such an analysis of the sud gene, a diplomonad gene with greater similarities to prokaryotic than to eukaryotic homologues. Sequences with the highest similarities to this gene corresponded to a conserved gene pair, sudA and sudB, present in Thermotoga maritima, Treponema pallidum, and Pyrococcus furiosus, suggesting an LGT event from a prokaryote to a eukaryotic ancestor of diplomonads. The gene pair encodes the two subunits of sulfide dehydrogenase (15). However, the C-terminal part of the diplomonad protein also shows high similarity to the small subunit of eubacterial glutamate synthase (gltD).
Glutamate synthase is a complex iron-sulfur flavoprotein that participates in ammonia assimilation processes. The eubacterial version of the enzyme is composed of two subunits. The small subunit is a flavin adenine dinucleotide-dependent NADPH oxidoreductase which shows sequence similarity to several other protein domains and enzyme subunits. Indeed, the small subunit of eubacterial glutamate synthase has been proposed to be a prototype of protein domains or enzyme subunits used in many different contexts to transfer electrons from NAD(P)H to an acceptor protein or protein domain (28). In the case of glutamate synthase, the small subunit provides electrons to the large subunit, which binds L-glutamine and 2-oxoglutarate and forms L-glutamate (28).
Sulfide dehydrogenase is an iron-sulfur flavoprotein without glutamate synthase activity (15). The enzyme consists of two subunits: the
-subunit, which is clearly homologous to the small subunit of glutamate synthase, and the ß-subunit, which does not show any sequence similarity to the large subunit of glutamate synthase at all (15). Thus, the
-subunit of sulfide dehydrogenase is an example of an enzymatic subunit that is homologous (and very similar) to the small subunit of glutamate synthase but that is used in a different context. The two subunits of sulfide dehydrogenase are encoded by sudA and sudB, two genes found next to each other on the chromosome in P. furiosus (15). Sulfide dehydrogenase not only catalyzes the reduction of polysulfide to H2S in P. furiosus, but it has also been shown to function as a reduced ferrodoxin-NADP oxidoreductase (15, 18). These dual functions of the protein complicate the interpretation of the physiological role for the enzyme. Indeed, the biological significance of the sulfide dehydrogenase activity in P. furiosus has recently been questioned, since the activity of the enzyme did not respond to changes in the availability of elemental sulfur (1).
To determine if the diplomonad gene shared a common ancestry with other sulfide dehydrogenases that would be suggestive of a shared function, and to elucidate its origin in the genomes of diplomonads, we retrieved all homologues of gltD and gltD-like genes from the public databases, studied the gene organization patterns around the homologues, and performed phylogenetic analyses. We identified two cases of prokaryote-to-eukaryote LGT, showing that interdomain LGT involving microbial eukaryotes may occur relatively frequently.
|
|
|---|
PCR assays and nucleotide sequencing. The gene for sulfide dehydrogenase--a fusion of sudA and sudBwas amplified from S. barkhanus genomic DNA by degenerate PCR under standard conditions with the following amplification parameters: denaturation at 95°C for 30 s, annealing at 48°C for 1 min, and extension at 72°C for 3 min. Forty cycles were performed and a final extension step of 72°C for 10 min was used. The forward primer GLTDf1 (5'-GGTCGAGTHTGYCCHYARGA-3') and the reverse primer GLTDr1 (5'-CAATCCATDGCDACRTTDCC-3') were used. The PCR product was purified and sequenced directly. Uneven PCR (7) was used to obtain the 5' end of the gene. Exact match primers from the obtained sequence were used together with arbitrary short primers in a reaction that used two radically different annealing temperatures (42 and 55°C) in alternate annealing steps, as described previously (7). This procedure yielded a PCR product that covered the 5' end of the gene, which was cloned. Several independent clones were sequenced.
Phylogenetic analysis.
All published gltD, gltD-like, and sudA sequences and all available unpublished sudA sequences were retrieved from the National Center for Biotechnology Information. The Dictyostelium discoideum Genome Project database (http://dicty.sdsc.edu) was also searched for homologues, with a negative result. Although the N-terminal part of eukaryotic dihydropyrimidine dehydrogenase has been shown to be homologous to the small subunit of glutamate synthase (28), we did not include it in our data set since the gene is only distantly related to gltD. This resulted in a data set with 399 unambiguously aligned positions. Sequences with >85% amino acid sequence identities were excluded from the data set to reduce the computational time. At the initial phylogenetic analysis, five sequences failed the
2 tests for deviation of amino acid frequencies implemented in TREE-PUZZLE, version 4.02 (26) (a partial Entamoeba histolytica gltD-like sequence, a Rhodobacter capsulatus gltD-like sequence, a Plasmodium falciparum gltS sequence, and gltD sequences from Pseudomonas aeruginosa and Campylobacter jejuni). These sequences were excluded since the currently available phylogenetic methods cannot deal with strong amino acid compositional biases in the data (13). This procedure resulted in a 53-taxon data set with 399 amino acids (aa). Protein maximum likelihood (ML) phylogenies were inferred using PROML within the PHYLIP package, version 3.6 (11), with a Dayhoff (PAM 001) substitution model (the only supported model in version 3.6a of PROML) and a mixed four-category discrete-gamma model of among-site rate variation plus invariable sites (PAM plus
plus Inv). Ten random additions with global rearrangements were used to find the optimal tree (shown below in Fig. 1). The
shape parameter,
, was estimated to be 1.07, and the fraction of invariable sites, Pinv, was 0.01 using TREE-PUZZLE, version 4.02 (26). Protein ML distance bootstrap values for bipartitions were calculated by analysis of 500 resampled data sets using PUZZLEBOOT (http://www.tree-puzzle.de) with a Jones-Taylor-Thornton (JTT) substitution model and a mixed eight-category discrete-gamma model (
= 1.25) of among-site rate variation plus invariable sites (Pinv = 0.01) (JTT plus
plus Inv). Due to the computational intensity, full ML bootstrapping was not performed on the 53-taxon data set.
![]() View larger version (74K): [in a new window] |
FIG. 1. The ML tree of the inferred amino acid sequence of gltD, sudA, and gltD-like genes and their gene arrangements. See Materials and Methods for gene nomenclature. The different typefaces of the species names indicate whether the species belongs to the eukaryotes (all capitals), Archaea (lowercase, normal), or Bacteria (lowercase italic). Three main parts of the tree are indicated with large boxes, as follows: a clade with mostly gltD genes (A); a clade with only sudA genes (B); and a paraphyletic group with different gltD-like genes (C). The tree was arbitrarily rooted between the gltD clade and the paraphyletic group of gltD-like genes. Black boxes represent the homologous part of the gltD, gltD-like, and sudA sequences (used in analysis), open boxes represent gltB genes, gray boxes represent sudB genes, and boxes with dotted borders represent unique N- or C-terminal extensions of the gltD-like genes. Note that there is no detectable sequence homology between the extensions represented by dotted boxes. A thin line between two boxes indicates neighboring genes on the chromosome. Two boxes attached to each other indicate fusion of the genes. Protein ML-distance bootstrap values >50% for bipartitions are shown.
|
2 amino acid composition test implemented in TREE-PUZZLE, version 4.02 (26), and were removed from further analysis. Protein ML phylogenies were inferred as described above except that an eight-category discrete-gamma model was used (
= 0.88 with no invariable sites detected). Protein ML bootstrap values for bipartitions were calculated by analysis of 500 resampled data sets using PROML within the PHYLIP package, version 3.6a (11), with one random addition with global rearrangements. Nucleotide sequence accession numbers. The G. lamblia sudA sequence was assembled using the following single-pass reads deposited in GenBank by the Josephine Bay Paul Center for Comparative Molecular Biology and Evolution (19) with the following accession numbers: AC030366, AC031039, AC031040, AC035392, AC043148, AC048816, AC051395, AC062548, AC084971, AC084972, AC086120, and AC082857. The S. barkhanus sequence reported here was deposited in GenBank under the accession number AF455034.
|
|
|---|
Phylogenetic analysis of gltD and its homologues.
The protein ML tree of the gene sequences of gltD and its homologues is shown in Fig. 1. The phylogenetic tree can be divided into three distinct parts: one well-supported clade (Fig. 1A) that includes mostly gltD sequences; one clade with sudA sequences (Fig. 1B); and a paraphyletic group (Fig. 1C) with six gltD-like sequences. However, the root of the tree is unknown and could be within one of these groupings. Within clade A, three distinct groups are found, two well-supported clades consisting of
- and
-proteobacteria, respectively, and a large clade with a monophyletic eukaryotic clade nested within low-GC gram-positive bacteria, indicating an LGT from this eubacterial group to an early eukaryote (see discussion below) (Fig. 1A). With a few exceptions (see below), all eubacterial sequences within the gltD clade form a gene pair with gltB, which encodes the large subunit of glutamate synthase (Fig. 1A). This indicates that the glutamate synthase operon structure initially identified in Escherichia coli (5) is conserved within eubacteria, with the exception that the gene order between gltB and gltD is reversed in
-proteobacteria. In eukaryotes, glutamate synthase is encoded by a single gene, gltS, which arose by gene fusion of the eubacterial gltB and gltD genes (Fig. 1A).
Interestingly, there are also three eubacterial gltD-like sequencestwo aegA genes from E. coli (6) and the dsrL gene from Allochromatium vinosum (GenBank accession number U84760)nested among the gltD sequences (Fig. 1A). These genes probably encode functions distinct from glutamate synthase; they have N- or C-terminal extensions unrelated to the large subunit of glutamate synthase (Fig. 1) and they are not linked to gltB. It is likely that these enzymes have originated from gene duplications of gltD, followed by fusions with different domains, and they have acquired new functions in these
-proteobacteria.
A clade consisting of sulfide dehydrogenase genes. A second large set of gene order conservation, including the P. furiosus sulfide dehydrogenase genes, was found for clade B (Fig. 1B). The sudB genes are found upstream of, or fused to, the sudA genes in all species in the clade except A. aeolicus. Gene order conservation is rare in distantly related prokaryotes and is very unlikely to indicate ancient organization that was retained solely by chance (29). Therefore, the gene order conservation between the sudA and sudB homologues (Fig. 1) strongly suggests that the gene products functionally, and probably physically, associate (interact) in vivo. Three iron-cluster-binding motifs have been proposed for the P. furiosus sulfide dehydrogenase (15). These show a high degree of conservation within the clade; all positions are conserved in all species containing the sud gene pair except in G. lamblia, which has substitutions in 2 of the 11 proposed iron-binding positions (Fig. 2). The monophyly of sudA (Fig. 1), the gene order conservation of sudA and sudB (Fig. 1), and the conservation of the iron-cluster-binding motifs strongly suggest that the gene pair (or a single gene in the diplomonad case) encodes sulfide dehydrogenases in these organisms. The most likely origin for the gene in the two diplomonads is by LGT, since the gene sequences are nested among prokaryotes and the gene is only distantly related to other eukaryotic sequences (see discussion below).
![]() View larger version (45K): [in a new window] |
FIG. 2. Amino acid alignment of the iron-cluster binding motifs in sulfide dehydrogenase, proposed by Hagen et al. (15). The numbers above the alignments refer to the amino acid position in the P. furiosus SudB (cluster I) and SudA (cluster II and III) proteins. The proposed ligands are indicated by shadowed boxes. Note that cluster I has been proposed to have a novel Asp(Cys)3 binding motif (15).
|
Several cases of LGT within the sulfide dehydrogenase clade. Preliminary individual phylogenetic reconstructions showed congruity between the sudA and sudB gene topologies (data not shown), suggesting that they share a common history. We therefore concatenated the two proteins in a combined phylogeny to increase the resolution within the sulfide dehydrogenase clade. Five nodes in the protein ML tree of the sud genes show strong bootstrap support (Fig. 3). The Pyrococcus and the diplomonad sequences form two strongly supported monophyletic clusters, as expected (Fig. 3). The Fibrobacter succinogenes sequence forms a clade with the Treponema denticola sequence, with the Porphyromonas gingivalis sequence as an immediate outgroup. Finally, the low-GC gram-positive Enterococcus faecalis forms a strongly supported monophyletic grouping with T. maritima (Fig. 3), which is inconsistent with accepted views of eubacterial phylogeny (4) and which strongly suggests an LGT event between the two lineages. This is unlikely to be a rare case of LGT in the history of the sud genes. The scattered distribution of the sud genes among microbes from the three domains of life is unlikely to be solely due to differential gene loss; clearly, that would require a vast number of independent gene losses in all three domains of life. Thus, propagation of the sud genes via multiple LGT events combined with vertical transmission is the most parsimonious explanation of the distribution of sud genes among microbes. Since the gene is only found in diplomonads among the studied eukaryotes and the diplomonads are nested within a prokaryotic clade (Fig. 1 and 3), an ancestor of diplomonads almost certainly acquired the gene from a prokaryotic source via LGT. Unfortunately, the donor lineage cannot be determined due to the limited resolution in the phylogenetic reconstructions (Fig. 1B and 3).
![]() View larger version (27K): [in a new window] |
FIG. 3. Phylogeny of concatenated sudA and sudB protein sequences. Protein ML bootstrap and ML distance bootstrap support values are shown above and below the branches, respectively. Only support values >50% for bipartitions are shown.
|
Eukaryotic gltS may have a eubacterial origin via LGT. The LGT of the sud genes from a prokaryote to the diplomonad lineage may not be the only case of prokaryote-to-eukaryote transfer in the evolution of the gltD and gltD-like genes. Interestingly, eukaryotic gltS is clearly nested (bootstrap support value of 96) within a bacterial clade consisting of mostly gram-positive bacteria (Fig. 1A). This is unlikely the result of an endosymbiotic gene replacement from the mitochondria or chloroplast, since the eukaryotic sequences fail to show affinity to proteobacteria or cyanobacteria, the ancestors of mitochondria and chloroplasts, respectively. Therefore, an origin via LGT independent of the eukaryotic organelles, perhaps from a gram-positive eubacterium in the common ancestor of animals, plants, and fungi, is the most likely scenario.
The P. falciparum gltS sequence (excluded from the large phylogenetic analysis due to its strong amino acid composition bias) showed affinity to the large bacterial clade in preliminary phylogenetic analyses but failed to branch with the other eukaryotic gltS sequences (data not shown). However, the P. falciparum sequence appears to have experienced the same fusion event as the other eukaryotic gltS sequences, indicating that the failure of this sequence to group with other eukaryotic gltS sequences is likely an artifact caused by its extreme amino acid composition bias.
Two examples of gene fusion events of prokaryotic operons in eukaryotes. The gene arrangements of glutamate synthase and sulfide dehydrogenase in prokaryotes versus eukaryotes are strikingly similar; in both cases a conserved gene pair occurs in prokaryotes that corresponds to a single fused gene in eukaryotes. Furthermore, in both cases, the eukaryote homologues probably originated via LGT from a prokaryotic source. The probability of successful transfers of the two subunits simultaneously was certainly increased by the conserved gene arrangements within prokaryotes. The gene fusions in eukaryotes likely reflect the differences in gene regulation between eukaryotes and prokaryotes. Fusions of genes that code for different subunits of the same enzyme are probably strongly selectively advantageous in eukaryotes, which lack operon structures, because they provide an easy solution for coregulation of the expression of the different subunits.
The gene fusions reported in this study probably resulted from two distinct fusion events: one that created the eukaryotic glutamate synthase gene and one that yielded the diplomonad sulfide dehydrogenase gene. Both appear to have occurred relatively soon after the LGT events. However, the evolutionary signal in gene fusions should not be overstated. The presence of similar fusions in distantly related eukaryotes does not necessarily mean that the genes are the result of a single transfer followed by a fusion. Two independent transfers of the same prokaryotic operon to two different eukaryotic lineages can easily lead to two independent gene fusions, since selection for coordinate regulation probably strongly favors such fusion events. Therefore, analyses of gene fusions should be interpreted in conjunction with phylogenetic analyses based on gene sequences.
LGT might play an important role in protistan genome evolution. Although it is always difficult to discern mechanisms for ancient evolutionary events, it has been suggested that a gene transfer ratchet mechanism may exist for eukaryotes whereby genes from prokaryotes taken up as food are incorporated into the eukaryotic nucleus (9). Such a mechanism would explain the transfer of sulfide dehydrogenase from a prokaryote to the common ancestor of diplomonads, which likely engulfed prokaryotes as food, as well as the putative transfer of glutamate synthase from a gram-positive eubacterium to a common ancestor of fungi, animals, and plantsan ancestor that most certainly was a single-celled organism.
Our analysis of sulfide dehydrogenase, together with several recently reported cases of prokaryote-to-eukaryote gene transfers (3, 8, 12, 16, 27), indicates that LGT may play an important role in protistan genome evolution. In addition, the putative transfer of glutamate synthase suggests that LGT also shaped the genomes of the protist ancestors of major multicellular eukaryote lineages.
J.O.A. is supported by a Postdoctoral Fellowship from the Wenner-Gren Foundations. A.J.R. is supported by the CIAR Program in Evolutionary Biology. This work was supported by a Natural Sciences and Engineering Research Council Genomics Project, grant 228263-99, awarded to A.J.R.
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»