INTRODUCTION
Bioactive small molecules generated by biosynthetic pathways operating in bacteria can profoundly influence the population structures of mixed microbial communities by serving as enzymatic cofactors, signaling molecules, or toxins. Excepting the lantibiotics, however, natural products containing unusual amino acids through modification of ribosomally translated precursor peptides frequently have been overlooked (
23).
The radical
S-adenosylmethionine (rSAM) domain family (
30) is a large superfamily of proteins with diverse members that generate a radical species by reductive cleavage of SAM. All radical SAM proteins discussed in this paper belong to Pfam (
9) family PF04055. A few radical SAM enzymes have long been known to modify peptides or proteins; PqqE cross-links a tyrosine to a glutamate in an intramolecular cyclization of PqqA as the first step in pyrroloquinoline quinone (PQQ) biosynthesis (
35), and AlbA performs three intramolecular cyclizations from cysteine side chains to synthesize the antilisterial bacteriocin subtilosin A from its precursor (
18). The methylthiotransferase RimO modifies ribosomal protein S12 (
20), and several families activate cognate enzymes dependent on a glycyl radical active site. However, functional diversity within the family is so great (
10) that mere classification as a radical SAM enzyme says very little about its molecular target or biological process. An extensive list of 68 nonoverlapping subgroups resolved by TIGRFAMs hidden Markov models (HMMs) within the radical SAM domain superfamily is summarized in Table S1 in the supplemental material. Among these, RlmN is a methyltransferase acting on 23S RNA (
33), MiaB is a methylthiotransferase acting on tRNA (
20), and SplB is a DNA repair enzyme, spore photoproduct lyase, that directly repairs thiamine dimers (
27). Radical SAM enzymes for cofactor biosynthesis include biotin synthase BioB, tyrosine lyase ThiH (thiamine pyrophosphate), lipoyl synthase LipA (
6), CofG and CofH (coenzyme F
420) (
11), the coproporphyrinogen dehydrogenase HemN (heme) and NirJ (heme d1) (
4), and two enzymes of menaquinone biosynthesis via futalosine (
16). HydG and HydE act in metal cluster assembly in iron-only hydrogenases (
25), NifB acts in nitrogenase metal cluster assembly, and HmdB acts in 5,10-methenyltetrahydromethanopterin hydrogenase metallocofactor biosynthesis. Additional characterized radical SAM families have roles in lipid metabolism, small-molecule transformations, and natural product biosynthesis.
While much is known about individual radical SAM enzymes, the family on the whole exposes the limits of legacy annotation available from public archives and of the performance of automated annotation pipelines currently in use. Most annotations are overly generic (“radical SAM domain protein” or “FeS oxidoreductases”), while specific functional assignments such as “coenzyme PQQ synthesis protein E” or “arylsulfatase regulator” (a misnomer for an anaerobic sulfatase maturase) have propagated incorrectly to numbers of homologs whose function, on inspection, clearly must differ. Many radical SAM-containing biological systems are sufficiently rare and sparsely distributed that simple clustering methods such as the COG (clusters of orthologous groups) algorithm based on bidirectional best hit linkages (
31) necessarily lump together proteins that differ in function, hindering inference about their roles in biological systems. Clearly, a large-scale reexamination of subfamilies within the radical SAM superfamily, with considerations of molecular phylogenetic trees, genome contexts, and system reconstructions performed during protein family construction (
29), would serve the scientific community well. Findings from such efforts are discussed here. The work has resulted in many new protein family definitions, included in TIGRFAMs release 10.1, both for the radical SAM families themselves and for the additional protein families that are their partners in the same biological systems.
A number of radical SAM enzymes involved in peptide modification show mutual sequence similarity in the region C terminal to the region described by Pfam model PF04055 (
2,
12). PqqE and AlbA were discussed above. In
Streptococcus thermophilus, a radical SAM enzyme (family TIGR04080) introduces a cyclization between amino acids 2 (Lys) and 6 (Trp) at the KxxxW motif in the peptide AKGDGWVKM to create the possible quorum sensing system molecule Pep1357C (
17). Recently, we described two new classes of peptide-modifying radical SAM enzymes. Family TIGR03962 is a radical SAM family putative maturase for mycofactocin, whose precursor peptide shows incredible sequence conservation across dozens of species throughout a taxonomic range that includes many actinobacteria and several
Chloroflexi,
Clostridia,
Deltaproteobacteria, and
Archaea (
12). Family TIGR04064 contains putative maturases (
12) for ribosomally translated natural product precursors of the Nif11-like and nitrile hydratase-like leader peptide families (
14). These precursors appear to share a cleavage and export system but assort promiscuously with different classes of maturases, including lantibiotic synthases, cyclodehydratases, and radical SAM enzymes. All these radical SAM enzymes known or presumed to act on peptide targets carry additional 4Fe-4S cluster-binding motifs that they share with anaerobic sulfatase-maturating enzymes (
2). The emerging picture suggests that additional close homologs may also act on peptide precursors. It should be noted, however, that several other members of this subgroup of radical SAM enzymes with extended C-terminal homology act on substrates that do not have ribosomal origin. Exceptions include BtrN, involved in synthesizing butirosin, an aminoglycoside antibiotic (
36), and NirJ from heme d1 biosynthesis.
A prevailing notion is that most natural products made in bacteria by posttranslational modification from ribosomally translated peptides are bacteriocins, peptide antibiotics able to kill rival bacteria (
23). This view, though well supported for lantibiotics, may have set too narrow a focus in experimental approaches to other ribosomal natural products; we will use the term ribosomally translated natural product (RTNP) rather than “putative bacteriocin” in the remainder of the discussion. Pyrroloquinoline quinone (PQQ) is a ribosomally derived RTNP but is a redox cofactor.
In silico analysis of the mycofactocin system shows that its signatures in comparative genomics analyses follow the same “bioinformatics grammar” as do cofactors such as PQQ (invariant residues in the propeptide region, cooccurrence and coclustering with paralogous sets of cofactor-dependent enzymes, and no exporter), rather than the grammar of bacteriocins (conservation mostly in the leader peptide, cooccurrence and coclustering with export transporters, tandem paralogs commonly observed), and so mycofactocin is predicted to be a novel redox factor (
12). Meanwhile, the first methanobactin (a copper-binding metallophore) for which the structure is known is now shown to derive from a translated peptide (
19). These reports contribute to recent expansions in our recognition of RTNPs (
28), the roles of radical SAM enzymes in their syntheses, and possible novel metabolic roles for their products.
The functionally highly diverse radical SAM domain family represents approximately 0.5% of total proteins in anaerobic bacteria and many hundreds of different biological roles, and yet only a small number have been examined experimentally. Here, we have undertaken a broad study of the radical SAM family aimed at delineating subgroups where each approximates, as well as possible, the whole of a set of enzymes that share one particular function. If that enzyme happens to belong to a pathway that has multiple protein components, getting the granularity right for the radical SAM protein creates a mechanism through which phylogenetic profiling methods (
15) help identify and build decision rules for recognizing those additional protein families that belong to the same system. Results from these analyses have identified a number of novel rSAM-containing genomic systems, described here, including a number with new RTNP precursor families. The interpretation of some of these radical SAM/RTNP systems suggests both new experimental strategies to look for microbial natural products and a broadened set of expectations for their possible biological roles.
DISCUSSION
We undertook an analysis of the radical SAM family by using phylogenetic profiling approaches, in which codistributed protein families each provide information to guide the proper construction of the other. The results of this analysis included several apparent discoveries of new peptide modification systems. Because the comparative genomics methods require multiple copies of a system to exist among the collected 1,466 reference genomes analyzed, we did not attempt to study systems with rarities comparable to those of the subtilosin A, YydG, and KxxxW systems. Therefore, there may be many additional undiscovered radical SAM-mediated peptide modification systems. Those that we did identify, however, featured radical SAM enzymes with considerable mutual sequence similarity C terminal to the region covered by Pfam model PF04055. This additional region always contained a Cys-rich motif for an additional 4Fe-4S binding site, either as described previously (
2) or in modified form. We constructed an HMM, TIGR04085, that readily identifies a branch of the radical SAM domain superfamily that appears highly enriched in peptide- and protein-modifying enzymes. The combined signature PF04055 plus TIGR04085, for a protein that is neither NirJ nor an anaerobic sulfatase maturase, marks a protein as a candidate peptide maturase. The often very small genes that encode ribosomal natural product precursors are easily overlooked; the identification of a new good marker for modified peptide precursors will aid in the detection of additional natural product biosynthesis systems.
Several of our newly described peptide modification systems show very different kinds of signatures in comparative genomics analyses than are typical among known bacteriocin production systems. To provide an interpretation of the SCIFF system, we examined apparent features from its “bioinformatics grammar.” We introduced this approach previously when we showed that the mycofactocin system exhibited a number of signatures more consistent with a role as a molecular cofactor or redox carrier than a role as a bacteriocin (
12). Biological systems that differ entirely in their makeup, such that no protein from the first system shows any sequence similarity to any component of the second, may obey similar sets of constraints if they perform similar roles, such as both producing bacteriocins or both producing a cofactor. For the different types of systems in which the core feature marking the system is a peptide maturase next to a target peptide, it is possible to identify additional aspects of its bioinformatics grammar as an aid to making well-formed hypotheses about possible biological roles.
Features in the grammars that distinguish bacteriocins from cofactor biosynthesis systems are summarized in
Table 2. Bacteriocins must be exported, while cofactors usually remain inside the cell. Consequently, bacteriocin biosynthesis loci typically are flanked by transporter genes. Bacteriocin families evolve, presumably, under strong positive selection, such that the propeptide region typically shows greater sequence divergence than the leader peptide. There may be several paralogous target peptides in the genome together with a single maturase. A polypeptide serving as a cofactor precursor, by contrast, will be encoded by a single-copy gene and will exhibit several invariant residues, such as the Glu and Tyr that are cross-linked in the first step in the biosynthesis of PQQ. Patterns of phylogenetic distribution in collections of hundreds to thousands of reference genomes clearly encode key clues to a system's role and contrast sharply between the SCIFF system (virtually universal in
Clostridia) and the His-Xaa-Ser system (sporadically distributed and regularly surrounded by markers of transposition and DNA integration). The SCIFF system is missing from only one complete genome classified as
Clostridia (
Halothermothrix orenii H 168) and two draft genomes, which makes it substantially better conserved than endospore formation, for example. It occurs in just three other
Firmicutes, plus two species classified outside the
Firmicutes by the current NCBI taxonomy tree:
Bacteroides capillosus ATCC 29799 and
Bacteroides pectinophilus ATCC 43243. But these two genomes have numerous markers for endospore formation shared with low-GC Gram-positive sporeformers, have no markers of outer membranes, have closest matches of housekeeping proteins to other
Clostridia, and clearly are misnamed and misclassified. Unlike typical cofactor biosynthesis systems, the SCIFF gene pair does not show coclustering or cooccurrence with paralogous families of cofactor-dependent enzymes, nor with any transcription factor. Instead, it has a tendency to appear in genomes next to the universal
secD gene. In fact, while the genes for the YajC and SecD subunits of the Sec complex are adjacent in species as widely separated as
Escherichia coli,
Mycobacterium tuberculosis, and
Staphylococcus aureus, the SCIFF gene pair occurs near
secD in more than 50 species and in several species occurs between
yajC and
secD with no other intervening genes. This unusual location suggests constitutive expression. The near-perfect correspondence between the gene pair and classification within the
Clostridia, the hint of constitutive expression, and its colocalization with housekeeping genes would seem to argue against the hypothesis that the SCIFF system makes an episodically produced metabolite such as a pheromone or a bacteriocin.
The His-Xaa-Ser repeat peptide system suggests novel chemistries for peptide modification. The precursor peptides are accompanied by not one but two radical SAM enzymes. In this system, comparative genomics identifies a presumptive peptide target that is longer than most bacteriocin precursors. However, only one small region of that protein family, the His-rich tripeptide repeat region, shows notable conservation, and we propose that region as the peptide modification target. A single enzyme creates three different cross-links through cysteine side chains in subtilosin A (
18), while cyclodehydratases are shown to act at multiple sites, and on heterologous targets, in thiazole/oxazole-modified microcin precursors (
21). It is likely that multiple His-Xaa-Ser sites are modified and that for each repeat the two enzymes act sequentially, although it is unclear if the target would be the histidine residue of each repeat, the serine, or both.
A surprising feature of the His-Xaa-Ser system, although one fully consistent with its highly sporadic species distribution, is that genes immediately neighboring its four-gene cassette are enriched in mobility markers: transposases, integrases, plasmid partitioning proteins, mobilization proteins, primases, restriction system proteins, toxin-antitoxin system (addiction module) proteins, and various phage protein homologs. At least 16 of 36 His-Xaa-Ser systems have such markers identifiable within three genes on one (10 cases) or both (6 cases) sides of the His-Xaa-Ser four-gene cassette. This exceeds the 25% rate that we observe for neighborhoods of Pfam model PF04013 family restriction systems and the 10 to 15% rates that we observe for one arsenite and one tellurite resistance marker. In general, proteins flanking His-Xaa-Ser systems and associated with mobility lack pairwise homology to each other, suggesting that the His-Xaa-Ser system does indicate the presence of any one specific type of mobile element. The sporadic distribution and coclustering with mobility markers cannot be consistent with housekeeping functions but could be consistent either with participation in mechanisms of lateral transfer or in providing a rapidly selectable trait such as immunity to an antibiotic, a toxic metal, or phage infection. Interestingly, histidine often serves to provide a metal-binding ligand (
22). Furthermore, a recent study proved the ribosomal origin of the methanobactin-OB3b (
19), a highly modified peptide natural product used by methanotrophs to acquire copper and change its redox state from Cu(II) to Cu(I). Natural products made from His-Xaa-Ser repeat-containing precursors conceivably could resemble siderophores and methanobactins more closely than bacteriocins, binding to and perhaps conferring resistance to one or more toxic heavy metals. The actual function, however, is unknown.
We have presented evidence that family TIGR04081 contains naturally occurring selenoproteins and inferred that the family undergoes radical SAM-mediated posttranslational modifications. We hypothesize that the mature form may function as a selenium-containing bacteriocin, that is, a selenobacteriocin. According to this hypothesis, the Se atom remains in the mature form of the natural product derived from the precursor peptide and may form a part of a novel peptide modification. In
Fig. 3, a column that is always Cys or SeCys occurs near the boundary that separates the consistently homologous N-terminal domain from a region that tends to be repetitive and low in complexity even in species with Cys instead of SeCys. All predicted SeCys residues are flanked on at least one side, and usually both, by glycine residues, as often observed for Cys residues subject to cyclization-forming modifications in other RTNP precursors.
An alternative hypothesis, however, is that the selenium atom, or sulfur atom in those species with Cys instead of SeCys at the equivalent sequence position, contributes to the molecular mechanism of modifications that occur elsewhere on the peptide, as a new example of substrate-assisted catalysis (
7). Substrate-assisted catalysis would help to guarantee that the peptide-modifying action of the radical SAM enzyme works only on appropriate targets while still allowing great plasticity in the local sequence environment around the sites to be modified. The principle of substrate-assisted catalysis may help explain the occurrence of somewhat longer leader peptides in recently discovered RTNP precursor families that show large paralogous family expansions (
14). It would therefore be of great interest to learn experimentally whether selenium remains part of the mature product or is removed with the leader peptide.
Nearly all known families in the selenoproteome are enzymes, most resembling thiol oxidoreductases with CxxU, UxxC, or related motifs. A recent study of the selenoproteome boosted the number of prokaryotic selenocysteine-containing protein families to over 50, with as few as two members found
in silico taken as sufficient to identify a new family (
38). Selenocysteine-containing RTNP families, however, present a special challenge because, in contrast to enzymes, RTNP families tend to be individually rare and tend to show weak if any sequence conservation in their propeptide regions. These traits interfere both with BLAST-based search sensitivity and with ratification of the family during manual review. The recognition here of a first selenocysteine-containing RTNP precursor family may now assist in the discovery of additional families.
The work presented here describes a contribution of many new protein family definitions for use in automated annotation pipelines that will produce more accurate genomic annotation of radical SAM enzymes in the future. In particular, it presents bioinformatics-based evidence for new families of modified peptides and new families of peptide modification enzymes. Recent bioinformatics and experimental work is expanding our catalog of, and appreciation for, natural products made from ribosomally produced peptides (
28). The new systems that we have described are considerably more widespread than previously studied model systems with radical SAM involvement in peptide modification other than PQQ biosynthesis (
Table 1). This work defines a domain (TIGR04085), found in the C-terminal region of a subset of radical SAM enzymes, for which member proteins show several variants of a previously noted iron-sulfur cluster-binding motif (see Fig. S1 in the supplemental material). This domain defines a molecular marker for probable peptide modification systems, perhaps the most abundant of any type. Several of these new biosynthetic systems are widely distributed in bacteria and should be investigated for biological roles other than bacteriocin-like antimicrobial activity.