Bioinformatics analysis of the phylum Bacteroidetes.
The large and diverse phylum Bacteroidetes harbors Gram-stain-negative, chemo-organotrophic, non-spore forming rod shaped bacteria (
47), graded into six so-called classes (
48,
49). Members have colonized all types of habitats, including soil, ocean, freshwater, and the gastrointestinal tract of animals (
38). Species from the mostly anaerobic
Bacteroidia class are predominantly found in gastrointestinal tracts, while environmental Bacteroidetes belong primarily to the
Flavobacteriia,
Cytophagia,
Chitinophagia,
Saprospiria, and
Sphingobacteriia classes (
48,
49). Environmental studies based on amplicon diversity of adenylation and ketosynthase domains gave a first glance to the genetic potential of the phylum for the biosynthesis of NPs (
46). In order to map the Bacteroidetes phylum systematically in terms of their BGC potential, we selected publicly available, closed, and annotated genomes in addition with some whole genome shotgun (WGS) projects at the time of data processing. In total, 600 genomes were analyzed using the “antibiotics and secondary metabolite analysis shell” (antiSMASH 5.0) (
50). The determined total BGC amount as well as specific amount of NRPS, PKS, and hybrid BGCs was assigned to each single strain and set to the taxonomic context of the Bacteroidetes phylum, based on a phylogenetic tree calculated on complete 16
S rRNA gene sequences (
Fig. 1A). Assigning their BGC amount and types over the phylogenetic tree enabled comparisons between different classes and genera in terms of BGC amount and type. In most cases, a small linear positive correlation between genome size and number of secondary metabolite BGCs per genome is given, a phenomenon known from other bacteria (
51) (
Fig. 1A and
B).
Mainly bacteria of the classes
Bacteroidia and
Flavobacteriia, with the smallest average genome size (3.78 and 3.51 Mbps) and an average BGC amount per strain (1.15 and 3.19 BGCs), display less significance for NP discovery. Exceptions are found in the genera
Kordia (5.33 Mbps and 10 BGCs on average, three unique genomes analyzed [
n = 3]) and
Chryseobacteria (4.43 Mbps and 6.19 BGCs on average,
n = 48) with up to five BGCs of the NRPS and/or PKS type. Many strains of these classes are pathogens and inhabit environments that are characterized by higher stability and lower complexity (e.g., guts) (
38). NP production is an adaptive mechanism providing evolutionary fitness upon changing environmental conditions and in the presence of growth competitors (
52). In accordance, the most talented bacterial NP producers, like the Actinomycetes (
30) and Myxobacteria (
53) are mainly found in highly competitive environments as e.g., soils. This correlation can also be seen within the Bacteriodetes phylum. In contrast to the anaerobic and pathogenic species, a higher BGC load was observed in the freely living and aerobic classes. The
Sphingobacteriia and
Cytophagia classes have an average genome size of 5.55 Mbps and 5.57 Mbps respectively, and an average BGC load of 5.79 and 4.63 per strain. An outlier is the genus
Pedobacter with up to 22 BGCs on a single genome. Nevertheless, our analysis revealed that the
Chitinophagia class outcompetes the other phyla members in respect to BGC amount per genome. Summarized, the class matched up with 11.4 BGCs per strain and genomes of an average size of 6.64 Mbps. Within this class, the genus
Chitinophaga (
n = 47) accumulates a enriched amount of 15.7 BGCs per strain on an average genome size of 7.51 Mbps. Thirty percent of their BGCs belong to the classes of NRPS and PKS, including rare
trans-AT PKS BGCs. The genus with the second highest BGC load within the Bacteroidetes phylum is
Taibaiella that also belongs to the
Chitinophagia class, with a 27% smaller genome size and on average 19% less BGCs (
Fig. 1C and
D).
In order to discover novel chemistry, the pure amount of BGCs is of subordinate importance in comparison to the BGCs divergence, predicted to translate into structural diversity within the encoded metabolites (
20,
54). Thus, we expanded the computational analysis by examining the sequential and compositional similarity of the BGCs detected in the 600 genomes using the “biosynthetic gene similarity clustering and prospecting engine” (BiG-SCAPE v1.0.0) (
26). BiG-SCAPE creates a distance matrix by calculating the distance between every pair of BGC in the data set. The distance matrix combines three metrics, the percentage of shared domain types (Jaccard index), the similarity between aligned domain sequences (Domain sequence similarity) and the similarity of domain pair types (Adjacency index). The comparative analysis of the Bacteroidetes BGCs with the integrated MIBiG (Minimum Information about a Biosynthetic Gene cluster, v1.4) database (
55) enabled their correlation to 1,796-deposited BGCs and consequently the correlation of their synthesized metabolites. In total, in 415 of the 600 genomes analyzed, 2,594 BGCs were detected and grouped with a default similarity score cutoff of c = 0.6 into a sequence similarity network with 306 gene cluster families (GCFs). Only 12 GCFs clustered with MiBIG reference BGCs of known function. Together, those 12 GCFs comprise 11.5% (298 BGCs) of all detected Bacteroidetes BGCs. Nine of them belonged to the BiG-SCAPE BGC classes of “PKSother,” “Terpenes,” or “Other.” These GCFs encode known NP classes like biotin, ectoine,
N-acyl glycin, and eicosapentaenoic acid, as well as products of the NRPS-independent siderophore (NIS) synthetase type, precisely desferrioxamine and bisucaberin B (
56), forming two connected though distinct clouds (
Fig. 2). A Cytophagales specific GCF of the terpene class includes the MIBiG BGC0000650, encoding the carotenoid flexixanthin (
57). The Bacteroidetes are well known producer of flexirubin-like pigments (aryl polyenes), which is reflected in a conserved biosynthesis across several genera (
58,
59). The flexirubin gene cluster cloud (GCC) covers 268 BGCs from 252 individual strains and five of six analyzed classes. In our analysis, only the newly formed
Saprospiria class (
49) was an exception. However, considering that the analysis included only two
Saprospiria strains it does not yet allow any integral assessment of its capabilities to produce these yellow pigments. The flexirubin GCC can be divided into at least five distinct GCFs. Resolved on class level this revealed a specific
Flavobacteriia family including BGC0000838 from
Flavobacterium johnsoniae UW101 (
58) as well as a specific
Chitinophagia family including BGC0000839 from
Chitinophaga pinensis DSM 2588 (
59). The latter in turn being directly connected to a third GCF, in majority covering BGCs from its genus
Chitinophaga while not including a reference BGC.
In addition, the reference BGCs described to encode the bioactive NPs monobactam SQ 28,332 (
60,
61), elansolids (
40,
62), and pinensins (
41), are annotated to three distinct GCFs (
Fig. 2A). The monobactam BGC (BGC0001672) was unique and only identified in its described producer strain
Flexibacter sp. ATCC 35103. Elansolids and pinensins represent patent protected chemical entities active against Gram-positive bacteria and filamentous fungi and yeasts, respectively. The complete elansolid encoding BGC (BGC0000178), almost 80 kbp in size, was identified in the genomes of strain
Chitinophaga sp. YR627 and
Chitinophaga pinensis DSM 2588 (=DSM 28390 [
63]) with the genetic potential to produce elansolid already proposed for
C. pinensis (
64) (Fig. S1A). Besides the original producer strain,
Chitinophaga sancti DSM 21134 (
65), these strains provide alternative bioresources to access these polyketide-derived macrolides. In addition, both strains also harbor the pinensin BGC directly co-localized with the elansolid-type BGC. This co-localization leads to an artificial connectivity between both GCFs by using the chosen BiG-SCAPE parameters. Manual curation revealed six strains carrying a pinensin-like BGC in their genome in total (Fig. S1B). The alignment of the RiPP core peptide revealed that only the amino acid sequence from strain
Chitinophaga sp. YR627 was identical to the described pinensin sequence. The other strains show amino acid sequence variations, pointing toward structural variance (Fig. S1C).
With >200 GCFs identified and only 12 of them annotated toward known BGCs and their metabolites, the sequential and compositional similarity analysis revealed a BGC diversity within the Bacteroidetes phylum, differing from the composition of known BGCs deposited in the MiBIG database.
Extension of this similarity network analysis toward taxonomic relations on phylum level showed that Bacteroidetes BGC of the RiPPs, NRPS, PKS, and hybrid NRPS/PKS classes are unique compared with BGCs of any other phylum (
Fig. 2B). This in turn provides a strong evidence of a general high potential to discover novel metabolites from this phylum. The majority (~66%) of all GCFs belonging to the above-mentioned BGC classes are found in the
Chitinophagia class, only representing 13.7% of the analyzed strains. Within this class, the genus
Chitinophaga can be prioritized in terms of BGC amount and composition. In respect to many more complete unique and novel RiPP, NRPS, PKS, and hybrid BGCs thereof (not depicted in the network), this is a strong indication that the biosynthetic potential within this genus is far from being fully exploited. It can be considered as the most promising starting point for the discovery of novel metabolites within this phylum.
Metabolomics of the Chitinophaga.
Based on the genomic data evaluation, we selected the
Chitinophaga for performing a bioactivity guided NP discovery program. NPs are considered to be nonessential metabolites for bacterial growth and reproduction but rather providing evolutionary fitness, thus, being expressed as adaptive response to changing environmental conditions. Consequently, the discovered BGC potential is not expected to translate into the actually produced metabolite pattern under laboratory conditions (
22). A common theme of strategies to approach this challenging link is the cultivation in several media variants exposing the strains to various stress conditions, e.g., nutrient depletion (
66,
67). As a consequence of nutrient depletion, bacteria enter the stationary phase and reduce or even cease growth, often found to coincide with induction of secondary metabolite production (
66,
68). To trigger these events, we cultivated a diversity of 25
Chitinophaga strains (Table S1) in five different media for 4 as well as 7 days.
The metabolites were extracted from freeze-dried culture broths with methanol and the organic extracts were subsequently analyzed by ultra-high performance liquid chromatography-high resolution mass spectrometry (UHPLC-QTOF-HR-MS). LC-MS data sets from a total of 250 extracts (and media controls) were examined allowing the definition of strain-specific molecular features. In an initial step, features (represented by
m/z, retention time, isotope pattern) were calculated within all extracts. Curation of all data sets was necessary to filter background noise and confirm the authenticity of defined features. This curation step helped to avoid false uniqueness due to concentrations near the corresponding detection limit and to reduce the possibility of picking up background noise. Furthermore, the possibility of multiple mass spectrometric features for any NP contributes to the complexity of those data sets, e.g., by the formation of different ion adducts and in-source-generated fragment ions of single molecules. The final data set consisted of 93,526 features. Those were aligned into 4,188 buckets with a bucket being defined as an
m/z and retention time (RT) region hosting all features with matching
m/z and RT (
69). We created a chemotype-barcoding matrix of this complex data set, allowing its visualization and evaluation (
Fig. 3A). After normalization of the data set by buckets congruent with the media controls (in total 1,452), we determined in total 2,736 buckets as representing produced metabolites of the investigated
Chitinophaga set.
The detected buckets were analyzed for presence in all combinations considering utilized cultivation media and incubation time. They were put in order according to 16
S rRNA sequence similarity on strain level. In order to facilitate data interpretation and to identify strain specific as well as conserved metabolites, all buckets were sorted according to their frequency of appearance. Comparative visualization of the short (4 days) and prolonged (7 days) incubation time revealed differences on the chemotype profile of the strains. Most strains metabolize the majority of media components within the first 4 days of cultivation, recognized by only a small number of “media buckets” still present in the extracts after this incubation period. A second population of rather slow-growing strains appeared to shift the metabolic profile only after 7 days of incubation in comparison to the respective media control (e.g.,
C. caeni KCTC 62265,
C. dinghuensis DSM 29821,
C. niastensis DSM 24859,
C. barathri, and
C. cymbidii). Especially
Chitinophaga sp. DSM 18078 required a prolonged incubation time to metabolize the media ingredients, producing 77.4% more metabolite buckets after 7 days of incubation in comparison to the earlier sampling time. Within the whole data set, on average 26.6% (92.4) more metabolite buckets were detected after seven than after 4 days of incubation. In contrast, a fraction of bacterial metabolite buckets disappeared within the extracts of five strains after prolonged incubation, showing the necessity to vary cultivation conditions to access a possibly comprehensive metabolite profile of each investigated strain. The many media-specific buckets (colored) compared with the ones being produced in various media (black) also show the effect of variations of the bacterial nutrient supply (
Fig. 3A). In combination with the varied cultivation period, this led to an average of approximately 46 unique buckets per strain. Especially
C. flava KCTC 62435 and its close relative
C. eiseniae DSM 22224 outcompeted the others by producing 305 and 150 unique metabolite buckets respectively (
Fig. 3B), indicating their status as best-in-class producers. In total 1,154 buckets (~42%) of the entire data set were identified only in one respective strain data set, representing strain-specific metabolites (
Fig. 3C). This high level of strain-specific metabolites shows the heterogeneity of the investigated strain set. In parallel, this experiment depicts a structural diversity in terms of molecular size up to 5145.051 Da and covered polarity range (Fig. S2).
Analysis of the conserved metabolites (which includes primary and secondary metabolites) revealed 357 buckets to be present in at least 10 out of the 25 total analyzed strains while only the low number of seven buckets could be detected within samples of all Chitinophaga strains investigated. Based on their MS2-fragmentation patterns sharing the loss of long carbon chains, we postulated a structural relationship between four of them and assigned them as hitherto unknown (amino/phospho) lipids.
Next, the complete LC-MS data set was examined for the presence of structurally characterized microbial NPs (~1,700) deposited in our in-house database on the basis of accurate
m/z, RT, and isotope pattern. The frequency of rediscovery was zero. Considering the known bias of the database for natural products from classical NP producing taxa such as Actinobacteria, Myxobacteria, and fungi, this confirmed the low congruency toward these taxa, which was also found by our BGC categorization study. A complementary scan of LC-MS/MS data of the entire data set was compared with
in silico fragments of >40k NPs deposited in the commercial database AntiBase (
70). Congruently, no database-recorded NP was identified within our data set besides falcitidin (
71), an acyltetrapeptide produced by a
Chitinophaga strain. Although this finding shows the general applicability of this workflow, the underrepresentation of NPs isolated from the phylum Bacteriodetes also in public databases is still a severe limitation for comprehensive categorization of their omics-data today.
However, as a consequence, these data provided a high confidence level in the investigated strain portfolio and showed that the metabolite spectrum produced by these strains is largely underexplored and different from the metabolites produced by classical NP producer taxa. Even though strains of the genus Chitinophaga are phylogenetically closely related, they produce a heterogeneity of metabolites; thereby, showing a high number of strain-specific metabolites, associated with a likelihood for chemical novelty.
Chitinopeptins, new CLPs from C. eiseniae and C. flava.
A correlation of those untapped buckets with antibacterial activity was performed by screening the organic crude extracts against a panel of opportunistic microbial pathogens. In particular, methanol extracts of
Chitinophaga eiseniae DSM 22224 and
Chitinophaga flava KCTC 62435, the two strains with the highest level of metabolic uniqueness, exhibited strong activity against
Candida albicans FH2173. Bioactivity assay and UHPLC-HR-ESI-MS guided fractionation led to the identification of six new cyclic lipodepsipeptides.
C. eiseniae produced the two tetradecalipodepsipeptides chitinopeptins A and B (1 and 2) with molecular formulae C
82H
137N
17O
28 (1, [M+H]
+ 1809.0052) and C
81H
135N
17O
28 (2, [M+H]
+ 1794.9907). Whereas
C. flava assembled the four pentadecalipodepsipeptides chitinopeptins C1+C2 and D1+D2 (3 to 6) with molecular formulae C
84H
140N
18O
30 (3 and 4, [M+H]
+ 1882.0177) and C
83H
138N
18O
30 (5 and 6, [M+H]
+ 1868.0059) (
Fig. 4). All compounds were present in the MS spectra as pairs of [M + 3H]
3+ and [M + 2H]
2+ ions (Fig. S3A). All six native peptides appeared to be highly stable, because only poor yields of fragment ions arose using electrospray ionization source (Fig. S3B), even under elevated collision energy conditions (up to 55 eV), thereby preventing MS/MS based structure prediction and structural relationship analysis using molecular networking.
The chemical structures of the six compounds were determined by extensive NMR studies using 1D-
1H, 1D-
13C, DQF-COSY, TOCSY, ROESY, multiplicity edited-HSQC, and HMBC spectra (
Fig. 4). The analysis of the compounds in most “standard” NMR solvents was hampered by either extreme line broadening (DMSO, MeOH, pyridine) or poor solubility (H
2O, acetone). However, a mixture of H
2O and CD
3CN (ratio 1:1) gave rise to NMR spectra of high quality and confirmed the presence of peptides. In order to obtain a good dispersion of the amide resonances,
1H-spectra were acquired at different temperatures between 290 and 305 K. A temperature of 299 K or 300 K, respectively, was found to be the best compromise considering signal dispersion and line broadening (Fig. S4 to 31 and Table S2 and S3).
The first compound to be studied was chitinopeptin A. The analysis of the NMR spectra revealed the presence of several canonical amino acids (1 Thr, 1 Ala, 1 Ile, 1 Ser, 1 Lys, and 2 Leu), and several hydroxylated amino acids (3 β-OH Asp, 1 β-OH Phe, 1 β-OH Ile). In addition, one N-methyl Val and one 2,3-diaminopropionic acid (Dap) moiety could be assigned. The sequence of the amino acids was established by correlations in the ROESY (NHi/NHi+1, NHi+1/Hαi) and HMBC spectrum (C’i/NHi+1). The formation of a cyclic peptide was indicated by the 1H-chemical shift of the β-proton of the Thr residue in position 2 (5.23 ppm) and the correlation in the HMBC spectrum between the carbonyl carbon of the C-terminal β-OH Ile (position 14) and the β-proton of the Thr residue.
Aside from the amino acids, a modified fatty acid residue was identified which could be described as 2,9-dimethyl-3-amino decanoic acid. Correlations in the HMBC spectrum between the carboxyl carbon (C1) and the N-methyl group of the N-methyl Val proved its position at the N-terminus of the peptide. The structure of chitinopeptin B was almost identical to the structure of CLP 1. The only difference was the substitution of the 2,9-dimethyl-3-amino decanoic acid by 3-amino-9-methyl decanoic acid.
CLPs 3 and 4 were isolated as a 5:4 mixture of two components. One of the main differences compared with the structures above was an additional Dap residue, which was inserted between the fatty acid moiety and N-methyl Val. Both components contain an Ile instead of a Leu (CLPs 1 and 2) at position 10. Furthermore, the 2,9-dimethyl-3-amino decanoic acid is replaced by a 3-hydroxy-9-methyl decanoic acid. The two components 3 and 4 differ in the constitution of the Dap in position 9. In one component 4, the peptide bond between the α-amino function and the carbonyl group of Lys is formed, while in the other component 3, the β-amino group (side chain) is connected to the carbonyl group of Lys. The same pair of structures as for CLPs 3 and 4 is obtained in the case of CLPs 5 and 6. In contrast to the previous ones, both components contain a Val in position 10 instead of Leu or Ile, respectively.
The absolute stereochemistry of the amino acids in the CLPs was determined by using advanced Marfey’s Analysis (
72). Comparison of the RTs with (commercially available) reference amino acids allowed identification of nine out of 14 (CLPs 1 and 2) and 10 out of 15 amino acids (CLPs 3–6), respectively. RTs of
N-methyl-
l-Val,
d-allo-Thr,
d-Ala,
d-allo-Ile (2 × for CLPs 3 and 4),
l-Ser,
d-Lys,
l-Dap (2x for CLPs 3 to 6),
d-Leu (only CLPs 1 and 2),
l-Leu, and
d-Val (only CLPs 5 and 6) matched the reference ones. Assigning
l- as well as
d-leucine within structures 1 and 2 to position 13 and 10, respectively, was possible because position 10 was the only variable position in all six depsipeptides. Either Ile (CLPs 3 and 4) or Val (CLPs 5 and 6) were identified at this position with all amino acids having
d-configuration. Therefore, it can be assumed that
d-Leu is present at position 10 in CLPs 1 and 2 (Fig. S32 and 33).
Authentic samples of the β-hydroxyamino acids or suitable precursors were synthesized utilizing modified literature known procedures. All four stereoisomers of β-hydroxyaspartic acid were obtained from (−)-dibenzyl
d-tartrate or (+)-dibenzyl
l-tartrate, respectively, according to a procedure described by Breuning et al. (
73). While the
anti-isomers are directly accessible, the
syn-isomers were obtained by selective base-induced epimerization of the azido-intermediates and separation of the two isomers by HPLC. Cbz-protected
l-isomers of the β-hydroxyphenylalanines and β-hydroxyisoleucines were synthesized starting from an orthoester protected
l-serine aldehyde, initially described by Blaskovich and Lajoie (
74–76). To cover the corresponding
d-isomers for analytical purposes, racemic samples of the amino acids were produced by racemization of the orthoester protected
l-serine aldehyde by simple chromatography on silica (
74). Advanced Marfey’s Analysis determined (2
S,3
S)-3-hydroxyaspartic acid, (2
S,3
R)-3-hydroxyphenylalanine and (2
S,3
R)-3-hydroxyisoleucine as the absolute stereochemistry of β-hydroxyamino acids for all six CLPs (Fig. S34 to 36).
To the best of our knowledge, these represent the first CLPs described from the genus
Chitinophaga and after the recently described isopedopeptins (
12), the second CLP family of the entire phylum. They contain a high number of non-proteinogenic amino acids (i.e., 12 of 14 amino acids in CLPs 1 and 2, and 13 of 15 in CLPs 3 to 6). Beta-hydroxylations of Asp, Phe, and Ile are the most abundant modifications and
N-methyl Val and Dap are incorporated into the peptide backbone. Furthermore, during LC-MS analysis, a mass shift of 52.908 Da accompanied with an emerging UV maximum at 310 nm, as well as a shift in RT was observed for compounds 1 to 6. This traced back to the coordination of the compounds to iron impurities during LC-analysis, which was confirmed by the addition of Fe(III)-citrate to the compounds prior to LC-MS analysis (Fig. S37). We postulated that the RT shift is due to a conformational change and altered polarities as a consequence of iron complexation. Iron coordination is a known feature of siderophores, produced by bacteria upon low iron stress (
77). To investigate the impact of iron on the CLPs production, CLP 1
C. eiseniae was cultured in 3018 medium (for composition see Materials and Methods section) supplemented with different iron concentrations. However, the overall production of CLPs 1 and 2 and their iron complexes was not repressed by increased iron levels, contrasting the iron-responsive productivity of classical siderophores (Fig. S38).
Chitinopeptins A to D were tested against eight Gram-negative and five Gram-positive bacteria, as well as against three filamentous fungi and C. albicans (Table S4). For all tested CLPs, activity was observed against M. catarrhalis ATCC 25238 and B. subtilis DSM 10, exhibiting MICs down to 2 µg/mL. The tetradecalipodepsipeptides 1 and 2 exhibited activity at 4 to 8 µg/mL against C. albicans FH2173, while the pentadecalipodepsipeptides exhibited MICs of only 16 µg/mL. Screenings against filamentous fungi revealed MICs of 16 µg/mL against Z. tritici MUCL45407, while no activity was observed against A. flavus ATCC 9170 and F. oxysporum ATCC 7601. To investigate the impact of iron-binding on the bioactivity, CLP 1 was tested as a representative also in its iron complexed form confirmed by LC-MS analysis (Fig. S37). This revealed that the bioactive potency of the iron complex is reduced in comparison to its iron free form, although not completely suppressed (Table S4).
BGCs corresponding to chitinopeptins.
In order to identify the BGCs encoding the chitinopeptins’ biosynthesis, we scanned the genomes of
C. eiseniae (FUWZ01.1) and
C. flava (QFFJ01.1) for NRPS-type BGCs matching the structural features of the molecules. The number and predicted substrate specificity of the A-domains, the overall composition of the NRPS assembly line, as well as precursor supply and post-assembly modifications were taken into account. We identified the BGCs in
C. eiseniae and
C. flava congruent to the CLP structure in each case (
Fig. 5A). Furthermore, the positioning of all epimerization domains within the detected NRPS genes, encoding the conversion of
l- to
d-amino acids, is in agreement with the determined stereochemistry of the molecules. Thereby, the domains are classically embedded in the NRPS assembly lines. No racemase(s) encoded in
trans or C domains catalyzing the conversion of amino acids, are observed as it is the case e.g., in the BGC of the stechlisins, CLPs produced by
Pseudomonas sp. (
78).
In our BiG-SCAPE network, these BGCs from
C. eiseniae and
C. flava were part of a GCF, and manual inspection confirmed two further related BGCs within the genomes of
C. oryziterrae JCM16595 (WRXO01.1) and
C. niastensis DSM 24859 (PYAW01.1) (
Fig. 5B). Besides the structural NRPS genes, further genes are conserved between all four BGCs, predicted to encode an ATP-binding cassette (ABC) transporter, a transcription factor, an S8 family serine peptidase, a thioester reductase domain, and a metal β-lactamase fold metallo-hydrolase. The biosynthesis of these CLPs requires a β-hydroxylation tailoring reaction of the precursor Asp, Phe, and Ile. This structural feature is also present in chloramphenicol and its biosynthesis was shown to be catalyzed by the diiron-monooxygenase CmlA that catalyzes substrate hydroxylation by dioxygen activation. CmlA coordinates two metal ions within a His-X-His-X-Asp-His motif and possesses a thioester reductase domain (
79). Both features are also present in all four detected BGCs, encoded by two separate genes annotated as thioester reductase domain and metal β-lactamase fold metallo-hydrolase (gene 3 and 4) (
Fig. 5B and Table S5). Moreover, all four BGCs contain genes with sequence similarity to
sbnA and
sbnB, encoding enzymes that catalyze the synthesis of the non-proteinogenic amino acid Dap (
80,
81) (Table S6). The presence of genes encoding for supply with precursor amino acid(s) is a common feature in BGCs corresponding to NPs described in the Firmicutes and Actinobacteria phyla (
82–87). Interestingly, within the BGC of
C. eiseniae a
sbnB homologue is missing. To rule out an error during genome sequencing, assembly, and annotation, we amplified the respective section by PCR and confirmed the published genome sequence. However, not encoded in
cis,
C. eiseniae carries further
sbnA and
sbnB homologues. These are encoded in other NRPS-type BGCs (
FUWZ01000006, location: 555,550 to 557,479 and
FUWZ01000004, location: 273,583 to 275,522) and potentially function in
trans to compensate the absence of the gene within the chitinopeptin A and B BGC. In general, we observed that the Dap subcluster
sbnA/B is an abundant genetic feature of Bacteroidetes secondary metabolism, present in many BGCs. This is not restricted to
Chitinophaga, but expanded to different genera and in consequence, a structural feature of Bacteroidetes NPs predictively found with high frequency. Indeed, Dap moieties are present in the known antibacterial Bacteroidetes compounds isopedopeptins (
12) and TAN-1057 A to D (
43).
Considering the composition of the structural NRPS genes in terms of A domain number as well as predicted A domain substrate specificity (
88), we propose that
C. oryziterrae and
C. niastensis carry the potential to provide additional structural variety to the chitinopeptins (Table S7). The NRPS gene of
C. niastensis encodes 16 A-domain-containing modules, predictably producing hexadecapeptides. Furthermore, an initial PKSI module replaces the C-starter domains present in the three other BGCs of this GCF. This points toward to an attachment of carboxylic acid residues to the peptide scaffold, resulting in further compound diversification. While
C. oryziterrae was not available for cultivation, we inspected the extracts of
C. niastensis for putative further CLPs. Indeed, although only detected in traces, possible products could be identified based on comparable RTs and isotope patterns with
m/z of 680.9817 [M + 3H]
3+ and 685.6545 [M + 3H]
3+ (Fig. S39).