INTRODUCTION
Xylan is the most abundant hemicellulose present in cell walls of higher plants, especially cereal grains and hardwoods (
1). The xylan main chain is composed of β-1,4-linked
d-xylopyranosyl (
d-Xyl
p) residues that can bear substitutions at O-2 and/or O-3 positions.
l-arabinofuranosyl (
l-Ara
f), 4-
O-methyl glucuronyl (
d-MeGlcA
p), and acetyl residues are frequent main-chain substituents, and
l-Ara
f moieties can be esterified by ferulate at their O-5 position. The nature of xylan backbone decorations varies depending on the species, the tissue, and the stage of development of the plant (
2). Generally, graminaceous plants are rich in glucuronoarabinoxylan (GAX), while glucuronoxylan (GX) is found in dicots, the difference between these two categories being the relative amounts of
l-Ara
f and
d-MeGlcA
p present. Complete xylan degradation requires an extensive arsenal of enzymes that can act synergistically (
3). The main chain is depolymerized by β-
d-xylanases (EC 3.2.1.8) that hydrolyze internal β-
1,4 bonds, while decorations are removed by a variety of accessory enzymes, including α-
l-arabinofuranosidases (EC 3.2.1.55), α-
d-glucuronidases (EC 3.2.1.139), feruloyl esterases (EC 3.1.1.73), and acetyl xylan esterases (EC 3.1.1.72). Finally, β-
d-xylosidases (EC 3.2.1.37) break down xylooligosaccharides, removing
d-Xyl
p from the nonreducing end (
4).
Xylanases are mainly found in the glycoside hydrolase (GH) families 5, 8, 10, 11, 30, and 43 in the CAZy database (
www.cazy.org) (
5). The GH10 family constitutes a monospecific family that includes only
endo-xylanases. Enzymes from this family perform catalysis via a retaining mechanism (
6), and their canonical three-dimensional (3D) structure is a TIM barrel, (β/α)
8, which is the most commonly known (2,077 occurrences) protein fold in the Protein Data Bank (PDB) and which forms an active cleft able to accommodate up to seven xylosyl backbone units (
7). In addition, according to the Pfam database (
http://pfam.xfam.org/), 20 to 30% of β-
d-xylanases are multidomain proteins, comprising catalytic domains associated with accessory or helper domains, such as carbohydrate binding modules (CBMs). The latter have been attributed various roles, including the ability to target specific regions in substrates (
8), disrupt polysaccharide structure (
9), or anchor enzymes to bacterial surfaces (
10). In multidomain proteins, individual domains are defined as the structural, functional, or evolutionary units of proteins (
11) and can be regarded as biological equivalents of components in complex devices whose parts can be interchanged. Mostly, domains in proteins are sequentially organized, with one domain following another one. However, around 10% to 20% of domain combinations are discontinuous, with one domain being inserted into another one (
12).
Termites are wood-feeding animals that are considered an abundant source of biomass-degrading enzymes (
13). Termites produce very few endogenous lignocellulose-degrading enzymes, and their gut microbiome is mainly responsible for their ability to capture nutrients and energy from plant biomass (
14,
15). Over the last decade, numerous metagenomics studies revealed enzyme arsenals of termite gut microbiomes and detected promising enzymes for industrial use (
16–20). Notably, Gram-negative
Bacteroidetes, the dominant phylum in many animal digestive systems (
21–25), utilize finely tuned glycan utilization systems. The paradigm for this type of system was provided by the well-studied starch utilization system (Sus) (
26). In Sus-like systems, several proteins are encoded by genes found in a cluster (known as polysaccharide utilization loci, or PUL) and act in a coordinated manner to bind and hydrolyze complex sugars and utilize them for their metabolism (
27). A xylan utilization system (Xus) that is composed of two outer membrane polysaccharide-binding proteins (XusB and XusD), two transporter proteins (XusA and XusC), and two outer membrane proteins (XusE and Xyn10C) was previously described in rumen and human digestive systems (
28). Each of these proteins is expressed from a cluster of tandem genes that are organized as
xusA-
xusB-
xusC-
xusD (or sometimes only
xusC-
xusD), followed by
xusE and
xyn10C, the latter encoding a CBM-containing GH10 β-
d-xylanase (
28).
According to previous data, the CBMs in Xyn10C are inserted into the polypeptide sequence of the GH10 catalytic domain between structural elements β3 and α3 of the TIM barrel (
29). The expression of Xyn10C was shown to be induced by xylan (
30) along with the other xylanases, XynA and XynB. The most effective inducer is demonstrated to be a xylooligosaccharide with a degree of polymerization (DP) around 35, similar to the hydrolysates of Xyn10C (
31). Altogether, this is consistent with the hypothesis that Xyn10C serves as a functional homologue of the
Bacteroides thetaiotaomicron VPI-5482 SusG protein, initiating xylan metabolism through extracellular hydrolysis of polymeric substrates (
28). In this regard, it has been proposed that Xyn10C is used as a functional marker of xylan degradation in the human gut (
28,
30,
32). The potential roles and distributions of Xyn10C have recently attracted considerable attention but have not yet been fully described (
29,
30,
32,
33).
Previously, a putative
xus locus assigned to the genus
Bacteroides, Gram-negative anaerobic bacteria, was identified in a metagenomic library from the microbiome of a fungus-growing termite,
Pseudacanthotermes militaris (
17). This
xus is composed of eight different open reading frames (ORFs) encoding putative XusC/D-like proteins, unknown protein (UNK), GH10 containing an insertion of two CBM4s (GH10|CBM4), GH115, GH11, a putative transporter protein, GH10, and GH43 (
Fig. 1A). The GH10|CBM4 protein, designated
P. militaris 25 (
Pm25) here, presents an insertional modular structure homologous to Xyn10C protein (
Fig. 1B). Here, we describe the characterization of
Pm25 and discuss its activity with respect to its unusual multidomain organization. In addition, the potential function of the UNK was also investigated.
(This research was conducted by H. Wu in partial fulfillment of the requirements for a Ph.D. degree from Toulouse University [
34].)
DISCUSSION
Unlike the vast majority of multimodular enzymes that display a sequential arrangement of their modules, the enzyme described here is characterized by a discontinuous organization that involves the insertion of two CBM domains into one GH10 xylanase domain. In this regard, it is significant that the SSN analysis performed using the amino acid sequence of M6 replacing Pm25 located the sequence within the same cluster, even though the CBMs were omitted (data not shown). This suggests that the Pm25 GH10 domain forms part of a distinct group and implies that the intercalated GH10 arrangement is robust from an evolutionary standpoint. Moreover, the biochemical data described here demonstrate that, despite its discontinuous organization, Pm25 is a fully functional xylanase.
The first
Pm25 analog was identified in a rumen-based member of the
Bacteroidetes phylum (
29). More have since been found in human gut bacteria (
30,
32,
46), with
Pm25 being the first described in termite gut. Several studies have revealed the importance of
Pm25-like GH10 in xylan utilization systems (
29,
30,
32,
33). Using SSN analysis, we have shown that
Pm25-like xylanases are exclusively linked to
Bacteroidetes and are mostly (44 out of 61 based on SSN analysis) adjacent to an
susC-
susD-(
unk) cluster. This evidence of strong conservation is consistent with the fact that in their native host, the genes encoding
Pm25 homologs are highly induced/expressed during growth on xylan (
30,
46). In addition, our data show that the UNK protein upstream of
Pm25 is a xylan-binding protein that strengthens the xylan utilization function of this core cluster (
46), suggesting it is an analogue of SusE, which is also supported by the fact that like SusE, UNK is predicted to have a lipoprotein peptide signal by SignalP (
47). Taken together, one can conclude that each component in the core cluster is essential for xylan utilization by members of the
Bacteroidetes phylum in the gut ecosystem.
The
in vivo function of
Pm25 homologs in gut
Bacteroidetes has not yet been fully established, although it has been suggested that it is a functional homolog of SusG (
28). SusG is a cell surface-bound GH13 α-amylase that catalyzes the initial cleavage of polysaccharides (
48). In our study, we also predict that
Pm25 bears an N-terminal signal peptide that directs it to the cell surface, consistent with a proposal that was previously made for a
Pm25 homolog (
32). Moreover, SusG displays negligible activity compared to periplasmic α-amylases (
48), an observation that is consistent with our findings. Indeed, compared to other xylanases (
41,
49), both
Pm25 and similar elements display quite poor catalytic efficiency toward polysaccharides (
33,
50) and oligosaccharides (
32). This trend is also observed in other polysaccharide-degrading systems, such as mannan utilization loci from members of the
Bacteroidetes phylum (
51) and the xylan-degrading system in the
Proteobacteria (
43) phylum. The underlying reason for such low activity most likely reflects its function. SusG-like proteins probably have a carbohydrate surveillance function, while highly active intracellular enzymes are charged with complete oligosaccharide breakdown prior to sugar catabolism. This clever and “selfish” strategy ensures that readily metabolizable sugars are not released into the environment, where they could be used by other bacteria that lack a specific glycan utilization machinery (
52).
Remarkably, we found that
Pm25 remains active over a broad pH range, maintaining more than 80% of its maximum activity at pH 9.0. This observation correlates well with results obtained for the
Pm25 homologs
Bacteroides intestinalis Xyn10C (
BiXyn10C) and
BiXyn10A, which were identified in the human gut microbiome (
50). Accounting for the fact that alkaline-stable xylanases are sought after for use in applications such as paper pulp biobleaching,
Pm25 might constitute a useful starting point for enzyme engineering aimed at improving its hydrolytic properties.
So far, we have been unable to obtain structural data pertaining to
Pm25, and none is available for its closest homologs. Therefore, at this stage it is tricky to speculate on the exact topology and molecular determinants of its active site. Nevertheless, to gain some understanding, we have examined similarities with the family GH10 xylanase
Cellvibrio japonicus Xyn10C (
CjXyn10C), which displays approximately 30% identity to
Pm25 and whose structure is known (PDB entry
1US3). Like
Pm25,
CjXyn10C exhibits rather poor activity on XOS, ascribed to weak substrate binding in subsite −2 (
43). Unlike most other GH10 enzymes,
CjXyn10C subsite −2 contains G295 in the place of E, whose side chain can hydrogen bond to the substrate. According to sequence alignment,
Pm25 also lacks the vital E residue in subsite −2, an observation that might explain its poor ability to hydrolyze X
4 (
43,
53). Therefore, the −3 subsite with rather strong affinity value (2.76 kcal/mol) compared to others (
53) is probably involved in the glycine subsite in the degradation of X
4 to compensate for the poor −2 subsite. Taken together, a hypothetical subsite mapping of the active site of
Pm25 with XOS is proposed for
Pm25 (
Fig. 5D).
The two CBM4s that are inserted into
Pm25 clearly contribute to the binding and degradation of complex biomass. Our results reveal that this is especially true when both CBMs are functional and suggest that binding of large ligands involves a cooperativity phenomenon (
Fig. 8). However, based on the PULDB database, the number of CBM domains in
Pm25 homologs varies from one to three, and the CBMs are from different families, CBM4, CBM22, or unclassified. This suggests that the SusG-assimilated functions can be fulfilled by enzymes that are not configured in an identical way. Moreover, it also confirms that the TIM-barrel fold in the GH10 family is quite accommodating in terms of insertions at the β3/α3 loop.
Apparently, unlike many highly active periplasmic endoglucanases, such as SusA (
48) and
CjXyn10D (
43), extracellular enzymes such as SusG,
CjXyn10A, and
CjXyn10C are generally appended to CBMs (
43). Therefore, it is of interest to discuss the reason for this. CBM58 in SusG (
54) and the CBM4s in
Pm25 appear to improve the ability of the enzymes to hydrolyze insoluble substrates (
Fig. 8), while CBM15 in
CjXyn10C does not play an important role in catalysis, irrespective of whether the substrate is soluble or not (
43). However, our data suggest that the affinity of
Pm25 for soluble substrates was mostly derived from the binding ability of the CBMs (
Table 2). In light of this observation, we propose that CBMs in membrane-associated enzymes temporarily withhold soluble oligosaccharides before their importation into the cell. This implies that the function of the CBM4 domains would be relatively independent of that of the GH10 domain. In this regard, it is noteworthy that the first structure of a SusG protein (
54), which reveals that a CBM58 domain is inserted into the B domain of the GH13 α-amylase domain, reports that CBM58 does not form hydrogen bonds with the catalytic domain, an observation that argues in favor of an independent function. Regarding
Pm25, evidence for an independent function of the CBM4 domains is provided by the fact that the xylan-degrading profile of the
Pm25 wild type was almost identical to that of the CBM-deleted version, M6 (
Fig. 7), and the fact that the xylan binding affinity of CBMs was relatively unaltered when the CBM domains were separated from the GH10 domain (
Table 2). Finally, it is also useful to recall that the affinity values determined for subsites −4 and −3 of
Pm25 and M6 were nearly identical. Therefore, we believe that the catalytic center of
Pm25 and the binding surfaces of the CBM4 domains are disconnected, an organization that corresponds to independent functions and contributes to low enzyme reaction rates (
55).
In conclusion, focusing on a termite gut-derived enzyme, we have provided further insight into the properties and function of Xyn10C-like enzymes that form part of core xylan utilization systems. This system seems to be rather efficient in terms of evolution, since it is conserved in termite gut, rumen, and human gut. Therefore, the role of the CBM insertion is an interesting question. In this respect, we have thoroughly succeeded in characterizing the enzyme and shown that the CBM4 domains can be successfully excised without loss of catalytic function. Regarding the enzyme’s substrate specificity, although it is difficult to speculate on the group of polysaccharides that might be preferential substrates in the termite gut environment, we have shown that it is better adapted for the hydrolysis of arabinoxylans than glucuronoxylans, which is consistent with the fact that the host termite feeds on crops such as sugarcane rather than wood.