Computing gene expression profiles of the host and the microbiome from dual RNA-seq data
First, we extracted high-quality RNA (RNA integrity number [RIN] scores ≥ 8.6; Table S1) from the intestinal sites (cecum, transverse colon, and rectum) of five common marmosets (Supplemental Note Section 1) and carried out dual RNA-seq with NovaSeq 6000 v1.5 (100 bp paired-end). The obtained RNA read data amounted to a total of 4,724 M reads (mean 157,450,033 paired-end reads/sample; Table S2; Supplemental Note Section 2).
Next, as the RNA read data obtained by dual RNA-seq contains a mixture of reads derived from both the host and the microbiome, it is necessary to separate these read data
in silico (Table S3; Supplemental Note Section 3). The pipeline established in this study separates the host-derived read data and the microbiome-derived read data by aligning the RNA reads to the host reference genome (
Fig. 1B). To avoid transcriptomic cross-contamination of read data between host and microbiome read data, we adjusted the pipeline parameters using simulated RNA read data from the host and the microbiome (Supplemental Note Section 4). As a result, we achieved a precision of 1.0 in separating the read data of the host and the microbiome, and our established pipeline completely separated the reads derived from the host and those from the microbiome (Table S9 posted at
https://doi.org/10.5281/zenodo.10425960).
After the above preprocessing, we calculated the gene expression levels (transcripts per million [TPM]) for both the host and the microbiome. As a result, we detected 16,578 host-expressed genes and 8,533 microbiome-expressed genes in total (Supplemental Note Sections 5 and 6). To examine the expression patterns across different sites and across different individuals for both host genes and microbiome genes, we applied principal component analysis (PCA) to the gene expression profiles for visualization. There is a clear separation between the host gene expression profiles for each intestinal site, and the host samples were not grouped based on individuals (
Fig. 2A and B). The microbiome profiles at the cecum and rectum, which are the beginning and end of the colon, respectively, were separated by PC1, with samples from the transverse colon, situated between the cecum and rectum, spanning across both sites (
Fig. 2C). For the microbiome profiles as well as for the host ones, samples were not separated based on individuals (
Fig. 2D). These results suggest that both host and microbiome gene expression profiles are more variable across different sites compared with those across individuals.
Spatial variations in host gene expression among intestinal sites
Of the 16,578 host genes detected in three regions of the common marmoset’s intestinal tract—cecum, transverse colon, and rectum—a total of 2,615 genes showed significant expression variations among these sites (FDR < 0.01) (Table S4; Supplemental Note Section 5). Furthermore, upon conducting functional pathway analysis, 40 pathways were identified, all of which were significantly upregulated in the cecum compared with the transverse colon (normalized enrichment score > 1.5, FDR < 0.1; Table S10 posted at
https://doi.org/10.5281/zenodo.10425960). In particular, pathways with high normalized enrichment scores, such as the Fc epsilon RI signaling pathway, T cell receptor signaling pathway, leukocyte transendothelial migration, Yersinia infection, and B cell receptor signaling pathway, were detected, indicating that pathways related to immune functions were upregulated in the cecum relative to the transverse colon (
Fig. 3A). Additionally, a substantial number of genes involved in these functional pathways were found to be significantly upregulated in the cecum (FDR < 0.01). For example, among the genes specifically upregulated in the cecum, we found genes related to B cell receptor signaling. These include
CD19, which acts as a coreceptor for the B cell antigen receptor complex and augments B cell activation (
22);
CD79A and
CD79B, components of the B cell antigen receptor that generate signals upon antigen recognition (
23); and
CD22 and
CD72, which inhibit the signal transduction of the B cell antigen receptor (
24,
25). In addition, genes associated with T cell receptor signaling were also found as being highly expressed specifically in the cecum. These include
CD28, a co-stimulatory receptor for T cell receptors that activates T cells (
26);
CD3D, a component of CD3 that plays a role in signal transduction during T cell activation (
27);
CD40LG, which is induced on the T cell surface following T cell receptor (TCR) activation (
28); and
CTLA4, which attenuates T cell activation and fine tunes immune responses (
29) (
Fig. 3B).
Interpreting host-microbiome interactions through spatial gene co-expression network analysis within the intestine
We constructed the unified gene co-expression network composed of 1,359 gene nodes, which embodies co-expression information on host-host, host-microbiome, and microbiome-microbiome interactions (Tables S11 to S13 posted at
https://doi.org/10.5281/zenodo.10425960; Supplemental Note Section 7). Subsequently, to identify densely co-expressed gene sets within this network, we implemented the clique-based module extraction algorithm proposed in this study. As a result, we then identified 27 gene modules containing both host and microbiome genes (Tables S14 and S15 posted at
https://doi.org/10.5281/zenodo.10425960). Furthermore, upon conducting a functional pathway enrichment analysis on these gene modules, we found that 70.4% (19 out of 27) of gene modules could be functionally characterized (Supplemental Note Section 8).
The modules we identified were characterized by functional pathways including the B cell receptor signaling pathway, fructose and mannose metabolism, oxidative phosphorylation, and biosynthesis of amino acids (
Fig. 4). In addition, we provided visualizations of the bacterial species that contribute to microbiome gene expression within these identified modules, together with host and microbiome genes (
Fig. 5 through 8).
Module_07, as shown in
Fig. 5, was identified as a module with a positive correlation between the host gene fructose-1,6-bisphosphatase 1 (
FBP1), which is highly expressed in the transverse colon, and 40 microbiome genes. These 40 gut microbiome genes included those associated with fructose-mannose metabolism, such as K01840 (
manB), K00850 (
pfkA), and K01818 (
fucI), as well as those related to oxidative phosphorylation, such as K00241 (
sdhC,
frdC), K00426 (
cydB), K02112 (
atpD), and K02121 (
ntpE,
atpE). The most substantial contributor to the expression of these microbiome genes was
Bacteroides vulgatus.
FBP1 is known to have reduced expression in various tumor tissues, including the esophagus (
30), liver (
31), lung (
32), and ovarian (
33) cancer. It has been confirmed that its expression increases with fructose treatment in intestinal organoids (
34). Previous studies have identified
FBP1 as a differentially expressed gene in the small intestine of germ-free mice, suggesting that the intestinal microbiome influences the expression of the host’s
FBP1 gene (
35), but the bacterial species and genes affecting the expression of the
FBP1 gene have not been identified.
Bacteroides vulgatus, which contributed to the expression of fructose-mannose metabolism-related microbiome genes observed in Module_07, has been reported to release fructose by decomposing polysaccharides (
36,
37). In this module, the gene K00850 (
pfkA) involved in the fructo-oligosaccharide degradation and the gene K01818 (
fucI) involved in the 2′-fucosyllactose degradation were detected, revealing a novel association between polysaccharide degradation genes encoded by
Bacteroides vulgatus and the host gene
FBP1.
In Module_01, the B cell receptor signaling pathway was enriched (
Fig. 6). This module featured covariation between host B cell-specific genes, including
CD19,
CD22,
CD79B, and
PTPN6, which were highly expressed in the cecum, and the tryptophan synthesis gene K01696 (
trpB) from the microbiome. The upregulation of these host B cell-specific genes indicates an increase in B cells within the cecum. This tryptophan synthesis gene K01696 (
trpB) expression was contributed to by
Parabacteroides distasonis. Tryptophan is an essential amino acid that, upon uptake by intestinal cells and subsequent metabolism, can activate the Aryl hydrocarbon receptor and influence B cell immune responses (
38). Tryptophan is also known to ameliorate autoimmune diseases, such as inflammatory bowel disease (
39).
Parabacteroides distasonis has been suggested to be involved in various autoimmune conditions in several studies (
40,
41). For instance,
Parabacteroides distasonis levels have been shown to be lower in multiple sclerosis (MS) patients compared with healthy controls, and transplanting the intestinal microbiota from MS patients to germ-free mice resulted in increased severity of autoimmune encephalomyelitis symptoms (
40). While previous studies have reported associations between tryptophan and autoimmune diseases via B cell immune responses, as well as between autoimmune diseases and
Parabacteroides distasonis, this study identifies a correlation between the tryptophan synthesis gene K01696 (
trpB) from
Parabacteroides distasonis and B cell-specific genes (
CD19,
CD22,
CD79B, and
PTPN6).
In Module_21, a group consisting of three host genes and 40 microbiome genes was identified. The three host genes included
MMP10 and two uncharacterized genes LOC103795407 and
CCDC162P, all of which showed high expression in the cecum (
Fig. 7). The microbiome genes that showed a positive correlation with these genes included K01491 (
folD), K00605 (
gcvT), K01803 (
tpiA), K00831 (
serC), K15633 (
gpmI), and K00024 (
mdh), all of which are involved in carbon metabolism pathways.
Bacteroides vulgatus was identified as the bacteria contributing to the expression of these microbiome genes. The
MMP10 gene plays a crucial role in maintaining epithelial barrier function and is necessary for resolving colonic epithelial damage (
42). A study examining the host transcriptome and microbiota composition of colorectal cancer tumors found high expression of
MMP10 in tumor tissues and a significant increase in the genus
Bacteroides, indicating a link between the host gene
MMP10 and the genus
Bacteroides (
43). These findings are consistent with our observations. Furthermore, in the identified Module_21, not only the relationship between the host gene
MMP10 and
Bacteroides vulgatus but also a group of microbiome genes related to carbon metabolism pathways showing a positive correlation with the host gene
MMP10 was identified. One hypothesis explaining the observed positive correlation between carbon metabolism-related genes encoded by
Bacteroides vulgatus and the host gene
MMP10 is that the microbiome may form colonies within the host’s mucosal layer. Mucin, the principal component of mucus secreted by the gastrointestinal mucosal epithelium, contains galactose and promotes bacterial settlement and adhesion (
44), serving as a nutrient source for certain bacteria.
Bacteroides vulgatus possesses mucosal adherence factors, allowing access to host-derived carbon sources (
45). The upregulation of
MMP10 in the cecum might represent a protective response against colonization by
Bacteroides vulgatus. The detection of carbon metabolism-related genes of
Bacteroides vulgatus in module_21 suggests they might be utilizing carbon sources derived from mucin or dietary glucose. Specifically, while
Bacteroides vulgatus cannot degrade human mucin independently, it can acquire carbon sources from mucin in co-culture with
Akkermansia muciniphila (
46), which possesses genes detected in this module. When
Akkermansia muciniphila converts galactose to glucose,
Bacteroides vulgatus might then activate these carbon metabolism-related genes to utilize this glucose (Fig. S1). Our findings indicate a relationship between the carbon metabolism-related genes of
Bacteroides vulgatus and the host gene
MMP10; however, further experiments are required to ascertain whether the upregulation of
MMP10 serves a protective role against colonization by
Bacteroides vulgatus and
Akkermansia muciniphila or if
Bacteroides vulgatus utilizes mucin as a carbon source.
In Module_18, 3 host genes and 54 microbiome genes were identified, and the expression of these microbiome genes was further influenced by five bacterial species. The relationships between these host genes and microbiome genes, as well as between the host genes and bacterial species, were consistent with existing knowledge about bile acid synthesis and degradation (
Fig. 8). This module included the host gene
CYP27A1, which is involved in bile acid production (
47), and
NR1H3, a host gene that codes for a bile acid receptor (
48). These genes were upregulated in the cecum and transverse colon compared with the rectum. The microbiome genes interacting with these host genes included K03664 (
smpB), a gene essential for bacterial survival when upregulated in the presence of secondary bile acid deoxycholic acid, K02835 (
prfA), which positively regulates the expression of a gene coding for bile acid hydrolase, and K03527 (
ispH,
lytB), a gene involved in bile acid sensitivity. Additionally,
Bacteroides vulgatus contributed to the expression of all these microbiome genes, which aligns with previous studies showing an increase in
Bacteroides spp. correlating with the expression of the host gene
NR1H3 (
49).