Recovery and sequencing of in vivo-selected transposon libraries.
In order to recover the Tn library from harvested tissue, ~105 to 106 CFU from lungs and thoracic lymph nodes were plated. Samples from 4 cattle were lost due to fungal contamination; therefore, samples processed represent 20 cattle. Lung samples were plated from all 20 animals, and thoracic lymph nodes samples were plated from 6 cattle. Bacteria were grown for 4 to 6 weeks before harvesting for genomic DNA extraction and sequencing (see Table S1 for assignation of sequencing files). The insertion densities of the output libraries were compared to the input library for each sample (Fig. S2). Libraries recovered from lung lesions from 20 different cattle contained an average of 14,456 unique mutants, and those recovered from the thoracic lymph nodes contained an average of 16,210 unique mutants. Given that the input library contained 27,419 unique mutants, this meant that there was an ~40 to 50% reduction in insertion density in the output libraries compared with the input. Good coverage of coding sequences (CDSs) was maintained, as the output libraries still contained insertions in (on average) 68 to 70% of the open reading frames. Given the loss of diversity of the individual output libraries, we pooled samples from the lungs and separately from the lymph nodes. The insertion densities of the pooled samples from the lungs were 33,039/66,931 permissible TA sites, and from the nodes were 25,072/66,931 permissible TA sites. This represented ~50% and ~38% saturation. Therefore, using this approach, the diversity of the input pools was maintained.
Calculation of the log2 fold change in the read counts between the input and output libraries allowed a measurement of the impact of the insertions on the survival of mutants in cattle. In order to determine statistically significant changes in the representation of mutants between the input and output libraries, we analyzed pooled samples from the lungs (20 cattle) and pooled samples from the lymph nodes (6 cattle). However, in recognition that cattle are genetically more heterogenous than standard laboratory mice, we have included data where we have calculated the log2 fold change between the output library and the input library for individual samples in addition to the pooled data set. The entire data set is shown in Table S3, and a volcano plot is shown in Fig. S3.
A comparison of the mean log2 fold change between lung and lymph node samples showed good correlation (Spearman’s rho, 0.88; P < 2.2e-16) (Fig. S4). TRANSIT resampling was performed to compare the composition of the mutant population in the lungs and thoracic lymph nodes of paired cattle; it was also applied to compare all the thoracic lymph nodes with the lungs of all cattle samples. TRANSIT analysis did not find any statistically significant differences, indicating that there were no differences in mutant composition between the tissue sites.
Using an adjusted
P value cutoff of ≤0.05 and a log
2 fold change of −1.5 in either lungs or lymph node, insertions in 300 genes caused significant attenuation in cattle. Of these genes, 220 had been previously described as being required
in vivo in
M. tuberculosis H37Rv in standard mouse models through the use of whole-genome Tn screens representing ~73% overlap with the previous literature (
8–10). These genes are given in Table S3 (“Significant genes” tab). No insertion mutants were significantly over-represented in the library. Although
Mb0025 was over-represented in both lungs and nodes (log
2 fold change, 7 to 8 in the pooled analyses), significant cutoffs were not reached, and this may be reflective of a lower number of TA sites in this gene, which limits statistical power.
Mb0025 overlaps with
Mb0024 and is the result of a frameshift mutation in the AF2122/97 genome. This mutation is also found in other assembled
M. bovis genomes, and we could find no evidence for lack of conservation of this frameshift mutation in global collections (
20).
Mb0024 and
Mb0025 represent orthologs of the 5′ and 3′ ends of
Rv0024, respectively, which is annotated as a p60 homologue involved in cell-to-cell spread (
21). The functionality of
Mb0024 and
Mb0025, or the impact on the transposon insertion, is not known.
Comparison with mutations known to cause attenuation in the MTBC.
Insertions in the RD1-encoded ESX-1 type VII secretion system secreting virulence factors and immunodominant antigens EsxA (CFP-10) and EsxB (ESAT-6) are expected to cause attenuation (
22). The impacts of insertions in this region are summarized in
Fig. 3 but are also available in Table S3 (“RD regions” tab) and Fig. S5. Insertions in genes encoding the structural components of the apparatus (
eccB1,
eccCa1,
eccCb1, and
eccD1) were significantly attenuated according to the criteria (adjusted
P value cutoff of ≤0.05 and a log
2 fold change of −1.5). Insertions in
eccA1, which also codes for a structural component of the apparatus, were not attenuating despite good insertion saturation in this gene. This is supported by the work of others who have shown that deletion of
eccA1 in
Mycobacterium marinum leads to only a partial secretion defect (
23). There were no impacts seen due to insertions in accessory genes
espJ,
espK, and
espH. The lack of attenuation seen in
espK mutants is supported by other studies showing that this gene is dispensable for secretion through the apparatus and is not required for virulence of
M. bovis in guinea pigs (
24,
25). Insertions in
esxA and
esxB resulted in severe attenuation (log
2 fold change of −6 to −7.5) but did not reach significance cutoffs (adjusted
P ≤ 0.05). This is likely to be due to the small number of TAs in these genes, which makes it challenging to measure mutant frequency, despite utilizing a pooled approach.
High levels of attenuation seen were in genes involved in the synthesis of the cell wall virulence lipids PDIMs (
ppsABCDE and
mas with log
2 fold changes of −7 to 7.5 [Table S3, “Mycolipids” tab]). PDIM synthesis is well known to be required for the survival of
M. tuberculosis and
M. bovis in mice and guinea pigs (
26,
27). MmpL7 is involved in PDIM transport, and there is evidence that it is phosphorylated by the serine-threonine kinase PknD (
28). PknD-MmpL7 interactions are thought to be perturbed in
M. bovis AF2122/97, as
pknD is split into two coding sequences in the bovine pathogen by a frameshift mutation (
29). The data presented here suggest that MmpL7 is required
in vivo in cattle despite the frameshift mutation in
pknD.Iron restriction is thought to be a mechanism by which the host responds to mycobacterial infection, although different cellular compartments may be more restrictive than others (
30). Insertion in many of the genes involved in mycobactin synthesis (
Mb2406-Mb2398c,
mbtJ-mbtH) were attenuating in cattle (
Fig. 4; Table S3, “Mycobactin synthesis” tab). As mycobactin is required for the acquisition of iron, this confirms that, like other members of the MTBC,
M. bovis needs to scavenge iron from the host for survival (
10,
16).
The role of cholesterol catabolism in
M. tuberculosis is well documented, and it is required for both energy generation and manipulation of the immune response (
31–33). Cholesterol uptake is mediated by the Mce4 transporter coded by the
mce4 operon
Rv3492c-Rv3501c (
Mb3522c-MB3531c) (
34,
35). It has been suggested that an alternative cholesterol acquisition pathway operates in
M. bovis BCG Danish, as, unlike insertions in genes in the downstream catabolic pathway, insertions in the
mce4 operon do not result in attenuation in this strain (
16). In contrast, our study shows that cholesterol transport via the Mce4 transporter is required in
M. bovis AF2122/97 (
Fig. 4; Table S3, “mce4 operons” tab). Interestingly, the significance cutoff (adjusted
P ≤ 0.05) was only reached in the pooled lymph node samples, but it is difficult to say whether this indicates the requirement for Mce4 only occurs in the lymph nodes or if this is due to stochastic effects. The lymph nodes are the site of the engagement with the adaptive immune system and are the site of persistence for
M. tuberculosis in nonhuman primates (
36). The requirement for the Mce4 transporter corroborates work performed in
M. tuberculosis, where Mce4 has been shown to be required for persistence in mice (
8,
34). A comparison of the fitness impact on genes in the cholesterol catabolic pathway in
M. bovis AF2122/97,
M. bovis BCG Danish and Pasteur, and
M. tuberculosis H37Rv is given in
Fig. 5.
Early stages of cholesterol catabolism involve the oxidation of cholesterol to cholestenone, a reaction catalyzed by the 3β-hydroxysteroid dehydrogenase (
hsd) encoded by
Rv1106c/Mb1136c. Rv1106c is not required for the survival of
M. tuberculosis in immunocompetent mice or guinea pigs, and this is thought to be due to the availability of other carbon sources, including glycolytic substrates,
in vivo (
8–10,
37). Insertions in
Mb1136c in
M. bovis AF2122/97 were attenuating (Table S3, “Cholesterol catabolism” tab), and this may be reflective of the inability of
M. bovis AF2122/97 to utilize glycolytic substrates due to a disrupted pyruvate kinase (
pykA) gene (
38,
39). In a recent extended Tn screen utilizing diverse mouse genotypes, Tn insertions in
hsd caused reduced fitness in a small panel of selected genotypes indicates there may be some host genetic component to the requirement for cholesterol oxidation by
hsd (
40). Given the potential for the use of host cholesterol metabolites, specifically cholestenone, as diagnostic biomarkers, this observation might have applications in the development of diagnostics (
41).
Genes that are differentially expressed between Mycobacterium bovis AF2122/97 and Mycobacterium tuberculosis H37Rv.
Several studies have identified key expression differences between
M. bovis AF2122/97 and
M. tuberculosis H37Rv (
29,
42,
43). We examined the data set for insights into the role of differentially expressed genes and transcriptional regulators during infection. One important regulatory system in
M. tuberculosis H37Rv is the two-component regulatory system PhoPR, and deletions in the
phoPR genes alongside
fadD26 are attenuating mutations in the live vaccine MTBVAC (
44–46). Our data show that insertions in both
phoPR and
fadD26 caused attenuation (
Fig. 6 and Table S3, “
phoPR regulon” and “Mycolipids” tabs). This reinforces the role of this system in virulence despite the presence of a single nucleotide polymorphism (SNP) in the sensor kinase
phoR that impacts signaling through the system in
M. bovis AF2122/97 (
44). However, care should be taken when using the data set to make inferences of the genetic requirements in field strains. For instance,
fadD26 contains nonsynonymous SNPs in global
M. bovis collections (
20). Signal potentiation via
phoR is required for secretion of ESAT-6 through the ESX-1 secretory system, and
M. bovis AF2122/97 is known to have compensatory mutations elsewhere in the genome, e.g., in the
espACD operon that restores ESAT-6 secretion in the face of a deficient signaling system (
44,
47). Our data also show that Tn insertions in
espA,
espB, and
espC (required for ESAT-6 secretion) and in
mprA, a transcriptional regulator of that operon (
48), caused attenuation, emphasizing the relevance of ESAT-6 as a virulence factor.
Studies comparing differences in expression during
in vitro growth between
M. bovis AF2122/97 and
M. tuberculosis H37Rv show that genes involved in sulfolipid (SL-1) biosynthesis are expressed at lower levels in
M. bovis AF2122/97 than
M. tuberculosis H37Rv (
29,
42). Interestingly, insertions in genes involved in SL-1 biosynthesis (
Mb3850 to
Mb3856) were not attenuating
in vivo (Table S3, “Mycolipids” tab), reinforcing the lack of importance of SL-1 for
M. bovis AF2122/97
in vivo, at least at the stages of infection studied here.
One of the most highly attenuated insertions occurred in
Mb0222/Rv0216. This gene has been shown to be highly (>10-fold) overexpressed in
M. bovis AF2122/97 compared with
M. tuberculosis H37Rv, but the physiological function of this gene is not currently known. The secreted antigens MPB70 and MPB83, encoded by
Mb2900 and
Mb2898, are also overexpressed in
M. bovis AF2122/97 and play a role in host-specific immune responses; however, insertions in these genes did not cause attenuation
in vivo in our data set (
49).
Novel attenuated mutations.
We identified 80 genes that were required for survival of
M. bovis AF2122/97 in cattle that had not been previously described as being essential
in vivo through transposon mutagenesis screens of
M. tuberculosis in standard laboratory mouse models (
8–10) (see Table S3, “Significant gene” tab). Insertions in some of these genes have been shown to cause attenuation in standard mouse models in
M. tuberculosis through the use of single mutants (
50–53). While writing this publication, a large-scale Tn-seq study that utilized over 120
M. tuberculosis libraries and several diverse mouse genotypes was performed (the collaborative cross-mouse panel [
54]). This study captured the genes required for survival under a greater variety of host microenvironments than those performed in the standard mouse models (
40). In that study, a larger subset of “adaptive” virulence genes that are required in a small subset of mice were identified, including those genes that were required in immunodeficient mice. Interestingly, insertions in
hsd were attenuating only in immunodeficient mice in this study. A direct comparison of our data set with the study by Smith et al. revealed that a further 15 genes were shown to be required in at least two mice strains, hence classified as “adaptive” virulence genes with specific host genetic components contributing to fitness. There remains a subset of 65 genes that are required for optimal fitness of
M. bovis AF2122/97 during infection of cattle that have not been previously identified as required for survival of
M. tuberculosis in any mice by using transposon mutagenesis screens.
Genes required for phenolic glycolipid synthesis. Insertions in
Mb2971c/Rv2947c (
pks15/1) and in
Mb2972c/Rv2948c (
fadD22) were attenuating in
M. bovis AF2122/97 (
Fig. 7), but these genes are not required
in vivo in
M. tuberculosis H37Rv, including in the extended panel of mouse genotypes (
8–10,
40). Both
pks15/1 and
fadD22 are involved in the early stages of synthesis of phenolic glycolipids (PGLs) and are involved in virulence (
55). The requirement for these genes in
M. bovis AF2122/97 but not in
M. tuberculosis H37Rv is consistent with the observation that Tn-seq studies in
M. tuberculosis are often carried out using lineage 4 strains (H37Rv and CDC1551) that harbor a frameshift mutation in the
pks15/1 gene, which renders them unable to synthesize PGLs. This removes the requirement for these genes
in vivo in lineage 4 strains of
M. tuberculosis. pks15/1 has been previously reported to be required for survival of an
M. bovis isolated in New Zealand in a guinea pig model of infection (
56).
Genes that are involved in post-translational modifications such as glycosylation were attenuating in
M. bovis AF2122/97 but not required
in vivo in
M. tuberculosis H37Rv
. Rv1002c is thought to add mannose groups to secreted proteins, and overexpression of this protein in
M. smegmatis was recently shown to enhance survival
in vivo and inhibit proinflammatory cytokine production (
57). The substrates of the protein mannosyltransferase are thought to be several secreted lipoproteins, including LpqW, which is involved in the insertion of the virulence lipid LAM at the mycobacterial cell surface (
57,
58).
Finally, insertions in
aspC and
glpD2 were attenuating in
M. bovis AF2122/97 but not required
in vivo in
M. tuberculosis H37Rv. An examination of the
in vitro essentiality literature showed that both of these genes are essential
in vitro in
M. tuberculosis H37Rv when grown on standard media but not in
M. bovis AF2122/97 (
11,
12,
19,
59). Information regarding
aspC and
glpD2 from Tn-seq approaches is likely to be lacking in
M. tuberculosis H37Rv because Tn mutants in these genes will not be represented in the input pool. The absence of insertion mutants in these genes in the most recent large-scale
M. tuberculosis H37Rv Tn-seq study supports this (
40).
aspC (
MB0344c/
Rv0337c) is an aspartate aminotransferase involved in the utilization of amino acids (aspartate) as a nitrogen source (
60). This provides evidence that
M. bovis utilizes aspartate
in vivo, as is observed in
M. tuberculosis (
61)
. glpD2 (
Mb3303c/
Rv3302c) is a membrane-bound glycerol-phosphate dehydrogenase. In
Escherichia coli,
glpD2 is an essential enzyme, functioning at the central junction of respiration, glycolysis, and phospholipid biosynthesis, and catalyzes the oxidation of dihydroxyacetone phosphate (DHAP) from glycerol-3-phosphate, resulting in the donation of electrons to the electron transport chain (
62). Its essentiality
in vitro in
M. tuberculosis H37Rv might be explained by the usage of glycerol during
in vitro growth in this species. The contribution of the membrane-bound
glpD2 to the donation of electrons in the electron transport chain has been suggested but not yet explored in the MTBC (
63). Given the interest in the electron transport chain as a chemotherapeutic target in
M. tuberculosis, the data presented here suggest that inhibition of
glpD2 might be a fruitful approach in the development of new drugs for the treatment of TB in humans (
64). The role of this gene in
M. bovis AF2122/97
in vivo is perhaps surprising given the disruptions in glycerol phosphate uptake and pathways that phosphorylate glycerol in
M. bovis AF2122/97 (
65). However,
M. tuberculosis is thought to engage in catabolism of membrane-derived glycerophospholipids, which may be an alternative potential source of glycerol-3-phosphate in members of the complex (
66).
In this study, we have identified the genes required for survival in M. bovis AF2122/97 in cattle. The data set correlates well with preexisting knowledge. However, in addition to known requirements, we have uncovered novel virulence factors that had not previously been described in members of the complex. In this way, we both corroborate and expand our current knowledge of tuberculosis.