INTRODUCTION
Bacteriophages (or phages) infecting the pathogenic bacterium
Clostridioides difficile have received increasing attention lately due to their potential as alternative therapeutic agents and also because of their possible contribution to the biology and virulence of their host (
1,
2).
C. difficile still represents a major threat to human health, and the epidemiology of this pathogen is a topic of great clinical importance (
3). Indeed, there is a lot of genetic and phenotypic diversity among isolates, and how such diversity affects virulence or clinical outcome is unclear (
4). Prophages, i.e., integrated phages (viruses), contribute to the virulence of many bacterial pathogens and represent an important source of genetic diversity among strains of the same species (
5). Previous studies in
C. difficile suggest a great diversity of inducible prophages (
6–9), some of which likely have some role to play in the biology of the host (
10). For example, phages phiCD119 and phiCD38-2 were shown to repress and induce toxin production, respectively (
11,
12). The phiCDHM1 and some related phage genomes encode genes that are predicted to participate in quorum sensing (
13). The phiSemix9 phage genome was recently shown to carry a complete and functional binary toxin locus (CdtLoc) (
14). There are currently 2,500+ complete or draft
C. difficile genomes available in public repositories (
15,
16). Analyzing all of them for the presence of prophages could provide very useful data to better understand the diversity and epidemiology of
C. difficile and the potential contribution of phages to the biology and virulence of this pathogen.
A little more than 20
C. difficile phages have been fully characterized and their genomes sequenced (
17–19). One general observation is that all
C. difficile phages are temperate members of either the
Myoviridae or the
Siphoviridae family of the order
Caudovirales, i.e., phages with contractile or noncontractile tails, respectively (
17,
20), and most genomes are ∼30 to 55 kbp in size. The
Myoviridae phage genomes characterized so far generally share significant DNA homology and tend to form phylogenetically related clusters. On the contrary, a limited number of
Siphoviridae phages have been described and sequenced (e.g., phiCD38-2, phiCD111, phiCD146, phiCD6356, and phiCD24-1 [
9,
19,
21]), and they are more distantly related to each other genetically (
22).
We had previously isolated and sequenced the genome of a large 131-kbp temperate phage that we called phiCD211 (accession no.
NC_029048.2) (
19). Almost concomitantly, Wittmann and colleagues reported a large phage genome of 131 kbp, called phiCDIF1296T (accession no.
CP011970), that they identified as an episome while sequencing the genome of
C. difficile strain DSM1296
T (also known as ATCC 9689) (
23). The functionality of the phiCDIF1296T phage or the isolation and observation of phage particles was not reported by the authors, but it turned out that phiCDIF1296T and phiCD211 are the same phage and were found in the same original strain (DSM1296
T = ATCC 9689).
In the present study, we demonstrate that phiCD211/phiCDIF1296T is a functional phage and further confirm by electron microscopy observations of viral particles that it is a member of the
Siphoviridae family of phages. We used PROKKA (
24) with the curated PHASTER database (
25) to improve the previous genome annotation. We also used the recently published PhageTerm software to determine the nature of the phage genome's termini (
26), which were not available before. In addition, we used Bowtie2 (
27) to screen raw sequencing reads from 2,584
C. difficile genomes available to determine the prevalence of phiCD211/phiCDIF1296T and related prophages. Finally, whole-prophage comparisons revealed the extreme genetic diversity and genome plasticity of phiCD211/phiCDIF1296T-like phages.
DISCUSSION
The morphology of phiCD211 had never been described before, although we had released the genome sequence of this phage in 2014 and cited the sequence in 2015 (
19). In addition, the functionality of phiCD1296T had never been demonstrated, and only the genome sequence was published (
23). With its large capsid measuring 76 nm in diameter and long tail of 449 nm in length, phiCD211 (phiCDIF1296T) represents the largest phage known to infect
C. difficile, since all previously characterized phages have capsids with a diameter of ∼50 nm and tails ∼100 to 350 nm in length (
6,
7,
9). Of interest, phiCD211 is morphologically similar to the
L. lactis phage 949, the largest
Siphoviridae phage known to infect this species, and has a capsid of 79 ± 7 nm in diameter and a tail of 500 ± 27 nm in length (
33). Accordingly, some structural proteins from phiCD211 share sequence homology with structural proteins from phage 949 and other large clostridial phage (e.g., the
C. botulinum phage c-st). Only a few other phages from different bacterial species have similar long tails, like the
Lactobacillus plantarum Siphoviridae phage B2 that has a 500-nm tail (
43), the
Bacillus cereus Myoviridae phage 11 that has a 485-nm tail (
44), or the
Bacillus megaterium Myoviridae phage G, with a tail of 455 nm in length (
45).
At 131 kbp, the phiCD211 genome is also the largest phage genome identified in
C. difficile, although our PHASTER analyses revealed the presence of even larger phiCD211-like prophages in certain isolates (e.g., ERR339803 at ∼170 kbp). However, the functionality of these larger phages needs to be determined experimentally. Of note, we could detect distant homology with certain phages only when using the PHASTER database comprising phage sequences only. BLASTp searches using larger nonredundant protein databases did not allow detection of some of these hits. Since phage genomes are often organized similarly (
46), prediction of the function of certain distantly related proteins with limited homology is possible based on the genomic location of the corresponding hit, like structural proteins, for instance. Hence, for future phage genome annotation, we suggest using PHASTER to increase the strength of phage annotations. This would reduce the number of hits corresponding to proteins of unknown function.
Unfortunately, we were unable to find a suitable host to propagate phiCD211. This was not surprising considering the rather narrow host spectrum observed so far with
C. difficile phages (
9,
18,
47,
48). This may be due to the presence of wide-spectrum antiphage systems (
49), including a functional CRISPR-Cas system (
19,
50). It also likely depends on the presence of a suitable host receptor. It will be important at some point to find a susceptible host to allow testing the influence of the prophage on bacterial phenotypes. Alternatively, curing the prophage from the host could be possible, but this might be more complex to do in
C. difficile than in other bacteria, such as
Escherichia coli, for which genetic tools are more accessible (
51).
The genome of phiCD211 carries interesting genes that could influence drug resistance, like ORF8, encoding a putative AcrB/AcrD/AcrF protein. This family of proteins generally encodes multidrug resistance transporters, and their inactivation in
E. coli was shown to affect biofilm formation and increase antimicrobial susceptibility (
37). Also of interest, ORF46 encodes a putative EzrA septation ring formation regulator. During cell division and cytokinesis, polymerization of the cytoskeleton protein FtsZ is required to allow Z-ring formation that will eventually lead to proper septation at mid-cell. EzrA is conserved in low G+C Gram-positive bacteria and has been shown to bind to FtsZ to control its polymerization
in vitro and
in vivo. Overexpression of EzrA by 2-fold resulted in a filamentous cell phenotype in
B. subtilis (
52). Interestingly, the ATCC 9689 strain that we used in our study (CD211) forms filamentous cells in culture and sporulates very poorly (
39), further supporting that the phiCD211-encoded EzrA homolog could be responsible for this phenotype.
A gene encoding a putative YyaC-like spore germination protease was also identified in phiCD211 (ORF68). Spore germination is an important step in the
C. difficile life cycle, since it is required for vegetative growth, colonization, toxin production, and, thus, initiation of
C. difficile infection (
53). Small acid-soluble proteins (SASP) protect DNA within the spore and need to be removed during germination. In
Clostridium acetobutylicum, a YyaC homolog has been shown to have proteolytic activity on SASP (
38). Hence, ORF68 in phiCD211 possibly influences spore germination.
Finally, we identified a putative CRISPR-associated Cas3-HD endonuclease protein (ORF154) located close to two short CRISPR arrays. In type I CRISPR systems, such as in
E. coli, the Cas3 nuclease acts in combination with the Cascade complex to nick, unwind, and degrade the target DNA (
54). Cas3 homologs have been identified in the type I-B CRISPR system of
C. difficile, but the function of this protein has not been demonstrated experimentally (
19). It will be interesting to study the functionality of this phage-encoded Cas3 protein as well as the associated CRISPR arrays. CRISPR systems generally are encoded on the chromosome of the host, but some have been identified in phage genomes (
55,
56). A number of
C. difficile phage genomes carry CRISPR arrays (without
cas genes), and here we present an example of a phage-encoded
cas gene in
C. difficile (
19,
50). Whether it is active and plays some role in CRISPR-mediated phage interference will need to be investigated.
Whole-phage genomic comparisons revealed pervasive gene rearrangements in phiCD211-like prophages. Several gene insertions and deletions could be identified, as well as large DNA inversions. The presence of several putative transposases and integrases suggests that this large phage acquired additional genes through recombination with other phages and plasmids. Acquisition of large transposons that integrated into the prophage is also very likely. The presence of possible recombination hotspots probably further promoted genome evolution. This likely explains the chimeric nature and apparent genome plasticity of phiCD211-like prophages.
In conclusion, phiCD211 and phiCD211-like prophages are highly prevalent in C. difficile and might contribute to important phenotypes, including sporulation, germination, and antimicrobial susceptibility and resistance to invading DNA, such as phages. It will be very interesting to perform functional assays to validate some of the predictions we made using bioinformatic approaches.