INTRODUCTION
Competition for survival in nature drives organisms to continuously adapt and evolve, leading to the evolution of species over time (
1,
2). A constant battle between prokaryotes and parasitic mobile genetic elements (MGEs), most notably viruses, provides a vivid illustration of this principle. To avoid extermination by viruses, which are estimated to outnumber their prokaryotic hosts by an order of magnitude (
3), cells have evolved numerous defense strategies. To avoid extinction, phages have evolved countermeasures to overcome specific defenses of their hosts. Prokaryotic adaptive clustered regularly interspaced short palindromic repeats (CRISPR)-Cas (CRISPR-associated) immunity systems have received much attention due to their unique mechanism of action and significance for biotechnology and biomedicine. These RNA-guided defenses consist of CRISPR arrays and associated
cas genes. During CRISPR adaptation, the hosts integrate short sequences derived from infectious agents' genomes as spacers into the CRISPR arrays. During CRISPR interference, the Cas proteins guided by short CRISPR RNAs (crRNAs) transcribed from the array recognize and eliminate invading pathogen genomes with sequences complementary to crRNA spacers (
4–6).
One way that bacteriophages and other MGEs can evade CRISPR-Cas immunity is by modifying or removing targeted DNA sequences from their genomes (
7–9). However, this strategy has limitations, particularly when CRISPR-Cas targets essential regions. Another strategy is to avoid recognition by CRISPR-Cas (and other DNA-targeting host defenses, such as restriction-modification systems) by extensively modifying the invader’s DNA or creating excluded compartments in infected cells that make invader DNA inaccessible to host defense systems (
8,
10,
11). Yet another common strategy relies on anti-CRISPR proteins (Acrs) that are encoded by MGEs, and inhibit CRISPR-Cas immunity by diverse mechanisms (
12).
The number of identified and experimentally characterized Acrs is steadily growing (
13) and is constantly updated (
tinyurl.com/anti-crispr). Known Acrs inhibit CRISPR interference by preventing target binding, target cleavage, or crRNA interaction with Cas interference proteins (
14). Most Acrs are small proteins, with many having a highly negative overall charge and, therefore, likely acting as DNA mimics (
15–18).
Within phage genomes,
acr genes are often paired with anti-CRISPR-associated (
aca) genes. The Aca proteins are transcription factors containing the DNA binding helix-turn-helix (HTH) domain, which regulates
acr transcription (
19). Genes coding for small proteins with the AP-2 DNA-binding domain are frequently observed in
acr loci as well (
20). While the diversity of Acrs poses a significant challenge for their identification by means of bioinformatics (
21), the "guilt-by-association" approach involving analysis of sequences flanking
aca-like genes has met with considerable success (
19,
22). Another strategy involves the analysis of prokaryotic genomes containing CRISPR arrays with spacers matching sequences, in a host’s own genome. In these cases, self-immunity is often prevented by Acrs encoded in prophages (
19).
Interest in discovering new Acr proteins is driven by their potential applications, including the development of phage therapy for pathogenic bacteria (
14). A virulent phage that can efficiently overcome host CRISPR-Cas defense by employing Acrs would be a preferred candidate for therapeutic application. Phage therapy is considered a promising alternative to antimicrobial treatments against the widespread anaerobic spore-forming bacterium
Clostridioides difficile, which poses a significant threat to human health all over the world (
23–25). The type I-B CRISPR-Cas system of
C. difficile is highly active and limits infection by phages (
24,
26–28). Aside from
in silico predictions, no anti-CRISPR proteins targeting type I-B CRISPR-Cas systems have been characterized yet (
29).
In this paper, we report a discovery of a new Acr protein that inhibits interference by the C. difficile CRISPR-Cas. This protein, which we name AcrIB2, is encoded by a temperate C. difficile phage φCDHM38-2. Sequence analysis suggests that proteins similar to AcrIB2 are common in clostridial phages. Most C. difficile strains encode two sets of type I-B cas genes. We show that the products of one cas gene set play no role in CRISPR interference, at least in laboratory settings. Thus, it follows that AcrIB2 targets CRISPR interference provided by proteins encoded by the remaining, active, cas gene set. Counterintuitively, the operon encoding the set of cas genes functional in interference is incomplete: it lacks genes required for CRISPR adaptation. In contrast, the operon encoding the seemingly non-functional interference genes also encodes the adaptation genes. These findings thus may hint at potential functional specialization between the duplicated cas operons of C. difficile, the nature of which remains to be determined.
DISCUSSION
Anti-CRISPR proteins have evolved in response to the co-evolutionary arms race between prokaryotes and their viruses. These proteins exhibit a wide range of structural and functional diversity, and only a small fraction of them have been identified and functionally characterized to date (
37). The discovery of Acr proteins has a wide range of applications, including phage therapy of pathogenic bacteria, where Acrs can inhibit the CRISPR-Cas system of the host, thus increasing the ability of the phage to clear the infection (
14).
The
C. difficile CRISPR-Cas system provides a potent defense against MGEs and presumably contributes to the pathogen’s survival in the phage-rich microbiome of the colon. Multiple spacers targeting phage genomes infecting
C. difficile have been identified (
26). All currently identified phages of
C. difficile are temperate and are capable of either inserting their genetic material into the bacterial genome or exist as episomes (
24). Therefore, it is likely that
C. difficile phages evolved anti-CRISPR mechanisms to protect themselves from CRISPR targeting while in the lysogenic state. However, no such mechanisms have been defined.
In this work, we describe bioinformatic identification followed by experimental validation of the first anti-CRISPR protein that inhibits the type I-B CRISPR-Cas system of
C. difficile. The putative clostridial Acrs identified in several
C. difficile phages are similar to each other but share no identifiable sequence similarity to known Acrs. The validated
acr gene of
C. difficile phage φCD38-2 (
acrIB2), together with two unknown-function genes upstream, is located immediately downstream of a long cluster of capsid, DNA packaging, tail, and lysis proteins genes and is transcribed in the same direction (
38). Immediately downstream from
acrIB2 is a putative lysogenic conversion region that is transcribed in the opposite direction. In a stable lysogen containing the φCD38-2 episome, the
acrIB2 gene along with other upstream genes (
Fig. 1B) is highly transcribed (
38). The generally conserved location of the
acrIB2-like genes may be due to the necessity to control anti-CRISPR gene expression, synchronizing it with the infection process. Phages that encode
acrIB2 homologs belong to different morphological classes (sipho- and myoviridae) and likely rely on different developmental strategies. While some phages encode an AP2 domain protein used for the search, others, including the φCD38-2 that encodes the validated AcrIB2 protein, do not (
Fig. 1B). Some of the unknown-function genes that are adjacent to
acrIB2 gene homologs in these phages may encode novel Aca proteins. Interestingly, the majority of phages possess a highly conserved gene of an unknown function downstream from
acrIB2 homologous genes. Of particular interest is phage φCD211 (
39). Its genome is much larger than the genomes of other phages encoding AcrIB2 homologs. In the immediate neighborhood of its
acrIB2-like gene, there is an open reading frame coding for a short C-terminal fragment of a Cas3-like protein and a 4-spacer CRISPR array targeting some known
C. difficile phages (
39). It is possible that this locus is used in inter-phage warfare as other prophage-located and prophage-targeting CRISPR arrays in several
C. difficile strains (
26,
40).
Our top hits for the AP2 domain protein-encoding gene in
C. difficile phages are neighbored by genes encoding split AcrIB2 homologs. Presumably, these phages encode either a unique split anti-CRISPR protein or produce a fusion protein as a result of +1 translational frameshifting between
gp28 and
gp29 open reading frames (ORFs) as previously described in other bacteriophages (
41–43).
AcrIB2 has a very strong effect on CRISPR interference against conjugating plasmids in the self-targeting model when expressed from an inducible promoter. In a biologically more relevant context of a φCD38-2 lysogenic strain, its effects are milder, increasing survival in the self-targeting model ca. 10-fold. Although this was not specifically tested in this study, it is reasonable to assume that the protective effects of AcrIB2 in the context of phage infection would also be partial and likely linked to the replication cycle of the phage. We attempted to delete the acrIB2 gene from the φCD38-2 genome. Regrettably, this proved impossible, perhaps because in the φCD38-2 lysogens multiple copies of phage episome exist, making it difficult to select desired clones.
To identify proteins interacting with AcrIB2, copurification assays were performed on extracts from wild-type C. difficile 630Δerm cells, utilizing a self-targeting plasmid that co-expressed functional N-terminally Strep-tagged AcrIB2. Extracts from cells containing the empty vector served as controls. Trypsin digestion and LC-MS-MS analysis identified 1,116 proteins, with 840 exhibiting a twofold-change difference (P ≤ 0.05) between test and control cells across biological replicates. Notably, Cas3 from both partial and full operons (22% amino acid sequence identity) was significantly enriched in the AcrIB2 sample, providing tentative evidence that AcrIB2 binds both Cas3 proteins (Fig. S4). The AcrIB2 sample also showed enrichment in numerous DNA and RNA-binding proteins involved in DNA replication, repair, topology, and structural chromosome maintenance, as well as various transcriptional regulators, RNA polymerase subunits, and nucleases (Table S3). These findings suggest a potential AcrIB2 mechanism of action related to DNA mimicry.
The AcrIB2 protein, along with its homologs derived from other
C. difficile phages, exhibits a substantial presence of negatively charged and aromatic amino acids (53% of the protein sequence), corroborating the LC-MS-MS analysis results and suggesting a potential role as a DNA mimic (
Fig. 5A). Predictions of the secondary structure reveal a predominance of alpha-helix motifs within the protein structure (
Fig. 5B). The AcrIB2 structure predicted with the AlphaFold tool reveals clustering of negatively charged residues along the long axis of the protein (
Fig. 5C), consistent with the DNA mimicry hypothesis regarding the mechanism of action of AcrIB2. The negatively charged positions are conserved among AcrIB2 homologs, suggesting their essentiality (
Fig. 5A and D). In the predicted structure, the position of the split that occurs in cases when an AcrB2 homolog is encoded by two separate genes is located in an unstructured linker (
Fig. 5D) and should not prevent the C-terminal fragment of the protein from making tight interactions with the N-terminal part that makes a structurally compact core from which a linker with conserved negatively charged residues (D92, E94, E95;
Fig. 5D) protrudes. The mechanism of action of AcrIB2 could thus involve binding to Cas3, making it unable to interact with DNA-bound Cascade and thus preventing target DNA destruction.
Most
C. difficile strains contain two
cas operons, and their individual contribution to interference was not explored before the present study. Surprisingly, our results demonstrate that the mutant lacking the partial
cas operon exhibited a complete loss of CRISPR interference activity, which indicates that it plays the primary role in CRISPR defense that is inhibited by AcrIB2. Upon heterologous expression in
E. coli, the full
cas operon led to a reduction in the transformation rate of CRISPR-targeted plasmids, albeit with modest efficiency compared to natural CRISPR interference in
C. difficile (
26). Since the partial operon lacks the adaptation module, spacer acquisition must be driven by the products of the full operon. Indeed, we have recently shown that the adaptation module is functional in naive adaptation when expressed from a plasmid (
27). Interestingly, both
cas operons are associated with general stress response SigB-dependent promoters, but we observed a stronger effect of
sigB mutation on the full
cas operon expression as compared to partial
cas operon (
44). This differential expression could suggest a potential role of full
cas operon under stressful conditions. While the function of the interference module of the full
C. difficile cas operon needs to be specified, it is attractive to speculate that it may be involved in regulatory function in concert with specific crRNAs or, together with the products of the adaptation module, be responsible for primed adaptation.
In conclusion, the identification of a new anti-CRISPR protein targeting
C. difficile type I-B CRISPR-Cas contributes to a better knowledge of the phage-host relationship and coevolution of defense and counter-defense systems for this important human pathogen and opens interesting perspectives for further developments of applications in biotechnology and health. Apart from its potential applications in phage therapy and phage selection (
45), AcrIB2 can also be leveraged as a control for CRISPR-Cas endogenous editing tool (
33). Moreover, AcrIB2 holds promise for enhancing the efficacy of the newly developed phage-delivered CRISPR-Cas antimicrobial, which triggers the self-elimination of
C. difficile caused by the activity of the endogenous CRISPR-Cas system (
46).
MATERIALS AND METHODS
Bioinformatic search of putative anti-CRISPR
The guilt-by-association bioinformatic method was used to identify the putative anti-CRISPR I-B type protein. The method is based on a chain search of homologs of
acr and
aca genes using BLAST (
47). Uncharacterized ORFs were identified with ORFfinder NCBI (
48). The identification of other putative
acr and
aca loci in
C. difficile phages and prophages was made by BLAST search (
47). The list of clostridial phages and identified putative Acrs can be found in Table S4.
Plasmid construction
The nucleic acid and amino acid sequences of Acrs used in this study are listed in Table S5. The list of plasmids used for this study is summarized in Table S6. The putative
acr gene from φCDHM13 phage was cloned into the protospacer, and self-targeting plasmids (pRPF185 derivatives) accompanied by regulatory elements (P
tet promoter, ribosome binding site (RBS), and terminator) in the form of gBlock (dsDNA) from IDT (France). The cloning was achieved through Gibson Assembly by using NEB Gibson Assembly Master Mix—Assembly (E2611)
(49). The resulting constructions were transformed into
E. coli NEB beta cells (New England BioLabs) and verified by Sanger sequencing.
To construct editing plasmids, approximately 800 bp long flanking regions of partial and full
cas operon of the
C. difficile 630Δ
erm strain were amplified by PCR and introduced into the pMSR vector (
50) using Gibson assembly reaction (
50). The resulting constructions were transformed into
E. coli NEB beta cells (New England BioLabs) and verified by Sanger sequencing. The list of primers used for this study is summarized in Table S7.
Bacterial strains and growth conditions
All bacterial strains used in this study are listed in Table S6. C. difficile was cultivated in the anaerobic chamber (Jacomex, France), filled with an atmosphere of 5% H2, 5% CO2, and 90% N2. Both liquid cultures and plate growth were conducted using brain heart infusion (BHI) medium (Difco) at 37°C. When working with strains carrying plasmids, Tm at the final concentration of 15 µg/mL was added to overnight cultures, and 7.5 µg/mL was used for the day cultures. In order to induce the inducible Ptet promoter of pRPF185 derivatives in C. difficile, the non-antibiotic analog ATc was added at the final concentration of 100 ng/mL. The E. coli strains were cultured in lysogeny broth (LB) medium at 37°C supplemented with 100 µg/mL ampicillin and 15 µg/mL chloramphenicol when required.
Plasmid conjugation and estimation of conjugation efficiency
All plasmids were transformed into the E. coli strain HB101 (RP4). Transformants were further mated with C. difficile cells on BHI agar plates for 8 hours (for C. difficile 630) or 24 hours (for C. difficile R20291) at 37°C. Furthermore, C. difficile transconjugants were selected on BHI agar plates containing Tm (15 µg/mL), D-cycloserine (Cs) (25 µg/mL), and cefoxitin (Cfx) (8 µg/mL).
To estimate conjugation efficiency, after the mating step, C. difficile conjugation mixture was serially diluted and plated on BHI agar supplemented with Tm, Cs, and Cfx, or Cs and Cfx only. Then the ratio of C. difficile transconjugants to the total number of CFU/mL was estimated.
Growth assays
C. difficile carrying either plasmid maintained in 7.5 µg/mL Tm was grown to an optical density at 600 nm (OD600) equal to 0.4–0.5, after which ATc inducer was added to a final concentration of 100 ng/mL. Then cultures were either transferred to a 96-well plate to obtain growth curves by using the CLARIOStar Plus machine or serially diluted and plated on BHI + Tm (15 µg/mL) plates at a certain time point and grown overnight before CFU counting.
For the drop tests, C. difficile carrying either plasmid was serially diluted from starting OD600 of 0.4 and spotted on BHI Tm plates (15 µg/mL) with or without ATc inducer (100 ng/mL). Plates were incubated at 37°C for 24 hours or 48 hours and photographed.
Microscopy
For phase-contrast microscopy, C. difficile carrying either plasmid maintained in 7.5 µg/mL Tm was grown to an OD600 equal to 0.4–0.5, after which ATc inducer was added to a final concentration of 100 ng/mL. After 3 hours of incubation at 37°C, 1 mL of culture was centrifuged at 3,500 rpm for 5 minutes, and the pellet was resuspended in 20 µL of sterile H2O. Cells were fixed with 1.2% agarose on the slide. Images were captured on a Leica DM1000 microscope using a Flexacam C1 12 MP camera with the LAS X software.
High-throughput sequencing of total genomic DNA
Total genomic DNA was purified by NucleoSpin Microbial DNA Mini kit (Machery-Nagel). For library preparation, the NEBNext Ultra II DNA Library Prep kit for Illumina (NEB) was used, and the sequencing was carried out on an Illumina platform (NovaSeq 6000).
To ensure accurate data analysis, raw reads were trimmed using Trimmomatic v0.39 (NexteraPE-PE.fa:2:30:10; leading: 3, trailing: 3, slidingwindow: 4:15, minlen: 20). Reads were then aligned to the reference genome using Bowtie2 aligner with end-to-end alignment mode and one allowed mismatch (
51). Only reads with unique alignment were retained for further analysis.
BAM files were analyzed using the Rsamtools package, and reads with MAPQ scores equal to 42 were selected for downstream coverage analysis and calculating the mean coverage across the genome (
34,
52).
Deletion of cas operons in C. difficile
An allele-coupled exchange mutagenesis approach described previously (
50) was used to delete the partial and full
cas operons from the
C. difficile 630Δ
erm strain. Editing plasmids were conjugated into
C. difficile. Transconjugants were selected on BHI supplemented with Cs, Cfx, and Tm and then restreaked onto fresh BHI plates containing Tm twice in a row to ensure the purity of the single crossover integrant. The purified colonies were then streaked onto BHI plates with ATc (100 ng/mL) to ensure the selection of cells where the plasmid had been excised and lost. If the plasmid was still present, the toxin was produced at lethal levels, and colonies did not form in the presence of ATc. Growing colonies were tested for the success of the deletion by PCR and Sanger sequencing.
AlphaFold structure prediction
The AcrIB2 amino-acid sequence was used as input to the MMseqs2 homology search program (
53) with three iterations against the Uniref30_2202 database to generate a multiple sequence alignment (MSA). This MSA was filtered with HHfilter using the parameters “id” = 100, “qid” = 25, and “cov” = 50, resulting in 68 homologous sequences, then full-length sequences were retrieved and realigned with MAFFT (
54) using the default FFT-NS-2 protocol. Then five independent runs of the AlphaFold2 (
55) algorithm with six recycles were performed with this input MSA and without template search, using a local instance of the ColabFold (
56) interface on a local cluster equipped with an NVIDIA Ampere A100 80Go GPU card. Each run generated five structural models. The best model out of 25 was picked using the predicted local distance difference test (pLDDT) confidence score as a metric and used for further structural analysis (pLDDT for this model: 83.5). The qualitative electrostatic surface was generated using PyMOL (
57) (local protein contact potential). The evolutionary conservation scores were generated using the AlphaFold2 MSA as an input to the Rate4Site (
58) program, which computes the relative evolutionary rate for each site.