All plasmids used in this study are listed in Supplementary Table 1, sgRNA target sequences are shown in Supplementary Table 2. Sequences and maps for the CRISRPi selection vectors are available as Supplementary data 1 (Genbank files). Oligonucleotides and synthetic double-stranded DNA fragments were obtained from Integrated DNA Technologies. PCRs were performed with Q5 Hot Start High Fidelity Polymerase (New England Biolabs) or Invitrogen Platinum Superfi II DNA Polymerase (Thermo Fisher Scientific). Expression vectors were created using classical restriction enzyme cloning or Golden-Gate assembly (40). Restriction enzymes were obtained from New England Biolabs, T4 DNA ligase was obtained from Jena Biosciences. Agarose gel electrophoresis was used to analyze PCR and restriction products. Bands of the expected size were cut out and DNA was extracted with a ZymoClean Gel DNA recovery kit (Zymo Research). For all cloning steps, chemically competent E. coli K12 DH5α cells (New England Biolabs) were used. Antibiotics were used at the following concentrations: carbenicillin, 50 µg mL−1; chloramphenicol, 25 µg mL−1; kanamycin, 50 µg mL−1. Plasmid DNA was purified with the ZR Plasmid Miniprep or ZymoPure II Midiprep kit (both Zymo Research). The integrity of all plasmids was verified by Sanger Sequencing (GENEWIZ Europe).
AcrIIA4 (Listeria monocytogenes) (19) and AcrIIA5 (Phage D4276) (14) coding sequences were codon optimized for expression in E. coli, obtained as double stranded gene fragments and cloned into pBAD24 (pBAD24-sfGFPx1 was a gift from Sankar Adhya & Francisco Malagon, Addgene plasmid #51558) (41) via unique EcoRI and HindIII restriction sites, resulting in the plasmids pBAD24-AcrIIA4 and pBAD24-AcrIIA5.
To construct plasmids co-expressing dSpy_Cas9 and sgRNA scaffolds, a second multiple cloning site was introduced into pBbA5 (pBbA5c-RFP was a gift from Jay Keasling, Addgene plasmid #35281) (42) by PCR, resulting in pBbA5C_sgMCS. A fragment encoding an _E. coli codon optimized d_Spy_Cas9 was PCR amplified from vector pdCas9 (gift from Luciano Marraffini, Addgene plasmid #46569) (43) and cloned into pBbA5C_sgMCS using EcoRI and BamHI restriction sites, thereby generating pBbA5C_sgMCS_d_Spy_Cas9. A sgRNA expression cassette with an RFP reporter-targeting spacer sequence was ordered as overlapping DNA oligonucleotides, extended by PCR and cloned into pBbA5c-spCas9 using NotI and SalI restriction sites, resulting in pBbA5C_sgMCS_d_Spy_Cas9_RFP_guide. To generate the RFP reporter plasmid, a bacterial codon optimized RFP encoding sequence was PCR amplified from pBbA5c-RFP; the Anderson Promoter J23102 (http://parts.igem.org/Promoters/Catalog/Anderson) was included in the 5’-extension of the forward primer. A sequence encoding a SsrA degradation tag (44) (AANDENYADAS; corresponds to part BBa_M0052 in iGEM parts registry - parts.igem.org) was included as 5’- extension into the reverse primer to reduce the half-live of RFP in bacterial cells. The resulting PCR fragments were cloned into pJUMP27 (pJUMP27-1AsfGFP was a gift from Chris French, Addgene plasmid #126974) (45) via XbaI and PstI restriction sites, thereby resulting in pJUMP27_J23102_RFP_M0052.
The single codon mutational libraries of AcrIIA4 and AcrIIA5 were generated by back-to-back PCR on pBAD24-AcrIIA4 and -AcrIIA5 with forward primers containing NNB overhangs as previously described (46). PCRs were performed for each position individually to avoid PCR bias, followed by gel extraction. The purified fragments were treated with KLD enzyme mix (New England Biolabs) and transformed into E. coli DH5a chemically competent cells. Cells were grown in LB supplemented with carbenicillin. Note that sub-libraries corresponding to a single codon were grown individually and to stationary phase in a 96 deep-well plate. Cultures were then combined at an equal volume and further grown in 50 ml LB carbenicillin until saturation, followed by extraction of plasmid DNA.
Single mutant Acr variants for bacterial validation experiments were created via back-to-back PCR on template vectors pBAD24-AcrIIA4 and -5 and by incorporating the mutations into the 5’-extension of one primer, followed by KLD treatment.
The vectors co-encoding firefly luciferase, a firefly luciferase gene targeting sgRNA and Renilla luciferase (for normalization purposes) used for the dual luciferase assay experiments in mammalian cells were previously reported by us (32). Vector pCMV-AcrIIA4 was previously reported by us (32). Single codon substitutions were introduced into the Acr expression constructs via back-to-back PCR by incorporating the mutations into the 5’- extension of one primer, followed by KLD treatment.
Transformation of E. coli with Acr libraries
For library generation, chemically competent bacterial cells were first co-transformed with the d_Spy_Cas9 plasmid and the RFP reporter plasmid and selected on LB containing kanamycin and chloramphenicol. An overnight culture of a single bacterial colony was inoculated in Super Optimal Broth (SOB) medium supplemented with kanamycin and chloramphenicol, grown to an optical density (OD)600 of 0.5 and chemical competent cells were prepared according to the Inoue transformation protocol (47). AcrIIA4 and AcrIIA5 libraries were transformed into chemical competent cells by heat-shock at 42°C for 1 min. 0.1% of the total transformation volume was plated on LB Agar plates containing carbenicillin, chloramphenicol and kanamycin to estimate the transformation efficiency and corresponding library complexity (Supplementary Tables 3 and 4). The remaining cells were grown in liquid LB medium containing carbenicillin, chloramphenicol and kanamycin until stationary phase, followed by cryopreservation in aliquots.
Fluorescence-activated cell sorting
Cryopreserved cells were thawed and grown in LB medium containing carbenicillin, chloramphenicol, kanamycin until stationary phase, followed by induction with 1 mM IPTG and 4 mM arabinose at a starting OD600 of 0.05. Following a 12-hour incubation period, cells were collected by centrifugation at 5000×g for 10 minutes, the supernatant was removed, and cells were resuspended in 10 volumes 1×PBS buffer.
Fluorescence-activated cell sorting (FACS) experiments were performed on a BD FACSAria™ Fusion flow cytometer at the ZMBH flow cytometry core facility (Heidelberg University). E. coli c_ells were first gated using the forward-scatter area (FSC-A) and site-scatter area (SSC-A). Bacterial cells were then sorted into eight fractions according to their RFP intensity using 150,000 total cells per fraction. Collected cells were either frozen for DNA extraction followed by deep amplicon sequencing or recovered by growing the cells in LB medium containing the necessary antibiotics for subsequent characterization of fluorescence enrichments. In the latter case, cells were washed in PBS twice and grown in LB medium containing carbenicillin, chloramphenicol, kanamycin overnight. The cultures form the individual fractions were then induced with 1 mM IPTG and 4 mM arabinose at a starting OD600 of 0.05 and analyzed on a BD FACSAria™ _Fusion flow cytometer 12 hours later as described above.
Amplicon Deep Sequencing
FACS-sorted E. coli cell fractions were lysed and DNA extracted for subsequent 1st stage PCR amplification of AcrIIA4 or AcrIIA5 genes with primers containing Nextera XT index overhangs. PCR amplicons were purified by gel extraction and the concentrations of each sample was adjusted to 25 ng/µl for the 2nd stage barcoding PCR using TG Nextera XT Index Kit v2 Set A (Illumina) according to the manufacturer’s protocol. Libraries were sequenced at the EMBL Heidelberg GeneCore facility on the Illumina MiSeq system using 2×250 paired-end sequencing reagents (MiSeq Reagent Kit v2, 500-cycles).
NGS data analysis
Paired-end reads for each sorted fraction were assembled by their overlap and filtered to remove reads corresponding to sequences not contained in the original library. Only reads corresponding the wild-type Acr and reads containing mutations in exactly one codon relative to wild-type were kept for further analysis. Read counts of the remaining reads were augmented with pseudocounts and normalized within each sorting fraction. Then, for each single mutant, read counts were normalized across fractions, resulting in a distribution of read counts over fractions for each mutant.
Acr mutant activity regression
As a proxy for Acr activity, linear regression of the binned distribution and mean of log fluorescence intensities was performed for a set of single Acr mutants (benchmarks). To this end, log fluorescence intensity data from flow cytometry for the benchmark mutant set were binned (bin width 0.5, range from 0 to 12) and bins normalized to sum to one. An affine regression model relating the read distribution across sorting fractions to the distribution of log fluorescence intensity was zero initialized and fitted using gradient descent to minimize squared error under L2 regularization. Model fits were evaluated using mean square error calculated via leave-one-out cross-validation on the set of benchmark mutants.
Log fluorescence intensity distributions were predicted for all single Acr mutants. As a measure of confidence for the predicted distribution, the relative entropy of the underlying NGS read distribution was computed relative to a uniform distribution. Large relative entropy indicates a distribution with a majority of probability mass focused on one sequenced fraction (high confidence), whereas a low relative entropy indicates a distribution close to the uniform distribution (low confidence).
Mammalian cell culture
HEK293T (human embryonic kidney) cells were cultured at 5% CO2 and 37°C in a humidified incubator and passaged every 2-3 days. Cells were maintained in DMEM (Thermo Fisher Scientific) supplemented with 10% (v/v) fetal calf serum (Thermo Fisher Scientific), 100 U mL-1 penicillin, 100 µg mL-1 streptomycin (Thermo Fisher Scientific). Prior to the assays, cells were checked for mycoplasma contamination by qPCR (Mycoplasma Check, GATC Eurofins).
12,500 cells were seeded in a 96 well culture plate and transfected the next day with Lipofectamine 3000 (Thermo Fisher Scientific) according to the manufacturer’s protocol. Cells were co-transfected with 33 ng of (i) a plasmid co-expressing Renilla and firefly luciferase as well as a sgRNA targeting the firefly reporter gene, (ii) 33 ng of a plasmid encoding Spy_Cas9 with C- and N-terminal nuclear localization signals (NLS) and an N-terminal 3xFlag tag, (iii) and either 11 ng, 33 ng or 99 ng of a vector encoding human codon-optimized Acr mutant variants. A stuffer plasmid (pUC19) was used to top up plasmid levels per sample to 165 ng, thereby keeping the total amount of DNA constant across all samples. 72 hours post-transfection, cells were washed with PBS and firefly and _Renilla luciferase activities were measured using the Dual-Glo luciferase assay system (Promega) according to manufacturer’s instructions. First, one volume Dual-Glo reagent was added to each well. Following a ten-minute incubation time, lysates were transferred to a white 96-well plate and firefly luciferase photon counts were measured using a FLUOstar Omega multimode reader (BMG Labtech). Subsequently, one volume Dual-Glo Stop & Glo was added to quench the firefly signal and activate Renilla luciferase, samples were incubated for ten minutes, and Renilla luciferase photon counts were measured. To calculate the reported luciferase activity values, firefly luciferase photon counts were normalized to Renilla luciferase photon counts in each sample.
Protein production for Acr single point mutants
Bacterial codon-optimized sequences of AcrIIA4 mutants were PCR amplified and cloned into pET-28b(+) using BsaI sites. Constructs were expressed with an N-terminal His-tag. Plasmids were transformed into BL21(DE3) competent cells (Thermo Fisher Scientific). Protein expression was conducted in LB medium supplemented with kanamycin, and cells were induced with 0.5 mM IPTG at an OD600 of 0.6, followed by incubation at 18°C for 16 hours. Next, cells were harvested by centrifugation, resuspended in lysis buffer (50 mM Tris-HCL pH 7.5, 200 mM NaCl, 1 mM DTT, 1 mM PMSF, and 0.3 mg ml-1 lysozyme) and sonicated. Subsequently, samples were centrifuged, and the cleared lysates were incubated with 1 mL HisPur Ni-NTA Resin (Thermo Fisher Scientific), washed and eluted. Eluates were desalted using 7K MWCO Zeba Spin Desalting columns, stored in phosphate-buffered saline solution (PBS). The flow-through was collected and concentrated using 3K MWCO Pierce Protein Concentrators PES (Thermo Fisher Scientific), followed by flash freezing.
Biolayer interferometry binding data were collected with a Gator Bioanalysis System and processed using the instrument’s integrated software. For the binding assay, His tagged designs were loaded onto anti-NTA biosensors (Gator probes) at 5 µg mL-1 in binding buffer (PBS pH 7.4) for 120 s. Homemade CRISPR-Cas9 RNP was diluted from 200 nM to 6.25 nM in binding buffer. After baseline measurement in the binding buffer alone, the binding kinetics were monitored by dipping the biosensors in wells containing the Cas9 RNP at the indicated concentration (association step) and then dipping the sensors back into baseline/buffer (dissociation).
Article TitleA deep mutational scanning platform to characterize the fitness landscape of anti-CRISPR proteins
Deep mutational scanning is a powerful method to explore the mutational fitness landscape of proteins. Its adaptation to anti-CRISPR proteins, which are natural CRISPR-Cas inhibitors and key players in the co-evolution of microbes and phages, would facilitate their in-depth characterization and optimization. Here, we developed a robust anti-CRISPR deep mutational scanning pipeline in Escherichia coli combining synthetic gene circuits based on CRISPR interference with flow cytometry-coupled sequencing and mathematical modeling. Using this pipeline, we created and characterized comprehensive single point mutation libraries for AcrIIA4 and AcrIIA5, two potent inhibitors of Streptococcus pyogenes Cas9. The resulting mutational fitness landscapes revealed that both Acrs possess a considerable mutational tolerance as well as an intrinsic redundancy with respect to Cas9 inhibitory features, suggesting evolutionary pressure towards high plasticity and robustness. Finally, to demonstrate that our pipeline can inform the optimization and fine-tuning of Acrs for genome editing applications, we cross-validated a subset of AcrIIA4 mutants via gene editing assays in mammalian cells and in vitro affinity measurements. Together, our work establishes deep mutational scanning as powerful method for anti-CRISPR protein characterization and optimization.