Materials and Methods

Cas13d is a compact RNA-targeting type VI CRISPR effector positively modulated by a WYL domain-containing accessory protein

STAR METHODSContact for Reagent and Resource SharingFurther information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contacts, David Scott (oib.robra@ttocsd) and David Cheng (oib.robra@gnehcd). The authors plan to make the reagents widely available to the academic community through Addgene subject to a MTA.Experimental Model and Subject DetailsEndura electrocompetent E. coli E. coli were electroporated according to the manufacturer’s protocols. After mixing 25uL of thawed cells with DNA, the E. coli were electroporated with a Bio-Rad Gene Pulser Xcell (Bio-rad) using a 1.0mm cuvette at settings of 10 uF, 600 Ohms, and 1800 Volts. 975 uL of Recovery Media (Lucigen) were added directly after the pulse, which were then shaken for 1 hour at 37°C at 250 rpm.NEB 5-alpha Competent E. coli (High Efficiency) Following transformation and outgrowth according to the manufacturer’s protocols, the E. coli were plated onto LB agar with appropriate antibiotic selection and grown at 37°C overnight.NEB NiCo21(DE3) Expression vectors for protein purification (Key Resources Table) were grown in the E. coli T7 expression strain, NiCo21(DE3) (New England Biolabs). 1mL of overnight culture was inoculated into 1 liter of Luria-Bertani broth growth media (10g/L tryptone, 5 g/L yeast extract, 5g/L NaCl, Sigma) supplemented with 50 µg/mL Kanamycin. Cells were grown at 37°C to a cell density of 0.5–0.8 OD600. Protein expression was then induced by supplementing with IPTG to a final concentration of 0.2 mM and the culture continued to grow for 14–18 hours at 20°C.Method DetailsPipeline for Class 2 CRISPR-Cas loci identification Genome and metagenome sequences were downloaded from NCBI (Benson et al., 2013; Pruitt et al., 2012), NCBI whole genome sequencing (WGS), and DOE JGI Integrated Microbial Genomes (Markowitz et al., 2012). Proteins were predicted (Meta-GeneMark (Zhu et al., 2010) using the standard model MetaGeneMark_v1.mod, and Prodigal (Hyatt et al., 2010) in anon mode) on all contigs at least 5kb in length, and de-duplicated in favor of pre-existing annotations to construct a complete protein database. CRISPR arrays were identified and protein sequences for ORFs located within +/− 10kb from CRISPR arrays were grouped into CRISPR-proximal protein clusters. Clusters of fewer than 4 proteins, or comprising proteins from fewer than 3 contigs were discarded. Each of these remaining protein clusters were considered to be a putative effector of a CRISPR-Cas system. In addition to the CRISPR array and putative effector protein, many CRISPR-Cas systems also include additional proteins which enable adaptation, crRNA processing, and defense. Potential additional CRISPR-Cas system components associated with each of the predicted effectors were identified as clusters of protein-coding genes with high effector co-occurrence, and CRISPR enrichment or CRISPR representation of at least 15%.Effector co-occurrence was calculated as the percentage of loci containing the effector that also contain the potential co-occurring protein. The high co-occurrence threshold was a function of the cohesiveness of the effector cluster (more homogenous clusters requiring a higher threshold). The CRISPR enrichment was calculated as follows: 1) Up to 20 unique proteins were sampled from each protein cluster, and UBLAST (Edgar, 2010) was used to generate a rank ordered list of proteins by E-value from the complete protein database. 2) An E-value threshold was imposed to recover at least 50% of the members of the cluster. 3) CRISPR enrichment was calculated by dividing the number of CRISPR-proximal proteins below the E-value threshold by the total number of proteins below the threshold. CRISPR representation was calculated as the percentage of effector-proximal proteins in a CRISPR-proximal protein cluster. All clustering operations were performed using mmseqs2 (Steinegger and Söding, 2017).This information was incorporated into a database of (predicted) CRISPR-Cas systems, each composed of: 1) a CRISPR array, 2) a putative effector, and optionally, 3) clusters of potential co-acting proteins. For functional characterization of this database of candidate CRISPR-Cas systems, we constructed multiple sequence alignment for each family of putative effectors using MAFFT (Katoh and Standley, 2013) and conducted an HMM search using HMMer (Eddy, 2011) against protein family databases Pfam (Finn et al., 2014) and Uniprot (Bateman et al., 2017), as well as a BLASTN search of CRISPR spacer sequences against a reference set of phages. This analysis led to the detection of protein families corresponding to all previously identified class 2 CRISPR-Cas systems, indicating a minimal false negative rate. To identify novel class 2 CRISPR-Cas systems, features included above for the prediction of the functions of putative CRISPR-Cas systems were used to rank candidate families for follow-up functional evaluation.Phylogenetic analysis Maximum likelihood trees were constructed using FastTree (Price et al., 2010). For the phylogenetic analysis of Cas1, all Cas1 sequences that were assigned to Type II and Type VI-A in the course of previous work (Shmakov et al., 2017a) were used, and all Cas1 sequences associated with Cas13d were added. Altogether, a set of 817 Cas1 sequences was employed for phylogenetic analysis (Data File S1).For the WYL family analysis, in addition to automatically identified WYL proteins, we used PSI-BLAST (Altschul et al., 1997) to search over a local set of NCBI sourced proteins using RspWYL1 as a query. The results with E-value 0.01 or lower were added to the set of WYL proteins. Proteins smaller than 150 aa were discarded from the data set, and UCLUST (Edgar, 2010) with identity threshold 0.90 was used to obtain a non-redundant set. We then added all WYL proteins identified in the vicinity of Cas13d genes to form a set of 3908 WYL sequences for phylogenetic analysis.Multiple alignment and phylogeny of protein sequences were constructed as described previously (Peters et al., 2017). Briefly, the sequences were clustered by similarity, and for each cluster, a multiple alignment was built using MUSCLE (Edgar, 2004). Alignments were combined into larger aligned clusters by HHalign (Yu et al., 2015) if the resulting score between the two alignments was higher than the threshold; otherwise, the scores were recorded in a similarity matrix. The matrix was used to reconstruct a UPGMA tree. For each cluster, the alignment was filtered as follows: the alignment positions with the gap character fraction values of 0.5 and homogeneity values of 0.1 or less were removed. The remaining positions were used for tree reconstruction using FastTree with the WAG evolutionary model and the discrete gamma model with 20 rate categories. The same program was used to compute SH (Shimodaira-Hasegawa)-like node support values (Data Files S2, S3).Spacer Analysis Spacer sequences from CRISPR arrays within 3kb of Cas13d effectors were extracted. In the case of multiple contigs containing the same Cas13d sequence (eg duplicated locus), only the contig containing the longest CRISPR array was used. Subsequent spacer analysis closely follows the method described previously (Shmakov et al., 2017b). Briefly, the resulting 198 spacers were de-duplicated by comparison of direct and reverse complement sequences, to produce a set of 182 unique spacers. A BLASTN search with the command line parameters - word_size 7 -gapopen 5 -gapextend 2 -reward 1 -penalty -3 was performed for the unique spacer set against a database comprising the virus and prokaryotic sequences in NCBI. To identify prophage regions, (i) all ORFs within 3kb of prokaryotic matches were collected; (ii) a PSI-BLAST search was conducted against the proteins extracted from the virus part of NCBI, using the command line parameters -seg no -evalue 0.000001 -dbsize 20000000; (iii) a spacer hit was classified as prophage if it overlapped with an ORF with a viral match, or if two or more ORFs with viral matches were identified within the neighborhood of the spacer hit.DNA synthesis & effector library cloning The E. coli codon-optimized genes representing the minimal CRISPR effectors and accessory proteins were synthesized (Genscript) into a custom expression system derived from the pET-28a(+) (EMD-Millipore). Briefly, the Ruminococcus sp. synthesis product included Cas13d and WYL1 codon optimized for E. coli expression under the control of a Lac promoter and separated by an E. coli ribosome binding sequence. Following the open reading frames for Cas13d and WYL1, we included an acceptor site for a CRISPR array library driven by a J23119 promoter (Registry of Standard Biological Parts: http://parts.igem.org/Part:BBa_J23119). Our Eubacterium siraeum system was similarly constructed, but with only the effector protein.In tandem with the effector gene synthesis, we first computationally designed an oligonucleotide library synthesis (OLS) pool containing “repeat-spacer-repeat” sequences, where “repeat” represents the consensus direct repeat sequence found in the CRISPR array associated with the effector, and “spacer” represents sequences tiling the pACYC184 plasmid. The spacer length was determined by the mode of the spacer lengths found in the endogenous CRISPR array. The repeat-spacer-repeat sequence was appended with restriction sites enabling the bi-directional cloning of the fragment into the aforementioned CRISPR array library acceptor site, as well as unique PCR priming sites to enable specific amplification of a specific repeat-spacer-repeat library from a larger pool. The library synthesis was performed by Agilent Genomics.We next cloned the repeat-spacer-repeat library into the plasmid containing the minimal engineered locus using the Golden Gate assembly method. In brief, we first amplified each repeat-spacer-repeat from the OLS pool (Agilent Genomics) using unique PCR primers, and pre-linearized the plasmid backbone using BsaI to reduce potential background. Both DNA fragments were purified with Ampure XP (Beckman Coulter) prior to addition to Golden Gate Assembly Master Mix (New England Biolabs) and incubated as per manufacturer’s instructions. We further purified and concentrated the Golden Gate reaction to enable maximum transformation efficiency in the subsequent steps of the bacterial screen.Bacterial screening for effector activity The plasmid library containing the distinct repeat-spacer-repeat elements and Cas proteins was electroporated into Endura electrocompetent E. coli (Lucigen) using a Gene Pulser Xcell (Bio-rad) following the protocol recommended by Lucigen. The library was either co-transformed with purified pACYC184 plasmid, or directly transformed into pACYC184-containing Endura electrocompetent E. coli (Lucigen), plated onto agar containing Chloramphenicol (Fisher), Tetracycline (Alfa Aesar), and Kanamycin (Alfa Aesar) in BioAssay dishes (Thermo Fisher), and incubated for 10–12h. After estimation of approximate colony count to ensure sufficient library representation on the bacterial plate, the bacteria were harvested and DNA plasmid extracted using a QIAprep Spin Miniprep Kit (Qiagen) to create the ‘output library’. By performing a PCR using custom primers containing barcodes and sites compatible with Illumina sequencing chemistry, we generated a barcoded next generation sequencing library from both the pre-transformation ‘input library’ and the post-harvest ‘output library’, which were then pooled and loaded onto a Nextseq 550 (Illumina) to evaluate the effectors. At least two independent biological replicates were performed for each screen to ensure consistency.Bacterial screen sequencing analysis Next generation sequencing data for screen input and output libraries were demultiplexed using Illumina bcl2fastq. Reads in resulting fastq files for each sample contained the CRISPR array elements for the screening plasmid library. The direct repeat sequence of the CRISPR array was used to determine the array orientation, and the spacer sequence was mapped to the source plasmid pACYC184 or negative control sequence (GFP) to determine the corresponding target. For each sample, the total number of reads for each unique array element (ra) in a given plasmid library was counted and normalized as follows: (ra+1)/total reads for all library array elements. The depletion score was calculated by dividing normalized output reads for a given array element by normalized input reads.PFS and sequence constraint determination We want to determine if a subset of nucleotide positions in the region of the targeting area can explain strongly depleted targets. To this end, we define a targeting requirement to comprise a set of locations relative to a target sequence (Figure S4a) and the corresponding nucleotide sequences at those locations. For a given targeting requirement, we define the hit ratio (hr) as the ratio of the number of strongly depleted CRISPR arrays to the total number of library targets satisfying the requirement. When searching for a PAM or PFS of length k, we consider (nk) potential targeting requirement locations, where n = spacer length + 2 · flank length. The bit score for a potential targeting requirement is calculated as bitscore = Σ −hr log(hr) over all nucleotide sequences at the specified targeting requirement locations.Effector and accessory protein purification The effector or accessory protein expression construct was transformed into an E. coli T7 expression strain, NiCo21(DE3) (New England Biolabs) and grown as described in the Experimental Models and Subject Details section of the STAR methods. The cells were harvested by centrifugation and cell paste was resuspended in 80 ml of freshly prepared Lysis Buffer (50 mM Hepes pH 7.6, 0.5M NaCl, 10 mM imidazole, 14 mM 2-mercaptoethanol and 5% glycerol) supplemented with protease inhibitors (cOmplete, EDTA-free, Roche Diagnostics Corporation). The resuspended cells were broken by passing through a cell disruptor (Constant System Limited). Lysate was cleared by centrifugation twice at 28,000g for 30 min each. The clarified lysate was applied to a 5 ml HisTrap FF chromatography column (GE Life Sciences). Protein purification was performed via FPLC (AKTA Pure, GE Healthcare Life Sciences). After washing with Lysis Buffer, protein was eluted with a gradient of 10 mM to 250 mM of imidazole. Fractions containing protein of the expected size were pooled, concentrated in Vivaspin 20 ultrafiltration unit (Sartorius) and either used directly for biochemical assays or frozen at −80°C for storage. Protein purity was determined by SDS-PAGE analysis and protein concentration was determined by Qubit protein assay kit (Thermo Fisher).crRNA and substrate RNA preparation DNA oligo templates for crRNA and substrate RNA in vitro transcription were ordered from IDT (Tables S1b). Templates for crRNAs were annealed to a short T7 primer (final concentrations 4µM) and incubated with T7 RNA polymerase overnight at 37°C using the HiScribe T7 Quick High Yield RNA Synthesis kit (New England Biolabs). Annealing was performed by incubating T7 primer with templates for 2 minutes at 95°C foll owed by a −5°C/s ramp down to 23°C. Templates for substrate RNA were PCR amplified to yield dsDNA and then incubated with T7 RNA polymerase at 37°C overnight using the same T7 Quick High Yield RNA Synthesis kit. After in vitro transcription, samples were treated with DNase I (Zymo Research) and then purified using RNA Clean & Concentrator kit (Zymo Research).5’ end labeling was accomplished using the 5’ end labeling kit (VectorLabs) and with a IR800 dye-maleimide probe (LI-COR Biosciences). Body labeling of RNA was performed during in vitro transcription using the HiScribe T7 Quick High Yield RNA Synthesis kit (New England Biolabs). The in vitro transcription reactions contained 2.5 mM Fluorescein-12-UTP (Sigma Aldrich). Labeled RNA was purified to remove excess dyes using RNA Clean & Concentrator kit (Zymo Research). The RNA concentration was measured on Nanodrop 2000 (Thermo Fisher).Pre-crRNA processing assays Pre-crRNA cleavage assays were performed at 37°C in processing buffer (20 mM Tris pH8.0, 50 mM KCl, 1 mM EDTA, 10mM MgCl2, and 100 ug/ml BSA) unless otherwise indicated, with a final reaction concentration of 200nM of pre-crRNA and varying enzyme concentrations and EDTA as indicated. Reactions were incubated for 30 minutes, and quenched with the addition of 1ug/uL of proteinase K (Ambion) incubated for 10 minutes at 37°C. Afterwards, 50mM of EDTA was added to the reaction, which was then mixed with equal parts 2× TBE-Urea Sample Buffer (Invitrogen) prior to denaturing at 65C for 3 minutes. Samples were analyzed by denaturing gel electrophoresis on 15% TBE-Urea gels (Invitrogen) and stained using SYBR Gold nucleic acid stain (Invitrogen) for 10–20 minutes prior to imaging on a Gel Doc EZ (Biorad).RNA sequencing Sequencing of in vitro cleaved pre-crRNA began with performing and quenching the cleavage assays as described above. The reactions were then column purified using a RNA Clean and Concentrator-5 kit (Zymo Research). The RNA samples were then PNK treated for 3 hours without ATP to enrich for 3’-P ends, after which ATP was added and the reaction incubated for another hour to enrich for 5’-OH ends. The samples were then column purified, incubated with RNA 5’ polyphosphatase (Lucigen) and column purified again prior to preparation for next-generation sequencing using the NEBNext Multiplex Small RNA Library Prep Set for Illumina (New England Biolabs). The library was paired-end sequenced on a Nextseq 550 (Illumina), and the resulting paired end alignments were analyzed using Geneious 11.0.2 (Biomatters).Sequencing the small RNA from the in vivo bacterial screen began by extracting total RNA from harvested screen bacteria using the Direct-zol RNA MiniPrep Plus w/TRI Reagent (Zymo Research). Ribosomal RNA was removed using a Ribo-Zero rRNA Removal Kit for Bacteria, followed by cleanup using a RNA Clean and Concentrator-5 kit. The resultant ribosomal RNA depleted total RNA was treated with T4 PNK, RNA 5’ polyphosphatase, prepared for sequencing using the NEBNext Small RNA Library Prep Set, and analyzed as described above.Target cleavage assays Target cleavage assays were performed at 37°C in cl eavage buffer (20 mM HEPES pH 7.1, 50 mM KCl, 5 mM MgCl2 and 5% glycerol). Cas13-crRNA complex formation was performed in cleavage buffer by incubating a 2:1 molar ratio of protein to crRNA at 37°C for 5 minutes, and RspWYL1 was added to the Cas13-crRNA pre-incubation according to the experimental conditions. For the cleavage reactions at different Cas13 concentrations, the pre-formed Cas13-crRNA complexes were diluted on ice, keeping the Cas13-crRNA ratio constant at 2:1. The 5’ IR800 labeled target ssRNA and/or additional unlabeled and fluorescent body-labeled ssRNAs were then added to the pre-formed complex and incubated at 37°C for 30 minutes. The final concentration of short substrate RNAs was 100nM and the fluorescent body-labeled ssRNA for collateral effect visualization was 50nM, unless otherwise indicated. Reactions were quenched by adding 1ug/uL of proteinase K (Ambion) and incubating for 10 minutes at 37°C. Afterwards, 50mM of EDTA was added to the reaction, which was then mixed with equal parts 2× TBE-Urea Sample Buffer (Invitrogen) prior to denaturing at 65°C for 3 minutes. Samples were analyzed by denaturing gel electrophoresis on 6% or 15% TBE-Urea gels (Invitrogen). Fluorescence images were obtained using a Gel Doc EZ (Biorad), and near-infrared images were obtained using an Odyssey CLx scanner (LI-COR Biosciences). Afterwards, the gels were stained for 10–20 minutes using SYBR Gold nucleic acid stain (Invitrogen) and imaged on the Gel Doc EZ to verify the results from the fluorescence and IR images.QUANTIFICATION AND STATISTICAL ANALYSISBacterial screen sequencing analysis To analyze primary screening results (Figures 2, ​,6,6, S3), we calculated an inverted depletion score as normalized input reads/normalized output reads for each repeat-spacer-repeat. Note that in this formulation, a score of 1 represents no change in relative representation of a repeat-spacer-repeat element, and 10 represents a normalized 10-fold decrease in representation. For the effector deletion condition, the primary screening experiment was performed using pET28a(+) vectors that contain a repeat-spacer-repeat cloned from a library of N elements, with N = 10,002 for EsCas13d, and N = 8,844 for RspCas13d. Mean and standard deviations of inverted depletion scores were calculated as μ = 1.01 and 1.04; σ = 0.44 and 0.46 for two biological replicates of EsCas13d, and μ = 1.03 and 1.04; σ = 0.58 and 0.69 for two biological replicates of RspCas13d. Setting a minimum inverted depletion score threshold of 10 (maximum depletion score of 0.1 as defined in the main text) represents a deviation greater than 10 standard deviations from negative control conditions.DATA AND SOFTWARE AVAILABILITYData have been deposited in the following resources: Next-Generation Sequencing for bacterial DNA-sequencing and RNA-sequencing of E. coli primary screens, and RNA-sequencing of in vitro pre-crRNA processing: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA434567

Article TitleCas13d is a compact RNA-targeting type VI CRISPR effector positively modulated by a WYL domain-containing accessory protein

DOI
Published
2018 Mar
License

Abstract

Bacterial class 2 CRISPR-Cas systems utilize a single RNA-guided protein effector to mitigate viral infection. We aggregated genomic data from multiple sources and constructed an expanded database of predicted class 2 CRISPR-Cas systems. A search for novel RNA targeting systems identified subtype VI-D, encoding dual HEPN-domain containing Cas13d effectors and putative WYL-domain containing accessory proteins (WYL1 and WYL-b1–5). The median size of Cas13d proteins is 190 to 300 amino acids smaller than that of Cas13a-c. Despite their small size, Cas13d orthologs fromEubacterium siraeum(Es) andRuminococcus sp. (Rsp) are active in both CRISPR RNA processing and target as well as collateral RNA cleavage, with no target-flanking sequence requirements. The RspWYL1 protein stimulates RNA cleavage by both EsCas13d and RspCas13d, demonstrating a common regulatory mechanism for divergent Cas13d orthologs. The small size, minimal targeting constraints, and modular regulation of Cas13d effectors further expands the CRISPR toolkit for RNA-manipulation and detection.


Login or Signup to leave a comment
Find your community. Ask questions. Science is better when we troubleshoot together.
Find your community. Ask questions. Science is better when we troubleshoot together.

Have a question?

Contact support@scifind.net or check out our support page.