Spacer2PAM: A computational framework for identification of functional PAM sequences for endogenous CRISPR systems

Prediction of PAM Sequences

All CRISPR arrays were retrieved from CRISPRCasdb, part of CRISPR-Cas++, which can be found at (28). Alignment of CRISPR spacers to genomes was done via the NCBI BLAST web interface(29) using the BLASTn algorithm excluding Eukaryotes (taxid:2759) as well as the organism that encodes the CRISPR system. All other manipulations of sequence information and prediction of PAM sequences were completed using Spacer2PAM which is available at INSERT URL TO GITHUB ONCE PUBLIC. Spacer2PAM requires the following dependencies: dplyr, ggplot2, ggseqlogo(30), taxonomizr, HelpersMG, httr, jsonlite, spatstat.utils, and seqinr. Prophage prediction uses the Phaster API(31). More information about Spacer2PAM can be found in the program documentation.

Plasmid Construction

All individual plasmids and libraries in this work were generated by two-piece Gibson assembly using the GeneArt Seamless Plus kit. Linear backbone was generated by PCR of pMTL82254 using Kapa DNA polymerase Master Mix and purification by gel electrophoresis and extraction with Zymoclean Gel DNA recovery Kit. Linear dsDNA gBlocks ordered from IDT containing the PAM sequence upstream of C. autoethanogenum CRISPR array 1 spacer 19 were used as inserts. Gibson assembly products were transformed into chemically competent One Shot™ MAX Efficiency™ DH10B T1 Phage-Resistant Cells using standard procedures. DNA sequence was confirmed by Illumina MiSeq Sequencing V2 and V3 chemistry.

Spacer2PAM-informed PAM Prediction Screening

Spacer2PAM was applied to the type I-B CRISPR system of C. autoethanogenum using the comprehensive method. The top 25% of high scoring PAM predictions were used to determine a set of 16 four nucleotide PAM sequences that are likely to be functional. The Spacer2PAM-informed, unpooled PAM library constructs were transformed into E. coli HB101 carrying R702(32) (CA434(33)) in parallel. Conjugation of library members into C. autoethanogenum DSM 19630, a derivate of type strain DSM 10061, was performed as described earlier(33, 34) using erythromycin (250 µg/mL) and clarithromycin (5 µg/mL) for plasmid selection in E. coli and C. autoethanogenum, respectively, and trimethoprim (10 µg/mL) as counter selection against E. coli CA434. Optical density of donor E. coli cultures were measured prior to addition to C. autoethanogenum cells. Transconjugant colonies were counted following 4 days of incubation at 37°C under 1.7 × 105 Pa gas (55% CO, 10% N2, 30% CO2, and 5% H2) in gas-tight jars. This was performed in biological triplicate, with 3 separate cultures of donor E. coli conjugated to aliquots of a single C. autoethanogenum culture.

Randomized PAM Library Screening

The randomized, pooled PAM library was transformed into NEBExpress® E. coli and then purified by QIAprep Spin Miniprep Kit. An aliquot of this DNA was saved to determine PAM frequencies before exposure to the CRISPR system. Electroporation into C. autoethanogenum was performed as described previously(35, 36). Following recovery, cells were pelleted by centrifugation at 4000 X g for 10 minutes, 9.5 mL of supernatant was discarded, and cells were resuspended in 500 µL YTF. Resuspensions were split by volume and spread on YTF 1.5% agar supplemented with 5 µG/mL clarithromycin, allowed to dry for ∼30 minutes, and incubated at 37°C for 4 days under 1.7 × 105 Pa gas (55% CO, 10% N2, 30% CO2, and 5% H2) in gas-tight jars. 2.5 mL of Luria broth was added to each plate and plates were scraped. Total DNA from the cell suspension was purified using the MasterPure™ Gram Positive DNA Purification Kit. PCR across the PAM and spacer was performed using Kapa DNA polymerase Master Mix followed by purification by gel electrophoresis (1.5% agarose) and extraction with Zymoclean Gel DNA recovery Kit. Extracts were quantified by Quant-iT (Thermo Fisher Scientific), diluted to 1 ng/uL, and prepared for sequencing following the Illumina 16S amplicon protocol starting at the Index PCR step Ampure XP purified libraries were quantified by Quant-iT and sequenced using MiSeq Reagent Kit V3. Frequency of each PAM was determined by counting the occurrence of each PAM next to a correct protospacer sequence within the read. Briefly, all sequence reads are searched for the presence of the C. autoethanogenum Array 1 spacer 19 sequence and are binned as a forward read, reverse read, or does not contain the spacer. For all reads in the forward and reverse bins, the immediate 4 nucleotides upstream or downstream, respectively, are extracted. The sequences extracted from reverse reads are converted to their reverse complement to be compatible with the sequences extracted from forward reads and the two sets of sequences are combined. The frequency of each 4-nucleotide sequence in the combined list is then counted and recorded. The frequency of each PAM was converted to a relative frequency within the total library and the log2-fold change in relative frequency was calculated from exposure to the CRISPR system.

Article TitleSpacer2PAM: A computational framework for identification of functional PAM sequences for endogenous CRISPR systems


RNA-guided nucleases from clustered regularly interspaced short palindromic repeats (CRISPR) systems expand opportunities for precise, targeted genome modification. Endogenous CRISPR systems in many bacteria and archaea are particularly attractive to circumvent expression, functionality, and unintended activity hurdles posed by heterologous CRISPR effectors. However, each CRISPR system recognizes a unique set of PAM sequences, which requires extensive screening of randomized DNA libraries. This challenge makes it difficult to develop endogenous CRISPR systems, especially in organisms that are slow-growing or have transformation idiosyncrasies. To address this limitation, we present Spacer2PAM, an easy-to-use, easy-to-interpret R package built to identify potential PAM sequences for any CRISPR system given its corresponding CRISPR array as input. Spacer2PAM can be used in “Quick” mode to generate a single PAM prediction that is likely to be functional or in “Comprehensive” mode to inform targeted, unpooled PAM libraries small enough to screen in difficult to transform organisms. We demonstrate Spacer2PAM by predicting PAM sequences for industrially relevant organisms and experimentally identifying seven PAM sequences that mediate interference from the Spacer2PAM-predicted PAM library for the type I-B CRISPR system from Clostridium autoethanogenum. We anticipate that Spacer2PAM will facilitate the use of endogenous CRISPR systems for industrial biotechnology and synthetic biology.

Login or Signup to leave a comment
Find your community. Ask questions. Science is better when we troubleshoot together.
Find your community. Ask questions. Science is better when we troubleshoot together.

Have a question?

Contact or check out our support page.