Materials and Methods

Spacer2PAM: A computational framework to guide experimental determination of functional CRISPR-Cas system PAM sequences

Prediction of PAM sequences

All CRISPR arrays were retrieved from CRISPRCasdb, part of CRISPR-Cas++, which can be found at (28). Alignment of CRISPR array spacers to genomes was done via BLAST (29) either programmatically using Spacer2PAM or manually through the web interface. The BLASTn algorithm was used and Eukaryotes (taxid:2759) were excluded from the search database. All other manipulations of sequence information and prediction of PAM sequences were completed using Spacer2PAM which is available at Spacer2PAM requires the following dependencies: dplyr, ggplot2, ggseqlogo (30), taxonomizr, HelpersMG, httr, jsonlite, spatstat.utils, readr and seqinr. Prophage prediction uses the Phaster API (31).

Briefly, Spacer2PAM can be used by passing the CRISPR-Cas system's host organism name and a user-defined identifier to setCRISPRInfo, which sets the name of the CRISPR-Cas system and defines file output names. The user then chooses one of two options to input the CRISPR array spacer sequence data. If starting with a FASTA file containing each spacer as an individual sequence, the user may call FASTA2DF to arrange the spacer sequences and other user input information about the CRISPR spacers (array number, length of each array, direction of each array, and the array consensus repeat sequence) into a dataframe which is suitable for downstream analysis with Spacer2PAM. We recommend that the user then call DF2FASTA to generate another FASTA file containing all the spacer sequences. Although the user already supplied a FASTA file with the same sequence information, doing so ensures that the title of each sequence is compatible with downstream Spacer2PAM functions. Alternatively, a user may start with a formatted dataframe containing the headers ‘Strain’, ‘Spacers’, ‘Array.Orientation’, ‘Repeat’, ‘Array’, and ‘Spacer’ and pass it to DF2FASTA to generate a FASTA file containing the spacer sequences with the appropriate labels.

Next, the user then submits the sequences from the FASTA file for alignment to BLAST. This step can either be done programmatically or manually. To send a query to the BLAST server and retrieve the result, call FASTA2Alignment and pass the file location of the properly formatted FASTA file generated from DF2FASTA. While we recommend this method, some CRISPR-Cas systems may contain too many spacers and exceed the query length limit for the NCBI BLAST API. In this instance, FASTA2Alignment will return an error message and encourage the user to visit the BLAST web interface. If using the web interface, select the BLASTn algorithm and to exclude Eukaryotes (taxid:2759). This limits the alignment to relevant organisms and decreases both BLAST and Spacer2PAM computational time. Once the alignment is completed through the BLAST web server, the resulting hit table should be downloaded in .CSV format. The hit table file should then be passed to alignmentCSV2DF to convert it to a dataframe. Performing the alignment programmatically via FASTA2Alignment will generate this dataframe automatically without the need for alingmentCSV2DF.

The resulting dataframe should then be passed to joinSpacerDFandAlignmentDF. This function joins the spacer dataframe with the alignment dataframe, assigning spacer information to each alignment in the hit table. This function also converts the accession number of the alignment to the genus and species name of the organism that encodes the alignment sequence using the taxonomizr package. As the taxonomizr package requires the local download and set up of an SQL database, the user should be prepared to store the 65 GB (at time of writing this) database in a location stably accessible while running joinSpacerDFandAlignmentDF. The resulting joined dataframe is sufficient for PAM prediction by join2PAM, but we recommend calling Submit2Phaster if the user plans to select the prophage prediction option in join2PAM. Submit2Phaster interacts with the PHASTER prophage prediction web server to submit a nonredundant list of accession numbers from the joined dataframe for prophage detection. Depending on the volume of traffic on the PHASTER server, prediction can take minutes to weeks to complete.

Lastly, the joined dataframe is passed to join2PAM. This function is the core of Spacer2PAM and predicts a PAM sequence from the alignments generated by BLAST. Multiple combinations of filter sets can be run sequentially with a single call of join2PAM to enrich alignments to likely protospacers. These filtered alignments are then used to identify the genomes encoding the putative protospacer and the locations of potential PAMs. The algorithm then harvests these potential PAM sequences by taking the sequence upstream and downstream of the alignment based on the position of the alignment to the spacer and the user input flank length. This harvesting procedure accounts for alignments that do not include the ends of the spacer and appropriately adjusts the harvested sequence to ensure PAMs are not shifted. These sequences are then used to calculate significant nucleotide positions and determine frequent nucleotide identities at those positions, generating a PAM prediction. The output of join2PAM is the dataframe ‘collectionFrame’ that summarizes the filtering process and records the upstream and downstream predicted PAMs as well as their associated PAM score.

A template R script is provided in Supplementary File 1 to guide users on how to assemble a PAM prediction workflow using Spacer2PAM.

Plasmid construction

All individual plasmids and libraries in this work were generated by two-piece Gibson assembly using the GeneArt Seamless Plus kit. Linear backbone was generated by PCR of pMTL82254 using Kapa DNA polymerase Master Mix, purification by gel electrophoresis and extraction with Zymoclean Gel DNA recovery Kit. Linear dsDNA gBlocks ordered from IDT containing the PAM sequence upstream of C. autoethanogenum CRISPR array 1 spacer 19 were used as inserts. Gibson assembly products were transformed into chemically competent One Shot™ MAX Efficiency™ DH10B T1 Phage-Resistant Cells using standard procedures. DNA sequence was confirmed by Illumina MiSeq Sequencing V2 and V3 chemistry. All oligonucleotides and plasmids used in this study can be found in Supplementary Table S1.

Spacer2PAM-informed PAM prediction screening

Spacer2PAM was applied to the type I-B CRISPR-Cas system of C. autoethanogenum using the ‘Comprehensive’ method. The top 25% of high scoring PAM predictions were used to determine a set of 16 four-nucleotide PAM sequences that are likely to be functional (Supplementary Table S3). The Spacer2PAM-informed, unpooled PAM library constructs were transformed into E. coli HB101 carrying R702 (32) (CA434 (33)) in parallel. Conjugation of library members into C. autoethanogenum DSM 19630, a derivate of type strain DSM 10061, was performed as described earlier (33,34) using erythromycin (250 μg/mL) and clarithromycin (5 μg/mL) for plasmid selection in E. coli and C. autoethanogenum, respectively, and trimethoprim (10 μg/mL) as counter selection against E. coli CA434. Optical density of donor E. coli cultures were measured prior to addition to C. autoethanogenum cells. Transconjugant colonies were counted following 4 days of incubation at 37°C under 1.7 × 105 Pa gas (55% CO, 10% N2, 30% CO2, and 5% H2) in gas-tight jars. This was performed in biological triplicate, with 3 separate cultures of donor E. coli conjugated to aliquots of a single C. autoethanogenum culture.

Randomized PAM library screening

The randomized, pooled PAM library was transformed into NEBExpress® E. coli and then purified by QIAprep Spin Miniprep Kit. An aliquot of this DNA was saved to determine PAM frequencies before exposure to the CRISPR-Cas system. Electroporation into C. autoethanogenum was performed as described previously (35,36). Following recovery, cells were pelleted by centrifugation at 4000 X g for 10 minutes, 9.5 mL of supernatant was discarded, and cells were resuspended in 500 μL YTF. Resuspensions were split by volume and spread on YTF 1.5% agar supplemented with 5 μG/mL clarithromycin, allowed to dry for ∼30 minutes, and incubated at 37°C for 4 days under 1.7 × 105 Pa gas (55% CO, 10% N2, 30% CO2, and 5% H2) in gas-tight jars. 2.5 mL of Luria broth was added to each plate and plates were scraped. Total DNA from the cell suspension was purified using the MasterPure™ Gram Positive DNA Purification Kit. PCR across the PAM and spacer was performed using Kapa DNA polymerase Master Mix followed by purification by gel electrophoresis (1.5% agarose) and extraction with Zymoclean Gel DNA recovery Kit. Extracts were quantified by Quant-iT (Thermo Fisher Scientific), diluted to 1 ng/μL, and prepared for sequencing following the Illumina 16S amplicon protocol starting at the Index PCR step. Ampure XP purified libraries were quantified by Quant-iT and sequenced using MiSeq Reagent Kit V3. Frequency of each PAM was determined by counting the occurrence of each PAM next to a correct protospacer sequence within the read. Briefly, all sequence reads are searched for the presence of the C. autoethanogenum Array 1 spacer 19 sequence and are binned as a forward read, reverse read, or does not contain the spacer. For all reads in the forward and reverse bins, the immediate 4 nucleotides upstream or downstream, respectively, are extracted. The sequences extracted from reverse reads are converted to their reverse complement to be compatible with the sequences extracted from forward reads and the two sets of sequences are combined. The frequency of each 4-nucleotide sequence in the combined list is then counted and recorded. The frequency of each PAM was converted to a relative frequency within the total library and the log2-fold change in relative frequency was calculated from exposure to the CRISPR-Cas system.system.

Article TitleSpacer2PAM: A computational framework to guide experimental determination of functional CRISPR-Cas system PAM sequences


RNA-guided nucleases from CRISPR-Cas systems expand opportunities for precise, targeted genome modification. Endogenous CRISPR-Cas systems in many prokaryotes are attractive to circumvent expression, functionality, and unintended activity hurdles posed by heterologous CRISPR-Cas effectors. However, each CRISPR-Cas system recognizes a unique set of protospacer adjacent motifs (PAMs), which requires identification by extensive screening of randomized DNA libraries. This challenge hinders development of endogenous CRISPR-Cas systems, especially those based on multi-protein effectors and in organisms that are slow-growing or have transformation idiosyncrasies. To address this challenge, we present Spacer2PAM, an easy-to-use, easy-to-interpret R package built to predict and guide experimental determination of functional PAM sequences for any CRISPR-Cas system given its corresponding CRISPR array as input. Spacer2PAM can be used in a ‘Quick’ method to generate a single PAM prediction or in a ‘Comprehensive’ method to inform targeted PAM libraries small enough to screen in difficult to transform organisms. We demonstrate Spacer2PAM by predicting PAM sequences for industrially relevant organisms and experimentally identifying seven PAM sequences that mediate interference from the Spacer2PAM-informed PAM library for the type I-B CRISPR-Cas system from Clostridium autoethanogenum. We anticipate that Spacer2PAM will facilitate the use of endogenous CRISPR-Cas systems for industrial biotechnology and synthetic biology.

Login or Signup to leave a comment
Find your community. Ask questions. Science is better when we troubleshoot together.
Find your community. Ask questions. Science is better when we troubleshoot together.

Have a question?

Contact or check out our support page.