Cloning, expression and purification
The cas1 (PF1117) and cas2 (PF1118) genes were amplified from Pyrococcus furiosus COM1 genomic DNA, individually cloned into pET21d expression vectors, and transformed into Escherichia coli BL21 RIPL strain. Cultures were grown at 37°C in 200 ml of Luria broth to an OD600 of 0.4–0.6, and expression of the C-terminal 6× histidine tagged proteins was induced with 1 mM IPTG during overnight growth at room temperature. Harvested cells were lysed in 40 mM Tris, 500 mM KCl, 10% glycerol, 5 mM imidazole, pH 7.5. Following thermal precipitation at 70°C for 30 min, the cell lysate was centrifuged at 14 000 rpm at 4°C for 30 min and the soluble fraction was collected and filtered (0.8 μm filter pore size Millex filter unit, Millipore). His-tagged proteins were purified by Ni2+ affinity column chromatography, using a stepwise increase of imidazole (10, 20, 50, 100, 250 and 500 mM) in 40 mM Tris, 500 mM KCl, 10% glycerol, pH 7.5. Peak elution fractions were dialyzed using Slide-a-lyzer mini dialysis cassettes (Thermo Fisher) into 40 mM Tris, 200 mM KCl, 10% glycerol, pH 7.5, and stored at 4°C prior to use for functional assays. C-terminal 6× histidine tagged proteins were confirmed to be functional for adaptation in vivo (data not shown), ruling out potential detrimental effects of the tag.
DNA substrate preparation
DNA oligonucleotides were from Eurofins Genomics (for minimal CRISPR substrates and PCR primers) and Integrated DNA Technologies (for pre-spacer DNA and hairpin CRISPR substrates). Oligonucleotides used to make pre-spacers and CRISPR substrates were separated by 15% denaturing polyacrylamide gel electrophoresis in 1× TBE, detected by ethidium bromide staining, and the band of the expected oligonucleotide size was excised. Oligonucleotides were eluted from the gel slices overnight at 4°C in 500 μl of elution buffer (0.5 M ammonium acetate, 1 mM EDTA (pH 8.0), and 0.1% SDS), extracted with phenol/chloroform/isoamyl alcohol (pH 8.0), ethanol precipitated, and resuspended in 20 mM Tris, 100 mM KCl, 5% glycerol, pH 7.5. Corresponding oligonucleotides were annealed by incubating at 95°C for 5 min followed by slow cooling until 23°C. Annealing was confirmed by 10% non-denaturing polyacrylamide gel electrophoresis in 1× TBE. Pre-spacers and half-site CRISPR substrates were 5′ radiolabeled with T4 polynucleotide kinase (PNK) and γ-32P ATP. In the case of the half-site substrates, a second gel extraction and precipitation was performed after annealing. Oligonucleotide sequences can be found in Supplementary Table S1.
Spacer integration assay
Unless stated otherwise, a final concentration of 1 μM Cas1, 1 μM Cas2, 1 mM DTT and 10 mM MgCl2 was incubated in reaction buffer (20 mM Tris, 100 mM KCl, 5% glycerol, pH 7.5) at 4°C for 1 h. 20 nM of radiolabed pre-spacer or 100 nM of unlabeled pre-spacer was added to the reactions and incubated at 4°C for an additional 15 min. Finally, plasmid or linear DNA was added to a final concentration of 5 nM for pCR7, or 100 nM for minimal CRISPR substrate, and the reaction was incubated at 70°C for 1 h. Reactions with pCR7 were quenched with EDTA and proteinase K (Life Technologies). Products were mixed with gel loading dye (purple, NEB) and separated on 1% agarose gels in 1× TAE. Reactions with minimal CRISPR substrates were quenched with an equal volume of Gel Loading Buffer II (Thermofisher) and 25 mM EDTA. Samples were boiled for 5 min before separation by 12% denaturing 7 M urea-containing polyacrylamide gel electrophoresis in 1× TBE. Gels were dried and radioactivity was detected with phosphorimaging (Storm 840 Scanner GE Healthcare).
Analysis of integration by high-throughput sequencing
To analyze integrations by high-throughput sequencing, the spacer integration assay was performed as described above using unlabeled pre-spacer. Following incubation, DNA was isolated using the DNA Clean and Concentrator Kit (Zymo Research, Irvine, CA, USA). For the plasmid integration samples, excess un-integrated pre-spacer was removed using Agencourt AMPure XP beads (Beckman Coulter, Indianapolis, IN, USA). Next, Illumina adapter sequence with an N10 random primer was annealed to the plasmid DNA and extended (thermocycler conditions: 98°C for 30 s, 25°C for 30 s, 35°C for 30 s, 45°C for 30 s, and 72°C for 5 min). Following extension, excess adapter was removed using AMPure beads, and PCR was done to amplify plasmid DNA that contained integrated pre-spacer: forward primers were specific for the pre-spacer, while reverse primers targeted the Illumina adapter introduced with the random anneal and extension step. This amplified both full-site and half-site integration events with no discrimination. Illumina barcodes and additional adapter sequences were added with a final PCR and the resulting library was separated on a 1% agarose gel to select for DNA in a 400–700 bp size range. DNA was isolated using the Zymo Gel DNA Recovery Kit (Zymo Research, Irvine, CA, USA) and sequenced on an Illumina MiSeq in a 100 by 50 cycle run. Only the 100 bp Read 1 data was used in this analysis.
For the minimal linear CRISPR substrate products, 1 μl of eluted DNA was used as a PCR template. Primers to add Illumina adaptor sequences were annealed to the newly integrated spacer and the 3′ end of either the plus or minus strand of the CRISPR substrate. DNA Clean and Concentrator Kit (Zymo Research, Irvine, CA) was used to isolate the PCR product, and 1 μl of this product was used as the template for a second PCR using primers to add Illumina barcodes. These products were purified on a 1% agarose gel and extracted with a Gel Purification Kit (Zymo Research, Irvine, CA).
Mapping integration events
After sequencing, samples were de-multiplexed by barcode and analyzed to determine sites of integration. For plasmid data, the complete pre-spacer sequence was located in each read and 50 bp of sequence immediately downstream from the end of the pre-spacer was extracted. These 50 bp sequences were aligned to the appropriate plasmid reference using Bowtie (47). To visualize the distribution of integration events, alignment output files were converted into coverage files using bedtools (48) and displayed on a custom UCSC genome browser track hub (https://www.genome.ucsc.edu). An initial inspection of the integration tracks revealed that large peaks occurred outside of the CRISPR arrays and data suggested that these peaks were due to particular sequence and spatial features. These trends were both analyzed in an unbiased plasmid-wide manner. To determine sequence preferences at the sites of integration, the base at the integration point, along with upstream and downstream context sequence, was extracted from the reference sequence (bedtools) and used to make sequence logos (49). For spacing trends, we took the browser track files and assessed the distances between two large peaks occurring on the same strand or on opposite strands. To do this, 500 random 50 bp intervals (‘windows’) were selected for each plasmid. Within each of these windows, the two highest peaks on the plus strand and the minus strand were identified and the bp distance between these peaks was determined (highest to second highest on the plus strand, highest to second highest on the minus strand, highest plus strand to highest minus strand). Distance values from all 500 windows were then binned and counted. For the minimal linear CRISPR integration data, the spacer-target junction was determined from each read and counts for each potential integration point were totaled.
Article TitleCRISPR repeat sequences and relative spacing specify DNA integration byPyrococcus furiosusCas1 and Cas2
Acquiring foreign spacer DNA into the CRISPR locus is an essential primary step of the CRISPR–Cas pathway in prokaryotes for developing host immunity to mobile genetic elements. Here, we investigate spacer integration in vitro using proteins from Pyrococcus furiosus and demonstrate that Cas1 and Cas2 are sufficient to accurately integrate spacers into a minimal CRISPR locus. Using high-throughput sequencing, we identified high frequency spacer integration occurring at the same CRISPR repeat border sites utilized in vivo, as well as at several non-CRISPR plasmid sequences which share features with repeats. Analysis of non-CRISPR integration sites revealed that Cas1 and Cas2 are directed to catalyze full-site spacer integration at specific DNA stretches where guanines and/or cytosines are 30 base pairs apart and the intervening sequence harbors several positionally conserved bases. Moreover, assaying a series of CRISPR repeat mutations, followed by sequencing of the integration products, revealed that the specificity of integration is primarily directed by sequences at the leader-repeat junction as well as an adenine-rich sequence block in the mid-repeat. Together, our results indicate that P. furiosus Cas1 and Cas2 recognize multiple sequence features distributed over a 30 base pair DNA region for accurate spacer integration at the CRISPR repeat.