Materials and Methods

CRISPR-SE: a brute force search engine for CRISPR design

MATERIALS AND METHODSTo evaluate the accuracy and performance of existing search engines, we constructed four gRNA clusters derived from the hg38 and mm10 reference genome (Supplementary Table S1). The gRNAs were clustered based on the minimum number of mismatches compared to all other gRNAs in the reference genome. A gRNA is named as N-mm gRNA when it has exactly N mismatches with at least one gRNA. To identify the N-mm gRNAs clusters (N = 1–4), we first used CRISPR-SE to search gRNAs with minimum N or more mismatches (n = 1–5) to any other gRNAs. The gRNA datasets found with 1 or more mismatches are named as 1+mm dataset. Next, we constructed gRNA sets from 2+mm to 5+mm dataset. We further construct the 1-mm gRNAs cluster by excluding all 2+mm gRNAs from 1+mm; therefore, the gRNAs in 1-mm cluster have exactly 1 mismatch with at least another one gRNA. Similarly, we derived 2-mm to 4-mm clusters for the benchmark of gRNA search engines by excluding all N+1+mm dataset from N+mm dataset. As an example, the 4-mm gRNA TGGTGTACGATCTACTCTCG locates at chr1:858163-858182 on hg38; it has four mismatches with the gRNA TGGTGTACAATCTAGTCACA at chr18:63700374-63700393; the 4-mm gRNA have four or more gRNAs compared with any other gRNAs found in hg38 reference genome. The repeated gRNAs with exact matches are 0-mm cluster since there are at least two such gRNAs having the same gRNA sequence. In the benchmark, we also validate the gRNA clusters base on the off-targets searching results of each search engine to ensure the correctness of the cluster construction.BenchmarkWe perform the off-target search using the five common K-mer based alignment methods: BLAST, BLAT, Bowtie, Bowtie 2 and BWA; for the brute force approaches, we included FlashFry, Crisflash and CRISPR-SE. GuideScan (12) computes the genome wide gRNAs, the estimated processing time for GuideScan is at least three months for the genome-wide gRNA design. We also excluded Cas-OffFinder because the software requires the presence of GPU hardware. Also, we excluded the CRISPRitz method because CRISPRitz has been reported slower than FlashFry. The Crackling method was also excluded because Crackling focuses on the scoring function and the method does not report the alignment information. For each of the search engine, we performed the off-targets search using 1-mm to 4-mm gRNA datasets. Due to the time limit, only the first 10 000 gRNAs from each cluster are used. All programs were provided with the same computational resource (8 x 2300 MHz AMD Opteron 6276 processors, up to 384 GB memory).A gRNA has a fixed off-target sets searched against a reference genome, we evaluate the accuracy of a search engine by checking if the search engine can report an off-target with the minimum number of mismatches, as to classify an N-mm gRNA correctly. If the classification is incorrect, it is sufficient to show that the off-targets searching results are incomplete. For each of the five K-mer based alignment method, we evaluate both the accuracy and speed. For the brute force approaches, we only perform the speed test because the methods would report the same results using the same parameters as long as the method is implemented correctly.ParametersWe used the search parameters found in the publications as well as from the source code (Supplementary Table S2). For the K-mer based alignment methods, the most important parameters are ‘-a’ for Bowtie that reports all alignments (22–28); ‘-k 100’ for Bowtie2 to report up to 100 alignments (29,30); ‘-N’ for BWA to search all hits (31–34); ‘-task blastn’ for BLAST for short sequences (35–37); and ‘-oneOff=1’ for BLAT to triggers all alignments (38). Note that these parameters are required for the alignment methods to perform off-target search; the alignment tools will run in a ‘slow mode’ that enforce the alignment tools to report more alignments. We also extended the query gRNAs from 20-mer to 23-mer by adding each of the four nucleotides followed by GG. The extensions were applied to overcome the error of ‘query sequence too short’ and not all search engines accept the wild nucleotide ‘N’ in the input such as Bowtie. For the brute force approaches, we used the default parameters for each method.

Article TitleCRISPR-SE: a brute force search engine for CRISPR design

Abstract

We built a web interface with pre-computed gRNAs for human and mouse genomes. All scripts and results were available online athttp://renlab.sdsc.edu/CRISPR-SE/. The source code is available athttps://github.com/bil022/CRISPR-SE.


Login or Signup to leave a comment
Find your community. Ask questions. Science is better when we troubleshoot together.
Find your community. Ask questions. Science is better when we troubleshoot together.

Have a question?

Contact support@scifind.net or check out our support page.