How to correctly set "KEY" in count_spacers.py

I am a newfie in this field. To practice before real experiment, I read the nature protocol (DOI: 10.1038/nprot.2017.016) and downloaded python code. To get the test data, I downloaded the fastq file (https://www.ncbi.nlm.nih.gov/sra/SRX4589873) from the published paper (DOI: 10.1016/j.stem.2018.09.003).

Here, I performed the following code: “ python ./count_spacers.py -f NGS.fastq -i library_sequences.csv ”. However, I got the failed result (statistics.txt):

Number of perfect guide matches: 0

Number of nonperfect guide matches: 1148

Number of reads where key was not found: 20908880

Number of reads processed: 20910028

Percentage of guides that matched perfectly: 0.0

Percentage of undetected guides: 100.0

Skew ratio of top 10% to bottom 10%: Not enough perfect matches to determine skew ratio

And the failed count result (library_count.csv):

Pbx2_1 0

Pbx2_2 0

Pbx2_3 0

Pbx2_4 0

Pbx3_1 0

Pbx3_2 0

I tried to set the parameters as: “KEY_REGION_START = 0” and “KEY_REGION_END = 300”. And after tests, I doubt the error may be caused by “KEY = "TCCAGTCAC"”. However, I do not know how to choose the correct sequence as “KEY”. Could you please help me to perform this analysis (count_spacers.py)?

Here is the description of primers in this paper (DOI: 10.1016/j.stem.2018.09.003).

Table S2. Primers used to construct sgRNAs, related to STAR Methods.

Primers sgRNA sequence

pSLQ1373-­Forward primer gtatcccttggagaaccaccttgttgnnnnnnnnnnnnnnnnnnnngtttaagagctaagctggaaacagca

pSLQ1373-­Reverse primer gatcctagtactcgagaaaaaaagcaccgactcggtgccac

sgAscl1 gccagggggtggctctagaaa

sgNgn2 gccaatcacaatagacagcgg

Table S4. Primers used to sequence sgRNAs on Illumina HiSeq 4000 and MiSeq, related

to STAR Methods.

HiSeq forward primer aatgatacggcgaccaccgagatctacacagatcggaagagcacacgtctgaactccagtcacnnnnnngcacaaaaggaaactcaccct

HiSeq reverse primer caagcagaagacggcatacgagatcgactcggtgccactttttc

Appendix files:

Csv file of sgRNA library:

Maff_16 GAAACAAGGCTACCAGACCC

En1_2 GAAACAAACGACACCTCTGA

Bach2_3 GAAACAAAGTCTGCTCTCTG

Elf2_5 GAAACAAATACATTCCGCGG

Mir15a_6 GAAACAAATAGAGTTGAAGG

Tcf12_7 GAAACAAATCAACCCACTTT

Gata4_8 GAAACAAATGTCTCTGTCCC

Fastq file looks like:

@SRR7733488.1 1/1

NTTGGTCCTGTTTCCTCCTGTGGCCCGGAGTTTGATGTGCAGGGCAGTGATGCCCAACTCCTTGCACCTCTGGGCCACATTCTGGGCAGCCAACATCGCTGCATATGGAGAGGACTCATCTCGGTCAGCCTTCACCTTCATCCCACCAGT

+

#AAAFAJ<J<JJJJFJJJ7AJJJFJJFJF<FJJJ<FJJ-<FF<FJJJJJ--77<AJJJJFF7AF<-FJ7FJ-7<JF7FA--7F-7AJA-AJJAFF--7<7FF<AJFJ7-<JJJ<F7AFAJFJJA<J-AJAJFJAJFFJJJJJJJAJJ7AF

@SRR7733488.2 2/1

NTTAGCTCTGAGAGGTGAGCTTGTAGTAGACAAAACAAAAAGGAAAAAACGAAGAGAGCTCTCTGAAGAACAGAAGCAAGAAATTAAAGATGCTTTTGAGCTGTTTGATACCGACAAAGACCAAGCCATAGATTATCATGAATTAAAGGT


Login or Signup to leave a comment
Peach Hippogriffover 1 year ago

Here, I tried another .fastq file (SRR7733484) and got similar result as before.

I tried different "KEY" sequence, including "CAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC", "TCCAGTCAC", "G", and "CGAAACACC (Original KEY in the nature protocol (DOI: 10.1038/nprot.2017.016))".

Could anyone please help me?

CAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC
#Number of perfect guide matches: 0
#Number of nonperfect guide matches: 129
#Number of reads where key was not found: 21501967
#Number of reads processed: 21502096

TCCAGTCAC
#Number of perfect guide matches: 0
#Number of nonperfect guide matches: 1971
#Number of reads where key was not found: 21500125
#Number of reads processed: 21502096

G
#Number of perfect guide matches: 0
#Number of nonperfect guide matches: 21432740
#Number of reads where key was not found: 69356
#Number of reads processed: 21502096

CGAAACACC
#Number of perfect guide matches: 0
#Number of nonperfect guide matches: 423
#Number of reads where key was not found: 21501673
#Number of reads processed: 21502096

Peach Hippogriffover 1 year ago

New update:

Interestingly, I tried a primer sequence from the published paper (DOI: 10.1016/j.stem.2018.09.003) that provided the .fastq file.

First, I grep the sequence and make sure there are many sequence containing the "KEY" I assumed (" TCCAGTCAC " from the primer author used ("CAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC")). And I choose a sequence following the "KEY" and set it as one of sgRNA sequence ("Fake_sgRNQ", "CCAGTTCAATCTCGTATGCC") in sgRNA_library.csv file.

grep "TCCAGTCAC" NGS.fastq

AACAATGCGGTTTTCTAAAGGCTTCCCCAGTGATCTCCCATCTGCCCCACCTGGACAATTACTGGACTCCCCAGATCGGAAGAGCACACGTCTGAAC TCCAGTCAC CCAGTTCAATCTCGTATGCCGTCTTCTGCTTGAAAAAAAGAAAT

However, I still get the empty count result after running "python ./Scount_spacers.py -f NGS.fastq -i sgRNA_library_fake.csv".

statistics.txt

Pink Sirenabout 1 year ago

I have a same issue.

Did you solve it?

Find your community. Ask questions. Science is better when we troubleshoot together.
Find your community. Ask questions. Science is better when we troubleshoot together.

Have a question?

Contact support@scifind.net or check out our support page.