Genomewide novice - problems

Hi all

I am trying to analyse a genomewide phenotypic screen from the GeckoHuman A & B libraries. All seemed to go well, the library screening passed all QC and the exposure to the phenotype (radiation) also seemed to work well.

When I've sequenced the library I've found few, if any reads that map to a sgRNA. We're sequencing on a NextSeq, and the typical output from a FASTQ is:

`

@NB501034:182:HCCK3AFXY:4:21501:21741:9233 1:N:0:CAAGGCGA
ATCATGCTTAGCTTTATATATCTTGTGGAAAGGACGAAACACCGGCGTCAATGTATTGTCGCTGGGTTAGAGCTAGAAATA

AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE<EEEEEEE<EEEAEEEEEEE<EEEEAAEEEAA/
@NB501034:182:HCCK3AFXY:1:21112:8588:5801 1:N:0:CAAGGCGA
ATCATGCTTAGCTTTATATATCTTGTGGAAAGGACGAAACACCGTTCTCCCCGGCCCCCCCTCGGGTTAGAGCTAGAAATA

AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE<EEEEE/EEEAEAAEEEEEEE<<AEAEEEEEEAE/EAAEE/E/E/
@NB501034:182:HCCK3AFXY:4:11409:7798:14597 1:N:0:CAAGGCGA
ATCATGCTTAGCTTTATATATCTTGTGGAAAGGACGAAACACCGAGAGAGACAAAATTAAGAAGGGTTAGAGCTAGAAATA

AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEE
@NB501034:182:HCCK3AFXY:2:21305:10374:5874 1:N:0:CAAGGCGA
ATCATGCTTAGCTTTATATATCTTGTGGAAAGGACGAAACACCGGGGGGGGGGGGCGGGGGGGGGTTAGCGGTAGAAATAG

/AAAAEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEAE/EEEEEEEEEEEEEEEE/EEEEEEEEAAA////////EE////
@NB501034:182:HCCK3AFXY:2:21208:17187:19017 1:N:0:CAAGGCGA
ATCATGCTTAGCTTTATATATCTTGTGGAAAGGACGAAACACCGCTGAAGGAGGGGGGTTTGCGGGTTAGAGCTAGAAATA

AAAAAEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEAEEAEEEEEEEEAEEEEEAEEEEEEEEEEE
@NB501034:182:HCCK3AFXY:1:11308:21483:17461 1:N:0:CAAGGCGA
ATCATGCTTAGCTTTATATATCTTGTGGAAAGGACGAAACACCGTGTGTTCGTTAGGGCAAAAGGGTTAGAGCTAGAAATA

AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
@NB501034:182:HCCK3AFXY:4:11604:15592:12169 1:N:0:CAAGGCGA
ATCATGCTTAGCTTTATATATCTTGTGGAAAGGACGAAACACCGTTTATCTCTTGTGTTTGTTGGGTTAGAGCTAGAAATA

AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
@NB501034:182:HCCK3AFXY:4:21502:2766:9061 1:N:0:CAAGGCGA
ATCATGCTTAGCTTTATATATCTTGTGGAAAGGACGAAACACCGCCCACAGCCCTCCCCACATGGGTTAGAGCTAGAAATA

`

I can identify the 5' adapter sequence but little else. BLAST'ing the reads seems to give me the vector sequence! I am a bioinformatician at heart so struggle with this but would be grateful if someone could point out where I might have gone wrong?


Login or Signup to leave a comment
Fuchsia Kelpiealmost 4 years ago

I actually see that most of your reads map to sgRNAs that are likely from the GeCKO libraries, and the sequencing quality is pretty good. Your NGS reads should have part of the adapter sequence, followed by the primer sequence that binds to the U6 promoter in the plasmid, the variable 20-bp sgRNA target sequence, and part of the sgRNA scaffold.

Could you try using the Python script provided in Joung Nat Protocols 2017 to map your NGS data? If you did not use the primers described in the paper, then you may need to modify your "key region start" and "key region end" parameters.

Find your community. Ask questions. Science is better when we troubleshoot together.
Find your community. Ask questions. Science is better when we troubleshoot together.

Have a question?

Contact support@scifind.net or check out our support page.