Design of the sgRNA library
We retrieved 92 sentinel genetic variants associated with coronary artery disease (CAD) at genome-wide significant levels (P-value ≤5×10−8) from four GWAS meta-analyses available at the time of the design of this experiment5,6,18,19. For the design of the sgRNA library, we included all sentinel variants as well as variants in strong LD (r_2 >0.8 in the 1000 Genomes Project European-ancestry populations). For each variant - sentinel and LD proxy - we identified all possible sgRNA in a 100-bp window centered on the variant itself. We prioritized sgRNA with the highest predicted quality using the CRISPR OffTarget Tool (version 2.0.3)43 with a Targeting_guide_score ≥ 20 and the “matches with 0 mismatches” = 1 and “matches with 1 mismatch” = 0 settings. We discarded sgRNA that overlapped heterozygous variants, indels and/or multi-allelic variants in the teloHAEC genome (build hg19). We selected sgRNA targeting essential genes from a previously published study44. For potential positive control genes (_SELE, SELP, ICAM1, VCAM1, PECAM1, NOS3, VWF, SOD2, SOD3, GPX3, CAT, ITPR1, ITPR2, ITPR3, ATP2A2, ATP2A3, PLN, CAV1, and TRPV4), we selected sgRNA from the Human GeCKOv2 CRISPR knockout pooled library45. We also selected sgRNA that targeted the promoter (300-bp window before the transcriptional start site) of positive control genes for the CRISPRa (dCas9-VP64) experiments. For all selected loci (variants, coding sequences, gene promoters), we retained the five top scoring sgRNA for the library design. Finally, we added two sgRNA for the SELE locus (SELE_g1, SELE_g2) that we frequently use to validate TNFα stimulation. This resulted in a final library of 8051 sgRNA (Supplementary Table 2).
The sgRNA were synthesized in duplicates by Agilent Technologies (Cat-#: G7555B) to accommodate the specific requirements of the Cas9/dCas9-KRAB and dCas9-VP64 (specific MS2 tracrRNA) experiments. We amplified each specific pool of oligonucleotides as previously described16, with the following small modifications: we performed two PCR using NebNext High fidelity Master mix (Cat-#: M0541L). The first PCR was used to amplify each pool separately using 2.5ng of pooled oligonucleotides and 500nM of each primer (for the Cas9/dCas9-KRAB library, we used U6_subpool_fwd and Guide_CM_barcode1_rev; for the dCas9-VP64 library, we used U6_subpool_fwd and Guide_MS2_Barcode2_rev). Cycling conditions for PCR1 were 98°C for 30 sec, then 15 cycles of 98°C for 10 sec; 55°C for 10 sec; 72°C for 15 sec and a final step of 72°C for 2 min and a 10°C hold. We performed the second PCR to add homologous sequences, using the U6_screen_fwd and Tracr_rev oligonucleotides for the Cas9/dCas9-KRAB library, and the U6_screen_fwd and Tracr_MS2_rev oligonucleotides for the dCas9-VP64 library, in both cases using ⅕ of PCR1 as template. Cycling conditions for PCR2 were 98°C for 30 sec, then 10 cycles of 98°C for 10 sec; 55C for 10 sec; 72C for 15 sec and a final cycle of 72C for 2 min and 10C hold. See table Supplementary Table 9 for primer details.
After gel extraction and PCR purification, we performed Gibson assembly in both respective vectors (pHKO9-Neo and lentisgRNA(MS2)-zeo backbone addgene 61427). For pHKO9-Neo, we replaced the Crimson fluorescent gene in the pHKO9-Crimson-CM vector (gift from Dan Bauer’s lab) by a neomycin resistance (NeoR) sequence. Briefly, we amplified the NeoR gene from our pCas9-Neo vector46 using BsiWI-Neo-Fwd and MluI-Neo_rev primer (Supplementary Table 9). After digestion by BsiWI and MluI, we cloned the segment in pHKO9-Crimson_CM, which had been digested with BsiWI and MluI. We amplified each library using ten independent maxi-preparations (Macherey-Nagel cat# 740424). To control the quality of both libraries, we sequenced them on an Illumina HiSeq4000 instrument and calculated the Gini index, which summarizes read distribution across sgRNA in a given pool. For a good-quality sgRNA library, the expected Gini index is ≤0.2, and we obtained Gini indexes of 0.050 and 0.052 for the Cas9/dCas9-KRAB and dCas9-VP64 library, respectively.
Engineering of teloHAEC cell lines to stably express Cas9 variants
TeloHAEC are immortalized human aortic endothelial cells obtained by over-expressing telomerase (ATCC CRL-4052). These cells have a normal female karyotype (46;XX) and exhibit many of the properties and functions of human vascular endothelial cells9. We generated our teloHAEC cells models expressing either Cas9, dCas9-KRAB or dCas9-VP64 + MPHv2 using Addgene vectors #52962, #46911, #61425 and #89308, and lentiviral infection as previously described46.
Pooled CRISPR screen experiments
We produced four batches of lentiviruses for each sgRNA library pool (Cas9/dCas9-KRAB, dCas9-VP64). We infected each teloHAEC cell line (Cas9, dCas9-KRAB, dCas9-VP64) at a multiplicity of infection of 0.3 using each batch of viruses separately. Following viral infection, we selected cells using zeocin (teloHAEC-dCas9-VP64) or G418 (teloHAEC-Cas9/-dCas9-KRAB) for five (teloHAEC-dCas9-VP64) or seven days (teloHAEC-Cas9/-dCas9-KRAB) in vascular cell basal medium (ATCC PCS-100-030) to remove any cells that did not incorporate a vector. After selection, we stimulated cells expressing Cas9 or dCas9-KRAB using TNFα (10ng/μl) for four hours to induce a pro-inflammatory response; we did not stimulate cells expressing dCas9-VP64, reasoning that the VP64 transcriptional domain should activate gene expression. Following TNFα stimulation, we immunostained cells (around 50M cells) with antibodies linked to phycoerythrin for adhesion molecules (E-selectin (BD BIOSCIENCES Cat-#: 551145), VCAM-1 (Cat-#: 12-1069-42), ICAM-1 (Cat-#: 12-0549-42)) or we incubated with fluorescent dye-based reagents for endothelial signaling markers: (nitric oxide (NO) (DAF-FM Diacetate, Cat-#: D23844)), reactive oxygen species (ROS) (CM-H2DCFDA, Cat-#: C6827), calcium signaling (Fura Red, Cat-#: F3021)). We calibrated the FACS assays with positive control treatments to make sure that we could robustly detect changes in the measured phenotypes. Antibodies and fluorescent dye-based reagents were titrated to use optimal concentrations. We also quantified how teloHAEC were responding to ionomycin for calcium signaling, to sodium nitroprusside for NO and to TNFα for reactive oxygen species. For adhesion molecules, we utilized sgRNA targeting coding exons and promoter regions of SELE, ICAM1 and VCAM1 as positive controls. Unless otherwise stated, we purchased all antibodies and dyes from ThermoFisher Scientific. Subsequently, we sorted stained cells by flow cytometry on a BD FACSARIA FUSION flow cytometer to collect the top and bottom 10% of fluorescently labeled cells. FACS traces were generated with FlowJo (BD Biosciences). We extracted genomic DNA from both top and bottom 10% cell fractions separately (around 5M cells in each fraction) using the QIAGEN DNeasy Blood and Tissue kit (Cat No. 69504) according to manufacturer’s instructions.
Amplification and sequencing of pooled CRISPR experiments
We amplified sgRNA sequences from genomic DNA via PCR, followed by a cleanup step using the QIAGEN QIAquick PCR purification kit (Cat-#: 28104) according to the manufacturer’s instructions. We used the primer sequences and PCR settings as previously described in ref. 16. We created sequencing libraries using Illumina TruSeq adapters according to the manufacturer’s protocols. We sequenced the libraries on an Illumina Hiseq4000 instrument at the McGill University and Genome Quebec Innovation Centre (MUGQIC). Generally 6 samples were multiplexed per sequencing lane for a target read coverage of ~500 reads per sgRNA per sample (Supplementary Figure 4B).
Computational analysis of pooled CRISPR screen data
We processed raw sequencing data from the BCL to the FASTQ format using bcl2fastq at MUGQIC. Raw FASTQ reads were quality-controlled using FastQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and MultiQC47. We performed downstream analysis of sgRNA sequencing data using MAGECK (v.0.5.9)48. We quantified sgRNA sequences using MAGECK count against the list of sgRNA sequences in the library (Supplementary Table 3), allowing for no mismatches in the sgRNA sequence. We then tested the difference in sgRNA counts between cell fractions or conditions using MAGECK maximum likelihood estimation (mle) method with median normalization49.
UMAP representation of sample-level count data
We normalized raw sgRNA counts using variance stabilizing transformation (vst) in DESeq250. To account for baseline differences between plasmid preparations, we further normalized samples to their respective vector library by dividing the vst normalized sgRNA count by the vst normalized count of the Cas9/dCas9-KRAB or dCas9/VP64 library, respectively. We calculated principal components using the top 10% most variable sgRNA (805 sgRNA) across all cell sorted samples based on normalized counts. Next, we used the loadings from the first three principal components in Uniform Manifold Approximation and Projection (UMAP)51 to create a two-dimensional embedding of the normalized sgRNA count data. Each dot in the UMAP plot represents one sequenced sample (top or bottom 10% of stained cells).
Analysis of sgRNAs targeting essential genes
To test for potential effects of sgRNA on endothelial cell death and proliferation, we compared sgRNA counts of all samples across the same cellular model (Cas9, dCas9-KRAB, dCas9-VP64) against the respective baseline vector library sgRNA count using MAGECK mle. We used sgRNAs targeting essential genes in the teloHAEC Cas9 cellular model as positive controls.
Single sgRNA validation
We individually cloned each sgRNA for validation as previously described52. We produced lentiviruses, infected cells, performed antibiotic selection, and stained cells as for the pooled CRISPR screen. We analyzed cells using flow cytometry (BD FACSCelesta (BD Biosciences, San Jose, CA, USA) equipped with a 20 mW blue laser (488 nm), a 40 mW red laser (640 nm), and a 50 mW violet laser (405 nm). For each experiment, we measured the mean fluorescent intensity (MFI) obtained for sgRNA of interest and compared it with the MFI for control sgRNA (safe-harbor and/or scrambled sgRNA). We performed each experiment at least three times. For statistical analyses, we used Student’s t-test and determined that a sgRNA had a significant effect on the measured phenotype when a one-tailed P-value ≤0.05.
For DHX38 Crimson experiments, we individually cloned each sgRNA in pHKO9-Crimson-CM vector (gift from Dan’s Bauer lab). We produced lentiviruses, infected cells and performed flow cytometry on a BD FACSARIA FUSION flow cytometer at day 2, day 4 and day 7 post-infection. We analyzed the percentage of Crimson positive cells and we sorted Crimson positive and negative cells to extract RNA in each fraction. We extracted total RNA using RNeasy Plus Mini Kit (Qiagen cat #: 74136). We measured RNA integrity and concentration using Agilent RNA 6000 Nano II assays (Agilent Technologies) on an Agilent 2100 Bioanalyzer and Take3 on Cytation V (Biotek). We reverse transcribed 750ng of total RNA using random primers and 1 U of the MultiScribe Reverse Transcriptase (Applied Biosystems) in a 20 μL reaction volume at 100 mM dNTPS and 20 U of RNase inhibitor with these three steps: 10 min at 25 °C, 120 min at 37 °C and 5 min at 85 °C. We followed the MIQE guidelines to assess quality and reproducibility of our qPCR results 53. We performed qPCR in triplicates for all samples using: 1.25 μL of cDNA (1/50 dilution), 5 μL of Platinum SYBR Green qPCR SuperMix-UDG (Life Technologies) and 3.75 μL of primer pair mix at 0.8 μM on a CFX384 from Biorad. We used the following thermal profile: 10 min at 95 °C, and 40 cycles of 30 s at 95 °C, 30 s at 55 °C and 45 s at 72 °C. We carried out melting curve analyses after the amplification process to ensure the specificity of the amplified products. We also simultaneously performed qPCR reactions with no template controls for each gene to test the absence of non-specific products. Cq values were determined with the CFX Manager 3.1 (Bio-Rad) software and expression levels were normalized on the expression levels of the house-keeping genes TATA-box binding protein (TBP), hypoxanthine-guanine phosphoribosyltransferase (HPRT), and glyceraldehyde 3-phosphate dehydrogenase (GAPDH) using the ΔΔCt method. The primer sequences are in Supplementary Table 8.
PCR for determination of CRISPR-Cas9-induced indels
We isolated gDNA using QuickExtract DNA Extraction Solution (Epicentre, QE0905) from 1×105 cells. We used 100 or 200 ng of gDNA as a template for PCR reaction with the corresponding primers (see Supplementary Table 9). gDNA from parental teloHAEC cells was used as control. Obtained PCR products were analysed by electrophoresis on a 1% agarose gel prior to Sanger sequencing. We used TIDE (tracking of indels deconvolution) software for analysis54.
Assays for assessment of cell senescence
Using the same experimental design (DHX38-Crimson), we performed beta-galactosidase staining using the CellEvent Senescence Green Flow Cytometry Assay Kit from Invitrogen at day 2, day 4 and day 7 following the manufacturer’s protocol. Briefly, we trypsinized and we fixed the cells with 2% paraformaldehyde solution during 10 minutes at room temperature, washed them in 1%BSA/PBS and incubated for 1h30 in 1/500 working solution. After incubation, we washed the cells with 1%BSA/PBS and analyzed them by flow cytometry. We measured β-galactosidase fluorescence signal in positive and negative Crimson cells independently. As positive control, non-infected cells were treated with 20 μM of Etoposide (Sigma, E1383-25) for 2, 4 and 7 days.
Transcriptome data analysis
For RNA-seq analysis, we extracted RNA using RNeasy plus mini kit from Qiagen (cat #: 74136). RNA-seq experiments were carried out by the Centre d’Expertise et de Services Genome Quebec using rRNA-depleted TruSeq stranded (HMR) libraries (Illumina) on an Illumina Hiseq 4000 instrument (paired-ends, 100-bp reads) and by The Center for applied Genomics (Toronto) using rRNA-deletion library prep on an Illumina NovaSeq-SP flow cell. We quality-controlled raw fastq files with FastQC and multiQC47. We used kallisto (v. 0.46.0) to quantify transcript abundances55 against ENSEMBL reference transcripts (release 94) followed by tximport to calculate gene-level counts56. We utilized regularized log-transformation (rlog) in DESeq250 as input for principal component analysis (PCA). DESeq250 was further used to identify differentially-expressed genes between teloHAEC cell models (Cas9, dCas9-VP64) infected by lentiviruses with safe-harbor sgRNA or sgRNA identified in the pooled CRISPR screens. We excluded genes expressed with less than 10 reads across all samples from the analysis. We performed shrinkage for effect size estimates using apeglm using the lfcShrink method57. Genes differentially expressed with a Benjamini-Hochberg adjusted p-value ≤ 0.05 were considered significant. Gene set enrichment analysis was performed using the R package fgsea using 100,000 permutations against the Hallmark gene sets from msigdbr (https://igordot.github.io/msigdbr/)58,59. We quantified short indels in the RNA-seq data of DHX38 (sgRNA10966) and _MAT2A (sgRNA_02249) using the tools transIndel and Genesis-Indel, which are specifically designed to identify indels in the unmapped read fraction of samples60,61.
Analysis of scRNA-seq data from human coronary arteries
Single-cell gene expression matrix from human right atherosclerotic coronary arteries (three male and one female donors), was downloaded from NCBI GEO (GSE131780, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE131780). The data was re-analyzed using the Seurat package in R with a standard single-cell clustering pipeline. Gene expression data was normalized using the SCTransform function from Seurat (v.3.2.3), regressing out the percentage of mitochondrial gene expression. Principal components analysis was performed, followed by dimensional reduction with Uniform Manifold Approximation and Projection (UMAP) using the first 20 principal components as input. Gene expression was visualized on the first two UMAP dimensions using the kernel density function (plot_density) from the Nebulosa package (v.0.99.92)62 for endothelial cell marker and candidate genes.
Statistics and data analysis
Unless noted otherwise, we performed all data and statistical analyses in R (v.3.6.0) using Rstudio. We ran our analyses on a high performance computing cluster (Beluga) from Calcul Quebec/Compute Canada. For MAGECK variant-level analyses, permutation-based FDR of ≤10% were considered significant. For RNA-seq analysis, genes with a Benjamini-Hochberg adjusted P-value in DESeq2 ≤0.05 were considered significant50.
Article TitleCRISPR perturbations at many coronary artery disease loci impair vascular endothelial cell functions
Genome-wide association studies have identified 161 genetic variants associated with coronary artery disease (CAD), but the causal genes and biological pathways remain unknown at most loci. Here, we used CRISPR knockout, inhibition and activation to target 1998 variants at 83 CAD loci to assess their effect on six vascular endothelial cell phenotypes (E-selectin, ICAM1, VCAM1, nitric oxide, reactive oxygen species, calcium signalling). We identified 42 significant variants located within 26 CAD loci. Detailed characterization of the RNA helicase DHX38 and CRISPR activation at the FURIN/FES, CCDC92/ZNF664 and CNNM2 loci revealed a strong effect on vascular endothelial cell senescence.