knitr::opts_chunk$set(tidy=FALSE, cache=TRUE,
                      dev="png", message=FALSE, error=FALSE, warning=TRUE)

Note: if you use MAGeCKFlute in published research, please cite: Binbin Wang, Mei Wang, Wubing Zhang. “Integrative analysis of pooled CRISPR genetic screens using MAGeCKFlute.” Nature Protocols (2019), doi: 10.1038/s41596-018-0113-7.

Install and load the required packages

if(!"MAGeCKFlute" %in% installed.packages()) BiocManager::install("MAGeCKFlute")
if(!"clusterProfiler" %in% installed.packages()) BiocManager::install("clusterProfiler")
if(!"ggplot2" %in% installed.packages()) BiocManager::install("ggplot2")

library(MAGeCKFlute)
library(clusterProfiler)
library(ggplot2)

Quick start

Downstream analysis of MAGeCK RRA

The MAGeCK (mageck test) uses Robust Rank Aggregation (RRA) for robust identification of CRISPR-screen hits, and outputs the summary results at both sgRNA and gene level. Before performing the downstream analysis, please make sure you have got the gene summary and sgRNA summary results from mageck test. MAGeCKFlute incorporates an example datasets (Deng Pan 2018), which study the gene functions in T cell mediate tumor killing by performing CRISPR screens in B16F10 cell lines co-cultured with Pmel-1 T cells. There are three samples in the data, including Pmel1_Input (B16F10 cells without T cell co-culture), Pmel1_Ctrl (B16F10 cells co-cultured with control T cells), and Pmel1 (B16F10 cells co-cultured with antigen specific T cells). We compared Pmel1 with Pmel1_Ctrl using mageck test, which identifies genes associated with T cell mediated tumor killing.

Run the FluteRRA pipeline

MAGeCKFlute processes MAGeCK RRA results (“gene_summary” and “sgrna_summary”) with the function FluteRRA, which identifies positively and negatively selected genes and performs functional analysis.

## path to the gene summary file (required)
file1 = file.path(system.file("extdata", package = "MAGeCKFlute"),
                  "testdata/rra.gene_summary.txt")
## path to the sgRNA summary file (optional)
file2 = file.path(system.file("extdata", package = "MAGeCKFlute"),
                  "testdata/rra.sgrna_summary.txt")
# Run FluteRRA with only gene summary file
FluteRRA(file1, proj="Pmel1", organism="mmu", outdir = "./")

# Run FluteRRA with both gene summary file and sgRNA summary file
FluteRRA(file1, file2, proj="Pmel1", organism="mmu", outdir = "./")

Important parameters

incorporateDepmap will allow users to compare the selected genes in the input data and Depmap data, which could clean the selected genes by eliminating lethal genes (false positive hits in some CRISPR screens) and then perform downstream enrichment analysis. You can set the incorporateDepmap to be TRUE if you want to include this analysis. Users could specify Depmap data of specific cell lines by setting the cell_lines and lineages.

FluteRRA(file1, proj="Pmel1", organism="mmu", incorporateDepmap = TRUE, outdir = "./")

omitEssential allows users to eliminate all common essential genes selected in the Depmap from all of the downstream analysis.

FluteRRA(gdata, proj="Pmel1", organism="mmu", omitEssential = TRUE, outdir = "./")

Hints: all figures and intermediate data are saved into local directory “./MAGeCKFlute_Test/”, and all figures are integrated into file “FluteRRA_Test.pdf”.

For more available parameters in FluteRRA, please read the documentation using the command ?FluteRRA.

Downstream analysis of MAGeCK MLE

The MAGeCK-VISPR (mageck mle) utilizes a maximum likelihood estimation (MLE) for robust identification of CRISPR screen hits. It outputs beta scores and the associated statistics in multiple conditions. The beta score describes how a gene is selected: a positive beta score indicates positive selection, and a negative beta score indicates negative selection. Before using FluteMLE, you should first get gene summary result from MAGeCK-VISPR (mageck mle). MAGeCKFlute uses the same dataset as before for demonstration. Using mageck mle, we removed the baseline effect (plasmid sample) from all the three samples, including Pmel1_Input (B16F10 cells without T cell co-culture), Pmel1_Ctrl (B16F10 cells co-cultured with control T cells), and Pmel1 (B16F10 cells co-cultured with antigen specific T cells).

Run the FluteMLE pipeline

MAGeCKFlute processes MAGeCK MLE results (“gene_summary”) with the function FluteMLE, which performs QC and data normalization based on the beta score, identifies essential genes by comparing control and treatment samples, and finally performs functional enrichment analysis.

file3 = file.path(system.file("extdata", package = "MAGeCKFlute"),
                  "testdata/mle.gene_summary.txt")
FluteMLE(file3, treatname="Pmel1", ctrlname="Pmel1_Ctrl", proj="Pmel1", organism="mmu")

Important parameters

incorporateDepmap will allow users to compare the selected genes in the input data and Depmap data, which could clean the selected genes by eliminating lethal genes (false positive hits in some CRISPR screens) and then perform downstream enrichment analysis. You can set the incorporateDepmap to be TRUE if you want to include this analysis. Users could specify Depmap data of specific cell lines by setting the cell_lines and lineages.

## Take Depmap screen as control
FluteMLE(gdata, treatname="Pmel1_Ctrl", ctrlname="Depmap", proj="Pmel1", organism="mmu", incorporateDepmap = TRUE)

omitEssential allows users to eliminate all common essential genes selected in the Depmap from all of the downstream analysis.

FluteMLE(gdata, treatname="Pmel1", ctrlname="Pmel1_Ctrl", proj="Pmel1", organism="mmu", omitEssential = TRUE)

Hint: All pipeline results are written into local directory “./MAGeCKFlute_Test/”, and all figures are integrated into file “FluteMLE_Test.pdf”.

For more available parameters in FluteMLE, please read the documentation using the command ?FluteMLE.

Step by step analysis

Section I: Quality control

Input data

MAGeCK/MAGeCK-VISPR outputs a count summary file, which summarizes some basic QC scores at raw count level, including map ratio, Gini index, and NegSelQC. MAGeCKFlute incorporates an example datasets (Ophir Shalem* 2014) for demonstration.

file4 = file.path(system.file("extdata", package = "MAGeCKFlute"),
                  "testdata/countsummary.txt")
countsummary = read.delim(file4, check.names = FALSE)
head(countsummary)

##                                   File    Label    Reads   Mapped Percentage
## 1 ../data/GSC_0131_Day23_Rep1.fastq.gz day23_r1 62818064 39992777     0.6366
## 2  ../data/GSC_0131_Day0_Rep2.fastq.gz  day0_r2 47289074 31709075     0.6705
## 3  ../data/GSC_0131_Day0_Rep1.fastq.gz  day0_r1 51190401 34729858     0.6784
## 4 ../data/GSC_0131_Day23_Rep2.fastq.gz day23_r2 58686580 37836392     0.6447
##   TotalsgRNAs Zerocounts GiniIndex NegSelQC NegSelQCPval
## 1       64076         57   0.08510        0            1
## 2       64076         17   0.07496        0            1
## 3       64076         14   0.07335        0            1
## 4       64076         51   0.08587        0            1
##   NegSelQCPvalPermutation NegSelQCPvalPermutationFDR NegSelQCGene
## 1                       1                          1            0
## 2                       1                          1            0
## 3                       1                          1            0
## 4                       1                          1            0

Visualize the QC results

# Gini index
BarView(countsummary, x = "Label", y = "GiniIndex",
        ylab = "Gini index", main = "Evenness of sgRNA reads")

# Missed sgRNAs
countsummary$Missed = log10(countsummary$Zerocounts)
BarView(countsummary, x = "Label", y = "Missed", fill = "#394E80",
        ylab = "Log10 missed gRNAs", main = "Missed sgRNAs")

# Read mapping
MapRatesView(countsummary)

# Or
countsummary$Unmapped = countsummary$Reads - countsummary$Mapped
gg = reshape2::melt(countsummary[, c("Label", "Mapped", "Unmapped")], id.vars = "Label")
gg$variable = factor(gg$variable, levels = c("Unmapped", "Mapped"))
gg = gg[order(gg$Label, gg$variable), ]
p = BarView(gg, x = "Label", y = "value", fill = "variable", 
            position = "stack", xlab = NULL, ylab = "Reads", main = "Map ratio")
p + scale_fill_manual(values = c("#9BC7E9", "#1C6DAB"))

Section II: Downstream analysis of MAGeCK RRA

For CRISPR/Cas9 screens with two experimental conditions, MAGeCK-RRA is available for identification of essential genes. In MAGeCK-RRA results, the sgRNA summary and gene summary file summarizes the statistical significance of positive selections and negative selections at sgRNA level and gene level.

Read the required data

file1 = file.path(system.file("extdata", package = "MAGeCKFlute"),
                  "testdata/rra.gene_summary.txt")
gdata = ReadRRA(file1)
head(gdata)

##       id   Score      FDR
## 1  Cd274 -3.4698 0.002475
## 2  Psmb8 -3.7260 0.002475
## 3   Rela -3.2413 0.008251
## 4 Otulin -3.6018 0.008663
## 5  Ikbkb -3.4256 0.010891
## 6  Tcea1 -3.4236 0.042079

file2 = file.path(system.file("extdata", package = "MAGeCKFlute"),
                  "testdata/rra.sgrna_summary.txt")
sdata = ReadsgRRA(file2)
head(sdata)

##                  sgrna  Gene     LFC FDR
## 1 AAACATATAGTGTACCTCTA  Jak1 10.8910   0
## 2 TCCGAACCGAATCATCACTG  Jak1 10.9170   0
## 3 TGAATAAATCCATCAGACAG  Jak1 10.5970   0
## 4 GGATAGACGCCCAGCCACTG Stat1  9.9921   0
## 5 TTAATGACGAGCTCGTGGAG Stat1  9.2728   0
## 6 GAAAAGCAAGCGTAATCTCC Stat1  8.7931   0

To incorporate depmap data that are profiled in human cell lines, we will convert mouse gene names to homologous human genes for this dataset.

gdata$HumanGene = TransGeneID(gdata$id, fromType = "symbol", toType = "symbol",
                              fromOrg = "mmu", toOrg = "hsa")
sdata$HumanGene = TransGeneID(sdata$Gene, fromType = "symbol", toType = "symbol",
                              fromOrg = "mmu", toOrg = "hsa")

Compute the similarity between the CRISPR screen with Depmap screens

## Remove missing or duplicate human genes
idx = duplicated(gdata$HumanGene)|is.na(gdata$HumanGene)
gdata = gdata[!idx, ]
depmap_similarity = ResembleDepmap(gdata, symbol = "HumanGene", score = "Score")
head(depmap_similarity)

Omit common essential genes from the data

gdata = OmitCommonEssential(gdata, symbol = "HumanGene")
sdata = OmitCommonEssential(sdata, symbol = "HumanGene")

Visualization of negative selections and positive selections

Volcano plot

gdata$LogFDR = -log10(gdata$FDR)
p1 = ScatterView(gdata, x = "Score", y = "LogFDR", label = "id", 
                 model = "volcano", top = 5)
print(p1)

# Or
p2 = VolcanoView(gdata, x = "Score", y = "FDR", Label = "id")
print(p2)

Rank plot

Rank all the genes based on their scores and label genes in the rank plot.

gdata$Rank = rank(gdata$Score)
p1 = ScatterView(gdata, x = "Rank", y = "Score", label = "id", 
                 top = 5, auto_cut_y = TRUE, ylab = "Log2FC", 
                 groups = c("top", "bottom"))
print(p1)

## Warning: ggrepel: 5 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

Label interested hits using parameter toplabels (in ScatterView) and genelist (in RankView).

ScatterView(gdata, x = "Rank", y = "Score", label = "id", top = 5, 
            auto_cut_y = TRUE, groups = c("top", "bottom"), 
            ylab = "Log2FC", toplabels = c("Pbrm1", "Arid2", "Brd7"))

## Warning: ggrepel: 8 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

You can also use the function RankView to draw the figure.

geneList= gdata$Score
names(geneList) = gdata$id
p2 = RankView(geneList, top = 5, bottom = 10)
print(p2)

## Warning: No shared levels found between `names(values)` of the manual scale and the
## data's fill values.

RankView(geneList, top = 0, bottom = 0, genelist = c("Pbrm1", "Arid2", "Brd7"))

## Warning: No shared levels found between `names(values)` of the manual scale and the
## data's fill values.

Dot plot

Visualize negative and positive selected genes separately.

gdata$RandomIndex = sample(1:nrow(gdata), nrow(gdata))
gdata = gdata[order(-gdata$Score), ]
gg = gdata[gdata$Score>0, ]
p1 = ScatterView(gg, x = "RandomIndex", y = "Score", label = "id",
                 y_cut = CutoffCalling(gdata$Score,2), 
                 groups = "top", top = 5, ylab = "Log2FC")
p1

gg = gdata[gdata$Score<0, ]
p2 = ScatterView(gg, x = "RandomIndex", y = "Score", label = "id",
                 y_cut = CutoffCalling(gdata$Score,2), 
                 groups = "bottom", top = 5, ylab = "Log2FC")
p2

sgRankView - visualize the rank of sgRNAs targeting top selected genes.

p2 = sgRankView(sdata, top = 4, bottom = 4)
print(p2)

Enrichment analysis

For more information about functional enrichment analysis in MAGeCKFlute, please read the MAGeCKFlute_enrichment document, in which we introduce all the available options and methods.

geneList= gdata$Score
names(geneList) = gdata$id
enrich_pos = EnrichAnalyzer(geneList = geneList[geneList>0.5], 
                            method = "HGT", type = "KEGG")
enrich_neg = EnrichAnalyzer(geneList = geneList[geneList< -0.5], 
                            method = "HGT", type = "KEGG")

Visualization of enrichment results

EnrichedView(enrich_pos, mode = 1, top = 5, bottom = 0)

EnrichedView(enrich_pos, mode = 2, top = 5, bottom = 0)

EnrichedView(enrich_neg, mode = 2, top = 0, bottom = 5)

Section III: Downstream analysis of MAGeCK MLE

The MAGeCK-VISPR (mageck mle) computes beta scores and the associated statistics for all genes in multiple conditions. The beta score describes how the gene is selected: a positive beta score indicates a positive selection, and a negative beta score indicates a negative selection. Before using FluteMLE, you should first get gene summary result from MAGeCK-VISPR (mageck mle). MAGeCKFlute incorporates an example datasets for demonstration.

Batch effect removal (not recommended)

Is there batch effects? This is a commonly asked question before perform later analysis. In our package, we provide HeatmapView to ensure whether the batch effect exists in data and use BatchRemove to remove easily if same batch samples cluster together.

##Before batch removal
edata = matrix(c(rnorm(2000, 5), rnorm(2000, 8)), 1000)
colnames(edata) = paste0("s", 1:4)
HeatmapView(cor(edata))

## After batch removal
batchMat = data.frame(sample = colnames(edata), batch = rep(1:2, each = 2))
edata1 = BatchRemove(edata, batchMat)
head(edata1$data)

##            s1       s2       s3       s4
## [1,] 7.082637 5.819096 6.872381 5.732671
## [2,] 5.411736 6.958999 6.212868 5.979145
## [3,] 6.356394 7.121785 6.242167 7.024653
## [4,] 6.767546 7.153993 7.137302 6.922231
## [5,] 5.136134 7.681344 6.427540 5.876513
## [6,] 6.947032 6.196381 6.419730 6.761133

print(edata1$p)

read gene summary file

file3 = file.path(system.file("extdata", package = "MAGeCKFlute"),
                  "testdata/mle.gene_summary.txt")
# Read and visualize the file format
gdata = ReadBeta(file3)
head(gdata)

##      Gene Pmel1_Input Pmel1_Ctrl     Pmel1
## 1  Defb34    0.087884  -0.010034 -0.068015
## 2   Mndal    0.282860   0.033850 -0.161790
## 3   Cox8c    0.212140  -0.031618 -0.491360
## 4 Poldip3    0.125260  -0.123200 -0.450270
## 5   Bcas2   -0.196670   0.031254 -0.457530
## 6  Klk1b9   -0.220270  -0.019697 -0.443300

Normalization of beta scores

It is difficult to control all samples with a consistent cell cycle in a CRISPR screen experiment with multi conditions. Besides, beta score among different states with an inconsistent cell cycle is incomparable. So it is necessary to do the normalization when comparing the beta scores in different conditions. Essential genes are those genes that are indispensable for its survival. The effect generated by knocking out these genes in different cell types is consistent. Based on this, we developed the cell cycle normalization method to shorten the gap of the cell cycle in different conditions.

ctrlname = "Pmel1_Ctrl"
treatname = "Pmel1"
gdata$HumanGene = TransGeneID(gdata$Gene, fromType = "symbol", toType = "symbol",
                              fromOrg = "mmu", toOrg = "hsa")

gdata_cc = NormalizeBeta(gdata, id = "HumanGene", samples=c(ctrlname, treatname), 
                         method="cell_cycle")
head(gdata_cc)

##      Gene Pmel1_Input  Pmel1_Ctrl      Pmel1 HumanGene
## 1  Defb34    0.087884 -0.06263323 -0.2425685  DEFB106A
## 2   Mndal    0.282860  0.21129508 -0.5770074     IFI16
## 3   Cox8c    0.212140 -0.19736271 -1.7523850     COX8C
## 4 Poldip3    0.125260 -0.76902670 -1.6058418   POLDIP3
## 5   Bcas2   -0.196670  0.19509059 -1.6317338     BCAS2
## 6  Klk1b9   -0.220270 -0.12295064 -1.5809840      KLK3

Distribution of all gene beta scores

After normalization, the distribution of beta scores in different conditions should be similar. We can evaluate the distribution of beta scores using the function ‘DensityView’, and ‘ConsistencyView’.

DensityView(gdata_cc, samples=c(ctrlname, treatname))

ConsistencyView(gdata_cc, ctrlname, treatname)

# Another option MAView
MAView(gdata_cc, ctrlname, treatname)

Positive selection and negative selection

gdata_cc$Control = rowMeans(gdata_cc[,ctrlname, drop = FALSE])
gdata_cc$Treatment = rowMeans(gdata_cc[,treatname, drop = FALSE])

p1 = ScatterView(gdata_cc, "Control", "Treatment", label = "Gene", 
                 auto_cut_diag = TRUE, display_cut = TRUE, 
                 groups = c("top", "bottom"),
                 toplabels = c("Pbrm1", "Brd7", "Arid2", "Jak1", "Stat1", "B2m"))
print(p1)

## Warning: ggrepel: 3 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

Rank plot - label top hits

gdata_cc$Diff = gdata_cc$Treatment - gdata_cc$Control
gdata_cc$Rank = rank(gdata_cc$Diff)
p1 = ScatterView(gdata_cc, x = "Rank", y =  "Diff", label = "Gene", 
                 top = 5, auto_cut_y = TRUE, groups = c("top", "bottom"))
print(p1)

## Warning: ggrepel: 5 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

# Or
rankdata = gdata_cc$Treatment - gdata_cc$Control
names(rankdata) = gdata_cc$Gene
RankView(rankdata)

## Warning: No shared levels found between `names(values)` of the manual scale and the
## data's fill values.

Nine-square scatter plot - identify treatment-associated genes

p1 = ScatterView(gdata_cc, x="Pmel1_Ctrl", y="Pmel1", label = "Gene", 
                 model = "ninesquare", top = 5, display_cut = TRUE, force = 2)
print(p1)

## Warning: ggrepel: 10 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

Customize the cutoff

p1 = ScatterView(gdata_cc, x="Pmel1_Ctrl", y="Pmel1", label = "Gene", 
                 model = "ninesquare", top = 5, display_cut = TRUE, 
                 x_cut = c(-1,1), y_cut = c(-1,1))
print(p1)

## Warning: ggrepel: 5 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

p2 = SquareView(gdata_cc, label = "Gene", 
                x_cutoff = CutoffCalling(gdata_cc$Control, 2), 
                y_cutoff = CutoffCalling(gdata_cc$Treatment, 2))
print(p2)

Functional analysis for treatment-associated genes

# 9-square groups
Square9 = p1$data
idx=Square9$group=="topcenter"
geneList = Square9$Diff[idx]
names(geneList) = Square9$Gene[idx]
universe = Square9$Gene

# Enrichment analysis
kegg1 = EnrichAnalyzer(geneList = geneList, universe = universe)
EnrichedView(kegg1, top = 6, bottom = 0)

Also, pathway visualization can be done using function KeggPathwayView (Luo et al. 2013).

gdata_cc$Entrez = TransGeneID(gdata_cc$Gene, "symbol", "entrez", organism = "mmu")
genedata = gdata_cc[, c("Entrez", "Control","Treatment")]
arrangePathview(genedata, pathways = "mmu04630", organism = "mmu", sub = NULL)

## Warning in data(bods): data set 'bods' not found

Session info

sessionInfo()

## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ggplot2_3.5.1          clusterProfiler_4.13.3 MAGeCKFlute_2.9.0     
## [4] BiocStyle_2.33.1      
## 
## loaded via a namespace (and not attached):
##   [1] RColorBrewer_1.1-3      sys_3.4.2               jsonlite_1.8.8         
##   [4] magrittr_2.0.3          farver_2.1.2            rmarkdown_2.28         
##   [7] fs_1.6.4                zlibbioc_1.51.1         vctrs_0.6.5            
##  [10] memoise_2.0.1           RCurl_1.98-1.16         ggtree_3.13.1          
##  [13] htmltools_0.5.8.1       AnnotationHub_3.13.3    curl_5.2.2             
##  [16] gridGraphics_0.5-1      depmap_1.19.1           sass_0.4.9             
##  [19] bslib_0.8.0             plyr_1.8.9              httr2_1.0.4            
##  [22] cachem_1.1.0            buildtools_1.0.0        igraph_2.0.3           
##  [25] lifecycle_1.0.4         pkgconfig_2.0.3         gson_0.1.0             
##  [28] Matrix_1.7-0            R6_2.5.1                fastmap_1.2.0          
##  [31] MatrixGenerics_1.17.0   GenomeInfoDbData_1.2.12 digest_0.6.37          
##  [34] aplot_0.2.3             enrichplot_1.25.2       colorspace_2.1-1       
##  [37] patchwork_1.2.0         AnnotationDbi_1.67.0    S4Vectors_0.43.2       
##  [40] pathview_1.45.0         ExperimentHub_2.13.1    RSQLite_2.3.7          
##  [43] org.Hs.eg.db_3.19.1     labeling_0.4.3          filelock_1.0.3         
##  [46] fansi_1.0.6             mgcv_1.9-1              httr_1.4.7             
##  [49] polyclip_1.10-7         compiler_4.4.1          bit64_4.0.5            
##  [52] withr_3.0.1             BiocParallel_1.39.0     viridis_0.6.5          
##  [55] DBI_1.2.3               highr_0.11              ggforce_0.4.2          
##  [58] R.utils_2.12.3          MASS_7.3-61             rappdirs_0.3.3         
##  [61] tools_4.4.1             scatterpie_0.2.4        ape_5.8                
##  [64] msigdbr_7.5.1           R.oo_1.26.0             glue_1.7.0             
##  [67] nlme_3.1-166            GOSemSim_2.31.2         shadowtext_0.1.4       
##  [70] grid_4.4.1              reshape2_1.4.4          sva_3.53.0             
##  [73] fgsea_1.31.0            generics_0.1.3          gtable_0.3.5           
##  [76] R.methodsS3_1.8.2       tidyr_1.3.1             data.table_1.16.0      
##  [79] tidygraph_1.3.1         utf8_1.2.4              XVector_0.45.0         
##  [82] BiocGenerics_0.51.1     ggrepel_0.9.6           BiocVersion_3.20.0     
##  [85] pillar_1.9.0            stringr_1.5.1           babelgene_22.9         
##  [88] limma_3.61.9            yulab.utils_0.1.7       genefilter_1.87.0      
##  [91] splines_4.4.1           dplyr_1.1.4             tweenr_2.0.3           
##  [94] treeio_1.29.1           BiocFileCache_2.13.0    lattice_0.22-6         
##  [97] survival_3.7-0          bit_4.0.5               annotate_1.83.0        
## [100] tidyselect_1.2.1        locfit_1.5-9.10         GO.db_3.19.1           
## [103] maketools_1.3.0         Biostrings_2.73.1       knitr_1.48             
## [106] gridExtra_2.3           IRanges_2.39.2          edgeR_4.3.14           
## [109] stats4_4.4.1            xfun_0.47               graphlayouts_1.1.1     
## [112] Biobase_2.65.1          statmod_1.5.0           matrixStats_1.4.1      
## [115] pheatmap_1.0.12         KEGGgraph_1.65.0        stringi_1.8.4          
## [118] UCSC.utils_1.1.0        lazyeval_0.2.2          ggfun_0.1.6            
## [121] yaml_2.3.10             evaluate_0.24.0         codetools_0.2-20       
## [124] ggraph_2.2.1            tibble_3.2.1            qvalue_2.37.0          
## [127] Rgraphviz_2.49.0        BiocManager_1.30.25     graph_1.83.0           
## [130] ggplotify_0.1.2         cli_3.6.3               xtable_1.8-4           
## [133] munsell_0.5.1           jquerylib_0.1.4         Rcpp_1.0.13            
## [136] GenomeInfoDb_1.41.1     dbplyr_2.5.0            png_0.1-8              
## [139] XML_3.99-0.17           parallel_4.4.1          blob_1.2.4             
## [142] DOSE_3.99.1             bitops_1.0-8            viridisLite_0.4.2      
## [145] tidytree_0.4.6          scales_1.3.0            purrr_1.0.2            
## [148] crayon_1.5.3            rlang_1.1.4             cowplot_1.1.3          
## [151] fastmatch_1.1-4         KEGGREST_1.45.1

References

Deng Pan, Peng Jiang, Aya Kobayashi. 2018. “A major chromatin regulator determines resistance of tumor cells to T cell-mediated killing.” https://www.science.org/lookup/doi/10.1126/science.aao1710.

Luo, Weijun, Brouwer, and Cory. 2013. “Pathview: An r/Bioconductor Package for Pathway-Based Data Integration and Visualization.” Bioinformatics 29 (14): 1830–31. https://doi.org/10.1093/bioinformatics/btt285.

Ophir Shalem*, Ella Hartenian, Neville E. Sanjana*. 2014. “Genome-scale CRISPR-Cas9 knockout screening in human cells.” http://science.sciencemag.org/content/343/6166/84.long.

Wei Li, Han Xu, Johannes Köster, and X. Shirley Liu. 2015. “Quality control, modeling, and visualization of CRISPR screens with MAGeCK-VISPR.” https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0843-6.

Wei Li, Tengfei Xiao, Han Xu, and X Shirley Liu. 2014. “MAGeCK enables robust identification of essential genes from genome-scale CRISPR/Cas9 knockout screens.” https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0554-4.

MAGeCKFlute - Integrative analysis pipeline for pooled CRISPR functional genetic screens

Install and load the required packages

Quick start

Downstream analysis of MAGeCK RRA

Run the FluteRRA pipeline

Important parameters

Downstream analysis of MAGeCK MLE

Run the FluteMLE pipeline

Important parameters

Step by step analysis

Section I: Quality control

Input data

Visualize the QC results

Section II: Downstream analysis of MAGeCK RRA

Read the required data

To incorporate depmap data that are profiled in human cell lines, we will convert mouse gene names to homologous human genes for this dataset.

Compute the similarity between the CRISPR screen with Depmap screens

Omit common essential genes from the data

Visualization of negative selections and positive selections

Volcano plot

Rank plot

Dot plot

sgRankView - visualize the rank of sgRNAs targeting top selected genes.

Enrichment analysis

Visualization of enrichment results

Section III: Downstream analysis of MAGeCK MLE

Batch effect removal (not recommended)

read gene summary file

Normalization of beta scores

Distribution of all gene beta scores

Positive selection and negative selection

Rank plot - label top hits

Nine-square scatter plot - identify treatment-associated genes

Functional analysis for treatment-associated genes

Session info

References