Title: | Suite of Functions for Pooled Crispr Screen QC and Analysis |
---|---|
Description: | Set of tools for evaluating pooled high-throughput screening experiments, typically employing CRISPR/Cas9 or shRNA expression cassettes. Contains methods for interrogating library and cassette behavior within an experiment, identifying differentially abundant cassettes, aggregating signals to identify candidate targets for empirical validation, hypothesis testing, and comprehensive reporting. Version 2.0 extends these applications to include a variety of tools for contextualizing and integrating signals across many experiments, incorporates extended signal enrichment methodologies via the "sparrow" package, and streamlines many formal requirements to aid in interpretablity. |
Authors: | Russell Bainer, Dariusz Ratman, Steve Lianoglou, Peter Haverty |
Maintainer: | Russell Bainer <[email protected]> |
License: | Artistic-2.0 |
Version: | 2.13.0 |
Built: | 2024-10-30 07:58:43 UTC |
Source: | https://github.com/bioc/gCrisprTools |
Example alignment matrix file for the provided example Crispr screen. All sample, gRNA, and Gene information has been anonymized and randomized.
Genentech, Inc.
Please see ‘vignettes/Crispr_example_workflow.R’ for details.
data('aln') head(aln)
data('aln') head(aln)
Example annotation file for the screen data provided in es
.
All sample, gRNA, and Gene information has been anonymized and randomized.
Genentech, Inc.
Please see ‘vignettes/Crispr_example_workflow.R’ for details.
data('ann') head(ann)
data('ann') head(ann)
This function displays the alignemnt statistics for a pooled Crispr screen, reported directly from an alignment statistic matrix.
ct.alignmentChart(aln, sampleKey = NULL)
ct.alignmentChart(aln, sampleKey = NULL)
aln |
A numeric matrix of alignment statistics for a Crispr experiment. Corresponds to a 4xN matrix of read counts, with columns indicating
samples and rows indicating the number of 'targets', 'nomatch', 'rejections', and 'double_match' reads. Details about these classes may be found
in the best practices vignette or as part of the report generated with |
sampleKey |
An optional ordered factor linking the samples to experimental variables. The |
A grouped barplot displaying the alignment statistics for each sample included in the alignment matrix, which usually corresponds to all of the samples in the experiment.
Russell Bainer
data('aln') ct.alignmentChart(aln)
data('aln') ct.alignmentChart(aln)
The 'alpha' part of RRAalpha is used to consider only the top guide-level scores for gene-level statistics. Practically, all guides failing the cutoff get a pvalue of 1. There are three ways of determining which guides fail. See 'scoring' below.
ct.applyAlpha( stats, RRAalphaCutoff = 0.1, scoring = c("combined", "pvalue", "fc") )
ct.applyAlpha( stats, RRAalphaCutoff = 0.1, scoring = c("combined", "pvalue", "fc") )
stats |
three-column numeric matrix with pvalues for down and up one-sided test with guide-level fold changes (coefficients from the relevant contrast). |
RRAalphaCutoff |
A cutoff to use when defining gRNAs with significantly altered abundance during the RRAa aggregation step, which may be specified
as a single numeric value on the unit interval or as a logical vector. When supplied as a logical vector (of length equal to |
scoring |
The gRNA ranking method to use in RRAa aggregation. May take one of three values: |
data.frame with guide-level pvals, fold change, and scores.deplete and scores.enrich which are the input the RRAalpha
Russell Bainer
fakestats <- matrix(runif(300), ncol = 3) colnames(fakestats) = c('Depletion.P', 'Enrichment.P', 'lfc') ct.applyAlpha(fakestats)
fakestats <- matrix(runif(300), ncol = 3) colnames(fakestats) = c('Depletion.P', 'Enrichment.P', 'lfc') ct.applyAlpha(fakestats)
Convenience function to package major components of a screen into a 'SummarizedExperiment' container for downstream visualization and analysis. All arguments are optional except for 'es'.
ct.buildSE( es, sampleKey = NULL, ann = NULL, vm = NULL, fit = NULL, summaryList = NULL )
ct.buildSE( es, sampleKey = NULL, ann = NULL, vm = NULL, fit = NULL, summaryList = NULL )
es |
An 'ExpressionSet' of screen data. Required. |
sampleKey |
a gCrisprTools 'sampleKey' object, to be added to the 'colData'. |
ann |
Annotation object to be packaged into the 'rowData' |
vm |
A 'voom'-derived normalized object |
fit |
a 'MArrayLM' object containing the contrast information and model results |
summaryList |
A named list of |
A 'SummarizedExperiment' object.
Russell Bainer
data('ann', 'es', 'fit', 'resultsDF') ct.buildSE(es, ann = ann, fit = 'fit', summaryList = list('resA' = resultsDF, 'resB' = resultsDF))
data('ann', 'es', 'fit', 'resultsDF') ct.buildSE(es, ann = ann, fit = 'fit', summaryList = list('resA' = resultsDF, 'resB' = resultsDF))
This is a function for comparing the results of two screening experiments. Given two summaryDF
,
the function places them in register with one another, generates a Concordance At The Top (CAT) plot, and returns an
invisible dataframe containing the relevant gene-level signals. Signals are aggregated by P-value threshold, such
that the concordance is represented as the portion of shared values meeting or exceeding that significance threshold.
This function is conceptually similar to the 'ct.ROC' and 'ct.PRC()' functions, but is appropriate when considering consistency of ranked values rather than an interchangeable set; the most common use case is for comparing primary and replication screens, where the underlying technology and selection criteria are expected to be highly similar. CAT plots are fundamentally about comparing rankings, and so only targets in common between the two provided screens are considered. If the totality of list overlap is important, consider using 'ct.PRC()' or 'ct.ROC()'.
ct.CAT( dflist, targets = c("geneSymbol", "geneID"), switch.dir = FALSE, plot.it = TRUE )
ct.CAT( dflist, targets = c("geneSymbol", "geneID"), switch.dir = FALSE, plot.it = TRUE )
dflist |
A list of results dataframes. Names will be preserved, and the enrichment calculation is conditioned on the first element of the list. |
targets |
Column of the provided |
switch.dir |
Logical indicating whether to test overlap of signals in the same direction, or whether the directionality is expected to reverse. 'same.dir = FALSE' looks at the consistency between depleted signals in 'df1' and enriched signals in 'df2'. |
plot.it |
Logical indicating whether to compose the plots on the default device. Two CAT plots summarizing overlap in both enrichment directions are drawn. |
Invisibly, a data.frame containing the relevant summary stats for each target in both screens.
Russell Bainer
data('resultsDF') cat <- ct.CAT(list('first' = resultsDF, 'second' = resultsDF[1:2000,])) head(cat)
data('resultsDF') cat <- ct.CAT(list('first' = resultsDF, 'second' = resultsDF[1:2000,])) head(cat)
This function identifies signals that are present in one or more screening experiment contrasts using a conditional strategy. Specifically, this function identifies all significant signals (according to user definitions) in a set of provided results DF and returns a 'simplifiedResult' dataframe derived from the first provided contrast with an appended logical column indicating whether there is evidence for signal replication in the other provided resultsDFs.
Signals are considered replicated if they cross the specified stringent threshold (default: Q = 0.1) in one or more of the provided contrasts, and are similarly enriched or depleted at the relaxed threshold (default: P = 0.1) in all of the remaining contrasts. If a single contrast is provided, all signals crossing the stringent threshold are considered replicated.
Signals are compared across screens on the basis of ct.regularizeContrasts
, so users must provide an identifier
with which to standardize targets ('geneID' by default).
ct.compareContrasts( dflist, statistics = c("best.q", "best.p"), cutoffs = c(0.1, 0.1), same.dir = rep(TRUE, length(dflist)), return.stats = FALSE, nperm = 10000, ... )
ct.compareContrasts( dflist, statistics = c("best.q", "best.p"), cutoffs = c(0.1, 0.1), same.dir = rep(TRUE, length(dflist)), return.stats = FALSE, nperm = 10000, ... )
dflist |
A list of (possibly simplified) results data.frames produced by |
statistics |
Statistics to use to define congruence; may be a single value, but internally coerced to a vector of length 2 where the first value corresponds to the stringent cutoff annd the second value is used for the relaxed cutoff. Must be 'best.p' or 'best.q'. |
cutoffs |
Numeric value(s) corresponding to the significance cutoff(s) used to define stringent and relaxed values of 'statistics'. Internally coerced to a vector of length 2. |
same.dir |
Logical vector of the same length as 'dflist' indicating whether replicating signals are expected to go in the same direction (e.g., enrich/deplete in their respective screens). For example, a 'dflist' of length 3 could be specified as c(TRUE, TRUE, FALSE), indicating that replicating signals should be enriched in both of the first two contrasts and depleted in the third to be considered replicated (or vise-versa). Default is 'rep(TRUE, length(dflist))'. |
return.stats |
When TRUE, return the significance of overlap instead of the logical vector (by permutation). |
nperm |
numeric indicating number of permutations when 'return.stats' is true (default 10000). |
... |
Other arguments to 'ct.simpleResult()', especially 'collapse'. |
If 'return.stats' is 'FALSE', returns the first contrast as a 'simplifiedResult' data.frame, with a 'replicated' logical column indicating whether each signal replicates in all of the provided screens according to the specified logic.
If 'return.stats' is 'TRUE', returns a dataframe indicating the permutation-based test statistics summarizing the evidence for significantly enriched signal replication across the provided contrasts (enrich, deplete, and all together).
Russell Bainer
data('resultsDF') summary(ct.compareContrasts(list(resultsDF, resultsDF[1:5000,]))$replicated) ct.compareContrasts(list(resultsDF, resultsDF[1:5000,]), return.stats = TRUE)
data('resultsDF') summary(ct.compareContrasts(list(resultsDF, resultsDF[1:5000,]))$replicated) ct.compareContrasts(list(resultsDF, resultsDF[1:5000,]), return.stats = TRUE)
Given a list of provided results 'data.frame's summarizing a series of contrasts from one or more pooled screens, this function visualizes their respective signals as a series of stacked barcharts. Enriched signals are represented in the positive direction, and depleted signals are represented in the negative direction. Note that the provided contrast results are not regularized by this function.
This function may be used to compare signals across different screen contrasts, or to compare signals within interesting subsets of targets ascertained within a single experiment.
ct.contrastBarchart( dflist, background = TRUE, statistic = c("best.q", "best.p"), ... )
ct.contrastBarchart( dflist, background = TRUE, statistic = c("best.q", "best.p"), ... )
dflist |
A named list of 'data.frame's summarizing the results of one or more screen contrasts, returned by the function
|
background |
Logical indicating whether to represent the nonsignificant hits in the barchart. |
statistic |
Should cutoffs be calculated based on FDR ('best.q') or P-value ('best.p')? |
... |
Other parameters to lower functions, especially 'ct.simpleResult()' |
A summary plot on the current device. Invisibly, the data.frame tallying signals at various thresholds.
Russell Bainer
data('resultsDF') ct.contrastBarchart(list('FirstResult' = resultsDF, 'SecondResult' = resultsDF)) ct.contrastBarchart(list('FirstResult' = resultsDF, 'SecondResult' = resultsDF), background = FALSE) ct.contrastBarchart(list('FirstResult' = resultsDF[1:1000,], 'SecondResult' = resultsDF))
data('resultsDF') ct.contrastBarchart(list('FirstResult' = resultsDF, 'SecondResult' = resultsDF)) ct.contrastBarchart(list('FirstResult' = resultsDF, 'SecondResult' = resultsDF), background = FALSE) ct.contrastBarchart(list('FirstResult' = resultsDF[1:1000,], 'SecondResult' = resultsDF))
This function produces two sets of one-sided P-values derived from the moderated t-statistics produced by eBayes.
ct.DirectionalTests(fit, contrast.term = NULL)
ct.DirectionalTests(fit, contrast.term = NULL)
fit |
An object of class MArrayLM containing, at minimum, a |
contrast.term |
If a fit object with multiple coefficients is passed in, a string indiating the coefficient of interest. |
A matrix object with two numeric columns, indicating the p-values quantifying the evidence for enrichment and depletion of each feature in the relevant model contrast.
Russell Bainer
data('fit') ct.DirectionalTests(fit)
data('fit') ct.DirectionalTests(fit)
This function takes a gCrisprTools annotation object and expands it to allow 1:many mappings of reagents. This mostly is used for internal processing, and users should interact with the wrapper functions that call it (e.g., 'ct.generateResults').
Libraries targeting ambiguous biological elements (e.g., alternative promoters to a gene where the boundaries between elements is contested) may contain reagents that are plausibly annotated to a finite set of possible targets. To accomodate this, users may supply an alternative reagent annotation in the form of a named list of vectors, where the names correspond to reagent 'ID's in the annotation object and each list element corresponds something coercible to a to a character vector of associated targets that will ultimately be assembled into the 'geneSymbol' column of the annotation object. It is assumed that the 'geneID' values are assigned unambiguously to the reagents, and are passed through directly.
ct.expandAnnotation(ann, alt.annotation)
ct.expandAnnotation(ann, alt.annotation)
ann |
A |
alt.annotation |
A named list of character vectors, which should be named identically to a value in the 'ID' column of the supplied annotation object. The values in the character vectors will eventually form the 'geneSymbol' column of the annotation file. |
A new annotation data frame, expanded as described above.
Russell Bainer
data('ann') alt.annotation <- list('Target2089_b' = c('foo', 'bar'), 'Target2089_c' = 'foo', 'Target2089_a' = 'bar') ct.expandAnnotation(ann, alt.annotation)
data('ann') alt.annotation <- list('Target2089_b' = c('foo', 'bar'), 'Target2089_c' = 'foo', 'Target2089_a' = 'bar') ct.expandAnnotation(ann, alt.annotation)
This function removes gRNAs only present in very low abundance across all samples of a pooled Crispr
screening experiment. In most cases very low-abundance guides are the
result of low-level contamination from other libraries, and often distort standard normalization approaches. This
function trims gRNAs in a largely heuristic way, assuming that the majority of 'real' gRNAs within the library are
comparably abundant in at least some of the samples (such as unexpanded controls), and that contaminants are
present at negligible levels. Specifically, the function trims the trim
most abundant guides from the upper tail of each log-transformed sample distribution, and then omits gRNAs whose
abundances are always less than 1/(2^log2.ratio
) of this value.
ct.filterReads( eset, trim = 1000, log2.ratio = 4, sampleKey = NULL, plot.it = TRUE, read.floor = NULL )
ct.filterReads( eset, trim = 1000, log2.ratio = 4, sampleKey = NULL, plot.it = TRUE, read.floor = NULL )
eset |
An unnormalized |
trim |
The number of gRNAs to be trimmed from the top of the distribution before estimating the abundance range. Empirically, this usually should be equal to about 2 to 5 percent of the guides in the library. |
log2.ratio |
Maximum abundance of contaminant gRNAs, expressed on the log2 scale from the top of the trimmed range
of each sample. That is, |
sampleKey |
An (optional) sample key, supplied as an ordered factor linking the samples to experimental
variables. The |
plot.it |
Logical value indicating whether to plot the adjusted gRNA densities on the default device. |
read.floor |
Optionally, the minimum number of reads required for each gRNA. |
An ExpressionSet
object, with trace-abundance gRNAs omitted.
Russell Bainer
data('es') ct.filterReads(es)
data('es') ct.filterReads(es)
This function visualizes relationships between gRNA GC content and their measured abundance or various differential expression model estimates.
ct.GCbias(data.obj, ann, sampleKey = NULL, lib.size = NULL)
ct.GCbias(data.obj, ann, sampleKey = NULL, lib.size = NULL)
data.obj |
An |
ann |
An annotation |
sampleKey |
An optional sample key, supplied as a factor linking the samples to experimental
variables. The |
lib.size |
An optional vector of voom-appropriate library size adjustment factors, usually calculated with |
An image relating GC content to experimental observations on the default device. If the
provided data.obj
is an ExpressionSet
, this takes the form of a scatter plot where the
GC
with a smoothed trendline within each sample. If data.obj
is a fit object describing a linear model
contrast, then four panels are returned describing the GC content distribution and its relationship
to guide-level fold change, standard deviation, and P-value estimates.
Russell Bainer
data('es') data('ann') data('fit') ct.GCbias(es, ann) ct.GCbias(fit, ann)
data('es') data('ann') data('fit') ct.GCbias(es, ann) ct.GCbias(fit, ann)
This is a wrapper function that enables direct generation of target-level p-values from a crispr screen.
ct.generateResults( fit, annotation, RRAalphaCutoff = 0.1, permutations = 1000, contrast.term = NULL, scoring = c("combined", "pvalue", "fc"), alt.annotation = NULL, permutation.seed = NULL )
ct.generateResults( fit, annotation, RRAalphaCutoff = 0.1, permutations = 1000, contrast.term = NULL, scoring = c("combined", "pvalue", "fc"), alt.annotation = NULL, permutation.seed = NULL )
fit |
An object of class |
annotation |
An annotation file for the experiment. gRNAs are annotated by
row, and must minimally contain columns |
RRAalphaCutoff |
A cutoff to use when defining gRNAs with significantly altered abundance during the RRAa aggregation step, which may be specified
as a single numeric value on the unit interval or as a logical vector. When supplied as a logical vector (of length equal to Note that this function uses directional tests to identify enriched or depleted targets, and when RRAalphaCutoff is provided as a logical vector the interpretation of the various aggregation statistics is going to be dependent on the specific criteria used to select reagents for inclusion. |
permutations |
The number of permutations to use during the RRAa aggregation step. |
contrast.term |
If a fit object with multiple coefficients is passed in, a string indiating the coefficient of interest. |
scoring |
The gRNA ranking method to use in RRAa aggregation. May take one of three values: |
alt.annotation |
Libraries targeting ambiguous biological elements (e.g., alternative promoters to a gene where the boundaries between elelments is contested) may contain reagents that are plausibly annotated to a finite set of possible targets. To accomodate this, users may supply an alternative reagent annotation in the form of a named list of vectors, where each list element corresponds something coercible to a to a character vector of associated targets that will ultimately be assembled into the 'geneSymbol' column of the 'resultsDF' object. Each of these character vectors should be named identically to a row of the supplied fit object (e.g., the 'row.names'). It is assumed that the 'geneID' values are assigned unambiguously to the reagents, and are passed through directly. |
permutation.seed |
numeric seed for permutation reproducibility.
Default: |
A dataframe containing gRNA-level and target-level statistics. In addition to the information present in the supplied annotation object, the returned object indicates P-values and Q-values for the depletion and enrichment of each gRNA and associated target, the median log2 fold change estimate among all gRNAs associated with the target, and Rho statistics that are calculated internally by the RRAa algorithm that may be useful in ranking targets that are considered significant at a given alpha or false discovery threshold.
A 'resultsDF' formatted dataframe containing gene-level statistics.
Russell Bainer
data('fit') data('ann') output <- ct.generateResults(fit, ann, permutations = 10) head(output) p = seq(0, 1, length.out=20) fc = seq(-3, 3, length.out=20) fc[2] = NA fc[3] = -20 stats = data.frame( Depletion.P=p, Enrichment.P=rev(p), fc=fc ) ct.applyAlpha(stats,scoring='combined')
data('fit') data('ann') output <- ct.generateResults(fit, ann, permutations = 10) head(output) p = seq(0, 1, length.out=20) fc = seq(-3, 3, length.out=20) fc[2] = NA fc[3] = -20 stats = data.frame( Depletion.P=p, Enrichment.P=rev(p), fc=fc ) ct.applyAlpha(stats,scoring='combined')
Update a gene-centric 'GeneSetDb' object for GREAT-style enrichment analysis using a specified annotation.
Often, pooled screening libraries are constructed such that the gene targets of interest are associated with variable numbers of semi-independent screen signals (associated with, e.g., sets of alternative promoters or cis regulatory units). Such an arrangement is often unavoidable but produces to complications when performing gene set enrichent analyses. This function conforms a standard 'GeneSetDb' object to appropriately consider this form of ultiple testing during ontological enrichment analyses according to the GREAT strategy outlined by [McLean et al. (2009)](https://doi.org/10.1038/nbt.1630).
Operationally, this means that genewise sets in the provided object will be translated to the corresponding 'geneSymbol' sets provided in the annotation file.
ct.GREATdb( annotation, gsdb = sparrow::getMSigGeneSetDb(collection = c("h", "c2"), species = "human", id.type = "ensembl"), minsize = 10, ... )
ct.GREATdb( annotation, gsdb = sparrow::getMSigGeneSetDb(collection = c("h", "c2"), species = "human", id.type = "ensembl"), minsize = 10, ... )
annotation |
an annotation object returned by |
gsdb |
A gene-centric |
minsize |
Minimum number of targets required to consider a geneset valid for analysis. |
... |
Additional arguments to be passed to 'ct.prepareAnnotation()'. |
A new GeneSetDb
object with the features annotated genewise to pathways.
data(resultsDF) data(ann) gsdb <- ct.GREATdb(ann, gsdb = sparrow::getMSigGeneSetDb(collection = 'h', species = 'human', id.type = 'entrez')) show(sparrow::featureIds(gsdb))
data(resultsDF) data(ann) gsdb <- ct.GREATdb(ann, gsdb = sparrow::getMSigGeneSetDb(collection = 'h', species = 'human', id.type = 'entrez')) show(sparrow::featureIds(gsdb))
This function median scales and log2 transforms the raw gRNA count data contained in an ExpressionSet, and then plots the ordered expression values within each replicate. The curve colors are assigned based on a user- specified column of the pData contained in the ExpressionSet. Optionally, this function can plot the location of Nontargeting control guides (or any guides, really) within the distribution.
ct.gRNARankByReplicate( eset, sampleKey, annotation = NULL, geneSymb = NULL, lib.size = NULL )
ct.gRNARankByReplicate( eset, sampleKey, annotation = NULL, geneSymb = NULL, lib.size = NULL )
eset |
An ExpressionSet object containing, at minimum, count data accessible by exprs() and some phenoData. |
sampleKey |
A sample key, supplied as a (possibly ordered) factor linking the samples to experimental
variables. The |
annotation |
An annotation dataframe indicating the nontargeting controls in the geneID column. |
geneSymb |
The |
lib.size |
An optional vector of voom-appropriate library size adjustment factors, usually calculated with |
A waterfall plot as specified, on the default device.
Russell Bainer
data('es') data('ann') #Build the sample key library(Biobase) sk <- ordered(relevel(as.factor(pData(es)$TREATMENT_NAME), 'ControlReference')) names(sk) <- row.names(pData(es)) ct.gRNARankByReplicate(es, sk, ann, 'Target1377')
data('es') data('ann') #Build the sample key library(Biobase) sk <- ordered(relevel(as.factor(pData(es)$TREATMENT_NAME), 'ControlReference')) names(sk) <- row.names(pData(es)) ct.gRNARankByReplicate(es, sk, ann, 'Target1377')
This function generates a plot relating the cumulative proportion of reads in each sample of a crispr screen to the abundance rank of the
underlying guides (or Targets). The purpose of this algorithm is to detect potential distortions in the library composition
that might not be properly controlled by sample normalization (see also: ct.stackedGuides()
).
ct.guideCDF(eset, sampleKey = NULL, plotType = "gRNA", annotation = NULL)
ct.guideCDF(eset, sampleKey = NULL, plotType = "gRNA", annotation = NULL)
eset |
An ExpressionSet object containing, at minimum, a matrix of gRNA abundances extractable with the exprs() function. |
sampleKey |
An optional sample key, supplied as an ordered factor linking the samples to experimental
variables. The |
plotType |
A string indicating whether the individual guides should be displayed (' |
annotation |
An optional data.frame containing an annotation object to be used to aggregate the guides into targets. gRNAs are annotated by row,
and must minimally contain a column |
A CDF plot displaying the appropriate CDF curves on the default device.
Russell Bainer
data('es') ct.guideCDF(es)
data('es') ct.guideCDF(es)
For many gCrisprTools functions, a sample key must be provided that specifies sample mapping to experimental groups. The sample key should be provided as a single, named factor whose names exactly correspond to the 'colnames()' of the 'ExpressionSet' containing the count data to be processed (or coercible as such). By convention, the first level corresponds to the control sample group.
This function checks whether the specified sample key is of the proper format and has properties consistent with the specified object.
ct.inputCheck(sampleKey, object)
ct.inputCheck(sampleKey, object)
sampleKey |
A named factor, where the |
object |
An |
A logical indicating whether the objects are compatible.
Russell Bainer
For many gCrisprTools functions, a sample key must be provided that specifies sample mapping to experimental groups. The sample key should be provided as a single, named factor whose names exactly correspond to the 'colnames()' of the 'ExpressionSet' containing the count data to be processed (or coercible as such). By convention, the first level corresponds to the control sample group.
This function checks whether the specified sample key is of the proper format and has properties consistent with the specified object.
ct.keyCheck(sampleKey, object)
ct.keyCheck(sampleKey, object)
sampleKey |
A named factor, where the |
object |
An |
Invisibly, a properly formatted 'sampleKey'.
Russell Bainer
data('es') library(limma) library(Biobase) #Build the sample key sk <- relevel(as.factor(pData(es)$TREATMENT_NAME), 'ControlReference') names(sk) <- row.names(pData(es)) ct.keyCheck(sk, es)
data('es') library(limma) library(Biobase) #Build the sample key sk <- relevel(as.factor(pData(es)$TREATMENT_NAME), 'ControlReference') names(sk) <- row.names(pData(es)) ct.keyCheck(sk, es)
This is a function to generate an html Contrast report for a
CRISPR screen, focusing on contrast-level analyses collected from other
functions in gCrisprTools
. It is designed to be used 'as-is', and
analysts interested in using different functionalities of the various
functions should do that outside of this wrapper script.
ct.makeContrastReport( eset, fit, sampleKey, results, annotation, comparison.id, identifier, contrast.subset = colnames(eset), outdir = NULL )
ct.makeContrastReport( eset, fit, sampleKey, results, annotation, comparison.id, identifier, contrast.subset = colnames(eset), outdir = NULL )
eset |
An ExpressionSet object containing, at minimum, a matrix of gRNA
abundances extractable with the |
fit |
A fit object for the contrast of interest, usually generated with
|
sampleKey |
A sample key, supplied as an ordered factor linking the
samples to experimental variables. The |
results |
A data.frame summarizing the results of the screen, returned
by the function |
annotation |
An annotation object for the experiment. See the man page
for |
comparison.id |
character with a name of the comparison. |
identifier |
A character string to name the report and corresponding
subdirectories. If provided, the final report will be called
' |
contrast.subset |
character vector containing the sample
labels to be used in the analysis; all elements must be contained in the
|
outdir |
An optional character string indicating the directory in which to
generate the report. If |
The path to the generated html report.
Russell Bainer, Dariusz Ratman
data('es') data('fit') data('ann') data('resultsDF') ##' #Build the sample key library(Biobase) sk <- ordered(relevel(as.factor(pData(es)$TREATMENT_NAME), 'ControlReference')) names(sk) <- row.names(pData(es)) path2report <- ct.makeContrastReport(es, fit, sk, resultsDF, ann, comparison.id = NULL, outdir = '.')
data('es') data('fit') data('ann') data('resultsDF') ##' #Build the sample key library(Biobase) sk <- ordered(relevel(as.factor(pData(es)$TREATMENT_NAME), 'ControlReference')) names(sk) <- row.names(pData(es)) path2report <- ct.makeContrastReport(es, fit, sk, resultsDF, ann, comparison.id = NULL, outdir = '.')
This is a function to generate an html QC report for a CRISPR
screen, focusing on experiment-level and library-level analyses collected
from other functions in gCrisprTools
. It is designed to be used
'as-is', and analysts interested in using different functionalities of the
various functions should do that outside of this wrapper script.
ct.makeQCReport( eset, trim, log2.ratio, sampleKey, annotation, aln, identifier = NULL, lib.size, geneSymb = NULL, outdir = NULL )
ct.makeQCReport( eset, trim, log2.ratio, sampleKey, annotation, aln, identifier = NULL, lib.size, geneSymb = NULL, outdir = NULL )
eset |
An ExpressionSet object containing, at minimum, a matrix of gRNA
abundances extractable with the |
trim |
The number of gRNAs to be trimmed from the top of the distribution before estimating the abundance range. Empirically, this usually should be equal to about 2 to 5 percent of the guides in the library. |
log2.ratio |
Maximum abundance of contaminant gRNAs, expressed on the
log2 scale from the top of the trimmed range of each sample. That is,
|
sampleKey |
A sample key, supplied as an ordered factor linking the
samples to experimental variables. The |
annotation |
An annotation object for the experiment. See the man page
for |
aln |
A numeric alignment matrix, where rows correspond to 'targets', 'nomatch', 'rejections', and 'double_match', and where columns correspond to experimentasl samples. May be 'NULL', to suppress alignment evaluation. |
identifier |
A character string to name the report and corresponding
subdirectories. If provided, the final report will be called
' |
lib.size |
An optional vector of voom-appropriate library size adjustment factors, usually calculated with |
geneSymb |
The |
outdir |
An optional character string indicating the directory in which
to generate the report. If |
The path to the generated html report.
Russell Bainer, Dariusz Ratman
data('es') data('ann') data('aln') ##' #Build the sample key library(Biobase) sk <- ordered(relevel(as.factor(pData(es)$TREATMENT_NAME), 'ControlReference')) names(sk) <- row.names(pData(es)) path2report <- ct.makeQCReport(es, trim = 1000, log2.ratio = 0.0625, sk, ann, aln, identifier = NULL, lib.size = NULL, geneSymb = 'NoTarget', outdir = '.')
data('es') data('ann') data('aln') ##' #Build the sample key library(Biobase) sk <- ordered(relevel(as.factor(pData(es)$TREATMENT_NAME), 'ControlReference')) names(sk) <- row.names(pData(es)) path2report <- ct.makeQCReport(es, trim = 1000, log2.ratio = 0.0625, sk, ann, aln, identifier = NULL, lib.size = NULL, geneSymb = 'NoTarget', outdir = '.')
This is a function to generate an html report for a CRISPR screen, incorporating information about a specified contrast.
The report contains a combination of experiment-level and contrast-specific analyses, largely collected from other functions in
gCrisprTools
. It is designed to be used 'as-is', and analysts interested in using different functionalities of the various
functions should do that outside of this wrapper script.
ct.makeReport( fit, eset, sampleKey, annotation, results, aln, outdir = NULL, contrast.term = NULL, identifier = NULL )
ct.makeReport( fit, eset, sampleKey, annotation, results, aln, outdir = NULL, contrast.term = NULL, identifier = NULL )
fit |
An object of class |
eset |
An ExpressionSet object containing, at minimum, a matrix of gRNA abundances extractable with the |
sampleKey |
A sample key, supplied as an ordered factor linking the samples to experimental
variables. The |
annotation |
An annotation object for the experiment. See the man page for |
results |
A data.frame summarizing the results of the screen, returned by the function |
aln |
A numeric alignment matrix, where rows correspond to 'targets', 'nomatch', 'rejections', and 'double_match', and where columns correspond to experimentasl samples. Users may also pass 'NULL' to suppress evaluation of alignment. |
outdir |
A directory in which to generate the report; if |
contrast.term |
A parameter passed to |
identifier |
A character string to name the report and corresponding subdirectories. If provided, the final report will be called
' |
The path to the generated html report.
Russell Bainer
data('fit') data('es') ##' #Build the sample key library(Biobase) sk <- relevel(as.factor(pData(es)$TREATMENT_NAME), 'ControlReference') names(sk) <- row.names(pData(es)) data('ann') data('resultsDF') data('aln') path2report <- ct.makeReport(fit, es, sk, ann, resultsDF, aln, outdir = '.')
data('fit') data('es') ##' #Build the sample key library(Biobase) sk <- relevel(as.factor(pData(es)$TREATMENT_NAME), 'ControlReference') names(sk) <- row.names(pData(es)) data('ann') data('resultsDF') data('aln') path2report <- ct.makeReport(fit, es, sk, ann, resultsDF, aln, outdir = '.')
Makes random distribution of Rho value by taking nperm random samples of n rank stats, p.
ct.makeRhoNull(n, p, nperm)
ct.makeRhoNull(n, p, nperm)
n |
single integer, number of guides per gene |
p |
numeric vector of rank statistics |
nperm |
single integer, how many random samples to take. |
numeric vector of Rho values
ct.makeRhoNull(3, 1:9, 5)
ct.makeRhoNull(3, 1:9, 5)
This function normalizes Crispr gRNA abundance estimates by equalizing the slopes of the middle (logged) values of the distribution across samples. Specifically, the algorithm ranks the gRNA abundance estimates within each sample and determines a relationship between rank change and gRNA within a trimmed region of the distribution via a linear fit. It then adjusts each sample such that the center of the logged abundance distribution is strictly horizontal and returns these values as median-scaled counts in the appropriate slot of the input ExpressionObject.
ct.normalizeBySlope(ExpressionObject, trim = 0.25, lib.size = NULL, ...)
ct.normalizeBySlope(ExpressionObject, trim = 0.25, lib.size = NULL, ...)
ExpressionObject |
An ExpressionSet containing, at minimum, count data accessible by |
trim |
The proportion to be trimmed from each end of the distributionbefore performing the linear fit; algorithm defaults to 25 fit is performed on the interquartile range. |
lib.size |
An optional vector of size factor adjusted library size.
Default: |
... |
Other arguments to be passed to |
A renormalized object of the same type as the provided object.
Russell Bainer
data('es') data('ann') #Build the sample key and library sizes for visualization library(Biobase) sk <- ordered(relevel(as.factor(pData(es)$TREATMENT_NAME), 'ControlReference')) names(sk) <- row.names(pData(es)) ls <- colSums(exprs(es)) es.norm <- ct.normalizeBySlope(es, lib.size= ls) ct.gRNARankByReplicate(es, sk, lib.size= ls) ct.gRNARankByReplicate(es.norm, sk, lib.size= ls)
data('es') data('ann') #Build the sample key and library sizes for visualization library(Biobase) sk <- ordered(relevel(as.factor(pData(es)$TREATMENT_NAME), 'ControlReference')) names(sk) <- row.names(pData(es)) ls <- colSums(exprs(es)) es.norm <- ct.normalizeBySlope(es, lib.size= ls) ct.gRNARankByReplicate(es, sk, lib.size= ls) ct.gRNARankByReplicate(es.norm, sk, lib.size= ls)
This function applies quantile normalization to subsets of samples defined by a provided factor, correcting for library size. It does this by converting raw count values to log2 counts per million and optionally adjusting further in the usual way by dividing these values by user-specified library size factors; then this matrix is split into groups according to the provided factor that are quantile normalized, and then the groups are median scaled to each other before conversion back into raw counts. This method is best used in comparisons for long timecourse screens, where groupwise differences in growth rate cause uneven intrinsic dialation of construct distributions.
Note that this normalization strategy is not appropriate for experiments where significant distortion of the libraries is expected as a consequence of the screening strategy (e.g., strong selection screens).
ct.normalizeFQ(eset, sets, lib.size = NULL)
ct.normalizeFQ(eset, sets, lib.size = NULL)
eset |
An |
sets |
A character or factor object delineating which samples should be grouped together during the normalization step. Must be the same length as the number of columns in the provided eset, and cannot contain 'NA' or 'NULL' values. |
lib.size |
An optional vector of voom-appropriate library size adjustment factors, usually calculated with |
A renormalized ExpressionSet object of the same type as the provided object.
Russell Bainer
data('es') #Build the sample key and library sizes for visualization library(Biobase) sk <- relevel(as.factor(pData(es)$TREATMENT_NAME), 'ControlReference') names(sk) <- row.names(pData(es)) ls <- colSums(exprs(es)) es.norm <- ct.normalizeFQ(es, sets = gsub('(Death|Control)', '', pData(es)$TREATMENT_NAME), lib.size= ls) ct.gRNARankByReplicate(es, sampleKey = sk, lib.size= ls) ct.gRNARankByReplicate(es.norm, sampleKey = sk, lib.size= ls)
data('es') #Build the sample key and library sizes for visualization library(Biobase) sk <- relevel(as.factor(pData(es)$TREATMENT_NAME), 'ControlReference') names(sk) <- row.names(pData(es)) ls <- colSums(exprs(es)) es.norm <- ct.normalizeFQ(es, sets = gsub('(Death|Control)', '', pData(es)$TREATMENT_NAME), lib.size= ls) ct.gRNARankByReplicate(es, sampleKey = sk, lib.size= ls) ct.gRNARankByReplicate(es.norm, sampleKey = sk, lib.size= ls)
This function normalizes Crispr gRNA abundance estimates contained in an ExpressionSet
object.
Currently four normalization methods are implemented: median scaling (via normalizeMedianValues
), slope-based
normalization (via ct.normalizeBySlope()
), scaling to the median of the nontargeting control values (via
ct.normalizeNTC()
), factored quantile normalization (via ct.normalizeFQ()
), and spline fitting to the distribution of
selected gRNAs (via ct.normalizeSpline()
). Because of the peculiarities of pooled Crispr screening data, these
implementations may be more stable than the endogenous methods used downstream by voom. See the respective
man pages for further details about specific normalization approaches.
ct.normalizeGuides( eset, method = c("scale", "FQ", "slope", "controlScale", "controlSpline"), annotation = NULL, sampleKey = NULL, lib.size = NULL, plot.it = FALSE, ... )
ct.normalizeGuides( eset, method = c("scale", "FQ", "slope", "controlScale", "controlSpline"), annotation = NULL, sampleKey = NULL, lib.size = NULL, plot.it = FALSE, ... )
eset |
An ExpressionSet object with integer count data extractable with |
method |
The normalization method to use. |
annotation |
The annotation object for the library, required for the methods employing nontargeting controls. |
sampleKey |
An (optional) sample key, supplied as an ordered factor linking the samples to experimental
variables. The |
lib.size |
An optional vector of voom-appropriate library size adjustment factors, usually calculated with |
plot.it |
Logical indicating whether to plot the ranked log2 gRNA count distributions before and after normalization. |
... |
Other parameters to be passed to the individual normalization methods. |
A renormalized ExpressionSet. If specified, the sample level counts will be scaled so as to maintain the validity
of the specified lib.size
values.
Russell Bainer
ct.normalizeMedians
, ct.normalizeBySlope
, ct.normalizeNTC
, ct.normalizeSpline
data('es') data('ann') #Build the sample key as needed library(Biobase) sk <- ordered(relevel(as.factor(pData(es)$TREATMENT_NAME), 'ControlReference')) names(sk) <- row.names(pData(es)) es.norm <- ct.normalizeGuides(es, 'scale', annotation = ann, sampleKey = sk, plot.it = TRUE) es.norm <- ct.normalizeGuides(es, 'slope', annotation = ann, sampleKey = sk, plot.it = TRUE) es.norm <- ct.normalizeGuides(es, 'controlScale', annotation = ann, sampleKey = sk, plot.it = TRUE, geneSymb = 'NoTarget') es.norm <- ct.normalizeGuides(es, 'controlSpline', annotation = ann, sampleKey = sk, plot.it = TRUE, geneSymb = 'NoTarget')
data('es') data('ann') #Build the sample key as needed library(Biobase) sk <- ordered(relevel(as.factor(pData(es)$TREATMENT_NAME), 'ControlReference')) names(sk) <- row.names(pData(es)) es.norm <- ct.normalizeGuides(es, 'scale', annotation = ann, sampleKey = sk, plot.it = TRUE) es.norm <- ct.normalizeGuides(es, 'slope', annotation = ann, sampleKey = sk, plot.it = TRUE) es.norm <- ct.normalizeGuides(es, 'controlScale', annotation = ann, sampleKey = sk, plot.it = TRUE, geneSymb = 'NoTarget') es.norm <- ct.normalizeGuides(es, 'controlSpline', annotation = ann, sampleKey = sk, plot.it = TRUE, geneSymb = 'NoTarget')
This function normalizes Crispr gRNA abundance estimates by equalizing the median gRNA abundance values after
correcting for library size. It does this by converting raw count values to log2 counts per million and optionally adjusting further in
the usual way by dividing these values by user-specified library size factors. THis method should be more stable than the endogenous
scaling functions used in voom
in th especific case of Crispr screens or other cases where the median number of observed counts may be low.
ct.normalizeMedians(eset, lib.size = NULL)
ct.normalizeMedians(eset, lib.size = NULL)
eset |
An |
lib.size |
An optional vector of voom-appropriate library size adjustment factors, usually calculated with |
A renormalized ExpressionSet object of the same type as the provided object.
Russell Bainer
data('es') #Build the sample key and library sizes for visualization library(Biobase) sk <- ordered(relevel(as.factor(pData(es)$TREATMENT_NAME), 'ControlReference')) names(sk) <- row.names(pData(es)) ls <- colSums(exprs(es)) es.norm <- ct.normalizeMedians(es, lib.size= ls) ct.gRNARankByReplicate(es, sampleKey = sk, lib.size= ls) ct.gRNARankByReplicate(es.norm, sampleKey = sk, lib.size= ls)
data('es') #Build the sample key and library sizes for visualization library(Biobase) sk <- ordered(relevel(as.factor(pData(es)$TREATMENT_NAME), 'ControlReference')) names(sk) <- row.names(pData(es)) ls <- colSums(exprs(es)) es.norm <- ct.normalizeMedians(es, lib.size= ls) ct.gRNARankByReplicate(es, sampleKey = sk, lib.size= ls) ct.gRNARankByReplicate(es.norm, sampleKey = sk, lib.size= ls)
This function normalizes Crispr gRNA abundance estimates by equalizing the median
abundances of the nontargeting gRNAs within each sample. The normalized values are returned as normalized counts in
the 'exprs
' slot of the input eset. Note that this method may be unstable if the screening library contains
relatively few nontargeting gRNAs.
ct.normalizeNTC(eset, annotation, lib.size = NULL, geneSymb = NULL)
ct.normalizeNTC(eset, annotation, lib.size = NULL, geneSymb = NULL)
eset |
An ExpressionSet object containing, at minimum, count data accessible by |
annotation |
An annotation dataframe indicating the nontargeting controls in the geneID column. |
lib.size |
An optional vector of voom-appropriate library size adjustment factors, usually calculated with |
geneSymb |
The |
A normalized eset
.
Russell Bainer
data('es') data('ann') #Build the sample key and library sizes for visualization library(Biobase) sk <- ordered(relevel(as.factor(pData(es)$TREATMENT_NAME), 'ControlReference')) names(sk) <- row.names(pData(es)) ls <- colSums(exprs(es)) es.norm <- ct.normalizeNTC(es, ann, lib.size = ls, geneSymb = 'NoTarget') ct.gRNARankByReplicate(es, sk, lib.size = ls) ct.gRNARankByReplicate(es.norm, sk, lib.size = ls)
data('es') data('ann') #Build the sample key and library sizes for visualization library(Biobase) sk <- ordered(relevel(as.factor(pData(es)$TREATMENT_NAME), 'ControlReference')) names(sk) <- row.names(pData(es)) ls <- colSums(exprs(es)) es.norm <- ct.normalizeNTC(es, ann, lib.size = ls, geneSymb = 'NoTarget') ct.gRNARankByReplicate(es, sk, lib.size = ls) ct.gRNARankByReplicate(es.norm, sk, lib.size = ls)
This function normalizes Crispr gRNA abundance estimates by fiting a smoothed spline to a subset of the gRNAs within each sample
and then equalizing these curves across the experiment. Specifically, the algorithm ranks the gRNA abundance estimates within each sample and
uses a smoothed spline to determine a relationship between the ranks of the "anchor" guides and their abundance estimates. It then adjusts the
spline trends from each sample to the mean of all of the sample spline fits in a manner analogous to quantile normalization, interpolating the
gRNA abundance values between the anchor points; these values are returned as normalized counts in the 'exprs
' slot of the input eset.
ct.normalizeSpline(eset, annotation, geneSymb = NULL, lib.size = NULL)
ct.normalizeSpline(eset, annotation, geneSymb = NULL, lib.size = NULL)
eset |
An ExpressionSet object containing, at minimum, count data accessible by |
annotation |
An annotation dataframe indicating the nontargeting controls in the geneID column. |
geneSymb |
The |
lib.size |
An optional vector of voom-appropriate library size adjustment factors, usually calculated with |
A normalized eset
.
Russell Bainer
data('es') data('ann') #Build the sample key and library sizes for visualization library(Biobase) sk <- (relevel(as.factor(pData(es)$TREATMENT_NAME), 'ControlReference')) names(sk) <- row.names(pData(es)) ls <- colSums(exprs(es)) es.norm <- ct.normalizeSpline(es, ann, 'NoTarget', lib.size = ls) ct.gRNARankByReplicate(es, sk, lib.size = ls) ct.gRNARankByReplicate(es.norm, sk, lib.size = ls)
data('es') data('ann') #Build the sample key and library sizes for visualization library(Biobase) sk <- (relevel(as.factor(pData(es)$TREATMENT_NAME), 'ControlReference')) names(sk) <- row.names(pData(es)) ls <- colSums(exprs(es)) es.norm <- ct.normalizeSpline(es, ann, 'NoTarget', lib.size = ls) ct.gRNARankByReplicate(es, sk, lib.size = ls) ct.gRNARankByReplicate(es.norm, sk, lib.size = ls)
This is an accessory function to 'ct.expandAnnotation()' function, which enables users to expand annotation objects to accomodate reagent libraries where reagents are expected to impact a set of known targets. See documentation for that function for additional details.
Often, libraries that contain multiply-targeting reagents are annotated using a structured format that can be decomposed by regex matching. This function takes in an annotation object containing an 'ID' column indicating the reagent ID and a 'geneSymbol' column containing the target mappings, and parses the target mappings according to a known annotation format. Currently supported formats are 'cellecta' (e.g., 'TARGET_P1P2P3' indicating multiple promoters associated with a known target), and 'underscore', where different targets are concatenated using the underscore separator (e.g., 'TARGET1_TARGET2_TARGET3').
Returns an 'alt.annotation'-type list of character vectors encoding the target mappings for each reagent.
ct.parseGeneSymbol(ann, format = c("cellecta", "underscore"))
ct.parseGeneSymbol(ann, format = c("cellecta", "underscore"))
ann |
A |
format |
Format of the geneSymbol column strings. |
A named 'alt.annotation'-type list of character vectors encoding the target mappings for each reagent
Russell Bainer
fakeann <- data.frame('ID' = LETTERS[1:4], 'geneSymbol' = c('T1_P1', 'T1_P1P2', 'T1_P2P1', 'T1_P2')) ct.parseGeneSymbol(fakeann, 'cellecta') ct.parseGeneSymbol(fakeann, 'underscore')
fakeann <- data.frame('ID' = LETTERS[1:4], 'geneSymbol' = c('T1_P1', 'T1_P1P2', 'T1_P2P1', 'T1_P2')) ct.parseGeneSymbol(fakeann, 'cellecta') ct.parseGeneSymbol(fakeann, 'underscore')
Given a set of targets of interest, this function generates a Precision Recall curve from the results of a CRISPR screen. Specifically, it orders the target elements in the screen in the specified direction, and then plots the recall rate (proportion of true targets identified) against the precision (proportion of identified targets that are true targets).
Note that ranking statistics in CRISPR screens are (usually) permutation-based, and so some granularity in the rankings is expected. This function does a little extra work to ensure that hits are counted as soon as the requisite value of the ranking statistic is reached regardless of where the gene is located within the block of equally-significant genes. Functionally, this means that the drawn curve is somewhat anticonservative in cases where the gene ranks are not well differentiated.
ct.PRC( summaryDF, target.list, direction = c("enrich", "deplete"), plot.it = TRUE )
ct.PRC( summaryDF, target.list, direction = c("enrich", "deplete"), plot.it = TRUE )
summaryDF |
A dataframe summarizing the results of the screen, returned by the function |
target.list |
A character vector containing the names of the targets to be tested; by default these are assumed to be 'geneID's, but specifying 'collapse=geneSymbol' enables setting on 'geneSymbol' by passing that value through to 'ct.simpleResult'. |
direction |
Direction by which to order target signals ('enrich' or 'deplete'). |
plot.it |
Logical value indicating whether to plot the curves. |
A list containing the the x and y coordinates of the curve.
Russell Bainer
data('resultsDF') data('essential.genes') #Note that this is an artificial example. pr <- ct.PRC(resultsDF, essential.genes, 'enrich') str(pr)
data('resultsDF') data('essential.genes') #Note that this is an artificial example. pr <- ct.PRC(resultsDF, essential.genes, 'enrich') str(pr)
This function processes a supplied annotation object for use in a pooled screening experiment.
Originally this was processed into something special, but now it essentially returns
the original annotation object in which the geneSymbol column has been factorized. This is primarily used
internally during a call to the ct.generateResults()
function. Also performs some minor functionality checking,
and ensures that the reagent identifiers are present as an 'ID' column (if absent, the row.names are used).
Valid annotations contain both 'geneID' and 'geneSymbol' columns. This is because there is often a distinction between the official gene that is being targeted and a coherent set of gRNAs that make up a testing cohort. For example, multiple sets of guides may target distinct promoters, exons, or other entities that are expected to produce distinct biological phenomena related to the gene that should be interpreted separately. For this reason, the 'geneID' column encodes the official gene designation (typically an ensembl or entrez gene identifier) while the 'geneSymbol' column contains a human-readable descriptor of the gRNA target (such as a gene symbol or promoter name). This mapping can be further expanded to incorporate mapping ambiguity via the 'ct.expandAnnotation()' function.
ct.prepareAnnotation(ann, object = NULL, controls = TRUE, throw.error = TRUE)
ct.prepareAnnotation(ann, object = NULL, controls = TRUE, throw.error = TRUE)
ann |
A |
object |
If supplied, an object with |
controls |
The name of a value in the |
throw.error |
Logical indicating whether to throw an error when |
A new annotation data frame, usually with nontargeting controls and NA values reformatted to NoTarget
(and geneID
set to 'no_gid'
), and the 'geneSymbol' column of ann
factorized. If supplied with
an object
, the gRNAs not present in the object
will be omitted.
Russell Bainer
data('ann') data('es') es <- ct.filterReads(es) newann <- ct.prepareAnnotation(ann, es)
data('ann') data('es') es <- ct.filterReads(es) newann <- ct.prepareAnnotation(ann, es)
This function takes in a supplied results data.frame, optionally transforms it into a 'simplifiedResult', and returns the ranks of the target-level signals.
ct.rankSimple(df, top = c("enrich", "deplete"))
ct.rankSimple(df, top = c("enrich", "deplete"))
df |
A results data.frame, in either raw or simplified form. Will be converted to simplified form if necessary. |
top |
Determines the directionality of the ranking. 'enrich' defines ranks from the most enriched to the most depleted target; 'deplete' does the opposite |
A numeric vector of ranks, with length equal to the number of rows in the simplified data.frame.
Russell Bainer
data('resultsDF') df.simple <- ct.simpleResult(resultsDF) sr <- ct.rankSimple(resultsDF) all((df.simple$best.p[sr == 1] == 0), (df.simple$direction[sr == 1] == 'enrich')) sr <- ct.rankSimple(resultsDF, 'deplete') all((df.simple$best.p[sr == 1] == 0), (df.simple$direction[sr == 1] == 'deplete'))
data('resultsDF') df.simple <- ct.simpleResult(resultsDF) sr <- ct.rankSimple(resultsDF) all((df.simple$best.p[sr == 1] == 0), (df.simple$direction[sr == 1] == 'enrich')) sr <- ct.rankSimple(resultsDF, 'deplete') all((df.simple$best.p[sr == 1] == 0), (df.simple$direction[sr == 1] == 'deplete'))
This function plots the per-sample densities of raw gRNA read counts on the log10 scale. The curve colors are assigned based on a user- specified sampleKey. This function is primarily useful to determine whether libraries are undersequenced (low mean raw gRNA counts), contaminated (many low-abundance gRNAs present), or if PCR artifacts may be present (subset of extremely abundant guides, multiple gRNA distribution modes). In most well-executed experiments the majority of gRNAs will form a tight distribution around some reasonably high average read count (hundreds of reads), at least among the control samples. Excessively low raw count values can compromise normalization steps and subsequent estimation of gRNA levels, especially in screens in which most gRNAs have minimal effects on cell viability.
ct.rawCountDensities(eset, sampleKey = NULL, lib.size = NULL)
ct.rawCountDensities(eset, sampleKey = NULL, lib.size = NULL)
eset |
An ExpressionSet object containing, at minimum, count data accessible by exprs() and some phenoData. |
sampleKey |
A sample key, supplied as a (possibly ordered) factor linking the samples to experimental
variables. The |
lib.size |
Optional named vector of library sizes (total reads within the library) to enable normalization |
A density plot as specified on the default device.
Russell Bainer
data('es') #Build the sample key library(Biobase) sk <- relevel(as.factor(pData(es)$TREATMENT_NAME), 'ControlReference') names(sk) <- row.names(pData(es)) ct.rawCountDensities(es, sk)
data('es') #Build the sample key library(Biobase) sk <- relevel(as.factor(pData(es)$TREATMENT_NAME), 'ControlReference') names(sk) <- row.names(pData(es)) ct.rawCountDensities(es, sk)
This function prepares multiple 'gCrisprTools' results dataframes for comparison. Specifically, it checks that all provided data frames are valid result objects, converts each to the target-wise 'simpleResult' format, removes signals that are not shared by all objects, places their rows in identical order, and then returns the simplified dataframes as a list.
This function is largely meant to be used by other gCrisprtools functions, although there are occasions when an analyst may want to call it directly. Often, it is useful to pass the 'collapse' argument to 'ct.simpleresult()' in cases where libraries and technologies differ between screens.
ct.regularizeContrasts(dflist, collapse = c("geneSymbol", "geneID"))
ct.regularizeContrasts(dflist, collapse = c("geneSymbol", "geneID"))
dflist |
A list of results dataframes. Names will be preserved. |
collapse |
Column of the provided resultsDFs on which to collapse values; should be 'geneSymbol' or 'geneID'. |
A list of the in-register 'simpleResult' objects, with length and names identical to 'dflist'.
data('resultsDF') lapply(ct.regularizeContrasts(list('df1' = resultsDF[1:300,], 'df2' = resultsDF[200:400,])), nrow)
data('resultsDF') lapply(ct.regularizeContrasts(list('df1' = resultsDF[1:300,], 'df2' = resultsDF[200:400,])), nrow)
Many gCrisprTools functions operate on a data.frame
of results
generated by a CRISPR screen. This function takes in a supplied object and returns
a logical indicating whether the object can be treated as one of these data.frames
for the purposes of downstream analyses. This is largely used internally, but can
be useful if a user needs to build a result object for some reason.
ct.resultCheck(summaryDF)
ct.resultCheck(summaryDF)
summaryDF |
A |
A logical indicating whether the object is of the appropriate format.
Russell Bainer
data('resultsDF') ct.resultCheck(resultsDF)
data('resultsDF') ct.resultCheck(resultsDF)
Given a set of targets of interest, this function generates a ROC curve and associated statistics from the results of a CRISPR screen. Specifically, it orders the elements targeted in the screen in the specified direction, and then plots the cumulative proportion of positive hits on the y-axis. The corresponding vectors and Area Under the Curve (AUC) statistic are returned as a list.
Note that ranking statistics in CRISPR screens are (usually) permutation-based, and so some granularity is expected. This function does a little extra work to ensure that hits are counted as soon as the requisite value of the ranking statistic is reached regardless of where the gene is located within the block of equally-significant genes. Functionally, this means that the drawn curve is somewhat anticonservative in cases where the gene ranks are not well differentiated.
ct.ROC( summaryDF, target.list, direction = c("enrich", "deplete"), condense = TRUE, plot.it = TRUE, ... )
ct.ROC( summaryDF, target.list, direction = c("enrich", "deplete"), condense = TRUE, plot.it = TRUE, ... )
summaryDF |
A dataframe summarizing the results of the screen, returned by the function |
target.list |
A character vector containing the names of the targets to be tested. Only targets contained in the |
direction |
Direction by which to order target signals ('enrich' or 'deplete'). |
condense |
Logical indicating whether the returned x and y coordinates should be 'condensed', returning only the points at which
the detected proportion of |
plot.it |
Logical value indicating whether to plot the curves. |
... |
Additional parameters for 'ct.simpleResult()' |
A list containing the the x and y coordinates of the curve, and the AUC statistic (invisibly).
Russell Bainer
data('resultsDF') data('essential.genes') #Note that this is an artificial example. roc <- ct.ROC(resultsDF, essential.genes, direction = 'deplete') str(roc)
data('resultsDF') data('essential.genes') #Note that this is an artificial example. roc <- ct.ROC(resultsDF, essential.genes, direction = 'deplete') str(roc)
This is a function for comparing the results of two screening experiments. Given two summaryDF
,
the function places them in register with one another, generates a simplified scatter plot where enrichment or depletion
in each contrast is represented by the associated "signed" log10 (*P*/*Q*)-value (where enriched signals are represented
in the positive direction and depleted signals are shown in the negative direction), and returns an invisible 'data.frame'
containing the target X-axis and Y-axis coordinates and corresponding quadrant.
This is a target-level analysis, and some minor simplifications are introduced to screen signals for the sake of clarity. Principal among these is the decision to collapse gene signals to a single directional enrichment statistic. Target-level signals are typically aggregates of many guide-level signals, it is formally possible for targets to be both significantly enriched and significantly depleted within a single screen contrast as a result of substantially divergent reagent activity. This behavior is uncommon, however, and so targets are represented by selecting the direction of enrichment or depletion associated with the most significant (*P*/*Q*)-value. This directionality is then encoded into the X-axis and Y-axis position of the target as the sign of the signal as described above.
ct.scatter( dflist, targets = c("geneSymbol", "geneID"), statistic = c("best.p", "best.q"), cutoff = 0.05, plot.it = TRUE )
ct.scatter( dflist, targets = c("geneSymbol", "geneID"), statistic = c("best.p", "best.q"), cutoff = 0.05, plot.it = TRUE )
dflist |
A (named) list of results dataframes, of length 2. See |
targets |
Column of the provided |
statistic |
Statistic to plot on each axis (after -log10 transformation). Must be 'p', 'q', or 'rho'. |
cutoff |
significance cutoff used to define the significance quadrants (cannot be exactly zero). |
plot.it |
Logical indicating whether to compose the plot on the default device. |
Invisibly, a list of length 4 containing the genes passing significance for the respective quadrants.
Russell Bainer
data('resultsDF') scat <- ct.scatter(list('FirstResult' = resultsDF[100:2100,], 'SecondResult' = resultsDF[1:2000,])) head(scat)
data('resultsDF') scat <- ct.scatter(list('FirstResult' = resultsDF[100:2100,], 'SecondResult' = resultsDF[1:2000,])) head(scat)
This function is a wrapper for the 'sparrow::seas()' function, which identifies differentially enriched/depleted ontological categories within the hits identified by a pooled screening experiment, given a provided 'GenseSetDb()' object and a list of results objects created by 'ct.generateResults()'. By default testing is performed using 'fgsea' and a hypergeometric test ('sparrow::ora()'), and results are returned as a 'SparrowResult' object.
This function will attempt to coerce them into inputs appropriate for the above analyses via 'ct.seasPrep()', after checking the relevant parameters within the provided 'GeneSetDb'. This is generally easier than going through the individual steps yourself, especially when the user is minimally postprocessing the contrast results in question.
Sometimes, it can be useful to directly indicate the set of targets to be included in an enrichment analysis (e.g., if you wish to expand or contract the set of active targets based on signal validation or other secondary information about the experiment). To accomodate this use case, users may include a logical column in the provided result object(s) indicating which elements should be included among the positive signals exposed to the test.
Note that many pooled libraries specifically target biased sets of genes, often focusing on genes involved in a particular pathway or encoding proteins with a shared biological property. Consequently, the enrichment results returned by this function represent the disproportionate enrichment or depletion of targets annotated to pathways *within the context of the screen*, and may or may not be informative of the underlying biology in question. This means that pathways not targeted by a library will obviously never be enriched a positive target set regardless of their biological relevance, and pathways enriched within a focused library screen are similarly expected to partially reflect the composition of the library and other confounding issues (e.g., number of targets within a pathway). Analysts should therefore use this function with care. For example, it might be unsurprising to detect pathways related to histone modification within a screen employing a crispr library primarily targeting epigenetic regulators.
ct.seas(dflist, gdb, active = "replicated", ...)
ct.seas(dflist, gdb, active = "replicated", ...)
dflist |
A result object created by 'ct.generateResults()', or a named list containing many of them; will be passed as a list to 'ct.seasPrep()' with the associated '...' arguments. |
gdb |
A 'GenseSetDb' object containing annotations for the targets specified in 'result'. |
active |
Name of a column in the supplied result(s) that should be used to indicate active/selected targets. |
... |
Additional arguments to pass to 'ct.seasPrep()' or 'sparrow::seas()'. |
A named list of 'SparrowResults' objects.
Steve Lianoglou for seas; Russell Bainer for GeneSetDb processing and wrapping functions.
data('resultsDF') gdb <- sparrow::getMSigGeneSetDb(collection = 'h', species = 'human', id.type = 'entrez') ct.seas(list('longer' = resultsDF, 'shorter' = resultsDF[1:10000,]), gdb)
data('resultsDF') gdb <- sparrow::getMSigGeneSetDb(collection = 'h', species = 'human', id.type = 'entrez') ct.seas(list('longer' = resultsDF, 'shorter' = resultsDF[1:10000,]), gdb)
Take in a list of results objects and return an equivalently-named list of input 'data.frames' appropriate for 'sparrow::seas()'. By construction, the relevant target unit is extracted from the 'geneSymbol' column of the provided results objects, which may. Note that the genewise '@logFC' slot in the returned object will contain the appropriately-signed Z transformation of the P-value assigned to the target. In most applications this is arguably more interpretable than e.g., the median gRNA log2 fold change.
ct.seasPrep( dflist, collapse.on = c("geneID", "geneSymbol"), cutoff = 0.1, statistic = c("best.q", "best.p"), regularize = FALSE, gdb = NULL, active = "replicated" )
ct.seasPrep( dflist, collapse.on = c("geneID", "geneSymbol"), cutoff = 0.1, statistic = c("best.q", "best.p"), regularize = FALSE, gdb = NULL, active = "replicated" )
dflist |
A list of gCrisprTools results 'data.frames' to be formatted. |
collapse.on |
Should targets be annotated as 'geneSymbol's or 'geneID's (default)? |
cutoff |
Numeric maximum value of 'statistic' to define significance. |
statistic |
Should cutoffs be calculated based on FDR ('best.q') or P-value ('best.p')? |
regularize |
Logical indicating whether to regularize the result objects in 'dflist' (e.g., use intersection set of all targets), or keep as-is. |
gdb |
Optionally, a 'GeneSetDb' object to enable proper registration of the output. If provided, the collapsing features in the provided 'simpleDF's must be present in the 'gsd@db$feature_id' slot. Note that a GREAT-style 'GeneSetDb' that has been conformed via 'ct.GREATdb()' will use 'geneID's as the 'feature_id'. |
active |
Optionally, the name of a logical column present in the provided result that will be used to define significant signals. This is set to 'replicated' by default to If a valid column name is provided, this overrides the specification of 'cutoff' and 'statistic'. |
A list of 'data.frames' formatted for evaluation with 'sparrow::seas()'.
data(resultsDF) ct.seasPrep(list('longer' = resultsDF, 'shorter' = resultsDF[1:10000,]), collapse.on = 'geneSymbol')
data(resultsDF) ct.seasPrep(list('longer' = resultsDF, 'shorter' = resultsDF[1:10000,]), collapse.on = 'geneSymbol')
Given one or more targets of interest, this function generates a summary image contextualizing the corresponding signals within the provided contrast. This takes the form of an annotated ranking curve of target-level signals, supplemented with horizontal Q-value cutoffs and an inset volcano plot of gRNA behavior.
Limited annotation is provided for the specified targets using the following logic:
- If a character vector is provided, up to five targets are annotated; longer lists are highlighted without specifying individual elements. - If a list is provided, the 'names' element is used as the annotation. This is similarly constrained to a total of 5 annotated elements.
ct.signalSummary(summaryDF, targets, callout = FALSE, ...)
ct.signalSummary(summaryDF, targets, callout = FALSE, ...)
summaryDF |
A dataframe summarizing the results of the screen, returned by the function |
targets |
A list or character vector containing the names of the targets to be displayed. Only targets contained in the column specified by the 'collapse' parameter to 'ct.simpleResult()' will be displayed; default is 'geneSymbol'. Plotting priority (e.g., the points to plot last in the case of overlapping signals) is given to earlier elements in the list. |
callout |
Logical indicating whether lines should be plotted indicating individual gene sets to augment the point highlighting. |
... |
Additional optional arguments to 'ct.simpleResult()' |
A summary plot on the current device.
Russell Bainer
data('resultsDF') ct.signalSummary(resultsDF, list('CandidateA' = 'Target229', 'Pathway3' = resultsDF$geneSymbol[c(42,116,1138,5508)]))
data('resultsDF') ct.signalSummary(resultsDF, list('CandidateA' = 'Target229', 'Pathway3' = resultsDF$geneSymbol[c(42,116,1138,5508)]))
Convenience function to reduce a full results object to a gene-level object that retains minimal statistics (or alternatively, check that a provided simple result object is valid).
ct.simpleResult(summaryDF, collapse = c("geneSymbol", "geneID"))
ct.simpleResult(summaryDF, collapse = c("geneSymbol", "geneID"))
summaryDF |
A |
collapse |
Column of the provided resultsDF on which to collapse values; in most cases this should be 'geneSymbol' or 'geneID'. |
A gene-level 'data.frame', with guide-level information omitted
Russell Bainer
data('resultsDF') ct.simpleResult(resultsDF)
data('resultsDF') ct.simpleResult(resultsDF)
This function -log10 transforms empirical P-values by adding a pseudocount of 1/2 the minimum nonzero value.
ct.softLog(x)
ct.softLog(x)
x |
numeric vector. |
-log10-transformed version of X.
ct.softLog(runif(20))
ct.softLog(runif(20))
This function identifies the gRNAs or targets that change the most from sample to sample within an experiment as a percentage of
the entire library. It then plots the abundance of the top nguides
as a stacked barplot for all samples in the experiment. The purpose of this
algorithm is to detect potential distortions in the library composition that might not be properly controlled by sample normalization, and so
the most variable entites are defined by calculating the percent of aligned reads that they contribute to each sample, and then ranking each entity
by the range of these percentages across all samples. Consequently, gRNAs or Targets that are highly abundant in at least one condition will be
are more likely to be identified.
ct.stackGuides( eset, sampleKey = NULL, nguides = 20, plotType = "gRNA", annotation = NULL, ylimit = NULL, subset = NULL )
ct.stackGuides( eset, sampleKey = NULL, nguides = 20, plotType = "gRNA", annotation = NULL, ylimit = NULL, subset = NULL )
eset |
An ExpressionSet object containing, at minimum, a matrix of gRNA abundances extractable with the exprs() function, and a metadata
object containing a column named |
sampleKey |
An optional sample key, supplied as an ordered factor linking the samples to experimental
variables. The |
nguides |
The number of guides (or targets) to display. |
plotType |
A string indicating whether the individual guides should be displayed (' |
annotation |
An optional data.frame containing an annotation object to be used to aggregate the guides into targets. gRNAs are annotated by row,
and must minimally contain a column |
ylimit |
An optional numeric vector of length 2 specifying the y limits for the plot, useful in comparin across studies. |
subset |
An optional character vector containing the sample labels to be used in the analysis; all elements must be contained in the |
A stacked barplot displaying the appropriate entities on the default device.
Russell Bainer
data('es') data('ann') ct.stackGuides(es, nguides = 20, plotType = 'Target', annotation = ann, ylimit = NULL, subset = NULL)
data('es') data('ann') ct.stackGuides(es, nguides = 20, plotType = 'Target', annotation = ann, ylimit = NULL, subset = NULL)
This is a function for displaying candidates from a crispr screen, using the information summarized
in the corresponding fit
and the output from ct.generateResults()
. The fold change and standard deviation
estimates for each gRNA associated with each target (extracted from the coefficients
and stdev.unscaled
slot
of fit
) are plotted on the y axis. Targets are selected on the basis of their gene-level enrichment or depletion
P-values; in the case of ties, they are ranked on the basis of their corresponding Rho statistics.
ct.topTargets( fit, summaryDF, annotation, targets = 10, enrich = TRUE, contrast.term = NULL )
ct.topTargets( fit, summaryDF, annotation, targets = 10, enrich = TRUE, contrast.term = NULL )
fit |
An object of class |
summaryDF |
A data.frame summarizing the results of the screen, returned by the function |
annotation |
An annotation object for the experiment. gRNAs are annotated by
row, and must minimally contain a column |
targets |
Either the number of top targets to display, or a list of |
enrich |
Logical indicating whether to display guides that are enriched (default) or depleted within the screen. If a vector of
|
contrast.term |
If a fit object with multiple coefficients is passed in, a string indiating the coefficient of interest. |
An image on the default device indicating each gRNA's log2 fold change and the unscaled standard deviation of the effect estimate,
derived from the MArrayLM
object.
Russell Bainer
data('fit') data('resultsDF') data('ann') ct.topTargets(fit, resultsDF, ann)
data('fit') data('resultsDF') data('ann') ct.topTargets(fit, resultsDF, ann)
This function takes in a named list of 'results' dataframes produced by 'ct.generateResults()' or similar, harmonizes them, and identifies overlaps between them using the logic implemented in 'ct.compareContrasts()'. It then uses the overlaps of these sets to compose an UpSet plot summarizing shared overlaps of the provided contrasts. These overlaps can be specified with some detail via arguments passed to the 'ct.compareContrasts()' function; see documentation for more details.
Note that the UpSet plot is constructed to respect signal directionality, and by default constructs overlaps conditionally, but in a *bidirectional* manner. That is, a signal is considered observed in two (or more) contrasts regardless of the contrast from which the stringent signal is observed, so a signal replicated in three contrasts is interpreted as a target for which the evidence crosses the stringent threshold in one or more of the contrasts and passes the lax contrast in the others.
Note that multiple important parameters are passed directly to 'ct.compareContrasts()' if not specified in the command. Users are advised to study the corresponding manual page to better understand their options regarding contrast thresholding, orientation, etc.
ct.upSet(dflist, add.stats = TRUE, nperm = 10000, ...)
ct.upSet(dflist, add.stats = TRUE, nperm = 10000, ...)
dflist |
a named list of (possibly simplified) 'resultsDf's. |
add.stats |
Logical indicating whether the significance of set overlaps should be included in the visualization. |
nperm |
Number of permutations for P-value generation. Ignored if 'add.stats' is 'FALSE'. |
... |
Other named arguments to 'ComplexHeatmap::UpSet()', 'ct.compareContrasts', or 'ct.simpleResult()'. |
An UpSet plot on the current device. Silently, a combination matrix appropriate for plotting that plot, containing useful information about the observed intersections.
Russell Bainer
data('resultsDF') sets <- ct.upSet(list('first' = resultsDF, 'second' = resultsDF[1:5000,]))
data('resultsDF') sets <- ct.upSet(list('first' = resultsDF, 'second' = resultsDF[1:5000,]))
This function tries to identify, and then plot the abundance of, the full set of non-targeting controls from an ExpressionSet object. Ideally, the user will supply a geneSymbol present in the appropriate annotation file that uniquely identifies the nontargeting gRNAs. Absent this, the the function will search for common identifier used by nontargeting controls (geneID 'no_gid', or geneSymbol NA).
ct.viewControls( eset, annotation, sampleKey, geneSymb = NULL, normalize = TRUE, lib.size = NULL )
ct.viewControls( eset, annotation, sampleKey, geneSymb = NULL, normalize = TRUE, lib.size = NULL )
eset |
An ExpressionSet object containing, at minimum, a matrix of gRNA abundances extractable with the |
annotation |
An annotation data.frame for the experiment. gRNAs are annotated by
row, and must minimally contain columns |
sampleKey |
A sample key, supplied as an ordered factor linking the samples to experimental
variables. The |
geneSymb |
The |
normalize |
Logical indicating whether to attempt to normalize the data in the |
lib.size |
An optional vector of voom-appropriate library size adjustment factors, usually calculated with |
An image of nontargeting control gRNA abundances on the default device.
Russell Bainer
data('es') data('ann') #Build the sample key library(Biobase) sk <- ordered(relevel(as.factor(pData(es)$TREATMENT_NAME), 'ControlReference')) names(sk) <- row.names(pData(es)) ct.viewControls(es, ann, sk, geneSymb = NULL, normalize = FALSE) ct.viewControls(es, ann, sk, geneSymb = NULL, normalize = TRUE)
data('es') data('ann') #Build the sample key library(Biobase) sk <- ordered(relevel(as.factor(pData(es)$TREATMENT_NAME), 'ControlReference')) names(sk) <- row.names(pData(es)) ct.viewControls(es, ann, sk, geneSymb = NULL, normalize = FALSE) ct.viewControls(es, ann, sk, geneSymb = NULL, normalize = TRUE)
This function generates a visualization of the effect estimates from a MArrayLM model result for all of the individual guides targeting a particular element, specified somewhere in the library annotation file. The estimated effect size and variance is plotted relative to zero for the specified contrast, with the color of the dot indicating the relative scale of the of the guide intercept within the model framework, with warmer colors indicating lowly expressed guides. For comparison, the density of gRNA fold change estimates is privided in a pane on the right, with white lines indicating the exact levels of the individual guides.
ct.viewGuides( gene, fit, ann, type = "geneSymbol", contrast.term = NULL, ylims = NULL )
ct.viewGuides( gene, fit, ann, type = "geneSymbol", contrast.term = NULL, ylims = NULL )
gene |
the name of the target element of interest, contained within the 'type' column of the annotation file. |
fit |
An object of class MArrayLM containing, at minimum, an 'Amean' slot containing the guide level abundances, a 'coefficients' slot containing the effect estimates for each guide, and an 'stdev.unscaled' slot giving the coefficient standard Deviations. |
ann |
A data.frame object containing the gRNA annotations.
At mimimum, it should have a column with the name specified by the |
type |
A character string indicating the column in ann containing the target of interest. |
contrast.term |
If a fit object with multiple coefficients is passed in, a string indiating the coefficient of interest. |
ylims |
An optional numeric vector of length 2 indicating the extremes of the y-axis scale. |
An image summarizing gRNA behavior within the specifed gene on the default device.
Russell Bainer
data('fit') data('ann') ct.viewGuides('Target1633', fit, ann)
data('fit') data('ann') ct.viewGuides('Target1633', fit, ann)
Expressionset of raw counts from a screen in mouse cells performed at Genentech, Inc. All sample, gRNA, and Gene information has been anonymized and randomized.
Genentech, Inc.
Please see ‘vignettes/Crispr_example_workflow.R’ for details.
data('es') print(es)
data('es') print(es)
Example gene list, designed to demonstrate functions using gene lists. All sample, gRNA, and Gene information has been anonymized and randomized.
Russell Bainer
Please see ‘vignettes/Crispr_example_workflow.R’ for details.
data('essential.genes') essential.genes
data('essential.genes') essential.genes
A precalculated fit object (class MArrayLM
) comparing the death
and control expansion arms of a crispr screen performed at Genentech, Inc.
All sample, gRNA, and Gene information has been anonymized and randomized.
Genentech, Inc.
Please see ‘vignettes/Crispr_example_workflow.R’ for model details.
data('fit') show(fit)
data('fit') show(fit)
A precalculated summary Dataframe comparing the death and control expansion arms of the provided example Crispr screen (using 8 cores, seed = 2). All sample, gRNA, and Gene information has been anonymized and randomized.
Genentech, Inc.
Please see ‘vignettes/Crispr_example_workflow.R’ for model details.
data('resultsDF') head(resultsDF)
data('resultsDF') head(resultsDF)