Title: | Functions useful for those doing repetitive analyses with Affymetrix GeneChips |
---|---|
Description: | Various wrapper functions that have been written to streamline the more common analyses that a core Biostatistician might see. |
Authors: | James W. MacDonald |
Maintainer: | James W. MacDonald <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.79.0 |
Built: | 2024-12-19 03:30:00 UTC |
Source: | https://github.com/bioc/affycoretools |
The affycoretools package is primarily intended to make analyses of Affymetrix GeneChip data simpler and more straightforward. There are any number of packages designed for preprocessing or analyzing Affy data, but there are not so many that help streamline the analysis to help create useful output that can be given to collaborators.
The affycoretools package is primarily intended to be used as a way to do reproducible research, where the analysis and documentation are all held in a single file, that is then processed by R to create the output data, as well as a nicely formatted pdf that documents the analysis. The affycoretools package can be used with either Sweave or knitr documents, although these days knitr is really the way to go.
In addition, affycoretools can be used with either annaffy or ReportingTools to create useful output in HTML or text format to share with your collaborators. However, ReportingTools is being actively developed and maintained, whereas annaffy is not, so the intention is to slowly convert all the functions to primarily use ReportingTools.
James W. MacDonald [email protected]
This function is designed to automatically read in all cel files in a directory, make all pre-processing QC plots and compute expression measures.
affystart( ..., filenames = NULL, groups = NULL, groupnames = NULL, plot = TRUE, pca = TRUE, squarepca = FALSE, plottype = "pdf", express = c("rma", "mas5", "gcrma"), addname = NULL, output = "txt", annotate = FALSE, ann.vec = c("SYMBOL", "GENENAME", "ENTREZID", "UNIGENE", "REFSEQ") )
affystart( ..., filenames = NULL, groups = NULL, groupnames = NULL, plot = TRUE, pca = TRUE, squarepca = FALSE, plottype = "pdf", express = c("rma", "mas5", "gcrma"), addname = NULL, output = "txt", annotate = FALSE, ann.vec = c("SYMBOL", "GENENAME", "ENTREZID", "UNIGENE", "REFSEQ") )
... |
Requires that all variables be named. |
filenames |
If not all cel files in a directory will be used, pass a vector of filenames. |
groups |
An integer |
groupnames |
A character |
plot |
Should density and degradation plots be made? Defaults to
|
pca |
Should a PCA plot be made? Defaults to |
squarepca |
Should the y-axis of the PCA plot be made comparable to the
x-axis? This may aid in interpretation of the PCA plot. Defaults to
|
plottype |
What type of plot to save. Can be "pdf","postscript",
"png","jpeg", or "bmp". Defaults to "pdf". Note that "png" and "jpeg" may
not be available on a given computer. See the help page for
|
express |
One of either rma, mas5, gcrma. Defaults to rma. Partial matching OK. |
addname |
Used to append something to the name of the pca plot and the expression values output file (e.g., if function is run twice using different methods to compute expression values). |
output |
What format to use for the output of expression values. Currently only supports text format. |
annotate |
Boolean. Add annotation data to the output file? |
ann.vec |
A character |
Returns an ExpressionSet
.
James W. MacDonald <[email protected]>
plotHist
, plotDeg
, plotPCA
This function fills the featureData slot of the ExpressionSet automatically, which is then available to downstream methods to provide annotated output. Annotating results is tedious, and can be surprisingly difficult to get right. By annotating the data automatically, we remove the tedium and add an extra layer of security since the resulting ExpressionSet will be tested for validity automatically (e.g., annotation data match up correctly with the expression data). Current choices for the annotation data are a ChipDb object (e.g., hugene10sttranscriptcluster.db) or an AffyGenePDInfo object (e.g., pd.hugene.1.0.st.v1). In the latter case, we use the parsed Affymetrix annotation csv file to get data. This is only intended for those situations where the ChipDb package is not available, and in particular is only available for those packages that contain the parsed annotation csv files (generally, Gene ST arrays, Exon ST arrays and Clariom/HTA/MTA/RTA arrays).
annotateEset(object, x, ...) ## S4 method for signature 'ExpressionSet,ChipDb' annotateEset( object, x, columns = c("PROBEID", "ENTREZID", "SYMBOL", "GENENAME"), multivals = "first" ) ## S4 method for signature 'ExpressionSet,AffyGenePDInfo' annotateEset(object, x, type = "core", ...) ## S4 method for signature 'ExpressionSet,AffyHTAPDInfo' annotateEset(object, x, type = "core", ...) ## S4 method for signature 'ExpressionSet,AffyExonPDInfo' annotateEset(object, x, type = "core", ...) ## S4 method for signature 'ExpressionSet,AffyExpressionPDInfo' annotateEset(object, x, type = "core", ...) ## S4 method for signature 'ExpressionSet,character' annotateEset(object, x, ...) ## S4 method for signature 'ExpressionSet,data.frame' annotateEset(object, x, probecol = NULL, annocols = NULL, ...)
annotateEset(object, x, ...) ## S4 method for signature 'ExpressionSet,ChipDb' annotateEset( object, x, columns = c("PROBEID", "ENTREZID", "SYMBOL", "GENENAME"), multivals = "first" ) ## S4 method for signature 'ExpressionSet,AffyGenePDInfo' annotateEset(object, x, type = "core", ...) ## S4 method for signature 'ExpressionSet,AffyHTAPDInfo' annotateEset(object, x, type = "core", ...) ## S4 method for signature 'ExpressionSet,AffyExonPDInfo' annotateEset(object, x, type = "core", ...) ## S4 method for signature 'ExpressionSet,AffyExpressionPDInfo' annotateEset(object, x, type = "core", ...) ## S4 method for signature 'ExpressionSet,character' annotateEset(object, x, ...) ## S4 method for signature 'ExpressionSet,data.frame' annotateEset(object, x, probecol = NULL, annocols = NULL, ...)
object |
An ExpressionSet to which we want to add annotation. |
x |
Either a ChipDb package (e.g., hugene10sttranscriptcluster.db), or a pdInfoPackage object (e.g., pd.hugene.1.0.st.v1). |
... |
Allow users to pass in arbitrary arguments. Particularly useful for passing in columns, multivals, and type arguments for methods. |
columns |
For ChipDb method; what annotation data to add. Use the |
multivals |
For ChipDb method; this is passed to |
type |
For pdInfoPackages; either 'core' or 'probeset', corresponding to the 'target' argument
used in the call to |
probecol |
Column of the data.frame that contains the probeset IDs. Can be either numeric (the column number) or character (the column header). |
annocols |
Column(x) of the data.frame to use for annotating. Can be a vector of numbers (which column numbers to use) or a character vector (vector of column names). |
An ExpressionSet that has annotation data added to the featureData slot.
object = ExpressionSet,x = ChipDb
: Annotate an ExpressionSet using a ChipDb package for annotation data.
object = ExpressionSet,x = AffyGenePDInfo
: Annotate an ExpressionSet using an AffyGenePDInfo package.
object = ExpressionSet,x = AffyHTAPDInfo
: Annotate an ExpressionSet using an AffyHTAPDInfo package.
object = ExpressionSet,x = AffyExonPDInfo
: Annotate an ExpressionSet using an AffyExonPDInfo package.
object = ExpressionSet,x = AffyExpressionPDInfo
: Annotate an ExpressionSet using an AffyExpressionPDInfo package.
object = ExpressionSet,x = character
: Method to capture character input.
object = ExpressionSet,x = data.frame
: Annotate an ExpressionSet using a user-supplied data.frame.
James W. MacDonald <[email protected]>
## Not run: dat <- read.celfiles(filenames = list.celfiles()) eset <- rma(dat) ## annotate using ChipDb eset <- annotateEset(eset, hgu10sttranscriptcluster.db) ## or AffyGenePDInfo eset <- annotateEset(eset, pd.hugene.1.0.st.v1) ## End(Not run)
## Not run: dat <- read.celfiles(filenames = list.celfiles()) eset <- rma(dat) ## annotate using ChipDb eset <- annotateEset(eset, hgu10sttranscriptcluster.db) ## or AffyGenePDInfo eset <- annotateEset(eset, pd.hugene.1.0.st.v1) ## End(Not run)
This is intended to be an internal function to runRomer
. It is
documented here only because it may be necessary to pass alternative
arguments to this function from runRomer
.
dataAndHeatmapPage( object, fit, ind, columns = NULL, fname, heatmap, title, key = TRUE, fitind = NULL, affy = TRUE, ... )
dataAndHeatmapPage( object, fit, ind, columns = NULL, fname, heatmap, title, key = TRUE, fitind = NULL, affy = TRUE, ... )
object |
An ExpressionSet, containing normalized, summarized gene expression data. |
fit |
An MArrayLM object containing the fitted data. |
ind |
Numeric vector indicating which rows of the data object to use. |
columns |
Numeric vector indicating which columns of the
data object to use. If |
fname |
The filename of the resulting output, without the 'html' file extension. |
heatmap |
Character. The filename of the heatmap to append to the bottom of the HTML page. |
title |
Title to be placed at the top of the resulting HTML page. |
key |
Character. The filename of the heatmap key to append to the bottom of the HTML page. |
fitind |
Numeric. Which column of the |
affy |
Boolean. Are these Affymetrix arrays? If |
... |
Included to allow arbitrary commands to be passed to lower level functions. |
This function creates an annotation table using probes2table
if an
annotation file is used, otherwise data will be output in a simple HTML
table. A heatmap showing the expression values for all the genes in the gene
set is then placed below this table, along with a key that indicates the
range of the expression values.
James W. MacDonald <[email protected]>
This function is designed to parse a design and contrasts matrix in order to generate Glimma's interactive MA-plots for all contrasts. The heading for the resulting plot is based on the colnames of the contrasts matrix, so it is important to include the colnames and make them explanatory.
doGlimma( tablst, datobj, dsgn, cont, grpvec, padj = "BH", sigfilt = 0.05, extraname = NULL, ID = NULL, sample.cols = rep("#1f77b4", ncol(datobj)), ... )
doGlimma( tablst, datobj, dsgn, cont, grpvec, padj = "BH", sigfilt = 0.05, extraname = NULL, ID = NULL, sample.cols = rep("#1f77b4", ncol(datobj)), ... )
tablst |
An MArrayLM object or list of DGEExact or DGELRT objects |
datobj |
A DGEList, ExpressionSet, EList or matrix |
dsgn |
A design matrix |
cont |
A contrast matrix |
grpvec |
A vector of groups, usually what was used to make the design matrix |
padj |
Method for multiplicity adjustment. BH by default |
sigfilt |
Significance cutoff for selecting genes |
extraname |
Used to add a sub-directory to the glimma-plots directory, mainly used to disambiguate contrasts with the same name (see below). |
ID |
Character, default is NULL. The column name of the genes list item (from the MArrayLM, DGEExact or DGELRT objects) that will be used for labeling the individual gene expression plot at the top right of the resulting plot. The default of NULL will search for a column labeled SYMBOL to use. If the ID value doesn't match any column, and there is no SYMBOL column, then the default for glMDPlot will be used. |
sample.cols |
A vector of numbers or hex strings, the same length as the grpvec. This will cause the samples in the upper right plot to have different colors. Useful primarily to check for batch differences. The default is the default from glMDPlot. |
... |
Allows end users to pass other arguments to the Glimma glMDPlot function |
In addition, if there are multiple contrasts with the same name, say if the same comparisons are being made for different tissue types, the extraname argument will cause the output to be placed in glimma-plots/<extraname>, to eliminate over-writing of existing files.
A character vector of the files generated, useful for using as links to the output.
James W. MacDonald [email protected]
## Not run: mat <- matrix(rnorm(1e6), ncol = 20) grps <- factor(1:4, each=5) design <- model.matrix(~0 + grps) colnames(design) <- LETTERS[1:4] contrast <- matrix(c(1,-1,0,0,1,0,-1,0,1,0,0,-1,0,1,-1,0,0,1,0,-1), ncol = 5) colnames(contrast) <- paste(LETTERS[c(1,1,1,2,2)], LETTERS[c(2,3,4,3,4)], sep = " vs ") fit <- lmFit(mat, design) fit2 <- contrasts.fit(fit, contrast) fit2 <- eBayes(fit2) htmllinks <- doGlimma(fit2, mat, design, contrast, grps) ## End(Not run)
## Not run: mat <- matrix(rnorm(1e6), ncol = 20) grps <- factor(1:4, each=5) design <- model.matrix(~0 + grps) colnames(design) <- LETTERS[1:4] contrast <- matrix(c(1,-1,0,0,1,0,-1,0,1,0,0,-1,0,1,-1,0,0,1,0,-1), ncol = 5) colnames(contrast) <- paste(LETTERS[c(1,1,1,2,2)], LETTERS[c(2,3,4,3,4)], sep = " vs ") fit <- lmFit(mat, design) fit2 <- contrasts.fit(fit, contrast) fit2 <- eBayes(fit2) htmllinks <- doGlimma(fit2, mat, design, contrast, grps) ## End(Not run)
These functions are intended to add links to the Affymetrix, Entrez Gene, Nuccore, and AmiGO databases when creating HTML tables using ReportingTools.
entrezLinks(df, ...)
entrezLinks(df, ...)
df |
A data.frame, usually created using the |
... |
Allows one to pass arbitrary arguments to lower level functions. Currently unsupported. |
These functions are not actually intended to be called directly. Instead,
they are used as targets for the .modifyDF argument of the publish
function of ReportingTools. See the example below for more detail.
A data.frame is returned, with links included.
James W. MacDonald [email protected]
## Not run: ## say we have an ExpressionSet from HuGene 1.0 ST array dat <- read.celfiles() eset <- rma(dat) ## annotate the ExpressionSet eset <- annotateEset(eset, hugene10sttranscriptcluster.db, columns = c("ENTREZID","ACCNUM","SYMBOL")) ## and fit a model using limma fit <- lmFit(eset, design) fit2 <- eBayes(fit) ## and create an HTML page with links to Affy and Entrez out <- topTable(fit2, coef=2) htab <- HTMLReport("The title","a_short_name") publish(out, htab, .modifyDF = list(affyLinks, entrezLinks, nuccoreLinks)) finish(htab) ## End(Not run)
## Not run: ## say we have an ExpressionSet from HuGene 1.0 ST array dat <- read.celfiles() eset <- rma(dat) ## annotate the ExpressionSet eset <- annotateEset(eset, hugene10sttranscriptcluster.db, columns = c("ENTREZID","ACCNUM","SYMBOL")) ## and fit a model using limma fit <- lmFit(eset, design) fit2 <- eBayes(fit) ## and create an HTML page with links to Affy and Entrez out <- topTable(fit2, coef=2) htab <- HTMLReport("The title","a_short_name") publish(out, htab, .modifyDF = list(affyLinks, entrezLinks, nuccoreLinks)) finish(htab) ## End(Not run)
Internal function used to automatically test for columns that can be converted to links
fixHeaderAndGo(df, affy = TRUE, probecol = "PROBEID")
fixHeaderAndGo(df, affy = TRUE, probecol = "PROBEID")
df |
A data.frame |
affy |
Boolean; does the data.frame contain Affymetrix probeset IDs? |
probecol |
Character. The column header containing Affymetrix probeset IDs. Defaults to "PROBEID". |
This is an internal function designed to test for the presence of Affymetrix Probeset IDs or Entrez Gene IDs, and if found, generate a list that can be passed to the ReportingTools publish function in order to generate hyperlinks. The underlying assumption is that the data will have been annotated using a Bioconductor annotation package, and thus Affy probeset IDs will have a column header "PROBEID", and Entrez Gene IDs will have a header "ENTREZID" (or any combination of upper and lowercase letters).
Returns a list of length two (with names mdf and df). The mdf object can be passed to the
publish
using the .modifyDF argument, and the df object is
input dat.frame with column names corrected to conform to affyLinks
and entrezLinks
, so links
will be generated correctly.
Jim MacDonald
This is intended to be an internal function to runRomer
, and is not
intended to be called by end users. However, the ... argument to
runRomer
allows one to pass arguments to lower level functions, so
the arguments are described here.
geneSetPage( rslts, genesets, object, fit, file, cutoff = 0.05, dir = ".", subdir = ".", columns = NULL, colnames = NULL, col = NULL, caption = NULL, fitind = NULL, bline = NULL, affy = TRUE, ... )
geneSetPage( rslts, genesets, object, fit, file, cutoff = 0.05, dir = ".", subdir = ".", columns = NULL, colnames = NULL, col = NULL, caption = NULL, fitind = NULL, bline = NULL, affy = TRUE, ... )
rslts |
The results from running |
genesets |
Character. A vector of gene symbols for one gene set. |
object |
An ExpressionSet, DGEList or EList containing normalized, summarized gene expression data. |
fit |
An MArrayLM or DGEGLM object, containing the fitted data. |
file |
Filename for the resulting HTML page. |
cutoff |
Numeric. The cutoff for significance for a given gene set. Defaults to 0.05. |
dir |
The directory to write the results. Defaults to the working directory. |
subdir |
The subdirectory to write the individual gene set results. Defaults to the working directory. |
columns |
Numeric. The columns of the |
colnames |
Character. Alternative column names for the resulting
heatmap. See |
col |
A vector of colors for the heatmap. Defaults to
|
caption |
Caption to put at the top of the HTML page. |
fitind |
Numeric. The columns of the |
bline |
Defaults to |
affy |
Boolean; are these Affymetrix arrays? If |
... |
Allows arguments to be passed to lower-level functions. See
|
This function creates a ‘midlevel’ HTML table that contains each gene
set that was significant, with a link to an HTML table that shows data for
each gene in that gene set (with annotation), as well as a heatmap showing
the expression levels. Normally this is not run by end users, but is called
as part of the runRomer
function.
Nothing is returned. Called only for the side effect of creating HTML tables.
James W. MacDonald <[email protected]>
This function is designed to remove all but the 'main' type of probesets from the Gene ST array types.
getMainProbes(input, level = "core")
getMainProbes(input, level = "core")
input |
Either a character string (e.g., "pd.hugene.1.0.st.v1") or a FeatureSet object. |
level |
The summarization level used when calling rma. |
If the argument is a character string, returns a data.frame containing probeset IDs along with the probeset type, that can be used to subset e.g., an ExpressionSet of Gene ST data, or an MArrayLM object created from Gene ST data. Note that the order of the probesets is not guaranteed to match the order in your ExpressionSet or MArrayLM object, so that should be checked first. If the argument is a FeatureSet object, it returns a FeatureSet object with only main probes remaining.
James W. MacDonald <[email protected]>
This is an internal function called by runRomer
and is not intended
to be used directly. It is documented here only because arguments may be
passed down via the dots
argument.
gsHeatmap( object, ind, filename, columns = NULL, colnames = NULL, col = NULL, annot = NULL, scale.row = FALSE, key = TRUE, bline = NULL )
gsHeatmap( object, ind, filename, columns = NULL, colnames = NULL, col = NULL, annot = NULL, scale.row = FALSE, key = TRUE, bline = NULL )
object |
An |
ind |
Numeric vector indicating which rows of the
|
filename |
The filename for the heatmap and associated key. |
columns |
Numeric vector indicating which columns of the
|
colnames |
Character. Substitute column names for the heatmap. If
|
col |
A vector of colors to use for the heatmap. If |
annot |
A matrix or data.frame containing gene symbols to annotate the heatmap. This will normally be extracted automatically from the 'fit' object passed to geneSetPage. If there is no annotation in the fit object, then the probe IDs will be used instead. |
scale.row |
Boolean. Should the data be scaled by row? Defaults to
|
key |
Boolean. Should a key be produced that shows the numeric range
for the colors of the heatmap? Defaults to |
bline |
A numeric vector, usually extracted from a contrast matrix, used to ‘sweep’ the mean baseline sample means from the heatmap data. The end result will be a heatmap in which the colors correspond to log fold changes from the baseline samples. |
As noted above, this is only intended to be called indirectly by
runRomer
. However, certain arguments such as scale.row, or col, etc,
can be passed down to this function via the dots
argument, allowing
the end user to have more control over the finished product.
Nothing is returned. Called only for the side effect of creating heatmaps in 'png' format.
James W. MacDonald <[email protected]>
A function to create an HTML table showing genes that gave rise to a significant GO term
makeGoGeneTable( fit.table, probe.sum.table, go.id, cont.name, base.dir = NULL, extraname = NULL, probecol = "PROBEID", affy = TRUE )
makeGoGeneTable( fit.table, probe.sum.table, go.id, cont.name, base.dir = NULL, extraname = NULL, probecol = "PROBEID", affy = TRUE )
fit.table |
The output from topTable |
probe.sum.table |
The output from running probeSetSummary on a GOHyperGResults object. |
go.id |
The GO ID of interest |
cont.name |
The contrast name. |
base.dir |
Character. Where should the HTML tables be generated? Defaults to NULL. |
extraname |
Character. An extra name that can be used if the contrast name isn't descriptive enough. |
probecol |
The column name in the topTable object that contains probe IDs. Defaults to PROBEID. |
affy |
Boolean. Are the arrays from Affymetrix? |
This is an internal function, not intended to be called by the end user. Documentation here for clarity. After running a GO analysis, it is advantageous to output a table listing those genes that gave rise to a significant GO term. This function creates the table, along with links to Netaffx (if the data are Affymetrix) and to the NCBI Gene database (if there are Entrez Gene IDs).
Returns an HTMLReportRef object.
Jim MacDonald
This function is used to create HTML tables to present the results from a Gene Ontology (GO) analysis.
makeGoTable( fit.table, go.summary, probe.summary, cont.name, base.dir = "GO_results", extraname = NULL, probecol = "PROBEID", affy = TRUE )
makeGoTable( fit.table, go.summary, probe.summary, cont.name, base.dir = "GO_results", extraname = NULL, probecol = "PROBEID", affy = TRUE )
fit.table |
The output from topTable |
go.summary |
The output from running |
probe.summary |
The output from running probeSetSummary on a GOHyperGResults object. |
cont.name |
The contrast name. |
base.dir |
Character. Where should the HTML tables be generated? Defaults to GO_results. |
extraname |
Character. An extra name that can be used if the contrast name isn't descriptive enough. |
probecol |
The column name in the topTable object that contains probe IDs. Defaults to PROBEID. |
affy |
Boolean. Are the arrays from Affymetrix? |
After running a GO analysis, it is often useful to first present a table showing the set of significant GO terms for a given comparison, and then have links to a sub-table for each GO term that shows the genes that were responsible for the significance of that term. The first table can be generated using the summary function, but it will not contain the links to the sub-table. The ReportingTools package has functionality to make these tables and sub-tables automatically, but the default is to include extra glyphs in the main table that are not that useful.
This function is intended to generate a more useful version of the table that one normally gets from ReportingTools.
Returns an HTMLReportRef object, which can be used when creating an index page to link to the results.
Jim MacDonald
This function is intended for use when both miRNA and mRNA data are available for the same samples. In this situation it may be advantageous to compute correlations between the two RNA types, in order to detect mRNA transcripts that are targeted by miRNA.
makeHmap( mRNAdat, miRNAdat, mRNAlst, mRNAvec = NULL, miRNAvec = NULL, chipPkg, header, plot = TRUE, out = TRUE, lhei = c(0.05, 0.95), margins = c(5, 8) )
makeHmap( mRNAdat, miRNAdat, mRNAlst, mRNAvec = NULL, miRNAvec = NULL, chipPkg, header, plot = TRUE, out = TRUE, lhei = c(0.05, 0.95), margins = c(5, 8) )
mRNAdat |
An |
miRNAdat |
An |
mRNAlst |
A |
mRNAvec |
A numeric vector used to subset or reorder the mRNA data, by
column. If |
miRNAvec |
A numeric vector used to subset or reorder the miRNA data,
by column. If |
chipPkg |
Character. The name of the chip-specific annotation package (e.g., "hgu133plus2.db"). |
header |
Character. The plot title if a heatmap is output. |
plot |
Boolean. Should a heatmap be generated? |
out |
Boolean. Should the matrix of correlation coefficients be output? |
lhei |
From |
margins |
From |
As noted above, this function is intended to generate output from simultaneous analyses of miRNA and mRNA data for the same samples, the goal being either a heatmap like plot of correlations, or the data (or both).
If creating a plot, note that if the number of significant mRNA probes is
large, the resulting heatmap will have many rows and will not plot correctly
on the usual graphics device within R. In order to visualize, it is almost
always better to output as a pdf. In addition, the dimensions of this pdf
will have to be adjusted so the row names for the heatmap will be legible.
As an example, a heatmap with 10 miRNA transcripts and 100 mRNA transcripts
will likely need a pdf with a width argument of 6 and a height argument of
25 or 30. It may require some experimentation to get the correct arguments
to the pdf
function.
Also please note that this function by necessity outputs rectangular data. However, there will be many instances in which a given miRNA isn't thought to target a particular mRNA. Whenever this occurs, the heatmap will have a white cell, and the output data for that combination will be NA.
This function will output a numeric matrix if the 'out' argument is
TRUE
.
James W. MacDonald
mirna2mrna
A function to add dotplot glyphs and links to HTML tables
makeImages( df, eset, grp.factor, design, contrast, colind, boxplot = FALSE, repdir = "./reports", extraname = NULL, weights = NULL, insert.after = 3, altnam = NULL, ... )
makeImages( df, eset, grp.factor, design, contrast, colind, boxplot = FALSE, repdir = "./reports", extraname = NULL, weights = NULL, insert.after = 3, altnam = NULL, ... )
df |
A data.frame from calling |
eset |
A matrix, data.frame, or |
grp.factor |
A factor that indicates which group ALL of the samples belong to. This will be subsetted internally, so do not subset yourself. |
design |
The design matrix used by limma or edgeR to fit the model. |
contrast |
The contrast matrix used by limma or edgeR to make comparisons. |
colind |
Which column of the contrast matrix are we using? In other words, for which comparison are we creating a table? |
boxplot |
Boolean. If |
repdir |
A directory in which to put the HTML tables. Defaults to a "reports" directory in the working directory. |
extraname |
By default, the tables will go in a "reports" subdirectory, and will be named based on the column name of the contrast that is specified by the colind argument (after replacing any spaces with an underscore). If this will result in name collisions (e.g., a previous file will be over-written because the resulting names are the same), then an extraname can be appended to ensure uniqueness. |
weights |
Array weights, generally from |
insert.after |
Which column should the image be inserted after? Defaults to 3. |
altnam |
Normally the output file directories are generated from the colnames of the contrast matrix. This argument can be used to over-ride the default, particularly in the case that one is computing an F-test using a set of columns from the contrast matrix. |
... |
Allows arbitrary arguments to be passed down to lower level functions. |
This function is intended to create little dotplot glyphs that can be added to an HTML table of results from e.g., a microarray or RNA-Seq experiment, showing graphically how much the different groups are changing. The glyphs have unlabeled axes to make them small enough to fit in an HTML table, and clicking on a glyph will result in a new page loading with a full sized dotplot, complete with axis labels.
This function is very similar to the stock functions in the ReportingTools package, but the standard glyphs for that package consist of a dotplot on top of a boxplot, which seems too busy to me. In addition, for most microarray analyses there are not enough replicates to make a boxplot useful.
A list, two items. The first item is the input data.frame with the glyphs included, ready to be used with ReportingTools to create an HTML table. The second item is a pdf of the most differentially expressed comparison. This is useful for those who are using e.g., knitr or Sweave and want to be able to automatically insert an example dotplot in the document to show clients what to expect.
James W. MacDonald [email protected]
This function is designed to output CSV and HTML tables based on an analysis using the limma or edgeR packages, with output generated using the ReportingTools package. Please note that a DGEGLM object from edgeR is simply converted to an MArrayLM object from limma and then used in the default MArrayLM method, so all arguments for the MArrayLM object pertain to the DGEGLM method as well.
makeVenn(object, ...) ## S4 method for signature 'MArrayLM' makeVenn( object, contrast, design, groups = NULL, collist = NULL, p.value = 0.05, lfc = 0, method = "both", adj.meth = "BH", titleadd = NULL, fileadd = NULL, baseUrl = ".", reportDirectory = "./venns", affy = TRUE, probecol = "PROBEID", ... ) ## S4 method for signature 'DGEGLM' makeVenn( object, contrast, design, comp.method = c("glmLRT", "glmQLFTest", "glmTreat"), lfc = 0, ... )
makeVenn(object, ...) ## S4 method for signature 'MArrayLM' makeVenn( object, contrast, design, groups = NULL, collist = NULL, p.value = 0.05, lfc = 0, method = "both", adj.meth = "BH", titleadd = NULL, fileadd = NULL, baseUrl = ".", reportDirectory = "./venns", affy = TRUE, probecol = "PROBEID", ... ) ## S4 method for signature 'DGEGLM' makeVenn( object, contrast, design, comp.method = c("glmLRT", "glmQLFTest", "glmTreat"), lfc = 0, ... )
object |
|
... |
Used to pass other arguments to lower level functions. |
contrast |
A contrasts matrix, produced either by hand, or by a call to
|
design |
A design matrix. |
groups |
This argument is used when creating a legend for the resulting HTML pages. If NULL, the groups will be generated using the column names of the design matrix. In general it is best to leave this NULL. |
collist |
A list containing numeric vectors indicating which columns of
the fit, contrast and design matrix to use. If |
p.value |
A p-value to filter the results by. |
lfc |
A log fold change to filter the results by. |
method |
One of "same", "both", "up", "down", "sameup", or "samedown". See details for more information. |
adj.meth |
Method to use for adjusting p-values. Default is 'BH', which
corresponds to 'fdr'. Ideally one would set this value to be the same as was
used for |
titleadd |
Additional text to add to the title of the HTML tables. Default is NULL, in which case the title of the table will be the same as the filename. |
fileadd |
Additional text to add to the name of the HTML and CSV tables. Default is NULL. |
baseUrl |
A character string giving the location of the page in terms of HTML locations. Defaults to "." |
reportDirectory |
A character string giving the location that the results will be written. Defaults to "./venns" |
affy |
Boolean. Are these Affymetrix data, and should hyperlinks to the affy website be generated in the HTML tables? |
probecol |
This argument is used in concert with the preceding argument. If these are Affymetrix data
, then specify the column header in the |
comp.method |
Character. For DGEGLM objects, the DGEGLM object must first be processed using one of |
The purpose of this function is to output HTML and text tables with lists of
genes that fulfill the criteria of a call to
decideTests
as well as the direction of differential
expression. This is a high-level function that calls vennSelect2
internally, and is intended to be used with vennPage
to create a set
of Venn diagrams (on an HTML page) that have clickable links in each cell of
the diagram. The links will then pass the end user to individual HTML pages
that contain the genes that are represented by the counts in a given cell of
the Venn diagram.
In general, the only thing that is needed to create a set of Venn diagrams is a list of numeric vectors that indicate the columns of the contrast matrix that are to be used for a given diagram. See the example below for a better explanation.
Some important things to note: First, the names of the HTML and text tables
are extracted from the colnames
of the TestResults
object,
which come from the contrasts matrix, so it is important to use something
descriptive. Second, the method argument is analogous to the include
argument from vennCounts
or
vennDiagram
. Choosing "both" will select genes
that are differentially expressed in one or more comparisons, regardless of
direction. Choosing "up" or "down" will select genes that are only
differentially expressed in one direction. Choosing "same" will select genes
that are differentially expressed in the same direction. Choosing "sameup"
or "samedown" will select genes that are differentially expressed in the
same direction as well as 'up' or 'down'.
Note that this is different than sequentially choosing "up" and then "down". For instance, a gene that is upregulated in one comparison and downregulated in another comparison will be listed in the intersection of those two comparisons if "both" is chosen, it will be listed in only one comparison for both the "up" and "down" methods, and it will be listed in the union (e.g., not selected) if "same" is chosen.
Unlike vennSelect
, this function automatically creates both HTML and
CSV output files.
Also please note that this function relys on annotation information contained in the "genes" slot of the "fit" object. If there are no annotation data, then just statistics will be output in the resulting HTML tables.
A list containing the output from calling vennSelect2
on the
columns specified by the collist argument. This is intended as input to
vennPage
, which will use those data to create the HTML page with Venn
diagrams with clickable links.
MArrayLM
: Make a Venn diagram using an MArrayLM object.
DGEGLM
: Make a Venn diagram using a DGEGLM object.
James W. MacDonald [email protected]
## Not run: mat <- matrix(rnorm(1e6), ncol = 20) design <- model.matrix(~factor(1:4, each=5)) colnames(design) <- LETTERS[1:4] contrast <- matrix(c(1,-1,0,0,1,0,-1,0,1,0,0,-1,0,1,-1,0,0,1,0,-1), ncol = 5) colnames(contrast) <- paste(LETTERS[c(1,1,1,2,2)], LETTERS[c(2,3,4,3,4)], sep = " vs ") fit <- lmFit(mat, design) fit2 <- contrasts.fit(fit, contrast) fit2 <- eBayes(fit2) ## two Venn diagrams - a 3-way Venn with the first three contrasts ## and a 2-way Venn with the last two contrasts collist <- list(1:3,4:5) venn <- makeVenn(fit2, contrast, design, collist = collist) vennPage(venn, "index.html", "Venn diagrams") ## End(Not run)
## Not run: mat <- matrix(rnorm(1e6), ncol = 20) design <- model.matrix(~factor(1:4, each=5)) colnames(design) <- LETTERS[1:4] contrast <- matrix(c(1,-1,0,0,1,0,-1,0,1,0,0,-1,0,1,-1,0,0,1,0,-1), ncol = 5) colnames(contrast) <- paste(LETTERS[c(1,1,1,2,2)], LETTERS[c(2,3,4,3,4)], sep = " vs ") fit <- lmFit(mat, design) fit2 <- contrasts.fit(fit, contrast) fit2 <- eBayes(fit2) ## two Venn diagrams - a 3-way Venn with the first three contrasts ## and a 2-way Venn with the last two contrasts collist <- list(1:3,4:5) venn <- makeVenn(fit2, contrast, design, collist = collist) vennPage(venn, "index.html", "Venn diagrams") ## End(Not run)
This function creates an MA plot for all arrays in either an ExpressionSet or a matrix. A 'baseline' array is created using the median expression for each gene, and each array is then compared to the baseline array.
maplot(object)
maplot(object)
object |
An ExpressionSet or matrix containing log-transformed array data. |
No output. Used only for the side effect of creating MA plots.
James W. MacDonald <[email protected]>
This function is intended use when there are miRNA and mRNA data for the same subjects, and the goal is to detect mRNAs that appear to be targeted by the miRNA.
mirna2mrna( miRNAids, miRNAannot, mRNAids, orgPkg, chipPkg, sanger = TRUE, miRNAcol = NULL, mRNAcol = NULL, transType = "ensembl" )
mirna2mrna( miRNAids, miRNAannot, mRNAids, orgPkg, chipPkg, sanger = TRUE, miRNAcol = NULL, mRNAcol = NULL, transType = "ensembl" )
miRNAids |
A character vector of miRNA IDs. Currently only supports Affymetrix platform. |
miRNAannot |
Character. The filename (including path if not in working directory) for the file containing miRNA to mRNA mappings. |
mRNAids |
A character vector of mRNA IDs. Currently only supports Affymetrix platform. |
orgPkg |
Character. The Bioconductor organism package (e.g., org.Hs.eg.db) to be used for mapping. |
chipPkg |
Character. The Bioconductor chip-specific package (e.g., hgu133plus2.db) to be used for mapping. |
sanger |
Boolean. Is the miRNAannot file a Sanger miRBase targets file? These can be downloaded from http://www.ebi.ac.uk/enright-srv/microcosm/cgi-bin/targets/v5/download.pl |
miRNAcol |
Numeric. If using a Sanger miRBase targets file, leave
|
mRNAcol |
Numeric. If using Sanger miRBase targets file, leave
|
transType |
Character. Designates the type of transcript ID for mRNA supplied by the miRNAannot file. If using the Sanger miRBase files, this is ensembl. Other choices include refseq and accnum. |
This function is intended to take a vector of miRNA IDs that are significantly differentially expressed in a given experiment and then map those IDs to putative mRNA transcripts that the miRNAs are supposed to target. The mRNA transcript IDs are then mapped to chip-specific probeset IDs, which are then subsetted to only include those probesets that were also significantly differentially expressed.
The output from this function is intended as input for
makeHmap
.
A list with names that correspond to each significant miRNA, and the mRNA probeset IDs that are targeted by that miRNA.
James W. MacDonald
makeHmap
This function is actually intended to be a sub-function of runRomer
,
but can hypothetically run by itself if the romer
step has
already been done.
outputRomer(object, fit, ...) ## S4 method for signature 'ExpressionSet,MArrayLM' outputRomer( object, fit, rsltlst, genesetlst, design = NULL, contrast = NULL, changenames = TRUE, dir = "genesets", explanation = NULL, baseline.hmap = TRUE, file = "indexRomer.html", affy = TRUE, ... ) ## S4 method for signature 'DGEList,DGEGLM' outputRomer( object, fit, rsltlst, genesetlst, design = NULL, contrast = NULL, changenames = TRUE, dir = "genesets", explanation = NULL, baseline.hmap = TRUE, file = "indexRomer.html", ... ) ## S4 method for signature 'EList,MArrayLM' outputRomer( object, fit, rsltlst, genesetlst, design = NULL, contrast = NULL, changenames = TRUE, dir = "genesets", explanation = NULL, baseline.hmap = TRUE, file = "indexRomer.html", ... )
outputRomer(object, fit, ...) ## S4 method for signature 'ExpressionSet,MArrayLM' outputRomer( object, fit, rsltlst, genesetlst, design = NULL, contrast = NULL, changenames = TRUE, dir = "genesets", explanation = NULL, baseline.hmap = TRUE, file = "indexRomer.html", affy = TRUE, ... ) ## S4 method for signature 'DGEList,DGEGLM' outputRomer( object, fit, rsltlst, genesetlst, design = NULL, contrast = NULL, changenames = TRUE, dir = "genesets", explanation = NULL, baseline.hmap = TRUE, file = "indexRomer.html", ... ) ## S4 method for signature 'EList,MArrayLM' outputRomer( object, fit, rsltlst, genesetlst, design = NULL, contrast = NULL, changenames = TRUE, dir = "genesets", explanation = NULL, baseline.hmap = TRUE, file = "indexRomer.html", ... )
object |
An ExpressionSet, DGEList or EList object containing normalized, summarized gene expression data. |
fit |
An MArrayLM or DGEGLM object, containing the fitted data. |
... |
Arguments to be passed to lower-level functions. See
|
rsltlst |
A list of results, generated by the |
genesetlst |
A list of genesets, usually created by loading in the RData files that can be downloaded from http://bioinf.wehi.edu.au/software/MSigDB/. See details for more information. |
design |
A design matrix describing the model. |
contrast |
A contrast matrix describing the contrasts that were fit. This matrix should have colnames, which will be used to name subdirectories containing results. |
changenames |
Boolean. When creating heatmaps of the gene sets, should
the columns be appended with the colnames from the design matrix? If
|
dir |
Character. The subdirectory to use for the output data. Defaults to 'genesets'. |
explanation |
If |
baseline.hmap |
Boolean. If |
file |
Character. The filename to output. Defaults to indexRomer.html. |
affy |
Boolean. Are these Affymetrix arrays? if |
This function is intended to be an internal function for runRomer
.
However, it is possible that runRomer
errored out after saving the
results from running romer
on a set of contrasts, and all that
remains is to create the output HTML.
Please note that the first two arguments to this function have certain
expectations. The rsltlst should be the output from running
romer
. If using the saved output from runRomer
, one
should first load
the 'romer.Rdata' file, which will introduce a list
object with the name 'romerlst' into the working directory, so the first
argument should be rsltlst = romerlst.
Second, see the code for runRomer, specifically the line that creates the 'sets' object, which will show how to create the correct genesetlst object.
Nothing is returned. The function is run only for the side effect of creating HTML tables with output for each significant gene set.
object = ExpressionSet,fit = MArrayLM
: Output romer results using microarray data
object = DGEList,fit = DGEGLM
: Output romer results using RNA-Seq data processed using edgeR
object = EList,fit = MArrayLM
: Output romer results using RNA-Seq data processed using voom.
James W. MacDonald <[email protected]>
These functions make density and RNA degradation plots with automatic placement of legends.
plotDeg(dat, filenames = NULL)
plotDeg(dat, filenames = NULL)
dat |
An |
filenames |
Filenames that will be used in the legend of the resulting
plot. If |
These functions are called only for the side effect of making the plots. Nothing else is returned.
James W. MacDonald <[email protected]>
library("affydata") data(Dilution) plotDeg(Dilution) plotHist(Dilution)
library("affydata") data(Dilution) plotDeg(Dilution) plotHist(Dilution)
This function makes a PCA plot from an ExpressionSet or matrix
## S4 method for signature 'matrix' plotPCA( object, groups = NULL, groupnames = NULL, addtext = NULL, x.coord = NULL, y.coord = NULL, screeplot = FALSE, squarepca = FALSE, pch = NULL, col = NULL, pcs = c(1, 2), legend = TRUE, main = "Principal Components Plot", plot3d = FALSE, outside = FALSE, ... ) ## S4 method for signature 'ExpressionSet' plotPCA(object, ...)
## S4 method for signature 'matrix' plotPCA( object, groups = NULL, groupnames = NULL, addtext = NULL, x.coord = NULL, y.coord = NULL, screeplot = FALSE, squarepca = FALSE, pch = NULL, col = NULL, pcs = c(1, 2), legend = TRUE, main = "Principal Components Plot", plot3d = FALSE, outside = FALSE, ... ) ## S4 method for signature 'ExpressionSet' plotPCA(object, ...)
object |
An |
groups |
A numeric |
groupnames |
A character |
addtext |
A character |
x.coord |
Pass an x-coordinate if automatic legend placement fails |
y.coord |
Pass a y-coordinate if automatic legend placement fails. |
screeplot |
Boolean: Plot a |
squarepca |
Should the y-axis of the PCA plot be made comparable to the
x-axis? This may aid in interpretation of the PCA plot. Defaults to
|
pch |
A numeric |
col |
A numeric or character |
pcs |
A character |
legend |
Boolean. Should a legend be added to the plot? Defaults to
|
main |
A character |
plot3d |
Boolean. If |
outside |
Boolean. If |
... |
Further arguments to be passed to |
This function returns nothing. It is called only for the side effect of producing a PCA plot or screeplot.
plotPCA,ExpressionSet-method
:
James W. MacDonald <[email protected]>
library("affy") data(sample.ExpressionSet) plotPCA(sample.ExpressionSet, groups = as.numeric(pData(sample.ExpressionSet)[,2]), groupnames = levels(pData(sample.ExpressionSet)[,2]))
library("affy") data(sample.ExpressionSet) plotPCA(sample.ExpressionSet, groups = as.numeric(pData(sample.ExpressionSet)[,2]), groupnames = levels(pData(sample.ExpressionSet)[,2]))
This function automates both running romer
on a set of
contrasts as well as the creation of output HTML tables that can be used to
explore the results. The basic idea here is that one might have used limma
to fit a model and compute some contrasts, and then want to do a GSEA using
romer
.
runRomer(object, ...) ## S4 method for signature 'ExpressionSet' runRomer( object, fit, setloc, annot = NULL, design = NULL, contrast = NULL, wts = NULL, save = TRUE, baseline.hmap = TRUE, affy = TRUE, ... ) ## S4 method for signature 'DGEList' runRomer( object, fit, setloc, design = NULL, contrast = NULL, save = TRUE, baseline.hmap = TRUE, ... ) ## S4 method for signature 'EList' runRomer( object, fit, setloc, design = NULL, contrast = NULL, save = TRUE, baseline.hmap = TRUE, ... )
runRomer(object, ...) ## S4 method for signature 'ExpressionSet' runRomer( object, fit, setloc, annot = NULL, design = NULL, contrast = NULL, wts = NULL, save = TRUE, baseline.hmap = TRUE, affy = TRUE, ... ) ## S4 method for signature 'DGEList' runRomer( object, fit, setloc, design = NULL, contrast = NULL, save = TRUE, baseline.hmap = TRUE, ... ) ## S4 method for signature 'EList' runRomer( object, fit, setloc, design = NULL, contrast = NULL, save = TRUE, baseline.hmap = TRUE, ... )
object |
An ExpressionSet, DGEList, or EList object |
... |
Used to pass arguments to lower-level functions. See
|
fit |
A fitted model from either limma (e.g., MArrayLM) or edgeR (e.g., DGEGLM) |
setloc |
A character vector giving the path for gene set RData files (see description for more information), or a named list (or list of lists), where the top-level names consist of gene set grouping names (like KeGG or GO), the next level names consist of gene set names (like NAKAMURA_CANCER_MICROENVIRONMENT_UP), and the list items themselves are gene symbols, matching the expected capitalization for the species being used (e.g., for human, they are ALL CAPS. For most other species only the First Letter Is Capitalized). |
annot |
Character. The name of the array annotation package. If NULL, the annotation data will be extracted from the fData slot (for ExpressionSets) or the genes list (for DGEList or EList objects). |
design |
A design matrix describing the model fit to the data. Ideally this should be a cell-means model (e.g., no intercept term), as the design and contrast matrices are used to infer which data to include in the output heatmaps. There is no guarantee that this will work correctly with a treatment-contrasts parameterization (e.g., a model with an intercept). |
contrast |
A contrast matrix describing the contrasts that were computed from the data. This contrast should have colnames, which will be used to create parts of the resulting directory structure. |
wts |
Optional weights vector - if array weights were used to fit the model, they should be supplied here as well. |
save |
Boolean. If true, after running the |
baseline.hmap |
Boolean. If |
affy |
Boolean; are these Affymetrix arrays? If |
The romer
expects as input a list or lists of gene symbols
that represent individual gene sets. One example is the various gene sets
from the Broad Institute that are available at
http://bioinf.wehi.edu.au/software/MSigDB/, which are distributed as RData
files. The default assumption for this function is that the end user will
have downloaded these files, and the setloc argument simply tells
runRomer
where to find them.
Alternatively, user-based gene sets could be created (these should consist of lists of character vectors of gene symbols - see one of the Broad gene sets for an example).
This function will run romer
using all the gene sets in the
referenced directory, on all the contrasts supplied, and then output the
results in a (default) 'genesets' subdirectory. There will be an HTML file
in the working directory with a (default) filename of 'indexRomer.html' that
will point to individual HTML files in the genesets subdirectory, which will
point to individual files in subdirectories within the genesets subdirectory
(named after the colnames of the contrast matrix).
If save is TRUE, return a list that can be re-processed using outputRomer
.
this is useful in cases where you might need to re-run multiple times.
Nothing is returned. This function is called only for the side-effects of creating output HTML files in the working and sub-directories.
ExpressionSet
: Perform gene set analysis using microarray data.
DGEList
: Perform gene set analysis using RNA-Seq data processed using edgeR.
EList
: Perform gene set analysis using RNA-Seq data processed using voom.
James W. MacDonald <[email protected]>
James W. MacDonald <[email protected]>
A function to create a 4-way Venn diagram
venn4Way( fit, contrast, p.value, lfc, adj.meth, baseUrl = ".", reportDirectory = "./venns", affy = TRUE, probecol = "PROBEID", ... )
venn4Way( fit, contrast, p.value, lfc, adj.meth, baseUrl = ".", reportDirectory = "./venns", affy = TRUE, probecol = "PROBEID", ... )
fit |
An |
contrast |
A contrasts matrix, used by limma to generate the comparisons made. |
p.value |
A p-value cutoff for significance |
lfc |
A log fold change cutoff |
adj.meth |
The method used to adjust for multiple comparisons. |
baseUrl |
The base directory for the tables generated. Defaults to ".", meaning the current directory. |
reportDirectory |
The directory in which to put the results. Defaults to a "venns" subdirectory. |
affy |
Boolean. Set to |
probecol |
The column containing either the Affymetrix probeset IDs (if the affy argument is set to |
... |
Allows arbitrary arguments to be passed to lower level functions |
This function is an internal function and not really intended to be called by the end user. It is generally called by the vennPage
function. The goal is to create a 4-way Venn diagram in an HTML page with clickable links to tables of the genes found in a given cell.
In addition, the numbers in each cell are underlined with colored bars that help end users tell what contrasts are captured by that cell.
Returns a list. The first item is a (list of) HTMLReportRef objects that can be used by ReportingTools to create HTML links.
The second item is the output from the venn
function in gtools, and the third item is the name of the contrasts used to generate
the Venn diagram.
James W. MacDonald [email protected]
This function is designed to compute counts for a Venn diagram. It is
slightly different from vennCounts
in the
additional ability to compute counts for genes that are differentially
expressed in the same direction.
vennCounts2(x, method = "same", fit = NULL, foldFilt = NULL)
vennCounts2(x, method = "same", fit = NULL, foldFilt = NULL)
x |
A |
method |
One of "same", "both", "up", "down". See details for more information. |
fit |
An |
foldFilt |
A fold change to filter samples. This is primarily here for
consistency with the corresponding argument in |
The function vennCounts
will return identical
results except for the "same" method. This will only select those genes that
both pass the criteria of decideTests
as well as being
differentially expressed in the same direction. Note that this is different
from the "both" method, which simply requires that a given gene be
differentially expressed in e.g., two different comparisons without any
requirement that the direction be the same.
A VennCounts
object.
James W. MacDonald <[email protected]>
library("limma") tstat <- matrix(rt(300,df=10),100,3) tstat[1:33,] <- tstat[1:33,]+2 clas <- classifyTestsF(tstat,df=10,p.value=0.05) a <- vennCounts2(clas) print(a) vennDiagram(a)
library("limma") tstat <- matrix(rt(300,df=10),100,3) tstat[1:33,] <- tstat[1:33,]+2 clas <- classifyTestsF(tstat,df=10,p.value=0.05) a <- vennCounts2(clas) print(a) vennDiagram(a)
A function to generate Venn diagrams for use within Rmarkdown documents, particularly for those using the Bioconductor BiocStyle package for formatting.
vennInLine( vennlst, caplst, cex.venn = 1, shift.title = FALSE, reportDirectory = NULL, ... )
vennInLine( vennlst, caplst, cex.venn = 1, shift.title = FALSE, reportDirectory = NULL, ... )
vennlst |
The output from |
caplst |
A list of captions to accompany each Venn diagram. |
cex.venn |
Adjustment parameter for the numbers in the Venn diagram. The default is usually OK. |
shift.title |
Boolean. Should the titles for the Venn diagram be shifted to accommodate long contrast names? |
reportDirectory |
Directory containing the Venn diagram. This is usually set by |
... |
Allows users to pass arbitrary arguments to lower level functions. |
This function is intended for those who use Rmarkdown documents to present results and who would like to include Venn diagrams showing the overlap between two to four contrasts. The Venn diagrams that are generated include links for each cell of the diagram that will open HTML pages that contain results for the genes that are found within the cell of the Venn diagram.
Please note that this function is tailored specifically for use within Rmarkdown documents, particularly those that use the Bioconductor BiocStyle package. The function call should be present in a code block using the argument results = "asis", because we are directly generating HTML rather than placing a figure.
This function returns the required HTML text to generate the Venn diagram
James W. MacDonald [email protected]
vennPage
particularly for the example.
This function is designed to be used in conjunction with the makeVenn
function, to first create a set of HTML pages containing the genes that are
represented by the cells of a Venn diagram, and then create an HTML page
with the same Venn diagrams, with clickable links that will point the end
user to the HTML pages.
vennPage( vennlst, pagename, pagetitle, cex.venn = 1, shift.title = FALSE, baseUrl = ".", reportDirectory = NULL, ... )
vennPage( vennlst, pagename, pagetitle, cex.venn = 1, shift.title = FALSE, baseUrl = ".", reportDirectory = NULL, ... )
vennlst |
The output from |
pagename |
Character. The file name for the resulting HTML page. Something like 'venns' is reasonable. Note that the .html will automatically be appended. |
pagetitle |
Character. The heading for the HTML page. |
cex.venn |
Numeric. Adjusts the size of the font in the Venn diagram. Usually the default is OK. |
shift.title |
Boolean. Should the right contrast name of the Venn diagram be shifted down? Useful for long contrast names. If a two-way Venn diagram, this will shift the right name down so they don't overlap. If a three-way Venn diagram, this will shift the top right name down. |
baseUrl |
Character. The base URL for the resulting HTML page. The default of "." is usually optimal. |
reportDirectory |
If |
... |
To allow passing other arguments to lower level functions. Currently not used. |
This function is intended to be used as part of a pipeline, by first calling
makeVenn
and then using the output from that function as input to
this function to create the HTML page with clickable links.
An HTMLReport object. If used as input to the ReportingTools
publish
function, this will create a link on an index page to the
Venn diagram HTML page. See e.g., the microarray analysis vignette for
ReportingTools for more information.
James W. MacDonald [email protected]
## Not run: mat <- matrix(rnorm(1e6), ncol = 20) design <- model.matrix(~factor(1:4, each=5)) colnames(design) <- LETTERS[1:4] contrast <- matrix(c(1,-1,0,0,1,0,-1,0,1,0,0,-1,0,1,-1,0,0,1,0,-1), ncol = 5) colnames(contrast) <- paste(LETTERS[c(1,1,1,2,2)], LETTERS[c(2,3,4,3,4)], sep = " vs ") fit <- lmFit(mat, design) fit2 <- contrasts.fit(fit, contrast) fit2 <- eBayes(fit2) ## two Venn diagrams - a 3-way Venn with the first three contrasts ## and a 2-way Venn with the last two contrasts collist <- list(1:3,4:5) venn <- makeVenn(fit2, contrast, design, eset, collist = collist) vennreport <- vennPage(venn, "index.html", "Venn diagrams") indexPage <- HTMLReport("index", "My results", reportDirectory = ".", baseUrl = ".") publish(vennreport) finish(indexPage) ## End(Not run)
## Not run: mat <- matrix(rnorm(1e6), ncol = 20) design <- model.matrix(~factor(1:4, each=5)) colnames(design) <- LETTERS[1:4] contrast <- matrix(c(1,-1,0,0,1,0,-1,0,1,0,0,-1,0,1,-1,0,0,1,0,-1), ncol = 5) colnames(contrast) <- paste(LETTERS[c(1,1,1,2,2)], LETTERS[c(2,3,4,3,4)], sep = " vs ") fit <- lmFit(mat, design) fit2 <- contrasts.fit(fit, contrast) fit2 <- eBayes(fit2) ## two Venn diagrams - a 3-way Venn with the first three contrasts ## and a 2-way Venn with the last two contrasts collist <- list(1:3,4:5) venn <- makeVenn(fit2, contrast, design, eset, collist = collist) vennreport <- vennPage(venn, "index.html", "Venn diagrams") indexPage <- HTMLReport("index", "My results", reportDirectory = ".", baseUrl = ".") publish(vennreport) finish(indexPage) ## End(Not run)
This function is designed to output text and/or HTML tables based on the
results of a call to decideTests
, using the
ReportingTools package.
vennSelect2( fit, contrast, design, groups = NULL, cols = NULL, p.value = 0.05, lfc = 0, method = "same", adj.meth = "BH", titleadd = NULL, fileadd = NULL, baseUrl = ".", reportDirectory = "./venns", affy = TRUE, probecol = "PROBEID", ... )
vennSelect2( fit, contrast, design, groups = NULL, cols = NULL, p.value = 0.05, lfc = 0, method = "same", adj.meth = "BH", titleadd = NULL, fileadd = NULL, baseUrl = ".", reportDirectory = "./venns", affy = TRUE, probecol = "PROBEID", ... )
fit |
|
contrast |
A contrasts matrix, produced either by hand, or by a call to
|
design |
A design matrix. |
groups |
This argument is used when creating a legend for the resulting HTML pages. If NULL, the groups will be generated using the column names of the design matrix. |
cols |
A numeric vector indicating which columns of the fit, contrast
and design matrix to use. If |
p.value |
A p-value to filter the results by. |
lfc |
A log fold change to filter the results by. |
method |
One of "same", "both", "up", "down", "sameup", or "samedown". See details for more information. |
adj.meth |
Method to use for adjusting p-values. Default is 'BH', which
corresponds to 'fdr'. Ideally one would set this value to be the same as was
used for |
titleadd |
Additional text to add to the title of the HTML tables. Default is NULL, in which case the title of the table will be the same as the filename. |
fileadd |
Additional text to add to the name of the HTML and CSV tables. Default is NULL. |
baseUrl |
A character string giving the location of the page in terms of HTML locations. Defaults to "." |
reportDirectory |
A character string giving the location that the results will be written. Defaults to "./venns" |
affy |
Boolean; are these Affymetrix arrays, and do you want hyperlinks for each probeset to the Affy website to be generated for the resulting HTML tables? |
probecol |
If the "affy" argument is |
... |
Used to pass arguments to lower level functions. |
The purpose of this function is to output HTML and text tables with lists of
genes that fulfill the criteria of a call to
decideTests
as well as the direction of differential
expression.
Some important things to note: First, the names of the HTML and text tables
are extracted from the colnames
of the TestResults
object,
which come from the contrasts matrix, so it is important to use something
descriptive. Second, the method argument is analogous to the include
argument from vennCounts
or
vennDiagram
. Choosing "both" will select genes
that are differentially expressed in one or more comparisons, regardless of
direction. Choosing "up" or "down" will select genes that are only
differentially expressed in one direction. Choosing "same" will select genes
that are differentially expressed in the same direction. Choosing "sameup"
or "samedown" will select genes that are differentially expressed in the
same direction as well as 'up' or 'down'.
Note that this is different than sequentially choosing "up" and then "down". For instance, a gene that is upregulated in one comparison and downregulated in another comparison will be listed in the intersection of those two comparisons if "both" is chosen, it will be listed in only one comparison for both the "up" and "down" methods, and it will be listed in the union (e.g., not selected) if "same" is chosen.
Unlike vennSelect
, this function automatically creates both HTML and
CSV output files.
A list with two items. First, a list of HTMLReport
objects
from the ReportingTools package, which can be used to create an index page
with links to the HTML pages created by this function. See the help page for
HTMLReport in ReportingTools as well as the vignettes for more information.
The second item is a vennCounts
object from limma, which can be used
to create a Venn diagram, e.g., in a report if this function is called
within a Sweave or knitR pipeline.
James W. MacDonald [email protected]
This function is designed to take an ExpressionSet
an annotation
package and an lmFit
object, and output an annotated text file
containing t-statistics, p-values, and fold change data for all contrasts.
writeFit( fit, annotation = NULL, eset, touse = c("symbol", "genename", "accnum", "entrezid", "unigene") )
writeFit( fit, annotation = NULL, eset, touse = c("symbol", "genename", "accnum", "entrezid", "unigene") )
fit |
A |
annotation |
An annotation package, specific for the chip used in the analysis. |
eset |
An |
touse |
Character vector of BiMaps from annotation package. As an example, if the annotation package is the hgu133plus2.db package, then 'symbol' refers to the hgu133plus2SYMBOL BiMap. |
This function is designed to output annotation data as well as statistics (p-values, fold change, t-statistics) for all probes on a chip.
A data.frame
is returned.
James W. MacDonald <[email protected]>