Title: | Significance-based Modules Integrating the Transcriptome and Epigenome |
---|---|
Description: | This package builds on the Epimods framework which facilitates finding weighted subnetworks ("modules") on Illumina Infinium 27k arrays using the SpinGlass algorithm, as implemented in the iGraph package. We have created a class of gene centric annotations associated with p-values and effect sizes and scores from any researchers prior statistical results to find functional modules. |
Authors: | Neil Ari Wijetunga, Andrew Damon Johnston, John Murray Greally |
Maintainer: | Neil Ari Wijetunga <[email protected]>, Andrew Damon Johnston <[email protected]> |
License: | GPL (>=2) |
Version: | 1.35.0 |
Built: | 2024-10-31 05:31:15 UTC |
Source: | https://github.com/bioc/SMITE |
SMITE provides a method of scoring and visualizing multi-level epigenomic data in order to prioritize genes within a genome-wide experiment. These scores can then be used to identify subnetworks within an interaction network called modules. Each module represents a collection of highly interacting genes that are implicated by the experiment.
Package: | SMITE |
Type: | Package |
Version: | 1.0.0 |
Date: | 2015-07-06 |
License: | GPL (>=2) |
Neil Ari Wijetunga, Andrew Damon Johnston
Maintainer: [email protected], [email protected]
FEM BioNet
## NOTE: commented out for example. See vignette for better explanation ## options(stringsAsFactors=FALSE) data(methylationdata) methylation <- methylation[-which(is.na(methylation[, 5])), ] methylation[,5] <- replace(methylation[,5],methylation[,5] == 0, min(subset(methylation[,5], methylation[,5]!=0), na.rm=TRUE)) data(curated_expressiondata) data(hg19_genes_bed) data(histone_h3k4me1) #test_annotation<-makePvalueAnnotation( data=hg19_genes, #other_data=list(h3k4me1=h3k4me1), gene_name_col=5, other_tss_distance=5000) ##fill in expression data #test_annotation<-annotateExpression(test_annotation, expression_curated) ##fill in methylation data #test_annotation<-annotateModification(test_annotation, methylation, #weight_by=c(promoter="distance", body="distance", h3k4me1="distance"), #verbose=TRUE, mod_corr=TRUE) ##create a pvalue object that will count the effect of the h3k4me1 as ##bidirectional #test_annotation<-makePvalueObject(test_annotation, #effect_directions=c(methylation_promoter="decrease", #methylation_body="decrease", #methylation_h3k4me1="bidirectional")) ##normalize the pvalues compared to colExp #test_annotation<-normalizePval(test_annotation,ref="expression_pvalue", #method="rescale") ##score with all four features contributing #test_annotation<-SMITEscorePval(test_annotation, #weights=c(methylation_promoter=.3,methylation_body=.1,expression=.3, #methylation_h3k4me1=.3)) ##load REACTOME #load(system.file("data","Reactome.Symbol.Igraph.rda", package="SMITE")) ##run Spinglass using REACTOME network #test_annotation<-runSpinglass(test_annotation, REACTOME, maxsize=50, #num_iterations=10) ##run goseq on individual modules to determine bias #test_annotation <- runGOseq(test_annotation, #coverage=read.table(system.file("extdata", #"hg19_symbol_hpaii.sites.inbodyand2kbupstream.bed.gz", package="SMITE")), #type="kegg") ##search go seq output for keywords #searchGOseq(test_annotation, "Cell") ##Draw a network #plotModule(test_annotation, which_network=6, layout="fr") ##sample final file ## data(test_annotation_score_data)
## NOTE: commented out for example. See vignette for better explanation ## options(stringsAsFactors=FALSE) data(methylationdata) methylation <- methylation[-which(is.na(methylation[, 5])), ] methylation[,5] <- replace(methylation[,5],methylation[,5] == 0, min(subset(methylation[,5], methylation[,5]!=0), na.rm=TRUE)) data(curated_expressiondata) data(hg19_genes_bed) data(histone_h3k4me1) #test_annotation<-makePvalueAnnotation( data=hg19_genes, #other_data=list(h3k4me1=h3k4me1), gene_name_col=5, other_tss_distance=5000) ##fill in expression data #test_annotation<-annotateExpression(test_annotation, expression_curated) ##fill in methylation data #test_annotation<-annotateModification(test_annotation, methylation, #weight_by=c(promoter="distance", body="distance", h3k4me1="distance"), #verbose=TRUE, mod_corr=TRUE) ##create a pvalue object that will count the effect of the h3k4me1 as ##bidirectional #test_annotation<-makePvalueObject(test_annotation, #effect_directions=c(methylation_promoter="decrease", #methylation_body="decrease", #methylation_h3k4me1="bidirectional")) ##normalize the pvalues compared to colExp #test_annotation<-normalizePval(test_annotation,ref="expression_pvalue", #method="rescale") ##score with all four features contributing #test_annotation<-SMITEscorePval(test_annotation, #weights=c(methylation_promoter=.3,methylation_body=.1,expression=.3, #methylation_h3k4me1=.3)) ##load REACTOME #load(system.file("data","Reactome.Symbol.Igraph.rda", package="SMITE")) ##run Spinglass using REACTOME network #test_annotation<-runSpinglass(test_annotation, REACTOME, maxsize=50, #num_iterations=10) ##run goseq on individual modules to determine bias #test_annotation <- runGOseq(test_annotation, #coverage=read.table(system.file("extdata", #"hg19_symbol_hpaii.sites.inbodyand2kbupstream.bed.gz", package="SMITE")), #type="kegg") ##search go seq output for keywords #searchGOseq(test_annotation, "Cell") ##Draw a network #plotModule(test_annotation, which_network=6, layout="fr") ##sample final file ## data(test_annotation_score_data)
This is a usefule function to help text stand out on busy backgrounds like gene networks
addShadowText(x, y = NULL, labels, col = "white", bg = "black", theta = seq(pi/4, 2 * pi, length.out = 8), r = 0.1, ...)
addShadowText(x, y = NULL, labels, col = "white", bg = "black", theta = seq(pi/4, 2 * pi, length.out = 8), r = 0.1, ...)
x |
A numeric vector of x coordinates |
y |
A numeric vector of y coordinates |
labels |
A character vector to be plotted at the specified coordinates |
col |
The text color |
bg |
The color of the outline |
theta |
The number of shadows to plot |
r |
The radius for the shadows |
... |
Additional plotting arguments |
The function creates its effect by plotting theta shadows at r radius around the text to create the illusion of a text shadow
Adds shadow text to plot
This function was adapted by N. Ari Wijetunga for SMITE.
Greg.Snow <at> imail.org
http://article.gmane.org/gmane.comp.lang.r.general/147787
text, mtext
plot.new() addShadowText(x = .5,y = .5,"TEST",col="white",bg="gray")
plot.new() addShadowText(x = .5,y = .5,"TEST",col="white",bg="gray")
This function is used to create and load an ExpressionSet into a PvalueAnnotation. Using specified effect and p-value column or named columns that the function will use to determine the effect and p-value columns, it loads the data it into the PvalueAnnotation.
annotateExpression(pvalue_annotation, expr_data, effect_col = NULL, pval_col = NULL)
annotateExpression(pvalue_annotation, expr_data, effect_col = NULL, pval_col = NULL)
pvalue_annotation |
An S4 object of class PvalueAnnotation |
expr_data |
An object of class data.frame or matrix with row names corresponding to genes and atleast two columns with an effect and p-value for expression. |
effect_col |
A numeric specifying the column with an effect direction. If not specified the function will grep for a single named column from: "effect", "odds", "coeff" or "B" |
pval_col |
A numeric specifying the column with p-values. If not specified the function will grep for a single named column from: "pval","p.val","p_val" or "sig" |
The function will load the entire given expression dataset as an ExpressionSet in the expression slot,while the effect and p-value data will also be stored as an "AnnotatedDataFrame" in the phenoData slot of the ExpressionSet.
A PvalueAnnotation, an S4 object with the slot "expression" filled in.
N. Ari Wijetunga
annotateModification makePvalueAnnotation createPvalueObject
data(curated_expressiondata) data(test_annotation_score_data) ## Load Expression data into PvalueAnnotation ## test_annotation <- annotateExpression(pvalue_annotation=test_annotation, expression_curated) ## Extract entire ExpressionSet with expression data ## #slot(test_annotation,"expression") ## Extract expression data summary ## #head(extractExpression(pvalue_annotation=test_annotation))
data(curated_expressiondata) data(test_annotation_score_data) ## Load Expression data into PvalueAnnotation ## test_annotation <- annotateExpression(pvalue_annotation=test_annotation, expression_curated) ## Extract entire ExpressionSet with expression data ## #slot(test_annotation,"expression") ## Extract expression data summary ## #head(extractExpression(pvalue_annotation=test_annotation))
This function is the main "workhorse" function for SMITE because given a specific epigenetic modification (e.g. DNA methylation) it will 1) assess an internal correlation structure and 2) aggregate the modification over all intervals associated with a gene in the "makePvalueAnnotation" function.
annotateModification(pvalue_annotation, mod_data, weight_by = NULL, weight_by_method = "Stouffer", mod_included = NULL, mod_corr = TRUE, mod_type = "methylation", verbose = FALSE)
annotateModification(pvalue_annotation, mod_data, weight_by = NULL, weight_by_method = "Stouffer", mod_included = NULL, mod_corr = TRUE, mod_type = "methylation", verbose = FALSE)
pvalue_annotation |
An S4 object of class PvalueAnnotation. |
mod_data |
A dataframe or matrix derived from a bed file with thhe the first three columns as (chromosome, start, end), column 4 is the effect, and column 5 is the p-value. |
weight_by |
A vector with named elements specifying how modifications should be weighted within an interval. Must be one of: "distance" Use the distance from the gene TSS to weight the p-values and the combined effect such that events closer to the TSS are weighted more. Log distances are used. "pval" "p.value" "pvalue" "p_val" (DEFAULT) Do not weight p-values but weight the combined effect such by the signficance of that effect. ELSE Do not weight p-values or the combined effect. |
weight_by_method |
A character specifying which method should be used to combine p-values. Must be one of: "Stouffer" (DEFAULT) Stouffer's method for combing pvalues involves first taking the inverse standard normal CDF transformation of a vector of p-values followed by a weighted sum creating a new Z score with a standard normal distribution "Fisher" "fisher" "chisq" "chi" Fisher's method involves summing the -2ln(p) for each of k p's which follows an approximate chi square distribution with 2k degrees of freedom "Sidak" "sidak" "minimum" Sidak's adjustment is essentially the minimum p-value, with an added transformation to account for multiple comparisons. "binomial" The binomial probability assesses the probability of observing the observed number of p-value below a threshold (alpha=0.05) given the total number of p values and the probability of a false positive. |
mod_included |
A vector of named elements specifying for which intervals in the annotation the function should find combined scores (e.g. promoters). If not specified the assumption is that all type of intervals associated with a gene should be included. |
mod_corr |
A logical (TRUE/FALSE) specifying whether a correlation matrix should be estimated. The DEFAULT is TRUE. |
mod_type |
A character naming the modification that is being loaded. The DEFAULT is "methylation" and any modType string can be used, but will be referred to in downstream analysis. A unique name must be used for each modification that is loaded. When picking a variable modType should also avoid using "_" as it is used to split column names containing modType. |
verbose |
A logical specifying if the user wants updates about the progress of the function. |
This function is the main "workhorse" function for SMITE because given a specific epigenetic modification (e.g. DNA methylation) it will 1) assess an internal correlation structure and 2) aggregate the modification over all intervals associated with a gene in the "makePvalueAnnotation" function.
A an S4 object of class PvalueAnnotation with the slot modification (a GrangesList) filled in for each additional modification.
N. Ari Wijetunga
Fisher R. Statistical methods for research workers. Oliver and Boyd; Edinburgh: 1932.
Stouffer S, DeVinney L, Suchmen E. The American soldier: Adjustment during army life. Vol. 1. Princeton University Press; Princeton, US: 1949.
Sidak, Z. (1967). Rectangular confidence regions for the means of multivariate normal distributions, Journal of the American Statistical Association 62, 626 633.
removeModification annotateExpression makePvalueAnnotation createPvalueObject
options(stringsAsFactors=FALSE) ## Commented out below See vignette for more detailed usage information ## ## Load genome bed file ## #data(hg19_genes_bed) ## Create a PvalueAnnotation with defaults for promoter size## #test_annotation<-makePvalueAnnotation(data=hg19_genes, gene_name_col=5) ## Load DNA methylation bed file ## #data(methylationdata) #methylation<-methylation[-which(is.na(methylation[,5])),] #methylation[,5]<-replace(methylation[,5],methylation[, 5] == 0, #min(subset(methylation[,5], methylation[,5]!=0), na.rm=TRUE)) ## Load DNA methylation into PvalueAnnotation modCorr=F for example## ## NOTE: Commented out below. See vignette for better example ## #test_annotation <- annotateModification(pvalue_annotation=test_annotation, #mod_data=methylation, weight_by=c(promoter="distance", body="distance"), #verbose=FALSE, mod_corr=FALSE, mod_type="methylation")
options(stringsAsFactors=FALSE) ## Commented out below See vignette for more detailed usage information ## ## Load genome bed file ## #data(hg19_genes_bed) ## Create a PvalueAnnotation with defaults for promoter size## #test_annotation<-makePvalueAnnotation(data=hg19_genes, gene_name_col=5) ## Load DNA methylation bed file ## #data(methylationdata) #methylation<-methylation[-which(is.na(methylation[,5])),] #methylation[,5]<-replace(methylation[,5],methylation[, 5] == 0, #min(subset(methylation[,5], methylation[,5]!=0), na.rm=TRUE)) ## Load DNA methylation into PvalueAnnotation modCorr=F for example## ## NOTE: Commented out below. See vignette for better example ## #test_annotation <- annotateModification(pvalue_annotation=test_annotation, #mod_data=methylation, weight_by=c(promoter="distance", body="distance"), #verbose=FALSE, mod_corr=FALSE, mod_type="methylation")
A convenient function used to convert between gene ids from different gene annotations.
convertGeneIds(gene_IDs, ID_type, ID_convert_to, delim = NULL, verbose = FALSE)
convertGeneIds(gene_IDs, ID_type, ID_convert_to, delim = NULL, verbose = FALSE)
gene_IDs |
A vector of gene names. |
ID_type |
A character specifying the type of given annotation. Currently one of "refseq", "ensembleprot","uniprot" or "ensemble". In the case of ID_convert_to="entrez", "symbol" |
ID_convert_to |
A character specifying the type of desired annotation. Currently one of "symbol" or in they case of ID_type="symbol", "entrez" |
delim |
An optional character that will be removed from the beginning of each gene name. It can be a long string. |
verbose |
TRUE/FALSE Should the function be verbose? DEFAULTS to FALSE. |
This is a very usefule function to efficently convert between gene ids. It currently relies on enumeration of each possible conversion, which has limited it's use to mainly converting to gene symbol.
A character vector formatted to ID_convert_to
The function has enumerated combinations using AnnotationDBI. We can provide additional functionality if needed.
N. Ari Wijetunga < [email protected] >
AnnotationDBI, Biomart
data(genes_for_conversiontest) genes[,1] <- convertGeneIds(gene_IDs=genes[,1], ID_type="refseq", ID_convert_to="symbol")
data(genes_for_conversiontest) genes[,1] <- convertGeneIds(gene_IDs=genes[,1], ID_type="refseq", ID_convert_to="symbol")
A toy dataset of pre-cleaned gene expression data from RNA-seq. The file is effect and p-value with gene names as rownames.
data("curated_expressiondata")
data("curated_expressiondata")
A data frame with 20819 observations on the following 2 variables.
a character vector specifying gene
an numeric vector specifying effect (log fold change)
a numeric vector with a two sided p-value from DESeq analysis
This gene expression dataset is a randomized version of the Toxoplasma dataset used to benchmark SMITE. It no longer has NAs or p-values=0. Gene names were converted to gene symbols.
A dataframe with rownames as genes in Refseq format and columns for effects and pvalues derived from negative binomial testing of DESeq normalized values from RNA-seq.
Manuscript in preparation. Please see https://github.com/GreallyLab for more details.
data(curated_expressiondata)
data(curated_expressiondata)
This function allows the user to see the effect and p-value data that was loaded into a PvalueAnnotation before performing downstream analysis.
extractExpression(pvalue_annotation)
extractExpression(pvalue_annotation)
pvalue_annotation |
An S4 object of class PvalueAnnotation for which expression data has already been loaded via annotateExpression |
A data.frame pulled from the phenoData of the expression slot within a load PvalueAnnotation. The phenoData specifically hold the effect and p-value information.
N. Ari Wijetunga
annotateExpression
data(test_annotation_score_data) data(curated_expressiondata) ## Load Expression data into PvalueAnnotation ## test_annotation<-annotateExpression(test_annotation, expression_curated) ## Extract entire ExpressionSet with expression data ## #slot(test_annotation,"expression") ## Extract expression data summary ## head(extractExpression(pvalue_annotation=test_annotation))
data(test_annotation_score_data) data(curated_expressiondata) ## Load Expression data into PvalueAnnotation ## test_annotation<-annotateExpression(test_annotation, expression_curated) ## Extract entire ExpressionSet with expression data ## #slot(test_annotation,"expression") ## Extract expression data summary ## head(extractExpression(pvalue_annotation=test_annotation))
Having defined at least one genomic module using runSpinglass or runBioNet, this function allows you to interrogate the enriched terms for a specific module or combination of modules.
extractGOseq(pvalue_annotation, which_network = NULL) searchGOseq(pvalue_annotation, search_string, wholeword = FALSE)
extractGOseq(pvalue_annotation, which_network = NULL) searchGOseq(pvalue_annotation, search_string, wholeword = FALSE)
pvalue_annotation |
An S4 object of class PvalueAnnotation, for which module-finding and GOseq analysis have already been performed |
which_network |
A numeric vector of a length of at least one, corresponding to a particular functional module specifically for the extract function. |
search_string |
A character specfying a search string specifically for the search function. |
wholeword |
A logical (TRUE/FALSE) detemring whether the search string must be matched for whole word specifically for the search function. |
Goseq analysis is useful since it allows you to assess term/pathway enrichment in a collection of genes, while adjusting for bias data. Potential bias can be from aspects like gene length or probe density that influence the likelihood of finding a particular gene. For more information please see the goseq reference.
##Extract## A list with eahc element matching the specified module. Has columns identfying the term id, the over represented p-value, underrepresented p-value the total number in the category found in the module, the total number in the category and a more descirptive pathway name.
##Search## A matrix with columns identfying the module name, module position/significance , the specific enriched term, the rank of that term within all enriched terms and the total number of enriched terms.
N. Ari Wijetunga
Young MD, Wakefield MJ, Smyth GK and Oshlack A (2010). Gene ontology analysis for RNA-seq: accounting for selection bias. Genome Biology, 11, pp. R14.
runGOseq runSpinglass runBioNet extractModules plotModule
## Commented out below. See vignette for more details ## ##load sample data with only PvalueObject filled in## data(test_annotation_score_data) ## show goseq analysis for module 1 ## extractGOseq(test_annotation, 1) ## show goseq analysis for module 1 and 2 ## #extractGOseq(test_annotation, 1:2) ## search for term ## #searchGOseq(test_annotation, "Cell cycle")
## Commented out below. See vignette for more details ## ##load sample data with only PvalueObject filled in## data(test_annotation_score_data) ## show goseq analysis for module 1 ## extractGOseq(test_annotation, 1) ## show goseq analysis for module 1 and 2 ## #extractGOseq(test_annotation, 1:2) ## search for term ## #searchGOseq(test_annotation, "Cell cycle")
After having loaded modifications into a PvalueAnnotation, these functions can be used to display the GRanges with the modification of interest, or the data frame containg a summary of the combined effects.
extractModification(pvalue_annotation, mod_type = "methylation") extractModSummary(pvalue_annotation)
extractModification(pvalue_annotation, mod_type = "methylation") extractModSummary(pvalue_annotation)
pvalue_annotation |
An s4 object of calss PvalueAnnotation |
mod_type |
A string or character vector that must match one or more of the loaded modifications. If NULL (DEFAULT) then it will show all modifications. |
A GRanges object containing the modification(s) of interest or a data frame with a summary of the combined p-values and effects
N. Ari Wijetunga
extractExpression annotateModificaiton removeModification
##NOTE: Comment out in exmaple see vignette for more detailed usage ## ## Load genome bed file ## data(hg19_genes_bed) ## Load curated DNA methylation bed file ## #data(methylationdata) #methylation <- methylation[-which(is.na(methylation[,5])),] #methylation[, 5] <- replace(methylation[,5],methylation[,5] == 0, #min(subset(methylation[, 5], methylation[, 5] !=0 ), na.rm=TRUE)) ## Create a PvalueAnnotation with defaults for promoter size## test_annotation<-makePvalueAnnotation(data=hg19_genes, gene_name_col=5) ## Load DNA methylation into PvalueAnnotation ## #test_annotation <- annotateModification(pvalue_annotation=test_annotation, #methylation, weight_by=c(promoter="distance", body="distance"), verbose=TRUE, #mod_corr=FALSE, mod_type="methylation") ## Extract GRanges with modification data ## #extractModification(pvalue_annotation=test_annotation)
##NOTE: Comment out in exmaple see vignette for more detailed usage ## ## Load genome bed file ## data(hg19_genes_bed) ## Load curated DNA methylation bed file ## #data(methylationdata) #methylation <- methylation[-which(is.na(methylation[,5])),] #methylation[, 5] <- replace(methylation[,5],methylation[,5] == 0, #min(subset(methylation[, 5], methylation[, 5] !=0 ), na.rm=TRUE)) ## Create a PvalueAnnotation with defaults for promoter size## test_annotation<-makePvalueAnnotation(data=hg19_genes, gene_name_col=5) ## Load DNA methylation into PvalueAnnotation ## #test_annotation <- annotateModification(pvalue_annotation=test_annotation, #methylation, weight_by=c(promoter="distance", body="distance"), verbose=TRUE, #mod_corr=FALSE, mod_type="methylation") ## Extract GRanges with modification data ## #extractModification(pvalue_annotation=test_annotation)
Having identfied modules within a Pvalue annotation, this function allows the user to display 1 or more of the module genes.
extractModules(pvalue_annotation, which_module = NULL)
extractModules(pvalue_annotation, which_module = NULL)
pvalue_annotation |
An S4 object of class PvalueAnnotation for which Spinglass or BioNet has already been run. |
which_module |
A numeric vector specifying one or more module to display |
A list with each element containing the requested modules
N. Ari Wijetunga
plotModule runGOseq extractGOseq runSpinglass runBioNet
data(test_annotation_score_data) extractModules(pvalue_annotation=test_annotation, which_module=1)
data(test_annotation_score_data) extractModules(pvalue_annotation=test_annotation, which_module=1)
A function to obtain all gene scores
extractScores(pvalue_annotation)
extractScores(pvalue_annotation)
pvalue_annotation |
An S4 object of class PvalueAnnotation for which scores have already been calculated |
A named vector containing all gene scores
N. Ari Wijetunga
scorePval extractModules highScores
data(test_annotation_score_data) out <- extractScores(pvalue_annotation=test_annotation) head(out)
data(test_annotation_score_data) out <- extractScores(pvalue_annotation=test_annotation) head(out)
This toy dataset has 100 randomly selected RefSeq genes and can be used to test conversion functionality in SMITE
data("genes_for_conversiontest")
data("genes_for_conversiontest")
A data frame with 100 observations on the following 1 variables.
a character vector of RefSeqGene IDs
A dataframe with genes in Refseq format for conversion testing.
data("genes_for_conversiontest") genes[,1]<-convertGeneIds(gene_IDs=genes[,1], ID_type="refseq", ID_convert_to="symbol")
data("genes_for_conversiontest") genes[,1]<-convertGeneIds(gene_IDs=genes[,1], ID_type="refseq", ID_convert_to="symbol")
A gene anntation BED file containing columns for RefSeq name and Gene Symbol
data("hg19_genes_bed")
data("hg19_genes_bed")
A data frame with 41633 observations on the following 6 variables.
a character vector for chromosome
an integer vector for start position
an integer vector for end position
an character vector for RefSeq gene names
an character vector for Gene Symbol names
an character vector for strand
A BED files taken from the table browser.
A dataframe in BED format (chromosome, start, end) with additional columns for gene name as Refseq and gene symbol and strand.
Karolchik D, Hinrichs AS, Furey TS, Roskin KM, Sugnet CW, Haussler D, Kent WJ. The UCSC Table Browser data retrieval tool. Nucleic Acids Res. 2004 Jan 1;32 (Database issue):D493-6.
http://genome.ucsc.edu/
data(hg19_genes_bed)
data(hg19_genes_bed)
This function can be used to extract a subset of the highest scoring genes for other downstream analysis.
highScores(pvalue_annotation, alpha = 0.05)
highScores(pvalue_annotation, alpha = 0.05)
pvalue_annotation |
An S4 object of class PvalueAnnotation for which scoring has already been performed |
alpha |
A numeric specifying a threshold at signficant genes can be determined. DEFAULT is alpha=0.05. |
This function randomly samples the scores with replacement 100 times and for within each random sample for each score it determines the proportion of scores at or greater than the score. The average of these proportions over the 100 samples will be the new p-value/scores. All scores falling below the threshold will be returned.
A named vector of scores.
N. Ari Wijetunga
scorePval plotCompareScores runSpinglass runBioNet
data(test_annotation_score_data) ## Note: commented out for example. See vignette for better example ## #out <- highScores(pvalue_annotation=test_annotation, alpha=0.01)
data(test_annotation_score_data) ## Note: commented out for example. See vignette for better example ## #out <- highScores(pvalue_annotation=test_annotation, alpha=0.01)
A toy dataset of H3k4me1 peaks from liver ChIP-seq through the encode project. The file is a BED file.
data(histone_h3k4me1)
data(histone_h3k4me1)
A data frame with 75448 observations on the following 3 variables.
a character vector specifying chromsome
an integer vector specifying start
an integer vector specifying end
This is a BED file that specifices the consensus locations of three H3K4me1 ChIP-seq experiments performed on normal adult liver.
A dataframe in BED format (chromosome, start, end).
GSM669972, GSM621654, GSM537706
Roadmap Epigenomics Lister R, et al. Nature. 2009 Nov 19;462(7271):315-22 ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012 Sep 6;489(7414):57-74.
data(histone_h3k4me1) head(h3k4me1)
data(histone_h3k4me1) head(h3k4me1)
This function initializes a PvalueAnnotation using a gene BED file and optional BED files corresponding to interval datasets. This is a necessary first step in order to establish for each gene the gene promoter, body and associated intervals.
makePvalueAnnotation(data, other_data = NULL, other_tss_distance = 10000, promoter_upstream_distance = 1000, promoter_downstream_distance = 1000, strand_col = NULL, gene_name_col = NULL)
makePvalueAnnotation(data, other_data = NULL, other_tss_distance = 10000, promoter_upstream_distance = 1000, promoter_downstream_distance = 1000, strand_col = NULL, gene_name_col = NULL)
data |
A required gene annotation BED file like that obtained from the UCSC Table Browser. At a minimum BED files must have the first three columns as (chromosome, start, end). Additional required columns should correspond to the strand and gene name.The gene name needs to match the gene format desired for the interaction network. Duplicated gene names and associated gene annotations are removed. |
other_data |
A list of BED files corresponding to each additional interval file to be associated with genes. The function will use the other_tss_distance variable and the gene transcription start site (TSS) to find for each gene all intervals within [tss-other_tss_distance, tss+other_tss_distance]. |
other_tss_distance |
A vector specifying for each element of otherdata a distance from the gene TSS to consider that interval as related to a gene. If the length of other_tss_distance does not match the length of the otherdata list, then the first value is used for all datasets in the otherdata list. DEFAULTS to 10,000 base pairs. |
promoter_upstream_distance |
A numeric specifying how far upstream from the gene TSS is considered part of the gene promoter. DEFAULTS to 1,000 base pairs. |
promoter_downstream_distance |
A numeric specifying how far downstream from the gene TSS is considered part of the gene promoter. Gene bodies subtract the promoter_downstream region. DEFAULTS to 1,000 base pairs. |
strand_col |
A numeric specifying the column of the gene BED file (data) corresponding to the gene strand. If this is not provided, the function will attempt to determine the strand. |
gene_name_col |
A numeric specifying the column of the gene BED file (data) corresponding to the gene name. |
The required only input file is the gene annotation BED file that should have (as all BED files) the chromosome, start and end in columns 1,2 and 3, respectively. Also, there should be a column for gene name and gene strand. The user needs to determine distance from the gene transcription start site that will define the gene promoter. The gene body will then be calculated as the non-promoter overlapping sequence. If optional BED files are given as otherdata (e.g. transcription factor binding sites, histone modification peaks), then the user will also decide a distance from the gene TSS to associate each BED interval with a gene. For a particular BED file, each genes may have more than one interval that falls within the desired range around a TSS. Unique gene names are required and the function will automatically remove duplicated genes. We recommend deciding on an interaction network first and then loading a gene annotation BED file with the same names. This will likely necessitate allowing the function to pick one annotation of a gene, or pre- processing using some criteria (e.g. longest transcript).
An S4 object of class PvalueAnnotation containing slots for an annotation (GRangesList), an expression set, modifications (GRangesList), and a PvalueObject.
N. Ari Wijetunga
SMITE vignette
## Note: Commented out below. See vignette for more detailed usage information## ## Load genome bed file ## data(hg19_genes_bed) ## Create a PvalueAnnotation with defaults for promoter size## test_annotation <- makePvalueAnnotation(data=hg19_genes, gene_name_col=5)
## Note: Commented out below. See vignette for more detailed usage information## ## Load genome bed file ## data(hg19_genes_bed) ## Create a PvalueAnnotation with defaults for promoter size## test_annotation <- makePvalueAnnotation(data=hg19_genes, gene_name_col=5)
Having annotated modifications and expression data this function will assemble a PvalueObject within the slot "score_data" of a PvalueAnnotation. This is a necessary step before being able to run downstrem functions.
makePvalueObject(pvalue_annotation, effect_directions = NULL)
makePvalueObject(pvalue_annotation, effect_directions = NULL)
pvalue_annotation |
An S4 object of class PvalueAnnotation |
effect_directions |
A character vector with optional names specifying "increase" Modification is expected to increase as expression increase "decrease" Modification is expected to descrease as expression decreases "bidirectional" No direction is assumed between modification and direction |
The specified relationship between the modification and expression will be stored and then used when scoring.
An S4 object of class PvalueAnnotation with a slot for score_data filled it
N.Ari Wijetunga
makePvalueAnnotation
#NOTE: Commented out in example, please see vignette for more details## options(stringsAsFactors=FALSE) data(methylationdata) methylation <- methylation[-which(is.na(methylation[, 5])), ] #methylation[, 5] <- replace(methylation[, 5],methylation[, 5] == 0, #min(subset(methylation[, 5], methylation[, 5] != 0), na.rm=TRUE)) #data(curated_expressiondata) #data(hg19_genes_bed) #data(histone_h3k4me1) #test_annotation<-makePvalueAnnotation(data=hg19_genes, #other_data=list(h3k4me1=h3k4me1), gene_name_col=5,other_tss_distance=5000) #fill in expression data #test_annotation<-annotateExpression(test_annotation, expression_curated) #fill in methylation data #this step takes ~10 minutes #test_annotation<-annotateModification(test_annotation, methylation, #weight_by=c(promoter="distance",body="distance",h3k4me1="distance"), #verbose=TRUE, mod_corr=FALSE) #create a pvalue object that will count the effect of the h3k4me1 as #bidirectional #test_annotation<-makePvalueObject(pvalue_annotation=test_annotation, #effect_directions=c(methylation_promoter="decrease", #methylation_body="decrease", methylation_h3k4me1="bidirectional"))
#NOTE: Commented out in example, please see vignette for more details## options(stringsAsFactors=FALSE) data(methylationdata) methylation <- methylation[-which(is.na(methylation[, 5])), ] #methylation[, 5] <- replace(methylation[, 5],methylation[, 5] == 0, #min(subset(methylation[, 5], methylation[, 5] != 0), na.rm=TRUE)) #data(curated_expressiondata) #data(hg19_genes_bed) #data(histone_h3k4me1) #test_annotation<-makePvalueAnnotation(data=hg19_genes, #other_data=list(h3k4me1=h3k4me1), gene_name_col=5,other_tss_distance=5000) #fill in expression data #test_annotation<-annotateExpression(test_annotation, expression_curated) #fill in methylation data #this step takes ~10 minutes #test_annotation<-annotateModification(test_annotation, methylation, #weight_by=c(promoter="distance",body="distance",h3k4me1="distance"), #verbose=TRUE, mod_corr=FALSE) #create a pvalue object that will count the effect of the h3k4me1 as #bidirectional #test_annotation<-makePvalueObject(pvalue_annotation=test_annotation, #effect_directions=c(methylation_promoter="decrease", #methylation_body="decrease", methylation_h3k4me1="bidirectional"))
A toy dataset of raw DNA methylation from HELP-tagging. The file is a BED file with columns added for effect and p-value.
data(methylationdata)
data(methylationdata)
A data frame with 40000 observations on the following 5 variables.
a character vector specifying chromsome
an integer vector specifying start
an integer vector specifying end
a numeric vector with an effect direction (here it is average difference between two groups)
a numeric vector with a two sided t-test p-value
This is a small subset of a DNA methylation dataset is a randomized version of the Toxoplasma dataset used to benchmark. We could not include the larger version do to package size requirements but larger versions are available. See Github source below. It still has NAs and p-values=0.
A dataframe in BED format (chromosome, start, end) with additional columns for and effect direction and p-value derived from T-tests of HELP-tagging DNA methylation data.
Manuscript in preparation. Please see https://github.com/GreallyLab/SMITE for more details.
data(methylationdata) any(is.na(methylation[, 4])) any(methylation[, 4] == 0)
data(methylationdata) any(is.na(methylation[, 4])) any(methylation[, 4] == 0)
This function is a used to rescale compenent scores when distributions have been altered. There are two methods available.
normalizePval(pvalue_annotation, trans, ref = "expression_pvalue", method = "rescale")
normalizePval(pvalue_annotation, trans, ref = "expression_pvalue", method = "rescale")
pvalue_annotation |
An S4 object of class p-value annotation |
trans |
A vector specifying a specific Box-cox power transformation to use. If not specified, the optimal transformation powers will be estimated. |
ref |
A string that will be grepped from the names of the loaded expression data or modification/context pairing. All scores will be rescaled to match this reference's range. The DEFAULT is expression. |
method |
A string of either "Rescale" DEFAULT "rescale" Performs a logit transform and rescales all probabilites to the reference's logit transformed scale, then back-transforms "Box-Cox" "box-cox" "boxcox" "Boxcox" For each probility vector does a logit transform and then iterates between 0.5 and 0.95 by 0.05 to determine the most similar transformation to the logit transformed referece by a Wilcoxon- test |
Normalization may not be necessary but should improve some p-values from driving the majority of downstream scores and modules solely because of the scale of their p-values. All transformations are monotonic and are controlled for by use of randomization prodecure downstream. procedures downstream should
An S4 object of class PvalueAnnotation with normalized p-values withing the pval_data slot of the PvalueObject "score_data" slot
Plots densities of p-values before and after transform
N. Ari Wijetunga
makePvalueObject scorePval plotDensityPval
data(test_annotation_score_data) #test_annotation<-normalizePval(pvalue_annotation=test_annotation)
data(test_annotation_score_data) #test_annotation<-normalizePval(pvalue_annotation=test_annotation)
This function creates a hexbin of the log transformed p-value/score for any two expression or modification-context pairing within a PvalueObject inside of a PvalueAnnotation
plotCompareScores(pvalue_annotation, x_name, y_name, ...)
plotCompareScores(pvalue_annotation, x_name, y_name, ...)
pvalue_annotation |
An S4 object of class PvalueAnnotation for which a PvalueObject has already been created |
x_name |
A string to be grepped from the columns within the slot "pval_data" that is within the PvalueAnnotation slot "score_data." This column will be plotted on the x-axis with a direction specified from the corresponding effect column. |
y_name |
A string to be grepped from the columns within the slot "pval_data" that is within the PvalueAnnotation slot "score_data." This column will be plotted on the y-axis with a direction specified from the corresponding effect column. |
... |
Other plotting parameters |
This plotting function creates a hexbin plot of any two p-value vectors stored within a p-value object. It can be used to define relationships between direction and significance in different genomic contexts after having combined p-values.
A hexbin plot
N. Ari Wijetunga
makePvalueObject plotDensityPval
data(test_annotation_score_data) plotCompareScores(pvalue_annotation=test_annotation, x_name="expression", y_name="methylation_promoter")
data(test_annotation_score_data) plotCompareScores(pvalue_annotation=test_annotation, x_name="expression", y_name="methylation_promoter")
This function in called by the normalizePval function, but can also be called by the user to visualize the relative densities of combined p-values (scores).
plotDensityPval(pvalue_annotation, ref = "expression_pvalue", ...)
plotDensityPval(pvalue_annotation, ref = "expression_pvalue", ...)
pvalue_annotation |
An S4 object of class PvalueAnnotation. |
ref |
A character specifying the name of the reference category. DEFAULT is "expression_pvalue" |
... |
Additional plotting arguments |
Plots a multidensity plot.
N. Ari Wijetunga
## Load test annotation with only score data ## data(test_annotation_score_data) plotDensityPval(pvalue_annotation=test_annotation)
## Load test annotation with only score data ## data(test_annotation_score_data) plotDensityPval(pvalue_annotation=test_annotation)
This function is an adapted version of renderModule available through Epimods. We have added optional functionality including plotting the actual raw data onto the node edges, adding goseq annotation to the plot, legends and plotting modes.
plotModule(pvalue_annotation, p_thresh = 0.05, which_network = 1, goseq = FALSE, layout = "fr", legend = TRUE, namestyle = "symbol", suppress_details = FALSE, meth_hi_col = "blue", meth_low_col = "yellow1", meth_mid_col = "gray90", exp_hi_col = "red1", exp_low_col = "chartreuse1", exp_mid_col = "gray90", label_scale = TRUE, label_shadow = FALSE, compare_plot=FALSE, pdf_out=NULL)
plotModule(pvalue_annotation, p_thresh = 0.05, which_network = 1, goseq = FALSE, layout = "fr", legend = TRUE, namestyle = "symbol", suppress_details = FALSE, meth_hi_col = "blue", meth_low_col = "yellow1", meth_mid_col = "gray90", exp_hi_col = "red1", exp_low_col = "chartreuse1", exp_mid_col = "gray90", label_scale = TRUE, label_shadow = FALSE, compare_plot=FALSE, pdf_out=NULL)
pvalue_annotation |
An S4 object of class PvalueAnnotation |
p_thresh |
A numeric specifying a threshold for plotting raw data on edges of nodes. DEFAULT is alpha=0.05. Items above this threshold will be classified as "mid" instead of "high" or "low" " |
which_network |
A numeric specifying which network to plot. DEFAULTS to 1, and will not plot another network until specified explicitly. |
goseq |
A logical indicating whether to plot goseq results for the module on the right hand side of the plot. |
layout |
A character string as either "fr" (DEFAULT) for fruchterman.reingold or "circle" for a circular plot. |
legend |
A logical (TRUE(DEFAULT)/FALSE) specifying whether a legend should be drawn. |
namestyle |
A character string as either "symbol" (DEFAULT) for gene symbols or "refseq" for RefSeq genes. If modules were performed on RefSeq genes, then the function will plot with gene symbols so that it is more useful. |
suppress_details |
A logical (TRUE(DEFAULT)/FALSE) indicating whether border raw data information should be plotted. |
meth_hi_col |
A color to be associated with signficant modification data with positive effects |
meth_low_col |
A color to be associated with signficant modification data with negative effects |
meth_mid_col |
A color to be associated with non-signficant modification data |
exp_hi_col |
A color to be associated with signficant expression data with postive effects |
exp_low_col |
A color to be associated with signficant expression data with negative effects |
exp_mid_col |
A color to be associated with non-signficant expression data |
compare_plot |
A logical (TRUE/FALSE(DEFAULT)) indicating whether two plots should be drawn side by side, one with raw data and one without |
label_scale |
A logical (TRUE(DEFAULT)/FALSE) indicating whether whether the node label should be scaled with the node score |
label_shadow |
A logical (TRUE/FALSE(DEFAULT)) indicating whether whether the node label should have a white text shadow for easier label reading |
pdf_out |
A string indicating a location to which the function should output a pdf. If NULL (DEFAULT) then no pdf is made. |
A plot of the module
N. Ari Wijetunga
extractModules
data(test_annotation_score_data) #plotModule(pvalue_annotation=test_annotation, which_network=2) #plotModule(pvalue_annotation=test_annotation, which_network=2, #suppressDetails=TRUE)
data(test_annotation_score_data) #plotModule(pvalue_annotation=test_annotation, which_network=2) #plotModule(pvalue_annotation=test_annotation, which_network=2, #suppressDetails=TRUE)
This is an Igraph network that was created using the REACTOME protein-protein interaction database.
data("Reactome.Symbol.Igraph")
data("Reactome.Symbol.Igraph")
An igraph object with 5770 nodes and 114288 edges
gene names as gene symbols
paired genes that interact
The provided igraph file was created using the igraph package and the interaction file provided from the reference.
An Igraph network based off of REACTOME interactions
REACTOME
http://www.reactome.org/pages/download-data/
data(Reactome.Symbol.Igraph) head(igraph::V(REACTOME))
data(Reactome.Symbol.Igraph) head(igraph::V(REACTOME))
After using the annotateModification function to load a modification into a PValue annotation, you may wish to remove a modification or reannotate one, which requires removing it first.
removeModification(pvalue_annotation, mod_type = "methylation")
removeModification(pvalue_annotation, mod_type = "methylation")
pvalue_annotation |
An S4 object of class PvalueAnnotation |
mod_type |
A character string that identifies a type of modification within a PvalueAnnotation. |
An S4 object of class PvalueAnnotation
N. Ari Wijetunga
annotateModification extractModification extractModSummary
##NOTE: Commented out in example. ## ## Please see vignette for more detailed usage information ## ## Load genome bed file ## #data(hg19_genes_bed) ## Load curated DNA methylation bed file ## data(methylationdata) methylation <- methylation[-which(is.na(methylation[, 5])), ] methylation[, 5] <- replace(methylation[, 5], methylation[, 5] == 0, min(subset(methylation[, 5], methylation[, 5] != 0), na.rm=TRUE)) #meth1<-methylation ## make second curated test methylation bed file ## #meth2<-methylation ## Create a PvalueAnnotation with defaults for promoter size## #test_annotation<-makePvalueAnnotation(data=hg19_genes, gene_name_col=5) ## Load DNA methylation into PvalueAnnotation ## #test_annotation<-annotateModification(annotation=test_annotation, #mod_data=meth1, weight_by=c(promoter="distance",body="distance"),verbose=TRUE, #mod_corr=TRUE,mod_type="methylation") ## Extract GRanges with modification data ## #extractModification(test_annotation) ## Load second dataset bed file ## #test_annotation<-annotateModification(pvalue_annotation=test_annotation, #mod_data=meth2, weight_by=c(promoter="distance",body="distance"), #verbose=TRUE, mod_corr=TRUE,mod_type="hydroxy") ## Extract GRanges with both modification dataset loaded ## #head(extractModification(test_annotation,"hydroxy")) ## Unload DNA hydroxymethylation form PvalueAnnotation ## #test_annotation<-removeModification(pvalue_annotation=test_annotation, #mod_type="hydroxy") ## Extract GRanges to see only one modification dataset loaded ## #head(extractModification(pvalue_annotation=test_annotation))
##NOTE: Commented out in example. ## ## Please see vignette for more detailed usage information ## ## Load genome bed file ## #data(hg19_genes_bed) ## Load curated DNA methylation bed file ## data(methylationdata) methylation <- methylation[-which(is.na(methylation[, 5])), ] methylation[, 5] <- replace(methylation[, 5], methylation[, 5] == 0, min(subset(methylation[, 5], methylation[, 5] != 0), na.rm=TRUE)) #meth1<-methylation ## make second curated test methylation bed file ## #meth2<-methylation ## Create a PvalueAnnotation with defaults for promoter size## #test_annotation<-makePvalueAnnotation(data=hg19_genes, gene_name_col=5) ## Load DNA methylation into PvalueAnnotation ## #test_annotation<-annotateModification(annotation=test_annotation, #mod_data=meth1, weight_by=c(promoter="distance",body="distance"),verbose=TRUE, #mod_corr=TRUE,mod_type="methylation") ## Extract GRanges with modification data ## #extractModification(test_annotation) ## Load second dataset bed file ## #test_annotation<-annotateModification(pvalue_annotation=test_annotation, #mod_data=meth2, weight_by=c(promoter="distance",body="distance"), #verbose=TRUE, mod_corr=TRUE,mod_type="hydroxy") ## Extract GRanges with both modification dataset loaded ## #head(extractModification(test_annotation,"hydroxy")) ## Unload DNA hydroxymethylation form PvalueAnnotation ## #test_annotation<-removeModification(pvalue_annotation=test_annotation, #mod_type="hydroxy") ## Extract GRanges to see only one modification dataset loaded ## #head(extractModification(pvalue_annotation=test_annotation))
With BioNet, a researcher can find a single interconnected gene module using the highest scoring genes generated in a PvalueAnnotation. This function will load the module into the PvalueAnnotation for visualization and downstream analysis.
runBioNet(pvalue_annotation, network, alpha = 0.05)
runBioNet(pvalue_annotation, network, alpha = 0.05)
pvalue_annotation |
An S4 object of class PvalueAnnotation that has already had scores generated. |
network |
An interaction network of class graphNEL or igraph. |
alpha |
A numeric specfiying a cutoff for high scoring genes to be return with the highScores function. |
The input of p-values to BioNet discussed in the BioNet vignette involves first modeling p-values as a Beta-uniform mixture model to obtain the actual corresponding probability function values. Since our scoring method produces p-values/scores that are uniform in distribution, we input them directly into the BioNet algorithm. For more details on BioNet see the reference or runFastHeinz in the BioNet package.
A PvalueAnnotation with a loaded module.
This is a wrapper function to run BioNet. The actual BioNet code was created by Beisser et al.
N. Ari Wijetunga
Beisser et al. BioNet: an R-Package for the functional analysis of biological networks. Bioinformatics. 2010 Apr 15;26(8):1129-30. doi: 10.1093/bioinformatics/btq089. Epub 2010 Feb 25.
plotModule extractModule runGOseq
## load test data ## data(test_annotation_score_data) ## NOTE: commented out for example. See vignette for better explanation ## #load reactome network with gene symbols ## #load(system.file("data","Reactome.Symbol.Igraph.rda", package="SMITE")) ## run BioNet ## #test_annotation<-runBioNet(pvalue_annotation=test_annotation, #network = REACTOME) ## view module ## #extractModules(pvalue_annotation=test_annotation, 1) ## plot module ## #plotModule(pvalue_annotation=test_annotation, which.network=1)
## load test data ## data(test_annotation_score_data) ## NOTE: commented out for example. See vignette for better explanation ## #load reactome network with gene symbols ## #load(system.file("data","Reactome.Symbol.Igraph.rda", package="SMITE")) ## run BioNet ## #test_annotation<-runBioNet(pvalue_annotation=test_annotation, #network = REACTOME) ## view module ## #extractModules(pvalue_annotation=test_annotation, 1) ## plot module ## #plotModule(pvalue_annotation=test_annotation, which.network=1)
This function allows pathway annotation of identified modules.
runGOseq(pvalue_annotation, p_thresh = 0.05, supply_cov=TRUE, coverage=NULL, type = "reactome")
runGOseq(pvalue_annotation, p_thresh = 0.05, supply_cov=TRUE, coverage=NULL, type = "reactome")
pvalue_annotation |
An S4 object of class PvalueAnnotation, for which module-finding has already been performed |
p_thresh |
A numeric specifying a threshold for signficance of FDR (q-values). DEFAULT is alpha=0.05 |
supply_cov |
A logical specifying whether or not the user wants to supply their own coverage (TRUE), or would like SMITE to calculate the coverage based on methylation data already inputted. DEFAULT is TRUE. |
coverage |
A data.frame that is a bed file (chr start stop) folowed by a gene name and a numeric specifying the bias data (e.g. gene length or number of probes related to gene). DEFAULT is null |
type |
Either "kegg" to run KEGG analysis or "reactome" to run a REACTOME analysis |
Goseq analysis is useful since it allows you to assess term/pathway enrichment in a collection of genes, while adjusting for bias data. Potential bias can be from aspects like gene length or probe density that influence the likelihood of finding a particular gene. For more information please see the goseq reference.
The function will compare all of the genes within a module to known pathways and terms to find the terms that are most enriched within a module. In this way, this tool allows a reasearch to find a functional importance of a module.
We currently offer KEGG and REACTOME analysis, although additional pathway tools may be added in the near future.
A PvalueAnnotation with goseq annotated modules.
This is a wrapper function written by N. Ari Wijetunga for the package SMITE.
Matthew D. Young myoung at wehi.edu.au
Young MD, Wakefield MJ, Smyth GK and Oshlack A (2010). Gene ontology analysis for RNA-seq: accounting for selection bias. Genome Biology, 11, pp. R14.
searchGOseq extractGOseq runSpinglass runBioNet extractModules plotModule
##load sample data with only PvalueObject filled in## data(test_annotation_score_data) ## NOTE commented out in example. Please see Vignette for better example ## #test_annotation<-runGOseq(pvalue_annotation=test_annotation, #coverage=read.table( #system.file("extdata", "hg19_symbol_hpaii.sites.inbodyand2kbupstream.bed.gz", #package="SMITE"),stringsAsFactors=FALSE),type="kegg") ## search for a term ## searchGOseq(test_annotation,"Cell cycle") ## show goseq analysis for module 1 ## #extractGOseq(test_annotation,1)
##load sample data with only PvalueObject filled in## data(test_annotation_score_data) ## NOTE commented out in example. Please see Vignette for better example ## #test_annotation<-runGOseq(pvalue_annotation=test_annotation, #coverage=read.table( #system.file("extdata", "hg19_symbol_hpaii.sites.inbodyand2kbupstream.bed.gz", #package="SMITE"),stringsAsFactors=FALSE),type="kegg") ## search for a term ## searchGOseq(test_annotation,"Cell cycle") ## show goseq analysis for module 1 ## #extractGOseq(test_annotation,1)
This function is a function to prepare the data for calling the Spinglass network algorithm.
runSpinglass(pvalue_annotation, network, random_alpha = 0.05, gam = 0.5, node_alpha = 0.05, maxsize = 500, minsize = 8, num_iterations = 1000, simplify = TRUE)
runSpinglass(pvalue_annotation, network, random_alpha = 0.05, gam = 0.5, node_alpha = 0.05, maxsize = 500, minsize = 8, num_iterations = 1000, simplify = TRUE)
pvalue_annotation |
An S4 object of class PvalueAnnotation |
network |
An graph object of class graphNEL or igraph |
random_alpha |
A numeric specifying a threshold with with to determine module signficance after randomization |
gam |
A parameter used by the Spinglass algorithm |
node_alpha |
The proportion of nodes to be used as seeds for the community detection |
maxsize |
The maximum module size |
minsize |
The minimum module size |
num_iterations |
The number of randomizations that will be computed to determine whether the module is significant by chance |
simplify |
A logical (TRUE(DEFAULT)/FALSE) that specifies whether network should be simplified by removing self loops and repeated edges |
In the provided Epimods reference, West et al outlined the advantages of using the spin-glass algorithm in the detection of modules. Please consult the reference for more detailed information on the spin-glass algorithm implemented in the package igraph.
Like Epimods, this function employs the spin-glass algorithm implemented in igraph and uses random permutations to assess the "modularity," the number and strength of connected nodes, of a module. However, SMITE scores are interpreted as Chi-square distributed statistics whenever possible, rather than the weighted-T-statistic in Epimods.
An S4 object of class PvalueAnnotation with modules loaded
This function was adapted from a function in the Epimods package that employs the spin-glass algorithm and uses random permutations to assess the "modularity" of a module . The original function was created by West et al.
N. Ari Wijetunga
James West, Stephan Beck, Xiangdong Wang & Andrew E. Teschendorff An integrative network algorithm identifies age-associated differential methylation interactome hotspots targeting stem-cell differentiation pathway. Scientific Reports 3, Article number: 1630 (2013)
https://code.google.com/p/epimods/
FEM runBioNet extractModules plotModule
data(test_annotation_score_data) #load(system.file("data","Reactome.Symbol.Igraph.rda", package="SMITE")) ## NOTE: commented out for example. See vignette for better explanation ## #test_annotation <- runSpinglass(pvalue_annotation=test_annotation, #network=REACTOME, maxsize=50, num_iterations=10) plotModule(test_annotation, which_network=6, layout="fr")
data(test_annotation_score_data) #load(system.file("data","Reactome.Symbol.Igraph.rda", package="SMITE")) ## NOTE: commented out for example. See vignette for better explanation ## #test_annotation <- runSpinglass(pvalue_annotation=test_annotation, #network=REACTOME, maxsize=50, num_iterations=10) plotModule(test_annotation, which_network=6, layout="fr")
This function uses an a priori weighting scheme to combine scores for a given gene.
scorePval(pvalue_annotation, weights)
scorePval(pvalue_annotation, weights)
pvalue_annotation |
An S4 object of class PvalueAnnotation, for which makePvalueObject has already been run. |
weights |
A numeric vector of the relative importance of expression, modifications, and genomic contexts toward the final score. Names should be provided that match the "modfication_genomicfeature" format, except for expression. While the scores do not have to add up to 1, it is good practice to impose this restriction in order to track the relative importance. |
Because each weighting scheme generates scores from a distribution that will change depending on the analysis inputs, the function will randomly sample the final scores and compare each derived score to this simulated distribution.
If no names are given, then the function will assume the weights are in the order that it finds a particular "modification_genomicfeature" and it will print the weighting scheme so that you can verify it is correct. The total number of weights must match the total number of modifications*genomicfeatures+1 for expression.
After calculating a combined score (using a Stouffer's weighted statistic), a new p-value is derived using a non-parametric sampling approach.
An S4 object of class PvalueAnnotation.
N. Ari Wijetunga
options(stringsAsFactors=FALSE) data(test_annotation_score_data) ## NOTE: commented out for example. See vignette for better explanation ## #test_annotation<-scorePval(pvalue_annotation=test_annotation, #weights=c(methylation_promoter=.3,methylation_body=.1,expression=.3, #methylation_h3k4me1=.3))
options(stringsAsFactors=FALSE) data(test_annotation_score_data) ## NOTE: commented out for example. See vignette for better explanation ## #test_annotation<-scorePval(pvalue_annotation=test_annotation, #weights=c(methylation_promoter=.3,methylation_body=.1,expression=.3, #methylation_h3k4me1=.3))
This function performs a weighted Stouffer's mehtod of combining p-values.
stoufferTest(pvalues, weights)
stoufferTest(pvalues, weights)
pvalues |
A vector of p-values. |
weights |
Optional weights used when combining probabilites. If no weights are given then the p-values are equally weighted. |
For each p-value the inverse standard normal CDF is applied and Z scores are derived. Z-scores are then summed and a new Z score is transformed back to a p-value.
A numeric p-value that represents the standard normal CDF of the combined Z statistic.
This function was adapted from the function written on the Fisher's Method wikipedia page.
https://en.wikipedia.org/wiki/Fisher's_method
Stouffer S, DeVinney L, Suchmen E. The American soldier: Adjustment during army life. Vol. 1. Princeton University Press; Princeton, US: 1949.
## Generate test weights ## weights<-runif(10, 1,100) weights<-sort(weights) ## Generate test p-values## pvals<-runif(10,0,1) ## run stoufferTest ## stoufferTest(pvalues = pvals, weights=1/weights)
## Generate test weights ## weights<-runif(10, 1,100) weights<-sort(weights) ## Generate test p-values## pvals<-runif(10,0,1) ## run stoufferTest ## stoufferTest(pvalues = pvals, weights=1/weights)
This Pvalue annotation has only scoring data filled in to use in late pipeline "SMITE" functions. It can be used to skip the loading data phase of analysis and test latter functionality.
data("test_annotation_score_data")
data("test_annotation_score_data")
A PvalueAnnotation with the following slots
a PvalueObject with slots corresponding to pval_data, effect_data, genes, signs_index, scores, trans, scoring_vector, and module_otuput
This is a PvalueAnnotation which has had all of the pre-scoring data removed so that it is only usefule for using functions beginning with SMITE and SMITE plotting functions.
A PvalueAnnotation with the score_data slot containing toy scores
data(test_annotation_score_data) plotDensityPval(test_annotation) head(extractScores(test_annotation))
data(test_annotation_score_data) plotDensityPval(test_annotation) head(extractScores(test_annotation))