Title: | Differential Expression Meta-Analysis |
---|---|
Description: | performing all the steps of gene expression meta-analysis considering the possible existence of missing genes. It provides the necessary functions to be able to perform the different methods of gene expression meta-analysis. In addition, it contains functions to apply quality controls, download GEO datasets and show graphical representations of the results. |
Authors: | Juan Antonio Villatoro-García [aut, cre], Pedro Carmona-Sáez [aut] |
Maintainer: | Juan Antonio Villatoro-García <[email protected]> |
License: | GPL-2 |
Version: | 1.15.0 |
Built: | 2024-10-30 05:24:35 UTC |
Source: | https://github.com/bioc/DExMA |
Set all datasets in the same ID (Official Gene Symbol, Entrez or Ensembl)
allSameID(objectMA, finalID = "GeneSymbol", organism="Homo sapiens")
allSameID(objectMA, finalID = "GeneSymbol", organism="Homo sapiens")
objectMA |
A list of list. Each list contains two elements. The first element is the expression matrix (genes in rows and sample in columns) and the second element is a vector of zeros and ones that represents the state of the different samples of the expression matrix. 0 represents one group (controls) and 1 represents the other group (cases). The result of the CreateobjectMA can be used too. |
finalID |
A character that indicates the final ID all the different studies will have. To know the available ids, you can write avaliableIDs. |
organism |
A character that indicates the organism of the data. To know the avaliable organisms write avaliableOrganism |
The same list with all the datasets in the same selected gene ID.
Juan Antonio Villatoro Garcia, [email protected]
data(DExMAExampleData) sameData <- allSameID(objectMA = maObjectDif, finalID = "GeneSymbol", organism = "Homo sapiens") sameData
data(DExMAExampleData) sameData <- allSameID(objectMA = maObjectDif, finalID = "GeneSymbol", organism = "Homo sapiens") sameData
It eliminates the effects of batch or bias of the covariates
batchRemove(expressionMatrix, pheno, formula, mainCov = NULL, nameGroup, ...)
batchRemove(expressionMatrix, pheno, formula, mainCov = NULL, nameGroup, ...)
expressionMatrix |
A matrix or data frame with genes in rows and samples in columns. An ExpressionSet object can be used too |
pheno |
A dataframe with samples in rows and covariates in colums. |
formula |
Formula of the covariates that are wanted to be corrected |
mainCov |
Name of the main covariate to be corrected |
nameGroup |
Name of the column of the Phenodata object in which the reference groups (cases and controls) are |
... |
other arguments are passed to lmFit fucntion of limma package |
The Expression Matrix with the bias or batch effect corrected. Moreover a plot of the visualization of the association between principal components and covariates is shown.
Juan Antonio Villatoro Garcia, [email protected]
Martin Lauss (2019). swamp: Visualization, Analysis and Adjustment of High-Dimensional Data in Respect to Sample Annotations. R package version 1.5.1. https://CRAN.R-project.org/package=swamp
data(DExMAExampleData) batchRemove(listMatrixEX$Study2, listPhenodatas$Study2, formula=~gender+race, nameGroup="condition")
data(DExMAExampleData) batchRemove(listMatrixEX$Study2, listPhenodatas$Study2, formula=~gender+race, nameGroup="condition")
This function uses the Hedges'g estimator to calulate the different Effects size and their variances for each genes and for each dataset.
calculateES(objectMA, missAllow = 0.3)
calculateES(objectMA, missAllow = 0.3)
objectMA |
A list of list. Each list contains two elements. The first element is the expression matrix (genes in rows and sample in columns) and the second element is a vector of zeros and ones that represents the state of the diffenrent samples of the expression matrix. 0 represents one group (controls) and 1 represents the other group (cases). The result of the CreateobjectMA can be used too. |
missAllow |
a number that indicates the maximun proportion of missing values allowed in a sample. If the sample has more proportion of missing values the sample will be eliminated. In the other case the missing values will be imputed using the K-NN algorithm. |
A list formed by three elements:
First element (ES) is a dataframe were columns are each of the studies (datasets) and rows are the genes. Each element of the dataframe represents the Effect Size.
Second element (Var) is a dataframe were columns are each of the studies (datasets) and rows are the genes. Each element of the dataframe represents the variance of the Effect size.
Third element (logFC) is a dataframe were columns are each of the studies (datasets) and rows are the genes. Each element of the dataframe represents the log Fold Changes.
Juan Antonio Villatoro Garcia, [email protected]
createObjectMA
, metaAnalysisDE
data(DExMAExampleData) resultsEffects <- calculateES(objectMA = maObject, missAllow = 0.3) resultsEffects
data(DExMAExampleData) resultsEffects <- calculateES(objectMA = maObject, missAllow = 0.3) resultsEffects
It allows the creation of an object to perform meta-analysis.
createObjectMA( listEX, listPheno = NULL, namePheno = c(rep(1, length(listEX))), expGroups = c(rep(1, length(listEX))), refGroups = c(rep(2, length(listEX))) )
createObjectMA( listEX, listPheno = NULL, namePheno = c(rep(1, length(listEX))), expGroups = c(rep(1, length(listEX))), refGroups = c(rep(2, length(listEX))) )
listEX |
A list of dataframes or matrix (genes in rows and sample in columns). A list of ExpressionSets can be used too |
listPheno |
A list of phenodatas (dataframes or matrix). If the object listEX is a list of ExpressionSets this element can be null. |
namePheno |
A list or vector of the different colunm names or positions from the phenodatas where the experimental and reference groups are identified. Each element of namePheno correspont to its equivalent element in the listPheno (default a vector of 1, all the first columns of each elements of listPheno are selected). |
expGroups |
A list of vectors or a vector containing the names or the positions with which we identify the elements of the experiment groups (cases) of the namePheno element (default a vector of 1, all the first groups are selected) |
refGroups |
A list of vectors or a vector containing the names or the positions with which we identify the elements of the reference groups (control) of the namePheno elements (default a vector of 1, all the first groups are selected) |
The object needed to perform meta-analysis. This object is list of nested lists. Each list contains two elements:
The first element is the expression matrix (genes in rows and sample in columns)
The second element is a vector of zeros and ones that represents the state of the diffenrent samples of the expression matrix. 0 represents reference group (controls) and 1 represents experimental group (cases).
Juan Antonio Villatoro Garcia, [email protected]
data(DExMAExampleData) phenoGroups = c("condition", "condition", "state", "state") phenoCases = list(Study1 = "Diseased", Study2 = c("Diseased", "ill"), Study3 = "Diseased", Study4 = "ill") phenoControls = list(Study1 = "Healthy", Study2 = c("Healthy", "control"), Study3 = "Healthy", Study4 = "control") newObjectMA <- createObjectMA(listEX=listMatrixEX, listPheno = listPhenodatas, namePheno=phenoGroups, expGroups=phenoCases, refGroups = phenoControls) newObjectMA
data(DExMAExampleData) phenoGroups = c("condition", "condition", "state", "state") phenoCases = list(Study1 = "Diseased", Study2 = c("Diseased", "ill"), Study3 = "Diseased", Study4 = "ill") phenoControls = list(Study1 = "Healthy", Study2 = c("Healthy", "control"), Study3 = "Healthy", Study4 = "control") newObjectMA <- createObjectMA(listEX=listMatrixEX, listPheno = listPhenodatas, namePheno=phenoGroups, expGroups=phenoCases, refGroups = phenoControls) newObjectMA
Auxiliary function to check if data are log transfromed and transformed if it are not log-transformed
dataLog(objectMA)
dataLog(objectMA)
objectMA |
A list of list. Each list contains two elements. The first element is the expression matrix (genes in rows and sample in columns) and the second element is a vector of zeros and ones that represents the state of the different samples of the expression matrix. 0 represents one group (controls) and 1 represents the other group (cases). The result of the CreateobjectMA should be used. |
The same object with log-transformed expression matrix
Juan Antonio Villatoro Garcia, [email protected]
data(DExMAExampleData) dataLog(maObject)
data(DExMAExampleData) dataLog(maObject)
Download different ExpressionSets objects from the GEO database
downloadGEOData(GEOobject, directory = getwd())
downloadGEOData(GEOobject, directory = getwd())
GEOobject |
a vector of character where each element represents a GEO object for downloading. |
directory |
The directory where the different downloaded GSE Series Matrix files from GEO will be stored. By default they are downloaded to the working directory |
This function internally uses getGEO function of GEOquery package. However, downloadGEO allows you to download multiple files at the same time.
A list of the different ExpressionSets
Juan Antonio Villatoro Garcia, [email protected]
Davis, S. and Meltzer, P. S. GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor. Bioinformatics, 2007, 14, 1846-1847
## Not run: GEOobjects<- c("GSE4588", "GSE10325") dataGEO<-downloadGEOData(GEOobjects) dataGEO ## End(Not run)
## Not run: GEOobjects<- c("GSE4588", "GSE10325") dataGEO<-downloadGEOData(GEOobjects) dataGEO ## End(Not run)
It allows the creation of a element of the object needed to perform meta-analysis
elementObjectMA( expressionMatrix, pheno = NULL, groupPheno, expGroup = 1, refGroup = 2 )
elementObjectMA( expressionMatrix, pheno = NULL, groupPheno, expGroup = 1, refGroup = 2 )
expressionMatrix |
A dataframe or matrix that contanining genes in rows and samples if columns. An ExpressionSet object can be used too. |
pheno |
A data frame or a matrix containing samples in rows and covariates in columns. If NULL (default), pheno is extracted from the ExpressionSet object |
groupPheno |
The column name or position from pheno where experimental group (cases) and reference group (control) are identified |
expGroup |
The group name or position from groupPheno variable used as experimental group (cases). By default the first group (character) is taken |
refGroup |
The group name or position from groupPheno variable used as reference group (control). By default the second group (character) is taken |
An element that can be included in meta-analysis object.
Juan Antonio Villatoro Garcia, [email protected]
data(DExMAExampleData) ExpressionSetStudy5 newElem <-elementObjectMA(expressionMatrix = ExpressionSetStudy5, groupPheno = "condition", expGroup = c("Diseased", "ill"), refGroup = c("Healthy", "control"))
data(DExMAExampleData) ExpressionSetStudy5 newElem <-elementObjectMA(expressionMatrix = ExpressionSetStudy5, groupPheno = "condition", expGroup = c("Diseased", "ill"), refGroup = c("Healthy", "control"))
Shows a QQ-plot of the Cochran's test and the quantiles of I^2 statistic values to mesuare heterogeneity
heterogeneityTest(objectMA, probs = c(0, 0.25, 0.5, 0.75))
heterogeneityTest(objectMA, probs = c(0, 0.25, 0.5, 0.75))
objectMA |
A list of list. Each list contains two elements. The first element is the expression matrix (genes in rows and sample in columns) and the second element is a vector of zeros and ones that represents the state of the different samples of the expression matrix. 0 represents one group (controls) and 1 represents the other group (cases). The result of the CreateobjectMA can be used too. |
probs |
Numeric vector of probabilities with values between 0 and 1. It indicates the I^2 quantiles that will be returned |
If in the QQ-plot of the Cochran’s test most of the values are close to the central line (most of the Cochran’s test values are close to the expected distribution), it can be said that there is homogeneity. In the case that these values deviate greatly from the expected distribution, it must be assumed that there is heterogeneity. I^2 measures the percentage of variation across studies due to heterogeneity. To assume homogeneity in the gene expression meta-analysis, almost all I^2 values (quantiles) must be 0 or at least less than 0.25.
Quantiles of the I^2 values
Juan Antonio Villatoro Garcia, [email protected]
Higgins JPT, Thompson SG. Quantifying heterogeneity in a meta-analysis. Stat Med 2002;21:1539–58.
Higgins JPT, Thompson SG, Deeks JJ, et al. Measuring inconsistency in meta-analyses. BMJ 2003;327:557–60.
data(DExMAExampleData) heterogeneityTest(maObject)
data(DExMAExampleData) heterogeneityTest(maObject)
It allows to see how the different significant genes are expressed in the different samples. Missing genes appear in gray
makeHeatmap( objectMA, resMA, scaling=c("zscor","rscale","swr","none"), regulation=c("all", "up","down"), breaks=c(-2,2), fdrSig = 0.05, logFCSig = 1.5, numSig = "all", color = colorRampPalette(rev(brewer.pal(n = 7, name = "RdYlBu")))(100), na_col = "darkgrey", legend = TRUE, cluster_cols = FALSE, cluster_rows = FALSE, show_rownames = TRUE, show_colnames = FALSE)
makeHeatmap( objectMA, resMA, scaling=c("zscor","rscale","swr","none"), regulation=c("all", "up","down"), breaks=c(-2,2), fdrSig = 0.05, logFCSig = 1.5, numSig = "all", color = colorRampPalette(rev(brewer.pal(n = 7, name = "RdYlBu")))(100), na_col = "darkgrey", legend = TRUE, cluster_cols = FALSE, cluster_rows = FALSE, show_rownames = TRUE, show_colnames = FALSE)
objectMA |
A list of list. Each list contains two elements. The first element is the expression matrix (genes in rows and sample in columns) and the second element is a vector of zeros and ones that represents the state of the different samples of the expression matrix. 0 represents one group (controls) and 1 represents the other group (cases). The result of the CreateobjectMA can be used too. |
resMA |
Output generated by the differents functions that performs meta-analysis (metaES, metaPvalue, metaRank or metaAnalysisDE) |
scaling |
Character variable to choose between different scaling approaches. See "Details" for more information. |
regulation |
Character variable that indicates whether we want the heatmap to show all significant genes ("all"), only the up-regulated genes ("up") or only the down-regulated genes("down") |
breaks |
Numeric vector of length 2 that contains the extreme values (minimum and maximum) of the range of values in which the heatmap color scale will be distributed. Default a vector By default a vector of -2 and 2 as extreme values. |
fdrSig |
Adjusted p-value from which a gene is considered significant. Default 0.05 |
logFCSig |
In absolute value. Log Fold Change threshold from which genes are considered in the heatmap. |
numSig |
The number of most significant genes to be represented. If numSig = "all", all significant genes that meet the selected parameters will be represented. |
color |
Vector of colors used in heatmap |
na_col |
Color of the NA cell in the heatmap |
legend |
Logical to determine if legend should be drawn or not |
cluster_cols |
boolean values determining if columns should be clustered. |
cluster_rows |
boolean values determining if rows should be clustered. |
show_rownames |
boolean specifying if row names are be shown. |
show_colnames |
boolean specifying if column names are be shown. |
Scaling approaches that can be used are:
"rscale": it applies rescale function of scales package. Values will be between -1 and 1)
"zscor": It calculates a z-score value for each gene, that is, the mean gene expression from each gene is subtracted from each gene expression value and then it is divided by the standard deviation
"swr": it applys scaling relative to reference dataset approach
"none": any scaling approach it is applied.
The matrix represented in the heatmap
Juan Antonio Villatoro Garcia, [email protected]
Hadley Wickham and Dana Seidel (2020). scales: Scale Functions for Visualization. R package version 1.1.1. https://CRAN.R-project.org/package=scales
Lazar, C, Meganck, S, Taminau, J, and et al. 2013. “Batch Effect Removal Methods for Microarray Gene Expression Data Integration: A Survey,” 469–90.
Raivo Kolde 2019. pheatmap: Pretty Heatmaps. R package version 1.0.12. https://CRAN.R-project.org/package=pheatmap
createObjectMA
, metaAnalysisDE
data(DExMAExampleData) resultsMA <- metaAnalysisDE(maObject, typeMethod="REM") makeHeatmap(objectMA=maObject, resMA=resultsMA, scaling = "zscor", regulation = "all",breaks=c(-2,2), fdrSig = 0.05, logFCSig = 1.5, numSig=40)
data(DExMAExampleData) resultsMA <- metaAnalysisDE(maObject, typeMethod="REM") makeHeatmap(objectMA=maObject, resMA=resultsMA, scaling = "zscor", regulation = "all",breaks=c(-2,2), fdrSig = 0.05, logFCSig = 1.5, numSig=40)
It performs meta-analysis using seven different methods.
metaAnalysisDE( objectMA = NULL, effectS = NULL, pvalues = NULL, weight = NULL, typeMethod = c("FEM", "REM", "maxP", "minP", "Fisher", "Stouffer", "ACAT"), missAllow = 0.3, proportionData = 0.5 )
metaAnalysisDE( objectMA = NULL, effectS = NULL, pvalues = NULL, weight = NULL, typeMethod = c("FEM", "REM", "maxP", "minP", "Fisher", "Stouffer", "ACAT"), missAllow = 0.3, proportionData = 0.5 )
objectMA |
A list of list. Each list contains two elements. The first element is the expression matrix (genes in rows and sample in columns) and the second element is a vector of zeros and ones that represents the state of the diffenrent samples of the expression matrix. 0 represents one group (controls) and 1 represents the other group (cases). The result of the CreateobjectMA can be used too. |
effectS |
A list of two elements. The first element is a dataframe with genes in rows and studies in columns. Each component of the dataframe is the effect of a gene in a study. The second element of the list is also a dataframe with the same structure, but in this case each component of the dataframe represent the variance of the effect of a gene in a study. The third element of the list is also a dataframe with the same structure, but in this case each component of the dataframe represent the log fold change of a gene in a study. This argument should be only used in the case that objectMA argument is null. |
pvalues |
A list of two elements. The first element is a dataframe with genes in rows and studies in columns. Each component of the dataframe is the p-value of a gene in a study. The second element of the list is also a dataframe with the same structure, but in this case each component of the dataframe represent the log fold change of a gene in a study. This argument should be only used in the case that objectMA argument is null. |
weight |
A vector of the weights of each dataset. This argument should only be included in case objectMA is null and you want to use "Stouffer" or "ACAT" method. |
typeMethod |
A character that indicates the method to be peformed. See "Details"for more information |
missAllow |
A number that indicates the maximun proportion of missing values allowed in a sample. If the sample has more proportion of missing values the sample will be eliminated. In the other case the missing values will be imputed using the K-NN algorithm. In case the objectMA has been previously imputed, this element is not necessary. |
proportionData |
The minimum proportion of datasets in which a gene must be contained to be included. By default, the gene must be contained in at least half of the datasets. In case the objectMA has been previously imputed, this element is not necessary. |
The different meta-analysis methods that can be applied are:
Effects sizes methods:
"FEM": Fixed Effects model
"REM": Random Effects model
P-value combination mehods
"Fisher": Fisher's methods
"Stouffer": Stouffer's method
"maxP": maximum p-value method (Wilkinson's method)
"minP": minimum p-value method (Tippett's method)
"ACAT": Aggregated Cauchy Association Test method
A dataframe with the meta-analysis results. Depending on the applied method, a different dataframe is obtained. For more information see the package vignette.
Juan Antonio Villatoro Garcia, [email protected]
Daniel Toro-Domínguez, Juan Antonio Villatoro-García, Jordi Martorell-Marugán, Yolanda Román-Montoya, Marta E Alarcón-Riquelme, Pedro Carmona-Sáez, A survey of gene expression meta-analysis: methods and applications, Briefings in Bioinformatics, 2020;, bbaa019, https://doi.org/10.1093/bib/bbaa019
Michael Dewey (2020). metap: meta-analysis of significance values.
Liu, Y., Chen, S., Li, Z., Morrison, A. C., Boerwinkle, E., & Lin, X. (2019). ACAT: A Fast and Powerful p Value Combination Method for Rare-Variant Analysis in Sequencing Studies. The American Journal of Human Genetics, 104(3), 410-421. https://doi.org/10.1016/j.ajhg.2019.01.002
data(DExMAExampleData) ResultsMA <- metaAnalysisDE(objectMA=maObject, typeMethod="REM", missAllow=0.3, proportionData=0.5) ResultsMA
data(DExMAExampleData) ResultsMA <- metaAnalysisDE(objectMA=maObject, typeMethod="REM", missAllow=0.3, proportionData=0.5) ResultsMA
missGenesImput uses k-nearest neighbors in the space of samples to impute the unmeasured genes of the different datasets.
missGenesImput(objectMA, k = 7)
missGenesImput(objectMA, k = 7)
objectMA |
A list of list. Each list contains two elements. The first element is the expression matrix (genes in rows and sample in columns) and the second element is a vector of zeros and ones that represents the state of the different samples of the expression matrix. 0 represents one group (controls) and 1 represents the other group (cases). The result of the CreateobjectMA can be used too. |
k |
Number of neighbors to be used in the imputation (default=7). |
A list formed by two elements:
First element (objectMA) the same objectMA with missign genes imputed
Second element (imputIndicators) a list with 4 different objects:
imputValuesSample: Number of missing values imputed per sample
imputPercentageSample: Percentage of missing values imputed per sample
imputValuesGene: Number of missing values imputed per gene
imputPercentageGene: Percentage of missing values imputed per gene
Juan Antonio Villatoro Garcia, [email protected]
Christopher A Mancuso, Jacob L Canfield, Deepak Singla, Arjun Krishnan, A flexible, interpretable, and accurate approach for imputing the expression of unmeasured genes, Nucleic Acids Research, Volume 48, Issue 21, 2 December 2020, Page e125, https://doi.org/10.1093/nar/gkaa881
Alberto Franzin, Francesco Sambo, Barbara di Camillo. bnstruct: an R package for Bayesian Network structure learning in the presence of missing data. Bioinformatics, 2017; 33 (8): 1250-1252, Oxford University Press, https://doi.org/10.1093/bioinformatics/btw807
createObjectMA
, metaAnalysisDE
data(DExMAExampleData) missGenesImput(maObject)
data(DExMAExampleData) missGenesImput(maObject)
This function uses t-test based on limma package in other to obtain the individual p-values for each study and gene
pvalueIndAnalysis(objectMA, missAllow = 0.3)
pvalueIndAnalysis(objectMA, missAllow = 0.3)
objectMA |
A list of list. Each list contains two elements. The first element is the expression matrix (genes in rows and sample in columns) and the second element is a vector of zeros and ones that represents the state of the diffenrent samples of the expression matrix. 0 represents one group (controls) and 1 represents the other group (cases). The result of the CreateObjectMA can be used too. |
missAllow |
a number that indicates the maximun proportion of missing values allowed in a sample. If the sample has more proportion of missing values the sample will be eliminated. In the other case the missing values will be imputed using the K-NN algorithm. |
A list formed by two elements:
First element (p) is a dataframe were columns are each of the studies (datasets) and rows are the genes. Each element of the dataframe represents the p-value.
Second element (logFC) is a dataframe were columns are each of the studies (datasets) and rows are the genes. Each element of the dataframe is the logFC.
Third element (weights_z) is a dataframe were columns are each of the studies (datasets) and rows are the genes. Each element of the dataframe represents the necessary weights for Stouffer's method.
Juan Antonio Villatoro Garcia, [email protected]
createObjectMA
, metaAnalysisDE
data(DExMAExampleData) pvalues <- pvalueIndAnalysis(objectMA=maObject, missAllow=0.3) pvalues
data(DExMAExampleData) pvalues <- pvalueIndAnalysis(objectMA=maObject, missAllow=0.3) pvalues