Title: | Principal Variance Component Analysis (PVCA) |
---|---|
Description: | This package contains the function to assess the batch sourcs by fitting all "sources" as random effects including two-way interaction terms in the Mixed Model(depends on lme4 package) to selected principal components, which were obtained from the original data correlation matrix. This package accompanies the book "Batch Effects and Noise in Microarray Experiements, chapter 12. |
Authors: | Pierre Bushel <[email protected]> |
Maintainer: | Jianying LI <[email protected]> |
License: | LGPL (>= 2.0) |
Version: | 1.47.0 |
Built: | 2024-11-18 04:24:14 UTC |
Source: | https://github.com/bioc/pvca |
This package contains the function to assess the batch sources by fitting all "sources" as random effects including two-way interaction terms in the Mixed Model(depends on lme4 package) to selected principal components, which were obtained from the original data correlation matrix. This package accompanies the book "Batch Effects and Noise in Microarray Experiements, chapter 12.
Package: | pvca |
Type: | Package |
Version: | 1.0 |
Date: | 2012-09-11 |
License: | LGPL (>= 2.0) |
library(golubEsets)
data(Golub_Merge) pct_threshold <- 0.6 batch.factors <- c("ALL.AML", "BM.PB", "Source")
pvcaObj <- pvcaBatchAssess (Golub_Merge, batch.factors, pct_threshold) bp <- barplot(pvcaObj$dat, xlab = "Effects", ylab = "Weighted average proportion variance", ylim= c(0,1.1),col = c("blue"), las=2, main="PVCA estimation bar chart") axis(1, at = bp, labels = pvcaObj$label, xlab = "Effects", cex.axis = 0.5, las=2) values = pvcaObj$dat new_values = round(values , 3) text(bp,pvcaObj$dat,labels = new_values, pos=3, cex = 0.8) print(sessionInfo())
Pierre Bushel <[email protected]>
Maintainer: Jianying LI <[email protected]>
Batch Effects and Noise in Microarray Experiments: Sources and Solutions. 2009 John Wiley & Sons, Ltd.
This package contains the function to assess the batch sources by fitting all "sources" as random effects including two-way interaction terms in the Mixed Model(depends on lme4 package) to selected principal components, which were obtained from the original data correlation matrix. This package accompanies the book "Batch Effects and Noise in Microarray Experiements, chapter 12.
pvcaBatchAssess(abatch, batch.factors, threshold)
pvcaBatchAssess(abatch, batch.factors, threshold)
abatch |
an instance of ExpresseionSet which can be imported from Biobase |
batch.factors |
A vector of factors that the mixed linear model will be fit on |
threshold |
the percentile value of the minimum amount of the variabilities that the selected principal components need to explain |
Often times "batch effects" are present in microarray data due to any number of factors, including e.g. a poor experimental design or when the gene expression data is combined from different studies with limited standardization. To estimate the variability of experimental effects including batch, a novel hybrid approach known as principal variance component analysis (PVCA) has been developed. The approach leverages the strengths of two very popular data analysis methods: first, principal component analysis (PCA) is used to efficiently reduce data dimension while maintaining the majority of the variability in the data, and variance components analysis (VCA) fits a mixed linear model using factors of interest as random effects to estimate and partition the total variability. The PVCA approach can be used as a screening tool to determine which sources of variability (biological, technical or other) are most prominent in a given microarray data set. Using the eigenvalues associated with their corresponding eigenvectors as weights, associated variations of all factors are standardized and the magnitude of each source of variability (including each batch effect) is presented as a proportion of total variance. Although PVCA is a generic approach for quantifying the corresponding proportion of variation of each effect, it can be a handy assessment for estimating batch effect before and after batch normalization.
dat |
A numerica vector contains the percentile of sources of batch effect for each term |
label |
A character vector containing the name for each term for plot label purpose |
Modified and maintained by Jianying Li
Pierre Bushel
library(golubEsets) data(Golub_Merge) pct_threshold <- 0.6 batch.factors <- c("ALL.AML", "BM.PB", "Source") pvcaObj <- pvcaBatchAssess (Golub_Merge, batch.factors, pct_threshold) bp <- barplot(pvcaObj$dat, xlab = "Effects", ylab = "Weighted average proportion variance", ylim= c(0,1.1), col = c("blue"), las=2, main="PVCA estimation bar chart") axis(1, at = bp, labels = pvcaObj$label, xlab = "Effects", cex.axis = 0.5, las=2) values = pvcaObj$dat new_values = round(values , 3) text(bp,pvcaObj$dat,labels = new_values, pos=3, cex = 0.8) print(sessionInfo())
library(golubEsets) data(Golub_Merge) pct_threshold <- 0.6 batch.factors <- c("ALL.AML", "BM.PB", "Source") pvcaObj <- pvcaBatchAssess (Golub_Merge, batch.factors, pct_threshold) bp <- barplot(pvcaObj$dat, xlab = "Effects", ylab = "Weighted average proportion variance", ylim= c(0,1.1), col = c("blue"), las=2, main="PVCA estimation bar chart") axis(1, at = bp, labels = pvcaObj$label, xlab = "Effects", cex.axis = 0.5, las=2) values = pvcaObj$dat new_values = round(values , 3) text(bp,pvcaObj$dat,labels = new_values, pos=3, cex = 0.8) print(sessionInfo())