Title: | Generate Quality Surrogate Variable Analysis for Degradation Correction |
---|---|
Description: | The qsvaR package contains functions for removing the effect of degration in rna-seq data from postmortem brain tissue. The package is equipped to help users generate principal components associated with degradation. The components can be used in differential expression analysis to remove the effects of degradation. |
Authors: | Joshua Stolz [aut] , Hedia Tnani [ctb, cre] , Leonardo Collado-Torres [ctb] |
Maintainer: | Hedia Tnani <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.11.0 |
Built: | 2024-11-18 04:14:40 UTC |
Source: | https://github.com/bioc/qsvaR |
This function is used to check if the tx1 and tx2 are GENCODE or ENSEMBL and print an error message if it's not and return a character vector of transcripts in tx2 that are in tx1.
check_tx_names(tx1, tx2, arg_name1, arg_name2)
check_tx_names(tx1, tx2, arg_name1, arg_name2)
tx1 |
A |
tx2 |
A |
arg_name1 |
A |
arg_name2 |
A |
A
character()
vector of transcripts in tx2
that are in tx1
.
sig_transcripts = select_transcripts("cell_component") check_tx_names(rownames(covComb_tx_deg), sig_transcripts, 'rownames(covComb_tx_deg)', 'sig_transcripts')
sig_transcripts = select_transcripts("cell_component") check_tx_names(rownames(covComb_tx_deg), sig_transcripts, 'rownames(covComb_tx_deg)', 'sig_transcripts')
This data was generated from an experiment using degraded RNA-seq samples post-mortem brain tissue. The transcripts included are the result of the qsva expanded framework study and will be used to remove the effect of degradation in bulk RNA-seq data.
A RangedSummarizedExperiment-class
These t-statistics are derived from the same data that was used for
covComb_tx_deg. They are the results from main model where we determined
the relationship with degradation time adjusting for the brain region (so
parallel degradation effects across brain regions). They are used for
plotting in DEqual()
.
A data.frame()
with the t
statistics for degradation time. The
rownames()
are the GENCODE transcript IDs.
A DEqual plot compares the effect of RNA degradation from an independent
degradation experiment on the y axis to the effect of the outcome of
interest. They were orignally described by Jaffe et al, PNAS, 2017
https://doi.org/10.1073/pnas.1617384114. Other DEqual versions are
included in Collado-Torres et al, Neuron, 2019
https://doi.org/10.1016/j.neuron.2019.05.013. This function compares your
t-statistics of interest computed on transcripts against the
t-statistics from degradation time adjusting for the six brain regions from
degradation experiment data used for determining covComb_tx_deg
.
DEqual(DE)
DEqual(DE)
DE |
a |
a ggplot
object of the DE t-statistic vs
the DE statistic from degradation
## Random differential expression t-statistics for the same transcripts ## we have degradation t-statistics for in `degradation_tstats`. set.seed(101) random_de <- data.frame( t = rt(nrow(degradation_tstats), 5), row.names = sample( rownames(degradation_tstats), nrow(degradation_tstats) ) ) ## Create the DEqual plot DEqual(random_de)
## Random differential expression t-statistics for the same transcripts ## we have degradation t-statistics for in `degradation_tstats`. set.seed(101) random_de <- data.frame( t = rt(nrow(degradation_tstats), 5), row.names = sample( rownames(degradation_tstats), nrow(degradation_tstats) ) ) ## Create the DEqual plot DEqual(random_de)
Using the pcs and the k number of components be included, we generate the qsva matrix.
get_qsvs(qsvPCs, k)
get_qsvs(qsvPCs, k)
qsvPCs |
prcomp object generated by taking the pcs of degraded transcripts |
k |
number of qsvs to be included. |
matrix with k principal components for each sample.
qsv <- getPCs(covComb_tx_deg, "tpm") get_qsvs(qsv, 2)
qsv <- getPCs(covComb_tx_deg, "tpm") get_qsvs(qsv, 2)
This function is used to obtain a RangedSummarizedExperiment-class of transcripts and their expression values #' These transcripts are selected based on a prior study of RNA degradation in postmortem brain tissues. This object can later be used to obtain the principle components necessary to remove the effect of degradation in differential expression.
getDegTx( rse_tx, type = c("cell_component", "standard", "top1500"), sig_transcripts = select_transcripts(type), assayname = "tpm" )
getDegTx( rse_tx, type = c("cell_component", "standard", "top1500"), sig_transcripts = select_transcripts(type), assayname = "tpm" )
rse_tx |
A RangedSummarizedExperiment-class object containing the transcript data desired to be studied. |
type |
A |
sig_transcripts |
A list of transcripts determined to have degradation signal in the qsva expanded paper. |
assayname |
character string specifying the name of the assay desired in rse_tx |
A RangedSummarizedExperiment-class object.
getDegTx(covComb_tx_deg) stopifnot(mean(rowMeans(assays(covComb_tx_deg)$tpm)) > 1)
getDegTx(covComb_tx_deg) stopifnot(mean(rowMeans(assays(covComb_tx_deg)$tpm)) > 1)
This function returns the pcs from the obtained RangedSummarizedExperiment object of selected transcripts
getPCs(rse_tx, assayname = "tpm")
getPCs(rse_tx, assayname = "tpm")
rse_tx |
Ranged Summarizeed Experiment with only trancsripts selected for qsva |
assayname |
character string specifying the name of the assay desired in rse_tx |
prcomp object generated by taking the pcs of degraded transcripts
getPCs(covComb_tx_deg, "tpm")
getPCs(covComb_tx_deg, "tpm")
Apply num.sv algorithm to determine the number of pcs to be included
k_qsvs(rse_tx, mod, assayname)
k_qsvs(rse_tx, mod, assayname)
rse_tx |
A RangedSummarizedExperiment-class object containing the transcript data desired to be studied. |
mod |
Model Matrix with necessary variables the you would model for in differential expression |
assayname |
character string specifying the name of the assay desired in rse_tx |
integer representing number of pcs to be included
## First we need to define a statistical model. We'll use the example ## covComb_tx_deg data. Note that the model you'll use in your own data ## might look different from this model. mod <- model.matrix(~ mitoRate + Region + rRNA_rate + totalAssignedGene + RIN, data = colData(covComb_tx_deg) ) ## To ensure that the results are reproducible, you will need to set a ## random seed with the set.seed() function. Internally, we are using ## sva::num.sv() which needs a random seed to ensure reproducibility of the ## results. set.seed(20230621) k_qsvs(covComb_tx_deg, mod, "tpm")
## First we need to define a statistical model. We'll use the example ## covComb_tx_deg data. Note that the model you'll use in your own data ## might look different from this model. mod <- model.matrix(~ mitoRate + Region + rRNA_rate + totalAssignedGene + RIN, data = colData(covComb_tx_deg) ) ## To ensure that the results are reproducible, you will need to set a ## random seed with the set.seed() function. Internally, we are using ## sva::num.sv() which needs a random seed to ensure reproducibility of the ## results. set.seed(20230621) k_qsvs(covComb_tx_deg, mod, "tpm")
A wrapper function used to perform qSVA in one step.
qSVA( rse_tx, type = c("cell_component", "standard", "top1500"), sig_transcripts = select_transcripts(type), mod, assayname )
qSVA( rse_tx, type = c("cell_component", "standard", "top1500"), sig_transcripts = select_transcripts(type), mod, assayname )
rse_tx |
A RangedSummarizedExperiment-class object containing the transcript data desired to be studied. |
type |
a character string specifying which model you would like to use when selecting a degradation matrix. |
sig_transcripts |
A list of transcripts that are associated with
degradation signal. Use |
mod |
Model Matrix with necessary variables the you would model for in differential expression |
assayname |
character string specifying the name of the assay desired in rse_tx |
matrix with k principal components for each sample
## First we need to define a statistical model. We'll use the example ## covComb_tx_deg data. Note that the model you'll use in your own data ## might look different from this model. mod <- model.matrix(~ mitoRate + Region + rRNA_rate + totalAssignedGene + RIN, data = colData(covComb_tx_deg) ) ## To ensure that the results are reproducible, you will need to set a ## random seed with the set.seed() function. Internally, we are using ## sva::num.sv() which needs a random seed to ensure reproducibility of the ## results. set.seed(20230621) qSVA(rse_tx = covComb_tx_deg, type = "cell_component", mod = mod, assayname = "tpm")
## First we need to define a statistical model. We'll use the example ## covComb_tx_deg data. Note that the model you'll use in your own data ## might look different from this model. mod <- model.matrix(~ mitoRate + Region + rRNA_rate + totalAssignedGene + RIN, data = colData(covComb_tx_deg) ) ## To ensure that the results are reproducible, you will need to set a ## random seed with the set.seed() function. Internally, we are using ## sva::num.sv() which needs a random seed to ensure reproducibility of the ## results. set.seed(20230621) qSVA(rse_tx = covComb_tx_deg, type = "cell_component", mod = mod, assayname = "tpm")
Helper function to select which experimental model will be used to generate the qSVs.
select_transcripts(type = c("cell_component", "top1500", "standard"))
select_transcripts(type = c("cell_component", "top1500", "standard"))
type |
A |
A character()
with the transcript IDs.
## Default set of transcripts associated with degradation sig_transcripts <- select_transcripts() length(sig_transcripts) head(sig_transcripts) ## Example where match.arg() auto-completes select_transcripts("top")
## Default set of transcripts associated with degradation sig_transcripts <- select_transcripts() length(sig_transcripts) head(sig_transcripts) ## Example where match.arg() auto-completes select_transcripts("top")
An object storing three lists of transcripts each corresponding to a model used in the degradation experiment.
These were determined by Joshua M. Stolz et al, 2022. Here the names "cell_component", "top1500", and "standard" refer to models that were determined to be effective in removing degradation effects.
The "standard" model involves taking the union of the top 1000 transcripts associated with degradation from the interaction model and the main effect model.
The "top1500" model is the same as the "standard" model except the union of the top 1500 genes associated with degradation is selected.
The most effective of our models, "cell_component", involved deconvolution of the degradation matrix to determine the proportion of cell types within our studied tissue.
These proportions were then added to our model.matrix()
and the union of the top 1000 transcripts in the interaction model, the main effect model, and the cell proportions model were used to generate this model of qSVs.
transcripts
transcripts
A list()
with character strings containing the transcripts selected by each model.
Each string is a GENCODE transcript IDs.