Title: | Generate Quality Surrogate Variable Analysis for Degradation Correction |
---|---|
Description: | The qsvaR package contains functions for removing the effect of degration in rna-seq data from postmortem brain tissue. The package is equipped to help users generate principal components associated with degradation. The components can be used in differential expression analysis to remove the effects of degradation. |
Authors: | Joshua Stolz [aut] , Hedia Tnani [ctb, cre] , Leonardo Collado-Torres [ctb] , Nicholas J. Eagles [aut] |
Maintainer: | Hedia Tnani <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.11.1 |
Built: | 2025-01-10 03:18:28 UTC |
Source: | https://github.com/bioc/qsvaR |
These t-statistics are derived from the degradation timepoints data
built into qsvaR. They are the results from multiple models where
we determined the association of transcripts with degradation time
adjusting for brain region (so parallel degradation effects across
brain regions). They are used for plotting in DEqual()
.
A data.frame()
with the t
statistics for degradation time. The
rownames()
are the GENCODE transcript IDs.
A DEqual plot compares the effect of RNA degradation from an independent
degradation experiment on the y axis to the effect of the outcome of
interest. They were orignally described by Jaffe et al, PNAS, 2017
https://doi.org/10.1073/pnas.1617384114. Other DEqual versions are
included in Collado-Torres et al, Neuron, 2019
https://doi.org/10.1016/j.neuron.2019.05.013. This function compares your
t-statistics of interest computed on transcripts against the
t-statistics from degradation time adjusting for the six brain regions from
degradation experiment data used for determining rse_tx
.
DEqual( DE, deg_tstats = qsvaR::degradation_tstats, show.legend = TRUE, show.cor = c("caption", "corner-top", "corner-bottom", "none"), font.size = 12, cor.size = font.size/2, cor.label = "cor: " )
DEqual( DE, deg_tstats = qsvaR::degradation_tstats, show.legend = TRUE, show.cor = c("caption", "corner-top", "corner-bottom", "none"), font.size = 12, cor.size = font.size/2, cor.label = "cor: " )
DE |
a |
deg_tstats |
an optional |
show.legend |
logical (default TRUE) to show legend in the plot |
show.cor |
specify where to show the correlation value. Can be one of "caption", "corner-top", "corner-bottom", or "none". |
font.size |
numeric value to set the base font size of the plot |
cor.size |
numeric (default font.size/2) to set the font size for the correlation text |
cor.label |
character (default "cor: ") to set the text preceding the correlation value |
a ggplot
object of the DE t-statistic vs
the DE statistic from degradation
## Random differential expression t-statistics for the same transcripts ## we have degradation t-statistics for in `degradation_tstats`. set.seed(101) random_de <- data.frame( t = rt(nrow(degradation_tstats), 5), row.names = sample( rownames(degradation_tstats), nrow(degradation_tstats) ) ) ## Create the DEqual plot DEqual(random_de)
## Random differential expression t-statistics for the same transcripts ## we have degradation t-statistics for in `degradation_tstats`. set.seed(101) random_de <- data.frame( t = rt(nrow(degradation_tstats), 5), row.names = sample( rownames(degradation_tstats), nrow(degradation_tstats) ) ) ## Create the DEqual plot DEqual(random_de)
Using the pcs and the k number of components be included, we generate the qsva matrix.
get_qsvs(qsvPCs, k)
get_qsvs(qsvPCs, k)
qsvPCs |
prcomp object generated by taking the pcs of degraded transcripts |
k |
number of qsvs to be included. |
matrix with k principal components for each sample.
qsv <- getPCs(rse_tx, "tpm") get_qsvs(qsv, 2)
qsv <- getPCs(rse_tx, "tpm") get_qsvs(qsv, 2)
This function is used to obtain a RangedSummarizedExperiment-class of transcripts and their expression values #' These transcripts are selected based on a prior study of RNA degradation in postmortem brain tissues. This object can later be used to obtain the principle components necessary to remove the effect of degradation in differential expression.
getDegTx( rse_tx, sig_transcripts = select_transcripts(), assayname = "tpm", verbose = TRUE )
getDegTx( rse_tx, sig_transcripts = select_transcripts(), assayname = "tpm", verbose = TRUE )
rse_tx |
A RangedSummarizedExperiment-class object containing the transcript data desired to be studied. |
sig_transcripts |
A |
assayname |
character string specifying the name of the assay desired in rse_tx |
verbose |
specify if the function should report how many model transcripts were matched |
A RangedSummarizedExperiment-class object.
degTx <- getDegTx(rse_tx)
degTx <- getDegTx(rse_tx)
This function returns the pcs from the obtained RangedSummarizedExperiment object of selected transcripts
getPCs(rse_tx, assayname = "tpm")
getPCs(rse_tx, assayname = "tpm")
rse_tx |
Ranged Summarizeed Experiment with only trancsripts selected for qsva |
assayname |
character string specifying the name of the assay desired in rse_tx |
prcomp object generated by taking the pcs of degraded transcripts
getPCs(rse_tx, "tpm")
getPCs(rse_tx, "tpm")
Apply num.sv algorithm to determine the number of pcs to be included
k_qsvs(rse_tx, mod, assayname)
k_qsvs(rse_tx, mod, assayname)
rse_tx |
A RangedSummarizedExperiment-class object containing the transcript data desired to be studied. |
mod |
Model Matrix with necessary variables the you would model for in differential expression |
assayname |
character string specifying the name of the assay desired in rse_tx |
integer representing number of pcs to be included
## First we need to define a statistical model. We'll use the example ## rse_tx data. Note that the model you'll use in your own data ## might look different from this model. mod <- model.matrix(~ mitoRate + Region + rRNA_rate + totalAssignedGene + RIN, data = colData(rse_tx) ) ## To ensure that the results are reproducible, you will need to set a ## random seed with the set.seed() function. Internally, we are using ## sva::num.sv() which needs a random seed to ensure reproducibility of the ## results. set.seed(20230621) k_qsvs(rse_tx, mod, "tpm")
## First we need to define a statistical model. We'll use the example ## rse_tx data. Note that the model you'll use in your own data ## might look different from this model. mod <- model.matrix(~ mitoRate + Region + rRNA_rate + totalAssignedGene + RIN, data = colData(rse_tx) ) ## To ensure that the results are reproducible, you will need to set a ## random seed with the set.seed() function. Internally, we are using ## sva::num.sv() which needs a random seed to ensure reproducibility of the ## results. set.seed(20230621) k_qsvs(rse_tx, mod, "tpm")
This function removes the Gencode/ENSEMBL version from the transcript ID, while protecting _PAR_Y suffixes if present
normalize_tx_names(txnames)
normalize_tx_names(txnames)
txnames |
A |
A
character()
vector of transcript names without versioning
ensIDs <- normalize_tx_names(rownames(rse_tx))
ensIDs <- normalize_tx_names(rownames(rse_tx))
A wrapper function used to perform qSVA in one step.
qSVA(rse_tx, sig_transcripts = select_transcripts(), mod, assayname)
qSVA(rse_tx, sig_transcripts = select_transcripts(), mod, assayname)
rse_tx |
A RangedSummarizedExperiment-class object containing the transcript data desired to be studied. |
sig_transcripts |
A |
mod |
Model Matrix with necessary variables the you would model for in differential expression. |
assayname |
character string specifying the name of the assay desired in rse_tx |
matrix with k principal components for each sample
## First we need to define a statistical model. We'll use the example ## rse_tx data. Note that the model you'll use in your own data ## might look different from this model. mod <- model.matrix(~ mitoRate + Region + rRNA_rate + totalAssignedGene + RIN, data = colData(rse_tx) ) ## To ensure that the results are reproducible, you will need to set a ## random seed with the set.seed() function. Internally, we are using ## sva::num.sv() which needs a random seed to ensure reproducibility of the ## results. set.seed(20230621) qSVA(rse_tx = rse_tx, mod = mod, assayname = "tpm")
## First we need to define a statistical model. We'll use the example ## rse_tx data. Note that the model you'll use in your own data ## might look different from this model. mod <- model.matrix(~ mitoRate + Region + rRNA_rate + totalAssignedGene + RIN, data = colData(rse_tx) ) ## To ensure that the results are reproducible, you will need to set a ## random seed with the set.seed() function. Internally, we are using ## sva::num.sv() which needs a random seed to ensure reproducibility of the ## results. set.seed(20230621) qSVA(rse_tx = rse_tx, mod = mod, assayname = "tpm")
This data is a RangedSummarizedExperiment-class with transcript quantification data stored in an "tpm" assay. It is used to demonstrate the use of qsvaR in bulk RNA-seq data.
A RangedSummarizedExperiment-class
Helper function to select which experimental model(s) will be used to
generate the qSVs. Degradation-associated transcripts are derived in four
different models (transcripts). The cell_component
parameter
controls whether the models with cell-type proportions are included. This
function extract the top top_n
transcripts found to be significant in
each considered model, then returns the union of transcripts across all
considered models. Up to 10,000 transcripts are available to select from
each model prior to taking the union.
select_transcripts(top_n = 1000, cell_component = FALSE)
select_transcripts(top_n = 1000, cell_component = FALSE)
top_n |
An |
cell_component |
A |
A character()
with the transcript IDs.
## Default set of transcripts associated with degradation sig_transcripts <- select_transcripts() length(sig_transcripts) head(sig_transcripts) ## Select more transcripts if desired length(select_transcripts(top_n = 5000))
## Default set of transcripts associated with degradation sig_transcripts <- select_transcripts() length(sig_transcripts) head(sig_transcripts) ## Select more transcripts if desired length(select_transcripts(top_n = 5000))
This object is a list of four tibble
s where each element corresponds to the
top 10,000 transcripts (by significance) and their adjusted p-values for a
given degradation model. The main_model
model is a linear model modelling
expression against a sample's degradation time, with brain region as a
covariate. The int_model
model is similar but includes an interaction term
with degradation time and brain region. The cell_main_model
and
cell_int_model
models are like the respective main_model
and int_model
models, but including cell-type fractions from deconvolution as linear terms.
transcripts
transcripts
A list()
of tibble()
s containing the transcripts and adjusted
p-values selected by each model.
Each string is a GENCODE transcript IDs.
This function is used to check if tx1 and tx2 are GENCODE or ENSEMBL transcript IDs and return an integer vector of tx1 transcript indexes that are in tx2.
which_tx_names(txnames, sig_tx)
which_tx_names(txnames, sig_tx)
txnames |
A |
sig_tx |
A |
A
integer()
vector of txnames
transcript indexes in sig_tx
.
sig_tx <- select_transcripts(cell_component = TRUE) whichTx <- which_tx_names(rownames(rse_tx), sig_tx)
sig_tx <- select_transcripts(cell_component = TRUE) whichTx <- which_tx_names(rownames(rse_tx), sig_tx)