Title: | Statistical analysis of RNA editing sites and hyper-editing regions |
---|---|
Description: | RNAeditr analyzes site-specific RNA editing events, as well as hyper-editing regions. The editing frequencies can be tested against binary, continuous or survival outcomes. Multiple covariate variables as well as interaction effects can also be incorporated in the statistical models. |
Authors: | Lanyu Zhang [aut, cre], Gabriel Odom [aut], Tiago Silva [aut], Lissette Gomez [aut], Lily Wang [aut] |
Maintainer: | Lanyu Zhang <[email protected]> |
License: | GPL-3 |
Version: | 1.17.0 |
Built: | 2024-11-13 06:17:06 UTC |
Source: | https://github.com/bioc/rnaEditr |
A wrapper function to extract clusters of RNA editing sites that are located closely in genomic regions.
AllCloseByRegions( regions_gr, rnaEditMatrix, maxGap = 50, minSites = 3, progressBar = "time" )
AllCloseByRegions( regions_gr, rnaEditMatrix, maxGap = 50, minSites = 3, progressBar = "time" )
regions_gr |
A GRanges object of input genomic regions. |
rnaEditMatrix |
A matrix (or data frame) of RNA editing level values on
individual sites, with row names as site IDs in the form of
"chrAA:XXXXXXXX", and column names as sample IDs. Please make sure to
follow the format of example dataset ( |
maxGap |
An integer, genomic locations within |
minSites |
An integer, minimum number of RNA editing sites within each resulting cluster. Defaults to 3. |
progressBar |
Name of the progress bar to use. There are currently five
types of progress bars: |
The algorithm of this function is based on the
clusterMaker
function in the bumphunter
R package. Each cluster is essentially a group of site locations such that
two consecutive locations in the cluster are separated by less than
maxGap
.
A GRanges object containing genomic regions of RNA editing sites located closely within each input pre-defined genomic region.
TransformToGR
, AllCoeditedRegions
,
CreateEditingTable
, SummarizeAllRegions
,
TestAssociations
, AnnotateResults
data(rnaedit_df) exm_regions <- TransformToGR( genes_char = c("PHACTR4", "CCR5", "METTL7A"), type = "symbol", genome = "hg19" ) AllCloseByRegions( regions_gr = exm_regions, rnaEditMatrix = rnaedit_df, maxGap = 50, minSites = 3, progressBar = "time" )
data(rnaedit_df) exm_regions <- TransformToGR( genes_char = c("PHACTR4", "CCR5", "METTL7A"), type = "symbol", genome = "hg19" ) AllCloseByRegions( regions_gr = exm_regions, rnaEditMatrix = rnaedit_df, maxGap = 50, minSites = 3, progressBar = "time" )
A wrapper function to extract contiguous co-edited genomic regions from input genomic regions.
AllCoeditedRegions( regions_gr, rnaEditMatrix, output = c("GRanges", "dataframe"), rDropThresh_num = 0.4, minPairCorr = 0.1, minSites = 3, method = c("spearman", "pearson"), returnAllSites = FALSE, progressBar = "time", verbose = TRUE )
AllCoeditedRegions( regions_gr, rnaEditMatrix, output = c("GRanges", "dataframe"), rDropThresh_num = 0.4, minPairCorr = 0.1, minSites = 3, method = c("spearman", "pearson"), returnAllSites = FALSE, progressBar = "time", verbose = TRUE )
regions_gr |
A GRanges object of input genomic regions. |
rnaEditMatrix |
A matrix (or data frame) of RNA editing level values on
individual sites, with row names as site IDs in the form of
"chrAA:XXXXXXXX", and column names as sample IDs. Please make sure to
follow the format of example dataset ( |
output |
Type of output data. Defaults to |
rDropThresh_num |
Threshold for minimum correlation between RNA editing levels of one site and the mean RNA editing levels of the rest of the sites. Please set a number between 0 and 1. Defaults to 0.4. |
minPairCorr |
Threshold for minimum pairwise correlation of sites within a selected cluster. To use this filter, set a number between -1 and 1 (defaults to 0.1). To select all clusters (i.e. no filter), please set this argument to -1. |
minSites |
Minimum number of sites to be considered as a region. Only
regions with more than |
method |
Method for computing correlation. Defaults to
|
returnAllSites |
When no contiguous co-edited regions are found in
an input genomic region, |
progressBar |
Name of the progress bar to use. There are currently five
types of progress bars: |
verbose |
Should messages and warnings be displayed? Defaults to FALSE,
but is set to TRUE when called from within |
When output
is set as "GRanges"
, a GRanges object with
seqnames
, ranges
and strand
of the contiguous
co-edited regions will be returned. When output
is set as
"dataframe"
, a data frame with following columns will be returned:
site
: site ID.
chr
: chromosome number.
pos
: genomic position number.
r_drop
: the correlation between RNA editing levels of
one site and the mean RNA editing levels of the rest of the sites.
keep
: indicator for co-edited sites, the sites with
keep = 1
belong to the contiguous and co-edited region.
keep_contiguous
: contiguous co-edited region number.
regionMinPairwiseCor
: the pairwise correlation of a
subregion.
keep_regionMinPairwiseCor
: indicator for contiguous
co-edited subregions, the regions with keepminPairwiseCor = 1
passed the minimum correlation and will be returned as a contiguous
co-edited subregion.
TransformToGR
, AllCloseByRegions
,
CreateEditingTable
, SummarizeAllRegions
,
TestAssociations
, AnnotateResults
data(rnaedit_df) genes_gr <- TransformToGR( genes_char = c("PHACTR4", "CCR5", "METTL7A"), type = "symbol", genome = "hg19" ) AllCoeditedRegions( regions_gr = genes_gr, rnaEditMatrix = rnaedit_df, output = "GRanges", method = "spearman" )
data(rnaedit_df) genes_gr <- TransformToGR( genes_char = c("PHACTR4", "CCR5", "METTL7A"), type = "symbol", genome = "hg19" ) AllCoeditedRegions( regions_gr = genes_gr, rnaEditMatrix = rnaedit_df, output = "GRanges", method = "spearman" )
Add annotations to site-specific or region-based analysis
results from function TestAssociations
.
AnnotateResults( results_df, closeByRegions_gr = NULL, inputRegions_gr = NULL, genome = c("hg38", "hg19"), analysis = c("region-based", "site-specific") )
AnnotateResults( results_df, closeByRegions_gr = NULL, inputRegions_gr = NULL, genome = c("hg38", "hg19"), analysis = c("region-based", "site-specific") )
results_df |
An output data frame from function
|
closeByRegions_gr |
An output GRanges object from function
|
inputRegions_gr |
A GRanges object for input genomic
regions, defaults to |
genome |
Use |
analysis |
Results type. Defaults to |
A data frame with locations of the genomic sites or regions
(seqnames, start, end, width
), annotations for locations
(inputRegion, closeByRegion, symbol
), test statistics
(estimate, stdErr
or coef, exp_coef, se_coef
), pValue
and false discovery rate (fdr
).
TransformToGR
, AllCloseByRegions
,
AllCoeditedRegions
, CreateEditingTable
,
SummarizeAllRegions
, TestAssociations
data(rnaedit_df) # get GRanges for genes genes_gr <- TransformToGR( genes_char = c("PHACTR4", "CCR5", "METTL7A"), type = "symbol", genome = "hg19" ) # find close-by regions within the genes closebyRegions_gr <- AllCloseByRegions( regions_gr = genes_gr, rnaEditMatrix = rnaedit_df ) # identify co-edited regions within the genes coedited_gr <- AllCoeditedRegions( regions_gr = closebyRegions_gr, rnaEditMatrix = rnaedit_df, output = "GRanges", method = "spearman" ) # summarize editing levels within each gene by maximum summarizedRegions_df <- SummarizeAllRegions( regions_gr = coedited_gr, rnaEditMatrix = rnaedit_df, selectMethod = MaxSites ) exm_pheno <- readRDS( system.file( "extdata", "pheno_df.RDS", package = 'rnaEditr', mustWork = TRUE ) ) # test summarized editing levels against survival outcome results_df <- TestAssociations( rnaEdit_df = summarizedRegions_df, pheno_df = exm_pheno, responses_char = "sample_type", covariates_char = NULL, respType = "binary" ) AnnotateResults( results_df = results_df, closeByRegions_gr = closebyRegions_gr, inputRegions_gr = genes_gr, genome = "hg19" )
data(rnaedit_df) # get GRanges for genes genes_gr <- TransformToGR( genes_char = c("PHACTR4", "CCR5", "METTL7A"), type = "symbol", genome = "hg19" ) # find close-by regions within the genes closebyRegions_gr <- AllCloseByRegions( regions_gr = genes_gr, rnaEditMatrix = rnaedit_df ) # identify co-edited regions within the genes coedited_gr <- AllCoeditedRegions( regions_gr = closebyRegions_gr, rnaEditMatrix = rnaedit_df, output = "GRanges", method = "spearman" ) # summarize editing levels within each gene by maximum summarizedRegions_df <- SummarizeAllRegions( regions_gr = coedited_gr, rnaEditMatrix = rnaedit_df, selectMethod = MaxSites ) exm_pheno <- readRDS( system.file( "extdata", "pheno_df.RDS", package = 'rnaEditr', mustWork = TRUE ) ) # test summarized editing levels against survival outcome results_df <- TestAssociations( rnaEdit_df = summarizedRegions_df, pheno_df = exm_pheno, responses_char = "sample_type", covariates_char = NULL, respType = "binary" ) AnnotateResults( results_df = results_df, closeByRegions_gr = closebyRegions_gr, inputRegions_gr = genes_gr, genome = "hg19" )
rnaEdit_df
.Convert RNA editing matrix to a special data frame with class
rnaEdit_df
, which is then used to identify differentially co-edited
regions with function TestAssociations
.
CreateEditingTable(rnaEditMatrix)
CreateEditingTable(rnaEditMatrix)
rnaEditMatrix |
A matrix of RNA editing level values on individual
sites, with row names as site IDs in the form of "chrAA:XXXXXXXX", and
column names as sample IDs. Please make sure to
follow the format of example dataset ( |
A dataset of class rnaEdit_df
, includes variables
seqnames, start, end, width
and summarized RNA editing levels in
each sample.
TransformToGR
, AllCloseByRegions
,
AllCoeditedRegions
, SummarizeAllRegions
,
TestAssociations
, AnnotateResults
data(rnaedit_df) CreateEditingTable(rnaEditMatrix = rnaedit_df)[1:3, 1:5]
data(rnaedit_df) CreateEditingTable(rnaEditMatrix = rnaedit_df)[1:3, 1:5]
A subset of the TCGA breast cancer RNA editing dataset for 272 edited sites on genes PHACTR4, CCR5, METTL7A and a few randomly sampled sites for 221 subjects.
rnaedit_df
rnaedit_df
A data frame containing RNA editing levels for 272 sites (in the rows) for 221 subjects (in the columns). Row names are site IDs and column names are sample IDs.
Synapse database ID: syn2374375.
A wrapper function to summarize RNA editing levels from multiple sites in regions.
SummarizeAllRegions( regions_gr, rnaEditMatrix, selectMethod = MedianSites, progressBar = "time", ... )
SummarizeAllRegions( regions_gr, rnaEditMatrix, selectMethod = MedianSites, progressBar = "time", ... )
regions_gr |
A GRanges object of input genomic regions. |
rnaEditMatrix |
A matrix (or data frame) of RNA editing level values
for individual sites, with row names as site IDs in the form of
"chrAA:XXXXXXXX", and column names as sample IDs. Please make sure to
follow the format of example dataset ( |
selectMethod |
Method for summarizing regions. Available options are
|
progressBar |
Name of the progress bar to use. There are currently five
types of progress bars: |
... |
Dots for additional internal arguments (currently unused). |
A data frame of the class rnaEdit_df
, includes
variables seqnames, start, end, width
and summarized RNA editing
levels in each sample.
TransformToGR
, AllCloseByRegions
,
AllCoeditedRegions
, CreateEditingTable
,
TestAssociations
, AnnotateResults
data(rnaedit_df) genes_gr <- TransformToGR( genes_char = c("PHACTR4", "CCR5", "METTL7A"), type = "symbol", genome = "hg19" ) exm_regions <- AllCoeditedRegions( regions_gr = genes_gr, rnaEditMatrix = rnaedit_df, output = "GRanges", method = "spearman" ) SummarizeAllRegions( regions_gr = exm_regions, rnaEditMatrix = rnaedit_df )[1:3, 1:6]
data(rnaedit_df) genes_gr <- TransformToGR( genes_char = c("PHACTR4", "CCR5", "METTL7A"), type = "symbol", genome = "hg19" ) exm_regions <- AllCoeditedRegions( regions_gr = genes_gr, rnaEditMatrix = rnaedit_df, output = "GRanges", method = "spearman" ) SummarizeAllRegions( regions_gr = exm_regions, rnaEditMatrix = rnaedit_df )[1:3, 1:6]
A subset of the TCGA breast cancer RNA editing dataset for 20
randomly selected RNA editing sites and 50 randomly selected subjects from
example dataset rnaedit_df
. Please note that this is only a
computational testing dataset for inner functions of this package. To test
main functions, please use dataset rnaedit_df
instead.
t_rnaedit_df
t_rnaedit_df
A data frame containing RNA editing levels for 50 subjects (in the rows) at 20 edited sites (in the columns). Row names are sample IDs and column names are site IDs.
Synapse database ID: syn2374375.
A wrapper function to test associations between phenotype and RNA editing levels in single-site analysis or summarized RNA editing levels in region-based analysis.
TestAssociations( rnaEdit_df, pheno_df, responses_char, covariates_char = NULL, respType = c("binary", "continuous", "survival"), progressBar = "time", orderByPval = TRUE )
TestAssociations( rnaEdit_df, pheno_df, responses_char, covariates_char = NULL, respType = c("binary", "continuous", "survival"), progressBar = "time", orderByPval = TRUE )
rnaEdit_df |
A data frame with class |
pheno_df |
A data frame with phenotype and covariates, which should
include all the samples in |
responses_char |
A character vector of names of response variables in
|
covariates_char |
A character vector of names of covariate variables in
|
respType |
Type of outcome. Defaults to |
progressBar |
Name of the progress bar to use. There are currently five
types of progress bars: |
orderByPval |
Sort co-edited regions by model p-value or not? Defaults to TRUE. |
A data frame with locations of the genomic regions or sites
(seqnames, start, end, width
), test statistics
(estimate, stdErr
or coef, exp_coef, se_coef
), pValue
and false discovery rate (fdr
).
TransformToGR
, AllCloseByRegions
,
AllCoeditedRegions
, CreateEditingTable
,
SummarizeAllRegions
, AnnotateResults
data(rnaedit_df) genes_gr <- TransformToGR( genes_char = c("PHACTR4", "CCR5", "METTL7A"), type = "symbol", genome = "hg19" ) exm_regions <- AllCoeditedRegions( regions_gr = genes_gr, rnaEditMatrix = rnaedit_df, output = "GRanges", method = "spearman" ) sum_regions <- SummarizeAllRegions( regions_gr = exm_regions, rnaEditMatrix = rnaedit_df, selectMethod = MaxSites ) exm_pheno <- readRDS( system.file( "extdata", "pheno_df.RDS", package = 'rnaEditr', mustWork = TRUE ) ) TestAssociations( rnaEdit_df = sum_regions, pheno_df = exm_pheno, responses_char = "sample_type", covariates_char = NULL, respType = "binary" )
data(rnaedit_df) genes_gr <- TransformToGR( genes_char = c("PHACTR4", "CCR5", "METTL7A"), type = "symbol", genome = "hg19" ) exm_regions <- AllCoeditedRegions( regions_gr = genes_gr, rnaEditMatrix = rnaedit_df, output = "GRanges", method = "spearman" ) sum_regions <- SummarizeAllRegions( regions_gr = exm_regions, rnaEditMatrix = rnaedit_df, selectMethod = MaxSites ) exm_pheno <- readRDS( system.file( "extdata", "pheno_df.RDS", package = 'rnaEditr', mustWork = TRUE ) ) TestAssociations( rnaEdit_df = sum_regions, pheno_df = exm_pheno, responses_char = "sample_type", covariates_char = NULL, respType = "binary" )
Transform a character vector of gene symbols or region ranges into a GRanges object.
TransformToGR( genes_char, type = c("symbol", "region"), genome = c("hg38", "hg19") )
TransformToGR( genes_char, type = c("symbol", "region"), genome = c("hg38", "hg19") )
genes_char |
A character vector of gene symbols or region ranges. If
you select |
type |
What is the type of |
genome |
Use |
TransformToGR()
uses the hg19/hg38 genes to associate gene
symbols with their genomic region ranges. The pre-processed dataset is
saved in inst/extdata in this package.
Users who wish to add gene symbols to the GRanges created using
function TransformToGR()
can use function AddMetaData()
.
Please see AddMetaData
for details.
A GRanges object with seqnames
, ranges
and
strand
.
AllCloseByRegions
, AllCoeditedRegions
,
CreateEditingTable
, SummarizeAllRegions
,
TestAssociations
, AnnotateResults
TransformToGR( genes_char = c("PHACTR4", "CCR5", "METTL7A"), type = "symbol", genome = "hg19" ) TransformToGR( genes_char = c("chr22:18555686-18573797", "chr22:36883233-36908148"), type = "region", genome = "hg19" )
TransformToGR( genes_char = c("PHACTR4", "CCR5", "METTL7A"), type = "symbol", genome = "hg19" ) TransformToGR( genes_char = c("chr22:18555686-18573797", "chr22:36883233-36908148"), type = "region", genome = "hg19" )