Title: | Colocalization analysis of transcriptome elements in the presence of isoform heterogeneity and ambiguity |
---|---|
Description: | RgnTX allows the integration of transcriptome annotations so as to model the complex alternative splicing patterns. It supports the testing of transcriptome elements without clear isoform association, which is often the real scenario due to technical limitations. It involves functions that do permutaion test for evaluating association between features and transcriptome regions. |
Authors: | Yue Wang [aut, cre], Jia Meng [aut] |
Maintainer: | Yue Wang <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.9.0 |
Built: | 2024-11-21 06:09:52 UTC |
Source: | https://github.com/bioc/RgnTX |
The first step of calculating positional shift over transcriptome regions.
calculateShift(regions, disp, direction = "right", strand = "+")
calculateShift(regions, disp, direction = "right", strand = "+")
regions |
A feature set, which should be a GRangesList object. |
disp |
A data frame object. It should have three columns, which are |
direction |
Either to be character "left" or "right", which means the direction to which the starting position is shifting. The former means moving to the direction of 5' while the latter means moving to 3'. |
strand |
Either to be "+" or "-". |
A GRanges
object.
# Take five transcripts. # Extract the last 200 nt regions from their CDS part. library(TxDb.Hsapiens.UCSC.hg19.knownGene) trans.id.pstv <- c("170", "782", "974", "1364", "1387") txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene # Download the CDS part of all transcriptome cds.tx0 <- cdsBy(txdb, use.names = FALSE) # pick the CDS part of these five transcripts cds.p <- cds.tx0[trans.id.pstv] width <- 200 disp.p.l <- data.frame( start = as.numeric(max(end(cds.p))), distance = width - 1, names = trans.id.pstv ) R.p.l <- calculateShift( regions = cds.p, disp = disp.p.l, direction = "left", strand = "+" )
# Take five transcripts. # Extract the last 200 nt regions from their CDS part. library(TxDb.Hsapiens.UCSC.hg19.knownGene) trans.id.pstv <- c("170", "782", "974", "1364", "1387") txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene # Download the CDS part of all transcriptome cds.tx0 <- cdsBy(txdb, use.names = FALSE) # pick the CDS part of these five transcripts cds.p <- cds.tx0[trans.id.pstv] width <- 200 disp.p.l <- data.frame( start = as.numeric(max(end(cds.p))), distance = width - 1, names = trans.id.pstv ) R.p.l <- calculateShift( regions = cds.p, disp = disp.p.l, direction = "left", strand = "+" )
Evaluation function. This function calculates the mean of the distance from each region of set RS1 to the closest region in RS2.
distanceTx(A, B, beta = 0.2, ...)
distanceTx(A, B, beta = 0.2, ...)
A |
Region set 1. A Granges or GRangesList object. |
B |
Region set 2. A Granges or GRangesList object. |
beta |
It is a user-defined argument that can filter out the corresponding percent of largest distance values. Default value is 0.2. |
... |
Any additional parameters needed. |
A numeric
object.
overlapWidthTx
, overlapCountsTx
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene trans.ids <- c("170", "782", "974", "1364", "1387") A <- randomizeTx( txdb, trans.ids, random_num = 20, random_length = 100 ) B <- randomizeTx( txdb, trans.ids, random_num = 20, random_length = 100 ) distanceTx(A, B, beta = 0.2)
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene trans.ids <- c("170", "782", "974", "1364", "1387") A <- randomizeTx( txdb, trans.ids, random_num = 20, random_length = 100 ) B <- randomizeTx( txdb, trans.ids, random_num = 20, random_length = 100 ) distanceTx(A, B, beta = 0.2)
This function receives three arguments: the scope region set, the target region set and the type of strand. It returns a subset of target region set, which is the intersection of the target region set and the scope region set.
extractRegions(regions_A, R, strand = "+")
extractRegions(regions_A, R, strand = "+")
regions_A |
The scope region set. A |
R |
The target region set. A |
strand |
The strand type of the transcripts. It has options "+" and "-". |
A GRangesList
object.
# Take five transcripts. # Extract the last 200 nt regions from their CDS part. library(TxDb.Hsapiens.UCSC.hg19.knownGene) trans.id.pstv <- c("170", "782", "974", "1364", "1387") txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene # download the CDS part of all transcriptome cds.tx0 <- cdsBy(txdb, use.names = FALSE) # pick the CDS part of these five transcripts cds.p <- cds.tx0[trans.id.pstv] width <- 200 disp.p.l <- data.frame( start = as.numeric(max(end(cds.p))), distance = width - 1, names = trans.id.pstv ) R.p.l <- calculateShift(regions = cds.p, disp = disp.p.l, direction = "left", strand = "+") R.cds.last200 <- extractRegions(regions_A = cds.p, R = R.p.l, strand = "+")
# Take five transcripts. # Extract the last 200 nt regions from their CDS part. library(TxDb.Hsapiens.UCSC.hg19.knownGene) trans.id.pstv <- c("170", "782", "974", "1364", "1387") txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene # download the CDS part of all transcriptome cds.tx0 <- cdsBy(txdb, use.names = FALSE) # pick the CDS part of these five transcripts cds.p <- cds.tx0[trans.id.pstv] width <- 200 disp.p.l <- data.frame( start = as.numeric(max(end(cds.p))), distance = width - 1, names = trans.id.pstv ) R.p.l <- calculateShift(regions = cds.p, disp = disp.p.l, direction = "left", strand = "+") R.cds.last200 <- extractRegions(regions_A = cds.p, R = R.p.l, strand = "+")
This function makes sure the two input region sets are in the correct format required by RgnTX evaluation functions.
getFormatCorrect(A, B)
getFormatCorrect(A, B)
A |
Region set 1. A Granges or GRangesList object. |
B |
Region set 2. A Granges or GRangesList object. |
A list
object.
This function returns a default permutation space for features with isoform ambiguity. The default permutation space of a feature is the aggregate of the multiple transcripts it may overlap with. It requires the input feature to be GRanges
format.
getPermSpaceByFeatures(features, txdb, type = "mature")
getPermSpaceByFeatures(features, txdb, type = "mature")
features |
A |
txdb |
A TxDb object. |
type |
A character object. Default is "mature". It accepts options "mature", "full", "fiveUTR", "CDS" or "threeUTR", with which one can get corresponding types of regions over transcriptome. |
A list object, which contains two elements.
perm.space:
A GRangesList
object that includes all the transcripts input features may overlap with.
index:
It contains a series of numbers indicating which feature these transcripts are respectively associated with.
getPermSpaceByTxID
, getPermSpaceByType
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene file <- system.file(package="RgnTX", "extdata/m6A_sites_data.rds") m6A_sites_data <- readRDS(file) permSpace <- getPermSpaceByFeatures(features = m6A_sites_data[1:100], txdb)
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene file <- system.file(package="RgnTX", "extdata/m6A_sites_data.rds") m6A_sites_data <- readRDS(file) permSpace <- getPermSpaceByFeatures(features = m6A_sites_data[1:100], txdb)
This function returns 5'UTR/CDS/3'UTR/mRNA/full part of transcriptome regions grouped by corresponding transcript ids.
getPermSpaceByTxID(trans_ids = "all", txdb, type = "mature")
getPermSpaceByTxID(trans_ids = "all", txdb, type = "mature")
trans_ids |
A character object. The transcript ids. Default is "all". If it takes the default value "all", the space that users get will be the whole transcriptome. |
txdb |
A TxDb object. |
type |
A character object. Default is "mature". It accepts options "mature", "full", "fiveUTR", "CDS" or "threeUTR", with which one can get corresponding types of transcriptome regions. |
A GRangesList
object.
getPermSpaceByType
, getPermSpaceByFeatures
trans.ids <- c("170", "782", "974", "1364", "1387") library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene permspace <- getPermSpaceByTxID(trans.ids, txdb)
trans.ids <- c("170", "782", "974", "1364", "1387") library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene permspace <- getPermSpaceByTxID(trans.ids, txdb)
This function can return 5'UTR/CDS/3'UTR/mRNA/full part of transcriptome regions, following the format required by the main permutation test functions.
getPermSpaceByType(txdb, type = "mature")
getPermSpaceByType(txdb, type = "mature")
txdb |
A TxDb object. |
type |
A character object. Default is "mature". It accepts options "mature", "full", "fiveUTR", "CDS" or "threeUTR", with which one can get corresponding types of transcriptome regions. |
A GRangesList
object.
getPermSpaceByTxID
, getPermSpaceByFeatures
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene permSpace <- getPermSpaceByType(txdb, type = "CDS")
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene permSpace <- getPermSpaceByType(txdb, type = "CDS")
Calculate a p-value and z-score based on observed value and random evaluation values.
getPvalZscore(orig.ev, rand.ev, pval_z = FALSE)
getPvalZscore(orig.ev, rand.ev, pval_z = FALSE)
orig.ev |
Observed value. |
rand.ev |
Random evaluation values. |
pval_z |
Boolean. Default is FALSE. If FALSE, the p-value is calculated based on the number of random evaluations is larger or less than the initial evaluation. If TRUE, the p-value is calculated based on a z-test. |
A p-value and a z-score.
Get stop codon regions for input transcripts. This is an example of customPick function.
getStopCodon(trans_ids, txdb, ...)
getStopCodon(trans_ids, txdb, ...)
trans_ids |
A character object containing transcript ids. |
txdb |
A TxDb object. |
... |
Any additional parameters needed. |
A numeric
object.
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene trans.ids <- c("170", "782", "974", "1364", "1387") RS2 <- getStopCodon(trans.ids, txdb)
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene trans.ids <- c("170", "782", "974", "1364", "1387") RS2 <- getStopCodon(trans.ids, txdb)
Generate a data frame object that contains information about input genomic feature set and its mapping results over the transcriptome.
getTransInfo(A, txdb)
getTransInfo(A, txdb)
A |
Genomic feature set, which should be a |
txdb |
A TxDb object. |
A data.frame
object containing the following components:
index_trans:
The label of transcripts.
index_features:
The label of genomic features.
seqnames:
The chr name.
features_pos:
The starting coordinate of each genomic feature.
width_features:
The width of each genomic feature.
strand:
The strand type of each genomic feature.
trans_ID:
The ids of the transcripts that each feature can be mapped to.
library(TxDb.Hsapiens.UCSC.hg19.knownGene) file <- system.file(package="RgnTX", "extdata/m6A_sites_data.rds") m6A_sites_data <- readRDS(file) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene getTransInfo(A = m6A_sites_data[1:100], txdb)
library(TxDb.Hsapiens.UCSC.hg19.knownGene) file <- system.file(package="RgnTX", "extdata/m6A_sites_data.rds") m6A_sites_data <- readRDS(file) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene getTransInfo(A = m6A_sites_data[1:100], txdb)
Convert a GRanges
object to a GRangesList
object. The output region set follows the format required by the main permutation test functions.
GRanges2GRangesList(A = NULL)
GRanges2GRangesList(A = NULL)
A |
A |
If input GRanges
object has a metadata named as "group",
ranges having the same group number represent a region. If not, a range is a region.
A region in the input set will be outputted as a list element IN returned
GRangesList
object.
A GRangesList
object.
library(GenomicRanges) GRanges.object <- GRanges( Rle(c("chr2", "chr2", "chr1", "chr3")), IRanges(1:4, width = 5) ) # Assign the first and the second ranges to the same element. GRanges.object$group <- c(1, 1, 2, 3) GRangesList.object <- GRanges2GRangesList(GRanges.object)
library(GenomicRanges) GRanges.object <- GRanges( Rle(c("chr2", "chr2", "chr1", "chr3")), IRanges(1:4, width = 5) ) # Assign the first and the second ranges to the same element. GRanges.object$group <- c(1, 1, 2, 3) GRangesList.object <- GRanges2GRangesList(GRanges.object)
Convert a GRangesList
object to a GRanges
object. The output region set follows the format required by the RgnTX permutation test functions, which should have metadata columns 'group' and 'transcriptsHits'.
GRangesList2GRanges(A = NULL)
GRangesList2GRanges(A = NULL)
A |
A |
A GRanges
object. Its transcript ids (if available) should be contained in a metadata column named “transcriptsHits”, which are provided by the names of input GRangesList
object.
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene trans.ids <- c("170", "782", "974", "1364", "1387") RS1 <- randomizeTx(txdb, trans.ids, random_num = 100, random_length = 100) RS1 <- GRangesList2GRanges(RS1)
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene trans.ids <- c("170", "782", "974", "1364", "1387") RS1 <- randomizeTx(txdb, trans.ids, random_num = 100, random_length = 100) RS1 <- GRangesList2GRanges(RS1)
This function receives two region sets and returns the number of their overlaps.
overlapCountsTx(A, B, count_once = TRUE, over_trans = TRUE, ...)
overlapCountsTx(A, B, count_once = TRUE, over_trans = TRUE, ...)
A |
Region set 1. A |
B |
Region set 2. A |
count_once |
Whether the overlap of multiple B regions with a single A region should be counted once or multiple times. |
over_trans |
Whether the overlapping is counted over the transcriptome or over the genome. |
... |
Any additional parameters needed. |
A numeric
object.
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene trans.ids <- c("170", "782", "974", "1364", "1387") exons.tx0 <- exonsBy(txdb) regions.A <- exons.tx0[trans.ids] A <- randomizeTransByOrder(regions.A, random_length = 200) B <- randomizeTransByOrder(regions.A, random_length = 200) overlapCountsTx(A, B)
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene trans.ids <- c("170", "782", "974", "1364", "1387") exons.tx0 <- exonsBy(txdb) regions.A <- exons.tx0[trans.ids] A <- randomizeTransByOrder(regions.A, random_length = 200) B <- randomizeTransByOrder(regions.A, random_length = 200) overlapCountsTx(A, B)
Evaluation function. This function receives a feature set (with isoform ambiguity) and a transcriptome region set (without isoform ambiguity), and returns a weighted number of overlaps between them.
overlapCountsTxIA(A, B, ...)
overlapCountsTxIA(A, B, ...)
A |
A feature set, which should be |
B |
A region set, which should be |
... |
Any additional parameters needed. |
A numeric
object.
overlapWidthTx
, distanceTx
, overlapCountsTx
library(TxDb.Hsapiens.UCSC.hg19.knownGene) file <- system.file(package="RgnTX", "extdata/m6A_sites_data.rds") m6A_sites_data <- readRDS(file) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene RS1 <- m6A_sites_data[1:100] trans.info <- getTransInfo(RS1, txdb) trans.ids <- trans.info[, "trans_ID"] RS2 <- getStopCodon(trans.ids, txdb = txdb) # Evaluation step. orig.ev <- overlapCountsTxIA(RS1, RS2)
library(TxDb.Hsapiens.UCSC.hg19.knownGene) file <- system.file(package="RgnTX", "extdata/m6A_sites_data.rds") m6A_sites_data <- readRDS(file) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene RS1 <- m6A_sites_data[1:100] trans.info <- getTransInfo(RS1, txdb) trans.ids <- trans.info[, "trans_ID"] RS2 <- getStopCodon(trans.ids, txdb = txdb) # Evaluation step. orig.ev <- overlapCountsTxIA(RS1, RS2)
Evaluation function. This function returns the sum of widths of each overlap between two region sets, i.e., the total number of overlapping nucleotides between two input region sets.
overlapWidthTx(A, B, ...)
overlapWidthTx(A, B, ...)
A |
Region set 1. A |
B |
Region set 2. A |
... |
Any additional parameters needed. |
A numeric
object.
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene trans.ids <- c("170", "782", "974", "1364", "1387") A <- randomizeTx( txdb, trans.ids, random_num = 20, random.length = 100 ) B <- randomizeTx( txdb, trans.ids = trans.ids, random_num = 20, random_length = 100 ) overlapWidthTx(A, B)
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene trans.ids <- c("170", "782", "974", "1364", "1387") A <- randomizeTx( txdb, trans.ids, random_num = 20, random.length = 100 ) B <- randomizeTx( txdb, trans.ids = trans.ids, random_num = 20, random_length = 100 ) overlapWidthTx(A, B)
Perform permutation test for evaluating spatial association between a feature set and a region set.
permTestTx(RS1 = NULL, RS2 = NULL, txdb = NULL, type = "mature", ntimes = 50, ev_function_1 = overlapCountsTx, ev_function_2 = overlapCountsTx, pval_z = FALSE, ...)
permTestTx(RS1 = NULL, RS2 = NULL, txdb = NULL, type = "mature", ntimes = 50, ev_function_1 = overlapCountsTx, ev_function_2 = overlapCountsTx, pval_z = FALSE, ...)
RS1 |
The region set to be randomized. It should be in the |
RS2 |
The region set to be compared with. It should be in the |
txdb |
A TxDb object. |
type |
A character object. Default is "mature". It accepts options "mature", "full", "fiveUTR", "CDS" or "threeUTR", with which one can get corresponding types of transcriptome regions. |
ntimes |
Randomization times. |
ev_function_1 |
Evaluation function defines what statistic to be tested between |
ev_function_2 |
Evaluation function defines what statistic to be tested between each element in |
pval_z |
Boolean. Default is FALSE. If FALSE, the p-value is calculated based on the number of random evaluations is larger or less than the initial evaluation. If TRUE, the p-value is calculated based on a z-test. |
... |
Any additional parameters needed. |
permTestTxIA
only needs users to input two region sets. It will automatically randomize the first region set into transcriptome.
A list object, which is defined to be permTestTx.results
class. It contains the following items:
RSL:
Randomized region sets of RS1
.
RS1:
The feature set to be randomized.
RS2:
The region set to be compared with the feature set.
orig.ev:
The value of overlapping counts between RS1
and RS2
.
rand.ev:
The values of overlapping counts between each element in RSL
and RS2
.
pval:
p-value of the test.
zscore:
Standard score of the test.
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene exons.tx0 <- exonsBy(txdb) trans.ids <- sample(names(exons.tx0), 500) A <- randomizeTx(txdb, trans.ids, random_num = 100, random_length = 100) B <- c(randomizeTx(txdb, trans.ids, random_num = 75, random_length = 100), A[1:25]) permTestTx_results <- permTestTx(A, B, txdb, ntimes = 5)
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene exons.tx0 <- exonsBy(txdb) trans.ids <- sample(names(exons.tx0), 500) A <- randomizeTx(txdb, trans.ids, random_num = 100, random_length = 100) B <- c(randomizeTx(txdb, trans.ids, random_num = 75, random_length = 100), A[1:25]) permTestTx_results <- permTestTx(A, B, txdb, ntimes = 5)
Perform permutation test for evaluating spatial association between region sets. This permutation test function receives two region sets and a set of randomized region sets of one of them. It evaluates if there is an association between these two region sets.
permTestTx_customAll(RSL = NULL, RS1 = NULL, RS2 = NULL, ev_function_1 = overlapCountsTx, ev_function_2 = overlapCountsTx, pval_z = FALSE, ...)
permTestTx_customAll(RSL = NULL, RS1 = NULL, RS2 = NULL, ev_function_1 = overlapCountsTx, ev_function_2 = overlapCountsTx, pval_z = FALSE, ...)
RSL |
Randomized region sets of |
RS1 |
The region set. It should be in the |
RS2 |
The region set to be compared with. It should be in the |
ev_function_1 |
Evaluation function defines what statistic to be tested between RS1 and RS2. Default is |
ev_function_2 |
Evaluation function defines what statistic to be tested between each element in RSL and RS2. Default is |
pval_z |
Boolean. Default is FALSE. If FALSE, p-value is calculated based on the number of random evaluations is larger or less than the initial evaluation. If TRUE, p-value is calculated based on a z-test. |
... |
Any additional parameters needed. |
permTestTx_customAll
will use evaluation function ev_function_1
to calculate the test statistic between RS1
and RS2
, and use ev_function_2
to evaluate the statistic between RSL
and RS2
. It will also return a p-value and a z-score.
A list object, which is defined to be permTestTx.results
class. It contains the following items:
RSL:
Randomized region sets of RS1
.
RS1:
The feature set to be randomized.
RS2:
The region set to be compared with the feature set.
orig.ev:
The value of overlapping counts between RS1
and RS2
.
rand.ev:
The values of overlapping counts between each element in RSL
and RS2
.
pval:
p-value of the test.
zscore:
Standard score of the test.
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene trans.ids1<- c("170") RS1 <- randomizeTx(txdb = txdb, trans_ids = trans.ids1, random_num = 20, random_length = 100) RS2 <- randomizeTx(txdb = txdb, trans_ids = trans.ids1, random_num = 20, random_length = 100) trans.ids2 <- c("170", "782", "974", "1364", "1387") RSL <- randomizeTx(txdb = txdb, trans_ids = trans.ids2, random_num = 20, random_length = 100, N = 10) permTestTx_results <- permTestTx_customAll(RSL = RSL, RS1 = RS1, RS2 = RS2)
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene trans.ids1<- c("170") RS1 <- randomizeTx(txdb = txdb, trans_ids = trans.ids1, random_num = 20, random_length = 100) RS2 <- randomizeTx(txdb = txdb, trans_ids = trans.ids1, random_num = 20, random_length = 100) trans.ids2 <- c("170", "782", "974", "1364", "1387") RSL <- randomizeTx(txdb = txdb, trans_ids = trans.ids2, random_num = 20, random_length = 100, N = 10) permTestTx_results <- permTestTx_customAll(RSL = RSL, RS1 = RS1, RS2 = RS2)
Perform permutation test for evaluating spatial association between a feature set and the customPick regions. The latter is defined by the customPick_function
argument provided by users.
permTestTx_customPick(RS1 = NULL, txdb = NULL, type = "mature", customPick_function = NULL, ntimes = 50, ev_function_1 = overlapCountsTx, ev_function_2 = overlapCountsTx, pval_z = FALSE, ...)
permTestTx_customPick(RS1 = NULL, txdb = NULL, type = "mature", customPick_function = NULL, ntimes = 50, ev_function_1 = overlapCountsTx, ev_function_2 = overlapCountsTx, pval_z = FALSE, ...)
RS1 |
The feature set to be randomized. It should be in the |
txdb |
A TxDb object. |
type |
A character object. Default is "mature". It accepts options "mature", "full", "fiveUTR", "CDS" or "threeUTR", with which one can get corresponding types of transcriptome regions. |
customPick_function |
A custom function needs to be inputted by users. The custom function should have two arguments: a TxDb object and a character object of transcript ids. It returns a part of region of each transcript. |
ntimes |
Randomization times. |
ev_function_1 |
Evaluation function defines what statistic to be tested between |
ev_function_2 |
Evaluation function defines what statistic to be tested between each element in |
pval_z |
Boolean. Default is FALSE. If FALSE, the p-value is calculated based on the number of random evaluations is larger or less than the initial evaluation. If TRUE, the p-value is calculated based on a z-test. |
... |
Any additional parameters needed. |
Each feature in RS1
is only mapped with the customPick regions over its transcript (picked by the customPick_function
). The output orig.ev
is the number of features that have overlap with its customPick region.
The set of randomized region sets is outputted as RSL
. The overlapping counts between each set in RSL
with RS2
is outputted as rand.ev
.
A list object, which is defined to be permTestTx.results
class. It contains the following items:
RSL:
Randomized region sets of RS1
.
RS1:
The feature set to be randomized.
RS2:
The region set to be compared with the feature set.
orig.ev:
The value of overlapping counts between RS1
and RS2
.
rand.ev:
The values of overlapping counts between each element in RSL
and RS2
.
pval:
p-value of the test.
zscore:
Standard score of the test.
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene exons.tx0 <- exonsBy(txdb) trans.ids <- sample(names(exons.tx0), 100) RS1 <- randomizeTx(txdb, trans.ids, random_num = 100, random_length = 200, type = 'CDS') getCDS = function(txdb, trans.id){ cds.tx0 <- cdsBy(txdb, use.names=FALSE) cds.names <- as.character(intersect(names(cds.tx0), trans.id)) cds = cds.tx0[cds.names] return(cds) } permTestTx_results <- permTestTx_customPick(RS1,txdb, customPick_function = getCDS, ntimes = 5)
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene exons.tx0 <- exonsBy(txdb) trans.ids <- sample(names(exons.tx0), 100) RS1 <- randomizeTx(txdb, trans.ids, random_num = 100, random_length = 200, type = 'CDS') getCDS = function(txdb, trans.id){ cds.tx0 <- cdsBy(txdb, use.names=FALSE) cds.names <- as.character(intersect(names(cds.tx0), trans.id)) cds = cds.tx0[cds.names] return(cds) } permTestTx_results <- permTestTx_customPick(RS1,txdb, customPick_function = getCDS, ntimes = 5)
Perform permutation test for evaluating spatial association between some features (with isoform ambiguity) and a region set. It randomizes the features and compares it with the region set to see if there is an association between the features and the region set. The difference between this function and permTestTx
is that it is for RNA-related genomic features that have isoform ambiguity, i.e., features that one does not know which transcript they comes from.
permTestTxIA(RS1 = NULL, RS2 = NULL, txdb = NULL, type = 'mature', ntimes = 50, ev_function_1 = overlapCountsTx, ev_function_2 = overlapCountsTx, pval_z = FALSE, ...)
permTestTxIA(RS1 = NULL, RS2 = NULL, txdb = NULL, type = 'mature', ntimes = 50, ev_function_1 = overlapCountsTx, ev_function_2 = overlapCountsTx, pval_z = FALSE, ...)
RS1 |
The feature set to be randomized. It should be in the |
RS2 |
The region set to be compared with. It should be in the |
txdb |
A TxDb object. |
type |
A character object. Default is "mature". It accepts options "mature", "full", "fiveUTR", "CDS" or "threeUTR", with which one can get corresponding types of transcriptome regions. |
ntimes |
Randomization times. |
ev_function_1 |
Evaluation function defines what statistic to be tested between |
ev_function_2 |
Evaluation function defines what statistic to be tested between each element in |
pval_z |
Boolean. Default is FALSE. If FALSE, the p-value is calculated based on the number of random evaluations is larger or less than the initial evaluation. If TRUE, the p-value is calculated based on a z-test. |
... |
Any additional parameters needed. |
permTestTxIA
only needs users to input two region sets. It will automatically randomize the first region set into transcriptome.
A list object, which is defined to be permTestTx.results
class. It contains the following items:
RSL:
Randomized region sets of RS1
.
RS1:
The feature set to be randomized.
RS2:
The region set to be compared with the feature set.
orig.ev:
The value of overlapping counts between RS1
and RS2
.
rand.ev:
The values of overlapping counts between each element in RSL
and RS2
.
pval:
p-value of the test.
zscore:
Standard score of the test.
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene file <- system.file(package="RgnTX", "extdata/m6A_sites_data.rds") m6A_sites_data <- readRDS(file) RS1 <- m6A_sites_data[1:500] trans.ids <- getTransInfo(RS1, txdb)[, "trans_ID"] RS2 <- getStopCodon(trans.ids, txdb) permTestTx_results <- permTestTxIA(RS1 = RS1, RS2 = RS2, txdb = txdb, ntimes = 5)
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene file <- system.file(package="RgnTX", "extdata/m6A_sites_data.rds") m6A_sites_data <- readRDS(file) RS1 <- m6A_sites_data[1:500] trans.ids <- getTransInfo(RS1, txdb)[, "trans_ID"] RS2 <- getStopCodon(trans.ids, txdb) permTestTx_results <- permTestTxIA(RS1 = RS1, RS2 = RS2, txdb = txdb, ntimes = 5)
Perform permutation test for evaluating spatial association between RNA features and a specified kind of regions. The latter is defined by the customPick_function
argument input by users. The difference between this function and permTestTx_customPick
is that it is for RNA-related genomic features that have isoform ambiguity, i.e., features that one does not know which transcript they comes from.
permTestTxIA_customPick(RS1 = NULL, txdb = NULL, type = 'mature', customPick_function = NULL, ntimes = 50, ev_function_1 = overlapCountsTxIA, ev_function_2 = overlapCountsTx, pval_z = FALSE, ...)
permTestTxIA_customPick(RS1 = NULL, txdb = NULL, type = 'mature', customPick_function = NULL, ntimes = 50, ev_function_1 = overlapCountsTxIA, ev_function_2 = overlapCountsTx, pval_z = FALSE, ...)
RS1 |
The region set to be randomized. It should be in the |
txdb |
A TxDb object. |
type |
A character object. Default is "mature". It accepts options "mature", "full", "fiveUTR", "CDS" or "threeUTR", with which one can get corresponding types of transcriptome regions. |
customPick_function |
A custom function needs to be inputted by users. The customPick function should have two arguments: a TxDb object and a character object of transcript ids. It returns a specified region over each transcript. |
ntimes |
Randomization times. |
ev_function_1 |
Evaluation function defines what statistic to be tested between |
ev_function_2 |
Evaluation function defines what statistic to be tested between each element in |
pval_z |
Boolean. Default is FALSE. If FALSE, the p-value is calculated based on the number of random evaluations is larger or less than the initial evaluation. If TRUE, the p-value is calculated based on a z-test. |
... |
Any additional parameters needed. |
permTestTxIA_customPick
will assess the test statistic between RS1
and each region in RSL
, and the relation between RS1
and RS2
.
Each RNA feature is only mapped with a part of region on its transcript (picked by the customPick_function
). The output orig.ev
is the weighted counts between RS1
and RS2
. Each feature in RS1
related to n1
isoforms in RS2
and overlapped with n2
RS2
regions will contribute a value of n2/n1
to the total number of overlaps.
This test function also randomizes input features per transcript. The set of randomized results is outputted as RSL
. The overlapping counts between each set in RSL
with RS2
is outputted as rand.ev
.
A list object, which is defined to be permTestTx.results
class. It contains the following information:
RSL:
Randomized region sets of RS1
.
RS1:
The feature set to be randomized.
RS2:
The region set to be compared with the feature set.
orig.ev:
The value of overlapping counts between RS1
and RS2
.
rand.ev:
The values of overlapping counts between each element in RSL
and RS2
.
pval:
p-value of the test.
zscore:
Standard score of the test.
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene file <- system.file(package="RgnTX", "extdata/m6A_sites_data.rds") m6A_sites_data <- readRDS(file) RS1 <- m6A_sites_data[1:500] permTestTx_results <- permTestTxIA_customPick(RS1 = RS1, txdb = txdb, type = 'mature', customPick_function = getStopCodon, ntimes = 5)
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene file <- system.file(package="RgnTX", "extdata/m6A_sites_data.rds") m6A_sites_data <- readRDS(file) RS1 <- m6A_sites_data[1:500] permTestTx_results <- permTestTxIA_customPick(RS1 = RS1, txdb = txdb, type = 'mature', customPick_function = getStopCodon, ntimes = 5)
Show a graphical representation of permutation test.
plotPermResults(permTestTx_results = NULL, breaks = 15, alpha = 0.05, test_type = "one-sided", binwidth = NULL)
plotPermResults(permTestTx_results = NULL, breaks = 15, alpha = 0.05, test_type = "one-sided", binwidth = NULL)
permTestTx_results |
A |
breaks |
Histogram breaks. Default is 15. |
alpha |
Significance level. |
test_type |
The type of the test. This argument only receives either two options "one-sided" or "two-sided". Default is "one-sided". |
binwidth |
Histogram binwidth. |
A plot object.
file <- system.file(package="RgnTX", "extdata", "permTestTx_results.rds") permTestTx_results <- readRDS(file) p_a <- plotPermResults(permTestTx_results, binwidth = 1) p_a
file <- system.file(package="RgnTX", "extdata", "permTestTx_results.rds") permTestTx_results <- readRDS(file) p_a <- plotPermResults(permTestTx_results, binwidth = 1) p_a
Plot shifted z scores for permutation test results.
plotShiftedZScoreTx(shitedZScoresTx_results)
plotShiftedZScoreTx(shitedZScoresTx_results)
shitedZScoresTx_results |
A |
A plot.
file <- system.file(package="RgnTX", "extdata", "shiftedZScoreTx_results1.rds") shiftedZScoreTx_results <- readRDS(file) p1 <- plotShiftedZScoreTx(shiftedZScoreTx_results) p1
file <- system.file(package="RgnTX", "extdata", "shiftedZScoreTx_results1.rds") shiftedZScoreTx_results <- readRDS(file) p1 <- plotShiftedZScoreTx(shiftedZScoreTx_results) p1
Randomize features into transcriptome.
randomizeFeaturesTx(RS, txdb, type = "mature", N = 1, ...)
randomizeFeaturesTx(RS, txdb, type = "mature", N = 1, ...)
RS |
The feature to be randomized. It should be a |
txdb |
A TxDb object. |
type |
A character object. Default is "mature". It accepts options "mature", "full", "fiveUTR", "CDS" or "threeUTR", with which one can get corresponding types of transcriptome regions. |
N |
The number of iterations. |
... |
Any additional parameters needed. |
A GRangesList
object. The name of each element is the id of the transcript where the corresponding range is located.
randomizeTransByOrder
, randomizeFeaturesTxIA
, randomizeTx
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene trans.ids <- c("170", "782", "974", "1364", "1387") RS1 <- randomizeTx(txdb, trans.ids, random_num = 100, random_length = 100) RS <- randomizeFeaturesTx(RS1, txdb, N = 1)
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene trans.ids <- c("170", "782", "974", "1364", "1387") RS1 <- randomizeTx(txdb, trans.ids, random_num = 100, random_length = 100) RS <- randomizeFeaturesTx(RS1, txdb, N = 1)
Randomize features into transcriptome, especially for the features that have isoform ambiguity.
randomizeFeaturesTxIA(RS, txdb, type = "mature", N = 1, ...)
randomizeFeaturesTxIA(RS, txdb, type = "mature", N = 1, ...)
RS |
The feature being randomized. It should be a |
txdb |
A TxDb object. |
type |
A character object. Default is "mature". It accepts options "mature", "full", "fiveUTR", "CDS" or "threeUTR", with which one can get corresponding types of transcriptome regions. |
N |
Randomization times. |
... |
Any additional parameters needed. |
A GRangesList
object. The name of each element is the id of the transcript where the corresponding range is located.
randomizeTransByOrder
, randomizeFeaturesTx
, randomizeTx
library(TxDb.Hsapiens.UCSC.hg19.knownGene) file <- system.file(package="RgnTX", "extdata/m6A_sites_data.rds") m6A_sites_data <- readRDS(file) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene RS1 <- m6A_sites_data[1:100] RS <- randomizeFeaturesTxIA(RS1, txdb, N = 1)
library(TxDb.Hsapiens.UCSC.hg19.knownGene) file <- system.file(package="RgnTX", "extdata/m6A_sites_data.rds") m6A_sites_data <- readRDS(file) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene RS1 <- m6A_sites_data[1:100] RS <- randomizeFeaturesTxIA(RS1, txdb, N = 1)
This function receives a GRangesList
object and picks a random region within each list element of this object. The length of the region to be picked is decided by the input random_length
argument.
randomizeTransByOrder(regions_A, random_length = 20)
randomizeTransByOrder(regions_A, random_length = 20)
regions_A |
A |
random_length |
A |
A GRangesList
object. The name of each list element should be the corresponding transcript id.
randomizeTx
, randomizeFeaturesTx
, randomizeFeaturesTxIA
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene exons.tx0 <- exonsBy(txdb) trans.ids <- sample(names(exons.tx0), 500) regions.A <- exons.tx0[trans.ids] RS <- randomizeTransByOrder(regions.A, random_length = 20)
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene exons.tx0 <- exonsBy(txdb) trans.ids <- sample(names(exons.tx0), 500) regions.A <- exons.tx0[trans.ids] RS <- randomizeTransByOrder(regions.A, random_length = 20)
Pick random regions over specified transcripts.
randomizeTx(txdb, trans_ids = 'all', random_num = 100, random_length = 20, type = 'mature', N = 1, ...)
randomizeTx(txdb, trans_ids = 'all', random_num = 100, random_length = 20, type = 'mature', N = 1, ...)
txdb |
A TxDb object. |
trans_ids |
The ids of transcripts, which should be a character object. Random regions will be picked from these transcripts. If this argument takes the default value 'all', the scope of picking random regions will be the whole transcriptome. |
random_num |
The number of regions to be picked. |
random_length |
The length of regions to be picked. |
type |
A character object. Default is "mature". It accepts options "mature", "full", "fiveUTR", "CDS" or "threeUTR", with which one can get corresponding types of transcriptome regions. |
N |
Randomization times. |
... |
Any additional parameters needed. |
A GRangesList
object. The name of each element is the id of the transcript where the corresponding range is located.
randomizeTransByOrder
, randomizeFeaturesTx
, randomizeFeaturesTxIA
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene trans.ids <- c("170", "782", "974", "1364", "1387") RS1 <- randomizeTx(txdb, trans.ids, random_num = 100, random_length = 100)
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene trans.ids <- c("170", "782", "974", "1364", "1387") RS1 <- randomizeTx(txdb, trans.ids, random_num = 100, random_length = 100)
Calculate shifted z scores for permutation test results.
shiftedZScoreTx(permTestTx_results = NULL, txdb = NULL, window = 200, step = 20, ev_function_1 = overlapCountsTx, ...)
shiftedZScoreTx(permTestTx_results = NULL, txdb = NULL, window = 200, step = 20, ev_function_1 = overlapCountsTx, ...)
permTestTx_results |
A |
txdb |
A TxDb object. |
window |
The window of the whole shifting. |
step |
The step of each shifting. |
ev_function_1 |
Evaluation function. Default is |
... |
Any additional parameters needed. |
see examples in plotShiftedZScoreTx
A list object, which is defined to be shitedZScore.results
class. It contains the following items:
shifted.z.scores:
Standard z-scores after shifting.
window:
Window of the whole shifting.
step:
Step of each shifting.
original.z.score:
Original standard score.
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene file <- system.file(package="RgnTX", "extdata/m6A_sites_data.rds") m6A_sites_data <- readRDS(file) RS1 <- m6A_sites_data[1:500] permTestTx_results <- permTestTxIA_customPick(RS1 = RS1, txdb = txdb, customPick_function = getStopCodon, ntimes = 5) shiftedZScoreTx_results <- shiftedZScoreTx(permTestTx_results, txdb = txdb, window = 2000, step = 200, ev_function_1 = overlapCountsTxIA)
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene file <- system.file(package="RgnTX", "extdata/m6A_sites_data.rds") m6A_sites_data <- readRDS(file) RS1 <- m6A_sites_data[1:500] permTestTx_results <- permTestTxIA_customPick(RS1 = RS1, txdb = txdb, customPick_function = getStopCodon, ntimes = 5) shiftedZScoreTx_results <- shiftedZScoreTx(permTestTx_results, txdb = txdb, window = 2000, step = 200, ev_function_1 = overlapCountsTxIA)
Calculate positional shifting over transcript regions. This function accepts a feature set and outputs a region set from it. Each output region is from each input feature.
shiftTx(regions, start, width, direction, strand)
shiftTx(regions, start, width, direction, strand)
regions |
A feature set following the format indicated in vignette section 3. Either to be |
start |
Starting positions. Each value represents a starting position in each input feature. |
width |
Widths. Each value represents a width of each region to be picked from each feature. |
direction |
Either to be character "left" or "right", which means the direction to which the starting position is shifting. The former means moving to the direction of 5' while the latter means moving to 3'. |
strand |
The strand type of the transcripts. It receives "+" or "-". |
A Granges
object.
# Take five transcripts. # Extract the last 200 nt regions from their CDS part. library(TxDb.Hsapiens.UCSC.hg19.knownGene) trans.id.pstv <- c("170", "782", "974", "1364", "1387") txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene # download the CDS part of all transcriptome cds.tx0 <- cdsBy(txdb, use.names = FALSE) # pick the CDS part of these five transcripts cds.p <- cds.tx0[trans.id.pstv] width <- 200 start <- as.numeric(max(end(cds.p))) R.cds.last200 <- shiftTx(cds.p, start = start, width = width, direction = 'left', strand = "+")
# Take five transcripts. # Extract the last 200 nt regions from their CDS part. library(TxDb.Hsapiens.UCSC.hg19.knownGene) trans.id.pstv <- c("170", "782", "974", "1364", "1387") txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene # download the CDS part of all transcriptome cds.tx0 <- cdsBy(txdb, use.names = FALSE) # pick the CDS part of these five transcripts cds.p <- cds.tx0[trans.id.pstv] width <- 200 start <- as.numeric(max(end(cds.p))) R.cds.last200 <- shiftTx(cds.p, start = start, width = width, direction = 'left', strand = "+")
Generate GRangesList
object from vectors. The output region set follows the format required by the main permutation test functions.
vector2GRangesList(RefSeqID, targetName, strand, blockSizes, targetStart)
vector2GRangesList(RefSeqID, targetName, strand, blockSizes, targetStart)
RefSeqID |
The name of each element. |
targetName |
The seqnames of each range. |
strand |
The strand of each range. |
blockSizes |
The width of each range. |
targetStart |
The stard coordinate of each range. |
A GRangesList
object.