Title: | Identifying differentially reactive regions from RNA structurome profiling data |
---|---|
Description: | dStruct identifies differentially reactive regions from RNA structurome profiling data. dStruct is compatible with a broad range of structurome profiling technologies, e.g., SHAPE-MaP, DMS-MaPseq, Structure-Seq, SHAPE-Seq, etc. See Choudhary et al., Genome Biology, 2019 for the underlying method. |
Authors: | Krishna Choudhary [aut, cre] , Sharon Aviran [aut] |
Maintainer: | Krishna Choudhary <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.13.0 |
Built: | 2024-11-29 05:40:13 UTC |
Source: | https://github.com/bioc/dStruct |
d score of a nucleotide is a measure of dissimilarity of its
normalized reactivity scores. Consider a transcript and its reactivity profiles
from a group of samples. Then, the d score of a nucleotide is
times the arc-tangent of the ratio of the sample standard deviation of its
reactivities to their mean.
calcDis(x)
calcDis(x)
x |
A numeric vector or matrix. |
If input is a numeric vector, a number is returned. For a matrix, a numeric vector is returned.
Krishna Choudhary
Choudhary, K., Lai, Y. H., Tran, E. J., & Aviran, S. (2019). dStruct: identifying differentially reactive regions from RNA structurome profiling data. Genome biology, 20(1), 1-26.
Choudhary K, Shih NP, Deng F, Ledda M, Li B, Aviran S. Metrics for rapid quality control in RNA structure probing experiments. Bioinformatics. 2016; 32(23):3575–3583.
#Lower standard deviation of reactivites results in lower d-score. calcDis(rnorm(10, 1, 0.2)) calcDis(rnorm(10, 1, 0.6))
#Lower standard deviation of reactivites results in lower d-score. calcDis(rnorm(10, 1, 0.2)) calcDis(rnorm(10, 1, 0.6))
Given the reactivity profiles for a transcript from multiple samples, and a list of sample identifiers, this function computes the dissimilarity of reactivity scores between the specified samples. These are returned as a sequence of nucleotide-wise d scores.
dCombs(rdf, combs)
dCombs(rdf, combs)
rdf |
Data.frame of reactivities for each sample. |
combs |
Data.frame with each column containing groupings of samples. |
Nucleotide-wise d scores.
Krishna Choudhary
Choudhary, K., Lai, Y. H., Tran, E. J., & Aviran, S. (2019). dStruct: identifying differentially reactive regions from RNA structurome profiling data. Genome biology, 20(1), 1-26.
#Example of a data frame with reactivities. reacs <- data.frame(matrix(runif(30, 0, 10), 10, 3)) #The columns of data frame with must indicate sample grouping and id. colnames(reacs) <- c("A1", "A2", "B1") #Get nucleotide-wise dissimilarity scores for a set of samples. dCombs(rdf = reacs, combs = data.frame(c("A1", "B1")))
#Example of a data frame with reactivities. reacs <- data.frame(matrix(runif(30, 0, 10), 10, 3)) #The columns of data frame with must indicate sample grouping and id. colnames(reacs) <- c("A1", "A2", "B1") #Get nucleotide-wise dissimilarity scores for a set of samples. dCombs(rdf = reacs, combs = data.frame(c("A1", "B1")))
This function takes reactivity profiles for samples of two groups as input and identifies differentially reactive regions in three steps (see Choudhary et al., Genome Biology, 2019 for details). First, it regroups the samples into homogeneous and heteregenous sub-groups, which are used to compute the within-group and between-group nucleotide-wise d scores. Second, smoothed between- and within-group d score profiles are compared to construct candidate differential regions. Finally, unsmoothed between- and within-group d scores are compared using the Wilcoxon signed-rank test. The resulting p-values quantify the significance of difference in reactivity patterns between the two input groups.
dStruct( rdf, reps_A, reps_B, batches = FALSE, min_length = 11, check_signal_strength = TRUE, check_nucs = TRUE, check_quality = TRUE, quality = "auto", evidence = 0, signal_strength = 0.1, within_combs = NULL, between_combs = NULL, ind_regions = TRUE, gap = 1, get_FDR = TRUE, proximity_assisted = FALSE, proximity = 10, proximity_defined_length = 30 )
dStruct( rdf, reps_A, reps_B, batches = FALSE, min_length = 11, check_signal_strength = TRUE, check_nucs = TRUE, check_quality = TRUE, quality = "auto", evidence = 0, signal_strength = 0.1, within_combs = NULL, between_combs = NULL, ind_regions = TRUE, gap = 1, get_FDR = TRUE, proximity_assisted = FALSE, proximity = 10, proximity_defined_length = 30 )
rdf |
Dataframe of reactivities for each sample. |
reps_A |
Number of replicates of group A. |
reps_B |
Number of replicates of group B. |
batches |
Logical suggesting if replicates of group A and B were performed in batches and are labelled accordingly. If TRUE, a heterogeneous/homogeneous subset may not have multiple samples from the same batch. |
min_length |
Minimum length of constructed regions. |
check_signal_strength |
Logical, if TRUE, construction of regions must be based on nucleotides that have a minimum absolute value of reactivity. |
check_nucs |
Logical, if TRUE, constructed regions must have a minimum number of nucleotides participating in Wilcoxon signed rank test. |
check_quality |
Logical, if TRUE, check constructed regions for quality. |
quality |
Worst allowed quality for a region to be tested. |
evidence |
Minimum evidence of increase in variation from within-group comparisons to between-group comparisons for a region to be tested. |
signal_strength |
Threshold for minimum signal strength. |
within_combs |
Data.frame with each column containing groupings of replicates of groups A or B, which will be used to assess within-group variation. |
between_combs |
Dataframe with each column containing groupings of replicates of groups A and B, which will be used to assess between-group variation. |
ind_regions |
Logical, if TRUE, test each region found in the transcript separately. |
gap |
Integer. Join regions if they are separated by these many nucleotides. |
get_FDR |
Logical, if FALSE, FDR is not reported. |
proximity_assisted |
Logical, if TRUE, proximally located regions are tested together. |
proximity |
Maximum distance between constructed regions for them to be considered proximal. |
proximity_defined_length |
If performing a "proximity-assisted" test, minimum end-to-end length of a region to be tested. |
Constructs regions, reports p-value and median difference of between-group and within-group d-scores for each region, and FDR for them.
Krishna Choudhary
Choudhary, K., Lai, Y. H., Tran, E. J., & Aviran, S. (2019). dStruct: identifying differentially reactive regions from RNA structurome profiling data. Genome biology, 20(1), 1-26.
#Load data from Lai et al., 2019 data(lai2019) #Run dStruct in de novo discovery mode for a transcript with id YAL042W. dStruct(rdf = lai2019[["YAL042W"]], reps_A = 3, reps_B = 2, batches = TRUE, min_length = 21, between_combs = data.frame(c("A3", "B1", "B2")), within_combs = data.frame(c("A1", "A2", "A3")), ind_regions = TRUE)
#Load data from Lai et al., 2019 data(lai2019) #Run dStruct in de novo discovery mode for a transcript with id YAL042W. dStruct(rdf = lai2019[["YAL042W"]], reps_A = 3, reps_B = 2, batches = TRUE, min_length = 21, between_combs = data.frame(c("A3", "B1", "B2")), within_combs = data.frame(c("A1", "A2", "A3")), ind_regions = TRUE)
This function takes as input reactivity profiles for a transcript region from samples of two groups. First, it regroups the samples into homogeneous and heteregenous sub-groups, which are used to compute the within-group and between-group nucleotide-wise d scores. If the region meets the quality criteria, the between- and within-group d scores are compared using the Wilcoxon signed-rank test. The resulting p-values quantify the significance of difference in reactivity patterns between the two input groups.
dStructGuided( rdf, reps_A, reps_B, batches = FALSE, within_combs = NULL, between_combs = NULL, check_quality = TRUE, quality = "auto", evidence = 0 )
dStructGuided( rdf, reps_A, reps_B, batches = FALSE, within_combs = NULL, between_combs = NULL, check_quality = TRUE, quality = "auto", evidence = 0 )
rdf |
Dataframe of reactivities for each sample. Each column must be labelled as A1, A2, ..., B1, B2, ... |
reps_A |
Number of replicates of group A. |
reps_B |
Number of replicates of group B. |
batches |
Logical suggesting if replicates of group A and B were performed in batches and are labelled accordingly. If TRUE, a heterogeneous/homogeneous subset may not have multiple samples from the same batch. |
within_combs |
Data.frame with each column containing groupings of replicates of groups A or B, which will be used to assess within-group variation. |
between_combs |
Dataframe with each column containing groupings of replicates of groups A and B, which will be used to assess between-group variation. |
check_quality |
Logical, if TRUE, check regions for quality. |
quality |
Worst allowed quality for a region to be tested. |
evidence |
Minimum evidence of increase in variation from within-group comparisons to between-group comparisons for a region to be tested. |
p-value for the tested region (estimated using one-sided Wilcoxon signed rank test) and the median of nucleotide-wise difference of between-group and within-group d-scores.
Krishna Choudhary
Choudhary, K., Lai, Y. H., Tran, E. J., & Aviran, S. (2019). dStruct: identifying differentially reactive regions from RNA structurome profiling data. Genome biology, 20(1), 1-26.
#Load Wan et al., 2014 data data(wan2014) #Run dStruct in the guided mode on first region in wan2014. dStructGuided(wan2014[[1]], reps_A = 2, reps_B = 1)
#Load Wan et al., 2014 data data(wan2014) #Run dStruct in the guided mode on first region in wan2014. dStructGuided(wan2014[[1]], reps_A = 2, reps_B = 1)
This function provides a convenient way to call the dStruct or dStructGuided functions for multiple transcripts simultaneously. By default, the transcripts are processed in using multiple parallel processes if available.
dStructome( rl, reps_A, reps_B, batches = FALSE, min_length = 11, check_signal_strength = TRUE, check_nucs = TRUE, check_quality = TRUE, quality = "auto", evidence = 0, signal_strength = 0.1, within_combs = NULL, between_combs = NULL, ind_regions = TRUE, gap = 1, processes = "auto", method = "denovo", proximity_assisted = FALSE, proximity = 10, proximity_defined_length = 30 )
dStructome( rl, reps_A, reps_B, batches = FALSE, min_length = 11, check_signal_strength = TRUE, check_nucs = TRUE, check_quality = TRUE, quality = "auto", evidence = 0, signal_strength = 0.1, within_combs = NULL, between_combs = NULL, ind_regions = TRUE, gap = 1, processes = "auto", method = "denovo", proximity_assisted = FALSE, proximity = 10, proximity_defined_length = 30 )
rl |
List of dataframes of reactivities for each sample. |
reps_A |
Number of replicates of group A. |
reps_B |
Number of replicates of group B. |
batches |
Logical suggesting if replicates of group A and B were performed in batches and are labelled accordingly. If TRUE, a heterogeneous/homogeneous subset may not have multiple samples from the same batch. |
min_length |
Minimum length of constructed regions. |
check_signal_strength |
Logical, if TRUE, construction of regions must be based on nucleotides that have a minimum absolute value of reactivity. |
check_nucs |
Logical, if TRUE, constructed regions must have a minimum number of nucleotides participating in Wilcoxon signed rank test. |
check_quality |
Logical, if TRUE, check constructed regions for quality. |
quality |
Worst allowed quality for a region to be tested. |
evidence |
Minimum evidence of increase in variation from within-group comparisons to between-group comparisons for a region to be tested. |
signal_strength |
Threshold for minimum signal strength. |
within_combs |
Data.frame with each column containing groupings of replicates of groups A or B, which will be used to assess within-group variation. |
between_combs |
Dataframe with each column containing groupings of replicates of groups A and B, which will be used to assess between-group variation. |
ind_regions |
Logical, if TRUE, test each region found in the transcript separately. |
gap |
Integer. Join regions if they are separated by these many nucleotides. |
processes |
Number of parallel processes to use. |
method |
Character specifying either guided or de novo discovery approach. |
proximity_assisted |
Logical, if TRUE, proximally located regions are tested together. |
proximity |
Maximum distance between constructed regions for them to be considered proximal. |
proximity_defined_length |
If performing a "proximity-assisted" test, minimum end-to-end length of a region to be tested. |
Constructs regions, reports p-value and median difference of between-group and within-group d-scores for each region, and FDR for them.
Krishna Choudhary
Choudhary, K., Lai, Y. H., Tran, E. J., & Aviran, S. (2019). dStruct: identifying differentially reactive regions from RNA structurome profiling data. Genome biology, 20(1), 1-26.
#Load data from Lai et al., 2019 data(lai2019) #Run dStruct in de novo discovery mode for all the transcripts in this data in one step. dStructome(lai2019, 3, 2, batches= TRUE, min_length = 21, between_combs = data.frame(c("A3", "B1", "B2")), within_combs = data.frame(c("A1", "A2", "A3")), ind_regions = TRUE, processes = 1) #Load data from Wan et al., 2014 data(wan2014) #Run dStruct in guide discovery mode for all the transcript regions in this data in one step. dStructome(wan2014, reps_A = 2, reps_B = 1, method = "guided", processes = 1)
#Load data from Lai et al., 2019 data(lai2019) #Run dStruct in de novo discovery mode for all the transcripts in this data in one step. dStructome(lai2019, 3, 2, batches= TRUE, min_length = 21, between_combs = data.frame(c("A3", "B1", "B2")), within_combs = data.frame(c("A1", "A2", "A3")), ind_regions = TRUE, processes = 1) #Load data from Wan et al., 2014 data(wan2014) #Run dStruct in guide discovery mode for all the transcript regions in this data in one step. dStructome(wan2014, reps_A = 2, reps_B = 1, method = "guided", processes = 1)
Regroup all the samples of A and B groups into homogoneous and heterogeneous sub-groups. Each homogenous sub-group contains replicates of either group A only or group B only. Each heterogeneous sub-group has a mix of samples from both the groups A and B.
getCombs( reps_A, reps_B, batches = FALSE, between_combs = NULL, within_combs = NULL )
getCombs( reps_A, reps_B, batches = FALSE, between_combs = NULL, within_combs = NULL )
reps_A |
Number of replicates of group A. |
reps_B |
Number of replicates of group B. |
batches |
Logical suggesting if replicates of group A and B were performed in batches and are labelled accordingly. If TRUE, a heterogeneous/homogeneous subset may not have multiple samples from the same batch. |
between_combs |
Dataframe with each column containing groupings of replicates of groups A and B, which will be used to assess between-group variation. |
within_combs |
Data.frame with each column containing groupings of replicates of groups A or B, which will be used to assess within-group variation. |
List of two dataframes, containing groupings for within-group and between-group variation.
Krishna Choudhary
Choudhary, K., Lai, Y. H., Tran, E. J., & Aviran, S. (2019). dStruct: identifying differentially reactive regions from RNA structurome profiling data. Genome biology, 20(1), 1-26.
#Get heterogeneous and homogeneous set combinations of samples when there are 2 samples of group A and 1 of group B. getCombs(2, 1)
#Get heterogeneous and homogeneous set combinations of samples when there are 2 samples of group A and 1 of group B. getCombs(2, 1)
Given a sequence of nucleotide indices, this function returns integer ranges covered by the indices. There is an option to merge ranges if they are separated by less than a user-specified distance.
getContigRegions(x, gap = 0)
getContigRegions(x, gap = 0)
x |
A vector of integers. |
gap |
Include gaps in the ranges if they are shorter than or equal to this length. |
IRanges object storing start and end sites of continguous regions.
Krishna Choudhary
#Convert an integer vector of nucleotide positions to an IRanges object containing the coordinates of contiguous regions. nucleotide_positions <- c(1, 3, 2, 8, 4:7, 11:20) getContigRegions(nucleotide_positions) #Merge regions if their end points are within 3 nt of each other. getContigRegions(nucleotide_positions, gap = 3)
#Convert an integer vector of nucleotide positions to an IRanges object containing the coordinates of contiguous regions. nucleotide_positions <- c(1, 3, 2, 8, 4:7, 11:20) getContigRegions(nucleotide_positions) #Merge regions if their end points are within 3 nt of each other. getContigRegions(nucleotide_positions, gap = 3)
This function takes between- and within-group d scores for a transcript as input and identifies regions where the former is generally larger. Regions that pass minimum quality and minimum signal criteria are returned.
getRegions( d_within, d_spec, rdf, min_length = 11, check_signal_strength = TRUE, check_nucs = TRUE, check_quality = TRUE, quality = 0.5, evidence = 0, signal_strength = 0.1 )
getRegions( d_within, d_spec, rdf, min_length = 11, check_signal_strength = TRUE, check_nucs = TRUE, check_quality = TRUE, quality = 0.5, evidence = 0, signal_strength = 0.1 )
d_within |
Nucleotide-wise d score for within-group variation. |
d_spec |
Nucleotide-wise d score for between-group variation. |
rdf |
Dataframe of reactivities for each sample. |
min_length |
Minimum length of constructed regions. |
check_signal_strength |
Logical, if TRUE, construction of regions must be based on nucleotides that have a minimum absolute value of reactivity. |
check_nucs |
Logical, if TRUE, constructed regions must have a minimum number of nucleotides participating in Wilcoxon signed rank test. |
check_quality |
Logical, if TRUE, check constructed regions for quality. |
quality |
Worst allowed quality for a region to be tested. |
evidence |
Minimum evidence of increase in variation from within-group comparisons to between-group comparisons for a region to be tested. |
signal_strength |
Threshold for minimum signal strength. |
Integer vector of nucleotides that constitute potential differentially reactive regions.
Krishna Choudhary
Choudhary, K., Lai, Y. H., Tran, E. J., & Aviran, S. (2019). dStruct: identifying differentially reactive regions from RNA structurome profiling data. Genome biology, 20(1), 1-26.
Data from a Structure-seq assay of five samples of S. cerevisiae, three of which were wild-type samples and two mutant samples. The data was pre-processed to obtain DMS reactivities as described by Lai et al. (2019).
data("lai2019")
data("lai2019")
An object of class "list"
.
Raw data from Lai et al., 2019 in processed form.
Lai et al. (2019) Genetics, Vol. 212, 153–174 (Genetics)
data("lai2019")
data("lai2019")
Assesses normalization factor for raw reactivities using the 2-8 % method. Given a reactivity profile, first, remove 2% of the nucleotides with the highest reactivities. Then, the normalization factor is the mean of reactivities of the 8% of the nucleotides with the next highest reactivities.
normalizer(raw.estimates)
normalizer(raw.estimates)
raw.estimates |
A vector of raw reactivities. |
The normalization factor.
Krishna Choudhary
Low JT, Weeks KM. SHAPE-directed RNA secondary structure prediction. Methods. 2010; 52(2):150–8.
Sloma MF, Mathews DH, Chen SJ, Burke-Aguero DH. Chapter four – improving RNA secondary structure prediction with structure mapping data. In: Methods in Enzymology, vol. 553. Cambridge: Academic Press: 2015. p. 91–114.
Choudhary K, Deng F, Aviran S. Comparative and integrative analysis of RNA structural profiling data: current practices and emerging questions. Quant Biol. 2017; 5(1):3–24.
normalizer(c(NA, rnorm(20, 0.5, 0.3), NA, -999))
normalizer(c(NA, rnorm(20, 0.5, 0.3), NA, -999))
Given the table of results from dStruct or dStructGuided and the corresponding lists with reactivity scores for all transcripts, this function saves a PDF file with detailed visualizations of reactivities for all differential regions.
plotDStructurome( rl, diff_regions, outfile, fdr = 0.05, ylim = c(-0.05, 3), del_d_cutoff = 0.01 )
plotDStructurome( rl, diff_regions, outfile, fdr = 0.05, ylim = c(-0.05, 3), del_d_cutoff = 0.01 )
rl |
List of dataframes of reactivities for each sample. |
diff_regions |
Output from dStruct or dStructGuided containing coordinates of regions with significance of differentially reactivity. |
outfile |
The name for pdf file which will be saved. |
fdr |
FDR threshold for plotted regions. |
ylim |
Y-axis limits for plots. |
del_d_cutoff |
Minimum effect size for plotted regions specified in terms of median difference of the between-group and within-group d-scores. |
Saves a PDF for all differentially reactive regions. Returns NULL.
Krishna Choudhary
Choudhary, K., Lai, Y. H., Tran, E. J., & Aviran, S. (2019). dStruct: identifying differentially reactive regions from RNA structurome profiling data. Genome biology, 20(1), 1-26.
#Load data from Lai et al., 2019 data(lai2019) #Run dStruct in de novo discovery mode for all the transcripts in this data in one step. res <- dStructome(lai2019, 3, 2, batches= TRUE, min_length = 21, between_combs = data.frame(c("A3", "B1", "B2")), within_combs = data.frame(c("A1", "A2", "A3")), ind_regions = TRUE, processes = 1) #Plot the significant results and save to a PDF file. plotDStructurome(rl = lai2019, diff_regions = res, outfile = "significantly_differential_regions", fdr = 0.05, ylim = c(-0.05, 3))
#Load data from Lai et al., 2019 data(lai2019) #Run dStruct in de novo discovery mode for all the transcripts in this data in one step. res <- dStructome(lai2019, 3, 2, batches= TRUE, min_length = 21, between_combs = data.frame(c("A3", "B1", "B2")), within_combs = data.frame(c("A1", "A2", "A3")), ind_regions = TRUE, processes = 1) #Plot the significant results and save to a PDF file. plotDStructurome(rl = lai2019, diff_regions = res, outfile = "significantly_differential_regions", fdr = 0.05, ylim = c(-0.05, 3))
Given a reactivity profile, first, remove 2% of the nucleotides with the highest reactivities. Then, the normalization factor is the mean of reactivities of the 8% of the nucleotides with the next highest reactivities. The raw reactivities are divided by the normalization factor to get normalized reactivities. This is called as 2-8 % normalization and has been a common way to normalize data from RNA structurome profiling technologies such as SHAPE-Seq, Structure-Seq, etc. (see Low and Weeks, 2010, Sloma et al., 2015, and Choudhary et al., 2017).
twoEightNormalize(raw.estimates)
twoEightNormalize(raw.estimates)
raw.estimates |
A vector of raw reactivities. |
A vector of normalized reactivities.
Krishna Choudhary
Low JT, Weeks KM. SHAPE-directed RNA secondary structure prediction. Methods. 2010; 52(2):150–8.
Sloma MF, Mathews DH, Chen SJ, Burke-Aguero DH. Chapter four – improving RNA secondary structure prediction with structure mapping data. In: Methods in Enzymology, vol. 553. Cambridge: Academic Press: 2015. p. 91–114.
Choudhary K, Deng F, Aviran S. Comparative and integrative analysis of RNA structural profiling data: current practices and emerging questions. Quant Biol. 2017; 5(1):3–24.
twoEightNormalize(c(NA, rnorm(20, 0.5, 0.3), NA, -999))
twoEightNormalize(c(NA, rnorm(20, 0.5, 0.3), NA, -999))
Data from a PARS assay of a family trio of mother, father, and child. The data was pre-processed to obtain PARS scores as described in Choudhary et al. (2019).
data(wan2014)
data(wan2014)
An object of class "list"
.
Counts data from Wan et al., 2014 in processed form.
Wan et al., Nature, 505, 706–709 (2014) (Nature)
data(wan2014)
data(wan2014)