Title: | 'rifiComparative' compares the output of rifi from two different conditions. |
---|---|
Description: | 'rifiComparative' is a continuation of rifi package. It compares two conditions output of rifi using half-life and mRNA at time 0 segments. As an input for the segmentation, the difference between half-life of both condtions and log2FC of the mRNA at time 0 are used. The package provides segmentation, statistics, summary table, fragments visualization and some additional useful plots for further anaylsis. |
Authors: | Loubna Youssar [aut, cre], Jens cre Georg [aut] |
Maintainer: | Loubna Youssar <[email protected]> |
License: | GPL-3 + file LICENSE |
Version: | 1.7.0 |
Built: | 2024-11-14 06:21:23 UTC |
Source: | https://github.com/bioc/rifiComparative |
'adjusting_HLToInt' merges HL and intensity segments adapting the positions to each other and combining to the genome annotation. To make HL and intensity segments comparable, log2FC(HL) is used to generate the data frame instead of distance. The fragments should have a significant p_value from t-test at least from one segmentation, either HL or intensity.
adjusting_HLToInt(data, Strand = c("+", "-"), annotation)
adjusting_HLToInt(data, Strand = c("+", "-"), annotation)
data |
data frame: data frame combined data by column |
Strand |
string: either "+" or "-" |
annotation |
data frame: data frame from processed gff3 file. |
The functions used are:
p_value_function extracts and return the p_values of HL and intensity segments respectively.
eliminate_outlier_hl eliminates outliers from HL fragments.
eliminate_outlier_int eliminates outliers from intensity fragments.
mean_length_int calculates the mean of the log2FC(intensity) fragments adapted to HL_fragments and their lengths
mean_length_hl calculates the mean of log2FC(HL) fragments adapted to the intensity fragments and their lengths.
calculating_rate calculates decay rate and log2FC(intensity). Both are used to calculate synthesis rate.
The data frame with the corresponding columns:
Integer, position of the first fragment
String, region annotation covering the fragments
String, gene annotation covering the fragments
String, locus_tag annotation covering the fragments
Boolean. The bin/probe specific strand (+/-)
String, HL fragments
String, intensity fragments
Integer, position of the first fragment and the last position of the last fragment
Integer, mean of the HL of the fragments involved
Integer, mean of the intensity of the fragments involved
Integer, log2FC(decay(condition1)/ decay(condition2))
Integer, sum of log2FC(decay_rate) and log2FC(intensity)
Integer, log2FC(mean(intensity(condition1))/mean( intensity(condition2)))
Integer, sum of log2FC(decay_rate) and log2FC(intensity)
String, indicated by "*" means at least one fragment either HL fragment or intensity fragment has a significant p_value
data(stats_df_comb_minimal) data(annot_g) df_mean_minimal <- adjusting_HLToInt(data = stats_df_comb_minimal, annotation = annot_g[[1]])
data(stats_df_comb_minimal) data(annot_g) df_mean_minimal <- adjusting_HLToInt(data = stats_df_comb_minimal, annotation = annot_g[[1]])
The result of gff3_preprocessing of gff3 file A list containing all necessary information from a gff file for adjusting_HLToInt and visualization.
data(annot_g)
data(annot_g)
A list with 2 items:
a data frame with 5853 rows and 6 variables
the region from the gff file
the start of the annotation
the end of the annotation
the strand of the annotation
the annotated gene name
the annotated locus tag
a numeric vector containing the length of the genome
https://github.com/CyanolabFreiburg/rifiComparative
The result of joining_by_row for inp_s and inp_f example data A data frame containing the output of joining_by_row as a data frame
data(data_combined_minimal)
data(data_combined_minimal)
A data frame with 600 rows and 49 variables:
The strand specific
The bin/probe specific position
The bin/probe specific ID
The bin/probe flag for background level
The relative intensity at time point 0
An internal value to determine which fitting model is applied
Information on which fitting model is applied
The position based segment
The delay value of the bin/probe
The half-life of the bin/probe
The termination factor of the bin/probe
The delay fragment the bin belongs to
The velocity value of the respective delay fragment
The vintercept of fit through the respective delay fragment
The slope of the fit through the respective delay fragment
The half-life fragment the bin belongs to
The mean half-life value of the respective half-life fragment
The intensity fragment the bin belongs to
The mean intensity value of the respective intensity fragment
The overarching transcription unit
The TI fragment the bin belongs to
The mean termination factor of the respective TI fragment
The combined ID of the fragment
presence of pausing site indicated by +/-
presence of iTSS_I indicated by +/-
The fragments involved in pausing site or iTSS_I
p_value of pausing site or iTSS_I
p_value of the slope
the slope value of the respective delay fragment
Integer, ratio of velocity between 2 delay fragments
Integer, the duration between two delay fragments
Integer, the position middle between 2 fragments with an event
Integer, the fold change value of 2 intensity fragments
Integer, the fold change value of 2 HL fragments
#'
p_value of the fold change of HL fragments
Integer, the fold change value of 2 intensity fragments
String, fragments involved in fold change between 2 intensity fragments
p_value of the fold change of intensity fragments
ratio of fold change between 2 half-life fragments and fold change between 2 intensity fragments
fragments involved on ratio of fold change between 2 half-life fragments and fold change between 2 intensity fragments
Integer, the fold change of half-life/ fold change of intensity, position of the half-life fragment is adapted to intensity fragment
Integer, the value correspomding to synthesis rate
String, the event assigned by synthesis rate either Termination or iTSS
p_value of the variance between two fold-changes, HL and intensity
p_value of TI fragment
p_value of 2 TI fragments
The condition assigned to the experiment here cdt2
The bin/probe log2 fold change of intensity at time 0
The bin/probe p_value adjusted
https://github.com/CyanolabFreiburg/rifi
The result of joining_by_column for data_combined_minimal example data A data frame containing the output of joining_by_row as a data frame
data(df_comb_minimal)
data(df_comb_minimal)
A data frame with 300 rows and 18 variables:
The strand specific
The bin/probe specific position
The bin/probe specific ID
The relative intensity at time point 0 for condition 1
The position based segment
The half-life of the bin/probe condition 1
The termination factor of the bin/probe condition 1
The half-life fragment the bin belongs to condition 1
The intensity fragment the bin belongs to condition 1
The TI fragment the bin belongs to condition 1
The bin/probe log2 fold change of intensity at time 0
The bin/probe p_value adjusted
The relative intensity at time point 0 condition 2
The half-life of the bin/probe condition 2
The termination factor of the bin/probe condition 2
The half-life fragment the bin belongs to condition 2
The intensity fragment the bin belongs to condition 2
The TI fragment the bin belongs to condition 2
https://github.com/CyanolabFreiburg/rifiComparative
The result of adjusting_HLToInt for stats_df_comb_minimal and annotation example data A data frame containing the output of adjusting_HLToInt as a data frame
data(df_mean_minimal)
data(df_mean_minimal)
A data frame with 52 rows and 15 variables:
The bin/probe specific position
the region from the gff file
the annotated gene name
the annotated locus tag
The strand specific
The half-life fragment the bin belongs
The intensity fragment the bin belongs
The position of the first fragment and the last position of the last fragment
The mean half-life value of the respective half-life fragments
The mean intensity value of the respective intensity fragments
log2FC(decay(condition1)/decay(condition2))
log2FC(decay_rate/intensity)
log2FC(decay_rate) + log2FC(intensity)
log2FC(mean(intensity(condition1))/mean( intensity(condition2)))
indicated by "*" means at least one fragment either HL fragment or intensity fragment has a significant p_value
https://github.com/CyanolabFreiburg/rifiComparative
An example data frame from Synechosystis PCC 6803 differential probes expression obtained from limma package and only interesting variables were selected. The data frame was used entirely.
data(differential_expression)
data(differential_expression)
A data frame of differential_expression with 55508 rows and 4 variables:
The bin/probe specific position
The strand specific
The bin/probe differential expression
The bin/probe p_value adjusted
https://github.com/CyanolabFreiburg/rifiComparative
'figures_fun' plots at one the density of HL, the HL category as histogram, log2FC of decay and synthesis rate and their heatscatter. Scatter plot of HL and volcano plot. The function uses the four output generated previously.
figures_fun( data.1, data.2, input.1, input.2, cdt1, cdt2, y = 30, x = 30, limits = c(0, 20) )
figures_fun( data.1, data.2, input.1, input.2, cdt1, cdt2, y = 30, x = 30, limits = c(0, 20) )
data.1 |
data frame output of statistic |
data.2 |
data frame joining two outputs from rifi_stats by row |
input.1 |
data frame joining two outputs from rifi_stats by column |
input.2 |
data frame of differential gene expression at time 0 |
cdt1 |
string for the first condition |
cdt2 |
string for the second condition |
y |
integer to break the scaling in scatter plot for y_axis |
x |
integer to break the scaling in scatter plot for x_axis |
limits |
vector to limit the scaling in scatter plot for both axis |
The functions used are:
plot_decay_synt: plots the changes in RNA decay rates versus the changes in RNA synthesis rates
plot_heatscatter: plots the changes in RNA decay rates versus the changes in RNA synthesis rates with density.
plot_volcano: plots statistical significance versus magnitude of change .
plot_histogram: plot a histogram of probe/bin half-life categories from 2 to 20 minutes in both conditions.
plot_density: plots the probe/bin half-life density in both conditions.
plot_scatter: plots of the bin/probe half-life in one condition1 vs. condition2.
extract the object of rifi_statistics from both conditions. The differential gene expression at time 0 needs to be run separately. The columns log2FC, p_value adjusted, position and strand are extracted and saved to a data frame. loading_fun_fig joins the differential gene expression table and the output from rifi statistics into one data frame.
several plots
data(data_combined_minimal) data(df_comb_minimal) data(differential_expression) data(df_mean_minimal) figures_fun(data.1 = df_mean_minimal, data.2 = data_combined_minimal, input.1 = df_comb_minimal, input.2 = differential_expression, cdt1 = "sc", cdt2 = "fe")
data(data_combined_minimal) data(df_comb_minimal) data(differential_expression) data(df_mean_minimal) figures_fun(data.1 = df_mean_minimal, data.2 = data_combined_minimal, input.1 = df_comb_minimal, input.2 = differential_expression, cdt1 = "sc", cdt2 = "fe")
The result of fragmentation for df_comb_minimal example data A data frame containing the output of fragmentation as a data frame
data(fragment_int)
data(fragment_int)
A data frame with 500 rows and 24 variables:
The strand specific
The bin/probe specific position
The bin/probe specific ID
The relative intensity at time point 0 for condition 1
The position based segment
The half-life of the bin/probe condition 1
The termination factor of the bin/probe condition 1
The half-life fragment the bin belongs to condition 1
The intensity fragment the bin belongs to condition 1
The TI fragment the bin belongs to condition 1
The bin/probe log2 fold change of intensity at time 0
The bin/probe p_value adjusted
The relative intensity at time point 0 condition 2
The half-life of the bin/probe condition 2
The termination factor of the bin/probe condition 2
The half-life fragment the bin belongs to condition 2
The intensity fragment the bin belongs to condition 2
The TI fragment the bin belongs to condition 2
The bin/probe difference of half-life from both conditions
The bin/probe log2 fold change of intensity at time 0
The half-life fragment the bin belongs to both conditions
The half-life mean of the fragment the bin belongs to both conditions
The intensity fragment the bin belongs to both conditions
The intensity mean of the fragment the bin belongs to both conditions
https://github.com/CyanolabFreiburg/rifiComparative
fragmentation fragments the half-life and intensity into segments using the penalties output.
fragmentation(data, pen_HL, pen_int, cores = 2)
fragmentation(data, pen_HL, pen_int, cores = 2)
data |
data frame: data frame combined data by column |
pen_HL |
list: list of the penalties set optimal for the fragmentation for half-life |
pen_int |
list: list of the penalties set optimal for the fragmentation for intensity |
cores |
integer: the number of assigned cores for the task. It needs to be increased in case of big data. |
Two data frames with half-life and intensity fragments and the mean of the coefficient fragment based.
data(penalties_df) data(pen_HL) data(pen_int) df_comb_minimal <- fragmentation(data = penalties_df, pen_HL, pen_int)
data(penalties_df) data(pen_HL) data(pen_int) df_comb_minimal <- fragmentation(data = penalties_df, pen_HL, pen_int)
gff3_preprocess processes the gff3 file extracting gene names and locus_tag from all coding regions (CDS). UTRs/ncRNA/asRNA if available, are also extracted. The resulting dataframe contains region, positions, strand, gene and locus_tag.
gff3_preprocess(path)
gff3_preprocess(path)
path |
path: path to the directory containing the gff3 file. |
A list with 2 items:
String, the region from the gff file
Integer, the start of the annotation
Integer, the end of the annotation
Boolean, the strand of the annotation
String, the annotated gene name
String, the annotated locus tag
a numeric vector containing the length of the genome
gff3_preprocess( path = gzfile(system.file("extdata", "gff_synechocystis_6803.gff.gz", package = "rifiComparative")) )
gff3_preprocess( path = gzfile(system.file("extdata", "gff_synechocystis_6803.gff.gz", package = "rifiComparative")) )
The result of loading_fun for stats_se_cdt2 example data Two data frame containing the output of loading_fun as second element of a list.
data(inp_f)
data(inp_f)
A data frame with 500 rows and 49 variables:
The strand specific
The bin/probe specific position
The bin/probe specific ID
The bin/probe flag for background level
The relative intensity at time point 0
An internal value to determine which fitting model is applied
Information on which fitting model is applied
The position based segment
The delay value of the bin/probe
The half-life of the bin/probe
The termination factor of the bin/probe
The delay fragment the bin belongs to
The velocity value of the respective delay fragment
The vintercept of fit through the respective delay fragment
The slope of the fit through the respective delay fragment
The half-life fragment the bin belongs to
The mean half-life value of the respective half-life fragment
The intensity fragment the bin belongs to
The mean intensity value of the respective intensity fragment
The overarching transcription unit
The TI fragment the bin belongs to
The mean termination factor of the respective TI fragment
The combined ID of the fragment
presence of pausing site indicated by +/-
presence of iTSS_I indicated by +/-
The fragments involved in pausing site or iTSS_I
p_value of pausing site or iTSS_I
p_value of the slope
the slope value of the respective delay fragment
Integer, ratio of velocity between 2 delay fragments
Integer, the duration between two delay fragments
Integer, the position middle between 2 fragments with an event
Integer, the fold change value of 2 HL fragments
Integer, the fold change value of 2 intensity fragments
p_value of the fold change of HL fragments
Integer, the fold change value of 2 intensity fragments
String, fragments involved in fold change between 2 intensity fragments
p_value of the fold change of intensity fragments
ratio of fold change between 2 half-life fragments and fold change between 2 intensity fragments
fragments involved on ratio of fold change between 2 half-life fragments and fold change between 2 intensity fragments
Integer, the fold change of half-life/ fold change of intensity, position of the half-life fragment is adapted to intensity fragment
Integer, the value correspomding to synthesis rate
String, the event assigned by synthesis rate either Termination or iTSS
p_value of the variance between two fold-changes, HL and intensity
p_value of TI fragment
p_value of 2 TI fragments
The condition assigned to the experiment here cdt2
The bin/probe log2 fold change of intensity at time 0
The bin/probe p_value adjusted
https://github.com/CyanolabFreiburg/rifiComparative
The result of loading_fun for stats_se_cdt1 example data Two data frame containing the output of loading_fun as first element of a list.
data(inp_s)
data(inp_s)
A data frame with 500 rows and 49 variables:
The strand specific
The bin/probe specific position
The bin/probe specific ID
The bin/probe flag for background level
The relative intensity at time point 0
An internal value to determine which fitting model is applied
Information on which fitting model is applied
The position based segment
The delay value of the bin/probe
The half-life of the bin/probe
The termination factor of the bin/probe
The delay fragment the bin belongs to
The velocity value of the respective delay fragment
The vintercept of fit through the respective delay fragment
The slope of the fit through the respective delay fragment
The half-life fragment the bin belongs to
The mean half-life value of the respective half-life fragment
The intensity fragment the bin belongs to
The mean intensity value of the respective intensity fragment
The overarching transcription unit
The TI fragment the bin belongs to
The mean termination factor of the respective TI fragment
The combined ID of the fragment
presence of pausing site indicated by +/-
presence of iTSS_I indicated by +/-
The fragments involved in pausing site or iTSS_I
p_value of pausing site or iTSS_I
p_value of the slope
the slope value of the respective delay fragment
Integer, ratio of velocity between 2 delay fragments
Integer, the duration between two delay fragments
Integer, the position middle between 2 fragments with an event
Integer, the fold change value of 2 HL fragments
Integer, the fold change value of 2 intensity fragments
p_value of the fold change of HL fragments
Integer, the fold change value of 2 intensity fragments
String, fragments involved in fold change between 2 intensity fragments
p_value of the fold change of intensity fragments
ratio of fold change between 2 half-life fragments and fold change between 2 intensity fragments
fragments involved on ratio of fold change between 2 half-life fragments and fold change between 2 intensity fragments
Integer, the fold change of half-life/ fold change of intensity, position of the half-life fragment is adapted to intensity fragment
Integer, the value correspomding to synthesis rate
String, the event assigned by synthesis rate either Termination or iTSS
p_value of the variance between two fold-changes, HL and intensity
p_value of TI fragment
p_value of 2 TI fragments
The condition assigned to the experiment here cdt1
The bin/probe log2 fold change of intensity at time 0
The bin/probe p_value adjusted
https://github.com/CyanolabFreiburg/rifiComparative
'joining_data_column': joins two data frames from different conditions by column.
joining_data_column(data)
joining_data_column(data)
data |
data frame with joined columns from both conditions |
The data frame with joined columns from both conditions with the corresponding columns: strand, position, ID, intensity.cdt1, position_segment, half_life.cdt1, TI_termination_factor.cdt1", HL_fragment.cdt1, intensity_fragment.cdt1, TI_termination_fragment.cdt1, logFC_int, P.Value, intensity.cdt2, half_life.cdt2, TI_termination_factor.cdt2, HL_fragment.cdt2, intensity_fragment.cdt2, TI_termination_fragment.cdt2.
cdt1: first condition, cdt2: second condition.
data(data_combined_minimal) df_comb_minimal <- joining_data_column(data = data_combined_minimal)
data(data_combined_minimal) df_comb_minimal <- joining_data_column(data = data_combined_minimal)
joining_data_row joins two data frames from different conditions by row.
joining_data_row(input1, input2)
joining_data_row(input1, input2)
input1 |
data frame from SummarizedExperiment output of rifi_stats from rifi package of the first condition. |
input2 |
data frame from SummarizedExperiment output of rifi_stats from rifi package of the second condition. |
The data frame with joined rows from both conditions with the corresponding columns: ID with position, strand, intensity, probe_TI, flag, position_segment, delay, half_life, TI_termination_factor, delay_fragment, velocity_fragment, intercept, slope, HL_fragment, HL_mean_fragment, intensity_fragment, intensity_mean_fragment, TU, TI_termination_fragment, TI_mean_termination_factor, seg_ID, pausing_site, iTSS_I, ps_ts_fragment, event_ps_itss_p_value_Ttest, p_value_slope, delay_frg_slope, velocity_ratio, event_duration, event_position, FC_HL, FC_fragment_HL, p_value_HL, FC_intensity, FC_fragment_intensity, p_value_intensity, FC_HL_intensity, FC_HL_intensity_fragment, FC_HL_adapted, synthesis_ratio, synthesis_ratio_event, p_value_Manova, p_value_TI, cdt (condition), logFC_int (log2FC(intensity)), P.Value of log2FC(intensity)
data(inp_s) data(inp_f) data_combined_minimal <- joining_data_row(input1 = inp_s, input2 = inp_f)
data(inp_s) data(inp_f) data_combined_minimal <- joining_data_row(input1 = inp_s, input2 = inp_f)
loading_fun extract outputs from rifi_stats of all conditions and join each data to the differential expression table. The differential gene expression at time 0 needs to be run separately. The columns log2FC, p_value adjusted, position and strand are extracted and saved to a data frame. loading_fun joins the differential gene expression table and the output from rifi statistics into one data frame.
loading_fun(data1, data2, data3)
loading_fun(data1, data2, data3)
data1 |
data frame from rifi_stats of one condition |
data2 |
data frame from rifi_stats of other condition |
data3 |
data frame from differential gene expression at time 0 |
A list of two data frames with joined columns from differential expression and output of rifi_stats with the corresponding columns: ID with position, strand, intensity, probe_TI, flag, position_segment, delay, half_life, TI_termination_factor, delay_fragment, velocity_fragment, intercept, slope, HL_fragment, HL_mean_fragment, intensity_fragment, intensity_mean_fragment, TU, TI_termination_fragment, TI_mean_termination_factor, seg_ID, pausing_site, iTSS_I, ps_ts_fragment, event_ps_itss_p_value_Ttest, p_value_slope, delay_frg_slope, velocity_ratio, event_duration, event_position, FC_HL, FC_fragment_HL, p_value_HL, FC_intensity, FC_fragment_intensity, p_value_intensity, FC_HL_intensity, FC_HL_intensity_fragment, FC_HL_adapted, synthesis_ratio, synthesis_ratio_event, p_value_Manova, p_value_TI, cdt (condition), logFC_int (log2FC(intensity)), P.Value of log2FC(intensity).
data(stats_se_cdt1) data(stats_se_cdt2) data(differential_expression) inp_s <- loading_fun(stats_se_cdt1, stats_se_cdt2, differential_expression)[[1]] inp_f <- loading_fun(stats_se_cdt1, stats_se_cdt2, differential_expression)[[2]]
data(stats_se_cdt1) data(stats_se_cdt2) data(differential_expression) inp_s <- loading_fun(stats_se_cdt1, stats_se_cdt2, differential_expression)[[1]] inp_f <- loading_fun(stats_se_cdt1, stats_se_cdt2, differential_expression)[[2]]
make_pen calls one of four available penalty functions to automatically assign penalties for the dynamic programming. The two functions to be called are:
fragment_HL_pen
fragment_inty_pen
make_pen( probe, FUN, cores = 1, logs, dpt = 1, smpl_min = 10, smpl_max = 100, sta_pen = 0.5, end_pen = 4.5, rez_pen = 9, sta_pen_out = 0.5, end_pen_out = 3.5, rez_pen_out = 7 )
make_pen( probe, FUN, cores = 1, logs, dpt = 1, smpl_min = 10, smpl_max = 100, sta_pen = 0.5, end_pen = 4.5, rez_pen = 9, sta_pen_out = 0.5, end_pen_out = 3.5, rez_pen_out = 7 )
probe |
data frame: data frame combined data by column |
FUN |
function: one of the four bottom level functions (see details) |
cores |
integer: the number of assigned cores for the task |
logs |
numeric vector: the logbook vector. |
dpt |
integer: the number of times a full iteration cycle is repeated with a more narrow range based on the previous cycle. |
smpl_min |
integer: the smaller end of the sampling size. |
smpl_max |
integer: the larger end of the sampling size. |
sta_pen |
numeric: the lower starting penalty. |
end_pen |
numeric: the higher starting penalty. |
rez_pen |
numeric: the number of penalties iterated within the penalty range. |
sta_pen_out |
numeric: the lower starting outlier penalty. |
end_pen_out |
numeric: the higher starting outlier penalty. |
rez_pen_out |
numeric: the number of outlier penalties iterated within the outlier penalty range. |
The two functions called return the amount of statistically correct and statistically wrong splits at a specific pair of penalties. 'make_pen' iterates over many penalty pairs and picks the most suitable pair based on the difference between wrong and correct splits. The sample size, penalty range and resolution as well as the number of cycles can be customized. The primary start parameters create a matrix with n = rez_pen rows and n = rez_pen_out columns with values between sta_pen/sta_pen_out and end_pen/end_pen_out. The best penalty pair is picked. If dept is bigger than 1 the same process is repeated with a new matrix of the same size based on the result of the previous cycle. Only position segments with length within the sample size range are considered for the penalties to increase run time.
A list with 4 items:
The logbook vector containing all penalty information
a vector with the respective penalty and outlier penalty
a matrix of the correct splits
a matrix of the incorrect splits
data(df_comb_minimal) df_comb_minimal$distance_HL <- df_comb_minimal$half_life.cdt1 - df_comb_minimal$half_life.cdt2 df_comb_minimal$distance_int <- df_comb_minimal$logFC_int pen_HL <- make_pen( probe = df_comb_minimal, FUN = rifiComparative:::fragment_HL_pen, cores = 2, logs = as.numeric(rep(NA, 8)), dpt = 1, smpl_min = 10, smpl_max = 50, sta_pen = 0.5, end_pen = 4.5, rez_pen = 9, sta_pen_out = 0.5, end_pen_out = 3.5, rez_pen_out = 7 ) pen_int <- make_pen( probe = df_comb_minimal, FUN = rifiComparative:::fragment_inty_pen, cores = 2, logs = as.numeric(rep(NA, 8)), dpt = 1, smpl_min = 10, smpl_max = 50, sta_pen = 0.5, end_pen = 4.5, rez_pen = 9, sta_pen_out = 0.5, end_pen_out = 3.5, rez_pen_out = 7 )
data(df_comb_minimal) df_comb_minimal$distance_HL <- df_comb_minimal$half_life.cdt1 - df_comb_minimal$half_life.cdt2 df_comb_minimal$distance_int <- df_comb_minimal$logFC_int pen_HL <- make_pen( probe = df_comb_minimal, FUN = rifiComparative:::fragment_HL_pen, cores = 2, logs = as.numeric(rep(NA, 8)), dpt = 1, smpl_min = 10, smpl_max = 50, sta_pen = 0.5, end_pen = 4.5, rez_pen = 9, sta_pen_out = 0.5, end_pen_out = 3.5, rez_pen_out = 7 ) pen_int <- make_pen( probe = df_comb_minimal, FUN = rifiComparative:::fragment_inty_pen, cores = 2, logs = as.numeric(rep(NA, 8)), dpt = 1, smpl_min = 10, smpl_max = 50, sta_pen = 0.5, end_pen = 4.5, rez_pen = 9, sta_pen_out = 0.5, end_pen_out = 3.5, rez_pen_out = 7 )
The result of penalties for df_comb_minimal example data. A list containing the output from penalties including the logbook and two penalty objects.
data(pen_HL)
data(pen_HL)
A list with 5 items:
A list with 4 items:
The logbook vector containing half-life penalty information
a vetor with the half-life penalty and half-life outlier penalty
a matrix of the correct splits
a matrix of the incorrect splits
https://github.com/CyanolabFreiburg/rifi
The result of penalties for df_comb_minimal example data. A list containing the output from penalties including the logbook and two penalty objects.
data(pen_int)
data(pen_int)
A list with 5 items:
A list with 4 items:
The logbook vector containing intensity penalty information
a vector with the intensity penalty and intensity outlier penalty
a matrix of the correct splits
a matrix of the incorrect splits
https://github.com/CyanolabFreiburg/rifi
penalties finds the best set of penalties for half-life and intensity fragmentation using dynamic programming. The segmentation of the HL uses the difference between 2 conditions and the segmentation of the intensity uses the log2FC.
penalties(data, cores = 2)
penalties(data, cores = 2)
data |
data frame with the joined columns from differential expression and output of rifi_stats. |
cores |
integer: the number of assigned cores for the task. It needs to be increased in case of big data. |
The function uses 4 functions:
score_fun_ave.r
make_pen.r
fragment_HL_pen.r
fragment_inty_pen.r
A list of data frame and penalties, The first element is data frame with 2 more variables, second and third are HL and intensity penalties respectively.
data(df_comb_minimal) penalties_df <- penalties(df_comb_minimal)[[1]] pen_HL <- penalties(df_comb_minimal)[[2]] pen_int <- penalties(df_comb_minimal)[[3]]
data(df_comb_minimal) penalties_df <- penalties(df_comb_minimal)[[1]] pen_HL <- penalties(df_comb_minimal)[[2]] pen_int <- penalties(df_comb_minimal)[[3]]
The result of penalties for df_comb_minimal example data A data frame containing the output of penalties as a data frame
data(penalties_df)
data(penalties_df)
A data frame with 300 rows and 20 variables:
The strand specific
The bin/probe specific position
The bin/probe specific ID
The relative intensity at time point 0 for condition 1
The position based segment
The half-life of the bin/probe condition 1
The termination factor of the bin/probe condition 1
The half-life fragment the bin belongs to condition 1
The intensity fragment the bin belongs to condition 1
The TI fragment the bin belongs to condition 1
The bin/probe log2 fold change of intensity at time 0
The bin/probe p_value adjusted
The relative intensity at time point 0 condition 2
The half-life of the bin/probe condition 2
The termination factor of the bin/probe condition 2
The half-life fragment the bin belongs to condition 2
The intensity fragment the bin belongs to condition 2
The TI fragment the bin belongs to condition 2
The bin/probe difference of half-life from both conditions
The bin/probe log2 fold change of intensity at time 0
https://github.com/CyanolabFreiburg/rifiComparative
rifi_visualization_comparison plots the genome annotation, half-life difference (HL), log2FC(intensity) fragments. It uses several functions to plot TUs and genes including small-RNAs. Additionally it plots the statistical t-test between the neighboring fragment, significant p-values from t-test are assigned with '*'.
rifi_visualization_comparison( data, data_c, genomeLength = annot_g[[2]], annot = annot_g[[1]], condition = c("cdt1", "cdt2"), Strand = c("+", "-"), region = c("CDS", "asRNA", "5'UTR", "ncRNA", "3'UTR", "tRNA"), color_region = c("grey0", "red", "blue", "orange", "yellow", "green", "white", "darkseagreen1", "grey50", "black"), color_TU = c("cyan", "yellow", "orange"), scaling_TU = c(0, 3.4, 6.6), color_text.1 = "grey0", color_text.2 = "black", Alpha = 0.5, size_tu = 1.6, size_locusTag = 1.6, size_gene = 1.6, Limit = 10, shape = 22, face = "bold", tick_length = 0.3, arrow.color = "darkseagreen1", col_above20 = "#00FFFF", fontface = "plain", shape_above20 = 14, axis_text_y_size = 3, axis_title_y_size = 6, iTSS_threshold = 1.2, p_value_manova = 0.05, termination_threshold = 0.8 )
rifi_visualization_comparison( data, data_c, genomeLength = annot_g[[2]], annot = annot_g[[1]], condition = c("cdt1", "cdt2"), Strand = c("+", "-"), region = c("CDS", "asRNA", "5'UTR", "ncRNA", "3'UTR", "tRNA"), color_region = c("grey0", "red", "blue", "orange", "yellow", "green", "white", "darkseagreen1", "grey50", "black"), color_TU = c("cyan", "yellow", "orange"), scaling_TU = c(0, 3.4, 6.6), color_text.1 = "grey0", color_text.2 = "black", Alpha = 0.5, size_tu = 1.6, size_locusTag = 1.6, size_gene = 1.6, Limit = 10, shape = 22, face = "bold", tick_length = 0.3, arrow.color = "darkseagreen1", col_above20 = "#00FFFF", fontface = "plain", shape_above20 = 14, axis_text_y_size = 3, axis_title_y_size = 6, iTSS_threshold = 1.2, p_value_manova = 0.05, termination_threshold = 0.8 )
data |
dataframe: the probe based dataframe with joined columns. |
data_c |
dataframe: the probe based dataframe with joined rows. |
genomeLength |
integer: genome length output of gff3_preprocess function. |
annot |
dataframe: the annotation file. |
condition |
string: assigned as cdt1 (condition 1) and cdt2 (condition2), it could be adapted to any name. |
Strand |
string: either ("+" or "-"). |
region |
dataframe: gff3 features of the genome. |
color_region |
string vector: vector of colors. |
color_TU |
string. TU color |
scaling_TU |
vector: values to adjusted termination and iTSSs to TUs. |
color_text.1 |
string: TU color text |
color_text.2 |
string: genes color text |
Alpha |
integer: color transparency degree. |
size_tu |
integer: TU size |
size_locusTag |
integer: locus_tag size |
size_gene |
integer: font size for gene annotation. |
Limit |
integer: value for y-axis limit. |
shape |
integer: value for shape. |
face |
string: label font. |
tick_length |
integer: value for ticks. |
arrow.color |
string: arrows color. |
col_above20 |
string: color for probes/bin above value 20. |
fontface |
integer: font type |
shape_above20 |
integer: shape for probes/bins above value 20. |
axis_text_y_size |
integer: text size for y-axis. |
axis_title_y_size |
integer: title size for y-axis. |
iTSS_threshold |
integer: threshold for iTSS_II selected to plot, default 1.2. |
p_value_manova |
integer: p_value of manova test fragments to plot, default 0.05. |
termination_threshold |
integer: threshold for termination to plot, default .8. |
The functions used are:
strand_selection: plots HL, intensity fragments from both strands.
splitGenome_function: splits the genome into fragments.
annotation_plot_comp: plots the corresponding annotation.
indice_function: assign a new column to the data to distinguish between fragments, outliers from delay or HL or intensity.
empty_data_positive: plots empty boxes in case no data is available for positive strand
empty_data_negative: plots empty boxes in case no data is available for negative strand
TU_annotation: designs the segments border for the genes and TUs annotation.
gene_annot_function: it requires gff3 file, returns a dataframe adjusting each fragment according to its annotation. It allows as well the plot of genes and TUs shared into two pages.
secondaryAxis: adjusts the half-life or delay to 20 in case of the dataframe row numbers is equal to 1 and the half-life or delay exceed the limit, they are plotted with different shape and color.
my_arrow: creates an arrow for the annotation.
arrange_byGroup: selects the last row for each segment and add 40 nucleotides in case of negative strand for a nice plot.
my_segment_T: plots terminals and pausing sites labels.
The plot.
data(data_combined_minimal) data(stats_df_comb_minimal) data(annot_g) rifi_visualization_comparison( data = data_combined_minimal, data_c = stats_df_comb_minimal, genomeLength = annot_g[[2]], annot = annot_g[[1]] )
data(data_combined_minimal) data(stats_df_comb_minimal) data(annot_g) rifi_visualization_comparison( data = data_combined_minimal, data_c = stats_df_comb_minimal, genomeLength = annot_g[[2]], annot = annot_g[[1]] )
rifiComparative a successor package of rifi. It compares 2 rifi outputs from 2 different conditions of the same organism.
rifiComparative was developed to compare 2 rifi outputs from 2 conditions. The rifi output may differ significantly from 2 conditions. The complexity of the segments number, position, length and the events make the comparison between 2 conditions nearly impossible. rifiComparative uses a simple strategy to generate single segments for both conditions, extract the features and make them comparable.
Five major steps ate described in rifiComparative:
Joining data
penalties
fragmentation
statistics
visualization
Loubna Youssar [email protected]
Jens Georg [email protected]
statistics uses t-test to check HL and intensity segments significance. The function returns the data frame with p_value and p_value adjusted. The function used is t_test_function.
statistics(data)
statistics(data)
data |
data frame: data frame output of fragmentation |
A list of two data frames, the first one contains all segments with p_value and p_value adjusted. The second one removes the duplicated segments from intensity and could be saved as an excel file.
data(fragment_int) stats_df_comb_minimal <- statistics(data= fragment_int)[[1]] df_comb_uniq_minimal <- statistics(data= fragment_int)[[2]]
data(fragment_int) stats_df_comb_minimal <- statistics(data= fragment_int)[[1]] df_comb_uniq_minimal <- statistics(data= fragment_int)[[2]]
The result of statistics for fragment_int example data A data frame containing the output of statistics as a data frame
data(stats_df_comb_minimal)
data(stats_df_comb_minimal)
A data frame with 500 rows and 26 variables:
The strand specific
The bin/probe specific position
The bin/probe specific ID
The relative intensity at time point 0 for condition 1
The position based segment
The half-life of the bin/probe condition 1
The termination factor of the bin/probe condition 1
The half-life fragment the bin belongs to condition 1
The intensity fragment the bin belongs to condition 1
The TI fragment the bin belongs to condition 1
The bin/probe log2 fold change of intensity at time 0
The bin/probe p_value adjusted
The relative intensity at time point 0 condition 2
The half-life of the bin/probe condition 2
The termination factor of the bin/probe condition 2
The half-life fragment the bin belongs to condition 2
The intensity fragment the bin belongs to condition 2
The TI fragment the bin belongs to condition 2
The bin/probe difference of half-life from both conditions
The bin/probe log2 fold change of intensity at time 0
The half-life fragment the bin belongs to both conditions
The half-life mean of the fragment the bin belongs to both conditions
The intensity fragment the bin belongs to both conditions
The intensity mean of the fragment the bin belongs to both conditions
The p_value adjusted of the half-life fragment the bin belongs to both conditions
The p_value adjusted of the intensity fragment the bin belongs to both conditions
https://github.com/CyanolabFreiburg/rifiComparative
An example SummarizedExperiment from Synechosystis PCC 6803 first condition obtained from rifi_statistics and used as input for rifiComparative
data(stats_se_cdt1)
data(stats_se_cdt1)
A rowRanges of SummarizedExperiment with 500 rows and 50 variables:
The sequence name chromosome
The bin/probe start position
The bin/probe end position
The bin/probe length
The strand specific
The bin/probe specific position
The bin/probe specific ID
The bin/probe flag for background level
The relative intensity at time point 0
An internal value to determine which fitting model is applied
Information on which fitting model is applied
The position based segment
The delay value of the bin/probe
The half-life of the bin/probe
The termination factor of the bin/probe
The delay fragment the bin belongs to
The velocity value of the respective delay fragment
The vintercept of fit through the respective delay fragment
The slope of the fit through the respective delay fragment
The half-life fragment the bin belongs to
The mean half-life value of the respective half-life fragment
The intensity fragment the bin belongs to
The mean intensity value of the respective intensity fragment
The overarching transcription unit
The TI fragment the bin belongs to
The mean termination factor of the respective TI fragment
The combined ID of the fragment
presence of pausing site indicated by +/-
presence of iTSS_I indicated by +/-
The fragments involved in pausing site or iTSS_I
p_value of pausing site or iTSS_I
#'
the slope value of the respective delay fragment
p_value of the slope
Integer, ratio of velocity between 2 delay fragments
Integer, the duration between two delay fragments
Integer, the position middle between 2 fragments with an event
Integer, the fold change value of 2 HL fragments
Integer, the fold change value of 2 intensity fragments
p_value of the fold change of HL fragments
Integer, the fold change value of 2 intensity fragments
String, fragments involved in fold change between 2 intensity fragments
p_value of the fold change of intensity fragments
ratio of fold change between 2 half-life fragments and fold change between 2 intensity fragments
fragments involved on ratio of fold change between 2 half-life fragments and fold change between 2 intensity fragments
Integer, the fold change of half-life/ fold change of intensity, position of the half-life fragment is adapted to intensity fragment
Integer, the value correspomding to synthesis rate
String, the event assigned by synthesis rate either Termination or iTSS
p_value of the variance between two fold-changes, HL and intensity
p_value of TI fragment
p_value of 2 TI fragments
https://github.com/CyanolabFreiburg/rifiComparative
An example SummarizedExperiment from Synechosystis PCC 6803 second condition obtained from rifi_statistics and used as input for rifiComparative
data(stats_se_cdt2)
data(stats_se_cdt2)
A rowRanges of SummarizedExperiment with 500 rows and 50 variables:
The sequence name chromosome
The bin/probe start position
The bin/probe end position
The bin/probe length
The strand specific
The bin/probe specific position
The bin/probe specific ID
The bin/probe flag for background level
The relative intensity at time point 0
An internal value to determine which fitting model is applied
Information on which fitting model is applied
The position based segment
The delay value of the bin/probe
The half-life of the bin/probe
The termination factor of the bin/probe
The delay fragment the bin belongs to
The velocity value of the respective delay fragment
The vintercept of fit through the respective delay fragment
The slope of the fit through the respective delay fragment
The half-life fragment the bin belongs to
The mean half-life value of the respective half-life fragment
The intensity fragment the bin belongs to
The mean intensity value of the respective intensity fragment
The overarching transcription unit
The TI fragment the bin belongs to
The mean termination factor of the respective TI fragment
The combined ID of the fragment
presence of pausing site indicated by +/-
presence of iTSS_I indicated by +/-
The fragments involved in pausing site or iTSS_I
p_value of pausing site or iTSS_I
p_value of the slope
the slope value of the respective delay fragment
Integer, ratio of velocity between 2 delay fragments
Integer, the duration between two delay fragments
Integer, the position middle between 2 fragments with an event
Integer, the fold change value of 2 HL fragments
Integer, the fold change value of 2 intensity fragments
p_value of the fold change of HL fragments
Integer, the fold change value of 2 intensity fragments
String, fragments involved in fold change between 2 intensity fragments
p_value of the fold change of intensity fragments
ratio of fold change between 2 half-life fragments and fold change between 2 intensity fragments
fragments involved on ratio of fold change between 2 half-life fragments and fold change between 2 intensity fragments
Integer, the fold change of half-life/ fold change of intensity, position of the half-life fragment is adapted to intensity fragment
Integer, the value correspomding to synthesis rate
String, the event assigned by synthesis rate either Termination or iTSS
p_value of the variance between two fold-changes, HL and intensity
p_value of TI fragment
p_value of 2 TI fragments
https://github.com/CyanolabFreiburg/rifiComparative