Title: | An R Interface for Ribo Files |
---|---|
Description: | The ribor package provides an R Interface for .ribo files. It provides functionality to read the .ribo file, which is of HDF5 format, and performs common analyses on its contents. |
Authors: | Michael Geng [cre, aut], Hakan Ozadam [aut], Can Cenik [aut] |
Maintainer: | Michael Geng <[email protected]> |
License: | GPL-3 |
Version: | 1.19.0 |
Built: | 2024-12-09 06:31:16 UTC |
Source: | https://github.com/bioc/ribor |
The function get_coverage
generates a DataFrame of coverage
data over the length of a given transcript.
get_coverage( ribo.object, name, range.lower = length_min(ribo.object), range.upper = length_max(ribo.object), length = TRUE, tidy = FALSE, alias = FALSE, compact = TRUE, experiment = experiments(ribo.object) )
get_coverage( ribo.object, name, range.lower = length_min(ribo.object), range.upper = length_max(ribo.object), length = TRUE, tidy = FALSE, alias = FALSE, compact = TRUE, experiment = experiments(ribo.object) )
ribo.object |
A 'Ribo' object |
name |
Transcript Name |
range.lower |
Lower bound of the read length, inclusive |
range.upper |
Upper bound of the read length, inclusive |
length |
Logical value that denotes if the coverage should be summed across read lengths |
tidy |
Logical value denoting whether or not the user wants a tidy format |
alias |
Option to accept the transcript input as aliases/nicknames |
compact |
Option to return a DataFrame with Rle and factor as opposed to a raw data.frame |
experiment |
List of experiments to obtain coverage information on |
The function get_coverage
first checks the experiments in the
'experiments' parameter to see if they are present in the .ribo file.
It will then check these experiments for coverage data which is an optional
dataset. As a result, this function safe guards against experiments that
do not have coverage data, but it also, by default, includes all of the
experiments in a file in the experiments' parameter.
The function checks the coverage of one transcript at a time at each read length from 'range.lower' to 'range.upper', inclusive. However, the parameter 'length' allows the user to obtain the coverage information of a transcript across the range of read lengths indicated by 'range.lower' and 'range.upper'.
If the ribo.object is generated with aliases, the 'alias' parameter, if set to TRUE, allows the user to use the alias of the transcript as the 'name' parameter instead of the original transcript name.
An annotated DataFrame or data.frame (if the compact parameter is set to FALSE) of the coverage information for the provided list of 'experiments' in the 'experiment' parameter. The returned object will have a length column when the 'length' parameter is set to FALSE, indicating that the user does not want to sum the count information across the range of read lengths. The returned data frame has the option of being tidy, and if the 'tidy' parameter is set to TRUE, a position column will be added. Finally, if the 'alias' parameter is set to TRUE, the alias transcript name must have been provided at the generation of the ribo object, and the function will accept this aliased name in the 'transcript' parameter.
Ribo
to generate the necessary ribo.object parameter
#generate the ribo object file.path <- system.file("extdata", "sample.ribo", package = "ribor") sample <- Ribo(file.path) #get the experiments of interest that also contain coverage data experiments <- c("Hela_1", "Hela_2", "Hela_3", "WT_1") #the ribo file contains a transcript named 'MYC' coverage.data <- get_coverage(ribo.object = sample, name = "MYC", range.lower = 2, range.upper = 5, length = TRUE, experiment = experiments)
#generate the ribo object file.path <- system.file("extdata", "sample.ribo", package = "ribor") sample <- Ribo(file.path) #get the experiments of interest that also contain coverage data experiments <- c("Hela_1", "Hela_2", "Hela_3", "WT_1") #the ribo file contains a transcript named 'MYC' coverage.data <- get_coverage(ribo.object = sample, name = "MYC", range.lower = 2, range.upper = 5, length = TRUE, experiment = experiments)
The function get_experiments
provides a list of experiment names in the .ribo file.
get_experiments(ribo.object)
get_experiments(ribo.object)
ribo.object |
S4 object of class "Ribo" |
get_experiments
returns a list of strings denoting the experiments. It obtains this
by reading directly from the .ribo file through the path of the 'ribo.object' parameter. To generate
the param 'ribo.object', call the Ribo
function and provide the path to the .ribo file of interest.
The user can then choose to create a subset from this list for any specific experiments of interest
for later function calls. Many functions that have the param 'experiment.list'
call get_experiments
to generate a default list of all experiments in the
.ribo file.
A list of the experiment names
Ribo
to generate the necessary ribo.object parameter
#generate the ribo object file.path <- system.file("extdata", "sample.ribo", package = "ribor") sample <- Ribo(file.path) #get a list of the experiments get_experiments(sample)
#generate the ribo object file.path <- system.file("extdata", "sample.ribo", package = "ribor") sample <- Ribo(file.path) #get a list of the experiments get_experiments(sample)
The function get_info
provides information on the attributes, metadata,
and datasets of the ribo file.
get_info(ribo.object)
get_info(ribo.object)
ribo.object |
ribo.object is an S4 object of class "Ribo" |
The get_info
first provides information on the format version, left_span, right_span,
longest read length, shortest read length, metagene_radius, and reference model. The last element of the
returned list contains the information about the presence of coverage and RNA-seq data which are
optional datasets to include in a .ribo file.
Returns a list containing a nested list of file attributes, a logical value denoting whether the root file has additional metadata, and a data.frame of information on each experiment
Ribo
to generate the necessary ribo.object parameter
#generate the ribo object file.path <- system.file("extdata", "sample.ribo", package = "ribor") sample <- Ribo(file.path) #retrieve information get_info(sample)
#generate the ribo object file.path <- system.file("extdata", "sample.ribo", package = "ribor") sample <- Ribo(file.path) #retrieve information get_info(sample)
The function get_internal_region_coordinates
retrieves the
start and site positions for the UTR5, UTR5 Junction, CDS, UTR3 Junction,
and UTR3 regions of every transcript.
get_internal_region_coordinates(ribo.object, alias = FALSE)
get_internal_region_coordinates(ribo.object, alias = FALSE)
ribo.object |
A 'Ribo' object |
alias |
Option to return the transcript names as aliases |
To note, because of the R-specific 1-based indexing, the positions start at
1 instead of 0 in other programming languages. The positions provided in
the returned data.frame will correspond to the positions in the output of
get_coverage
.
Additionally, within the transcripts, there are edge cases. NA values found in the returned data.frame means that the region has no start and stop position and a length of zero after computing the boundaries of the UTR5 and UTR3 junction.
A data.frame of start and stop coordinates for every region
# generate a ribo object file.path <- system.file("extdata", "HEK293_ingolia.ribo", package = "ribor") sample <- Ribo(file.path, rename = rename_default) # get the region coordinates coord <- get_internal_region_coordinates(sample, alias = TRUE)
# generate a ribo object file.path <- system.file("extdata", "HEK293_ingolia.ribo", package = "ribor") sample <- Ribo(file.path, rename = rename_default) # get the region coordinates coord <- get_internal_region_coordinates(sample, alias = TRUE)
The function get_internal_region_coordinates
retrieves the
lengths for the UTR5, UTR5 Junction, CDS, UTR3 Junction,
and UTR3 regions of every transcript.
get_internal_region_lengths(ribo.object, alias = FALSE)
get_internal_region_lengths(ribo.object, alias = FALSE)
ribo.object |
A 'Ribo' object |
alias |
Option to return the transcript names as aliases |
A data.frame of the region lengths
# generate a ribo object file.path <- system.file("extdata", "HEK293_ingolia.ribo", package = "ribor") sample <- Ribo(file.path, rename = rename_default) # get the region lengths region_lengths <- get_internal_region_lengths(sample, alias = TRUE)
# generate a ribo object file.path <- system.file("extdata", "HEK293_ingolia.ribo", package = "ribor") sample <- Ribo(file.path, rename = rename_default) # get the region lengths region_lengths <- get_internal_region_lengths(sample, alias = TRUE)
The function get_length_distribution
retrieves the raw or normalized
counts at each read length from 'range.lower' to 'range.upper'.
get_length_distribution( ribo.object, region, range.lower = length_min(ribo.object), range.upper = length_max(ribo.object), compact = TRUE, experiment = experiments(ribo.object) )
get_length_distribution( ribo.object, region, range.lower = length_min(ribo.object), range.upper = length_max(ribo.object), compact = TRUE, experiment = experiments(ribo.object) )
ribo.object |
A 'Ribo' object |
region |
Specific region of interest |
range.lower |
Lower bound of the read length, inclusive |
range.upper |
Upper bound of the read length, inclusive |
compact |
Option to return a DataFrame with Rle and factor as opposed to a raw data.frame |
experiment |
List of experiment names |
This function is a wrapper function of get_region_counts
, and the
returned DataFrame is valid input for plot_length_distribution
.
An annotated DataFrame or data.frame (if the compact parameter is set to FALSE) of the read-length specific region count information for a single region specified in the 'region' parameter. The returned data frame will have a length column, and it will not contain a transcript column.
plot_length_distribution
to plot the output of this function
#generate the ribo object file.path <- system.file("extdata", "sample.ribo", package = "ribor") sample <- Ribo(file.path) #specify the experiments of interest experiments <- c("Hela_1", "Hela_2", "WT_1") #gets the normalized length distribution from read length 2 to 5 length.dist <- get_length_distribution(ribo.object = sample, region = "CDS", range.lower = 2, range.upper = 5)
#generate the ribo object file.path <- system.file("extdata", "sample.ribo", package = "ribor") sample <- Ribo(file.path) #specify the experiments of interest experiments <- c("Hela_1", "Hela_2", "WT_1") #gets the normalized length distribution from read length 2 to 5 length.dist <- get_length_distribution(ribo.object = sample, region = "CDS", range.lower = 2, range.upper = 5)
get_metadata
provides information on all of the user-inputted
metadata of an experiment. If the experiment is not found, then the
attributes of the root .ribo file is returned instead.
get_metadata(ribo.object, name = NULL, print = TRUE)
get_metadata(ribo.object, name = NULL, print = TRUE)
ribo.object |
object of class 'ribo' |
name |
The name of the experiment |
print |
Logical value indicating whether or not to neatly print the output |
If a valid experiment name is provided, a list of elements providing all of the metadata of the experiment is returned.
If the name is not provided and the root file has metadata, then a list of elements providing all of the metadata found in the root file is returnend.
Ribo
to generate the necessary ribo.object parameter
#ribo object use case #generate the ribo object file.path <- system.file("extdata", "sample.ribo", package = "ribor") sample <- Ribo(file.path) #the ribo file contains an experiment named 'Hela_1' get_metadata(sample, "Hela_1")
#ribo object use case #generate the ribo object file.path <- system.file("extdata", "sample.ribo", package = "ribor") sample <- Ribo(file.path) #the ribo file contains an experiment named 'Hela_1' get_metadata(sample, "Hela_1")
The function get_metagene
returns a data frame that provides
the coverage at the positions surrounding the metagene start or stop site.
get_metagene( ribo.object, site, range.lower = length_min(ribo.object), range.upper = length_max(ribo.object), transcript = TRUE, length = TRUE, alias = FALSE, compact = TRUE, experiment = experiments(ribo.object) )
get_metagene( ribo.object, site, range.lower = length_min(ribo.object), range.upper = length_max(ribo.object), transcript = TRUE, length = TRUE, alias = FALSE, compact = TRUE, experiment = experiments(ribo.object) )
ribo.object |
A 'Ribo' object |
site |
"start" or "stop" site coverage |
range.lower |
Lower bound of the read length, inclusive |
range.upper |
Upper bound of the read length, inclusive |
transcript |
Logical value that denotes if the metagene information should be summed across transcripts |
length |
Logical value that denotes if the metagene information should be summed across read lengths |
alias |
Option to report the transcripts as aliases/nicknames |
compact |
Option to return a DataFrame with Rle and factor as opposed to a raw data.frame |
experiment |
List of experiment names |
The dimensions of the returned data frame depend on the parameters range.lower, range.upper, length, and transcript.
The param 'length' condenses the read lengths together. When length is TRUE and transcript is FALSE, the data frame presents information for each transcript across all of the read lengths. That is, each transcript has a value that is the sum of all of the counts across every read length. As a result, information about the transcript at each specific read length is lost.
The param 'transcripts' condenses the transcripts together. When transcript is TRUE and length is FALSE, the data frame presents information at each read length between range.lower and range.upper inclusive. That is, each separate read length denotes the sum of counts from every transcript. As a result, information about the counts of each individual transcript is lost.
If both 'length' and 'transcript' are TRUE, then the resulting data frame prints out one row for each experiment. This provides the metagene information across all transcripts and all reads in a given experiment.
If both length' and 'transcript' are FALSE, no calculations are done to the data, all information is preserved for both the read length and the transcript. The data frame would just present the entire stored raw data from the read length 'range.lower' to the read length 'range.upper' which in most cases would result in a slow run time with a massive DataFrame returned.
When 'transcript' is set to FALSE, the 'alias' parameter specifies whether or not the returned DataFrame should present each transcript as an alias instead of the original name. If 'alias' is set to TRUE, then the returned data frame will contain the aliases rather than the original reference names of the .ribo file.
An annotated DataFrame or data.frame (if the compact parameter is set to FALSE) of the metagene information for either the 'stop' or 'start' site provided in the 'site' parameter. The returned data frame will have a length column when the 'length' parameter is set to FALSE, indicating the returned data frame will have a transcript column whe the 'transcript' parameter is set to FALSE, indicating that the count information will not be summed across the transcripts. In the case that transcript parameter is 'FALSE', the returned data frame will present the transcripts according to the aliases specified at the creation of the ribo object if the 'alias' parameter is set to TRUE.
Ribo
to generate the necessary 'Ribo' class object,
plot_metagene
to visualize the metagene data,
get_tidy_metagene
to obtain tidy metagene data under certain conditions
#generate the ribo object by providing the file.path to the ribo file file.path <- system.file("extdata", "sample.ribo", package = "ribor") sample <- Ribo(file.path) #extract the total metagene information for all experiments #across the read lengths and transcripts of the start site #from read length 2 to 5 metagene_info <- get_metagene(ribo.object = sample, site = "start", range.lower = 2, range.upper = 5, length = TRUE, transcript = TRUE, experiment = experiments(sample)) #Note that length, transcript, and experiments in this case are the #default values and can be left out. The following generates the same output. metagene_info <- get_metagene(ribo.object = sample, site = "start", range.lower = 2, range.upper = 5)
#generate the ribo object by providing the file.path to the ribo file file.path <- system.file("extdata", "sample.ribo", package = "ribor") sample <- Ribo(file.path) #extract the total metagene information for all experiments #across the read lengths and transcripts of the start site #from read length 2 to 5 metagene_info <- get_metagene(ribo.object = sample, site = "start", range.lower = 2, range.upper = 5, length = TRUE, transcript = TRUE, experiment = experiments(sample)) #Note that length, transcript, and experiments in this case are the #default values and can be left out. The following generates the same output. metagene_info <- get_metagene(ribo.object = sample, site = "start", range.lower = 2, range.upper = 5)
The function get_original_region_coordinates
retrieves the
start and site positions for the UTR5, UTR5 Junction, CDS, UTR3 Junction,
and UTR3 regions of every transcript.
get_original_region_coordinates(ribo.object, alias = FALSE)
get_original_region_coordinates(ribo.object, alias = FALSE)
ribo.object |
A 'Ribo' object |
alias |
Option to return the transcript names as aliases |
To note, because of the R-specific 1-based indexing, the positions start at
1 instead of 0 in other programming languages. The positions provided in
the returned data.frame will correspond to the positions in the output of
get_coverage
.
Additionally, within the transcripts, there are edge cases. NA values found in the returned data.frame means that the region has no start and stop position and a length of zero after computing the boundaries of the UTR5 and UTR3 junction.
A data.frame of start and stop coordinates for every region
# generate a ribo object file.path <- system.file("extdata", "HEK293_ingolia.ribo", package = "ribor") sample <- Ribo(file.path, rename = rename_default) # get the region coordinates coord <- get_original_region_coordinates(sample, alias = TRUE)
# generate a ribo object file.path <- system.file("extdata", "HEK293_ingolia.ribo", package = "ribor") sample <- Ribo(file.path, rename = rename_default) # get the region coordinates coord <- get_original_region_coordinates(sample, alias = TRUE)
The function get_original_region_coordinates
retrieves the
lengths for the UTR5, CDS, and UTR3 regions of every transcript.
get_original_region_lengths(ribo.object, alias = FALSE)
get_original_region_lengths(ribo.object, alias = FALSE)
ribo.object |
A 'Ribo' object |
alias |
Option to return the transcript names as aliases |
A data.frame of the region lengths
# generate a ribo object file.path <- system.file("extdata", "HEK293_ingolia.ribo", package = "ribor") sample <- Ribo(file.path, rename = rename_default) # get the region coordinates region_lengths <- get_original_region_lengths(sample, alias = TRUE)
# generate a ribo object file.path <- system.file("extdata", "HEK293_ingolia.ribo", package = "ribor") sample <- Ribo(file.path, rename = rename_default) # get the region coordinates region_lengths <- get_original_region_lengths(sample, alias = TRUE)
Gets a list of reference names by reading directly from the .ribo file
get_reference_names(ribo.object)
get_reference_names(ribo.object)
ribo.object |
A 'Ribo' object-=09 |
a list of the reference names
#generate a ribo object with transcript nicknames/aliases file.path <- system.file("extdata", "HEK293_ingolia.ribo", package = "ribor") sample <- Ribo(file.path) #get the reference names names <- get_reference_names(sample)
#generate a ribo object with transcript nicknames/aliases file.path <- system.file("extdata", "HEK293_ingolia.ribo", package = "ribor") sample <- Ribo(file.path) #get the reference names names <- get_reference_names(sample)
The function get_region_coordinates
retrieves the
start and site positions for the UTR5, UTR5 Junction, CDS, UTR3 Junction,
and UTR3 regions of every transcript.
get_region_coordinates(ribo.object, alias = FALSE)
get_region_coordinates(ribo.object, alias = FALSE)
ribo.object |
A 'Ribo' object |
alias |
Option to return the transcript names as aliases |
To note, because of the R-specific 1-based indexing, the positions start at
1 instead of 0 in other programming languages. The positions provided in
the returned data.frame will correspond to the positions in the output of
get_coverage
.
Additionally, within the transcripts, there are edge cases. NA values found in the returned data.frame means that the region has no start and stop position and a length of zero after computing the boundaries of the UTR5 and UTR3 junction.
A data.frame of start and stop coordinates for every region
get_region_counts
will return the particular region counts
of any subset of regions for a given set of experiments.
get_region_counts( ribo.object, range.lower = length_min(ribo.object), range.upper = length_max(ribo.object), length = TRUE, transcript = TRUE, tidy = TRUE, alias = FALSE, normalize = FALSE, region = c("UTR5", "UTR5J", "CDS", "UTR3J", "UTR3"), compact = TRUE, experiment = experiments(ribo.object) )
get_region_counts( ribo.object, range.lower = length_min(ribo.object), range.upper = length_max(ribo.object), length = TRUE, transcript = TRUE, tidy = TRUE, alias = FALSE, normalize = FALSE, region = c("UTR5", "UTR5J", "CDS", "UTR3J", "UTR3"), compact = TRUE, experiment = experiments(ribo.object) )
ribo.object |
A 'Ribo' object |
range.lower |
Lower bound of the read length, inclusive |
range.upper |
Upper bound of the read length, inclusive |
length |
Logical value that denotes if the region count information should be summed across read lengths |
transcript |
Logical value that denotes if the region count information should be summed across transcripts |
tidy |
Option to return the data frame in a tidy format |
alias |
Option to report the transcripts as aliases/nicknames |
normalize |
Option to normalize the counts as counts per million reads |
region |
Specific region of interest |
compact |
Option to return a DataFrame with Rle and factor as opposed to a raw data.frame |
experiment |
List of experiment names |
This function will return a data frane of the counts at each specified region for each specified experiment. The region options are "UTR5", "UTR5J", "CDS", "UTR3J", and "UTR3". The user can specify any subset of regions in the form of a vector, a list, or a single string if only one region is desired.
The dimensions of the returned DataFrame depend on the parameters range.lower, range.upper, length, and transcript.
The param 'length' condenses the read lengths together. When length is TRUE and transcript is FALSE, the data frame presents information for each transcript across all of the read lengths. That is, each transcript has a value that is the sum of all of the counts across every read length. As a result, information about the transcript at each specific read length is lost.
The param 'transcript' condenses the transcripts together. When transcript is TRUE and length is FALSE data frame presents information at each read length between range.lower and range.upper inclusive. That is, each separate read length denotes the sum of counts from every transcript. As a result, information about the counts of each individual transcript is lost.
When 'transcript' is set to FALSE, the 'alias' parameter specifies whether or not the returned DataFrame should present each transcript as an alias instead of the original name. If 'alias' is set to TRUE, then the column of the transcript names will contain the aliases rather than the original reference names of the .ribo file.
If both 'length' and 'transcript' are TRUE, then the resulting DataFrame prints out one row for each experiment. This provides the metagene information across all transcripts and all reads in a given experiment.
If both length' and 'transcript' are FALSE, calculations are done to the data, all information is preserved for both the read length and the transcript. The DataFrame would just present the entire stored raw data from the read length 'range.lower' to the read length 'range.upper' which in most cases would result in a slow run time with a massive DataFrame returned.
When 'transcript' is set to FALSE, the 'alias' parameter specifies whether or not the returned DataFrame should present each transcript as an alias instead of the original name. If 'alias' is set to TRUE, then the column of the transcript names will contain the aliases rather than the original reference names of the .ribo file.
An annotated DataFrame or data.frame (if the compact parameter is set to FALSE) of the region count information for the regions specified in the 'region' parameter. The returned data frame will have a length column when the 'length' parameter is set to FALSE, indicating that the count information will not be summed across the provided range of read lengths. Similarly, the returned data frame will have a transcript column when the 'transcript' parameter is set to FALSE, indicating that the count information will not be summed across the transcripts. In the case that transcript parameter is 'FALSE', the returned data frame will present the transcripts according to the aliases specified at the creation of the ribo object if the 'alias' parameter is set to TRUE.
#generate the ribo object file.path <- system.file("extdata", "sample.ribo", package = "ribor") sample <- Ribo(file.path) #specify the regions and experiments of interest regions <- c("UTR5", "UTR5J", "CDS", "UTR3J", "UTR3") experiments <- c("Hela_1", "Hela_2", "WT_1") #obtains the region counts at each individual read length, summed across every transcript region.counts <- get_region_counts(ribo.object = sample, region = regions, range.lower = 2, range.upper = 5, length = FALSE, transcript = TRUE, tidy = FALSE, alias = FALSE, experiment = experiments)
#generate the ribo object file.path <- system.file("extdata", "sample.ribo", package = "ribor") sample <- Ribo(file.path) #specify the regions and experiments of interest regions <- c("UTR5", "UTR5J", "CDS", "UTR3J", "UTR3") experiments <- c("Hela_1", "Hela_2", "WT_1") #obtains the region counts at each individual read length, summed across every transcript region.counts <- get_region_counts(ribo.object = sample, region = regions, range.lower = 2, range.upper = 5, length = FALSE, transcript = TRUE, tidy = FALSE, alias = FALSE, experiment = experiments)
The function get_region_lengths
retrieves the
lengths for the UTR5, UTR5 Junction, CDS, UTR3 Junction,
and UTR3 regions of every transcript.
get_region_lengths(ribo.object, alias = FALSE)
get_region_lengths(ribo.object, alias = FALSE)
ribo.object |
A 'Ribo' object |
alias |
Option to return the transcript names as aliases |
This function is deprecated, and we recommend get_internal_region_lengths
.
A data.frame of the region lengths
get_rnaseq
returns a data frame containing information on the transcript name, experiment, and
sequence abundance
get_rnaseq( ribo.object, tidy = TRUE, region = c("UTR5", "UTR5J", "CDS", "UTR3J", "UTR3"), experiment = experiments(ribo.object), compact = TRUE, alias = FALSE )
get_rnaseq( ribo.object, tidy = TRUE, region = c("UTR5", "UTR5J", "CDS", "UTR3J", "UTR3"), experiment = experiments(ribo.object), compact = TRUE, alias = FALSE )
ribo.object |
A 'Ribo' object |
tidy |
Option to return the data frame in a tidy format |
region |
Specific region(s) of interest |
experiment |
List of experiment names |
compact |
Option to return a DataFrame with Rle and factor as opposed to a raw data.frame |
alias |
Option to report the transcripts as aliases/nicknames |
As a default value, experiment.list is presumed to include all of the experiments within a ribo file. RNA-Seq data is an optional dataset to include in a .ribo file. The experiments in experiment.list are checked for experiment existence in the ribo file and then checked for RNA-seq data.
The returned DataFrame can either be in the tidy format for easier data cleaning or in a condensed non-tidy format. The data will present RNA-seq counts for each transcript in each valid experiment in experiment.list.
The 'alias' parameter specifies whether or not the returned DataFrame should present each transcript as an alias instead of the original name. If 'alias' is set to TRUE, then the column of the transcript names will contain the aliases rather than the original reference names of the .ribo file.
An annotated data frame containing the RNA-Seq counts for the regions in specified in the 'region' parameter with the option of presenting the data in a tidy format. Additionally, the function returns a DataFrame with Rle and factor applied if the 'compact' parameter is set to TRUE and a data.frame without any Rle or factor if the 'compact' parameter is set to FALSE
Ribo
to generate the necessary ribo.object parameter
#generate the ribo object file.path <- system.file("extdata", "sample.ribo", package = "ribor") sample <- Ribo(file.path) #list out the experiments of interest that have RNA-Seq data experiments <- c("Hela_1", "Hela_2", "WT_1") regions <- c("UTR5", "CDS", "UTR3") rnaseq.data <- get_rnaseq(ribo.object = sample, tidy = TRUE, region = regions, experiment = experiments)
#generate the ribo object file.path <- system.file("extdata", "sample.ribo", package = "ribor") sample <- Ribo(file.path) #list out the experiments of interest that have RNA-Seq data experiments <- c("Hela_1", "Hela_2", "WT_1") regions <- c("UTR5", "CDS", "UTR3") rnaseq.data <- get_rnaseq(ribo.object = sample, tidy = TRUE, region = regions, experiment = experiments)
The function get_tidy_metagene
provides the user with a tidy data format for easier
data cleaning and manipulation. In providing this functionality while reducing the returned data frame
size, the user must aggregate across the transcripts and is only provided the option to aggregate the
read lengths together.
get_tidy_metagene( ribo.object, site, range.lower = length_min(ribo.object), range.upper = length_max(ribo.object), length = TRUE, compact = TRUE, experiment = experiments(ribo.object) )
get_tidy_metagene( ribo.object, site, range.lower = length_min(ribo.object), range.upper = length_max(ribo.object), length = TRUE, compact = TRUE, experiment = experiments(ribo.object) )
ribo.object |
A 'Ribo' object |
site |
"start" or "stop" site coverage |
range.lower |
Lower bound of the read length, inclusive |
range.upper |
Upper bound of the read length, inclusive |
length |
Logical value that denotes if the metagene information should be summed across read lengths |
compact |
Option to return a DataFrame with Rle and factor as opposed to a raw data.frame |
experiment |
List of experiment names |
The dimensions of the returned data frame depend on the parameters range.lower, range.upper, and length.
The param 'length' condenses the read lengths together. When length is TRUE, then the resulting data frame prints out one row for each experiment. This provides a tidy format of the metagene information across all transcripts and all read lengths in a given experiment. Each row in the data frame represents the total metagene coverage count of a given experiment at a given position.
When the param 'length' is FALSE, then the resulting data frame prints out the metagene coverage count at each position of the metagene radius for each read length. This provides a tidy format of the metagene information across the transcripts, preserving the metagene coverage count at each read length.
An annotated, tidy DataFrame or data.frame (if the compact parameter is set to FALSE) of the metagene information for either the 'stop' or 'start' site provided in the 'site' parameter. The data frame, as a result of its tidy property, will have a position column. The returned data frame will have a length column when the 'length' parameter is set to FALSE, indicating will be automatically aggregated to keep the memory footprint of this function reasonable.
Ribo
to generate the necessary 'Ribo' class object.
plot_metagene
to visualize the metagene data,
get_metagene
to obtain tidy metagene data under certain conditions
#generate the ribo object by loading in a ribo function and calling the \code{\link{Ribo}} function file.path <- system.file("extdata", "sample.ribo", package = "ribor") sample <- Ribo(file.path) #extract the total metagene information in a tidy format #for all experiments across the read lengths and transcripts #of the start site from read length 2 to 5 metagene_info <- get_tidy_metagene(ribo.object = sample, site = "start", range.lower = 2, range.upper = 5, length = TRUE, experiment = experiments(sample)) #Note that length and experiments in this case are the #default values and can be left out. The following generates the same output. metagene_info <- get_tidy_metagene(ribo.object = sample, site = "start", range.lower = 2, range.upper = 5)
#generate the ribo object by loading in a ribo function and calling the \code{\link{Ribo}} function file.path <- system.file("extdata", "sample.ribo", package = "ribor") sample <- Ribo(file.path) #extract the total metagene information in a tidy format #for all experiments across the read lengths and transcripts #of the start site from read length 2 to 5 metagene_info <- get_tidy_metagene(ribo.object = sample, site = "start", range.lower = 2, range.upper = 5, length = TRUE, experiment = experiments(sample)) #Note that length and experiments in this case are the #default values and can be left out. The following generates the same output. metagene_info <- get_tidy_metagene(ribo.object = sample, site = "start", range.lower = 2, range.upper = 5)
The function plot_length_distribution
can take either a DataFrame
or a "Ribo" object to generate a line graph of the length distributions from
range.lower to range.upper.
plot_length_distribution( x, region, experiment, range.lower, range.upper, fraction = FALSE, title = "Length Distribution" )
plot_length_distribution( x, region, experiment, range.lower, range.upper, fraction = FALSE, title = "Length Distribution" )
x |
A 'Ribo' object or a DataFrame generated from |
region |
the region of interest |
experiment |
a list of experiment names |
range.lower |
a lower bounds for a read length range |
range.upper |
an upper bounds for a read length range |
fraction |
logical value that, if TRUE, presents the count as a fraction of the total reads in the given ranges |
title |
a title for the generated plot |
The param 'fraction' will plot the fractions of each length relative to the total sum of the read length range provided by param 'range.lower' and 'range.upper'. When fraction is set to FALSE, the total count of each read length is plotted.
When given a "Ribo" object, plot_length_distribution
calls
get_region_counts
to retrieve the necessary information
for plotting.
The user can instead provide a DataFrame with the same structure as the
output of the get_region_counts
function where the 'transcript'
parameter is set to FALSE and 'length' parameters is the default value of
TRUE. This also means that the many of the remaining parameters of the
plot_length_distribution
function are not necessary. The run
time becomes substantially faster when plot_region_counts
is
given the direct DataFrame to plot. Note that there is no manipulation by
this function on the DataFrame. This responsibility is given to the user
and allows for more control.
A 'ggplot' of the length distribution
get_region_counts
to generate a DataFrame that can
be provided as input,
Ribo
to create a ribo.object that can be provided as input
#ribo object use case #generate the ribo object file.path <- system.file("extdata", "sample.ribo", package = "ribor") sample <- Ribo(file.path) #specify experiments of interest experiments <- c("Hela_1", "Hela_2", "WT_1") plot_length_distribution(x = sample, region = "CDS", range.lower = 2, range.upper = 5, experiment = experiments, fraction = TRUE) #DataFrame use case #obtains the region counts at each individual read length, summed across every transcript region.counts <- get_length_distribution(ribo.object = sample, region = "CDS", range.lower = 2, range.upper = 5, experiment = experiments) #the param 'length' must be set to FALSE and param 'transcript' must be set #to TRUE to use a DataFrame plot_length_distribution(region.counts)
#ribo object use case #generate the ribo object file.path <- system.file("extdata", "sample.ribo", package = "ribor") sample <- Ribo(file.path) #specify experiments of interest experiments <- c("Hela_1", "Hela_2", "WT_1") plot_length_distribution(x = sample, region = "CDS", range.lower = 2, range.upper = 5, experiment = experiments, fraction = TRUE) #DataFrame use case #obtains the region counts at each individual read length, summed across every transcript region.counts <- get_length_distribution(ribo.object = sample, region = "CDS", range.lower = 2, range.upper = 5, experiment = experiments) #the param 'length' must be set to FALSE and param 'transcript' must be set #to TRUE to use a DataFrame plot_length_distribution(region.counts)
The function plot_metagene
plots the metagene site coverage,
separating by experiment.
plot_metagene( x, site, experiment, range.lower, range.upper, normalize = FALSE, title = "Metagene Site Coverage", tick = 10 )
plot_metagene( x, site, experiment, range.lower, range.upper, normalize = FALSE, title = "Metagene Site Coverage", tick = 10 )
x |
A 'Ribo' object or a data frame generated from |
site |
"start" or "stop" site |
experiment |
list of experiments |
range.lower |
lower bound of the read length, inclusive |
range.upper |
upper bound of the read length, inclusive |
normalize |
When TRUE, normalizes the data by the total reads. |
title |
title of the generated plot |
tick |
x-axis labeling increment |
If a DataFrame is provided as param 'x', then the only additional parameter is the optional title' parameter for the generated plot. If a ribo.object is provided as param 'x', the rest of the parameters listed are necessary.
When given a ribo class object, the plot_metagene
function
generates a DataFrame by calling the get_tidy_metagene
function, so the run times in this case will be mostly comprised of a call
to the get_metagene
function.
This function uses ggplot in its underlying implementation.
A 'ggplot' of the metagene site coverage
#a potential use case is to directly pass in the ribo object file as param 'x' #generate the ribo object to directly use file.path <- system.file("extdata", "sample.ribo", package = "ribor") sample <- Ribo(file.path) #specify experiments of interest experiments <- c("Hela_1", "Hela_2", "WT_1") #plot the metagene start site coverage for all experiments in 'sample.ribo' #from read length 2 to 5 plot_metagene(x = sample, site = "start", range.lower = 2, range.upper = 5, experiment = experiments) #Note that the site, range.lower, range.upper, and experiment parameter are only #necessary if a ribo object is being passed in as param 'x'. If a ribo #object is passed in, then the param 'experiments' will be set to all of #the experiments by default. #If a DataFrame is passed in, then the plot_metagene function #does not need any other information. All of the elements of the DataFrame #will be used, assuming that it contains the same column names and number of #columns as the output from get_tidy_metagene() #gets the metagene start site coverage from read length 2 to 5 #note that the data must be summed across transcripts and read lengths #for the plot_metagene function data <- get_tidy_metagene(sample, site = "start", range.lower = 2, range.upper = 5) #plot the metagene data plot_metagene(data)
#a potential use case is to directly pass in the ribo object file as param 'x' #generate the ribo object to directly use file.path <- system.file("extdata", "sample.ribo", package = "ribor") sample <- Ribo(file.path) #specify experiments of interest experiments <- c("Hela_1", "Hela_2", "WT_1") #plot the metagene start site coverage for all experiments in 'sample.ribo' #from read length 2 to 5 plot_metagene(x = sample, site = "start", range.lower = 2, range.upper = 5, experiment = experiments) #Note that the site, range.lower, range.upper, and experiment parameter are only #necessary if a ribo object is being passed in as param 'x'. If a ribo #object is passed in, then the param 'experiments' will be set to all of #the experiments by default. #If a DataFrame is passed in, then the plot_metagene function #does not need any other information. All of the elements of the DataFrame #will be used, assuming that it contains the same column names and number of #columns as the output from get_tidy_metagene() #gets the metagene start site coverage from read length 2 to 5 #note that the data must be summed across transcripts and read lengths #for the plot_metagene function data <- get_tidy_metagene(sample, site = "start", range.lower = 2, range.upper = 5) #plot the metagene data plot_metagene(data)
The function plot_region_counts
can take either a DataFrame
or a "Ribo" object to generate the a stacked bar plot of proportions that
correspond to the "UTR5", "CDS", and "UTR3" regions.
plot_region_counts( x, experiment, range.lower, range.upper, title = "Region Counts" )
plot_region_counts( x, experiment, range.lower, range.upper, title = "Region Counts" )
x |
A 'Ribo' object or a DataFrame generated from |
experiment |
a list of experiment names |
range.lower |
a lower bounds for a read length range |
range.upper |
an upper bounds for a read length range |
title |
a title for the generated plot |
When given a 'Ribo' object, plot_region_counts
calls
get_region_counts
to retrieve the necessary information
for plotting. This option is in the case that a DataFrame of the
region count information is not required.
The user can instead provide a DataFrame with the same structure as the
output of the get_region_counts
function where the 'transcript'
and 'length' parameters are the default values of TRUE. This also means that
the remaining parameters of the plot_region_counts
function are not necessary.
The run time becomes substantially faster when plot_region_counts
is given
the direct DataFrame to plot. However, the DataFrame needs to follow the format and
types in the output of the reading functions
A 'ggplot' of the region counts for each of the experiments
get_region_counts
to generate a DataFrame that can be provided as input,
Ribo
to create a ribo.object that can be provided as input
#ribo object use case #generate the ribo object file.path <- system.file("extdata", "sample.ribo", package = "ribor") sample <- Ribo(file.path) #specify the regions and experiments of interest regions <- c("UTR5", "CDS", "UTR3") experiments <- c("Hela_1", "Hela_2", "WT_1") plot_region_counts(sample, range.lower = 2, range.upper = 5, experiments) #DataFrame use case #obtains the region counts at each individual read length, summed across every transcript region.counts <- get_region_counts(sample, region = regions, range.lower = 2, range.upper = 5, tidy = TRUE, length = TRUE, transcript = TRUE) #the params 'length' and 'transcript' must be set to true to use a DataFrame plot_region_counts(region.counts)
#ribo object use case #generate the ribo object file.path <- system.file("extdata", "sample.ribo", package = "ribor") sample <- Ribo(file.path) #specify the regions and experiments of interest regions <- c("UTR5", "CDS", "UTR3") experiments <- c("Hela_1", "Hela_2", "WT_1") plot_region_counts(sample, range.lower = 2, range.upper = 5, experiments) #DataFrame use case #obtains the region counts at each individual read length, summed across every transcript region.counts <- get_region_counts(sample, region = regions, range.lower = 2, range.upper = 5, tidy = TRUE, length = TRUE, transcript = TRUE) #the params 'length' and 'transcript' must be set to true to use a DataFrame plot_region_counts(region.counts)
The function rename_default
is the default renaming function for the
appris human transcriptome. It takes one single transcript name and returns a simplified
alias.
rename_default(x)
rename_default(x)
x |
Character denoting original name of the transcript |
Character denoting simplified name of the object
original <- paste("ENST00000613283.2|ENSG00000136997.17|", "OTTHUMG00000128475.8|-|MYC-206|MYC|1365|protein_coding|", sep = "") alias <- rename_default(original)
original <- paste("ENST00000613283.2|ENSG00000136997.17|", "OTTHUMG00000128475.8|-|MYC-206|MYC|1365|protein_coding|", sep = "") alias <- rename_default(original)
The function rename_transcripts
strives to make the transcript names less
cumbersome to write and easier to use.
rename_transcripts(ribo, rename)
rename_transcripts(ribo, rename)
ribo |
a path to the ribo file or a 'Ribo' object |
rename |
A function that renames the original transcript or an already generated character vector of aliases |
Transcript names found in a .ribo file can often be long and inconvenient to use. As a result, this function allows the user to rename the transcripts.
Often times, a short function can be used on the ribo file reference names
to split and extract a more convenient name, and a function with a similar input and
output to rename_default
can be passed in.
However, if there is no simple function that takes the original name and renames it into
a unique alias, then the user can provide a character vector of the same length as the number of
transcripts in the ribo file. This character vector would provide aliases that match the order
of the original reference names returned by the get_reference_names
function.
A character vector denoting the renamed transcript aliases
rename_default
to view expected input and output of a 'rename' function
Ribo
to generate a ribo object
file.path <- system.file("extdata", "HEK293_ingolia.ribo", package = "ribor") sample <- Ribo(file.path, rename = rename_default) aliases <- rename_transcripts(sample, rename = rename_default)
file.path <- system.file("extdata", "HEK293_ingolia.ribo", package = "ribor") sample <- Ribo(file.path, rename = rename_default) aliases <- rename_transcripts(sample, rename = rename_default)
The Ribo object serves as the main utility vehicle for the ribor package. Specifically, it allows the user to interface with a .ribo file in the R ribor rely on the Ribo object to read, visualize, and inspect the contents of the .ribo file. The information stored in this object include the .ribo file path, the list of experiments, the format version, the reference model, the minimum read length, maximum read length, the left span, the right span, and other transcript information.
## S4 method for signature 'Ribo' show(object) ## S4 method for signature 'Ribo' path(object) ## S4 method for signature 'Ribo' experiments(object) ## S4 method for signature 'Ribo' format_version(object) ## S4 method for signature 'Ribo' reference(object) ## S4 method for signature 'Ribo' length_min(object) ## S4 method for signature 'Ribo' length_max(object) ## S4 method for signature 'Ribo' left_span(object) ## S4 method for signature 'Ribo' right_span(object) ## S4 method for signature 'Ribo' metagene_radius(object) ## S4 method for signature 'Ribo' length_offset(object) ## S4 method for signature 'Ribo' has_metadata(object) ## S4 method for signature 'Ribo' experiment_info(object) ## S4 method for signature 'Ribo' transcript_info(object) ## S4 method for signature 'Ribo' alias_hash(object) ## S4 method for signature 'Ribo' original_hash(object) Ribo(path, rename = NULL)
## S4 method for signature 'Ribo' show(object) ## S4 method for signature 'Ribo' path(object) ## S4 method for signature 'Ribo' experiments(object) ## S4 method for signature 'Ribo' format_version(object) ## S4 method for signature 'Ribo' reference(object) ## S4 method for signature 'Ribo' length_min(object) ## S4 method for signature 'Ribo' length_max(object) ## S4 method for signature 'Ribo' left_span(object) ## S4 method for signature 'Ribo' right_span(object) ## S4 method for signature 'Ribo' metagene_radius(object) ## S4 method for signature 'Ribo' length_offset(object) ## S4 method for signature 'Ribo' has_metadata(object) ## S4 method for signature 'Ribo' experiment_info(object) ## S4 method for signature 'Ribo' transcript_info(object) ## S4 method for signature 'Ribo' alias_hash(object) ## S4 method for signature 'Ribo' original_hash(object) Ribo(path, rename = NULL)
object |
Ribo object |
path |
The path to the .ribo file |
rename |
A function that renames the original transcript or an already generated character vector of aliases |
Note that the path parameter takes in a file path and stores it. While using the package, be sure to not to move or change the location of the .ribo file. The default names of the transcripts may be difficult to use depending on the settings used to generate the .ribo file. As a result, we have provided a rename parameter that integrates well with the Appris reference transcriptome. Users may also define a simple function that processes a given default transcript name in a one-to-one manner to another custom alias.
Returns an S4 object of class "Ribo" containing a path to the HDF5 file, various attributes in the root folder, and information about the transcripts such as names and lengths
If a ribo object is already generated but aliases want to be added or updated, use the
set_aliases
function.
file.path <- system.file("extdata", "sample.ribo", package = "ribor") sample <- Ribo(file.path) show(sample) #generate a ribo object with transcript nicknames/aliases file.path <- system.file("extdata", "HEK293_ingolia.ribo", package = "ribor") sample <- Ribo(file.path, rename = rename_default )
file.path <- system.file("extdata", "sample.ribo", package = "ribor") sample <- Ribo(file.path) show(sample) #generate a ribo object with transcript nicknames/aliases file.path <- system.file("extdata", "HEK293_ingolia.ribo", package = "ribor") sample <- Ribo(file.path, rename = rename_default )
The 'ribor' package offers a suite of reading functions for the datasets present in a .ribo file and also provides some rudimentary plotting functions.
To get started with the ribor package, please see the vignette page at https://ribosomeprofiling.github.io/ribor/ribor.html.
The paper associated with the Ribo ecosystem can be found at https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btaa028/5701654.
For more information on the preprocessing pipeline, please see the link to the source code at https://github.com/ribosomeprofiling/riboflow.
For more information on the .ribo file format, please see its documentation page at https://ribopy.readthedocs.io/en/latest/ribo_file_format.html.
For an alternative to ribor, please see a link to source code of ribopy, a python interface, at https://github.com/ribosomeprofiling/ribopy.
Ribo
to get started
get_length_distribution
to get length distribution counts
plot_length_distribution
to plot the length distribution
get_region_counts
to get region counts
plot_region_counts
to plot the region counts
get_metagene
to get metagene site coverage
get_tidy_metagene
to get a tidy format of the metagene site coverage
plot_metagene
to plot the metagene site coverage
The function set_aliases
allows the user to add aliases to a valid
ribo object.
set_aliases(ribo.object, rename)
set_aliases(ribo.object, rename)
ribo.object |
A 'ribo' object |
rename |
A function that renames original transcript name into an alias |
If there is a different naming convention from the default appris transcriptome,
there may be no simple way to generate convenient aliases from the original reference names.
As a result, the user can first generate the ribo object and get the reference names, use custom
(and likely more intricate) functions to generate a list of aliases, and then pass in a character
vector of these aliases. The character vector should match the order of and correspond to the
list of reference names retrieved from get_reference_names
A modified 'ribo' object that contains alias information
#generate a ribo object with transcript nicknames/aliases file.path <- system.file("extdata", "HEK293_ingolia.ribo", package = "ribor") sample <- Ribo(file.path) sample <- set_aliases(ribo.object = sample, rename = rename_default)
#generate a ribo object with transcript nicknames/aliases file.path <- system.file("extdata", "HEK293_ingolia.ribo", package = "ribor") sample <- Ribo(file.path) sample <- set_aliases(ribo.object = sample, rename = rename_default)