Title: | R wrapper for Progenetix |
---|---|
Description: | The package is an R wrapper for Progenetix REST API built upon the Beacon v2 protocol. Its purpose is to provide a seamless way for retrieving genomic data from Progenetix database—an open resource dedicated to curated oncogenomic profiles. Empowered by this package, users can effortlessly access and visualize data from Progenetix. |
Authors: | Hangjia Zhao [aut, cre] |
Maintainer: | Hangjia Zhao <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.1.4 |
Built: | 2024-07-25 03:03:47 UTC |
Source: | https://github.com/bioc/pgxRpi |
A dataframe containing cytoband annotation details extracted from the hg19 gennome. It is used for CNV frequency visualization.
hg19_cytoband
hg19_cytoband
An object of class data.frame
with 862 rows and 5 columns.
cytoband of hg19 genome
http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/cytoBand.txt.gz
A dataframe containing cytoband annotation details extracted from the hg38 gennome. It is used for CNV frequency visualization.
hg38_cytoband
hg38_cytoband
An object of class data.frame
with 862 rows and 5 columns.
cytoband of hg38 genome
http://hgdownload.cse.ucsc.edu/goldenpath/hg38/database/cytoBand.txt.gz
This function returns the number of samples for every filter in Progenetix database.
pgxCount( filters = NULL, domain = "http://progenetix.org", dataset = "progenetix" )
pgxCount( filters = NULL, domain = "http://progenetix.org", dataset = "progenetix" )
filters |
A single or a comma-concatenated list of identifiers such as c("NCIT:C7376","icdom-98353") |
domain |
A string specifying the domain of database. Default is "http://progenetix.org". |
dataset |
A string specifying the dataset to query. Default is "progenetix". Other available options are "cancercelllines". |
Count of samples in the given filter
pgxCount(filters = "NCIT:C3512")
pgxCount(filters = "NCIT:C3512")
This function retrieves available filters in the Progenetix database.
pgxFilter( prefix = NULL, return_all_prefix = FALSE, domain = "http://progenetix.org", dataset = "progenetix" )
pgxFilter( prefix = NULL, return_all_prefix = FALSE, domain = "http://progenetix.org", dataset = "progenetix" )
prefix |
A string specifying the prefix of filters, such as 'NCIT' and 'PMID'. Default is NULL, which means that all available filters will be returned. When specified, it returns all filters with the specified prefix. |
return_all_prefix |
A logical value determining whether to return all valid prefixes of filters used in Progenetix.
If TRUE, the |
domain |
A string specifying the domain of the Progenetix database. Default is "http://progenetix.org". |
dataset |
A string specifying the dataset to query. Default is "progenetix". Other available options are "cancercelllines". |
filter terms used in Progenetix.
pgxFilter(prefix = "NCIT")
pgxFilter(prefix = "NCIT")
Thie function plots the frequency of deletions and duplications
pgxFreqplot( data, chrom = NULL, layout = c(1, 1), filters = NULL, circos = FALSE, highlight = NULL, assembly = "hg38" )
pgxFreqplot( data, chrom = NULL, layout = c(1, 1), filters = NULL, circos = FALSE, highlight = NULL, assembly = "hg38" )
data |
The frequency object returned by |
chrom |
A vector with chromosomes to be plotted. If NULL, return the plot by genome. If specified the frequencies are plotted with one panel for each chromosome. Default is NULL. |
layout |
Number of columns and rows in plot. Only used in plot by chromosome. Default is c(1,1). |
filters |
Index or string value to indicate which filter to be plotted, such as 1
(the first filters in |
circos |
A logical value to indicate if return a circos plot. If TRUE, it can return a circos plot with multiple filters for display and comparison. Default is FALSE. |
highlight |
Indices of genomic bins to be highlighted with red color. |
assembly |
A string specifying which genome assembly version should be applied to CNV frequency plotting. Allowed options are "hg19", "hg38". Default is "hg38" (genome version used in Progenetix). |
The binned CNV frequency plot
## load necessary data (this step can be skipped in real implementation) data("hg38_cytoband") ## get frequency data freq <- pgxLoader(type="frequency", output ='pgxfreq', filters="NCIT:C3512") ## visualize pgxFreqplot(freq)
## load necessary data (this step can be skipped in real implementation) data("hg38_cytoband") ## get frequency data freq <- pgxLoader(type="frequency", output ='pgxfreq', filters="NCIT:C3512") ## visualize pgxFreqplot(freq)
This function loads various data from Progenetix
database.
pgxLoader( type = NULL, output = NULL, filters = NULL, codematches = FALSE, filterLogic = "AND", limit = 0, skip = NULL, biosample_id = NULL, individual_id = NULL, save_file = FALSE, filename = NULL, num_cores = 1, domain = "http://progenetix.org", dataset = "progenetix" )
pgxLoader( type = NULL, output = NULL, filters = NULL, codematches = FALSE, filterLogic = "AND", limit = 0, skip = NULL, biosample_id = NULL, individual_id = NULL, save_file = FALSE, filename = NULL, num_cores = 1, domain = "http://progenetix.org", dataset = "progenetix" )
type |
A string specifying output data type. Available options are "biosample", "individual", "variant" or "frequency". The first two options return corresponding metadata, "variant" returns CNV variant data, and "frequency" returns precomputed CNV frequency based on data in Progenetix. |
output |
A string specifying output data format. When the parameter |
filters |
Identifiers for cancer type, literature, cohorts, and age such as c("NCIT:C7376", "pgx:icdom-98353", "PMID:22824167", "pgx:cohort-TCGAcancers", "age:>=P50Y"). |
codematches |
A logical value determining whether to exclude samples from child concepts of specified filters that belong to cancer type/tissue encoding system (NCIt, icdom/t, Uberon).
If TRUE, retrieved samples only keep samples exactly encoded by specified filters.
Do not use this parameter when |
filterLogic |
A string specifying logic for combining multiple filters when query metadata (the paramter |
limit |
Integer to specify the number of returned biosample/individual/variant profiles for each filter. Default is 0 (return all). |
skip |
Integer to specify the number of skipped biosample/individual/variant profiles for each filter. E.g. if skip = 2, limit=500, the first 2*500 =1000 profiles are skipped and the next 500 profiles are returned. Default is NULL (no skip). |
biosample_id |
Identifiers used in Progenetix database for identifying biosamples. |
individual_id |
Identifiers used in Progenetix database for identifying individuals. |
save_file |
A logical value determining whether to save the segment variant data as file
instead of direct return. Only used when the parameter |
filename |
A string specifying the path and name of the file to be saved.
Only used if the parameter |
num_cores |
Integer to specify the number of cores used for the variant query. Only used when the parameter |
domain |
A string specifying the domain of database. Default is "http://progenetix.org". |
dataset |
A string specifying the dataset to query. Default is "progenetix". Other available options are "cancercelllines". |
Data from Progenetix database
## query metadata biosamples <- pgxLoader(type="biosample", filters = "NCIT:C3512") ## query segment variants seg <- pgxLoader(type="variant", output = "pgxseg", biosample_id = "pgxbs-kftvgx4y") ## query CNV frequency freq <- pgxLoader(type="frequency", output ='pgxfreq', filters="NCIT:C3512")
## query metadata biosamples <- pgxLoader(type="biosample", filters = "NCIT:C3512") ## query segment variants seg <- pgxLoader(type="variant", output = "pgxseg", biosample_id = "pgxbs-kftvgx4y") ## query CNV frequency freq <- pgxLoader(type="frequency", output ='pgxfreq', filters="NCIT:C3512")
This function provides the survival plot from individual metadata.
pgxMetaplot(data, group_id, condition, return_data = FALSE, ...)
pgxMetaplot(data, group_id, condition, return_data = FALSE, ...)
data |
The meatdata of individuals returned by |
group_id |
A string specifying which column is used for grouping in the Kaplan-Meier plot. |
condition |
Condition for splitting individuals into younger and older groups. Only used if |
return_data |
A logical value determining whether to return the metadata used for plotting. Default is FALSE. |
... |
Other parameters relevant to KM plot. These include |
The KM plot from input data
individuals <- pgxLoader(type="individual",filters="NCIT:C3512") pgxMetaplot(individuals, group_id="age_iso", condition="P65Y")
individuals <- pgxLoader(type="individual",filters="NCIT:C3512") pgxMetaplot(individuals, group_id="age_iso", condition="P65Y")
This function extracts segments, CNV frequency, and metadata from local "pgxseg" files and supports survival data visualization
pgxSegprocess( file, group_id = "group_id", show_KM_plot = FALSE, return_metadata = FALSE, return_seg = FALSE, return_frequency = FALSE, assembly = "hg38", bin_size = 1e+06, overlap = 1000, soft_expansion = 0.1, ... )
pgxSegprocess( file, group_id = "group_id", show_KM_plot = FALSE, return_metadata = FALSE, return_seg = FALSE, return_frequency = FALSE, assembly = "hg38", bin_size = 1e+06, overlap = 1000, soft_expansion = 0.1, ... )
file |
A string specifying the path and name of the "pgxseg" file where the data is to be read. |
group_id |
A string specifying which id is used for grouping in KM plot or CNV frequency calculation. Default is "group_id". |
show_KM_plot |
A logical value determining whether to return the Kaplan-Meier plot based on metadata. Default is FALSE. |
return_metadata |
A logical value determining whether to return metadata. Default is FALSE. |
return_seg |
A logical value determining whether to return segment data. Default is FALSE. |
return_frequency |
A logical value determining whether to return CNV frequency data. The frequency calculation is based on segments in segment data and specified group id in metadata. Default is FALSE. |
assembly |
A string specifying which genome assembly version should be applied to CNV frequency calculation and plotting. Allowed options are "hg19", "hg38". Default is "hg38". |
bin_size |
Size of genomic bins used in CNV frequency calculation to split the genome, in base pairs (bp). Default is 1,000,000. |
overlap |
Numeric value defining the amount of overlap between bins and segments considered as bin-specific CNV, in base pairs (bp). Default is 1,000. |
soft_expansion |
Fraction of |
... |
Other parameters relevant to KM plot. These include |
Segments data, CNV frequency object, meta data or KM plots from local "pgxseg" files
file_path <- system.file("extdata", "example.pgxseg",package = 'pgxRpi') info <- pgxSegprocess(file=file_path,show_KM_plot = TRUE, return_seg = TRUE, return_metadata = TRUE)
file_path <- system.file("extdata", "example.pgxseg",package = 'pgxRpi') info <- pgxSegprocess(file=file_path,show_KM_plot = TRUE, return_seg = TRUE, return_metadata = TRUE)
Thie function calculates the frequency of deletions and duplications
segtoFreq( data, cnv_column_idx = 6, cohort_name = "unspecified cohort", assembly = "hg38", bin_size = 1e+06, overlap = 1000, soft_expansion = 0.1 )
segtoFreq( data, cnv_column_idx = 6, cohort_name = "unspecified cohort", assembly = "hg38", bin_size = 1e+06, overlap = 1000, soft_expansion = 0.1 )
data |
Segment data with CNV states. The first four columns should specify sample ID, chromosome, start position, and end position, respectively. The column representing CNV states should contain either "DUP" for duplications or "DEL" for deletions. |
cnv_column_idx |
Index of the column specifying CNV state. Default is 6, following the "pgxseg" format used in Progenetix.
If the input segment data uses the general |
cohort_name |
A string specifying the cohort name. Default is "unspecified cohort". |
assembly |
A string specifying the genome assembly version for CNV frequency calculation. Allowed options are "hg19" or "hg38". Default is "hg38". |
bin_size |
Size of genomic bins used to split the genome, in base pairs (bp). Default is 1,000,000. |
overlap |
Numeric value defining the amount of overlap between bins and segments considered as bin-specific CNV, in base pairs (bp). Default is 1,000. |
soft_expansion |
Fraction of |
The binned CNV frequency stored in "pgxfreq" format
## load necessary data (this step can be skipped in real implementation) data("hg38_cytoband") ## get pgxseg data seg <- read.table(system.file("extdata", "example.pgxseg",package = 'pgxRpi'),header=TRUE) ## calculate frequency data freq <- segtoFreq(seg) ## visualize pgxFreqplot(freq)
## load necessary data (this step can be skipped in real implementation) data("hg38_cytoband") ## get pgxseg data seg <- read.table(system.file("extdata", "example.pgxseg",package = 'pgxRpi'),header=TRUE) ## calculate frequency data freq <- segtoFreq(seg) ## visualize pgxFreqplot(freq)