Title: | Position Related Data Analysis |
---|---|
Description: | Package for the position related analysis of quantitative functional genomics data. |
Authors: | Francesco Ferrari <[email protected]> |
Maintainer: | Francesco Ferrari <[email protected]> |
License: | GPL-2 |
Version: | 1.53.0 |
Built: | 2024-11-19 04:12:44 UTC |
Source: | https://github.com/bioc/PREDA |
Get the names of the analyses in the from StatisticsForPREDA objects, PREDAResults objects and objects from classes extending these classes.
analysesNames(.Object)
analysesNames(.Object)
.Object |
an object of class StatisticsForPREDA, PREDAResults or any other class extending these classes |
Character vector of analysesNames
Francesco Ferrari
"StatisticsForPREDA"
,
"PREDAResults"
require(PREDAsampledata) data(SODEGIRGEanalysisResults) analysesNames(SODEGIRGEanalysisResults)
require(PREDAsampledata) data(SODEGIRGEanalysisResults) analysesNames(SODEGIRGEanalysisResults)
Function to compute dataset signature for recurrent significant genomic regions
# computeDatasetSignature(.Object, genomicRegionsList=genomicRegionsList, # multTestCorrection="fdr", signature_qval_threshold=0.05, # returnRegions=TRUE, use.referencePositions=TRUE) computeDatasetSignature(.Object, ...)
# computeDatasetSignature(.Object, genomicRegionsList=genomicRegionsList, # multTestCorrection="fdr", signature_qval_threshold=0.05, # returnRegions=TRUE, use.referencePositions=TRUE) computeDatasetSignature(.Object, ...)
.Object |
Object of class GenomicAnnotationsForPREDA |
... |
See below
|
The function adopts a binomial test to identify significant recurrence of genomic regions across multiple dataset sampels.
A GenomicRegions object (if returnRegions = TRUE) or a PREDAresults object containing dataset signature statistics (if returnRegions = FALSE)
Francesco Ferrari
## Not run: require(PREDAsampledata) data(SODEGIRCNanalysisResults) data(GEDataForPREDA) SODEGIR_CN_GAIN<-PREDAResults2GenomicRegions( SODEGIRCNanalysisResults, qval.threshold=0.01, smoothStatistic.tail="upper", smoothStatistic.threshold=0.1) CNgain_signature<-computeDatasetSignature(GEDataForPREDA, genomicRegionsList=SODEGIR_CN_GAIN) ## End(Not run)
## Not run: require(PREDAsampledata) data(SODEGIRCNanalysisResults) data(GEDataForPREDA) SODEGIR_CN_GAIN<-PREDAResults2GenomicRegions( SODEGIRCNanalysisResults, qval.threshold=0.01, smoothStatistic.tail="upper", smoothStatistic.threshold=0.1) CNgain_signature<-computeDatasetSignature(GEDataForPREDA, genomicRegionsList=SODEGIR_CN_GAIN) ## End(Not run)
This class is used to manage all of the data required as input for PREDA analysis: it is usually created by merging a GenomicAnnotationsForPREDA and a StatisticsForPREDA classes
Objects can be created by calls of the form new("DataForPREDA", ids, chr, start, end, strand, chromosomesNumbers, chromosomesLabels, position, optionalAnnotations, optionalAnnotationsHeaders, statistic, analysesNames, testedTail)
.
position
:Object of class "integer"
~~
ids
:Object of class "character"
a character vector of unique identifiers for the genomic features under investigation
chr
:Object of class "integer"
a numeric vector representing the chromosome where each ids is mapped.
Please note that chromosome usually not represented with a number must will be comverted to a number as well.
e.g. for Human, chromsomomees X and Y will be converted to chromsomes 23 and 24 respectively.
User defined options will allow this conversion during GenomicAnnotations objects initialization.
start
:Object of class "integer"
a numeric vector of start genomic position for each genomic feature under investigation (i.e. gene, transcript, SNP or other elements).
end
:Object of class "integer"
a numeric vector of end genomic position for each genomic feature under investigation (i.e. gene, transcript, SNP or other elements).
strand
:Object of class "numeric"
a numeric vector of strand genomic position for each genomic feature under investigation: value 1 is used for "plus" (forward) strand and value -1 for "minus" (reverse) strand.
User defined options will allow the conversion to this format during GenomicAnnotations objects initialization.
chromosomesNumbers
:Object of class "numeric"
a numeric vector containing the list of chromosomes for which genomic annotations are provided in the GenomicAnnotations object.
Each chromosome is represented just once in increasing order. Please note that chromosome usually not represented with a number must will be comverted to a number as well.
e.g. for Human, chromsomomees X and Y will be converted to chromsomes 23 and 24 respectively.
chromosomesLabels
:Object of class "character"
a character vector containing the list of chromosomes for which genomic annotations are provided in the GenomicAnnotations object.
Each chromosome is represented just once in the same order as reported in chromosomesNumbers slot.
This slot is actually used just to provide a label for each associated chromosome number, in case that some non numeric chromsome is used
(e.g. to preserve the correspondence between chr 23 and the actual chr X in Human)
optionalAnnotations
:Object of class "matrix"
optional annotations associated to the genomic features can be managed along with genomic positions annotations.
E.g. GeneSymbol or EntrezGene ids can be associated to gene realted GenomicAnnotaitons objects.
These additional annotations are not mandatory (the default value for this slot is NULL)
The additional annotations must be provided as a matrix of character,
with a number of rows equal to the length of "ids" slot and a number of columns equal
to the length of "optionalAnnotationsHeaders" slot.
optionalAnnotationsHeaders
:Object of class "character"
character vector containing the names associated to optional annotations. Please avoid using spaces in annotations names.
statistic
:Object of class "matrix"
a numeric matrix containing gene-centered statistics (or statistics on genomic data centered on other genomic features under investigation).
The statistics must be provided as a matrix of numeric values,
with a number of rows equal to the length of "ids" slot and a number of columns equal
to the length of "analysesNames" slot.
analysesNames
:Object of class "character"
a character vector of unique names associated to each column of statistic matrix.
This is just a name that will be used to identify each analysis.
testedTail
:Object of class "character"
a character describing what tail of the statistic distribution will be analyzed during PREDA analysis.
Possible values are "upper", "lower" or "both". Anyway we strongly recommend using PREDA analysis only
for statistics on genomic data with a symmetric distribution around zero.
Class "GenomicAnnotationsForPREDA"
, directly.
Class "StatisticsForPREDA"
, directly.
Class "GenomicAnnotations"
, by class "GenomicAnnotationsForPREDA", distance 2.
signature(.Object = "DataForPREDA")
: extract data and annotations as a dataframe with probeids as rownames
signature(.Object = "DataForPREDA")
: extract a GenomicAnnotationsForPREDA object from a data DataForPREDA object
signature(.Object = "DataForPREDA")
: extract a StatisticsForPREDA object from a data DataForPREDA object
signature(.Object = "DataForPREDA")
: filter annotations to remove selected chromosomes
signature(.Object = "DataForPREDA")
: filter annotations to keep selected chromosomes
signature(.Object = "DataForPREDA")
: sort annotations according to selected chromosomes and to remove genes containing any NA annotation field
signature(.Object = "DataForPREDA")
: initialize method for DataForPREDA objects
signature(.Object = "DataForPREDA")
: filter statistics to remove selected analyses
signature(.Object = "DataForPREDA")
: filter statistics to keep selected analyses
This class is better described in the package vignette
Francesco Ferrari
"GenomicAnnotations"
, "GenomicAnnotationsForPREDA"
, "StatisticsForPREDA"
,
DataForPREDA2dataframe
,DataForPREDA2GenomicAnnotationsForPREDA
,DataForPREDA2StatisticsForPREDA
,
GenomicAnnotationsFilter_neg
,GenomicAnnotationsFilter_pos
,GenomicAnnotationsSortAndCleanNA
,
StatisticsForPREDAFilterColumns_neg
,StatisticsForPREDAFilterColumns_pos
showClass("DataForPREDA")
showClass("DataForPREDA")
extract data and annotations as a dataframe with probeids as rownames
DataForPREDA2dataframe(.Object)
DataForPREDA2dataframe(.Object)
.Object |
An object of class DataForPREDA |
extract data and annotations as a dataframe with probeids as rownames
a dataframe with probeids as rownames
extract a GenomicAnnotationsForPREDA object from a data DataForPREDA object
DataForPREDA2GenomicAnnotationsForPREDA(.Object)
DataForPREDA2GenomicAnnotationsForPREDA(.Object)
.Object |
an object of class DataForPREDA |
extract a GenomicAnnotationsForPREDA object from a data DataForPREDA object
a GenomicAnnotationsForPREDA object
extract a StatisticsForPREDA object from a data DataForPREDA object
DataForPREDA2StatisticsForPREDA(.Object)
DataForPREDA2StatisticsForPREDA(.Object)
.Object |
a data DataForPREDA object |
extract a StatisticsForPREDA object from a data DataForPREDA object
a StatisticsForPREDA object
Function to scale median value of DataForPREDA statistics to zero
DataForPREDAMedianCenter(.Object, ...)
DataForPREDAMedianCenter(.Object, ...)
.Object |
a DataForPREDA object |
... |
Scale median value of DataForPREDA statistics to zero
a DataForPREDA object
Function building a GenomicAnnotations object on an ExpressionSet object
# eset2GenomicAnnotations(.Object, retain.chrs, # optionalAnnotations) eset2GenomicAnnotations(.Object, ...)
# eset2GenomicAnnotations(.Object, retain.chrs, # optionalAnnotations) eset2GenomicAnnotations(.Object, ...)
.Object |
ExpressionSet object. The associated annotation library will be used to build a GenomicAnnotations object. |
... |
See below
|
An object of class "GenomicAnnotations"
Francesco Ferrari
## Not run: require("PREDAsampledata") data(ExpressionSetRCC) GEGenomicAnnotations<-eset2GenomicAnnotations(ExpressionSetRCC, retain.chrs=1:22) ## End(Not run)
## Not run: require("PREDAsampledata") data(ExpressionSetRCC) GEGenomicAnnotations<-eset2GenomicAnnotations(ExpressionSetRCC, retain.chrs=1:22) ## End(Not run)
draw a genome plot with user defined genomic regions
# genomePlot(.Object, genomicRegions=NULL, draw.blocks=TRUE, # parallel.plot=TRUE, grouping=NULL, custom.labels=NULL, # scale.positions=NULL, qval.threshold=0.05, # use.referencePositions=FALSE, smoothStatistic.tail=NULL, # smoothStatistic.threshold=NULL, region.colors=NULL, # limitChrs=NULL) genomePlot(.Object, ...)
# genomePlot(.Object, genomicRegions=NULL, draw.blocks=TRUE, # parallel.plot=TRUE, grouping=NULL, custom.labels=NULL, # scale.positions=NULL, qval.threshold=0.05, # use.referencePositions=FALSE, smoothStatistic.tail=NULL, # smoothStatistic.threshold=NULL, region.colors=NULL, # limitChrs=NULL) genomePlot(.Object, ...)
.Object |
Object of class GenomicAnnotationsForPREDA, or any other class exteinding this one. |
... |
See below
|
See also the PREDA tutorial vignette for more details and sample usage
A plot of the genome with significant GenomicRegions
Francesco Ferrari
PREDAResults2GenomicRegions
,
PREDAResults
,
PREDADataAndResults
,
GenomicAnnotationsForPREDA
## See PREDA tutorial vignette for some examples
## See PREDA tutorial vignette for some examples
This class is used to manage information about genomic features under investigation: i.e. genomic genes, SNP or others, with particular focus on the genomic coordinates of each of them. Other additional annotations associated to each element can be stored in a GenomicAnnotations object in the optionalAnnotations slots
Objects can be created by calls of the form new("GenomicAnnotations",
ids, chr, start, end, strand, chromosomesNumbers, chromosomesLabels,
optionalAnnotations, optionalAnnotationsHeaders)
.
ids
:Object of class "character"
~~
chr
:Object of class "integer"
~~
start
:Object of class "integer"
~~
end
:Object of class "integer"
~~
strand
:Object of class "numeric"
~~
chromosomesNumbers
:Object of class "numeric"
~~
chromosomesLabels
:Object of class "character"
~~
optionalAnnotations
:Object of class "matrix"
~~
optionalAnnotationsHeaders
:Object of class "character"
~~
signature(.Object = "GenomicAnnotations")
: extracts annotations as a dataframe with probeids as rownames
signature(.Object = "GenomicAnnotations")
: generate a new GenomicAnnotationsForPREDA object from a GenomicAnnotations object
signature(.Object = "GenomicAnnotations")
: extract from the GenomicAnnotations object a vector containing a vector with reference positions
signature(.Object = "GenomicAnnotations")
: extract optional annotations for a specific region
signature(.Object = "GenomicAnnotations")
: filter annotations to remove selected chromosomes
signature(.Object = "GenomicAnnotations")
: filter annotations to keep selected chromosomes
signature(.Object = "GenomicAnnotations")
: sort annotations according to selected chromosomes and to remove genes containing any NA annotation field
signature(.Object1 = "GenomicRegions", .Object2 = "GenomicAnnotations")
: extract annotations from a GenomicAnnotations object for a set of regions specified as a GenomicRegions object
signature(.Object = "GenomicAnnotations")
: initialize method for GenomicAnnotations objects
This class is better described in the package vignette
Francesco Ferrari
GenomicAnnotations2dataframe
, GenomicAnnotations2GenomicAnnotationsForPREDA
,
GenomicAnnotations2reference_positions
,GenomicAnnotationsExtract
,
GenomicAnnotationsFilter_neg
,GenomicAnnotationsFilter_pos
,
GenomicAnnotationsSortAndCleanNA
,GenomicRegionsAnnotate
,
showClass("GenomicAnnotations")
showClass("GenomicAnnotations")
extracts annotations as a dataframe with probeids as rownames
GenomicAnnotations2dataframe(.Object)
GenomicAnnotations2dataframe(.Object)
.Object |
A GenomicAnnotations object |
extract annotations as a dataframe with probeids as rownames
a dataframe with probeids as rownames
generate a new GenomicAnnotationsForPREDA object from a GenomicAnnotations object
# GenomicAnnotations2GenomicAnnotationsForPREDA(.Object, # positions=NULL, reference_position_type=NULL) GenomicAnnotations2GenomicAnnotationsForPREDA(.Object, ... )
# GenomicAnnotations2GenomicAnnotationsForPREDA(.Object, # positions=NULL, reference_position_type=NULL) GenomicAnnotations2GenomicAnnotationsForPREDA(.Object, ... )
.Object |
An object of class GenomicAnnotations |
... |
See below
|
A GenomicAnnotationsForPREDA object
Francesco Ferrari
## Not run: GEGenomicAnnotations<-GenomicAnnotationsFromLibrary(annotLibrary = "org.Hs.eg.db", retain.chrs=1:22) GEGenomicAnnotationsForPREDA<- GenomicAnnotations2GenomicAnnotationsForPREDA( GEGenomicAnnotations, reference_position_type="median") ## End(Not run)
## Not run: GEGenomicAnnotations<-GenomicAnnotationsFromLibrary(annotLibrary = "org.Hs.eg.db", retain.chrs=1:22) GEGenomicAnnotationsForPREDA<- GenomicAnnotations2GenomicAnnotationsForPREDA( GEGenomicAnnotations, reference_position_type="median") ## End(Not run)
extract from the GenomicAnnotations object a vector containing a vector with reference positions
# GenomicAnnotations2reference_positions(.Object, # reference_position_type=c("start", "end", "median", "strand.start", "strand.end"), # withnames=TRUE) GenomicAnnotations2reference_positions(.Object, ...)
# GenomicAnnotations2reference_positions(.Object, # reference_position_type=c("start", "end", "median", "strand.start", "strand.end"), # withnames=TRUE) GenomicAnnotations2reference_positions(.Object, ...)
.Object |
Object of class GenomicAnnotations |
... |
See below
|
A numeric vector with the selected reference positions.
extract optional annotations for a specific region
# GenomicAnnotationsExtract(.Object, chr, start, end, # AnnotationsHeader=NULL, sep.character="; ", # complete.inclusion=FALSE, skipSorting=FALSE, # annotationAsRange=FALSE, getJustFeaturesNumber=FALSE) GenomicAnnotationsExtract(.Object, ...)
# GenomicAnnotationsExtract(.Object, chr, start, end, # AnnotationsHeader=NULL, sep.character="; ", # complete.inclusion=FALSE, skipSorting=FALSE, # annotationAsRange=FALSE, getJustFeaturesNumber=FALSE) GenomicAnnotationsExtract(.Object, ...)
.Object |
An object of class GenomicAnnotations |
... |
See below
|
Extract annotations associated to a specific genomic region from a GenomiAnnotations object. Only annotations from the specified columns are returned.
A character vector is returned
filter annotations to remove selected chromosomes
# GenomicAnnotationsFilter_neg(.Object, chrToRemove, chrAsLabels=FALSE) GenomicAnnotationsFilter_neg(.Object, ...)
# GenomicAnnotationsFilter_neg(.Object, chrToRemove, chrAsLabels=FALSE) GenomicAnnotationsFilter_neg(.Object, ...)
.Object |
An object of class GenomicAnnotations or classes inheriting from GenomicAnnotations |
... |
See below
|
filter annotations to keep selected chromosomes
# GenomicAnnotationsFilter_pos(.Object, chrToRetain, chrAsLabels=FALSE) GenomicAnnotationsFilter_pos(.Object, ...)
# GenomicAnnotationsFilter_pos(.Object, chrToRetain, chrAsLabels=FALSE) GenomicAnnotationsFilter_pos(.Object, ...)
.Object |
An object of class GenomicAnnotations or classes inheriting from GenomicAnnotations |
... |
See below
|
This class is equivalent to the GenomicAnnotations class but includes an additional slot specifying the reference position that will be used for PREDA smoothing of data: this is included in the "position" slot. An unique reference position is required for PREDA analysis because this position is used for smoothing data along chromosomal coordinates. This reference position usaually is the start, the end, or the median posizion of each considered genomic feature, nevertheless other user defined positions could be used as well.
Objects can be created by calls of the form new("GenomicAnnotationsForPREDA", ids, chr, start, end, strand, chromosomesNumbers, chromosomesLabels, position, optionalAnnotations, optionalAnnotationsHeaders)
.
position
:Object of class "integer"
a numeric vector of reference genomic positions that will be associated and used for each genomic feature under investigation for smoothing data during PREDA analysis.
ids
:Object of class "character"
a character vector of unique identifiers for the genomic features under investigation
chr
:Object of class "integer"
a numeric vector representing the chromosome where each ids is mapped.
Please note that chromosome usually not represented with a number must will be comverted to a number as well.
e.g. for Human, chromsomomees X and Y will be converted to chromsomes 23 and 24 respectively.
User defined options will allow this conversion during GenomicAnnotations objects initialization.
start
:Object of class "integer"
a numeric vector of start genomic position for each genomic feature under investigation (i.e. gene, transcript, SNP or other elements).
end
:Object of class "integer"
a numeric vector of end genomic position for each genomic feature under investigation (i.e. gene, transcript, SNP or other elements).
strand
:Object of class "numeric"
a numeric vector of strand genomic position for each genomic feature under investigation: value 1 is used for "plus" (forward) strand and value -1 for "minus" (reverse) strand.
User defined options will allow the conversion to this format during GenomicAnnotations objects initialization.
chromosomesNumbers
:Object of class "numeric"
a numeric vector containing the list of chromosomes for which genomic annotations are provided in the GenomicAnnotations object.
Each chromosome is represented just once in increasing order. Please note that chromosome usually not represented with a number must will be comverted to a number as well.
e.g. for Human, chromsomomees X and Y will be converted to chromsomes 23 and 24 respectively.
chromosomesLabels
:Object of class "character"
a character vector containing the list of chromosomes for which genomic annotations are provided in the GenomicAnnotations object.
Each chromosome is represented just once in the same order as reported in chromosomesNumbers slot.
This slot is actually used just to provide a label for each associated chromosome number, in case that some non numeric chromsome is used
(e.g. to preserve the correspondence between chr 23 and the actual chr X in Human)
optionalAnnotations
:Object of class "matrix"
optional annotations associated to the genomic features can be managed along with genomic positions annotations.
E.g. GeneSymbol or EntrezGene ids can be associated to gene realted GenomicAnnotaitons objects.
These additional annotations are not mandatory (the default value for this slot is NULL)
The additional annotations must be provided as a matrix of character,
with a number of rows equal to the length of "ids" slot and a number of columns equal
to the length of "optionalAnnotationsHeaders" slot.
optionalAnnotationsHeaders
:Object of class "character"
character vector containing the names associated to optional annotations. Please avoid using spaces in annotations names.
Class "GenomicAnnotations"
, directly.
signature(.Object = "GenomicAnnotationsForPREDA")
: draw a genome plot
signature(.Object = "GenomicAnnotationsForPREDA")
: extract annotations as a dataframe with probeids as rownames
signature(.Object = "GenomicAnnotationsForPREDA")
: filter annotations to remove selected chromosomes
signature(.Object = "GenomicAnnotationsForPREDA")
: filter annotations to keep selected chromosomes
signature(.Object = "GenomicAnnotationsForPREDA")
: extract annotations as a dataframe with probeids as rownames
signature(.Object = "GenomicAnnotationsForPREDA")
: extract the GenomicAnnotations object from the GenomicAnnotationsForPREDA object
signature(.Object = "GenomicAnnotationsForPREDA")
: add PREDA results information to genomic annotatations creating a PREDAResults object
signature(.Object = "GenomicAnnotationsForPREDA")
: sort annotations according to selected chromosomes and to remove genes containing any NA annotation field
signature(.Object = "GenomicAnnotationsForPREDA")
: initialize method for GenomicAnnotationsForPREDA objects
This class is better described in the package vignette
Francesco Ferrari
"GenomicAnnotations"
, GenomicAnnotationsSortAndCleanNA
,
GenomicAnnotationsForPREDA2PREDAResults
,GenomicAnnotationsForPREDA2GenomicAnnotations
,
GenomicAnnotationsForPREDA2dataframe
,GenomicAnnotationsFilter_pos
,
GenomicAnnotationsFilter_neg
,GenomicAnnotations2dataframe
,genomePlot
showClass("GenomicAnnotationsForPREDA")
showClass("GenomicAnnotationsForPREDA")
extract annotations as a dataframe with probeids as rownames
GenomicAnnotationsForPREDA2dataframe(.Object)
GenomicAnnotationsForPREDA2dataframe(.Object)
.Object |
an object of class GenomicAnnotationsForPREDA |
extract annotations from an object of class GenomicAnnotationsForPREDA as a dataframe with probeids as rownames
a dataframe with probeids as rownames
extract the GenomicAnnotations object from the GenomicAnnotationsForPREDA object
GenomicAnnotationsForPREDA2GenomicAnnotations(.Object)
GenomicAnnotationsForPREDA2GenomicAnnotations(.Object)
.Object |
an object of class GenomicAnnotationsForPREDA |
add PREDA results information to genomic annotatations creating a PREDAResults object
# GenomicAnnotationsForPREDA2PREDAResults(.Object, analysesNames, testedTail, smoothStatistic, pvalue, qvalue) GenomicAnnotationsForPREDA2PREDAResults(.Object, ...)
# GenomicAnnotationsForPREDA2PREDAResults(.Object, analysesNames, testedTail, smoothStatistic, pvalue, qvalue) GenomicAnnotationsForPREDA2PREDAResults(.Object, ...)
.Object |
An object of class GenomicAnnotationsForPREDA |
... |
See below
|
Function to create a GenomicAnnotationsForPREDA object from a txt file
GenomicAnnotationsForPREDAFromfile(file, ids_column, chr_column, start_column, end_column, strand_column, chromosomesNumbers = NULL, chromosomesLabels = NULL, chromosomesLabelsInput = NULL, MinusStrandString = "-", PlusStrandString = "+", optionalAnnotationsColumns = NULL, reference_position_type = "median", ...)
GenomicAnnotationsForPREDAFromfile(file, ids_column, chr_column, start_column, end_column, strand_column, chromosomesNumbers = NULL, chromosomesLabels = NULL, chromosomesLabelsInput = NULL, MinusStrandString = "-", PlusStrandString = "+", optionalAnnotationsColumns = NULL, reference_position_type = "median", ...)
file |
Path to the input txt file containing genomic annotations |
ids_column |
Specify the column from the input txt file with gene (or other genomic features) ids. Can be specified using column index (numeric) or column name (character). |
chr_column |
Specify the column from the input txt file with chromosome annotations fields for each ids. Can be specified using column index (numeric) or column name (character). |
start_column |
Specify the column from the input txt file with genomic start position for each genomic element. Can be specified using column index (numeric) or column name (character). |
end_column |
Specify the column from the input txt file with genomic end position for each genomic element. Can be specified using column index (numeric) or column name (character). |
strand_column |
Specify the column from the input txt file with genomic strand mapping for each genomic element. Can be specified using column index (numeric) or column name (character). |
chromosomesNumbers |
Numeric vector to specify the list of numeric values to be associated to each chromosome (especially useful for chromosomes not associated to a number such as chr X or Y) |
chromosomesLabels |
Character vector to specify the list of character labels to be associated to each chromosome (especially useful for chromosomes not associated to a number such as chr X or Y) |
chromosomesLabelsInput |
Character vector to specify the list of character labels associated to each chromosome in the input file. Particularly useful when non numeric character strings are associated to eacforh chromosome in the input file: e.g. "chr3" for chromosome "3". |
MinusStrandString |
Character string used to identify minus strand in the input text file |
PlusStrandString |
Character string used to identify plus strand in the input text file |
optionalAnnotationsColumns |
Character vector of columns headers or numeric vector of columns indices to specify columns of the input file containing additional annotation fields |
reference_position_type |
Character string to specify which genomic coordinate must be
used as reference position for PREDA analysis. See also
|
... |
any other parameter for read.table function that could be useful for parsing the input file, such as "sep", "quote", "header", "na.strings" and other parameters. |
An object of class
"GenomicAnnotationsForPREDA"
Francesco Ferrari
## Not run: data(PREDAsampledata) CNdataPath <- system.file("sampledata", "CopyNumber", package = "PREDAsampledata") CNannotationFile <- file.path(CNdataPath , "SNPAnnot100k.csv") CNGenomicsAnnotations<-GenomicAnnotationsForPREDAFromfile( file=CNannotationFile, ids_column=1, chr_column="Chromosome", start_column=4, end_column=4, strand_column="Strand", chromosomesLabelsInput=1:22, MinusStrandString="-", PlusStrandString="+", optionalAnnotationsColumns=c("Cytoband", "Entrez_gene"), header=TRUE, sep=",", quote="\"", na.strings = c("NA", "", "---")) ## End(Not run)
## Not run: data(PREDAsampledata) CNdataPath <- system.file("sampledata", "CopyNumber", package = "PREDAsampledata") CNannotationFile <- file.path(CNdataPath , "SNPAnnot100k.csv") CNGenomicsAnnotations<-GenomicAnnotationsForPREDAFromfile( file=CNannotationFile, ids_column=1, chr_column="Chromosome", start_column=4, end_column=4, strand_column="Strand", chromosomesLabelsInput=1:22, MinusStrandString="-", PlusStrandString="+", optionalAnnotationsColumns=c("Cytoband", "Entrez_gene"), header=TRUE, sep=",", quote="\"", na.strings = c("NA", "", "---")) ## End(Not run)
Function to create a GenomiAnnotations object from a dataframe
GenomicAnnotationsFromdataframe(GenomicAnnotations_dataframe, ids_column, chr_column, start_column, end_column, strand_column, chromosomesNumbers = NULL, chromosomesLabels = NULL, chromosomesLabelsInput = NULL, MinusStrandString = "-", PlusStrandString = "+", optionalAnnotationsColumns = NULL)
GenomicAnnotationsFromdataframe(GenomicAnnotations_dataframe, ids_column, chr_column, start_column, end_column, strand_column, chromosomesNumbers = NULL, chromosomesLabels = NULL, chromosomesLabelsInput = NULL, MinusStrandString = "-", PlusStrandString = "+", optionalAnnotationsColumns = NULL)
GenomicAnnotations_dataframe |
Dataframe object contanining genomic annotations. |
ids_column |
Specify the column from the input txt file with gene (or other genomic features) ids. Can be specified using column index (numeric) or column name (character). |
chr_column |
Specify the column from the input txt file with chromosome annotations fields for each ids. Can be specified using column index (numeric) or column name (character). |
start_column |
Specify the column from the input txt file with genomic start position for each genomic element. Can be specified using column index (numeric) or column name (character). |
end_column |
Specify the column from the input txt file with genomic end position for each genomic element. Can be specified using column index (numeric) or column name (character). |
strand_column |
Specify the column from the input txt file with genomic strand mapping for each genomic element. Can be specified using column index (numeric) or column name (character). |
chromosomesNumbers |
Numeric vector to specify the list of numeric values to be associated to each chromosome (especially useful for chromosomes not associated to a number such as chr X or Y) |
chromosomesLabels |
Character vector to specify the list of character labels to be associated to each chromosome (especially useful for chromosomes not associated to a number such as chr X or Y) |
chromosomesLabelsInput |
Character vector to specify the list of character labels associated to each chromosome in the input file. Particularly useful when non numeric character strings are associated to eacforh chromosome in the input file: e.g. "chr3" for chromosome "3". |
MinusStrandString |
Character string used to identify minus strand in the input text file |
PlusStrandString |
Character string used to identify plus strand in the input text file |
optionalAnnotationsColumns |
Character vector of columns headers or numeric vector of columns indices to specify columns of the input file containing additional annotation fields |
An object of class "GenomicAnnotations"
Francesco Ferrari
Function to create a GenomiAnnotations object from a text file
GenomicAnnotationsFromfile(file, ids_column, chr_column, start_column, end_column, strand_column, chromosomesNumbers = NULL, chromosomesLabels = NULL, chromosomesLabelsInput = NULL, MinusStrandString = "-", PlusStrandString = "+", optionalAnnotationsColumns = NULL, ...)
GenomicAnnotationsFromfile(file, ids_column, chr_column, start_column, end_column, strand_column, chromosomesNumbers = NULL, chromosomesLabels = NULL, chromosomesLabelsInput = NULL, MinusStrandString = "-", PlusStrandString = "+", optionalAnnotationsColumns = NULL, ...)
file |
Path to the input txt file containing genomic annotations |
ids_column |
Specify the column from the input txt file with gene (or other genomic features) ids. Can be specified using column index (numeric) or column name (character). |
chr_column |
Specify the column from the input txt file with chromosome annotations fields for each ids. Can be specified using column index (numeric) or column name (character). |
start_column |
Specify the column from the input txt file with genomic start position for each genomic element. Can be specified using column index (numeric) or column name (character). |
end_column |
Specify the column from the input txt file with genomic end position for each genomic element. Can be specified using column index (numeric) or column name (character). |
strand_column |
Specify the column from the input txt file with genomic strand mapping for each genomic element. Can be specified using column index (numeric) or column name (character). |
chromosomesNumbers |
Numeric vector to specify the list of numeric values to be associated to each chromosome (especially useful for chromosomes not associated to a number such as chr X or Y) |
chromosomesLabels |
Character vector to specify the list of character labels to be associated to each chromosome (especially useful for chromosomes not associated to a number such as chr X or Y) |
chromosomesLabelsInput |
Character vector to specify the list of character labels associated to each chromosome in the input file. Particularly useful when non numeric character strings are associated to eacforh chromosome in the input file: e.g. "chr3" for chromosome "3". |
MinusStrandString |
Character string used to identify minus strand in the input text file |
PlusStrandString |
Character string used to identify plus strand in the input text file |
optionalAnnotationsColumns |
Character vector of columns headers or numeric vector of columns indices to specify columns of the input file containing additional annotation fields |
... |
any other parameter for read.table function that could be useful for parsing the input file, such as "sep", "quote", "header", "na.strings" and other parameters. |
An object of class "GenomicAnnotations"
Francesco Ferrari
## Not run: data(PREDAsampledata) CNdataPath <- system.file("sampledata", "CopyNumber", package = "PREDAsampledata") CNannotationFile <- file.path(CNdataPath , "SNPAnnot100k.csv") CNGenomicsAnnotations<-GenomicAnnotationsForPREDAFromfile( file=CNannotationFile, ids_column=1, chr_column="Chromosome", start_column=4, end_column=4, strand_column="Strand", chromosomesLabelsInput=1:22, MinusStrandString="-", PlusStrandString="+", optionalAnnotationsColumns=c("Cytoband", "Entrez_gene"), header=TRUE, sep=",", quote="\"", na.strings = c("NA", "", "---")) ## End(Not run)
## Not run: data(PREDAsampledata) CNdataPath <- system.file("sampledata", "CopyNumber", package = "PREDAsampledata") CNannotationFile <- file.path(CNdataPath , "SNPAnnot100k.csv") CNGenomicsAnnotations<-GenomicAnnotationsForPREDAFromfile( file=CNannotationFile, ids_column=1, chr_column="Chromosome", start_column=4, end_column=4, strand_column="Strand", chromosomesLabelsInput=1:22, MinusStrandString="-", PlusStrandString="+", optionalAnnotationsColumns=c("Cytoband", "Entrez_gene"), header=TRUE, sep=",", quote="\"", na.strings = c("NA", "", "---")) ## End(Not run)
Function extracting a GenomicAnnotations object from a Bioconductor annotation library
GenomicAnnotationsFromLibrary(annotLibrary, probeIDs = NULL, retain.chrs = NULL, optionalAnnotations = NULL)
GenomicAnnotationsFromLibrary(annotLibrary, probeIDs = NULL, retain.chrs = NULL, optionalAnnotations = NULL)
annotLibrary |
Character string containing the name of the annotations library to be used for building the GenomicAnnotations object |
probeIDs |
Optional: list of reference id from the selected annotLibrary to be used for building the GenomicAnnotations object |
retain.chrs |
Numeric vector, containing the list of chromosomes selected for the output GenomicAnnotations object. E.g. set retain.chrs=1:22 to limit the GenomicAnnotations object to chromosomes from 1 to 22. This might be ueseful to limit GenomiAnnotations objects to autosomic chromosomes. |
optionalAnnotations |
Character vector to select additional annotations fields to be included into the GenomicAnnotations object. |
An object of class "GenomicAnnotations"
Francesco Ferrari
## Not run: GEGenomicAnnotations<-GenomicAnnotationsFromLibrary(annotLibrary= "org.Hs.eg.db", retain.chrs=1:22) # with optional annotations Genesymbols and EntrezGeneIDs GEGenomicAnnotations<-GenomicAnnotationsFromLibrary(annotLibrary= "hgu133plus2.db", retain.chrs=1:22, optionalAnnotations=c("SYMBOL", "ENTREZID")) ## End(Not run)
## Not run: GEGenomicAnnotations<-GenomicAnnotationsFromLibrary(annotLibrary= "org.Hs.eg.db", retain.chrs=1:22) # with optional annotations Genesymbols and EntrezGeneIDs GEGenomicAnnotations<-GenomicAnnotationsFromLibrary(annotLibrary= "hgu133plus2.db", retain.chrs=1:22, optionalAnnotations=c("SYMBOL", "ENTREZID")) ## End(Not run)
sort annotations according to selected chromosomes and to remove genes containing any NA annotation field
# GenomicAnnotationsSortAndCleanNA(.Object, sorting_position_column="start") GenomicAnnotationsSortAndCleanNA(.Object, ...)
# GenomicAnnotationsSortAndCleanNA(.Object, sorting_position_column="start") GenomicAnnotationsSortAndCleanNA(.Object, ...)
.Object |
An object of class GenomicAnnotations or any object inheriting from GenomicAnnotations |
... |
See below
|
This class is used to manage genomic regions information that can be derived from PREDA analysis results or from other sources:e.g. relevant genomic regions from literature reports can be imported into a GenomicRegions object and compared with PREDA analysis results
Objects can be created by calls of the form new("GenomicRegions", chr, start, end, chromosomesNumbers, chromosomesLabels, optionalAnnotations, optionalAnnotationsHeaders, ids)
.
chr
:Object of class "integer"
a numeric vector representing the chromosome where each genomic region is located.
Please note that chromosome usually not represented with a number must will be comverted to a number as well.
e.g. for Human, chromsomomes X and Y will be converted to chromsomes 23 and 24 respectively.
User defined options will allow this conversion during GenomicAnnotations objects initialization.
start
:Object of class "integer"
a numeric vector of start genomic position for each genomic region. This vector must have the same length of "chr" slot.
end
:Object of class "integer"
a numeric vector of end genomic position for each genomic region. This vector must have the same length of "chr" slot.
chromosomesNumbers
:Object of class "numeric"
a numeric vector containing the list of chromosomes associated to genomic regions in the GenomicRegions object.
Each chromosome is represented just once in increasing order. Please note that chromosomes usually not represented with a number will be comverted to a number as well.
e.g. for Human, chromsomomees X and Y will be converted to chromsomes 23 and 24 respectively.
chromosomesLabels
:Object of class "character"
a character vector containing the list of chromosomes associated to genomic regions in the GenomicRegions object.
Each chromosome is represented just once in the same order as reported in chromosomesNumbers slot.
This slot is actually used just to provide a label for each associated chromosome number, in case that some non numeric chromsome is used
(e.g. to preserve the correspondence between chr 23 and the actual chr X in Human)
optionalAnnotations
:Object of class "matrix"
optional annotations associated to the genomic regions can be managed along with GenomicRegions objects.
E.g. the list of GeneSymbol or EntrezGene ids associated to each genomic region can be provided as optional annotation.
These additional annotations are not mandatory (the default value for this slot is NULL)
The additional annotations must be provided as a matrix of character,
with a number of rows equal to the length of "chr", "start" and "end" slots and a number of columns equal
to le thength of "optionalAnnotationsHeaders" slot.
optionalAnnotationsHeaders
:Object of class "character"
the list of names associated to optional annotations. Please avoid using spaces in annotations names.
ids
:Object of class "character"
a character vector of unique identifiers associated to each genomic regions. This is just an optional element of GenomicRegions objects: the default value is NULL.
signature(.Object = "GenomicRegions")
: extract genomic regions information as a dataframe object
signature(.Object1 = "GenomicRegions", .Object2 = "GenomicAnnotations")
: extract annotations from a GenomicAnnotations object for a set of regions specified as a GenomicRegions object
signature(.Object = "GenomicRegions")
: determine the number of chromosomes with genomic regions
signature(.Object1 = "GenomicRegions", .Object2 = "GenomicRegions")
: compare GenomicRegions objects to identify overlaps
signature(.Object = "GenomicRegions")
: generate unique ids for GenomicRegions objects
signature(.Object = "GenomicRegions")
: filter genomic regions to remove selected chromosomes
signature(.Object = "GenomicRegions")
: filter genomic regions to keep selected chromosomes
signature(.Object = "GenomicRegions")
: determine the number of genomic regions
signature(.Object = "GenomicRegions")
: determine the span of each genomic region
signature(.Object = "GenomicRegions")
: determine the total span of genomic regions
signature(.Object = "GenomicRegions")
: initialize method for GenomicRegions objects
This class is better described in the package vignette
Francesco Ferrari
GenomicAnnotationsSortAndCleanNA
,PREDADataAndResults2dataframe
showClass("GenomicRegions")
showClass("GenomicRegions")
extract genomic regions information as a dataframe object
GenomicRegions2dataframe(GenomicRegionsObject)
GenomicRegions2dataframe(GenomicRegionsObject)
GenomicRegionsObject |
Object of class genomic regions |
Extract genomic regions information as a dataframe object
A dataframe object
Francesco Ferrari
## Not run: require(PREDAsampledata) data(GEanalysisResults) genomic_regions_UP<-PREDAResults2GenomicRegions(GEanalysisResults , qval.threshold=0.05, smoothStatistic.tail="upper", smoothStatistic.threshold=0.5) dataframe_UPregions<-GenomicRegions2dataframe( genomic_regions_UP[[1]]) ## End(Not run)
## Not run: require(PREDAsampledata) data(GEanalysisResults) genomic_regions_UP<-PREDAResults2GenomicRegions(GEanalysisResults , qval.threshold=0.05, smoothStatistic.tail="upper", smoothStatistic.threshold=0.5) dataframe_UPregions<-GenomicRegions2dataframe( genomic_regions_UP[[1]]) ## End(Not run)
extract annotations from a GenomicAnnotations object for a set of regions specified as a GenomicRegions object
# GenomicRegionsAnnotate(.Object1, .Object2, # AnnotationsHeaders=NULL, sep.character="; ", # complete.inclusion=FALSE, annotationAsRange=FALSE, # getJustFeaturesNumber=FALSE) GenomicRegionsAnnotate(.Object1, .Object2, ...)
# GenomicRegionsAnnotate(.Object1, .Object2, # AnnotationsHeaders=NULL, sep.character="; ", # complete.inclusion=FALSE, annotationAsRange=FALSE, # getJustFeaturesNumber=FALSE) GenomicRegionsAnnotate(.Object1, .Object2, ...)
.Object1 |
An object of class GenomicRegions |
.Object2 |
An object of class GenomicAnnotations |
... |
See below
|
The annotation features overlapping the input genomic regions are used to add optional annotations field to the GenomicRegions object.
If previous optional annotations fields are present, they are preserved as well in the output object
A GenomicRegions object with optionalAnnotations
determine the number of chromosomes with genomic regions
GenomicRegionsChrNumber(.Object)
GenomicRegionsChrNumber(.Object)
.Object |
An object of class GenomicRegions |
compare GenomicRegions objects to identify overlaps and differences
GenomicRegionsComparison(.Object1, .Object2)
GenomicRegionsComparison(.Object1, .Object2)
.Object1 |
An object of Class GenomicRegions |
.Object2 |
An object of Class GenomicRegions |
Compare GenomicRegions objects to identify overlaps and differences
A list containing:
overlapping.regions |
GenomicRegions object describing the overlapping regions between input object1 and object2 |
difference.1.2 |
GenomicRegions object describing the regions from input object1 not overlapping regions from object2 |
difference.2.1 |
GenomicRegions object describing the regions from input object2 not overlapping regions from object1 |
GenomicRegions1.number |
Number of genomic regions in input object1 |
GenomicRegions2.number |
Number of genomic regions in input object2 |
overlapping.number |
Number of overlapping genomic regions between input object1 and object2 |
GenomicRegions1.totalspan |
Total span of genomic regions in input object1 |
GenomicRegions2.totalspan |
Total span of genomic regions in input object2 |
overlapping.totalspan |
Total span of overlapping genomic regions between input object1 and object2 |
overlap.VS.GenomicRegions1.ratio |
Ratio between overlapping regions and regions from input object1 |
overlap.VS.GenomicRegions2.ratio |
Ratio between overlapping regions and regions from input object2 |
Francesco Ferrari
GenomicRegionsFindOverlap
,
GenomicRegions
generate unique ids for GenomicRegions objects
GenomicRegionsCreateRegionsIds(.Object, ...)
GenomicRegionsCreateRegionsIds(.Object, ...)
.Object |
An object of class GenomicRegions |
... |
filter genomic regions to remove selected chromosomes
# GenomicRegionsFilter_neg(.Object, chrToRemove, chrAsLabels=FALSE, quiet=FALSE) GenomicRegionsFilter_neg(.Object, ...)
# GenomicRegionsFilter_neg(.Object, chrToRemove, chrAsLabels=FALSE, quiet=FALSE) GenomicRegionsFilter_neg(.Object, ...)
.Object |
An object of class GenomicRegions |
... |
See below
|
filter genomic regions to keep selected chromosomes
# GenomicRegionsFilter_pos(.Object, chrToRetain, chrAsLabels=FALSE, quiet=FALSE) GenomicRegionsFilter_pos(.Object, ...)
# GenomicRegionsFilter_pos(.Object, chrToRetain, chrAsLabels=FALSE, quiet=FALSE) GenomicRegionsFilter_pos(.Object, ...)
.Object |
An object of class GenomicRegions |
... |
See below
|
Function to find overlap between GenomicRegions objects
GenomicRegionsFindOverlap(GenomicRegions1, GenomicRegions2 = NULL)
GenomicRegionsFindOverlap(GenomicRegions1, GenomicRegions2 = NULL)
GenomicRegions1 |
Either a GenomicRegions object or a list of GenomicRegions objects |
GenomicRegions2 |
Optiona with default value NULL. Either a GenomicRegions object or a list of GenomicRegions objects. |
Input genomic regions object are compared to select overlapping genomic regions that are returned as GenomicRegions objects.
If two single GenomicRegions object are provided, just one comparison is performed and one single GenomicRegions object is returned.
If one single list of GenomicRegions objects is provided as input, then the included GenomicRegions objects are compared to select overlapping GenomicRegions across all of the elements.
If two lists of GenomicRegions objects are provided as input, they must have the same number of elements, because element by element comparison will be performed to identify overlapping GenomicRegions across all of the elements.
Either a single GenomicRegions objec or a list of GenomicRegions objecs.
Francesco Ferrari
GenomicRegionsComparison
,
GenomicRegions
## Not run: require(PREDAsampledata) data(SODEGIRCNanalysisResults) data(SODEGIRGEanalysisResults) SODEGIR_GE_UP<-PREDAResults2GenomicRegions( SODEGIRGEanalysisResults, qval.threshold=0.05, smoothStatistic.tail="upper", smoothStatistic.threshold=0.5) SODEGIR_CN_GAIN<-PREDAResults2GenomicRegions( SODEGIRCNanalysisResults, qval.threshold=0.01, smoothStatistic.tail="upper", smoothStatistic.threshold=0.1) SODEGIR_AMPLIFIED<-GenomicRegionsFindOverlap(SODEGIR_GE_UP, SODEGIR_CN_GAIN) ## End(Not run)
## Not run: require(PREDAsampledata) data(SODEGIRCNanalysisResults) data(SODEGIRGEanalysisResults) SODEGIR_GE_UP<-PREDAResults2GenomicRegions( SODEGIRGEanalysisResults, qval.threshold=0.05, smoothStatistic.tail="upper", smoothStatistic.threshold=0.5) SODEGIR_CN_GAIN<-PREDAResults2GenomicRegions( SODEGIRCNanalysisResults, qval.threshold=0.01, smoothStatistic.tail="upper", smoothStatistic.threshold=0.1) SODEGIR_AMPLIFIED<-GenomicRegionsFindOverlap(SODEGIR_GE_UP, SODEGIR_CN_GAIN) ## End(Not run)
Function to create a GenomiRegions object from a dataframe
GenomicRegionsFromdataframe(GenomicRegions_dataframe, ids_column=NULL, chr_column, start_column, end_column, chromosomesNumbers=NULL, chromosomesLabels=NULL, chromosomesLabelsInput=NULL)
GenomicRegionsFromdataframe(GenomicRegions_dataframe, ids_column=NULL, chr_column, start_column, end_column, chromosomesNumbers=NULL, chromosomesLabels=NULL, chromosomesLabelsInput=NULL)
GenomicRegions_dataframe |
Dataframe object containing the annotations for genomic regions |
ids_column |
Specify the column from the input dataframe with (optional) ids for genomic regions. Can be specified using column index (numeric) or column name (character). |
chr_column |
Specify the column from the input dataframe with chromosome annotations fields. Can be specified using column index (numeric) or column name (character). |
start_column |
Specify the column from the input dataframe with genomic start position for each genomic region. Can be specified using column index (numeric) or column name (character). |
end_column |
Specify the column from the input dataframe with genomic end position for each genomic region. Can be specified using column index (numeric) or column name (character). |
chromosomesNumbers |
Numeric vector to specify the list of numeric values to be associated to each chromosome (especially useful for chromosomes not associated to a number such as chr X or Y) |
chromosomesLabels |
Character vector to specify the list of character labels to be associated to each chromosome (especially useful for chromosomes not associated to a number such as chr X or Y) |
chromosomesLabelsInput |
Character vector to specify the list of character labels associated to each chromosome in the input. Particularly useful when non numeric character strings are associated to each chromosome in the input file: e.g. "chr3" for chromosome "3". |
An object of class "GenomicRegions"
Francesco Ferrari
Function to create a GenomiRegions object from a text file
GenomicRegionsFromfile(file, ids_column=NULL, chr_column, start_column, end_column, chromosomesNumbers=NULL, chromosomesLabels=NULL, chromosomesLabelsInput=NULL, ...)
GenomicRegionsFromfile(file, ids_column=NULL, chr_column, start_column, end_column, chromosomesNumbers=NULL, chromosomesLabels=NULL, chromosomesLabelsInput=NULL, ...)
file |
Path to the input txt file containing genomic regions annotations |
ids_column |
Specify the column from the input txt file with (optional) ids for genomic regions. Can be specified using column index (numeric) or column name (character). |
chr_column |
Specify the column from the input txt file with chromosome annotations fields. Can be specified using column index (numeric) or column name (character). |
start_column |
Specify the column from the input txt file with genomic start position for each genomic region. Can be specified using column index (numeric) or column name (character). |
end_column |
Specify the column from the input txt file with genomic end position for each genomic region. Can be specified using column index (numeric) or column name (character). |
chromosomesNumbers |
Numeric vector to specify the list of numeric values to be associated to each chromosome (especially useful for chromosomes not associated to a number such as chr X or Y) |
chromosomesLabels |
Character vector to specify the list of character labels to be associated to each chromosome (especially useful for chromosomes not associated to a number such as chr X or Y) |
chromosomesLabelsInput |
Character vector to specify the list of character labels associated to each chromosome in the input file. Particularly useful when non numeric character strings are associated to each chromosome in the input file: e.g. "chr3" for chromosome "3". |
... |
any other parameter for read.table function that could be useful for parsing the input file, such as "sep", "quote", "header", "na.strings" and other parameters. |
An object of class "GenomicRegions"
Francesco Ferrari
determine the number of genomic regions
GenomicRegionsNumber(.Object)
GenomicRegionsNumber(.Object)
.Object |
An object of class GenomicRegions |
determine the span of each genomic region
GenomicRegionsSpan(.Object, ...)
GenomicRegionsSpan(.Object, ...)
.Object |
An object of class GenomicRegions |
... |
determine the total span of genomic regions
GenomicRegionsTotalSpan(.Object, ...)
GenomicRegionsTotalSpan(.Object, ...)
.Object |
Object of Class GenomicRegions |
... |
extract data for individual analyses using the analysis name
# getStatisticByName(.Object, analysisName) getStatisticByName(.Object, ...)
# getStatisticByName(.Object, analysisName) getStatisticByName(.Object, ...)
.Object |
An object of class StatisticsForPREDA |
... |
See below
|
This function merges a StatisticsForPREDA and a GenomicAnnotationsForPREDA object into a DataForPREDA object
MergeStatisticAnnotations2DataForPREDA(StatisticsForPREDAObject, GenomicAnnotationsForPREDAObject, sortAndCleanNA = FALSE, quiet = FALSE, MedianCenter = FALSE)
MergeStatisticAnnotations2DataForPREDA(StatisticsForPREDAObject, GenomicAnnotationsForPREDAObject, sortAndCleanNA = FALSE, quiet = FALSE, MedianCenter = FALSE)
StatisticsForPREDAObject |
An object of class StatisticsForPREDA |
GenomicAnnotationsForPREDAObject |
An object of class GenomicAnnotationsForPREDA |
sortAndCleanNA |
Logical, if TRUE, genomic annotations are sorted for chromosome and genomic position then ids with NA positinal annotations are removed |
quiet |
Logical, if TRUE messages reporting the number of unmatched ids are suppressed. |
MedianCenter |
Logical, if TRUE data are normalized per median sample. |
An object of class DataForPREDA
Francesco Ferrari
function performing the core of PREDA analysis
PREDA_main(inputDataForPREDA, outputGenomicAnnotationsForPREDA =NULL, nperms = 10000, verbose = TRUE, parallelComputations = FALSE, multTestCorrection = "fdr", permutePerChromosome = FALSE, blocksize = 10, permuteStatisticSign = FALSE, smoothMethod = "lokern_scaledBandwidth_repeated", force = FALSE, lokern_scaledBandwidthFactor = 2, limit.analysis = NULL)
PREDA_main(inputDataForPREDA, outputGenomicAnnotationsForPREDA =NULL, nperms = 10000, verbose = TRUE, parallelComputations = FALSE, multTestCorrection = "fdr", permutePerChromosome = FALSE, blocksize = 10, permuteStatisticSign = FALSE, smoothMethod = "lokern_scaledBandwidth_repeated", force = FALSE, lokern_scaledBandwidthFactor = 2, limit.analysis = NULL)
inputDataForPREDA |
A Data for PREDA object |
outputGenomicAnnotationsForPREDA |
A GenomicAnnotationsForPREDA object. If NULL, GenomicsAnnotations for output data are obtained from inputDataForPREDA |
nperms |
Number of permutations performed in PREDA analysis. |
verbose |
Logical, if TRUE some messages are printed concenrning the advancement of the analysis. |
parallelComputations |
Logical, if TRUE Rmpi is used to spawn slave processes, thus using parallel computing to speedup the analysis. |
multTestCorrection |
Multiple testing correction that will be adopted to correct the statistic p-values. Possible values are "fdr", for benjamini and Hochberg multiple testing correction and "qvalue" for p-values correction performed with qvalue package. |
permutePerChromosome |
Logical, if TRUE data parmutations are perfored separatedly for each chromsoome. In most cases the default value (FALSE) is preferable to avoid biases related to specific chromosomes extreme alterations. |
blocksize |
A parameter used to tune parallel computations if parallelComputations is TRUE. This is actually the number of permutations performed on each slave process before every communication with master process. This is useftul to reduce the numebr of network communications when slow communicatinos are established among slave processes. |
permuteStatisticSign |
Logical, if TRUE statistics signs are permuted instead of permuting data along chromsomal position. |
smoothMethod |
The deafault smoothing metod used in the PREDA_main function is lokern smoothing with scaled bandwidth, using a scaling factor equal to 2. Possible values are "lokern", for standard lokern smoothing, "quantsmooth", "spline" and "runningmean.x", where x is a user defined value for the number of adjacent data points using for running mean smoothing. |
force |
Logical, if TRUE force skipping quantsmooth control on number of data points. Singe quantsmooth is very slow with a high number of inpuit data, a check stopping computation with more than 2000 data points in one or more chromosome was introduced. This aprameter allow skippin this security check. |
lokern_scaledBandwidthFactor |
Factor of scaling for lokern estimated bandwidths |
limit.analysis |
Vector (numeric or character representing analyses names) to limit the output of preda analysis to a subset of input analyses. |
See supplementary material about PREDA method
If outputGenomicAnnotationsForPREDA is NULL, a PREDADataAndResults object is returned. Otherwise a PREDAResults object is returned instead
Francesco Ferrari
Supplementary information about PREDA method
#See examples in PREDA tutorial
#See examples in PREDA tutorial
This class is used to manage the PREDA analysis output along with corresponding input data
Objects can be created by calls of the form new("PREDADataAndResults", ids, chr, start, end, strand, chromosomesNumbers, chromosomesLabels, position, optionalAnnotations, optionalAnnotationsHeaders, analysesNames, testedTail, smoothStatistic, pvalue, qvalue, statistic)
.
analysesNames
:Object of class "character"
a character vector of unique names associated to each column of smoothStatistic, pvalue and qvalue matrices.
This is just a name that is used to identify each analysis.
testedTail
:Object of class "character"
a character describing what tail of the statistic distribution will be analyzed during PREDA analysis.
Possible values are "upper", "lower" or "both". Anyway we strongly recommend using PREDA analysis only
smoothStatistic
:Object of class "matrix"
a numeric matrix containing smoothed observed statistics as obtained from PREDA analysis.
The smoothed statistics must be provided as a matrix of numeric values,
with a number of rows equal to the length of "ids" slot and a number of columns equal
to the length of "analysesNames" slot.
pvalue
:Object of class "matrix"
a numeric matrix containing unadjusted gene-centered pvalues as obtained from PREDA analysis.
The pvalue matrix must be provided as a matrix of numeric values,
with a number of rows equal to the length of "ids" slot and a number of columns equal
to the length of "analysesNames" slot.
qvalue
:Object of class "matrix"
a numeric matrix containing adjusted gene-centered pvalues as obtained from PREDA analysis:
i.e. usually FDR adjusted pvalues, but other multiple testing methods could be adopted as well
The qvalue matrix must be provided as a matrix of numeric values,
with a number of rows equal to the length of "ids" slot and a number of columns equal
to the length of "analysesNames" slot.
position
:Object of class "integer"
a numeric vector of reference genomic positions that will be associated and used for each genomic feature under investigation for smoothing data during PREDA analysis.
ids
:Object of class "character"
a character vector of unique identifiers for the genomic features under investigation
chr
:Object of class "integer"
a numeric vector representing the chromosome where each ids is mapped.
Please note that chromosome usually not represented with a number must will be comverted to a number as well.
e.g. for Human, chromsomomees X and Y will be converted to chromsomes 23 and 24 respectively.
User defined options will allow this conversion during GenomicAnnotations objects initialization.
start
:Object of class "integer"
a numeric vector of start genomic position for each genomic feature under investigation (i.e. gene, transcript, SNP or other elements).
end
:Object of class "integer"
a numeric vector of end genomic position for each genomic feature under investigation (i.e. gene, transcript, SNP or other elements).
strand
:Object of class "numeric"
a numeric vector of strand genomic position for each genomic feature under investigation: value 1 is used for "plus" (forward) strand and value -1 for "minus" (reverse) strand.
User defined options will allow the conversion to this format during GenomicAnnotations objects initialization.
chromosomesNumbers
:Object of class "numeric"
a numeric vector containing the list of chromosomes for which genomic annotations are provided in the GenomicAnnotations object.
Each chromosome is represented just once in increasing order. Please note that chromosome usually not represented with a number must will be comverted to a number as well.
e.g. for Human, chromsomomees X and Y will be converted to chromsomes 23 and 24 respectively.
chromosomesLabels
:Object of class "character"
a character vector containing the list of chromosomes for which genomic annotations are provided in the GenomicAnnotations object.
Each chromosome is represented just once in the same order as reported in chromosomesNumbers slot.
This slot is actually used just to provide a label for each associated chromosome number, in case that some non numeric chromsome is used
(e.g. to preserve the correspondence between chr 23 and the actual chr X in Human)
optionalAnnotations
:Object of class "matrix"
optional annotations associated to the genomic features can be managed along with genomic positions annotations.
E.g. GeneSymbol or EntrezGene ids can be associated to gene realted GenomicAnnotaitons objects.
These additional annotations are not mandatory (the default value for this slot is NULL)
The additional annotations must be provided as a matrix of character,
with a number of rows equal to the length of "ids" slot and a number of columns equal
to the length of "optionalAnnotationsHeaders" slot.
optionalAnnotationsHeaders
:Object of class "character"
character vector containing the names associated to optional annotations. Please avoid using spaces in annotations names.
statistic
:Object of class "matrix"
a numeric matrix containing gene-centered statistics (or statistics on genomic data centered on other genomic features under investigation).
The statistics must be provided as a matrix of numeric values,
with a number of rows equal to the length of "ids" slot and a number of columns equal
to the length of "analysesNames" slot.
Class "PREDAResults"
, directly.
Class "DataForPREDA"
, directly.
Class "GenomicAnnotationsForPREDA"
, by class "PREDAResults", distance 2.
Class "StatisticsForPREDA"
, by class "DataForPREDA", distance 2.
Class "GenomicAnnotations"
, by class "PREDAResults", distance 3.
signature(.Object = "PREDADataAndResults")
: sort annotations according to selected chromosomes and to remove genes containing any NA annotation field
signature(.Object = "PREDADataAndResults")
: initialize method for PREDADataAndResults objects
signature(.Object = "PREDADataAndResults")
: extract data and annotations as a dataframe with probeids as rownames
This class is better described in the package vignette
Francesco Ferrari
"GenomicAnnotations"
, "GenomicAnnotationsForPREDA"
, "StatisticsForPREDA"
,
"DataForPREDA"
, "PREDAResults"
,
GenomicAnnotationsSortAndCleanNA
,PREDADataAndResults2dataframe
showClass("PREDADataAndResults")
showClass("PREDADataAndResults")
extract data and annotations as a dataframe with probeids as rownames
PREDADataAndResults2dataframe(.Object)
PREDADataAndResults2dataframe(.Object)
.Object |
An object of class PREDADataAndResults |
this class is used to manage the basic PREDA analysis output including smoothened statistic, pvalues and qvalues.
Objects can be created by calls of the form new("PREDAResults", ids, chr, start, end, strand, chromosomesNumbers, chromosomesLabels, position, optionalAnnotations, optionalAnnotationsHeaders, analysesNames, testedTail, smoothStatistic, pvalue, qvalue)
.
analysesNames
:Object of class "character"
a character vector of unique names associated to each column of smoothStatistic, pvalue and qvalue matrices.
This is just a name that is used to identify each analysis.
testedTail
:Object of class "character"
a character describing what tail of the statistic distribution will be analyzed during PREDA analysis.
Possible values are "upper", "lower" or "both". Anyway we strongly recommend using PREDA analysis only
smoothStatistic
:Object of class "matrix"
a numeric matrix containing smoothed observed statistics as obtained from PREDA analysis.
The smoothed statistics must be provided as a matrix of numeric values,
with a number of rows equal to the length of "ids" slot and a number of columns equal
to the length of "analysesNames" slot.
pvalue
:Object of class "matrix"
a numeric matrix containing unadjusted gene-centered pvalues as obtained from PREDA analysis.
The pvalue matrix must be provided as a matrix of numeric values,
with a number of rows equal to the length of "ids" slot and a number of columns equal
to the length of "analysesNames" slot.
qvalue
:Object of class "matrix"
a numeric matrix containing adjusted gene-centered pvalues as obtained from PREDA analysis:
i.e. usually FDR adjusted pvalues, but other multiple testing methods could be adopted as well
The qvalue matrix must be provided as a matrix of numeric values,
with a number of rows equal to the length of "ids" slot and a number of columns equal
to the length of "analysesNames" slot.
position
:Object of class "integer"
a numeric vector of reference genomic positions that will be associated and used for each genomic feature under investigation for smoothing data during PREDA analysis.
ids
:Object of class "character"
a character vector of unique identifiers for the genomic features under investigation
chr
:Object of class "integer"
a numeric vector representing the chromosome where each ids is mapped.
Please note that chromosome usually not represented with a number must will be comverted to a number as well.
e.g. for Human, chromsomomees X and Y will be converted to chromsomes 23 and 24 respectively.
User defined options will allow this conversion during GenomicAnnotations objects initialization.
start
:Object of class "integer"
a numeric vector of start genomic position for each genomic feature under investigation (i.e. gene, transcript, SNP or other elements).
end
:Object of class "integer"
a numeric vector of end genomic position for each genomic feature under investigation (i.e. gene, transcript, SNP or other elements).
strand
:Object of class "numeric"
a numeric vector of strand genomic position for each genomic feature under investigation: value 1 is used for "plus" (forward) strand and value -1 for "minus" (reverse) strand.
User defined options will allow the conversion to this format during GenomicAnnotations objects initialization.
chromosomesNumbers
:Object of class "numeric"
a numeric vector containing the list of chromosomes for which genomic annotations are provided in the GenomicAnnotations object.
Each chromosome is represented just once in increasing order. Please note that chromosome usually not represented with a number must will be comverted to a number as well.
e.g. for Human, chromsomomees X and Y will be converted to chromsomes 23 and 24 respectively.
chromosomesLabels
:Object of class "character"
a character vector containing the list of chromosomes for which genomic annotations are provided in the GenomicAnnotations object.
Each chromosome is represented just once in the same order as reported in chromosomesNumbers slot.
This slot is actually used just to provide a label for each associated chromosome number, in case that some non numeric chromsome is used
(e.g. to preserve the correspondence between chr 23 and the actual chr X in Human)
optionalAnnotations
:Object of class "matrix"
optional annotations associated to the genomic features can be managed along with genomic positions annotations.
E.g. GeneSymbol or EntrezGene ids can be associated to gene realted GenomicAnnotaitons objects.
These additional annotations are not mandatory (the default value for this slot is NULL)
The additional annotations must be provided as a matrix of character,
with a number of rows equal to the length of "ids" slot and a number of columns equal
to the length of "optionalAnnotationsHeaders" slot.
optionalAnnotationsHeaders
:Object of class "character"
character vector containing the names associated to optional annotations. Please avoid using spaces in annotations names.
Class "GenomicAnnotationsForPREDA"
, directly.
Class "GenomicAnnotations"
, by class "GenomicAnnotationsForPREDA", distance 2.
signature(.Object = "PREDAResults")
: sort annotations according to selected chromosomes and to remove genes containing any NA annotation field
signature(.Object = "PREDAResults")
: initialize method for PREDAResults objects
signature(.Object = "PREDAResults")
: extact preda results statistics as a data frame object
signature(.Object = "PREDAResults")
: identify significant genomic regions from a PREDAResults object
signature(.Object = "PREDAResults")
: identify significant genomic regions from a single analysis in a PREDAResults object
signature(.Object = "PREDAResults")
: merge PREDAResults and input statistics to create a PREDADataAndResults object
signature(.Object = "PREDAResults")
: extract genomic positions with significant alterations as a matrix of flags from a PREDAResults object
This class is better described in the package vignette
Francesco Ferrari
"GenomicAnnotations"
, "GenomicAnnotationsForPREDA"
,
GenomicAnnotationsSortAndCleanNA
, PREDAResults2dataframe
,
PREDAResults2GenomicRegions
, PREDAResults2GenomicRegionsSingle
,
PREDAResults2PREDADataAndResults
, PREDAResultsGetObservedFlags
showClass("PREDAResults")
showClass("PREDAResults")
extact preda results statistics as a data frame object
PREDAResults2dataframe(.Object)
PREDAResults2dataframe(.Object)
.Object |
An object of class PREDAResults |
identify significant genomic regions from a PREDAResults object
# PREDAResults2GenomicRegions(.Object, qval.threshold=0.05, # use.referencePositions=TRUE, smoothStatistic.tail=NULL, # smoothStatistic.threshold=NULL) PREDAResults2GenomicRegions(.Object, ...)
# PREDAResults2GenomicRegions(.Object, qval.threshold=0.05, # use.referencePositions=TRUE, smoothStatistic.tail=NULL, # smoothStatistic.threshold=NULL) PREDAResults2GenomicRegions(.Object, ...)
.Object |
Object of class PREDAResults or PREDADataAndResults |
... |
See below
|
A list og genomic regions objects is returned: one GenomicRegions object for each analysis in the input PREDAresults.
A NULL element is included in the output list whenever no siginifcant regions are identified.
A list of genomic regions objects
Francesco Ferrari
## Not run: require(PREDAsampledata) data(GEanalysisResults) genomic_regions_UP<-PREDAResults2GenomicRegions(GEanalysisResults , qval.threshold=0.05, smoothStatistic.tail="upper", smoothStatistic.threshold=0.5) ## End(Not run)
## Not run: require(PREDAsampledata) data(GEanalysisResults) genomic_regions_UP<-PREDAResults2GenomicRegions(GEanalysisResults , qval.threshold=0.05, smoothStatistic.tail="upper", smoothStatistic.threshold=0.5) ## End(Not run)
identify significant genomic regions from a single analysis in a PREDAResults object
# PREDAResults2GenomicRegionsSingle(.Object, # qval.threshold=0.05, analysisName=NULL, # use.referencePositions=TRUE, smoothStatistic.tail=NULL, # smoothStatistic.threshold=NULL) PREDAResults2GenomicRegionsSingle(.Object, ...)
# PREDAResults2GenomicRegionsSingle(.Object, # qval.threshold=0.05, analysisName=NULL, # use.referencePositions=TRUE, smoothStatistic.tail=NULL, # smoothStatistic.threshold=NULL) PREDAResults2GenomicRegionsSingle(.Object, ...)
.Object |
Object of class PREDAResults or PREDADataAndResults |
... |
See below
|
merge PREDAResults and input statistics to create a PREDADataAndResults object
# PREDAResults2PREDADataAndResults(.Object, statistic) PREDAResults2PREDADataAndResults(.Object, ...)
# PREDAResults2PREDADataAndResults(.Object, statistic) PREDAResults2PREDADataAndResults(.Object, ...)
.Object |
An object of class PREDAResults |
... |
See below
|
extract genomic positions with significant alterations as a matrix of flags from a PREDAResults object
# PREDAResultsGetObservedFlags(.Object, qval.threshold=0.05, # smoothStatistic.tail=NULL, smoothStatistic.threshold=NULL, # null.value=0, significant.value=1) PREDAResultsGetObservedFlags(.Object, ...)
# PREDAResultsGetObservedFlags(.Object, qval.threshold=0.05, # smoothStatistic.tail=NULL, smoothStatistic.threshold=NULL, # null.value=0, significant.value=1) PREDAResultsGetObservedFlags(.Object, ...)
.Object |
An object of class PREDAResults or PREDADataAndResults |
... |
See below
|
Wrapper function for gene expression data preprocessing for differential expression analysis with PREDA
preprocessingGE(SampleInfoFile = NULL, CELfiles_dir = NULL, AffyBatchInput = NULL, custom_cdfname, arrayNameColumn = NULL, sampleNameColumn = NULL, classColumn, referenceGroupLabel, statisticType, optionalAnnotations = NULL, retain.chrs = NULL, reference_position_type = "median", testedTail = "both")
preprocessingGE(SampleInfoFile = NULL, CELfiles_dir = NULL, AffyBatchInput = NULL, custom_cdfname, arrayNameColumn = NULL, sampleNameColumn = NULL, classColumn, referenceGroupLabel, statisticType, optionalAnnotations = NULL, retain.chrs = NULL, reference_position_type = "median", testedTail = "both")
SampleInfoFile |
Path to sample info file |
CELfiles_dir |
Path to directory containing raw CEL data files for Affymetrix arrays |
AffyBatchInput |
Alternatively input raw data can be provided as an AffyBatch object. In this case sample classes will be inferred from phenodata contained in AffyBatch object. In particular classColumn parameter will refer to the column in pData(AffyBatchInput) object. |
custom_cdfname |
Specify the cdf library to be used for data preprocessing |
arrayNameColumn |
Column of sampleinfo file containing the name of raw data (CEL) files |
sampleNameColumn |
Column of sampleinfo file containing the name to be used for samples labels |
classColumn |
Column of sampleinfo file containing the label of sample classes. If input raw data are provided as an AffyBatch object, this parameter refers intead to the column in pData(AffyBatchInput) object. |
referenceGroupLabel |
Specify which class label is used for the reference sample used in computing statistics for differential expression. |
statisticType |
Stastistic for differential expression that is computed on input data. Possible values are "tstatistic", "FC" (Fold Change), "FCmedian" (fold change computed on medians) |
optionalAnnotations |
Character vector to select additional annotations fields to be included into the GenomicAnnotations object. |
retain.chrs |
Numeric vector, containing the list of chromosomes selected for the output GenomicAnnotations object. E.g. set retain.chrs=1:22 to limit the GenomicAnnotations object to chromosomes from 1 to 22. This might be ueseful to limit GenomiAnnotations objects to autosomic chromosomes. |
reference_position_type |
Specify which genomic coordinate must be used as reference position for PREDA analysis. Possible values are "start", "end", "median", "strand.start" or "strand.end". |
testedTail |
Specify what tail of the distribution will be tested for significantly extreme values in PREDA analysis. Possible values are "both", "upper" or "lower". |
Preprocess raw (CEL) files for Affymetrix gene expression arrays using user defined CDF libraries and RMA normalization. Then statistics for differential expression are computed. Then annotations are retrieved from the corresponding annotation library.
Please note this function is a user-friendly preprocessing function for Affy gene expression microarrays. Step by step preprocessing functions can be used with any other platform.
A DataForPREDA object is returned.
Francesco Ferrari
## Not run: require("PREDAsampledata") CELfilesPath <- system.file("sampledata", "GeneExpression", package = "PREDAsampledata") infofile <- file.path(CELfilesPath , "sampleinfoGE_PREDA.txt") sampleinfo<-read.table(infofile, sep="\t", header=TRUE) GEDataForPREDA<-preprocessingGE(SampleInfoFile=infofile, CELfiles_dir=CELfilesPath, custom_cdfname="hgu133plus2", arrayNameColumn=1, sampleNameColumn=2, classColumn="Class", referenceGroupLabel="normal", statisticType="tstatistic", optionalAnnotations=c("SYMBOL", "ENTREZID"), retain.chrs=1:22 ) ## End(Not run)
## Not run: require("PREDAsampledata") CELfilesPath <- system.file("sampledata", "GeneExpression", package = "PREDAsampledata") infofile <- file.path(CELfilesPath , "sampleinfoGE_PREDA.txt") sampleinfo<-read.table(infofile, sep="\t", header=TRUE) GEDataForPREDA<-preprocessingGE(SampleInfoFile=infofile, CELfiles_dir=CELfilesPath, custom_cdfname="hgu133plus2", arrayNameColumn=1, sampleNameColumn=2, classColumn="Class", referenceGroupLabel="normal", statisticType="tstatistic", optionalAnnotations=c("SYMBOL", "ENTREZID"), retain.chrs=1:22 ) ## End(Not run)
Wrapper function for gene expression statistics preprocessing for SODEGIR analysis.
# SODEGIR_GEstatistics(.Object, pData_classColumn=NULL, # referenceGroupLabel=NULL, # statisticType=c("tstatistic", "FC", "FCmedian", "eBayes"), # singleSampleOutput=TRUE, varianceAll=FALSE) SODEGIR_GEstatistics(.Object, ...)
# SODEGIR_GEstatistics(.Object, pData_classColumn=NULL, # referenceGroupLabel=NULL, # statisticType=c("tstatistic", "FC", "FCmedian", "eBayes"), # singleSampleOutput=TRUE, varianceAll=FALSE) SODEGIR_GEstatistics(.Object, ...)
.Object |
An object of class ExpressionSet containing gene expression input data |
... |
See below
|
Using an ExpressionSet object as input, statistics for differential expression are computed comparing each sample with the reference group.
The output is returned as a matrix.
Francesco Ferrari
Silvio Bicciato, Roberta Spinelli, Mattia Zampieri, Eleonora Mangano, Francesco Ferrari, Luca Beltrame, Ingrid Cifola, Clelia Peano, Aldo Solari, and Cristina Battaglia. A computational procedure to identify significant overlap of differentially expressed and genomic imbalanced regions in cancer datasets. Nucleic Acids Res, 37(15):5057-70, August 2009.
preprocessingGE
,
SODEGIRpreprocessingGE
,
ExpressionSet
Wrapper function for gene expression data preprocessing for SODEGIR analysis
SODEGIRpreprocessingGE(SampleInfoFile = NULL, CELfiles_dir = NULL, AffyBatchInput = NULL, custom_cdfname, arrayNameColumn = NULL, sampleNameColumn = NULL, classColumn, referenceGroupLabel, statisticType, optionalAnnotations = NULL, retain.chrs = NULL, reference_position_type = "median", testedTail = "both", singleSampleOutput = TRUE, varianceAll=FALSE)
SODEGIRpreprocessingGE(SampleInfoFile = NULL, CELfiles_dir = NULL, AffyBatchInput = NULL, custom_cdfname, arrayNameColumn = NULL, sampleNameColumn = NULL, classColumn, referenceGroupLabel, statisticType, optionalAnnotations = NULL, retain.chrs = NULL, reference_position_type = "median", testedTail = "both", singleSampleOutput = TRUE, varianceAll=FALSE)
SampleInfoFile |
Path to sample info file |
CELfiles_dir |
Path to directory containing raw CEL data files for Affymetrix arrays |
AffyBatchInput |
Alternatively input raw data can be provided as an AffyBatch object. In this case sample classes will be inferred from phenodata contained in AffyBatch object. In particular classColumn parameter will refer to the column in pData(AffyBatchInput) object. |
custom_cdfname |
Specify the cdf library to be used for data preprocessing |
arrayNameColumn |
Column of sampleinfo file containing the name of raw data (CEL) files |
sampleNameColumn |
Column of sampleinfo file containing the name to be used for samples labels |
classColumn |
Column of sampleinfo file containing the label of sample classes. If input raw data are provided as an AffyBatch object, this parameter refers intead to the column in pData(AffyBatchInput) object. |
referenceGroupLabel |
Specify which class label is used for the reference sample used in computing statistics for differential expression. |
statisticType |
Stastistic for differential expression that is computed on input data. Possible values are "tstatistic", "FC" (Fold Change), "FCmedian" (fold change computed on medians) |
optionalAnnotations |
Character vector to select additional annotations fields to be included into the GenomicAnnotations object. |
retain.chrs |
Numeric vector, containing the list of chromosomes selected for the output GenomicAnnotations object. E.g. set retain.chrs=1:22 to limit the GenomicAnnotations object to chromosomes from 1 to 22. This might be ueseful to limit GenomiAnnotations objects to autosomic chromosomes. |
reference_position_type |
Specify which genomic coordinate must be used as reference position for PREDA analysis. Possible values are "start", "end", "median", "strand.start" or "strand.end". "strand.start" is strand specific start: i.e. start on positive strand but end on negative strand. "strand.end" is strand specific end. |
testedTail |
Specify what tail of the distribution will be tested for significantly extreme values in PREDA analysis. Possible values are "both", "upper" or "lower". |
singleSampleOutput |
Logical, if TRUE a statistic comparing each sample with the reference group is computed. |
varianceAll |
This parameter affect the computation only when singleSampleOutput is TRUE. varianceAll is itself a logical parameter. If TRUE, all pathological (e.g. tumor) samples and all normal (reference) samples are used to estimate variance in the comparison of individual pathological samples to the normal reference, as described in the original SODEGIR apper by Bicciato et al. (Nucleic Acids Res. 2009). The original SODEGIR statistic for Gene Expression was based on the SAM score. However, since July 2018 the original samr package is no more available in CRAN. Therefore in the current PREDA version the varianceAll=TRUE and singleSampleOutput=TREU can't be used with SAM. When singleSampleOutput is TRUE and a different statisticType is used, the variance is actually computed using only the normal (reference) samples. If FALSE (default value), the computation of statistics for single sample VS reference comparisons only take into account the variance in the reference group of samples. |
Preprocess raw (CEL) files for Affymetrix gene expression arrays using user defined CDF libraries and RMA normalization.
Then statistics for differential expression are computed comparing each sample with the reference group.
Then annotations are retrieved from the corresponding annotation library.
Please note this function is a user-friendly preprocessing function for Affy gene expression microarrays. Step by step preprocessing functions can be used with any other platform.
A DataForPREDA object is returned.
Francesco Ferrari
Silvio Bicciato, Roberta Spinelli, Mattia Zampieri, Eleonora Mangano, Francesco Ferrari, Luca Beltrame, Ingrid Cifola, Clelia Peano, Aldo Solari, and Cristina Battaglia. A computational procedure to identify significant overlap of differentially expressed and genomic imbalanced regions in cancer datasets. Nucleic Acids Res, 37(15):5057-70, August 2009.
## Not run: require(PREDAsampledata) CELfilesPath <- system.file("sampledata", "GeneExpression", package = "PREDAsampledata") infofile <- file.path(CELfilesPath , "sampleinfoGE_PREDA.txt") SODEGIRGEDataForPREDA<-SODEGIRpreprocessingGE(SampleInfoFile= infofile, CELfiles_dir=CELfilesPath, custom_cdfname="hgu133plus2", arrayNameColumn=1, sampleNameColumn=2, classColumn="Class", referenceGroupLabel="normal", statisticType="tstatistic", optionalAnnotations=c("SYMBOL", "ENTREZID"), retain.chrs=1:22 ) ## End(Not run)
## Not run: require(PREDAsampledata) CELfilesPath <- system.file("sampledata", "GeneExpression", package = "PREDAsampledata") infofile <- file.path(CELfilesPath , "sampleinfoGE_PREDA.txt") SODEGIRGEDataForPREDA<-SODEGIRpreprocessingGE(SampleInfoFile= infofile, CELfiles_dir=CELfilesPath, custom_cdfname="hgu133plus2", arrayNameColumn=1, sampleNameColumn=2, classColumn="Class", referenceGroupLabel="normal", statisticType="tstatistic", optionalAnnotations=c("SYMBOL", "ENTREZID"), retain.chrs=1:22 ) ## End(Not run)
This class is used to manage the datamatrix containing statistics for PREDA analyses: i.e. the gene (or other genomic feature) centered statistics accounting for differential expression (or for the other type of variation under investigation)
Objects can be created by calls of the form new("StatisticsForPREDA", ids, statistic, analysesNames, testedTail)
.
ids
:Object of class "character"
a character vector of unique identifiers for the genomic features under investigation
statistic
:Object of class "matrix"
a numeric matrix containing gene-centered statistics (or statistics on genomic data centered on other genomic features under investigation).
The statistics must be provided as a matrix of numeric values,
with a number of rows equal to the length of "ids" slot and a number of columns equal
to the length of "analysesNames" slot.
analysesNames
:Object of class "character"
a character vector of unique names associated to each column of statistic matrix.
This is just a name that will be used to identify each analysis.
testedTail
:Object of class "character"
a character describing what tail of the statistic distribution will be analyzed during PREDA analysis.
Possible values are "upper", "lower" or "both". Anyway we strongly recommend using PREDA analysis only
for statistics on genomic data with a symmetric distribution around zero.
signature(.Object = "StatisticsForPREDA")
: get the names of the analyses in the StatisticsForPREDA object
signature(.Object = "StatisticsForPREDA")
: extract data for individual analyses using the analysis name
signature(.Object = "StatisticsForPREDA")
: initialize method for StatisticsForPREDA objects
signature(.Object = "StatisticsForPREDA")
: extract data as a dataframe with probeids as rownames
signature(.Object = "StatisticsForPREDA")
: filter statistics to remove selected analyses
signature(.Object = "StatisticsForPREDA")
: filter statistics to keep selected analyses
This class is better described in the package vignette
Francesco Ferrari
"DataForPREDA"
,
analysesNames
,getStatisticByName
StatisticsForPREDA2dataframe
,
StatisticsForPREDAFilterColumns_neg
,StatisticsForPREDAFilterColumns_pos
showClass("StatisticsForPREDA")
showClass("StatisticsForPREDA")
extract data as a dataframe with probeids as rownames
StatisticsForPREDA2dataframe(.Object)
StatisticsForPREDA2dataframe(.Object)
.Object |
An object of class StatisticsForPREDA |
filter statistics to remove selected analyses
# StatisticsForPREDAFilterColumns_neg(.Object, analysesToRemove, # analysesAsNames=FALSE) StatisticsForPREDAFilterColumns_neg(.Object, ...)
# StatisticsForPREDAFilterColumns_neg(.Object, analysesToRemove, # analysesAsNames=FALSE) StatisticsForPREDAFilterColumns_neg(.Object, ...)
.Object |
An object of class StatisticsForPREDA |
... |
See below
|
filter statistics to keep selected analyses
# StatisticsForPREDAFilterColumns_pos(.Object, analysesToRetain, # analysesAsNames=FALSE) StatisticsForPREDAFilterColumns_pos(.Object, ...)
# StatisticsForPREDAFilterColumns_pos(.Object, analysesToRetain, # analysesAsNames=FALSE) StatisticsForPREDAFilterColumns_pos(.Object, ...)
.Object |
An object of class StatisticsForPREDA |
... |
See below
|
Function to create a StatisticsForPREDA objet from a dataframe
StatisticsForPREDAFromdataframe(StatisticsForPREDA_dataframe, ids_column = NULL, statistic_columns = NULL, analysesNames = NULL, testedTail = c("upper", "lower", "both"))
StatisticsForPREDAFromdataframe(StatisticsForPREDA_dataframe, ids_column = NULL, statistic_columns = NULL, analysesNames = NULL, testedTail = c("upper", "lower", "both"))
StatisticsForPREDA_dataframe |
Input dataframe containing statistics on genomics data. |
ids_column |
Specify the column from the input dataframe with gene (or other genomic features) ids. Can be specified using column index (numeric) or column name (character). |
statistic_columns |
Specify the column (or columns) from the input dataframe with gsta.enomic data statistics that will be included in the statisticsForPREDA object. Can be specified using column index (numeric) or column name (character). If NULL (default), all columns excluding ids_column will be considered as input statistics |
analysesNames |
Names (labels) to be associated to each input statistic. If NULL the column names for statistics_columns will be used. |
testedTail |
Specify what tail of the distribution will be tested for significantly extreme values in PREDA analysis. Possible values are "both", "upper" or "lower". |
... |
any other parameter for read.table function that could be useful for parsing the input file, such as "sep", "quote", "header", "na.strings" and other parameters. |
A dataframe is parsed and a statisticsForPREDA object is built using contained data.
A statisticsForPREDA object
Francesco Ferrari
## Not run: require(PREDAsampledata) CNdataPath <- system.file("sampledata", "CopyNumber", package = "PREDAsampledata") CNdataFile <- file.path(CNdataPath , "CNAG_data_PREDA.txt") CNannotationFile <- file.path(CNdataPath , "SNPAnnot100k.csv") CNStatisticsForPREDA<-StatisticsForPREDAFromdataframe(file=CNdataFile, ids_column="AffymetrixSNPsID", testedTail="both", sep="\t", header=TRUE) ## End(Not run)
## Not run: require(PREDAsampledata) CNdataPath <- system.file("sampledata", "CopyNumber", package = "PREDAsampledata") CNdataFile <- file.path(CNdataPath , "CNAG_data_PREDA.txt") CNannotationFile <- file.path(CNdataPath , "SNPAnnot100k.csv") CNStatisticsForPREDA<-StatisticsForPREDAFromdataframe(file=CNdataFile, ids_column="AffymetrixSNPsID", testedTail="both", sep="\t", header=TRUE) ## End(Not run)
function to compute a statisticsForPREDA object from an ExpressionSet object
# statisticsForPREDAfromEset(.Object, pData_classColumn=NULL, # statisticType=NULL, logged=TRUE, referenceGroupLabel=NULL, # classVector=NULL, testedTail="both") statisticsForPREDAfromEset(.Object, ...)
# statisticsForPREDAfromEset(.Object, pData_classColumn=NULL, # statisticType=NULL, logged=TRUE, referenceGroupLabel=NULL, # classVector=NULL, testedTail="both") statisticsForPREDAfromEset(.Object, ...)
.Object |
Object of class ExpressionSet |
... |
See below
|
An object of class ExpressionSet is used as input and gene centered statistics for differential expression are computed on the contained data. The computed statistics are used to build a StatisticsForPREDA object
An object of class StatisticsForPREDA
Francesco Ferrari
## Not run: require(PREDAsampledata) data(ExpressionSetRCC) GEstatisticsForPREDA<-statisticsForPREDAfromEset( ExpressionSetRCC, statisticType="tstatistic", referenceGroupLabel="normal", classVector=sampleinfo[,"Class"]) ## End(Not run)
## Not run: require(PREDAsampledata) data(ExpressionSetRCC) GEstatisticsForPREDA<-statisticsForPREDAfromEset( ExpressionSetRCC, statisticType="tstatistic", referenceGroupLabel="normal", classVector=sampleinfo[,"Class"]) ## End(Not run)
Function to create a StatisticsForPREDA objet from a txt file
StatisticsForPREDAFromfile(file, ids_column = NULL, statistic_columns = NULL, analysesNames = NULL, testedTail = c("upper", "lower", "both"), ...)
StatisticsForPREDAFromfile(file, ids_column = NULL, statistic_columns = NULL, analysesNames = NULL, testedTail = c("upper", "lower", "both"), ...)
file |
Path to the input txt file containing statistics on genomics data |
ids_column |
Specify the column from the input txt file with gene (or other genomic features) ids. Can be specified using column index (numeric) or column name (character). |
statistic_columns |
Specify the column (or columns) from the input txt file with gsta.enomic data statistics that will be included in the statisticsForPREDA object. Can be specified using column index (numeric) or column name (character). If NULL (default), all columns excluding ids_column will be considered as input statistics |
analysesNames |
Names (labels) to be associated to each input statistic. If NULL the column names for statistics_columns will be used. |
testedTail |
Specify what tail of the distribution will be tested for significantly extreme values in PREDA analysis. Possible values are "both", "upper" or "lower". |
... |
any other parameter for read.table function that could be useful for parsing the input file, such as "sep", "quote", "header", "na.strings" and other parameters. |
A txt file is parsed and a statisticsForPREDA object is built using contained data.
A statisticsForPREDA object
Francesco Ferrari
## Not run: require(PREDAsampledata) CNdataPath <- system.file("sampledata", "CopyNumber", package = "PREDAsampledata") CNdataFile <- file.path(CNdataPath , "CNAG_data_PREDA.txt") CNannotationFile <- file.path(CNdataPath , "SNPAnnot100k.csv") CNStatisticsForPREDA<-StatisticsForPREDAFromfile(file=CNdataFile, ids_column="AffymetrixSNPsID", testedTail="both", sep="\t", header=TRUE) ## End(Not run)
## Not run: require(PREDAsampledata) CNdataPath <- system.file("sampledata", "CopyNumber", package = "PREDAsampledata") CNdataFile <- file.path(CNdataPath , "CNAG_data_PREDA.txt") CNannotationFile <- file.path(CNdataPath , "SNPAnnot100k.csv") CNStatisticsForPREDA<-StatisticsForPREDAFromfile(file=CNdataFile, ids_column="AffymetrixSNPsID", testedTail="both", sep="\t", header=TRUE) ## End(Not run)