Title: | ChIPanalyser: Predicting Transcription Factor Binding Sites |
---|---|
Description: | ChIPanalyser is a package to predict and understand TF binding by utilizing a statistical thermodynamic model. The model incorporates 4 main factors thought to drive TF binding: Chromatin State, Binding energy, Number of bound molecules and a scaling factor modulating TF binding affinity. Taken together, ChIPanalyser produces ChIP-like profiles that closely mimic the patterns seens in real ChIP-seq data. |
Authors: | Patrick C.N.Martin & Nicolae Radu Zabet |
Maintainer: | Patrick C.N. Martin <[email protected]> |
License: | GPL-3 |
Version: | 1.29.0 |
Built: | 2024-12-29 05:04:55 UTC |
Source: | https://github.com/bioc/ChIPanalyser |
ChIPanalyser is a package to predict and understand TF binding by utilizing a statistical thermodynamic model. The model incorporates 4 main factors thought to drive TF binding: Chromatin State, Binding energy, Number of bound molecules and a scaling factor modulating TF binding affinity. Taken together, ChIPanalyser produces ChIP-like profiles that closely mimic the patterns seens in real ChIP-seq data.
The DESCRIPTION file:
Package: | ChIPanalyser |
Type: | Package |
Title: | ChIPanalyser: Predicting Transcription Factor Binding Sites |
Authors@R: | c( person(c("Patrick", "CN"), "Martin", role=c("cre", "aut"), email="[email protected]"), person(c("Nicolea","Radu"), "Zabet", role="aut")) |
Version: | 1.29.0 |
Date: | 2017-09-01 |
Author: | Patrick C.N.Martin & Nicolae Radu Zabet |
Maintainer: | Patrick C.N. Martin <[email protected]> |
Citation: | Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. |
Description: | ChIPanalyser is a package to predict and understand TF binding by utilizing a statistical thermodynamic model. The model incorporates 4 main factors thought to drive TF binding: Chromatin State, Binding energy, Number of bound molecules and a scaling factor modulating TF binding affinity. Taken together, ChIPanalyser produces ChIP-like profiles that closely mimic the patterns seens in real ChIP-seq data. |
License: | GPL-3 |
Collate: | 2AllS4Class_ProfileParameters.R 3AllGenerics.R 4AllMethods.R AllInitialize.R AllShowMethods.R computeChIPProfile.R computeOccupancy.R computeOptimal.R computePWMScore.R computeGenomeWide.R parallelInternalFunctionsDev.R GenomicProfileGenericFunctions.R plotOccupancy.R plotOptimalHeatMapDev.R DataPreprocessingDev.R DataPreprocessingGenericFunctionsDev.R profileAccuracyEstimateDev.R GAAnalysis.R GAGeneric.R |
Depends: | R (>= 3.5.0),GenomicRanges, Biostrings, BSgenome, RcppRoll, parallel |
Imports: | methods, IRanges, S4Vectors,grDevices,graphics,stats,utils,rtracklayer,ROCR, BiocManager,GenomeInfoDb,RColorBrewer |
Suggests: | BSgenome.Dmelanogaster.UCSC.dm6,knitr, RUnit, BiocGenerics |
Encoding: | UTF-8 |
LazyData: | true |
biocViews: | Software, BiologicalQuestion, WorkflowStep, Transcription, Sequencing, ChipOnChip, Coverage, Alignment, ChIPSeq, SequenceMatching, DataImport ,PeakDetection |
VignetteBuilder: | knitr |
RoxygenNote: | 7.2.1 |
Config/pak/sysreqs: | make libxml2-dev libssl-dev |
Repository: | https://bioc.r-universe.dev |
RemoteUrl: | https://github.com/bioc/ChIPanalyser |
RemoteRef: | HEAD |
RemoteSha: | 0c3209595339e38bf8d122dfe6a8284931d9b4f4 |
Index of help topics:
BPFrequency Accessor method for 'BPFrequency' slot in a 'genomicProfiles' object. BPFrequency-methods ~~ Methods for Function 'BPFrequency' ~~ BPFrequency<- Setter method for 'BPFrequency' slot in a 'genomicProfiles' object. BPFrequency<--methods ~~ Methods for Function 'BPFrequency<-' ~~ ChIPScore-class Class '"ChIPScore"' ChIPanalyser ChIPanalyser: Predicting Transcription Factor Binding Sites ChIPanalyserData ChIPanalyserData DNASequenceLength Accessor method for 'DNASequenceLength' slot in a 'genomicProfiles' DNASequenceLength-methods ~~ Methods for Function 'DNASequenceLength' ~~ GRList-class Class '"GRList"' PFMFormat Accesor method for the 'PFMFormat' slot in a 'genomicProfiles' object PFMFormat-methods ~~ Methods for Function 'PFMFormat' ~~ PFMFormat<- Setter method for the 'PFMFormat' slot in a 'genomicProfiles' object PFMFormat<--methods ~~ Methods for Function 'PFMFormat<-' ~~ PWMThreshold Accessor method for the 'PWMThreshold' slot in a 'parameterOptions' object PWMThreshold-methods ~~ Methods for Function 'PWMThreshold' ~~ PWMThreshold<- Setter Method for the 'PWMThreshold' slot in a 'parameterOptions' object PWMThreshold<--methods ~~ Methods for Function 'PWMThreshold<-' ~~ PWMpseudocount Accessor Method for a 'PWMpseudocount' slot in a 'parameterOptions' PWMpseudocount-methods ~~ Methods for Function 'PWMpseudocount' ~~ PWMpseudocount<- Setter Method for the 'pseudocount' slot in a 'parameterOptions' object PWMpseudocount<--methods ~~ Methods for Function 'PWMpseudocount<-' ~~ PositionFrequencyMatrix Accessor method for the 'PFM' slot in a 'genomicProfiles' object PositionFrequencyMatrix-methods ~~ Methods for Function 'PositionFrequencyMatrix' ~~ PositionFrequencyMatrix<- Setter method for the 'PFM' slot in a 'genomicProfiles' object PositionFrequencyMatrix<--methods ~~ Methods for Function 'PositionFrequencyMatrix<-' ~~ PositionWeightMatrix Accessor Method for the 'PWM' slot in a 'genomicProfiles' object PositionWeightMatrix-methods ~~ Methods for Function 'PositionWeightMatrix' ~~ PositionWeightMatrix<- Setter Method for the 'PositionWeightMatrix' slot in a 'genomicProfiles' object PositionWeightMatrix<--methods ~~ Methods for Function 'PositionWeightMatrix<-' ~~ averageExpPWMScore Accessor for 'averageExpPWMScore' slot in a 'genomicProfiles' object. averageExpPWMScore-methods ~~ Methods for Function 'averageExpPWMScore' ~~ backgroundSignal Accessor method for the 'backgroundSignal' slot in a 'parameterOptions' object. backgroundSignal-methods ~~ Methods for Function 'backgroundSignal' ~~ backgroundSignal<- Setter method for 'backgroundSignal' slot in a 'parameterOptions' backgroundSignal<--methods ~~ Methods for Function 'backgroundSignal<-' ~~ boundMolecules Accessor methods for 'boundMolecules' slot in 'parameterOptions' object. boundMolecules-methods ~~ Methods for Function 'boundMolecules' ~~ boundMolecules<- Setter method for the 'boundMolecules' slot in a 'parameterOptions' object. boundMolecules<--methods ~~ Methods for Function 'boundMolecules<-' ~~ chipMean Accessor method for 'chipMean' slot in a 'parameterOptions' object. chipMean-methods ~~ Methods for Function 'chipMean' ~~ chipMean<- Access methods for 'chipMean' slot in 'parameterOptions' object. chipMean<--methods ~~ Methods for Function 'chipMean<-' ~~ chipSd Accessor method for 'chipSd' slot in a 'parameterOptions' object. chipSd-methods ~~ Methods for Function 'chipSd' ~~ chipSd<- Setter methods for 'chipSd' slot in a 'parameterOptions' object. chipSd<--methods ~~ Methods for Function 'chipSd<-' ~~ chipSmooth Accessor methods for 'chipSmooth' slot in a 'parameterOptions' object. chipSmooth-methods ~~ Methods for Function 'chipSmooth' ~~ chipSmooth<- Setter method for 'chipSmooth' slot in 'parameterOptions' object. chipSmooth<--methods ~~ Methods for Function 'chipSmooth<-' ~~ computeChIPProfile Computing ChIP-seq like profiles from Occupancy data. computeGenomeWideScores Computing Genome Wide scores computeOccupancy Compute Occupancy values from PWM Scores based on model. computeOptimal compute Optimal Parameters computePWMScore Compute PWM Scores of sites above threshold. drop Accessor Method for the 'drop' slot in a 'genomicProfiles' object. drop-methods ~~ Methods for Function 'drop' ~~ evolve Running the ChIPanalyser implementation of a Genetic algorithm. generateStartingPopulation Generate Starting population for ChIPanalyser Genetic algortihm genomicProfiles Genomic Profile object genomicProfiles-class Class '"genomicProfiles"' genomicProfilesInternal-class Class '"genomicProfilesInternal"' getHighestFitnessSolutions Get Highest Fitness Solutions getTestingData Extract testing data from ChIPscore object getTrainingData Extract training data from ChIPscore object initialize-methods ~~ Methods for Function 'initialize' ~~ lambdaPWM Accessor Method for the 'lambdaPWM' slot in a 'parameterOptions' object lambdaPWM-methods ~~ Methods for Function 'lambdaPWM' ~~ lambdaPWM<- Setter Method for the 'lambdaPWM' slot in a 'parameterOptions' object lambdaPWM<--methods ~~ Methods for Function 'lambdaPWM<-' ~~ loci Accessor Method for the 'loci' slot in a 'ChIPScore' object loci-class Class '"loci"' loci-methods ~~ Methods for Function 'loci' ~~ lociWidth Accessor Method for the 'lociWidth' slot in a 'parameterOptions' object lociWidth-methods ~~ Methods for Function 'lociWidth' ~~ lociWidth<- Setter Method for the 'lociWidth' slot in a 'parameterOptions' object lociWidth<--methods ~~ Methods for Function 'lociWidth<-' ~~ maxPWMScore Accessor function for 'maxPWMScore' slot in a 'genomicProfiles' object. maxPWMScore-methods ~~ Methods for Function 'maxPWMScore' ~~ maxSignal Accessor method for the 'maxSignal' slot in a 'parameterOptions' object. maxSignal-methods ~~ Methods for Function 'maxSignal' ~~ maxSignal<- Setter method for 'maxSignal' slot in a 'parameterOptions' object. maxSignal<--methods ~~ Methods for Function 'maxSignal<-' ~~ minPWMScore Accessor method the 'minPWMScore' slot in a 'genomicProfiles' object minPWMScore-methods ~~ Methods for Function 'minPWMScore' ~~ naturalLog Accessor method the 'naturalLog' slot in a 'parameterOptions' object. naturalLog-methods ~~ Methods for Function 'naturalLog' ~~ naturalLog<- Setter method for the 'naturalLog' slot in a 'parameterOptions' object. naturalLog<--methods ~~ Methods for Function 'naturalLog<-' ~~ noOfSites Accessor Method for the 'noOfSites' slot in a 'parameterOptions' object noOfSites-methods ~~ Methods for Function 'noOfSites' ~~ noOfSites<- Setter Method for the 'noOfSites' slot in a 'parameterOptions' object. noOfSites<--methods ~~ Methods for Function 'noOfSites<-' ~~ noiseFilter Accessor Method for the 'noiseFilter' slot in a 'parameterOptions' object noiseFilter-methods ~~ Methods for Function 'noiseFilter' ~~ noiseFilter<- Setter Method for the 'noiseFilter' slot in a 'parameterOptions' object noiseFilter<--methods ~~ Methods for Function 'noiseFilter<-' ~~ nos-class Class '"nos"' parameterOptions parameter Options object parameterOptions-class Class '"parameterOptions"' ploidy Accessor method for the 'ploidy' slot in a 'parameterOptions' object ploidy-methods ~~ Methods for Function 'ploidy' ~~ ploidy<- Setter Method for the 'ploidy' slot in an 'parameterOptions' object ploidy<--methods ~~ Methods for Function 'ploidy<-' ~~ plotOccupancyProfile Plot Occupancy Profiles plotOptimalHeatMaps Heat Map of optimal Parameters processingChIP Pre-processing ChIP-seq data profileAccuracyEstimate Estimating Accuracy of predicted Profiles profiles ~~ Methods for Function 'profiles' ~~ removeBackground Accessor Method for the 'removeBackground' slot in a 'parameterOptions' object removeBackground-methods ~~ Methods for Function 'removeBackground' ~~ removeBackground<- Setter Method for the 'removeBackground' slot in a 'parameterOptions' object removeBackground<--methods ~~ Methods for Function 'removeBackground<-' ~~ scores Accessor Method for the 'scores' slot in a 'ChIPScore' object scores-methods ~~ Methods for Function 'scores' ~~ searchSites Searching function for Sites above threshold and predicted ChIP-seq Profiles setChromatinStates setChromatinStates show-methods ~~ Methods for Function 'show' ~~ singleRun singleRun splitData Get Training and Testing data from ChIPscore objects stepSize Accessor method of the 'stepSize' slot in 'parameterOptions' object stepSize-methods ~~ Methods for Function 'stepSize' ~~ stepSize<- Setter Method for the 'stepSize' slot in a 'parameterOptions' stepSize<--methods ~~ Methods for Function 'stepSize<-' ~~ strandRule Accessor Method for the 'strandRule' slot in a 'parameterOptions' object strandRule-methods ~~ Methods for Function 'strandRule' ~~ strandRule<- Setter method for the 'strandRule' slot in a 'parameterOptions' object. strandRule<--methods ~~ Methods for Function 'strandRule<-' ~~ whichstrand Accessor method for the 'whichstrand' slot in a 'parameterOptions' object whichstrand-methods ~~ Methods for Function 'whichstrand' ~~ whichstrand<- Setter method for the 'whichstrand' slot in a 'parameterOptions' object whichstrand<--methods ~~ Methods for Function 'whichstrand<-' ~~
Patrick C.N. Martin <[email protected]>
And
Nicolae Radu Zabet <[email protected]>
Maintainer: Patrick C.N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
#Data extraction data(ChIPanalyserData) # path to Position Frequency Matrix PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #As an example of genome, this example will run on the Drosophila genome if(!require("BSgenome.Dmelanogaster.UCSC.dm6", character.only = TRUE)){ if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("BSgenome.Dmelanogaster.UCSC.dm6") } library(BSgenome.Dmelanogaster.UCSC.dm6) DNASequenceSet <- getSeq(BSgenome.Dmelanogaster.UCSC.dm6) #Building data objects GPP <- genomicProfiles(PFM=PFM,PFMFormat="JASPAR",BPFrequency=DNASequenceSet) chip<-processingChIP(chip,top) # Computing Genome Wide GenomeWide <- computeGenomeWideScores(DNASequenceSet = DNASequenceSet, genomicsProfiles = GPP) #Compute PWM Scores PWMScores <- computePWMScore(genomicsProfiles = GenomeWide, DNASequenceSet = DNASequenceSet, loci = top, chromatinState = Access) #Compute Occupnacy Occupancy <- computeOccupancy(genomicsProfiles = PWMScores, parameterOptions = OPP) #Compute ChIP profiles chipProfile <- computeChIPProfile(genomicProfiles = Occupancy, loci = top, parameterOptions = OPP) #Estimating accuracy estimate AccuracyEstimate <- profileAccuracyEstimate(genomicProfiles = chipProfile, ChIPScore = chip, parameterOptions = OPP)
#Data extraction data(ChIPanalyserData) # path to Position Frequency Matrix PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #As an example of genome, this example will run on the Drosophila genome if(!require("BSgenome.Dmelanogaster.UCSC.dm6", character.only = TRUE)){ if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("BSgenome.Dmelanogaster.UCSC.dm6") } library(BSgenome.Dmelanogaster.UCSC.dm6) DNASequenceSet <- getSeq(BSgenome.Dmelanogaster.UCSC.dm6) #Building data objects GPP <- genomicProfiles(PFM=PFM,PFMFormat="JASPAR",BPFrequency=DNASequenceSet) chip<-processingChIP(chip,top) # Computing Genome Wide GenomeWide <- computeGenomeWideScores(DNASequenceSet = DNASequenceSet, genomicsProfiles = GPP) #Compute PWM Scores PWMScores <- computePWMScore(genomicsProfiles = GenomeWide, DNASequenceSet = DNASequenceSet, loci = top, chromatinState = Access) #Compute Occupnacy Occupancy <- computeOccupancy(genomicsProfiles = PWMScores, parameterOptions = OPP) #Compute ChIP profiles chipProfile <- computeChIPProfile(genomicProfiles = Occupancy, loci = top, parameterOptions = OPP) #Estimating accuracy estimate AccuracyEstimate <- profileAccuracyEstimate(genomicProfiles = chipProfile, ChIPScore = chip, parameterOptions = OPP)
averageExpPWMScore
slot in a
genomicProfiles
object.
Extract or Access averageExpPWMScore
slot in a
genomicProfiles
averageExpPWMScore(object)
averageExpPWMScore(object)
object |
|
As a general rule, averageExpPWMScore
is computed and updated
internally by computeGenomeWideScores
.
Idealy, this slot should not be updated by user.
The averageExpPWMScore
is the sum of the exponential of every PWM score
for a given DNA sequence and divided by the length of the said DNA sequence
(DNASequenceLength
). This can either be the full
length sequence or only the accessible sequence
(see computeGenomeWideScores
).
Returns the averageExpPWMScore
of a
genomicProfiles
when computed.
Patrick C.N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
# Accessing Data data(ChIPanalyserData) # path to Position Frequency Matrix PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") # Building genomicProfiles object GPP <- genomicProfiles(PFM=PFM,PFMFormat="JASPAR") # Extracting AllSitesAboveThreshold slot averageExpPWMScore(GPP) ## Note this slot is now empty as nothing has yet been computed
# Accessing Data data(ChIPanalyserData) # path to Position Frequency Matrix PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") # Building genomicProfiles object GPP <- genomicProfiles(PFM=PFM,PFMFormat="JASPAR") # Extracting AllSitesAboveThreshold slot averageExpPWMScore(GPP) ## Note this slot is now empty as nothing has yet been computed
averageExpPWMScore
~~~~ Methods for function averageExpPWMScore
~~
signature(object = "genomicProfilesInternal")
backgroundSignal
slot in a
parameterOptions
object.
Extract or access the backgroundSignal
slot in a
parameterOptions
object.
backgroundSignal(object)
backgroundSignal(object)
object |
|
Default Value: 0
When computing computeOccupancy
, a ChIP-seq background
signal is used to scale Occupancy by considering both a
backgroundSignal
and a maxSignal
.
The backgroundSignal
is also used to nomalise occupancies against maxOccupancy.
The backgroundSignal
usually comes from
experimental data and is provided by user. As a general rule,
if ChIP-seq data is available and will be used in
computeChIPProfile
, profileAccuracyEstimate
or plotOccupancyProfile
,
it is advised to use the backgroundSignal
from this data.
We strongly encourage to set values when building a
parameterOptions
object.
Returns a backgroundSignal
of a
parameterOptions
object.
Patrick C.N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
# Building occupancyProfileParameters object OPP <- parameterOptions() #Viewing single value in object backgroundSignal(OPP)
# Building occupancyProfileParameters object OPP <- parameterOptions() #Viewing single value in object backgroundSignal(OPP)
backgroundSignal
~~~~ Methods for function backgroundSignal
~~
signature(object = "parameterOptions")
backgroundSignal
slot in a
parameterOptions
Setter method for backgroundSignal
slot in a
parameterOptions
backgroundSignal(object)<-value
backgroundSignal(object)<-value
object |
|
value |
|
Defualt value: 0.
When computing computeOccupancy
, a ChIP-seq background
signal is used to
scale Occupancy by considering both a backgroundSignal
and a
maxSignal
. The backgroundSignal
is also used to nomalise occupancies to maxOccupancy.
The backgroundSignal
usually comes from
experimental data and is provided by user. As a general rule,
if ChIP-seq data is available and will be used in
computeChIPProfile
, profileAccuracyEstimate
or
plotOccupancyProfile
, it is advised to use
the backgroundSignal
from this data.
We strongly encourage to set values when building a
parameterOptions
object.
Returns a parameterOptions
object with a new
value
assigned to the backgroundSignal
slot.
Patrick C.N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
# Building occupancyProfileParameters object OPP <- parameterOptions() # Setting new value for backgroundSignal backgroundSignal(OPP) <- 0.2 # Viewing whole object with new updated value OPP #Viewing single value in object backgroundSignal(OPP)
# Building occupancyProfileParameters object OPP <- parameterOptions() # Setting new value for backgroundSignal backgroundSignal(OPP) <- 0.2 # Viewing whole object with new updated value OPP #Viewing single value in object backgroundSignal(OPP)
backgroundSignal<-
~~~~ Methods for function backgroundSignal<-
~~
backgroundSignal(object)<-value
boundMolecules
slot in
parameterOptions
object.
Extract or Access boundMolecules
slot in
parameterOptions
object.
boundMolecules(object)
boundMolecules(object)
object |
|
Defaut value: 1000
When computing occupancy (computeOccupancy
), a value for the
number of bound Molecules to DNA is needed.
This value can be updated and set in a
parameterOptions
object.
If the number of molecules is unknown,it is possible to infer this value with
computeOptimal
.
We strongly encourage to set values when building a
parameterOptions
object.
Returns boundMolecules
slot in
parameterOptions
object.
Patrick C.N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
# Building parameterOptions object OPP <- parameterOptions() #Checking single value by slot accessor boundMolecules(OPP)
# Building parameterOptions object OPP <- parameterOptions() #Checking single value by slot accessor boundMolecules(OPP)
boundMolecules
~~~~ Methods for function boundMolecules
~~
signature(object = "parameterOptions")
boundMolecules
slot in a
parameterOptions
object.
Setter method for the boundMolecules
slot in a
parameterOptions
object.
boundMolecules(object)<-value
boundMolecules(object)<-value
object |
|
value |
|
Default value: 1000
When computing occupancy (computeOccupancy
),
a value for the number of bound Molecules to DNA is needed.
This value can be updated and set in a
parameterOptions
object.
If the number of molecules is unknown,
it is possible to infer this value with computeOptimal
.
We strongly encourage to set values when building a
parameterOptions
object.
Returns a parameterOptions
object with an updated
value for boundMolecules
.
Patrick C.N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
# Building parameterOptions object OPP <- parameterOptions() # Setting new boundMolecules value in OPP boundMolecules(OPP) <- 5000 #Checking value in whole object OPP #Checking single value by slot accessor boundMolecules(OPP)
# Building parameterOptions object OPP <- parameterOptions() # Setting new boundMolecules value in OPP boundMolecules(OPP) <- 5000 #Checking value in whole object OPP #Checking single value by slot accessor boundMolecules(OPP)
boundMolecules<-
~~~~ Methods for function boundMolecules<-
~~
signature(object = "parameterOptions", value = "vector")
BPFrequency
slot in a
genomicProfiles
object.
Extract or Access BPFrequency
slot in a
genomicProfiles
object.
BPFrequency(object)
BPFrequency(object)
object |
|
Default value is c(0.25,0.25,0.25,0.25)
When generating a Postion Weight Matrix from a Position Frequency Matrix,
the probability
of occurrence of each base pair (Base Pair Frequency) is necessary
(as originally described by Gary Stormo). It is possible to
set custom values for BPFrequency
with a vector of length 4
containing the probability of occurrence of each base pair (A,C,G,T) in order.
If Base pair frequency is unknown, BPFrequency
will compute base pair
frequency from a DNA sequence. The nature of this sequence can be a
BSgenome
or a DNAStringSet
.
In order to decrease run time, it is advised to use
DNAStringSet
Returns BPFrequency
slot in
genomicProfiles
object.
Patrick C.N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
data(ChIPanalyserData) # path to Position Frequency Matrix PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") # Building genomicProfiles object GPP <- genomicProfiles(PFM=PFM,PFMFormat="JASPAR") #Extracting BPFrequency slot BPFrequency(GPP)
data(ChIPanalyserData) # path to Position Frequency Matrix PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") # Building genomicProfiles object GPP <- genomicProfiles(PFM=PFM,PFMFormat="JASPAR") #Extracting BPFrequency slot BPFrequency(GPP)
BPFrequency
~~~~ Methods for function BPFrequency
~~
signature(object = "genomicProfilesInternal")
BPFrequency
slot in a
genomicProfiles
object.
Setter method for BPFrequency
slot in a
genomicProfiles
object.
If base pair frequency is unknown, BPFrequency
will compute base pair
frequency from a DNA sequence.
BPFrequency(object)<-value
BPFrequency(object)<-value
object |
|
value |
A vector of length 4 containing the probability of occurrence of each base pair (A,C,G,T) in order. Default value is c(0.25,0.25,0.25,0.25). A A |
Default value is c(0.25,0.25,0.25,0.25)
When generating a Postion Weight Matrix from a Position Frequency Matrix,
the probability of occurrence of each base pair (Base Pair Frequency) is
necessary (as originally described by Gary Stormo). It is possible to
set custom values for BPFrequency
with a vector of length 4
containing the probability of occurrence of each base pair (A,C,G,T) in order.
If Base pair frequency is unknown, BPFrequency
will compute base pair
frequency from a DNA sequence when building a
genomicProfiles
object.
The nature of this sequence can be aBSgenome
object or a
DNAStringSet
. In order to decrease run time,
it is advised to use DNAStringSet
.
Returns a genomicProfiles
object with an updated
value for BPFrequency
.
Patrick C.N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
data(ChIPanalyserData) # path to Position Frequency Matrix PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #As an example of genome, this example will run on the Drosophila genome if(!require("BSgenome.Dmelanogaster.UCSC.dm6", character.only = TRUE)){ if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("BSgenome.Dmelanogaster.UCSC.dm6") } library(BSgenome.Dmelanogaster.UCSC.dm6) DNASequenceSet <- getSeq(BSgenome.Dmelanogaster.UCSC.dm6) # Building genomicProfiles object GPP <- genomicProfiles(PFM=PFM,PFMFormat = "JASPAR", BPFrequency=DNASequenceSet) # Updating BPFrequency ## !! Note!! BPFrequency is used to compute PWM from PFM ## IF updated after building GPP, then it will not influence PWM ## Advised to build with BPFrequency directly BPFrequency(GPP) <- DNASequenceSet BPFrequency(GPP) <- c(0.25,0.25,0.25,0.25)
data(ChIPanalyserData) # path to Position Frequency Matrix PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #As an example of genome, this example will run on the Drosophila genome if(!require("BSgenome.Dmelanogaster.UCSC.dm6", character.only = TRUE)){ if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("BSgenome.Dmelanogaster.UCSC.dm6") } library(BSgenome.Dmelanogaster.UCSC.dm6) DNASequenceSet <- getSeq(BSgenome.Dmelanogaster.UCSC.dm6) # Building genomicProfiles object GPP <- genomicProfiles(PFM=PFM,PFMFormat = "JASPAR", BPFrequency=DNASequenceSet) # Updating BPFrequency ## !! Note!! BPFrequency is used to compute PWM from PFM ## IF updated after building GPP, then it will not influence PWM ## Advised to build with BPFrequency directly BPFrequency(GPP) <- DNASequenceSet BPFrequency(GPP) <- c(0.25,0.25,0.25,0.25)
BPFrequency<-
~~~~ Methods for function BPFrequency<-
~~
signature(object = "genomicProfilesInternal", value = "DNAStringSet")
signature(object = "genomicProfilesInternal", value = "vector")
ChIPanalyserData
is derived from real biological data.
The source organism is Drosophila melanogaster.
The data can be described as genomic data as it contains DNA sequences,
loci, genetic information, DNA accessibility data and ChIP-seq data.
data(ChIPanalyserData)
data(ChIPanalyserData)
Access
is GRanges
containing DNA Accesibility data for the sequences described above.
cs
is GRanges
containing Chromatin State data for the sequences described above.
top
is GRanges
containing a locus of interest.
In this case eve strip Locus on
chromosome 2R in Drosophila melanogaster
chip
is a GRanges containing ChIP score of the eve
strip locus in Drosophila melanogaster.
geneRef
is a GRanges
containing UCSC gene reference information
Returns a set of Rdata objects as described above.
Transcription Factor PFM: Berkeley Drosophila Transcription Network Project (bdtnp.lbl.gov)
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
data(ChIPanalyserData)
data(ChIPanalyserData)
chipMean
slot in a
parameterOptions
object.
Accessor method for chipMean
slot in a
parameterOptions
object.
chipMean(object)
chipMean(object)
object |
|
Default vlaue : 150
When computing ChIP-seq like profiles (computeChIPProfile
,
the occupancy values given by computeOccupancy
are transformed
into ChIP-seq like profiles.
The average size of a ChIP-seq peak was described by Kaplan
(Kaplan et al. , 2011). It is advised to use the average
width of ChIP peaks from actual ChIP-seq data.
We strongly encourage to set values when building a
parameterOptions
object.
Returns chipMean
slot from a
parameterOptions
object.
Patrick C.N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
Kaplan T.,Li X.-Y.,Sabo P.J.,Thomas S.,Stamatoyannopoulos J.A., Biggin M.D., EisenM.B. Quantitative models of the mechanisms that control genome-wide patterns of transcription factor binding during early Drosophila development, PLoS Genet.,2011, vol. 7 pg. e1001290
# Building parameterOptions object OPP <- parameterOptions() #Accessing chipMean slot in OPP chipMean(OPP)
# Building parameterOptions object OPP <- parameterOptions() #Accessing chipMean slot in OPP chipMean(OPP)
chipMean
~~~~ Methods for function chipMean
~~
chipMean(object)
chipMean
slot in
parameterOptions
object.
Access methods for chipMean
slot in
parameterOptions
object.
chipMean(object)<-value
chipMean(object)<-value
object |
|
value |
|
Default vlaue : 150
When computing ChIP-seq like profiles (computeChIPProfile
,
the occupancy values given by computeOccupancy
are
transformed into ChIP-seq like profiles.
The average size of a ChIP-seq peak was described by Kaplan
(Kaplan et al. , 2011). It is advised to use the average
width of ChIP peaks from actual ChIP-seq data.
We strongly encourage to set values when building a
parameterOptions
object.
Returns a parameterOptions
object with an updated
value for chipMean
slot.
Patrick C.N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
Kaplan T.,Li X.-Y.,Sabo P.J.,Thomas S.,Stamatoyannopoulos J.A., Biggin M.D.,EisenM.B. Quantitative models of the mechanisms that control genome-wide patterns of transcription factor binding during early Drosophila development, PLoS Genet.,2011, vol. 7 pg. e1001290
# Building parameterOptions object OPP <- parameterOptions() # Setting new value for slot chipMean(OPP) <- 250
# Building parameterOptions object OPP <- parameterOptions() # Setting new value for slot chipMean(OPP) <- 250
chipMean<-
~~~~ Methods for function chipMean<-
~~
chipMean(object)<-value
"ChIPScore"
ChIPScore
is the result of the processingChIP
function. This object contains the extracted ChIP Score from ChIP data, the loci of interest and optional paramters associated to ChIPanalyser. The loci of interest will either be user provided or the top n regions as defined by the reduce argument im processingChIP
. This object has the sole purpose of aiding the storage and parsing of data and parameters.
Object of this class are created internaly and will be parsed to other objects as is.
scores
:Object of class "list"
List of extracted ChIP scores
loci
:Object of class "loci"
GRanges containing loci of interest
ploidy
:Object of class "numeric"
Ploidy level of the organism
boundMolecules
:Object of class "vector"
Number of Bound molecules to the DNA
backgroundSignal
:Object of class "numeric"
ChIP background signal (average ChIP score)
maxSignal
:Object of class "numeric"
max ChIP signal
lociWidth
:Object of class "numeric"
Width of loci if reduce is used and no loci are provided
chipMean
:Object of class "numeric"
Average ChIP peak width
chipSd
:Object of class "numeric"
Standard Deviation of ChIP peak width
chipSmooth
:Object of class "vector"
Smoothing window width for ChIP score
stepSize
:Object of class "numeric"
Defining resolution size of ChIP like profiles (10bp = signal will be only considered every 10bp)
removeBackground
:Object of class "numeric"
Signal Threshold to be removed. Default removes all negative scores
noiseFilter
:Object of class "character"
Type of noise filter to be used on ChIP data.
PWMThreshold
:Object of class "numeric"
Threshold of PWM scores that will be selected
strandRule
:Object of class "character"
Rule to compute strand score (max, mean or sum)
whichstrand
:Object of class "character"
Which strand should be used to compute PWM scores.
lambdaPWM
:Object of class "vector"
Lambda value - Scaling factor to the PWM
naturalLog
:Object of class "logical"
PFM to PWM conversion log transform ( natural log or log2)
noOfSites
:Object of class "nos"
Number of Sites in the PWM that should be used to compute PWM scores.
PWMpseudocount
:Object of class "numeric"
PWM pseudocount value for PFM to PWM conversion.
paramTag
:Object of class "character"
Internal Tag - Code progression
Class "parameterOptions"
, directly.
signature(object = "ChIPScore", value = "loci")
: ...
signature(object = "ChIPScore", value = "list")
: ...
signature(.Object = "ChIPScore")
: ...
signature(object = "ChIPScore")
: ...
signature(object = "ChIPScore")
: ...
signature(object = "ChIPScore")
: ...
Patrick C.N. Martin
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
showClass("ChIPScore")
showClass("ChIPScore")
chipSd
slot in a
parameterOptions
object.
Access or Extract chipSd
slot in a
parameterOptions
object.
chipSd(object)
chipSd(object)
object |
|
When computing ChIP-seq like profiles (computeChIPProfile
,
the occupancy values given by computeOccupancy
are transformed into
ChIP-seq like profiles.
The average size of a ChIP-seq peak was described by Kaplan
(Kaplan et al. , 2011). The average peak size is subject to
variation. This variation is accounted for with chipSd
.
It is advised to use the standard deviation of ChIP peak width from actual
ChIP-seq data.
We strongly encourage to set values when building a
parameterOptions
object.
Returns a parameterOptions
object with an
updated value for chipSd
.
Patrick C.N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
Kaplan T.,Li X.-Y.,Sabo P.J.,Thomas S.,Stamatoyannopoulos J.A., Biggin M.D., Eisen M.B. Quantitative models of the mechanisms that control genome-wide patterns of transcription factor binding during early Drosophila development, PLoS Genet.,2011, vol. 7 pg. e1001290
# Building parameterOptions object OPP <- parameterOptions() # Accessing chipSd slot chipSd(OPP)
# Building parameterOptions object OPP <- parameterOptions() # Accessing chipSd slot chipSd(OPP)
chipSd
~~~~ Methods for function chipSd
~~
chipSd(object)
chipSd
slot in a
parameterOptions
object.
Setter methods for chipSd
slot in a
parameterOptions
object.
chipSd(object)<-value
chipSd(object)<-value
object |
|
value |
|
When computing ChIP-seq like profiles (computeChIPProfile
,
the occupancy values given by computeOccupancy
are transformed into
ChIP-seq like profiles.
The average size of a ChIP-seq peak was described by Kaplan
(Kaplan et al. , 2011). The average peak size is subject to
variation. This variation is accounted for with chipSd
.
It is advised to use the standard deviation
of ChIP peak width from actual ChIP-seq data.
We strongly encourage to set values when building a
parameterOptions
object.
Returns a parameterOptions
object with an updated
value for chipSd
.
Patrick C.N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
Kaplan T.,Li X.-Y.,Sabo P.J.,Thomas S.,Stamatoyannopoulos J.A., Biggin M.D., Eisen M.B. Quantitative models of the mechanisms that control genome-wide patterns of transcription factor binding during early Drosophila development, PLoS Genet.,2011, vol. 7 pg. e1001290
# Building parameterOptions object OPP <- parameterOptions() # Setting new value for chipSd slot chipSd(OPP) <- 250
# Building parameterOptions object OPP <- parameterOptions() # Setting new value for chipSd slot chipSd(OPP) <- 250
chipSd<-
~~~~ Methods for function chipSd<-
~~
chipSd(object)<-value
chipSmooth
slot in a
parameterOptions
object.
Access or Extract chipSmooth
slot in a
parameterOptions
object.
chipSmooth(object)
chipSmooth(object)
object |
|
When computing ChIP-seq like (computeChIPProfile
) profile
from occupancy data (see computeOccupancy
),
the profiles are smoothed using a window of a given size.
The default value is set at 250 base pairs. If chipSmooth
is set to 0 then the profile will not be smoothed.
We strongly encourage to set values when building a
parameterOptions
object.
Returns the chipSmooth
slot in an
parameterOptions
object.
Patrick C.N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
# Building parameterOptions object OPP <- parameterOptions() # Accessing chipSd slot chipSmooth(OPP)
# Building parameterOptions object OPP <- parameterOptions() # Accessing chipSd slot chipSmooth(OPP)
chipSmooth
~~~~ Methods for function chipSmooth
~~
signature(object = "parameterOptions")
chipSmooth
slot in
parameterOptions
object.
Setter method for chipSmooth
slot in
parameterOptions
object.
chipSmooth(object) <- value
chipSmooth(object) <- value
object |
|
value |
|
When computing ChIP-seq like (computeChIPProfile
) profile
from occupancy data (see computeOccupancy
),
the profiles are smoothed using a window of a given size.
The default value is set at 250 base pairs.If chipSmooth
is set to 0 then the profile will not be smoothed.
We strongly encourage to set values when building a
parameterOptions
object.
Returns a parameterOptions
object with an updated
value for chipSmooth
slot.
Patrick C.N Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
# Building parameterOptions object OPP <- parameterOptions() # Setting new value for chipSd slot chipSmooth(OPP) <- 250
# Building parameterOptions object OPP <- parameterOptions() # Setting new value for chipSd slot chipSmooth(OPP) <- 250
chipSmooth<-
~~~~ Methods for function chipSmooth<-
~~
signature(object = "parameterOptions", value = "vector")
computeChIPProfile
compute ChIP-seq like profile from occupancy data.
Occupancy data is computed using computeOccupancy
.
computeChIPProfile(genomicProfiles, loci, parameterOptions = NULL, norm = TRUE, method = c("moving_kernel","truncated_kernel","exact"), peakSignificantThreshold= NULL,cores=1, verbose = TRUE)
computeChIPProfile(genomicProfiles, loci, parameterOptions = NULL, norm = TRUE, method = c("moving_kernel","truncated_kernel","exact"), peakSignificantThreshold= NULL,cores=1, verbose = TRUE)
genomicProfiles |
|
loci |
|
parameterOptions |
|
norm |
|
method |
|
peakSignificantThreshold |
|
cores |
|
verbose |
|
computeChIPProfile
converts Transcription Factor occuapncy to a profile
resembling the one of a ChIP-seq profile. Internally a few paramters are required
to build a ChIP like profile. These parameters are either defined and stored in
a ChIPScore
object (Paramters are updated based on
your ChIP data ), a genomicProfiles
(user defined at the
start of the analysis) or a parameterOptions
(if you want to
update values as you go along)
Returns a genomicProfiles
objec containing all ChIP-seq like
profile for every combination of lambdaPWM
and boundMolecules
provided by the user.
Patrick C.N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
#Extracting Data data(ChIPanalyserData) # path to Position Frequency Matrix PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #As an example of genome, this example will run on the Drosophila genome if(!require("BSgenome.Dmelanogaster.UCSC.dm6", character.only = TRUE)){ if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("BSgenome.Dmelanogaster.UCSC.dm6") } library(BSgenome.Dmelanogaster.UCSC.dm6) DNASequenceSet <- getSeq(BSgenome.Dmelanogaster.UCSC.dm6) # Building genomicProfiles object GPP <- genomicProfiles(PFM=PFM, PFMFormat="JASPAR",BPFrequency=DNASequenceSet) # Computing Genome Wide GenomeWide <- computeGenomeWideScores(genomicProfiles = GPP, DNASequenceSet = DNASequenceSet) #Compute PWM Scores PWMScores <- computePWMScore(genomicProfiles = GenomeWide, DNASequenceSet = DNASequenceSet, loci = top, chromatinState = Access) #Compute Occupnacy Occupancy <- computeOccupancy(genomicProfiles = PWMScores) #Compute ChIP profiles chipProfile <- computeChIPProfile(genomicProfiles=Occupancy,loci=top) chipProfile
#Extracting Data data(ChIPanalyserData) # path to Position Frequency Matrix PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #As an example of genome, this example will run on the Drosophila genome if(!require("BSgenome.Dmelanogaster.UCSC.dm6", character.only = TRUE)){ if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("BSgenome.Dmelanogaster.UCSC.dm6") } library(BSgenome.Dmelanogaster.UCSC.dm6) DNASequenceSet <- getSeq(BSgenome.Dmelanogaster.UCSC.dm6) # Building genomicProfiles object GPP <- genomicProfiles(PFM=PFM, PFMFormat="JASPAR",BPFrequency=DNASequenceSet) # Computing Genome Wide GenomeWide <- computeGenomeWideScores(genomicProfiles = GPP, DNASequenceSet = DNASequenceSet) #Compute PWM Scores PWMScores <- computePWMScore(genomicProfiles = GenomeWide, DNASequenceSet = DNASequenceSet, loci = top, chromatinState = Access) #Compute Occupnacy Occupancy <- computeOccupancy(genomicProfiles = PWMScores) #Compute ChIP profiles chipProfile <- computeChIPProfile(genomicProfiles=Occupancy,loci=top) chipProfile
computeGenomeWideScores
compute the max and min PWM score over the entire genome.
computeGenomeWideScores(genomicProfiles, DNASequenceSet, chromatinState = NULL, parameterOptions = NULL, cores = 1, verbose = TRUE)
computeGenomeWideScores(genomicProfiles, DNASequenceSet, chromatinState = NULL, parameterOptions = NULL, cores = 1, verbose = TRUE)
genomicProfiles |
|
DNASequenceSet |
|
chromatinState |
|
parameterOptions |
|
cores |
|
verbose |
|
computeGenomeWideScores
function computes PWM scores over the entire genome (or accessible Genome if chromatin State are provided ). Genome wide scores are used to determine the maximum and minimum PWM score as well as the average exponential score. These scores will in turn be used to determine which score are above the PWM theshold. The average exponential score is an integrale part of the equation used to compute Occupancy. Using defualt settings, ChIPanalyser will only compute occupancy on the top 70% of PWM scores. This threshold can be changed. See PWMThreshold
Returns a genomicsProfiles
object with updated values for max score, min score and averageExpPWMScore.
Patrick C.N Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
if(!require("BSgenome.Dmelanogaster.UCSC.dm6", character.only = TRUE)){ if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("BSgenome.Dmelanogaster.UCSC.dm6") } library(BSgenome.Dmelanogaster.UCSC.dm6) DNASequenceSet <- getSeq(BSgenome.Dmelanogaster.UCSC.dm6) # Building genomicProfiles object GPP <- genomicProfiles(PFM=PFM,PFMFormat="JASPAR", BPFrequency=DNASequenceSet) # Computing Genome Wide GenomeWide <- computeGenomeWideScores(genomicProfiles = GPP, DNASequenceSet = DNASequenceSet)
if(!require("BSgenome.Dmelanogaster.UCSC.dm6", character.only = TRUE)){ if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("BSgenome.Dmelanogaster.UCSC.dm6") } library(BSgenome.Dmelanogaster.UCSC.dm6) DNASequenceSet <- getSeq(BSgenome.Dmelanogaster.UCSC.dm6) # Building genomicProfiles object GPP <- genomicProfiles(PFM=PFM,PFMFormat="JASPAR", BPFrequency=DNASequenceSet) # Computing Genome Wide GenomeWide <- computeGenomeWideScores(genomicProfiles = GPP, DNASequenceSet = DNASequenceSet)
computeOccupancy
will compute the Occupancy from PWM Scores.
As described in detail in the vignette,
ChIPanalyser
uses PWM Scores, DNA Accessibility data, the number of
bound molecules and a sclaing factor of Transcription Factor specificty.
This function will compute occupancy using the
values assigned to each variable.
computeOccupancy(genomicProfiles,parameterOptions = NULL, norm = TRUE, verbose = TRUE)
computeOccupancy(genomicProfiles,parameterOptions = NULL, norm = TRUE, verbose = TRUE)
genomicProfiles |
|
parameterOptions |
|
norm |
|
verbose |
|
computeOccupancy
will compute the Occupancy from PWM Scores.
As described in detail in the vignette,
ChIPanalyser
uses PWM Scores, DNA Accessibility data,
the number of bound molecules and a sclaing factor of
Transcription Factor specificty.
This function will compute occupancy using the values assigned
to each variable. It should also be noted that the
parameterOptions
object contains a set of parameters used to compute Occupancy
(not only restricted to this ). These parameters are often dependant on
real ChIP-Seq data and will influence
the goodness of fit between the predicted model an real ChIP-seq data.
We strongly advise that the values assigned to each parameter should be
customiszed in order to increase the model ageement with
real world biological data.
computeOccupancy
will return a genomicProfiles
.
The main difference will reside in the
profiles
slot.
This slot is generally a list or GRangesList
.
Within these list type structures are enclosed
GRanges
containing the positions of site
above threshold, PWMScores and Occupancy for each site.
The series of GRanges will depend on the number of loci that are
tested and the number of element in the list will depend on the various
combinations of lambdaPWM
and boundMolecules
.
Patrick C.N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
#Data extraction data(ChIPanalyserData) # path to Position Frequency Matrix PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #As an example of genome, this example will run on the Drosophila genome if(!require("BSgenome.Dmelanogaster.UCSC.dm6", character.only = TRUE)){ if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("BSgenome.Dmelanogaster.UCSC.dm6") } library(BSgenome.Dmelanogaster.UCSC.dm6) DNASequenceSet <- getSeq(BSgenome.Dmelanogaster.UCSC.dm6) #Building data objects GPP <- genomicProfiles(PFM=PFM,PFMFormat="JASPAR",BPFrequency=DNASequenceSet) OPP <- parameterOptions() # Computing Genome Wide GenomeWide <- computeGenomeWideScores(genomicProfiles = GPP, DNASequenceSet = DNASequenceSet) #Compute PWM Scores PWMScores <- computePWMScore(genomicProfiles = GenomeWide, DNASequenceSet = DNASequenceSet, loci = top, chromatinState = Access) #Compute Occupnacy Occupancy <- computeOccupancy(genomicProfiles = PWMScores, parameterOptions = OPP) Occupancy
#Data extraction data(ChIPanalyserData) # path to Position Frequency Matrix PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #As an example of genome, this example will run on the Drosophila genome if(!require("BSgenome.Dmelanogaster.UCSC.dm6", character.only = TRUE)){ if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("BSgenome.Dmelanogaster.UCSC.dm6") } library(BSgenome.Dmelanogaster.UCSC.dm6) DNASequenceSet <- getSeq(BSgenome.Dmelanogaster.UCSC.dm6) #Building data objects GPP <- genomicProfiles(PFM=PFM,PFMFormat="JASPAR",BPFrequency=DNASequenceSet) OPP <- parameterOptions() # Computing Genome Wide GenomeWide <- computeGenomeWideScores(genomicProfiles = GPP, DNASequenceSet = DNASequenceSet) #Compute PWM Scores PWMScores <- computePWMScore(genomicProfiles = GenomeWide, DNASequenceSet = DNASequenceSet, loci = top, chromatinState = Access) #Compute Occupnacy Occupancy <- computeOccupancy(genomicProfiles = PWMScores, parameterOptions = OPP) Occupancy
ChIPanalyser
contains a set of functions some of which require two
parameters known as lambdaPWM
and as
boundMolecules
. These two paramters are not always known.
computeOptimal
will compute these values by maximising the
correlation and minimising the Mean Squared Error between a predicted
ChIP-seq-like profile and a real ChIP-seq profile for a given loci.
computeOptimal(genomicProfiles,DNASequenceSet, ChIPScore,chromatinState = NULL, parameterOptions = NULL, optimalMethod = "all",rank=FALSE,returnAll=TRUE, peakMethod="moving_kernel",cores=1)
computeOptimal(genomicProfiles,DNASequenceSet, ChIPScore,chromatinState = NULL, parameterOptions = NULL, optimalMethod = "all",rank=FALSE,returnAll=TRUE, peakMethod="moving_kernel",cores=1)
genomicProfiles |
|
DNASequenceSet |
|
ChIPScore |
|
chromatinState |
|
parameterOptions |
|
optimalMethod |
|
rank |
|
returnAll |
|
peakMethod |
|
cores |
|
In order to backward infer the values of lambdaPWM
and boundMolecules
, it is possible to use the
computeOptimal
to find these parameters.
It should be noted that this functions requires a ChIP-seq data input.
ChIPScore
(ChIP-seq data). This should be the output of the processingChIP
function.
computeOptimal
returns a list respectivly described as the optimal
set of Parameters (lambda - lambdaPWM
and
boundMolecules
), the optimal matrix (a matrix containing
accuracy estimates dependant on the parameter chosen), and finally the
chosen parameter. If the parameter that was chosen was "all",
then each element of this list will contain the optimal set of
parameters, optimal matricies for all of the aforementioned paramters (see optimalMethod
).
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
#Data extraction data(ChIPanalyserData) # path to Position Frequency Matrix PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #As an example of genome, this example will run on the Drosophila genome if(!require("BSgenome.Dmelanogaster.UCSC.dm6", character.only = TRUE)){ if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("BSgenome.Dmelanogaster.UCSC.dm6") } library(BSgenome.Dmelanogaster.UCSC.dm6) DNASequenceSet <- getSeq(BSgenome.Dmelanogaster.UCSC.dm6) chip<-processingChIP(chip,top) #Building data objects GPP <- genomicProfiles(PFM=PFM,PFMFormat="JASPAR",BPFrequency=DNASequenceSet) OPP <- parameterOptions() #Computing Optimal set of Parameters optimalParam <- computeOptimal(genomicProfiles = GPP, DNASequenceSet = DNASequenceSet, ChIPScore = chip, chromatinState = Access, parameterOptions = OPP, parameter = "all", peakMethod="moving_kernel")
#Data extraction data(ChIPanalyserData) # path to Position Frequency Matrix PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #As an example of genome, this example will run on the Drosophila genome if(!require("BSgenome.Dmelanogaster.UCSC.dm6", character.only = TRUE)){ if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("BSgenome.Dmelanogaster.UCSC.dm6") } library(BSgenome.Dmelanogaster.UCSC.dm6) DNASequenceSet <- getSeq(BSgenome.Dmelanogaster.UCSC.dm6) chip<-processingChIP(chip,top) #Building data objects GPP <- genomicProfiles(PFM=PFM,PFMFormat="JASPAR",BPFrequency=DNASequenceSet) OPP <- parameterOptions() #Computing Optimal set of Parameters optimalParam <- computeOptimal(genomicProfiles = GPP, DNASequenceSet = DNASequenceSet, ChIPScore = chip, chromatinState = Access, parameterOptions = OPP, parameter = "all", peakMethod="moving_kernel")
computePWMScore
will compute and extract all sites that exhibit a
PWM Score higher than a threshold.
This threshold (see PWMThreshold
) will determine the percentage
of total sites that should NOT be considered.
computePWMScore(genomicProfiles,DNASequenceSet, loci = NULL, chromatinState = NULL,parameterOptions=NULL,cores=1, verbose = TRUE)
computePWMScore(genomicProfiles,DNASequenceSet, loci = NULL, chromatinState = NULL,parameterOptions=NULL,cores=1, verbose = TRUE)
DNASequenceSet |
|
genomicProfiles |
|
loci |
|
parameterOptions |
|
chromatinState |
|
cores |
|
verbose |
|
After determining genome wide scores, it is possible to only compute and
extract high affinity sites (in the sense that they have a high PWM Score).
If a PWMThreshold
is not set by user,
the default value is set at 0.7.
This means that 70 % of sites will NOT be selected.
Only the top 30 % will be computed and extracted.
If one is interested in all PWM Scores at a genome wide scale
( or accessible DNA ), this is possible by setting
PWMThreshold
to zero.
computePWMScore
will return a
genomicProfiles
object.
The profiles
slot will have been updated.
This slot will now contain a GRangesList
with each element being a GRanges
.
This GRanges will contain postion of each sites
(start, end and strand) and the PWMScore associated to that site.
Patrick C.N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
#Data extraction data(ChIPanalyserData) # path to Position Frequency Matrix PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #As an example of genome, this example will run on the Drosophila genome if(!require("BSgenome.Dmelanogaster.UCSC.dm6", character.only = TRUE)){ if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("BSgenome.Dmelanogaster.UCSC.dm6") } library(BSgenome.Dmelanogaster.UCSC.dm6) DNASequenceSet <- getSeq(BSgenome.Dmelanogaster.UCSC.dm6) chip<-processingChIP(chip,top) #Building data objects GPP <- genomicProfiles(PFM=PFM,PFMFormat="JASPAR",BPFrequency=DNASequenceSet) # Computing Genome Wide GenomeWide <- computeGenomeWideScores(DNASequenceSet = DNASequenceSet, genomicProfiles = GPP) #Compute PWM Scores PWMScores <- computePWMScore(DNASequenceSet = DNASequenceSet, genomicProfiles = GenomeWide, loci = chip, chromatinState = Access) PWMScores
#Data extraction data(ChIPanalyserData) # path to Position Frequency Matrix PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #As an example of genome, this example will run on the Drosophila genome if(!require("BSgenome.Dmelanogaster.UCSC.dm6", character.only = TRUE)){ if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("BSgenome.Dmelanogaster.UCSC.dm6") } library(BSgenome.Dmelanogaster.UCSC.dm6) DNASequenceSet <- getSeq(BSgenome.Dmelanogaster.UCSC.dm6) chip<-processingChIP(chip,top) #Building data objects GPP <- genomicProfiles(PFM=PFM,PFMFormat="JASPAR",BPFrequency=DNASequenceSet) # Computing Genome Wide GenomeWide <- computeGenomeWideScores(DNASequenceSet = DNASequenceSet, genomicProfiles = GPP) #Compute PWM Scores PWMScores <- computePWMScore(DNASequenceSet = DNASequenceSet, genomicProfiles = GenomeWide, loci = chip, chromatinState = Access) PWMScores
DNASequenceLength
slot in a
genomicProfiles
Accessor method for DNASequenceLength
slot in a
genomicProfiles
DNASequenceLength(object)
DNASequenceLength(object)
object |
|
The model on which is based ChIPanalyser
requires the length of the
DNA sequence used to compute scores. In this circustance,
this DNA Length is the total length of the DNA of the organism of interest
or the the Accessible DNA at a genome wide scale.
Returns DNASequenceLength
slot in a
genomicProfiles
object.
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
#Data extraction data(ChIPanalyserData) # path to Position Frequency Matrix PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #As an example of genome, this example will run on the Drosophila genome if(!require("BSgenome.Dmelanogaster.UCSC.dm6", character.only = TRUE)){ if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("BSgenome.Dmelanogaster.UCSC.dm6") } library(BSgenome.Dmelanogaster.UCSC.dm6) DNASequenceSet <- getSeq(BSgenome.Dmelanogaster.UCSC.dm6) #Building genomicProfiles object GPP <- genomicProfiles(PFM=PFM,PFMFormat="JASPAR",BPFrequency=DNASequenceSet) # Computing Genome Wide GenomceWide <- computeGenomeWideScores(DNASequenceSet = DNASequenceSet, genomicProfiles = GPP) DNASequenceLength(GenomceWide)
#Data extraction data(ChIPanalyserData) # path to Position Frequency Matrix PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #As an example of genome, this example will run on the Drosophila genome if(!require("BSgenome.Dmelanogaster.UCSC.dm6", character.only = TRUE)){ if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("BSgenome.Dmelanogaster.UCSC.dm6") } library(BSgenome.Dmelanogaster.UCSC.dm6) DNASequenceSet <- getSeq(BSgenome.Dmelanogaster.UCSC.dm6) #Building genomicProfiles object GPP <- genomicProfiles(PFM=PFM,PFMFormat="JASPAR",BPFrequency=DNASequenceSet) # Computing Genome Wide GenomceWide <- computeGenomeWideScores(DNASequenceSet = DNASequenceSet, genomicProfiles = GPP) DNASequenceLength(GenomceWide)
DNASequenceLength
~~~~ Methods for function DNASequenceLength
~~
signature(object = "genomicProfilesInternal")
drop
slot in a
genomicProfiles
object.
Accessor Method for the drop
slot in a
genomicProfiles
object.
drop(object)
drop(object)
object |
|
During certain computations, it is possible that the Loci of interest
do no show any overlap with accesible DNA. If this were to be the case,
a warning message will appear in the console but these inaccessible Loci
will be stored in this slot. It is also for these reasons that it is
imperative for Loci of interest to be named
(in this case, a named GRanges
).
Returns a character string with loci containing no accesible DNA.
Patrick C.N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
# Loading data data(ChIPanalyserData) #Loading PFM files PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #Building data objects GPP <- genomicProfiles(PFM=PFM,PFMFormat="JASPAR") # Loci with no acces - a warning message will be issued #if loci do no contain accesible DNA # Otherwise this slot will remain empty drop(GPP)
# Loading data data(ChIPanalyserData) #Loading PFM files PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #Building data objects GPP <- genomicProfiles(PFM=PFM,PFMFormat="JASPAR") # Loci with no acces - a warning message will be issued #if loci do no contain accesible DNA # Otherwise this slot will remain empty drop(GPP)
drop
~~~~ Methods for function drop
~~
signature(object = "genomicProfilesInternal")
evolve
pushes a starting population to evolve in a genetic algorithm.
evolve(population,DNASequenceSet,ChIPScore, genomicProfiles,parameters=NULL,generations=100,mutationProbability=0.3, offsprings=5,chromatinState=NULL, method="geometric", lambda=TRUE, checkpoint=TRUE, filename=NULL, cores=1)
evolve(population,DNASequenceSet,ChIPScore, genomicProfiles,parameters=NULL,generations=100,mutationProbability=0.3, offsprings=5,chromatinState=NULL, method="geometric", lambda=TRUE, checkpoint=TRUE, filename=NULL, cores=1)
population |
numeric value describing the number of individuals in the starting population.
Alternatively - a starting population list as returned by |
DNASequenceSet |
DNAStringSet object containing DNA sequences of interest (Extracted from BSgenome) |
ChIPScore |
ChIPScore object as returned by the |
genomicProfiles |
genomicProfiles object containing minimal information (such as the PWM) |
parameters |
vector or list containing each parameter that should be added to the chromosome.
See |
generations |
numeric describing the number of generation before the Genetic algorithm should halt. |
mutationProbability |
numeric descrbining the rate of mutations for each surviving individual |
offsprings |
numeric descrbining the number of individuals surviving to the next generation |
chromatinState |
GRanges object containing chromatin state information. Each state should be labled in a meta data column named "name". It is advised to use numeric values for each state name. |
method |
character string describing the scoring metric that should be used. ChIPanalyser offers twelve different metrics: correlation coefficients (Pearson, Spearman and Kendall), Mean Squared Error (MSE), Kolmogorov–Smirnov Distance, precision, recall, accuracy, F-score, Matthew’s correlation coefficient (MCC) and Area Under Curve Receiver Operator Characteristic (AUC ROC or just AUC) |
lambda |
logical describing if lambda value should be pre-computed. Setting to TRUE increases the speed of the algorithm. |
checkpoint |
logical describing if population parameters at each generations should be saved. |
filename |
character string that will serve as a prefix to the saved intermediate files. |
cores |
numeric describing the number of cores used to run the GA. |
ChIPanalyser offers a way of finding optimal solution by using a genetic algorithm. Instead of running the stadard analysis, TF binding affinities to chromatin states can be extracted via this more complex method. It should be noted that this method is better suited for the analysis of chromatin states. While the algorithm still works with simple DNA Accessibility, it would potentially take more time for accuracy minor gains.
Returns a named list with three elements.
database saves the data frame containing all scores for each individual since generation 1
population saves the last population with chromosome values
fitestsaves the fittest individual for a given generation
Patrick C.N. Martin <[email protected]
library(ChIPanalyser) data(ChIPanalyserData) # See GA vignette for usage
library(ChIPanalyser) data(ChIPanalyserData) # See GA vignette for usage
generateStartingPopulation
generates a starting population with random
traits for each individual
generateStartingPopulation(population,parameters,names=NULL)
generateStartingPopulation(population,parameters,names=NULL)
population |
numeric value describing the number of individuals in the starting population. |
parameters |
vector or list containing each parameter that should be added to the chromosome. |
names |
character describing names that should be added to each individual. |
generateStartingPopulation
generates a starting poppulation to be used
in the genetic algortihm implemented in ChIPanalyser. There are two main ways a
starting population can be generated:
by name Using names of each parameter that should be parse to each "chromosome". The possible paramters are N, lambda, PWMThreshold, CS ( DNAAffinity or DNAAccessibility also works). CS values should also contain a numeric value associated to each chromatin state you wish to parse. e.g CS1 ... CS14 This will generate a value by sampling from a set of predefined value for each paramters.
by value range
Using a named list (names for each parameters). Each element of the list
should contain three numeric values : length of range, min value, max value.
(Internally - values are parse to runif
)
Returns a list of individuals with a random traits
Patrick C.N. Martin
## by name param <- c("N","lambda","PWMThreshold","CS1","CS2","CS3") pop <- generateStartingPopulation(20,param, names = NULL) # by range paramValue <- list(c(10,1,1000),c(10,0,5),c(10,0,0.9),c(10,0,1),c(10,0,1),c(10,0,1)) pop <- generateStartingPopulation(20,paramValue,names= param)
## by name param <- c("N","lambda","PWMThreshold","CS1","CS2","CS3") pop <- generateStartingPopulation(20,param, names = NULL) # by range paramValue <- list(c(10,1,1000),c(10,0,5),c(10,0,0.9),c(10,0,1),c(10,0,1),c(10,0,1)) pop <- generateStartingPopulation(20,paramValue,names= param)
genomicProfiles
is an S4 object serving two purposes: (i) storing internal computed data and (ii) storing paramter options. This object is parsed through the different steps of the pipeline
to facilitate that parsing and changing of paramters.
genomicProfiles(..., parameterOptions = NULL, genomicProfiles = NULL, ChIPScore = NULL)
genomicProfiles(..., parameterOptions = NULL, genomicProfiles = NULL, ChIPScore = NULL)
... |
Any of the user available slots in genomicProfiles. |
parameterOptions |
If some parameters were already previously computed or stored in a parameterOptions, parsing this object will use those values instead of the default ones. |
genomicProfiles |
If some parameters were already previously computed or stored in a genomicProfiles, parsing this object will use those values instead of the default ones. |
ChIPScore |
If some parameters were already previously computed or stored in a ChIPScore, parsing this object will use those values instead of the default ones. |
The genomicProfiles
object serves the purpose of storing, and parsing paramters and computed data between the different steps of the pipeline. When creating a genomicProfiles
object it is possible to use previously computed values by simply parsing the object to the constructor function.
Returns a genomicsProfiles
object with updated slots for all paramters parsed.
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") genomicProfiles() genomicProfiles(PFM=PFM,PFMFormat="JASPAR")
PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") genomicProfiles() genomicProfiles(PFM=PFM,PFMFormat="JASPAR")
"genomicProfiles"
genomicProfiles
is an S4 object serving two purposes: (i) storing internal computed data and (ii) storing paramter options. This object is parsed through the different steps of the pipeline
to facilitate that parsing and changing of paramters.
Objects can be created by calls of the form genomicProfiles(ploidy, boundMolecules, backgroundSignal, maxSignal, lociWidth, chipMean, chipSd, chipSmooth, stepSize, noiseFilter, removeBackground, lambdaPWM, PWMpseudocount, naturalLog, noOfSites, PWMThreshold, strandRule, whichstrand, PFM, PWM, PFMFormat, BPFrequency, minPWMScore, maxPWMScore, profiles, DNASequenceLength, averageExpPWMScore)
.
PWM
:Object of class "matrix"
:
A Position Weight Matrix (either supplied or internally computed if
PFM is provided)
PFM
:Object of class "matrix"
:
A Position Frequency Matrix (may also be a path to file containing PFM)
PFMFormat
:Object of class "character"
:
A character string of one of the following: raw, transfac,JASPAR or
sequences
BPFrequency
:Object of class "vector"
:
Base Pair Frequency in the genome (if a DNA sequence is provided
(as a DNAStringSet
or BSgenome
),
will be automatically computed internally). Default:c(0.25,0.25,0.25,0.25)
minPWMScore
:Object of class "vector"
:
Lowest PWM score accros the genome (computed and updated internally)
maxPWMScore
:Object of class "vector"
:
Highest PWM score across the genome (computed and updated internally)
profiles
:Object of class "GRList"
:
Containins GRanges with sites above threshold and associated metrics
(PWMscore and Occupancy) - Computed Internally
DNASequenceLength
:Object of class "vector"
:
Length of the Genome (or accesible genome) - computed internally
averageExpPWMScore
:Object of class "vector"
:
Average exponential PWM score across the genome
(or accesible genome) - computed internally
ZeroBackground
:Object of class "vector"
:
Internal background value (computed internally)
drop
:Object of class "vector"
:
Stores Loci that do contain accesible DNA if it were to be the case
(computed and updated internally)
tags
:Object of class "character"
~Internal Tags~
ploidy
:Object of class "numeric"
:
A numeric Value descibing the ploidy of the organism. Default: 2
boundMolecules
:Object of class "vector"
:
A vector (or single value) containing the number of bound Molecules
(bound Transcription Factors): Default: 1000
backgroundSignal
:Object of class "numeric"
:
A numeric value descibing the ChIP-seq background Signal
(average signal from real ChIP seq data). Default: 0
maxSignal
:Object of class "numeric"
:
A numeric value describing the highest ChIP-seq signal
(from real ChIP-seq data). Default: 1
lociWidth
:Object of class "numeric"
~~
chipMean
:Object of class "numeric"
:
A numeric value describing the mean width of a ChIP- seq peak.
Default:150
chipSd
:Object of class "numeric"
:
A numeric value describing the standard deviation of ChIP-seq peaks.
Default: 150
chipSmooth
:Object of class "vector"
:
A numeric value describing the width of the window used to smooth
Occupancy profiles into ChIP profiles. Default:250
stepSize
:Object of class "numeric"
:
A numeric value describing the step Size (in base pairs) between
each ChIP-seq score. Default:10 (Scored every 10 base pairs)
removeBackground
:Object of class "numeric"
:
A numeric value describing the value at which score should be removed.
Defualt:0 (If negative scores then remove)
noiseFilter
:Object of class "character"
~Describes the
noiseFilter method that will be applied to ChIP data (Zero, mean, median,
sigmoid)~
PWMThreshold
:Object of class "numeric"
:
Threshold at which PWM Score should be selected (only sites above
threshold will be selected - between 0 and 1)
strandRule
:Object of class "character"
:
"mean", "max" or "sum" will dertermine how strand should be handle
for computing PWM Scores. Default : "max"
whichstrand
:Object of class "character"
:
"+","-" or "+-" on which strand should PWM Score be computed.
Default: "+-"
lambdaPWM
:Object of class "vector"
A vector (or single value) contaning values for lambdaPWM Default:1
naturalLog
:Object of class "logical"
:
A logical value describing if natural Log will be used to compute
the PWM (if FALSE then log2 will be used). Default: TRUE
noOfSites
:Object of class "nos"
A Positive integer descibing number of sites (in base pair) should
be used from the PFM to compute PWM. Default =0 (Full width of
binding site will be used when set to 0)
PWMpseudocount
:Object of class "numeric"
:
A numeric value describing a PWMpseudocount for PWM computation.
Default:1
paramTag
:Object of class "character"
~Internal~
Class "genomicProfilesInternal"
, directly.
Class "parameterOptions"
, directly.
signature(.Object = "genomicProfiles")
: ...
signature(object = "genomicProfiles")
: ...
Partick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
genomicProfiles
parameterOptions
showClass("genomicProfiles")
showClass("genomicProfiles")
"genomicProfilesInternal"
Non exported class. Represents the stripped down version of genomicProfiles.
Created Internally.
PWM
:Object of class "matrix"
~~
PFM
:Object of class "matrix"
~~
PFMFormat
:Object of class "character"
~~
BPFrequency
:Object of class "vector"
~~
minPWMScore
:Object of class "vector"
~~
maxPWMScore
:Object of class "vector"
~~
profiles
:Object of class "GRList"
~~
DNASequenceLength
:Object of class "vector"
~~
averageExpPWMScore
:Object of class "vector"
~~
ZeroBackground
:Object of class "vector"
~~
drop
:Object of class "vector"
~~
tags
:Object of class "character"
~~
signature(object = "genomicProfilesInternal", value = "numeric")
: ...
signature(object = "genomicProfilesInternal", value = "vector")
: ...
signature(object = "genomicProfilesInternal", value = "vector")
: ...
signature(object = "genomicProfilesInternal")
: ...
signature(object = "genomicProfilesInternal", value = "vector")
: ...
signature(object = "genomicProfilesInternal", value = "vector")
: ...
signature(object = "genomicProfilesInternal", value = "GRList")
: ...
signature(object = "genomicProfilesInternal")
: ...
signature(object = "genomicProfilesInternal", value = "character")
: ...
signature(object = "genomicProfilesInternal")
: ...
signature(object = "genomicProfilesInternal")
: ...
signature(object = "genomicProfilesInternal", value = "DNAStringSet")
: ...
signature(object = "genomicProfilesInternal", value = "vector")
: ...
signature(object = "genomicProfilesInternal")
: ...
signature(object = "genomicProfilesInternal")
: ...
signature(object = "genomicProfilesInternal")
: ...
signature(object = "genomicProfilesInternal")
: ...
signature(object = "genomicProfilesInternal")
: ...
signature(object = "genomicProfilesInternal", value = "character")
: ...
signature(object = "genomicProfilesInternal")
: ...
signature(object = "genomicProfilesInternal", value = "character")
: ...
signature(object = "genomicProfilesInternal", value = "matrix")
: ...
signature(object = "genomicProfilesInternal")
: ...
signature(object = "genomicProfilesInternal", value = "matrix")
: ...
signature(object = "genomicProfilesInternal")
: ...
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
showClass("genomicProfilesInternal")
showClass("genomicProfilesInternal")
getHighestFitnessSolutions
extract best solution from a ChIPanalyser GA/evolve Run.
getHighestFitnessSolutions(population,child=2,method="geometric")
getHighestFitnessSolutions(population,child=2,method="geometric")
population |
Population list as output by the |
child |
numeric describing the number of solution to be extracted from Population list. |
method |
character string describing which scoring method should be used and selected from "geometric","ks","MSE","pearson","spearman","kendall", "recall","precesion","fscore","MCC","Accuracy" or "AUC". |
This function only serves as a way of extracting data from the poppulation list. Ultimately - it is just a wrapper for some indexing.
Return the index of the top "child" solutions.
Patrick C.N. Martin <[email protected]>
library(ChIPanalyser) data(ChIPanalyserData) # See GA vignette for usage
library(ChIPanalyser) data(ChIPanalyserData) # See GA vignette for usage
getTestingData
extracts selected regions from ChIPscore object to be used as testing set.
getTestingData(ChIPscore,loci = 1)
getTestingData(ChIPscore,loci = 1)
ChIPscore |
ChIPscore object as returned by |
loci |
numeric describing index of loci to be used as testing data. |
Returns ChIPscore object with the selected testing loci.
Patrick C.N. Martin <[email protected]
library(ChIPanalyser) data(ChIPanalyserData) # See GA vignette for usage test <- processingChIP(chip,top) test <- getTestingData(test, 1:2)
library(ChIPanalyser) data(ChIPanalyserData) # See GA vignette for usage test <- processingChIP(chip,top) test <- getTestingData(test, 1:2)
getTrainingData
extracts selected regions from ChIPScore object to be used as training set.
getTrainingData(ChIPscore,loci = 1)
getTrainingData(ChIPscore,loci = 1)
ChIPscore |
ChIPscore object as returned by |
loci |
numeric describing index of loci to be used as training data. |
Returns ChIPscore object with the selected training loci.
Patrick C.N. Martin <[email protected]
library(ChIPanalyser) data(ChIPanalyserData) # See GA vignette for usage test <- processingChIP(chip,top) test <- getTrainingData(test, 1:2)
library(ChIPanalyser) data(ChIPanalyserData) # See GA vignette for usage test <- processingChIP(chip,top) test <- getTrainingData(test, 1:2)
"GRList"
Virutal Class to handle multiple data types for one slot
( profiles
)
A virtual Class: No objects may be created from it.
GRList-class
The purpose of this virtual classe is to store data of two different formats in one slot: GRangesList and Lists
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
showClass("GRList")
showClass("GRList")
initialize
~~~~ Methods for function initialize
~~
signature(.Object = "ChIPScore")
Initialize ChIPScore
signature(.Object = "genomicProfiles")
Initialize genomicProfiles
signature(.Object = "parameterOptions")
Initialize parameterOptions
lambdaPWM
slot in a
parameterOptions
object
Accessor Method for the lambdaPWM
slot in a
parameterOptions
object
lambdaPWM(object)
lambdaPWM(object)
object |
|
The model underlying ChIPanalyser internally infers two paramters: number of bound molecules and lambda. Lambda represents a scaling factor for the Position weight matrix (PWM). This can be described as how well does a TF discriminate between high affinity and very high affinity sites.
Returns the value assigned to the lambdaPWM
slot in a
parameterOptions
object.
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions(lambdaPWM=1) #Setting new Value for lambdaPWM lambdaPWM(GPP)
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions(lambdaPWM=1) #Setting new Value for lambdaPWM lambdaPWM(GPP)
lambdaPWM
~~~~ Methods for function lambdaPWM
~~
lambdaPWM(object)
lambdaPWM
slot in a
parameterOptions
object
Setter Method for the lambdaPWM
slot in a
parameterOptions
object
lambdaPWM(object)<-value
lambdaPWM(object)<-value
object |
|
value |
|
The model underlying ChIPanalyser internally infers two paramters: number of bound molecules and lambda. Lambda represents a scaling factor for the Position weight matrix (PWM). This can be described as how well does a TF discriminate between high affinity and very high affinity sites.
Returns the value assigned to the lambdaPWM
slot in a
parameterOptions
object.
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions(lambdaPWM=1) #Setting new Value for lambdaPWM lambdaPWM(GPP) <- 2
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions(lambdaPWM=1) #Setting new Value for lambdaPWM lambdaPWM(GPP) <- 2
lambdaPWM<-
~~Setter method for the lambdaPWM
slot in the parameterOptions
lambdaPWM(object)<-value
loci
slot in a
ChIPScore
object
Setter Method for the loci
slot in a
ChIPScore
object
loci(object)
loci(object)
object |
|
When using the processingChIP
, this functions will return a
name GRanges with the loci of interest. These loci will either result from
user input or extracted from the ChIP profiles (see processingChIP
and lociWidth
). This functions enalbles
you to extract those loci from the ChIPScore object.
Returns the value assigned to the loci
slot in a
ChIPScore
object.
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
# Loading data data(ChIPanalyserData) chip<-processingChIP(chip,top) loci(chip)
# Loading data data(ChIPanalyserData) chip<-processingChIP(chip,top) loci(chip)
"loci"
Setter for Loci of interest parsed to or extracted from the ChIPScore object
A virtual Class: No objects may be created from it.
signature(object = "ChIPScore", value = "loci")
: ...
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
showClass("loci")
showClass("loci")
loci
~~Accessor method for the loci
slot in ChIPScore
loci{Object}
Loci of interest parsed to or extracted from the ChIPScore object
lociWidth
slot in a
parameterOptions
object
Setter Method for the lociWidth
slot in a
parameterOptions
object
lociWidth(object)
lociWidth(object)
object |
|
When using the processingChIP
function, the provided ChIP
scores will be split into bins of a given size. lociWidth determines the Size
of that bin. Default is set at 20 000 bp.
This means that the ChIP profiles provided will be split into bins of 20 000 bp
over the entire profile provided if no loci of interest is provided.
Returns the value assigned to the lociWidth
slot in a
parameterOptions
object.
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions(lociWidth=20000) #Accessing new Value for lociWidth lociWidth(GPP)
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions(lociWidth=20000) #Accessing new Value for lociWidth lociWidth(GPP)
lociWidth
~~Accessor method for the loci
slot in ChIPScore
lociWidth(object)
Setting width of regions when using the reduce argument and NOT providing your
own loci when using the processingChIP
function.
lociWidth
slot in a
parameterOptions
object
Setter Method for the lociWidth
slot in a
parameterOptions
object
lociWidth(object)<-value
lociWidth(object)<-value
object |
|
value |
|
When using the processingChIP
function, the provided ChIP
scores will be split into bins of a given size. lociWidth determines the Size
of that bin. Default is set at 20 000 bp.
This mean that the ChIP profiles provided will be split into bins of 20 000 bp
over the entire profile provided if no loci of interest is provided.
Returns the value assigned to the lociWidth
slot in a
parameterOptions
object.
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions(lociWidth=20000) #Setting new Value for lociWidth lociWidth(GPP) <- 30000
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions(lociWidth=20000) #Setting new Value for lociWidth lociWidth(GPP) <- 30000
lociWidth<-
~~Setter method for the loci
slot in ChIPScore
lociWidth(Object)<-value
maxPWMScore
slot in a
genomicProfiles
object.
Accessor function for maxPWMScore
slot in a
genomicProfiles
object.
maxPWMScore(object)
maxPWMScore(object)
object |
|
maxPWMScore
is a numerical value that can be described as the
highest PWM score computed at a genome wide scale.
This value is computed and updated in the
genomicProfiles
object after using the
computeGenomeWideScores
.
Returns the value of assigned to the maxPWMScore
slot in a
genomicProfiles
object.
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
# Loading data #Data extraction data(ChIPanalyserData) # path to Position Frequency Matrix PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #As an example of genome, this example will run on the Drosophila genome if(!require("BSgenome.Dmelanogaster.UCSC.dm6", character.only = TRUE)){ if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("BSgenome.Dmelanogaster.UCSC.dm6") } library(BSgenome.Dmelanogaster.UCSC.dm6) DNASequenceSet <- getSeq(BSgenome.Dmelanogaster.UCSC.dm6) #Building data objects GPP <- genomicProfiles(PFM=PFM,PFMFormat="JASPAR") # Computing Genome Wide GenomeWide <- computeGenomeWideScores(DNASequenceSet = DNASequenceSet, genomicProfiles = GPP) maxPWMScore(GenomeWide) ## If used before computeGenomeWidePWMScore, will return NULL
# Loading data #Data extraction data(ChIPanalyserData) # path to Position Frequency Matrix PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #As an example of genome, this example will run on the Drosophila genome if(!require("BSgenome.Dmelanogaster.UCSC.dm6", character.only = TRUE)){ if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("BSgenome.Dmelanogaster.UCSC.dm6") } library(BSgenome.Dmelanogaster.UCSC.dm6) DNASequenceSet <- getSeq(BSgenome.Dmelanogaster.UCSC.dm6) #Building data objects GPP <- genomicProfiles(PFM=PFM,PFMFormat="JASPAR") # Computing Genome Wide GenomeWide <- computeGenomeWideScores(DNASequenceSet = DNASequenceSet, genomicProfiles = GPP) maxPWMScore(GenomeWide) ## If used before computeGenomeWidePWMScore, will return NULL
maxPWMScore
~~Accessor method for maxPWMScore
maxPWMScore(object)
maxSignal
slot in a
parameterOptions
object.
Accessor method for the maxSignal
slot in a
parameterOptions
object.
maxSignal(object)
maxSignal(object)
object |
|
In the context of ChIPanalyser
, maxSignal
represents the
maximum normalised ChIP-Seq signal of a given Transcription factor
(or DNA binding protein). Although, A default value of 1 has been assigned to
this slot, we strongly recommend to tailor this value accordingly.
We strongly encourage to set values when building a
parameterOptions
object.
Returns the value assigned to the maxSignal
slot in a
parameterOptions
object.
Patrick C.N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
# Building parameterOptions object OPP <- parameterOptions() #Setting new Value for maxSignal maxSignal(OPP)
# Building parameterOptions object OPP <- parameterOptions() #Setting new Value for maxSignal maxSignal(OPP)
maxSignal
~~Accessor method for maxSignal
maxSignal(object)
Maximum ChIP signal extracted from ChIP data (see processingChIP
)
maxSignal
slot in a
parameterOptions
object.
Setter method for maxSignal
slot in a
parameterOptions
object.
maxSignal(object) <- value
maxSignal(object) <- value
object |
|
value |
|
In the context of ChIPanalyser
, maxSignal
represents the
maximum normalised ChIP-Seq signal of a given Transcription factor
(or DNA binding protein). Although, A default value of 1 has been assigned to
this slot, we strongly recommend to tailor this value accordingly.
We strongly encourage to set values when building a
parameterOptions
object.
Returns a parameterOptions
with an updated
value for maxSignal
.
Patrick C.N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
# Building parameterOptions object OPP <- parameterOptions() #Setting new Value for maxSignal maxSignal(OPP) <- 1.8
# Building parameterOptions object OPP <- parameterOptions() #Setting new Value for maxSignal maxSignal(OPP) <- 1.8
maxSignal<-
~~Setter method for maxSignal
maxSignal(Object)<-value
Maximum ChIP signal extracted from ChIP data (see processingChIP
)
minPWMScore
slot in a
genomicProfiles
object
Accessor method the minPWMScore
slot in a
genomicProfiles
object
minPWMScore(object)
minPWMScore(object)
object |
|
minPWMScore
can be described as the lowest PWM score computed at
a genome wide scale. Although it is possible to assigne a value
to minPWMScore
, we strongly advise to use the value
computed and assigned internally. This value is computed in the
computeGenomeWideScores
function.
Returns the value assigned to the minPWMScore
slot in a
genomicProfiles
object.
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
#Data extraction data(ChIPanalyserData) # path to Position Frequency Matrix PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #As an example of genome, this example will run on the Drosophila genome if(!require("BSgenome.Dmelanogaster.UCSC.dm6", character.only = TRUE)){ if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("BSgenome.Dmelanogaster.UCSC.dm6") } library(BSgenome.Dmelanogaster.UCSC.dm6) DNASequenceSet <- getSeq(BSgenome.Dmelanogaster.UCSC.dm6) #Building data objects GPP <- genomicProfiles(PFM=PFM,PFMFormat="JASPAR") # Computing Genome Wide GenomceWide <- computeGenomeWideScores(DNASequenceSet = DNASequenceSet, genomicProfiles = GPP) minPWMScore(GenomceWide) ## If used before computeGenomeWidePWMScore, will return NULL
#Data extraction data(ChIPanalyserData) # path to Position Frequency Matrix PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #As an example of genome, this example will run on the Drosophila genome if(!require("BSgenome.Dmelanogaster.UCSC.dm6", character.only = TRUE)){ if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("BSgenome.Dmelanogaster.UCSC.dm6") } library(BSgenome.Dmelanogaster.UCSC.dm6) DNASequenceSet <- getSeq(BSgenome.Dmelanogaster.UCSC.dm6) #Building data objects GPP <- genomicProfiles(PFM=PFM,PFMFormat="JASPAR") # Computing Genome Wide GenomceWide <- computeGenomeWideScores(DNASequenceSet = DNASequenceSet, genomicProfiles = GPP) minPWMScore(GenomceWide) ## If used before computeGenomeWidePWMScore, will return NULL
minPWMScore
~~Accessor for minPWMScore
minPWMScore(object)
Minimum PWM score computed during the computeGenomeWideScores
step.
naturalLog
slot in a
parameterOptions
object.
Accessor method the naturalLog
slot in a
parameterOptions
object.
naturalLog(object)
naturalLog(object)
object |
|
During the computation of a Postion Weight Matrix, the
Position Probability Matrix (derived from a Position Frequency Matrix)
is log transformed. This parameter provides whcih "log transform" will be used.
If TRUE, the Natural Log will bu used (ln). If FALSE, log2 will be used.
We strongly encourage to set values when building a
parameterOptions
object.
Returns the value assigned to the naturalLog
slot in a
parameterOptions
object.
Patrick C.N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions(naturalLog=TRUE) #Setting new Value for naturalLog naturalLog(GPP)
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions(naturalLog=TRUE) #Setting new Value for naturalLog naturalLog(GPP)
naturalLog
~~Accessor method for the naturalLog
slot in a
parameterOptions
object.
naturalLog(object)
naturalLog
slot in a
parameterOptions
object.
Setter method for the naturalLog
slot in a
parameterOptions
object.
naturalLog(object)<- value
naturalLog(object)<- value
object |
|
value |
|
During the computation of a Postion Weight Matrix, the
Position Probability Matrix (derived from a Position Frequency Matrix)
is log transformed. This parameter provides whcih "log transform" will be used.
If TRUE, the Natural Log will bu used (ln). If FALSE, log2 will be used.
We strongly encourage to set values when building a
parameterOptions
object.
Returns parameterOptions
object with an updated
value for the naturalLog
slot.
Patrick C.N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
# Loading data data(ChIPanalyserData) #Building data objects OPP <- parameterOptions(naturalLog=TRUE) #Setting new Value for naturalLog naturalLog(OPP) <- FALSE
# Loading data data(ChIPanalyserData) #Building data objects OPP <- parameterOptions(naturalLog=TRUE) #Setting new Value for naturalLog naturalLog(OPP) <- FALSE
naturalLog<-
~~Setter method for the naturalLog
slot in a
parameterOptions
object.
naturalLog(object)<-value
noiseFilter
slot in a
parameterOptions
object
Accessor Method for the noiseFilter
slot in a
parameterOptions
object
noiseFilter(object)
noiseFilter(object)
object |
|
Noise filtering method that should be used on ChIP-seq data. Four methods are available: Zero, Mean, Median and Sigmoid. Zero removes all ChIP-seq scores bellow zero, mean under the mean score, median under median score and sigmoid assignes a weight to each score based on a logistic regression curve. Mid point is set at 95 95 quantile of ChIP-seq scores. Below midpoint will receive a score between 0 and 1 , everything above will receive a score between 1 and 2
Returns the value assigned to the noiseFilter
slot in a
parameterOptions
object.
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions(noiseFilter="sigmoid") #Setting new Value for noiseFilter noiseFilter(GPP)
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions(noiseFilter="sigmoid") #Setting new Value for noiseFilter noiseFilter(GPP)
noiseFilter
~~Accessor method for noiseFilter
noiseFilter(object)
Noise Filter that will be applied to ChIP scores
noiseFilter
slot in a
parameterOptions
object
Setter Method for the noiseFilter
slot in a
parameterOptions
object
noiseFilter(object) <- value
noiseFilter(object) <- value
object |
|
value |
|
Noise filtering method that should be used on ChIP-seq data. Four methods are available: Zero, Mean, Median and Sigmoid. Zero removes all ChIP-seq scores bellow zero, mean under the mean score, median under median score and sigmoid assignes a weight to each score based on a logistic regression curve. Mid point is set at 95 95 quantile of ChIP-seq scores. Below midpoint will receive a score between 0 and 1 , everything above will receive a score between 1 and 2
Returns the value assigned to the noiseFilter
slot in a
parameterOptions
object.
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions(noiseFilter="sigmoid") #Setting new Value for noiseFilter noiseFilter(GPP) <-"zero"
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions(noiseFilter="sigmoid") #Setting new Value for noiseFilter noiseFilter(GPP) <-"zero"
noiseFilter<-
~~Setter method for noiseFilter
noiseFilter(object)<-value
Noise Filter that will be applied to ChIP scores
noOfSites
slot in a
parameterOptions
object
Accessor Method for the noOfSites
slot in a
parameterOptions
object
noOfSites(object)
noOfSites(object)
object |
|
While computing Position Weight Matricies (PWM) from Position Frequency Matricies (PFM), it is possible to restrict the number of sites that will be used to compute the PWM. The default is set at "all". In this case, all sites will be used to compute the PWM.
Returns the value assigned to the noOfSites
slot in a
parameterOptions
object.
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions(noOfSites="all") #Setting new Value for naturalLog noOfSites(GPP)
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions(noOfSites="all") #Setting new Value for naturalLog noOfSites(GPP)
noOfSites
~~~~ Methods for function noOfSites
~~
signature(object = "parameterOptions")
noOfSites
slot in a
parameterOptions
object.
Setter Method for the noOfSites
slot in a
parameterOptions
object.
noOfSites(object) <- value
noOfSites(object) <- value
object |
|
value |
|
While computing Position Weight Matricies (PWM) from Position Frequency Matricies (PFM), it is possible to restrict the number of sites that will be used to compute the PWM. The default is set at "all". In this case, all sites will be used to compute the PWM.
Returns a parameterOptions
object with an updated
value for the noOfSites
slot.
Patrick C.N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions(noOfSites=0) #Setting new Value for naturalLog noOfSites(GPP) <- 8
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions(noOfSites=0) #Setting new Value for naturalLog noOfSites(GPP) <- 8
noOfSites<-
~~Setter method for noOfSites
noOfSites(object)<-"all"
noOfSites(object)<-value
"nos"
Virtual class to handle Number of Sites
A virtual Class: No objects may be created from it.
No methods defined with class "nos" in the signature.
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
showClass("nos")
showClass("nos")
parameterOptions
is an object used to store and parse the various parameters
needed throughout this analysis pipeline.
parameterOptions(ploidy = 2, boundMolecules = 1000, backgroundSignal = 0, maxSignal = 1, lociWidth = 20000, chipMean = 200, chipSd = 200, chipSmooth = 250, stepSize = 10, removeBackground = 0, noiseFilter = "zero", naturalLog = TRUE, noOfSites = "all", PWMThreshold = 0.7, strandRule = "max", whichstrand = "+-", PWMpseudocount = 1, lambdaPWM = 1)
parameterOptions(ploidy = 2, boundMolecules = 1000, backgroundSignal = 0, maxSignal = 1, lociWidth = 20000, chipMean = 200, chipSd = 200, chipSmooth = 250, stepSize = 10, removeBackground = 0, noiseFilter = "zero", naturalLog = TRUE, noOfSites = "all", PWMThreshold = 0.7, strandRule = "max", whichstrand = "+-", PWMpseudocount = 1, lambdaPWM = 1)
ploidy |
|
boundMolecules |
|
backgroundSignal |
|
maxSignal |
|
lociWidth |
|
chipMean |
|
chipSd |
|
chipSmooth |
|
stepSize |
|
removeBackground |
|
noiseFilter |
|
naturalLog |
|
noOfSites |
|
PWMThreshold |
|
strandRule |
|
whichstrand |
|
PWMpseudocount |
|
lambdaPWM |
A vector (or single value) contaning values for the ScalingFactorPWM (Also known as lambda).Default:1 |
ChIPanalyser requires a lot of parameters. parameterOptions
was created with the intent
of storing and parsing these numerous arguments to the different functions. All parameters in this object are optional although strongly recommend. Some parameters are extracted and updated from
function along the pipeline e.g. maxSignal and backgroundSignal are extracted during the
processingChIP
step. These paramters will be automatically parsed. If you do not which to use them ( or any other parameter) simply parse a new parameterOptions object with your desired
paramters.
Returns a parameterOptions
with updated values.
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
# parameterOptions(ploidy = 2, boundMolecules = 1000, backgroundSignal = 0, maxSignal = 1, lociWidth = 20000, chipMean = 200, chipSd = 200, chipSmooth = 250, stepSize = 10, removeBackground = 0, noiseFilter = "zero", naturalLog = TRUE, noOfSites = "all", PWMThreshold = 0.7, strandRule = "max", whichstrand = "+-", PWMpseudocount = 1, lambdaPWM = 1)
# parameterOptions(ploidy = 2, boundMolecules = 1000, backgroundSignal = 0, maxSignal = 1, lociWidth = 20000, chipMean = 200, chipSd = 200, chipSmooth = 250, stepSize = 10, removeBackground = 0, noiseFilter = "zero", naturalLog = TRUE, noOfSites = "all", PWMThreshold = 0.7, strandRule = "max", whichstrand = "+-", PWMpseudocount = 1, lambdaPWM = 1)
"parameterOptions"
parameterOptions
is an object used to store and parse the various parameters
needed throughout this analysis pipeline.
Objects can be created by calls of the form parameterOptions(ploidy, boundMolecules, backgroundSignal, maxSignal, lociWidth, chipMean, chipSd, chipSmooth, stepSize, noiseFilter, removeBackground, lambdaPWM, PWMpseudocount, naturalLog, noOfSites, PWMThreshold, strandRule, whichstrand)
.
ploidy
:Object of class "numeric"
:
A numeric Value descibing the ploidy of the organism. Default: 2
boundMolecules
:Object of class "vector"
:
A vector (or single value) containing the number of bound Molecules
(bound Transcription Factors): Default: 1000
backgroundSignal
:Object of class "numeric"
:
A numeric value descibing the ChIP-seq background Signal
(average signal from real ChIP seq data). Default: 0
maxSignal
:Object of class "numeric"
:
A numeric value describing the highest ChIP-seq signal
(from real ChIP-seq data). Default: 1
lociWidth
:Object of class "numeric"
:
A numeric value describing bin size when splitting ChIP seq scores). Default: 20 000
chipMean
:Object of class "numeric"
:
A numeric value describing the mean width of a ChIP- seq peak.
Default:150
chipSd
:Object of class "numeric"
:
A numeric value describing the standard deviation of ChIP-seq peaks.
Default: 150
chipSmooth
:Object of class "vector"
:
A numeric value describing the width of the window used to smooth
Occupancy profiles into ChIP profiles. Default:250
stepSize
:Object of class "numeric"
:
A numeric value describing the step Size (in base pairs) between
each ChIP-seq score. Default:10 (Scored every 10 base pairs)
removeBackground
:Object of class "numeric"
:
A numeric value describing the value at which score should be removed.
Defualt:0 (If negative scores then remove)
noiseFilter
:Object of class "character"
Describes
noiseFilter method applied to ChIP scores
PWMThreshold
:Object of class "numeric"
:
Threshold at which PWM Score should be selected (only sites above
threshold will be selected - between 0 and 1)
strandRule
:Object of class "character"
:
"mean", "max" or "sum" will dertermine how strand should be handle
for computing PWM Scores. Default : "max"
whichstrand
:Object of class "character"
:
"+","-" or "+-" on which strand should PWM Score be computed.
Default: "+-"
lambdaPWM
:Object of class "vector"
A vector (or single value) contaning values for lambdaPWM Default:1
naturalLog
:Object of class "logical"
:
A logical value describing if natural Log will be used to compute
the PWM (if FALSE then log2 will be used). Default: TRUE
noOfSites
:Object of class "nos"
A Positive integer descibing number of sites (in base pair) should
be used from the PFM to compute PWM. Default =0 (Full width of
binding site will be used when set to 0)
PWMpseudocount
:Object of class "numeric"
:
A numeric value describing a PWMpseudocount for PWM computation.
Default:1
paramTag
:Object of class "character"
~Internal~
signature(object = "parameterOptions")
: ...
signature(object = "parameterOptions", value = "character")
: ...
signature(object = "parameterOptions")
: ...
signature(object = "parameterOptions", value = "vector")
: ...
signature(object = "parameterOptions")
: ...
signature(object = "parameterOptions", value = "numeric")
: ...
signature(object = "parameterOptions")
: ...
signature(object = "parameterOptions", value = "vector")
: ...
signature(object = "parameterOptions")
: ...
signature(object = "parameterOptions", value = "numeric")
: ...
signature(object = "parameterOptions")
: ...
signature(object = "parameterOptions", value = "numeric")
: ...
signature(object = "parameterOptions")
: ...
signature(object = "parameterOptions", value = "vector")
: ...
signature(.Object = "parameterOptions")
: ...
signature(object = "parameterOptions")
: ...
signature(object = "parameterOptions", value = "vector")
: ...
signature(object = "parameterOptions")
: ...
signature(object = "parameterOptions", value = "numeric")
: ...
signature(object = "parameterOptions")
: ...
signature(object = "parameterOptions", value = "numeric")
: ...
signature(object = "parameterOptions")
: ...
signature(object = "parameterOptions", value = "logical")
: ...
signature(object = "parameterOptions")
: ...
signature(object = "parameterOptions", value = "character")
: ...
signature(object = "parameterOptions")
: ...
signature(object = "parameterOptions", value = "character")
: ...
signature(object = "parameterOptions", value = "numeric")
: ...
signature(object = "parameterOptions")
: ...
signature(object = "parameterOptions", value = "numeric")
: ...
signature(object = "parameterOptions")
: ...
signature(object = "parameterOptions", value = "numeric")
: ...
signature(object = "parameterOptions")
: ...
signature(object = "parameterOptions", value = "numeric")
: ...
signature(object = "parameterOptions")
: ...
signature(object = "parameterOptions", value = "vector")
: ...
signature(object = "parameterOptions")
: ...
signature(object = "parameterOptions")
: ...
signature(object = "parameterOptions", value = "numeric")
: ...
signature(object = "parameterOptions")
: ...
signature(object = "parameterOptions", value = "character")
: ...
signature(object = "parameterOptions")
: ...
signature(object = "parameterOptions", value = "character")
: ...
Partick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
showClass("parameterOptions")
showClass("parameterOptions")
PFMFormat
slot in a
genomicProfiles
object
Accesor method for the PFMFormat
slot in a
genomicProfiles
object
PFMFormat(object)
PFMFormat(object)
object |
|
If loading a
PositionFrequencyMatrix
from a file, the format
of the file should be specified. Default is raw. Please keep in mind that
this argument is used when parsing the
PositionFrequencyMatrix
file. IF this argument is changed
after building the genomicProfiles
with
a PositionFrequencyMatrix file, this
will not influence the parsing of the file.
PFMFormat
can be one of the following:
"raw","transfac","JASPAR" or "sequences"
Returns the value assigned to the PFMFormat
slot a
genomicProfiles
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
# Loading data data(ChIPanalyserData) #Loading PFM files PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #Building data objects #### THIS IS THE PREFFERED METHOD FOR SETTING PFMFormat GPP <- genomicProfiles(PFM=PFM,PFMFormat="JASPAR") #Setting New value for PFMFormat PFMFormat(GPP)
# Loading data data(ChIPanalyserData) #Loading PFM files PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #Building data objects #### THIS IS THE PREFFERED METHOD FOR SETTING PFMFormat GPP <- genomicProfiles(PFM=PFM,PFMFormat="JASPAR") #Setting New value for PFMFormat PFMFormat(GPP)
PFMFormat
~~Accesor method for the PFMFormat
slot in a
genomicProfiles
object
PFMFormat(object)
PFMFormat
slot in a
genomicProfiles
object
Setter method for the PFMFormat
slot in a
genomicProfiles
object
PFMFormat(object) <- value
PFMFormat(object) <- value
object |
|
value |
|
If loading a
PositionFrequencyMatrix
from a file, the format
of the file should be specified. Default is JASPAR. Please keep in mind that
this argument is used when parsing the
PositionFrequencyMatrix
file. IF this argument is changed
after building the genomicProfiles
with
a PositionFrequencyMatrix file, this
will not influence the parsing of the file.
Returns a genomicProfiles
object with an updated
value for the PFMFormat
slot.
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
# Loading data data(ChIPanalyserData) #Loading PFM files PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #Building data objects #### THIS IS THE PREFFERED METHOD FOR SETTING PFMFormat GPP <- genomicProfiles(PFM=PFM,PFMFormat="JASPAR") #Setting New value for PFMFormat PFMFormat(GPP) <- "JASPAR"
# Loading data data(ChIPanalyserData) #Loading PFM files PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #Building data objects #### THIS IS THE PREFFERED METHOD FOR SETTING PFMFormat GPP <- genomicProfiles(PFM=PFM,PFMFormat="JASPAR") #Setting New value for PFMFormat PFMFormat(GPP) <- "JASPAR"
PFMFormat<-
~~Setter method for the PFMFormat
slot in a
genomicProfiles
object
PFMFormat(object)<-value
ploidy
slot in a
parameterOptions
object
Accessor method for the ploidy
slot in a
parameterOptions
object
ploidy(object)
ploidy(object)
object |
|
Default value for ploidy
is set a 2. It should be mentioned that
ChIPanalyser is based on a model that also considers the ploidy of the
organism of interest however this only considers simple polyploidy
(or haploidy). The model does not consider hybrids such as wheat.
Returns the value assigned to the ploidy
slot in a
parameterOptions
object
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
# Building parameterOptions object OPP <- parameterOptions() #Setting new Value for maxSignal ploidy(OPP)
# Building parameterOptions object OPP <- parameterOptions() #Setting new Value for maxSignal ploidy(OPP)
ploidy
~~Accessor method for the ploidy
slot in a
parameterOptions
object
ploidy(object)
ploidy
slot in an
parameterOptions
object
Setter Method for the ploidy
slot in an
parameterOptions
object
ploidy(object)<- value
ploidy(object)<- value
object |
|
value |
|
Default value for ploidy
is set a 2. It should be mentioned that
ChIPanalyser is based on a model that also considers the ploidy of the
organism however this only considers simple polyploidy (or haploidy). T
he model does not consider hybrids such as wheat.
Returns a parameterOptions
object with an updated
value for the ploidy
slot.
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
# Building parameterOptions object OPP <- parameterOptions() #Setting new Value for maxSignal ploidy(OPP) <- 2
# Building parameterOptions object OPP <- parameterOptions() #Setting new Value for maxSignal ploidy(OPP) <- 2
ploidy<-
~~Setter Method for the ploidy
slot in an
parameterOptions
object
ploidy(object)<-value
plotOccupancyProfile
plots the predicted profiles.
If provided, this functions will also plot ChIP-seq profiles,
PWMScores (or Occupancy), chromatin States, Goodness of Fit estimates and gene information.
plotOccupancyProfile(predictedProfile, ChIPScore = NULL,chromatinState = NULL, occupancy = NULL,goodnessOfFit = NULL,PWM=FALSE, geneRef = NULL,addLegend = TRUE,...)
plotOccupancyProfile(predictedProfile, ChIPScore = NULL,chromatinState = NULL, occupancy = NULL,goodnessOfFit = NULL,PWM=FALSE, geneRef = NULL,addLegend = TRUE,...)
predictedProfile |
|
ChIPScore |
|
chromatinState |
|
occupancy |
|
goodnessOfFit |
|
PWM |
|
geneRef |
|
addLegend |
|
... |
Any other graphical Parameter of the following : cex, cex.lab, cex.main, densityCS , densityGR , ylab, xlab, main, colPred, colChIP, colOccup, colCS, colGR, n_axis_ticks. See details. |
Once the predicted ChIP-seq like profiles have been computed, it is possible to plot these profiles.
This functions allows to control graphical parameters. In short:
* col = color values - exact number of colors or colors that will be used in a colorRampPalettte.
* cex = font sizes - for text, axis labels and main
* Density = fill density for chromatin state and/or geneRef blocks
Pred = predictedProfile ChIP = ChIP score (Experimental ChIP data) CS = Chromatin States GR = Gene reference Occup = Occupnacy locations
Returns a profile plot with "Occupancy" on the y axis and DNA position on the the X- axis.
Patrick C.N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
#Data extraction data(ChIPanalyserData) # path to Position Frequency Matrix PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #As an example of genome, this example will run on the Drosophila genome if(!require("BSgenome.Dmelanogaster.UCSC.dm6", character.only = TRUE)){ if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("BSgenome.Dmelanogaster.UCSC.dm6") } library(BSgenome.Dmelanogaster.UCSC.dm6) DNASequenceSet <- getSeq(BSgenome.Dmelanogaster.UCSC.dm6) #Building data objects GPP <- genomicProfiles(PFM=PFM,PFMFormat="JASPAR", BPFrequency=DNASequenceSet) # Computing Genome Wide GenomeWide <- computeGenomeWideScores(DNASequenceSet = DNASequenceSet, genomicProfiles = GPP) #Compute PWM Scores PWMScores <- computePWMScore(DNASequenceSet = DNASequenceSet, genomicProfiles = GenomeWide, loci = top, chromatinState = Access) #Compute Occupnacy Occupancy <- computeOccupancy(genomicProfiles = PWMScores) #Compute ChIP profiles chipProfile <- computeChipProfile(loci = top, genomicProfiles = Occupancy) #Plotting Profile plotOccupancyProfile(predictedProfile=chipProfile, ChIPScore = chip, chromatinState = Access, occupancy = Occupancy, geneRef =geneRef) plotOccupancyProfile(predictedProfile=chipProfile, ChIPScore = chip, chromatinState = Access, occupancy = Occupancy, geneRef = geneRef, colCS = c("red","blue"), densityGR = 60)
#Data extraction data(ChIPanalyserData) # path to Position Frequency Matrix PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #As an example of genome, this example will run on the Drosophila genome if(!require("BSgenome.Dmelanogaster.UCSC.dm6", character.only = TRUE)){ if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("BSgenome.Dmelanogaster.UCSC.dm6") } library(BSgenome.Dmelanogaster.UCSC.dm6) DNASequenceSet <- getSeq(BSgenome.Dmelanogaster.UCSC.dm6) #Building data objects GPP <- genomicProfiles(PFM=PFM,PFMFormat="JASPAR", BPFrequency=DNASequenceSet) # Computing Genome Wide GenomeWide <- computeGenomeWideScores(DNASequenceSet = DNASequenceSet, genomicProfiles = GPP) #Compute PWM Scores PWMScores <- computePWMScore(DNASequenceSet = DNASequenceSet, genomicProfiles = GenomeWide, loci = top, chromatinState = Access) #Compute Occupnacy Occupancy <- computeOccupancy(genomicProfiles = PWMScores) #Compute ChIP profiles chipProfile <- computeChipProfile(loci = top, genomicProfiles = Occupancy) #Plotting Profile plotOccupancyProfile(predictedProfile=chipProfile, ChIPScore = chip, chromatinState = Access, occupancy = Occupancy, geneRef =geneRef) plotOccupancyProfile(predictedProfile=chipProfile, ChIPScore = chip, chromatinState = Access, occupancy = Occupancy, geneRef = geneRef, colCS = c("red","blue"), densityGR = 60)
plotOptimalHeatMaps
will plot heat maps of optimal
Parameters and highlight the optimal combination of
lambdaPWM
and boundMolecules
plotOptimalHeatMaps(optimalParam,contour=TRUE,col=NULL,main=NULL,layout=TRUE,overlay=FALSE)
plotOptimalHeatMaps(optimalParam,contour=TRUE,col=NULL,main=NULL,layout=TRUE,overlay=FALSE)
optimalParam |
|
contour |
|
col |
|
main |
|
layout |
|
overlay |
|
Once the optimal set of Parameters ( lambdaPWM
and boundMolecules
), it is possible to plot the results
in the form of a heat map. Each heat map will be plotted in a seperate page if
layout = TRUE, If layout= FALSE, it is up to the user to define how they wish
to layout there heat maps.
Returns a heat map of optimal combinations of lambdaPWM
and boundMolecules
. The x axis represents the different
value assigned to lambda ( lambdaPWM
)
and the y axis represents the different values to boundMolecules
( boundMolecules
).
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
#Data extraction data(ChIPanalyserData) # path to Position Frequency Matrix PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #As an example of genome, this example will run on the Drosophila genome if(!require("BSgenome.Dmelanogaster.UCSC.dm6", character.only = TRUE)){ if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("BSgenome.Dmelanogaster.UCSC.dm6") } library(BSgenome.Dmelanogaster.UCSC.dm6) DNASequenceSet <- getSeq(BSgenome.Dmelanogaster.UCSC.dm6) #Building data objects GPP <- genomicProfiles(PFM=PFM,PFMFormat="JASPAR",BPFrequency=DNASequenceSet) #Computing Optimal set of Parameters optimalParam <- computeOptimal(genomicProfiles = GPP, DNASequenceSet = DNASequenceSet, ChIPScore = chip, chromatinState = Access, parameterOptions = OPP, parameter = "all", peakMethod="moving_kernel") plotOptimalHeatMaps(optimalParam)
#Data extraction data(ChIPanalyserData) # path to Position Frequency Matrix PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #As an example of genome, this example will run on the Drosophila genome if(!require("BSgenome.Dmelanogaster.UCSC.dm6", character.only = TRUE)){ if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("BSgenome.Dmelanogaster.UCSC.dm6") } library(BSgenome.Dmelanogaster.UCSC.dm6) DNASequenceSet <- getSeq(BSgenome.Dmelanogaster.UCSC.dm6) #Building data objects GPP <- genomicProfiles(PFM=PFM,PFMFormat="JASPAR",BPFrequency=DNASequenceSet) #Computing Optimal set of Parameters optimalParam <- computeOptimal(genomicProfiles = GPP, DNASequenceSet = DNASequenceSet, ChIPScore = chip, chromatinState = Access, parameterOptions = OPP, parameter = "all", peakMethod="moving_kernel") plotOptimalHeatMaps(optimalParam)
PFM
slot in a
genomicProfiles
object
Accessor method for the PFM
slot in a
genomicProfiles
object
PositionFrequencyMatrix(object)
PositionFrequencyMatrix(object)
object |
|
After creating a genomicProfiles
object,
it is possible to access the Position Frequency Matrix slot.
However this slot will be empty if the genomicProfiles
object was built using directly a Position Weight Matrix.
See genomicProfiles
Returns the Position Frequency Matrix (PFM
slot) used to compute the
PositionWeightMatrix
in a
genomicProfiles
object
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
#Loading data data(ChIPanalyserData) #Loading PFM files PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #Building genomicProfiles object GPP<-genomicProfiles(PFM=PFM,PFMFormat="JASPAR") # Accessing Slot PositionFrequencyMatrix(GPP)
#Loading data data(ChIPanalyserData) #Loading PFM files PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #Building genomicProfiles object GPP<-genomicProfiles(PFM=PFM,PFMFormat="JASPAR") # Accessing Slot PositionFrequencyMatrix(GPP)
PositionFrequencyMatrix
~~Accessor method for the PFM
slot in a
genomicProfiles
object
PositionFrequencyMatrix(object)
PFM
slot in a
genomicProfiles
object
Setter method for the PFM
slot in a
genomicProfiles
object
PositionFrequencyMatrix(object)<- value
PositionFrequencyMatrix(object)<- value
object |
|
value |
|
The Position Frequency Matrix is one of the fundamental object that needs
to be supplied to a genomicProfiles
.
If after building a genomicProfiles
,
only the Position Frequency Matrix needs to be modified then it is
possible to manually update the value of this matrix using the function above.
There are two options for the type of data that may be supplied to the
PFM
slot: a matrix in the form of a Position Frequency Matrix
(matrix with four rows - one for each base pair (ACTG) and a number of
columns equal to the number of sites in the binding site), or it is
possible (also recommended) to provide a path to the file containing the
Position Frequency Matrix. This Position Frequency Matrix file may come
in multiple form such as RAW, Transfac or JASPAR.
WARNING: if a genomicProfiles object has already been created
and only the PFM is supplied/updated ,
then the Positon Weight Matrix will automatically updated as well.
Returns a genomicProfiles
with an updated PFM
slot (as described above this will lead to an updated PositionWeightMatrix).
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
#Loading data data(ChIPanalyserData) #Loading PFM files PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #Building genomicProfiles object # NOT ADVISED!!!! PLEASE PARSE PFM AND PFMFormat together GPP<-genomicProfiles(PFMFormat = "JASPAR") #Setting PFM PositionFrequencyMatrix(GPP) <- PFM
#Loading data data(ChIPanalyserData) #Loading PFM files PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #Building genomicProfiles object # NOT ADVISED!!!! PLEASE PARSE PFM AND PFMFormat together GPP<-genomicProfiles(PFMFormat = "JASPAR") #Setting PFM PositionFrequencyMatrix(GPP) <- PFM
PositionFrequencyMatrix<-
~~Setter method for the PFM
slot in a
genomicProfiles
object
PositionFrequencyMatrix(object)<-"path/to/file/"
PositionFrequencyMatrix(object)<-value
PWM
slot in a
genomicProfiles
object
Accessor Method for the PWM
slot in a
genomicProfiles
object
PositionWeightMatrix(object)
PositionWeightMatrix(object)
object |
|
After creating a genomicProfiles
object,
it is possible to access the Position Weight Matrix stored in this slot.
This slot should always contain something. This slot is either supplied by
user or directly computed from a Position Frequency Matrix when supplied.
Returns a matrix in the form of a Position Weight Matrix
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
#Loading data data(ChIPanalyserData) #Loading PFM files PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #Building genomicProfiles object GPP<-genomicProfiles(PFM=PFM,PFMFormat="JASPAR") # Accessing Slot PositionWeightMatrix(GPP)
#Loading data data(ChIPanalyserData) #Loading PFM files PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #Building genomicProfiles object GPP<-genomicProfiles(PFM=PFM,PFMFormat="JASPAR") # Accessing Slot PositionWeightMatrix(GPP)
PositionWeightMatrix
~~Accessor Method for the PWM
slot in a
genomicProfiles
object
PositionWeightMatrix(object)
PositionWeightMatrix
slot in a
genomicProfiles
object
Setter Method for the PositionWeightMatrix
slot in a
genomicProfiles
object
PositionWeightMatrix(object) <- value
PositionWeightMatrix(object) <- value
object |
|
value |
|
If a Position Weight Matrix is readily available, it is possible to directly
assign this matrix to the PWM
slot. However, this is only possible
if a genomicProfiles
object has already been created.
In that case, we advise to first create a
genomicProfiles
object. It should be noted
that this Position Weight Matrix will be automatically computed from a
Position Frequency Matrix. If no Position Frequency Matrix are available,
then a Position Weight Matrix can be directly assigned to this slot.
Returns a genomicProfiles
object with an updated
value for the PWM
slot
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
#Building genomicProfiles object GPP <- genomicProfiles() #Setting PWM to PositionWeightMatrix slot PWM <- matrix(runif(32,-10,20), ncol=8) rownames(PWM) <- c("A","C","T","G") PositionWeightMatrix(GPP) <- PWM
#Building genomicProfiles object GPP <- genomicProfiles() #Setting PWM to PositionWeightMatrix slot PWM <- matrix(runif(32,-10,20), ncol=8) rownames(PWM) <- c("A","C","T","G") PositionWeightMatrix(GPP) <- PWM
PositionWeightMatrix<-
~~Setter Method for the PositionWeightMatrix
slot in a
genomicProfiles
object
PositionWeightMatrix(object)<-value
processingChIP
will process and extract ChIP scores at a set of loci of interest.
processingChIP(profile,loci=NULL,reduce=NULL, peaks=NULL,chromatinState=NULL,parameterOptions=NULL, cores=1)
processingChIP(profile,loci=NULL,reduce=NULL, peaks=NULL,chromatinState=NULL,parameterOptions=NULL, cores=1)
profile |
|
loci |
|
reduce |
|
parameterOptions |
|
peaks |
|
chromatinState |
|
cores |
|
When using computeOptimal
, it is required to supply real ChIP
data in order to have a point of comparison. The corralation and MSE Scores are
computed based of how well the model fits biological data.
processingChIP
will extract this data from ChIP data at loci
of interest. When using the reduce
option, this function will only
select the top regions based on peak height or mean ChIP score.
processingChIP
will also extract maxSignal and backgroundSignal from
ChIP data and parse it to an parameterOptions
object.
Returns a ChIPScore object containing extracted (and normalised) ChIP scores, the loci of interest and newly extracted Parameters(e.g. maxSignal)
Patrick C.N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
#Data extraction data(ChIPanalyserData) ## Extracting ChIP scores at loci of interest ChIP<-processingChIP(profile=chip, loci=top)
#Data extraction data(ChIPanalyserData) ## Extracting ChIP scores at loci of interest ChIP<-processingChIP(profile=chip, loci=top)
profileAccuracyEstimate
will compare the predicted ChIP-seq-like
profile to real ChIP-seq data and return a set of metrics describing how
accurate the predicted model is compared to real data.
profileAccuracyEstimate(genomicProfiles,ChIPScore, parameterOptions=NULL,method="all",cores=1)
profileAccuracyEstimate(genomicProfiles,ChIPScore, parameterOptions=NULL,method="all",cores=1)
genomicProfiles |
|
ChIPScore |
|
parameterOptions |
|
method |
|
cores |
|
In order to assess the quality of the model against experimental ChIP-seq data, ChIPanalyser offers a wide range of method to choose from. These methods are also used when computing optimal paramters.
Returns list of goodness of fit metrics for each loci and each parameter selected.
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
#Data extraction data(ChIPanalyserData) # path to Position Frequency Matrix PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #As an example of genome, this example will run on the Drosophila genome if(!require("BSgenome.Dmelanogaster.UCSC.dm6", character.only = TRUE)){ if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("BSgenome.Dmelanogaster.UCSC.dm6") } library(BSgenome.Dmelanogaster.UCSC.dm6) DNASequenceSet <- getSeq(BSgenome.Dmelanogaster.UCSC.dm6) # Building genomicProfiles object GPP <- genomicProfiles(PFM=PFM,PFMFormat="JASPAR", BPFrequency=DNASequenceSet) # Computing Genome Wide GenomeWide <- computeGenomeWideScore(genomicProfiles = GPP, DNASequenceSet = DNASequenceSet) #Compute PWM Scores PWMScores <- computePWMScore(genomicProfiles = GenomeWide, DNASequenceSet = DNASequenceSet, loci = top, chromatinState = Access) #Compute Occupnacy Occupancy <- computeOccupancy(genomicProfiles = PWMScores) #Compute ChIP profiles chipProfile <- computeChIPProfile(genomicProfiles=Occupancy,loci=top) #Estimating accuracy estimate AccuracyEstimate <- profileAccuracyEstimate(genomicProfiles = chipProfile, ChIPScore = chip, occupancyProfileParameters = OPP)
#Data extraction data(ChIPanalyserData) # path to Position Frequency Matrix PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #As an example of genome, this example will run on the Drosophila genome if(!require("BSgenome.Dmelanogaster.UCSC.dm6", character.only = TRUE)){ if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("BSgenome.Dmelanogaster.UCSC.dm6") } library(BSgenome.Dmelanogaster.UCSC.dm6) DNASequenceSet <- getSeq(BSgenome.Dmelanogaster.UCSC.dm6) # Building genomicProfiles object GPP <- genomicProfiles(PFM=PFM,PFMFormat="JASPAR", BPFrequency=DNASequenceSet) # Computing Genome Wide GenomeWide <- computeGenomeWideScore(genomicProfiles = GPP, DNASequenceSet = DNASequenceSet) #Compute PWM Scores PWMScores <- computePWMScore(genomicProfiles = GenomeWide, DNASequenceSet = DNASequenceSet, loci = top, chromatinState = Access) #Compute Occupnacy Occupancy <- computeOccupancy(genomicProfiles = PWMScores) #Compute ChIP profiles chipProfile <- computeChIPProfile(genomicProfiles=Occupancy,loci=top) #Estimating accuracy estimate AccuracyEstimate <- profileAccuracyEstimate(genomicProfiles = chipProfile, ChIPScore = chip, occupancyProfileParameters = OPP)
profiles
~~Accessor method for profiles
in a genomicProfiles
object
profiles(oject)
Computed PWM scores, Occupancy or ChIP-seq like profiles for loci of interest and paramter combination of interest.
PWMpseudocount
slot in a
parameterOptions
Accessor Method for a PWMpseudocount
slot in a
parameterOptions
PWMpseudocount(object)
PWMpseudocount(object)
object |
|
In the context of Position Weight Matricies, the pseudocount is used to avoid 0 probabilities during the transformation of Position Frequency Matrix to a Position Probability Matrix and finally to a Postion Weight Matrix. It is essentially a sample correction that is added in the case of small sample size. The effect of the base pair to which a pseudocount was assigned will not influence the model nor will create mathematical issues such as infinities or zero division. Default is set at 1.
Returns the value assigned to a PWMpseudocount
slot in a
parameterOptions
object
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions(PWMpseudocount=0) #Accessing slot value PWMpseudocount(GPP)
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions(PWMpseudocount=0) #Accessing slot value PWMpseudocount(GPP)
PWMpseudocount
~~Accessor Method for a PWMpseudocount
slot in a
parameterOptions
PWMpseudocount(object)
pseudocount
slot in a
parameterOptions
object
Setter Method for the pseudocount
slot in a
parameterOptions
object
PWMpseudocount(object) <- value
PWMpseudocount(object) <- value
object |
|
value |
|
In the context of Position Weight Matricies, the pseudocount is used to avoid 0 probabilities during the transformation of Position Frequency Matrix to a Position Probability Matrix and finally to a Postion Weight Matrix. It is essentially a sample correction that is added in the case of small sample size. The effect of the base pair to which a pseudocount was assigned will not influence the model nor will create mathematical issues such as infinities or zero division.
Returns a parameterOptions
object with an updated value for
the pseudocount
slot.
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions( PWMpseudocount=0) #Setting Value for new PWMpseudocount PWMpseudocount(GPP) <- 1
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions( PWMpseudocount=0) #Setting Value for new PWMpseudocount PWMpseudocount(GPP) <- 1
PWMpseudocount<-
~~Setter Method for the pseudocount
slot in a
parameterOptions
object
PWMpseudocount(object)<-value
PWMThreshold
slot in a
parameterOptions
object
Accessor method for the PWMThreshold
slot in a
parameterOptions
object
PWMThreshold(object)
PWMThreshold(object)
object |
|
The computePWMScore
function requires a so-called PWM Threshold.
This threshold represents the Threshold at which PWM Score should be selected.
The PWMThreshold
is a positive numeric value (between 0 and 1.
If set at 0, all sites will be selected. If set at 0.7 (Default value),
then 70 % of PWM Score (and by extension binding sites) will be IGNORED.
The top 30 % will be selected.
Returns the value assinged to the PWMThreshold
slot in a
parameterOptions
object
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions(PWMThreshold=0.7) #Accessing Value for PWMThreshold PWMThreshold(GPP)
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions(PWMThreshold=0.7) #Accessing Value for PWMThreshold PWMThreshold(GPP)
PWMThreshold
~~Accessor method for the PWMThreshold
slot in a
parameterOptions
object
PWMThreshold(object)
PWMThreshold
slot in a
parameterOptions
object
Setter Method for the PWMThreshold
slot in a
parameterOptions
object
PWMThreshold(object) <- value
PWMThreshold(object) <- value
object |
|
value |
|
The computePWMScore
function requires a so-called PWM Threshold.
This threshold represents the Threshold at which PWM Score should be selected.
The PWMThreshold
is a positive numeric value (between 0 and 1.
If set at 0, all sites will be selected. If set at 0.7 (Default value),
then 70 % of PWM Score (and by extension binding sites) will be IGNORED.
The top 30 % will be selected.
Returns parameterOptions
objetc with an updated value
for the PWMThreshold
slot
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions(PWMThreshold=0.7) #Setting Value for new PWMThreshold PWMThreshold(GPP) <- 0.8
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions(PWMThreshold=0.7) #Setting Value for new PWMThreshold PWMThreshold(GPP) <- 0.8
PWMThreshold<-
~~Setter Method for the PWMThreshold
slot in a
parameterOptions
object
PWMThreshold(object)<-value
removeBackground
slot in a
parameterOptions
object
Accessor Method for the removeBackground
slot in a
parameterOptions
object
removeBackground(object)
removeBackground(object)
object |
|
A numeric value describing a threshold at which Occupancy signals must be
removed (Default is set at 0). The removal of Occupancy signals will occur
when computing computeOccupancy
(see computeOccupancy
function)
Returns the value assigned to the removeBackground
slot in a
parameterOptions
object
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
#Building parameterOptions object OPP <- parameterOptions() #Accessing Value for removeBackground removeBackground(OPP)
#Building parameterOptions object OPP <- parameterOptions() #Accessing Value for removeBackground removeBackground(OPP)
removeBackground
~~Accessor Method for the removeBackground
slot in a
parameterOptions
object
removeBackground(object)
removeBackground
slot in a
parameterOptions
object
Setter Method for the removeBackground
slot in a
parameterOptions
object
removeBackground(object) <-value
removeBackground(object) <-value
object |
|
value |
|
A numeric value describing a threshold at which Occupancy signals must be
removed (Default is set at 0). The removal of Occupancy signals will occur
when computing computeOccupancy
(see computeOccupancy
function)
Returns an parameterOptions
object with an updated
value for the removeBackground
slot
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
#Building parameterOptions object OPP <- parameterOptions() #Setting new Value for removeBackground removeBackground(OPP) <- 0.1
#Building parameterOptions object OPP <- parameterOptions() #Setting new Value for removeBackground removeBackground(OPP) <- 0.1
removeBackground<-
~~Setter Method for the removeBackground
slot in a
parameterOptions
object
removeBackground(object)<-value
scores
slot in a
ChIPScore
object
Setter Method for the scores
slot in a
ChIPScore
object
scores(object)
scores(object)
object |
|
When using the processingChIP
, this functions will return a
name list of normalised ChIP scores at loci of interest. This functions enalbles
you to extract those scores from the ChIPScore object.
Returns the value assigned to the scores
slot in a
ChIPScore
object.
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
# Loading data data(ChIPanalyserData) chip<-processingChIP(chip,top) str(scores(chip))
# Loading data data(ChIPanalyserData) chip<-processingChIP(chip,top) str(scores(chip))
scores
~~Accessor method for scores
slot in a ChIPScore
object.
scores(object)
Extracted and normalised ChIP scores at loci of interest.
searchSites
is function enabling quick extraction and search for
parameter combinations and/or loci in any genomicProfiles
object
from computeOccupancy onwards.
searchSites(Sites,lambdaPWM="all",BoundMolecules="all", Locus="all")
searchSites(Sites,lambdaPWM="all",BoundMolecules="all", Locus="all")
Sites |
|
lambdaPWM |
|
BoundMolecules |
|
Locus |
|
When testing numerous combinations of lambdaPWM and boundMolecules on top of many loci, it can
become challenging to navigate the large data output
searchSites
will make searching in this slot a lot easier.
If all arguments are left at their default value of "all", then all Parameters
will be searched thus returning the full list of Sites above
threshold. If a value for lambdaPWM is user provided then only this lambdaPWM will be selected (all boundMolecules and loci will also be selected).
searchSites
also works on the result of computeOptimal
.
Returns object of same time as parsed to this function with only the parameters and/or loci selected.
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
#Data extraction data(ChIPanalyserData) # path to Position Frequency Matrix PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #As an example of genome, this example will run on the Drosophila genome if(!require("BSgenome.Dmelanogaster.UCSC.dm6", character.only = TRUE)){ if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("BSgenome.Dmelanogaster.UCSC.dm6") } library(BSgenome.Dmelanogaster.UCSC.dm6) DNASequenceSet <- getSeq(BSgenome.Dmelanogaster.UCSC.dm6) # Building genomicProfiles object GPP <- genomicProfiles(PFM=PFM,PFMFormat="JASPAR", BPFrequency=DNASequenceSet) # Computing Genome Wide GenomeWide <- computeGenomeWideScore(genomicProfiles = GPP, DNASequenceSet = DNASequenceSet) #Compute PWM Scores PWMScores <- computePWMScore(genomicProfiles = GenomeWide, DNASequenceSet = DNASequenceSet, loci = top, chromatinState = Access) #Compute Occupnacy Occupancy <- computeOccupancy(genomicProfiles = PWMScores) searchSites(Occupancy,ScalingFactor=c(1,4), BoundMolecules = c(1,100), Locus="eve") #Compute ChIP profiles chipProfile <- computeChIPProfile(genomicProfiles=Occupancy,loci=top) searchSites(chipProfile,ScalingFactor=c(1,4), BoundMolecules = c(1,100), Locus="eve") optimalParam <- computeOptimal(genomicProfiles = GPP, DNASequenceSet = DNASequenceSet, ChIPScore = chip, chromatinState = Access, parameterOptions = OPP, parameter = "all", peakMethod="moving_kernel") searchSites(optimalParam,ScalingFactor=c(1,4), BoundMolecules = c(1,100), Locus="eve")
#Data extraction data(ChIPanalyserData) # path to Position Frequency Matrix PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #As an example of genome, this example will run on the Drosophila genome if(!require("BSgenome.Dmelanogaster.UCSC.dm6", character.only = TRUE)){ if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("BSgenome.Dmelanogaster.UCSC.dm6") } library(BSgenome.Dmelanogaster.UCSC.dm6) DNASequenceSet <- getSeq(BSgenome.Dmelanogaster.UCSC.dm6) # Building genomicProfiles object GPP <- genomicProfiles(PFM=PFM,PFMFormat="JASPAR", BPFrequency=DNASequenceSet) # Computing Genome Wide GenomeWide <- computeGenomeWideScore(genomicProfiles = GPP, DNASequenceSet = DNASequenceSet) #Compute PWM Scores PWMScores <- computePWMScore(genomicProfiles = GenomeWide, DNASequenceSet = DNASequenceSet, loci = top, chromatinState = Access) #Compute Occupnacy Occupancy <- computeOccupancy(genomicProfiles = PWMScores) searchSites(Occupancy,ScalingFactor=c(1,4), BoundMolecules = c(1,100), Locus="eve") #Compute ChIP profiles chipProfile <- computeChIPProfile(genomicProfiles=Occupancy,loci=top) searchSites(chipProfile,ScalingFactor=c(1,4), BoundMolecules = c(1,100), Locus="eve") optimalParam <- computeOptimal(genomicProfiles = GPP, DNASequenceSet = DNASequenceSet, ChIPScore = chip, chromatinState = Access, parameterOptions = OPP, parameter = "all", peakMethod="moving_kernel") searchSites(optimalParam,ScalingFactor=c(1,4), BoundMolecules = c(1,100), Locus="eve")
setChromatinStates
sets chromatin state affinity values to a GRanges object.
setChromatinStates(population,chromatinStates)
setChromatinStates(population,chromatinStates)
population |
Population list containing all individuals and associated parameter. Must
contain chromatin state affinity values. See |
chromatinStates |
GRanges object containing chromatin state locations. |
Chromatin states can be loaded into R as a GRanges object. Each range represents the extent of a certain chromatin state and the chromatin state type should be assigned to a meta data column called "name". The affinity values names should be set accordingly.
Returns a GRange object with affinity scores for each chromatin state range. Affinity scores are placed in the DNAAffinity meta data column.
Patrick C.N. Martin
library(ChIPanalyser) # Input data data(ChIPanalyserData) pop <- 10 params <- c("N","lambda","PWMThreshold", paste0("CS",seq(1:11))) start_pop <- generateStartingPopulation(pop, params) cs <- setChromatinStates(start_pop,cs)
library(ChIPanalyser) # Input data data(ChIPanalyserData) pop <- 10 params <- c("N","lambda","PWMThreshold", paste0("CS",seq(1:11))) start_pop <- generateStartingPopulation(pop, params) cs <- setChromatinStates(start_pop,cs)
show
~~Show methods for various objects
signature(object = "ChIPScore")
signature(object = "genomicProfiles")
signature(object = "parameterOptions")
singleRun
runs ChIPanalyser after optimal paramters have been found by the
evolve function.
singleRun(indiv,DNAAffinity, genomicProfiles,DNASequenceSet, ChIPScore,fitness="all")
singleRun(indiv,DNAAffinity, genomicProfiles,DNASequenceSet, ChIPScore,fitness="all")
indiv |
Population list containing the top scoring individual. Note that this should be a list of length 1 containing another list with all parameter values. |
DNAAffinity |
GRanges object as outputed by the |
genomicProfiles |
genomicProfiles object containing PWM scores and other desired metrics. Note that PWMThreshold, lambda and N will be overwritten using values from indiv. |
DNASequenceSet |
DNA string set object containing DNA sequence of interest. |
ChIPScore |
ChIPScore object as outputed by the |
fitness |
character string describing which metric should be used to assess fitness and should be one of the following:"geometric","ks","MSE","pearson","spearman","kendall", "recall","precesion","fscore","MCC","Accuracy" or "AUC". |
Once the genetic algorithm has been optimised, the top individual may be run on its own to get predicted ChIP profiles. The use of this function requires a few extract steps in order to predict ChIP profiles.
First, the index of the top individual should be extracted (see getHighestFitnessSolutions
).
Second, using this index, subset top individual from GA population. Note this
should be done using "[]" single bracket notation as, a list of length 1 containing
another list with all parameter values is required for the next steps. Yes, this
is might seem annoying but the functions were design for list structures...
Third, setchromatinStates using the top individual list. This will add chromatin
affinity values to your chromatinState GRanges. Use this new chromatinState object
as your new chromatinState object.
Fourth, parse your indiv list object to singleRun
.
Return a list with three elements. First element contains a genomicProfiles object with occupancy scores. Second element contains a genomicProfiles objecy with ChIP profile scores. Third element contains a goodness of fit metrics.
Patrick C.N. Martin <[email protected]
library(ChIPanalyser) data(ChIPanalyserData) # See GA vignette for usage
library(ChIPanalyser) data(ChIPanalyserData) # See GA vignette for usage
splitData
splits processed ChIP data into training and testing sets.
splitData(ChIPscore,dist = c(80,20), as.proportion = TRUE)
splitData(ChIPscore,dist = c(80,20), as.proportion = TRUE)
ChIPscore |
ChIPscore object as returned by |
dist |
If |
as.proportion |
Logical describing if values provided to |
Returns a named list of ChIPScore objects
* trainingSet = ChIPscore containing training set * testingSet = ChIPscore containing testing set.
Patrick C.N. Martin <[email protected]
library(ChIPanalyser) data(ChIPanalyserData) # See GA vignette for usage test <- processingChIP(chip,top) usingDist <- splitData(test, dist = c(50,50),as.proportion = TRUE ) usingIndex <- splitData(test, dist = c(1,2,3,4),as.proportion = FALSE )
library(ChIPanalyser) data(ChIPanalyserData) # See GA vignette for usage test <- processingChIP(chip,top) usingDist <- splitData(test, dist = c(50,50),as.proportion = TRUE ) usingIndex <- splitData(test, dist = c(1,2,3,4),as.proportion = FALSE )
stepSize
slot in
parameterOptions
object
Accessor method of the stepSize
slot in
parameterOptions
object
stepSize(object)
stepSize(object)
object |
|
It possible to restrict the size of the ChIP-seq-like profile produced
by computeChIPProfile
. Instead of returning ChIP-seq like
score for each base pair, it is possible to skip base pairs and only
return the predicted enrichement score for every "n" base pair
(n is the value assigned to stepSize). This will reduce the size of the
output data (unless step size is very large, this will not affect
the accuracy of the model). Default is set at 10 base pairs.
Returns the value assigned to the stepSize
slot in a
parameterOptions
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
# Building parameterOptions object OPP <- parameterOptions() #Setting new Value for maxSignal stepSize(OPP)
# Building parameterOptions object OPP <- parameterOptions() #Setting new Value for maxSignal stepSize(OPP)
stepSize
~~Accessor method of the stepSize
slot in
parameterOptions
object
stepSize(object)
stepSize
slot in a
parameterOptions
Setter Method for the stepSize
slot in a
parameterOptions
stepSize(object) <- value
stepSize(object) <- value
object |
|
value |
|
It possible to restrict the size of the ChIP-seq-like profile produced by
computeChIPProfile
. Instead of returning ChIP-seq like score
for each base pair, it is possible to skip base pairs and only return the
predicted enrichement score for every "n" base pair
(n is the value assigned to stepSize). This will reduce the size of the
output data (unless step size is very large, this will not affect the
accuracy of the model). Default is set at 10 base pairs.
Returns a parameterOptions
object with an updated value
for the stepSize
slot.
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
# Building parameterOptions object OPP <- parameterOptions() #Setting new Value for maxSignal stepSize(OPP) <- 20
# Building parameterOptions object OPP <- parameterOptions() #Setting new Value for maxSignal stepSize(OPP) <- 20
stepSize<-
~~Setter Method for the stepSize
slot in a
parameterOptions
stepSize(object)<-value
strandRule
slot in a
parameterOptions
object
Accessor Method for the strandRule
slot in a
parameterOptions
object
strandRule(object)
strandRule(object)
object |
|
When computing the PWM Scores and if whichstrand
is set to "+-", strandRule
will determine how to handle both strands
( one of three options : "mean", "max", "sum"). If set to "mean",
the average PWM Score of both strand will be computed. If set to "max",
the highest PWM score between each strand will be selected and finally "sum"
will sum both score together. Default set at "max"
Returns the value assigned to strandRule
slot (one of three options :
"mean", "max", "sum") in a parameterOptions
object
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions( strandRule="max") #Accesssing Value for strandRule strandRule(GPP)
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions( strandRule="max") #Accesssing Value for strandRule strandRule(GPP)
strandRule
~~Accessor Method for the strandRule
slot in a
parameterOptions
object
strandRule(object)
strandRule
slot in a
parameterOptions
object.
Setter method for the strandRule
slot in a
parameterOptions
object.
strandRule(object) <- value
strandRule(object) <- value
object |
|
value |
|
When computing the PWM Scores and if whichstrand
is set
to ‘+-’, strandRule
will determine how to handle both strands
( one of three options : ‘mean’, ‘max’, ‘sum’). If set to ‘mean’,
the average PWM Score of both strand will be computed. If set to ‘max’,
the highest PWM score between each strand will be selected and finally ‘sum’
will sum both score together.
Default set at ‘max’
Returns a parameterOptions
object with an updated
value for the strandRule
slot
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions(strandRule="max") #Setting New Value for strandRule strandRule(GPP) <- "mean"
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions(strandRule="max") #Setting New Value for strandRule strandRule(GPP) <- "mean"
strandRule<-
~~Setter method for the strandRule
slot in a
parameterOptions
object.
strandRule(object)<-value
whichstrand
slot in a
parameterOptions
object
Accessor method for the whichstrand
slot in a
parameterOptions
object
whichstrand(object)
whichstrand(object)
object |
|
PWM Score may be computed on either the positive strand ("+"), the negative strand ("-") or on both strands ("+-").
Returns on which strand PWM Scores should be computed
( whichstrand
in a parameterOptions
object)
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions( whichstrand="+-") #Setting New Value for whichstrand whichstrand(GPP)
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions( whichstrand="+-") #Setting New Value for whichstrand whichstrand(GPP)
whichstrand
~~Accessor method for the whichstrand
slot in a
parameterOptions
object
whichstrand(object)
whichstrand
slot in a
parameterOptions
object
Setter method for the whichstrand
slot in a
parameterOptions
object
whichstrand(object) <- value
whichstrand(object) <- value
object |
|
value |
|
PWM Score may be computed on either the positive strand ("+"), the negative strand ("-") or on both strands ("+-").
Returns a parameterOptions
object with an updated
value for the whichstrand
slot
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions( whichstrand="+-") #Setting New Value for whichstrand whichstrand(GPP) <- "+"
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions( whichstrand="+-") #Setting New Value for whichstrand whichstrand(GPP) <- "+"
whichstrand<-
~~Setter method for the whichstrand
slot in a
parameterOptions
object
whichstrand(object)<-value