ChIPanalyser is a package to predict and understand TF binding by utilizing a statistical thermodynamic model. The model incorporates 4 main factors thought to drive TF binding: Chromatin State, Binding energy, Number of bound molecules and a scaling factor modulating TF binding affinity. Taken together, ChIPanalyser produces ChIP-like profiles that closely mimic the patterns seens in real ChIP-seq data.
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
#Data extraction data(ChIPanalyserData) # path to Position Frequency Matrix PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #As an example of genome, this example will run on the Drosophila genome if(!require("BSgenome.Dmelanogaster.UCSC.dm6", character.only = TRUE)){ if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("BSgenome.Dmelanogaster.UCSC.dm6") } library(BSgenome.Dmelanogaster.UCSC.dm6) DNASequenceSet <- getSeq(BSgenome.Dmelanogaster.UCSC.dm6) #Building data objects GPP <- genomicProfiles(PFM=PFM,PFMFormat="JASPAR",BPFrequency=DNASequenceSet) chip<-processingChIP(chip,top) # Computing Genome Wide GenomeWide <- computeGenomeWideScores(DNASequenceSet = DNASequenceSet, genomicsProfiles = GPP) #Compute PWM Scores PWMScores <- computePWMScore(genomicsProfiles = GenomeWide, DNASequenceSet = DNASequenceSet, loci = top, chromatinState = Access) #Compute Occupnacy Occupancy <- computeOccupancy(genomicsProfiles = PWMScores, parameterOptions = OPP) #Compute ChIP profiles chipProfile <- computeChIPProfile(genomicProfiles = Occupancy, loci = top, parameterOptions = OPP) #Estimating accuracy estimate AccuracyEstimate <- profileAccuracyEstimate(genomicProfiles = chipProfile, ChIPScore = chip, parameterOptions = OPP)
slot in a
Extract or Access averageExpPWMScore
slot in a
object |
As a general rule, averageExpPWMScore
is computed and updated
internally by computeGenomeWideScores
Idealy, this slot should not be updated by user.
The averageExpPWMScore
is the sum of the exponential of every PWM score
for a given DNA sequence and divided by the length of the said DNA sequence
). This can either be the full
length sequence or only the accessible sequence
(see computeGenomeWideScores
Returns the averageExpPWMScore
of a
when computed.
Patrick C.N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
# Accessing Data data(ChIPanalyserData) # path to Position Frequency Matrix PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") # Building genomicProfiles object GPP <- genomicProfiles(PFM=PFM,PFMFormat="JASPAR") # Extracting AllSitesAboveThreshold slot averageExpPWMScore(GPP) ## Note this slot is now empty as nothing has yet been computed
~~~~ Methods for function averageExpPWMScore
signature(object = "genomicProfilesInternal")
slot in a
Extract or access the backgroundSignal
slot in a
object |
Default Value: 0
When computing computeOccupancy
, a ChIP-seq background
signal is used to scale Occupancy by considering both a
and a maxSignal
The backgroundSignal
is also used to nomalise occupancies against maxOccupancy.
The backgroundSignal
usually comes from
experimental data and is provided by user. As a general rule,
if ChIP-seq data is available and will be used in
, profileAccuracyEstimate
or plotOccupancyProfile
it is advised to use the backgroundSignal
from this data.
We strongly encourage to set values when building a
Returns a backgroundSignal
of a
Patrick C.N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
# Building occupancyProfileParameters object OPP <- parameterOptions() #Viewing single value in object backgroundSignal(OPP)
~~~~ Methods for function backgroundSignal
signature(object = "parameterOptions")
slot in a
Setter method for backgroundSignal
slot in a
object |
value |
Defualt value: 0.
When computing computeOccupancy
, a ChIP-seq background
signal is used to
scale Occupancy by considering both a backgroundSignal
and a
. The backgroundSignal
is also used to nomalise occupancies to maxOccupancy.
The backgroundSignal
usually comes from
experimental data and is provided by user. As a general rule,
if ChIP-seq data is available and will be used in
, profileAccuracyEstimate
, it is advised to use
the backgroundSignal
from this data.
We strongly encourage to set values when building a
Returns a parameterOptions
object with a new
assigned to the backgroundSignal
Patrick C.N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
# Building occupancyProfileParameters object OPP <- parameterOptions() # Setting new value for backgroundSignal backgroundSignal(OPP) <- 0.2 # Viewing whole object with new updated value OPP #Viewing single value in object backgroundSignal(OPP)
~~~~ Methods for function backgroundSignal<-
slot in
Extract or Access boundMolecules
slot in
object |
Defaut value: 1000
When computing occupancy (computeOccupancy
), a value for the
number of bound Molecules to DNA is needed.
This value can be updated and set in a
If the number of molecules is unknown,it is possible to infer this value with
We strongly encourage to set values when building a
Returns boundMolecules
slot in
Patrick C.N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
# Building parameterOptions object OPP <- parameterOptions() #Checking single value by slot accessor boundMolecules(OPP)
~~~~ Methods for function boundMolecules
signature(object = "parameterOptions")
slot in a
Setter method for the boundMolecules
slot in a
object |
value |
Default value: 1000
When computing occupancy (computeOccupancy
a value for the number of bound Molecules to DNA is needed.
This value can be updated and set in a
If the number of molecules is unknown,
it is possible to infer this value with computeOptimal
We strongly encourage to set values when building a
Returns a parameterOptions
object with an updated
value for boundMolecules
Patrick C.N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
# Building parameterOptions object OPP <- parameterOptions() # Setting new boundMolecules value in OPP boundMolecules(OPP) <- 5000 #Checking value in whole object OPP #Checking single value by slot accessor boundMolecules(OPP)
~~~~ Methods for function boundMolecules<-
signature(object = "parameterOptions", value = "vector")
slot in a
Extract or Access BPFrequency
slot in a
object |
Default value is c(0.25,0.25,0.25,0.25)
When generating a Postion Weight Matrix from a Position Frequency Matrix,
the probability
of occurrence of each base pair (Base Pair Frequency) is necessary
(as originally described by Gary Stormo). It is possible to
set custom values for BPFrequency
with a vector of length 4
containing the probability of occurrence of each base pair (A,C,G,T) in order.
If Base pair frequency is unknown, BPFrequency
will compute base pair
frequency from a DNA sequence. The nature of this sequence can be a
or a DNAStringSet
In order to decrease run time, it is advised to use
Returns BPFrequency
slot in
Patrick C.N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
data(ChIPanalyserData) # path to Position Frequency Matrix PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") # Building genomicProfiles object GPP <- genomicProfiles(PFM=PFM,PFMFormat="JASPAR") #Extracting BPFrequency slot BPFrequency(GPP)
~~~~ Methods for function BPFrequency
signature(object = "genomicProfilesInternal")
slot in a
Setter method for BPFrequency
slot in a
If base pair frequency is unknown, BPFrequency
will compute base pair
frequency from a DNA sequence.
object |
value |
A vector of length 4 containing the probability of occurrence of each base pair (A,C,G,T) in order. Default value is c(0.25,0.25,0.25,0.25). A A |
Default value is c(0.25,0.25,0.25,0.25)
When generating a Postion Weight Matrix from a Position Frequency Matrix,
the probability of occurrence of each base pair (Base Pair Frequency) is
necessary (as originally described by Gary Stormo). It is possible to
set custom values for BPFrequency
with a vector of length 4
containing the probability of occurrence of each base pair (A,C,G,T) in order.
If Base pair frequency is unknown, BPFrequency
will compute base pair
frequency from a DNA sequence when building a
The nature of this sequence can be aBSgenome
object or a
. In order to decrease run time,
it is advised to use DNAStringSet
Returns a genomicProfiles
object with an updated
value for BPFrequency
Patrick C.N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
data(ChIPanalyserData) # path to Position Frequency Matrix PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #As an example of genome, this example will run on the Drosophila genome if(!require("BSgenome.Dmelanogaster.UCSC.dm6", character.only = TRUE)){ if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("BSgenome.Dmelanogaster.UCSC.dm6") } library(BSgenome.Dmelanogaster.UCSC.dm6) DNASequenceSet <- getSeq(BSgenome.Dmelanogaster.UCSC.dm6) # Building genomicProfiles object GPP <- genomicProfiles(PFM=PFM,PFMFormat = "JASPAR", BPFrequency=DNASequenceSet) # Updating BPFrequency ## !! Note!! BPFrequency is used to compute PWM from PFM ## IF updated after building GPP, then it will not influence PWM ## Advised to build with BPFrequency directly BPFrequency(GPP) <- DNASequenceSet BPFrequency(GPP) <- c(0.25,0.25,0.25,0.25)
~~~~ Methods for function BPFrequency<-
signature(object = "genomicProfilesInternal", value = "DNAStringSet")
signature(object = "genomicProfilesInternal", value = "vector")
is derived from real biological data.
The source organism is Drosophila melanogaster.
The data can be described as genomic data as it contains DNA sequences,
loci, genetic information, DNA accessibility data and ChIP-seq data.
is GRanges
containing DNA Accesibility data for the sequences described above.
is GRanges
containing Chromatin State data for the sequences described above.
is GRanges
containing a locus of interest.
In this case eve strip Locus on
chromosome 2R in Drosophila melanogaster
is a GRanges containing ChIP score of the eve
strip locus in Drosophila melanogaster.
is a GRanges
containing UCSC gene reference information
Returns a set of Rdata objects as described above.
Transcription Factor PFM: Berkeley Drosophila Transcription Network Project (bdtnp.lbl.gov)
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
slot in a
Accessor method for chipMean
slot in a
object |
Default vlaue : 150
When computing ChIP-seq like profiles (computeChIPProfile
the occupancy values given by computeOccupancy
are transformed
into ChIP-seq like profiles.
The average size of a ChIP-seq peak was described by Kaplan
(Kaplan et al. , 2011). It is advised to use the average
width of ChIP peaks from actual ChIP-seq data.
We strongly encourage to set values when building a
Returns chipMean
slot from a
Patrick C.N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
Kaplan T.,Li X.-Y.,Sabo P.J.,Thomas S.,Stamatoyannopoulos J.A., Biggin M.D., EisenM.B. Quantitative models of the mechanisms that control genome-wide patterns of transcription factor binding during early Drosophila development, PLoS Genet.,2011, vol. 7 pg. e1001290
~~~~ Methods for function chipMean
slot in
Access methods for chipMean
slot in
object |
value |
Default vlaue : 150
When computing ChIP-seq like profiles (computeChIPProfile
the occupancy values given by computeOccupancy
transformed into ChIP-seq like profiles.
The average size of a ChIP-seq peak was described by Kaplan
(Kaplan et al. , 2011). It is advised to use the average
width of ChIP peaks from actual ChIP-seq data.
We strongly encourage to set values when building a
Returns a parameterOptions
object with an updated
value for chipMean
Patrick C.N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
Kaplan T.,Li X.-Y.,Sabo P.J.,Thomas S.,Stamatoyannopoulos J.A., Biggin M.D.,EisenM.B. Quantitative models of the mechanisms that control genome-wide patterns of transcription factor binding during early Drosophila development, PLoS Genet.,2011, vol. 7 pg. e1001290
~~~~ Methods for function chipMean<-
is the result of the processingChIP
function. This object contains the extracted ChIP Score from ChIP data, the loci of interest and optional paramters associated to ChIPanalyser. The loci of interest will either be user provided or the top n regions as defined by the reduce argument im processingChIP
. This object has the sole purpose of aiding the storage and parsing of data and parameters.
Object of this class are created internaly and will be parsed to other objects as is.
:Object of class "list"
List of extracted ChIP scores
:Object of class "loci"
GRanges containing loci of interest
:Object of class "numeric"
Ploidy level of the organism
:Object of class "vector"
Number of Bound molecules to the DNA
:Object of class "numeric"
ChIP background signal (average ChIP score)
:Object of class "numeric"
max ChIP signal
:Object of class "numeric"
Width of loci if reduce is used and no loci are provided
:Object of class "numeric"
Average ChIP peak width
:Object of class "numeric"
Standard Deviation of ChIP peak width
:Object of class "vector"
Smoothing window width for ChIP score
:Object of class "numeric"
Defining resolution size of ChIP like profiles (10bp = signal will be only considered every 10bp)
:Object of class "numeric"
Signal Threshold to be removed. Default removes all negative scores
:Object of class "character"
Type of noise filter to be used on ChIP data.
:Object of class "numeric"
Threshold of PWM scores that will be selected
:Object of class "character"
Rule to compute strand score (max, mean or sum)
:Object of class "character"
Which strand should be used to compute PWM scores.
:Object of class "vector"
Lambda value - Scaling factor to the PWM
:Object of class "logical"
PFM to PWM conversion log transform ( natural log or log2)
:Object of class "nos"
Number of Sites in the PWM that should be used to compute PWM scores.
:Object of class "numeric"
PWM pseudocount value for PFM to PWM conversion.
:Object of class "character"
Internal Tag - Code progression
Class "parameterOptions"
, directly.
signature(object = "ChIPScore", value = "loci")
: ...
signature(object = "ChIPScore", value = "list")
: ...
signature(.Object = "ChIPScore")
: ...
signature(object = "ChIPScore")
: ...
signature(object = "ChIPScore")
: ...
signature(object = "ChIPScore")
: ...
Patrick C.N. Martin
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
slot in a
Access or Extract chipSd
slot in a
object |
When computing ChIP-seq like profiles (computeChIPProfile
the occupancy values given by computeOccupancy
are transformed into
ChIP-seq like profiles.
The average size of a ChIP-seq peak was described by Kaplan
(Kaplan et al. , 2011). The average peak size is subject to
variation. This variation is accounted for with chipSd
It is advised to use the standard deviation of ChIP peak width from actual
ChIP-seq data.
We strongly encourage to set values when building a
Returns a parameterOptions
object with an
updated value for chipSd
Patrick C.N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
Kaplan T.,Li X.-Y.,Sabo P.J.,Thomas S.,Stamatoyannopoulos J.A., Biggin M.D., Eisen M.B. Quantitative models of the mechanisms that control genome-wide patterns of transcription factor binding during early Drosophila development, PLoS Genet.,2011, vol. 7 pg. e1001290
~~~~ Methods for function chipSd
slot in a
Setter methods for chipSd
slot in a
object |
value |
When computing ChIP-seq like profiles (computeChIPProfile
the occupancy values given by computeOccupancy
are transformed into
ChIP-seq like profiles.
The average size of a ChIP-seq peak was described by Kaplan
(Kaplan et al. , 2011). The average peak size is subject to
variation. This variation is accounted for with chipSd
It is advised to use the standard deviation
of ChIP peak width from actual ChIP-seq data.
We strongly encourage to set values when building a
Returns a parameterOptions
object with an updated
value for chipSd
Patrick C.N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
Kaplan T.,Li X.-Y.,Sabo P.J.,Thomas S.,Stamatoyannopoulos J.A., Biggin M.D., Eisen M.B. Quantitative models of the mechanisms that control genome-wide patterns of transcription factor binding during early Drosophila development, PLoS Genet.,2011, vol. 7 pg. e1001290
~~~~ Methods for function chipSd<-
slot in a
Access or Extract chipSmooth
slot in a
object |
When computing ChIP-seq like (computeChIPProfile
) profile
from occupancy data (see computeOccupancy
the profiles are smoothed using a window of a given size.
The default value is set at 250 base pairs. If chipSmooth
is set to 0 then the profile will not be smoothed.
We strongly encourage to set values when building a
Returns the chipSmooth
slot in an
Patrick C.N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
~~~~ Methods for function chipSmooth
signature(object = "parameterOptions")
slot in
Setter method for chipSmooth
slot in
chipSmooth(object) <- value
chipSmooth(object) <- value
object |
value |
When computing ChIP-seq like (computeChIPProfile
) profile
from occupancy data (see computeOccupancy
the profiles are smoothed using a window of a given size.
The default value is set at 250 base pairs.If chipSmooth
is set to 0 then the profile will not be smoothed.
We strongly encourage to set values when building a
Returns a parameterOptions
object with an updated
value for chipSmooth
Patrick C.N Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
~~~~ Methods for function chipSmooth<-
signature(object = "parameterOptions", value = "vector")
compute ChIP-seq like profile from occupancy data.
Occupancy data is computed using computeOccupancy
computeChIPProfile(genomicProfiles, loci, parameterOptions = NULL, norm = TRUE, method = c("moving_kernel","truncated_kernel","exact"), peakSignificantThreshold= NULL,cores=1, verbose = TRUE)
computeChIPProfile(genomicProfiles, loci, parameterOptions = NULL, norm = TRUE, method = c("moving_kernel","truncated_kernel","exact"), peakSignificantThreshold= NULL,cores=1, verbose = TRUE)
genomicProfiles |
loci |
parameterOptions |
norm |
method |
peakSignificantThreshold |
cores |
verbose |
converts Transcription Factor occuapncy to a profile
resembling the one of a ChIP-seq profile. Internally a few paramters are required
to build a ChIP like profile. These parameters are either defined and stored in
a ChIPScore
object (Paramters are updated based on
your ChIP data ), a genomicProfiles
(user defined at the
start of the analysis) or a parameterOptions
(if you want to
update values as you go along)
Returns a genomicProfiles
objec containing all ChIP-seq like
profile for every combination of lambdaPWM
and boundMolecules
provided by the user.
Patrick C.N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
#Extracting Data data(ChIPanalyserData) # path to Position Frequency Matrix PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #As an example of genome, this example will run on the Drosophila genome if(!require("BSgenome.Dmelanogaster.UCSC.dm6", character.only = TRUE)){ if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("BSgenome.Dmelanogaster.UCSC.dm6") } library(BSgenome.Dmelanogaster.UCSC.dm6) DNASequenceSet <- getSeq(BSgenome.Dmelanogaster.UCSC.dm6) # Building genomicProfiles object GPP <- genomicProfiles(PFM=PFM, PFMFormat="JASPAR",BPFrequency=DNASequenceSet) # Computing Genome Wide GenomeWide <- computeGenomeWideScores(genomicProfiles = GPP, DNASequenceSet = DNASequenceSet) #Compute PWM Scores PWMScores <- computePWMScore(genomicProfiles = GenomeWide, DNASequenceSet = DNASequenceSet, loci = top, chromatinState = Access) #Compute Occupnacy Occupancy <- computeOccupancy(genomicProfiles = PWMScores) #Compute ChIP profiles chipProfile <- computeChIPProfile(genomicProfiles=Occupancy,loci=top) chipProfile
compute the max and min PWM score over the entire genome.
computeGenomeWideScores(genomicProfiles, DNASequenceSet, chromatinState = NULL, parameterOptions = NULL, cores = 1, verbose = TRUE)
computeGenomeWideScores(genomicProfiles, DNASequenceSet, chromatinState = NULL, parameterOptions = NULL, cores = 1, verbose = TRUE)
genomicProfiles |
DNASequenceSet |
chromatinState |
parameterOptions |
cores |
verbose |
function computes PWM scores over the entire genome (or accessible Genome if chromatin State are provided ). Genome wide scores are used to determine the maximum and minimum PWM score as well as the average exponential score. These scores will in turn be used to determine which score are above the PWM theshold. The average exponential score is an integrale part of the equation used to compute Occupancy. Using defualt settings, ChIPanalyser will only compute occupancy on the top 70% of PWM scores. This threshold can be changed. See PWMThreshold
Returns a genomicsProfiles
object with updated values for max score, min score and averageExpPWMScore.
Patrick C.N Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
will compute the Occupancy from PWM Scores.
As described in detail in the vignette,
uses PWM Scores, DNA Accessibility data, the number of
bound molecules and a sclaing factor of Transcription Factor specificty.
This function will compute occupancy using the
values assigned to each variable.
computeOccupancy(genomicProfiles,parameterOptions = NULL, norm = TRUE, verbose = TRUE)
computeOccupancy(genomicProfiles,parameterOptions = NULL, norm = TRUE, verbose = TRUE)
genomicProfiles |
parameterOptions |
norm |
verbose |
will compute the Occupancy from PWM Scores.
As described in detail in the vignette,
uses PWM Scores, DNA Accessibility data,
the number of bound molecules and a sclaing factor of
Transcription Factor specificty.
This function will compute occupancy using the values assigned
to each variable. It should also be noted that the
object contains a set of parameters used to compute Occupancy
(not only restricted to this ). These parameters are often dependant on
real ChIP-Seq data and will influence
the goodness of fit between the predicted model an real ChIP-seq data.
We strongly advise that the values assigned to each parameter should be
customiszed in order to increase the model ageement with
real world biological data.
will return a genomicProfiles
The main difference will reside in the
This slot is generally a list or GRangesList
Within these list type structures are enclosed
containing the positions of site
above threshold, PWMScores and Occupancy for each site.
The series of GRanges will depend on the number of loci that are
tested and the number of element in the list will depend on the various
combinations of lambdaPWM
and boundMolecules
Patrick C.N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
#Data extraction data(ChIPanalyserData) # path to Position Frequency Matrix PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #As an example of genome, this example will run on the Drosophila genome if(!require("BSgenome.Dmelanogaster.UCSC.dm6", character.only = TRUE)){ if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("BSgenome.Dmelanogaster.UCSC.dm6") } library(BSgenome.Dmelanogaster.UCSC.dm6) DNASequenceSet <- getSeq(BSgenome.Dmelanogaster.UCSC.dm6) #Building data objects GPP <- genomicProfiles(PFM=PFM,PFMFormat="JASPAR",BPFrequency=DNASequenceSet) OPP <- parameterOptions() # Computing Genome Wide GenomeWide <- computeGenomeWideScores(genomicProfiles = GPP, DNASequenceSet = DNASequenceSet) #Compute PWM Scores PWMScores <- computePWMScore(genomicProfiles = GenomeWide, DNASequenceSet = DNASequenceSet, loci = top, chromatinState = Access) #Compute Occupnacy Occupancy <- computeOccupancy(genomicProfiles = PWMScores, parameterOptions = OPP) Occupancy
contains a set of functions some of which require two
parameters known as lambdaPWM
and as
. These two paramters are not always known.
will compute these values by maximising the
correlation and minimising the Mean Squared Error between a predicted
ChIP-seq-like profile and a real ChIP-seq profile for a given loci.
computeOptimal(genomicProfiles,DNASequenceSet, ChIPScore,chromatinState = NULL, parameterOptions = NULL, optimalMethod = "all",rank=FALSE,returnAll=TRUE, peakMethod="moving_kernel",cores=1)
computeOptimal(genomicProfiles,DNASequenceSet, ChIPScore,chromatinState = NULL, parameterOptions = NULL, optimalMethod = "all",rank=FALSE,returnAll=TRUE, peakMethod="moving_kernel",cores=1)
genomicProfiles |
DNASequenceSet |
ChIPScore |
chromatinState |
parameterOptions |
optimalMethod |
rank |
returnAll |
peakMethod |
cores |
In order to backward infer the values of lambdaPWM
and boundMolecules
, it is possible to use the
to find these parameters.
It should be noted that this functions requires a ChIP-seq data input.
(ChIP-seq data). This should be the output of the processingChIP
returns a list respectivly described as the optimal
set of Parameters (lambda - lambdaPWM
), the optimal matrix (a matrix containing
accuracy estimates dependant on the parameter chosen), and finally the
chosen parameter. If the parameter that was chosen was "all",
then each element of this list will contain the optimal set of
parameters, optimal matricies for all of the aforementioned paramters (see optimalMethod
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
will compute and extract all sites that exhibit a
PWM Score higher than a threshold.
This threshold (see PWMThreshold
) will determine the percentage
of total sites that should NOT be considered.
computePWMScore(genomicProfiles,DNASequenceSet, loci = NULL, chromatinState = NULL,parameterOptions=NULL,cores=1, verbose = TRUE)
computePWMScore(genomicProfiles,DNASequenceSet, loci = NULL, chromatinState = NULL,parameterOptions=NULL,cores=1, verbose = TRUE)
DNASequenceSet |
genomicProfiles |
loci |
parameterOptions |
chromatinState |
cores |
verbose |
After determining genome wide scores, it is possible to only compute and
extract high affinity sites (in the sense that they have a high PWM Score).
If a PWMThreshold
is not set by user,
the default value is set at 0.7.
This means that 70 % of sites will NOT be selected.
Only the top 30 % will be computed and extracted.
If one is interested in all PWM Scores at a genome wide scale
( or accessible DNA ), this is possible by setting
to zero.
will return a
The profiles
slot will have been updated.
This slot will now contain a GRangesList
with each element being a GRanges
This GRanges will contain postion of each sites
(start, end and strand) and the PWMScore associated to that site.
Patrick C.N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
slot in a
Accessor method for DNASequenceLength
slot in a
object |
The model on which is based ChIPanalyser
requires the length of the
DNA sequence used to compute scores. In this circustance,
this DNA Length is the total length of the DNA of the organism of interest
or the the Accessible DNA at a genome wide scale.
Returns DNASequenceLength
slot in a
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
~~~~ Methods for function DNASequenceLength
signature(object = "genomicProfilesInternal")
slot in a
Accessor Method for the drop
slot in a
object |
During certain computations, it is possible that the Loci of interest
do no show any overlap with accesible DNA. If this were to be the case,
a warning message will appear in the console but these inaccessible Loci
will be stored in this slot. It is also for these reasons that it is
imperative for Loci of interest to be named
(in this case, a named GRanges
Returns a character string with loci containing no accesible DNA.
Patrick C.N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
~~~~ Methods for function drop
signature(object = "genomicProfilesInternal")
pushes a starting population to evolve in a genetic algorithm.
evolve(population,DNASequenceSet,ChIPScore, genomicProfiles,parameters=NULL,generations=100,mutationProbability=0.3, offsprings=5,chromatinState=NULL, method="geometric", lambda=TRUE, checkpoint=TRUE, filename=NULL, cores=1)
evolve(population,DNASequenceSet,ChIPScore, genomicProfiles,parameters=NULL,generations=100,mutationProbability=0.3, offsprings=5,chromatinState=NULL, method="geometric", lambda=TRUE, checkpoint=TRUE, filename=NULL, cores=1)
population |
numeric value describing the number of individuals in the starting population.
Alternatively - a starting population list as returned by |
DNASequenceSet |
DNAStringSet object containing DNA sequences of interest (Extracted from BSgenome) |
ChIPScore |
ChIPScore object as returned by the |
genomicProfiles |
genomicProfiles object containing minimal information (such as the PWM) |
parameters |
vector or list containing each parameter that should be added to the chromosome.
See |
generations |
numeric describing the number of generation before the Genetic algorithm should halt. |
mutationProbability |
numeric descrbining the rate of mutations for each surviving individual |
offsprings |
numeric descrbining the number of individuals surviving to the next generation |
chromatinState |
GRanges object containing chromatin state information. Each state should be labled in a meta data column named "name". It is advised to use numeric values for each state name. |
method |
character string describing the scoring metric that should be used. ChIPanalyser offers twelve different metrics: correlation coefficients (Pearson, Spearman and Kendall), Mean Squared Error (MSE), Kolmogorov–Smirnov Distance, precision, recall, accuracy, F-score, Matthew’s correlation coefficient (MCC) and Area Under Curve Receiver Operator Characteristic (AUC ROC or just AUC) |
lambda |
logical describing if lambda value should be pre-computed. Setting to TRUE increases the speed of the algorithm. |
checkpoint |
logical describing if population parameters at each generations should be saved. |
filename |
character string that will serve as a prefix to the saved intermediate files. |
cores |
numeric describing the number of cores used to run the GA. |
ChIPanalyser offers a way of finding optimal solution by using a genetic algorithm. Instead of running the stadard analysis, TF binding affinities to chromatin states can be extracted via this more complex method. It should be noted that this method is better suited for the analysis of chromatin states. While the algorithm still works with simple DNA Accessibility, it would potentially take more time for accuracy minor gains.
Returns a named list with three elements.
database saves the data frame containing all scores for each individual since generation 1
population saves the last population with chromosome values
fitestsaves the fittest individual for a given generation
Patrick C.N. Martin <[email protected]
generates a starting population with random
traits for each individual
population |
numeric value describing the number of individuals in the starting population. |
parameters |
vector or list containing each parameter that should be added to the chromosome. |
names |
character describing names that should be added to each individual. |
generates a starting poppulation to be used
in the genetic algortihm implemented in ChIPanalyser. There are two main ways a
starting population can be generated:
by name Using names of each parameter that should be parse to each "chromosome". The possible paramters are N, lambda, PWMThreshold, CS ( DNAAffinity or DNAAccessibility also works). CS values should also contain a numeric value associated to each chromatin state you wish to parse. e.g CS1 ... CS14 This will generate a value by sampling from a set of predefined value for each paramters.
by value range
Using a named list (names for each parameters). Each element of the list
should contain three numeric values : length of range, min value, max value.
(Internally - values are parse to runif
Returns a list of individuals with a random traits
Patrick C.N. Martin
is an S4 object serving two purposes: (i) storing internal computed data and (ii) storing paramter options. This object is parsed through the different steps of the pipeline
to facilitate that parsing and changing of paramters.
genomicProfiles(..., parameterOptions = NULL, genomicProfiles = NULL, ChIPScore = NULL)
genomicProfiles(..., parameterOptions = NULL, genomicProfiles = NULL, ChIPScore = NULL)
... |
Any of the user available slots in genomicProfiles. |
parameterOptions |
If some parameters were already previously computed or stored in a parameterOptions, parsing this object will use those values instead of the default ones. |
genomicProfiles |
If some parameters were already previously computed or stored in a genomicProfiles, parsing this object will use those values instead of the default ones. |
ChIPScore |
If some parameters were already previously computed or stored in a ChIPScore, parsing this object will use those values instead of the default ones. |
The genomicProfiles
object serves the purpose of storing, and parsing paramters and computed data between the different steps of the pipeline. When creating a genomicProfiles
object it is possible to use previously computed values by simply parsing the object to the constructor function.
Returns a genomicsProfiles
object with updated slots for all paramters parsed.
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
is an S4 object serving two purposes: (i) storing internal computed data and (ii) storing paramter options. This object is parsed through the different steps of the pipeline
to facilitate that parsing and changing of paramters.
Objects can be created by calls of the form genomicProfiles(ploidy, boundMolecules, backgroundSignal, maxSignal, lociWidth, chipMean, chipSd, chipSmooth, stepSize, noiseFilter, removeBackground, lambdaPWM, PWMpseudocount, naturalLog, noOfSites, PWMThreshold, strandRule, whichstrand, PFM, PWM, PFMFormat, BPFrequency, minPWMScore, maxPWMScore, profiles, DNASequenceLength, averageExpPWMScore)
:Object of class "matrix"
A Position Weight Matrix (either supplied or internally computed if
PFM is provided)
:Object of class "matrix"
A Position Frequency Matrix (may also be a path to file containing PFM)
:Object of class "character"
A character string of one of the following: raw, transfac,JASPAR or
:Object of class "vector"
Base Pair Frequency in the genome (if a DNA sequence is provided
(as a DNAStringSet
or BSgenome
will be automatically computed internally). Default:c(0.25,0.25,0.25,0.25)
:Object of class "vector"
Lowest PWM score accros the genome (computed and updated internally)
:Object of class "vector"
Highest PWM score across the genome (computed and updated internally)
:Object of class "GRList"
Containins GRanges with sites above threshold and associated metrics
(PWMscore and Occupancy) - Computed Internally
:Object of class "vector"
Length of the Genome (or accesible genome) - computed internally
:Object of class "vector"
Average exponential PWM score across the genome
(or accesible genome) - computed internally
:Object of class "vector"
Internal background value (computed internally)
:Object of class "vector"
Stores Loci that do contain accesible DNA if it were to be the case
(computed and updated internally)
:Object of class "character"
~Internal Tags~
:Object of class "numeric"
A numeric Value descibing the ploidy of the organism. Default: 2
:Object of class "vector"
A vector (or single value) containing the number of bound Molecules
(bound Transcription Factors): Default: 1000
:Object of class "numeric"
A numeric value descibing the ChIP-seq background Signal
(average signal from real ChIP seq data). Default: 0
:Object of class "numeric"
A numeric value describing the highest ChIP-seq signal
(from real ChIP-seq data). Default: 1
:Object of class "numeric"
:Object of class "numeric"
A numeric value describing the mean width of a ChIP- seq peak.
:Object of class "numeric"
A numeric value describing the standard deviation of ChIP-seq peaks.
Default: 150
:Object of class "vector"
A numeric value describing the width of the window used to smooth
Occupancy profiles into ChIP profiles. Default:250
:Object of class "numeric"
A numeric value describing the step Size (in base pairs) between
each ChIP-seq score. Default:10 (Scored every 10 base pairs)
:Object of class "numeric"
A numeric value describing the value at which score should be removed.
Defualt:0 (If negative scores then remove)
:Object of class "character"
~Describes the
noiseFilter method that will be applied to ChIP data (Zero, mean, median,
:Object of class "numeric"
Threshold at which PWM Score should be selected (only sites above
threshold will be selected - between 0 and 1)
:Object of class "character"
"mean", "max" or "sum" will dertermine how strand should be handle
for computing PWM Scores. Default : "max"
:Object of class "character"
"+","-" or "+-" on which strand should PWM Score be computed.
Default: "+-"
:Object of class "vector"
A vector (or single value) contaning values for lambdaPWM Default:1
:Object of class "logical"
A logical value describing if natural Log will be used to compute
the PWM (if FALSE then log2 will be used). Default: TRUE
:Object of class "nos"
A Positive integer descibing number of sites (in base pair) should
be used from the PFM to compute PWM. Default =0 (Full width of
binding site will be used when set to 0)
:Object of class "numeric"
A numeric value describing a PWMpseudocount for PWM computation.
:Object of class "character"
Class "genomicProfilesInternal"
, directly.
Class "parameterOptions"
, directly.
signature(.Object = "genomicProfiles")
: ...
signature(object = "genomicProfiles")
: ...
Partick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
Non exported class. Represents the stripped down version of genomicProfiles.
Created Internally.
:Object of class "matrix"
:Object of class "matrix"
:Object of class "character"
:Object of class "vector"
:Object of class "vector"
:Object of class "vector"
:Object of class "GRList"
:Object of class "vector"
:Object of class "vector"
:Object of class "vector"
:Object of class "vector"
:Object of class "character"
signature(object = "genomicProfilesInternal", value = "numeric")
: ...
signature(object = "genomicProfilesInternal", value = "vector")
: ...
signature(object = "genomicProfilesInternal", value = "vector")
: ...
signature(object = "genomicProfilesInternal")
: ...
signature(object = "genomicProfilesInternal", value = "vector")
: ...
signature(object = "genomicProfilesInternal", value = "vector")
: ...
signature(object = "genomicProfilesInternal", value = "GRList")
: ...
signature(object = "genomicProfilesInternal")
: ...
signature(object = "genomicProfilesInternal", value = "character")
: ...
signature(object = "genomicProfilesInternal")
: ...
signature(object = "genomicProfilesInternal")
: ...
signature(object = "genomicProfilesInternal", value = "DNAStringSet")
: ...
signature(object = "genomicProfilesInternal", value = "vector")
: ...
signature(object = "genomicProfilesInternal")
: ...
signature(object = "genomicProfilesInternal")
: ...
signature(object = "genomicProfilesInternal")
: ...
signature(object = "genomicProfilesInternal")
: ...
signature(object = "genomicProfilesInternal")
: ...
signature(object = "genomicProfilesInternal", value = "character")
: ...
signature(object = "genomicProfilesInternal")
: ...
signature(object = "genomicProfilesInternal", value = "character")
: ...
signature(object = "genomicProfilesInternal", value = "matrix")
: ...
signature(object = "genomicProfilesInternal")
: ...
signature(object = "genomicProfilesInternal", value = "matrix")
: ...
signature(object = "genomicProfilesInternal")
: ...
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
extract best solution from a ChIPanalyser GA/evolve Run.
population |
Population list as output by the |
child |
numeric describing the number of solution to be extracted from Population list. |
method |
character string describing which scoring method should be used and selected from "geometric","ks","MSE","pearson","spearman","kendall", "recall","precesion","fscore","MCC","Accuracy" or "AUC". |
This function only serves as a way of extracting data from the poppulation list. Ultimately - it is just a wrapper for some indexing.
Return the index of the top "child" solutions.
Patrick C.N. Martin <[email protected]>
extracts selected regions from ChIPscore object to be used as testing set.
getTestingData(ChIPscore,loci = 1)
getTestingData(ChIPscore,loci = 1)
ChIPscore |
ChIPscore object as returned by |
loci |
numeric describing index of loci to be used as testing data. |
Returns ChIPscore object with the selected testing loci.
Patrick C.N. Martin <[email protected]
extracts selected regions from ChIPScore object to be used as training set.
getTrainingData(ChIPscore,loci = 1)
getTrainingData(ChIPscore,loci = 1)
ChIPscore |
ChIPscore object as returned by |
loci |
numeric describing index of loci to be used as training data. |
Returns ChIPscore object with the selected training loci.
Patrick C.N. Martin <[email protected]
Virutal Class to handle multiple data types for one slot
( profiles
A virtual Class: No objects may be created from it.
The purpose of this virtual classe is to store data of two different formats in one slot: GRangesList and Lists
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
~~~~ Methods for function initialize
signature(.Object = "ChIPScore")
Initialize ChIPScore
signature(.Object = "genomicProfiles")
Initialize genomicProfiles
signature(.Object = "parameterOptions")
Initialize parameterOptions
slot in a
Accessor Method for the lambdaPWM
slot in a
object |
The model underlying ChIPanalyser internally infers two paramters: number of bound molecules and lambda. Lambda represents a scaling factor for the Position weight matrix (PWM). This can be described as how well does a TF discriminate between high affinity and very high affinity sites.
Returns the value assigned to the lambdaPWM
slot in a
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
~~~~ Methods for function lambdaPWM
slot in a
Setter Method for the lambdaPWM
slot in a
object |
value |
The model underlying ChIPanalyser internally infers two paramters: number of bound molecules and lambda. Lambda represents a scaling factor for the Position weight matrix (PWM). This can be described as how well does a TF discriminate between high affinity and very high affinity sites.
Returns the value assigned to the lambdaPWM
slot in a
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
~~Setter method for the lambdaPWM
slot in the parameterOptions
slot in a
Setter Method for the loci
slot in a
object |
When using the processingChIP
, this functions will return a
name GRanges with the loci of interest. These loci will either result from
user input or extracted from the ChIP profiles (see processingChIP
and lociWidth
). This functions enalbles
you to extract those loci from the ChIPScore object.
Returns the value assigned to the loci
slot in a
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.
Setter for Loci of interest parsed to or extracted from the ChIPScore object
A virtual Class: No objects may be created from it.
signature(object = "ChIPScore", value = "loci")
: ...
Patrick C. N. Martin <[email protected]>
Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94. Patrick C.N. Martin and Nicolae Radu Zabe (2020) Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. CSBJ, 18, 3590-3605.
~~Accessor method for the loci
slot in ChIPScore
Loci of interest parsed to or extracted from the ChIPScore object
slot in a
Setter Method for the lociWidth
slot in a
object |
When using the processingChIP
function, the provided ChIP
scores will be split into bins of a given size. lociWidth determines the Size
of that bin. Default is set at 20 000 bp.
This means that the ChIP profiles provided will be split into bins of 20 000 bp
over the entire profile provided if no loci of interest is provided.
Returns the value assigned to the lociWidth
slot in a
Patrick C. N. Martin <[email protected]>
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions(lociWidth=20000) #Accessing new Value for lociWidth lociWidth(GPP)
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions(lociWidth=20000) #Accessing new Value for lociWidth lociWidth(GPP)
~~Accessor method for the loci
slot in ChIPScore
Setting width of regions when using the reduce argument and NOT providing your
own loci when using the processingChIP
slot in a
Setter Method for the lociWidth
slot in a
object |
value |
When using the processingChIP
function, the provided ChIP
scores will be split into bins of a given size. lociWidth determines the Size
of that bin. Default is set at 20 000 bp.
This mean that the ChIP profiles provided will be split into bins of 20 000 bp
over the entire profile provided if no loci of interest is provided.
Returns the value assigned to the lociWidth
slot in a
Patrick C. N. Martin <[email protected]>
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions(lociWidth=20000) #Setting new Value for lociWidth lociWidth(GPP) <- 30000
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions(lociWidth=20000) #Setting new Value for lociWidth lociWidth(GPP) <- 30000
~~Setter method for the loci
slot in ChIPScore
slot in a
Accessor function for maxPWMScore
slot in a
object |
is a numerical value that can be described as the
highest PWM score computed at a genome wide scale.
This value is computed and updated in the
object after using the
Returns the value of assigned to the maxPWMScore
slot in a
Patrick C. N. Martin <[email protected]>
# Loading data #Data extraction data(ChIPanalyserData) # path to Position Frequency Matrix PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #As an example of genome, this example will run on the Drosophila genome if(!require("BSgenome.Dmelanogaster.UCSC.dm6", character.only = TRUE)){ if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("BSgenome.Dmelanogaster.UCSC.dm6") } library(BSgenome.Dmelanogaster.UCSC.dm6) DNASequenceSet <- getSeq(BSgenome.Dmelanogaster.UCSC.dm6) #Building data objects GPP <- genomicProfiles(PFM=PFM,PFMFormat="JASPAR") # Computing Genome Wide GenomeWide <- computeGenomeWideScores(DNASequenceSet = DNASequenceSet, genomicProfiles = GPP) maxPWMScore(GenomeWide) ## If used before computeGenomeWidePWMScore, will return NULL
~~Accessor method for maxPWMScore
slot in a
Accessor method for the maxSignal
slot in a
object |
In the context of ChIPanalyser
, maxSignal
represents the
maximum normalised ChIP-Seq signal of a given Transcription factor
(or DNA binding protein). Although, A default value of 1 has been assigned to
this slot, we strongly recommend to tailor this value accordingly.
We strongly encourage to set values when building a
Returns the value assigned to the maxSignal
slot in a
Patrick C.N. Martin <[email protected]>
# Building parameterOptions object OPP <- parameterOptions() #Setting new Value for maxSignal maxSignal(OPP)
~~Accessor method for maxSignal
Maximum ChIP signal extracted from ChIP data (see processingChIP
slot in a
Setter method for maxSignal
slot in a
maxSignal(object) <- value
maxSignal(object) <- value
object |
value |
In the context of ChIPanalyser
, maxSignal
represents the
maximum normalised ChIP-Seq signal of a given Transcription factor
(or DNA binding protein). Although, A default value of 1 has been assigned to
this slot, we strongly recommend to tailor this value accordingly.
We strongly encourage to set values when building a
Returns a parameterOptions
with an updated
value for maxSignal
Patrick C.N. Martin <[email protected]>
# Building parameterOptions object OPP <- parameterOptions() #Setting new Value for maxSignal maxSignal(OPP) <- 1.8
# Building parameterOptions object OPP <- parameterOptions() #Setting new Value for maxSignal maxSignal(OPP) <- 1.8
~~Setter method for maxSignal
Maximum ChIP signal extracted from ChIP data (see processingChIP
slot in a
Accessor method the minPWMScore
slot in a
object |
can be described as the lowest PWM score computed at
a genome wide scale. Although it is possible to assigne a value
to minPWMScore
, we strongly advise to use the value
computed and assigned internally. This value is computed in the
Returns the value assigned to the minPWMScore
slot in a
Patrick C. N. Martin <[email protected]>
#Data extraction data(ChIPanalyserData) # path to Position Frequency Matrix PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #As an example of genome, this example will run on the Drosophila genome if(!require("BSgenome.Dmelanogaster.UCSC.dm6", character.only = TRUE)){ if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("BSgenome.Dmelanogaster.UCSC.dm6") } library(BSgenome.Dmelanogaster.UCSC.dm6) DNASequenceSet <- getSeq(BSgenome.Dmelanogaster.UCSC.dm6) #Building data objects GPP <- genomicProfiles(PFM=PFM,PFMFormat="JASPAR") # Computing Genome Wide GenomceWide <- computeGenomeWideScores(DNASequenceSet = DNASequenceSet, genomicProfiles = GPP) minPWMScore(GenomceWide) ## If used before computeGenomeWidePWMScore, will return NULL
~~Accessor for minPWMScore
Minimum PWM score computed during the computeGenomeWideScores
slot in a
Accessor method the naturalLog
slot in a
object |
During the computation of a Postion Weight Matrix, the
Position Probability Matrix (derived from a Position Frequency Matrix)
is log transformed. This parameter provides whcih "log transform" will be used.
If TRUE, the Natural Log will bu used (ln). If FALSE, log2 will be used.
We strongly encourage to set values when building a
Returns the value assigned to the naturalLog
slot in a
Patrick C.N. Martin <[email protected]>
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions(naturalLog=TRUE) #Setting new Value for naturalLog naturalLog(GPP)
~~Accessor method for the naturalLog
slot in a
slot in a
Setter method for the naturalLog
slot in a
naturalLog(object)<- value
naturalLog(object)<- value
object |
value |
During the computation of a Postion Weight Matrix, the
Position Probability Matrix (derived from a Position Frequency Matrix)
is log transformed. This parameter provides whcih "log transform" will be used.
If TRUE, the Natural Log will bu used (ln). If FALSE, log2 will be used.
We strongly encourage to set values when building a
Returns parameterOptions
object with an updated
value for the naturalLog
Patrick C.N. Martin <[email protected]>
# Loading data data(ChIPanalyserData) #Building data objects OPP <- parameterOptions(naturalLog=TRUE) #Setting new Value for naturalLog naturalLog(OPP) <- FALSE
~~Setter method for the naturalLog
slot in a
slot in a
Accessor Method for the noiseFilter
slot in a
object |
Noise filtering method that should be used on ChIP-seq data. Four methods are available: Zero, Mean, Median and Sigmoid. Zero removes all ChIP-seq scores bellow zero, mean under the mean score, median under median score and sigmoid assignes a weight to each score based on a logistic regression curve. Mid point is set at 95 95 quantile of ChIP-seq scores. Below midpoint will receive a score between 0 and 1 , everything above will receive a score between 1 and 2
Returns the value assigned to the noiseFilter
slot in a
Patrick C. N. Martin <[email protected]>
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions(noiseFilter="sigmoid") #Setting new Value for noiseFilter noiseFilter(GPP)
~~Accessor method for noiseFilter
Noise Filter that will be applied to ChIP scores
slot in a
Setter Method for the noiseFilter
slot in a
noiseFilter(object) <- value
noiseFilter(object) <- value
object |
value |
Noise filtering method that should be used on ChIP-seq data. Four methods are available: Zero, Mean, Median and Sigmoid. Zero removes all ChIP-seq scores bellow zero, mean under the mean score, median under median score and sigmoid assignes a weight to each score based on a logistic regression curve. Mid point is set at 95 95 quantile of ChIP-seq scores. Below midpoint will receive a score between 0 and 1 , everything above will receive a score between 1 and 2
Returns the value assigned to the noiseFilter
slot in a
Patrick C. N. Martin <[email protected]>
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions(noiseFilter="sigmoid") #Setting new Value for noiseFilter noiseFilter(GPP) <-"zero"
~~Setter method for noiseFilter
Noise Filter that will be applied to ChIP scores
slot in a
Accessor Method for the noOfSites
slot in a
object |
While computing Position Weight Matricies (PWM) from Position Frequency Matricies (PFM), it is possible to restrict the number of sites that will be used to compute the PWM. The default is set at "all". In this case, all sites will be used to compute the PWM.
Returns the value assigned to the noOfSites
slot in a
Patrick C. N. Martin <[email protected]>
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions(noOfSites="all") #Setting new Value for naturalLog noOfSites(GPP)
~~~~ Methods for function noOfSites
signature(object = "parameterOptions")
slot in a
Setter Method for the noOfSites
slot in a
noOfSites(object) <- value
noOfSites(object) <- value
object |
value |
While computing Position Weight Matricies (PWM) from Position Frequency Matricies (PFM), it is possible to restrict the number of sites that will be used to compute the PWM. The default is set at "all". In this case, all sites will be used to compute the PWM.
Returns a parameterOptions
object with an updated
value for the noOfSites
Patrick C.N. Martin <[email protected]>
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions(noOfSites=0) #Setting new Value for naturalLog noOfSites(GPP) <- 8
~~Setter method for noOfSites
Virtual class to handle Number of Sites
A virtual Class: No objects may be created from it.
No methods defined with class "nos" in the signature.
Patrick C. N. Martin <[email protected]>
is an object used to store and parse the various parameters
needed throughout this analysis pipeline.
parameterOptions(ploidy = 2, boundMolecules = 1000, backgroundSignal = 0, maxSignal = 1, lociWidth = 20000, chipMean = 200, chipSd = 200, chipSmooth = 250, stepSize = 10, removeBackground = 0, noiseFilter = "zero", naturalLog = TRUE, noOfSites = "all", PWMThreshold = 0.7, strandRule = "max", whichstrand = "+-", PWMpseudocount = 1, lambdaPWM = 1)
parameterOptions(ploidy = 2, boundMolecules = 1000, backgroundSignal = 0, maxSignal = 1, lociWidth = 20000, chipMean = 200, chipSd = 200, chipSmooth = 250, stepSize = 10, removeBackground = 0, noiseFilter = "zero", naturalLog = TRUE, noOfSites = "all", PWMThreshold = 0.7, strandRule = "max", whichstrand = "+-", PWMpseudocount = 1, lambdaPWM = 1)
ploidy |
boundMolecules |
backgroundSignal |
maxSignal |
lociWidth |
chipMean |
chipSd |
chipSmooth |
stepSize |
removeBackground |
noiseFilter |
naturalLog |
noOfSites |
PWMThreshold |
strandRule |
whichstrand |
PWMpseudocount |
lambdaPWM |
A vector (or single value) contaning values for the ScalingFactorPWM (Also known as lambda).Default:1 |
ChIPanalyser requires a lot of parameters. parameterOptions
was created with the intent
of storing and parsing these numerous arguments to the different functions. All parameters in this object are optional although strongly recommend. Some parameters are extracted and updated from
function along the pipeline e.g. maxSignal and backgroundSignal are extracted during the
step. These paramters will be automatically parsed. If you do not which to use them ( or any other parameter) simply parse a new parameterOptions object with your desired
Returns a parameterOptions
with updated values.
Patrick C. N. Martin <[email protected]>
# parameterOptions(ploidy = 2, boundMolecules = 1000, backgroundSignal = 0, maxSignal = 1, lociWidth = 20000, chipMean = 200, chipSd = 200, chipSmooth = 250, stepSize = 10, removeBackground = 0, noiseFilter = "zero", naturalLog = TRUE, noOfSites = "all", PWMThreshold = 0.7, strandRule = "max", whichstrand = "+-", PWMpseudocount = 1, lambdaPWM = 1)
is an object used to store and parse the various parameters
needed throughout this analysis pipeline.
Objects can be created by calls of the form parameterOptions(ploidy, boundMolecules, backgroundSignal, maxSignal, lociWidth, chipMean, chipSd, chipSmooth, stepSize, noiseFilter, removeBackground, lambdaPWM, PWMpseudocount, naturalLog, noOfSites, PWMThreshold, strandRule, whichstrand)
:Object of class "numeric"
A numeric Value descibing the ploidy of the organism. Default: 2
:Object of class "vector"
A vector (or single value) containing the number of bound Molecules
(bound Transcription Factors): Default: 1000
:Object of class "numeric"
A numeric value descibing the ChIP-seq background Signal
(average signal from real ChIP seq data). Default: 0
:Object of class "numeric"
A numeric value describing the highest ChIP-seq signal
(from real ChIP-seq data). Default: 1
:Object of class "numeric"
A numeric value describing bin size when splitting ChIP seq scores). Default: 20 000
:Object of class "numeric"
A numeric value describing the mean width of a ChIP- seq peak.
:Object of class "numeric"
A numeric value describing the standard deviation of ChIP-seq peaks.
Default: 150
:Object of class "vector"
A numeric value describing the width of the window used to smooth
Occupancy profiles into ChIP profiles. Default:250
:Object of class "numeric"
A numeric value describing the step Size (in base pairs) between
each ChIP-seq score. Default:10 (Scored every 10 base pairs)
:Object of class "numeric"
A numeric value describing the value at which score should be removed.
Defualt:0 (If negative scores then remove)
:Object of class "character"
noiseFilter method applied to ChIP scores
:Object of class "numeric"
Threshold at which PWM Score should be selected (only sites above
threshold will be selected - between 0 and 1)
:Object of class "character"
"mean", "max" or "sum" will dertermine how strand should be handle
for computing PWM Scores. Default : "max"
:Object of class "character"
"+","-" or "+-" on which strand should PWM Score be computed.
Default: "+-"
:Object of class "vector"
A vector (or single value) contaning values for lambdaPWM Default:1
:Object of class "logical"
A logical value describing if natural Log will be used to compute
the PWM (if FALSE then log2 will be used). Default: TRUE
:Object of class "nos"
A Positive integer descibing number of sites (in base pair) should
be used from the PFM to compute PWM. Default =0 (Full width of
binding site will be used when set to 0)
:Object of class "numeric"
A numeric value describing a PWMpseudocount for PWM computation.
:Object of class "character"
signature(object = "parameterOptions")
: ...
signature(object = "parameterOptions", value = "character")
: ...
signature(object = "parameterOptions")
: ...
signature(object = "parameterOptions", value = "vector")
: ...
signature(object = "parameterOptions")
: ...
signature(object = "parameterOptions", value = "numeric")
: ...
signature(object = "parameterOptions")
: ...
signature(object = "parameterOptions", value = "vector")
: ...
signature(object = "parameterOptions")
: ...
signature(object = "parameterOptions", value = "numeric")
: ...
signature(object = "parameterOptions")
: ...
signature(object = "parameterOptions", value = "numeric")
: ...
signature(object = "parameterOptions")
: ...
signature(object = "parameterOptions", value = "vector")
: ...
signature(.Object = "parameterOptions")
: ...
signature(object = "parameterOptions")
: ...
signature(object = "parameterOptions", value = "vector")
: ...
signature(object = "parameterOptions")
: ...
signature(object = "parameterOptions", value = "numeric")
: ...
signature(object = "parameterOptions")
: ...
signature(object = "parameterOptions", value = "numeric")
: ...
signature(object = "parameterOptions")
: ...
signature(object = "parameterOptions", value = "logical")
: ...
signature(object = "parameterOptions")
: ...
signature(object = "parameterOptions", value = "character")
: ...
signature(object = "parameterOptions")
: ...
signature(object = "parameterOptions", value = "character")
: ...
signature(object = "parameterOptions", value = "numeric")
: ...
signature(object = "parameterOptions")
: ...
signature(object = "parameterOptions", value = "numeric")
: ...
signature(object = "parameterOptions")
: ...
signature(object = "parameterOptions", value = "numeric")
: ...
signature(object = "parameterOptions")
: ...
signature(object = "parameterOptions", value = "numeric")
: ...
signature(object = "parameterOptions")
: ...
signature(object = "parameterOptions", value = "vector")
: ...
signature(object = "parameterOptions")
: ...
signature(object = "parameterOptions")
: ...
signature(object = "parameterOptions", value = "numeric")
: ...
signature(object = "parameterOptions")
: ...
signature(object = "parameterOptions", value = "character")
: ...
signature(object = "parameterOptions")
: ...
signature(object = "parameterOptions", value = "character")
: ...
Partick C. N. Martin <[email protected]>
slot in a
Accesor method for the PFMFormat
slot in a
object |
If loading a
from a file, the format
of the file should be specified. Default is raw. Please keep in mind that
this argument is used when parsing the
file. IF this argument is changed
after building the genomicProfiles
a PositionFrequencyMatrix file, this
will not influence the parsing of the file.
can be one of the following:
"raw","transfac","JASPAR" or "sequences"
Returns the value assigned to the PFMFormat
slot a
Patrick C. N. Martin <[email protected]>
# Loading data data(ChIPanalyserData) #Loading PFM files PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #Building data objects #### THIS IS THE PREFFERED METHOD FOR SETTING PFMFormat GPP <- genomicProfiles(PFM=PFM,PFMFormat="JASPAR") #Setting New value for PFMFormat PFMFormat(GPP)
~~Accesor method for the PFMFormat
slot in a
slot in a
Setter method for the PFMFormat
slot in a
PFMFormat(object) <- value
PFMFormat(object) <- value
object |
value |
If loading a
from a file, the format
of the file should be specified. Default is JASPAR. Please keep in mind that
this argument is used when parsing the
file. IF this argument is changed
after building the genomicProfiles
a PositionFrequencyMatrix file, this
will not influence the parsing of the file.
Returns a genomicProfiles
object with an updated
value for the PFMFormat
Patrick C. N. Martin <[email protected]>
# Loading data data(ChIPanalyserData) #Loading PFM files PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #Building data objects #### THIS IS THE PREFFERED METHOD FOR SETTING PFMFormat GPP <- genomicProfiles(PFM=PFM,PFMFormat="JASPAR") #Setting New value for PFMFormat PFMFormat(GPP) <- "JASPAR"
~~Setter method for the PFMFormat
slot in a
slot in a
Accessor method for the ploidy
slot in a
object |
Default value for ploidy
is set a 2. It should be mentioned that
ChIPanalyser is based on a model that also considers the ploidy of the
organism of interest however this only considers simple polyploidy
(or haploidy). The model does not consider hybrids such as wheat.
Returns the value assigned to the ploidy
slot in a
Patrick C. N. Martin <[email protected]>
# Building parameterOptions object OPP <- parameterOptions() #Setting new Value for maxSignal ploidy(OPP)
~~Accessor method for the ploidy
slot in a
slot in an
Setter Method for the ploidy
slot in an
ploidy(object)<- value
ploidy(object)<- value
object |
value |
Default value for ploidy
is set a 2. It should be mentioned that
ChIPanalyser is based on a model that also considers the ploidy of the
organism however this only considers simple polyploidy (or haploidy). T
he model does not consider hybrids such as wheat.
Returns a parameterOptions
object with an updated
value for the ploidy
Patrick C. N. Martin <[email protected]>
# Building parameterOptions object OPP <- parameterOptions() #Setting new Value for maxSignal ploidy(OPP) <- 2
~~Setter Method for the ploidy
slot in an
plots the predicted profiles.
If provided, this functions will also plot ChIP-seq profiles,
PWMScores (or Occupancy), chromatin States, Goodness of Fit estimates and gene information.
plotOccupancyProfile(predictedProfile, ChIPScore = NULL,chromatinState = NULL, occupancy = NULL,goodnessOfFit = NULL,PWM=FALSE, geneRef = NULL,addLegend = TRUE,...)
plotOccupancyProfile(predictedProfile, ChIPScore = NULL,chromatinState = NULL, occupancy = NULL,goodnessOfFit = NULL,PWM=FALSE, geneRef = NULL,addLegend = TRUE,...)
predictedProfile |
ChIPScore |
chromatinState |
occupancy |
goodnessOfFit |
geneRef |
addLegend |
... |
Any other graphical Parameter of the following : cex, cex.lab, cex.main, densityCS , densityGR , ylab, xlab, main, colPred, colChIP, colOccup, colCS, colGR, n_axis_ticks. See details. |
Once the predicted ChIP-seq like profiles have been computed, it is possible to plot these profiles.
This functions allows to control graphical parameters. In short:
* col = color values - exact number of colors or colors that will be used in a colorRampPalettte.
* cex = font sizes - for text, axis labels and main
* Density = fill density for chromatin state and/or geneRef blocks
Pred = predictedProfile ChIP = ChIP score (Experimental ChIP data) CS = Chromatin States GR = Gene reference Occup = Occupnacy locations
Returns a profile plot with "Occupancy" on the y axis and DNA position on the the X- axis.
Patrick C.N. Martin <[email protected]>
#Data extraction data(ChIPanalyserData) # path to Position Frequency Matrix PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #As an example of genome, this example will run on the Drosophila genome if(!require("BSgenome.Dmelanogaster.UCSC.dm6", character.only = TRUE)){ if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("BSgenome.Dmelanogaster.UCSC.dm6") } library(BSgenome.Dmelanogaster.UCSC.dm6) DNASequenceSet <- getSeq(BSgenome.Dmelanogaster.UCSC.dm6) #Building data objects GPP <- genomicProfiles(PFM=PFM,PFMFormat="JASPAR", BPFrequency=DNASequenceSet) # Computing Genome Wide GenomeWide <- computeGenomeWideScores(DNASequenceSet = DNASequenceSet, genomicProfiles = GPP) #Compute PWM Scores PWMScores <- computePWMScore(DNASequenceSet = DNASequenceSet, genomicProfiles = GenomeWide, loci = top, chromatinState = Access) #Compute Occupnacy Occupancy <- computeOccupancy(genomicProfiles = PWMScores) #Compute ChIP profiles chipProfile <- computeChipProfile(loci = top, genomicProfiles = Occupancy) #Plotting Profile plotOccupancyProfile(predictedProfile=chipProfile, ChIPScore = chip, chromatinState = Access, occupancy = Occupancy, geneRef =geneRef) plotOccupancyProfile(predictedProfile=chipProfile, ChIPScore = chip, chromatinState = Access, occupancy = Occupancy, geneRef = geneRef, colCS = c("red","blue"), densityGR = 60)
will plot heat maps of optimal
Parameters and highlight the optimal combination of
and boundMolecules
optimalParam |
contour |
col |
main |
layout |
overlay |
Once the optimal set of Parameters ( lambdaPWM
and boundMolecules
), it is possible to plot the results
in the form of a heat map. Each heat map will be plotted in a seperate page if
layout = TRUE, If layout= FALSE, it is up to the user to define how they wish
to layout there heat maps.
Returns a heat map of optimal combinations of lambdaPWM
and boundMolecules
. The x axis represents the different
value assigned to lambda ( lambdaPWM
and the y axis represents the different values to boundMolecules
( boundMolecules
Patrick C. N. Martin <[email protected]>
#Data extraction data(ChIPanalyserData) # path to Position Frequency Matrix PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #As an example of genome, this example will run on the Drosophila genome if(!require("BSgenome.Dmelanogaster.UCSC.dm6", character.only = TRUE)){ if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("BSgenome.Dmelanogaster.UCSC.dm6") } library(BSgenome.Dmelanogaster.UCSC.dm6) DNASequenceSet <- getSeq(BSgenome.Dmelanogaster.UCSC.dm6) #Building data objects GPP <- genomicProfiles(PFM=PFM,PFMFormat="JASPAR",BPFrequency=DNASequenceSet) #Computing Optimal set of Parameters optimalParam <- computeOptimal(genomicProfiles = GPP, DNASequenceSet = DNASequenceSet, ChIPScore = chip, chromatinState = Access, parameterOptions = OPP, parameter = "all", peakMethod="moving_kernel") plotOptimalHeatMaps(optimalParam)
slot in a
Accessor method for the PFM
slot in a
object |
After creating a genomicProfiles
it is possible to access the Position Frequency Matrix slot.
However this slot will be empty if the genomicProfiles
object was built using directly a Position Weight Matrix.
See genomicProfiles
Returns the Position Frequency Matrix (PFM
slot) used to compute the
in a
Patrick C. N. Martin <[email protected]>
#Loading data data(ChIPanalyserData) #Loading PFM files PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #Building genomicProfiles object GPP<-genomicProfiles(PFM=PFM,PFMFormat="JASPAR") # Accessing Slot PositionFrequencyMatrix(GPP)
~~Accessor method for the PFM
slot in a
slot in a
Setter method for the PFM
slot in a
PositionFrequencyMatrix(object)<- value
PositionFrequencyMatrix(object)<- value
object |
value |
The Position Frequency Matrix is one of the fundamental object that needs
to be supplied to a genomicProfiles
If after building a genomicProfiles
only the Position Frequency Matrix needs to be modified then it is
possible to manually update the value of this matrix using the function above.
There are two options for the type of data that may be supplied to the
slot: a matrix in the form of a Position Frequency Matrix
(matrix with four rows - one for each base pair (ACTG) and a number of
columns equal to the number of sites in the binding site), or it is
possible (also recommended) to provide a path to the file containing the
Position Frequency Matrix. This Position Frequency Matrix file may come
in multiple form such as RAW, Transfac or JASPAR.
WARNING: if a genomicProfiles object has already been created
and only the PFM is supplied/updated ,
then the Positon Weight Matrix will automatically updated as well.
Returns a genomicProfiles
with an updated PFM
slot (as described above this will lead to an updated PositionWeightMatrix).
Patrick C. N. Martin <[email protected]>
#Loading data data(ChIPanalyserData) #Loading PFM files PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #Building genomicProfiles object # NOT ADVISED!!!! PLEASE PARSE PFM AND PFMFormat together GPP<-genomicProfiles(PFMFormat = "JASPAR") #Setting PFM PositionFrequencyMatrix(GPP) <- PFM
~~Setter method for the PFM
slot in a
slot in a
Accessor Method for the PWM
slot in a
object |
After creating a genomicProfiles
it is possible to access the Position Weight Matrix stored in this slot.
This slot should always contain something. This slot is either supplied by
user or directly computed from a Position Frequency Matrix when supplied.
Returns a matrix in the form of a Position Weight Matrix
Patrick C. N. Martin <[email protected]>
#Loading data data(ChIPanalyserData) #Loading PFM files PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #Building genomicProfiles object GPP<-genomicProfiles(PFM=PFM,PFMFormat="JASPAR") # Accessing Slot PositionWeightMatrix(GPP)
~~Accessor Method for the PWM
slot in a
slot in a
Setter Method for the PositionWeightMatrix
slot in a
PositionWeightMatrix(object) <- value
PositionWeightMatrix(object) <- value
object |
value |
If a Position Weight Matrix is readily available, it is possible to directly
assign this matrix to the PWM
slot. However, this is only possible
if a genomicProfiles
object has already been created.
In that case, we advise to first create a
object. It should be noted
that this Position Weight Matrix will be automatically computed from a
Position Frequency Matrix. If no Position Frequency Matrix are available,
then a Position Weight Matrix can be directly assigned to this slot.
Returns a genomicProfiles
object with an updated
value for the PWM
Patrick C. N. Martin <[email protected]>
#Building genomicProfiles object GPP <- genomicProfiles() #Setting PWM to PositionWeightMatrix slot PWM <- matrix(runif(32,-10,20), ncol=8) rownames(PWM) <- c("A","C","T","G") PositionWeightMatrix(GPP) <- PWM
~~Setter Method for the PositionWeightMatrix
slot in a
will process and extract ChIP scores at a set of loci of interest.
processingChIP(profile,loci=NULL,reduce=NULL, peaks=NULL,chromatinState=NULL,parameterOptions=NULL, cores=1)
processingChIP(profile,loci=NULL,reduce=NULL, peaks=NULL,chromatinState=NULL,parameterOptions=NULL, cores=1)
profile |
loci |
reduce |
parameterOptions |
peaks |
chromatinState |
cores |
When using computeOptimal
, it is required to supply real ChIP
data in order to have a point of comparison. The corralation and MSE Scores are
computed based of how well the model fits biological data.
will extract this data from ChIP data at loci
of interest. When using the reduce
option, this function will only
select the top regions based on peak height or mean ChIP score.
will also extract maxSignal and backgroundSignal from
ChIP data and parse it to an parameterOptions
Returns a ChIPScore object containing extracted (and normalised) ChIP scores, the loci of interest and newly extracted Parameters(e.g. maxSignal)
Patrick C.N. Martin <[email protected]>
#Data extraction data(ChIPanalyserData) ## Extracting ChIP scores at loci of interest ChIP<-processingChIP(profile=chip, loci=top)
will compare the predicted ChIP-seq-like
profile to real ChIP-seq data and return a set of metrics describing how
accurate the predicted model is compared to real data.
profileAccuracyEstimate(genomicProfiles,ChIPScore, parameterOptions=NULL,method="all",cores=1)
profileAccuracyEstimate(genomicProfiles,ChIPScore, parameterOptions=NULL,method="all",cores=1)
genomicProfiles |
ChIPScore |
parameterOptions |
method |
cores |
In order to assess the quality of the model against experimental ChIP-seq data, ChIPanalyser offers a wide range of method to choose from. These methods are also used when computing optimal paramters.
Returns list of goodness of fit metrics for each loci and each parameter selected.
Patrick C. N. Martin <[email protected]>
#Data extraction data(ChIPanalyserData) # path to Position Frequency Matrix PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #As an example of genome, this example will run on the Drosophila genome if(!require("BSgenome.Dmelanogaster.UCSC.dm6", character.only = TRUE)){ if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("BSgenome.Dmelanogaster.UCSC.dm6") } library(BSgenome.Dmelanogaster.UCSC.dm6) DNASequenceSet <- getSeq(BSgenome.Dmelanogaster.UCSC.dm6) # Building genomicProfiles object GPP <- genomicProfiles(PFM=PFM,PFMFormat="JASPAR", BPFrequency=DNASequenceSet) # Computing Genome Wide GenomeWide <- computeGenomeWideScore(genomicProfiles = GPP, DNASequenceSet = DNASequenceSet) #Compute PWM Scores PWMScores <- computePWMScore(genomicProfiles = GenomeWide, DNASequenceSet = DNASequenceSet, loci = top, chromatinState = Access) #Compute Occupnacy Occupancy <- computeOccupancy(genomicProfiles = PWMScores) #Compute ChIP profiles chipProfile <- computeChIPProfile(genomicProfiles=Occupancy,loci=top) #Estimating accuracy estimate AccuracyEstimate <- profileAccuracyEstimate(genomicProfiles = chipProfile, ChIPScore = chip, occupancyProfileParameters = OPP)
~~Accessor method for profiles
in a genomicProfiles
Computed PWM scores, Occupancy or ChIP-seq like profiles for loci of interest and paramter combination of interest.
slot in a
Accessor Method for a PWMpseudocount
slot in a
object |
In the context of Position Weight Matricies, the pseudocount is used to avoid 0 probabilities during the transformation of Position Frequency Matrix to a Position Probability Matrix and finally to a Postion Weight Matrix. It is essentially a sample correction that is added in the case of small sample size. The effect of the base pair to which a pseudocount was assigned will not influence the model nor will create mathematical issues such as infinities or zero division. Default is set at 1.
Returns the value assigned to a PWMpseudocount
slot in a
Patrick C. N. Martin <[email protected]>
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions(PWMpseudocount=0) #Accessing slot value PWMpseudocount(GPP)
~~Accessor Method for a PWMpseudocount
slot in a
slot in a
Setter Method for the pseudocount
slot in a
PWMpseudocount(object) <- value
PWMpseudocount(object) <- value
object |
value |
In the context of Position Weight Matricies, the pseudocount is used to avoid 0 probabilities during the transformation of Position Frequency Matrix to a Position Probability Matrix and finally to a Postion Weight Matrix. It is essentially a sample correction that is added in the case of small sample size. The effect of the base pair to which a pseudocount was assigned will not influence the model nor will create mathematical issues such as infinities or zero division.
Returns a parameterOptions
object with an updated value for
the pseudocount
Patrick C. N. Martin <[email protected]>
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions( PWMpseudocount=0) #Setting Value for new PWMpseudocount PWMpseudocount(GPP) <- 1
~~Setter Method for the pseudocount
slot in a
slot in a
Accessor method for the PWMThreshold
slot in a
object |
The computePWMScore
function requires a so-called PWM Threshold.
This threshold represents the Threshold at which PWM Score should be selected.
The PWMThreshold
is a positive numeric value (between 0 and 1.
If set at 0, all sites will be selected. If set at 0.7 (Default value),
then 70 % of PWM Score (and by extension binding sites) will be IGNORED.
The top 30 % will be selected.
Returns the value assinged to the PWMThreshold
slot in a
Patrick C. N. Martin <[email protected]>
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions(PWMThreshold=0.7) #Accessing Value for PWMThreshold PWMThreshold(GPP)
~~Accessor method for the PWMThreshold
slot in a
slot in a
Setter Method for the PWMThreshold
slot in a
PWMThreshold(object) <- value
PWMThreshold(object) <- value
object |
value |
The computePWMScore
function requires a so-called PWM Threshold.
This threshold represents the Threshold at which PWM Score should be selected.
The PWMThreshold
is a positive numeric value (between 0 and 1.
If set at 0, all sites will be selected. If set at 0.7 (Default value),
then 70 % of PWM Score (and by extension binding sites) will be IGNORED.
The top 30 % will be selected.
Returns parameterOptions
objetc with an updated value
for the PWMThreshold
Patrick C. N. Martin <[email protected]>
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions(PWMThreshold=0.7) #Setting Value for new PWMThreshold PWMThreshold(GPP) <- 0.8
~~Setter Method for the PWMThreshold
slot in a
slot in a
Accessor Method for the removeBackground
slot in a
object |
A numeric value describing a threshold at which Occupancy signals must be
removed (Default is set at 0). The removal of Occupancy signals will occur
when computing computeOccupancy
(see computeOccupancy
Returns the value assigned to the removeBackground
slot in a
Patrick C. N. Martin <[email protected]>
#Building parameterOptions object OPP <- parameterOptions() #Accessing Value for removeBackground removeBackground(OPP)
~~Accessor Method for the removeBackground
slot in a
slot in a
Setter Method for the removeBackground
slot in a
removeBackground(object) <-value
removeBackground(object) <-value
object |
value |
A numeric value describing a threshold at which Occupancy signals must be
removed (Default is set at 0). The removal of Occupancy signals will occur
when computing computeOccupancy
(see computeOccupancy
Returns an parameterOptions
object with an updated
value for the removeBackground
Patrick C. N. Martin <[email protected]>
#Building parameterOptions object OPP <- parameterOptions() #Setting new Value for removeBackground removeBackground(OPP) <- 0.1
~~Setter Method for the removeBackground
slot in a
slot in a
Setter Method for the scores
slot in a
object |
When using the processingChIP
, this functions will return a
name list of normalised ChIP scores at loci of interest. This functions enalbles
you to extract those scores from the ChIPScore object.
Returns the value assigned to the scores
slot in a
Patrick C. N. Martin <[email protected]>
# Loading data data(ChIPanalyserData) chip<-processingChIP(chip,top) str(scores(chip))
~~Accessor method for scores
slot in a ChIPScore
Extracted and normalised ChIP scores at loci of interest.
is function enabling quick extraction and search for
parameter combinations and/or loci in any genomicProfiles
from computeOccupancy onwards.
searchSites(Sites,lambdaPWM="all",BoundMolecules="all", Locus="all")
searchSites(Sites,lambdaPWM="all",BoundMolecules="all", Locus="all")
Sites |
lambdaPWM |
BoundMolecules |
Locus |
When testing numerous combinations of lambdaPWM and boundMolecules on top of many loci, it can
become challenging to navigate the large data output
will make searching in this slot a lot easier.
If all arguments are left at their default value of "all", then all Parameters
will be searched thus returning the full list of Sites above
threshold. If a value for lambdaPWM is user provided then only this lambdaPWM will be selected (all boundMolecules and loci will also be selected).
also works on the result of computeOptimal
Returns object of same time as parsed to this function with only the parameters and/or loci selected.
Patrick C. N. Martin <[email protected]>
#Data extraction data(ChIPanalyserData) # path to Position Frequency Matrix PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm") #As an example of genome, this example will run on the Drosophila genome if(!require("BSgenome.Dmelanogaster.UCSC.dm6", character.only = TRUE)){ if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("BSgenome.Dmelanogaster.UCSC.dm6") } library(BSgenome.Dmelanogaster.UCSC.dm6) DNASequenceSet <- getSeq(BSgenome.Dmelanogaster.UCSC.dm6) # Building genomicProfiles object GPP <- genomicProfiles(PFM=PFM,PFMFormat="JASPAR", BPFrequency=DNASequenceSet) # Computing Genome Wide GenomeWide <- computeGenomeWideScore(genomicProfiles = GPP, DNASequenceSet = DNASequenceSet) #Compute PWM Scores PWMScores <- computePWMScore(genomicProfiles = GenomeWide, DNASequenceSet = DNASequenceSet, loci = top, chromatinState = Access) #Compute Occupnacy Occupancy <- computeOccupancy(genomicProfiles = PWMScores) searchSites(Occupancy,ScalingFactor=c(1,4), BoundMolecules = c(1,100), Locus="eve") #Compute ChIP profiles chipProfile <- computeChIPProfile(genomicProfiles=Occupancy,loci=top) searchSites(chipProfile,ScalingFactor=c(1,4), BoundMolecules = c(1,100), Locus="eve") optimalParam <- computeOptimal(genomicProfiles = GPP, DNASequenceSet = DNASequenceSet, ChIPScore = chip, chromatinState = Access, parameterOptions = OPP, parameter = "all", peakMethod="moving_kernel") searchSites(optimalParam,ScalingFactor=c(1,4), BoundMolecules = c(1,100), Locus="eve")
sets chromatin state affinity values to a GRanges object.
population |
Population list containing all individuals and associated parameter. Must
contain chromatin state affinity values. See |
chromatinStates |
GRanges object containing chromatin state locations. |
Chromatin states can be loaded into R as a GRanges object. Each range represents the extent of a certain chromatin state and the chromatin state type should be assigned to a meta data column called "name". The affinity values names should be set accordingly.
Returns a GRange object with affinity scores for each chromatin state range. Affinity scores are placed in the DNAAffinity meta data column.
Patrick C.N. Martin
~~Show methods for various objects
signature(object = "ChIPScore")
signature(object = "genomicProfiles")
signature(object = "parameterOptions")
runs ChIPanalyser after optimal paramters have been found by the
evolve function.
singleRun(indiv,DNAAffinity, genomicProfiles,DNASequenceSet, ChIPScore,fitness="all")
singleRun(indiv,DNAAffinity, genomicProfiles,DNASequenceSet, ChIPScore,fitness="all")
indiv |
Population list containing the top scoring individual. Note that this should be a list of length 1 containing another list with all parameter values. |
DNAAffinity |
GRanges object as outputed by the |
genomicProfiles |
genomicProfiles object containing PWM scores and other desired metrics. Note that PWMThreshold, lambda and N will be overwritten using values from indiv. |
DNASequenceSet |
DNA string set object containing DNA sequence of interest. |
ChIPScore |
ChIPScore object as outputed by the |
fitness |
character string describing which metric should be used to assess fitness and should be one of the following:"geometric","ks","MSE","pearson","spearman","kendall", "recall","precesion","fscore","MCC","Accuracy" or "AUC". |
Once the genetic algorithm has been optimised, the top individual may be run on its own to get predicted ChIP profiles. The use of this function requires a few extract steps in order to predict ChIP profiles.
First, the index of the top individual should be extracted (see getHighestFitnessSolutions
Second, using this index, subset top individual from GA population. Note this
should be done using "[]" single bracket notation as, a list of length 1 containing
another list with all parameter values is required for the next steps. Yes, this
is might seem annoying but the functions were design for list structures...
Third, setchromatinStates using the top individual list. This will add chromatin
affinity values to your chromatinState GRanges. Use this new chromatinState object
as your new chromatinState object.
Fourth, parse your indiv list object to singleRun
Return a list with three elements. First element contains a genomicProfiles object with occupancy scores. Second element contains a genomicProfiles objecy with ChIP profile scores. Third element contains a goodness of fit metrics.
Patrick C.N. Martin <[email protected]
splits processed ChIP data into training and testing sets.
splitData(ChIPscore,dist = c(80,20), as.proportion = TRUE)
splitData(ChIPscore,dist = c(80,20), as.proportion = TRUE)
ChIPscore |
ChIPscore object as returned by |
dist |
If |
as.proportion |
Logical describing if values provided to |
Returns a named list of ChIPScore objects
* trainingSet = ChIPscore containing training set * testingSet = ChIPscore containing testing set.
Patrick C.N. Martin <[email protected]
slot in
Accessor method of the stepSize
slot in
object |
It possible to restrict the size of the ChIP-seq-like profile produced
by computeChIPProfile
. Instead of returning ChIP-seq like
score for each base pair, it is possible to skip base pairs and only
return the predicted enrichement score for every "n" base pair
(n is the value assigned to stepSize). This will reduce the size of the
output data (unless step size is very large, this will not affect
the accuracy of the model). Default is set at 10 base pairs.
Returns the value assigned to the stepSize
slot in a
Patrick C. N. Martin <[email protected]>
# Building parameterOptions object OPP <- parameterOptions() #Setting new Value for maxSignal stepSize(OPP)
~~Accessor method of the stepSize
slot in
slot in a
Setter Method for the stepSize
slot in a
stepSize(object) <- value
stepSize(object) <- value
object |
value |
It possible to restrict the size of the ChIP-seq-like profile produced by
. Instead of returning ChIP-seq like score
for each base pair, it is possible to skip base pairs and only return the
predicted enrichement score for every "n" base pair
(n is the value assigned to stepSize). This will reduce the size of the
output data (unless step size is very large, this will not affect the
accuracy of the model). Default is set at 10 base pairs.
Returns a parameterOptions
object with an updated value
for the stepSize
Patrick C. N. Martin <[email protected]>
# Building parameterOptions object OPP <- parameterOptions() #Setting new Value for maxSignal stepSize(OPP) <- 20
~~Setter Method for the stepSize
slot in a
slot in a
Accessor Method for the strandRule
slot in a
object |
When computing the PWM Scores and if whichstrand
is set to "+-", strandRule
will determine how to handle both strands
( one of three options : "mean", "max", "sum"). If set to "mean",
the average PWM Score of both strand will be computed. If set to "max",
the highest PWM score between each strand will be selected and finally "sum"
will sum both score together. Default set at "max"
Returns the value assigned to strandRule
slot (one of three options :
"mean", "max", "sum") in a parameterOptions
Patrick C. N. Martin <[email protected]>
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions( strandRule="max") #Accesssing Value for strandRule strandRule(GPP)
~~Accessor Method for the strandRule
slot in a
slot in a
Setter method for the strandRule
slot in a
strandRule(object) <- value
strandRule(object) <- value
object |
value |
When computing the PWM Scores and if whichstrand
is set
to ‘+-’, strandRule
will determine how to handle both strands
( one of three options : ‘mean’, ‘max’, ‘sum’). If set to ‘mean’,
the average PWM Score of both strand will be computed. If set to ‘max’,
the highest PWM score between each strand will be selected and finally ‘sum’
will sum both score together.
Default set at ‘max’
Returns a parameterOptions
object with an updated
value for the strandRule
Patrick C. N. Martin <[email protected]>
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions(strandRule="max") #Setting New Value for strandRule strandRule(GPP) <- "mean"
~~Setter method for the strandRule
slot in a
slot in a
Accessor method for the whichstrand
slot in a
object |
PWM Score may be computed on either the positive strand ("+"), the negative strand ("-") or on both strands ("+-").
Returns on which strand PWM Scores should be computed
( whichstrand
in a parameterOptions
Patrick C. N. Martin <[email protected]>
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions( whichstrand="+-") #Setting New Value for whichstrand whichstrand(GPP)
~~Accessor method for the whichstrand
slot in a
slot in a
Setter method for the whichstrand
slot in a
whichstrand(object) <- value
whichstrand(object) <- value
object |
value |
PWM Score may be computed on either the positive strand ("+"), the negative strand ("-") or on both strands ("+-").
Returns a parameterOptions
object with an updated
value for the whichstrand
Patrick C. N. Martin <[email protected]>
# Loading data data(ChIPanalyserData) #Building data objects GPP <- parameterOptions( whichstrand="+-") #Setting New Value for whichstrand whichstrand(GPP) <- "+"
~~Setter method for the whichstrand
slot in a