Title: | DiffLogo: A comparative visualisation of biooligomer motifs |
---|---|
Description: | DiffLogo is an easy-to-use tool to visualize motif differences. |
Authors: | c( person("Martin", "Nettling", role = c("aut", "cre"), email = "[email protected]"), person("Hendrik", "Treutler", role = c("aut", "cre"), email = "[email protected]"), person("Jan", "Grau", role = c("aut", "ctb"), email = "[email protected]"), person("Andrey", "Lando", role = c("aut", "ctb"), email = "[email protected]"), person("Jens", "Keilwagen", role = c("aut", "ctb"), email = "[email protected]"), person("Stefan", "Posch", role = "aut", email = "[email protected]"), person("Ivo", "Grosse", role = "aut", email = "[email protected]")) |
Maintainer: | Hendrik Treutler<[email protected]> |
License: | GPL (>= 2) |
Version: | 2.31.0 |
Built: | 2024-11-29 05:25:37 UTC |
Source: | https://github.com/bioc/DiffLogo |
Align two sets of pwms
alignPwmSets(left_pwms_set, left_alignment, right_pwms_set, right_alignment, try_reverse_complement)
alignPwmSets(left_pwms_set, left_alignment, right_pwms_set, right_alignment, try_reverse_complement)
left_pwms_set |
list of pwms(matrixes) |
left_alignment |
alignment of left_pwms_set. |
right_pwms_set |
list of pwms; |
right_alignment |
alignment of right_pwms_set. |
try_reverse_complement |
if true(default), also try reverse complement. |
list - alignment of concatination of left_pwms_set and right_pwms_set
Lando Andrey
builts an object of class Alphabet from the given set of symbols and colors
Alphabet(chars, cols, supportReverseComplement)
Alphabet(chars, cols, supportReverseComplement)
chars |
set of symbols |
cols |
set of colors; one for each symbol |
supportReverseComplement |
boolean whether the alphabet supports reverse complementation (like DNA/RNA) or not (like ASN) |
the Alphabet object
Martin Nettling
DNA = Alphabet(c("A","C","G","T"),c("green4","blue","orange","red"),TRUE)
DNA = Alphabet(c("A","C","G","T"),c("green4","blue","orange","red"),TRUE)
the amino acid alphabet (20 symbols), i.e. A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y
ASN
ASN
An object of class Alphabet
of length 4.
Martin Nettling
ASN
ASN
Generates a PWM consisting of only the uniform distribution or the given base_distribution (if defined).
baseDistributionPwm(pwm_length, alphabet_length, base_distribution = NULL)
baseDistributionPwm(pwm_length, alphabet_length, base_distribution = NULL)
pwm_length |
the number of positions |
alphabet_length |
the alphabet size |
base_distribution |
optional base distribution for each PWM position |
a PWM
Calculates the p-value for the null-hypothesis that two given probability vectors p1, p2 calculated from n1/n2 observations arise from the same distribution
calculatePvalue(p1, p2, n1, n2, stackHeight = shannonDivergence, numberOfPermutations = 100, plotGammaDistributionFit = FALSE)
calculatePvalue(p1, p2, n1, n2, stackHeight = shannonDivergence, numberOfPermutations = 100, plotGammaDistributionFit = FALSE)
p1 |
first probability vector with one probability for each symbol of the alphabet |
p2 |
second probability vector with one probability for each symbol of the alphabet |
n1 |
number of observations for the calculation of p1 |
n2 |
number of observations for the calculation of p2 |
stackHeight |
function for the calculation of a divergence measure for two probability vectors |
numberOfPermutations |
the number of permutations to perform for the calculation of stackHeights |
plotGammaDistributionFit |
if TRUE the fit of a gamma distribution to the sampled stackHeights is plotted |
a numeric p-value
Hendrik Treutler
p1 <- c(0.2, 0.3, 0.1, 0.4) p2 <- c(0.2, 0.1, 0.3, 0.4) n1 <- 100 n2 <- 200 numberOfPermutations = 100 plotGammaDistributionFit = TRUE pValue <- calculatePvalue(p1 = p1, p2 = p2, n1 = n1, n2 = n2, stackHeight = shannonDivergence, numberOfPermutations = numberOfPermutations, plotGammaDistributionFit = plotGammaDistributionFit)
p1 <- c(0.2, 0.3, 0.1, 0.4) p2 <- c(0.2, 0.1, 0.3, 0.4) n1 <- 100 n2 <- 200 numberOfPermutations = 100 plotGammaDistributionFit = TRUE pValue <- calculatePvalue(p1 = p1, p2 = p2, n1 = n1, n2 = n2, stackHeight = shannonDivergence, numberOfPermutations = numberOfPermutations, plotGammaDistributionFit = plotGammaDistributionFit)
Creates a DiffLogo object
createDiffLogoObject(pwm1, pwm2, stackHeight = shannonDivergence, baseDistribution = normalizedDifferenceOfProbabilities, alphabet = DNA, align_pwms = FALSE, unaligned_penalty = divergencePenaltyForUnaligned, try_reverse_complement = TRUE, base_distribution = NULL, length_normalization = FALSE, unaligned_from_left = 0, unaligned_from_right = 0)
createDiffLogoObject(pwm1, pwm2, stackHeight = shannonDivergence, baseDistribution = normalizedDifferenceOfProbabilities, alphabet = DNA, align_pwms = FALSE, unaligned_penalty = divergencePenaltyForUnaligned, try_reverse_complement = TRUE, base_distribution = NULL, length_normalization = FALSE, unaligned_from_left = 0, unaligned_from_right = 0)
pwm1 |
representation of the first position weight matrix (PWM) of type pwm, data.frame, or matrix |
pwm2 |
representation of the second position weight matrix (PWM) of type pwm, data.frame, or matrix |
stackHeight |
function for the height of a stack at position i |
baseDistribution |
function for the heights of the individual bases |
alphabet |
of type Alphabet |
align_pwms |
if True, will aling and extend pwms. |
unaligned_penalty |
is a function for localPwmAlignment. |
try_reverse_complement |
if True, alignment will try reverse complement pwms |
base_distribution |
is a vector of length nrow(pwm) that is added to unaligned columns of pwms for comparing. If NULL, uniform distribution is used |
length_normalization |
If true, will minimize the average divergence between PWMs. Otherwise will minimize the sum of divergences between positions. In both cases unalignes positions are compared to base_distribution and are counted when computing the alignment length. |
unaligned_from_left |
the number of unaligned positions on the left |
unaligned_from_right |
the number of unaligned positions on the right |
DiffLogo object
Martin Nettling
motif_folder= "extdata/pwm" motif_names = c("HepG2","MCF7","HUVEC","ProgFib") motifs = list() for (name in motif_names) { fileName = paste(motif_folder,"/",name,".pwm",sep="") file = system.file(fileName, package = "DiffLogo") motifs[[name]] = getPwmFromPwmFile(file) } pwm1 = motifs[[motif_names[[1]]]] pwm2 = motifs[[motif_names[[2]]]] diffLogoObj = createDiffLogoObject(pwm1 = pwm1, pwm2 = pwm2) diffLogo(diffLogoObj)
motif_folder= "extdata/pwm" motif_names = c("HepG2","MCF7","HUVEC","ProgFib") motifs = list() for (name in motif_names) { fileName = paste(motif_folder,"/",name,".pwm",sep="") file = system.file(fileName, package = "DiffLogo") motifs[[name]] = getPwmFromPwmFile(file) } pwm1 = motifs[[motif_names[[1]]]] pwm2 = motifs[[motif_names[[2]]]] diffLogoObj = createDiffLogoObject(pwm1 = pwm1, pwm2 = pwm2) diffLogo(diffLogoObj)
information content differences normalized by the sum of absolute information content differences for the given pair of probability vectors
differenceOfICs(p1, p2)
differenceOfICs(p1, p2)
p1 |
probability vector representing the first symbol distribution |
p2 |
probability vector representing the second symbol distribution |
a vector with one result for each symbol
Martin Nettling
motif_folder= "extdata/pwm" motif_names = c("HepG2","MCF7","HUVEC","ProgFib") motifs = list() for (name in motif_names) { fileName = paste(motif_folder,"/",name,".pwm",sep="") file = system.file(fileName, package = "DiffLogo") motifs[[name]] = getPwmFromPwmFile(file) } pwm1 = motifs[[motif_names[[1]]]] pwm2 = motifs[[motif_names[[2]]]] diffLogoFromPwm(pwm1 = pwm1, pwm2 = pwm2, baseDistribution = differenceOfICs)
motif_folder= "extdata/pwm" motif_names = c("HepG2","MCF7","HUVEC","ProgFib") motifs = list() for (name in motif_names) { fileName = paste(motif_folder,"/",name,".pwm",sep="") file = system.file(fileName, package = "DiffLogo") motifs[[name]] = getPwmFromPwmFile(file) } pwm1 = motifs[[motif_names[[1]]]] pwm2 = motifs[[motif_names[[2]]]] diffLogoFromPwm(pwm1 = pwm1, pwm2 = pwm2, baseDistribution = differenceOfICs)
Draws the difference of two sequence logos.
diffLogo(diffLogoObj, ymin = 0, ymax = 0, sparse = FALSE, diffLogoConfiguration = list())
diffLogo(diffLogoObj, ymin = 0, ymax = 0, sparse = FALSE, diffLogoConfiguration = list())
diffLogoObj |
a DiffLogoObject created by the function createDiffLogoObject |
ymin |
minimum value on the y-axis |
ymax |
maximum value on the y-axis |
sparse |
if TRUE margins are reduced and tickmarks are removed from the logo |
diffLogoConfiguration |
list of configuration parameters (see function diffLogoTableConfiguration(...)) |
none (draws difference logo)
Martin Nettling
motif_folder= "extdata/pwm" motif_names = c("HepG2","MCF7","HUVEC","ProgFib") motifs = list() for (name in motif_names) { fileName = paste(motif_folder,"/",name,".pwm",sep="") file = system.file(fileName, package = "DiffLogo") motifs[[name]] = getPwmFromPwmFile(file) } pwm1 = motifs[[motif_names[[1]]]] pwm2 = motifs[[motif_names[[2]]]] diffLogoObj = createDiffLogoObject(pwm1 = pwm1, pwm2 = pwm2) diffLogo(diffLogoObj)
motif_folder= "extdata/pwm" motif_names = c("HepG2","MCF7","HUVEC","ProgFib") motifs = list() for (name in motif_names) { fileName = paste(motif_folder,"/",name,".pwm",sep="") file = system.file(fileName, package = "DiffLogo") motifs[[name]] = getPwmFromPwmFile(file) } pwm1 = motifs[[motif_names[[1]]]] pwm2 = motifs[[motif_names[[2]]]] diffLogoObj = createDiffLogoObject(pwm1 = pwm1, pwm2 = pwm2) diffLogo(diffLogoObj)
Draws the difference of two sequence logos.
diffLogoFromPwm(pwm1, pwm2, ymin = 0, ymax = 0, stackHeight = shannonDivergence, baseDistribution = normalizedDifferenceOfProbabilities, sparse = FALSE, alphabet = DNA, align_pwms = FALSE, unaligned_penalty = divergencePenaltyForUnaligned, try_reverse_complement = TRUE, base_distribution = NULL, length_normalization = FALSE)
diffLogoFromPwm(pwm1, pwm2, ymin = 0, ymax = 0, stackHeight = shannonDivergence, baseDistribution = normalizedDifferenceOfProbabilities, sparse = FALSE, alphabet = DNA, align_pwms = FALSE, unaligned_penalty = divergencePenaltyForUnaligned, try_reverse_complement = TRUE, base_distribution = NULL, length_normalization = FALSE)
pwm1 |
representation of the first position weight matrix (PWM) of type pwm, data.frame, or matrix |
pwm2 |
representation of the second position weight matrix (PWM) of type pwm, data.frame, or matrix |
ymin |
minimum value on the y-axis |
ymax |
maximum value on the y-axis |
stackHeight |
function for the height of a stack at position i |
baseDistribution |
function for the heights of the individual bases |
sparse |
if TRUE margins are reduced and tickmarks are removed from the logo |
alphabet |
of type Alphabet |
align_pwms |
if true, DiffLogo will align pwms before plotting |
unaligned_penalty |
is a function for localPwmAlignment. |
try_reverse_complement |
if True, alignment will try reverse complement pwms |
base_distribution |
is a vector of length nrow(pwm) that is added to unaligned columns of pwms for comparing. If NULL, uniform distribution is used |
length_normalization |
If true, will minimize the average divergence between PWMs. Otherwise will minimize the sum of divergences between positions. In both cases unalignes positions are compared to base_distribution and are counted when computing the alignment length. |
none (draws difference logo)
Martin Nettling
motif_folder= "extdata/pwm" motif_names = c("HepG2","MCF7","HUVEC","ProgFib") motifs = list() for (name in motif_names) { fileName = paste(motif_folder,"/",name,".pwm",sep="") file = system.file(fileName, package = "DiffLogo") motifs[[name]] = getPwmFromPwmFile(file) } pwm1 = motifs[[motif_names[[1]]]] pwm2 = motifs[[motif_names[[2]]]] diffLogoFromPwm(pwm1 = pwm1, pwm2 = pwm2)
motif_folder= "extdata/pwm" motif_names = c("HepG2","MCF7","HUVEC","ProgFib") motifs = list() for (name in motif_names) { fileName = paste(motif_folder,"/",name,".pwm",sep="") file = system.file(fileName, package = "DiffLogo") motifs[[name]] = getPwmFromPwmFile(file) } pwm1 = motifs[[motif_names[[1]]]] pwm2 = motifs[[motif_names[[2]]]] diffLogoFromPwm(pwm1 = pwm1, pwm2 = pwm2)
Draws a table of DiffLogos.
diffLogoTable(PWMs, sampleSizes = NULL, alphabet = DNA, configuration = list(), ...)
diffLogoTable(PWMs, sampleSizes = NULL, alphabet = DNA, configuration = list(), ...)
PWMs |
a list/vector of position weight matrices (PWMs) each of type pwm, data.frame, or matrix |
sampleSizes |
the number of sequences behind each PWM |
alphabet |
the alphabet of the given PWMs |
configuration |
list of (probably part of) of configuration options. See diffLogoTableConfiguration. |
... |
set of parameters passed to the function 'axis' for plotting |
none (draws table of difference logos)
Martin Nettling
motif_folder= "extdata/pwm" motif_names = c("HepG2","MCF7","HUVEC","ProgFib") motifs = list() for (name in motif_names) { fileName = paste(motif_folder,"/",name,".pwm",sep="") file = system.file(fileName, package = "DiffLogo") motifs[[name]] = getPwmFromPwmFile(file) } diffLogoTable(motifs)
motif_folder= "extdata/pwm" motif_names = c("HepG2","MCF7","HUVEC","ProgFib") motifs = list() for (name in motif_names) { fileName = paste(motif_folder,"/",name,".pwm",sep="") file = system.file(fileName, package = "DiffLogo") motifs[[name]] = getPwmFromPwmFile(file) } diffLogoTable(motifs)
Default configuration list for diffLogoTable
diffLogoTableConfiguration(alphabet, stackHeight = shannonDivergence, baseDistribution = normalizedDifferenceOfProbabilities, uniformYaxis = TRUE, sparse = TRUE, showSequenceLogosTop = TRUE, enableClustering = TRUE, treeHeight = 0.5, margin = 0.02, ratio = 1, align_pwms = FALSE, multiple_align_pwms = TRUE, unaligned_penalty = divergencePenaltyForUnaligned, try_reverse_complement = TRUE, length_normalization = FALSE, numberOfPermutations = 100)
diffLogoTableConfiguration(alphabet, stackHeight = shannonDivergence, baseDistribution = normalizedDifferenceOfProbabilities, uniformYaxis = TRUE, sparse = TRUE, showSequenceLogosTop = TRUE, enableClustering = TRUE, treeHeight = 0.5, margin = 0.02, ratio = 1, align_pwms = FALSE, multiple_align_pwms = TRUE, unaligned_penalty = divergencePenaltyForUnaligned, try_reverse_complement = TRUE, length_normalization = FALSE, numberOfPermutations = 100)
alphabet |
used alphabet of type Alphabet |
stackHeight |
function for the height of a stack at position i |
baseDistribution |
function for the heights of the individual bases |
uniformYaxis |
if TRUE each DiffLogo is plotted with the same scaling of the y-axis |
sparse |
if TRUE margins are reduced and tickmarks are removed from the logo |
showSequenceLogosTop |
if TRUE the classical sequence logos are drawn above each column of the table |
enableClustering |
if TRUE the motifs are reordered, so that similar motifs have a small vertical and horizontal distance in the table |
treeHeight |
the height of the plotted cluster tree above the columns of the table; set equal to zero to omit the cluster tree |
margin |
the space reseverved for labels |
ratio |
the ratio of the plot; this is needed to determine the margin sizes correctly |
align_pwms |
if True, will align and extend pwms in each cell of diffLogoTable independently. |
multiple_align_pwms |
if True, will align and extend pwms in the diffLogoTable jointly. |
unaligned_penalty |
is a function for localPwmAlignment. |
try_reverse_complement |
if True, alignment will try reverse complement pwms |
length_normalization |
if True, divergence between pwms is divided by length of pwms. |
numberOfPermutations |
number of permutations for the permutation test for the calculation of p-values |
list of parameters
Lando Andrey
diffLogoTableConfiguration(DNA)
diffLogoTableConfiguration(DNA)
the DNA alphabet, i.e. A, C, G, T
DNA
DNA
An object of class Alphabet
of length 4.
Martin Nettling
DNA
DNA
Draws a table of DiffLogos.
drawDiffLogoTable(diffLogoTableObj, ...)
drawDiffLogoTable(diffLogoTableObj, ...)
diffLogoTableObj |
the diffLogoTable-Object created by function prepareDiffLogoTable(...) |
... |
optional parameters for functon axis |
none (draws difference logo)
motif_folder= "extdata/pwm" motif_names = c("HepG2","MCF7","HUVEC","ProgFib") motifs = list() for (name in motif_names) { fileName = paste(motif_folder,"/",name,".pwm",sep="") file = system.file(fileName, package = "DiffLogo") motifs[[name]] = getPwmFromPwmFile(file) } diffLogoTableObj = prepareDiffLogoTable(motifs) drawDiffLogoTable(diffLogoTableObj)
motif_folder= "extdata/pwm" motif_names = c("HepG2","MCF7","HUVEC","ProgFib") motifs = list() for (name in motif_names) { fileName = paste(motif_folder,"/",name,".pwm",sep="") file = system.file(fileName, package = "DiffLogo") motifs[[name]] = getPwmFromPwmFile(file) } diffLogoTableObj = prepareDiffLogoTable(motifs) drawDiffLogoTable(diffLogoTableObj)
Enriches a difflogo object with p-values which quantifies the probability that two PWM-positions are from the same distribution
enrichDiffLogoObjectWithPvalues(diffLogoObj, n1, n2, stackHeight = shannonDivergence, numberOfPermutations = 100)
enrichDiffLogoObjectWithPvalues(diffLogoObj, n1, n2, stackHeight = shannonDivergence, numberOfPermutations = 100)
diffLogoObj |
matrix of difflogo objects |
n1 |
the number of sequences behind the first pwm behind the given difflogo object |
n2 |
the number of sequences behind the second pwm behind the given difflogo object |
stackHeight |
function for the calculation of a divergence measure for two probability vectors |
numberOfPermutations |
the number of permutations to perform for the calculation of stackHeights |
enriched difflogo object
motif_folder= "extdata/pwm" motif_names = c("HepG2","MCF7","HUVEC","ProgFib") motifs = list() for (name in motif_names) { fileName = paste(motif_folder,"/",name,".pwm",sep="") file = system.file(fileName, package = "DiffLogo") motifs[[name]] = getPwmFromPwmFile(file) } pwm1 = motifs[[motif_names[[1]]]] pwm2 = motifs[[motif_names[[2]]]] n1 <- 100 n2 <- 100 diffLogoObj = createDiffLogoObject(pwm1 = pwm1, pwm2 = pwm2) diffLogoObj = enrichDiffLogoObjectWithPvalues(diffLogoObj, n1, n2)
motif_folder= "extdata/pwm" motif_names = c("HepG2","MCF7","HUVEC","ProgFib") motifs = list() for (name in motif_names) { fileName = paste(motif_folder,"/",name,".pwm",sep="") file = system.file(fileName, package = "DiffLogo") motifs[[name]] = getPwmFromPwmFile(file) } pwm1 = motifs[[motif_names[[1]]]] pwm2 = motifs[[motif_names[[2]]]] n1 <- 100 n2 <- 100 diffLogoObj = createDiffLogoObject(pwm1 = pwm1, pwm2 = pwm2) diffLogoObj = enrichDiffLogoObjectWithPvalues(diffLogoObj, n1, n2)
Enriches a matrix of difflogo objects with p-values which quantifies the probability that two PWM-positions are from the same distribution
enrichDiffLogoTableWithPvalues(diffLogoObjMatrix, sampleSizes, stackHeight = shannonDivergence, numberOfPermutations = 100)
enrichDiffLogoTableWithPvalues(diffLogoObjMatrix, sampleSizes, stackHeight = shannonDivergence, numberOfPermutations = 100)
diffLogoObjMatrix |
matrix of difflogo objects |
sampleSizes |
number of sequences behind the pwms behind the given difflogo objects |
stackHeight |
function for the calculation of a divergence measure for two probability vectors |
numberOfPermutations |
the number of permutations to perform for the calculation of stackHeights |
matrix of difflogo objects enriched with p-values
Martin Nettling
motif_folder= "extdata/pwm" motif_names = c("HepG2","MCF7","HUVEC","ProgFib") motifs = list() for (name in motif_names) { fileName = paste(motif_folder,"/",name,".pwm",sep="") file = system.file(fileName, package = "DiffLogo") motifs[[name]] = getPwmFromPwmFile(file) } sampleSizes <- c(100, 150, 200, 250) names(sampleSizes) <- motif_names diffLogoTableObj = prepareDiffLogoTable(motifs); diffLogoTableObj$diffLogoObjMatrix = enrichDiffLogoTableWithPvalues(diffLogoTableObj$diffLogoObjMatrix, sampleSizes)
motif_folder= "extdata/pwm" motif_names = c("HepG2","MCF7","HUVEC","ProgFib") motifs = list() for (name in motif_names) { fileName = paste(motif_folder,"/",name,".pwm",sep="") file = system.file(fileName, package = "DiffLogo") motifs[[name]] = getPwmFromPwmFile(file) } sampleSizes <- c(100, 150, 200, 250) names(sampleSizes) <- motif_names diffLogoTableObj = prepareDiffLogoTable(motifs); diffLogoTableObj$diffLogoObjMatrix = enrichDiffLogoTableWithPvalues(diffLogoTableObj$diffLogoObjMatrix, sampleSizes)
Extends pwms by adding base_distribution to both sides, so that they keep aligned, but have equal length.
extendPwmsFromAlignmentVector(pwms, alignment_vector, base_distribution = NULL)
extendPwmsFromAlignmentVector(pwms, alignment_vector, base_distribution = NULL)
pwms |
is a list of matrixes |
alignment_vector |
is a list of shifts ($shift) and orientations ($direction) |
base_distribution |
is a vector of length nrow(pwm) that is added to unaligned columns of pwms for comparing. If NULL, uniform distribution is used |
extended pwms
Lando Andrey
file1 = system.file("extdata/homer/Max.motif", package = "DiffLogo") file2 = system.file("extdata/homer/c-Myc.motif", package = "DiffLogo") pwm1 = getPwmFromFile(file1) pwm2 = getPwmFromFile(file2) pwms <- list(pwm1, pwm2) multiple_pwms_alignment = multipleLocalPwmsAlignment(pwms) aligned_pwms = extendPwmsFromAlignmentVector(pwms, multiple_pwms_alignment$alignment$vector)
file1 = system.file("extdata/homer/Max.motif", package = "DiffLogo") file2 = system.file("extdata/homer/c-Myc.motif", package = "DiffLogo") pwm1 = getPwmFromFile(file1) pwm2 = getPwmFromFile(file2) pwms <- list(pwm1, pwm2) multiple_pwms_alignment = multipleLocalPwmsAlignment(pwms) aligned_pwms = extendPwmsFromAlignmentVector(pwms, multiple_pwms_alignment$alignment$vector)
the alphabet of all 26 characters
FULL_ALPHABET
FULL_ALPHABET
An object of class Alphabet
of length 4.
Hendrik Treutler
FULL_ALPHABET
FULL_ALPHABET
returns the alphabet which fits to the given characters
getAlphabetFromCharacters(characters)
getAlphabetFromCharacters(characters)
characters |
a character vector of characters |
an alphabet of type Alphabet
alphabet = getAlphabetFromSequences(c("A","A","C","C","G","G","T","T"))
alphabet = getAlphabetFromSequences(c("A","A","C","C","G","G","T","T"))
returns the alphabet which fits to the given sequences
getAlphabetFromSequences(sequences)
getAlphabetFromSequences(sequences)
sequences |
a character vector of sequences |
an alphabet of type Alphabet
alphabet = getAlphabetFromSequences("AACCGGTT")
alphabet = getAlphabetFromSequences("AACCGGTT")
Creates a matrix-representation of a PWM from a set of sequences
getPwmFromAlignment(alignment, alphabet = NULL, pseudoCount = 0)
getPwmFromAlignment(alignment, alphabet = NULL, pseudoCount = 0)
alignment |
a vector or list of sequences each with equal length |
alphabet |
of type Alphabet |
pseudoCount |
the number of pseudo-observations for each character in the alphabet |
PWM as matrix
Hendrik Treutler
motif_folder= "extdata/alignments" motif_name = "calamodulin_1" fileName = paste(motif_folder,"/",motif_name,".txt",sep="") file = system.file(fileName, package = "DiffLogo") motif = getPwmFromAlignment(readLines(file), ASN, 1) seqLogo(pwm = motif, alphabet=ASN)
motif_folder= "extdata/alignments" motif_name = "calamodulin_1" fileName = paste(motif_folder,"/",motif_name,".txt",sep="") file = system.file(fileName, package = "DiffLogo") motif = getPwmFromAlignment(readLines(file), ASN, 1) seqLogo(pwm = motif, alphabet=ASN)
generates a pwm from an alignment file
getPwmFromAlignmentFile(filename, alphabet = NULL)
getPwmFromAlignmentFile(filename, alphabet = NULL)
filename |
the alignment file |
alphabet |
the desired alphabet of type Alphabet |
a pwm
fileName = "extdata/alignments/calamodulin_1.txt" file = system.file(fileName, package = "DiffLogo") pwm = getPwmFromAlignmentFile(file)
fileName = "extdata/alignments/calamodulin_1.txt" file = system.file(fileName, package = "DiffLogo") pwm = getPwmFromAlignmentFile(file)
generates a pwm from a FASTA file
getPwmFromFastaFile(filename, alphabet = NULL)
getPwmFromFastaFile(filename, alphabet = NULL)
filename |
the FASTA file |
alphabet |
the desired alphabet of type Alphabet |
a pwm
fileName = "extdata/alignments/F-box_bacteria.seq.fa" file = system.file(fileName, package = "DiffLogo") pwm = getPwmFromFastaFile(file)
fileName = "extdata/alignments/F-box_bacteria.seq.fa" file = system.file(fileName, package = "DiffLogo") pwm = getPwmFromFastaFile(file)
Generates a pwm from a file of different formats. Supported formats are FASTA files (.fa, .fasta), alignment files (.txt, .text, .al, .alignment), PWM files (.pwm), JASPAR / Position Frequency Matrix files (.pfm), and homer files (.motif).
getPwmFromFile(filename)
getPwmFromFile(filename)
filename |
the file |
a pwm
fileName = "extdata/pwm/H1-hESC.pwm" file = system.file(fileName, package = "DiffLogo") pwm = getPwmFromFile(file)
fileName = "extdata/pwm/H1-hESC.pwm" file = system.file(fileName, package = "DiffLogo") pwm = getPwmFromFile(file)
generates a pwm from a homer file
getPwmFromHomerFile(filename)
getPwmFromHomerFile(filename)
filename |
the homer file |
a pwm
fileName = "extdata/homer/CTCF_Zf_CD4.motif" file = system.file(fileName, package = "DiffLogo") pwm = getPwmFromHomerFile(file)
fileName = "extdata/homer/CTCF_Zf_CD4.motif" file = system.file(fileName, package = "DiffLogo") pwm = getPwmFromHomerFile(file)
generates a pwm from a jaspar file
getPwmFromPfmOrJasparFile(filename)
getPwmFromPfmOrJasparFile(filename)
filename |
the jaspar file |
a pwm
fileName = "extdata/pfm/ctcf_jaspar.pfm" file = system.file(fileName, package = "DiffLogo") pwm = getPwmFromPfmOrJasparFile(file)
fileName = "extdata/pfm/ctcf_jaspar.pfm" file = system.file(fileName, package = "DiffLogo") pwm = getPwmFromPfmOrJasparFile(file)
generates a pwm from a pwm file
getPwmFromPwmFile(filename)
getPwmFromPwmFile(filename)
filename |
the pwm file |
a pwm
fileName = "extdata/pwm/H1-hESC.pwm" file = system.file(fileName, package = "DiffLogo") pwm = getPwmFromPwmFile(file)
fileName = "extdata/pwm/H1-hESC.pwm" file = system.file(fileName, package = "DiffLogo") pwm = getPwmFromPwmFile(file)
extracts the sequences from an alignment file
getSequencesFromAlignmentFile(filename)
getSequencesFromAlignmentFile(filename)
filename |
the alignment file |
a vector of sequences
fileName = "extdata/alignments/calamodulin_1.txt" file = system.file(fileName, package = "DiffLogo") sequences = getSequencesFromAlignmentFile(file)
fileName = "extdata/alignments/calamodulin_1.txt" file = system.file(fileName, package = "DiffLogo") sequences = getSequencesFromAlignmentFile(file)
extracts the sequences from a FASTA file
getSequencesFromFastaFile(filename)
getSequencesFromFastaFile(filename)
filename |
the FASTA file |
a vector of sequences
fileName = "extdata/alignments/F-box_bacteria.seq.fa" file = system.file(fileName, package = "DiffLogo") sequences = getSequencesFromFastaFile(file)
fileName = "extdata/alignments/F-box_bacteria.seq.fa" file = system.file(fileName, package = "DiffLogo") sequences = getSequencesFromFastaFile(file)
the information content for the given probability vector
informationContent(p)
informationContent(p)
p |
probability vector representing the symbol distribution |
an object consisting of height a ylab
Martin Nettling
motif_folder= "extdata/pwm" motif_name = "HepG2" fileName = paste(motif_folder,"/",motif_name,".pwm",sep="") file = system.file(fileName, package = "DiffLogo") motif = getPwmFromPwmFile(file) seqLogo(pwm = motif, stackHeight = informationContent)
motif_folder= "extdata/pwm" motif_name = "HepG2" fileName = paste(motif_folder,"/",motif_name,".pwm",sep="") file = system.file(fileName, package = "DiffLogo") motif = getPwmFromPwmFile(file) seqLogo(pwm = motif, stackHeight = informationContent)
Finds best local alignment for two PWMs.
localPwmAlignment(pwm_left, pwm_right, divergence = shannonDivergence, unaligned_penalty = divergencePenaltyForUnaligned, try_reverse_complement = TRUE, base_distribution = NULL, length_normalization = FALSE)
localPwmAlignment(pwm_left, pwm_right, divergence = shannonDivergence, unaligned_penalty = divergencePenaltyForUnaligned, try_reverse_complement = TRUE, base_distribution = NULL, length_normalization = FALSE)
pwm_left |
first PWM, a matrix of type matrix |
pwm_right |
first PWM, a matrix of type matrix |
divergence |
is a measure of difference between two pwm columns. Smaller is more similar. If you want to use non-uniform background distribution, provide your own function. |
unaligned_penalty |
distance for unaligned columns at edges of matrixes. See divergencePenaltyForUnaligned as an example for providing your own function |
try_reverse_complement |
If false the alignment will not be performed on reverse complements. If true, the input pwms should have column order of ACTG/ACGU. |
base_distribution |
is a vector of length nrow(pwm) that is added to unaligned columns of pwms for comparing. If NULL, uniform distribution is used |
length_normalization |
If true, will minimize the average divergence between PWMs. Otherwise will minimize the sum of divergences between positions. In both cases unalignes positions are compared to base_distribution and are counted when computing the alignment length. |
list of length two containing the alignment and the divergence
Lando Andrey
the change of information content for the given probability vectors
lossOfAbsICDifferences(p1, p2)
lossOfAbsICDifferences(p1, p2)
p1 |
probability vector representing the first symbol distribution |
p2 |
probability vector representing the second symbol distribution |
an object consisting of height and ylab
Martin Nettling
motif_folder= "extdata/pwm" motif_names = c("HepG2","MCF7","HUVEC","ProgFib") motifs = list() for (name in motif_names) { fileName = paste(motif_folder,"/",name,".pwm",sep="") file = system.file(fileName, package = "DiffLogo") motifs[[name]] = getPwmFromPwmFile(file) } pwm1 = motifs[[motif_names[[1]]]] pwm2 = motifs[[motif_names[[2]]]] diffLogoFromPwm(pwm1 = pwm1, pwm2 = pwm2, stackHeight = lossOfAbsICDifferences)
motif_folder= "extdata/pwm" motif_names = c("HepG2","MCF7","HUVEC","ProgFib") motifs = list() for (name in motif_names) { fileName = paste(motif_folder,"/",name,".pwm",sep="") file = system.file(fileName, package = "DiffLogo") motifs[[name]] = getPwmFromPwmFile(file) } pwm1 = motifs[[motif_names[[1]]]] pwm2 = motifs[[motif_names[[2]]]] diffLogoFromPwm(pwm1 = pwm1, pwm2 = pwm2, stackHeight = lossOfAbsICDifferences)
Creates a multiple alignment of pwms
multipleLocalPwmsAlignment(pwms, divergence = shannonDivergence, unaligned_penalty = divergencePenaltyForUnaligned, try_reverse_complement = TRUE, base_distribution = NULL, length_normalization = FALSE)
multipleLocalPwmsAlignment(pwms, divergence = shannonDivergence, unaligned_penalty = divergencePenaltyForUnaligned, try_reverse_complement = TRUE, base_distribution = NULL, length_normalization = FALSE)
pwms |
list of pwms |
divergence |
Divergence measure. |
unaligned_penalty |
is a function for localPwmAlignment. |
try_reverse_complement |
if True, alignment will try reverse complement pwms |
base_distribution |
is a vector of length nrow(pwm) that is added to unaligned columns of pwms for comparing. If NULL, uniform distribution is used |
length_normalization |
If true, will minimize the average divergence between PWMs. Otherwise will minimize the sum of divergences between positions. In both cases unalignes positions are compared to base_distribution and are counted when computing the alignment length. |
list
Lando Andrey
file1 = system.file("extdata/homer/Max.motif", package = "DiffLogo") file2 = system.file("extdata/homer/c-Myc.motif", package = "DiffLogo") pwm1 = getPwmFromFile(file1) pwm2 = getPwmFromFile(file2) multiple_pwms_alignment = multipleLocalPwmsAlignment(list(pwm1, pwm2))
file1 = system.file("extdata/homer/Max.motif", package = "DiffLogo") file2 = system.file("extdata/homer/c-Myc.motif", package = "DiffLogo") pwm1 = getPwmFromFile(file1) pwm2 = getPwmFromFile(file2) multiple_pwms_alignment = multipleLocalPwmsAlignment(list(pwm1, pwm2))
probability differences normalized by the sum of absolute probability differences for the given pair of probability vectors
normalizedDifferenceOfProbabilities(p1, p2)
normalizedDifferenceOfProbabilities(p1, p2)
p1 |
probability vector representing the first symbol distribution |
p2 |
probability vector representing the second symbol distribution |
a vector with one result for each symbol
Martin Nettling
motif_folder= "extdata/pwm" motif_names = c("HepG2","MCF7","HUVEC","ProgFib") motifs = list() for (name in motif_names) { fileName = paste(motif_folder,"/",name,".pwm",sep="") file = system.file(fileName, package = "DiffLogo") motifs[[name]] = getPwmFromPwmFile(file) } pwm1 = motifs[[motif_names[[1]]]] pwm2 = motifs[[motif_names[[2]]]] diffLogoFromPwm(pwm1 = pwm1, pwm2 = pwm2, baseDistribution = normalizedDifferenceOfProbabilities)
motif_folder= "extdata/pwm" motif_names = c("HepG2","MCF7","HUVEC","ProgFib") motifs = list() for (name in motif_names) { fileName = paste(motif_folder,"/",name,".pwm",sep="") file = system.file(fileName, package = "DiffLogo") motifs[[name]] = getPwmFromPwmFile(file) } pwm1 = motifs[[motif_names[[1]]]] pwm2 = motifs[[motif_names[[2]]]] diffLogoFromPwm(pwm1 = pwm1, pwm2 = pwm2, baseDistribution = normalizedDifferenceOfProbabilities)
normalizes the given pwm to column-sums of 1.0
normalizePWM(pwm)
normalizePWM(pwm)
pwm |
a pwm |
a normalized pwm
pwm = matrix(1:40, nrow = 4, dimnames = list(c("A","C","G","T"), 1:10)) pwm = normalizePWM(pwm)
pwm = matrix(1:40, nrow = 4, dimnames = list(c("A","C","G","T"), 1:10)) pwm = normalizePWM(pwm)
Prepares a DiffLogoTable and generates an object that contains the hirarchical clustering and a matrix of prepared difference logos.
prepareDiffLogoTable(PWMs, alphabet = DNA, configuration = list())
prepareDiffLogoTable(PWMs, alphabet = DNA, configuration = list())
PWMs |
a list/vector of position weight matrices (PWMs) each of type pwm, data.frame, or matrix |
alphabet |
the alphabet of the given PWMs |
configuration |
list of (probably part of) of configuration options. See diffLogoTableConfiguration. |
matrix of difference logos
Martin Nettling
motif_folder= "extdata/pwm" motif_names = c("HepG2","MCF7","HUVEC","ProgFib") motifs = list() for (name in motif_names) { fileName = paste(motif_folder,"/",name,".pwm",sep="") file = system.file(fileName, package = "DiffLogo") motifs[[name]] = getPwmFromPwmFile(file) } sampleSizes <- c(100, 150, 200, 250) diffLogoTableObj = prepareDiffLogoTable(motifs);
motif_folder= "extdata/pwm" motif_names = c("HepG2","MCF7","HUVEC","ProgFib") motifs = list() for (name in motif_names) { fileName = paste(motif_folder,"/",name,".pwm",sep="") file = system.file(fileName, package = "DiffLogo") motifs[[name]] = getPwmFromPwmFile(file) } sampleSizes <- c(100, 150, 200, 250) diffLogoTableObj = prepareDiffLogoTable(motifs);
the given probabilities
probabilities(p)
probabilities(p)
p |
probability vector representing the symbol distribution |
the given vector
Martin Nettling
motif_folder= "extdata/pwm" motif_name = "HepG2" fileName = paste(motif_folder,"/",motif_name,".pwm",sep="") file = system.file(fileName, package = "DiffLogo") motif = getPwmFromPwmFile(file) seqLogo(pwm = motif, baseDistribution = probabilities)
motif_folder= "extdata/pwm" motif_name = "HepG2" fileName = paste(motif_folder,"/",motif_name,".pwm",sep="") file = system.file(fileName, package = "DiffLogo") motif = getPwmFromPwmFile(file) seqLogo(pwm = motif, baseDistribution = probabilities)
Counts PWM divergence as sum of divergencies of their columns.
pwmDivergence(pwm_left, pwm_right, divergence = shannonDivergence)
pwmDivergence(pwm_left, pwm_right, divergence = shannonDivergence)
pwm_left |
is a PWM representation in type of matrix |
pwm_right |
is a PWM representation in type of matrix. The result is symmetric on pwm_left and pwm_right |
divergence |
is a Divergence function on columns. |
float - sum of divergences
Creates a distance matrix for pwms
pwmsDistanceMatrix(pwms, diagonal_value = 0, bottom_default_value = NULL, divergence = shannonDivergence, unaligned_penalty = divergencePenaltyForUnaligned, try_reverse_complement = TRUE, base_distribution = NULL, length_normalization = FALSE)
pwmsDistanceMatrix(pwms, diagonal_value = 0, bottom_default_value = NULL, divergence = shannonDivergence, unaligned_penalty = divergencePenaltyForUnaligned, try_reverse_complement = TRUE, base_distribution = NULL, length_normalization = FALSE)
pwms |
list of pwms |
diagonal_value |
value to put on diagonal. |
bottom_default_value |
value to put on bottom triangle. Set to NULL to get symmetric distance matrix. |
divergence |
divergence measure. |
unaligned_penalty |
is a function for localPwmAlignment. |
try_reverse_complement |
if True, alignment will try reverse complement pwms |
base_distribution |
is a vector of length nrow(pwm) that is added to unaligned columns of pwms for comparing. If NULL, uniform distribution is used |
length_normalization |
is a vector of length nrow(pwm) that is added to unaligned columns of pwms for comparing. If NULL, uniform distribution is used |
list
Lando Andrey
Returns alignment vector as if all pwm were reverted.
reverseAlignmentVector(alignment_vector, pwms)
reverseAlignmentVector(alignment_vector, pwms)
alignment_vector |
list of list which $shift and $orientation |
pwms |
list of matrixes. |
list - reversed alignment vector
Lando Andrey
the RNA alphabet, i.e. A, C, G, U
RNA
RNA
An object of class Alphabet
of length 4.
Martin Nettling
RNA
RNA
Draws the classic sequence logo.
seqLogo(pwm, sparse = FALSE, drawLines = 0.5, stackHeight = informationContent, baseDistribution = probabilities, alphabet = DNA, main = NULL)
seqLogo(pwm, sparse = FALSE, drawLines = 0.5, stackHeight = informationContent, baseDistribution = probabilities, alphabet = DNA, main = NULL)
pwm |
representation of a position weight matrix (PWM) of type pwm, data.frame, or matrix |
sparse |
if TRUE margins are reduced and tickmarks are removed from the logo |
drawLines |
distance between background lines |
stackHeight |
function for the height of a stack at position i |
baseDistribution |
function for the heights of the individual bases |
alphabet |
of type Alphabet |
main |
the main title for the plot |
none (draws sequence logo)
Martin Nettling
motif_folder= "extdata/pwm" motif_name = "HepG2" fileName = paste(motif_folder,"/",motif_name,".pwm",sep="") file = system.file(fileName, package = "DiffLogo") motif = getPwmFromPwmFile(file) seqLogo(pwm = motif)
motif_folder= "extdata/pwm" motif_name = "HepG2" fileName = paste(motif_folder,"/",motif_name,".pwm",sep="") file = system.file(fileName, package = "DiffLogo") motif = getPwmFromPwmFile(file) seqLogo(pwm = motif)
the shannon divergence for the given pair of probability vectors
shannonDivergence(p1, p2)
shannonDivergence(p1, p2)
p1 |
probability vector representing the first symbol distribution |
p2 |
probability vector representing the second symbol distribution |
an object consisting of height and ylab
Martin Nettling
motif_folder= "extdata/pwm" motif_names = c("HepG2","MCF7","HUVEC","ProgFib") motifs = list() for (name in motif_names) { fileName = paste(motif_folder,"/",name,".pwm",sep="") file = system.file(fileName, package = "DiffLogo") motifs[[name]] = getPwmFromPwmFile(file) } pwm1 = motifs[[motif_names[[1]]]] pwm2 = motifs[[motif_names[[2]]]] diffLogoFromPwm(pwm1 = pwm1, pwm2 = pwm2, stackHeight = shannonDivergence)
motif_folder= "extdata/pwm" motif_names = c("HepG2","MCF7","HUVEC","ProgFib") motifs = list() for (name in motif_names) { fileName = paste(motif_folder,"/",name,".pwm",sep="") file = system.file(fileName, package = "DiffLogo") motifs[[name]] = getPwmFromPwmFile(file) } pwm1 = motifs[[motif_names[[1]]]] pwm2 = motifs[[motif_names[[2]]]] diffLogoFromPwm(pwm1 = pwm1, pwm2 = pwm2, stackHeight = shannonDivergence)
the sum of absolute information content differences for the given pair of probability vectors
sumOfAbsICDifferences(p1, p2)
sumOfAbsICDifferences(p1, p2)
p1 |
probability vector representing the first symbol distribution |
p2 |
probability vector representing the second symbol distribution |
an object consisting of height and ylab
Martin Nettling
motif_folder= "extdata/pwm" motif_names = c("HepG2","MCF7","HUVEC","ProgFib") motifs = list() for (name in motif_names) { fileName = paste(motif_folder,"/",name,".pwm",sep="") file = system.file(fileName, package = "DiffLogo") motifs[[name]] = getPwmFromPwmFile(file) } pwm1 = motifs[[motif_names[[1]]]] pwm2 = motifs[[motif_names[[2]]]] diffLogoFromPwm(pwm1 = pwm1, pwm2 = pwm2, stackHeight = sumOfAbsICDifferences)
motif_folder= "extdata/pwm" motif_names = c("HepG2","MCF7","HUVEC","ProgFib") motifs = list() for (name in motif_names) { fileName = paste(motif_folder,"/",name,".pwm",sep="") file = system.file(fileName, package = "DiffLogo") motifs[[name]] = getPwmFromPwmFile(file) } pwm1 = motifs[[motif_names[[1]]]] pwm2 = motifs[[motif_names[[2]]]] diffLogoFromPwm(pwm1 = pwm1, pwm2 = pwm2, stackHeight = sumOfAbsICDifferences)
the sum of absolute probability differences for the given pair of probability vectors
sumOfAbsProbabilityDifferences(p1, p2)
sumOfAbsProbabilityDifferences(p1, p2)
p1 |
probability vector representing the first symbol distribution |
p2 |
probability vector representing the second symbol distribution |
an object consisting of height and ylab
Martin Nettling
motif_folder= "extdata/pwm" motif_names = c("HepG2","MCF7","HUVEC","ProgFib") motifs = list() for (name in motif_names) { fileName = paste(motif_folder,"/",name,".pwm",sep="") file = system.file(fileName, package = "DiffLogo") motifs[[name]] = getPwmFromPwmFile(file) } pwm1 = motifs[[motif_names[[1]]]] pwm2 = motifs[[motif_names[[2]]]] diffLogoFromPwm(pwm1 = pwm1, pwm2 = pwm2, stackHeight = sumOfAbsProbabilityDifferences)
motif_folder= "extdata/pwm" motif_names = c("HepG2","MCF7","HUVEC","ProgFib") motifs = list() for (name in motif_names) { fileName = paste(motif_folder,"/",name,".pwm",sep="") file = system.file(fileName, package = "DiffLogo") motifs[[name]] = getPwmFromPwmFile(file) } pwm1 = motifs[[motif_names[[1]]]] pwm2 = motifs[[motif_names[[2]]]] diffLogoFromPwm(pwm1 = pwm1, pwm2 = pwm2, stackHeight = sumOfAbsProbabilityDifferences)
the sum of probabilities for the given probability vector, i.e. 1.0
sumProbabilities(p)
sumProbabilities(p)
p |
probability vector representing the symbol distribution |
an object consisting of height and ylab
Martin Nettling
motif_folder= "extdata/pwm" motif_name = "HepG2" fileName = paste(motif_folder,"/",motif_name,".pwm",sep="") file = system.file(fileName, package = "DiffLogo") motif = getPwmFromPwmFile(file) seqLogo(pwm = motif, stackHeight = sumProbabilities)
motif_folder= "extdata/pwm" motif_name = "HepG2" fileName = paste(motif_folder,"/",motif_name,".pwm",sep="") file = system.file(fileName, package = "DiffLogo") motif = getPwmFromPwmFile(file) seqLogo(pwm = motif, stackHeight = sumProbabilities)
Switches between 'forward' and 'reverse'
switchDirection(direction)
switchDirection(direction)
direction |
either 'forward' or 'reverse' |
either 'reverse' or 'forward'
Computes average pwm divergence from alignment vector for two datasets. This equals to average divergence between all pairs where one pwm comes from left set, and other comes from right
twoSetsAveragePwmDivergenceFromAlignmentVector(left_pwms_list, left_pwms_alignment, right_pwms_list, right_pwms_alignment, divergence = shannonDivergence)
twoSetsAveragePwmDivergenceFromAlignmentVector(left_pwms_list, left_pwms_alignment, right_pwms_list, right_pwms_alignment, divergence = shannonDivergence)
left_pwms_list |
is a list of matrixes |
left_pwms_alignment |
is a list of shifts ($shift) and orientations ($direction) |
right_pwms_list |
is a list of matrixes |
right_pwms_alignment |
is a list of shifts ($shift) and orientations ($direction) |
divergence |
divergence measure. |
float
Lando Andrey