Title: | Conducting statistical inference on comparing the mutational exposures of mutational signatures by using hierarchical latent Dirichlet allocation |
---|---|
Description: | A package built under the Bayesian framework of applying hierarchical latent Dirichlet allocation. It statistically tests whether the mutational exposures of mutational signatures (Shiraishi-model signatures) are different between two groups. The package also provides inference and visualization. |
Authors: | Zhi Yang [aut, cre], Yuichi Shiraishi [ctb] |
Maintainer: | Zhi Yang <[email protected]> |
License: | GPL-3 |
Version: | 1.21.0 |
Built: | 2024-10-30 07:32:33 UTC |
Source: | https://github.com/bioc/HiLDA |
Check whether the parameter F is within the appropriate range
boundaryTurbo_F(turboF, fdim, signatureNum)
boundaryTurbo_F(turboF, fdim, signatureNum)
turboF |
F (converted for turboEM) |
fdim |
a vector specifying the number of possible values for each mutation signature |
signatureNum |
the number of mutation signatures |
a logical value
Check whether the parameter Q is within the appropriate range
boundaryTurbo_Q(turboQ, signatureNum, sampleNum)
boundaryTurbo_Q(turboQ, signatureNum, sampleNum)
turboQ |
Q (converted for turboEM) |
signatureNum |
the number of mutation signatures |
sampleNum |
the number of cancer genomes |
a logical value
A function for calculating the log-likelihood from the data and parameters
calcPMSLikelihood(p, y)
calcPMSLikelihood(p, y)
p |
this variable includes the parameters for mutation signatures and membership parameters |
y |
this variable includes the information on the mutation features, the number of mutation signatures specified and so on |
a value
Restore the converted parameter F for turboEM
convertFromTurbo_F(turboF, fdim, signatureNum, isBackground)
convertFromTurbo_F(turboF, fdim, signatureNum, isBackground)
turboF |
F (converted for turboEM) |
fdim |
a vector specifying the number of possible values for each mutation signature |
signatureNum |
the number of mutation signatures |
isBackground |
the logical value showing whether a background mutaiton features is included or not |
a vector
Restore the converted parameter Q for turboEM
convertFromTurbo_Q(turboQ, signatureNum, sampleNum)
convertFromTurbo_Q(turboQ, signatureNum, sampleNum)
turboQ |
Q (converted for turboEM) |
signatureNum |
the number of mutation signatures |
sampleNum |
the number of cancer genomes |
a vector
Convert the parameter F so that turboEM can treat
convertToTurbo_F(vF, fdim, signatureNum, isBackground)
convertToTurbo_F(vF, fdim, signatureNum, isBackground)
vF |
F (converted to a vector) |
fdim |
a vector specifying the number of possible values for each mutation signature |
signatureNum |
the number of mutation signatures |
isBackground |
the logical value showing whether a background mutaiton features is included or not |
a vector
Convert the parameter Q so that turboEM can treat
convertToTurbo_Q(vQ, signatureNum, sampleNum)
convertToTurbo_Q(vQ, signatureNum, sampleNum)
vQ |
Q (converted to a vector) |
signatureNum |
the number of mutation signatures |
sampleNum |
the number of cancer genomes |
a vector
An S4 class representing the estimated parameters
sampleList
a list of sample names observed in the input mutation data
signatureNum
the number of mutation signatures specified at the time of estimation
isBackGround
the flag showing whether the background signature data is used or not.
backGroundProb
the background signatures
signatureFeatureDistribution
estimated parameters for mutation signatures
sampleSignatureDistribution
estimated parameters for memberships of mutation signatures for each sample
loglikelihood
the log-likelihood value for the estimated parameters
Calculate the value of the log-likelihood for given parameters
getLogLikelihoodC( vPatternList, vSparseCount, vF, vQ, fdim, signatureNum, sampleNum, patternNum, samplePatternNum, isBackground, vF0 )
getLogLikelihoodC( vPatternList, vSparseCount, vF, vQ, fdim, signatureNum, sampleNum, patternNum, samplePatternNum, isBackground, vF0 )
vPatternList |
The list of possible mutation features (converted to a vector) |
vSparseCount |
The table showing (mutation feature, sample, the number of mutation) (converted to a vector) |
vF |
F (converted to a vector) |
vQ |
Q (converted to a vector) |
fdim |
a vector specifying the number of possible values for each mutation signature |
signatureNum |
the number of mutation signatures |
sampleNum |
the number of cancer genomes |
patternNum |
the number of possible combinations of all the mutation features |
samplePatternNum |
the number of possible combination of samples and mutation patternns |
isBackground |
the logical value showing whether a background mutaiton features is included or not |
vF0 |
a background mutaiton features |
a value
Get mutation feature vector from context sequence data and reference and alternate allele information
getMutationFeatureVector( context, ref_base, alt_base, strandInfo = NULL, numBases, type )
getMutationFeatureVector( context, ref_base, alt_base, strandInfo = NULL, numBases, type )
context |
the context sequence data around the mutated position. This shoud be Biostrings::DNAStringSet class |
ref_base |
the reference bases at the mutated position. |
alt_base |
the alternate bases at the mutated position. |
strandInfo |
transcribed strand information at the mutated position. (this is optional) |
numBases |
the number of flanking bases around the mutated position. |
type |
the type of mutation feature vecotr (should be "independent" or "full"). |
a mutation featuer vector
Read the raw mutation data with the mutation feature vector format, estimate and plot both mutation signatures and their fractions
hildaBarplot( inputG, hildaResult, sigOrder = NULL, refGroup, sortSampleNum = TRUE, refName = "Control", altName = "Case", charSize = 3 )
hildaBarplot( inputG, hildaResult, sigOrder = NULL, refGroup, sortSampleNum = TRUE, refName = "Control", altName = "Case", charSize = 3 )
inputG |
a MutationFeatureData S4 class output by the pmsignature. |
hildaResult |
a rjags class output by HiLDA. |
sigOrder |
the order of signatures if needed (default: NULL). |
refGroup |
the samples in the reference group (default: NULL). |
sortSampleNum |
whether to sort plots by number of mutations (default: TRUE). |
refName |
the name of reference group (default: Control) |
altName |
the name of the other group (default: Case) |
charSize |
the size of the character on the signature plot (default: 3) |
a list of a signature plot and a barplot of mutational exposures
load(system.file("extdata/sample.rdata", package="HiLDA")) inputFile <- system.file("extdata/hildaLocal.rdata", package="HiLDA") hildaLocal <- readRDS(inputFile) hildaBarplot(G, hildaLocal, refGroup=1:4)
load(system.file("extdata/sample.rdata", package="HiLDA")) inputFile <- system.file("extdata/hildaLocal.rdata", package="HiLDA") hildaLocal <- readRDS(inputFile) hildaBarplot(G, hildaLocal, refGroup=1:4)
Read the raw mutation data with the mutation feature vector format, estimate and plot both mutation signatures and their fractions
hildaDiffPlot(inputG, hildaResult, sigOrder = NULL, charSize = 3)
hildaDiffPlot(inputG, hildaResult, sigOrder = NULL, charSize = 3)
inputG |
a MutationFeatureData S4 class output by the pmsignature. |
hildaResult |
a rjags class output by HiLDA. |
sigOrder |
the order of signatures if needed (default: NULL). |
charSize |
the size of the character on the signature plot (default: 3) |
a list of the signature plot and the mean difference plot.
load(system.file("extdata/sample.rdata", package="HiLDA")) inputFile <- system.file("extdata/hildaLocal.rdata", package="HiLDA") hildaLocal <- readRDS(inputFile) hildaDiffPlot(G, hildaLocal)
load(system.file("extdata/sample.rdata", package="HiLDA")) inputFile <- system.file("extdata/hildaLocal.rdata", package="HiLDA") hildaLocal <- readRDS(inputFile) hildaDiffPlot(G, hildaLocal)
Compute the Bayes factor
hildaGlobalResult(jagsOutput, pM1 = 0.5)
hildaGlobalResult(jagsOutput, pM1 = 0.5)
jagsOutput |
the output jags file generated by the jags function from the R2jags package. |
pM1 |
the probability of sampling the null (default: 0.5) |
a number for the Bayes factor
load(system.file("extdata/sample.rdata", package="HiLDA")) hildaGlobal <- hildaTest(inputG=G, numSig=3, refGroup=1:4, nIter=1000, localTest=TRUE) hildaGlobalResult(hildaGlobal)
load(system.file("extdata/sample.rdata", package="HiLDA")) hildaGlobal <- hildaTest(inputG=G, numSig=3, refGroup=1:4, nIter=1000, localTest=TRUE) hildaGlobalResult(hildaGlobal)
Extract the posterior distributions of the mean differences in muational exposures
hildaLocalResult(jagsOutput)
hildaLocalResult(jagsOutput)
jagsOutput |
the output jags file generated by the jags function from the R2jags package. |
a data frame that contains the posterior distributions of difference.
inputFile <- system.file("extdata/hildaLocal.rdata", package="HiLDA") hildaLocal <- readRDS(inputFile) hildaLocalResult(hildaLocal)
inputFile <- system.file("extdata/hildaLocal.rdata", package="HiLDA") hildaLocal <- readRDS(inputFile) hildaLocalResult(hildaLocal)
Plot mutation signatures from HiLDA output
hildaPlotSignature(hildaResult, sigOrder = NULL, colorList = NULL, ...)
hildaPlotSignature(hildaResult, sigOrder = NULL, colorList = NULL, ...)
hildaResult |
a rjags class output by HiLDA |
sigOrder |
the order of signatures if needed (default: NULL) |
colorList |
a vector of color for mutational exposures barplots |
... |
additional arguments passed on to visPMS |
a plot object containing all mutational signatures
inputFile <- system.file("extdata/hildaLocal.rdata", package="HiLDA") hildaLocal <- readRDS(inputFile) hildaPlotSignature(hildaLocal)
inputFile <- system.file("extdata/hildaLocal.rdata", package="HiLDA") hildaLocal <- readRDS(inputFile) hildaPlotSignature(hildaLocal)
The mutation position format is tab-delimited text file, where the 1st-5th columns shows sample names, chromosome names, coordinates, reference bases (A, C, G, or T) and the alternate bases (A, C, G, or T), respectively. An example is as follows;
—
sample1 chr1 100 A C
sample1 chr1 200 A T
sample1 chr2 100 G T
sample2 chr1 300 T C
sample3 chr3 400 T C
—
Also, this function usually can accept compressed files (e.g., by gzip, bzip2 and so on) when using recent version of R.
hildaReadMPFile( infile, numBases = 3, trDir = FALSE, bs_genome = NULL, txdb_transcript = NULL )
hildaReadMPFile( infile, numBases = 3, trDir = FALSE, bs_genome = NULL, txdb_transcript = NULL )
infile |
the path for the input file for the mutation data of Mutation Position Format. |
numBases |
the number of upstream and downstream flanking bases (including the mutated base) to take into account. |
trDir |
the index representing whether transcription direction is considered or not. The gene annotation information is given by UCSC knownGene (TxDb.Hsapiens.UCSC.hg19.knownGene object) When trDir is TRUE, the mutations located in intergenic region are excluded from the analysis. |
bs_genome |
this argument specifies the reference genome (e.g., B Sgenome.Mmusculus.UCSC.mm10 can be used for the mouse genome). See https://bioconductor.org/packages/release/bioc/html/BSgenome.html for the available genome list |
txdb_transcript |
this argument specified the transcript database (e.g., TxDb.Mmusculus.UCSC.mm10.knownGene can be used for the mouse genome). See https://bioconductor.org/packages/release/bioc/html/AnnotationDbi.html for details. |
The output is an instance of MutationFeatureData S4 class (which stores summarized information on mutation data). This will be typically used as the initial values for the global test and the local test.
inputFile <- system.file("extdata/esophageal.mp.txt.gz", package="HiLDA") G <- hildaReadMPFile(inputFile, numBases=5, trDir=TRUE)
inputFile <- system.file("extdata/esophageal.mp.txt.gz", package="HiLDA") G <- hildaReadMPFile(inputFile, numBases=5, trDir=TRUE)
Output the maximum potential scale reduction statistic of all parameters estimated
hildaRhat(jagsOutput)
hildaRhat(jagsOutput)
jagsOutput |
the output jags file generated by the jags function from the R2jags package. |
a number for the Rhat statistic.
inputFile <- system.file("extdata/hildaLocal.rdata", package="HiLDA") hildaLocal <- readRDS(inputFile) hildaRhat(hildaLocal)
inputFile <- system.file("extdata/hildaLocal.rdata", package="HiLDA") hildaLocal <- readRDS(inputFile) hildaRhat(hildaLocal)
Apply HiLDA to statistically testing the global difference in burdens of mutation signatures between two groups
hildaTest( inputG, numSig, refGroup, useInits = NULL, sigOrder = NULL, nIter = 2000, nBurnin = 0, pM1 = 0.5, localTest = TRUE, ... )
hildaTest( inputG, numSig, refGroup, useInits = NULL, sigOrder = NULL, nIter = 2000, nBurnin = 0, pM1 = 0.5, localTest = TRUE, ... )
inputG |
a MutationFeatureData S4 class output by the pmsignature. |
numSig |
an integer number of the number of mutational signatures. |
refGroup |
the indice indicating the samples in the reference group. |
useInits |
a EstimatedParameters S4 class output by the pmsignature (default: NULL) |
sigOrder |
the order of the mutational signatures. |
nIter |
number of total iterations per chain (default: 2000). |
nBurnin |
length of burn (default: 0). |
pM1 |
the probability of sampling the null (default: 0.5) |
localTest |
a logical value (default: TRUE) |
... |
Other arguments passed on to methods. |
the output jags file
load(system.file("extdata/sample.rdata", package="HiLDA")) ## with initial values hildaLocal <- hildaTest(inputG=G, numSig=3, refGroup=1:4, nIter=1000, localTest=TRUE) hildaGlobal <- hildaTest(inputG=G, numSig=3, refGroup=1:4, nIter=1000, localTest=FALSE)
load(system.file("extdata/sample.rdata", package="HiLDA")) ## with initial values hildaLocal <- hildaTest(inputG=G, numSig=3, refGroup=1:4, nIter=1000, localTest=TRUE) hildaGlobal <- hildaTest(inputG=G, numSig=3, refGroup=1:4, nIter=1000, localTest=FALSE)
@slot type type of data format (independent, full, custom) @slot flankingBasesNum the number of flanking bases to consider (only applicable for independent and full types) @slot transcriptionDirection the flag representing whether transcription direction is considered or not @slot possibleFeatures a vector representing the numbers of possible values for each mutation feature
An S4 class representing the mutation data
featureVectorList
a list of feature vectors actually observed in the input mutation data
sampleList
a list of sample names observed in the input mutation data
countData
a matrix representing the number of mutations and samples. The (1st, 2nd, 3rd) columns are for (mutation pattern index, sample index, frequencies).
mutationPosition
a data frame containing position and mutations
A function for estimating parameters using Squared EM algorithm
mySquareEM(p, y, tol = 1e-04, maxIter = 10000)
mySquareEM(p, y, tol = 1e-04, maxIter = 10000)
p |
this variable includes the parameters for mutation signatures and membership parameters |
y |
this variable includes the information on the mutation features, the number of mutation signatures specified and so on |
tol |
tolerance for the estimation (when the difference of log-likelihoods become below this value, stop the estimation) |
maxIter |
the maximum number of iteration of estimation |
a list
Plot both mutation signatures and their mutational exposures from pmsignature output
pmBarplot( inputG, inputParam, sigOrder = NULL, refGroup = NULL, sortSampleNum = TRUE, refName = "Control", altName = "Case", charSize = 3 )
pmBarplot( inputG, inputParam, sigOrder = NULL, refGroup = NULL, sortSampleNum = TRUE, refName = "Control", altName = "Case", charSize = 3 )
inputG |
a MutationFeatureData S4 class output by the pmsignature. |
inputParam |
a estimatedParameters S4 class output by the pmsignature. |
sigOrder |
the order of signatures if needed (default: NULL). |
refGroup |
the samples in the reference group (default: NULL). |
sortSampleNum |
whether to sort by number of mutations (default: TRUE). |
refName |
the name of reference group (default: Control). |
altName |
the name of the other group (default: Case). |
charSize |
the size of the character on the signature plot (default: 3). |
a list of a signature plot and a barplot of mutational exposures
load(system.file("extdata/sample.rdata", package="HiLDA")) Param <- pmgetSignature(G, K = 3) pmPlots <- pmBarplot(G, Param, refGroup=1:4) cowplot::plot_grid(pmPlots$sigPlot, pmPlots$propPlot, rel_widths = c(1,3))
load(system.file("extdata/sample.rdata", package="HiLDA")) Param <- pmgetSignature(G, K = 3) pmPlots <- pmBarplot(G, Param, refGroup=1:4) cowplot::plot_grid(pmPlots$sigPlot, pmPlots$propPlot, rel_widths = c(1,3))
Obtain the parameters for mutation signatures and memberships
pmgetSignature( mutationFeatureData, K, numInit = 10, tol = 1e-04, maxIter = 10000 )
pmgetSignature( mutationFeatureData, K, numInit = 10, tol = 1e-04, maxIter = 10000 )
mutationFeatureData |
the mutation data (MutationFeatureData class
(S4 class)) by the |
K |
the number of mutation signatures |
numInit |
the number of performing calculations with different initial values |
tol |
tolerance for the estimation (when the difference of log-likelihoods become below this value, stop the estimation) |
maxIter |
the maximum number of iteration of estimation |
The output is an instance of EstimatedParameters S4 class, which stores estimated parameters and other meta-information, and will be used for saving parameter values and visualizing the mutation signatures and memberships
## After obtaining G (see e.g., hildaReadMPFile function) load(system.file("extdata/sample.rdata", package="HiLDA")) Param <- pmgetSignature(G, K = 3)
## After obtaining G (see e.g., hildaReadMPFile function) load(system.file("extdata/sample.rdata", package="HiLDA")) Param <- pmgetSignature(G, K = 3)
Plot both mutation signatures and their mutational exposures from pmsignature output for more than two groups
pmMultiBarplot( inputG, inputParam, sigOrder = NULL, groupIndices, sortSampleNum = TRUE, charSize = 3 )
pmMultiBarplot( inputG, inputParam, sigOrder = NULL, groupIndices, sortSampleNum = TRUE, charSize = 3 )
inputG |
a MutationFeatureData S4 class output by the pmsignature. |
inputParam |
a estimatedParameters S4 class output by the pmsignature. |
sigOrder |
the order of signatures if needed (default: NULL). |
groupIndices |
a vector of group indicators. |
sortSampleNum |
an indictor variable on whether samples are sorted by the number of mutations (default: TRUE). |
charSize |
the size of the character on the signature plot (default: 3) |
a list of the signature plot and the mean difference plot.
load(system.file("extdata/sample.rdata", package="HiLDA")) Param <- pmgetSignature(G, K = 3) pmPlots <- pmMultiBarplot(G, Param, groupIndices=c(1, rep(2,3), rep(3,6))) cowplot::plot_grid(pmPlots$sigPlot, pmPlots$propPlot, rel_widths = c(1,3))
load(system.file("extdata/sample.rdata", package="HiLDA")) Param <- pmgetSignature(G, K = 3) pmPlots <- pmMultiBarplot(G, Param, groupIndices=c(1, rep(2,3), rep(3,6))) cowplot::plot_grid(pmPlots$sigPlot, pmPlots$propPlot, rel_widths = c(1,3))
Plot mutation signatures from pmsignature output
pmPlotSignature(inputParam, sigOrder = NULL, colorList = NULL, ...)
pmPlotSignature(inputParam, sigOrder = NULL, colorList = NULL, ...)
inputParam |
a estimatedParameters S4 class output by the pmsignature. |
sigOrder |
the order of signatures if needed (default: NULL). |
colorList |
a list of color to highlight the signatures (default: NULL). |
... |
additional arguments passed on to visPMS. |
a plot object containing all mutational signatures
load(system.file("extdata/sample.rdata", package="HiLDA")) Param <- pmgetSignature(G, K = 3) pmPlotSignature(Param)
load(system.file("extdata/sample.rdata", package="HiLDA")) Param <- pmgetSignature(G, K = 3) pmPlotSignature(Param)
A functional for generating the function checking the parameter (p) is within the restricted conditions or not
PMSboundary(y)
PMSboundary(y)
y |
this variable includes the information on the mutation features, the number of mutation signatures specified and so on |
a functional
Update the parameter F and Q (M-step in the EM-algorithm)
updateMstepFQC( vPatternList, vSparseCount, nTheta, fdim, signatureNum, sampleNum, patternNum, samplePatternNum, isBackground )
updateMstepFQC( vPatternList, vSparseCount, nTheta, fdim, signatureNum, sampleNum, patternNum, samplePatternNum, isBackground )
vPatternList |
The list of possible mutation features (converted to a vector) |
vSparseCount |
The table showing (mutation feature, sample, the number of mutation) (converted to a vector) |
nTheta |
The parameters in the distribution |
fdim |
a vector specifying the number of possible values for each mutation signature |
signatureNum |
the number of mutation signatures |
sampleNum |
the number of cancer genomes |
patternNum |
the number of possible combinations of all the mutation features |
samplePatternNum |
the number of possible combination of samples and mutation patternns |
isBackground |
the logical value showing whether a background mutaiton features is included or not |
a vector
A function for updating parameters using EM-algorithm
updatePMSParam(p, y)
updatePMSParam(p, y)
p |
this variable includes the parameters for mutation signatures and membership parameters |
y |
this variable includes the information on the mutation features, the number of mutation signatures specified and so on |
a value
Update the auxiliary parameters theta and normalize them so that the summation of each group sums to 1 (E-step), also calculate the current log-likelihood value
updateTheta_NormalizedC( vPatternList, vSparseCount, vF, vQ, fdim, signatureNum, sampleNum, patternNum, samplePatternNum, isBackground, vF0 )
updateTheta_NormalizedC( vPatternList, vSparseCount, vF, vQ, fdim, signatureNum, sampleNum, patternNum, samplePatternNum, isBackground, vF0 )
vPatternList |
The list of possible mutation features (converted to a vector) |
vSparseCount |
The table showing (mutation feature, sample, the number of mutation) (converted to a vector) |
vF |
F (converted to a vector) |
vQ |
Q (converted to a vector) |
fdim |
a vector specifying the number of possible values for each mutation signature |
signatureNum |
the number of mutation signatures |
sampleNum |
the number of cancer genomes |
patternNum |
the number of possible combinations of all the mutation features |
samplePatternNum |
the number of possible combination of samples and mutation patternns |
isBackground |
the logical value showing whether a background mutaiton features is included or not |
vF0 |
a background mutaiton features |
a value for theta
Generate visualization of mutation signatures for the model with substitution patterns and flanking bases represented by the indepenent representation.
visPMS( vF, numBases, baseCol = NA, trDir = FALSE, charSize = 5, isScale = FALSE, alpha = 2, charLimit = 0.25 )
visPMS( vF, numBases, baseCol = NA, trDir = FALSE, charSize = 5, isScale = FALSE, alpha = 2, charLimit = 0.25 )
vF |
a matrix for mutation signature |
numBases |
the number of flanking bases |
baseCol |
the colour of the bases (A, C, G, T, plus/minus strand) |
trDir |
the index whether the strand direction is plotted or not |
charSize |
the size of the character |
isScale |
the index whether the height of the flanking base is changed or not |
alpha |
the parameter for the Renyi entropy (applicable only if the isScale is TRUE) |
charLimit |
the limit of char size |
a plot of the input mutational signature
load(system.file("extdata/sample.rdata", package="HiLDA")) Param <- pmgetSignature(G, K = 3) sig <- slot(Param, "signatureFeatureDistribution")[1,,] visPMS(sig, numBases = 5, isScale = TRUE)
load(system.file("extdata/sample.rdata", package="HiLDA")) Param <- pmgetSignature(G, K = 3) sig <- slot(Param, "signatureFeatureDistribution")[1,,] visPMS(sig, numBases = 5, isScale = TRUE)