Title: | Assessment of evidence for LOH in spatial transcriptomics pre-processed data using Bayes factor calculations |
---|---|
Description: | tLOH, or transcriptomicsLOH, assesses evidence for loss of heterozygosity (LOH) in pre-processed spatial transcriptomics data. This tool requires spatial transcriptomics cluster and allele count information at likely heterozygous single-nucleotide polymorphism (SNP) positions in VCF format. Bayes factors are calculated at each SNP to determine likelihood of potential loss of heterozygosity event. Two plotting functions are included to visualize allele fraction and aggregated Bayes factor per chromosome. Data generated with the 10X Genomics Visium Spatial Gene Expression platform must be pre-processed to obtain an individual sample VCF with columns for each cluster. Required fields are allele depth (AD) with counts for reference/alternative alleles and read depth (DP). |
Authors: | Michelle Webb [cre, aut], David Craig [aut] |
Maintainer: | Michelle Webb <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.15.0 |
Built: | 2024-11-22 06:26:33 UTC |
Source: | https://github.com/bioc/tLOH |
Output is a plot of the sum of Log10(1/K) values (K is a Bayes factor) per chromosome for each cluster. The dotted line at y=3 represents threshold for substantial evidence toward Model 2
df |
An input dataframe with merged cluster data output by tLOHCalc |
sample |
Name of sample for plot title |
Output is a plot where the y axis is sum of Log10(1/K) values (K is a Bayes factor) per chromosome and the x axis is chromosome
Michelle Webb
data('humanGBMsampleAC') df <- tLOHCalc(humanGBMsampleAC) aggregateCHRPlot(df,"Example")
data('humanGBMsampleAC') df <- tLOHCalc(humanGBMsampleAC) aggregateCHRPlot(df,"Example")
Creates a plot with panels for each cluster. The x-axis is chromosome, y-axis is allele frequency. Point color is Log10(1/K) where K is a Bayes factor
df |
An input dataframe with merged cluster data |
sample |
Name of sample for plot title |
Output is a plot of allele frequency for each cluster. Can be assigned to object and visualized individually. For each panel, the y axis has a min of 0 and max of 1
Michelle Webb
data('humanGBMsampleAC') df <- tLOHCalc(humanGBMsampleAC) alleleFrequencyPlot(df,"Example")
data('humanGBMsampleAC') df <- tLOHCalc(humanGBMsampleAC) alleleFrequencyPlot(df,"Example")
Generates dataframes equal to the length of original data containing NA if errors were found during the run HMM process. Important for final overview of results
documentErrorRegions(a,b)
documentErrorRegions(a,b)
a |
List of dataframes from prepareHMMdataframes |
b |
List of HMM state determination dataframes from the run_HMM series of functions |
Output is a list of dataframes to be used by the regionAnalysis function
Michelle Webb
estimatedStates <- data.frame(state = c(1,1,1), S1 = c(1,1,1), S2 = c(1,1,1)) sampleDataFrame <- data.frame(data = c(1,1,1)) list1 <- list(sampleDataFrame,sampleDataFrame) list2 <- list(estimatedStates,estimatedStates) documentErrorRegions(list1,list2)
estimatedStates <- data.frame(state = c(1,1,1), S1 = c(1,1,1), S2 = c(1,1,1)) sampleDataFrame <- data.frame(data = c(1,1,1)) list1 <- list(sampleDataFrame,sampleDataFrame) list2 <- list(estimatedStates,estimatedStates) documentErrorRegions(list1,list2)
A dataset of a human glioblastoma sample containing the allele count (AC) information for 9 spatial transcriptomics clusters
data("humanGBMsampleAC")
data("humanGBMsampleAC")
A data frame with 34601 rows and 7 variables:
dbSNP rs identifier
cluster number
total number of counts
counts for the reference allele
counts for the alternative allele
chromosome number
genomic position
Craig Lab data repository
data("humanGBMsampleAC")
data("humanGBMsampleAC")
A dataset of initial start probabilities to use with the HMM analysis. Users may create their own dataset using the same format
data("initialStartProbabilities")
data("initialStartProbabilities")
A data frame with 22 rows and 2 variables:
Initial Probability 1
Initial Probability 2
Craig Lab data repository
data("initialStartProbabilities")
data("initialStartProbabilities")
Calculation of the marginal likelihood of Model 1, LOH
x |
Dataframe output by tLOHDataImport |
a |
Alpha value |
b |
Beta value |
The reference and total counts should come from a .csv output by the spatial LOH pre-processing pipeline. The recommended values for both Alpha1 and Beta1 is 1.25.
The value returned from marginalLikelihoodM1 is numeric
Michelle Webb
test <- data.frame(REF=c(10,2,3,4,5,10),TOTAL=c(20,20,20,20,20,20)) apply(test, MARGIN = 1, FUN = marginalLikelihoodM1, a = 1.25, b = 1.25)
test <- data.frame(REF=c(10,2,3,4,5,10),TOTAL=c(20,20,20,20,20,20)) apply(test, MARGIN = 1, FUN = marginalLikelihoodM1, a = 1.25, b = 1.25)
Calculation of the marginal likelihood of Model 2, HET
x |
Dataframe output by tLOHDataImport |
a |
Alpha value |
b |
Beta value |
The reference and total counts should come from a .csv output by the spatial LOH pre-processing pipeline. The recommended values for both Alpha2 and Beta2 is 500.
The value returned from marginalLikelihoodM1 is numeric
Michelle Webb
test <- data.frame(REF=c(10,2,3,4,5,10),TOTAL=c(20,20,20,20,20,20)) apply(test, MARGIN = 1, FUN = marginalLikelihoodM1, a = 500, b = 500)
test <- data.frame(REF=c(10,2,3,4,5,10),TOTAL=c(20,20,20,20,20,20)) apply(test, MARGIN = 1, FUN = marginalLikelihoodM1, a = 500, b = 500)
This function takes the number of counts for a reference allele as x, and the number of total allele counts as y.
x |
Number of counts for the reference allele. |
y |
Number of counts total at this SNP position. |
The reference and total counts should come from a .csv output by the spatial LOH pre-processing pipeline.
The value returned from marginalM1Calc is numeric
Michelle Webb
marginalM1Calc(10, 0.5)
marginalM1Calc(10, 0.5)
Calculation of marginal M2 het
marginalM2CalcBHET(x, a, b)
marginalM2CalcBHET(x, a, b)
x |
Number of counts for the reference allele |
a |
Alpha value |
b |
Beta value |
The value returned from marginalM2CalcBHET is numeric
Michelle Webb
save <- data.frame(REF=c(10,2,3,4,5,10),TOTAL=c(20,20,20,20,20,20)) apply(save, MARGIN = 1, FUN = marginalM2CalcBHET, a = 10,b = 10)
save <- data.frame(REF=c(10,2,3,4,5,10),TOTAL=c(20,20,20,20,20,20)) apply(save, MARGIN = 1, FUN = marginalM2CalcBHET, a = 10,b = 10)
Calculation of the marginal for Model 2
marginalM2CalcBLOH(x, a, b)
marginalM2CalcBLOH(x, a, b)
x |
Counts for the reference allele |
a |
Alpha value |
b |
Beta value |
The value returned from marginalM2CalcBLOH is numeric
Michelle Webb
test <- data.frame(REF=c(10,2,3,4,5,10),TOTAL=c(20,20,20,20,20,20)) apply(test, MARGIN = 1, FUN = marginalM2CalcBLOH, a = 10,b = 10)
test <- data.frame(REF=c(10,2,3,4,5,10),TOTAL=c(20,20,20,20,20,20)) apply(test, MARGIN = 1, FUN = marginalM2CalcBLOH, a = 10,b = 10)
This function takes a set of numbers and outputs a mode peak value. To be used in a larger function that will be updated.
x |
List of allele fraction values |
List of values should be the allele fractions of SNPs with the top 25 percent of counts in a region. If only one value is input, that value is returned.
The value returned is numeric
Michelle Webb
test <- c(1,2,3,4,5) modePeakCalc(test)
test <- c(1,2,3,4,5) modePeakCalc(test)
Split output from tLOHCalc or tLOHCalcUpdate into a list of cluster and chromosome separated dataframes. Applies an ordered quantile normalization on the bayes factor K values in each dataset.
prepareHMMdataframes(importedData)
prepareHMMdataframes(importedData)
importedData |
Input dataframe generated from the tLOHCalc or tLOHCalcUpdate function |
Output is a list of dataframes separated by chromosome and cluster
Michelle Webb
data('humanGBMsampleAC') df <- tLOHCalcUpdate(humanGBMsampleAC,1.25,1.25,500,500,4) output <- prepareHMMdataframes(df)
data('humanGBMsampleAC') df <- tLOHCalcUpdate(humanGBMsampleAC,1.25,1.25,500,500,4) output <- prepareHMMdataframes(df)
Generates summary metrics for HMM regions
regionAnalysis(originalDF,dataframeList)
regionAnalysis(originalDF,dataframeList)
originalDF |
Original imported dataframe from tLOHCalcUpdated |
dataframeList |
List of HMM state determination dataframes from the run_HMM series of functions |
Output is a dataframe containing region metrics and data for each HMM segment
Michelle Webb
data('humanGBMsampleAC') data('initialStartProbabilities') df <- tLOHCalcUpdate(humanGBMsampleAC,1.25,1.25,500,500,4) dataframeList <- prepareHMMdataframes(df) trProbs <- cbind(c(0.8999,0.1001),c(0.1001,0.8999)) dataframeList2 <- runHMM_1(dataframeList, initialStartProbabilities, trProbs) dataframeList3 <- runHMM_2(dataframeList2) output <- runHMM_3(dataframeList3) final <- regionAnalysis(dataframeList,output)
data('humanGBMsampleAC') data('initialStartProbabilities') df <- tLOHCalcUpdate(humanGBMsampleAC,1.25,1.25,500,500,4) dataframeList <- prepareHMMdataframes(df) trProbs <- cbind(c(0.8999,0.1001),c(0.1001,0.8999)) dataframeList2 <- runHMM_1(dataframeList, initialStartProbabilities, trProbs) dataframeList3 <- runHMM_2(dataframeList2) output <- runHMM_3(dataframeList3) final <- regionAnalysis(dataframeList,output)
Final metrics and summary for regions
regionFinalize(finalList1)
regionFinalize(finalList1)
finalList1 |
List of dataframes output by the regionAnalysis function |
Output is a table containing all calculations from the bayes factor and HMM analysis
Michelle Webb
## Not run: data('humanGBMsampleAC') data('initialStartProbabilities') df <- tLOHCalcUpdate(humanGBMsampleAC,1.25,1.25,500,500,4) dataframeList <- prepareHMMdataframes(df) trProbs <- cbind(c(0.8999,0.1001),c(0.1001,0.8999)) dataframeList2 <- runHMM_1(dataframeList, initialStartProbabilities, trProbs) dataframeList3 <- runHMM_2(dataframeList2) output <- runHMM_3(dataframeList3) intermediate <- regionAnalysis(dataframeList,output) final <- regionFinalize(intermediate) ## End(Not run)
## Not run: data('humanGBMsampleAC') data('initialStartProbabilities') df <- tLOHCalcUpdate(humanGBMsampleAC,1.25,1.25,500,500,4) dataframeList <- prepareHMMdataframes(df) trProbs <- cbind(c(0.8999,0.1001),c(0.1001,0.8999)) dataframeList2 <- runHMM_1(dataframeList, initialStartProbabilities, trProbs) dataframeList3 <- runHMM_2(dataframeList2) output <- runHMM_3(dataframeList3) intermediate <- regionAnalysis(dataframeList,output) final <- regionFinalize(intermediate) ## End(Not run)
Take rows with a total count greater than 2000 and sets to NA
dataframe |
input dataframe |
cols |
which column |
rows |
which row |
newValue |
what to replace |
Dataframe returned
Michelle Webb
test <- data.frame(TOTAL=c(2000,20,20,20,20,20)) removeOutlierFromCalc(test,"TOTAL",test[test$TOTAL > 2000,],NA)
test <- data.frame(TOTAL=c(2000,20,20,20,20,20)) removeOutlierFromCalc(test,"TOTAL",test[test$TOTAL > 2000,],NA)
Applies the depmixS4 method depmix on normalized K values.
runHMM_1(dataframeList, initProbs, trProbs)
runHMM_1(dataframeList, initProbs, trProbs)
dataframeList |
List of dataframes separated by cluster and chromosome from the prepareHMMdataframes function. |
initProbs |
Dataframe containing 22 rows and two columns, initProb1 and initProb2. Each row represents a chromosome in sequential order, with initProb1 being the probability of state1 and initProb2 being the probability of state2. |
trProbs |
Matrix of transition start probabilities for HMM |
Output is a list of depmixS4 depmix class objects for each input dataframe
Michelle Webb
data('humanGBMsampleAC') data('initialStartProbabilities') df <- tLOHCalcUpdate(humanGBMsampleAC,1.25,1.25,500,500,4) dataframeList <- prepareHMMdataframes(df) trProbs <- cbind(c(0.8999,0.1001),c(0.1001,0.8999)) output <- runHMM_1(dataframeList, initialStartProbabilities, trProbs)
data('humanGBMsampleAC') data('initialStartProbabilities') df <- tLOHCalcUpdate(humanGBMsampleAC,1.25,1.25,500,500,4) dataframeList <- prepareHMMdataframes(df) trProbs <- cbind(c(0.8999,0.1001),c(0.1001,0.8999)) output <- runHMM_1(dataframeList, initialStartProbabilities, trProbs)
Applies the depmixS4 method fit on .
runHMM_2(dataframeList)
runHMM_2(dataframeList)
dataframeList |
List of depmixS4 depmix class objects generated from the runHMM_1 step |
Output is a list of depmixS4 depmix.fitted class output for each input dataframe
Michelle Webb
data('humanGBMsampleAC') data('initialStartProbabilities') df <- tLOHCalcUpdate(humanGBMsampleAC,1.25,1.25,500,500,4) dataframeList <- prepareHMMdataframes(df) trProbs <- cbind(c(0.8999,0.1001),c(0.1001,0.8999)) dataframeList2 <- runHMM_1(dataframeList, initialStartProbabilities, trProbs) output <- runHMM_2(dataframeList2)
data('humanGBMsampleAC') data('initialStartProbabilities') df <- tLOHCalcUpdate(humanGBMsampleAC,1.25,1.25,500,500,4) dataframeList <- prepareHMMdataframes(df) trProbs <- cbind(c(0.8999,0.1001),c(0.1001,0.8999)) dataframeList2 <- runHMM_1(dataframeList, initialStartProbabilities, trProbs) output <- runHMM_2(dataframeList2)
Applies the depmixS4 method posterior on normalized K values.
runHMM_3(dataframeList)
runHMM_3(dataframeList)
dataframeList |
List of depmixS4 depmix.fitted class output generated from the runHMM_2 step |
Output is a list of depmixS4 posterior state classifications for each input dataframe
Michelle Webb
data('humanGBMsampleAC') data('initialStartProbabilities') df <- tLOHCalcUpdate(humanGBMsampleAC,1.25,1.25,500,500,4) dataframeList <- prepareHMMdataframes(df) trProbs <- cbind(c(0.8999,0.1001),c(0.1001,0.8999)) dataframeList2 <- runHMM_1(dataframeList, initialStartProbabilities, trProbs) dataframeList3 <- runHMM_2(dataframeList2) output <- runHMM_3(dataframeList3)
data('humanGBMsampleAC') data('initialStartProbabilities') df <- tLOHCalcUpdate(humanGBMsampleAC,1.25,1.25,500,500,4) dataframeList <- prepareHMMdataframes(df) trProbs <- cbind(c(0.8999,0.1001),c(0.1001,0.8999)) dataframeList2 <- runHMM_1(dataframeList, initialStartProbabilities, trProbs) dataframeList3 <- runHMM_2(dataframeList2) output <- runHMM_3(dataframeList3)
Creates individual chromosome dataframes
splitByChromosome(listOfDataframes,numberOfDataframes)
splitByChromosome(listOfDataframes,numberOfDataframes)
listOfDataframes |
Input dataframe generated from the tLOHDataImport function |
numberOfDataframes |
Number of dataframes in list |
Output is a list of dataframe separated by chromosome
Michelle Webb
data('humanGBMsampleAC') df <- tLOHCalcUpdate(humanGBMsampleAC,1.25,1.25,500,500,4) output <- splitByChromosome(list(df),1)
data('humanGBMsampleAC') df <- tLOHCalcUpdate(humanGBMsampleAC,1.25,1.25,500,500,4) output <- splitByChromosome(list(df),1)
Function used by regionFinalize to group segments
summarizeRegions1(x)
summarizeRegions1(x)
x |
A list of dataframes containing calculated values and JMM state determinations |
Output is a list of dataframes
Michelle Webb
## Not run: data('humanGBMsampleAC') data('initialStartProbabilities') df <- tLOHCalcUpdate(humanGBMsampleAC,1.25,1.25,500,500,4) dataframeList <- prepareHMMdataframes(df) trProbs <- cbind(c(0.8999,0.1001),c(0.1001,0.8999)) dataframeList2 <- runHMM_1(dataframeList, initialStartProbabilities, trProbs) dataframeList3 <- runHMM_2(dataframeList2) output <- runHMM_3(dataframeList3) intermediate <- regionAnalysis(dataframeList,output) finalList1 <- purrr::map(intermediate, ~dplyr::mutate(.x, state = as.character(state))) sampleValues <- as.data.frame(purrr::reduce(finalList1,full_join)) sampleData <- summarizeRegions1(finalList1) ## End(Not run)
## Not run: data('humanGBMsampleAC') data('initialStartProbabilities') df <- tLOHCalcUpdate(humanGBMsampleAC,1.25,1.25,500,500,4) dataframeList <- prepareHMMdataframes(df) trProbs <- cbind(c(0.8999,0.1001),c(0.1001,0.8999)) dataframeList2 <- runHMM_1(dataframeList, initialStartProbabilities, trProbs) dataframeList3 <- runHMM_2(dataframeList2) output <- runHMM_3(dataframeList3) intermediate <- regionAnalysis(dataframeList,output) finalList1 <- purrr::map(intermediate, ~dplyr::mutate(.x, state = as.character(state))) sampleValues <- as.data.frame(purrr::reduce(finalList1,full_join)) sampleData <- summarizeRegions1(finalList1) ## End(Not run)
Function used by regionFinalize to identify segment start and end positions
summarizeRegions2(finalTable)
summarizeRegions2(finalTable)
finalTable |
A dataframe containing metrics from the Bayes Factor and HMM analysis |
Output is a dataframe
Michelle Webb
## Not run: data('humanGBMsampleAC') data('initialStartProbabilities') df <- tLOHCalcUpdate(humanGBMsampleAC,1.25,1.25,500,500,4) dataframeList <- prepareHMMdataframes(df) trProbs <- cbind(c(0.8999,0.1001),c(0.1001,0.8999)) dataframeList2 <- runHMM_1(dataframeList, initialStartProbabilities, trProbs) dataframeList3 <- runHMM_2(dataframeList2) output <- runHMM_3(dataframeList3) intermediate <- regionAnalysis(dataframeList,output) finalList1 <- purrr::map(intermediate, ~dplyr::mutate(.x, state = as.character(state))) sampleValues <- as.data.frame(purrr::reduce(finalList1,full_join)) sampleData <- summarizeRegions1(finalList1) sampleData$lengthOfInterval <- sampleData$intervalEnd - sampleData$intervalStart sampleDF <- summarizeRegions2(sampleValues) ## End(Not run)
## Not run: data('humanGBMsampleAC') data('initialStartProbabilities') df <- tLOHCalcUpdate(humanGBMsampleAC,1.25,1.25,500,500,4) dataframeList <- prepareHMMdataframes(df) trProbs <- cbind(c(0.8999,0.1001),c(0.1001,0.8999)) dataframeList2 <- runHMM_1(dataframeList, initialStartProbabilities, trProbs) dataframeList3 <- runHMM_2(dataframeList2) output <- runHMM_3(dataframeList3) intermediate <- regionAnalysis(dataframeList,output) finalList1 <- purrr::map(intermediate, ~dplyr::mutate(.x, state = as.character(state))) sampleValues <- as.data.frame(purrr::reduce(finalList1,full_join)) sampleData <- summarizeRegions1(finalList1) sampleData$lengthOfInterval <- sampleData$intervalEnd - sampleData$intervalStart sampleDF <- summarizeRegions2(sampleValues) ## End(Not run)
Calculates Bayes factors for allele fractions at each SNP position. Uses dataframe output by tLOHDataImport
tLOHCalc(forCalcDF)
tLOHCalc(forCalcDF)
forCalcDF |
Input dataframe generated from the tLOHDataImport function |
Output is a dataframe with values that can be visualized with alleleFrequencyPlot() or aggregateCHRPlot()
Michelle Webb
data('humanGBMsampleAC') df <- tLOHCalc(humanGBMsampleAC) head(df)
data('humanGBMsampleAC') df <- tLOHCalc(humanGBMsampleAC) head(df)
Calculates Bayes factors for allele fractions at each SNP position. Uses dataframe output by tLOHDataImport.
tLOHCalcUpdate(forCalcDF, alpha1, beta1,alpha2, beta2, countThreshold)
tLOHCalcUpdate(forCalcDF, alpha1, beta1,alpha2, beta2, countThreshold)
forCalcDF |
Input dataframe generated from the tLOHDataImport function |
alpha1 |
Model 1 alpha value |
beta1 |
Model 1 beta value |
alpha2 |
Model 2 alpha value |
beta2 |
Model 2 beta value |
countThreshold |
Threshold for minimum number of read counts |
Output is a dataframe with Bayes Factor values
Michelle Webb
data('humanGBMsampleAC') df <- tLOHCalcUpdate(humanGBMsampleAC, 1.25,1.25,500,500,4) head(df)
data('humanGBMsampleAC') df <- tLOHCalcUpdate(humanGBMsampleAC, 1.25,1.25,500,500,4) head(df)
Import a VCF with per-cluster allele count information at heterozygous SNP positions for the tLOHCalc calculation function.
vcf |
An input VCF file. Spatial transcriptomics clusters make up the sample columns. AD and DP fields are required. Each SNP should be annotated with dbSNP rsIDs. |
Output is a dataframe with required fields for tLOHCalc
Michelle Webb
## Not run: R.utils::gunzip("inst/extdata/Example.vcf.gz","inst/extdata/Example.vcf") exampleDF <- tLOHDataImport("inst/extdata/Example.vcf") ## End(Not run)
## Not run: R.utils::gunzip("inst/extdata/Example.vcf.gz","inst/extdata/Example.vcf") exampleDF <- tLOHDataImport("inst/extdata/Example.vcf") ## End(Not run)