Title: | Clonal Population Identification in Single-Cell RNA-Seq Data using Mitochondrial and Somatic Mutations |
---|---|
Description: | This package primarily identifies variants in mitochondrial genomes from BAM alignment files. It filters these variants to remove RNA editing events then estimates their evolutionary relationship (i.e. their phylogenetic tree) and groups single cells into clones. It also visualizes the mutations and providing additional genomic context. |
Authors: | Benjamin Story [aut, cre], Lars Velten [aut], Gregor Mönke [aut] |
Maintainer: | Benjamin Story <[email protected]> |
License: | GPL-3 |
Version: | 1.13.0 |
Built: | 2024-11-19 03:59:33 UTC |
Source: | https://github.com/bioc/mitoClone2 |
This function uses a C interface to read the nucleotide counts on each position of a .bam alignment. The counts are individually tabulated for each cell barcode as specified by the user. The counts of both strands are reported separately and nucleotides below a quality cutoff are masked.
bam2R_10x( file, sites = "MT:1-16569", q = 25, mq = 0, s = 2, head.clip = 0, max.depth = 1e+06, verbose = FALSE, mask = 0, keepflag = 0, max.mismatches = NULL, ncores = 1, ignore_nonstandard = FALSE )
bam2R_10x( file, sites = "MT:1-16569", q = 25, mq = 0, s = 2, head.clip = 0, max.depth = 1e+06, verbose = FALSE, mask = 0, keepflag = 0, max.mismatches = NULL, ncores = 1, ignore_nonstandard = FALSE )
file |
The file location of the BAM file as a string. |
sites |
The chromosome locations of interest in BED format as a string. Alternatively a single GRanges object will also work. |
q |
An optional cutoff for the nucleotide Phred quality. Default q = 25. Nucleotides with Q < q will be masked by 'N'. |
mq |
An optional cutoff for the read mapping quality. Default mq = 0 (no filter). reads with MQ < mq will be discarded. |
s |
Optional choice of the strand. Defaults to s = 2 (both). |
head.clip |
Should n nucleotides from the head of reads be clipped? Default 0. |
max.depth |
The maximal depth for the pileup command. Default 1,000,000. |
verbose |
Boolean. Set to TRUE if you want to get additional output. |
mask |
Integer indicating which flags to filter. Default 0 (no mask). Try 1796 (BAM_DEF_MASK). |
keepflag |
Integer indicating which flags to keep. Default 0 (no mask). Try 3 (PAIRED|PROPERLY_PAIRED). |
max.mismatches |
Integer indicating maximum MN value to allow in a read. Default NULL (no filter). |
ncores |
Integer indicating the number of threads to use for the parallel function call that summarize the results for each bam file. Default 1. |
ignore_nonstandard |
Boolean indicating whether or not gapped alignments, insertions, or deletions should be included in the final output. Default FALSE. If you have an inflation of spliced mitochondrial reads it is recommended to set this to TRUE. |
This code is an adaption of code that was originally written by Moritz Gerstung for the deepSNV package
A named list
of matrix
with
rows corresponding to genomic positions and columns for the
nucleotide counts (A, T, C, G, -), masked nucleotides (N),
(INS)ertions, (DEL)etions that count how often a read begins
and ends at the given position, respectively. Each member of
the list corresponds to an invididual cells or entity based on
the cell barcode of interest. The names of the elements of the
list correspond to the respective cell barcodes. For the
intents and purposes of the mitoClone2 package this object is
equivalent to the output from the
baseCountsFromBamList
function. The returned
list has a variable length depending on the ignore_nonstandard
parameter and each element contains a matrix has 8 columns and
(stop - start + 1) rows. The two strands have their counts
merged. If no counts are present in the provided sites
parameter nothing will be returned. IMPORTANT: The names of
the list will NOT reflect the source filename and will
exclusively be named based on the respective the barcodes
extracted from said file. If merging multiple datasets, it is
important to change the list's names once imported to avoid
naming collisions.
Benjamin Story (adapted from original code with permission from Moritz Gerstung)
bamCounts <- bam2R_10x(file = system.file("extdata", "mm10_10x.bam", package="mitoClone2"), sites="chrM:1-15000")
bamCounts <- bam2R_10x(file = system.file("extdata", "mm10_10x.bam", package="mitoClone2"), sites="chrM:1-15000")
Uses the deepSNV
package to count nucleotide frequencies at
every position in the mitochondrial genome for every cell.
baseCountsFromBamList( bamfiles, sites = "chrM:1-16569", ncores = 1, ignore_nonstandard = FALSE )
baseCountsFromBamList( bamfiles, sites = "chrM:1-16569", ncores = 1, ignore_nonstandard = FALSE )
bamfiles |
A character vector specifying the bam file paths |
sites |
String specifying genomic regions, defaults to the entire mitochondrial genome |
ncores |
Number of threads to use for the computation. Default 1 |
ignore_nonstandard |
Ignore basecalls that are not AGCTN |
A list of base count matrices which can serve as an input to
mutationCallsFromExclusionlist
or
mutationCallsFromCohort
bamCounts <- baseCountsFromBamList(bamfiles = list(system.file("extdata", "mm10_10x.bam", package="mitoClone2")),sites="chrM:1-15000", ncores=1)
bamCounts <- baseCountsFromBamList(bamfiles = list(system.file("extdata", "mm10_10x.bam", package="mitoClone2")),sites="chrM:1-15000", ncores=1)
PhISCS orders all mutations into a hierarchical mutational tree; in
many cases, the exact order of the acquisition of individual
mutations in not unanimously determined from the data. This
function computes the change in likelihood of the infered clonal
assignment if two mutations are merged into a clone. Hierarchical
clustering is then used to determine the clonal structure. The
result is visualized and should be fine-tuned using the
min.lik
parameter.
clusterMetaclones(mutcalls, min.lik = 1, plot = TRUE)
clusterMetaclones(mutcalls, min.lik = 1, plot = TRUE)
mutcalls |
mutcalls object of class |
min.lik |
specifies the minimum difference in likelihood required. This parameter is set arbitrarily, see the vignette "Computation of clonal hierarchies and clustering of mutations" for more information. |
plot |
whether dendrograms should be plotted. |
Returns the provided mutationCalls
class
object with an additional 'mainClone' metadata which allows for
further refinement of clonal population and association of
cells with a cluster of mutations (in this case clones).
P1 <- readRDS(system.file("extdata/sample_example1.RDS",package = "mitoClone2")) P1 <- clusterMetaclones(P1) ## access via mainClone metadata
P1 <- readRDS(system.file("extdata/sample_example1.RDS",package = "mitoClone2")) P1 <- clusterMetaclones(P1) ## access via mainClone metadata
List of variants that are likely not true somatic mutations and should thus be excluded
M: Mutant allele counts; N: Reference allele counts. P1: Patient 1; P2: Patient 2
exclusionlists M_P1 N_P1 M_P2 N_P2
exclusionlists M_P1 N_P1 M_P2 N_P2
A list with four entries: #'
three: Regions of the mitochondrial genome that are within 1 nt of a 3-mer homopolymer (e.g. AAA)
mutaseq: Mutations in the mitochondrial genome that were reoccuring across patients (present in more than one individual in the MutaSeq dataset)
masked: Regions of the mitochondrial genome that are soft-masked in the UCSC or Ensembl annotations
rnaEDIT: Regions of the mitochondrial genome that are thought to be subject to RNA-editing according to the REDIportal V2.0
a data frame of variable sites (columns) across single cells (rows)
An object of class data.frame
with 1430 rows and 16 columns.
An object of class data.frame
with 1430 rows and 16 columns.
An object of class data.frame
with 1066 rows and 22 columns.
An object of class data.frame
with 1066 rows and 22 columns.
Extracts the counts of allele for either the mutant or all the non-mutant alleles
getAlleleCount(mutcall, type = c("mutant", "nonmutant"))
getAlleleCount(mutcall, type = c("mutant", "nonmutant"))
mutcall |
object of class |
type |
character that is either 'mutant' or 'nonmutant' depending on which allele count the user wants to access |
Returns matrix of either mutant or non-mutant allele counts
load(system.file("extdata/LudwigFig7.Rda",package = "mitoClone2")) mutantAllele_count <- getAlleleCount(LudwigFig7,type='mutant')
load(system.file("extdata/LudwigFig7.Rda",package = "mitoClone2")) mutantAllele_count <- getAlleleCount(LudwigFig7,type='mutant')
Retrieves the full matrix of likelihoods associating single cells with clones
getCloneLikelihood(mutcall, mainClones = length(mutcall@mut2clone) > 0) getMainClone(mutcall, mainClones = length(mutcall@mut2clone) > 0) getConfidence(mutcall, mainClones = length(mutcall@mut2clone) > 0) getMut2Clone(mutcall)
getCloneLikelihood(mutcall, mainClones = length(mutcall@mut2clone) > 0) getMainClone(mutcall, mainClones = length(mutcall@mut2clone) > 0) getConfidence(mutcall, mainClones = length(mutcall@mut2clone) > 0) getMut2Clone(mutcall)
mutcall |
object of class |
mainClones |
Retrieve likelihoods associated with the main
Clones. Defaults to |
Return TRUE
if clusterMetaclones
has
been run otherwise returns the cell by clone matrix of
likelihood associating each cell to a given clone.
getMainClone
: Retrieve the most likely clone
associate with each cell.
getConfidence
: Retrieve the likelihood of the most
likely clone for each cell.
getMut2Clone
: Retrieve the assignment of mutations
to clones, once clusterMetaclones
has been run.
load(system.file("extdata/LudwigFig7.Rda",package = "mitoClone2")) likelihood_matrix <- getCloneLikelihood(LudwigFig7)
load(system.file("extdata/LudwigFig7.Rda",package = "mitoClone2")) likelihood_matrix <- getCloneLikelihood(LudwigFig7)
Extracts all the putative variants that we want to use for clustering
getVarsCandidate(mutcall)
getVarsCandidate(mutcall)
mutcall |
object of class |
Returns a character vector including all the variants to be used for clustering
load(system.file("extdata/LudwigFig7.Rda",package = "mitoClone2")) mutations_to_cluster <- getVarsCandidate(LudwigFig7)
load(system.file("extdata/LudwigFig7.Rda",package = "mitoClone2")) mutations_to_cluster <- getVarsCandidate(LudwigFig7)
Plot clone-specific variants in circular plots
mitoPlot( variants, patient = NULL, genome = "hg38", showLegend = TRUE, showLabel = TRUE )
mitoPlot( variants, patient = NULL, genome = "hg38", showLegend = TRUE, showLabel = TRUE )
variants |
Character vector of variants to plot in format 5643G>T or 5643 G>T. |
patient |
Characet vector identifying which variant belongs to what clone. The order should match that of the 'vars' parameter and shoul dbe of identical length. If none is provided, the function assumes all variants are from one single sample which will be named "Main Clone". Default: NULL. |
genome |
The mitochondrial genome of the sample being investigated. Please note that this is the UCSC standard chromosome sequence. Default: hg38. |
showLegend |
Boolean for whether or not the gene legend should be present in the final output plot. Default: TRUE. |
showLabel |
Boolean for whether or not the name of the variant should be shown as a label in the final output plot. Default: TRUE. |
A ggplot object illustrating the clone specific mutations.
known.variants <- c("9001 T>C","12345 G>A","1337 G>A") mitoPlot(known.variants)
known.variants <- c("9001 T>C","12345 G>A","1337 G>A") mitoPlot(known.variants)
Convert mutation string to GRanges
mut2GR(mut)
mut2GR(mut)
mut |
The mutation to convert to a GRanges in the format of "position reference>alternate". |
Returns a GRanges object containg the site of the variant along with reference/alternate allele data in the metacolumns
mutation.as.granges <- mut2GR('1434 G>A') mutation.as.granges.no.space <- mut2GR('1434G>A')
mutation.as.granges <- mut2GR('1434 G>A') mutation.as.granges.no.space <- mut2GR('1434G>A')
To create this class from a list of bam files (where each bam file corresponds
to a single cell), use mutationCallsFromCohort
or
mutationCallsFromExclusionlist
. To create this class if you
already have the matrices of mutation counts, use its contstructor, i.e.
mutationCallsFromMatrix(M = data1, N = data2)
.
M
A matrix of read counts mapping to the mutant allele. Columns are genomic sites and rows and single cells.
N
A matrix of read counts mapping to the nonmutant alleles. Columns are genomic sites and rows and single cells.
ternary
Discretized version describing the mutational status of each gene in each cell, where 1 signfiies mutant, 0 signifies reference, and ? signifies dropout
cluster
Boolean vector of length ncol(M)
specifying if the given
mutation should be included for clustering (TRUE
) or only used for
annotation.
metadata
Metadata frame for annotation of single cells (used for
plotting). Row names should be the same as in M
tree
Inferred mutation tree
cell2clone
Probability matrix of single cells and their assignment to clones.
mut2clone
Maps mutations to main clones
mainClone
Probability matrix of single cells and their assignment to main clones
treeLikelihoods
Likelihood matrix underlying the inference of main
clones, see clusterMetaclones
Identifies relevant mitochondrial somatic variants from raw counts
of nucleotide frequencies measured in single cells from several
individuals. Applies two sets of filters: In the first step,
filters on coverage to include potentially noisy variants; in the
second step, compares allele frequencies between patients to remove
variants that were observed in several individuals and that
therefore are unlikely to represent true somatic variants (e.g. RNA
editing events). The exclusionlist derived from the original Velten
et al. 2021 dataset is available internal and can be used on single
individuals using mutationCallsFromExclusionlist
mutationCallsFromCohort( BaseCounts, sites, patient, MINREADS = 5, MINCELL = 20, MINFRAC = 0.1, MINCELLS.PATIENT = 10, MINFRAC.PATIENT = 0.01, MINFRAC.OTHER = 0.1, USE.REFERENCE = TRUE, genome = "hg38" )
mutationCallsFromCohort( BaseCounts, sites, patient, MINREADS = 5, MINCELL = 20, MINFRAC = 0.1, MINCELLS.PATIENT = 10, MINFRAC.PATIENT = 0.01, MINFRAC.OTHER = 0.1, USE.REFERENCE = TRUE, genome = "hg38" )
BaseCounts |
A list of base call matrices (one matrix per cell)
as produced by |
sites |
Vector specifying genomic regions, defaults to the entire mitochondrial genome. Excepts a string but may be included as a GRanges object. |
patient |
A character vector associating each cell / entry in
the |
MINREADS |
Minimum number of reads on a site in a single cell to qualify the site as covered |
MINCELL |
Minimum number of cells across the whole data set to cover a site |
MINFRAC |
Fraction of reads on the mutant allele to provisionally classify a cell as mutant |
MINCELLS.PATIENT |
Minimum number of mutant cells per patient to classify the mutation as relevant in that patient, AND |
MINFRAC.PATIENT |
Minimum fraction of mutant cells per patient to classify the mutation as relevant in that patient |
MINFRAC.OTHER |
Minimum fraction of mutant cells identified in a second patient for the mutation to be excluded. Fraction relative to the fraction of of cells from the patient where a variant is enriched. |
USE.REFERENCE |
Boolean. The variant calls will be of the
format REF>ALT where REF is decided based on the selected
|
genome |
The mitochondrial genome of the sample being investigated. Please note that this is the UCSC standard chromosome sequence. Default: hg38. |
A list of mutationCalls
objects (one for each
patient
) and an entry named exclusionlist
containing a exclusionlist of sites with variants in several
individuals
sites.gr <- GenomicRanges::GRanges("chrM:1-15000") BaseCounts <- bam2R_10x(file = system.file("extdata", "mm10_10x.bam", package="mitoClone2"), sites=sites.gr) mutCalls <- mutationCallsFromCohort(BaseCounts, patient=c('sample2','sample1','sample2','sample2','sample1','sample2'), MINCELL=1, MINFRAC=0, MINCELLS.PATIENT=1, genome='mm10', sites=sites.gr)
sites.gr <- GenomicRanges::GRanges("chrM:1-15000") BaseCounts <- bam2R_10x(file = system.file("extdata", "mm10_10x.bam", package="mitoClone2"), sites=sites.gr) mutCalls <- mutationCallsFromCohort(BaseCounts, patient=c('sample2','sample1','sample2','sample2','sample1','sample2'), MINCELL=1, MINFRAC=0, MINCELLS.PATIENT=1, genome='mm10', sites=sites.gr)
Identifies relevant mitochondrial somatic variants from raw counts
of nucleotide frequencies. Applies two sets of filters: In the
first step, filters on coverage and minimum allele frequency to
exclude potentially noisy variants; in the second step, filters
against a exclusionlist of variants that were observed in several
individuals and that therefore are unlikely to represent true
somatic variants (e.g. RNA editing events). These exclusionlists
are created using mutationCallsFromCohort
mutationCallsFromExclusionlist( BaseCounts, lim.cov = 20, min.af = 0.2, min.num.samples = 0.01 * length(BaseCounts), min.af.universal = min.af, universal.var.cells = 0.95 * length(BaseCounts), exclusionlists.use = exclusionlists, max.var.na = 0.5, max.cell.na = 0.95, genome = "hg38", ncores = 1, ... )
mutationCallsFromExclusionlist( BaseCounts, lim.cov = 20, min.af = 0.2, min.num.samples = 0.01 * length(BaseCounts), min.af.universal = min.af, universal.var.cells = 0.95 * length(BaseCounts), exclusionlists.use = exclusionlists, max.var.na = 0.5, max.cell.na = 0.95, genome = "hg38", ncores = 1, ... )
BaseCounts |
A list of base call matrices (one matrix per cell)
as produced by |
lim.cov |
Minimal coverage required per cell for a cell to be classified as covered |
min.af |
Minimal allele frequency for a cell to be classified as mutant |
min.num.samples |
Minimal number of cells required to be
classified as covered and mutant according to the thresholds
set in |
min.af.universal |
Minimal allele frequency for a cell to be
classified as mutant, in the context of removing universal
variants. Defaults to |
universal.var.cells |
Maximum number of cells required to be
classified as mutant according to the threshold set in
|
exclusionlists.use |
List of sites to exclude for variants calling. The default exclusionlists object included with this package contains exclude or hardmask in GRanges format. The four exclusionlists included in this case are: "three" (hg38 sites that are part of homopolymer(e.g. AAA) of at least 3 bp in length), "mutaseq" (sites discovered to be overrepresented in AML SmartSeq2 data analysis from Velten et al 2021), "masked" (sites that are softmasked in either the UCSC or Refseq genome annotations), and "rnaEDIT" which are sites that are subjected to RNA-editing according to the REDIportal. These lists can also be input manually by a researcher and provided as either coordinates (as a string) or as a GRanges objects. |
max.var.na |
Final filtering step: Remove all mutations with no coverage in more than this fraction of cells |
max.cell.na |
Final filtering step: Remove all cells with no coverage in more than this fraction of mutations |
genome |
The mitochondrial genome of the sample being investigated. Please note that this is the UCSC standard chromosome sequence. Default: hg38. |
ncores |
number of cores to use for tabulating potential variants (defaults to 2) |
... |
Parameters passed to
|
An object of class mutationCalls
load(system.file("extdata/example_counts.Rda",package = "mitoClone2")) Example <- mutationCallsFromExclusionlist(example.counts, min.af=0.05, min.num.samples=5, universal.var.cells = 0.5 * length(example.counts), binarize = 0.1)
load(system.file("extdata/example_counts.Rda",package = "mitoClone2")) Example <- mutationCallsFromExclusionlist(example.counts, min.af=0.05, min.num.samples=5, universal.var.cells = 0.5 * length(example.counts), binarize = 0.1)
To be used when allele-specific count matrices are available.
mutationCallsFromMatrix( M, N, cluster = NULL, metadata = data.frame(row.names = rownames(M)), binarize = 0.05 )
mutationCallsFromMatrix( M, N, cluster = NULL, metadata = data.frame(row.names = rownames(M)), binarize = 0.05 )
M |
A matrix of read counts mapping to the mutant allele. Columns are genomic sites and rows and single cells. |
N |
A matrix of read counts mapping to the referece allele. Columns are genomic sites and rows and single cells. |
cluster |
If |
metadata |
A data.frame of metadata that will be transfered to
the final output where the |
binarize |
Allele frequency threshold to define a site as mutant (required for some clustering methods) |
An object of class mutationCalls
.
load(system.file("extdata/example_counts.Rda",package = "mitoClone2")) ## we have loaded the example.counts object known.variants <- c("8 T>C","4 G>A","11 G>A","7 A>G","5 G>A","15 G>A","14 G>A") known.subset <- pullcountsVars(example.counts, known.variants) known.subset <- mutationCallsFromMatrix(t(known.subset$M), t(known.subset$N), cluster = rep(TRUE, length(known.variants)))
load(system.file("extdata/example_counts.Rda",package = "mitoClone2")) ## we have loaded the example.counts object known.variants <- c("8 T>C","4 G>A","11 G>A","7 A>G","5 G>A","15 G>A","14 G>A") known.subset <- pullcountsVars(example.counts, known.variants) known.subset <- mutationCallsFromMatrix(t(known.subset$M), t(known.subset$N), cluster = rep(TRUE, length(known.variants)))
The function clusterMetaclones
provides an automated
way to group mutations into clones for subsequent analyses (such as
differential expression analyses). In practice, it may make sense
to overwrite these results manually. See the vignette 'Computation
of clonal hierarchies and clustering of mutations' for an example.
overwriteMetaclones(mutcalls, mutation2clones)
overwriteMetaclones(mutcalls, mutation2clones)
mutcalls |
mutcalls object of class |
mutation2clones |
Named integer vector that assigns mutations to clones. See the vignette 'Computation of clonal hierarchies and clustering of mutations' for an example. |
Returns the provided mutationCalls
class
object with the 'mainClone' metadata overwritten with the
manual values provided by the user.
P1 <- readRDS(system.file("extdata/sample_example1.RDS",package = "mitoClone2")) new.n <- seq(17) names(new.n) <- names(getMut2Clone(P1)) P1.newid <- overwriteMetaclones(P1,new.n)
P1 <- readRDS(system.file("extdata/sample_example1.RDS",package = "mitoClone2")) new.n <- seq(17) names(new.n) <- names(getMut2Clone(P1)) P1.newid <- overwriteMetaclones(P1,new.n)
Creates a heatmap of single cell mutation calls, clustered using PhISCS.
plotClones(mutcalls, what = c("alleleFreq", "ternary"), show = c(), ...)
plotClones(mutcalls, what = c("alleleFreq", "ternary"), show = c(), ...)
mutcalls |
object of class |
what |
One of the following: alleleFreq: The fraction of reads mapping to the mutant allele or ternary: Ternarized mutation status |
show |
boolean vector specifying for each mutation if it should
be plotted on top of the heatmap as metadata; defaults to
mutations not used for the clustering |
... |
any arguments passed to |
Returns TRUE only used for generating a PostScript tree image of the putative mutation tree
P1 <- readRDS(system.file("extdata/sample_example1.RDS",package = "mitoClone2")) plotClones(P1)
P1 <- readRDS(system.file("extdata/sample_example1.RDS",package = "mitoClone2")) plotClones(P1)
Pull variant counts
pullcountsVars(BaseCounts, vars, cells = NULL)
pullcountsVars(BaseCounts, vars, cells = NULL)
BaseCounts |
A list of base call matrices (one matrix per cell)
as produced by |
vars |
Character vector of variants to pull, in format 5643G>T |
cells |
Character vector for cells to select, or NULL if all cells from the input are to be used |
A list with two entries, M (count table on the variant allele) and N (count table on the reference allele)
load(system.file("extdata/example_counts.Rda",package = "mitoClone2")) known.variants <- c("9 T>C","12 G>A","13 G>A") counts.known.vars <- pullcountsVars(example.counts, vars=known.variants)
load(system.file("extdata/example_counts.Rda",package = "mitoClone2")) known.variants <- c("9 T>C","12 G>A","13 G>A") counts.known.vars <- pullcountsVars(example.counts, vars=known.variants)
Performs a quick hierarchical clustering on a object of class
mutationCalls
. See varCluster
for an
alternative that infers mutational trees and uses sound models of
dropout.
quick_cluster(mutcalls, binarize = FALSE, drop_empty = TRUE, ...)
quick_cluster(mutcalls, binarize = FALSE, drop_empty = TRUE, ...)
mutcalls |
object of class |
binarize |
If |
drop_empty |
Remove all rows in the provided mutcalls object where no cells exhibit a mutation. |
... |
Parameters passed to |
The result of running pheatmap
load(system.file("extdata/LudwigFig7.Rda",package = "mitoClone2")) quickCluster <- quick_cluster(LudwigFig7)
load(system.file("extdata/LudwigFig7.Rda",package = "mitoClone2")) quickCluster <- quick_cluster(LudwigFig7)
Mutations co-occuring at the same genomic position may often be the result of sequencing artifacts or technical biases. In cases where the user which to drop these from a result this function may be used. ONLY WORKS FOR MITOCHONDRIAL MUTATIONS.
removeWindow(x, window = 1)
removeWindow(x, window = 1)
x |
A list of strings that comprise sites that will be filtered |
window |
Integer of how close mutations must be to one another (in bp) to be removed |
Returns the same list of mutations excluding those, if any, that fall within the same window =
P1.muts <- rep(TRUE,3) names(P1.muts) <- c("X2537GA","X3351TC","X3350TC") names(P1.muts) <- gsub("^X","", gsub("(\\d+)([AGCT])([AGCT])","\\1 \\2>\\3",names(P1.muts))) P1.muts <- P1.muts[removeWindow(names(P1.muts))]
P1.muts <- rep(TRUE,3) names(P1.muts) <- c("X2537GA","X3351TC","X3350TC") names(P1.muts) <- gsub("^X","", gsub("(\\d+)([AGCT])([AGCT])","\\1 \\2>\\3",names(P1.muts))) P1.muts <- P1.muts[removeWindow(names(P1.muts))]
Sets the putative variants that we want to use for clustering
setVarsCandidate(mutcall, varlist)
setVarsCandidate(mutcall, varlist)
mutcall |
object of class |
varlist |
vector of booleans with the names set to the variants to use for clustering |
Sets the cluster slot on a mutationCalls object
load(system.file("extdata/LudwigFig7.Rda",package = "mitoClone2")) mutations_to_cluster <- getVarsCandidate(LudwigFig7) mutations_to_cluster[] <- rep(c(TRUE,FALSE),each=19) LudwigFig7 <- setVarsCandidate(LudwigFig7,mutations_to_cluster)
load(system.file("extdata/LudwigFig7.Rda",package = "mitoClone2")) mutations_to_cluster <- getVarsCandidate(LudwigFig7) mutations_to_cluster[] <- rep(c(TRUE,FALSE),each=19) LudwigFig7 <- setVarsCandidate(LudwigFig7,mutations_to_cluster)
From data on the observed mutational status of single cells at a
number of genomic sites, computes a likely phylogenetic tree using
PhISCS (https://github.com/sfu-compbio/PhISCS) and associates
single cells with leaves of the tree. The function
clusterMetaclones
should be called on the output in
order to group mutations into clones using a likelihood-based
approach.
varCluster( mutcalls, fn = 0.1, fp = 0.02, cores = 1, time = 10000, tempfolder = tempdir(), python_env = "", force_recalc = FALSE, method = "SCITE" )
varCluster( mutcalls, fn = 0.1, fp = 0.02, cores = 1, time = 10000, tempfolder = tempdir(), python_env = "", force_recalc = FALSE, method = "SCITE" )
mutcalls |
object of class |
fn |
false negative rate, i.e. the probability of only observing the reference allele if there is a mutation. #add gene-wise |
fp |
false positive, i.e. the probability of observing the mutant allele if there is no mutation. |
cores |
number of cores to use for PhISCS (defaults to 1) |
time |
maximum time to be used for PhISCS optimization, in seconds (defaults to 10000) |
tempfolder |
temporary folder to use for PhISCS output |
python_env |
Any shell commands to execute in order to make the
gurobi python package available. The easiest solution is
running R from an environment where the gurobi python package
is avaiable. In some settings (e.g. RStudio Server), this
parameter can be used instead. |
force_recalc |
Rerun PhISCS even if the |
method |
A string variable of either PhISCS or SCITE depending on the tree-inferring software the user wants to use. Default: PhISCS |
an object of class mutationCalls
, with an
inferred tree structure and cell to clone assignment added.
load(system.file("extdata/LudwigFig7.Rda",package = "mitoClone2")) LudwigFig7 <- varCluster(LudwigFig7, python_env = "",method='SCITE')
load(system.file("extdata/LudwigFig7.Rda",package = "mitoClone2")) LudwigFig7 <- varCluster(LudwigFig7, python_env = "",method='SCITE')