The quality control of cancer somatic mutations is of great significance in cancer genomics. It helps to eliminate false positive mutations arisen during the sequencing process, thereby improving the efficiency and accuracy of downstream analysis. Here, we developed an R package CaMutQC, for the quality control and selection of cancer somatic mutations. It offers both common and customized strategies for the filtration of cancer somatic mutations based on the MAF data frame, which also can select key somatic mutations related to tumorigenesis. In addition, we believe that the union of CaMutQC-filtered mutations returned by multiple variant caller contains more true positive somatic mutations than that from a single variant caller or the intersection of multiple callers. The package, source code and documents are freely available through Github (https://github.com/likelet/CaMutQC)
In R console, enter citation("CaMutQC")
.
Install the latest and most stable version of CaMutQC with Bioconductor by typing the commands below in R console:
For now, there are three main functional modules in CaMutQC. The
first section is to filter cancer somatic mutations through common
strategies, and the following section offers users customized filtration
criteria based on cancer types and published papers. CaMutQC is also
capable of measuring TMB (Tumor Mutational Burden) through various
assays. Required input of most functions in CaMutQC can be obtained by
applying vcfToMAF
function on VCF files.
MAF data frame with special labels from CaMutQC will be returned after each filtration. And a filter report will be generated, offering detailed and organized information.
VCF is a widely used text file format in bioinformatics for storing gene sequence variations. All VCF files should be annotated by VEP first before analyzing through CaMutQC because annotated VCF files contain more detailed information that has clinical significance. Information about VEP and how to run it on VCF file can be found here.
CaMutQC supports VEP annotated multi-sample or multi-caller VCF files as inputs, which should be under the same file path. Supported caller: MuTect2, VarScan2, MuSE.
VCF and MAF both are important formats in oncology and
bioinformatics, but additional tools are needed when transforming
between these two formats. vcfToMAF
function in CaMutQC is
able to perform this transformation using one line command in a few
seconds when the input VCF file is VEP-annotated. In addition, parameter
filterGene
can filter variants without Hugo Symbol when it
is set as TRUE
.
library(CaMutQC)
MAFdat <- vcfToMAF(system.file("extdata", "WES_EA_T_1_mutect2.vep.vcf", package="CaMutQC"))
MAFdat[1:5, 1:13]
## Hugo_Symbol Entrez_Gene_Id Center NCBI_Build Chromosome Start_Position
## 1 AGRN 375790 . GRCh38 chr1 1049980
## 2 GATAD2B 57459 . GRCh38 chr1 153806614
## 3 CNOT11 55571 . GRCh38 chr2 101270077
## 4 ANO7 50636 . GRCh38 chr2 241190240
## 5 PDCD1 5133 . GRCh38 chr2 241850724
## End_Position Strand Variant_Classification Variant_Type Reference_Allele
## 1 1049980 + Missense_Mutation SNP G
## 2 153806615 + 3'UTR DEL TT
## 3 101270077 + 3'UTR DEL T
## 4 241190240 + Intron SNP C
## 5 241850726 + 3'UTR DEL GCA
## Tumor_Seq_Allele1 Tumor_Seq_Allele2
## 1 G C
## 2 TT -
## 3 T -
## 4 C T
## 5 GCA -
Load multi-caller data that consists of several VCF files by setting
multiVCF
as TRUE
.
vcfPath <- system.file("extdata/Multi-caller", package="CaMutQC")
multiVCFs <- vcfToMAF(vcfPath, multiVCF=TRUE)
unique(multiVCFs$Tumor_Sample_Barcode)
## [1] "WES_EA_T_1" "TUMOR"
There are two Tumor_Sample_Barcode(s) after reading two VCF files
under the Multi-caller
folder.
After reading a number of classical papers, we collected, sorted and
summarized some widely used parameters and their thresholds when
performing cancer somatic mutation filtration. These strategies are
implemented through a number of sub-functions that cover widely used
criteria like sequencing quality, strand of bias and database selection.
Besides, sub-functions are integrated into bigger functions to enable .
Each of the functions takes MAF data frame generated by
vcfToMAF
function in CaMutQC as an input, and returns a
labeled MAF data frame as results.
Sub-functions and their corresponding flags
Main function | Sub-function | Flag |
---|---|---|
mutFilterTech | mutFilterQual | Q |
mutFilterSB | S | |
mutFilterAdj | A | |
mutFilterNormalDP | N | |
mutFilterPON | P | |
FILTER | F | |
mutSelection | mutFilterDB | D |
mutFilterType | T | |
mutFilterReg | R |
Note: A variant labeled by certain flag indicates it fails to pass this filter function, and all variants start from tag '0’
Sequencing quality parameters like allele depth (AD
),
total depth (DP
) and variant allele frequency
(VAF
) are widely used to filter potential artifacts. To
provide more convenience as well as more flexibility, the
panel
parameter in this function is able to apply a set of
filtration strategies related to sequencing quality, where user can
choose between panels like WES
and MSKCC
and
they can also set freely under any panel.
Parameters for Customized, WES and MSKCC panel
Parameter | Customized panel (default) | WES panel | MSKCC panel |
---|---|---|---|
normalDP | 10 | 10 | 10 |
normalAD | Inf* | 1 | 1 |
tumorDP | 20 | 20 | 20 |
tumorAD | 5 | 5 | 10 |
VAF | 0.05 | 0.05 | 0.05 |
VAFratio | 0 | 0 | 5 |
*: Inf here means normalAD is not a filtration criterion in this panel
##
## 0 0Q
## 30 57
Here we can see that 57 mutations get an extra Q flag, which means they fail to pass the filtration on sequencing quality with VAF < 0.01 or VAFratio < 4, or both.
Strand bias occurs when the genotype inferred from information presented by the forward strand and the reverse strand disagrees. A study showed that post-analysis procedures can cause strand bias, which introduce more SNPs with higher strand bias, and in turn result in more false-positive SNPs 1. Therefore, it is necessary to detect and minimize the effect of strand bias.
At present, there are four widely-used methods for strand bias detection. One approach was mentioned in a mitochondria heteroplasmy study 2. And GATK calculates a strand bias score for each SNP identified while Samtools put forwards another strand bias score based on Fisher’s exact test. Additionally, GATK introduced an updated form of the Fisher Strand Test, StrandOddsRatioSOR annotation (SOR), which is believed to be better at measuring strand bias of data in high coverage.
In CaMutQC, either Fisher Strand Test or SOR algorithm can be used to evaluate strand bias and filter variants based on the results. By default, strand bias is detected through SOR algorithm and the cutoff for strand of bias score is set as 3.
##
## 0 0S
## 68 19
In our case, 19 mutations are labeled by S flag because CaMutQC believes they have strand bias when the cutoff is set to 2.
The Adjacent Indel tag is used when a somatic SNP/DNP/TNP was possibly caused by misalignment around a germline or somatic insertion/deletion (indel). By default, CaMutQC filters any SNV within 10 bp of an indel with length <= 50 bp found in the tumor sample.
##
## 0 0A
## 85 2
There are 2 point mutations labeled by flag A in the above example, because they are within 15 bps of an indel with length <= 40.
To avoid miscalling germline variants and to improve the quality of variants 3, CaMutQC supports filtration on normal depth for both dbsnp/non-dbsnp variants, where cutoffs are 19 and 8 respectively.
##
## 0
## 87
Based on the results, all mutations pass this filtration under default settings.
Panel of Normals (PON) is a type of resource used in somatic variant analysis. Basically, if a variant is found in a panel of normals, or is found in more than two normal samples, it is unlikely to be a driven variant during tumorigenesis or tumor development. PON filtration has been widely used in many researches and projects to discard non-driven variants 4 5 6.
A PON data set can be generated by users through sequencing a number of normal samples that are as technically similar as possible to the tumor (same exome or genome preparation methods, sequencing technology and so on). Or, the PON data set can also be directly obtained from GATK, which is viewed as one of the most effective filters for false-positive, contamination, and germline variants 3.
Due to potential copyright issues, PON files are NOT contained in CaMutQC package. But we recommend public GATK panels of normals data as PON files, and they can be easily accessed from GATK resource bundle:
GRCh38: gs://gatk-best-practices/somatic-hg38/1000g_pon.hg38.vcf.gz
GRCh37: gs://gatk-best-practices/somatic-b37/Mutect2-exome-panel.vcf
MAF_pon <- mutFilterPON(MAFdat,
PONfile=system.file("extdata", "PON_test.txt",
package="CaMutQC"), PONformat="txt")
table(MAF_pon$CaTag)
##
## 0 0P
## 86 1
Here, we use a random PON file as an example to display how this function works, and 1 mutation is found in the PON file, and thus labeled by P flag.
Some database published germline variants and recurrent artifacts in distinct races. In CaMutQC, based on the parameters we collected 3 4 7, potential germline variants is removed based on annotation from those databases (if available) unless the allele frequency of a mutation recorded in those databases is lower than the VAF threshold (0.01) or the CliVar/OMIM/HGMD flags it as pathogenic.
COSMIC (the
Catalogue of Somatic Mutations In Cancer) has the most comprehensive
resource for exploring the impact of somatic mutations in oncology. The
team has assembled a list of genes that are somatically mutated and
causally implicated in human cancer 8, which is
called the The Cancer Gene Census and is updated periodically with new
genes. In VCF file annotated by VEP, the Existing_variation
column indicates a gene is in this COSMIC list if it has an annotation
ID starts with COSV
, COSM
or
COSN
.
By default, CaMutQC filters variants recorded in ExAC, Genomesprojects1000, ESP6500 and gnomAD, and always keeps variants in COSMIC no matter they are present in any germline database or not.
##
## 0 0D
## 46 41
We can see from the results that 41 mutations are labeled by D flag when we set the database VAF cutoff as 0.01 and filter mutations in the dbSNP database, much more than the mutations labeled in previous steps. Since this function is a part of the candidate variant selection process, more mutations might be labeled due to strict conditions and thresholds.
Most studies relate to cancer somatic mutations keep certain types of
variants in order to better target candidate variants, among which
exonic
and
nonsynonymous
are two of the most widely
used categories for filtration 3 9 10.
In CaMutQC, these two categories can be chosen in this step and
exonic
is the default option, while
nonsynonymous
will leave users
non-synonymous variants. More details could be found at Ensembl
Variation.
Variant classifications filtered when set as exonic:
RNA
, Intron
, IGR
,
5\'Flank
, 3\'Flank
, 5\'UTR
,
3\'UTR
Variant classifications filtered when set as nonsynonymous:
3'UTR
, 5\'UTR
, 3\'Flank
,
Targeted_Region
, Silent
, Intron
,
RNA
, IGR
, Splice_Region
,
5\'Flank
,
lincRNA
,De_novo_Start_InFrame
,
De_novo_Start_OutOfFrame
, Start_Codon_Ins
,
Start_Codon_SNP
,
Stop_Codon_Del
##
## 0 0T
## 12 75
##
## In_Frame_Del Missense_Mutation
## 1 11
75 synonymous mutations are labeled in this step, and the remained nonsynonymous mutations are more likely to be related to cancer development and progress.
In this step, users are able to further select variants related to
cancer development by providing an additional BED file (or a .rds file
with a bed
variable in it), and variants will be searched
only in target regions covered in the BED file. Besides, parameter
bedFilter
can be set as TRUE
to clean the bed
file (only leaves segments in Chr1-Chr22
, ChrX
and ChrY
).
MAF_reg <- mutFilterReg(MAFdat,
bedFile=system.file("extdata/bed/panel_hg19",
"FlCDx-hg19.rds", package="CaMutQC"))
table(MAF_reg$CaTag)
##
## 0R
## 87
No mutation is within the target region provided in this case, so all mutations get an R flag.
sub-functions mentioned above are divided into two groups according
to their definitions and the categories they belong to, which can be
reached through advanced function mutFilterTech
and
mutSelection
respectively. Each advanced function is
composed of multiple sub-functions that apply filtration on variants
from different aspects but the same category. After passing through the
advanced filter function, each variant may be labeled with more than one
flag that shows the filtration results.
In addition, mutFilterCom
function is an upper function
that combines mutFilterTech
and mutSelection
,
so any parameter in sub-functions can be set in
mutFilterCom
.
Function mutFilterTech
combines filtration strategies
for removing potential artifacts, including sequencing quality, strand
of bias, normal DP, PON and adjacent indel filtration.
Some variant callers add a tag if a variant pass the post-filtration
after calling. With CaMutQC, users can set a standard tag found in the
FILTER column of VCF file to keep variants. PASS
is set
as the default tag.
MAF_tech <- mutFilterTech(MAFdat, panel="Customized", tumorDP=8, minInterval=9,
tagFILTER=NULL, progressbar=FALSE,
PONfile=system.file("extdata", "PON_test.txt",
package="CaMutQC"), PONformat="txt")
table(MAF_tech$CaTag)
##
## 0 0A 0P 0Q 0QA 0S
## 46 1 1 32 1 6
There are 41 mutations labeled by mutFilterTech
in the
above example, and 1 mutations have 2 flags, suggesting it is more
likely to be a false positive under current settings.
In most cases, basic filtration by removing potential artifacts is not enough for selecting candidate variants that participate in the formation and development of tumor, because a number of germline variants or variants that do not influence phenotype are still remained in the data set. Therefore, candidate variant selection is a necessary step for downstream analyses.
The whole selection process in CaMutQC is composed of database
filtration, variant type filtration and region selection, all
incorporated in the mutSelection
function.
MAF_selec <- mutSelection(MAFdat, dbVAF=0.02, keepType='nonsynonymous', progressbar=FALSE)
table(MAF_selec$CaTag)
##
## 0 0DT 0T
## 12 7 68
12 mutations are selected as candidates by mutSelection
after filtering synonymous mutations and mutations with VAF >= 0.02
in databases.
mutFilerCom
A main function of CaMutQC is mutFilterCom
, which
integrates all sub-functions into a big function. And it includes other
functions that make CaMutQC an interactive and powerful tool, for
example, you can export the code, along with the parameters you set by
turning on the codelog
setting and specify
codelogFile
.
MAFCom <- mutFilterCom(MAFdat, panel="WES", report=FALSE, TMB=FALSE, progressbar=FALSE,
PONfile=system.file("extdata", "PON_test.txt",
package="CaMutQC"), PONformat="txt")
table(MAFCom$CaTag)
##
## 0 0Q 0QAT 0QDT 0QPT 0QSDT 0QST 0QT 0T
## 10 12 2 6 1 1 5 33 17
mutFilterCom
function is the combination of
mutFilterTech
and mutSelection
, which labels
77 mutations in our case. The results above clearly show the status of
each mutation, offering users much information for further filtration
and analyses.
By default, a vivid and detailed filter report will be saved
automatically each time after running mutFilterCom
. An
example filter report can be found here.
mutFilterCom
also supports the calculation of TMB.
Details about TMB can be found in Mutational analysis
section.
MAFCom_tmb <- mutFilterCom(MAFdat, panel="WES", assay="Customized", report=FALSE, TMB=TRUE,
bedFile=system.file("extdata/bed/panel_hg38",
"Pan-cancer-hg38.rds", package="CaMutQC"),
PONfile=system.file("extdata", "PON_test.txt", package="CaMutQC"),
PONformat="txt", progressbar=FALSE, verbose=FALSE)
## Warning in calTMB(maf, bedFile = bedFile, assay = assay, genelist = genelist, : Bed files in CaMutQC are not accurate. The result serves only as a reference.
## Method used to calculate TMB: Customized.
## Estimated TMB is: 0.847.
When running mutFilterCom
, mutFilterTech
or
mutSelection
, a progress bar and some messages will display
by default to notify users how the task goes, as well as some potential
issues. Users can turn off the message by setting
verbose=FALSE
. When TMB=TRUE
, the TMB will be
calculated using a specific assay and printed out on the screen. TMB is
0.847 in this case.
With CaMutQC, users are able to filter and select cancer somatic
mutations according to cancer types, where thresholds for parameters all
come from classical studies. mutFilterCan
function
integrates 10 cancer types so far, with different parameters for each
cancer type, for a more precise and customized filtration.
Cancer types supported in CaMutQC: COADREAD, BRCA, LIHC, LAML, LCML, UCEC, UCS, BLCA, KIRC, KIRP.
MAFCan <- mutFilterCan(MAFdat, cancerType='LAML', report=FALSE, TMB=FALSE,
progressbar=FALSE,
PONfile=system.file("extdata", "PON_test.txt",
package="CaMutQC"), PONformat="txt")
table(MAFCan$CaTag)
##
## 0 0D 0PD 0Q
## 33 48 1 5
After applying the filtering strategies of Acute myeloid leukemia (LAML), 33 out of 87 mutations are kept.
Sometimes, we may want to apply the same set of strategies in another
study, to become comparable with it. So far, filtering strategies used
in five studies are provided in CaMutQC. By passing one of the
references in the correct format into mutFilterRef
function, all filtering strategies in that study will be applied
automatically on your data.
MAFRef <- mutFilterRef(MAFdat, reference="Zhu_et_al-Nat_Commun-2020-KIRP",
report=FALSE, TMB=FALSE, progressbar=FALSE,
PONfile=system.file("extdata", "PON_test.txt",
package="CaMutQC"), PONformat="txt")
table(MAFRef$CaTag)
##
## 0 0D 0PD 0Q 0QD
## 34 30 1 12 10
After applying the same strategies used in
Zhu_et_al-Nat_Commun-2020-KIRP
, 34 mutations are left
without any flag.
Tumor Mutational Burden (TMB) refers to the number of somatic non-synonymous mutations per megabase pair (Mb) in a specific genomic region. In 2015, tumor non-synonymous mutation burden was first confirmed to be related to PD1/PD-L1 cancer immunotherapy 11. Through the analysis of mutation burden of patients with non-small cell lung cancer, the clinical response and survival rate and other indicators, researchers confirmed that the higher TMB of cancer patients have, the better effect of tumor immunotherapy would get. This conclusion was subsequently verified in other cancer types such as malignant melanoma 12 and small cell lung cancer 13. Therefore, TMB has become one of the predictive biomarkers of immune checkpoint and inhibitor immunotherapy in cancer treatment 14.
There are many assays for TMB measurement, including WGS, WES, targeted sequencing using gene panels, and sequencing of circulating tumor DNA in tumor samples or blood 15. Different from scientific research, conventional method of calculating TMB in clinical practice is to target-sequence tumor samples, which is to hybridize and capture the exon and intron regions of a certain number of cancer-related genes, without the need for WES sequencing. Currently, the most widely used panels are FoundationOneCDx (F1CDx) and MSK-IMPACT 9. The former only needs to sequence tumor samples, while the latter requires both the tumor sample and its matched normal sample to be sequenced. Both of them have certification from US Food and Drug Administration (FDA).
CaMutQC supports four assays for TMB calculation, including
FoundationOne, MSK-IMPACT (3 versions of genelist), Pan-cancer panel 16 and WES. By default, TMB is calculated using
MSK-IMPACT method (gene panel version 3, 468 genes). Also, users are
free to apply their own methods by setting parameter assay
as Customized
.
Note: the bed region files mentioned above are generated only from CDS regions, NOT the exact bed region, so the TMB results are only for reference.
tmb_value <- calTMB(MAFdat, assay='Customized',
bedFile=system.file("extdata/bed/panel_hg38","Pan-cancer-hg38.rds",
package="CaMutQC"))
## Warning in calTMB(MAFdat, assay = "Customized", bedFile = system.file("extdata/bed/panel_hg38", : Bed files in CaMutQC are not accurate. The result serves only as a reference.
## [1] 0.847
TMB value estimated by CaMutQC for this random MAF is 0.847. This is only an example case so it does not have any clinical meaning to be interpreted, but yours may have.
After verifying on published data sets, We believed combining
CaMutQC-filtered mutations from multiple variant callers is a great
approach to better eliminate the bias of single mutation caller while
rescuing potential false negative mutations. In this pipeline, the same
data set processed by three variant callers (MuSE, (MuTect2
and VarScan2) first goes
through CaMutQC filtration respectively and removes labeled mutations.
Then processMut
function takes three MAF data frames and
returns the union of mutations. And processMut
can also
take intersection of MAFs when asked.
maf_MuSE <- vcfToMAF(system.file("extdata/Multi-caller",
"WES_EA_T_1.MuSE.vep.vcf", package="CaMutQC"))
maf_MuSE_f <- mutFilterCom(maf_MuSE, report=FALSE, TMB=FALSE,
PONfile=system.file("extdata",
"PON_test.txt", package="CaMutQC"),
PONformat = "txt", progressbar=FALSE)
maf_VarScan2 <- vcfToMAF(system.file("extdata/Multi-caller",
"WES_EA_T_1_varscan_filter_snp.vep.vcf", package="CaMutQC"))
maf_VarScan2_f <- mutFilterCom(maf_VarScan2, report=FALSE, TMB=FALSE,
PONfile=system.file("extdata",
"PON_test.txt", package="CaMutQC"),
PONformat="txt", progressbar=FALSE)
MAFdat_f <- mutFilterCom(MAFdat, report=FALSE, TMB=FALSE,
PONfile=system.file("extdata", "PON_test.txt", package= "CaMutQC"),
PONformat="txt", progressbar=FALSE)
mafs <- list(maf_MuSE_f, maf_VarScan2_f, MAFdat_f)
maf_union <- processMut(mafs, processMethod = "union")
maf_union
## Hugo_Symbol NCBI_Build Chromosome Start_Position End_Position
## 1 C1QTNF2 GRCh38 chr5 160354994 160354994
## 2 PTPN13 GRCh38 chr4 86769842 86769842
## 3 SLC6A7 GRCh38 chr5 150204585 150204585
## 4 BAZ1B GRCh38 chr7 73469636 73469636
## 5 DNAJA3 GRCh38 chr16 4454855 4454855
## 6 MYBBP1A GRCh38 chr17 4545725 4545725
## 7 KIF1C GRCh38 chr17 5007047 5007047
## 8 SH2D3A GRCh38 chr19 6755018 6755018
## 9 AGRN GRCh38 chr1 1049980 1049980
## 10 MUC20 GRCh38 chr3 195725999 195725999
## 11 WDR17 GRCh38 chr4 176120051 176120052
## 12 COL22A1 GRCh38 chr8 138737577 138737577
## 13 CCDC7 GRCh38 chr10 32567833 32567833
## 14 FMNL3 GRCh38 chr12 49649096 49649096
## 15 NEMF GRCh38 chr14 49840710 49840710
## 16 L2HGDH GRCh38 chr14 50246124 50246125
## 17 CCDC33 GRCh38 chr15 74280669 74280669
## 18 MPG GRCh38 chr16 79675 79675
## 19 RNF157 GRCh38 chr17 76212499 76212500
## 20 SLC2A11 GRCh38 chr22 23875150 23875150
## Variant_Classification Variant_Type Reference_Allele
## 1 Silent SNP G
## 2 Silent SNP T
## 3 Silent SNP G
## 4 Nonsense_Mutation SNP G
## 5 Nonsense_Mutation SNP G
## 6 Missense_Mutation SNP T
## 7 Missense_Mutation SNP C
## 8 Missense_Mutation SNP T
## 9 Missense_Mutation SNP G
## 10 Missense_Mutation SNP G
## 11 Missense_Mutation DNP AG
## 12 Missense_Mutation SNP C
## 13 Missense_Mutation SNP C
## 14 Silent SNP C
## 15 Splice_Region SNP C
## 16 Splice_Region INS -
## 17 Missense_Mutation SNP G
## 18 Missense_Mutation SNP C
## 19 Targeted_Region INS -
## 20 Silent SNP G
## Tumor_Seq_Allele2
## 1 A
## 2 C
## 3 C
## 4 C
## 5 T
## 6 C
## 7 T
## 8 C
## 9 C
## 10 A
## 11 GA
## 12 A
## 13 T
## 14 T
## 15 G
## 16 A
## 17 T
## 18 A
## 19 TCCTGACCTCAGGTGATCCATCCGCCTCGGCCTCCCAAAGTGCTGGG
## 20 A
Here, three dataset are first converted from VCF to MAF, then
filtered by mutFilterCom
, and finally taken union. Due to
the fact that even the same mutation have different depths, VAFs, etc in
different dataset, only 7 columns will be kept after taking union, as
displayed above.
Tired of finding or memorizing best parameters? You can share your
own filtration strategies/parameters set in the CaMutQC community
by opening a new issue with a parameter set
label. Every
six months, CaMutQC will be updated to include top-rated parameter sets
in mutFilterRef
function, with the name of author’s Github
username. Start using, sharing and contributing NOW!
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] CaMutQC_1.3.0 BiocStyle_2.33.1
##
## loaded via a namespace (and not attached):
## [1] RColorBrewer_1.1-3 shape_1.4.6.1 sys_3.4.3
## [4] jsonlite_1.8.9 magrittr_2.0.3 ggtangle_0.0.3
## [7] farver_2.1.2 rmarkdown_2.28 GlobalOptions_0.1.2
## [10] fs_1.6.4 zlibbioc_1.51.2 vctrs_0.6.5
## [13] memoise_2.0.1 ggtree_3.13.2 htmltools_0.5.8.1
## [16] gridGraphics_0.5-1 pracma_2.4.4 sass_0.4.9
## [19] bslib_0.8.0 htmlwidgets_1.6.4 plyr_1.8.9
## [22] cachem_1.1.0 buildtools_1.0.0 igraph_2.1.1
## [25] lifecycle_1.0.4 iterators_1.0.14 pkgconfig_2.0.3
## [28] Matrix_1.7-1 R6_2.5.1 fastmap_1.2.0
## [31] gson_0.1.0 clue_0.3-65 GenomeInfoDbData_1.2.13
## [34] digest_0.6.37 aplot_0.2.3 enrichplot_1.25.5
## [37] colorspace_2.1-1 maftools_2.21.3 patchwork_1.3.0
## [40] AnnotationDbi_1.69.0 S4Vectors_0.43.2 RSQLite_2.3.7
## [43] org.Hs.eg.db_3.20.0 vegan_2.6-8 fansi_1.0.6
## [46] httr_1.4.7 mgcv_1.9-1 compiler_4.4.1
## [49] withr_3.0.2 bit64_4.5.2 doParallel_1.0.17
## [52] BiocParallel_1.39.0 DBI_1.2.3 R.utils_2.12.3
## [55] MASS_7.3-61 MesKit_1.15.0 rjson_0.2.23
## [58] DNAcopy_1.79.0 permute_0.9-7 tools_4.4.1
## [61] ape_5.8 quadprog_1.5-8 R.oo_1.26.0
## [64] glue_1.8.0 nlme_3.1-166 GOSemSim_2.31.2
## [67] grid_4.4.1 cluster_2.1.6 reshape2_1.4.4
## [70] memuse_4.2-3 fgsea_1.31.6 generics_0.1.3
## [73] gtable_0.3.6 R.methodsS3_1.8.2 tidyr_1.3.1
## [76] pinfsc50_1.3.0 data.table_1.16.2 utf8_1.2.4
## [79] XVector_0.45.0 BiocGenerics_0.51.3 ggrepel_0.9.6
## [82] foreach_1.5.2 pillar_1.9.0 stringr_1.5.1
## [85] yulab.utils_0.1.7 circlize_0.4.16 splines_4.4.1
## [88] dplyr_1.1.4 treeio_1.29.2 lattice_0.22-6
## [91] survival_3.7-0 bit_4.5.0 tidyselect_1.2.1
## [94] GO.db_3.20.0 ComplexHeatmap_2.21.1 maketools_1.3.1
## [97] Biostrings_2.73.2 knitr_1.48 IRanges_2.39.2
## [100] stats4_4.4.1 xfun_0.48 Biobase_2.65.1
## [103] matrixStats_1.4.1 DT_0.33 stringi_1.8.4
## [106] UCSC.utils_1.1.0 lazyeval_0.2.2 ggfun_0.1.7
## [109] yaml_2.3.10 evaluate_1.0.1 codetools_0.2-20
## [112] tibble_3.2.1 qvalue_2.37.0 BiocManager_1.30.25
## [115] ggplotify_0.1.2 cli_3.6.3 munsell_0.5.1
## [118] jquerylib_0.1.4 Rcpp_1.0.13 GenomeInfoDb_1.41.2
## [121] vcfR_1.15.0 png_0.1-8 parallel_4.4.1
## [124] ggplot2_3.5.1 blob_1.2.4 mclust_6.1.1
## [127] clusterProfiler_4.13.4 DOSE_3.99.1 phangorn_2.12.1
## [130] viridisLite_0.4.2 tidytree_0.4.6 ggridges_0.5.6
## [133] scales_1.3.0 purrr_1.0.2 crayon_1.5.3
## [136] GetoptLong_1.0.5 rlang_1.1.4 cowplot_1.1.3
## [139] fastmatch_1.1-4 KEGGREST_1.45.1
Guo Y, Li J, Li CI, Long J, Samuels DC, Shyr Y. The effect of strand bias in Illumina short-read sequencing data. BMC Genomics. 2012;13:666. Published 2012 Nov 24. doi:10.1186/1471-2164-13-666
Guo Y, Cai Q, Samuels DC, et al. The use of next generation sequencing technology to study the effect of radiation therapy on mitochondrial DNA mutation. Mutat Res. 2012;744(2):154-160. doi:10.1016/j.mrgentox.2012.02.006
Ellrott K, Bailey MH, Saksena G, et al. Scalable Open Science Approach for Mutation Calling of Tumor Exomes Using Multiple Genomic Pipelines. Cell Syst. 2018;6(3):271-281.e7. doi:10.1016/j.cels.2018.03.002
Pereira B, Chin SF, Rueda OM, et al. The somatic mutation profiles of 2,433 breast cancers refines their genomic and transcriptomic landscapes. Nat Commun. 2016;7:11479. Published 2016 May 10. doi:10.1038/ncomms11479
Brastianos PK, Carter SL, Santagata S, et al. Genomic Characterization of Brain Metastases Reveals Branched Evolution and Potential Therapeutic Targets. Cancer Discov. 2015;5(11):1164-1177. doi:10.1158/2159-8290.CD-15-0369
Sethi NS, Kikuchi O, Duronio GN, et al. Early TP53 alterations engage environmental exposures to promote gastric premalignancy in an integrative mouse model. Nat Genet. 2020;52(2):219-230. doi:10.1038/s41588-019-0574-9
Xue R, Chen L, Zhang C, et al. Genomic and Transcriptomic Profiling of Combined Hepatocellular and Intrahepatic Cholangiocarcinoma Reveals Distinct Molecular Subtypes. Cancer Cell. 2019;35(6):932-947.e8. doi:10.1016/j.ccell.2019.04.007
Futreal PA, Coin L, Marshall M, et al. A census of human cancer genes. Nat Rev Cancer. 2004;4(3):177-183. doi:10.1038/nrc1299
Cheng DT, Mitchell TN, Zehir A, et al. Memorial Sloan Kettering-Integrated Mutation Profiling of Actionable Cancer Targets (MSK-IMPACT): A Hybridization Capture-Based Next-Generation Sequencing Clinical Assay for Solid Tumor Molecular Oncology. J Mol Diagn. 2015;17(3):251-264. doi:10.1016/j.jmoldx.2014.12.006
Sakamoto H, Attiyeh MA, Gerold JM, et al. The Evolutionary Origins of Recurrent Pancreatic Cancer. Cancer Discov. 2020;10(6):792-805. doi:10.1158/2159-8290.CD-19-1508
Rizvi NA, Hellmann MD, Snyder A, et al. Cancer immunology. Mutational landscape determines sensitivity to PD-1 blockade in non-small cell lung cancer. Science. 2015;348(6230):124-128. doi:10.1126/science.aaa1348
Snyder A, Makarov V, Merghoub T, et al. Genetic basis for clinical response to CTLA-4 blockade in melanoma [published correction appears in N Engl J Med. 2018 Nov 29;379(22):2185]. N Engl J Med. 2014;371(23):2189-2199. doi:10.1056/NEJMoa1406498
Hellmann MD, Callahan MK, Awad MM, et al. Tumor Mutational Burden and Efficacy of Nivolumab Monotherapy and in Combination with Ipilimumab in Small-Cell Lung Cancer [published correction appears in Cancer Cell. 2019 Feb 11;35(2):329]. Cancer Cell. 2018;33(5):853-861.e4. doi:10.1016/j.ccell.2018.04.001
Lee M, Samstein RM, Valero C, Chan TA, Morris LGT. Tumor mutational burden as a predictive biomarker for checkpoint inhibitor immunotherapy. Hum Vaccin Immunother. 2020;16(1):112-115. doi:10.1080/21645515.2019.1631136
Stenzinger A, Allen JD, Maas J, et al. Tumor mutational burden standardization initiatives: Recommendations for consistent tumor mutational burden assessment in clinical samples to guide immunotherapy treatment decisions. Genes Chromosomes Cancer. 2019;58(8):578-588. doi:10.1002/gcc.22733
Xu Z, Dai J, Wang D, et al. Assessment of tumor mutation burden calculation from gene panel sequencing data. Onco Targets Ther. 2019;12:3401-3409. Published 2019 May 6. doi:10.2147/OTT.S196638