TCGAbiolinks: Searching, downloading and visualizing mutation files

Search and Download

TCGAbiolinks has provided a few functions to download mutation data from GDC. There are two options to download the data:

  1. Use GDCquery, GDCdownload and GDCpreprare to download MAF aligned against hg38
  2. Use GDCquery, GDCdownload and GDCpreprare to download MAF aligned against hg19
  3. Use getMC3MAF(), to download MC3 MAF from https://gdc.cancer.gov/about-data/publications/mc3-2017

Mutation data (hg38)

This example will download Aggregate GDC MAFs. For more information please access https://github.com/NCI-GDC/gdc-maf-tool and GDC docs.

query <- GDCquery(
    project = "TCGA-CHOL", 
    data.category = "Simple Nucleotide Variation", 
    access = "open",
    data.type = "Masked Somatic Mutation", 
    workflow.type = "Aliquot Ensemble Somatic Variant Merging and Masking"
)
GDCdownload(query)
maf <- GDCprepare(query)
# Only first 50 to make render faster
datatable(maf[1:20,],
          filter = 'top',
          options = list(scrollX = TRUE, keys = TRUE, pageLength = 5), 
          rownames = FALSE)

Mutation data MC3 file

This will download the MC3 MAF file from https://gdc.cancer.gov/about-data/publications/mc3-2017, and add project each sample belongs.

maf <- getMC3MAF()

Visualize the data

To visualize the data you can use the Bioconductor package maftools. For more information, please check its vignette.

library(maftools)
library(dplyr)
query <- GDCquery(
    project = "TCGA-CHOL", 
    data.category = "Simple Nucleotide Variation", 
    access = "open",
    data.type = "Masked Somatic Mutation", 
    workflow.type = "Aliquot Ensemble Somatic Variant Merging and Masking"
)
GDCdownload(query)
maf <- GDCprepare(query)

maf <- maf %>% maftools::read.maf
datatable(getSampleSummary(maf),
          filter = 'top',
          options = list(scrollX = TRUE, keys = TRUE, pageLength = 5), 
          rownames = FALSE)
plotmafSummary(maf = maf, rmOutlier = TRUE, addStat = 'median', dashboard = TRUE)

oncoplot(maf = maf, top = 10, removeNonMutated = TRUE)
titv = titv(maf = maf, plot = FALSE, useSyn = TRUE)
#plot titv summary
plotTiTv(res = titv)