Title: | Statistical Analysis of Regions with CNVs |
---|---|
Description: | Imports a cov/coverage file (normalised read coverages from BAM files) and a cnv file (list of CNVs - similiar to a BED file) from WES/ WGS CNV (copy number variation) detection pipelines and utilises several metrics to weigh the likelihood of a sample containing a detected CNV being a true CNV or a false positive. Highly useful for diagnostic testing to filter out false positives to provide clinicians with fewer variants to interpret. SARC uniquely only used cov and csv (similiar to BED file) files which are the common CNV pipeline calling filetypes, and can be used as to supplement the Interactive Genome Browser (IGV) to generate many figures automatedly, which can be especially helpful in large cohorts with 100s-1000s of patients. |
Authors: | Krutik Patel [aut, cre] |
Maintainer: | Krutik Patel <[email protected]> |
License: | GPL-3 |
Version: | 1.5.0 |
Built: | 2024-11-19 04:16:56 UTC |
Source: | https://github.com/bioc/SARC |
For the length of the CNV (+padding, if this was done), exons and gene symbol information will be attached in each Grange object.
addExonsGenes(RE, covgranges, txdb, txgene)
addExonsGenes(RE, covgranges, txdb, txgene)
RE |
RaggedExperiment object used to store all information. |
covgranges |
List of grange objects created by the SARC::regionGrangeMake function. This should be found in the metadata of the RE object. |
txdb |
Loaded txDB object e.g. TxDb.Hsapiens.UCSC.hg38.knownGene. Standard usage from the GenomicFeatures tutorial https://kasperdanielhansen.github.io/genbioconductor/html/GenomicFeatures.html. |
txgene |
List of genes and exons associated with each genomic region of interest. Requires specific species package to make e.g. Homo.sapiens for human samples. |
A list of grange objects with exons and genes added. One grange object for each detected CNV. Will be stored in the RE object provided as metadata.
if (requireNamespace("TxDb.Hsapiens.UCSC.hg38.knownGene", quietly = TRUE)) { require("TxDb.Hsapiens.UCSC.hg38.knownGene") } else {} if (requireNamespace("Homo.sapiens", quietly = TRUE)) { require("Homo.sapiens") } else {} data("test_cnv") test_cnv <- test_cnv[c(1),] data("test_cov") SARC <- regionSet(cnv = test_cnv, cov = test_cov) SARC <- plotCovPrep(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[1]], startlist = metadata(SARC)[[2]], endlist = metadata(SARC)[[3]]) SARC <- regionGrangeMake(RE = SARC, covprepped = metadata(SARC)[[4]]) TxDb(Homo.sapiens) <- TxDb.Hsapiens.UCSC.hg38.knownGene txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene tx <- transcriptsBy(Homo.sapiens, columns = "SYMBOL") txgene <- tx@unlistData SARC <- addExonsGenes(RE = SARC, covgranges = metadata(SARC)[[5]], txdb = txdb, txgene = txgene)
if (requireNamespace("TxDb.Hsapiens.UCSC.hg38.knownGene", quietly = TRUE)) { require("TxDb.Hsapiens.UCSC.hg38.knownGene") } else {} if (requireNamespace("Homo.sapiens", quietly = TRUE)) { require("Homo.sapiens") } else {} data("test_cnv") test_cnv <- test_cnv[c(1),] data("test_cov") SARC <- regionSet(cnv = test_cnv, cov = test_cov) SARC <- plotCovPrep(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[1]], startlist = metadata(SARC)[[2]], endlist = metadata(SARC)[[3]]) SARC <- regionGrangeMake(RE = SARC, covprepped = metadata(SARC)[[4]]) TxDb(Homo.sapiens) <- TxDb.Hsapiens.UCSC.hg38.knownGene txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene tx <- transcriptsBy(Homo.sapiens, columns = "SYMBOL") txgene <- tx@unlistData SARC <- addExonsGenes(RE = SARC, covgranges = metadata(SARC)[[5]], txdb = txdb, txgene = txgene)
Apply anova on the CNV data. For each CNV listed in the input cnv file, the read-depths will be used to calculate if this region of the DNA, in this sample, is significantly different from other samples in the cohort of samples, at this same region.
anovaOnCNV(RE, cnv, anovacov, nameofnewdf = "CNVanova")
anovaOnCNV(RE, cnv, anovacov, nameofnewdf = "CNVanova")
RE |
RaggedExperiment object used to store all information. |
cnv |
List of CNVs in a dataframe containing CNVs from detection algorithms/ pipelines. |
anovacov |
List of dataframes which contain the required information to perform anova for each region with a detected CNV. Generated from prepAnova, and stored as metadata. |
nameofnewdf |
Name of new dataframe to be saved in metadata(RE)[['CNVlist']]. Default is CNVanova. |
A new cnv dataframe with an additional column for the p-value from anova. Will be stored as an experiment in the RE object.
data("test_cnv") data("test_cov") SARC <- regionSet(cnv = test_cnv, cov = test_cov) SARC <- regionSplit(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[1]], startlist = metadata(SARC)[[2]], endlist = metadata(SARC)[[3]]) SARC <- prepAnova(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[1]], start = metadata(SARC)[[2]], end = metadata(SARC)[[3]]) SARC <- anovaOnCNV(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[1]], anovacov = metadata(SARC)[[4]])
data("test_cnv") data("test_cov") SARC <- regionSet(cnv = test_cnv, cov = test_cov) SARC <- regionSplit(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[1]], startlist = metadata(SARC)[[2]], endlist = metadata(SARC)[[3]]) SARC <- prepAnova(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[1]], start = metadata(SARC)[[2]], end = metadata(SARC)[[3]]) SARC <- anovaOnCNV(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[1]], anovacov = metadata(SARC)[[4]])
Flags the confidence of the CNV being a true CNV. The flags range from Very Unconfident - Very Confident. Our clinicians preferred variants to be flagged rather than filtered out - so we simply do this. Confidence scores will be added to the cnv file given. The cnv file should be one which contains all the statistical results from other functions of the SARC package.
cnvConfidence( RE, cnv, ph = FALSE, m1 = 0.2, m2 = 0.8, m3 = 1.2, m4 = 1.8, nameofnewdf = "CNVrank" )
cnvConfidence( RE, cnv, ph = FALSE, m1 = 0.2, m2 = 0.8, m3 = 1.2, m4 = 1.8, nameofnewdf = "CNVrank" )
RE |
RaggedExperiment object used to store all information. |
cnv |
List of CNVs in a dataframe containing CNVs from detection algorithms/ pipelines. Must use one which contains at least MeanScore, Qlow, Qhigh and anova p-values. If Post-hoc tests have also been performed, these results can also be used by this function. |
ph |
If Dunnet tests was performed, the p-value from the tests can be taken into account. Default is FALSE. |
m1 |
Value used to cut-off the highest mean score allowed for HOMOZYGOUS DELECTIONS, and lowest cut-off for HETEROZYGOUS DELETIONS. Default is 0.2. |
m2 |
Value used to cut-off the highest mean score allowed for HETEROZYGOUS DELETINS. Default is 0.8. |
m3 |
Value used to cut-off the lowest mean score allowed for HETEROZYGOUS DUPLICATIONS. Default is 1.2. |
m4 |
Value used to cut-off the lowest mean score allowed for HETEROZYGOUS DUPLICATIONS. Default is 1.8. |
nameofnewdf |
Name of new dataframe to be saved in metadata(RE)[['CNVlist']]. Default is CNVrank. |
A new cnv file with additional columns which describe our level of confidence of the detected CNV being a true CNV.
data("test_cnv") test_cnv <- test_cnv[c(1:3),] data("test_cov") SARC <- regionSet(cnv = test_cnv, cov = test_cov) SARC <- regionSplit(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[1]], startlist = metadata(SARC)[[2]], endlist = metadata(SARC)[[3]]) SARC <- regionMean(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[1]], splitcov = metadata(SARC)[[4]]) SARC <- regionQuantiles(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[2]], meancov = metadata(SARC)[[3]], q1 =.1, q2 = .9) SARC <- prepAnova(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[3]], start = metadata(SARC)[[2]], end=metadata(SARC)[[3]]) SARC <- anovaOnCNV(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[3]], anovacov = metadata(SARC)[[8]]) SARC <- cnvConfidence(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[4]])
data("test_cnv") test_cnv <- test_cnv[c(1:3),] data("test_cov") SARC <- regionSet(cnv = test_cnv, cov = test_cov) SARC <- regionSplit(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[1]], startlist = metadata(SARC)[[2]], endlist = metadata(SARC)[[3]]) SARC <- regionMean(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[1]], splitcov = metadata(SARC)[[4]]) SARC <- regionQuantiles(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[2]], meancov = metadata(SARC)[[3]], q1 =.1, q2 = .9) SARC <- prepAnova(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[3]], start = metadata(SARC)[[2]], end=metadata(SARC)[[3]]) SARC <- anovaOnCNV(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[3]], anovacov = metadata(SARC)[[8]]) SARC <- cnvConfidence(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[4]])
Add Genes and Exons to the cnv file in order to show the range the detected CNV extends. This is useful information for clinicians/ diagnostic scientists when interpreting the variants. Note - if padding was performed during plotCovPrep step, the CNV range would be extended by the padding so extra exons may be included in these cases.
pasteExonsGenes(RE, setup, cnv, nameofnewdf = "CNVexonsgenes")
pasteExonsGenes(RE, setup, cnv, nameofnewdf = "CNVexonsgenes")
RE |
RaggedExperiment object used to store all information. |
setup |
List of dataframes which have been processed for plotting. This will be stored as metadata in the RE object after SARC::setupCNVPlot. |
cnv |
cnv file containing CNVs which the user wishes to generate plots for. It is recommended that the most recently created cnv file is used. Check print(RE) to see more cnv files created by SARC. |
nameofnewdf |
Name of new dataframe to be saved in metadata(RE)[['CNVlist']]. Default is CNVexonsgenes. |
A new cnv file with an additional column which lists the genes and exons which the detected variant ranges.
if (requireNamespace("TxDb.Hsapiens.UCSC.hg38.knownGene", quietly = TRUE)) { require("TxDb.Hsapiens.UCSC.hg38.knownGene") } else {} if (requireNamespace("Homo.sapiens", quietly = TRUE)) { require("Homo.sapiens") } else {} data("test_cnv") test_cnv <- test_cnv[c(1),] data("test_cov") SARC <- regionSet(cnv = test_cnv, cov = test_cov) SARC <- regionSplit(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[1]], startlist = metadata(SARC)[[2]], endlist = metadata(SARC)[[3]]) SARC <- plotCovPrep(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[1]], startlist = metadata(SARC)[[2]], endlist = metadata(SARC)[[3]]) SARC <- regionGrangeMake(RE = SARC, covprepped = metadata(SARC)[[4]]) TxDb(Homo.sapiens) <- TxDb.Hsapiens.UCSC.hg38.knownGene txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene tx <- transcriptsBy(Homo.sapiens, columns = "SYMBOL") txgene <- tx@unlistData SARC <- addExonsGenes(RE = SARC, covgranges = metadata(SARC)[[6]], txdb = txdb, txgene = txgene) SARC <- setupCNVplot(RE = SARC, namedgranges = metadata(SARC)[[7]], covprepped = metadata(SARC)[[4]]) SARC <- pasteExonsGenes(RE = SARC, setup = metadata(SARC)[[8]], cnv = metadata(SARC)[['CNVlist']][[1]])
if (requireNamespace("TxDb.Hsapiens.UCSC.hg38.knownGene", quietly = TRUE)) { require("TxDb.Hsapiens.UCSC.hg38.knownGene") } else {} if (requireNamespace("Homo.sapiens", quietly = TRUE)) { require("Homo.sapiens") } else {} data("test_cnv") test_cnv <- test_cnv[c(1),] data("test_cov") SARC <- regionSet(cnv = test_cnv, cov = test_cov) SARC <- regionSplit(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[1]], startlist = metadata(SARC)[[2]], endlist = metadata(SARC)[[3]]) SARC <- plotCovPrep(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[1]], startlist = metadata(SARC)[[2]], endlist = metadata(SARC)[[3]]) SARC <- regionGrangeMake(RE = SARC, covprepped = metadata(SARC)[[4]]) TxDb(Homo.sapiens) <- TxDb.Hsapiens.UCSC.hg38.knownGene txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene tx <- transcriptsBy(Homo.sapiens, columns = "SYMBOL") txgene <- tx@unlistData SARC <- addExonsGenes(RE = SARC, covgranges = metadata(SARC)[[6]], txdb = txdb, txgene = txgene) SARC <- setupCNVplot(RE = SARC, namedgranges = metadata(SARC)[[7]], covprepped = metadata(SARC)[[4]]) SARC <- pasteExonsGenes(RE = SARC, setup = metadata(SARC)[[8]], cnv = metadata(SARC)[['CNVlist']][[1]])
Applies post-hoc test Dunnet test on each CNV in the cnv file. Performs a pair-wise test between the sample where the CNV was detected and each other sample at the same genomic region. WARNING - very slow, only use if a small number of CNVs / samples are available. Other tests e.g. Quantile Distribution are more effective with larger cohorts.
phDunnetonCNV(RE, cnv, anovacov, nameofnewdf = "CNVdunnet")
phDunnetonCNV(RE, cnv, anovacov, nameofnewdf = "CNVdunnet")
RE |
RaggedExperiment object used to store all information. |
cnv |
List of CNVs in a dataframe containing CNVs from detection algorithms/ pipelines. It is recommended to use the most recently created cnv file. Use print(RE) to check. |
anovacov |
List of dataframes which contain the required information to perform anova for each region with a detected CNV. Generated from prepAnova. |
nameofnewdf |
Name of new dataframe to be saved in metadata(RE)[['CNVlist']]. Default is CNVdunnet. |
A new cnv dataframe with an additional column of the sumlog p-value from each dunnet test performed for the CNV entry.
Plots of the region of the DNA from several samples where a CNV is detected. The sample with the detected CNV, from the cnv file, will be highlighted in purple. Additional specificity such as gene of interest, or batch of WES/ WGS can be specified. This function is made to be easily looped for multiple CNVs. It is also made as a quick and more visually preferable alternative to looking for true CNVs visually.
plotCNV( cnv, setup, FilteredCNV = 1, batch = NULL, nSamples = NULL, gene = NULL, log = NULL, logbase = 2 )
plotCNV( cnv, setup, FilteredCNV = 1, batch = NULL, nSamples = NULL, gene = NULL, log = NULL, logbase = 2 )
cnv |
List of CNVs in a dataframe containing CNVs from detection algorithms/ pipelines. It is recommended to use the most recently generated cnv file. Check print(RE) to see if cnv files from other analyses have been generated. |
setup |
ist of dataframes which have been processed for plotting. This will be stored as metadata in the RE object after SARC::setupCNVPlot. |
FilteredCNV |
Which CNV from the list of dataframes (setup) should be plotted. Can be automated within a for-loop. Default is 1. |
batch |
If WES/ WGS was performed in batches, users may wish to plot samples from different batches separately. In this case, a column called 'BATCH' should be found in the cnv file. Different batches should be labelled with differentiating strings, and this string should be added to the batch parameter. Default is NULL. |
nSamples |
Numbers of samples to be printed in the plots. Default is 10. Add an integer to represent how many subplots should be made. The higher the number, the more difficult the plot will be for viewing. |
gene |
String of a gene of interest. Only this gene will be printed. It must match standard gene symbols e.g. FCGR3A, P53. Only one string should be inputted at a time. The default is null. |
log |
Applies log normalisation to reads. This can be helpful in WES data, where reads can be quite disperse across a short region of DNA. Recommended is 2, and default is NULL. |
logbase |
Base to log read depth by - default is 2. |
A grid plot showing the read-depths for a specific regions of the the genome. This region will contain one samples which will have had a CNV detected. Heat-plot like colouring will be used to visually show if the sample has a significant change at this region, in contrast to several other samples.
if (requireNamespace("TxDb.Hsapiens.UCSC.hg38.knownGene", quietly = TRUE)) { require("TxDb.Hsapiens.UCSC.hg38.knownGene") } else {} if (requireNamespace("Homo.sapiens", quietly = TRUE)) { require("Homo.sapiens") } else {} data("test_cnv") test_cnv <- test_cnv[c(1),] data("test_cov") SARC <- regionSet(cnv = test_cnv, cov = test_cov) SARC <- regionSplit(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[1]], startlist = metadata(SARC)[[2]], endlist = metadata(SARC)[[3]]) SARC <- plotCovPrep(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[1]], startlist = metadata(SARC)[[2]], endlist = metadata(SARC)[[3]]) SARC <- regionGrangeMake(RE = SARC, covprepped = metadata(SARC)[[4]]) TxDb(Homo.sapiens) <- TxDb.Hsapiens.UCSC.hg38.knownGene txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene tx <- transcriptsBy(Homo.sapiens, columns = "SYMBOL") txgene <- tx@unlistData SARC <- addExonsGenes(RE = SARC, covgranges = metadata(SARC)[[6]], txdb = txdb, txgene = txgene) SARC <- setupCNVplot(RE = SARC, namedgranges = metadata(SARC)[[7]], covprepped = metadata(SARC)[[4]]) p <- plotCNV(cnv = metadata(SARC)[['CNVlist']][[1]], setup = metadata(SARC)[[8]], FilteredCNV=1)
if (requireNamespace("TxDb.Hsapiens.UCSC.hg38.knownGene", quietly = TRUE)) { require("TxDb.Hsapiens.UCSC.hg38.knownGene") } else {} if (requireNamespace("Homo.sapiens", quietly = TRUE)) { require("Homo.sapiens") } else {} data("test_cnv") test_cnv <- test_cnv[c(1),] data("test_cov") SARC <- regionSet(cnv = test_cnv, cov = test_cov) SARC <- regionSplit(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[1]], startlist = metadata(SARC)[[2]], endlist = metadata(SARC)[[3]]) SARC <- plotCovPrep(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[1]], startlist = metadata(SARC)[[2]], endlist = metadata(SARC)[[3]]) SARC <- regionGrangeMake(RE = SARC, covprepped = metadata(SARC)[[4]]) TxDb(Homo.sapiens) <- TxDb.Hsapiens.UCSC.hg38.knownGene txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene tx <- transcriptsBy(Homo.sapiens, columns = "SYMBOL") txgene <- tx@unlistData SARC <- addExonsGenes(RE = SARC, covgranges = metadata(SARC)[[6]], txdb = txdb, txgene = txgene) SARC <- setupCNVplot(RE = SARC, namedgranges = metadata(SARC)[[7]], covprepped = metadata(SARC)[[4]]) p <- plotCNV(cnv = metadata(SARC)[['CNVlist']][[1]], setup = metadata(SARC)[[8]], FilteredCNV=1)
Prepares plotting of coverage at specific regions of the DNA - where CNVs have been detected.
plotCovPrep(RE, cnv, n1 = 0, n2 = 0, startlist, endlist)
plotCovPrep(RE, cnv, n1 = 0, n2 = 0, startlist, endlist)
RE |
RaggedExperiment object used to store all information. |
cnv |
List of CNVs in a dataframe containing CNVs from detection algorithms/ pipelines. Must use one which contains at least MeanScore, Qlow, Qhigh and anova p-values. If Post-hoc tests have also been performed, these results can also be used by this function. |
n1 |
How many rows to pad the start site of each CNV - default is 0. |
n2 |
How many rows to pad the end site of each CNV - default is 0. |
startlist |
List of start sites created from SARC::regionSet and stored as metadata. |
endlist |
List of end sites created from SARC::regionSet and stored as metadata. |
A new list of dataframes, each df is a region where a CNV was detected.
data("test_cnv") test_cnv <- test_cnv[c(1:3),] data("test_cov") SARC <- regionSet(cnv = test_cnv, cov = test_cov) SARC <- regionSplit(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[1]], startlist = metadata(SARC)[[2]], endlist = metadata(SARC)[[3]]) SARC <- plotCovPrep(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[1]], startlist = metadata(SARC)[[2]], endlist = metadata(SARC)[[3]])
data("test_cnv") test_cnv <- test_cnv[c(1:3),] data("test_cov") SARC <- regionSet(cnv = test_cnv, cov = test_cov) SARC <- regionSplit(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[1]], startlist = metadata(SARC)[[2]], endlist = metadata(SARC)[[3]]) SARC <- plotCovPrep(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[1]], startlist = metadata(SARC)[[2]], endlist = metadata(SARC)[[3]])
Function for annovaOnCNV.This function sets up samples for anova analysis. Can also be used as input for Dunnet analysis.
prepAnova(RE, cnv, startlist, endlist)
prepAnova(RE, cnv, startlist, endlist)
RE |
RaggedExperiment object used to store all information. |
cnv |
List of CNVs in a dataframe containing CNVs from detection algorithms/ pipelines. It is recommended that the most recently created cnv file is used. Check print(RE) to see more cnv files created by SARC. |
startlist |
List of start sites created from SARC::regionSet and stored as metadata. |
endlist |
List of end sites created from SARC::regionSet and stored as metadata. |
Additional list of dataframes which will be needed to calculate anova for each region with a detected CNV.
data("test_cnv") data("test_cov") SARC <- regionSet(cnv = test_cnv, cov = test_cov) SARC <- regionSplit(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[1]], startlist = metadata(SARC)[[2]], endlist = metadata(SARC)[[3]]) SARC <- regionMean(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[1]], splitcov = metadata(SARC)[[4]]) SARC <- regionQuantiles(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[2]], meancov = metadata(SARC)[[5]], q1=.1, q2=.9) SARC <- prepAnova(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[3]], startlist = metadata(SARC)[[2]], endlist = metadata(SARC)[[3]])
data("test_cnv") data("test_cov") SARC <- regionSet(cnv = test_cnv, cov = test_cov) SARC <- regionSplit(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[1]], startlist = metadata(SARC)[[2]], endlist = metadata(SARC)[[3]]) SARC <- regionMean(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[1]], splitcov = metadata(SARC)[[4]]) SARC <- regionQuantiles(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[2]], meancov = metadata(SARC)[[5]], q1=.1, q2=.9) SARC <- prepAnova(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[3]], startlist = metadata(SARC)[[2]], endlist = metadata(SARC)[[3]])
Makes Grange objects for each region with a detected CNV.
regionGrangeMake(RE, covprepped, range = 10, gap = 10)
regionGrangeMake(RE, covprepped, range = 10, gap = 10)
RE |
RaggedExperiment object used to store all information. |
covprepped |
List of dataframes, each dataframe is small cov file which ranges the detected CNV. This should be the padded cov file created with SARC::plotCovPrep and will be in the matadata. |
range |
Number of rows (genomic ranges) to group together as one entry for grange. As some WES platforms will have very short genomic start-end coordinates, this is useful to extract the exon/ gene data. Default is 10. |
gap |
The gap between the start of one grange entry to the next entry. Default is 10. |
List of Grange Objects, one for each CNV detected.
data("test_cnv") test_cnv <- test_cnv[c(1:3),] data("test_cov") SARC <- regionSet(cnv = test_cnv, cov = test_cov) SARC <- regionSplit(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[1]], startlist = metadata(SARC)[[2]], endlist = metadata(SARC)[[3]]) SARC <- plotCovPrep(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[1]], startlist = metadata(SARC)[[2]], endlist = metadata(SARC)[[3]]) SARC <- regionGrangeMake(RE = SARC, covprepped = metadata(SARC)[[4]])
data("test_cnv") test_cnv <- test_cnv[c(1:3),] data("test_cov") SARC <- regionSet(cnv = test_cnv, cov = test_cov) SARC <- regionSplit(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[1]], startlist = metadata(SARC)[[2]], endlist = metadata(SARC)[[3]]) SARC <- plotCovPrep(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[1]], startlist = metadata(SARC)[[2]], endlist = metadata(SARC)[[3]]) SARC <- regionGrangeMake(RE = SARC, covprepped = metadata(SARC)[[4]])
Function to calculate MeanScores for each region where a CNV has been detected. Dependent of sequencing quality and WES/WGS input, the estimated MeanScores for CNV types are generally: HOMOZYGOUS DELETION 0-0.3, HETEROZYGOUS DELETION 0.3-0.7, HETEROZYGOUS DUPLICATION 1.3-1.8 HOMOZYGOUS DUPLICATION >1.8.
regionMean(RE, cnv, splitcov, nameofnewdf = "CNVmeans")
regionMean(RE, cnv, splitcov, nameofnewdf = "CNVmeans")
RE |
RaggedExperiment object used to store all information. |
cnv |
List of CNVs in a dataframe containing CNVs from detection algorithms/ pipelines. It is recomended to use the most recently generated cnv file. Check print(RE) to see if cnv files from other anlyses have been generated. |
splitcov |
List of small .cov dataframes. Would be stored as metadata after SARC::regionSplit. |
nameofnewdf |
Name of new dataframe to be saved in metadata(RE)[['CNVlist']]. Default is CNVmeans. |
A new cnv file with an additional column representing the meanScores. A new list of dataframes which contains the meanscores.
data("test_cnv") data("test_cov") SARC <- regionSet(cnv = test_cnv, cov = test_cov) SARC <- regionSplit(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[1]], startlist = metadata(SARC)[[2]], endlist = metadata(SARC)[[3]]) SARC <- regionMean(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[1]], splitcov = metadata(SARC)[[4]])
data("test_cnv") data("test_cov") SARC <- regionSet(cnv = test_cnv, cov = test_cov) SARC <- regionSplit(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[1]], startlist = metadata(SARC)[[2]], endlist = metadata(SARC)[[3]]) SARC <- regionMean(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[1]], splitcov = metadata(SARC)[[4]])
Calculates quantile distribution from each sample, within a region with a suspected CNV. We assume true deletions would be in the lower end of the quantile distribution and true duplications to be in the upper end of the quantile distribution. Many CNVs captured by detection pipelines will be false positives, this this is a good metric to remove obvious false positives. Lower quantile (q1) and upper quantile (q2) should be adjusted based on the number of patient samples available.
regionQuantiles( RE, cnv, meancov, q1 = 0.05, q2 = 0.95, nameofnewdf = "CNVquantiles" )
regionQuantiles( RE, cnv, meancov, q1 = 0.05, q2 = 0.95, nameofnewdf = "CNVquantiles" )
RE |
RaggedExperiment object used to store all information. |
cnv |
List of CNVs in a dataframe containing CNVs from detection algorithms/ pipelines. It is recommended to use the most recently generated cnv file. Check print(RE) to see if cnv files from other analyses have been generated. |
meancov |
Small cov files with meanscores included. This will be stored as metadata after being generated by SARC::regionMean. |
q1 |
Lower quantile cut-off. Default is .05. This should be adjusted based on the number of samples, i.e. less samples > higher the lower threshold. |
q2 |
Upper quantile cut-off. Default is .95. This should be adjusted based on the number of samples, i.e. less samples > lower the higher threshold. |
nameofnewdf |
Name of new dataframe to be saved in metadata(RE)[['CNVlist']]. Default is CNVquantiles. |
A new cnv file with two new columns. For deletions the Qlow columns will be 0 or 1, and for duplications the Qhigh columns will be 0 or 1.
data("test_cnv") data("test_cov") SARC <- regionSet(cnv = test_cnv, cov = test_cov) SARC <- regionSplit(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[1]], startlist = metadata(SARC)[[2]], endlist = metadata(SARC)[[3]]) SARC <- regionMean(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[1]], splitcov = metadata(SARC)[[4]]) SARC <- regionQuantiles(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[2]], meancov = metadata(SARC)[[5]], q1=.1, q2=.9)
data("test_cnv") data("test_cov") SARC <- regionSet(cnv = test_cnv, cov = test_cov) SARC <- regionSplit(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[1]], startlist = metadata(SARC)[[2]], endlist = metadata(SARC)[[3]]) SARC <- regionMean(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[1]], splitcov = metadata(SARC)[[4]]) SARC <- regionQuantiles(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[2]], meancov = metadata(SARC)[[5]], q1=.1, q2=.9)
Stores .cov and .cnv file (BED inspired file) into a RaggedExperiment object. Will use genomic locations from the .cnv file to find the start and end points of the CNVs from the .cov file and store these as two lists. The assay of the RE object will store the cov file and a list of nested dataframes called "CNVlist" within the metadata will store all the cnv dataframes.
regionSet(cnv, cov, col = "experiment")
regionSet(cnv, cov, col = "experiment")
cnv |
List of CNVs in a dataframe containing CNVs from detection algorithms/ pipelines. Additional columns such as BATCH and GENE can also be used for better plotting. |
cov |
cov file from WES platform/ sequencer kit or if WGS regular intervals. Stored as a dataframe - genomic locations as rows and samples as columns. It is recommended to normalize Read Depth by library size for fairer comparisons. |
col |
colData - A string which is used in RaggedExperiment creation. Default is "experiment". |
RaggedExperiment object with two lists: Start sites and End sites.The cov file will be stored as the main assay of the RE object, and a new list will appear in the metadata to contain all cnv dataframes.
data("test_cnv") data("test_cov") SARC <- regionSet(cnv = test_cnv, cov = test_cov)
data("test_cnv") data("test_cov") SARC <- regionSet(cnv = test_cnv, cov = test_cov)
Splits the large cov file into multiple smaller cov files. Each one is specific for each CNV entry in the bam file.
regionSplit(RE, cnv, startlist, endlist)
regionSplit(RE, cnv, startlist, endlist)
RE |
RaggedExperiment object used to store all information. |
cnv |
List of CNVs in a dataframe containing CNVs from detection algorithms/ pipelines. It is recommended that the most recently created cnv file is used. Check print(RE) to see more cnv files created by SARC. |
startlist |
List of start sites created from SARC::regionSet and stored as metadata. |
endlist |
List of end sites created from SARC::regionSet and stored as metadata. |
A list of dataframes - each one a smaller cov file.
data("test_cnv") data("test_cov") SARC <- regionSet(cnv = test_cnv, cov = test_cov) SARC <- regionSplit(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[1]], startlist = metadata(SARC)[[2]], endlist = metadata(SARC)[[3]])
data("test_cnv") data("test_cov") SARC <- regionSet(cnv = test_cnv, cov = test_cov) SARC <- regionSplit(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[1]], startlist = metadata(SARC)[[2]], endlist = metadata(SARC)[[3]])
Displays the distribution of mean scores from read depths from all the samples. This is a quick method of checking why some detected CNVs might be false positives. We expect true duplications to have very high read-depths and true deletions to have very low read-depths.
seeDist(meanList, cnv, sample, plotly = FALSE, colourCNV = "red", size = 2)
seeDist(meanList, cnv, sample, plotly = FALSE, colourCNV = "red", size = 2)
meanList |
List of dataframes which show the ranked mean scores for each CNV. Stored in metadata after SARC::setDPlot. |
cnv |
List of CNVs in a dataframe containing CNVs from detection algorithms/ pipelines. It is recommended that the most recently created cnv file is used. Check print(RE) to see more cnv files created by SARC. |
sample |
Which CNV/ row from the cnv file should be checked. Default is 1. |
plotly |
Should plotly be used - this could be useful when interested in seeing the samples at each point. |
colourCNV |
Colour of the sample which had the CNV detected. Default is red. |
size |
Size if the dots. Default is 2. |
A scatter graph which shows the mean score of read-depths for a particular region of DNA where a CNV was detected. The sample which had the CNV detected is coloured as red/ used selected colour.
data("test_cnv") data("test_cov") SARC <- regionSet(cnv = test_cnv, cov = test_cov) SARC <- regionSplit(RE = SARC, cnv = metadata(SARC)[[1]][[1]], startlist = metadata(SARC)[[2]], endlist = metadata(SARC)[[3]]) SARC <- regionMean(RE = SARC, cnv = metadata(SARC)[[1]][[1]], splitcov = metadata(SARC)[[4]]) SARC <- setQDplot(RE = SARC, meancov = metadata(SARC)[[5]]) p <- seeDist(meanList = metadata(SARC)[[6]], cnv = metadata(SARC)[[1]][[2]], plotly=FALSE, sample=1)
data("test_cnv") data("test_cov") SARC <- regionSet(cnv = test_cnv, cov = test_cov) SARC <- regionSplit(RE = SARC, cnv = metadata(SARC)[[1]][[1]], startlist = metadata(SARC)[[2]], endlist = metadata(SARC)[[3]]) SARC <- regionMean(RE = SARC, cnv = metadata(SARC)[[1]][[1]], splitcov = metadata(SARC)[[4]]) SARC <- setQDplot(RE = SARC, meancov = metadata(SARC)[[5]]) p <- seeDist(meanList = metadata(SARC)[[6]], cnv = metadata(SARC)[[1]][[2]], plotly=FALSE, sample=1)
Quality checking plot to see the Distribution of mean scores at a given region of the genome. The sample with the suspected CNV is highlighted. Useful way of checking why a certain CNV may be a false positive.
setQDplot(RE, meancov)
setQDplot(RE, meancov)
RE |
RaggedExperiment object used to store all information. |
meancov |
Small cov files with mean scores included. This will be stored as metadata after being generated by SARC::regionMean. |
A list of dataframes. Each dataframe contains ranked meanscores for each sample. One dataframe is made for each CNV in the cnv file.
data("test_cnv") data("test_cov") SARC <- regionSet(cnv = test_cnv, cov = test_cov) SARC <- regionSplit(RE = SARC, cnv = metadata(SARC)[[1]][[1]], startlist = metadata(SARC)[[2]], endlist = metadata(SARC)[[3]]) SARC <- regionMean(RE = SARC, cnv = metadata(SARC)[[1]][[1]], splitcov = metadata(SARC)[[4]]) SARC <- setQDplot(RE = SARC, meancov = metadata(SARC)[[5]])
data("test_cnv") data("test_cov") SARC <- regionSet(cnv = test_cnv, cov = test_cov) SARC <- regionSplit(RE = SARC, cnv = metadata(SARC)[[1]][[1]], startlist = metadata(SARC)[[2]], endlist = metadata(SARC)[[3]]) SARC <- regionMean(RE = SARC, cnv = metadata(SARC)[[1]][[1]], splitcov = metadata(SARC)[[4]]) SARC <- setQDplot(RE = SARC, meancov = metadata(SARC)[[5]])
Sets up dataframes for plotting the CNVs. If there are grange objects with no genes/ exons, it may be the CNVs are too small. These CNVs should be removed to not lead to errors.
setupCNVplot(RE, namedgranges, covprepped)
setupCNVplot(RE, namedgranges, covprepped)
RE |
RaggedExperiment object used to store all information. |
namedgranges |
List of grange objects, one for each CNV. These objects should have exons and genes, and will be in metadata after being created by SARC::addExonsGenes. |
covprepped |
List of dataframes, one for each CNV. This list should be in metadata after bring created by SARC::plotCovPrep. |
List of dataframes, each dataframe can be plotted using plotting functions from this package.
if (requireNamespace("TxDb.Hsapiens.UCSC.hg38.knownGene", quietly = TRUE)) { require("TxDb.Hsapiens.UCSC.hg38.knownGene") } else {} if (requireNamespace("Homo.sapiens", quietly = TRUE)) { require("Homo.sapiens") } else {} data("test_cnv") test_cnv <- test_cnv[c(1),] data("test_cov") SARC <- regionSet(cnv = test_cnv, cov = test_cov) SARC <- regionSplit(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[1]], startlist = metadata(SARC)[[2]], endlist = metadata(SARC)[[3]]) SARC <- plotCovPrep(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[1]], startlist = metadata(SARC)[[2]], endlist = metadata(SARC)[[3]]) SARC <- regionGrangeMake(RE = SARC, covprepped = metadata(SARC)[[4]]) TxDb(Homo.sapiens) <- TxDb.Hsapiens.UCSC.hg38.knownGene txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene tx <- transcriptsBy(Homo.sapiens, columns = "SYMBOL") txgene <- tx@unlistData SARC <- addExonsGenes(RE = SARC, covgranges = metadata(SARC)[[6]], txdb = txdb, txgene = txgene) SARC <- setupCNVplot(RE = SARC, namedgranges = metadata(SARC)[[7]], covprepped = metadata(SARC)[[4]])
if (requireNamespace("TxDb.Hsapiens.UCSC.hg38.knownGene", quietly = TRUE)) { require("TxDb.Hsapiens.UCSC.hg38.knownGene") } else {} if (requireNamespace("Homo.sapiens", quietly = TRUE)) { require("Homo.sapiens") } else {} data("test_cnv") test_cnv <- test_cnv[c(1),] data("test_cov") SARC <- regionSet(cnv = test_cnv, cov = test_cov) SARC <- regionSplit(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[1]], startlist = metadata(SARC)[[2]], endlist = metadata(SARC)[[3]]) SARC <- plotCovPrep(RE = SARC, cnv = metadata(SARC)[['CNVlist']][[1]], startlist = metadata(SARC)[[2]], endlist = metadata(SARC)[[3]]) SARC <- regionGrangeMake(RE = SARC, covprepped = metadata(SARC)[[4]]) TxDb(Homo.sapiens) <- TxDb.Hsapiens.UCSC.hg38.knownGene txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene tx <- transcriptsBy(Homo.sapiens, columns = "SYMBOL") txgene <- tx@unlistData SARC <- addExonsGenes(RE = SARC, covgranges = metadata(SARC)[[6]], txdb = txdb, txgene = txgene) SARC <- setupCNVplot(RE = SARC, namedgranges = metadata(SARC)[[7]], covprepped = metadata(SARC)[[4]])
This is not a traditional bed file, but shares the purpose of a bed file during CNV analysis - which is to store a list of CNVs detected from CNV detection algorithms/ pipelines. This files shares the first three columns of a traditional bed file (chrom, chromstart, chromend), and in addition has several more columns e.g. sample, type,value/ score and tool. The column names must match: SAMPLE, CHROM, START, END, TYPE, VALUE - but only the order of the first three columns must remain the same.
data("test_cnv")
data("test_cnv")
A data frame with 15 observations on the following 7 variables.
SAMPLE
a character vector
CHROM
a factor with levels chr1
chr10
chr11
chr12
chr14
chr15
chr16
chr17
chr18
chr19
chr2
chr20
chr22
chr3
chr4
chr5
chr6
chr7
chr8
chr9
chrX
chrY
START
a numeric vector
END
a numeric vector
TYPE
a factor with levels DEL
DUP
VALUE
a numeric vector
TOOL
a factor with levels clinCNV
cn.mops
exomeDepth
Several annonymised whole exome sequencing samples which have been put through a Copy Number Variaiton pipleine and had their CNVs recorded in a way which resembles a bed file. This file has additional columns too.
Dataframe listing CNVs from a WES CNV detection pipeline.
This is annonmysed patient data from the Newcastle Mitochondrial Research group, from the Wellcome Center for Mitochondrial research, Newcastle Universiry, UK.
Taylor, Robert W., et al. "Use of whole-exome sequencing to determine the genetic basis of multiple mitochondrial respiratory chain complex deficiencies." Jama 312.1 (2014): 68-77.
data(test_cnv) ## maybe str(test_cnv) ; plot(test_cnv) ...
data(test_cnv) ## maybe str(test_cnv) ; plot(test_cnv) ...
This is not a traditional bed file, but shares the purpose of a bed file during CNV analysis - which is to store a list of CNVs detected from CNV detection algorithms/ pipelines. This files shares the first three columns of a traditional bed file (chrom, chromstart, chromend), and in addition has several more columns e.g. sample, type,value/ score and tool. The column names must match: SAMPLE, CHROM, START, END, TYPE, VALUE - but only the order of the first three columns must remain the same. In addition, columns for the batch and gene are included to improve visualisation.
data("test_cnv2")
data("test_cnv2")
A data frame with 15 observations on the following 10 variables.
SAMPLE
a character vector
CHROM
a factor with levels chr1
chr10
chr11
chr12
chr14
chr15
chr16
chr17
chr18
chr19
chr2
chr20
chr22
chr3
chr4
chr5
chr6
chr7
chr8
chr9
chrX
chrY
START
a numeric vector
END
a numeric vector
TYPE
a factor with levels DEL
DUP
VALUE
a numeric vector
TOOL
a factor with levels clinCNV
cn.mops
exomeDepth
BATCH
a character vector
GENE
a character vector
Several annonymised whole exome sequencing samples which have been put through a Copy Number Variaiton pipleine and had their CNVs recorded in a way which resembles a bed file. This file has additional columns too.
Dataframe listing CNVs from a WES CNV detection pipeline. Contains extra columns for plot specificity.
This is annonmysed patient data from the Newcastle Mitochondrial Research group, from the Wellcome Center for Mitochondrial research, Newcastle Universiry, UK.
Taylor, Robert W., et al. "Use of whole-exome sequencing to determine the genetic basis of multiple mitochondrial respiratory chain complex deficiencies." Jama 312.1 (2014): 68-77.
data(test_cnv2) ## maybe str(test_cnv2) ; plot(test_cnv2) ...
data(test_cnv2) ## maybe str(test_cnv2) ; plot(test_cnv2) ...
This file is a large matrix of raw read depth from whole exome sequencing BAM files. The first four columns are: ID, chromosome, start position and end position.Following on, each sample has its own column which gives an integer for each position of DNA found in the whole exome experiment. Samples are annoymised. Read depth has been normalised by total library sizes via RPM (reads per million) calculations.
data("test_cov")
data("test_cov")
A data frame with 34987 observations on the following 24 variables.
ID
a numeric vector
CHROM
a factor with levels chr1
chr10
chr11
chr12
chr13
chr14
chr15
chr16
chr17
chr18
chr19
chr2
chr20
chr21
chr22
chr3
chr4
chr5
chr6
chr7
chr8
chr9
chrM
chrX
chrY
START
a numeric vector
END
a numeric vector
SampleA
a numeric vector
SampleB
a numeric vector
SampleC
a numeric vector
SampleD
a numeric vector
SampleE
a numeric vector
SampleF
a numeric vector
SampleG
a numeric vector
SampleH
a numeric vector
SampleI
a numeric vector
SampleJ
a numeric vector
SampleK
a numeric vector
SampleL
a numeric vector
SampleM
a numeric vector
SampleN
a numeric vector
SampleO
a numeric vector
SampleP
a numeric vector
SampleQ
a numeric vector
SampleR
a numeric vector
SampleS
a numeric vector
SampleT
a numeric vector
Several annonymised whole exome sequencing samples had been converted to a bam file and had a read counting tool applied.Only values from chromosome 1 are present to save space and time in examples.
Dataframe of normalised coverage from WES samples.
This is annonmysed patient data from the Newcastle Mitochondrial Research group, from the Wellcome Center for Mitochondrial research, Newcastle Universiry, UK.
Taylor, Robert W., et al. "Use of whole-exome sequencing to determine the genetic basis of multiple mitochondrial respiratory chain complex deficiencies." Jama 312.1 (2014): 68-77.
data(test_cov) ## maybe str(test_cov) ; plot(test_cov) ...
data(test_cov) ## maybe str(test_cov) ; plot(test_cov) ...