Title: | Reliable CNV detection in targeted sequencing applications |
---|---|
Description: | A method that allows for the use of a collection of non-matched normal tissue samples. Our approach uses a non-parametric bootstrap subsampling of the available reference samples to estimate the distribution of read counts from targeted sequencing. As inspired by random forest, this is combined with a procedure that subsamples the amplicons associated with each of the targeted genes. The obtained information allows us to reliably classify the copy number aberrations on the gene level. |
Authors: | Cristiano Oliveira [aut], Thomas Wolf [aut, cre], Albrecht Stenzinger [ctb], Volker Endris [ctb], Nicole Pfarr [ctb], Benedikt Brors [ths], Wilko Weichert [ths] |
Maintainer: | Thomas Wolf <[email protected]> |
License: | GPL-3 |
Version: | 1.39.0 |
Built: | 2025-01-13 04:05:10 UTC |
Source: | https://github.com/bioc/CNVPanelizer |
This package implements an algorithm that uses a collection of non-matched normal tissue samples as a reference set to detect CNV aberrations in data generated from amplicon based targeted sequencing.
Our approach uses a non-parametric bootstrap subsampling of the available reference samples, to estimate the distribution of read counts from targeted sequencing. As inspired by random forest, this is combined at each iteration with a procedure that subsamples the amplicons associated with each of the targeted genes. To estimate the background noise of sequencing genes with a low number of amplicons a second subsampling step is performed. Both steps are combined to make a decision on the CNV status. Thus classifying the copy number aberrations on the gene level. |
For a complete list of functions, use library(help = "CNVPanelizer").
Package: | CNVPanelizer |
Type: | Package |
License: | GPL-3 |
Thomas Wolf <[email protected]>
Cristiano Oliveira <[email protected]>
Makes use of a subsampling approach to estimate the background noise when sequencing a gene with a specific number of amplicons. The 95 percent confidence interval is returned for each unique number of amplicons in the experiment.
Background(geneNames, samplesNormalizedReadCounts, referenceNormalizedReadCounts, bootList, replicates = 1000, significanceLevel = 0.05, robust = FALSE)
Background(geneNames, samplesNormalizedReadCounts, referenceNormalizedReadCounts, bootList, replicates = 1000, significanceLevel = 0.05, robust = FALSE)
geneNames |
A vector of gene names, with one entry for each sequenced amplicon. |
samplesNormalizedReadCounts |
A matrix with the normalized read counts of the samples of interest |
referenceNormalizedReadCounts |
A matrix with the normalized reference read counts |
bootList |
A list as returned by BootList |
replicates |
an integer number of how many replicates should be performed |
significanceLevel |
The significance level for the calculated confidence interval |
robust |
If set to true the confidence interval is calculated replacing mean with median and sd with mad. |
Returns a list of data frames. One data frame for each sample of interest. The data frames report the 95 percent confidence interval of the background noise for each number of amplicons and sample combination.
Thomas Wolf, Cristiano Oliveira
data(sampleReadCounts) data(referenceReadCounts) ## Gene names should be same size as row columns geneNames <- row.names(referenceReadCounts) ampliconNames <- NULL normalizedReadCounts <- CombinedNormalizedCounts(sampleReadCounts, referenceReadCounts, ampliconNames = ampliconNames) # After normalization data sets need to be splitted again to perform bootstrap samplesNormalizedReadCounts = normalizedReadCounts["samples"][[1]] referenceNormalizedReadCounts = normalizedReadCounts["reference"][[1]] #Values above 10000 should be used replicates <- 10 # Perform the bootstrap based analysis bootList <- BootList(geneNames, samplesNormalizedReadCounts, referenceNormalizedReadCounts, replicates = replicates) background <- Background(geneNames, samplesNormalizedReadCounts, referenceNormalizedReadCounts, bootList, replicates = replicates, significanceLevel = 0.1)
data(sampleReadCounts) data(referenceReadCounts) ## Gene names should be same size as row columns geneNames <- row.names(referenceReadCounts) ampliconNames <- NULL normalizedReadCounts <- CombinedNormalizedCounts(sampleReadCounts, referenceReadCounts, ampliconNames = ampliconNames) # After normalization data sets need to be splitted again to perform bootstrap samplesNormalizedReadCounts = normalizedReadCounts["samples"][[1]] referenceNormalizedReadCounts = normalizedReadCounts["reference"][[1]] #Values above 10000 should be used replicates <- 10 # Perform the bootstrap based analysis bootList <- BootList(geneNames, samplesNormalizedReadCounts, referenceNormalizedReadCounts, replicates = replicates) background <- Background(geneNames, samplesNormalizedReadCounts, referenceNormalizedReadCounts, bootList, replicates = replicates, significanceLevel = 0.1)
It generates a GenomicRanges object from a bed file. Needs to be passed the correct number of the gene name column. If the strings contain more information then just the gene name, a splitting character (split) has to be defined. I.e GeneName1;Amplicon2
BedToGenomicRanges(panelBedFilepath, ampliconColumn, split, doReduce, rangeExtend, dropChromossomes, skip)
BedToGenomicRanges(panelBedFilepath, ampliconColumn, split, doReduce, rangeExtend, dropChromossomes, skip)
panelBedFilepath |
Filepath of the bed file. |
ampliconColumn |
Number of the column that identifies the gene name in the bed file
passed through |
split |
The character used as separator in the |
doReduce |
Should overlapping ranges be merged. |
rangeExtend |
Should the defined ranges be extended left and right by the given value. Affects the merging of overlapping regions and also read counting. |
dropChromossomes |
Drop chromossomes. |
skip |
How many lines should be skipped from the top of the bed file. The function assumes a bed file with column names. Thus default is skip = 1. |
A GenomicRanges
object containing information about the amplicons
described in the bed file.
Thomas Wolf, Cristiano Oliveira
bedFilepath <- file.path("someFile.bed") ampliconColumn <- 4 genomicRangesFromBed <- BedToGenomicRanges(bedFilepath, ampliconColumn)
bedFilepath <- file.path("someFile.bed") ampliconColumn <- 4 genomicRangesFromBed <- BedToGenomicRanges(bedFilepath, ampliconColumn)
Performs a hybrid bootstrapping subsampling procedure similar to random forest. It bootstraps the reference samples and subsamples the amplicons associated with each gene. Returns a distribution of sample/reference ratios for each gene and sample of interest combination.
BootList(geneNames, sampleMatrix, refmat, replicates)
BootList(geneNames, sampleMatrix, refmat, replicates)
geneNames |
A vector of gene names, with one entry for each sequenced amplicon. |
sampleMatrix |
A vector or matrix of the read counts from the sample of interest. In the case of a matrix columns represent samples and rows amplicons. |
refmat |
A matrix of the read counts obtianed from the reference samples. Columns represent reference samples and rows amplicons. |
replicates |
How many bootstrap replicates should be performed. |
Returns a list of numeric matrices: For each matrix a row represent a gene while each column represents a bootstrapping/subsampling iteration.
Thomas Wolf, Cristiano Oliveira
data(sampleReadCounts) data(referenceReadCounts) ## Gene names should be same size as row columns geneNames <- row.names(referenceReadCounts) ampliconNames <- NULL normalizedReadCounts <- CombinedNormalizedCounts(sampleReadCounts, referenceReadCounts, ampliconNames = ampliconNames) # After normalization data sets need to be splitted again to perform bootstrap samplesNormalizedReadCounts = normalizedReadCounts["samples"][[1]] referenceNormalizedReadCounts = normalizedReadCounts["reference"][[1]] # Should be used values above 10000 replicates <- 10 # Perform the bootstrap based analysis bootList <- BootList(geneNames, samplesNormalizedReadCounts, referenceNormalizedReadCounts, replicates = replicates)
data(sampleReadCounts) data(referenceReadCounts) ## Gene names should be same size as row columns geneNames <- row.names(referenceReadCounts) ampliconNames <- NULL normalizedReadCounts <- CombinedNormalizedCounts(sampleReadCounts, referenceReadCounts, ampliconNames = ampliconNames) # After normalization data sets need to be splitted again to perform bootstrap samplesNormalizedReadCounts = normalizedReadCounts["samples"][[1]] referenceNormalizedReadCounts = normalizedReadCounts["reference"][[1]] # Should be used values above 10000 replicates <- 10 # Perform the bootstrap based analysis bootList <- BootList(geneNames, samplesNormalizedReadCounts, referenceNormalizedReadCounts, replicates = replicates)
Performs the workflow analysis with CNVPanelizer from the read counts and splitting the batch of samples analyzed
CNVPanelizerFromReadCounts(sampleReadCounts, referenceReadCounts, genomicRangesFromBed, numberOfBootstrapReplicates = 10000, normalizationMethod = "tmm", robust = TRUE, backgroundSignificanceLevel = 0.05, outputDir = file.path(getwd(), "CNVPanelizer"))
CNVPanelizerFromReadCounts(sampleReadCounts, referenceReadCounts, genomicRangesFromBed, numberOfBootstrapReplicates = 10000, normalizationMethod = "tmm", robust = TRUE, backgroundSignificanceLevel = 0.05, outputDir = file.path(getwd(), "CNVPanelizer"))
sampleReadCounts |
samples read counts matrix |
referenceReadCounts |
reference read counts matrix |
genomicRangesFromBed |
genomic ranges from bed |
numberOfBootstrapReplicates |
number of bootstrap replicates |
normalizationMethod |
Normalization method ("tmm" or "tss") |
robust |
if TRUE, the median is used instead of mean |
backgroundSignificanceLevel |
The background Significance Level |
outputDir |
Output directory |
Returns a list with the results of each samples analyzed
Cristiano Oliveira
CNVPanelizerFromReadCounts(sampleReadCounts, referenceReadCounts, genomicRangesFromBed, numberOfBootstrapReplicates = 10000, normalizationMethod = "tmm", robust = TRUE, backgroundSignificanceLevel = 0.05, outputDir = file.path(getwd(), "CNVPanelizer"))
CNVPanelizerFromReadCounts(sampleReadCounts, referenceReadCounts, genomicRangesFromBed, numberOfBootstrapReplicates = 10000, normalizationMethod = "tmm", robust = TRUE, backgroundSignificanceLevel = 0.05, outputDir = file.path(getwd(), "CNVPanelizer"))
Helper to performs the workflow analysis with CNVPanelizer from the read counts and splitting the batch of samples analyzed
CNVPanelizerFromReadCountsHELPER(sampleReadCounts, referenceReadCounts, genomicRangesFromBed, numberOfBootstrapReplicates = 10000, normalizationMethod = "tmm", robust = TRUE, backgroundSignificanceLevel = 0.05, outputDir = file.path(getwd(), "CNVPanelizer"), splitSize = 5)
CNVPanelizerFromReadCountsHELPER(sampleReadCounts, referenceReadCounts, genomicRangesFromBed, numberOfBootstrapReplicates = 10000, normalizationMethod = "tmm", robust = TRUE, backgroundSignificanceLevel = 0.05, outputDir = file.path(getwd(), "CNVPanelizer"), splitSize = 5)
sampleReadCounts |
samples read counts matrix |
referenceReadCounts |
reference read counts matrix |
genomicRangesFromBed |
genomic ranges from bed |
numberOfBootstrapReplicates |
number of bootstrap replicates |
normalizationMethod |
Normalization method ("tmm" or "tss") |
robust |
if TRUE, the median is used instead of mean |
backgroundSignificanceLevel |
The background Significance Level |
outputDir |
Output directory |
splitSize |
Split size of the batches analyzed |
Returns a list with the results of each samples analyzed
Cristiano Oliveira
CNVPanelizerFromReadCountsHELPER(sampleReadCounts, referenceReadCounts, genomicRangesFromBed, numberOfBootstrapReplicates = 10000, normalizationMethod = "tmm", robust = TRUE, backgroundSignificanceLevel = 0.05, outputDir = file.path(getwd(), "CNVPanelizer"), splitSize = 5)
CNVPanelizerFromReadCountsHELPER(sampleReadCounts, referenceReadCounts, genomicRangesFromBed, numberOfBootstrapReplicates = 10000, normalizationMethod = "tmm", robust = TRUE, backgroundSignificanceLevel = 0.05, outputDir = file.path(getwd(), "CNVPanelizer"), splitSize = 5)
Collect a single column from all report tables at the list
CollectColumnFromAllReportTables(reportTables, columnName)
CollectColumnFromAllReportTables(reportTables, columnName)
reportTables |
A list of report tables |
columnName |
The column name |
Returns a data frame with where the columns were collected from the entire list of report tables
Cristiano Oliveira
CollectColumnFromAllReportTables(reportTables, columnName)
CollectColumnFromAllReportTables(reportTables, columnName)
This function makes use of Total sum scaling or NOISeq::tmm to normalize the read counts of all samples and references to the same median read count
CombinedNormalizedCounts(sampleCounts, referenceCounts, method, ampliconNames = NULL)
CombinedNormalizedCounts(sampleCounts, referenceCounts, method, ampliconNames = NULL)
sampleCounts |
Matrix or vector with sample read counts (rows: amplicons, columns: samples) |
referenceCounts |
Matrix with reference read counts (rows: amplicons, columns: samples) |
method |
either "tmm" (trimmed mean of m values) or "tss"(total sum scaling) |
ampliconNames |
A vector with amplicon defining names for the reference and sample matrices |
A list object with two matrices
samples |
The samples matrix normalized |
reference |
The reference matrix normalized |
Cristiano Oliveira, Thomas Wolf
data(sampleReadCounts) data(referenceReadCounts) normalizedReadCounts <- CombinedNormalizedCounts(sampleReadCounts, referenceReadCounts)
data(sampleReadCounts) data(referenceReadCounts) normalizedReadCounts <- CombinedNormalizedCounts(sampleReadCounts, referenceReadCounts)
Index a list of bam files if there is no index exists for the file entries in the list.
IndexMultipleBams(bams, index_type = ".bam.bai")
IndexMultipleBams(bams, index_type = ".bam.bai")
bams |
A character vector of bam files to be indexed |
index_type |
The index file type extension |
Not returning any value
Thomas Wolf, Cristiano Oliveira
files = c("file1.bam","file2.bam","file3.bam") IndexMultipleBams(bams = files)
files = c("file1.bam","file2.bam","file3.bam") IndexMultipleBams(bams = files)
This function normalize counts use of Total sum scaling or NOISeq::tmm to normalize the read counts
NormalizeCounts(allCounts, method)
NormalizeCounts(allCounts, method)
allCounts |
Matrix or vector with sample read counts (rows: amplicons, columns: samples) |
method |
either "tmm" (trimmed mean of m values) or "tss"(total sum scaling) |
A matrice
samples |
The samples matrix normalized |
Cristiano Oliveira, Thomas Wolf
data(sampleReadCounts) normalizedReadCounts <- NormalizeCounts(sampleReadCounts)
data(sampleReadCounts) normalizedReadCounts <- NormalizeCounts(sampleReadCounts)
Plots the generated bootstrap distribution as violin plots. Genes showing significant values are marked in a different color.
PlotBootstrapDistributions(bootList, reportTables, outputFolder = getwd(), sampleNames = NULL, save = FALSE, scale = 10)
PlotBootstrapDistributions(bootList, reportTables, outputFolder = getwd(), sampleNames = NULL, save = FALSE, scale = 10)
bootList |
List of bootstrapped read counts for each sample data |
reportTables |
List of report tables for each sample data |
outputFolder |
Path to the folder where the data plots will be created |
sampleNames |
List with sample names |
save |
Boolean to save the plots to the output folder |
scale |
Numeric scale factor |
A list with ggplot2 objects.
Thomas Wolf, Cristiano Oliveira
data(sampleReadCounts) data(referenceReadCounts) ## Gene names should be same size as row columns geneNames <- row.names(referenceReadCounts) ampliconNames <- NULL normalizedReadCounts <- CombinedNormalizedCounts(sampleReadCounts, referenceReadCounts, ampliconNames = ampliconNames) # After normalization data sets need to be splitted again to perform bootstrap samplesNormalizedReadCounts = normalizedReadCounts["samples"][[1]] referenceNormalizedReadCounts = normalizedReadCounts["reference"][[1]] # Should be used values above 10000 replicates <- 10 # Perform the bootstrap based analysis bootList <- BootList(geneNames, samplesNormalizedReadCounts, referenceNormalizedReadCounts, replicates = replicates) backgroundNoise <- Background(geneNames, samplesNormalizedReadCounts, referenceNormalizedReadCounts, bootList, replicates = replicates) reportTables <- ReportTables(geneNames, samplesNormalizedReadCounts, referenceNormalizedReadCounts, bootList, backgroundNoise) PlotBootstrapDistributions(bootList, reportTables, save = FALSE)
data(sampleReadCounts) data(referenceReadCounts) ## Gene names should be same size as row columns geneNames <- row.names(referenceReadCounts) ampliconNames <- NULL normalizedReadCounts <- CombinedNormalizedCounts(sampleReadCounts, referenceReadCounts, ampliconNames = ampliconNames) # After normalization data sets need to be splitted again to perform bootstrap samplesNormalizedReadCounts = normalizedReadCounts["samples"][[1]] referenceNormalizedReadCounts = normalizedReadCounts["reference"][[1]] # Should be used values above 10000 replicates <- 10 # Perform the bootstrap based analysis bootList <- BootList(geneNames, samplesNormalizedReadCounts, referenceNormalizedReadCounts, replicates = replicates) backgroundNoise <- Background(geneNames, samplesNormalizedReadCounts, referenceNormalizedReadCounts, bootList, replicates = replicates) reportTables <- ReportTables(geneNames, samplesNormalizedReadCounts, referenceNormalizedReadCounts, bootList, backgroundNoise) PlotBootstrapDistributions(bootList, reportTables, save = FALSE)
Returns a matrix with the read counts from a set of bam files.
ReadCountsFromBam(bamFilenames, sampleNames, gr, ampliconNames, minimumMappingQuality, removeDup = FALSE)
ReadCountsFromBam(bamFilenames, sampleNames, gr, ampliconNames, minimumMappingQuality, removeDup = FALSE)
bamFilenames |
Vector of bamfile filepaths |
sampleNames |
Vector of sample names to be used as colums names instead of bam filepaths |
gr |
Genomic Range object as created by |
ampliconNames |
List of amplicon defining names |
minimumMappingQuality |
Minimum mapping quality |
removeDup |
Boolean value to remove duplicates. For reads with the same start site, end site and orientation only one is kept. For IonTorrent data this can be used to as an additional quality control. For Illumina data too many reads are being removed. |
A matrix with read counts where the rows represents the Amplicons and the columns represents the samples.
Thomas Wolf, Cristiano Oliveira
ReadCountsFromBam(bamFilenames, sampleNames, gr, ampliconNames, removeDup)
ReadCountsFromBam(bamFilenames, sampleNames, gr, ampliconNames, removeDup)
Reads a list of read count matrices from a xlsx as generated
by WriteReadCountsToXLSX
ReadXLSXToList(filepath, rowNames = TRUE, colNames = TRUE)
ReadXLSXToList(filepath, rowNames = TRUE, colNames = TRUE)
filepath |
filepath |
rowNames |
if row names should be included |
colNames |
if col names should be included |
A list of read count matrices
Thomas Wolf, Cristiano Oliveira
ReadXLSXToList(filepath)
ReadXLSXToList(filepath)
Synthetic reference data set of simulated read counts. Only to be used for code examples.
referenceSamples
referenceSamples
A matrix with columns identifying the sample names and columns the gene names
A matrix with columns identifying the sample names and columns the gene names
Artificially generated data
This function generates the final report of the CNV detection procedure. One data frame is generated for each sample of interest.
ReportTables(geneNames, samplesNormalizedReadCounts, referenceNormalizedReadCounts, bootList, backgroundNoise)
ReportTables(geneNames, samplesNormalizedReadCounts, referenceNormalizedReadCounts, bootList, backgroundNoise)
geneNames |
Describe |
samplesNormalizedReadCounts |
Describe |
referenceNormalizedReadCounts |
Describe |
bootList |
A list as returned by the |
backgroundNoise |
A list of background noise as returned by the
|
Returns a list of tables, one for each sample of interest. Each of these tables contains numerical information of the aberration status of each gene. For a detailed description see the Vignette.
Thomas Wolf, Cristiano Oliveira
data(sampleReadCounts) data(referenceReadCounts) ## Gene names should be same size as row columns geneNames <- row.names(referenceReadCounts) ampliconNames <- NULL normalizedReadCounts <- CombinedNormalizedCounts(sampleReadCounts, referenceReadCounts, ampliconNames = ampliconNames) # After normalization data sets need to be splitted again to perform bootstrap samplesNormalizedReadCounts = normalizedReadCounts["samples"][[1]] referenceNormalizedReadCounts = normalizedReadCounts["reference"][[1]] # Should be used values above 10000 replicates <- 10 # Perform the bootstrap based analysis bootList <- BootList(geneNames, samplesNormalizedReadCounts, referenceNormalizedReadCounts, replicates = replicates) backgroundNoise = Background(geneNames, samplesNormalizedReadCounts, referenceNormalizedReadCounts, bootList, replicates = replicates) reportTables <- ReportTables(geneNames, samplesNormalizedReadCounts, referenceNormalizedReadCounts, bootList, backgroundNoise)
data(sampleReadCounts) data(referenceReadCounts) ## Gene names should be same size as row columns geneNames <- row.names(referenceReadCounts) ampliconNames <- NULL normalizedReadCounts <- CombinedNormalizedCounts(sampleReadCounts, referenceReadCounts, ampliconNames = ampliconNames) # After normalization data sets need to be splitted again to perform bootstrap samplesNormalizedReadCounts = normalizedReadCounts["samples"][[1]] referenceNormalizedReadCounts = normalizedReadCounts["reference"][[1]] # Should be used values above 10000 replicates <- 10 # Perform the bootstrap based analysis bootList <- BootList(geneNames, samplesNormalizedReadCounts, referenceNormalizedReadCounts, replicates = replicates) backgroundNoise = Background(geneNames, samplesNormalizedReadCounts, referenceNormalizedReadCounts, bootList, replicates = replicates) reportTables <- ReportTables(geneNames, samplesNormalizedReadCounts, referenceNormalizedReadCounts, bootList, backgroundNoise)
Run CNVPanelizer as a shiny app
RunCNVPanelizerShiny(port = 8100)
RunCNVPanelizerShiny(port = 8100)
port |
Port where the app will be listening |
Not returning any value
Thomas Wolf, Cristiano Oliveira
RunCNVPanelizerShiny(port=8080)
RunCNVPanelizerShiny(port=8080)
Synthetic data set of simulated read counts. Only to be used for running the code examples.
testSamples
testSamples
A matrix with columns identifying the sample names and columns the gene names
A matrix with columns identifying the sample names and columns the gene names
Artificially generated data
Select a reference set using a factor of the Interquartile Range
SelectReferenceSetByInterquartileRange(allSamplesReadCounts, normalizationMethod = "tmm", iqrFactor = 1)
SelectReferenceSetByInterquartileRange(allSamplesReadCounts, normalizationMethod = "tmm", iqrFactor = 1)
allSamplesReadCounts |
All samples read counts matrix |
normalizationMethod |
tmm (trimmed mean of m values) or tss (total sum scaling) |
iqrFactor |
Interquantile range factor |
Returns a list of sample identifiers to be used as reference
Cristiano Oliveira
SelectReferenceSetByPercentil(allSamplesReadCounts, normalizationMethod = "tmm", iqrFactor = 1)
SelectReferenceSetByPercentil(allSamplesReadCounts, normalizationMethod = "tmm", iqrFactor = 1)
Select a reference set using Kmeans
SelectReferenceSetByKmeans(allSamplesReadCounts, normalizationMethod = "tmm", referenceNumberOfElements)
SelectReferenceSetByKmeans(allSamplesReadCounts, normalizationMethod = "tmm", referenceNumberOfElements)
allSamplesReadCounts |
All samples read counts matrix |
normalizationMethod |
tmm (trimmed mean of m values) or tss (total sum scaling) |
referenceNumberOfElements |
Number of elements to select for the reference set |
Returns a list of sample identifiers to be used as reference
Cristiano Oliveira
SelectReferenceSetByKmeans(allSamplesReadCounts, normalizationMethod = "tmm", referenceNumberOfElements)
SelectReferenceSetByKmeans(allSamplesReadCounts, normalizationMethod = "tmm", referenceNumberOfElements)
Select a reference set using percentiles
SelectReferenceSetByPercentil(allSamplesReadCounts, normalizationMethod = "tmm", lowerBoundPercentage = 1, upperBoundPercentage = 99)
SelectReferenceSetByPercentil(allSamplesReadCounts, normalizationMethod = "tmm", lowerBoundPercentage = 1, upperBoundPercentage = 99)
allSamplesReadCounts |
All samples read counts matrix |
normalizationMethod |
tmm (trimmed mean of m values) or tss (total sum scaling) |
lowerBoundPercentage |
Lower bound percentage |
upperBoundPercentage |
Upper bound percentage |
Returns a list of sample identifiers to be used as reference
Cristiano Oliveira
SelectReferenceSetByPercentil(allSamplesReadCounts, normalizationMethod = "tmm", lowerBoundPercentage = 1, upperBoundPercentage = 99)
SelectReferenceSetByPercentil(allSamplesReadCounts, normalizationMethod = "tmm", lowerBoundPercentage = 1, upperBoundPercentage = 99)
Select a reference set from read counts
SelectReferenceSetFromReadCounts(allSamplesReadCounts, normalizationMethod = "tmm", referenceMaximumNumberOfElements = 30, referenceSelectionMethod = "kmeans", lowerBoundPercentage = 1, upperBoundPercentage = 99)
SelectReferenceSetFromReadCounts(allSamplesReadCounts, normalizationMethod = "tmm", referenceMaximumNumberOfElements = 30, referenceSelectionMethod = "kmeans", lowerBoundPercentage = 1, upperBoundPercentage = 99)
allSamplesReadCounts |
All samples read counts matrix |
normalizationMethod |
tmm (trimmed mean of m values) or tss (total sum scaling) |
referenceMaximumNumberOfElements |
Maximum number of elements to consider as reference (only to be used in case interquantile reference selection method) |
referenceSelectionMethod |
Reference selection method ("kmeans", ...) |
lowerBoundPercentage |
Lower bound percentage (only to be used in case interquantile reference selection method) |
upperBoundPercentage |
Upper bound percentage (only to be used in case interquantile reference selection method) |
Returns a list of sample identifiers to be used as reference
Cristiano Oliveira
SelectReferenceSetFromReadCounts(allSamplesReadCounts, normalizationMethod = "tmm", referenceMaximumNumberOfElements = 30, referenceSelectionMethod = "kmeans")
SelectReferenceSetFromReadCounts(allSamplesReadCounts, normalizationMethod = "tmm", referenceMaximumNumberOfElements = 30, referenceSelectionMethod = "kmeans")
Generates a status heapmap for all samples analyzed
StatusHeatmap(dfData, statusColors = c("Deletion" = "blue", "Normal" = "green", "Amplification" = "red"), header = "Status Heatmap", filepath = "CNVPanelizerHeatMap.png")
StatusHeatmap(dfData, statusColors = c("Deletion" = "blue", "Normal" = "green", "Amplification" = "red"), header = "Status Heatmap", filepath = "CNVPanelizerHeatMap.png")
dfData |
data frame with the "Amplification", "Deletion" and "Normal" status |
statusColors |
A named vector with the colors associated with each level |
header |
Header text at the plot |
filepath |
Filepath where the generated heatmap is saved |
Returns the filepath of the saved Heatmap
Cristiano Oliveira
StatusHeatmap(dfData, statusColors = c("Deletion" = "blue", "Normal" = "green", "Amplification" = "red"), header = "Status Heatmap", filepath = "CNVPanelizerHeatMap.png")
StatusHeatmap(dfData, statusColors = c("Deletion" = "blue", "Normal" = "green", "Amplification" = "red"), header = "Status Heatmap", filepath = "CNVPanelizerHeatMap.png")
Writes list of data frames to an xlsx file
WriteListToXLSX(listOfDataFrames, multipleFiles = FALSE, outputFolder = file.path(getwd(), "xlsx"), filepath = "list.xlsx")
WriteListToXLSX(listOfDataFrames, multipleFiles = FALSE, outputFolder = file.path(getwd(), "xlsx"), filepath = "list.xlsx")
listOfDataFrames |
list of dataframes |
multipleFiles |
If should be generated on single file with all results or multiple files |
outputFolder |
Output folder |
filepath |
filepath |
Not returning any value
Thomas Wolf, Cristiano Oliveira
WriteListToXLSX(listOfDataFrames = exampleList, filepath = "list.xlsx")
WriteListToXLSX(listOfDataFrames = exampleList, filepath = "list.xlsx")