Title: | Clustering algorithm for ddPCR data |
---|---|
Description: | The ddPCRclust algorithm can automatically quantify the CPDs of non-orthogonal ddPCR reactions with up to four targets. In order to determine the correct droplet count for each target, it is crucial to both identify all clusters and label them correctly based on their position. For more information on what data can be analyzed and how a template needs to be formatted, please check the vignette. |
Authors: | Benedikt G. Brink [aut, cre], Justin Meskas [ctb], Ryan R. Brinkman [ctb] |
Maintainer: | Benedikt G. Brink <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.27.0 |
Built: | 2024-12-14 03:52:04 UTC |
Source: | https://github.com/bioc/ddPCRclust |
This function takes the results of the clustering and calculates the actual counts per target, as well as the counts per droplet (CPD) for each marker.
calculateCPDs(results, template = NULL, constantControl = NULL)
calculateCPDs(results, template = NULL, constantControl = NULL)
results |
The result of the ddPCRclust algorithm. |
template |
The parsed dataframe containing the template. |
constantControl |
The constant refrence control, which should be present in each reaction. It is used to normalize the data. |
A list of lists, containing the counts for empty droplets, each marker with both total droplet count and CPD, and total number of droplets, for each element of the input list respectively.
# Read files exampleFiles <- list.files(paste0(find.package('ddPCRclust'), '/extdata'), full.names = TRUE) files <- readFiles(exampleFiles[3]) # To read all example files uncomment the following line # files <- readFiles(exampleFiles[1:8]) # Read template template <- readTemplate(exampleFiles[9]) # Run ddPCRclust result <- ddPCRclust(files, template) # Calculate the CPDs markerCPDs <- calculateCPDs(result, template$template)
# Read files exampleFiles <- list.files(paste0(find.package('ddPCRclust'), '/extdata'), full.names = TRUE) files <- readFiles(exampleFiles[3]) # To read all example files uncomment the following line # files <- readFiles(exampleFiles[1:8]) # Read template template <- readTemplate(exampleFiles[9]) # Run ddPCRclust result <- ddPCRclust(files, template) # Calculate the CPDs markerCPDs <- calculateCPDs(result, template$template)
This function takes the three (or less) clustering approaches of the ddPCRclust package and combines them to one cluster ensemble. See cl_medoid for more information.
createEnsemble(dens = NULL, sam = NULL, peaks = NULL, file)
createEnsemble(dens = NULL, sam = NULL, peaks = NULL, file)
dens |
The result of the flowDensity algorithm as a CLUE partition. |
sam |
The result of the samSPECTRAL algorithm as a CLUE partition. |
peaks |
The result of the flowPeaks algorithm as a CLUE partition. |
file |
The input data. More specifically, a data frame with two dimensions, each dimension representing the intensity for one color. |
data |
The original input data minus the removed events (for plotting) |
confidence |
The agreement between the different clustering results in percent. If all algorithms calculated the same result, the clustering is likely to be correct, thus the confidence is high. |
counts |
The droplet count for each cluster. |
exampleFiles <- list.files(paste0(find.package('ddPCRclust'), '/extdata'), full.names = TRUE) file <- read.csv(exampleFiles[3]) densResult <- runDensity(file = file, numOfMarkers = 4) samResult <- runSam(file = file, numOfMarkers = 4) peaksResult <- runPeaks(file = file, numOfMarkers = 4) superResult <- createEnsemble(densResult, samResult, peaksResult, file)
exampleFiles <- list.files(paste0(find.package('ddPCRclust'), '/extdata'), full.names = TRUE) file <- read.csv(exampleFiles[3]) densResult <- runDensity(file = file, numOfMarkers = 4) samResult <- runSam(file = file, numOfMarkers = 4) peaksResult <- runPeaks(file = file, numOfMarkers = 4) superResult <- createEnsemble(densResult, samResult, peaksResult, file)
The ddPCRclust algorithm can automatically quantify the events of ddPCR reaction with up to four markers. In order to determine the correct droplet count for each marker, it is crucial to both identify all clusters and label them correctly based on their position. For more information on what data can be analyzed and how a template needs to be formatted, please check the project repository on github.
This is the main function of this package. It automatically runs the ddPCRclust algorithm on one or multiple csv files containing the raw data from a ddPCR run with up to 4 markers.
ddPCRclust(files, template, numOfMarkers = 4, sensitivity = 1, similarityParam = 0.95, distanceParam = 0.2, fast = FALSE, multithread = FALSE)
ddPCRclust(files, template, numOfMarkers = 4, sensitivity = 1, similarityParam = 0.95, distanceParam = 0.2, fast = FALSE, multithread = FALSE)
files |
The input data obtained from the csv files. For more information, please see |
template |
A data frame containing information about the individual ddPCR runs.
An example template is provided with this package. For more information, please see |
numOfMarkers |
The number of primary clusters that are expected according the experiment set up.
Can be ignored if a template is provided. Else, a vector with length equal to |
sensitivity |
A number between 0.1 and 2 determining sensitivity of the initial clustering, e.g. the number of clusters. A higher value means the data is divided into more clusters, a lower value means more clusters are merged. This allows fine tuning of the algorithm for exceptionally low or high CPDs. |
similarityParam |
If the distance of a droplet between two or more clusters is very similar, it will not be counted for either. The standard it 0.95, i.e. at least 95% similarity. A sensible value lies between 0 and 1, where 0 means none of the 'rain' droplets will be counted and 1 means all droplets will be counted. |
distanceParam |
When assigning rain between two clusters, typically the bottom 20% are assigned to the lower cluster and the remaining 80% to the higher cluster. This parameter changes the ratio, i.e. a value of 0.1 would assign only 10% to the lower cluster. |
fast |
Run a simpler version of the algorithm that is about 10x faster. For clean data, this might already deliver very good results. However, is is mostly intended to get a quick overview over the data. |
multithread |
Distribute the algorithm amongst all CPU cores to speed up the computation. |
results |
The results of the ddPCRclust algorithm. It contains three fields: |
The main function of the package is ddPCRclust
. This function runs the algorithm with one or multiple files,
automatically distributing them amongst all cpu cores using the parallel package
(parallelization does not work on windows). Afterwards, the results can be exported in different ways,
using exportPlots
, exportToExcel
and exportToCSV
.
Once the clustering is finished, copies per droplet (CPD) for each marker can be calculated using calculateCPDs
.
These functions provide access to all functionalities of the ddPCRclust package.
However, expert users can directly call some internal functions of the algorithm, if they find it necessary.
Here is a list of all available supplemental functions: runDensity
runSam
runPeaks
createEnsemble
Maintainer: Benedikt G. Brink [email protected]
Other contributors:
Justin Meskas [email protected] [contributor]
Ryan R. Brinkman [email protected] [contributor]
Useful links:
# Read files exampleFiles <- list.files(paste0(find.package('ddPCRclust'), '/extdata'), full.names = TRUE) files <- readFiles(exampleFiles[3]) # To read all example files uncomment the following line # files <- readFiles(exampleFiles[1:8]) # Read template template <- readTemplate(exampleFiles[9]) # Run ddPCRclust result <- ddPCRclust(files, template) # Plot the results library(ggplot2) p <- ggplot(data = result$B01$data, mapping = aes(x = Ch2.Amplitude, y = Ch1.Amplitude)) p <- p + geom_point(aes(color = factor(Cluster)), size = .5, na.rm = TRUE) + ggtitle('B01 example')+theme_bw() + theme(legend.position='none') p
# Read files exampleFiles <- list.files(paste0(find.package('ddPCRclust'), '/extdata'), full.names = TRUE) files <- readFiles(exampleFiles[3]) # To read all example files uncomment the following line # files <- readFiles(exampleFiles[1:8]) # Read template template <- readTemplate(exampleFiles[9]) # Run ddPCRclust result <- ddPCRclust(files, template) # Plot the results library(ggplot2) p <- ggplot(data = result$B01$data, mapping = aes(x = Ch2.Amplitude, y = Ch1.Amplitude)) p <- p + geom_point(aes(color = factor(Cluster)), size = .5, na.rm = TRUE) + ggtitle('B01 example')+theme_bw() + theme(legend.position='none') p
A convinience function that takes the results of the ddPCRclust algorithm, plots them using the ggplot2 library and a custom colour palette and saves the plots to a folder.
exportPlots(data, directory, annotations, format = "png", invert = FALSE)
exportPlots(data, directory, annotations, format = "png", invert = FALSE)
data |
The result of the ddPCRclust algorithm |
directory |
The parent directory where the files should saved. A new folder with the experiment name will be created (see below). |
annotations |
Some basic metadata about the ddPCR reaction.
If you provided |
format |
Which file format to use. Can be either be a device function (e.g. png),
or one of 'eps', 'ps', 'tex' (pictex), 'pdf', 'jpeg', 'tiff', 'png', 'bmp', 'svg' or 'wmf' (windows only).
See also |
invert |
Invert the axis, e.g. x = Ch2.Amplitude, y = Ch1.Amplitude |
None
# Read files exampleFiles <- list.files(paste0(find.package('ddPCRclust'), '/extdata'), full.names = TRUE) files <- readFiles(exampleFiles[3]) # To read all example files uncomment the following line # files <- readFiles(exampleFiles[1:8]) # Read template template <- readTemplate(exampleFiles[9]) # Run ddPCRclust result <- ddPCRclust(files, template) # Export the plots dir.create('./Results') exportPlots(data = result, directory = './Results/', annotations = result$annotations)
# Read files exampleFiles <- list.files(paste0(find.package('ddPCRclust'), '/extdata'), full.names = TRUE) files <- readFiles(exampleFiles[3]) # To read all example files uncomment the following line # files <- readFiles(exampleFiles[1:8]) # Read template template <- readTemplate(exampleFiles[9]) # Run ddPCRclust result <- ddPCRclust(files, template) # Export the plots dir.create('./Results') exportPlots(data = result, directory = './Results/', annotations = result$annotations)
A convinience function that takes the results of the droplClust algorithm and exports them to a csv file.
exportToCSV(data, directory, annotations, raw = FALSE)
exportToCSV(data, directory, annotations, raw = FALSE)
data |
The result of the ddPCRclust algorithm |
directory |
The parent directory where the files should saved. A new folder with the experiment name will be created (see below). |
annotations |
Some basic metadata about the ddPCR reaction.
If you provided |
raw |
Boolean which determines if the annotated raw data should be exported along with the final counts. Basically, a third column will be added to the original data, which contains the cluster number to which this point was assigned to. Useful for example to visualize the clustering later on. (Warning: this can take a while!) |
None
# Read files exampleFiles <- list.files(paste0(find.package('ddPCRclust'), '/extdata'), full.names = TRUE) files <- readFiles(exampleFiles[3]) # To read all example files uncomment the following line # files <- readFiles(exampleFiles[1:8]) # Read template template <- readTemplate(exampleFiles[9]) # Run ddPCRclust result <- ddPCRclust(files, template) # Export the results dir.create('./Results') exportToCSV(data = result, directory = './Results/', annotations = result$annotations)
# Read files exampleFiles <- list.files(paste0(find.package('ddPCRclust'), '/extdata'), full.names = TRUE) files <- readFiles(exampleFiles[3]) # To read all example files uncomment the following line # files <- readFiles(exampleFiles[1:8]) # Read template template <- readTemplate(exampleFiles[9]) # Run ddPCRclust result <- ddPCRclust(files, template) # Export the results dir.create('./Results') exportToCSV(data = result, directory = './Results/', annotations = result$annotations)
A convinience function that takes the results of the droplClust algorithm and exports them to an Excel file.
exportToExcel(data, directory, annotations, raw = FALSE)
exportToExcel(data, directory, annotations, raw = FALSE)
data |
The result of the ddPCRclust algorithm |
directory |
The parent directory where the files should saved. A new folder with the experiment name will be created (see below). |
annotations |
Some basic metadata about the ddPCR reaction.
If you provided |
raw |
Boolean which determines if the annotated raw data should be exported along with the final counts. Basically, a third column will be added to the original data, which contains the cluster number to which this point was assigned to. Useful for example to visualize the clustering later on. (Warning: this can take a while!) |
None
# Read files exampleFiles <- list.files(paste0(find.package('ddPCRclust'), '/extdata'), full.names = TRUE) files <- readFiles(exampleFiles[3]) # To read all example files uncomment the following line # files <- readFiles(exampleFiles[1:8]) # Read template template <- readTemplate(exampleFiles[9]) # Run ddPCRclust result <- ddPCRclust(files, template) # Export the results dir.create('./Results') exportToExcel(data = result, directory = './Results/', annotations = result$annotations)
# Read files exampleFiles <- list.files(paste0(find.package('ddPCRclust'), '/extdata'), full.names = TRUE) files <- readFiles(exampleFiles[3]) # To read all example files uncomment the following line # files <- readFiles(exampleFiles[1:8]) # Read template template <- readTemplate(exampleFiles[9]) # Run ddPCRclust result <- ddPCRclust(files, template) # Export the results dir.create('./Results') exportToExcel(data = result, directory = './Results/', annotations = result$annotations)
This function reads the raw csv files for ddPCRclust from disk and returns the experiment data. Please refer to the vignette for more information on how these files need to be formatted.
readFiles(files)
readFiles(files)
files |
The input file(s), specifically csv files. Each file represents a two-dimensional data frame. Each row within the data frame represents a single droplet, each column the respective intensities per colour channel. |
files |
A data frame composed of the experiment data |
ids |
The file ids, e.g. A01, A02, etc. |
# Read files exampleFiles <- list.files(paste0(find.package('ddPCRclust'), '/extdata'), full.names = TRUE) files <- readFiles(exampleFiles[1:8])
# Read files exampleFiles <- list.files(paste0(find.package('ddPCRclust'), '/extdata'), full.names = TRUE) files <- readFiles(exampleFiles[1:8])
This function reads a template file for ddPCRclust from disk and returns a run template and annotations. Please refer to the vignette for information on how this file need to be formatted.
readTemplate(template)
readTemplate(template)
template |
A csv file containing information about the individual ddPCR runs. An example template is provided with this package. For more information, please check the vignette or the repository on github. |
annotations |
The metatdata provided in the header of the template. It contains four fields: |
template |
A parsed dataframe containing the template. |
# Read template exampleFiles <- list.files(paste0(find.package('ddPCRclust'), '/extdata'), full.names = TRUE) template <- readTemplate(exampleFiles[9])
# Read template exampleFiles <- list.files(paste0(find.package('ddPCRclust'), '/extdata'), full.names = TRUE) template <- readTemplate(exampleFiles[9])
Use the local density function of the flowDensity package to find the cluster centres of the ddPCR reaction. Clusters are then labelled based on their rotated position and lastly the rain is assigned.
runDensity(file, sensitivity = 1, numOfMarkers, missingClusters = NULL, similarityParam = 0.95, distanceParam = 0.2)
runDensity(file, sensitivity = 1, numOfMarkers, missingClusters = NULL, similarityParam = 0.95, distanceParam = 0.2)
file |
The input data. More specifically, a data frame with two dimensions, each dimension representing the intensity for one color channel. |
sensitivity |
A number between 0.1 and 2 determining sensitivity of the initial clustering, e.g. the number of clusters. A higher value means more clusters are being found. Standard is 1. |
numOfMarkers |
The number of primary clusters that are expected according the experiment set up. |
missingClusters |
A vector containing the number of primary clusters, which are missing in this dataset according to the template. |
similarityParam |
If the distance of a droplet between two or more clusters is very similar, it will not be counted for either. The standard it 0.95, i.e. at least 95% similarity. A sensible value lies between 0 and 1, where 0 means none of the 'rain' droplets will be counted and 1 means all droplets will be counted. |
distanceParam |
When assigning rain between two clusters, typically the bottom 20% are assigned to the lower cluster and the remaining 80% to the higher cluster. This parameter changes the ratio, i.e. a value of 0.1 would assign only 10% to the lower cluster. |
data |
The original input data minus the removed events (for plotting) |
counts |
The droplet count for each cluster. |
firstClusters |
The position of the primary clusters. |
partition |
The cluster numbers as a CLUE partition (see clue package for more information). |
# Run the flowDensity based approach exampleFiles <- list.files(paste0(find.package('ddPCRclust'), '/extdata'), full.names = TRUE) file <- read.csv(exampleFiles[3]) densResult <- runDensity(file = file, numOfMarkers = 4) # Plot the results library(ggplot2) p <- ggplot(data = densResult$data, mapping = aes(x = Ch2.Amplitude, y = Ch1.Amplitude)) p <- p + geom_point(aes(color = factor(Cluster)), size = .5, na.rm = TRUE) + ggtitle('flowDensity example')+theme_bw() + theme(legend.position='none') p
# Run the flowDensity based approach exampleFiles <- list.files(paste0(find.package('ddPCRclust'), '/extdata'), full.names = TRUE) file <- read.csv(exampleFiles[3]) densResult <- runDensity(file = file, numOfMarkers = 4) # Plot the results library(ggplot2) p <- ggplot(data = densResult$data, mapping = aes(x = Ch2.Amplitude, y = Ch1.Amplitude)) p <- p + geom_point(aes(color = factor(Cluster)), size = .5, na.rm = TRUE) + ggtitle('flowDensity example')+theme_bw() + theme(legend.position='none') p
Find the rain and assign it based on the distance to vector lines connecting the cluster centres.
runPeaks(file, sensitivity = 1, numOfMarkers, missingClusters = NULL, similarityParam = 0.95, distanceParam = 0.2)
runPeaks(file, sensitivity = 1, numOfMarkers, missingClusters = NULL, similarityParam = 0.95, distanceParam = 0.2)
file |
The input data. More specifically, a data frame with two dimensions, each dimension representing the intensity for one color channel. |
sensitivity |
A number between 0.1 and 2 determining sensitivity of the initial clustering, e.g. the number of clusters. A higher value means more clusters are being found. Standard is 1. |
numOfMarkers |
The number of primary clusters that are expected according the experiment set up. |
missingClusters |
A vector containing the number of primary clusters, which are missing in this dataset according to the template. |
similarityParam |
If the distance of a droplet between two or more clusters is very similar, it will not be counted for either. The standard it 0.95, i.e. at least 95% similarity. A sensible value lies between 0 and 1, where 0 means none of the 'rain' droplets will be counted and 1 means all droplets will be counted. |
distanceParam |
When assigning rain between two clusters, typically the bottom 20% are assigned to the lower cluster and the remaining 80% to the higher cluster. This parameter changes the ratio, i.e. a value of 0.1 would assign only 10% to the lower cluster. |
data |
The original input data minus the removed events (for plotting) |
counts |
The droplet count for each cluster. |
firstClusters |
The position of the primary clusters. |
partition |
The cluster numbers as a CLUE partition (see clue package for more information). |
# Run the flowPeaks based approach exampleFiles <- list.files(paste0(find.package('ddPCRclust'), '/extdata'), full.names = TRUE) file <- read.csv(exampleFiles[3]) peaksResult <- runPeaks(file = file, numOfMarkers = 4) # Plot the results library(ggplot2) p <- ggplot(data = peaksResult$data, mapping = aes(x = Ch2.Amplitude, y = Ch1.Amplitude)) p <- p + geom_point(aes(color = factor(Cluster)), size = .5, na.rm = TRUE) + ggtitle('flowPeaks example')+theme_bw() + theme(legend.position='none') p
# Run the flowPeaks based approach exampleFiles <- list.files(paste0(find.package('ddPCRclust'), '/extdata'), full.names = TRUE) file <- read.csv(exampleFiles[3]) peaksResult <- runPeaks(file = file, numOfMarkers = 4) # Plot the results library(ggplot2) p <- ggplot(data = peaksResult$data, mapping = aes(x = Ch2.Amplitude, y = Ch1.Amplitude)) p <- p + geom_point(aes(color = factor(Cluster)), size = .5, na.rm = TRUE) + ggtitle('flowPeaks example')+theme_bw() + theme(legend.position='none') p
Find the rain and assign it based on the distance to vector lines connecting the cluster centres.
runSam(file, sensitivity = 1, numOfMarkers, missingClusters = NULL, similarityParam = 0.95, distanceParam = 0.2)
runSam(file, sensitivity = 1, numOfMarkers, missingClusters = NULL, similarityParam = 0.95, distanceParam = 0.2)
file |
The input data. More specifically, a data frame with two dimensions, each dimension representing the intensity for one color channel. |
sensitivity |
A number between 0.1 and 2 determining sensitivity of the initial clustering, e.g. the number of clusters. A higher value means more clusters are being found. Standard is 1. |
numOfMarkers |
The number of primary clusters that are expected according the experiment set up. |
missingClusters |
A vector containing the number of primary clusters, which are missing in this dataset according to the template. |
similarityParam |
If the distance of a droplet between two or more clusters is very similar, it will not be counted for either. The standard it 0.95, i.e. at least 95% similarity. A sensible value lies between 0 and 1, where 0 means none of the 'rain' droplets will be counted and 1 means all droplets will be counted. |
distanceParam |
When assigning rain between two clusters, typically the bottom 20% are assigned to the lower cluster and the remaining 80% to the higher cluster. This parameter changes the ratio, i.e. a value of 0.1 would assign only 10% to the lower cluster. |
data |
The original input data minus the removed events (for plotting) |
counts |
The droplet count for each cluster. |
firstClusters |
The position of the primary clusters. |
partition |
The cluster numbers as a CLUE partition (see clue package for more information). |
# Run the SamSPECTRAL based approach exampleFiles <- list.files(paste0(find.package('ddPCRclust'), '/extdata'), full.names = TRUE) file <- read.csv(exampleFiles[3]) samResult <- runSam(file = file, numOfMarkers = 4) # Plot the results library(ggplot2) p <- ggplot(data = samResult$data, mapping = aes(x = Ch2.Amplitude, y = Ch1.Amplitude)) p <- p + geom_point(aes(color = factor(Cluster)), size = .5, na.rm = TRUE) + ggtitle('SamSPECTRAL example')+theme_bw() + theme(legend.position='none') p
# Run the SamSPECTRAL based approach exampleFiles <- list.files(paste0(find.package('ddPCRclust'), '/extdata'), full.names = TRUE) file <- read.csv(exampleFiles[3]) samResult <- runSam(file = file, numOfMarkers = 4) # Plot the results library(ggplot2) p <- ggplot(data = samResult$data, mapping = aes(x = Ch2.Amplitude, y = Ch1.Amplitude)) p <- p + geom_point(aes(color = factor(Cluster)), size = .5, na.rm = TRUE) + ggtitle('SamSPECTRAL example')+theme_bw() + theme(legend.position='none') p
Longer DNA templates produce a lower droplet count due to DNA shearing. This function normalizes the ddPCRclust result based on a stable marker of different lengths to negate the effect of differences in the lengths of the actual markers of interest. (Work in progress)
shearCorrection(counts, lengthControl, stableControl)
shearCorrection(counts, lengthControl, stableControl)
counts |
The counts per marker as provided by calculateCPDs. |
lengthControl |
The name of the length Control. If the template name is for example CPT2, the name in the template should be CPT2-125, where 125 represents the number of basepairs. |
stableControl |
The name of the stable Control used as a reference for this experiment. |
A linear regression model fitting the length vs ln(ratio) (see lm for details on linear regression).