| Title: | Analysis of scDNA-seq data |
|---|---|
| Description: | Scafari is a Shiny application designed for the analysis of single-cell DNA sequencing (scDNA-seq) data provided in .h5 file format. The analysis process is structured into the four key steps "Sequencing", "Panel", "Variants", and "Explore Variants". It supports various analyses and visualizations. |
| Authors: | Sophie Wind [aut, cre] (ORCID: <https://orcid.org/0009-0002-1174-8201>) |
| Maintainer: | Sophie Wind <[email protected]> |
| License: | LGPL-3 |
| Version: | 1.3.0 |
| Built: | 2026-05-30 09:57:33 UTC |
| Source: | https://git.bioconductor.org/packages/scafari |
Function: annotateAmplicons This function takes a SingleCellExperiment object as input and annotates the stored amplicons.
annotateAmplicons(sce, known.canon, shiny = FALSE)annotateAmplicons(sce, known.canon, shiny = FALSE)
sce |
SingleCellExperiment object containing the single-cell data. |
known.canon |
Path to jnown canonicals (see vignette) |
shiny |
If TRUE messages are shown |
A dataframe containing annotated amplicons.
# Assume `sce` is a SingleCellExperiment object with a 'counts' assay sce_filtered <- readRDS(system.file("extdata", "sce_filtered_demo.rds", package = "scafari" )) annotated <- annotateAmplicons( sce_filtered, system.file("extdata", "UCSC_hg19_knownCanonical_mock.txt", package = "scafari" ) )# Assume `sce` is a SingleCellExperiment object with a 'counts' assay sce_filtered <- readRDS(system.file("extdata", "sce_filtered_demo.rds", package = "scafari" )) annotated <- annotateAmplicons( sce_filtered, system.file("extdata", "UCSC_hg19_knownCanonical_mock.txt", package = "scafari" ) )
Function: annotateVariants This function takes a SingleCellExperiment object as input and performs variant annotation.
annotateVariants(sce, shiny = FALSE, max.var = 50)annotateVariants(sce, shiny = FALSE, max.var = 50)
sce |
SingleCellExperiment object containing the single-cell data to be annotated. |
shiny |
A logical flag indicating whether the function is being run in a Shiny application context. Default is FALSE. |
max.var |
Maximum number of variants to annotate. By default this is 50 to avoid long runtime. |
The function returns an annotated SingleCellExperiment object.
https://missionbio.github.io/mosaic/, https://github.com/rachelgriffard/optima
# Assume `sce` is a SingleCellExperiment object with variants in altExp() sce_filtered <- readRDS(system.file("extdata", "sce_filtered_demo.rds", package = "scafari" )) sce <- annotateVariants(sce_filtered, shiny = FALSE)# Assume `sce` is a SingleCellExperiment object with variants in altExp() sce_filtered <- readRDS(system.file("extdata", "sce_filtered_demo.rds", package = "scafari" )) sce <- annotateVariants(sce_filtered, shiny = FALSE)
Function: clusterVariantSelection This function takes selected variants and performs clustering on them.
clusterVariantSelection( sce, variants.of.interest, n.clust, method = "k-means", eps.value = 0.2, resolution = NULL, min.pts = NULL )clusterVariantSelection( sce, variants.of.interest, n.clust, method = "k-means", eps.value = 0.2, resolution = NULL, min.pts = NULL )
sce |
A SingleCellExperiment object containing the single-cell data on which clustering will be performed. |
variants.of.interest |
A vector or list specifying the variants of interest to be selected for clustering. |
n.clust |
An integer specifying the number of clusters. |
method |
Clustering method. Either k-means, dbscan or leiden. |
eps.value |
Size (radius) of the epsilon neighborhood. Can be omitted if x is a frNN object. |
resolution |
The resolution parameter to use. Higher resolutions lead to more smaller communities, while lower resolutions lead to fewer larger communities. |
min.pts |
Number of minimum points required in the eps neighborhood for core points (including the point itself). By default cell number/100. |
A list with clustering results and a ggplot-object.
https://cran.r-project.org/web/packages/dbscan/readme/ README.html#ref-hahsler2019dbscan
# Assume `sce` is a SingleCellExperiment object with variants in altExp() sce_filtered <- readRDS(system.file("extdata", "sce_filtered_demo.rds", package = "scafari" )) clusterplot <- clusterVariantSelection( sce = sce_filtered, variants.of.interest = c( "FLT3:chr13:28610183:A/G", "KIT:chr4:55599436:T/C", "TP53:chr17:7577427:G/A", "TET2:chr4:106158216:G/A" ), n.clust = 4 )# Assume `sce` is a SingleCellExperiment object with variants in altExp() sce_filtered <- readRDS(system.file("extdata", "sce_filtered_demo.rds", package = "scafari" )) clusterplot <- clusterVariantSelection( sce = sce_filtered, variants.of.interest = c( "FLT3:chr13:28610183:A/G", "KIT:chr4:55599436:T/C", "TP53:chr17:7577427:G/A", "TET2:chr4:106158216:G/A" ), n.clust = 4 )
Function: filterVariants ——————————- This function takes a SingleCellExperiment object as input and performs variant filtering
filterVariants( depth.threshold = numeric(), genotype.quality.threshold = numeric(), vaf.ref = numeric(), vaf.het = numeric(), vaf.hom = numeric(), min.cell = numeric(), min.mut.cell = numeric(), se.var, sce, shiny = FALSE )filterVariants( depth.threshold = numeric(), genotype.quality.threshold = numeric(), vaf.ref = numeric(), vaf.het = numeric(), vaf.hom = numeric(), min.cell = numeric(), min.mut.cell = numeric(), se.var, sce, shiny = FALSE )
depth.threshold |
A numeric value specifying the minimum read depth required. |
genotype.quality.threshold |
A numeric value specifying the minimum genotype quality score. |
vaf.ref |
A numeric value specifying the variant allele frequency threshold for wild-type alleles. |
vaf.het |
A numeric value specifying the variant allele frequency threshold for heterozygous variants. |
vaf.hom |
A numeric value specifying the variant allele frequency threshold for homozygous variants. |
min.cell |
A numeric value indicating the minimum number of cells with a genotype other than "missing". |
min.mut.cell |
A numeric value indicating the minimum number of mutated (genotype either "homozygous" or "heterozygous") cells. |
se.var |
The SummarizedExperiment object containing variant data which will be filtered. |
sce |
SingleCellExperiment object containing the single-cell data TODO. |
shiny |
A logical flag indicating whether the function is being run in a Shiny application context. Default is FALSE. |
A list containing the following elements:
Variant allele frequencies after filtering.
Genotype information after filtering.
Normalized read counts for variants retained after filtering.
A vector of the variant IDs that were retained after filtering.
Genotype qualities for variants retained after filtering.
A vector of cell identifiers for those cells retaine d after filtering.
https://missionbio.github.io/mosaic/, https://github.com/rachelgriffard/optima
library(SummarizedExperiment) h5_file_path <- system.file("extdata", "demo.h5", package = "scafari") h5 <- h5ToSce(h5_file_path) sce <- h5$sce_amp se.var <- h5$se_var sce <- normalizeReadCounts(sce = sce) filteres <- filterVariants( depth.threshold = 10, genotype.quality.threshold = 30, vaf.ref = 5, vaf.het = 35, vaf.hom = 95, min.cell = 50, min.mut.cell = 1, se.var = se.var, sce = sce, shiny = FALSE ) se.f <- SummarizedExperiment( assays = list( VAF = t(filteres$vaf.matrix.filtered), Genotype = t(filteres$genotype.matrix.filtered), Genoqual = t(filteres$genoqual.matrix.filtered) ), rowData = filteres$variant.ids.filtered, colData = filteres$cells.keep ) # Filter out cells in sce object # Find the indices of the columns to keep indices_to_keep <- match(filteres$cells.keep, SummarizedExperiment::colData(sce)[[1]], nomatch = 0 ) # Subset the SCE using these indices sce_filtered <- sce[, indices_to_keep]library(SummarizedExperiment) h5_file_path <- system.file("extdata", "demo.h5", package = "scafari") h5 <- h5ToSce(h5_file_path) sce <- h5$sce_amp se.var <- h5$se_var sce <- normalizeReadCounts(sce = sce) filteres <- filterVariants( depth.threshold = 10, genotype.quality.threshold = 30, vaf.ref = 5, vaf.het = 35, vaf.hom = 95, min.cell = 50, min.mut.cell = 1, se.var = se.var, sce = sce, shiny = FALSE ) se.f <- SummarizedExperiment( assays = list( VAF = t(filteres$vaf.matrix.filtered), Genotype = t(filteres$genotype.matrix.filtered), Genoqual = t(filteres$genoqual.matrix.filtered) ), rowData = filteres$variant.ids.filtered, colData = filteres$cells.keep ) # Filter out cells in sce object # Find the indices of the columns to keep indices_to_keep <- match(filteres$cells.keep, SummarizedExperiment::colData(sce)[[1]], nomatch = 0 ) # Subset the SCE using these indices sce_filtered <- sce[, indices_to_keep]
Function: h5ToSce This function takes the path of an h5 file and reads this into an object of the SingleCellExperiment class.
h5ToSce(h5_file)h5ToSce(h5_file)
h5_file |
path of an h5 file |
A list containing the following elements:
SingleCellExperiment class object containing read count information.
SummarizedExperiment class object containing variant information..
A list with the SingleCellExperiment and the SummarizedExperiment object representing the amplicon and the variant analysis.
SingleCellExperiment with amplicon experiment.
SummarizedExperiment with variants eexperiment
h5_file_path <- system.file("extdata", "demo.h5", package = "scafari") # Read the h5ToSce using readH5File result <- h5ToSce(h5_file_path) # Display the result resulth5_file_path <- system.file("extdata", "demo.h5", package = "scafari") # Read the h5ToSce using readH5File result <- h5ToSce(h5_file_path) # Display the result result
Launches the Scafari Shiny application for data visualization.
launchScafariShiny()launchScafariShiny()
shiny app
if (interactive()) { launchScafariShiny() }if (interactive()) { launchScafariShiny() }
This function generates a log-log plot of read counts using the data from a 'SingleCellExperiment' object.
logLogPlot(sce)logLogPlot(sce)
sce |
A SingleCellExperiment object that contains read count data to be plotted. The read counts are extracted from an associated h5 file or similar data structure within the object. |
A ggplot object representing the log-log plot of read counts.
# Assume `sce` is a SingleCellExperiment object with a 'counts' assay h5_file_path <- system.file("extdata", "demo.h5", package = "scafari") h5 <- h5ToSce(h5_file_path) sce <- h5$sce_amp plot <- logLogPlot(sce) print(plot)# Assume `sce` is a SingleCellExperiment object with a 'counts' assay h5_file_path <- system.file("extdata", "demo.h5", package = "scafari") h5 <- h5ToSce(h5_file_path) sce <- h5$sce_amp plot <- logLogPlot(sce) print(plot)
This function normalizes the read counts contained within a 'SingleCellExperiment' object.
normalizeReadCounts(sce)normalizeReadCounts(sce)
sce |
A SingleCellExperiment object that includes the assay data with read counts to be normalized. The metadata within the object may also be utilized for normalization purposes. |
SingleCellExperiment object with normalized read counts.
https://missionbio.github.io/mosaic/, https://github.com/rachelgriffard/optima
# Assume `sce` is a SingleCellExperiment object with 'counts' assay. h5_file_path <- system.file("extdata", "demo.h5", package = "scafari") h5 <- h5ToSce(h5_file_path) sce <- h5$sce_amp sce <- normalizeReadCounts(sce)# Assume `sce` is a SingleCellExperiment object with 'counts' assay. h5_file_path <- system.file("extdata", "demo.h5", package = "scafari") h5 <- h5ToSce(h5_file_path) sce <- h5$sce_amp sce <- normalizeReadCounts(sce)
This function generates a plot to visualize the distribution of amplicons within a 'SingleCellExperiment' object.
plotAmpliconDistribution(sce)plotAmpliconDistribution(sce)
sce |
A SingleCellExperiment object that contains the assay data, including amplicon information to be plotted. |
A ggplot object representing the distribution of amplicons.
# Assume `sce` is a SingleCellExperiment object with 'counts' assay. h5_file_path <- system.file("extdata", "demo.h5", package = "scafari") h5 <- h5ToSce(h5_file_path) sce <- h5$sce_amp plotAmpliconDistribution(sce)# Assume `sce` is a SingleCellExperiment object with 'counts' assay. h5_file_path <- system.file("extdata", "demo.h5", package = "scafari") h5 <- h5ToSce(h5_file_path) sce <- h5$sce_amp plotAmpliconDistribution(sce)
This function generates a plot to visualize genotype in clusters based on selected variants of interest.
plotClusterGenotype(sce, variants.of.interest, gg.clust)plotClusterGenotype(sce, variants.of.interest, gg.clust)
sce |
A SingleCellExperiment object containing the relevant data. |
variants.of.interest |
A vector specifying the variants of interest. |
gg.clust |
An object containing clustering information. |
A ggplot object that visually represents the clustering of genotypes based on the specified variants and clustering information.
# Assume `sce` is a SingleCellExperiment object with variants in altExp() # and clusterplot is the output of clusterVariantSleection(). sce_filtered <- readRDS(system.file("extdata", "sce_filtered_demo.rds", package = "scafari" )) clusterplot <- readRDS(system.file("extdata", "clusterplot.rds", package = "scafari" )) plotClusterGenotype( sce = sce_filtered, variants.of.interest = c( "FLT3:chr13:28610183:A/G", "KIT:chr4:55599436:T/C", "TP53:chr17:7577427:G/A", "TET2:chr4:106158216:G/A" ), gg.clust = clusterplot$clusterplot )# Assume `sce` is a SingleCellExperiment object with variants in altExp() # and clusterplot is the output of clusterVariantSleection(). sce_filtered <- readRDS(system.file("extdata", "sce_filtered_demo.rds", package = "scafari" )) clusterplot <- readRDS(system.file("extdata", "clusterplot.rds", package = "scafari" )) plotClusterGenotype( sce = sce_filtered, variants.of.interest = c( "FLT3:chr13:28610183:A/G", "KIT:chr4:55599436:T/C", "TP53:chr17:7577427:G/A", "TET2:chr4:106158216:G/A" ), gg.clust = clusterplot$clusterplot )
plot Cluster VAF This function generates a plot to visualize variant allele frequency (VAF) in clusters based on selected variants of interest.
plotClusterVAF(sce, variants.of.interest, gg.clust)plotClusterVAF(sce, variants.of.interest, gg.clust)
sce |
A SingleCellExperiment object containing the relevant data. |
variants.of.interest |
A vector specifying the variants of interest. |
gg.clust |
An object containing clustering information. |
A ggplot object that visually represents the VAF in the clusters.
# Assume `sce` is a SingleCellExperiment object with variants in altExp() and # clusterplot is the output of clusterVariantSleection(). sce_filtered <- readRDS(system.file("extdata", "sce_filtered_demo.rds", package = "scafari" )) clusterplot <- readRDS(system.file("extdata", "clusterplot.rds", package = "scafari" )) plotClusterVAF( sce = sce_filtered, variants.of.interest = c( "FLT3:chr13:28610183:A/G", "KIT:chr4:55599436:T/C", "TP53:chr17:7577427:G/A", "TET2:chr4:106158216:G/A" ), gg.clust = clusterplot$clusterplot )# Assume `sce` is a SingleCellExperiment object with variants in altExp() and # clusterplot is the output of clusterVariantSleection(). sce_filtered <- readRDS(system.file("extdata", "sce_filtered_demo.rds", package = "scafari" )) clusterplot <- readRDS(system.file("extdata", "clusterplot.rds", package = "scafari" )) plotClusterVAF( sce = sce_filtered, variants.of.interest = c( "FLT3:chr13:28610183:A/G", "KIT:chr4:55599436:T/C", "TP53:chr17:7577427:G/A", "TET2:chr4:106158216:G/A" ), gg.clust = clusterplot$clusterplot )
plot Cluster VAF Map This function generates a plot to visualize variant allele frequency (VAF) in clusters based on selected variants of interest with clusters in the background.
plotClusterVAFMap(sce, variants.of.interest, gg.clust)plotClusterVAFMap(sce, variants.of.interest, gg.clust)
sce |
A SingleCellExperiment object containing the relevant data. |
variants.of.interest |
A vector specifying the variants of interest. |
gg.clust |
An object containing clustering information. |
A ggplot object that visually represents the VAF in the clusters with clusters in the background.
# Assume `sce` is a SingleCellExperiment object with variants in altExp() and # clusterplot is the output of clusterVariantSleection(). sce_filtered <- readRDS(system.file("extdata", "sce_filtered_demo.rds", package = "scafari" )) clusterplot <- readRDS(system.file("extdata", "clusterplot.rds", package = "scafari" )) plotClusterVAFMap( sce = sce_filtered, variants.of.interest = c( "FLT3:chr13:28610183:A/G", "KIT:chr4:55599436:T/C", "TP53:chr17:7577427:G/A", "TET2:chr4:106158216:G/A" ), gg.clust = clusterplot$clusterplot )# Assume `sce` is a SingleCellExperiment object with variants in altExp() and # clusterplot is the output of clusterVariantSleection(). sce_filtered <- readRDS(system.file("extdata", "sce_filtered_demo.rds", package = "scafari" )) clusterplot <- readRDS(system.file("extdata", "clusterplot.rds", package = "scafari" )) plotClusterVAFMap( sce = sce_filtered, variants.of.interest = c( "FLT3:chr13:28610183:A/G", "KIT:chr4:55599436:T/C", "TP53:chr17:7577427:G/A", "TET2:chr4:106158216:G/A" ), gg.clust = clusterplot$clusterplot )
This function takes a SingleCellExperiment object and variants of interest as input and plots an elbow plot to perform k-means later.
plotElbow(sce, variants.of.interest)plotElbow(sce, variants.of.interest)
sce |
A SingleCellExperiment object containing the relevant data. |
variants.of.interest |
A vector specifying the variants of interest. |
ggplot object with elbow plot.
# Assume `sce` is a SingleCellExperiment object with variants in altExp(). sce_filtered <- readRDS(system.file("extdata", "sce_filtered_demo.rds", package = "scafari")) plotElbow( sce = sce_filtered, variants.of.interest = c( "FLT3:chr13:28610183:A/G", "KIT:chr4:55599436:T/C", "TP53:chr17:7577427:G/A", "TET2:chr4:106158216:G/A" ) )# Assume `sce` is a SingleCellExperiment object with variants in altExp(). sce_filtered <- readRDS(system.file("extdata", "sce_filtered_demo.rds", package = "scafari")) plotElbow( sce = sce_filtered, variants.of.interest = c( "FLT3:chr13:28610183:A/G", "KIT:chr4:55599436:T/C", "TP53:chr17:7577427:G/A", "TET2:chr4:106158216:G/A" ) )
This function generates a plot to assess the quality of genotypes within a ‘SingleCellExperiment' object. It uses the ’genotype_quality' assay or any relevant assay to visualize the distribution of genotype quality metrics across different genotypes.
plotGenotypequalityPerGenotype(sce)plotGenotypequalityPerGenotype(sce)
sce |
A 'SingleCellExperiment' object containing the single-cell data. This object should include a 'genotype_quality' assay or similar data which provides quality metrics for each genotype. |
A 'ggplot' object visualizing the distribution of genotype quality across different genotypes.
# Assume `sce` is a SingleCellExperiment object with 'genotype_quality' # assay. sce_filtered <- readRDS(system.file("extdata", "sce_filtered_demo.rds", package = "scafari" )) genotype_quality_plot <- plotGenotypequalityPerGenotype(sce_filtered) print(genotype_quality_plot)# Assume `sce` is a SingleCellExperiment object with 'genotype_quality' # assay. sce_filtered <- readRDS(system.file("extdata", "sce_filtered_demo.rds", package = "scafari" )) genotype_quality_plot <- plotGenotypequalityPerGenotype(sce_filtered) print(genotype_quality_plot)
This function plots the normalize read counts in a bar chart.
plotNormalizedReadCounts(sce)plotNormalizedReadCounts(sce)
sce |
A SingleCellExperiment object containing the relevant data. |
ggplot object visualizing the normalized read counts in a bar chart.
# Assume `sce` is a SingleCellExperiment object with 'counts' assay. h5_file_path <- system.file("extdata", "demo.h5", package = "scafari") h5 <- h5ToSce(h5_file_path) sce <- h5$sce_amp sce <- normalizeReadCounts(sce = sce) plotNormalizedReadCounts(sce)# Assume `sce` is a SingleCellExperiment object with 'counts' assay. h5_file_path <- system.file("extdata", "demo.h5", package = "scafari") h5 <- h5ToSce(h5_file_path) sce <- h5$sce_amp sce <- normalizeReadCounts(sce = sce) plotNormalizedReadCounts(sce)
This function generates a plot to assess the uniformity of panel reads in a 'SingleCellExperiment' object. It uses read counts stored in the 'counts' assay to visualize the distribution and variability of reads.
plotPanelUniformity(sce, interactive = FALSE)plotPanelUniformity(sce, interactive = FALSE)
sce |
A 'SingleCellExperiment' object that contains single-cell data with a 'counts' assay. This object should be pre-processed to ensure the data is appropriate for uniformity analysis. |
interactive |
in interactive mode an plotly is returned. |
A 'ggplot' object visualizing the uniformity of panel reads.
# Assume `sce` is a SingleCellExperiment object with 'counts' assay. h5_file_path <- system.file("extdata", "demo.h5", package = "scafari") h5 <- h5ToSce(h5_file_path) sce <- h5$sce_amp sce <- normalizeReadCounts(sce = sce) uniformity_plot <- plotPanelUniformity(sce) print(uniformity_plot)# Assume `sce` is a SingleCellExperiment object with 'counts' assay. h5_file_path <- system.file("extdata", "demo.h5", package = "scafari") h5 <- h5ToSce(h5_file_path) sce <- h5$sce_amp sce <- normalizeReadCounts(sce = sce) uniformity_plot <- plotPanelUniformity(sce) print(uniformity_plot)
This function generates a heatmap to visualize variant data within a 'SingleCellExperiment' object. The heatmap provides insights into the distribution and frequency of variants across different samples or cells.
plotVariantHeatmap(sce)plotVariantHeatmap(sce)
sce |
A 'SingleCellExperiment' object containing single-cell variant data. The object should include an assay that holds variant information suitable for visualization in a heatmap. |
A 'ggplot' object representing the heatmap of variant frequencies or presence across cells or groups.
# Assume `sce` is a SingleCellExperiment object with an appropriate variant # assay. sce_filtered <- readRDS(system.file("extdata", "sce_filtered_demo.rds", package = "scafari" )) variant_heatmap <- plotVariantHeatmap(sce_filtered) print(variant_heatmap)# Assume `sce` is a SingleCellExperiment object with an appropriate variant # assay. sce_filtered <- readRDS(system.file("extdata", "sce_filtered_demo.rds", package = "scafari" )) variant_heatmap <- plotVariantHeatmap(sce_filtered) print(variant_heatmap)
scafari is an R Bioconductor package for single-cell DNA-seq (scDNA-seq) analysis.
scafari works on .h5 files, the standard output of the Tapestri pipeline. It offers easy-to-use data quality control as well as explorative variant analyses and visualization for scDNA-seq data.
h5ToSce(): Reads in .h5 files and writes them into a
SingleCellExperiment object.
normalizeReadCounts(): Normalizes the read counts.
annotateAmplicons(): Annotates the amplicons present in the
.h5 file.
filterVariants(): Filters the variants present in the .h5
file.
annotateVariants(): Annotates the filtered variants.
clusterVariantSelection(): Clusteres variants of special
interest.
To install this package, use:
BiocManager::install("scafari")
Maintainer: Sophie Wind [email protected] (ORCID)
Useful links: