Title: | Analysis of single-cell epigenomics datasets with a Shiny App |
---|---|
Description: | ChromSCape - Chromatin landscape profiling for Single Cells - is a ready-to-launch user-friendly Shiny Application for the analysis of single-cell epigenomics datasets (scChIP-seq, scATAC-seq, scCUT&Tag, ...) from aligned data to differential analysis & gene set enrichment analysis. It is highly interactive, enables users to save their analysis and covers a wide range of analytical steps: QC, preprocessing, filtering, batch correction, dimensionality reduction, vizualisation, clustering, differential analysis and gene set analysis. |
Authors: | Pacome Prompsy [aut, cre] , Celine Vallot [aut] |
Maintainer: | Pacome Prompsy <[email protected]> |
License: | GPL-3 |
Version: | 1.17.0 |
Built: | 2024-11-18 03:27:36 UTC |
Source: | https://github.com/bioc/ChromSCape |
Find nearest peaks of each gene and return refined annotation
annotation_from_merged_peaks(scExp, odir, merged_peaks, geneTSS_annotation)
annotation_from_merged_peaks(scExp, odir, merged_peaks, geneTSS_annotation)
scExp |
A SingleCellExperiment object |
odir |
An output directory where to write the mergedpeaks BED file |
merged_peaks |
A list of GRanges object containing the merged peaks |
geneTSS_annotation |
A GRanges object with reference genes |
A data.frame with refined annotation
annotToCol2
annotToCol2( annotS = NULL, annotT = NULL, missing = c("", NA), anotype = NULL, maxnumcateg = 2, categCol = NULL, quantitCol = NULL, plotLegend = TRUE, plotLegendFile = NULL )
annotToCol2( annotS = NULL, annotT = NULL, missing = c("", NA), anotype = NULL, maxnumcateg = 2, categCol = NULL, quantitCol = NULL, plotLegend = TRUE, plotLegendFile = NULL )
annotS |
A color matrix |
annotT |
A color matrix |
missing |
Convert missing to NA |
anotype |
Annotation type |
maxnumcateg |
Maximum number of categories |
categCol |
Categorical columns |
quantitCol |
Quantitative columns |
plotLegend |
Plot legend ? |
plotLegendFile |
Which file to plot legend ? |
A matrix of continuous or discrete colors
data("scExp") annotToCol2(SingleCellExperiment::colData(scExp), plotLegend = FALSE)
data("scExp") annotToCol2(SingleCellExperiment::colData(scExp), plotLegend = FALSE)
Helper binary column for anocol function
anocol_binary(anocol, anotype, plotLegend, annotS)
anocol_binary(anocol, anotype, plotLegend, annotS)
anocol |
The color feature matrix |
anotype |
The feature types |
plotLegend |
Plot legend ? |
annotS |
A color matrix |
A color matrix similar to anocol with binrary columns colored
Helper binary column for anocol function
anocol_categorical(anocol, categCol, anotype, plotLegend, annotS)
anocol_categorical(anocol, categCol, anotype, plotLegend, annotS)
anocol |
The color feature matrix |
categCol |
Colors for categorical features |
anotype |
The feature types |
plotLegend |
Plot legend ? |
annotS |
A color matrix |
A color matrix similar to anocol with binrary columns colored
Count bam files on interval to create count indexes
bams_to_matrix_indexes(dir, which, BPPARAM = BiocParallel::bpparam())
bams_to_matrix_indexes(dir, which, BPPARAM = BiocParallel::bpparam())
dir |
A directory containing single cell BAM files and BAI files |
which |
Genomic Range on which to count |
BPPARAM |
BPPARAM object for multiprocessing. See bpparam for more informations. Will take the default BPPARAM set in your R session. |
A list containing a "feature index" data.frame and a count vector for non 0 entries, both used to form the sparse matrix
Count bed files on interval to create count indexes
beds_to_matrix_indexes(dir, which, BPPARAM = BiocParallel::bpparam())
beds_to_matrix_indexes(dir, which, BPPARAM = BiocParallel::bpparam())
dir |
A directory containing the single cell BED files |
which |
Genomic Range on which to count |
BPPARAM |
BPPARAM object for multiprocessing. See bpparam for more informations. Will take the default BPPARAM set in your R session. |
A list containing a "feature index" data.frame and a names of cells as vector both used to form the sparse matrix
Cytobands are considered large enough in order that a variation at the cytoband level is not considered as an epigenetic event but as a genetic event, e.g. Copy Number Alterations. The function successively :
Calculates the fraction of reads in each cytoband (FrCyto). See calculate_cyto_mat
Calculates the log2-ratio FrCyto of each cell by the average FrCyto in normal cells. See calculate_logRatio_CNA
Estimates if there was a gain or a loss of copy in each cyto band. See calculate_gain_or_loss
The corresponding matrices are accessibles in the reducedDim slots "cytoBands", "logRatio_cytoBands" and "gainOrLoss_cytoBands" respectively.
calculate_CNA( scExp, control_samples = unique(scExp$sample_id)[1], ref_genome = c("hg38", "mm10")[1], quantiles_to_define_gol = c(0.05, 0.95) )
calculate_CNA( scExp, control_samples = unique(scExp$sample_id)[1], ref_genome = c("hg38", "mm10")[1], quantiles_to_define_gol = c(0.05, 0.95) )
scExp |
A SingleCellExperiment with "logRatio_cytoBand" reducedDim slot
filled. See |
control_samples |
Sample IDs of the normal sample to take as reference. |
ref_genome |
Reference genome ('hg38' or 'mm10') |
quantiles_to_define_gol |
Quantiles of normal log2-ratio distribution
below/above which cytoband is considered to be a loss/gain. (c(0.05,0.95)).
See |
The SCE with the fraction of reads, log2-ratio and gain or loss in each cytobands in each cells (of dimension cell x cytoband) in the reducedDim slots.
data("scExp") scExp = calculate_CNA(scExp, control_samples = unique(scExp$sample_id)[1], ref_genome="hg38", quantiles_to_define_gol = c(0.05,0.95)) SingleCellExperiment::reducedDim(scExp, "cytoBand") SingleCellExperiment::reducedDim(scExp, "logRatio_cytoBand") SingleCellExperiment::reducedDim(scExp, "gainOrLoss_cytoBand")
data("scExp") scExp = calculate_CNA(scExp, control_samples = unique(scExp$sample_id)[1], ref_genome="hg38", quantiles_to_define_gol = c(0.05,0.95)) SingleCellExperiment::reducedDim(scExp, "cytoBand") SingleCellExperiment::reducedDim(scExp, "logRatio_cytoBand") SingleCellExperiment::reducedDim(scExp, "gainOrLoss_cytoBand")
Re-Count binned reads onto cytobands and calculate the fraction of reads in each of the cytoband in each cell. For each cell, the fraction of reads in any given cytoband is calculated. Cytobands are considered large enough in order that a variation at the cytoband level is not considered as an epigenetic event but as a genetic event, e.g. Copy Number Alterations.
calculate_cyto_mat(scExp, ref_genome = c("hg38", "mm10")[1])
calculate_cyto_mat(scExp, ref_genome = c("hg38", "mm10")[1])
scExp |
A SingleCellExperiment with genomic coordinate as features (peaks or bins) |
ref_genome |
Reference genome ('hg38' or 'mm10') |
The SCE with the fraction of reads in each cytobands in each cells (of dimension cell x cytoband ) in the reducedDim slot "cytoBand".
data("scExp") scExp = calculate_cyto_mat(scExp, ref_genome="hg38") SingleCellExperiment::reducedDim(scExp, "cytoBand")
data("scExp") scExp = calculate_cyto_mat(scExp, ref_genome="hg38") SingleCellExperiment::reducedDim(scExp, "cytoBand")
Given a SingleCellExperiment object with the slot "logRatio_cytoBand" containing the log2-ratio of the fraction of reads in each cytoband, estimate if the cytoband was lost or acquired a gain in a non-quantitative way. To do so, the quantiles distribution of the normal cells are calculated, and any cytoband below or above will be considered as a loss/gain. The False Discovery Rate is directly proportional to the quantiles.
calculate_gain_or_loss(scExp, controls, quantiles = c(0.05, 0.95))
calculate_gain_or_loss(scExp, controls, quantiles = c(0.05, 0.95))
scExp |
A SingleCellExperiment with "logRatio_cytoBand" reducedDim slot
filled. See |
controls |
Sample IDs or Cell IDs of the normal sample to take as reference. |
quantiles |
Quantiles of normal log2-ratio distribution below/above which cytoband is considered to be a loss/gain. (c(0.05,0.95)) |
The SCE with the gain or loss in each cytobands in each cells (of dimension cell x cytoband ) in the reducedDim slot "gainOrLoss_cytoBand".
data("scExp") scExp = calculate_cyto_mat(scExp, ref_genome="hg38") scExp = calculate_logRatio_CNA(scExp, controls=unique(scExp$sample_id)[1]) scExp = calculate_gain_or_loss(scExp, controls=unique(scExp$sample_id)[1]) SingleCellExperiment::reducedDim(scExp, "gainOrLoss_cytoBand")
data("scExp") scExp = calculate_cyto_mat(scExp, ref_genome="hg38") scExp = calculate_logRatio_CNA(scExp, controls=unique(scExp$sample_id)[1]) scExp = calculate_gain_or_loss(scExp, controls=unique(scExp$sample_id)[1]) SingleCellExperiment::reducedDim(scExp, "gainOrLoss_cytoBand")
Given a SingleCellExperiment object with the slot "cytoBand" containing the fraction of reads in each cytoband, calculates the log2-ratio of tumor vs normal fraction of reads in cytobands, cell by cell. If the average signal in normal sample in a cytoband is 0, set this value to 1 so that the ratio won't affect the fraction of read value.
calculate_logRatio_CNA(scExp, controls)
calculate_logRatio_CNA(scExp, controls)
scExp |
A SingleCellExperiment with "cytoBand" reducedDim slot filled. |
controls |
Sample IDs or Cell IDs of the normal sample to take as reference. |
The SCE with the log2-ratio of fraction of reads in each cytobands in each cells (of dimension cell x cytoband ) in the reducedDim slot "logRatio_cytoBand".
data("scExp") scExp = calculate_cyto_mat(scExp, ref_genome="hg38") scExp = calculate_logRatio_CNA(scExp, controls=unique(scExp$sample_id)[1]) SingleCellExperiment::reducedDim(scExp, "logRatio_cytoBand")
data("scExp") scExp = calculate_cyto_mat(scExp, ref_genome="hg38") scExp = calculate_logRatio_CNA(scExp, controls=unique(scExp$sample_id)[1]) SingleCellExperiment::reducedDim(scExp, "logRatio_cytoBand")
Calling MACS2 peak caller and merging resulting peaks
call_macs2_merge_peaks( affectation, odir, p.value, format = c("scBED", "BAM")[1], ref, peak_distance_to_merge )
call_macs2_merge_peaks( affectation, odir, p.value, format = c("scBED", "BAM")[1], ref, peak_distance_to_merge )
affectation |
Annotation data.frame with cell cluster and cell id information |
odir |
Output directory to write MACS2 output |
p.value |
P value to detect peaks, passed to MACS2 |
format |
File format, either "BAM" or "scBED" |
ref |
Reference genome to get chromosome information from. |
peak_distance_to_merge |
Distance to merge peaks |
A list of merged GRanges peaks
changeRange
changeRange(v, newmin = 1, newmax = 10)
changeRange(v, newmin = 1, newmax = 10)
v |
A numeric vector |
newmin |
New min |
newmax |
New max |
A matrix with values scaled between newmin and newmax
This data.frame was obtained by downloading datasets from ChEA3 database (https://maayanlab.cloud/chea3/) and merging targets for :
ARCHS4_Coexpression
ENCODE_ChIP-seq
Enrichr_Queries
GTEx_Coexpression
Literature_ChIP-seq
ReMap_ChIP-seq
data("CheA3_TF_nTargets")
data("CheA3_TF_nTargets")
CheA3_TF_nTargets - a data.frame with 1632 rows (unique TFs) and 2 columns
Keenan AB, Torre D, Lachmann A, Leong AK, Wojciechowicz M, Utti V, Jagodnik K, Kropiwnicki E, Wang Z, Ma'ayan A (2019) ChEA3: transcription factor enrichment analysis by orthogonal omics integration. Nucleic Acids Research. doi: 10.1093/nar/gkz446
The data.frame is composed of two columns:
TF column containing the TF gene names (human)
nTargets_TF containing the number of targets for this TF in the combined database.
data("CheA3_TF_nTargets") head(CheA3_TF_nTargets)
data("CheA3_TF_nTargets") head(CheA3_TF_nTargets)
Throws warnings / error if matrix is in the wrong format
check_correct_datamatrix(datamatrix_single, sample_name = "")
check_correct_datamatrix(datamatrix_single, sample_name = "")
datamatrix_single |
A sparse matrix |
sample_name |
Matrix sample name for warnings |
A sparseMatrix in the right rownames format
This functions takes as input a SingleCellExperiment object and a number of cluster to select. It outputs a SingleCellExperiment object with each cell assigned to a correlation cluster in colData. Also calculates a hierarchical clustering of the consensus associations calculated by ConsensusClusterPlus.
choose_cluster_scExp( scExp, nclust = 3, consensus = FALSE, hc_linkage = "ward.D" )
choose_cluster_scExp( scExp, nclust = 3, consensus = FALSE, hc_linkage = "ward.D" )
scExp |
A SingleCellExperiment object containing consclust in metadata. |
nclust |
Number of cluster to pick (3) |
consensus |
Use consensus clustering results instead of simple hierarchical clustering ? (FALSE) |
hc_linkage |
A linkage method for hierarchical clustering. See cor. ('ward.D') |
Returns a SingleCellExperiment object with each cell assigned to a correlation cluster in colData.
data("scExp") scExp_cf = correlation_and_hierarchical_clust_scExp(scExp) scExp_cf = choose_cluster_scExp(scExp_cf,nclust=3,consensus=FALSE) table(scExp_cf$cell_cluster) scExp_cf = consensus_clustering_scExp(scExp) scExp_cf_consensus = choose_cluster_scExp(scExp_cf,nclust=3,consensus=TRUE) table(scExp_cf_consensus$cell_cluster)
data("scExp") scExp_cf = correlation_and_hierarchical_clust_scExp(scExp) scExp_cf = choose_cluster_scExp(scExp_cf,nclust=3,consensus=FALSE) table(scExp_cf$cell_cluster) scExp_cf = consensus_clustering_scExp(scExp) scExp_cf_consensus = choose_cluster_scExp(scExp_cf,nclust=3,consensus=TRUE) table(scExp_cf_consensus$cell_cluster)
Choose perplexity depending on number of cells for Tsne
choose_perplexity(dataset)
choose_perplexity(dataset)
dataset |
A matrix of features x cells (rows x columns) |
A number between 5 and 30 to use in Rtsne function
Transform character color to hexadecimal color code.
col2hex(cname)
col2hex(cname)
cname |
Color name |
The HEX color code of a particular color
Adding colors to cells & features
colors_scExp( scExp, annotCol = "sample_id", color_by = "sample_id", color_df = NULL )
colors_scExp( scExp, annotCol = "sample_id", color_by = "sample_id", color_df = NULL )
scExp |
A SingleCellExperiment Object |
annotCol |
Column names to color |
color_by |
If specifying color_df, column names to color |
color_df |
Color data.frame to specify which color for which condition |
A SingleCellExperiment with additionnal "color" columns in colData
data("scExp") scExp = colors_scExp(scExp,annotCol = c("sample_id", "total_counts"), color_by = c("sample_id","total_counts")) #Specific colors using a manually created data.frame : color_df = data.frame(sample_id=unique(scExp$sample_id), sample_id_color=c("red","blue","green","yellow")) scExp = colors_scExp(scExp,annotCol="sample_id", color_by="sample_id",color_df=color_df)
data("scExp") scExp = colors_scExp(scExp,annotCol = c("sample_id", "total_counts"), color_by = c("sample_id","total_counts")) #Specific colors using a manually created data.frame : color_df = data.frame(sample_id=unique(scExp$sample_id), sample_id_color=c("red","blue","green","yellow")) scExp = colors_scExp(scExp,annotCol="sample_id", color_by="sample_id",color_df=color_df)
Combine two matrices and emit warning if no regions are in common
combine_datamatrix(datamatrix, datamatrix_single, file_names, i)
combine_datamatrix(datamatrix, datamatrix_single, file_names, i)
datamatrix |
A sparse matrix or NULL if empty |
datamatrix_single |
Another sparse matrix |
file_names |
File name corresponding to the matrix for warnings |
i |
file number |
A combined sparse matrix
Run enrichment tests and combine into list
combine_enrichmentTests( diff, enrichment_qval, qval.th, logFC.th, min.percent, annotFeat_long, peak_distance, refined_annotation, GeneSets, GeneSetsDf, GenePool, progress = NULL )
combine_enrichmentTests( diff, enrichment_qval, qval.th, logFC.th, min.percent, annotFeat_long, peak_distance, refined_annotation, GeneSets, GeneSetsDf, GenePool, progress = NULL )
diff |
Differential list |
enrichment_qval |
Adusted p-value threshold above which a pathway is considered significative list |
qval.th |
Differential analysis adjusted p.value threshold |
logFC.th |
Differential analysis log-fold change threshold |
min.percent |
Minimum fraction of cells having the feature active to consider it as significantly differential. (0.01) |
annotFeat_long |
Long annotation |
peak_distance |
Maximum gene to peak distance |
refined_annotation |
Refined annotation data.frame if peak calling is done |
GeneSets |
List of pathways |
GeneSetsDf |
Data.frame of pathways |
GenePool |
Pool of possible genes for testing |
progress |
A shiny Progress instance to display progress bar. |
A list of list of pathway enrichment data.frames for Both / Over / Under and for each cluster
Find comparable variable scExp
comparable_variables(scExp, allExp = TRUE)
comparable_variables(scExp, allExp = TRUE)
scExp |
A SingleCellExperiment |
allExp |
A logical indicating wether alternative experiments comparable variables should also be fetch. |
A character vector with the comparable variable names
Creates a summary table with the number of genes under- or overexpressed in each group and outputs several graphical representations
CompareedgeRGLM( dataMat = NULL, annot = NULL, ref_group = NULL, groups = NULL, featureTab = NULL, norm_method = "TMMwsp" )
CompareedgeRGLM( dataMat = NULL, annot = NULL, ref_group = NULL, groups = NULL, featureTab = NULL, norm_method = "TMMwsp" )
dataMat |
reads matrix |
annot |
selected annotation of interest |
ref_group |
List containing one or more vectors of reference samples. Name of the vectors will be used in the results table. The length of this list should be 1 or the same length as the groups list |
groups |
List containing the IDs of groups to be compared with the reference samples. Names of the vectors will be used in the results table |
featureTab |
Feature annotations to be added to the results table |
norm_method |
Which method to use for normalizing ('upperquantile') |
A dataframe containing the foldchange and p.value of each feature
Eric Letouze & Celine Vallot
data("scExp") scExp_cf = correlation_and_hierarchical_clust_scExp(scExp) scExp_cf = choose_cluster_scExp(scExp_cf,nclust=2,consensus=FALSE) featureTab = as.data.frame(SummarizedExperiment::rowRanges(scExp_cf)) rownames(featureTab) = featureTab$ID ref_group = list("C1"=scExp_cf$cell_id[which(scExp_cf$cell_cluster=="C1")]) groups = list("C2"=scExp_cf$cell_id[which(scExp_cf$cell_cluster=="C2")]) myres = CompareedgeRGLM(as.matrix(SingleCellExperiment::counts(scExp_cf)), annot=as.data.frame(SingleCellExperiment::colData(scExp_cf)), ref_group=ref_group,groups=groups, featureTab=featureTab)
data("scExp") scExp_cf = correlation_and_hierarchical_clust_scExp(scExp) scExp_cf = choose_cluster_scExp(scExp_cf,nclust=2,consensus=FALSE) featureTab = as.data.frame(SummarizedExperiment::rowRanges(scExp_cf)) rownames(featureTab) = featureTab$ID ref_group = list("C1"=scExp_cf$cell_id[which(scExp_cf$cell_cluster=="C1")]) groups = list("C2"=scExp_cf$cell_id[which(scExp_cf$cell_cluster=="C2")]) myres = CompareedgeRGLM(as.matrix(SingleCellExperiment::counts(scExp_cf)), annot=as.data.frame(SingleCellExperiment::colData(scExp_cf)), ref_group=ref_group,groups=groups, featureTab=featureTab)
CompareWilcox
CompareWilcox( dataMat = NULL, annot = NULL, ref_group = NULL, groups = NULL, featureTab = NULL, block = NULL, BPPARAM = BiocParallel::bpparam() )
CompareWilcox( dataMat = NULL, annot = NULL, ref_group = NULL, groups = NULL, featureTab = NULL, block = NULL, BPPARAM = BiocParallel::bpparam() )
dataMat |
A raw count matrix |
annot |
A cell annotation data.frame |
ref_group |
List with cells in reference group(s) |
groups |
List with cells in group(s) to test |
featureTab |
data.frame with feature annotation |
block |
Use a blocking factor to conteract batch effect ? |
BPPARAM |
BPPARAM object for multiprocessing. See bpparam for more informations. Will take the default BPPARAM set in your R session. |
A dataframe containing the foldchange and p.value of each feature
Eric Letouze & Celine Vallot & Pacome Prompsy
data("scExp") scExp_cf = correlation_and_hierarchical_clust_scExp(scExp) scExp_cf = choose_cluster_scExp(scExp_cf,nclust=2,consensus=FALSE) featureTab = as.data.frame(SummarizedExperiment::rowRanges(scExp_cf)) rownames(featureTab) = featureTab$ID ref_group = list("C1"=scExp_cf$cell_id[which(scExp_cf$cell_cluster=="C1")]) groups = list("C2"=scExp_cf$cell_id[which(scExp_cf$cell_cluster=="C2")]) myres = CompareWilcox(as.matrix(SingleCellExperiment::normcounts(scExp_cf)), annot=as.data.frame(SingleCellExperiment::colData(scExp_cf)), ref_group=ref_group,groups=groups, featureTab=featureTab)
data("scExp") scExp_cf = correlation_and_hierarchical_clust_scExp(scExp) scExp_cf = choose_cluster_scExp(scExp_cf,nclust=2,consensus=FALSE) featureTab = as.data.frame(SummarizedExperiment::rowRanges(scExp_cf)) rownames(featureTab) = featureTab$ID ref_group = list("C1"=scExp_cf$cell_id[which(scExp_cf$cell_cluster=="C1")]) groups = list("C2"=scExp_cf$cell_id[which(scExp_cf$cell_cluster=="C2")]) myres = CompareWilcox(as.matrix(SingleCellExperiment::normcounts(scExp_cf)), annot=as.data.frame(SingleCellExperiment::colData(scExp_cf)), ref_group=ref_group,groups=groups, featureTab=featureTab)
Concatenate single-cell BED into clusters
concatenate_scBed_into_clusters(affectation, files_list, odir)
concatenate_scBed_into_clusters(affectation, files_list, odir)
affectation |
Annotation data.frame containing cluster information |
files_list |
Named list of scBED file paths to concatenate. List Names must match affectation$sample_id and basenames must match affectation$barcode. |
odir |
Output directory to write concatenate pseudo-bulk BEDs. |
Merge single-cell BED files into cluster BED files. Ungzip file if BED is gzipped.
Runs consensus hierarchical clustering on PCA feature space of scExp object. Plot consensus scores for each number of clusters. See ConsensusClusterPlus - Wilkerson, M.D., Hayes, D.N. (2010). ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking. Bioinformatics, 2010 Jun 15;26(12):1572-3.
consensus_clustering_scExp( scExp, prefix = NULL, maxK = 10, reps = 100, pItem = 0.8, pFeature = 1, distance = "pearson", clusterAlg = "hc", innerLinkage = "ward.D", finalLinkage = "ward.D", plot_consclust = "pdf", plot_icl = "png" )
consensus_clustering_scExp( scExp, prefix = NULL, maxK = 10, reps = 100, pItem = 0.8, pFeature = 1, distance = "pearson", clusterAlg = "hc", innerLinkage = "ward.D", finalLinkage = "ward.D", plot_consclust = "pdf", plot_icl = "png" )
scExp |
A SingleCellExperiment object containing 'PCA' in reducedDims. |
prefix |
character value for output directory. Directory is created only if plot_consclust is not NULL. This title can be an abosulte or relative path. |
maxK |
integer value. maximum cluster number to evaluate. (10) |
reps |
integer value. number of subsamples. (100) |
pItem |
numerical value. proportion of items to sample. (0.8) |
pFeature |
numerical value. proportion of features to sample. (1) |
distance |
character value. 'pearson': (1 - Pearson correlation), 'spearman' (1 - Spearman correlation), 'euclidean', 'binary', 'maximum', 'canberra', 'minkowski' or custom distance function. ('pearson') |
clusterAlg |
character value. cluster algorithm. 'hc' heirarchical (hclust), 'pam' for paritioning around medoids, 'km' for k-means upon data matrix, 'kmdist' ('hc') for k-means upon distance matrices (former km option), or a function that returns a clustering. ('hc') |
innerLinkage |
hierarchical linkage method for subsampling. ('ward.D') |
finalLinkage |
hierarchical linkage method for consensus matrix. ('ward.D') |
plot_consclust |
character value. NULL - print to screen, 'pdf', 'png', 'pngBMP' for bitmap png, helpful for large datasets. ('pdf') |
plot_icl |
same as above for item consensus plot. ('png') |
This functions takes as input a SingleCellExperiment object that must have 'PCA' in reducedDims and outputs a SingleCellExperiment object containing consclust list calculated cluster consensus and item consensus scores in metadata.
Returns a SingleCellExperiment object containing consclust list, calculated cluster consensus and item consensus scores in metadata.
ConsensusClusterPlus package by Wilkerson, M.D., Hayes, D.N. (2010). ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking. Bioinformatics, 2010 Jun 15;26(12):1572-3.
data("scExp") scExp_cf = correlation_and_hierarchical_clust_scExp(scExp) scExp_cf = consensus_clustering_scExp(scExp)
data("scExp") scExp_cf = correlation_and_hierarchical_clust_scExp(scExp) scExp_cf = consensus_clustering_scExp(scExp)
Calculates cell to cell correlation matrix based on the PCA feature space and runs hierarchical clustering taking 1 - correlation scores as distance.
correlation_and_hierarchical_clust_scExp(scExp, hc_linkage = "ward.D")
correlation_and_hierarchical_clust_scExp(scExp, hc_linkage = "ward.D")
scExp |
A SingleCellExperiment object, containing 'PCA' in reducedDims. |
hc_linkage |
A linkage method for hierarchical clustering. See cor. ('ward.D') |
This functions takes as input a SingleCellExperiment object that must have PCA calculated and outputs a SingleCellExperiment object with correlation matrix and hierarchical clustering.
Return a SingleCellExperiment object with correlation matrix & hiearchical clustering.
data("scExp") scExp_cf = correlation_and_hierarchical_clust_scExp(scExp)
data("scExp") scExp_cf = correlation_and_hierarchical_clust_scExp(scExp)
Normalization is CPM, smoothing is done by averaging on n_smoothBin regions left and right of any given region.
count_coverage( input, format = "BAM", bins, canonical_chr, norm_factor, n_smoothBin = 5, ref = "hg38", read_size = 101, original_bins = NULL )
count_coverage( input, format = "BAM", bins, canonical_chr, norm_factor, n_smoothBin = 5, ref = "hg38", read_size = 101, original_bins = NULL )
input |
Either a named list of character vector of path towards single-cell BED files or a sparse raw matrix of small bins (<<500bp). If a named list specifying scBEDn the names MUST correspond to the 'sample_id' column in your SingleCellExperiment object. The single-cell BED files names MUST match the barcode names in your SingleCellExperiment (column 'barcode'). The scBED files can be gzipped or not. |
format |
File format, either "BAM" or "BED" |
bins |
A GenomicRanges object of binned genome |
canonical_chr |
GenomicRanges of the chromosomes to read the BAM file. |
norm_factor |
Then number of cells or total number of reads in the given sample, for normalization. |
n_smoothBin |
Number of bins left and right to smooth the signal. |
ref |
Genomic reference |
read_size |
Length of the reads |
original_bins |
Original bins GenomicRanges in case the format is raw matrix. |
A binned GenomicRanges that can be readily exported into bigwig file.
Creates a project folder that will be recognizable by ChromSCape Shiny application.
create_project_folder( output_directory, analysis_name = "Analysis_1", ref_genome = c("hg38", "mm10")[1] )
create_project_folder( output_directory, analysis_name = "Analysis_1", ref_genome = c("hg38", "mm10")[1] )
output_directory |
Path towards the directory to create the 'ChromSCape_Analyses' folder and the analysis subfolder. If this path already contains the 'ChromSCape_Analyses' folder, will only create the analysis subfolder. |
analysis_name |
Name of the analysis. Must only contain alphanumerical characters or '_'. |
ref_genome |
Reference genome, either 'hg38' or 'mm10'. |
Creates the project folder and returns the root of the project.
dir = tempdir() create_project_folder(output_directory = dir, analysis_name = "Analysis_1") list.dirs(file.path(dir))
dir = tempdir() create_project_folder(output_directory = dir, analysis_name = "Analysis_1") list.dirs(file.path(dir))
Create a sample name matrix
create_sample_name_mat(nb_samples, samples_names)
create_sample_name_mat(nb_samples, samples_names)
nb_samples |
Number of samples |
samples_names |
Character vector of sample names |
A matrix
Create a simulated single cell datamatrix & cell annotation
create_scDataset_raw( cells = 300, features = 600, featureType = c("window", "peak", "gene"), sparse = TRUE, nsamp = 4, ref = "hg38", batch_id = factor(rep(1, nsamp)) )
create_scDataset_raw( cells = 300, features = 600, featureType = c("window", "peak", "gene"), sparse = TRUE, nsamp = 4, ref = "hg38", batch_id = factor(rep(1, nsamp)) )
cells |
Number of cells (300) |
features |
Number of features (600) |
featureType |
Type of feature (window) |
sparse |
Is matrix sparse ? (TRUE) |
nsamp |
Number of samples (4) |
ref |
Reference genome ('hg38') |
batch_id |
Batch origin (factor((1,1,1,1)) |
A list composed of * mat : a sparse matrix following an approximation of the negative binomial law (adapted to scChIPseq) * annot : a data.frame of cell annotation * batches : an integer vector with the batch number for each cell
# Creating a basic sparse 600 genomic bins x 300 cells matrix and annotation l = create_scDataset_raw() head(l$mat) head(l$annot) head(l$batches) # Specifying number of cells, features and samples l2 = create_scDataset_raw(cells = 500, features = 500, nsamp=2) # Specifying species mouse_l = create_scDataset_raw(ref="mm10") # Specifying batches batch_l = create_scDataset_raw(nsamp=4, batch_id = factor(c(1,1,2,2))) # Peaks of different size as features peak_l = create_scDataset_raw(featureType="peak") head(peak_l$mat) # Genes as features gene_l = create_scDataset_raw(featureType="gene") head(gene_l$mat)
# Creating a basic sparse 600 genomic bins x 300 cells matrix and annotation l = create_scDataset_raw() head(l$mat) head(l$annot) head(l$batches) # Specifying number of cells, features and samples l2 = create_scDataset_raw(cells = 500, features = 500, nsamp=2) # Specifying species mouse_l = create_scDataset_raw(ref="mm10") # Specifying batches batch_l = create_scDataset_raw(nsamp=4, batch_id = factor(c(1,1,2,2))) # Peaks of different size as features peak_l = create_scDataset_raw(featureType="peak") head(peak_l$mat) # Genes as features gene_l = create_scDataset_raw(featureType="gene") head(gene_l$mat)
Create the single cell experiment from (sparse) datamatrix and feature dataframe containing feature names and location. Also optionally removes zero count Features, zero count Cells, non canconical chromosomes, and chromosome M. Calculates QC Metrics (scran).
create_scExp( datamatrix, annot, remove_zero_cells = TRUE, remove_zero_features = TRUE, remove_non_canonical = TRUE, remove_chr_M = TRUE, mainExpName = "main", verbose = TRUE )
create_scExp( datamatrix, annot, remove_zero_cells = TRUE, remove_zero_features = TRUE, remove_non_canonical = TRUE, remove_chr_M = TRUE, mainExpName = "main", verbose = TRUE )
datamatrix |
A matrix or sparseMatrix of raw counts. Features x Cells (rows x columns). |
annot |
A data.frame containing informations on cells. Should have the same number of rows as the number of columns in datamatrix. |
remove_zero_cells |
remove cells with zero counts ? (TRUE) |
remove_zero_features |
remove cells with zero counts ? (TRUE) |
remove_non_canonical |
remove non canonical chromosomes ?(TRUE) |
remove_chr_M |
remove chromosomes M ? (TRUE) |
mainExpName |
Name of the mainExpName e.g. 'bins', 'peaks'... ("default") |
verbose |
(TRUE) |
Returns a SingleCellExperiment object.
raw <- create_scDataset_raw() scExp = create_scExp(raw$mat, raw$annot) scExp
raw <- create_scDataset_raw() scExp = create_scExp(raw$mat, raw$annot) scExp
Differential Analysis Custom in 'One vs One' mode
DA_custom( affectation, by, counts, method, feature, block, ref, group, progress = NULL, BPPARAM = BiocParallel::bpparam() )
DA_custom( affectation, by, counts, method, feature, block, ref, group, progress = NULL, BPPARAM = BiocParallel::bpparam() )
affectation |
An annotation data.frame with cell_id and |
by |
= A character specifying the column of the object containing the groups of cells to compare. |
counts |
Count matrix |
method |
DA method : Wilcoxon or EdgeR |
feature |
Feature tables |
block |
Blocking feature |
ref |
If de_type is custom, the reference to compare (data.frame), must be a one-column data.frame with cell_clusters or sample_id as character in rows |
group |
If de_type is custom, the group to compare (data.frame), must be a one-column data.frame with cell_clusters or sample_id as character in rows |
progress |
A shiny Progress instance to display progress bar. |
BPPARAM |
BPPARAM object for multiprocessing. See bpparam for more informations. Will take the default BPPARAM set in your R session. |
A list of results, groups compared and references
Differential Analysis in 'One vs Rest' mode
DA_one_vs_rest( affectation, by, counts, method, feature, block, progress = NULL, BPPARAM = BiocParallel::bpparam() )
DA_one_vs_rest( affectation, by, counts, method, feature, block, progress = NULL, BPPARAM = BiocParallel::bpparam() )
affectation |
An annotation data.frame with cell_id and cell_cluster columns |
by |
= A character specifying the column of the object containing the groups of cells to compare. |
counts |
Count matrix |
method |
DA method : Wilcoxon or EdgeR |
feature |
Feature tables |
block |
Blocking feature |
progress |
A shiny Progress instance to display progress bar. |
BPPARAM |
BPPARAM object for multiprocessing. See bpparam for more informations. Will take the default BPPARAM set in your R session. |
A list of results, groups compared and references
Run differential analysis in Pairwise mode
DA_pairwise( affectation, by, counts, method, feature, block, progress = NULL, BPPARAM = BiocParallel::bpparam() )
DA_pairwise( affectation, by, counts, method, feature, block, progress = NULL, BPPARAM = BiocParallel::bpparam() )
affectation |
An annotation data.frame with cell_cluster and cell_id columns |
by |
= A character specifying the column of the object containing the groups of cells to compare. |
counts |
Count matrix |
method |
DA method, Wilcoxon or edgeR |
feature |
Feature data.frame |
block |
Blocking feature |
progress |
A shiny Progress instance to display progress bar. |
BPPARAM |
BPPARAM object for multiprocessing. See bpparam for more informations. Will take the default BPPARAM set in your R session. |
A list of results, groups compared and references
Define the features on which reads will be counted
define_feature(ref = c("hg38","mm10")[1], peak_file = NULL, bin_width = NULL, genebody = FALSE, extendPromoter = 2500)
define_feature(ref = c("hg38","mm10")[1], peak_file = NULL, bin_width = NULL, genebody = FALSE, extendPromoter = 2500)
ref |
Reference genome |
peak_file |
A bed file if counting on peaks |
bin_width |
A number of bins if divinding genome into fixed width bins |
genebody |
A logical indicating if feature should be counted in genebodies and promoter. |
extendPromoter |
Extension length before TSS (2500). |
A GRanges object
gr_bins = define_feature("hg38", bin_width = 50000) gr_genes = define_feature("hg38", genebody = TRUE, extendPromoter = 5000)
gr_bins = define_feature("hg38", bin_width = 50000) gr_genes = define_feature("hg38", genebody = TRUE, extendPromoter = 5000)
Identify a fixed number of common string (samples) in a set of varying strings (cells). E.g. in the set "Sample1_cell1","Sample1_cell2","Sample2_cell1","Sample2_cell2" and with nb_samples=2, the function returns "Sample1","Sample1","Sample2","Sample2".
detect_samples(barcodes, nb_samples = 1)
detect_samples(barcodes, nb_samples = 1)
barcodes |
Vector of cell barcode names (e.g. Sample1_cell1, Sample1_cell2...) |
nb_samples |
Number of samples to find |
character vector of sample names the same length as cell labels
barcodes = c(paste0("HBCx22_BC_",seq_len(100)), paste0("mouse_sample_XX",208:397)) samples = detect_samples(barcodes, nb_samples=2)
barcodes = c(paste0("HBCx22_BC_",seq_len(100)), paste0("mouse_sample_XX",208:397)) samples = detect_samples(barcodes, nb_samples=2)
Based on the statement that single-cell epigenomic dataset are very sparse, specifically when analysis small bins or peaks, we can define each feature as being 'active' or not simply by the presence or the absence of reads in this feature. This is the equivalent of binarize the data. When trying to find differences in signal for a feature between multiple cell groups, this function simply compare the percentage of cells 'activating' the feature in each of the group. The p.values are then calculated using a Pearson's Chi-squared Test for Count Data (comparing the number of active cells in one group vs the other) and corrected using Benjamini-Hochberg correction for multiple testing.
differential_activation( scExp, by = c("cell_cluster", "sample_id")[1], verbose = TRUE, progress = NULL )
differential_activation( scExp, by = c("cell_cluster", "sample_id")[1], verbose = TRUE, progress = NULL )
scExp |
A SingleCellExperiment object containing consclust with selected number of cluster. |
by |
Which grouping to run the marker enrichment ? |
verbose |
Print ? |
progress |
A shiny Progress instance to display progress bar. |
To calculate the logFC, the percentage of activation of the features are corrected for total number of reads to correct for library size bias. For each cluster ('group') the function consider the rest of the cells as the reference.
Returns a dataframe of differential activation results that contains the rowData of the SingleCellExperiment with additional logFC, q.value, group activation (fraction of cells active for each feature in the group cells), reference activation (fraction of cells active for each feature in the reference cells).
For Pearson's Chi-squared Test for Count Data chisq.test. For other differential analysis see differential_analysis_scExp.
data("scExp") res = differential_activation(scExp, by = "cell_cluster") res = differential_activation(scExp, by = "sample_id")
data("scExp") res = differential_activation(scExp, by = "cell_cluster") res = differential_activation(scExp, by = "sample_id")
Based on clusters of cell defined previously, runs non-parametric Wilcoxon Rank Sum test to find significantly depleted or enriched features, in 'one_vs_rest' mode or 'pairwise' mode. In pairwise mode, each cluster is compared to all other cluster individually, and then pairwise comparisons between clusters are combined to find overall differential features using combineMarkers function from scran.
differential_analysis_scExp( scExp, de_type = c("one_vs_rest_fast", "one_vs_rest", "pairwise", "custom")[1], by = "cell_cluster", method = "wilcox", block = NULL, group = NULL, ref = NULL, prioritize_genes = nrow(scExp) > 20000, max_distanceToTSS = 1000, progress = NULL, BPPARAM = BiocParallel::bpparam() )
differential_analysis_scExp( scExp, de_type = c("one_vs_rest_fast", "one_vs_rest", "pairwise", "custom")[1], by = "cell_cluster", method = "wilcox", block = NULL, group = NULL, ref = NULL, prioritize_genes = nrow(scExp) > 20000, max_distanceToTSS = 1000, progress = NULL, BPPARAM = BiocParallel::bpparam() )
scExp |
A SingleCellExperiment object containing consclust with selected number of cluster. |
de_type |
Type of comparisons. Either 'one_vs_rest', to compare each cluster against all others, or 'pairwise' to make 1 to 1 comparisons. ('one_vs_rest') |
by |
= A character specifying the column of the object containing the groups of cells to compare. Exclusive with de_type == custom |
method |
Differential testing method, either 'wilcox' for Wilcoxon non- parametric testing or 'neg.binomial' for edgerGLM based testing. ('wilcox') |
block |
Use batches as blocking factors ? If TRUE, block will be taken as the column "batch_id" from the SCE. Cells will be compared only within samples belonging to the same batch. |
group |
If de_type = "custom", the sample / cluster of interest as a one- column data.frame. The name of the column is the group name and the values are character either cluster ("C1", "C2", ...) or sample_id. |
ref |
If de_type = "custom", the sample / cluster of reference as a one- column data.frame. The name of the column is the group name and the values are character either cluster ("C1", "C2", ...) or sample_id. |
prioritize_genes |
First filter by loci being close to genes ? E.g. for differential analysis, it is more relevant to keep features close to genes |
max_distanceToTSS |
If prioritize_genes is TRUE, the maximum distance to consider a feature close to a gene. |
progress |
A shiny Progress instance to display progress bar. |
BPPARAM |
BPPARAM object for multiprocessing. See bpparam for more informations. Will take the default BPPARAM set in your R session. |
This functions takes as input a SingleCellExperiment object with consclust, the type of comparison, either 'one_vs_rest' or 'pairwise', the adjusted p-value threshold (qval.th) and the fold-change threshold (logFC.th). It outputs a SingleCellExperiment object containing a differential list.
Returns a SingleCellExperiment object containing a differential list.
data("scExp") scExp_cf = differential_analysis_scExp(scExp)
data("scExp") scExp_cf = differential_analysis_scExp(scExp)
distPearson
distPearson(m)
distPearson(m)
m |
A matrix |
A dist object
Find the TF that are enriched in the differential genes using ChEA3 API
enrich_TF_ChEA3_genes(genes)
enrich_TF_ChEA3_genes(genes)
genes |
A character vector with the name of genes to enrich for TF. |
Returns a SingleCellExperiment object containing list of enriched Gene Sets for each cluster, either in depleted features, enriched features or simply differential features (both).
Keenan AB, Torre D, Lachmann A, Leong AK, Wojciechowicz M, Utti V, Jagodnik K, Kropiwnicki E, Wang Z, Ma'ayan A (2019) ChEA3: transcription factor enrichment analysis by orthogonal omics integration. Nucleic Acids Research. doi: 10.1093/nar/gkz446 +
data(scExp) enrich_TF_ChEA3_genes(head(unlist(strsplit(SummarizedExperiment::rowData(scExp)$Gene, split = ",", fixed = TRUE)), 15))
data(scExp) enrich_TF_ChEA3_genes(head(unlist(strsplit(SummarizedExperiment::rowData(scExp)$Gene, split = ",", fixed = TRUE)), 15))
Find the TF that are enriched in the differential genes using ChEA3 database
enrich_TF_ChEA3_scExp( scExp, ref = "hg38", qval.th = 0.01, logFC.th = 1, min.percent = 0.01, peak_distance = 1000, use_peaks = FALSE, progress = NULL, verbose = TRUE )
enrich_TF_ChEA3_scExp( scExp, ref = "hg38", qval.th = 0.01, logFC.th = 1, min.percent = 0.01, peak_distance = 1000, use_peaks = FALSE, progress = NULL, verbose = TRUE )
scExp |
A SingleCellExperiment object containing list of differential features. |
ref |
A reference annotation, either 'hg38' or 'mm10'. ('hg38') |
qval.th |
Adjusted p-value threshold to define differential features. (0.01) |
logFC.th |
Fold change threshold to define differential features. (1) |
min.percent |
Minimum fraction of cells having the feature active to consider it as significantly differential. (0.01) |
peak_distance |
Maximum distanceToTSS of feature to gene TSS to consider associated, in bp. (1000) |
use_peaks |
Use peak calling method (must be calculated beforehand). (FALSE) |
progress |
A shiny Progress instance to display progress bar. |
verbose |
A logical to print message or not. (TRUE) |
Returns a SingleCellExperiment object containing list of enriched Gene Sets for each cluster, either in depleted features, enriched features or simply differential features (both).
data("scExp") scExp = enrich_TF_ChEA3_scExp( scExp, ref = "hg38", qval.th = 0.01, logFC.th = 1, min.percent = 0.01)
data("scExp") scExp = enrich_TF_ChEA3_scExp( scExp, ref = "hg38", qval.th = 0.01, logFC.th = 1, min.percent = 0.01)
enrichmentTest
enrichmentTest(gene.sets, mylist, possibleIds, sep = ";", silent = FALSE)
enrichmentTest(gene.sets, mylist, possibleIds, sep = ";", silent = FALSE)
gene.sets |
A list of reference gene sets |
mylist |
A list of genes to test |
possibleIds |
All existing genes |
sep |
Separator used to collapse genes |
silent |
Silent mode ? |
A dataframe with the gene sets and their enrichment p.value
Remove specific features (CNA, repeats)
exclude_features_scExp( scExp, features_to_exclude, by = "region", verbose = TRUE )
exclude_features_scExp( scExp, features_to_exclude, by = "region", verbose = TRUE )
scExp |
A SingleCellExperiment object. |
features_to_exclude |
A GenomicRanges object or data.frame containing genomic regions or features to exclude or path towards a BED file containing the features to exclude. |
by |
Type of features. Either 'region' or 'feature_name'. If 'region', will look for genomic coordinates in columns 1-3 (chr,start,stop). If 'feature_name', will look for a genes in first column. ('region') |
verbose |
(TRUE) |
A SingleCellExperiment object without features to exclude.
raw <- create_scDataset_raw() scExp = create_scExp(raw$mat, raw$annot) features_to_exclude = data.frame(chr=c("chr4","chr7","chr17"), start=c(50000,8000000,2000000), end=c(100000,16000000,2500000)) features_to_exclude = as(features_to_exclude,"GRanges") scExp = exclude_features_scExp(scExp,features_to_exclude) scExp
raw <- create_scDataset_raw() scExp = create_scExp(raw$mat, raw$annot) features_to_exclude = data.frame(chr=c("chr4","chr7","chr17"), start=c(50000,8000000,2000000), end=c(100000,16000000,2500000)) features_to_exclude = as(features_to_exclude,"GRanges") scExp = exclude_features_scExp(scExp,features_to_exclude) scExp
Add gene annotations to features
feature_annotation_scExp(scExp, ref = "hg38", reference_annotation = NULL)
feature_annotation_scExp(scExp, ref = "hg38", reference_annotation = NULL)
scExp |
A SingleCellExperiment object. |
ref |
Reference genome. Either 'hg38' or 'mm10'. ('hg38') |
reference_annotation |
A data.frame containing gene (or else) annotation with genomic coordinates. |
A SingleCellExperiment object with annotated rowData.
raw <- create_scDataset_raw() scExp = create_scExp(raw$mat, raw$annot) scExp = feature_annotation_scExp(scExp) head(SummarizedExperiment::rowRanges(scExp)) # Mouse raw = create_scDataset_raw(ref = "mm10") scExp = create_scExp(raw$mat, raw$annot) scExp = feature_annotation_scExp(scExp,ref="mm10") head(SummarizedExperiment::rowRanges(scExp))
raw <- create_scDataset_raw() scExp = create_scExp(raw$mat, raw$annot) scExp = feature_annotation_scExp(scExp) head(SummarizedExperiment::rowRanges(scExp)) # Mouse raw = create_scDataset_raw(ref = "mm10") scExp = create_scExp(raw$mat, raw$annot) scExp = feature_annotation_scExp(scExp,ref="mm10") head(SummarizedExperiment::rowRanges(scExp))
Filter genes based on peak calling refined annotation
filter_genes_with_refined_peak_annotation( refined_annotation, peak_distance, signific, over, under )
filter_genes_with_refined_peak_annotation( refined_annotation, peak_distance, signific, over, under )
refined_annotation |
A data.frame containing each gene distance to real peak |
peak_distance |
Minimum distance to an existing peak to accept a given gene |
signific |
Indexes of all significantly differential genes |
over |
Indexes of all significantly overexpressed genes |
under |
Indexes of all significantly underexpressed genes |
List of significantly differential, overexpressed and underexpressed genes close enough to existing peaks
Function to filter out cells & features from SingleCellExperiment based on total count per cell, number of cells 'ON' in features and top covered cells that might be doublets.
filter_scExp( scExp, min_cov_cell = 1600, quant_removal = 95, min_count_per_feature = 10, verbose = TRUE )
filter_scExp( scExp, min_cov_cell = 1600, quant_removal = 95, min_count_per_feature = 10, verbose = TRUE )
scExp |
A SingleCellExperiment object. |
min_cov_cell |
Minimum counts for each cell. (1600) |
quant_removal |
Centile of cell counts above which cells are removed. (95) |
min_count_per_feature |
Minimum number of reads per feature (10). |
verbose |
(TRUE) |
Returns a filtered SingleCellExperiment object.
raw <- create_scDataset_raw() scExp = create_scExp(raw$mat, raw$annot) scExp. = filter_scExp(scExp) # No feature filtering (all features are valuable) scExp. = filter_scExp(scExp,min_count_per_feature=30) # No cell filtering (all features are valuable) scExp. = filter_scExp(scExp,min_cov_cell=0,quant_removal=100)
raw <- create_scDataset_raw() scExp = create_scExp(raw$mat, raw$annot) scExp. = filter_scExp(scExp) # No feature filtering (all features are valuable) scExp. = filter_scExp(scExp,min_count_per_feature=30) # No cell filtering (all features are valuable) scExp. = filter_scExp(scExp,min_cov_cell=0,quant_removal=100)
Build SNN graph and find cluster using Louvain Algorithm
find_clusters_louvain_scExp( scExp, k = 10, resolution = 1, use.dimred = "PCA", type = c("rank", "number", "jaccard")[3], BPPARAM = BiocParallel::bpparam() )
find_clusters_louvain_scExp( scExp, k = 10, resolution = 1, use.dimred = "PCA", type = c("rank", "number", "jaccard")[3], BPPARAM = BiocParallel::bpparam() )
scExp |
A SingleCellExperiment with PCA calculated |
k |
An integer scalar specifying the number of nearest neighbors to consider during graph construction. |
resolution |
A numeric specifying the resolution of clustering to pass to igraph::cluster_louvain function. |
use.dimred |
A string specifying the dimensionality reduction to use. |
type |
A string specifying the type of weighting scheme to use for shared neighbors. |
BPPARAM |
BPPARAM object for multiprocessing. See bpparam for more informations. Will take the default BPPARAM set in your R session. |
A SingleCellExperiment containing the vector of clusters (named C1, C2 ....)
data('scExp') scExp = find_clusters_louvain_scExp(scExp, k = 10)
data('scExp') scExp = find_clusters_louvain_scExp(scExp, k = 10)
Find the top most covered features that will be used for dimensionality reduction. Optionally remove non-top features.
find_top_features( scExp, n = 20000, keep_others = FALSE, prioritize_genes = FALSE, max_distanceToTSS = 10000, verbose = TRUE )
find_top_features( scExp, n = 20000, keep_others = FALSE, prioritize_genes = FALSE, max_distanceToTSS = 10000, verbose = TRUE )
scExp |
A SingleCellExperiment. |
n |
Either an integer indicating the number of top covered regions to find or a character vector of the top percentile of features to keep (e.g. 'q20' to keep top 20% features). |
keep_others |
Logical indicating if non-top regions are to be removed from the SCE or not (FALSE). |
prioritize_genes |
First filter by loci being close to genes ? E.g. for differential analysis, it is more relevant to keep features close to genes |
max_distanceToTSS |
If prioritize_genes is TRUE, the maximum distance to consider a feature close to a gene. |
verbose |
Print ? |
A SCE with top features
data(scExp) scExp_top = find_top_features(scExp, n = 4000, keep_others = FALSE)
data(scExp) scExp_top = find_top_features(scExp, n = 4000, keep_others = FALSE)
This function takes previously calculated differential features and runs hypergeometric test to look for enriched gene sets in the genes associated with differential features, for each cell cluster. This functions takes as input a SingleCellExperiment object with consclust, the type of comparison, either 'one_vs_rest' or 'pairwise', the adjusted p-value threshold (qval.th) and the fold-change threshold (logFC.th). It outputs a SingleCellExperiment object containing a differential list.
gene_set_enrichment_analysis_scExp( scExp, enrichment_qval = 0.1, ref = "hg38", GeneSets = NULL, GeneSetsDf = NULL, GenePool = NULL, qval.th = 0.01, logFC.th = 1, min.percent = 0.01, peak_distance = 1000, use_peaks = FALSE, GeneSetClasses = c("c1_positional", "c2_curated", "c3_motif", "c4_computational", "c5_GO", "c6_oncogenic", "c7_immunologic", "hallmark"), progress = NULL )
gene_set_enrichment_analysis_scExp( scExp, enrichment_qval = 0.1, ref = "hg38", GeneSets = NULL, GeneSetsDf = NULL, GenePool = NULL, qval.th = 0.01, logFC.th = 1, min.percent = 0.01, peak_distance = 1000, use_peaks = FALSE, GeneSetClasses = c("c1_positional", "c2_curated", "c3_motif", "c4_computational", "c5_GO", "c6_oncogenic", "c7_immunologic", "hallmark"), progress = NULL )
scExp |
A SingleCellExperiment object containing list of differential features. |
enrichment_qval |
Adjusted p-value threshold for gene set enrichment. (0.1) |
ref |
A reference annotation, either 'hg38' or 'mm10'. ('hg38') |
GeneSets |
A named list of gene sets. If NULL will automatically load MSigDB list of gene sets for specified reference genome. (NULL) |
GeneSetsDf |
A dataframe containing gene sets & class of gene sets. If NULL will automatically load MSigDB dataframe of gene sets for specified reference genome. (NULL) |
GenePool |
The pool of genes to run enrichment in. If NULL will automatically load Gencode list of genes fro specified reference genome. (NULL) |
qval.th |
Adjusted p-value threshold to define differential features. (0.01) |
logFC.th |
Fold change threshold to define differential features. (1) |
min.percent |
Minimum fraction of cells having the feature active to consider it as significantly differential. (0.01) |
peak_distance |
Maximum distanceToTSS of feature to gene TSS to consider associated, in bp. (1000) |
use_peaks |
Use peak calling method (must be calculated beforehand). (FALSE) |
GeneSetClasses |
Which classes of MSIGdb to look for. |
progress |
A shiny Progress instance to display progress bar. |
Returns a SingleCellExperiment object containing list of enriched Gene Sets for each cluster, either in depleted features, enriched features or simply differential features (both).
data("scExp") #Usually recommanding qval.th = 0.01 & logFC.th = 1 or 2 ## Not run: scExp_cf = gene_set_enrichment_analysis_scExp(scExp, qval.th = 0.4, logFC.th = 0.3) ## End(Not run)
data("scExp") #Usually recommanding qval.th = 0.01 & logFC.th = 1 or 2 ## Not run: scExp_cf = gene_set_enrichment_analysis_scExp(scExp, qval.th = 0.4, logFC.th = 0.3) ## End(Not run)
Generate a complete ChromSCape analysis
generate_analysis(input_data_folder, analysis_name = "Analysis_1", output_directory = "./", input_data_type = c("scBED", "DenseMatrix", "SparseMatrix", "scBAM")[1], rebin_sparse_matrix = FALSE, feature_count_on = c("bins","genebody","peaks")[1], feature_count_parameter = 50000, ref_genome = c("hg38","mm10")[1], run = c("filter", "CNA","cluster", "consensus","peak_call", "coverage", "DA", "GSA", "report")[c(1,3,6,7,8,9)], min_reads_per_cell = 1000, max_quantile_read_per_cell = 99, n_top_features = 40000, norm_type = "CPM", subsample_n = NULL, exclude_regions = NULL, n_clust = NULL, corr_threshold = 99, percent_correlation = 1, maxK = 10, qval.th = 0.1, logFC.th = 1, enrichment_qval = 0.1, doBatchCorr = FALSE, batch_sels = NULL, control_samples_CNA = NULL, genes_to_plot = c("Krt8","Krt5","Tgfb1", "Foxq1", "Cdkn2b", "Cdkn2a", "chr7:15000000-20000000") )
generate_analysis(input_data_folder, analysis_name = "Analysis_1", output_directory = "./", input_data_type = c("scBED", "DenseMatrix", "SparseMatrix", "scBAM")[1], rebin_sparse_matrix = FALSE, feature_count_on = c("bins","genebody","peaks")[1], feature_count_parameter = 50000, ref_genome = c("hg38","mm10")[1], run = c("filter", "CNA","cluster", "consensus","peak_call", "coverage", "DA", "GSA", "report")[c(1,3,6,7,8,9)], min_reads_per_cell = 1000, max_quantile_read_per_cell = 99, n_top_features = 40000, norm_type = "CPM", subsample_n = NULL, exclude_regions = NULL, n_clust = NULL, corr_threshold = 99, percent_correlation = 1, maxK = 10, qval.th = 0.1, logFC.th = 1, enrichment_qval = 0.1, doBatchCorr = FALSE, batch_sels = NULL, control_samples_CNA = NULL, genes_to_plot = c("Krt8","Krt5","Tgfb1", "Foxq1", "Cdkn2b", "Cdkn2a", "chr7:15000000-20000000") )
input_data_folder |
Directory containing the input data. |
analysis_name |
Name given to the analysis. |
output_directory |
Directory where to create the analysis and the HTML report. |
input_data_type |
The type of input data. |
feature_count_on |
For raw data type, on which features to count the cells. |
feature_count_parameter |
Additional parameter corresponding to the 'feature_count_on' parameter. E.g. for 'bins' must be a numeric, e.g. 50000, for 'peaks' must be a character containing path towards a BED peak file. |
rebin_sparse_matrix |
A boolean specifying if the SparseMatrix should be rebinned on features (see feature_count_on and feature_count_parameter). |
ref_genome |
The genome of reference. |
run |
What steps to run. By default runs everything. Some steps are required in order to run downstream steps. |
min_reads_per_cell |
Minimum number of reads per cell. |
max_quantile_read_per_cell |
Upper quantile above which to consider cells doublets. |
n_top_features |
Number of features to keep in the analysis. |
norm_type |
Normalization type. |
subsample_n |
Number of cells per condition to downsample to, for performance principally. |
exclude_regions |
Path towards a BED file containing CNA to exclude from the analysis (optional). |
n_clust |
Number of clusters to force choice of clusters. |
corr_threshold |
Quantile of correlation above which two cells are considered as correlated. |
percent_correlation |
Percentage of the total cells that a cell must be correlated with in order to be kept in the analysis. |
maxK |
Upper cluster number to rest for ConsensusClusterPlus. |
qval.th |
Adjusted p-value below which to consider features differential. |
logFC.th |
Log2-fold-change above/below which to consider a feature depleted/enriched. |
enrichment_qval |
Adjusted p-value below which to consider a gene set as significantly enriched in differential features. |
doBatchCorr |
Logical indicating if batch correction using fastMNN should be run. |
batch_sels |
If doBatchCorr is TRUE, a named list containing the samples in each batch. |
control_samples_CNA |
If running CopyNumber Analysis, a character vector of the sample names that are 'normal'. |
genes_to_plot |
A character vector containing genes of interest of which to plot the coverage. |
Creates a ChromSCape-readable directory and saved objects, as well as a multi-tabbed HTML report resuming the analysis.
## Not run: generate_analysis("/path/to/data/", "Analysis_1") ## End(Not run)
## Not run: generate_analysis("/path/to/data/", "Analysis_1") ## End(Not run)
Generate count matrix
generate_count_matrix(cells, features, sparse, cell_names, feature_names)
generate_count_matrix(cells, features, sparse, cell_names, feature_names)
cells |
Number of cells |
features |
Number of features |
sparse |
Is matrix sparse ? |
cell_names |
Cell names |
feature_names |
Feature names |
A matrix or a sparse matrix
Generate cell cluster pseudo-bulk coverage tracks. First, scBED files are concatenated into cell clusters contained in the 'by' column of your SingleCellExperiment object. To do so, for each sample in the given list, the barcodes of each cluster are grepped and BED files are merged into pseudo-bulk of clusters (C1,C2...). Two cells from different can have the same barcode ID as cell affectation is done sample by sample. Then coverage of pseudo-bulk BED files is calculated by averaging & smoothing reads on small genomic window (150bp per default). The pseudo bulk BED and BigWigs coverage tracks are writtend to the output directory. This functionality is not available on Windows as it uses the 'cat' and 'gzip' utilities from Unix OS.
generate_coverage_tracks( scExp_cf, input, odir, format = "scBED", ref_genome = c("hg38", "mm10")[1], bin_width = 150, n_smoothBin = 5, read_size = 101, quantile_for_peak_calling = 0.85, by = "cell_cluster", progress = NULL )
generate_coverage_tracks( scExp_cf, input, odir, format = "scBED", ref_genome = c("hg38", "mm10")[1], bin_width = 150, n_smoothBin = 5, read_size = 101, quantile_for_peak_calling = 0.85, by = "cell_cluster", progress = NULL )
scExp_cf |
A SingleCellExperiment with cluster selected.
(see |
input |
Either a named list of character vector of path towards single-cell BED files or a sparse raw matrix of small bins (<<500bp). If a named list specifying scBED the names MUST correspond to the 'sample_id' column in your SingleCellExperiment object. The single-cell BED files names MUST match the barcode names in your SingleCellExperiment (column 'barcode'). The scBED files can be gzipped or not. |
odir |
The output directory to write the cumulative BED and BigWig files. |
format |
File format, either "raw_mat", "BED" or "BAM" |
ref_genome |
The genome of reference, used to constrain to canonical chromosomes. Either 'hg38' or 'mm10'. 'hg38' per default. |
bin_width |
The width of the bin to create the coverage track. The smaller the greater the resolution & runtime. Default to 150. |
n_smoothBin |
Number of bins left & right to average ('smooth') the signal on. Default to 5. |
read_size |
The estimated size of reads. Default to 101. |
quantile_for_peak_calling |
The quantile to define the threshold above which signal is considered as a peak. |
by |
A character specifying a categoricla column of scExp_cf metadata by which to group cells and generate coverage tracks and peaks. |
progress |
A Progress object for Shiny. Default to NULL. |
Generate coverage tracks (.bigwig) for each group in the SingleCellExperiment "by" column.
## Not run: data(scExp) input_files_coverage = list( "scChIP_Jurkat_K4me3" = paste0("/path/to/",scExp$barcode[1:51],".bed"), "scChIP_Ramos_K4me3" = paste0("/path/to/",scExp$barcode[52:106],".bed") ) generate_coverage_tracks(scExp, input_files_coverage, "/path/to/output", ref_genome = "hg38") ## End(Not run)
## Not run: data(scExp) input_files_coverage = list( "scChIP_Jurkat_K4me3" = paste0("/path/to/",scExp$barcode[1:51],".bed"), "scChIP_Ramos_K4me3" = paste0("/path/to/",scExp$barcode[52:106],".bed") ) generate_coverage_tracks(scExp, input_files_coverage, "/path/to/output", ref_genome = "hg38") ## End(Not run)
Generate feature names
generate_feature_names(featureType, ref, features)
generate_feature_names(featureType, ref, features)
featureType |
Type of feature |
ref |
Reference genome |
features |
Number of features to generate |
A character vector of feature names
From a ChromSCape analysis directory, generate an HTML report.
generate_report( ChromSCape_directory, prefix = NULL, run = c("filter", "CNA", "cluster", "consensus", "peak_call", "coverage", "DA", "GSA", "report")[c(1, 3, 6, 7, 8, 9)], genes_to_plot = c("Krt8", "Krt5", "Tgfb1", "Foxq1", "Cdkn2b", "Cdkn2a", "chr7:15000000-20000000"), control_samples_CNA = NULL )
generate_report( ChromSCape_directory, prefix = NULL, run = c("filter", "CNA", "cluster", "consensus", "peak_call", "coverage", "DA", "GSA", "report")[c(1, 3, 6, 7, 8, 9)], genes_to_plot = c("Krt8", "Krt5", "Tgfb1", "Foxq1", "Cdkn2b", "Cdkn2a", "chr7:15000000-20000000"), control_samples_CNA = NULL )
ChromSCape_directory |
Path towards the ChromSCape directory of which you want to create the report. The report will be created at the root of this directory. |
prefix |
Name of the analysis with the filtering parameters (e.g. Analysis_3000_100000_99_uncorrected). You will find the prefix in the Filtering_Normalize_Reduce subfolder. |
run |
Which steps to report ("filter", "CNA","cluster", "consensus", "peak_call", "coverage", "DA", "GSA", "report"). Only indicate steps that were done in the analysis. By default do not report CNA, consensus and peak calling. |
genes_to_plot |
For the UMAP, which genes do you want to see in the report. |
control_samples_CNA |
If running the Copy Number Alteration (CNA) part, which samples are the controls |
Generate an HTML report at the root of the analysis directory.
## Not run: generate_analysis("/path/to/data/", "Analysis_1") ## End(Not run)
## Not run: generate_analysis("/path/to/data/", "Analysis_1") ## End(Not run)
Get color dataframe from shiny::colorInput
get_color_dataframe_from_input( input, levels_selected, color_by = c("sample_id", "total_counts"), input_id_prefix = "color_" )
get_color_dataframe_from_input( input, levels_selected, color_by = c("sample_id", "total_counts"), input_id_prefix = "color_" )
input |
Shiny input object |
levels_selected |
Names of the features |
color_by |
Which feature color to retrieve |
input_id_prefix |
Prefix in front of the feature names |
A data.frame with the feature levels and the colors of each level of this feature.
Map the features of a SingleCellExperiment onto the cytobands of a given genome. Some features might not be mapped to any cytobands (e.g. if they are not in the canconical chromosomes), and are removed from the returned object.
get_cyto_features(scExp, ref_genome = c("hg38", "mm10")[1])
get_cyto_features(scExp, ref_genome = c("hg38", "mm10")[1])
scExp |
A SingleCellExperiment with genomic coordinate as features (peaks or bins) |
ref_genome |
Reference genome ('hg38' or 'mm10') |
The cytobands are an arbitrary cutting of the genome that dates back to staining metaphase chromosomes with Giemsa.
A data.frame of the SCE features with their corresponding cytoband name
data("scExp") matching_cyto = get_cyto_features(scExp, ref_genome="hg38")
data("scExp") matching_cyto = get_cyto_features(scExp, ref_genome="hg38")
Get SingleCellExperiment's genomic coordinates
get_genomic_coordinates(scExp)
get_genomic_coordinates(scExp)
scExp |
A SingleCellExperiment object. |
A GRanges object of genomic coordinates.
raw <- create_scDataset_raw() scExp = create_scExp(raw$mat, raw$annot) feature_GRanges = get_genomic_coordinates(scExp)
raw <- create_scDataset_raw() scExp = create_scExp(raw$mat, raw$annot) feature_GRanges = get_genomic_coordinates(scExp)
Given a SingleCellExperiment object with the slot "cytoBand" containing the fraction of reads in each cytoband, calculates the variance of each cytoband and returns a data.frame with the top variables cytobands. Most cytobands are expected to be unchanged between normal and tumor samples, therefore focusing on the top variable cytobands enable to focus on the most interseting regions.
get_most_variable_cyto(scExp, top = 50)
get_most_variable_cyto(scExp, top = 50)
scExp |
A SingleCellExperiment with "cytoBand" reducedDim slot filled. |
top |
Number of cytobands to return (50). |
A data.frame of the top variable cytoBands and their variance
data("scExp") scExp = calculate_cyto_mat(scExp, ref_genome="hg38") get_most_variable_cyto(scExp, top=50)
data("scExp") scExp = calculate_cyto_mat(scExp, ref_genome="hg38") get_most_variable_cyto(scExp, top=50)
Get pathway matrix
get_pathway_mat_scExp( scExp, pathways, max_distanceToTSS = 1000, ref = "hg38", GeneSetClasses = c("c1_positional", "c2_curated", "c3_motif", "c4_computational", "c5_GO", "c6_oncogenic", "c7_immunologic", "hallmark"), progress = NULL )
get_pathway_mat_scExp( scExp, pathways, max_distanceToTSS = 1000, ref = "hg38", GeneSetClasses = c("c1_positional", "c2_curated", "c3_motif", "c4_computational", "c5_GO", "c6_oncogenic", "c7_immunologic", "hallmark"), progress = NULL )
scExp |
A SingleCellExperiment |
pathways |
A character vector specifying the pathways to retrieve the cell count for. |
max_distanceToTSS |
Numeric. Maximum distance to a gene's TSS to consider a region linked to a gene. (1000)#' @param ref |
ref |
Reference genome, either mm10 or hg38 |
GeneSetClasses |
Which classes of MSIGdb to load |
progress |
A shiny Progress instance to display progress bar. |
A matrix of cell to pathway
data(scExp) mat = get_pathway_mat_scExp(scExp, pathways = "KEGG_T_CELL_RECEPTOR_SIGNALING_PATHWAY")
data(scExp) mat = get_pathway_mat_scExp(scExp, pathways = "KEGG_T_CELL_RECEPTOR_SIGNALING_PATHWAY")
Get experiment names from a SingleCellExperiment
getExperimentNames(scExp)
getExperimentNames(scExp)
scExp |
A SingleCellExperiment with named mainExp and altExps. |
Character vector of unique experiment names
data(scExp) getExperimentNames(scExp)
data(scExp) getExperimentNames(scExp)
Get Main experiment of a SingleCellExperiment
getMainExperiment(scExp)
getMainExperiment(scExp)
scExp |
A SingleCellExperiment with named mainExp and altExps. |
The swapped SingleCellExperiment towards "main" experiment
data(scExp) getMainExperiment(scExp)
data(scExp) getMainExperiment(scExp)
gg_fill_hue
gg_fill_hue(n)
gg_fill_hue(n)
n |
num hues |
A color in HEX format
groupMat
groupMat(mat = NA, margin = 1, groups = NA, method = "mean")
groupMat(mat = NA, margin = 1, groups = NA, method = "mean")
mat |
A matrix |
margin |
By row or columns ? |
groups |
Groups |
method |
Method to group |
A grouped matrix
H1proportion
H1proportion(pv = NA, lambda = 0.5)
H1proportion(pv = NA, lambda = 0.5)
pv |
P.value vector |
lambda |
Lambda value |
H1 proportion value
Does SingleCellExperiment has genomic coordinates in features ?
has_genomic_coordinates(scExp)
has_genomic_coordinates(scExp)
scExp |
A SingleCellExperiment object |
TRUE or FALSE
raw <- create_scDataset_raw() scExp = create_scExp(raw$mat, raw$annot) has_genomic_coordinates(scExp) raw_genes = create_scDataset_raw(featureType="gene") scExp_gene = create_scExp(raw_genes$mat, raw_genes$annot) has_genomic_coordinates(scExp_gene)
raw <- create_scDataset_raw() scExp = create_scExp(raw$mat, raw$annot) has_genomic_coordinates(scExp) raw_genes = create_scDataset_raw(featureType="gene") scExp_gene = create_scExp(raw_genes$mat, raw_genes$annot) has_genomic_coordinates(scExp_gene)
hclustAnnotHeatmapPlot
hclustAnnotHeatmapPlot( x = NULL, hc = NULL, hmColors = NULL, anocol = NULL, xpos = c(0.1, 0.9, 0.114, 0.885), ypos = c(0.1, 0.5, 0.5, 0.6, 0.62, 0.95), dendro.cex = 1, xlab.cex = 0.8, hmRowNames = FALSE, hmRowNames.cex = 0.5 )
hclustAnnotHeatmapPlot( x = NULL, hc = NULL, hmColors = NULL, anocol = NULL, xpos = c(0.1, 0.9, 0.114, 0.885), ypos = c(0.1, 0.5, 0.5, 0.6, 0.62, 0.95), dendro.cex = 1, xlab.cex = 0.8, hmRowNames = FALSE, hmRowNames.cex = 0.5 )
x |
A correlation matrix |
hc |
An hclust object |
hmColors |
A color palette |
anocol |
A matrix of colors |
xpos |
Xpos |
ypos |
Ypos |
dendro.cex |
Size of denro names |
xlab.cex |
Size of x label |
hmRowNames |
Write rownames ? |
hmRowNames.cex |
Size of rownames ? |
A heatmap
This data frame provides the length of each "canonical" chromosomes of Homo Sapiens genome build hg38.
data("hg38.chromosomes")
data("hg38.chromosomes")
hg38.chromosomes - a data frame with 24 rows and 3 variables:
Chromosome - character
Start of the chromosome (bp) - integer
End of the chromosome (bp) - integer
This data frame provides the location of each cytoBands of Homo Sapiens genome build hg38.
data("hg38.cytoBand")
data("hg38.cytoBand")
hg38.cytoBand - a data frame with 862 rows and 4 variables:
Chromosome - character
Start of the chromosome (bp) - integer
End of the chromosome (bp) - integer
Name of the cytoBand - character
This dataframe was extracted from Gencode v25 and report the Transcription Start Site of each gene in the Homo Sapiens genome build hg38.
data("hg38.GeneTSS")
data("hg38.GeneTSS")
hg38.GeneTSS - a data frame with 24 rows and 3 variables:
Chromosome - character
Start of the gene (TSS) - integer
End of the gene - integer
Gene symbol - character
imageCol
imageCol( matcol = NULL, strat = NULL, xlab.cex = 0.5, ylab.cex = 0.5, drawLines = c("none", "h", "v", "b")[1], ... )
imageCol( matcol = NULL, strat = NULL, xlab.cex = 0.5, ylab.cex = 0.5, drawLines = c("none", "h", "v", "b")[1], ... )
matcol |
A matrix of colors |
strat |
Strat |
xlab.cex |
X label size |
ylab.cex |
Y label size |
drawLines |
Draw lines ? |
... |
Additional parameters |
A rectangular image
Import and count input files depending on their format
import_count_input_files( files_dir_list, file_type, which, ref, verbose, progress, BPPARAM = BiocParallel::bpparam() )
import_count_input_files( files_dir_list, file_type, which, ref, verbose, progress, BPPARAM = BiocParallel::bpparam() )
files_dir_list |
A named list of directories containing the input files. |
file_type |
Input file type. |
which |
A GRanges object of features. |
ref |
Reference genome. |
verbose |
Print ? |
progress |
A progress object for Shiny. |
BPPARAM |
BPPARAM object for multiprocessing. See bpparam for more informations. Will take the default BPPARAM set in your R session. |
A list with the feature indexes data.frame containing non-zeroes entries in the count matrix and the cell names
Combine one or multiple matrices together to create a sparse matrix and cell annotation data.frame.
import_scExp(file_paths, remove_pattern = "", temp_path = NULL)
import_scExp(file_paths, remove_pattern = "", temp_path = NULL)
file_paths |
A character vector of file names towards single cell epigenomic matrices (features x cells) (must be .txt / .tsv) |
remove_pattern |
A string pattern to remove from the sample names. Can be a regexp. |
temp_path |
In case matrices are stored in temporary folder, a character vector of path towards temporary files. (NULL) |
A list containing:
datamatrix: a sparseMatrix of features x cells
annot_raw: an annotation of cells as data.frame
mat1 = mat2 = create_scDataset_raw()$mat tmp1 = tempfile(fileext = ".tsv") tmp2 = tempfile(fileext = ".tsv") write.table(as.matrix(mat1),file=tmp1,sep = "\t", row.names = TRUE,col.names = TRUE,quote = FALSE) write.table(as.matrix(mat2),file=tmp2, sep = "\t", row.names = TRUE,col.names = TRUE,quote = FALSE) file_paths = c(tmp1,tmp2) out = import_scExp(file_paths)
mat1 = mat2 = create_scDataset_raw()$mat tmp1 = tempfile(fileext = ".tsv") tmp2 = tempfile(fileext = ".tsv") write.table(as.matrix(mat1),file=tmp1,sep = "\t", row.names = TRUE,col.names = TRUE,quote = FALSE) write.table(as.matrix(mat2),file=tmp2, sep = "\t", row.names = TRUE,col.names = TRUE,quote = FALSE) file_paths = c(tmp1,tmp2) out = import_scExp(file_paths)
Read index-peaks-barcodes trio files on interval to create count indexes
index_peaks_barcodes_to_matrix_indexes( feature_file, matrix_file, barcode_file, binarize = FALSE )
index_peaks_barcodes_to_matrix_indexes( feature_file, matrix_file, barcode_file, binarize = FALSE )
feature_file |
A file containing the features genomic locations |
matrix_file |
A file containing the indexes of non-zeroes values and their value (respectively i,j,x,see sparseMatrix) |
barcode_file |
A file containing the barcode ids |
binarize |
Binarize matrix ? |
A list containing a "feature index" data.frame, name_cells, and a region GenomicRange object used to form the sparse matrix
Calculate inter correlation between cluster or samples
inter_correlation_scExp( scExp_cf, by = c("sample_id", "cell_cluster")[1], reference_group = unique(scExp_cf[[by]])[1], other_groups = unique(scExp_cf[[by]]), fullCor = TRUE )
inter_correlation_scExp( scExp_cf, by = c("sample_id", "cell_cluster")[1], reference_group = unique(scExp_cf[[by]])[1], other_groups = unique(scExp_cf[[by]]), fullCor = TRUE )
scExp_cf |
A SingleCellExperiment |
by |
On which feature to calculate correlation ("sample_id" or "cell_cluster") |
reference_group |
Reference group to calculate correlation with. Must be in accordance with "by". |
other_groups |
Groups on which to calculate correlation (can contain multiple groups, and also reference_group). Must be in accordance with "by". |
fullCor |
A logical specifying if the correlation matrix was calculated on the entire set of cells (TRUE). |
A data.frame of average inter-correlation of cells in other_groups with cells in reference_group
data(scExp) inter_correlation_scExp(scExp)
data(scExp) inter_correlation_scExp(scExp)
Calculate intra correlation between cluster or samples
intra_correlation_scExp( scExp_cf, by = c("sample_id", "cell_cluster")[1], fullCor = TRUE )
intra_correlation_scExp( scExp_cf, by = c("sample_id", "cell_cluster")[1], fullCor = TRUE )
scExp_cf |
A SingleCellExperiment |
by |
On which feature to calculate correlation ("sample_id" or "cell_cluster") |
fullCor |
Logical specifying if the correlation matrix was run on the entire number of cells or on a subset. |
A data.frame of cell average intra-correlation
data(scExp) intra_correlation_scExp(scExp, by = "sample_id") intra_correlation_scExp(scExp, by = "cell_cluster")
data(scExp) intra_correlation_scExp(scExp, by = "sample_id") intra_correlation_scExp(scExp, by = "cell_cluster")
Main function to launch ChromSCape in your favorite browser. You can pass
additional parameters that you would pass to shiny::runApp
(runApp
)
launchApp(launch.browser = TRUE, ...)
launchApp(launch.browser = TRUE, ...)
launch.browser |
Wether to launch browser or not |
... |
Additional parameters passed to |
Launches the shiny application
## Not run: launchApp() ## End(Not run)
## Not run: launchApp() ## End(Not run)
Load and format MSIGdb pathways using msigdbr package
load_MSIGdb( ref, GeneSetClasses = c("c1_positional", "c2_curated", "c3_motif", "c4_computational", "c5_GO", "c6_oncogenic", "c7_immunologic", "hallmark") )
load_MSIGdb( ref, GeneSetClasses = c("c1_positional", "c2_curated", "c3_motif", "c4_computational", "c5_GO", "c6_oncogenic", "c7_immunologic", "hallmark") )
ref |
Reference genome, either mm10 or hg38 |
GeneSetClasses |
Which classes of MSIGdb to load |
A list containing the GeneSet (list), GeneSetDf (data.frame) and GenePool character vector of all possible genes
Merge peak files from MACS2 peak caller
merge_MACS2_peaks(peak_file, peak_distance_to_merge, min_peak_size = 200, ref)
merge_MACS2_peaks(peak_file, peak_distance_to_merge, min_peak_size = 200, ref)
peak_file |
A character specifying the path towards the peak file (BED or bedGraph format) |
peak_distance_to_merge |
Maximum distance to merge two peaks |
min_peak_size |
An integer specifying the minimum size of peaks |
ref |
Reference genome |
Peaks as GRanges
This data frame provides the length of each "canonical" chromosomes of Mus Musculus (Mouse) genome build mm10.
data("mm10.chromosomes")
data("mm10.chromosomes")
mm10.chromosomes - a data frame with 24 rows and 3 variables:
Chromosome - character
Start of the chromosome (bp) - integer
End of the chromosome (bp) - integer
This data frame provides the location of each cytoBands of Homo Sapiens genome build mm10.
data("mm10.cytoBand")
data("mm10.cytoBand")
mm10.cytoBand - a data frame with 862 rows and 4 variables:
Chromosome - character
Start of the chromosome (bp) - integer
End of the chromosome (bp) - integer
Name of the cytoBand - character
This dataframe was extracted from Gencode v25 and report the Transcription Start Site of each gene in the Mus Musculus genome build mm10 (Mouse).
data("mm10.GeneTSS")
data("mm10.GeneTSS")
mm10.GeneTSS - a data frame with 24 rows and 3 variables:
Chromosome name - character
Start of the gene (TSS) - integer
End of the gene - integer
Gene symbol - character
Normalize counts
normalize_scExp( scExp, type = c("CPM", "TFIDF", "RPKM", "TPM", "feature_size_only") )
normalize_scExp( scExp, type = c("CPM", "TFIDF", "RPKM", "TPM", "feature_size_only") )
scExp |
A SingleCellExperiment object. |
type |
Which normalization to apply. Either 'CPM', 'TFIDF','RPKM', 'TPM' or 'feature_size_only'. Note that for all normalization by size (RPKM, TPM, feature_size_only), the features must have defined genomic coordinates. |
A SingleCellExperiment object containing normalized counts. (See ?normcounts())
raw <- create_scDataset_raw() scExp = create_scExp(raw$mat, raw$annot) scExp = normalize_scExp(scExp) head(SingleCellExperiment::normcounts(scExp))
raw <- create_scDataset_raw() scExp = create_scExp(raw$mat, raw$annot) scExp = normalize_scExp(scExp) head(SingleCellExperiment::normcounts(scExp))
Number of cells before & after correlation filtering
num_cell_after_cor_filt_scExp(scExp, scExp_cf)
num_cell_after_cor_filt_scExp(scExp, scExp_cf)
scExp |
SingleCellExperiment object before correlation filtering. |
scExp_cf |
SingleCellExperiment object atfer correlation filtering. |
A colored kable with the number of cells per sample before and after filtering for display
data("scExp") scExp_cf = correlation_and_hierarchical_clust_scExp(scExp) scExp_cf = filter_correlated_cell_scExp(scExp_cf, corr_threshold = 99, percent_correlation = 1) ## Not run: num_cell_after_cor_filt_scExp(scExp,scExp_cf)
data("scExp") scExp_cf = correlation_and_hierarchical_clust_scExp(scExp) scExp_cf = filter_correlated_cell_scExp(scExp_cf, corr_threshold = 99, percent_correlation = 1) ## Not run: num_cell_after_cor_filt_scExp(scExp,scExp_cf)
Table of cells before / after QC
num_cell_after_QC_filt_scExp(scExp, annot, datamatrix)
num_cell_after_QC_filt_scExp(scExp, annot, datamatrix)
scExp |
A SingleCellExperiment object. |
annot |
A raw annotation data.frame of cells before filtering. |
datamatrix |
A matrix of cells per regions before filtering. |
A formatted kable in HTML.
raw <- create_scDataset_raw() scExp = create_scExp(raw$mat, raw$annot) scExp_filtered = filter_scExp(scExp) ## Not run: num_cell_after_QC_filt_scExp( scExp_filtered,SingleCellExperiment::colData(scExp)) ## End(Not run)
raw <- create_scDataset_raw() scExp = create_scExp(raw$mat, raw$annot) scExp_filtered = filter_scExp(scExp) ## Not run: num_cell_after_QC_filt_scExp( scExp_filtered,SingleCellExperiment::colData(scExp)) ## End(Not run)
Table of number of cells before correlation filtering
num_cell_before_cor_filt_scExp(scExp)
num_cell_before_cor_filt_scExp(scExp)
scExp |
A SingleCellExperiment Object |
A colored kable with the number of cells per sample for display
data("scExp") ## Not run: num_cell_before_cor_filt_scExp(scExp)
data("scExp") ## Not run: num_cell_before_cor_filt_scExp(scExp)
Number of cells in each cluster
num_cell_in_cluster_scExp(scExp)
num_cell_in_cluster_scExp(scExp)
scExp |
A SingleCellExperiment object containing chromatin groups. |
A formatted kable of cell assignation to each cluster.
data("scExp") scExp_cf = correlation_and_hierarchical_clust_scExp(scExp) scExp_cf = choose_cluster_scExp(scExp_cf,nclust=3,consensus=FALSE) ## Not run: num_cell_in_cluster_scExp(scExp_cf)
data("scExp") scExp_cf = correlation_and_hierarchical_clust_scExp(scExp) scExp_cf = choose_cluster_scExp(scExp_cf,nclust=3,consensus=FALSE) ## Not run: num_cell_in_cluster_scExp(scExp_cf)
Table of cells
num_cell_scExp(annot, datamatrix)
num_cell_scExp(annot, datamatrix)
annot |
An annotation of cells. Can be obtain through 'colData(scExp)'. |
datamatrix |
A matrix of cells per regions before filtering. |
A formatted kable in HTML.
raw <- create_scDataset_raw() scExp = create_scExp(raw$mat, raw$annot) ## Not run: num_cell_scExp(SingleCellExperiment::colData(scExp))
raw <- create_scDataset_raw() scExp = create_scExp(raw$mat, raw$annot) ## Not run: num_cell_scExp(SingleCellExperiment::colData(scExp))
This function allows to run a PCA using IRLBA Singular Value Decomposition in a fast & memory efficient way. The increamental Lanczos bidiagonalisation algorithm allows to keep the matrix sparse as the "loci" centering is implicit. The function then multiplies by the approximate singular values (svd$d) in order to get more importance to the first PCs proportionnally to their singular values. This step is crucial for downstream approaches, e.g. UMAP or T-SNE.
pca_irlba_for_sparseMatrix(x, n_comp, work = 3 * n_comp)
pca_irlba_for_sparseMatrix(x, n_comp, work = 3 * n_comp)
x |
A sparse normalized matrix (features x cells) |
n_comp |
The number of principal components to keep |
work |
Working subspace dimension, larger values can speed convergence at the cost of more memory use. |
The rotated data, e.g. the cells x PC column in case of sc data.
Plot cluster consensus score for each k as a bargraph.
plot_cluster_consensus_scExp(scExp)
plot_cluster_consensus_scExp(scExp)
scExp |
A SingleCellExperiment |
The consensus score for each cluster for each k as a barplot
data("scExp") plot_cluster_consensus_scExp(scExp)
data("scExp") plot_cluster_consensus_scExp(scExp)
Plotting correlation of PCs with a variable of interest
plot_correlation_PCA_scExp( scExp, correlation_var = "total_counts", color_by = NULL, topPC = 10 )
plot_correlation_PCA_scExp( scExp, correlation_var = "total_counts", color_by = NULL, topPC = 10 )
scExp |
A SingleCellExperiment Object |
correlation_var |
A string specifying with which numeric variable from colData of scExp to calculate and plot the correlation of each PC with. ('total_counts') |
color_by |
A string specifying with which categorical variable to color the plot. ('NULL') |
topPC |
An integer specifying the number of PCs to plot correlation with 10 |
A ggplot histogram representing the distribution of count per cell
data("scExp") plot_correlation_PCA_scExp(scExp, topPC = 25) plot_correlation_PCA_scExp(scExp, color_by = "cell_cluster") plot_correlation_PCA_scExp(scExp, color_by = "sample_id")
data("scExp") plot_correlation_PCA_scExp(scExp, topPC = 25) plot_correlation_PCA_scExp(scExp, color_by = "cell_cluster") plot_correlation_PCA_scExp(scExp, color_by = "sample_id")
Coverage plot
plot_coverage_BigWig( coverages, label_color_list, peaks = NULL, chrom, start, end, ref = "hg38" )
plot_coverage_BigWig( coverages, label_color_list, peaks = NULL, chrom, start, end, ref = "hg38" )
coverages |
A list containing sample coverage as GenomicRanges |
label_color_list |
List of colors, list names are labels |
peaks |
A GRanges object containing peaks location to plot (optional) |
chrom |
Chromosome |
start |
Start |
end |
End |
ref |
Genomic Reference |
A coverage plot annotated with genes
data(scExp)
data(scExp)
Differential summary barplot
plot_differential_summary_scExp( scExp_cf, qval.th = 0.01, logFC.th = 1, min.percent = 0.01 )
plot_differential_summary_scExp( scExp_cf, qval.th = 0.01, logFC.th = 1, min.percent = 0.01 )
scExp_cf |
A SingleCellExperiment object |
qval.th |
Adjusted p-value threshold. (0.01) |
logFC.th |
Fold change threshold. (1) |
min.percent |
Minimum fraction of cells having the feature active to consider it as significantly differential. (0.01) |
A barplot summary of differential analysis
data("scExp") plot_differential_summary_scExp(scExp)
data("scExp") plot_differential_summary_scExp(scExp)
Volcano plot of differential features
plot_differential_volcano_scExp( scExp_cf, group = "C1", logFC.th = 1, qval.th = 0.01, min.percent = 0.01 )
plot_differential_volcano_scExp( scExp_cf, group = "C1", logFC.th = 1, qval.th = 0.01, min.percent = 0.01 )
scExp_cf |
A SingleCellExperiment object |
group |
A character indicating the group for which to plot the differential volcano plot. ("C1") |
logFC.th |
Fold change threshold. (1) |
qval.th |
Adjusted p-value threshold. (0.01) |
min.percent |
Minimum fraction of cells having the feature active to consider it as significantly differential. (0.01) |
A volcano plot of differential analysis of a specific cluster
data("scExp") plot_differential_volcano_scExp(scExp,"C1")
data("scExp") plot_differential_volcano_scExp(scExp,"C1")
Plotting distribution of signal
plot_distribution_scExp( scExp, raw = TRUE, log10 = FALSE, pseudo_counts = 1, bins = 150 )
plot_distribution_scExp( scExp, raw = TRUE, log10 = FALSE, pseudo_counts = 1, bins = 150 )
scExp |
A SingleCellExperiment Object |
raw |
Use raw counts ? |
log10 |
Transform using log10 ? |
pseudo_counts |
Pseudo-count to add if using log10 |
bins |
Number of bins in the histogram |
A ggplot histogram representing the distribution of count per cell
data("scExp") plot_distribution_scExp(scExp)
data("scExp") plot_distribution_scExp(scExp)
Plot Gain or Loss of cytobands of the most variables cytobands
Plot Gain or Loss of cytobands of the most variables cytobands
plot_gain_or_loss_barplots(scExp, cells = NULL, top = 20) plot_gain_or_loss_barplots(scExp, cells = NULL, top = 20)
plot_gain_or_loss_barplots(scExp, cells = NULL, top = 20) plot_gain_or_loss_barplots(scExp, cells = NULL, top = 20)
scExp |
A SingleCellExperiment with "logRatio_cytoBand" reducedDim slot
filled. See |
cells |
Cell IDs of the tumor samples to |
top |
Number of most variables cytobands to plot |
Plot the gains/lost in the selected cells of interest as multiple barplots
Plot the gains/lost in the selected cells of interest as multiple barplots
data("scExp") scExp = calculate_CNA(scExp, control_samples = unique(scExp$sample_id)[1], ref_genome="hg38", quantiles_to_define_gol = c(0.05,0.95)) plot_gain_or_loss_barplots(scExp, cells = scExp$cell_id[which( scExp$sample_id %in% unique(scExp$sample_id)[2])]) data("scExp") scExp = calculate_CNA(scExp, control_samples = unique(scExp$sample_id)[1], ref_genome="hg38", quantiles_to_define_gol = c(0.05,0.95)) plot_gain_or_loss_barplots(scExp, cells = scExp$cell_id[which( scExp$sample_id %in% unique(scExp$sample_id)[2])])
data("scExp") scExp = calculate_CNA(scExp, control_samples = unique(scExp$sample_id)[1], ref_genome="hg38", quantiles_to_define_gol = c(0.05,0.95)) plot_gain_or_loss_barplots(scExp, cells = scExp$cell_id[which( scExp$sample_id %in% unique(scExp$sample_id)[2])]) data("scExp") scExp = calculate_CNA(scExp, control_samples = unique(scExp$sample_id)[1], ref_genome="hg38", quantiles_to_define_gol = c(0.05,0.95)) plot_gain_or_loss_barplots(scExp, cells = scExp$cell_id[which( scExp$sample_id %in% unique(scExp$sample_id)[2])])
Plot cell correlation heatmap with annotations
plot_heatmap_scExp( scExp, name_hc = "hc_cor", corColors = (grDevices::colorRampPalette(c("royalblue", "white", "indianred1")))(256), color_by = NULL, downsample = 1000, hc_linkage = "ward.D" )
plot_heatmap_scExp( scExp, name_hc = "hc_cor", corColors = (grDevices::colorRampPalette(c("royalblue", "white", "indianred1")))(256), color_by = NULL, downsample = 1000, hc_linkage = "ward.D" )
scExp |
A SingleCellExperiment Object |
name_hc |
Name of the hclust contained in the SingleCellExperiment object |
corColors |
A palette of colors for the heatmap |
color_by |
Which features to add as additional bands on top of plot |
downsample |
Number of cells to downsample |
hc_linkage |
A linkage method for hierarchical clustering. See cor. ('ward.D') |
A heatmap of cell to cell correlation, grouping cells by hierarchical clustering.
data("scExp") plot_heatmap_scExp(scExp)
data("scExp") plot_heatmap_scExp(scExp)
Violin plot of inter-correlation distribution between one or multiple groups and one reference group
plot_inter_correlation_scExp( scExp_cf, by = c("sample_id", "cell_cluster")[1], jitter_by = NULL, reference_group = unique(scExp_cf[[by]])[1], other_groups = unique(scExp_cf[[by]]), downsample = 5000 )
plot_inter_correlation_scExp( scExp_cf, by = c("sample_id", "cell_cluster")[1], jitter_by = NULL, reference_group = unique(scExp_cf[[by]])[1], other_groups = unique(scExp_cf[[by]]), downsample = 5000 )
scExp_cf |
A SingleCellExperiment |
by |
Color by sample_id or cell_cluster |
jitter_by |
Add jitter points of another layer (cell_cluster or sample_id) |
reference_group |
Character containing the reference group name to calculate correlation from. |
other_groups |
Character vector of the other groups for which to calculate correlation with the reference group. |
downsample |
Downsample for plotting |
A violin plot of inter-correlation
data(scExp) plot_intra_correlation_scExp(scExp)
data(scExp) plot_intra_correlation_scExp(scExp)
Violin plot of intra-correlation distribution
plot_intra_correlation_scExp( scExp_cf, by = c("sample_id", "cell_cluster")[1], jitter_by = NULL, downsample = 5000 )
plot_intra_correlation_scExp( scExp_cf, by = c("sample_id", "cell_cluster")[1], jitter_by = NULL, downsample = 5000 )
scExp_cf |
A SingleCellExperiment |
by |
Color by sample_id or cell_cluster |
jitter_by |
Add jitter points of another layer (cell_cluster or sample_id) |
downsample |
Downsample for plotting |
A violin plot of intra-correlation
data(scExp) plot_intra_correlation_scExp(scExp)
data(scExp) plot_intra_correlation_scExp(scExp)
Plot Top/Bottom most contributing features to PCA
plot_most_contributing_features( scExp, component = "Component_1", n_top_bot = 10 )
plot_most_contributing_features( scExp, component = "Component_1", n_top_bot = 10 )
scExp |
A SingleCellExperiment containing "PCA" in reducedDims and gene annotation in rowRanges |
component |
The name of the component of interest |
n_top_bot |
An integer number of top and bot regions to plot |
If a gene TSS is within 10,000bp of the region, the name of the gene(s) will be displayed instead of the region
A barplot of top and bottom features with the largest absolute value in the component of interest
data(scExp) plot_most_contributing_features(scExp, component = "Component_1")
data(scExp) plot_most_contributing_features(scExp, component = "Component_1")
Barplot of the % of active cells for a given features
plot_percent_active_feature_scExp( scExp, gene, by = c("cell_cluster", "sample_id")[1], highlight = NULL, downsample = 5000, max_distanceToTSS = 1000 )
plot_percent_active_feature_scExp( scExp, gene, by = c("cell_cluster", "sample_id")[1], highlight = NULL, downsample = 5000, max_distanceToTSS = 1000 )
scExp |
A SingleCellExperiment |
gene |
A character specifying the gene to plot |
by |
Color violin by cell_cluster or sample_id ("cell_cluster") |
highlight |
A specific group to highlight in a one vs all fashion |
downsample |
Downsample for plotting (5000) |
max_distanceToTSS |
Numeric. Maximum distance to a gene's TSS to consider a region linked to a gene. (1000) |
A violin plot of intra-correlation
data(scExp) plot_percent_active_feature_scExp(scExp, "UBXN10")
data(scExp) plot_percent_active_feature_scExp(scExp, "UBXN10")
Pie chart of top contribution of chromosomes in the 100 most contributing features to PCA #'
plot_pie_most_contributing_chr( scExp, component = "Component_1", n_top_bot = 100 )
plot_pie_most_contributing_chr( scExp, component = "Component_1", n_top_bot = 100 )
scExp |
A SingleCellExperiment containing "PCA" in reducedDims and gene annotation in rowRanges |
component |
The name of the component of interest |
n_top_bot |
An integer number of top and bot regions to plot (100) |
A pie chart showing the distribution of chromosomes in the top features with the largest absolute value in the component of interest
data(scExp) plot_pie_most_contributing_chr(scExp, component = "Component_1")
data(scExp) plot_pie_most_contributing_chr(scExp, component = "Component_1")
Plot reduced dimensions (PCA, TSNE, UMAP)
plot_reduced_dim_scExp( scExp, color_by = "sample_id", reduced_dim = c("PCA", "TSNE", "UMAP"), select_x = NULL, select_y = NULL, downsample = 5000, transparency = 0.6, size = 1, max_distanceToTSS = 1000, annotate_clusters = "cell_cluster" %in% colnames(colData(scExp)), min_quantile = 0.01, max_quantile = 0.99 )
plot_reduced_dim_scExp( scExp, color_by = "sample_id", reduced_dim = c("PCA", "TSNE", "UMAP"), select_x = NULL, select_y = NULL, downsample = 5000, transparency = 0.6, size = 1, max_distanceToTSS = 1000, annotate_clusters = "cell_cluster" %in% colnames(colData(scExp)), min_quantile = 0.01, max_quantile = 0.99 )
scExp |
A SingleCellExperiment Object |
color_by |
Character of eature used for coloration. Can be cell metadata ('total_counts', 'sample_id', ...) or a gene name. |
reduced_dim |
Reduced Dimension used for plotting |
select_x |
Which variable to select for x axis |
select_y |
Which variable to select for y axis |
downsample |
Number of cells to downsample |
transparency |
Alpha parameter, between 0 and 1 |
size |
Size of the points. |
max_distanceToTSS |
The maximum distance to TSS to consider a gene linked to a region. Used only if "color_by" is a gene name. |
annotate_clusters |
A logical indicating if clusters should be labelled. The 'cell_cluster' column should be present in metadata. |
min_quantile |
The lower threshold to remove outlier cells, as quantile of cell embeddings (between 0 and 0.5). |
max_quantile |
The upper threshold to remove outlier cells, as quantile of cell embeddings (between 0.5 and 1). |
A ggplot geom_point plot of reduced dimension 2D reprensentation
data("scExp") plot_reduced_dim_scExp(scExp, color_by = "sample_id") plot_reduced_dim_scExp(scExp, color_by = "total_counts") plot_reduced_dim_scExp(scExp, reduced_dim = "UMAP") plot_reduced_dim_scExp(scExp, color_by = "CD52", reduced_dim = "UMAP")
data("scExp") plot_reduced_dim_scExp(scExp, color_by = "sample_id") plot_reduced_dim_scExp(scExp, color_by = "total_counts") plot_reduced_dim_scExp(scExp, reduced_dim = "UMAP") plot_reduced_dim_scExp(scExp, color_by = "CD52", reduced_dim = "UMAP")
Plot UMAP colored by Gain or Loss of cytobands
plot_reduced_dim_scExp_CNA(scExp, cytoBand)
plot_reduced_dim_scExp_CNA(scExp, cytoBand)
scExp |
A SingleCellExperiment with "gainOrLoss_cytoBand" reducedDim
slot filled. See |
cytoBand |
Which cytoBand to color cells by |
Plot the gains/lost of the cytoband overlayed on the epigenetic UMAP.
data("scExp") scExp = calculate_CNA(scExp, control_samples = unique(scExp$sample_id)[1], ref_genome="hg38", quantiles_to_define_gol = c(0.05,0.95)) plot_reduced_dim_scExp_CNA(scExp, get_most_variable_cyto(scExp)$cytoBand[1])
data("scExp") scExp = calculate_CNA(scExp, control_samples = unique(scExp$sample_id)[1], ref_genome="hg38", quantiles_to_define_gol = c(0.05,0.95)) plot_reduced_dim_scExp_CNA(scExp, get_most_variable_cyto(scExp)$cytoBand[1])
Barplot of top TFs from ChEA3 TF enrichment analysis
plot_top_TF_scExp( scExp, group = unique(scExp$cell_cluster)[1], set = c("Differential", "Enriched", "Depleted")[1], type = c("Score", "nTargets", "nTargets_over_TF", "nTargets_over_genes")[1], n_top = 25 )
plot_top_TF_scExp( scExp, group = unique(scExp$cell_cluster)[1], set = c("Differential", "Enriched", "Depleted")[1], type = c("Score", "nTargets", "nTargets_over_TF", "nTargets_over_genes")[1], n_top = 25 )
scExp |
A SingleCellExperiment |
group |
A character string specifying the differential group to display the top TFs |
set |
A character string specifying the set of genes in which the TF were enriched, either 'Differential', 'Enriched' or 'Depleted'. |
type |
A character string specifying the Y axis of the plot, either the number of differential targets or the ChEA3 integrated mean score. E.g. either "Score", "nTargets", "nTargets_over_TF" for the number of target genes over the total number of genes targeted by the TF or "nTargets_over_genes" for the number of target genes over the number of genes in the gene set. |
n_top |
An integer specifying the number of top TF to display |
A bar plot of top TFs from ChEA3 TF enrichment analysis
data("scExp") plot_top_TF_scExp( scExp, group = "C1", set = "Differential", type = "Score", n_top = 10) plot_top_TF_scExp( scExp, group = "C1", set = "Enriched", type = "nTargets_over_genes", n_top = 20)
data("scExp") plot_top_TF_scExp( scExp, group = "C1", set = "Differential", type = "Score", n_top = 10) plot_top_TF_scExp( scExp, group = "C1", set = "Enriched", type = "nTargets_over_genes", n_top = 20)
Violin plot of features
plot_violin_feature_scExp( scExp, gene, by = c("cell_cluster", "sample_id")[1], downsample = 5000, max_distanceToTSS = 1000 )
plot_violin_feature_scExp( scExp, gene, by = c("cell_cluster", "sample_id")[1], downsample = 5000, max_distanceToTSS = 1000 )
scExp |
A SingleCellExperiment |
gene |
A character specifying the gene to plot |
by |
Color violin by cell_cluster or sample_id ("cell_cluster") |
downsample |
Downsample for plotting (5000) |
max_distanceToTSS |
Numeric. Maximum distance to a gene's TSS to consider a region linked to a gene. (1000) |
A violin plot of intra-correlation
data(scExp) plot_violin_feature_scExp(scExp, "UBXN10")
data(scExp) plot_violin_feature_scExp(scExp, "UBXN10")
Preprocess scExp - Counts Per Million (CPM)
preprocess_CPM(scExp)
preprocess_CPM(scExp)
scExp |
A SingleCellExperiment Object |
A SingleCellExperiment object.
raw <- create_scDataset_raw() scExp = create_scExp(raw$mat, raw$annot) scExp = preprocess_CPM(scExp) head(SingleCellExperiment::normcounts(scExp))
raw <- create_scDataset_raw() scExp = create_scExp(raw$mat, raw$annot) scExp = preprocess_CPM(scExp) head(SingleCellExperiment::normcounts(scExp))
Preprocess scExp - size only
preprocess_feature_size_only(scExp)
preprocess_feature_size_only(scExp)
scExp |
A SingleCellExperiment Object |
A SingleCellExperiment object.
raw <- create_scDataset_raw() scExp = create_scExp(raw$mat, raw$annot) scExp = preprocess_feature_size_only(scExp) head(SingleCellExperiment::normcounts(scExp))
raw <- create_scDataset_raw() scExp = create_scExp(raw$mat, raw$annot) scExp = preprocess_feature_size_only(scExp) head(SingleCellExperiment::normcounts(scExp))
Preprocess scExp - Read per Kilobase Per Million (RPKM)
preprocess_RPKM(scExp)
preprocess_RPKM(scExp)
scExp |
A SingleCellExperiment Object |
A SingleCellExperiment object.
raw <- create_scDataset_raw() scExp = create_scExp(raw$mat, raw$annot) scExp = preprocess_RPKM(scExp) head(SingleCellExperiment::normcounts(scExp))
raw <- create_scDataset_raw() scExp = create_scExp(raw$mat, raw$annot) scExp = preprocess_RPKM(scExp) head(SingleCellExperiment::normcounts(scExp))
Preprocess scExp - TF-IDF
preprocess_TFIDF(scExp, scale = 10000, log = TRUE)
preprocess_TFIDF(scExp, scale = 10000, log = TRUE)
scExp |
A SingleCellExperiment Object |
scale |
A numeric to multiply the matrix in order to have human readeable numbers. Has no impact on the downstream analysis |
log |
Wether to use neperian log on the TF-IDF normalized data or not. |
A SingleCellExperiment object.
raw <- create_scDataset_raw() scExp = create_scExp(raw$mat, raw$annot) scExp = preprocess_TFIDF(scExp) head(SingleCellExperiment::normcounts(scExp))
raw <- create_scDataset_raw() scExp = create_scExp(raw$mat, raw$annot) scExp = preprocess_TFIDF(scExp) head(SingleCellExperiment::normcounts(scExp))
Preprocess scExp - Transcripts per Million (TPM)
preprocess_TPM(scExp)
preprocess_TPM(scExp)
scExp |
A SingleCellExperiment Object |
A SingleCellExperiment object.
raw <- create_scDataset_raw() scExp = create_scExp(raw$mat, raw$annot) scExp = preprocess_TPM(scExp) head(SingleCellExperiment::normcounts(scExp))
raw <- create_scDataset_raw() scExp = create_scExp(raw$mat, raw$annot) scExp = preprocess_TPM(scExp) head(SingleCellExperiment::normcounts(scExp))
Preprocess and filter matrix annotation data project folder to SCE
preprocessing_filtering_and_reduction( datamatrix, annot_raw, min_reads_per_cell = 1600, max_quantile_read_per_cell = 95, n_top_features = 40000, norm_type = "CPM", n_dims = 10, remove_PC = NULL, subsample_n = NULL, ref_genome = "hg38", exclude_regions = NULL, doBatchCorr = FALSE, batch_sels = NULL )
preprocessing_filtering_and_reduction( datamatrix, annot_raw, min_reads_per_cell = 1600, max_quantile_read_per_cell = 95, n_top_features = 40000, norm_type = "CPM", n_dims = 10, remove_PC = NULL, subsample_n = NULL, ref_genome = "hg38", exclude_regions = NULL, doBatchCorr = FALSE, batch_sels = NULL )
datamatrix |
A sparse count matrix of features x cells. |
annot_raw |
A data.frame with barcode, cell_id, sample_id, batch_id, total_counts |
min_reads_per_cell |
Minimum read per cell to keep the cell |
max_quantile_read_per_cell |
Upper count quantile threshold above which cells are removed |
n_top_features |
Number of features to keep |
norm_type |
Normalization type c("CPM", "TFIDF", "RPKM", "TPM", "feature_size_only") |
n_dims |
An integer specifying the number of dimensions to keep for PCA |
remove_PC |
A vector of string indicating which principal components to remove before downstream analysis as probably correlated to library size. Should be under the form : 'Component_1', 'Component_2', ... Recommended when using 'TFIDF' normalization method. (NULL) |
subsample_n |
Number of cells to subsample. |
ref_genome |
Reference genome ("hg38" or "mm10"). |
exclude_regions |
GenomicRanges with regions to remove from the object. |
doBatchCorr |
Run batch correction ? TRUE or FALSE |
batch_sels |
If doBatchCorr is TRUE, List of characters. Names are batch names, characters are sample names. |
A SingleCellExperiment object containing feature spaces.
raw <- create_scDataset_raw() scExp = preprocessing_filtering_and_reduction(raw$mat, raw$annot)
raw <- create_scDataset_raw() scExp = preprocessing_filtering_and_reduction(raw$mat, raw$annot)
This function takes three different type of single-cell input: - Single cell BAM files (sorted) - Single cell BED files (gzipped) - A combination of an index file, a peak file and cell barcode file (The index file is composed of three column: index i, index j and value x for the non zeroes entries in the sparse matrix.)
raw_counts_to_sparse_matrix( files_dir_list, file_type = c("scBED", "scBAM", "FragmentFile"), use_Signac = TRUE, peak_file = NULL, n_bins = NULL, bin_width = NULL, genebody = NULL, extendPromoter = 2500, verbose = TRUE, ref = c("hg38", "mm10")[1], progress = NULL, BPPARAM = BiocParallel::bpparam() )
raw_counts_to_sparse_matrix( files_dir_list, file_type = c("scBED", "scBAM", "FragmentFile"), use_Signac = TRUE, peak_file = NULL, n_bins = NULL, bin_width = NULL, genebody = NULL, extendPromoter = 2500, verbose = TRUE, ref = c("hg38", "mm10")[1], progress = NULL, BPPARAM = BiocParallel::bpparam() )
files_dir_list |
A named character vector of directories containing the files. The names correspond to sample names. |
file_type |
Input file(s) type(s) ('scBED','scBAM','FragmentFile') |
use_Signac |
Use Signac wrapper function 'FeatureMatrix' if the Signac package is installed (TRUE). |
peak_file |
A file containing genomic location of peaks (NULL) |
n_bins |
The number of bins to tile the genome (NULL) |
bin_width |
The size of bins to tile the genome (NULL) |
genebody |
Count on genes (body + promoter) ? (NULL) |
extendPromoter |
If counting on genes, number of base pairs to extend up or downstream of TSS (2500). |
verbose |
Verbose (TRUE) |
ref |
reference genome to use (hg38) |
progress |
Progress object for Shiny |
BPPARAM |
BPPARAM object for multiprocessing. See bpparam for more informations. Will take the default BPPARAM set in your R session. |
This functions re-counts signal on either fixed genomic bins, a set of user-defined peaks or around the TSS of genes.
A sparse matrix of features x cells
Stuart el al., Multimodal single-cell chromatin analysis with Signac bioRxiv https://doi.org/10.1101/2020.11.09.373613
rawfile_ToBigWig : reads in BAM file and write out BigWig coverage file, normalized and smoothed
rawfile_ToBigWig( input, BigWig_filename, format = "BAM", bin_width = 150, norm_factor, n_smoothBin = 5, ref = "hg38", read_size = 101, original_bins = NULL, quantile_for_peak_calling = 0.85 )
rawfile_ToBigWig( input, BigWig_filename, format = "BAM", bin_width = 150, norm_factor, n_smoothBin = 5, ref = "hg38", read_size = 101, original_bins = NULL, quantile_for_peak_calling = 0.85 )
input |
Either a named list of character vector of path towards single-cell BED files or a sparse raw matrix of small bins (<<500bp). If a named list specifying scBEDn the names MUST correspond to the 'sample_id' column in your SingleCellExperiment object. The single-cell BED files names MUST match the barcode names in your SingleCellExperiment (column 'barcode'). The scBED files can be gzipped or not. |
BigWig_filename |
Path to write the output BigWig file |
format |
File format, either "BAM" or "BED" |
bin_width |
Bin size for coverage |
norm_factor |
Then number of cells or total number of reads in the given sample, for normalization. |
n_smoothBin |
Number of bins for smoothing values |
ref |
Reference genome. |
read_size |
Length of the reads. |
original_bins |
Original bins GenomicRanges in case the format is raw matrix. |
quantile_for_peak_calling |
The quantile to define the threshold above which signal is considered as a peak. |
Writes in the output directory a bigwig file displaying the cumulative coverage of cells and a basic set of peaks called by taking all peaks above a given threshold
Writes a BigWig file as output
Read a count matrix with three first columns (chr,start,end)
read_count_mat_with_separated_chr_start_end( path_to_matrix, format_test, separator )
read_count_mat_with_separated_chr_start_end( path_to_matrix, format_test, separator )
path_to_matrix |
Path to the count matrix |
format_test |
Sample of the read.table |
separator |
Separator character |
A sparseMatrix with rownames in the form "chr1:1222-55555"
Given one or multiple directories, look in each directory for a combination of the following files :
A 'features' file containing unique feature genomic locations -in tab separated format ( *_features.bed / .txt / .tsv / .gz), e.g. chr, start and end
A 'barcodes' file containing unique barcode names ( _barcode.txt / .tsv / .gz)
A 'matrix' A file containing indexes of non zero entries (_matrix.mtx / .gz)
read_sparse_matrix(files_dir_list, ref = c("hg38", "mm10")[1], verbose = TRUE)
read_sparse_matrix(files_dir_list, ref = c("hg38", "mm10")[1], verbose = TRUE)
files_dir_list |
A named character vector containing the full path towards folders. Each folder should contain only the Feature file, the Barcode file and the Matrix file (see description). |
ref |
Reference genome (used to filter non-canonical chromosomes). |
verbose |
Print ? |
Returns a list containing a datamatrix and cell annotation
## Not run: sample_dirs = c("/path/to/folder1/", "/path/to/folder2/") names(sample_dirs) = c("sample_1", "sample_2") out <- read_sparse_matrix(sample_dirs, ref = "hg38") head(out$datamatrix) head(out$annot_raw) ## End(Not run)
## Not run: sample_dirs = c("/path/to/folder1/", "/path/to/folder2/") names(sample_dirs) = c("sample_1", "sample_2") out <- read_sparse_matrix(sample_dirs, ref = "hg38") head(out$datamatrix) head(out$annot_raw) ## End(Not run)
Rebin Helper for rebin_matrix function
rebin_helper(mat_df)
rebin_helper(mat_df)
mat_df |
A data.frame corresponding to sparse matrix indexes & values. |
a data.frame grouped mean-summarised by col and new_row
This functions is best used to re-count large number of small bins or peaks (e.g. <= 5000bp) into equal or larger sized bins. The genome is either cut in fixed bins (e.g. 50,000bp) or into an user defined number of bins. Bins are calculated based on the canconical chromosomes. Note that if peaks are larger than bins, or if peaks are overlapping multiple bins, the signal is added to each bin. Users can increase the minimum overlap to consider peaks overlapping bins (by default 150bp, size of a nucleosome) to disminish the number of peaks overlapping multiple region. Any peak smaller than the minimum overlapp threshold will be dismissed. Therefore, library size might be slightly different from peaks to bins if signal was duplicated into multiple bins or ommitted due to peaks smaller than minimum overlap.
rebin_matrix( mat, bin_width = 50000, custom_annotation = NULL, minoverlap = 500, verbose = TRUE, ref = "hg38", nthreads = 1, rebin_function = rebin_helper )
rebin_matrix( mat, bin_width = 50000, custom_annotation = NULL, minoverlap = 500, verbose = TRUE, ref = "hg38", nthreads = 1, rebin_function = rebin_helper )
mat |
A matrix of peaks x cells |
bin_width |
Width of bins to produce in base pairs (minimum 500) (50000) |
custom_annotation |
A GenomicRanges object specifying the new features to count the matrix on instead of recounting on genomic bins. If not NULL, takes predecency over bin_width. |
minoverlap |
Minimum overlap between the original bins and the new features to consider the peak as overlapping the bin . We recommand to put this number at exactly half of the original bin size (e.g. 500bp for original bin size of 1000bp) so that no original bins are counted twice. (500) |
verbose |
Verbose |
ref |
Reference genome to use (hg38) |
nthreads |
Number of threads to use for paralell processing |
A sparse matrix of larger bins or peaks.
mat = create_scDataset_raw()$mat binned_mat = rebin_matrix(mat,bin_width = 10e6) dim(binned_mat)
mat = create_scDataset_raw()$mat binned_mat = rebin_matrix(mat,bin_width = 10e6) dim(binned_mat)
Reduce dimension with batch corrections
reduce_dim_batch_correction(scExp, mat, batch_list, n)
reduce_dim_batch_correction(scExp, mat, batch_list, n)
scExp |
SingleCellExperiment |
mat |
The normalized count matrix |
batch_list |
List of batches |
n |
Number of PCs to keep |
A list containing the SingleCellExperiment with batch info and the corrected pca
Reduce dimensions (PCA, TSNE, UMAP)
reduce_dims_scExp( scExp, dimension_reductions = c("PCA", "UMAP"), n = 10, batch_correction = FALSE, batch_list = NULL, remove_PC = NULL, verbose = TRUE )
reduce_dims_scExp( scExp, dimension_reductions = c("PCA", "UMAP"), n = 10, batch_correction = FALSE, batch_list = NULL, remove_PC = NULL, verbose = TRUE )
scExp |
A SingleCellExperiment object. |
dimension_reductions |
A character vector of methods to apply. (c('PCA','TSNE','UMAP')) |
n |
Numbers of dimensions to keep for PCA. (50) |
batch_correction |
Do batch correction ? (FALSE) |
batch_list |
List of characters. Names are batch names, characters are sample names. |
remove_PC |
A vector of string indicating which principal components to remove before downstream analysis as probably correlated to library size. Should be under the form : 'Component_1', 'Component_2', ... Recommended when using 'TFIDF' normalization method. (NULL) |
verbose |
Print messages ?(TRUE) |
A SingleCellExperiment object containing feature spaces. See ?reduceDims().
raw <- create_scDataset_raw() scExp = create_scExp(raw$mat, raw$annot) scExp = normalize_scExp(scExp, "CPM") scExp = reduce_dims_scExp(scExp,dimension_reductions=c("PCA","UMAP"))
raw <- create_scDataset_raw() scExp = create_scExp(raw$mat, raw$annot) scExp = normalize_scExp(scExp, "CPM") scExp = reduce_dims_scExp(scExp,dimension_reductions=c("PCA","UMAP"))
Remove chromosome M from scExprownames
remove_chr_M_fun(scExp, verbose)
remove_chr_M_fun(scExp, verbose)
scExp |
A SingleCellExperiment |
verbose |
Print ? |
A SingleCellExperiment without chromosome M (mitochondrial chr)
Remove non canonical chromosomes from scExp
remove_non_canonical_fun(scExp, verbose)
remove_non_canonical_fun(scExp, verbose)
scExp |
A SingleCellExperiment |
verbose |
Print ? |
A SingleCellExperiment without non canonical chromosomes (random,unknown, contigs etc...)
Run hypergeometric enrichment test and combine significant pathways into a data.frame
results_enrichmentTest( differentialGenes, enrichment_qval, GeneSets, GeneSetsDf, GenePool )
results_enrichmentTest( differentialGenes, enrichment_qval, GeneSets, GeneSetsDf, GenePool )
differentialGenes |
Genes significantly over / under expressed |
enrichment_qval |
Adusted p-value threshold above which a pathway is considered significative |
GeneSets |
List of pathways |
GeneSetsDf |
Data.frame of pathways |
GenePool |
Pool of possible genes for testing |
A data.frame with pathways passing q.value threshold
Retrieve Top and Bot most contributing features of PCA
retrieve_top_bot_features_pca( pca, counts, component, n_top_bot, absolute = FALSE )
retrieve_top_bot_features_pca( pca, counts, component, n_top_bot, absolute = FALSE )
pca |
A matrix/data.frame of rotated data |
counts |
the normalized counts used for PCA |
component |
the componenent of interest |
n_top_bot |
the number of top & bot features to take |
absolute |
If TRUE, return the top features in absolute values instead. |
a data.frame of top bot contributing features in PCA
Run pairwise tests
run_pairwise_tests( affectation, by, counts, feature, method, progress = NULL, BPPARAM = BiocParallel::bpparam() )
run_pairwise_tests( affectation, by, counts, feature, method, progress = NULL, BPPARAM = BiocParallel::bpparam() )
affectation |
An annotation data.frame with cell_cluster and cell_id columns |
by |
= A character specifying the column of the object containing the groups of cells to compare. |
counts |
Count matrix |
feature |
Feature data.frame |
method |
DA method, Wilcoxon or edgeR |
progress |
A shiny Progress instance to display progress bar. |
BPPARAM |
BPPARAM object for multiprocessing. See bpparam for more informations. Will take the default BPPARAM set in your R session. |
A list containing objects for DA function
Run tsne on single cell experiment
run_tsne_scExp(scExp, verbose = FALSE)
run_tsne_scExp(scExp, verbose = FALSE)
scExp |
A SingleCellExperiment Object |
verbose |
Print ? |
A colored kable with the number of cells per sample for display
Data from a single-cell ChIP-seq experiment against H3K4me3 active mark from two cell lines, Jurkat B cells and Ramos T cells from Grosselin et al., 2019. The count matrices, on 5kbp bins, were given to ChromSCape and the filtering parameter was set to 3% of cells active in regions and subsampled down to 150 cells per sample. After correlation filtering, the experiment is composed of respectively 51 and 55 cells from Jurkat & Ramos and 5499 5kbp-genomic bins where signal is located.
data("scExp")
data("scExp")
scExp - a SingleCellExperiment with 106 cells and 5499 features (genomic bins) in hg38:
A SingleCellExperiment
The scExp is composed of :
counts and normcounts assays, PCA, UMAP, and Correlation matrix in reducedDims(scExp)
Assignation of genes to genomic bins in rowRanges(scExp)
Cluster information in colData(scExp) correlation
Hierarchical clustering dengogram in metadata$hc_cor
Consensus clustering raw data in metadata$consclust
Consensus clustering cluster-consensus and item consensus dataframes in metadata$icl
Differential analysis in metadata$diff
Gene Set Analysis in metadata$enr
data("scExp") plot_reduced_dim_scExp(scExp) plot_reduced_dim_scExp(scExp,color_by = "cell_cluster") plot_heatmap_scExp(scExp) plot_differential_volcano_scExp(scExp, "C1") plot_differential_summary_scExp(scExp)
data("scExp") plot_reduced_dim_scExp(scExp) plot_reduced_dim_scExp(scExp,color_by = "cell_cluster") plot_heatmap_scExp(scExp) plot_differential_volcano_scExp(scExp, "C1") plot_differential_summary_scExp(scExp)
Separate BAM files into cell cluster BAM files
separate_BAM_into_clusters(affectation, odir, merged_bam)
separate_BAM_into_clusters(affectation, odir, merged_bam)
affectation |
An annotation data.frame containing cell_id and cell_cluster columns |
odir |
A valid output directory path |
merged_bam |
A list of merged bam file paths @importFrom Rsamtools filterBam ScanBamParam |
Create one BAM per cluster from one BAM per condition
Determine Count matrix separator ("tab" or ",")
separator_count_mat(path_to_matrix)
separator_count_mat(path_to_matrix)
path_to_matrix |
A path towards the count matrix to check |
A character separator
Smooth a vector of values with nb_bins left and righ values
smoothBin(bin_score, nb_bins = 10)
smoothBin(bin_score, nb_bins = 10)
bin_score |
A numeric vector of values to be smoothed |
nb_bins |
Number of values to take left and right @importFrom BiocParallel bpvec |
A smooth vector of the same size
Randomly sample x cells from each sample in a SingleCellExperiment to return a subsampled SingleCellExperiment with all samples having maximum n cells. If n is higher than the number of cell in a sample, this sample will not be subsampled.
subsample_scExp(scExp, n_cell_per_sample = 500, n_cell_total = NULL)
subsample_scExp(scExp, n_cell_per_sample = 500, n_cell_total = NULL)
scExp |
A SingleCellExperiment |
n_cell_per_sample |
An integer number of cells to subsample for each sample. Exclusive with n_cells_total. (500) |
n_cell_total |
An integer number of cells to subsample in total. Exclusive with n_cell_per_sample (NULL). |
A subsampled SingleCellExperiment
raw <- create_scDataset_raw() scExp = create_scExp(raw$mat, raw$annot) scExp_sub = subsample_scExp(scExp,50) ## Not run: num_cell_scExp(scExp_sub)
raw <- create_scDataset_raw() scExp = create_scExp(raw$mat, raw$annot) scExp_sub = subsample_scExp(scExp,50) ## Not run: num_cell_scExp(scExp_sub)
This functions does peak calling on each cell population in order to refine gene annotation for large bins. For instance, a 50000bp bins might contain the TSS of several genes, while in reality only one or two of these genes are overlapping the signal (peak). To do so, first in-silico cell sorting is applied based on previously defined clusters contained in the SingleCellExperiment. Taking BAM files of each sample as input, samtools pools then splits reads from each cell barcode into 1 BAM file per cell cluster (pseudo-bulk). Then MACS2 calls peaks on each cluster. The peaks are aggregated and merged if closer to a certain distance defined by user (10000bp). Then,
This function takes as input a SingleCellExperiment, that must contain a 'cell_cluster' column in it's colData, an output directory where to store temporary files, the list of BAM files corresponding to each sample and containing the cell barcode information as a tag (for instance tag CB:Z:xxx, XB:Z:xxx or else...) or single-cell BED files containing the raw reads and corresponding to the 'barcode' column metadata, the p.value used by MACS2 to distinguish significant peaks, the reference genome (either hg38 or mm10), the maximal merging distance in bp and a data.frame containing gene TSS genomic cooridnates of corresponding genome (if set to NULL, will automatically load geneTSS). The output is a SingleCellExperiment with GRanges object containing ranges of each merged peaks that falls within genomic bins of the SingleCellExperiment, saving the bin range as additional column (window_chr, window_start, window_end), as well as the closests genes and their distance relative to the peak. The peaks may be present in several rows if multiple genes are close / overlap to the peaks.
Note that the user must have MACS2 installed and available in the PATH. Users can open command terminal and type 'which macs2' to verify the availability of these programs. Will only work on unix operating system. Check operating system with 'print(.Platform)'.
subset_bam_call_peaks( scExp, odir, input, format = "BAM", p.value = 0.05, ref = "hg38", peak_distance_to_merge = 10000, geneTSS_annotation = NULL, run_coverage = FALSE, progress = NULL )
subset_bam_call_peaks( scExp, odir, input, format = "BAM", p.value = 0.05, ref = "hg38", peak_distance_to_merge = 10000, geneTSS_annotation = NULL, run_coverage = FALSE, progress = NULL )
scExp |
A SingleCellExperiment object |
odir |
Output directory where to write temporary files and each cluster's BAM file |
input |
A character vector of file paths to each sample's BAM file, containing cell barcode information as tags. BAM files can be paired-end or single-end. |
format |
Format of the input data, either "BAM" or "scBED". |
p.value |
a p-value to use for MACS2 to determine significant peaks. (0.05) |
ref |
A reference genome, either hg38 or mm10. ('hg38') |
peak_distance_to_merge |
Maximal distance to merge peaks together after peak calling , in bp. (10000) |
geneTSS_annotation |
A data.frame annotation of genes TSS. If NULL will automatically load Gencode list of genes fro specified reference genome. |
run_coverage |
Create coverage tracks (.bw) for each cluster ? |
progress |
A shiny Progress instance to display progress bar. |
The BED files of the peaks called for each clusters, as well as the merged peaks are written in the output directory.
A SingleCellExperiment with refinded annotation
## Not run: data("scExp") subset_bam_call_peaks(scExp, "path/to/out/", list("sample1" = "path/to/BAM/sample1.bam", "sample2" = "path/to/BAM/sample2.bam"), p.value = 0.05, ref = "hg38", peak_distance_to_merge = 10000, geneTSS_annotation = NULL) ## End(Not run)
## Not run: data("scExp") subset_bam_call_peaks(scExp, "path/to/out/", list("sample1" = "path/to/BAM/sample1.bam", "sample2" = "path/to/BAM/sample2.bam"), p.value = 0.05, ref = "hg38", peak_distance_to_merge = 10000, geneTSS_annotation = NULL) ## End(Not run)
Summary of the differential analysis
summary_DA(scExp, qval.th = 0.01, logFC.th = 1, min.percent = 0.01)
summary_DA(scExp, qval.th = 0.01, logFC.th = 1, min.percent = 0.01)
scExp |
A SingleCellExperiment object containing consclust with selected number of cluster. |
qval.th |
Adjusted p-value threshold. (0.01) |
logFC.th |
Fold change threshold. (1) |
min.percent |
Minimum fraction of cells having the feature active to consider it as significantly differential. (0.01) |
A table summary of the differential analysis
data('scExp') summary_DA(scExp)
data('scExp') summary_DA(scExp)
Swap main & alternative Experiments, with fixed colData
swapAltExp_sameColData(scExp, alt)
swapAltExp_sameColData(scExp, alt)
scExp |
A SingleCellExperiment |
alt |
Name of the alternative experiment |
A swapped SingleCellExperiment with the exact same colData.
data(scExp) swapAltExp_sameColData(scExp, "peaks")
data(scExp) swapAltExp_sameColData(scExp, "peaks")
Creates table of enriched genes sets
table_enriched_genes_scExp( scExp, set = "Both", group = "C1", enr_class_sel = c("c1_positional", "c2_curated", "c3_motif", "c4_computational", "c5_GO", "c6_oncogenic", "c7_immunologic", "hallmark") )
table_enriched_genes_scExp( scExp, set = "Both", group = "C1", enr_class_sel = c("c1_positional", "c2_curated", "c3_motif", "c4_computational", "c5_GO", "c6_oncogenic", "c7_immunologic", "hallmark") )
scExp |
A SingleCellExperiment object containing list of enriched gene sets. |
set |
A character vector, either 'Both', 'Overexpressed' or 'Underexpressed'. ('Both') |
group |
The "group" name from differential analysis. Can be the cluster name or the custom name in case of a custom differential analysis. |
enr_class_sel |
Which classes of gene sets to show. (c('c1_positional', 'c2_curated', ...)) |
A DT::data.table of enriched gene sets.
data("scExp") ## Not run: table_enriched_genes_scExp(scExp)
data("scExp") ## Not run: table_enriched_genes_scExp(scExp)
Warning for differential_analysis_scExp
warning_DA(scExp, by, de_type, method, block, group, ref)
warning_DA(scExp, by, de_type, method, block, group, ref)
scExp |
A SingleCellExperiment object containing consclust with selected number of cluster. |
by |
= A character specifying the column of the object containing the groups of cells to compare. Exclusive with de_type == custom |
de_type |
Type of comparisons. Either 'one_vs_rest', to compare each cluster against all others, or 'pairwise' to make 1 to 1 comparisons. ('one_vs_rest') |
method |
Wilcoxon or edgerGLM |
block |
Use batches as blocking factors ? |
group |
If de_type is custom, the group to compare (data.frame), must be a one-column data.frame with cell_clusters or sample_id as character in rows |
ref |
If de_type is custom, the reference to compare (data.frame), must be a one-column data.frame with cell_clusters or sample_id as character in rows |
Warnings or Errors if the input are not correct
A warning helper for plot_reduced_dim_scExp
warning_plot_reduced_dim_scExp( scExp, color_by, reduced_dim, downsample, transparency, size, max_distanceToTSS, annotate_clusters, min_quantile, max_quantile )
warning_plot_reduced_dim_scExp( scExp, color_by, reduced_dim, downsample, transparency, size, max_distanceToTSS, annotate_clusters, min_quantile, max_quantile )
scExp |
A SingleCellExperiment Object |
color_by |
Feature used for coloration |
reduced_dim |
Reduced Dimension used for plotting |
downsample |
Number of cells to downsample |
transparency |
Alpha parameter, between 0 and 1 |
size |
Size of the points. |
max_distanceToTSS |
Numeric. Maximum distance to a gene's TSS to consider a region linked to a gene. |
annotate_clusters |
A logical indicating if clusters should be labelled. The 'cell_cluster' column should be present in metadata. |
min_quantile |
The lower threshold to remove outlier cells, as quantile of cell embeddings (between 0 and 0.5). |
max_quantile |
The upper threshold to remove outlier cells, as quantile of cell embeddings (between 0.5 and 1). |
Warning or errors if the inputs are not correct
Warning for raw_counts_to_sparse_matrix
warning_raw_counts_to_sparse_matrix( files_dir_list, file_type = c("scBAM", "scBED", "SparseMatrix"), peak_file = NULL, n_bins = NULL, bin_width = NULL, genebody = NULL, extendPromoter = 2500, verbose = TRUE, ref = "hg38" )
warning_raw_counts_to_sparse_matrix( files_dir_list, file_type = c("scBAM", "scBED", "SparseMatrix"), peak_file = NULL, n_bins = NULL, bin_width = NULL, genebody = NULL, extendPromoter = 2500, verbose = TRUE, ref = "hg38" )
files_dir_list |
A named character vector of directory containing the raw files |
file_type |
Input file(s) type(s) ('scBED','scBAM','SparseMatrix') |
peak_file |
A file containing genomic location of peaks (NULL) |
n_bins |
The number of bins to tile the genome (NULL) |
bin_width |
The size of bins to tile the genome (NULL) |
genebody |
Count on genes (body + promoter) ? (NULL) |
extendPromoter |
If counting on genes, number of base pairs to extend up or downstream of TSS (2500). |
verbose |
Verbose (TRUE) |
ref |
reference genome to use (hg38) |
Error or warnings if the input are not correct
Wrapper around 'FeatureMatrix' function from Signac Package
wrapper_Signac_FeatureMatrix( files_dir_list, which, ref = "hg38", process_n = 2000, set_future_plan = TRUE, verbose = TRUE, progress = NULL )
wrapper_Signac_FeatureMatrix( files_dir_list, which, ref = "hg38", process_n = 2000, set_future_plan = TRUE, verbose = TRUE, progress = NULL )
files_dir_list |
A named character vector of directories containing the files. The names correspond to sample names. |
which |
A GenomicRanges containing the features to count on. |
ref |
Reference genome to use (hg38).Chromosomes that are not present in the canonical chromosomes of the given reference genome will be excluded from the matrix. |
process_n |
Number of regions to load into memory at a time, per thread. Processing more regions at once can be faster but uses more memory. (2000) |
set_future_plan |
Set 'multisession' plan within the function (TRUE). If TRUE, the previous plan (e.g. future::plan()) will be set back on exit. |
verbose |
Verbose (TRUE). |
progress |
Progress object for Shiny. |
Signac & future are not required packages for ChromSCape as they are required only for the fragment matrix calculations. To use this function, install Signac package first (future will be installed as a dependency). For the simplicity of the application & optimization, the function by defaults sets future::plan("multisession") with workers = future::availableCores(omit = 1) in order to allow parallel processing with Signac. On exit the plan is re-set to the previously set future plan. Note that future multisession may have trouble running when VPN is on. To run in parallel, first deactivate your VPN if you encounter long runtimes.
A sparse matrix of features x cells
Stuart el al., Multimodal single-cell chromatin analysis with Signac bioRxiv https://doi.org/10.1101/2020.11.09.373613
## Not run: gr_bins = define_feature("hg38", bin_width = 50000) wrapper_Signac_FeatureMatrix("/path/to/dir_containing_fragment_files", gr_bins, ref = "hg38") ## End(Not run)
## Not run: gr_bins = define_feature("hg38", bin_width = 50000) wrapper_Signac_FeatureMatrix("/path/to/dir_containing_fragment_files", gr_bins, ref = "hg38") ## End(Not run)