Package 'ChromSCape' reference manual

Title:	Analysis of single-cell epigenomics datasets with a Shiny App
Description:	ChromSCape - Chromatin landscape profiling for Single Cells - is a ready-to-launch user-friendly Shiny Application for the analysis of single-cell epigenomics datasets (scChIP-seq, scATAC-seq, scCUT&Tag, ...) from aligned data to differential analysis & gene set enrichment analysis. It is highly interactive, enables users to save their analysis and covers a wide range of analytical steps: QC, preprocessing, filtering, batch correction, dimensionality reduction, vizualisation, clustering, differential analysis and gene set analysis.
Authors:	Pacome Prompsy [aut, cre] , Celine Vallot [aut]
Maintainer:	Pacome Prompsy <[email protected]>
License:	GPL-3
Version:	1.17.0
Built:	2025-03-18 04:07:37 UTC
Source:	https://github.com/bioc/ChromSCape

Find nearest peaks of each gene and return refined annotation

Description

Find nearest peaks of each gene and return refined annotation

Usage

annotation_from_merged_peaks(scExp, odir, merged_peaks, geneTSS_annotation)
annotation_from_merged_peaks(scExp, odir, merged_peaks, geneTSS_annotation)

Arguments

`scExp`	A SingleCellExperiment object
`odir`	An output directory where to write the mergedpeaks BED file
`merged_peaks`	A list of GRanges object containing the merged peaks
`geneTSS_annotation`	A GRanges object with reference genes

Value

A data.frame with refined annotation

annotToCol2

Description

annotToCol2

Usage

annotToCol2(
  annotS = NULL,
  annotT = NULL,
  missing = c("", NA),
  anotype = NULL,
  maxnumcateg = 2,
  categCol = NULL,
  quantitCol = NULL,
  plotLegend = TRUE,
  plotLegendFile = NULL
)
annotToCol2(
  annotS = NULL,
  annotT = NULL,
  missing = c("", NA),
  anotype = NULL,
  maxnumcateg = 2,
  categCol = NULL,
  quantitCol = NULL,
  plotLegend = TRUE,
  plotLegendFile = NULL
)

Arguments

`annotS`	A color matrix
`annotT`	A color matrix
`missing`	Convert missing to NA
`anotype`	Annotation type
`maxnumcateg`	Maximum number of categories
`categCol`	Categorical columns
`quantitCol`	Quantitative columns
`plotLegend`	Plot legend ?
`plotLegendFile`	Which file to plot legend ?

Value

A matrix of continuous or discrete colors

Examples

data("scExp")
annotToCol2(SingleCellExperiment::colData(scExp), plotLegend = FALSE)

data("scExp")
annotToCol2(SingleCellExperiment::colData(scExp), plotLegend = FALSE)

Helper binary column for anocol function

Description

Helper binary column for anocol function

Usage

anocol_binary(anocol, anotype, plotLegend, annotS)
anocol_binary(anocol, anotype, plotLegend, annotS)

Arguments

`anocol`	The color feature matrix
`anotype`	The feature types
`plotLegend`	Plot legend ?
`annotS`	A color matrix

Value

A color matrix similar to anocol with binrary columns colored

Helper binary column for anocol function

Description

Helper binary column for anocol function

Usage

anocol_categorical(anocol, categCol, anotype, plotLegend, annotS)
anocol_categorical(anocol, categCol, anotype, plotLegend, annotS)

Arguments

`anocol`	The color feature matrix
`categCol`	Colors for categorical features
`anotype`	The feature types
`plotLegend`	Plot legend ?
`annotS`	A color matrix

Value

A color matrix similar to anocol with binrary columns colored

Count bam files on interval to create count indexes

Description

Count bam files on interval to create count indexes

Usage

bams_to_matrix_indexes(dir, which, BPPARAM = BiocParallel::bpparam())
bams_to_matrix_indexes(dir, which, BPPARAM = BiocParallel::bpparam())

Arguments

`dir`	A directory containing single cell BAM files and BAI files
`which`	Genomic Range on which to count
`BPPARAM`	BPPARAM object for multiprocessing. See bpparam for more informations. Will take the default BPPARAM set in your R session.

Value

A list containing a "feature index" data.frame and a count vector for non 0 entries, both used to form the sparse matrix

Count bed files on interval to create count indexes

Description

Count bed files on interval to create count indexes

Usage

beds_to_matrix_indexes(dir, which, BPPARAM = BiocParallel::bpparam())
beds_to_matrix_indexes(dir, which, BPPARAM = BiocParallel::bpparam())

Arguments

`dir`	A directory containing the single cell BED files
`which`	Genomic Range on which to count
`BPPARAM`	BPPARAM object for multiprocessing. See bpparam for more informations. Will take the default BPPARAM set in your R session.

Value

A list containing a "feature index" data.frame and a names of cells as vector both used to form the sparse matrix

Estimate copy number alterations in cytobands

Description

Cytobands are considered large enough in order that a variation at the cytoband level is not considered as an epigenetic event but as a genetic event, e.g. Copy Number Alterations. The function successively :

Calculates the fraction of reads in each cytoband (FrCyto). See calculate_cyto_mat
Calculates the log2-ratio FrCyto of each cell by the average FrCyto in normal cells. See calculate_logRatio_CNA
Estimates if there was a gain or a loss of copy in each cyto band. See calculate_gain_or_loss

The corresponding matrices are accessibles in the reducedDim slots "cytoBands", "logRatio_cytoBands" and "gainOrLoss_cytoBands" respectively.

Usage

calculate_CNA(
  scExp,
  control_samples = unique(scExp$sample_id)[1],
  ref_genome = c("hg38", "mm10")[1],
  quantiles_to_define_gol = c(0.05, 0.95)
)
calculate_CNA(
  scExp,
  control_samples = unique(scExp$sample_id)[1],
  ref_genome = c("hg38", "mm10")[1],
  quantiles_to_define_gol = c(0.05, 0.95)
)

Arguments

`scExp`	A SingleCellExperiment with "logRatio_cytoBand" reducedDim slot filled. See `calculate_logRatio_CNA`
`control_samples`	Sample IDs of the normal sample to take as reference.
`ref_genome`	Reference genome ('hg38' or 'mm10')
`quantiles_to_define_gol`	Quantiles of normal log2-ratio distribution below/above which cytoband is considered to be a loss/gain. (c(0.05,0.95)). See `calculate_gain_or_loss`

Value

The SCE with the fraction of reads, log2-ratio and gain or loss in each cytobands in each cells (of dimension cell x cytoband) in the reducedDim slots.

Examples


data("scExp")
scExp = calculate_CNA(scExp,  control_samples = unique(scExp$sample_id)[1],
ref_genome="hg38", quantiles_to_define_gol = c(0.05,0.95))
SingleCellExperiment::reducedDim(scExp, "cytoBand")
SingleCellExperiment::reducedDim(scExp, "logRatio_cytoBand")
SingleCellExperiment::reducedDim(scExp, "gainOrLoss_cytoBand")

data("scExp")
scExp = calculate_CNA(scExp,  control_samples = unique(scExp$sample_id)[1],
ref_genome="hg38", quantiles_to_define_gol = c(0.05,0.95))
SingleCellExperiment::reducedDim(scExp, "cytoBand")
SingleCellExperiment::reducedDim(scExp, "logRatio_cytoBand")
SingleCellExperiment::reducedDim(scExp, "gainOrLoss_cytoBand")

Calculate Fraction of reads in each cytobands

Description

Re-Count binned reads onto cytobands and calculate the fraction of reads in each of the cytoband in each cell. For each cell, the fraction of reads in any given cytoband is calculated. Cytobands are considered large enough in order that a variation at the cytoband level is not considered as an epigenetic event but as a genetic event, e.g. Copy Number Alterations.

Usage

calculate_cyto_mat(scExp, ref_genome = c("hg38", "mm10")[1])
calculate_cyto_mat(scExp, ref_genome = c("hg38", "mm10")[1])

Arguments

`scExp`	A SingleCellExperiment with genomic coordinate as features (peaks or bins)
`ref_genome`	Reference genome ('hg38' or 'mm10')

Value

The SCE with the fraction of reads in each cytobands in each cells (of dimension cell x cytoband ) in the reducedDim slot "cytoBand".

Examples


data("scExp")
scExp = calculate_cyto_mat(scExp, ref_genome="hg38")
SingleCellExperiment::reducedDim(scExp, "cytoBand") 

data("scExp")
scExp = calculate_cyto_mat(scExp, ref_genome="hg38")
SingleCellExperiment::reducedDim(scExp, "cytoBand")

Estimate the copy gains/loss of tumor vs normal based on log2-ratio of fraction of reads

Description

Given a SingleCellExperiment object with the slot "logRatio_cytoBand" containing the log2-ratio of the fraction of reads in each cytoband, estimate if the cytoband was lost or acquired a gain in a non-quantitative way. To do so, the quantiles distribution of the normal cells are calculated, and any cytoband below or above will be considered as a loss/gain. The False Discovery Rate is directly proportional to the quantiles.

Usage

calculate_gain_or_loss(scExp, controls, quantiles = c(0.05, 0.95))
calculate_gain_or_loss(scExp, controls, quantiles = c(0.05, 0.95))

Arguments

`scExp`	A SingleCellExperiment with "logRatio_cytoBand" reducedDim slot filled. See `calculate_logRatio_CNA`
`controls`	Sample IDs or Cell IDs of the normal sample to take as reference.
`quantiles`	Quantiles of normal log2-ratio distribution below/above which cytoband is considered to be a loss/gain. (c(0.05,0.95))

Value

The SCE with the gain or loss in each cytobands in each cells (of dimension cell x cytoband ) in the reducedDim slot "gainOrLoss_cytoBand".

Examples


data("scExp")
scExp = calculate_cyto_mat(scExp, ref_genome="hg38")
scExp = calculate_logRatio_CNA(scExp, controls=unique(scExp$sample_id)[1])
scExp = calculate_gain_or_loss(scExp, controls=unique(scExp$sample_id)[1])
SingleCellExperiment::reducedDim(scExp, "gainOrLoss_cytoBand")

data("scExp")
scExp = calculate_cyto_mat(scExp, ref_genome="hg38")
scExp = calculate_logRatio_CNA(scExp, controls=unique(scExp$sample_id)[1])
scExp = calculate_gain_or_loss(scExp, controls=unique(scExp$sample_id)[1])
SingleCellExperiment::reducedDim(scExp, "gainOrLoss_cytoBand")

Calculate the log2-ratio of tumor vs normal fraction of reads in cytobands

Description

Given a SingleCellExperiment object with the slot "cytoBand" containing the fraction of reads in each cytoband, calculates the log2-ratio of tumor vs normal fraction of reads in cytobands, cell by cell. If the average signal in normal sample in a cytoband is 0, set this value to 1 so that the ratio won't affect the fraction of read value.

Usage

calculate_logRatio_CNA(scExp, controls)
calculate_logRatio_CNA(scExp, controls)

Arguments

scExp

A SingleCellExperiment with "cytoBand" reducedDim slot filled.

see calculate_cyto_mat

controls

Sample IDs or Cell IDs of the normal sample to take as reference.

Value

The SCE with the log2-ratio of fraction of reads in each cytobands in each cells (of dimension cell x cytoband ) in the reducedDim slot "logRatio_cytoBand".

Examples


data("scExp")
scExp = calculate_cyto_mat(scExp, ref_genome="hg38")
scExp = calculate_logRatio_CNA(scExp, controls=unique(scExp$sample_id)[1])
SingleCellExperiment::reducedDim(scExp, "logRatio_cytoBand")

data("scExp")
scExp = calculate_cyto_mat(scExp, ref_genome="hg38")
scExp = calculate_logRatio_CNA(scExp, controls=unique(scExp$sample_id)[1])
SingleCellExperiment::reducedDim(scExp, "logRatio_cytoBand")

Calling MACS2 peak caller and merging resulting peaks

Description

Calling MACS2 peak caller and merging resulting peaks

Usage

call_macs2_merge_peaks(
  affectation,
  odir,
  p.value,
  format = c("scBED", "BAM")[1],
  ref,
  peak_distance_to_merge
)
call_macs2_merge_peaks(
  affectation,
  odir,
  p.value,
  format = c("scBED", "BAM")[1],
  ref,
  peak_distance_to_merge
)

Arguments

`affectation`	Annotation data.frame with cell cluster and cell id information
`odir`	Output directory to write MACS2 output
`p.value`	P value to detect peaks, passed to MACS2
`format`	File format, either "BAM" or "scBED"
`ref`	Reference genome to get chromosome information from.
`peak_distance_to_merge`	Distance to merge peaks

Value

A list of merged GRanges peaks

changeRange

Description

changeRange

Usage

changeRange(v, newmin = 1, newmax = 10)
changeRange(v, newmin = 1, newmax = 10)

Arguments

`v`	A numeric vector
`newmin`	New min
`newmax`	New max

Value

A matrix with values scaled between newmin and newmax

A data.frame with the number of targets of each TF in ChEA3

Description

This data.frame was obtained by downloading datasets from ChEA3 database (https://maayanlab.cloud/chea3/) and merging targets for :

ARCHS4_Coexpression
ENCODE_ChIP-seq
Enrichr_Queries
GTEx_Coexpression
Literature_ChIP-seq
ReMap_ChIP-seq

Usage

data("CheA3_TF_nTargets")
data("CheA3_TF_nTargets")

Format

CheA3_TF_nTargets - a data.frame with 1632 rows (unique TFs) and 2 columns

References

Keenan AB, Torre D, Lachmann A, Leong AK, Wojciechowicz M, Utti V, Jagodnik K, Kropiwnicki E, Wang Z, Ma'ayan A (2019) ChEA3: transcription factor enrichment analysis by orthogonal omics integration. Nucleic Acids Research. doi: 10.1093/nar/gkz446

The data.frame is composed of two columns:

TF column containing the TF gene names (human)
nTargets_TF containing the number of targets for this TF in the combined database.

Examples

data("CheA3_TF_nTargets")
head(CheA3_TF_nTargets)

data("CheA3_TF_nTargets")
head(CheA3_TF_nTargets)

Check if matrix rownames are well formated and correct if needed

Description

Throws warnings / error if matrix is in the wrong format

Usage

check_correct_datamatrix(datamatrix_single, sample_name = "")
check_correct_datamatrix(datamatrix_single, sample_name = "")

Arguments

`datamatrix_single`	A sparse matrix
`sample_name`	Matrix sample name for warnings

Value

A sparseMatrix in the right rownames format

Choose a number of clusters

Description

This functions takes as input a SingleCellExperiment object and a number of cluster to select. It outputs a SingleCellExperiment object with each cell assigned to a correlation cluster in colData. Also calculates a hierarchical clustering of the consensus associations calculated by ConsensusClusterPlus.

Usage

choose_cluster_scExp(
  scExp,
  nclust = 3,
  consensus = FALSE,
  hc_linkage = "ward.D"
)
choose_cluster_scExp(
  scExp,
  nclust = 3,
  consensus = FALSE,
  hc_linkage = "ward.D"
)

Arguments

`scExp`	A SingleCellExperiment object containing consclust in metadata.
`nclust`	Number of cluster to pick (3)
`consensus`	Use consensus clustering results instead of simple hierarchical clustering ? (FALSE)
`hc_linkage`	A linkage method for hierarchical clustering. See cor. ('ward.D')

Value

Returns a SingleCellExperiment object with each cell assigned to a correlation cluster in colData.

Examples

data("scExp")
scExp_cf = correlation_and_hierarchical_clust_scExp(scExp)
scExp_cf = choose_cluster_scExp(scExp_cf,nclust=3,consensus=FALSE)
table(scExp_cf$cell_cluster)

scExp_cf = consensus_clustering_scExp(scExp)
scExp_cf_consensus = choose_cluster_scExp(scExp_cf,nclust=3,consensus=TRUE)
table(scExp_cf_consensus$cell_cluster)

data("scExp")
scExp_cf = correlation_and_hierarchical_clust_scExp(scExp)
scExp_cf = choose_cluster_scExp(scExp_cf,nclust=3,consensus=FALSE)
table(scExp_cf$cell_cluster)

scExp_cf = consensus_clustering_scExp(scExp)
scExp_cf_consensus = choose_cluster_scExp(scExp_cf,nclust=3,consensus=TRUE)
table(scExp_cf_consensus$cell_cluster)

Choose perplexity depending on number of cells for Tsne

Description

Choose perplexity depending on number of cells for Tsne

Usage

choose_perplexity(dataset)
choose_perplexity(dataset)

Arguments

dataset

A matrix of features x cells (rows x columns)

Value

A number between 5 and 30 to use in Rtsne function

Col2Hex

Description

Transform character color to hexadecimal color code.

Usage

col2hex(cname)
col2hex(cname)

Arguments

cname

Color name

Value

The HEX color code of a particular color

Adding colors to cells & features

Description

Adding colors to cells & features

Usage

colors_scExp(
  scExp,
  annotCol = "sample_id",
  color_by = "sample_id",
  color_df = NULL
)
colors_scExp(
  scExp,
  annotCol = "sample_id",
  color_by = "sample_id",
  color_df = NULL
)

Arguments

`scExp`	A SingleCellExperiment Object
`annotCol`	Column names to color
`color_by`	If specifying color_df, column names to color
`color_df`	Color data.frame to specify which color for which condition

Value

A SingleCellExperiment with additionnal "color" columns in colData

Examples

data("scExp")
scExp = colors_scExp(scExp,annotCol = c("sample_id",
"total_counts"),
 color_by =  c("sample_id","total_counts"))

#Specific colors using a manually created data.frame :
color_df = data.frame(sample_id=unique(scExp$sample_id),
 sample_id_color=c("red","blue","green","yellow"))
scExp = colors_scExp(scExp,annotCol="sample_id",
color_by="sample_id",color_df=color_df)

data("scExp")
scExp = colors_scExp(scExp,annotCol = c("sample_id",
"total_counts"),
 color_by =  c("sample_id","total_counts"))

#Specific colors using a manually created data.frame :
color_df = data.frame(sample_id=unique(scExp$sample_id),
 sample_id_color=c("red","blue","green","yellow"))
scExp = colors_scExp(scExp,annotCol="sample_id",
color_by="sample_id",color_df=color_df)

Combine two matrices and emit warning if no regions are in common

Description

Combine two matrices and emit warning if no regions are in common

Usage

combine_datamatrix(datamatrix, datamatrix_single, file_names, i)
combine_datamatrix(datamatrix, datamatrix_single, file_names, i)

Arguments

`datamatrix`	A sparse matrix or NULL if empty
`datamatrix_single`	Another sparse matrix
`file_names`	File name corresponding to the matrix for warnings
`i`	file number

Value

A combined sparse matrix

Run enrichment tests and combine into list

Description

Run enrichment tests and combine into list

Usage

combine_enrichmentTests(
  diff,
  enrichment_qval,
  qval.th,
  logFC.th,
  min.percent,
  annotFeat_long,
  peak_distance,
  refined_annotation,
  GeneSets,
  GeneSetsDf,
  GenePool,
  progress = NULL
)
combine_enrichmentTests(
  diff,
  enrichment_qval,
  qval.th,
  logFC.th,
  min.percent,
  annotFeat_long,
  peak_distance,
  refined_annotation,
  GeneSets,
  GeneSetsDf,
  GenePool,
  progress = NULL
)

Arguments

`diff`	Differential list
`enrichment_qval`	Adusted p-value threshold above which a pathway is considered significative list
`qval.th`	Differential analysis adjusted p.value threshold
`logFC.th`	Differential analysis log-fold change threshold
`min.percent`	Minimum fraction of cells having the feature active to consider it as significantly differential. (0.01)
`annotFeat_long`	Long annotation
`peak_distance`	Maximum gene to peak distance
`refined_annotation`	Refined annotation data.frame if peak calling is done
`GeneSets`	List of pathways
`GeneSetsDf`	Data.frame of pathways
`GenePool`	Pool of possible genes for testing
`progress`	A shiny Progress instance to display progress bar.

Value

A list of list of pathway enrichment data.frames for Both / Over / Under and for each cluster

Find comparable variable scExp

Description

Find comparable variable scExp

Usage

comparable_variables(scExp, allExp = TRUE)
comparable_variables(scExp, allExp = TRUE)

Arguments

`scExp`	A SingleCellExperiment
`allExp`	A logical indicating wether alternative experiments comparable variables should also be fetch.

Value

A character vector with the comparable variable names

Creates a summary table with the number of genes under- or overexpressed in each group and outputs several graphical representations

Description

Creates a summary table with the number of genes under- or overexpressed in each group and outputs several graphical representations

Usage

CompareedgeRGLM(
  dataMat = NULL,
  annot = NULL,
  ref_group = NULL,
  groups = NULL,
  featureTab = NULL,
  norm_method = "TMMwsp"
)
CompareedgeRGLM(
  dataMat = NULL,
  annot = NULL,
  ref_group = NULL,
  groups = NULL,
  featureTab = NULL,
  norm_method = "TMMwsp"
)

Arguments

`dataMat`	reads matrix
`annot`	selected annotation of interest
`ref_group`	List containing one or more vectors of reference samples. Name of the vectors will be used in the results table. The length of this list should be 1 or the same length as the groups list
`groups`	List containing the IDs of groups to be compared with the reference samples. Names of the vectors will be used in the results table
`featureTab`	Feature annotations to be added to the results table
`norm_method`	Which method to use for normalizing ('upperquantile')

Value

A dataframe containing the foldchange and p.value of each feature

Author(s)

Eric Letouze & Celine Vallot

Examples

data("scExp")
scExp_cf = correlation_and_hierarchical_clust_scExp(scExp)
scExp_cf = choose_cluster_scExp(scExp_cf,nclust=2,consensus=FALSE)
featureTab = as.data.frame(SummarizedExperiment::rowRanges(scExp_cf))
rownames(featureTab) = featureTab$ID
ref_group = list("C1"=scExp_cf$cell_id[which(scExp_cf$cell_cluster=="C1")])
groups = list("C2"=scExp_cf$cell_id[which(scExp_cf$cell_cluster=="C2")])
myres = CompareedgeRGLM(as.matrix(SingleCellExperiment::counts(scExp_cf)),
annot=as.data.frame(SingleCellExperiment::colData(scExp_cf)),
   ref_group=ref_group,groups=groups, featureTab=featureTab)

data("scExp")
scExp_cf = correlation_and_hierarchical_clust_scExp(scExp)
scExp_cf = choose_cluster_scExp(scExp_cf,nclust=2,consensus=FALSE)
featureTab = as.data.frame(SummarizedExperiment::rowRanges(scExp_cf))
rownames(featureTab) = featureTab$ID
ref_group = list("C1"=scExp_cf$cell_id[which(scExp_cf$cell_cluster=="C1")])
groups = list("C2"=scExp_cf$cell_id[which(scExp_cf$cell_cluster=="C2")])
myres = CompareedgeRGLM(as.matrix(SingleCellExperiment::counts(scExp_cf)),
annot=as.data.frame(SingleCellExperiment::colData(scExp_cf)),
   ref_group=ref_group,groups=groups, featureTab=featureTab)

CompareWilcox

Description

CompareWilcox

Usage

CompareWilcox(
  dataMat = NULL,
  annot = NULL,
  ref_group = NULL,
  groups = NULL,
  featureTab = NULL,
  block = NULL,
  BPPARAM = BiocParallel::bpparam()
)
CompareWilcox(
  dataMat = NULL,
  annot = NULL,
  ref_group = NULL,
  groups = NULL,
  featureTab = NULL,
  block = NULL,
  BPPARAM = BiocParallel::bpparam()
)

Arguments

`dataMat`	A raw count matrix
`annot`	A cell annotation data.frame
`ref_group`	List with cells in reference group(s)
`groups`	List with cells in group(s) to test
`featureTab`	data.frame with feature annotation
`block`	Use a blocking factor to conteract batch effect ?
`BPPARAM`	BPPARAM object for multiprocessing. See bpparam for more informations. Will take the default BPPARAM set in your R session.

Value

A dataframe containing the foldchange and p.value of each feature

Author(s)

Eric Letouze & Celine Vallot & Pacome Prompsy

Examples

data("scExp")
scExp_cf = correlation_and_hierarchical_clust_scExp(scExp)
scExp_cf = choose_cluster_scExp(scExp_cf,nclust=2,consensus=FALSE)
featureTab = as.data.frame(SummarizedExperiment::rowRanges(scExp_cf))
rownames(featureTab) = featureTab$ID
ref_group = list("C1"=scExp_cf$cell_id[which(scExp_cf$cell_cluster=="C1")])
groups = list("C2"=scExp_cf$cell_id[which(scExp_cf$cell_cluster=="C2")])
myres = CompareWilcox(as.matrix(SingleCellExperiment::normcounts(scExp_cf)),
annot=as.data.frame(SingleCellExperiment::colData(scExp_cf)),
   ref_group=ref_group,groups=groups, featureTab=featureTab)

data("scExp")
scExp_cf = correlation_and_hierarchical_clust_scExp(scExp)
scExp_cf = choose_cluster_scExp(scExp_cf,nclust=2,consensus=FALSE)
featureTab = as.data.frame(SummarizedExperiment::rowRanges(scExp_cf))
rownames(featureTab) = featureTab$ID
ref_group = list("C1"=scExp_cf$cell_id[which(scExp_cf$cell_cluster=="C1")])
groups = list("C2"=scExp_cf$cell_id[which(scExp_cf$cell_cluster=="C2")])
myres = CompareWilcox(as.matrix(SingleCellExperiment::normcounts(scExp_cf)),
annot=as.data.frame(SingleCellExperiment::colData(scExp_cf)),
   ref_group=ref_group,groups=groups, featureTab=featureTab)

Concatenate single-cell BED into clusters

Description

Concatenate single-cell BED into clusters

Usage

concatenate_scBed_into_clusters(affectation, files_list, odir)
concatenate_scBed_into_clusters(affectation, files_list, odir)

Arguments

`affectation`	Annotation data.frame containing cluster information
`files_list`	Named list of scBED file paths to concatenate. List Names must match affectation$sample_id and basenames must match affectation$barcode.
`odir`	Output directory to write concatenate pseudo-bulk BEDs.

Value

Merge single-cell BED files into cluster BED files. Ungzip file if BED is gzipped.

Wrapper to apply ConsensusClusterPlus to scExp object

Description

Runs consensus hierarchical clustering on PCA feature space of scExp object. Plot consensus scores for each number of clusters. See ConsensusClusterPlus - Wilkerson, M.D., Hayes, D.N. (2010). ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking. Bioinformatics, 2010 Jun 15;26(12):1572-3.

Usage

consensus_clustering_scExp(
  scExp,
  prefix = NULL,
  maxK = 10,
  reps = 100,
  pItem = 0.8,
  pFeature = 1,
  distance = "pearson",
  clusterAlg = "hc",
  innerLinkage = "ward.D",
  finalLinkage = "ward.D",
  plot_consclust = "pdf",
  plot_icl = "png"
)
consensus_clustering_scExp(
  scExp,
  prefix = NULL,
  maxK = 10,
  reps = 100,
  pItem = 0.8,
  pFeature = 1,
  distance = "pearson",
  clusterAlg = "hc",
  innerLinkage = "ward.D",
  finalLinkage = "ward.D",
  plot_consclust = "pdf",
  plot_icl = "png"
)

Arguments

`scExp`	A SingleCellExperiment object containing 'PCA' in reducedDims.
`prefix`	character value for output directory. Directory is created only if plot_consclust is not NULL. This title can be an abosulte or relative path.
`maxK`	integer value. maximum cluster number to evaluate. (10)
`reps`	integer value. number of subsamples. (100)
`pItem`	numerical value. proportion of items to sample. (0.8)
`pFeature`	numerical value. proportion of features to sample. (1)
`distance`	character value. 'pearson': (1 - Pearson correlation), 'spearman' (1 - Spearman correlation), 'euclidean', 'binary', 'maximum', 'canberra', 'minkowski' or custom distance function. ('pearson')
`clusterAlg`	character value. cluster algorithm. 'hc' heirarchical (hclust), 'pam' for paritioning around medoids, 'km' for k-means upon data matrix, 'kmdist' ('hc') for k-means upon distance matrices (former km option), or a function that returns a clustering. ('hc')
`innerLinkage`	hierarchical linkage method for subsampling. ('ward.D')
`finalLinkage`	hierarchical linkage method for consensus matrix. ('ward.D')
`plot_consclust`	character value. NULL - print to screen, 'pdf', 'png', 'pngBMP' for bitmap png, helpful for large datasets. ('pdf')
`plot_icl`	same as above for item consensus plot. ('png')

Details

This functions takes as input a SingleCellExperiment object that must have 'PCA' in reducedDims and outputs a SingleCellExperiment object containing consclust list calculated cluster consensus and item consensus scores in metadata.

Value

Returns a SingleCellExperiment object containing consclust list, calculated cluster consensus and item consensus scores in metadata.

References

ConsensusClusterPlus package by Wilkerson, M.D., Hayes, D.N. (2010). ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking. Bioinformatics, 2010 Jun 15;26(12):1572-3.

Examples

data("scExp")
scExp_cf = correlation_and_hierarchical_clust_scExp(scExp)
scExp_cf = consensus_clustering_scExp(scExp)

data("scExp")
scExp_cf = correlation_and_hierarchical_clust_scExp(scExp)
scExp_cf = consensus_clustering_scExp(scExp)

Correlation and hierarchical clustering

Description

Calculates cell to cell correlation matrix based on the PCA feature space and runs hierarchical clustering taking 1 - correlation scores as distance.

Usage

correlation_and_hierarchical_clust_scExp(scExp, hc_linkage = "ward.D")
correlation_and_hierarchical_clust_scExp(scExp, hc_linkage = "ward.D")

Arguments

`scExp`	A SingleCellExperiment object, containing 'PCA' in reducedDims.
`hc_linkage`	A linkage method for hierarchical clustering. See cor. ('ward.D')

Details

This functions takes as input a SingleCellExperiment object that must have PCA calculated and outputs a SingleCellExperiment object with correlation matrix and hierarchical clustering.

Value

Return a SingleCellExperiment object with correlation matrix & hiearchical clustering.

Examples

data("scExp")
scExp_cf = correlation_and_hierarchical_clust_scExp(scExp)

data("scExp")
scExp_cf = correlation_and_hierarchical_clust_scExp(scExp)

Create a smoothed and normalized coverage track from a BAM file and given a bin GenomicRanges object (same as deepTools bamCoverage)

Description

Normalization is CPM, smoothing is done by averaging on n_smoothBin regions left and right of any given region.

Usage

count_coverage(
  input,
  format = "BAM",
  bins,
  canonical_chr,
  norm_factor,
  n_smoothBin = 5,
  ref = "hg38",
  read_size = 101,
  original_bins = NULL
)
count_coverage(
  input,
  format = "BAM",
  bins,
  canonical_chr,
  norm_factor,
  n_smoothBin = 5,
  ref = "hg38",
  read_size = 101,
  original_bins = NULL
)

Arguments

`input`	Either a named list of character vector of path towards single-cell BED files or a sparse raw matrix of small bins (<<500bp). If a named list specifying scBEDn the names MUST correspond to the 'sample_id' column in your SingleCellExperiment object. The single-cell BED files names MUST match the barcode names in your SingleCellExperiment (column 'barcode'). The scBED files can be gzipped or not.
`format`	File format, either "BAM" or "BED"
`bins`	A GenomicRanges object of binned genome
`canonical_chr`	GenomicRanges of the chromosomes to read the BAM file.
`norm_factor`	Then number of cells or total number of reads in the given sample, for normalization.
`n_smoothBin`	Number of bins left and right to smooth the signal.
`ref`	Genomic reference
`read_size`	Length of the reads
`original_bins`	Original bins GenomicRanges in case the format is raw matrix.

Value

A binned GenomicRanges that can be readily exported into bigwig file.

Create ChromSCape project folder

Description

Creates a project folder that will be recognizable by ChromSCape Shiny application.

Usage

create_project_folder(
  output_directory,
  analysis_name = "Analysis_1",
  ref_genome = c("hg38", "mm10")[1]
)
create_project_folder(
  output_directory,
  analysis_name = "Analysis_1",
  ref_genome = c("hg38", "mm10")[1]
)

Arguments

`output_directory`	Path towards the directory to create the 'ChromSCape_Analyses' folder and the analysis subfolder. If this path already contains the 'ChromSCape_Analyses' folder, will only create the analysis subfolder.
`analysis_name`	Name of the analysis. Must only contain alphanumerical characters or '_'.
`ref_genome`	Reference genome, either 'hg38' or 'mm10'.

Value

Creates the project folder and returns the root of the project.

Examples

dir = tempdir()  
create_project_folder(output_directory = dir,
 analysis_name = "Analysis_1")
list.dirs(file.path(dir))
dir = tempdir()  
create_project_folder(output_directory = dir,
 analysis_name = "Analysis_1")
list.dirs(file.path(dir))

Create a sample name matrix

Description

Create a sample name matrix

Usage

create_sample_name_mat(nb_samples, samples_names)
create_sample_name_mat(nb_samples, samples_names)

Arguments

`nb_samples`	Number of samples
`samples_names`	Character vector of sample names

Value

A matrix

Create a simulated single cell datamatrix & cell annotation

Description

Create a simulated single cell datamatrix & cell annotation

Usage

create_scDataset_raw(
  cells = 300,
  features = 600,
  featureType = c("window", "peak", "gene"),
  sparse = TRUE,
  nsamp = 4,
  ref = "hg38",
  batch_id = factor(rep(1, nsamp))
)
create_scDataset_raw(
  cells = 300,
  features = 600,
  featureType = c("window", "peak", "gene"),
  sparse = TRUE,
  nsamp = 4,
  ref = "hg38",
  batch_id = factor(rep(1, nsamp))
)

Arguments

`cells`	Number of cells (300)
`features`	Number of features (600)
`featureType`	Type of feature (window)
`sparse`	Is matrix sparse ? (TRUE)
`nsamp`	Number of samples (4)
`ref`	Reference genome ('hg38')
`batch_id`	Batch origin (factor((1,1,1,1))

Value

A list composed of * mat : a sparse matrix following an approximation of the negative binomial law (adapted to scChIPseq) * annot : a data.frame of cell annotation * batches : an integer vector with the batch number for each cell

Examples

# Creating a basic sparse 600 genomic bins x 300 cells matrix and annotation
l = create_scDataset_raw()
head(l$mat)
head(l$annot)
head(l$batches)

# Specifying number of cells, features and samples
l2 = create_scDataset_raw(cells = 500, features = 500, nsamp=2)

# Specifying species
mouse_l = create_scDataset_raw(ref="mm10")

# Specifying batches
batch_l = create_scDataset_raw(nsamp=4, batch_id = factor(c(1,1,2,2)))

# Peaks of different size as features
peak_l = create_scDataset_raw(featureType="peak")
head(peak_l$mat)

# Genes as features
gene_l = create_scDataset_raw(featureType="gene")
head(gene_l$mat)
# Creating a basic sparse 600 genomic bins x 300 cells matrix and annotation
l = create_scDataset_raw()
head(l$mat)
head(l$annot)
head(l$batches)

# Specifying number of cells, features and samples
l2 = create_scDataset_raw(cells = 500, features = 500, nsamp=2)

# Specifying species
mouse_l = create_scDataset_raw(ref="mm10")

# Specifying batches
batch_l = create_scDataset_raw(nsamp=4, batch_id = factor(c(1,1,2,2)))

# Peaks of different size as features
peak_l = create_scDataset_raw(featureType="peak")
head(peak_l$mat)

# Genes as features
gene_l = create_scDataset_raw(featureType="gene")
head(gene_l$mat)

Wrapper to create the single cell experiment from count matrix and feature dataframe

Description

Create the single cell experiment from (sparse) datamatrix and feature dataframe containing feature names and location. Also optionally removes zero count Features, zero count Cells, non canconical chromosomes, and chromosome M. Calculates QC Metrics (scran).

Usage

create_scExp(
  datamatrix,
  annot,
  remove_zero_cells = TRUE,
  remove_zero_features = TRUE,
  remove_non_canonical = TRUE,
  remove_chr_M = TRUE,
  mainExpName = "main",
  verbose = TRUE
)
create_scExp(
  datamatrix,
  annot,
  remove_zero_cells = TRUE,
  remove_zero_features = TRUE,
  remove_non_canonical = TRUE,
  remove_chr_M = TRUE,
  mainExpName = "main",
  verbose = TRUE
)

Arguments

`datamatrix`	A matrix or sparseMatrix of raw counts. Features x Cells (rows x columns).
`annot`	A data.frame containing informations on cells. Should have the same number of rows as the number of columns in datamatrix.
`remove_zero_cells`	remove cells with zero counts ? (TRUE)
`remove_zero_features`	remove cells with zero counts ? (TRUE)
`remove_non_canonical`	remove non canonical chromosomes ?(TRUE)
`remove_chr_M`	remove chromosomes M ? (TRUE)
`mainExpName`	Name of the mainExpName e.g. 'bins', 'peaks'... ("default")
`verbose`	(TRUE)

Value

Returns a SingleCellExperiment object.

Examples

raw <- create_scDataset_raw()
scExp = create_scExp(raw$mat, raw$annot)
scExp

raw <- create_scDataset_raw()
scExp = create_scExp(raw$mat, raw$annot)
scExp

Differential Analysis Custom in 'One vs One' mode

Description

Differential Analysis Custom in 'One vs One' mode

Usage

DA_custom(
  affectation,
  by,
  counts,
  method,
  feature,
  block,
  ref,
  group,
  progress = NULL,
  BPPARAM = BiocParallel::bpparam()
)
DA_custom(
  affectation,
  by,
  counts,
  method,
  feature,
  block,
  ref,
  group,
  progress = NULL,
  BPPARAM = BiocParallel::bpparam()
)

Arguments

`affectation`	An annotation data.frame with cell_id and
`by`	= A character specifying the column of the object containing the groups of cells to compare.
`counts`	Count matrix
`method`	DA method : Wilcoxon or EdgeR
`feature`	Feature tables
`block`	Blocking feature
`ref`	If de_type is custom, the reference to compare (data.frame), must be a one-column data.frame with cell_clusters or sample_id as character in rows
`group`	If de_type is custom, the group to compare (data.frame), must be a one-column data.frame with cell_clusters or sample_id as character in rows
`progress`	A shiny Progress instance to display progress bar.
`BPPARAM`	BPPARAM object for multiprocessing. See bpparam for more informations. Will take the default BPPARAM set in your R session.

Value

A list of results, groups compared and references

Differential Analysis in 'One vs Rest' mode

Description

Differential Analysis in 'One vs Rest' mode

Usage

DA_one_vs_rest(
  affectation,
  by,
  counts,
  method,
  feature,
  block,
  progress = NULL,
  BPPARAM = BiocParallel::bpparam()
)
DA_one_vs_rest(
  affectation,
  by,
  counts,
  method,
  feature,
  block,
  progress = NULL,
  BPPARAM = BiocParallel::bpparam()
)

Arguments

`affectation`	An annotation data.frame with cell_id and cell_cluster columns
`by`	= A character specifying the column of the object containing the groups of cells to compare.
`counts`	Count matrix
`method`	DA method : Wilcoxon or EdgeR
`feature`	Feature tables
`block`	Blocking feature
`progress`	A shiny Progress instance to display progress bar.
`BPPARAM`	BPPARAM object for multiprocessing. See bpparam for more informations. Will take the default BPPARAM set in your R session.

Value

A list of results, groups compared and references

Run differential analysis in Pairwise mode

Description

Run differential analysis in Pairwise mode

Usage

DA_pairwise(
  affectation,
  by,
  counts,
  method,
  feature,
  block,
  progress = NULL,
  BPPARAM = BiocParallel::bpparam()
)
DA_pairwise(
  affectation,
  by,
  counts,
  method,
  feature,
  block,
  progress = NULL,
  BPPARAM = BiocParallel::bpparam()
)

Arguments

`affectation`	An annotation data.frame with cell_cluster and cell_id columns
`by`	= A character specifying the column of the object containing the groups of cells to compare.
`counts`	Count matrix
`method`	DA method, Wilcoxon or edgeR
`feature`	Feature data.frame
`block`	Blocking feature
`progress`	A shiny Progress instance to display progress bar.
`BPPARAM`	BPPARAM object for multiprocessing. See bpparam for more informations. Will take the default BPPARAM set in your R session.

Value

A list of results, groups compared and references

Define the features on which reads will be counted

Description

Define the features on which reads will be counted

Usage

define_feature(ref = c("hg38","mm10")[1],
 peak_file = NULL,
 bin_width  = NULL,
 genebody = FALSE,
 extendPromoter = 2500)
define_feature(ref = c("hg38","mm10")[1],
 peak_file = NULL,
 bin_width  = NULL,
 genebody = FALSE,
 extendPromoter = 2500)

Arguments

`ref`	Reference genome
`peak_file`	A bed file if counting on peaks
`bin_width`	A number of bins if divinding genome into fixed width bins
`genebody`	A logical indicating if feature should be counted in genebodies and promoter.
`extendPromoter`	Extension length before TSS (2500).

Value

A GRanges object

Examples

gr_bins = define_feature("hg38", bin_width = 50000)
gr_genes = define_feature("hg38", genebody = TRUE, extendPromoter = 5000)

gr_bins = define_feature("hg38", bin_width = 50000)
gr_genes = define_feature("hg38", genebody = TRUE, extendPromoter = 5000)

Heuristic discovery of samples based on cell labels

Description

Identify a fixed number of common string (samples) in a set of varying strings (cells). E.g. in the set "Sample1_cell1","Sample1_cell2","Sample2_cell1","Sample2_cell2" and with nb_samples=2, the function returns "Sample1","Sample1","Sample2","Sample2".

Usage

detect_samples(barcodes, nb_samples = 1)
detect_samples(barcodes, nb_samples = 1)

Arguments

`barcodes`	Vector of cell barcode names (e.g. Sample1_cell1, Sample1_cell2...)
`nb_samples`	Number of samples to find

Value

character vector of sample names the same length as cell labels

Examples


barcodes = c(paste0("HBCx22_BC_",seq_len(100)),
paste0("mouse_sample_XX",208:397))
samples = detect_samples(barcodes, nb_samples=2)

barcodes = c(paste0("HBCx22_BC_",seq_len(100)),
paste0("mouse_sample_XX",208:397))
samples = detect_samples(barcodes, nb_samples=2)

Find Differentialy Activated Features (One vs All)

Description

Based on the statement that single-cell epigenomic dataset are very sparse, specifically when analysis small bins or peaks, we can define each feature as being 'active' or not simply by the presence or the absence of reads in this feature. This is the equivalent of binarize the data. When trying to find differences in signal for a feature between multiple cell groups, this function simply compare the percentage of cells 'activating' the feature in each of the group. The p.values are then calculated using a Pearson's Chi-squared Test for Count Data (comparing the number of active cells in one group vs the other) and corrected using Benjamini-Hochberg correction for multiple testing.

Usage

differential_activation(
  scExp,
  by = c("cell_cluster", "sample_id")[1],
  verbose = TRUE,
  progress = NULL
)
differential_activation(
  scExp,
  by = c("cell_cluster", "sample_id")[1],
  verbose = TRUE,
  progress = NULL
)

Arguments

`scExp`	A SingleCellExperiment object containing consclust with selected number of cluster.
`by`	Which grouping to run the marker enrichment ?
`verbose`	Print ?
`progress`	A shiny Progress instance to display progress bar.

Details

To calculate the logFC, the percentage of activation of the features are corrected for total number of reads to correct for library size bias. For each cluster ('group') the function consider the rest of the cells as the reference.

Value

Returns a dataframe of differential activation results that contains the rowData of the SingleCellExperiment with additional logFC, q.value, group activation (fraction of cells active for each feature in the group cells), reference activation (fraction of cells active for each feature in the reference cells).

Examples

data("scExp")
res = differential_activation(scExp, by = "cell_cluster")
res = differential_activation(scExp, by = "sample_id")
data("scExp")
res = differential_activation(scExp, by = "cell_cluster")
res = differential_activation(scExp, by = "sample_id")

Runs differential analysis between cell clusters

Description

Based on clusters of cell defined previously, runs non-parametric Wilcoxon Rank Sum test to find significantly depleted or enriched features, in 'one_vs_rest' mode or 'pairwise' mode. In pairwise mode, each cluster is compared to all other cluster individually, and then pairwise comparisons between clusters are combined to find overall differential features using combineMarkers function from scran.

Usage

differential_analysis_scExp(
  scExp,
  de_type = c("one_vs_rest_fast", "one_vs_rest", "pairwise", "custom")[1],
  by = "cell_cluster",
  method = "wilcox",
  block = NULL,
  group = NULL,
  ref = NULL,
  prioritize_genes = nrow(scExp) > 20000,
  max_distanceToTSS = 1000,
  progress = NULL,
  BPPARAM = BiocParallel::bpparam()
)
differential_analysis_scExp(
  scExp,
  de_type = c("one_vs_rest_fast", "one_vs_rest", "pairwise", "custom")[1],
  by = "cell_cluster",
  method = "wilcox",
  block = NULL,
  group = NULL,
  ref = NULL,
  prioritize_genes = nrow(scExp) > 20000,
  max_distanceToTSS = 1000,
  progress = NULL,
  BPPARAM = BiocParallel::bpparam()
)

Arguments

`scExp`	A SingleCellExperiment object containing consclust with selected number of cluster.
`de_type`	Type of comparisons. Either 'one_vs_rest', to compare each cluster against all others, or 'pairwise' to make 1 to 1 comparisons. ('one_vs_rest')
`by`	= A character specifying the column of the object containing the groups of cells to compare. Exclusive with de_type == custom
`method`	Differential testing method, either 'wilcox' for Wilcoxon non- parametric testing or 'neg.binomial' for edgerGLM based testing. ('wilcox')
`block`	Use batches as blocking factors ? If TRUE, block will be taken as the column "batch_id" from the SCE. Cells will be compared only within samples belonging to the same batch.
`group`	If de_type = "custom", the sample / cluster of interest as a one- column data.frame. The name of the column is the group name and the values are character either cluster ("C1", "C2", ...) or sample_id.
`ref`	If de_type = "custom", the sample / cluster of reference as a one- column data.frame. The name of the column is the group name and the values are character either cluster ("C1", "C2", ...) or sample_id.
`prioritize_genes`	First filter by loci being close to genes ? E.g. for differential analysis, it is more relevant to keep features close to genes
`max_distanceToTSS`	If prioritize_genes is TRUE, the maximum distance to consider a feature close to a gene.
`progress`	A shiny Progress instance to display progress bar.
`BPPARAM`	BPPARAM object for multiprocessing. See bpparam for more informations. Will take the default BPPARAM set in your R session.

Details

This functions takes as input a SingleCellExperiment object with consclust, the type of comparison, either 'one_vs_rest' or 'pairwise', the adjusted p-value threshold (qval.th) and the fold-change threshold (logFC.th). It outputs a SingleCellExperiment object containing a differential list.

Value

Returns a SingleCellExperiment object containing a differential list.

Examples

data("scExp")
scExp_cf = differential_analysis_scExp(scExp)

data("scExp")
scExp_cf = differential_analysis_scExp(scExp)

distPearson

Description

distPearson

Usage

distPearson(m)
distPearson(m)

Arguments

m

A matrix

Value

A dist object

Find the TF that are enriched in the differential genes using ChEA3 API

Description

Find the TF that are enriched in the differential genes using ChEA3 API

Usage

enrich_TF_ChEA3_genes(genes)
enrich_TF_ChEA3_genes(genes)

Arguments

genes

A character vector with the name of genes to enrich for TF.

Value

Returns a SingleCellExperiment object containing list of enriched Gene Sets for each cluster, either in depleted features, enriched features or simply differential features (both).

References

Examples

data(scExp)
enrich_TF_ChEA3_genes(head(unlist(strsplit(SummarizedExperiment::rowData(scExp)$Gene, split = ",", fixed = TRUE)), 15))

data(scExp)
enrich_TF_ChEA3_genes(head(unlist(strsplit(SummarizedExperiment::rowData(scExp)$Gene, split = ",", fixed = TRUE)), 15))

Find the TF that are enriched in the differential genes using ChEA3 database

Description

Find the TF that are enriched in the differential genes using ChEA3 database

Usage

enrich_TF_ChEA3_scExp(
  scExp,
  ref = "hg38",
  qval.th = 0.01,
  logFC.th = 1,
  min.percent = 0.01,
  peak_distance = 1000,
  use_peaks = FALSE,
  progress = NULL,
  verbose = TRUE
)
enrich_TF_ChEA3_scExp(
  scExp,
  ref = "hg38",
  qval.th = 0.01,
  logFC.th = 1,
  min.percent = 0.01,
  peak_distance = 1000,
  use_peaks = FALSE,
  progress = NULL,
  verbose = TRUE
)

Arguments

`scExp`	A SingleCellExperiment object containing list of differential features.
`ref`	A reference annotation, either 'hg38' or 'mm10'. ('hg38')
`qval.th`	Adjusted p-value threshold to define differential features. (0.01)
`logFC.th`	Fold change threshold to define differential features. (1)
`min.percent`	Minimum fraction of cells having the feature active to consider it as significantly differential. (0.01)
`peak_distance`	Maximum distanceToTSS of feature to gene TSS to consider associated, in bp. (1000)
`use_peaks`	Use peak calling method (must be calculated beforehand). (FALSE)
`progress`	A shiny Progress instance to display progress bar.
`verbose`	A logical to print message or not. (TRUE)

Value

Returns a SingleCellExperiment object containing list of enriched Gene Sets for each cluster, either in depleted features, enriched features or simply differential features (both).

Examples

data("scExp")

scExp = enrich_TF_ChEA3_scExp(
 scExp,
 ref = "hg38",
 qval.th = 0.01,
 logFC.th = 1,
 min.percent = 0.01)


data("scExp")

scExp = enrich_TF_ChEA3_scExp(
 scExp,
 ref = "hg38",
 qval.th = 0.01,
 logFC.th = 1,
 min.percent = 0.01)

enrichmentTest

Description

enrichmentTest

Usage

enrichmentTest(gene.sets, mylist, possibleIds, sep = ";", silent = FALSE)
enrichmentTest(gene.sets, mylist, possibleIds, sep = ";", silent = FALSE)

Arguments

`gene.sets`	A list of reference gene sets
`mylist`	A list of genes to test
`possibleIds`	All existing genes
`sep`	Separator used to collapse genes
`silent`	Silent mode ?

Value

A dataframe with the gene sets and their enrichment p.value

Remove specific features (CNA, repeats)

Description

Remove specific features (CNA, repeats)

Usage

exclude_features_scExp(
  scExp,
  features_to_exclude,
  by = "region",
  verbose = TRUE
)
exclude_features_scExp(
  scExp,
  features_to_exclude,
  by = "region",
  verbose = TRUE
)

Arguments

`scExp`	A SingleCellExperiment object.
`features_to_exclude`	A GenomicRanges object or data.frame containing genomic regions or features to exclude or path towards a BED file containing the features to exclude.
`by`	Type of features. Either 'region' or 'feature_name'. If 'region', will look for genomic coordinates in columns 1-3 (chr,start,stop). If 'feature_name', will look for a genes in first column. ('region')
`verbose`	(TRUE)

Value

A SingleCellExperiment object without features to exclude.

Examples


raw <- create_scDataset_raw()
scExp = create_scExp(raw$mat, raw$annot)
features_to_exclude = data.frame(chr=c("chr4","chr7","chr17"),
start=c(50000,8000000,2000000),
end=c(100000,16000000,2500000))
features_to_exclude = as(features_to_exclude,"GRanges")
scExp = exclude_features_scExp(scExp,features_to_exclude)
scExp

raw <- create_scDataset_raw()
scExp = create_scExp(raw$mat, raw$annot)
features_to_exclude = data.frame(chr=c("chr4","chr7","chr17"),
start=c(50000,8000000,2000000),
end=c(100000,16000000,2500000))
features_to_exclude = as(features_to_exclude,"GRanges")
scExp = exclude_features_scExp(scExp,features_to_exclude)
scExp

Add gene annotations to features

Description

Add gene annotations to features

Usage

feature_annotation_scExp(scExp, ref = "hg38", reference_annotation = NULL)
feature_annotation_scExp(scExp, ref = "hg38", reference_annotation = NULL)

Arguments

`scExp`	A SingleCellExperiment object.
`ref`	Reference genome. Either 'hg38' or 'mm10'. ('hg38')
`reference_annotation`	A data.frame containing gene (or else) annotation with genomic coordinates.

Value

A SingleCellExperiment object with annotated rowData.

Examples

raw <- create_scDataset_raw()
scExp = create_scExp(raw$mat, raw$annot)
scExp = feature_annotation_scExp(scExp)
head(SummarizedExperiment::rowRanges(scExp))

# Mouse
raw = create_scDataset_raw(ref = "mm10")
scExp = create_scExp(raw$mat, raw$annot)
scExp = feature_annotation_scExp(scExp,ref="mm10")
head(SummarizedExperiment::rowRanges(scExp))
raw <- create_scDataset_raw()
scExp = create_scExp(raw$mat, raw$annot)
scExp = feature_annotation_scExp(scExp)
head(SummarizedExperiment::rowRanges(scExp))

# Mouse
raw = create_scDataset_raw(ref = "mm10")
scExp = create_scExp(raw$mat, raw$annot)
scExp = feature_annotation_scExp(scExp,ref="mm10")
head(SummarizedExperiment::rowRanges(scExp))

Filter lowly correlated cells

Description

Remove cells that have a correlation score lower than what would be expected by chance with other cells.

Usage

filter_correlated_cell_scExp(scExp, random_iter = 5,
corr_threshold = 99, percent_correlation = 1,
downsample = 2500, verbose = TRUE, n_process = 250,
BPPARAM = BiocParallel::bpparam())
filter_correlated_cell_scExp(scExp, random_iter = 5,
corr_threshold = 99, percent_correlation = 1,
downsample = 2500, verbose = TRUE, n_process = 250,
BPPARAM = BiocParallel::bpparam())

Arguments

`scExp`	A SingleCellExperiment object containing 'Cor', a correlation matrix, in reducedDims.
`random_iter`	Number of random matrices to create to calculate random correlation scores. (50)
`corr_threshold`	Quantile of random correlation score above which a cell is considered to be 'correlated' with another cell. (99)
`percent_correlation`	Percentage of the cells that any cell must be 'correlated' to in order to not be filtered. (1)
`downsample`	Number of cells to calculate correlation filtering threshold ? (2500)
`verbose`	Print messages ? (TRUE)
`n_process`	Number of cell to proceed at a time. Increase this number to increase speed at memory cost
`BPPARAM`	BPPARAM object for multiprocessing. See bpparam for more informations. Will take the default BPPARAM set in your R session.

Details

This functions takes as input a SingleCellExperiment object that must have correlation matrix calculated and outputs a SingleCellExperiment object without lowly correlated cells. TSNE is recalculated.

Value

Returns a SingleCellExperiment object without lowly correlated cells. The calculated correlation score limit threshold is saved in metadata.

Examples

data("scExp")
dim(scExp)
scExp_cf = filter_correlated_cell_scExp(scExp,
corr_threshold = 99, percent_correlation = 1)
dim(scExp_cf)
data("scExp")
dim(scExp)
scExp_cf = filter_correlated_cell_scExp(scExp,
corr_threshold = 99, percent_correlation = 1)
dim(scExp_cf)

Filter genes based on peak calling refined annotation

Description

Filter genes based on peak calling refined annotation

Usage

filter_genes_with_refined_peak_annotation(
  refined_annotation,
  peak_distance,
  signific,
  over,
  under
)
filter_genes_with_refined_peak_annotation(
  refined_annotation,
  peak_distance,
  signific,
  over,
  under
)

Arguments

`refined_annotation`	A data.frame containing each gene distance to real peak
`peak_distance`	Minimum distance to an existing peak to accept a given gene
`signific`	Indexes of all significantly differential genes
`over`	Indexes of all significantly overexpressed genes
`under`	Indexes of all significantly underexpressed genes

Value

List of significantly differential, overexpressed and underexpressed genes close enough to existing peaks

Filter cells and features

Description

Function to filter out cells & features from SingleCellExperiment based on total count per cell, number of cells 'ON' in features and top covered cells that might be doublets.

Usage

filter_scExp(
  scExp,
  min_cov_cell = 1600,
  quant_removal = 95,
  min_count_per_feature = 10,
  verbose = TRUE
)
filter_scExp(
  scExp,
  min_cov_cell = 1600,
  quant_removal = 95,
  min_count_per_feature = 10,
  verbose = TRUE
)

Arguments

`scExp`	A SingleCellExperiment object.
`min_cov_cell`	Minimum counts for each cell. (1600)
`quant_removal`	Centile of cell counts above which cells are removed. (95)
`min_count_per_feature`	Minimum number of reads per feature (10).
`verbose`	(TRUE)

Value

Returns a filtered SingleCellExperiment object.

Examples


raw <- create_scDataset_raw()
scExp = create_scExp(raw$mat, raw$annot)
scExp. = filter_scExp(scExp)

# No feature filtering (all features are valuable)
scExp. = filter_scExp(scExp,min_count_per_feature=30)

# No cell filtering (all features are valuable)
scExp. = filter_scExp(scExp,min_cov_cell=0,quant_removal=100)

raw <- create_scDataset_raw()
scExp = create_scExp(raw$mat, raw$annot)
scExp. = filter_scExp(scExp)

# No feature filtering (all features are valuable)
scExp. = filter_scExp(scExp,min_count_per_feature=30)

# No cell filtering (all features are valuable)
scExp. = filter_scExp(scExp,min_cov_cell=0,quant_removal=100)

Build SNN graph and find cluster using Louvain Algorithm

Description

Build SNN graph and find cluster using Louvain Algorithm

Usage

find_clusters_louvain_scExp(
  scExp,
  k = 10,
  resolution = 1,
  use.dimred = "PCA",
  type = c("rank", "number", "jaccard")[3],
  BPPARAM = BiocParallel::bpparam()
)
find_clusters_louvain_scExp(
  scExp,
  k = 10,
  resolution = 1,
  use.dimred = "PCA",
  type = c("rank", "number", "jaccard")[3],
  BPPARAM = BiocParallel::bpparam()
)

Arguments

`scExp`	A SingleCellExperiment with PCA calculated
`k`	An integer scalar specifying the number of nearest neighbors to consider during graph construction.
`resolution`	A numeric specifying the resolution of clustering to pass to igraph::cluster_louvain function.
`use.dimred`	A string specifying the dimensionality reduction to use.
`type`	A string specifying the type of weighting scheme to use for shared neighbors.
`BPPARAM`	BPPARAM object for multiprocessing. See bpparam for more informations. Will take the default BPPARAM set in your R session.

Value

A SingleCellExperiment containing the vector of clusters (named C1, C2 ....)

Examples

data('scExp')

scExp = find_clusters_louvain_scExp(scExp, k = 10)
data('scExp')

scExp = find_clusters_louvain_scExp(scExp, k = 10)

Find most covered features

Description

Find the top most covered features that will be used for dimensionality reduction. Optionally remove non-top features.

Usage

find_top_features(
  scExp,
  n = 20000,
  keep_others = FALSE,
  prioritize_genes = FALSE,
  max_distanceToTSS = 10000,
  verbose = TRUE
)
find_top_features(
  scExp,
  n = 20000,
  keep_others = FALSE,
  prioritize_genes = FALSE,
  max_distanceToTSS = 10000,
  verbose = TRUE
)

Arguments

`scExp`	A SingleCellExperiment.
`n`	Either an integer indicating the number of top covered regions to find or a character vector of the top percentile of features to keep (e.g. 'q20' to keep top 20% features).
`keep_others`	Logical indicating if non-top regions are to be removed from the SCE or not (FALSE).
`prioritize_genes`	First filter by loci being close to genes ? E.g. for differential analysis, it is more relevant to keep features close to genes
`max_distanceToTSS`	If prioritize_genes is TRUE, the maximum distance to consider a feature close to a gene.
`verbose`	Print ?

Value

A SCE with top features

Examples

data(scExp)
scExp_top = find_top_features(scExp, n = 4000, keep_others = FALSE)

data(scExp)
scExp_top = find_top_features(scExp, n = 4000, keep_others = FALSE)

Runs Gene Set Enrichment Analysis on genes associated with differential features

Description

This function takes previously calculated differential features and runs hypergeometric test to look for enriched gene sets in the genes associated with differential features, for each cell cluster. This functions takes as input a SingleCellExperiment object with consclust, the type of comparison, either 'one_vs_rest' or 'pairwise', the adjusted p-value threshold (qval.th) and the fold-change threshold (logFC.th). It outputs a SingleCellExperiment object containing a differential list.

Usage

gene_set_enrichment_analysis_scExp(
  scExp,
  enrichment_qval = 0.1,
  ref = "hg38",
  GeneSets = NULL,
  GeneSetsDf = NULL,
  GenePool = NULL,
  qval.th = 0.01,
  logFC.th = 1,
  min.percent = 0.01,
  peak_distance = 1000,
  use_peaks = FALSE,
  GeneSetClasses = c("c1_positional", "c2_curated", "c3_motif", "c4_computational",
    "c5_GO", "c6_oncogenic", "c7_immunologic", "hallmark"),
  progress = NULL
)
gene_set_enrichment_analysis_scExp(
  scExp,
  enrichment_qval = 0.1,
  ref = "hg38",
  GeneSets = NULL,
  GeneSetsDf = NULL,
  GenePool = NULL,
  qval.th = 0.01,
  logFC.th = 1,
  min.percent = 0.01,
  peak_distance = 1000,
  use_peaks = FALSE,
  GeneSetClasses = c("c1_positional", "c2_curated", "c3_motif", "c4_computational",
    "c5_GO", "c6_oncogenic", "c7_immunologic", "hallmark"),
  progress = NULL
)

Arguments

`scExp`	A SingleCellExperiment object containing list of differential features.
`enrichment_qval`	Adjusted p-value threshold for gene set enrichment. (0.1)
`ref`	A reference annotation, either 'hg38' or 'mm10'. ('hg38')
`GeneSets`	A named list of gene sets. If NULL will automatically load MSigDB list of gene sets for specified reference genome. (NULL)
`GeneSetsDf`	A dataframe containing gene sets & class of gene sets. If NULL will automatically load MSigDB dataframe of gene sets for specified reference genome. (NULL)
`GenePool`	The pool of genes to run enrichment in. If NULL will automatically load Gencode list of genes fro specified reference genome. (NULL)
`qval.th`	Adjusted p-value threshold to define differential features. (0.01)
`logFC.th`	Fold change threshold to define differential features. (1)
`min.percent`	Minimum fraction of cells having the feature active to consider it as significantly differential. (0.01)
`peak_distance`	Maximum distanceToTSS of feature to gene TSS to consider associated, in bp. (1000)
`use_peaks`	Use peak calling method (must be calculated beforehand). (FALSE)
`GeneSetClasses`	Which classes of MSIGdb to look for.
`progress`	A shiny Progress instance to display progress bar.

Value

Returns a SingleCellExperiment object containing list of enriched Gene Sets for each cluster, either in depleted features, enriched features or simply differential features (both).

Examples

data("scExp")

#Usually recommanding qval.th = 0.01 & logFC.th = 1 or 2
## Not run: scExp_cf = gene_set_enrichment_analysis_scExp(scExp,
 qval.th = 0.4, logFC.th = 0.3)
## End(Not run)

data("scExp")

#Usually recommanding qval.th = 0.01 & logFC.th = 1 or 2
## Not run: scExp_cf = gene_set_enrichment_analysis_scExp(scExp,
 qval.th = 0.4, logFC.th = 0.3)
## End(Not run)

Generate a complete ChromSCape analysis

Description

Generate a complete ChromSCape analysis

Usage

generate_analysis(input_data_folder,
analysis_name = "Analysis_1",
output_directory = "./",
input_data_type = c("scBED", "DenseMatrix", "SparseMatrix", "scBAM")[1],
rebin_sparse_matrix = FALSE,
feature_count_on = c("bins","genebody","peaks")[1],
feature_count_parameter = 50000,
ref_genome = c("hg38","mm10")[1],
run = c("filter", "CNA","cluster", "consensus","peak_call", "coverage", 
       "DA", "GSA", "report")[c(1,3,6,7,8,9)],
min_reads_per_cell = 1000,
max_quantile_read_per_cell = 99,
n_top_features = 40000,
norm_type = "CPM",
subsample_n = NULL,
exclude_regions = NULL,
n_clust = NULL,
corr_threshold = 99,
percent_correlation = 1,
maxK = 10,
qval.th = 0.1,
logFC.th = 1,
enrichment_qval = 0.1,
doBatchCorr  = FALSE,
batch_sels  = NULL,
control_samples_CNA = NULL,
genes_to_plot = c("Krt8","Krt5","Tgfb1", "Foxq1", "Cdkn2b",
                 "Cdkn2a", "chr7:15000000-20000000")
)
generate_analysis(input_data_folder,
analysis_name = "Analysis_1",
output_directory = "./",
input_data_type = c("scBED", "DenseMatrix", "SparseMatrix", "scBAM")[1],
rebin_sparse_matrix = FALSE,
feature_count_on = c("bins","genebody","peaks")[1],
feature_count_parameter = 50000,
ref_genome = c("hg38","mm10")[1],
run = c("filter", "CNA","cluster", "consensus","peak_call", "coverage", 
       "DA", "GSA", "report")[c(1,3,6,7,8,9)],
min_reads_per_cell = 1000,
max_quantile_read_per_cell = 99,
n_top_features = 40000,
norm_type = "CPM",
subsample_n = NULL,
exclude_regions = NULL,
n_clust = NULL,
corr_threshold = 99,
percent_correlation = 1,
maxK = 10,
qval.th = 0.1,
logFC.th = 1,
enrichment_qval = 0.1,
doBatchCorr  = FALSE,
batch_sels  = NULL,
control_samples_CNA = NULL,
genes_to_plot = c("Krt8","Krt5","Tgfb1", "Foxq1", "Cdkn2b",
                 "Cdkn2a", "chr7:15000000-20000000")
)

Arguments

`input_data_folder`	Directory containing the input data.
`analysis_name`	Name given to the analysis.
`output_directory`	Directory where to create the analysis and the HTML report.
`input_data_type`	The type of input data.
`feature_count_on`	For raw data type, on which features to count the cells.
`feature_count_parameter`	Additional parameter corresponding to the 'feature_count_on' parameter. E.g. for 'bins' must be a numeric, e.g. 50000, for 'peaks' must be a character containing path towards a BED peak file.
`rebin_sparse_matrix`	A boolean specifying if the SparseMatrix should be rebinned on features (see feature_count_on and feature_count_parameter).
`ref_genome`	The genome of reference.
`run`	What steps to run. By default runs everything. Some steps are required in order to run downstream steps.
`min_reads_per_cell`	Minimum number of reads per cell.
`max_quantile_read_per_cell`	Upper quantile above which to consider cells doublets.
`n_top_features`	Number of features to keep in the analysis.
`norm_type`	Normalization type.
`subsample_n`	Number of cells per condition to downsample to, for performance principally.
`exclude_regions`	Path towards a BED file containing CNA to exclude from the analysis (optional).
`n_clust`	Number of clusters to force choice of clusters.
`corr_threshold`	Quantile of correlation above which two cells are considered as correlated.
`percent_correlation`	Percentage of the total cells that a cell must be correlated with in order to be kept in the analysis.
`maxK`	Upper cluster number to rest for ConsensusClusterPlus.
`qval.th`	Adjusted p-value below which to consider features differential.
`logFC.th`	Log2-fold-change above/below which to consider a feature depleted/enriched.
`enrichment_qval`	Adjusted p-value below which to consider a gene set as significantly enriched in differential features.
`doBatchCorr`	Logical indicating if batch correction using fastMNN should be run.
`batch_sels`	If doBatchCorr is TRUE, a named list containing the samples in each batch.
`control_samples_CNA`	If running CopyNumber Analysis, a character vector of the sample names that are 'normal'.
`genes_to_plot`	A character vector containing genes of interest of which to plot the coverage.

Value

Creates a ChromSCape-readable directory and saved objects, as well as a multi-tabbed HTML report resuming the analysis.

Examples

## Not run: 
generate_analysis("/path/to/data/", "Analysis_1")

## End(Not run)
## Not run: 
generate_analysis("/path/to/data/", "Analysis_1")

## End(Not run)

Generate count matrix

Description

Generate count matrix

Usage

generate_count_matrix(cells, features, sparse, cell_names, feature_names)
generate_count_matrix(cells, features, sparse, cell_names, feature_names)

Arguments

`cells`	Number of cells
`features`	Number of features
`sparse`	Is matrix sparse ?
`cell_names`	Cell names
`feature_names`	Feature names

Value

A matrix or a sparse matrix

Generate cell cluster pseudo-bulk coverage tracks

Description

Generate cell cluster pseudo-bulk coverage tracks. First, scBED files are concatenated into cell clusters contained in the 'by' column of your SingleCellExperiment object. To do so, for each sample in the given list, the barcodes of each cluster are grepped and BED files are merged into pseudo-bulk of clusters (C1,C2...). Two cells from different can have the same barcode ID as cell affectation is done sample by sample. Then coverage of pseudo-bulk BED files is calculated by averaging & smoothing reads on small genomic window (150bp per default). The pseudo bulk BED and BigWigs coverage tracks are writtend to the output directory. This functionality is not available on Windows as it uses the 'cat' and 'gzip' utilities from Unix OS.

Usage

generate_coverage_tracks(
  scExp_cf,
  input,
  odir,
  format = "scBED",
  ref_genome = c("hg38", "mm10")[1],
  bin_width = 150,
  n_smoothBin = 5,
  read_size = 101,
  quantile_for_peak_calling = 0.85,
  by = "cell_cluster",
  progress = NULL
)
generate_coverage_tracks(
  scExp_cf,
  input,
  odir,
  format = "scBED",
  ref_genome = c("hg38", "mm10")[1],
  bin_width = 150,
  n_smoothBin = 5,
  read_size = 101,
  quantile_for_peak_calling = 0.85,
  by = "cell_cluster",
  progress = NULL
)

Arguments

`scExp_cf`	A SingleCellExperiment with cluster selected. (see `choose_cluster_scExp`). It is recommended having a minimum of ~100 cells per cluster in order to obtain smooth tracks.
`input`	Either a named list of character vector of path towards single-cell BED files or a sparse raw matrix of small bins (<<500bp). If a named list specifying scBED the names MUST correspond to the 'sample_id' column in your SingleCellExperiment object. The single-cell BED files names MUST match the barcode names in your SingleCellExperiment (column 'barcode'). The scBED files can be gzipped or not.
`odir`	The output directory to write the cumulative BED and BigWig files.
`format`	File format, either "raw_mat", "BED" or "BAM"
`ref_genome`	The genome of reference, used to constrain to canonical chromosomes. Either 'hg38' or 'mm10'. 'hg38' per default.
`bin_width`	The width of the bin to create the coverage track. The smaller the greater the resolution & runtime. Default to 150.
`n_smoothBin`	Number of bins left & right to average ('smooth') the signal on. Default to 5.
`read_size`	The estimated size of reads. Default to 101.
`quantile_for_peak_calling`	The quantile to define the threshold above which signal is considered as a peak.
`by`	A character specifying a categoricla column of scExp_cf metadata by which to group cells and generate coverage tracks and peaks.
`progress`	A Progress object for Shiny. Default to NULL.

Value

Generate coverage tracks (.bigwig) for each group in the SingleCellExperiment "by" column.

Examples

## Not run: 
data(scExp)
input_files_coverage = list(
  "scChIP_Jurkat_K4me3" = paste0("/path/to/",scExp$barcode[1:51],".bed"),
  "scChIP_Ramos_K4me3" = paste0("/path/to/",scExp$barcode[52:106],".bed")
)
generate_coverage_tracks(scExp, input_files_coverage, "/path/to/output",
ref_genome = "hg38")

## End(Not run)
## Not run: 
data(scExp)
input_files_coverage = list(
  "scChIP_Jurkat_K4me3" = paste0("/path/to/",scExp$barcode[1:51],".bed"),
  "scChIP_Ramos_K4me3" = paste0("/path/to/",scExp$barcode[52:106],".bed")
)
generate_coverage_tracks(scExp, input_files_coverage, "/path/to/output",
ref_genome = "hg38")

## End(Not run)

Generate feature names

Description

Generate feature names

Usage

generate_feature_names(featureType, ref, features)
generate_feature_names(featureType, ref, features)

Arguments

`featureType`	Type of feature
`ref`	Reference genome
`features`	Number of features to generate

Value

A character vector of feature names

From a ChromSCape analysis directory, generate an HTML report.

Description

From a ChromSCape analysis directory, generate an HTML report.

Usage

generate_report(
  ChromSCape_directory,
  prefix = NULL,
  run = c("filter", "CNA", "cluster", "consensus", "peak_call", "coverage", "DA",
    "GSA", "report")[c(1, 3, 6, 7, 8, 9)],
  genes_to_plot = c("Krt8", "Krt5", "Tgfb1", "Foxq1", "Cdkn2b", "Cdkn2a",
    "chr7:15000000-20000000"),
  control_samples_CNA = NULL
)
generate_report(
  ChromSCape_directory,
  prefix = NULL,
  run = c("filter", "CNA", "cluster", "consensus", "peak_call", "coverage", "DA",
    "GSA", "report")[c(1, 3, 6, 7, 8, 9)],
  genes_to_plot = c("Krt8", "Krt5", "Tgfb1", "Foxq1", "Cdkn2b", "Cdkn2a",
    "chr7:15000000-20000000"),
  control_samples_CNA = NULL
)

Arguments

`ChromSCape_directory`	Path towards the ChromSCape directory of which you want to create the report. The report will be created at the root of this directory.
`prefix`	Name of the analysis with the filtering parameters (e.g. Analysis_3000_100000_99_uncorrected). You will find the prefix in the Filtering_Normalize_Reduce subfolder.
`run`	Which steps to report ("filter", "CNA","cluster", "consensus", "peak_call", "coverage", "DA", "GSA", "report"). Only indicate steps that were done in the analysis. By default do not report CNA, consensus and peak calling.
`genes_to_plot`	For the UMAP, which genes do you want to see in the report.
`control_samples_CNA`	If running the Copy Number Alteration (CNA) part, which samples are the controls

Value

Generate an HTML report at the root of the analysis directory.

Examples


## Not run: 
generate_analysis("/path/to/data/", "Analysis_1")

## End(Not run)
## Not run: 
generate_analysis("/path/to/data/", "Analysis_1")

## End(Not run)

Get color dataframe from shiny::colorInput

Description

Get color dataframe from shiny::colorInput

Usage

get_color_dataframe_from_input(
  input,
  levels_selected,
  color_by = c("sample_id", "total_counts"),
  input_id_prefix = "color_"
)
get_color_dataframe_from_input(
  input,
  levels_selected,
  color_by = c("sample_id", "total_counts"),
  input_id_prefix = "color_"
)

Arguments

`input`	Shiny input object
`levels_selected`	Names of the features
`color_by`	Which feature color to retrieve
`input_id_prefix`	Prefix in front of the feature names

Value

A data.frame with the feature levels and the colors of each level of this feature.

Map features onto cytobands

Description

Map the features of a SingleCellExperiment onto the cytobands of a given genome. Some features might not be mapped to any cytobands (e.g. if they are not in the canconical chromosomes), and are removed from the returned object.

Usage

get_cyto_features(scExp, ref_genome = c("hg38", "mm10")[1])
get_cyto_features(scExp, ref_genome = c("hg38", "mm10")[1])

Arguments

`scExp`	A SingleCellExperiment with genomic coordinate as features (peaks or bins)
`ref_genome`	Reference genome ('hg38' or 'mm10')

Details

The cytobands are an arbitrary cutting of the genome that dates back to staining metaphase chromosomes with Giemsa.

Value

A data.frame of the SCE features with their corresponding cytoband name

Examples

data("scExp")
matching_cyto = get_cyto_features(scExp, ref_genome="hg38")
 
data("scExp")
matching_cyto = get_cyto_features(scExp, ref_genome="hg38")

Get SingleCellExperiment's genomic coordinates

Description

Get SingleCellExperiment's genomic coordinates

Usage

get_genomic_coordinates(scExp)
get_genomic_coordinates(scExp)

Arguments

scExp

A SingleCellExperiment object.

Value

A GRanges object of genomic coordinates.

Examples


raw <- create_scDataset_raw()
scExp = create_scExp(raw$mat, raw$annot)
feature_GRanges = get_genomic_coordinates(scExp)

raw <- create_scDataset_raw()
scExp = create_scExp(raw$mat, raw$annot)
feature_GRanges = get_genomic_coordinates(scExp)

Retrieve the cytobands with the most variable fraction of reads

Description

Given a SingleCellExperiment object with the slot "cytoBand" containing the fraction of reads in each cytoband, calculates the variance of each cytoband and returns a data.frame with the top variables cytobands. Most cytobands are expected to be unchanged between normal and tumor samples, therefore focusing on the top variable cytobands enable to focus on the most interseting regions.

Usage

get_most_variable_cyto(scExp, top = 50)
get_most_variable_cyto(scExp, top = 50)

Arguments

`scExp`	A SingleCellExperiment with "cytoBand" reducedDim slot filled.
`top`	Number of cytobands to return (50).

Value

A data.frame of the top variable cytoBands and their variance

Examples


data("scExp")
scExp = calculate_cyto_mat(scExp, ref_genome="hg38")
get_most_variable_cyto(scExp, top=50)
 
data("scExp")
scExp = calculate_cyto_mat(scExp, ref_genome="hg38")
get_most_variable_cyto(scExp, top=50)

Get pathway matrix

Description

Get pathway matrix

Usage

get_pathway_mat_scExp(
  scExp,
  pathways,
  max_distanceToTSS = 1000,
  ref = "hg38",
  GeneSetClasses = c("c1_positional", "c2_curated", "c3_motif", "c4_computational",
    "c5_GO", "c6_oncogenic", "c7_immunologic", "hallmark"),
  progress = NULL
)
get_pathway_mat_scExp(
  scExp,
  pathways,
  max_distanceToTSS = 1000,
  ref = "hg38",
  GeneSetClasses = c("c1_positional", "c2_curated", "c3_motif", "c4_computational",
    "c5_GO", "c6_oncogenic", "c7_immunologic", "hallmark"),
  progress = NULL
)

Arguments

`scExp`	A SingleCellExperiment
`pathways`	A character vector specifying the pathways to retrieve the cell count for.
`max_distanceToTSS`	Numeric. Maximum distance to a gene's TSS to consider a region linked to a gene. (1000)#' @param ref
`ref`	Reference genome, either mm10 or hg38
`GeneSetClasses`	Which classes of MSIGdb to load
`progress`	A shiny Progress instance to display progress bar.

Value

A matrix of cell to pathway

Examples

data(scExp)
mat = get_pathway_mat_scExp(scExp, pathways = "KEGG_T_CELL_RECEPTOR_SIGNALING_PATHWAY")

data(scExp)
mat = get_pathway_mat_scExp(scExp, pathways = "KEGG_T_CELL_RECEPTOR_SIGNALING_PATHWAY")

Get experiment names from a SingleCellExperiment

Description

Get experiment names from a SingleCellExperiment

Usage

getExperimentNames(scExp)
getExperimentNames(scExp)

Arguments

scExp

A SingleCellExperiment with named mainExp and altExps.

Value

Character vector of unique experiment names

Examples

data(scExp)
getExperimentNames(scExp)

data(scExp)
getExperimentNames(scExp)

Get Main experiment of a SingleCellExperiment

Description

Get Main experiment of a SingleCellExperiment

Usage

getMainExperiment(scExp)
getMainExperiment(scExp)

Arguments

scExp

A SingleCellExperiment with named mainExp and altExps.

Value

The swapped SingleCellExperiment towards "main" experiment

Examples

data(scExp)
getMainExperiment(scExp)

data(scExp)
getMainExperiment(scExp)

gg_fill_hue

Description

gg_fill_hue

Usage

gg_fill_hue(n)
gg_fill_hue(n)

Arguments

n

num hues

Value

A color in HEX format

groupMat

Description

groupMat

Usage

groupMat(mat = NA, margin = 1, groups = NA, method = "mean")
groupMat(mat = NA, margin = 1, groups = NA, method = "mean")

Arguments

`mat`	A matrix
`margin`	By row or columns ?
`groups`	Groups
`method`	Method to group

Value

A grouped matrix

H1proportion

Description

H1proportion

Usage

H1proportion(pv = NA, lambda = 0.5)
H1proportion(pv = NA, lambda = 0.5)

Arguments

`pv`	P.value vector
`lambda`	Lambda value

Value

H1 proportion value

Does SingleCellExperiment has genomic coordinates in features ?

Description

Does SingleCellExperiment has genomic coordinates in features ?

Usage

has_genomic_coordinates(scExp)
has_genomic_coordinates(scExp)

Arguments

scExp

A SingleCellExperiment object

Value

TRUE or FALSE

Examples


raw <- create_scDataset_raw()
scExp = create_scExp(raw$mat, raw$annot)
has_genomic_coordinates(scExp)
raw_genes = create_scDataset_raw(featureType="gene")
scExp_gene = create_scExp(raw_genes$mat, raw_genes$annot)
has_genomic_coordinates(scExp_gene)

raw <- create_scDataset_raw()
scExp = create_scExp(raw$mat, raw$annot)
has_genomic_coordinates(scExp)
raw_genes = create_scDataset_raw(featureType="gene")
scExp_gene = create_scExp(raw_genes$mat, raw_genes$annot)
has_genomic_coordinates(scExp_gene)

hclustAnnotHeatmapPlot

Description

hclustAnnotHeatmapPlot

Usage

hclustAnnotHeatmapPlot(
  x = NULL,
  hc = NULL,
  hmColors = NULL,
  anocol = NULL,
  xpos = c(0.1, 0.9, 0.114, 0.885),
  ypos = c(0.1, 0.5, 0.5, 0.6, 0.62, 0.95),
  dendro.cex = 1,
  xlab.cex = 0.8,
  hmRowNames = FALSE,
  hmRowNames.cex = 0.5
)
hclustAnnotHeatmapPlot(
  x = NULL,
  hc = NULL,
  hmColors = NULL,
  anocol = NULL,
  xpos = c(0.1, 0.9, 0.114, 0.885),
  ypos = c(0.1, 0.5, 0.5, 0.6, 0.62, 0.95),
  dendro.cex = 1,
  xlab.cex = 0.8,
  hmRowNames = FALSE,
  hmRowNames.cex = 0.5
)

Arguments

`x`	A correlation matrix
`hc`	An hclust object
`hmColors`	A color palette
`anocol`	A matrix of colors
`xpos`	Xpos
`ypos`	Ypos
`dendro.cex`	Size of denro names
`xlab.cex`	Size of x label
`hmRowNames`	Write rownames ?
`hmRowNames.cex`	Size of rownames ?

Value

A heatmap

Data.frame of chromosome length - hg38

Description

This data frame provides the length of each "canonical" chromosomes of Homo Sapiens genome build hg38.

Usage

data("hg38.chromosomes")
data("hg38.chromosomes")

Format

hg38.chromosomes - a data frame with 24 rows and 3 variables:

chr: Chromosome - character
start: Start of the chromosome (bp) - integer
end: End of the chromosome (bp) - integer

Data.frame of cytoBandlocation - hg38

Description

This data frame provides the location of each cytoBands of Homo Sapiens genome build hg38.

Usage

data("hg38.cytoBand")
data("hg38.cytoBand")

Format

hg38.cytoBand - a data frame with 862 rows and 4 variables:

chr: Chromosome - character
start: Start of the chromosome (bp) - integer
end: End of the chromosome (bp) - integer
cytoBand: Name of the cytoBand - character

Data.frame of gene TSS - hg38

Description

This dataframe was extracted from Gencode v25 and report the Transcription Start Site of each gene in the Homo Sapiens genome build hg38.

Usage

data("hg38.GeneTSS")
data("hg38.GeneTSS")

Format

hg38.GeneTSS - a data frame with 24 rows and 3 variables:

chr: Chromosome - character
start: Start of the gene (TSS) - integer
end: End of the gene - integer
gene: Gene symbol - character

imageCol

Description

imageCol

Usage

imageCol(
  matcol = NULL,
  strat = NULL,
  xlab.cex = 0.5,
  ylab.cex = 0.5,
  drawLines = c("none", "h", "v", "b")[1],
  ...
)
imageCol(
  matcol = NULL,
  strat = NULL,
  xlab.cex = 0.5,
  ylab.cex = 0.5,
  drawLines = c("none", "h", "v", "b")[1],
  ...
)

Arguments

`matcol`	A matrix of colors
`strat`	Strat
`xlab.cex`	X label size
`ylab.cex`	Y label size
`drawLines`	Draw lines ?
`...`	Additional parameters

Value

A rectangular image

Import and count input files depending on their format

Description

Import and count input files depending on their format

Usage

import_count_input_files(
  files_dir_list,
  file_type,
  which,
  ref,
  verbose,
  progress,
  BPPARAM = BiocParallel::bpparam()
)
import_count_input_files(
  files_dir_list,
  file_type,
  which,
  ref,
  verbose,
  progress,
  BPPARAM = BiocParallel::bpparam()
)

Arguments

`files_dir_list`	A named list of directories containing the input files.
`file_type`	Input file type.
`which`	A GRanges object of features.
`ref`	Reference genome.
`verbose`	Print ?
`progress`	A progress object for Shiny.
`BPPARAM`	BPPARAM object for multiprocessing. See bpparam for more informations. Will take the default BPPARAM set in your R session.

Value

A list with the feature indexes data.frame containing non-zeroes entries in the count matrix and the cell names

Read single-cell matrix(ces) into scExp

Description

Combine one or multiple matrices together to create a sparse matrix and cell annotation data.frame.

Usage

import_scExp(file_paths, remove_pattern = "", temp_path = NULL)
import_scExp(file_paths, remove_pattern = "", temp_path = NULL)

Arguments

`file_paths`	A character vector of file names towards single cell epigenomic matrices (features x cells) (must be .txt / .tsv)
`remove_pattern`	A string pattern to remove from the sample names. Can be a regexp.
`temp_path`	In case matrices are stored in temporary folder, a character vector of path towards temporary files. (NULL)

Value

A list containing:

datamatrix: a sparseMatrix of features x cells
annot_raw: an annotation of cells as data.frame

Examples

mat1 = mat2 = create_scDataset_raw()$mat
tmp1 = tempfile(fileext = ".tsv")
tmp2 = tempfile(fileext = ".tsv")
write.table(as.matrix(mat1),file=tmp1,sep = "\t",
row.names = TRUE,col.names = TRUE,quote = FALSE)
write.table(as.matrix(mat2),file=tmp2, sep = "\t",
row.names = TRUE,col.names = TRUE,quote = FALSE)
file_paths = c(tmp1,tmp2)
out = import_scExp(file_paths)

mat1 = mat2 = create_scDataset_raw()$mat
tmp1 = tempfile(fileext = ".tsv")
tmp2 = tempfile(fileext = ".tsv")
write.table(as.matrix(mat1),file=tmp1,sep = "\t",
row.names = TRUE,col.names = TRUE,quote = FALSE)
write.table(as.matrix(mat2),file=tmp2, sep = "\t",
row.names = TRUE,col.names = TRUE,quote = FALSE)
file_paths = c(tmp1,tmp2)
out = import_scExp(file_paths)

Read index-peaks-barcodes trio files on interval to create count indexes

Description

Read index-peaks-barcodes trio files on interval to create count indexes

Usage

index_peaks_barcodes_to_matrix_indexes(
  feature_file,
  matrix_file,
  barcode_file,
  binarize = FALSE
)
index_peaks_barcodes_to_matrix_indexes(
  feature_file,
  matrix_file,
  barcode_file,
  binarize = FALSE
)

Arguments

`feature_file`	A file containing the features genomic locations
`matrix_file`	A file containing the indexes of non-zeroes values and their value (respectively i,j,x,see sparseMatrix)
`barcode_file`	A file containing the barcode ids
`binarize`	Binarize matrix ?

Value

A list containing a "feature index" data.frame, name_cells, and a region GenomicRange object used to form the sparse matrix

Calculate inter correlation between cluster or samples

Description

Calculate inter correlation between cluster or samples

Usage

inter_correlation_scExp(
  scExp_cf,
  by = c("sample_id", "cell_cluster")[1],
  reference_group = unique(scExp_cf[[by]])[1],
  other_groups = unique(scExp_cf[[by]]),
  fullCor = TRUE
)
inter_correlation_scExp(
  scExp_cf,
  by = c("sample_id", "cell_cluster")[1],
  reference_group = unique(scExp_cf[[by]])[1],
  other_groups = unique(scExp_cf[[by]]),
  fullCor = TRUE
)

Arguments

`scExp_cf`	A SingleCellExperiment
`by`	On which feature to calculate correlation ("sample_id" or "cell_cluster")
`reference_group`	Reference group to calculate correlation with. Must be in accordance with "by".
`other_groups`	Groups on which to calculate correlation (can contain multiple groups, and also reference_group). Must be in accordance with "by".
`fullCor`	A logical specifying if the correlation matrix was calculated on the entire set of cells (TRUE).

Value

A data.frame of average inter-correlation of cells in other_groups with cells in reference_group

Examples

data(scExp)
inter_correlation_scExp(scExp)
data(scExp)
inter_correlation_scExp(scExp)

Calculate intra correlation between cluster or samples

Description

Calculate intra correlation between cluster or samples

Usage

intra_correlation_scExp(
  scExp_cf,
  by = c("sample_id", "cell_cluster")[1],
  fullCor = TRUE
)
intra_correlation_scExp(
  scExp_cf,
  by = c("sample_id", "cell_cluster")[1],
  fullCor = TRUE
)

Arguments

`scExp_cf`	A SingleCellExperiment
`by`	On which feature to calculate correlation ("sample_id" or "cell_cluster")
`fullCor`	Logical specifying if the correlation matrix was run on the entire number of cells or on a subset.

Value

A data.frame of cell average intra-correlation

Examples

data(scExp)
intra_correlation_scExp(scExp, by = "sample_id")
intra_correlation_scExp(scExp, by = "cell_cluster")
data(scExp)
intra_correlation_scExp(scExp, by = "sample_id")
intra_correlation_scExp(scExp, by = "cell_cluster")

Launch ChromSCape

Description

Main function to launch ChromSCape in your favorite browser. You can pass additional parameters that you would pass to shiny::runApp (runApp)

Usage

launchApp(launch.browser = TRUE, ...)
launchApp(launch.browser = TRUE, ...)

Arguments

`launch.browser`	Wether to launch browser or not
`...`	Additional parameters passed to `runApp`

Value

Launches the shiny application

Examples

## Not run: 
launchApp()

## End(Not run)
## Not run: 
launchApp()

## End(Not run)

Load and format MSIGdb pathways using msigdbr package

Description

Load and format MSIGdb pathways using msigdbr package

Usage

load_MSIGdb(
  ref,
  GeneSetClasses = c("c1_positional", "c2_curated", "c3_motif", "c4_computational",
    "c5_GO", "c6_oncogenic", "c7_immunologic", "hallmark")
)
load_MSIGdb(
  ref,
  GeneSetClasses = c("c1_positional", "c2_curated", "c3_motif", "c4_computational",
    "c5_GO", "c6_oncogenic", "c7_immunologic", "hallmark")
)

Arguments

`ref`	Reference genome, either mm10 or hg38
`GeneSetClasses`	Which classes of MSIGdb to load

Value

A list containing the GeneSet (list), GeneSetDf (data.frame) and GenePool character vector of all possible genes

Merge peak files from MACS2 peak caller

Description

Merge peak files from MACS2 peak caller

Usage

merge_MACS2_peaks(peak_file, peak_distance_to_merge, min_peak_size = 200, ref)
merge_MACS2_peaks(peak_file, peak_distance_to_merge, min_peak_size = 200, ref)

Arguments

`peak_file`	A character specifying the path towards the peak file (BED or bedGraph format)
`peak_distance_to_merge`	Maximum distance to merge two peaks
`min_peak_size`	An integer specifying the minimum size of peaks
`ref`	Reference genome

Value

Peaks as GRanges

Data.frame of chromosome length - mm10

Description

This data frame provides the length of each "canonical" chromosomes of Mus Musculus (Mouse) genome build mm10.

Usage

data("mm10.chromosomes")
data("mm10.chromosomes")

Format

mm10.chromosomes - a data frame with 24 rows and 3 variables:

chr: Chromosome - character
start: Start of the chromosome (bp) - integer
end: End of the chromosome (bp) - integer

Data.frame of cytoBandlocation - mm10

Description

This data frame provides the location of each cytoBands of Homo Sapiens genome build mm10.

Usage

data("mm10.cytoBand")
data("mm10.cytoBand")

Format

mm10.cytoBand - a data frame with 862 rows and 4 variables:

chr: Chromosome - character
start: Start of the chromosome (bp) - integer
end: End of the chromosome (bp) - integer
cytoBand: Name of the cytoBand - character

Data.frame of gene TSS - mm10

Description

This dataframe was extracted from Gencode v25 and report the Transcription Start Site of each gene in the Mus Musculus genome build mm10 (Mouse).

Usage

data("mm10.GeneTSS")
data("mm10.GeneTSS")

Format

mm10.GeneTSS - a data frame with 24 rows and 3 variables:

chr: Chromosome name - character
start: Start of the gene (TSS) - integer
end: End of the gene - integer
gene: Gene symbol - character

Normalize counts

Description

Normalize counts

Usage

normalize_scExp(
  scExp,
  type = c("CPM", "TFIDF", "RPKM", "TPM", "feature_size_only")
)
normalize_scExp(
  scExp,
  type = c("CPM", "TFIDF", "RPKM", "TPM", "feature_size_only")
)

Arguments

`scExp`	A SingleCellExperiment object.
`type`	Which normalization to apply. Either 'CPM', 'TFIDF','RPKM', 'TPM' or 'feature_size_only'. Note that for all normalization by size (RPKM, TPM, feature_size_only), the features must have defined genomic coordinates.

Value

A SingleCellExperiment object containing normalized counts. (See ?normcounts())

Examples

raw <- create_scDataset_raw()
scExp = create_scExp(raw$mat, raw$annot)
scExp = normalize_scExp(scExp)
head(SingleCellExperiment::normcounts(scExp))

raw <- create_scDataset_raw()
scExp = create_scExp(raw$mat, raw$annot)
scExp = normalize_scExp(scExp)
head(SingleCellExperiment::normcounts(scExp))

Number of cells before & after correlation filtering

Description

Number of cells before & after correlation filtering

Usage

num_cell_after_cor_filt_scExp(scExp, scExp_cf)
num_cell_after_cor_filt_scExp(scExp, scExp_cf)

Arguments

`scExp`	SingleCellExperiment object before correlation filtering.
`scExp_cf`	SingleCellExperiment object atfer correlation filtering.

Value

A colored kable with the number of cells per sample before and after filtering for display

Examples

data("scExp")
scExp_cf = correlation_and_hierarchical_clust_scExp(scExp)
scExp_cf = filter_correlated_cell_scExp(scExp_cf,
corr_threshold = 99, percent_correlation = 1)
## Not run: num_cell_after_cor_filt_scExp(scExp,scExp_cf)

data("scExp")
scExp_cf = correlation_and_hierarchical_clust_scExp(scExp)
scExp_cf = filter_correlated_cell_scExp(scExp_cf,
corr_threshold = 99, percent_correlation = 1)
## Not run: num_cell_after_cor_filt_scExp(scExp,scExp_cf)

Table of cells before / after QC

Description

Table of cells before / after QC

Usage

num_cell_after_QC_filt_scExp(scExp, annot, datamatrix)
num_cell_after_QC_filt_scExp(scExp, annot, datamatrix)

Arguments

`scExp`	A SingleCellExperiment object.
`annot`	A raw annotation data.frame of cells before filtering.
`datamatrix`	A matrix of cells per regions before filtering.

Value

A formatted kable in HTML.

Examples

raw <- create_scDataset_raw()
scExp = create_scExp(raw$mat, raw$annot)
scExp_filtered = filter_scExp(scExp)
## Not run:  num_cell_after_QC_filt_scExp(
scExp_filtered,SingleCellExperiment::colData(scExp))
## End(Not run)

raw <- create_scDataset_raw()
scExp = create_scExp(raw$mat, raw$annot)
scExp_filtered = filter_scExp(scExp)
## Not run:  num_cell_after_QC_filt_scExp(
scExp_filtered,SingleCellExperiment::colData(scExp))
## End(Not run)

Table of number of cells before correlation filtering

Description

Table of number of cells before correlation filtering

Usage

num_cell_before_cor_filt_scExp(scExp)
num_cell_before_cor_filt_scExp(scExp)

Arguments

scExp

A SingleCellExperiment Object

Value

A colored kable with the number of cells per sample for display

Examples

data("scExp")
## Not run: num_cell_before_cor_filt_scExp(scExp)

data("scExp")
## Not run: num_cell_before_cor_filt_scExp(scExp)

Number of cells in each cluster

Description

Number of cells in each cluster

Usage

num_cell_in_cluster_scExp(scExp)
num_cell_in_cluster_scExp(scExp)

Arguments

scExp

A SingleCellExperiment object containing chromatin groups.

Value

A formatted kable of cell assignation to each cluster.

Examples

data("scExp")
scExp_cf = correlation_and_hierarchical_clust_scExp(scExp)
scExp_cf = choose_cluster_scExp(scExp_cf,nclust=3,consensus=FALSE)
## Not run: num_cell_in_cluster_scExp(scExp_cf)

data("scExp")
scExp_cf = correlation_and_hierarchical_clust_scExp(scExp)
scExp_cf = choose_cluster_scExp(scExp_cf,nclust=3,consensus=FALSE)
## Not run: num_cell_in_cluster_scExp(scExp_cf)

Table of cells

Description

Table of cells

Usage

num_cell_scExp(annot, datamatrix)
num_cell_scExp(annot, datamatrix)

Arguments

`annot`	An annotation of cells. Can be obtain through 'colData(scExp)'.
`datamatrix`	A matrix of cells per regions before filtering.

Value

A formatted kable in HTML.

Examples

raw <- create_scDataset_raw()
scExp = create_scExp(raw$mat, raw$annot)
## Not run: num_cell_scExp(SingleCellExperiment::colData(scExp))
raw <- create_scDataset_raw()
scExp = create_scExp(raw$mat, raw$annot)
## Not run: num_cell_scExp(SingleCellExperiment::colData(scExp))

Run sparse PCA using irlba SVD

Description

This function allows to run a PCA using IRLBA Singular Value Decomposition in a fast & memory efficient way. The increamental Lanczos bidiagonalisation algorithm allows to keep the matrix sparse as the "loci" centering is implicit. The function then multiplies by the approximate singular values (svd$d) in order to get more importance to the first PCs proportionnally to their singular values. This step is crucial for downstream approaches, e.g. UMAP or T-SNE.

Usage

pca_irlba_for_sparseMatrix(x, n_comp, work = 3 * n_comp)
pca_irlba_for_sparseMatrix(x, n_comp, work = 3 * n_comp)

Arguments

`x`	A sparse normalized matrix (features x cells)
`n_comp`	The number of principal components to keep
`work`	Working subspace dimension, larger values can speed convergence at the cost of more memory use.

Value

The rotated data, e.g. the cells x PC column in case of sc data.

Plot cluster consensus

Description

Plot cluster consensus score for each k as a bargraph.

Usage

plot_cluster_consensus_scExp(scExp)
plot_cluster_consensus_scExp(scExp)

Arguments

scExp

A SingleCellExperiment

Value

The consensus score for each cluster for each k as a barplot

Examples

data("scExp")

plot_cluster_consensus_scExp(scExp)

data("scExp")

plot_cluster_consensus_scExp(scExp)

Plotting correlation of PCs with a variable of interest

Description

Plotting correlation of PCs with a variable of interest

Usage

plot_correlation_PCA_scExp(
  scExp,
  correlation_var = "total_counts",
  color_by = NULL,
  topPC = 10
)
plot_correlation_PCA_scExp(
  scExp,
  correlation_var = "total_counts",
  color_by = NULL,
  topPC = 10
)

Arguments

`scExp`	A SingleCellExperiment Object
`correlation_var`	A string specifying with which numeric variable from colData of scExp to calculate and plot the correlation of each PC with. ('total_counts')
`color_by`	A string specifying with which categorical variable to color the plot. ('NULL')
`topPC`	An integer specifying the number of PCs to plot correlation with 10

Value

A ggplot histogram representing the distribution of count per cell

Examples

data("scExp")
plot_correlation_PCA_scExp(scExp, topPC = 25)
plot_correlation_PCA_scExp(scExp, color_by = "cell_cluster")
plot_correlation_PCA_scExp(scExp, color_by = "sample_id")
data("scExp")
plot_correlation_PCA_scExp(scExp, topPC = 25)
plot_correlation_PCA_scExp(scExp, color_by = "cell_cluster")
plot_correlation_PCA_scExp(scExp, color_by = "sample_id")

Coverage plot

Description

Coverage plot

Usage

plot_coverage_BigWig(
  coverages,
  label_color_list,
  peaks = NULL,
  chrom,
  start,
  end,
  ref = "hg38"
)
plot_coverage_BigWig(
  coverages,
  label_color_list,
  peaks = NULL,
  chrom,
  start,
  end,
  ref = "hg38"
)

Arguments

`coverages`	A list containing sample coverage as GenomicRanges
`label_color_list`	List of colors, list names are labels
`peaks`	A GRanges object containing peaks location to plot (optional)
`chrom`	Chromosome
`start`	Start
`end`	End
`ref`	Genomic Reference

Value

A coverage plot annotated with genes

Examples

data(scExp)

data(scExp)

Differential summary barplot

Description

Differential summary barplot

Usage

plot_differential_summary_scExp(
  scExp_cf,
  qval.th = 0.01,
  logFC.th = 1,
  min.percent = 0.01
)
plot_differential_summary_scExp(
  scExp_cf,
  qval.th = 0.01,
  logFC.th = 1,
  min.percent = 0.01
)

Arguments

`scExp_cf`	A SingleCellExperiment object
`qval.th`	Adjusted p-value threshold. (0.01)
`logFC.th`	Fold change threshold. (1)
`min.percent`	Minimum fraction of cells having the feature active to consider it as significantly differential. (0.01)

Value

A barplot summary of differential analysis

Examples

data("scExp")
plot_differential_summary_scExp(scExp)
data("scExp")
plot_differential_summary_scExp(scExp)

Volcano plot of differential features

Description

Volcano plot of differential features

Usage

plot_differential_volcano_scExp(
  scExp_cf,
  group = "C1",
  logFC.th = 1,
  qval.th = 0.01,
  min.percent = 0.01
)
plot_differential_volcano_scExp(
  scExp_cf,
  group = "C1",
  logFC.th = 1,
  qval.th = 0.01,
  min.percent = 0.01
)

Arguments

`scExp_cf`	A SingleCellExperiment object
`group`	A character indicating the group for which to plot the differential volcano plot. ("C1")
`logFC.th`	Fold change threshold. (1)
`qval.th`	Adjusted p-value threshold. (0.01)
`min.percent`	Minimum fraction of cells having the feature active to consider it as significantly differential. (0.01)

Value

A volcano plot of differential analysis of a specific cluster

Examples

data("scExp")
plot_differential_volcano_scExp(scExp,"C1")
data("scExp")
plot_differential_volcano_scExp(scExp,"C1")

Plotting distribution of signal

Description

Plotting distribution of signal

Usage

plot_distribution_scExp(
  scExp,
  raw = TRUE,
  log10 = FALSE,
  pseudo_counts = 1,
  bins = 150
)
plot_distribution_scExp(
  scExp,
  raw = TRUE,
  log10 = FALSE,
  pseudo_counts = 1,
  bins = 150
)

Arguments

`scExp`	A SingleCellExperiment Object
`raw`	Use raw counts ?
`log10`	Transform using log10 ?
`pseudo_counts`	Pseudo-count to add if using log10
`bins`	Number of bins in the histogram

Value

A ggplot histogram representing the distribution of count per cell

Examples

data("scExp")
plot_distribution_scExp(scExp)
data("scExp")
plot_distribution_scExp(scExp)

Plot Gain or Loss of cytobands of the most variables cytobands

Description

Plot Gain or Loss of cytobands of the most variables cytobands

Usage

plot_gain_or_loss_barplots(scExp, cells = NULL, top = 20)

plot_gain_or_loss_barplots(scExp, cells = NULL, top = 20)
plot_gain_or_loss_barplots(scExp, cells = NULL, top = 20)

plot_gain_or_loss_barplots(scExp, cells = NULL, top = 20)

Arguments

`scExp`	A SingleCellExperiment with "logRatio_cytoBand" reducedDim slot filled. See `calculate_logRatio_CNA`
`cells`	Cell IDs of the tumor samples to
`top`	Number of most variables cytobands to plot

Value

Plot the gains/lost in the selected cells of interest as multiple barplots

Examples


data("scExp")
scExp = calculate_CNA(scExp,  control_samples = unique(scExp$sample_id)[1],
ref_genome="hg38", quantiles_to_define_gol = c(0.05,0.95))
plot_gain_or_loss_barplots(scExp, cells = scExp$cell_id[which(
scExp$sample_id %in% unique(scExp$sample_id)[2])])


data("scExp")
scExp = calculate_CNA(scExp,  control_samples = unique(scExp$sample_id)[1],
ref_genome="hg38", quantiles_to_define_gol = c(0.05,0.95))
plot_gain_or_loss_barplots(scExp, cells = scExp$cell_id[which(
scExp$sample_id %in% unique(scExp$sample_id)[2])])

data("scExp")
scExp = calculate_CNA(scExp,  control_samples = unique(scExp$sample_id)[1],
ref_genome="hg38", quantiles_to_define_gol = c(0.05,0.95))
plot_gain_or_loss_barplots(scExp, cells = scExp$cell_id[which(
scExp$sample_id %in% unique(scExp$sample_id)[2])])


data("scExp")
scExp = calculate_CNA(scExp,  control_samples = unique(scExp$sample_id)[1],
ref_genome="hg38", quantiles_to_define_gol = c(0.05,0.95))
plot_gain_or_loss_barplots(scExp, cells = scExp$cell_id[which(
scExp$sample_id %in% unique(scExp$sample_id)[2])])

Plot cell correlation heatmap with annotations

Description

Plot cell correlation heatmap with annotations

Usage

plot_heatmap_scExp(
  scExp,
  name_hc = "hc_cor",
  corColors = (grDevices::colorRampPalette(c("royalblue", "white", "indianred1")))(256),
  color_by = NULL,
  downsample = 1000,
  hc_linkage = "ward.D"
)
plot_heatmap_scExp(
  scExp,
  name_hc = "hc_cor",
  corColors = (grDevices::colorRampPalette(c("royalblue", "white", "indianred1")))(256),
  color_by = NULL,
  downsample = 1000,
  hc_linkage = "ward.D"
)

Arguments

`scExp`	A SingleCellExperiment Object
`name_hc`	Name of the hclust contained in the SingleCellExperiment object
`corColors`	A palette of colors for the heatmap
`color_by`	Which features to add as additional bands on top of plot
`downsample`	Number of cells to downsample
`hc_linkage`	A linkage method for hierarchical clustering. See cor. ('ward.D')

Value

A heatmap of cell to cell correlation, grouping cells by hierarchical clustering.

Examples

data("scExp")
plot_heatmap_scExp(scExp)

data("scExp")
plot_heatmap_scExp(scExp)

Violin plot of inter-correlation distribution between one or multiple groups and one reference group

Description

Violin plot of inter-correlation distribution between one or multiple groups and one reference group

Usage

plot_inter_correlation_scExp(
  scExp_cf,
  by = c("sample_id", "cell_cluster")[1],
  jitter_by = NULL,
  reference_group = unique(scExp_cf[[by]])[1],
  other_groups = unique(scExp_cf[[by]]),
  downsample = 5000
)
plot_inter_correlation_scExp(
  scExp_cf,
  by = c("sample_id", "cell_cluster")[1],
  jitter_by = NULL,
  reference_group = unique(scExp_cf[[by]])[1],
  other_groups = unique(scExp_cf[[by]]),
  downsample = 5000
)

Arguments

`scExp_cf`	A SingleCellExperiment
`by`	Color by sample_id or cell_cluster
`jitter_by`	Add jitter points of another layer (cell_cluster or sample_id)
`reference_group`	Character containing the reference group name to calculate correlation from.
`other_groups`	Character vector of the other groups for which to calculate correlation with the reference group.
`downsample`	Downsample for plotting

Value

A violin plot of inter-correlation

Examples

data(scExp)
plot_intra_correlation_scExp(scExp)
data(scExp)
plot_intra_correlation_scExp(scExp)

Violin plot of intra-correlation distribution

Description

Violin plot of intra-correlation distribution

Usage

plot_intra_correlation_scExp(
  scExp_cf,
  by = c("sample_id", "cell_cluster")[1],
  jitter_by = NULL,
  downsample = 5000
)
plot_intra_correlation_scExp(
  scExp_cf,
  by = c("sample_id", "cell_cluster")[1],
  jitter_by = NULL,
  downsample = 5000
)

Arguments

`scExp_cf`	A SingleCellExperiment
`by`	Color by sample_id or cell_cluster
`jitter_by`	Add jitter points of another layer (cell_cluster or sample_id)
`downsample`	Downsample for plotting

Value

A violin plot of intra-correlation

Examples

data(scExp)
plot_intra_correlation_scExp(scExp)
data(scExp)
plot_intra_correlation_scExp(scExp)

Plot Top/Bottom most contributing features to PCA

Description

Plot Top/Bottom most contributing features to PCA

Usage

plot_most_contributing_features(
  scExp,
  component = "Component_1",
  n_top_bot = 10
)
plot_most_contributing_features(
  scExp,
  component = "Component_1",
  n_top_bot = 10
)

Arguments

`scExp`	A SingleCellExperiment containing "PCA" in reducedDims and gene annotation in rowRanges
`component`	The name of the component of interest
`n_top_bot`	An integer number of top and bot regions to plot

Details

If a gene TSS is within 10,000bp of the region, the name of the gene(s) will be displayed instead of the region

Value

A barplot of top and bottom features with the largest absolute value in the component of interest

Examples

data(scExp)
plot_most_contributing_features(scExp, component = "Component_1")
data(scExp)
plot_most_contributing_features(scExp, component = "Component_1")

Barplot of the % of active cells for a given features

Description

Barplot of the % of active cells for a given features

Usage

plot_percent_active_feature_scExp(
  scExp,
  gene,
  by = c("cell_cluster", "sample_id")[1],
  highlight = NULL,
  downsample = 5000,
  max_distanceToTSS = 1000
)
plot_percent_active_feature_scExp(
  scExp,
  gene,
  by = c("cell_cluster", "sample_id")[1],
  highlight = NULL,
  downsample = 5000,
  max_distanceToTSS = 1000
)

Arguments

`scExp`	A SingleCellExperiment
`gene`	A character specifying the gene to plot
`by`	Color violin by cell_cluster or sample_id ("cell_cluster")
`highlight`	A specific group to highlight in a one vs all fashion
`downsample`	Downsample for plotting (5000)
`max_distanceToTSS`	Numeric. Maximum distance to a gene's TSS to consider a region linked to a gene. (1000)

Value

A violin plot of intra-correlation

Examples

data(scExp)
plot_percent_active_feature_scExp(scExp, "UBXN10")

data(scExp)
plot_percent_active_feature_scExp(scExp, "UBXN10")

Pie chart of top contribution of chromosomes in the 100 most contributing features to PCA #'

Description

Pie chart of top contribution of chromosomes in the 100 most contributing features to PCA #'

Usage

plot_pie_most_contributing_chr(
  scExp,
  component = "Component_1",
  n_top_bot = 100
)
plot_pie_most_contributing_chr(
  scExp,
  component = "Component_1",
  n_top_bot = 100
)

Arguments

`scExp`	A SingleCellExperiment containing "PCA" in reducedDims and gene annotation in rowRanges
`component`	The name of the component of interest
`n_top_bot`	An integer number of top and bot regions to plot (100)

Value

A pie chart showing the distribution of chromosomes in the top features with the largest absolute value in the component of interest

Examples

data(scExp)
 plot_pie_most_contributing_chr(scExp, component = "Component_1")
data(scExp)
 plot_pie_most_contributing_chr(scExp, component = "Component_1")

Plot reduced dimensions (PCA, TSNE, UMAP)

Description

Plot reduced dimensions (PCA, TSNE, UMAP)

Usage

plot_reduced_dim_scExp(
  scExp,
  color_by = "sample_id",
  reduced_dim = c("PCA", "TSNE", "UMAP"),
  select_x = NULL,
  select_y = NULL,
  downsample = 5000,
  transparency = 0.6,
  size = 1,
  max_distanceToTSS = 1000,
  annotate_clusters = "cell_cluster" %in% colnames(colData(scExp)),
  min_quantile = 0.01,
  max_quantile = 0.99
)
plot_reduced_dim_scExp(
  scExp,
  color_by = "sample_id",
  reduced_dim = c("PCA", "TSNE", "UMAP"),
  select_x = NULL,
  select_y = NULL,
  downsample = 5000,
  transparency = 0.6,
  size = 1,
  max_distanceToTSS = 1000,
  annotate_clusters = "cell_cluster" %in% colnames(colData(scExp)),
  min_quantile = 0.01,
  max_quantile = 0.99
)

Arguments

`scExp`	A SingleCellExperiment Object
`color_by`	Character of eature used for coloration. Can be cell metadata ('total_counts', 'sample_id', ...) or a gene name.
`reduced_dim`	Reduced Dimension used for plotting
`select_x`	Which variable to select for x axis
`select_y`	Which variable to select for y axis
`downsample`	Number of cells to downsample
`transparency`	Alpha parameter, between 0 and 1
`size`	Size of the points.
`max_distanceToTSS`	The maximum distance to TSS to consider a gene linked to a region. Used only if "color_by" is a gene name.
`annotate_clusters`	A logical indicating if clusters should be labelled. The 'cell_cluster' column should be present in metadata.
`min_quantile`	The lower threshold to remove outlier cells, as quantile of cell embeddings (between 0 and 0.5).
`max_quantile`	The upper threshold to remove outlier cells, as quantile of cell embeddings (between 0.5 and 1).

Value

A ggplot geom_point plot of reduced dimension 2D reprensentation

Examples

data("scExp")
plot_reduced_dim_scExp(scExp, color_by = "sample_id")
plot_reduced_dim_scExp(scExp, color_by = "total_counts")
plot_reduced_dim_scExp(scExp, reduced_dim = "UMAP")
plot_reduced_dim_scExp(scExp, color_by = "CD52",  reduced_dim = "UMAP")

data("scExp")
plot_reduced_dim_scExp(scExp, color_by = "sample_id")
plot_reduced_dim_scExp(scExp, color_by = "total_counts")
plot_reduced_dim_scExp(scExp, reduced_dim = "UMAP")
plot_reduced_dim_scExp(scExp, color_by = "CD52",  reduced_dim = "UMAP")

Plot UMAP colored by Gain or Loss of cytobands

Description

Plot UMAP colored by Gain or Loss of cytobands

Usage

plot_reduced_dim_scExp_CNA(scExp, cytoBand)
plot_reduced_dim_scExp_CNA(scExp, cytoBand)

Arguments

`scExp`	A SingleCellExperiment with "gainOrLoss_cytoBand" reducedDim slot filled. See `calculate_gain_or_loss`
`cytoBand`	Which cytoBand to color cells by

Value

Plot the gains/lost of the cytoband overlayed on the epigenetic UMAP.

Examples


data("scExp")
scExp = calculate_CNA(scExp,  control_samples = unique(scExp$sample_id)[1],
ref_genome="hg38", quantiles_to_define_gol = c(0.05,0.95))
plot_reduced_dim_scExp_CNA(scExp, get_most_variable_cyto(scExp)$cytoBand[1])

data("scExp")
scExp = calculate_CNA(scExp,  control_samples = unique(scExp$sample_id)[1],
ref_genome="hg38", quantiles_to_define_gol = c(0.05,0.95))
plot_reduced_dim_scExp_CNA(scExp, get_most_variable_cyto(scExp)$cytoBand[1])

Barplot of top TFs from ChEA3 TF enrichment analysis

Description

Barplot of top TFs from ChEA3 TF enrichment analysis

Usage

plot_top_TF_scExp(
  scExp,
  group = unique(scExp$cell_cluster)[1],
  set = c("Differential", "Enriched", "Depleted")[1],
  type = c("Score", "nTargets", "nTargets_over_TF", "nTargets_over_genes")[1],
  n_top = 25
)
plot_top_TF_scExp(
  scExp,
  group = unique(scExp$cell_cluster)[1],
  set = c("Differential", "Enriched", "Depleted")[1],
  type = c("Score", "nTargets", "nTargets_over_TF", "nTargets_over_genes")[1],
  n_top = 25
)

Arguments

`scExp`	A SingleCellExperiment
`group`	A character string specifying the differential group to display the top TFs
`set`	A character string specifying the set of genes in which the TF were enriched, either 'Differential', 'Enriched' or 'Depleted'.
`type`	A character string specifying the Y axis of the plot, either the number of differential targets or the ChEA3 integrated mean score. E.g. either "Score", "nTargets", "nTargets_over_TF" for the number of target genes over the total number of genes targeted by the TF or "nTargets_over_genes" for the number of target genes over the number of genes in the gene set.
`n_top`	An integer specifying the number of top TF to display

Value

A bar plot of top TFs from ChEA3 TF enrichment analysis

Examples

data("scExp")

plot_top_TF_scExp(
 scExp,
 group = "C1",
  set = "Differential",
   type = "Score",
    n_top = 10)
    
plot_top_TF_scExp(
 scExp,
 group = "C1",
  set = "Enriched",
   type = "nTargets_over_genes",
    n_top = 20)
    
data("scExp")

plot_top_TF_scExp(
 scExp,
 group = "C1",
  set = "Differential",
   type = "Score",
    n_top = 10)
    
plot_top_TF_scExp(
 scExp,
 group = "C1",
  set = "Enriched",
   type = "nTargets_over_genes",
    n_top = 20)

Violin plot of features

Description

Violin plot of features

Usage

plot_violin_feature_scExp(
  scExp,
  gene,
  by = c("cell_cluster", "sample_id")[1],
  downsample = 5000,
  max_distanceToTSS = 1000
)
plot_violin_feature_scExp(
  scExp,
  gene,
  by = c("cell_cluster", "sample_id")[1],
  downsample = 5000,
  max_distanceToTSS = 1000
)

Arguments

`scExp`	A SingleCellExperiment
`gene`	A character specifying the gene to plot
`by`	Color violin by cell_cluster or sample_id ("cell_cluster")
`downsample`	Downsample for plotting (5000)
`max_distanceToTSS`	Numeric. Maximum distance to a gene's TSS to consider a region linked to a gene. (1000)

Value

A violin plot of intra-correlation

Examples

data(scExp)
plot_violin_feature_scExp(scExp, "UBXN10")

data(scExp)
plot_violin_feature_scExp(scExp, "UBXN10")

Preprocess scExp - Counts Per Million (CPM)

Description

Preprocess scExp - Counts Per Million (CPM)

Usage

preprocess_CPM(scExp)
preprocess_CPM(scExp)

Arguments

scExp

A SingleCellExperiment Object

Value

A SingleCellExperiment object.

Examples

raw <- create_scDataset_raw()
scExp = create_scExp(raw$mat, raw$annot)
scExp = preprocess_CPM(scExp)
head(SingleCellExperiment::normcounts(scExp))

raw <- create_scDataset_raw()
scExp = create_scExp(raw$mat, raw$annot)
scExp = preprocess_CPM(scExp)
head(SingleCellExperiment::normcounts(scExp))

Preprocess scExp - size only

Description

Preprocess scExp - size only

Usage

preprocess_feature_size_only(scExp)
preprocess_feature_size_only(scExp)

Arguments

scExp

A SingleCellExperiment Object

Value

A SingleCellExperiment object.

Examples

raw <- create_scDataset_raw()
scExp = create_scExp(raw$mat, raw$annot)
scExp = preprocess_feature_size_only(scExp)
head(SingleCellExperiment::normcounts(scExp))

raw <- create_scDataset_raw()
scExp = create_scExp(raw$mat, raw$annot)
scExp = preprocess_feature_size_only(scExp)
head(SingleCellExperiment::normcounts(scExp))

Preprocess scExp - Read per Kilobase Per Million (RPKM)

Description

Preprocess scExp - Read per Kilobase Per Million (RPKM)

Usage

preprocess_RPKM(scExp)
preprocess_RPKM(scExp)

Arguments

scExp

A SingleCellExperiment Object

Value

A SingleCellExperiment object.

Examples


raw <- create_scDataset_raw()
scExp = create_scExp(raw$mat, raw$annot)
scExp = preprocess_RPKM(scExp)
head(SingleCellExperiment::normcounts(scExp))
raw <- create_scDataset_raw()
scExp = create_scExp(raw$mat, raw$annot)
scExp = preprocess_RPKM(scExp)
head(SingleCellExperiment::normcounts(scExp))

Preprocess scExp - TF-IDF

Description

Preprocess scExp - TF-IDF

Usage

preprocess_TFIDF(scExp, scale = 10000, log = TRUE)
preprocess_TFIDF(scExp, scale = 10000, log = TRUE)

Arguments

`scExp`	A SingleCellExperiment Object
`scale`	A numeric to multiply the matrix in order to have human readeable numbers. Has no impact on the downstream analysis
`log`	Wether to use neperian log on the TF-IDF normalized data or not.

Value

A SingleCellExperiment object.

Examples

raw <- create_scDataset_raw()
scExp = create_scExp(raw$mat, raw$annot)
scExp = preprocess_TFIDF(scExp)
head(SingleCellExperiment::normcounts(scExp))

raw <- create_scDataset_raw()
scExp = create_scExp(raw$mat, raw$annot)
scExp = preprocess_TFIDF(scExp)
head(SingleCellExperiment::normcounts(scExp))

Preprocess scExp - Transcripts per Million (TPM)

Description

Preprocess scExp - Transcripts per Million (TPM)

Usage

preprocess_TPM(scExp)
preprocess_TPM(scExp)

Arguments

scExp

A SingleCellExperiment Object

Value

A SingleCellExperiment object.

Examples


raw <- create_scDataset_raw()
scExp = create_scExp(raw$mat, raw$annot)
scExp = preprocess_TPM(scExp)
head(SingleCellExperiment::normcounts(scExp))

raw <- create_scDataset_raw()
scExp = create_scExp(raw$mat, raw$annot)
scExp = preprocess_TPM(scExp)
head(SingleCellExperiment::normcounts(scExp))

Preprocess and filter matrix annotation data project folder to SCE

Description

Preprocess and filter matrix annotation data project folder to SCE

Usage

preprocessing_filtering_and_reduction(
  datamatrix,
  annot_raw,
  min_reads_per_cell = 1600,
  max_quantile_read_per_cell = 95,
  n_top_features = 40000,
  norm_type = "CPM",
  n_dims = 10,
  remove_PC = NULL,
  subsample_n = NULL,
  ref_genome = "hg38",
  exclude_regions = NULL,
  doBatchCorr = FALSE,
  batch_sels = NULL
)
preprocessing_filtering_and_reduction(
  datamatrix,
  annot_raw,
  min_reads_per_cell = 1600,
  max_quantile_read_per_cell = 95,
  n_top_features = 40000,
  norm_type = "CPM",
  n_dims = 10,
  remove_PC = NULL,
  subsample_n = NULL,
  ref_genome = "hg38",
  exclude_regions = NULL,
  doBatchCorr = FALSE,
  batch_sels = NULL
)

Arguments

`datamatrix`	A sparse count matrix of features x cells.
`annot_raw`	A data.frame with barcode, cell_id, sample_id, batch_id, total_counts
`min_reads_per_cell`	Minimum read per cell to keep the cell
`max_quantile_read_per_cell`	Upper count quantile threshold above which cells are removed
`n_top_features`	Number of features to keep
`norm_type`	Normalization type c("CPM", "TFIDF", "RPKM", "TPM", "feature_size_only")
`n_dims`	An integer specifying the number of dimensions to keep for PCA
`remove_PC`	A vector of string indicating which principal components to remove before downstream analysis as probably correlated to library size. Should be under the form : 'Component_1', 'Component_2', ... Recommended when using 'TFIDF' normalization method. (NULL)
`subsample_n`	Number of cells to subsample.
`ref_genome`	Reference genome ("hg38" or "mm10").
`exclude_regions`	GenomicRanges with regions to remove from the object.
`doBatchCorr`	Run batch correction ? TRUE or FALSE
`batch_sels`	If doBatchCorr is TRUE, List of characters. Names are batch names, characters are sample names.

Value

A SingleCellExperiment object containing feature spaces.

Examples

raw <- create_scDataset_raw()
scExp = preprocessing_filtering_and_reduction(raw$mat, raw$annot)
raw <- create_scDataset_raw()
scExp = preprocessing_filtering_and_reduction(raw$mat, raw$annot)

Create a sparse count matrix from various format of input data.

Description

This function takes three different type of single-cell input: - Single cell BAM files (sorted) - Single cell BED files (gzipped) - A combination of an index file, a peak file and cell barcode file (The index file is composed of three column: index i, index j and value x for the non zeroes entries in the sparse matrix.)

Usage

raw_counts_to_sparse_matrix(
  files_dir_list,
  file_type = c("scBED", "scBAM", "FragmentFile"),
  use_Signac = TRUE,
  peak_file = NULL,
  n_bins = NULL,
  bin_width = NULL,
  genebody = NULL,
  extendPromoter = 2500,
  verbose = TRUE,
  ref = c("hg38", "mm10")[1],
  progress = NULL,
  BPPARAM = BiocParallel::bpparam()
)
raw_counts_to_sparse_matrix(
  files_dir_list,
  file_type = c("scBED", "scBAM", "FragmentFile"),
  use_Signac = TRUE,
  peak_file = NULL,
  n_bins = NULL,
  bin_width = NULL,
  genebody = NULL,
  extendPromoter = 2500,
  verbose = TRUE,
  ref = c("hg38", "mm10")[1],
  progress = NULL,
  BPPARAM = BiocParallel::bpparam()
)

Arguments

`files_dir_list`	A named character vector of directories containing the files. The names correspond to sample names.
`file_type`	Input file(s) type(s) ('scBED','scBAM','FragmentFile')
`use_Signac`	Use Signac wrapper function 'FeatureMatrix' if the Signac package is installed (TRUE).
`peak_file`	A file containing genomic location of peaks (NULL)
`n_bins`	The number of bins to tile the genome (NULL)
`bin_width`	The size of bins to tile the genome (NULL)
`genebody`	Count on genes (body + promoter) ? (NULL)
`extendPromoter`	If counting on genes, number of base pairs to extend up or downstream of TSS (2500).
`verbose`	Verbose (TRUE)
`ref`	reference genome to use (hg38)
`progress`	Progress object for Shiny
`BPPARAM`	BPPARAM object for multiprocessing. See bpparam for more informations. Will take the default BPPARAM set in your R session.

Details

This functions re-counts signal on either fixed genomic bins, a set of user-defined peaks or around the TSS of genes.

Value

A sparse matrix of features x cells

References

Stuart el al., Multimodal single-cell chromatin analysis with Signac bioRxiv https://doi.org/10.1101/2020.11.09.373613

rawfile_ToBigWig : reads in BAM file and write out BigWig coverage file, normalized and smoothed

Description

rawfile_ToBigWig : reads in BAM file and write out BigWig coverage file, normalized and smoothed

Usage

rawfile_ToBigWig(
  input,
  BigWig_filename,
  format = "BAM",
  bin_width = 150,
  norm_factor,
  n_smoothBin = 5,
  ref = "hg38",
  read_size = 101,
  original_bins = NULL,
  quantile_for_peak_calling = 0.85
)
rawfile_ToBigWig(
  input,
  BigWig_filename,
  format = "BAM",
  bin_width = 150,
  norm_factor,
  n_smoothBin = 5,
  ref = "hg38",
  read_size = 101,
  original_bins = NULL,
  quantile_for_peak_calling = 0.85
)

Arguments

`input`	Either a named list of character vector of path towards single-cell BED files or a sparse raw matrix of small bins (<<500bp). If a named list specifying scBEDn the names MUST correspond to the 'sample_id' column in your SingleCellExperiment object. The single-cell BED files names MUST match the barcode names in your SingleCellExperiment (column 'barcode'). The scBED files can be gzipped or not.
`BigWig_filename`	Path to write the output BigWig file
`format`	File format, either "BAM" or "BED"
`bin_width`	Bin size for coverage
`norm_factor`	Then number of cells or total number of reads in the given sample, for normalization.
`n_smoothBin`	Number of bins for smoothing values
`ref`	Reference genome.
`read_size`	Length of the reads.
`original_bins`	Original bins GenomicRanges in case the format is raw matrix.
`quantile_for_peak_calling`	The quantile to define the threshold above which signal is considered as a peak.

Value

Writes in the output directory a bigwig file displaying the cumulative coverage of cells and a basic set of peaks called by taking all peaks above a given threshold

Writes a BigWig file as output

Read a count matrix with three first columns (chr,start,end)

Description

Read a count matrix with three first columns (chr,start,end)

Usage

read_count_mat_with_separated_chr_start_end(
  path_to_matrix,
  format_test,
  separator
)
read_count_mat_with_separated_chr_start_end(
  path_to_matrix,
  format_test,
  separator
)

Arguments

`path_to_matrix`	Path to the count matrix
`format_test`	Sample of the read.table
`separator`	Separator character

Value

A sparseMatrix with rownames in the form "chr1:1222-55555"

Read in one or multiple sparse matrices (10X format)

Description

Given one or multiple directories, look in each directory for a combination of the following files :

A 'features' file containing unique feature genomic locations -in tab separated format ( *_features.bed / .txt / .tsv / .gz), e.g. chr, start and end
A 'barcodes' file containing unique barcode names ( _barcode.txt / .tsv / .gz)
A 'matrix' A file containing indexes of non zero entries (_matrix.mtx / .gz)

Usage

read_sparse_matrix(files_dir_list, ref = c("hg38", "mm10")[1], verbose = TRUE)
read_sparse_matrix(files_dir_list, ref = c("hg38", "mm10")[1], verbose = TRUE)

Arguments

`files_dir_list`	A named character vector containing the full path towards folders. Each folder should contain only the Feature file, the Barcode file and the Matrix file (see description).
`ref`	Reference genome (used to filter non-canonical chromosomes).
`verbose`	Print ?

Value

Returns a list containing a datamatrix and cell annotation

Examples

## Not run: 
sample_dirs = c("/path/to/folder1/", "/path/to/folder2/")
names(sample_dirs) = c("sample_1", "sample_2")
out <- read_sparse_matrix(sample_dirs, ref = "hg38")
head(out$datamatrix)
head(out$annot_raw)

## End(Not run)

## Not run: 
sample_dirs = c("/path/to/folder1/", "/path/to/folder2/")
names(sample_dirs) = c("sample_1", "sample_2")
out <- read_sparse_matrix(sample_dirs, ref = "hg38")
head(out$datamatrix)
head(out$annot_raw)

## End(Not run)

Rebin Helper for rebin_matrix function

Description

Rebin Helper for rebin_matrix function

Usage

rebin_helper(mat_df)
rebin_helper(mat_df)

Arguments

mat_df

A data.frame corresponding to sparse matrix indexes & values.

Value

a data.frame grouped mean-summarised by col and new_row

Transforms a bins x cells count matrix into a larger bins x cells count matrix.

Description

This functions is best used to re-count large number of small bins or peaks (e.g. <= 5000bp) into equal or larger sized bins. The genome is either cut in fixed bins (e.g. 50,000bp) or into an user defined number of bins. Bins are calculated based on the canconical chromosomes. Note that if peaks are larger than bins, or if peaks are overlapping multiple bins, the signal is added to each bin. Users can increase the minimum overlap to consider peaks overlapping bins (by default 150bp, size of a nucleosome) to disminish the number of peaks overlapping multiple region. Any peak smaller than the minimum overlapp threshold will be dismissed. Therefore, library size might be slightly different from peaks to bins if signal was duplicated into multiple bins or ommitted due to peaks smaller than minimum overlap.

Usage

rebin_matrix(
  mat,
  bin_width = 50000,
  custom_annotation = NULL,
  minoverlap = 500,
  verbose = TRUE,
  ref = "hg38",
  nthreads = 1,
  rebin_function = rebin_helper
)
rebin_matrix(
  mat,
  bin_width = 50000,
  custom_annotation = NULL,
  minoverlap = 500,
  verbose = TRUE,
  ref = "hg38",
  nthreads = 1,
  rebin_function = rebin_helper
)

Arguments

`mat`	A matrix of peaks x cells
`bin_width`	Width of bins to produce in base pairs (minimum 500) (50000)
`custom_annotation`	A GenomicRanges object specifying the new features to count the matrix on instead of recounting on genomic bins. If not NULL, takes predecency over bin_width.
`minoverlap`	Minimum overlap between the original bins and the new features to consider the peak as overlapping the bin . We recommand to put this number at exactly half of the original bin size (e.g. 500bp for original bin size of 1000bp) so that no original bins are counted twice. (500)
`verbose`	Verbose
`ref`	Reference genome to use (hg38)
`nthreads`	Number of threads to use for paralell processing

Value

A sparse matrix of larger bins or peaks.

Examples

mat = create_scDataset_raw()$mat
binned_mat = rebin_matrix(mat,bin_width = 10e6)
dim(binned_mat)

mat = create_scDataset_raw()$mat
binned_mat = rebin_matrix(mat,bin_width = 10e6)
dim(binned_mat)

Reduce dimension with batch corrections

Description

Reduce dimension with batch corrections

Usage

reduce_dim_batch_correction(scExp, mat, batch_list, n)
reduce_dim_batch_correction(scExp, mat, batch_list, n)

Arguments

`scExp`	SingleCellExperiment
`mat`	The normalized count matrix
`batch_list`	List of batches
`n`	Number of PCs to keep

Value

A list containing the SingleCellExperiment with batch info and the corrected pca

Reduce dimensions (PCA, TSNE, UMAP)

Description

Reduce dimensions (PCA, TSNE, UMAP)

Usage

reduce_dims_scExp(
  scExp,
  dimension_reductions = c("PCA", "UMAP"),
  n = 10,
  batch_correction = FALSE,
  batch_list = NULL,
  remove_PC = NULL,
  verbose = TRUE
)
reduce_dims_scExp(
  scExp,
  dimension_reductions = c("PCA", "UMAP"),
  n = 10,
  batch_correction = FALSE,
  batch_list = NULL,
  remove_PC = NULL,
  verbose = TRUE
)

Arguments

`scExp`	A SingleCellExperiment object.
`dimension_reductions`	A character vector of methods to apply. (c('PCA','TSNE','UMAP'))
`n`	Numbers of dimensions to keep for PCA. (50)
`batch_correction`	Do batch correction ? (FALSE)
`batch_list`	List of characters. Names are batch names, characters are sample names.
`remove_PC`	A vector of string indicating which principal components to remove before downstream analysis as probably correlated to library size. Should be under the form : 'Component_1', 'Component_2', ... Recommended when using 'TFIDF' normalization method. (NULL)
`verbose`	Print messages ?(TRUE)

Value

A SingleCellExperiment object containing feature spaces. See ?reduceDims().

Examples


raw <- create_scDataset_raw()
scExp = create_scExp(raw$mat, raw$annot)
scExp =  normalize_scExp(scExp, "CPM")
scExp = reduce_dims_scExp(scExp,dimension_reductions=c("PCA","UMAP"))

raw <- create_scDataset_raw()
scExp = create_scExp(raw$mat, raw$annot)
scExp =  normalize_scExp(scExp, "CPM")
scExp = reduce_dims_scExp(scExp,dimension_reductions=c("PCA","UMAP"))

Remove chromosome M from scExprownames

Description

Remove chromosome M from scExprownames

Usage

remove_chr_M_fun(scExp, verbose)
remove_chr_M_fun(scExp, verbose)

Arguments

`scExp`	A SingleCellExperiment
`verbose`	Print ?

Value

A SingleCellExperiment without chromosome M (mitochondrial chr)

Remove non canonical chromosomes from scExp

Description

Remove non canonical chromosomes from scExp

Usage

remove_non_canonical_fun(scExp, verbose)
remove_non_canonical_fun(scExp, verbose)

Arguments

`scExp`	A SingleCellExperiment
`verbose`	Print ?

Value

A SingleCellExperiment without non canonical chromosomes (random,unknown, contigs etc...)

Resutls of hypergeometric gene set enrichment test

Description

Run hypergeometric enrichment test and combine significant pathways into a data.frame

Usage

results_enrichmentTest(
  differentialGenes,
  enrichment_qval,
  GeneSets,
  GeneSetsDf,
  GenePool
)
results_enrichmentTest(
  differentialGenes,
  enrichment_qval,
  GeneSets,
  GeneSetsDf,
  GenePool
)

Arguments

`differentialGenes`	Genes significantly over / under expressed
`enrichment_qval`	Adusted p-value threshold above which a pathway is considered significative
`GeneSets`	List of pathways
`GeneSetsDf`	Data.frame of pathways
`GenePool`	Pool of possible genes for testing

Value

A data.frame with pathways passing q.value threshold

Retrieve Top and Bot most contributing features of PCA

Description

Retrieve Top and Bot most contributing features of PCA

Usage

retrieve_top_bot_features_pca(
  pca,
  counts,
  component,
  n_top_bot,
  absolute = FALSE
)
retrieve_top_bot_features_pca(
  pca,
  counts,
  component,
  n_top_bot,
  absolute = FALSE
)

Arguments

`pca`	A matrix/data.frame of rotated data
`counts`	the normalized counts used for PCA
`component`	the componenent of interest
`n_top_bot`	the number of top & bot features to take
`absolute`	If TRUE, return the top features in absolute values instead.

Value

a data.frame of top bot contributing features in PCA

Run pairwise tests

Description

Run pairwise tests

Usage

run_pairwise_tests(
  affectation,
  by,
  counts,
  feature,
  method,
  progress = NULL,
  BPPARAM = BiocParallel::bpparam()
)
run_pairwise_tests(
  affectation,
  by,
  counts,
  feature,
  method,
  progress = NULL,
  BPPARAM = BiocParallel::bpparam()
)

Arguments

`affectation`	An annotation data.frame with cell_cluster and cell_id columns
`by`	= A character specifying the column of the object containing the groups of cells to compare.
`counts`	Count matrix
`feature`	Feature data.frame
`method`	DA method, Wilcoxon or edgeR
`progress`	A shiny Progress instance to display progress bar.
`BPPARAM`	BPPARAM object for multiprocessing. See bpparam for more informations. Will take the default BPPARAM set in your R session.

Value

A list containing objects for DA function

Run tsne on single cell experiment

Description

Run tsne on single cell experiment

Usage

run_tsne_scExp(scExp, verbose = FALSE)
run_tsne_scExp(scExp, verbose = FALSE)

Arguments

`scExp`	A SingleCellExperiment Object
`verbose`	Print ?

Value

A colored kable with the number of cells per sample for display

A SingleCellExperiment outputed by ChromSCape

Description

Data from a single-cell ChIP-seq experiment against H3K4me3 active mark from two cell lines, Jurkat B cells and Ramos T cells from Grosselin et al., 2019. The count matrices, on 5kbp bins, were given to ChromSCape and the filtering parameter was set to 3% of cells active in regions and subsampled down to 150 cells per sample. After correlation filtering, the experiment is composed of respectively 51 and 55 cells from Jurkat & Ramos and 5499 5kbp-genomic bins where signal is located.

Usage

data("scExp")
data("scExp")

Format

scExp - a SingleCellExperiment with 106 cells and 5499 features (genomic bins) in hg38:

chr: A SingleCellExperiment

Details

The scExp is composed of :

counts and normcounts assays, PCA, UMAP, and Correlation matrix in reducedDims(scExp)
Assignation of genes to genomic bins in rowRanges(scExp)
Cluster information in colData(scExp) correlation
Hierarchical clustering dengogram in metadata$hc_cor
Consensus clustering raw data in metadata$consclust
Consensus clustering cluster-consensus and item consensus dataframes in metadata$icl
Differential analysis in metadata$diff
Gene Set Analysis in metadata$enr

Examples

data("scExp")
plot_reduced_dim_scExp(scExp)
plot_reduced_dim_scExp(scExp,color_by = "cell_cluster")
plot_heatmap_scExp(scExp)
plot_differential_volcano_scExp(scExp,  "C1")
plot_differential_summary_scExp(scExp)
data("scExp")
plot_reduced_dim_scExp(scExp)
plot_reduced_dim_scExp(scExp,color_by = "cell_cluster")
plot_heatmap_scExp(scExp)
plot_differential_volcano_scExp(scExp,  "C1")
plot_differential_summary_scExp(scExp)

Separate BAM files into cell cluster BAM files

Description

Separate BAM files into cell cluster BAM files

Usage

separate_BAM_into_clusters(affectation, odir, merged_bam)
separate_BAM_into_clusters(affectation, odir, merged_bam)

Arguments

affectation

An annotation data.frame containing cell_id and cell_cluster columns

odir

A valid output directory path

merged_bam

A list of merged bam file paths

@importFrom Rsamtools filterBam ScanBamParam

Value

Create one BAM per cluster from one BAM per condition

Determine Count matrix separator ("tab" or ",")

Description

Determine Count matrix separator ("tab" or ",")

Usage

separator_count_mat(path_to_matrix)
separator_count_mat(path_to_matrix)

Arguments

path_to_matrix

A path towards the count matrix to check

Value

A character separator

Smooth a vector of values with nb_bins left and righ values

Description

Smooth a vector of values with nb_bins left and righ values

Usage

smoothBin(bin_score, nb_bins = 10)
smoothBin(bin_score, nb_bins = 10)

Arguments

bin_score

A numeric vector of values to be smoothed

nb_bins

Number of values to take left and right

@importFrom BiocParallel bpvec

Value

A smooth vector of the same size

Subsample scExp

Description

Randomly sample x cells from each sample in a SingleCellExperiment to return a subsampled SingleCellExperiment with all samples having maximum n cells. If n is higher than the number of cell in a sample, this sample will not be subsampled.

Usage

subsample_scExp(scExp, n_cell_per_sample = 500, n_cell_total = NULL)
subsample_scExp(scExp, n_cell_per_sample = 500, n_cell_total = NULL)

Arguments

`scExp`	A SingleCellExperiment
`n_cell_per_sample`	An integer number of cells to subsample for each sample. Exclusive with n_cells_total. (500)
`n_cell_total`	An integer number of cells to subsample in total. Exclusive with n_cell_per_sample (NULL).

Value

A subsampled SingleCellExperiment

Examples

raw <- create_scDataset_raw()
scExp = create_scExp(raw$mat, raw$annot)
scExp_sub = subsample_scExp(scExp,50)
## Not run: num_cell_scExp(scExp_sub)

raw <- create_scDataset_raw()
scExp = create_scExp(raw$mat, raw$annot)
scExp_sub = subsample_scExp(scExp,50)
## Not run: num_cell_scExp(scExp_sub)

Peak calling on cell clusters

Description

This functions does peak calling on each cell population in order to refine gene annotation for large bins. For instance, a 50000bp bins might contain the TSS of several genes, while in reality only one or two of these genes are overlapping the signal (peak). To do so, first in-silico cell sorting is applied based on previously defined clusters contained in the SingleCellExperiment. Taking BAM files of each sample as input, samtools pools then splits reads from each cell barcode into 1 BAM file per cell cluster (pseudo-bulk). Then MACS2 calls peaks on each cluster. The peaks are aggregated and merged if closer to a certain distance defined by user (10000bp). Then,

This function takes as input a SingleCellExperiment, that must contain a 'cell_cluster' column in it's colData, an output directory where to store temporary files, the list of BAM files corresponding to each sample and containing the cell barcode information as a tag (for instance tag CB:Z:xxx, XB:Z:xxx or else...) or single-cell BED files containing the raw reads and corresponding to the 'barcode' column metadata, the p.value used by MACS2 to distinguish significant peaks, the reference genome (either hg38 or mm10), the maximal merging distance in bp and a data.frame containing gene TSS genomic cooridnates of corresponding genome (if set to NULL, will automatically load geneTSS). The output is a SingleCellExperiment with GRanges object containing ranges of each merged peaks that falls within genomic bins of the SingleCellExperiment, saving the bin range as additional column (window_chr, window_start, window_end), as well as the closests genes and their distance relative to the peak. The peaks may be present in several rows if multiple genes are close / overlap to the peaks.

Note that the user must have MACS2 installed and available in the PATH. Users can open command terminal and type 'which macs2' to verify the availability of these programs. Will only work on unix operating system. Check operating system with 'print(.Platform)'.

Usage

subset_bam_call_peaks(
  scExp,
  odir,
  input,
  format = "BAM",
  p.value = 0.05,
  ref = "hg38",
  peak_distance_to_merge = 10000,
  geneTSS_annotation = NULL,
  run_coverage = FALSE,
  progress = NULL
)
subset_bam_call_peaks(
  scExp,
  odir,
  input,
  format = "BAM",
  p.value = 0.05,
  ref = "hg38",
  peak_distance_to_merge = 10000,
  geneTSS_annotation = NULL,
  run_coverage = FALSE,
  progress = NULL
)

Arguments

`scExp`	A SingleCellExperiment object
`odir`	Output directory where to write temporary files and each cluster's BAM file
`input`	A character vector of file paths to each sample's BAM file, containing cell barcode information as tags. BAM files can be paired-end or single-end.
`format`	Format of the input data, either "BAM" or "scBED".
`p.value`	a p-value to use for MACS2 to determine significant peaks. (0.05)
`ref`	A reference genome, either hg38 or mm10. ('hg38')
`peak_distance_to_merge`	Maximal distance to merge peaks together after peak calling , in bp. (10000)
`geneTSS_annotation`	A data.frame annotation of genes TSS. If NULL will automatically load Gencode list of genes fro specified reference genome.
`run_coverage`	Create coverage tracks (.bw) for each cluster ?
`progress`	A shiny Progress instance to display progress bar.

Details

The BED files of the peaks called for each clusters, as well as the merged peaks are written in the output directory.

Value

A SingleCellExperiment with refinded annotation

Examples

## Not run: 
data("scExp")
subset_bam_call_peaks(scExp, "path/to/out/", list("sample1" = 
   "path/to/BAM/sample1.bam", "sample2" = "path/to/BAM/sample2.bam"),
   p.value = 0.05, ref = "hg38", peak_distance_to_merge = 10000, 
   geneTSS_annotation = NULL)

## End(Not run)
## Not run: 
data("scExp")
subset_bam_call_peaks(scExp, "path/to/out/", list("sample1" = 
   "path/to/BAM/sample1.bam", "sample2" = "path/to/BAM/sample2.bam"),
   p.value = 0.05, ref = "hg38", peak_distance_to_merge = 10000, 
   geneTSS_annotation = NULL)

## End(Not run)

Summary of the differential analysis

Description

Summary of the differential analysis

Usage

summary_DA(scExp, qval.th = 0.01, logFC.th = 1, min.percent = 0.01)
summary_DA(scExp, qval.th = 0.01, logFC.th = 1, min.percent = 0.01)

Arguments

`scExp`	A SingleCellExperiment object containing consclust with selected number of cluster.
`qval.th`	Adjusted p-value threshold. (0.01)
`logFC.th`	Fold change threshold. (1)
`min.percent`	Minimum fraction of cells having the feature active to consider it as significantly differential. (0.01)

Value

A table summary of the differential analysis

Examples

data('scExp')
summary_DA(scExp)
data('scExp')
summary_DA(scExp)

Swap main & alternative Experiments, with fixed colData

Description

Swap main & alternative Experiments, with fixed colData

Usage

swapAltExp_sameColData(scExp, alt)
swapAltExp_sameColData(scExp, alt)

Arguments

`scExp`	A SingleCellExperiment
`alt`	Name of the alternative experiment

Value

A swapped SingleCellExperiment with the exact same colData.

Examples

data(scExp) 
swapAltExp_sameColData(scExp, "peaks")

data(scExp) 
swapAltExp_sameColData(scExp, "peaks")

Creates table of enriched genes sets

Description

Creates table of enriched genes sets

Usage

table_enriched_genes_scExp(
  scExp,
  set = "Both",
  group = "C1",
  enr_class_sel = c("c1_positional", "c2_curated", "c3_motif", "c4_computational",
    "c5_GO", "c6_oncogenic", "c7_immunologic", "hallmark")
)
table_enriched_genes_scExp(
  scExp,
  set = "Both",
  group = "C1",
  enr_class_sel = c("c1_positional", "c2_curated", "c3_motif", "c4_computational",
    "c5_GO", "c6_oncogenic", "c7_immunologic", "hallmark")
)

Arguments

`scExp`	A SingleCellExperiment object containing list of enriched gene sets.
`set`	A character vector, either 'Both', 'Overexpressed' or 'Underexpressed'. ('Both')
`group`	The "group" name from differential analysis. Can be the cluster name or the custom name in case of a custom differential analysis.
`enr_class_sel`	Which classes of gene sets to show. (c('c1_positional', 'c2_curated', ...))

Value

A DT::data.table of enriched gene sets.

Examples

data("scExp")
## Not run: table_enriched_genes_scExp(scExp)

data("scExp")
## Not run: table_enriched_genes_scExp(scExp)

Warning for differential_analysis_scExp

Description

Warning for differential_analysis_scExp

Usage

warning_DA(scExp, by, de_type, method, block, group, ref)
warning_DA(scExp, by, de_type, method, block, group, ref)

Arguments

`scExp`	A SingleCellExperiment object containing consclust with selected number of cluster.
`by`	= A character specifying the column of the object containing the groups of cells to compare. Exclusive with de_type == custom
`de_type`	Type of comparisons. Either 'one_vs_rest', to compare each cluster against all others, or 'pairwise' to make 1 to 1 comparisons. ('one_vs_rest')
`method`	Wilcoxon or edgerGLM
`block`	Use batches as blocking factors ?
`group`	If de_type is custom, the group to compare (data.frame), must be a one-column data.frame with cell_clusters or sample_id as character in rows
`ref`	If de_type is custom, the reference to compare (data.frame), must be a one-column data.frame with cell_clusters or sample_id as character in rows

Value

Warnings or Errors if the input are not correct

warning_filter_correlated_cell_scExp

Description

warning_filter_correlated_cell_scExp

Usage

warning_filter_correlated_cell_scExp(
  scExp,
  random_iter,
  corr_threshold,
  percent_correlation,
  run_tsne,
  downsample,
  verbose
)
warning_filter_correlated_cell_scExp(
  scExp,
  random_iter,
  corr_threshold,
  percent_correlation,
  run_tsne,
  downsample,
  verbose
)

Arguments

`scExp`	A SingleCellExperiment object containing 'Cor', a correlation matrix, in reducedDims.
`random_iter`	Number of random matrices to create to calculate random correlation scores. (50)
`corr_threshold`	Quantile of random correlation score above which a cell is considered to be 'correlated' with another cell. (99)
`percent_correlation`	Percentage of the cells that any cell must be 'correlated' to in order to not be filtered. (1)
`run_tsne`	Re-run tsne ? (FALSE)
`downsample`	Number of cells to calculate correlation filtering threshold ? (2500)
`verbose`	(TRUE)

Value

Warnings or Errors if the input are not correct

A warning helper for plot_reduced_dim_scExp

Description

A warning helper for plot_reduced_dim_scExp

Usage

warning_plot_reduced_dim_scExp(
  scExp,
  color_by,
  reduced_dim,
  downsample,
  transparency,
  size,
  max_distanceToTSS,
  annotate_clusters,
  min_quantile,
  max_quantile
)
warning_plot_reduced_dim_scExp(
  scExp,
  color_by,
  reduced_dim,
  downsample,
  transparency,
  size,
  max_distanceToTSS,
  annotate_clusters,
  min_quantile,
  max_quantile
)

Arguments

`scExp`	A SingleCellExperiment Object
`color_by`	Feature used for coloration
`reduced_dim`	Reduced Dimension used for plotting
`downsample`	Number of cells to downsample
`transparency`	Alpha parameter, between 0 and 1
`size`	Size of the points.
`max_distanceToTSS`	Numeric. Maximum distance to a gene's TSS to consider a region linked to a gene.
`annotate_clusters`	A logical indicating if clusters should be labelled. The 'cell_cluster' column should be present in metadata.
`min_quantile`	The lower threshold to remove outlier cells, as quantile of cell embeddings (between 0 and 0.5).
`max_quantile`	The upper threshold to remove outlier cells, as quantile of cell embeddings (between 0.5 and 1).

Value

Warning or errors if the inputs are not correct

Warning for raw_counts_to_sparse_matrix

Description

Warning for raw_counts_to_sparse_matrix

Usage

warning_raw_counts_to_sparse_matrix(
  files_dir_list,
  file_type = c("scBAM", "scBED", "SparseMatrix"),
  peak_file = NULL,
  n_bins = NULL,
  bin_width = NULL,
  genebody = NULL,
  extendPromoter = 2500,
  verbose = TRUE,
  ref = "hg38"
)
warning_raw_counts_to_sparse_matrix(
  files_dir_list,
  file_type = c("scBAM", "scBED", "SparseMatrix"),
  peak_file = NULL,
  n_bins = NULL,
  bin_width = NULL,
  genebody = NULL,
  extendPromoter = 2500,
  verbose = TRUE,
  ref = "hg38"
)

Arguments

`files_dir_list`	A named character vector of directory containing the raw files
`file_type`	Input file(s) type(s) ('scBED','scBAM','SparseMatrix')
`peak_file`	A file containing genomic location of peaks (NULL)
`n_bins`	The number of bins to tile the genome (NULL)
`bin_width`	The size of bins to tile the genome (NULL)
`genebody`	Count on genes (body + promoter) ? (NULL)
`extendPromoter`	If counting on genes, number of base pairs to extend up or downstream of TSS (2500).
`verbose`	Verbose (TRUE)
`ref`	reference genome to use (hg38)

Value

Error or warnings if the input are not correct

Wrapper around 'FeatureMatrix' function from Signac Package

Description

Wrapper around 'FeatureMatrix' function from Signac Package

Usage

wrapper_Signac_FeatureMatrix(
  files_dir_list,
  which,
  ref = "hg38",
  process_n = 2000,
  set_future_plan = TRUE,
  verbose = TRUE,
  progress = NULL
)
wrapper_Signac_FeatureMatrix(
  files_dir_list,
  which,
  ref = "hg38",
  process_n = 2000,
  set_future_plan = TRUE,
  verbose = TRUE,
  progress = NULL
)

Arguments

`files_dir_list`	A named character vector of directories containing the files. The names correspond to sample names.
`which`	A GenomicRanges containing the features to count on.
`ref`	Reference genome to use (hg38).Chromosomes that are not present in the canonical chromosomes of the given reference genome will be excluded from the matrix.
`process_n`	Number of regions to load into memory at a time, per thread. Processing more regions at once can be faster but uses more memory. (2000)
`set_future_plan`	Set 'multisession' plan within the function (TRUE). If TRUE, the previous plan (e.g. future::plan()) will be set back on exit.
`verbose`	Verbose (TRUE).
`progress`	Progress object for Shiny.

Details

Signac & future are not required packages for ChromSCape as they are required only for the fragment matrix calculations. To use this function, install Signac package first (future will be installed as a dependency). For the simplicity of the application & optimization, the function by defaults sets future::plan("multisession") with workers = future::availableCores(omit = 1) in order to allow parallel processing with Signac. On exit the plan is re-set to the previously set future plan. Note that future multisession may have trouble running when VPN is on. To run in parallel, first deactivate your VPN if you encounter long runtimes.

Value

A sparse matrix of features x cells

References

Stuart el al., Multimodal single-cell chromatin analysis with Signac bioRxiv https://doi.org/10.1101/2020.11.09.373613

Examples

## Not run: 
gr_bins = define_feature("hg38", bin_width = 50000)
wrapper_Signac_FeatureMatrix("/path/to/dir_containing_fragment_files",
 gr_bins, ref = "hg38")

## End(Not run)
## Not run: 
gr_bins = define_feature("hg38", bin_width = 50000)
wrapper_Signac_FeatureMatrix("/path/to/dir_containing_fragment_files",
 gr_bins, ref = "hg38")

## End(Not run)

Package 'ChromSCape'

Help Index

Find nearest peaks of each gene and return refined annotation

Description

Usage

Arguments

Value

annotToCol2

Description

Usage

Arguments

Value

Examples

Helper binary column for anocol function

Description

Usage

Arguments

Value

Helper binary column for anocol function

Description

Usage

Arguments

Value

Count bam files on interval to create count indexes

Description

Usage

Arguments

Value

Count bed files on interval to create count indexes

Description

Usage

Arguments

Value

Estimate copy number alterations in cytobands

Description

Usage

Arguments

Value

Examples

Calculate Fraction of reads in each cytobands

Description

Usage

Arguments

Value

Examples

Estimate the copy gains/loss of tumor vs normal based on log2-ratio of fraction of reads

Description

Usage

Arguments

Value

Examples

Calculate the log2-ratio of tumor vs normal fraction of reads in cytobands

Description

Usage

Arguments

Value

Examples

Calling MACS2 peak caller and merging resulting peaks

Description

Usage

Arguments

Value

changeRange

Description

Usage

Arguments

Value

A data.frame with the number of targets of each TF in ChEA3

Description

Usage

Format

References

Examples

Check if matrix rownames are well formated and correct if needed

Description

Usage

Arguments

Value

Choose a number of clusters

Description