Package 'SCOPE' reference manual

Title:	A normalization and copy number estimation method for single-cell DNA sequencing
Description:	Whole genome single-cell DNA sequencing (scDNA-seq) enables characterization of copy number profiles at the cellular level. This circumvents the averaging effects associated with bulk-tissue sequencing and has increased resolution yet decreased ambiguity in deconvolving cancer subclones and elucidating cancer evolutionary history. ScDNA-seq data is, however, sparse, noisy, and highly variable even within a homogeneous cell population, due to the biases and artifacts that are introduced during the library preparation and sequencing procedure. Here, we propose SCOPE, a normalization and copy number estimation method for scDNA-seq data. The distinguishing features of SCOPE include: (i) utilization of cell-specific Gini coefficients for quality controls and for identification of normal/diploid cells, which are further used as negative control samples in a Poisson latent factor model for normalization; (ii) modeling of GC content bias using an expectation-maximization algorithm embedded in the Poisson generalized linear models, which accounts for the different copy number states along the genome; (iii) a cross-sample iterative segmentation procedure to identify breakpoints that are shared across cells from the same genetic background.
Authors:	Rujin Wang, Danyu Lin, Yuchao Jiang
Maintainer:	Rujin Wang <[email protected]>
License:	GPL-2
Version:	1.19.0
Built:	2025-03-27 05:44:42 UTC
Source:	https://github.com/bioc/SCOPE

Pre-stored coverageObj.scope data for demonstration purposes

Description

Pre-stored coverageObj.scope data for demonstration purposes

Usage

coverageObj.scopeDemo
coverageObj.scopeDemo

Format

Pre-computed using whole genome sequencing data of three single cells from 10X Genomics Single-Cell CNV solution

Get bam file directories, sample names, and whole genomic bins

Description

Get bam file directories, sample names, and whole genomic bins from .bed file

Usage

get_bam_bed(bamdir, sampname, hgref = "hg19", resolution = 500, 
            sex = FALSE)
get_bam_bed(bamdir, sampname, hgref = "hg19", resolution = 500, 
            sex = FALSE)

Arguments

`bamdir`	vector of the directory of a bam file. Should be in the same order as sample names in `sampname`.
`sampname`	vector of sample names. Should be in the same order as bam directories in `bamdir`.
`hgref`	reference genome. This should be 'hg19', 'hg38' or 'mm10'. Default is human genome `hg19`.
`resolution`	numeric value of fixed bin-length. Default is `500`. Unit is "kb".
`sex`	logical, whether to include sex chromosomes. Default is `FALSE`.

Value

A list with components

`bamdir`	A vector of bam directories
`sampname`	A vector of sample names
`ref`	A GRanges object specifying whole genomic bin positions

Author(s)

Rujin Wang [email protected]

Examples

library(WGSmapp)
library(BSgenome.Hsapiens.UCSC.hg38)
bamfolder <- system.file('extdata', package = 'WGSmapp')
bamFile <- list.files(bamfolder, pattern = '*.dedup.bam$')
bamdir <- file.path(bamfolder, bamFile)
sampname_raw <- sapply(strsplit(bamFile, '.', fixed = TRUE), '[', 1)
bambedObj <- get_bam_bed(bamdir = bamdir, sampname = sampname_raw, 
                        hgref = "hg38")
bamdir <- bambedObj$bamdir
sampname_raw <- bambedObj$sampname
ref_raw <- bambedObj$ref

library(WGSmapp)
library(BSgenome.Hsapiens.UCSC.hg38)
bamfolder <- system.file('extdata', package = 'WGSmapp')
bamFile <- list.files(bamfolder, pattern = '*.dedup.bam$')
bamdir <- file.path(bamfolder, bamFile)
sampname_raw <- sapply(strsplit(bamFile, '.', fixed = TRUE), '[', 1)
bambedObj <- get_bam_bed(bamdir = bamdir, sampname = sampname_raw, 
                        hgref = "hg38")
bamdir <- bambedObj$bamdir
sampname_raw <- bambedObj$sampname
ref_raw <- bambedObj$ref

Get read coverage from single-cell DNA sequencing

Description

Get read coverage for each genomic bin across all single cells from scDNA-seq. Blacklist regions, such as segmental duplication regions and gaps near telomeres/centromeres will be masked prior to getting coverage.

Usage

get_coverage_scDNA(bambedObj, mapqthres, seq, hgref = "hg19")
get_coverage_scDNA(bambedObj, mapqthres, seq, hgref = "hg19")

Arguments

`bambedObj`	object returned from `get_bam_bed`
`mapqthres`	mapping quality threshold of reads
`seq`	the sequencing method to be used. This should be either 'paired-end' or 'single-end'
`hgref`	reference genome. This should be 'hg19', 'hg38' or 'mm10'. Default is human genome `hg19`.

Value

`Y`	Read depth matrix

Author(s)

Rujin Wang [email protected]

Examples

library(WGSmapp)
library(BSgenome.Hsapiens.UCSC.hg38)
bamfolder <- system.file('extdata', package = 'WGSmapp')
bamFile <- list.files(bamfolder, pattern = '*.dedup.bam$')
bamdir <- file.path(bamfolder, bamFile)
sampname_raw <- sapply(strsplit(bamFile, '.', fixed = TRUE), '[', 1)
bambedObj <- get_bam_bed(bamdir = bamdir,
                            sampname = sampname_raw, 
                            hgref = "hg38")

# Getting raw read depth
coverageObj <- get_coverage_scDNA(bambedObj,
                                mapqthres = 40,
                                seq = 'paired-end', 
                                hgref = "hg38")
Y_raw <- coverageObj$Y

library(WGSmapp)
library(BSgenome.Hsapiens.UCSC.hg38)
bamfolder <- system.file('extdata', package = 'WGSmapp')
bamFile <- list.files(bamfolder, pattern = '*.dedup.bam$')
bamdir <- file.path(bamfolder, bamFile)
sampname_raw <- sapply(strsplit(bamFile, '.', fixed = TRUE), '[', 1)
bambedObj <- get_bam_bed(bamdir = bamdir,
                            sampname = sampname_raw, 
                            hgref = "hg38")

# Getting raw read depth
coverageObj <- get_coverage_scDNA(bambedObj,
                                mapqthres = 40,
                                seq = 'paired-end', 
                                hgref = "hg38")
Y_raw <- coverageObj$Y

Compute GC content

Description

Compute GC content for each bin

Usage

get_gc(ref, hgref = "hg19")
get_gc(ref, hgref = "hg19")

Arguments

`ref`	GRanges object returned from `get_bam_bed`
`hgref`	reference genome. This should be 'hg19', 'hg38' or 'mm10'. Default is human genome `hg19`.

Value

`gc`	Vector of GC content for each bin/target

Author(s)

Rujin Wang [email protected]

Examples

## Not run: 
library(WGSmapp)
library(BSgenome.Hsapiens.UCSC.hg38)
bamfolder <- system.file('extdata', package = 'WGSmapp')
bamFile <- list.files(bamfolder, pattern = '*.dedup.bam$')
bamdir <- file.path(bamfolder, bamFile)
sampname_raw <- sapply(strsplit(bamFile, '.', fixed = TRUE), '[', 1)
bambedObj <- get_bam_bed(bamdir = bamdir,
                            sampname = sampname_raw, 
                            hgref = "hg38")
bamdir <- bambedObj$bamdir
sampname_raw <- bambedObj$sampname
ref_raw <- bambedObj$ref

gc <- get_gc(ref_raw, hgref = "hg38")

## End(Not run)

## Not run: 
library(WGSmapp)
library(BSgenome.Hsapiens.UCSC.hg38)
bamfolder <- system.file('extdata', package = 'WGSmapp')
bamFile <- list.files(bamfolder, pattern = '*.dedup.bam$')
bamdir <- file.path(bamfolder, bamFile)
sampname_raw <- sapply(strsplit(bamFile, '.', fixed = TRUE), '[', 1)
bambedObj <- get_bam_bed(bamdir = bamdir,
                            sampname = sampname_raw, 
                            hgref = "hg38")
bamdir <- bambedObj$bamdir
sampname_raw <- bambedObj$sampname
ref_raw <- bambedObj$ref

gc <- get_gc(ref_raw, hgref = "hg38")

## End(Not run)

Compute Gini coefficients for single cells

Description

Gini index is defined as two times the area between the Lorenz curve and the diagonal.

Usage

get_gini(Y)
get_gini(Y)

Arguments

`Y`	raw read depth matrix after quality control procedure

Value

Gini

Vector of Gini coefficients for single cells from scDNA-seq

Author(s)

Rujin Wang [email protected]

Examples

Gini <- get_gini(Y_sim)

Gini <- get_gini(Y_sim)

Compute mappability

Description

Compute mappability for each bin. Note that scDNA sequencing is whole-genome amplification and the mappability score is essential to determine variable binning method. Mappability track for 100-mers on the GRCh37/hg19 human reference genome from ENCODE is pre-saved. Compute the mean of mappability scores that overlapped reads map to bins, weighted by the width of mappability tracks on the genome reference. Use liftOver utility to calculate mappability for hg38, which is pre-saved as well. For mm10, there are two workarounds: 1) set all mappability to 1 to avoid extensive computation; 2) adopt QC procedures based on annotation results, e.g., filter out bins within black list regions, which generally have low mappability.

Usage

get_mapp(ref, hgref = "hg19")
get_mapp(ref, hgref = "hg19")

Arguments

`ref`	GRanges object returned from `get_bam_bed`
`hgref`	reference genome. This should be 'hg19', 'hg38' or 'mm10'. Default is human genome `hg19`.

Value

mapp

Vector of mappability for each bin/target

Author(s)

Rujin Wang [email protected]

Examples

## Not run: 
library(WGSmapp)
library(BSgenome.Hsapiens.UCSC.hg38)
bamfolder <- system.file('extdata', package = 'WGSmapp')
bamFile <- list.files(bamfolder, pattern = '*.dedup.bam$')
bamdir <- file.path(bamfolder, bamFile)
sampname_raw <- sapply(strsplit(bamFile, '.', fixed = TRUE), '[', 1)
bambedObj <- get_bam_bed(bamdir = bamdir,
                            sampname = sampname_raw, 
                            hgref = "hg38")
bamdir <- bambedObj$bamdir
sampname_raw <- bambedObj$sampname
ref_raw <- bambedObj$ref

mapp <- get_mapp(ref_raw, hgref = "hg38")

## End(Not run)

## Not run: 
library(WGSmapp)
library(BSgenome.Hsapiens.UCSC.hg38)
bamfolder <- system.file('extdata', package = 'WGSmapp')
bamFile <- list.files(bamfolder, pattern = '*.dedup.bam$')
bamdir <- file.path(bamfolder, bamFile)
sampname_raw <- sapply(strsplit(bamFile, '.', fixed = TRUE), '[', 1)
bambedObj <- get_bam_bed(bamdir = bamdir,
                            sampname = sampname_raw, 
                            hgref = "hg38")
bamdir <- bambedObj$bamdir
sampname_raw <- bambedObj$sampname
ref_raw <- bambedObj$ref

mapp <- get_mapp(ref_raw, hgref = "hg38")

## End(Not run)

Get QC metrics for single cells

Description

Perform QC step on single cells.

Usage

get_samp_QC(bambedObj)
get_samp_QC(bambedObj)

Arguments

bambedObj

object returned from get_bam_bed

Value

QCmetric

A matrix containing total number/proportion of reads, total number/proportion of mapped reads, total number/proportion of mapped non-duplicate reads, and number/proportion of reads with mapping quality greater than 20

Author(s)

Rujin Wang [email protected]

Examples

library(WGSmapp)
library(BSgenome.Hsapiens.UCSC.hg38)
bamfolder <- system.file('extdata', package = 'WGSmapp')
bamFile <- list.files(bamfolder, pattern = '*.dedup.bam$')
bamdir <- file.path(bamfolder, bamFile)
sampname_raw <- sapply(strsplit(bamFile, '.', fixed = TRUE), '[', 1)
bambedObj <- get_bam_bed(bamdir = bamdir,
                            sampname = sampname_raw, 
                            hgref = "hg38")
QCmetric_raw = get_samp_QC(bambedObj)

library(WGSmapp)
library(BSgenome.Hsapiens.UCSC.hg38)
bamfolder <- system.file('extdata', package = 'WGSmapp')
bamFile <- list.files(bamfolder, pattern = '*.dedup.bam$')
bamdir <- file.path(bamfolder, bamFile)
sampname_raw <- sapply(strsplit(bamFile, '.', fixed = TRUE), '[', 1)
bambedObj <- get_bam_bed(bamdir = bamdir,
                            sampname = sampname_raw, 
                            hgref = "hg38")
QCmetric_raw = get_samp_QC(bambedObj)

A post cross-sample segmentation integer copy number matrix returned by SCOPE in the demo

Description

A post cross-sample segmentation integer copy number matrix returned by SCOPE in the demo

Usage

iCN_sim
iCN_sim

Format

A post cross-sample segmentation integer copy number matrix of five toy cells returned by SCOPE

Ploidy pre-initialization

Description

Pre-estimate ploidies across all cells

Usage

initialize_ploidy(Y, Yhat, ref, maxPloidy = 6, minPloidy = 1.5,
                minBinWidth = 5, SoS.plot = FALSE)
initialize_ploidy(Y, Yhat, ref, maxPloidy = 6, minPloidy = 1.5,
                minBinWidth = 5, SoS.plot = FALSE)

Arguments

`Y`	raw read depth matrix after quality control procedure
`Yhat`	normalized read depth matrix
`ref`	GRanges object after quality control procedure
`maxPloidy`	maximum ploidy candidate. Defalut is `6`
`minPloidy`	minimum ploidy candidate. Defalut is `1.5`
`minBinWidth`	the minimum number of bins for a changed segment. Defalut is `5`
`SoS.plot`	logical, whether to generate ploidy pre-estimation plots. Default is `FALSE`.

Value

ploidy.SoS

Vector of pre-estimated ploidies for each cell

Author(s)

Rujin Wang [email protected]

Examples

Gini <- get_gini(Y_sim)

# first-pass CODEX2 run with no latent factors
normObj.sim <- normalize_codex2_ns_noK(Y_qc = Y_sim,
                                        gc_qc = ref_sim$gc,
                                        norm_index = which(Gini<=0.12))
Yhat.noK.sim <- normObj.sim$Yhat
beta.hat.noK.sim <- normObj.sim$beta.hat
fGC.hat.noK.sim <- normObj.sim$fGC.hat
N.sim <- normObj.sim$N

# Ploidy initialization
ploidy.sim <- initialize_ploidy(Y = Y_sim, 
                                Yhat = Yhat.noK.sim, 
                                ref = ref_sim)
ploidy.sim

Gini <- get_gini(Y_sim)

# first-pass CODEX2 run with no latent factors
normObj.sim <- normalize_codex2_ns_noK(Y_qc = Y_sim,
                                        gc_qc = ref_sim$gc,
                                        norm_index = which(Gini<=0.12))
Yhat.noK.sim <- normObj.sim$Yhat
beta.hat.noK.sim <- normObj.sim$beta.hat
fGC.hat.noK.sim <- normObj.sim$fGC.hat
N.sim <- normObj.sim$N

# Ploidy initialization
ploidy.sim <- initialize_ploidy(Y = Y_sim, 
                                Yhat = Yhat.noK.sim, 
                                ref = ref_sim)
ploidy.sim

Group-wise ploidy pre-initialization

Description

Pre-estimate ploidies across cells with shared clonal memberships

Usage

initialize_ploidy_group(Y, Yhat, ref, groups, 
                        maxPloidy = 6, minPloidy = 1.5,
                        minBinWidth = 5, SoS.plot = FALSE)
initialize_ploidy_group(Y, Yhat, ref, groups, 
                        maxPloidy = 6, minPloidy = 1.5,
                        minBinWidth = 5, SoS.plot = FALSE)

Arguments

`Y`	raw read depth matrix after quality control procedure
`Yhat`	normalized read depth matrix
`ref`	GRanges object after quality control procedure
`groups`	clonal membership labels for each cell
`maxPloidy`	maximum ploidy candidate. Defalut is `6`
`minPloidy`	minimum ploidy candidate. Defalut is `1.5`
`minBinWidth`	the minimum number of bins for a changed segment. Defalut is `5`
`SoS.plot`	logical, whether to generate ploidy pre-estimation plots. Default is `FALSE`.

Value

ploidy.SoS

Vector of group-wise pre-estimated ploidies for each cell

Author(s)

Rujin Wang [email protected]

Examples

Gini <- get_gini(Y_sim)

# first-pass CODEX2 run with no latent factors
normObj.sim <- normalize_codex2_ns_noK(Y_qc = Y_sim,
                                        gc_qc = ref_sim$gc,
                                        norm_index = which(Gini<=0.12))
Yhat.noK.sim <- normObj.sim$Yhat
beta.hat.noK.sim <- normObj.sim$beta.hat
fGC.hat.noK.sim <- normObj.sim$fGC.hat
N.sim <- normObj.sim$N

# Group-wise ploidy initialization
clones <- c("normal", "tumor1", "normal", "tumor1", "tumor1")
ploidy.sim.group <- initialize_ploidy_group(Y = Y_sim, Yhat = Yhat.noK.sim, 
                                ref = ref_sim, groups = clones)
ploidy.sim.group

Gini <- get_gini(Y_sim)

# first-pass CODEX2 run with no latent factors
normObj.sim <- normalize_codex2_ns_noK(Y_qc = Y_sim,
                                        gc_qc = ref_sim$gc,
                                        norm_index = which(Gini<=0.12))
Yhat.noK.sim <- normObj.sim$Yhat
beta.hat.noK.sim <- normObj.sim$beta.hat
fGC.hat.noK.sim <- normObj.sim$fGC.hat
N.sim <- normObj.sim$N

# Group-wise ploidy initialization
clones <- c("normal", "tumor1", "normal", "tumor1", "tumor1")
ploidy.sim.group <- initialize_ploidy_group(Y = Y_sim, Yhat = Yhat.noK.sim, 
                                ref = ref_sim, groups = clones)
ploidy.sim.group

Normalization of read depth without latent factors under the case-control setting

Description

Assuming that all reads are from diploid regions, fit a Poisson generalized linear model to normalize the raw read depth data from single-cell DNA sequencing, without latent factors under the case-control setting.

Usage

normalize_codex2_ns_noK(Y_qc, gc_qc, norm_index)
normalize_codex2_ns_noK(Y_qc, gc_qc, norm_index)

Arguments

`Y_qc`	read depth matrix after quality control
`gc_qc`	vector of GC content for each bin after quality control
`norm_index`	indices of normal/diploid cells

Value

A list with components

`Yhat`	A list of normalized read depth matrix
`fGC.hat`	A list of estimated GC content bias matrix
`beta.hat`	A list of estimated bin-specific bias vector
`N`	A vector of cell-specific library size factor, which is computed from the genome-wide read depth data

Author(s)

Rujin Wang [email protected]

Examples

Gini <- get_gini(Y_sim)
# first-pass CODEX2 run with no latent factors
normObj.sim <- normalize_codex2_ns_noK(Y_qc = Y_sim,
                                        gc_qc = ref_sim$gc,
                                        norm_index = which(Gini<=0.12))

Gini <- get_gini(Y_sim)
# first-pass CODEX2 run with no latent factors
normObj.sim <- normalize_codex2_ns_noK(Y_qc = Y_sim,
                                        gc_qc = ref_sim$gc,
                                        norm_index = which(Gini<=0.12))

Normalization of read depth with latent factors using Expectation-Maximization algorithm under the case-control setting

Description

Fit a Poisson generalized linear model to normalize the raw read depth data from single-cell DNA sequencing, with latent factors under the case-control setting. Model GC content bias using an expectation-maximization algorithm, which accounts for the different copy number states.

Usage

normalize_scope(Y_qc, gc_qc, K, norm_index, T, ploidyInt,
                beta0, minCountQC = 20)
normalize_scope(Y_qc, gc_qc, K, norm_index, T, ploidyInt,
                beta0, minCountQC = 20)

Arguments

`Y_qc`	read depth matrix after quality control
`gc_qc`	vector of GC content for each bin after quality control
`K`	Number of latent Poisson factors
`norm_index`	indices of normal/diploid cells
`T`	a vector of integers indicating number of CNV groups. Use BIC to select optimal number of CNV groups. If `T = 1`, assume all reads are from normal regions so that EM algorithm is not implemented. Otherwise, we assume there is always a CNV group of heterozygous deletion and a group of null region. The rest groups are representative of different duplication states.
`ploidyInt`	a vector of initialized ploidy return from `initialize_ploidy`. Users are also allowed to provide prior-knowledge ploidies as the input and to manually tune a few cells that have poor fitting
`beta0`	a vector of initialized bin-specific biases returned from CODEX2 without latent factors
`minCountQC`	the minimum read coverage required for normalization and EM fitting. Defalut is `20`

Value

A list with components

`Yhat`	A list of normalized read depth matrix with EM
`alpha.hat`	A list of absolute copy number matrix
`fGC.hat`	A list of EM estimated GC content bias matrix
`beta.hat`	A list of EM estimated bin-specific bias vector
`g.hat`	A list of estimated Poisson latent factor
`h.hat`	A list of estimated Poisson latent factor
`AIC`	AIC for model selection
`BIC`	BIC for model selection
`RSS`	RSS for model selection
`K`	Number of latent Poisson factors

Author(s)

Rujin Wang [email protected]

Examples

Gini <- get_gini(Y_sim)

# first-pass CODEX2 run with no latent factors
normObj.sim <- normalize_codex2_ns_noK(Y_qc = Y_sim,
                                        gc_qc = ref_sim$gc,
                                        norm_index = which(Gini<=0.12))
Yhat.noK.sim <- normObj.sim$Yhat
beta.hat.noK.sim <- normObj.sim$beta.hat
fGC.hat.noK.sim <- normObj.sim$fGC.hat
N.sim <- normObj.sim$N

# Ploidy initialization
ploidy.sim <- initialize_ploidy(Y = Y_sim, 
                                Yhat = Yhat.noK.sim, 
                                ref = ref_sim)
ploidy.sim

normObj.scope.sim <- normalize_scope(Y_qc = Y_sim, gc_qc = ref_sim$gc,
                                    K = 1, ploidyInt = ploidy.sim,
                                    norm_index = which(Gini<=0.12), T = 1:5,
                                    beta0 = beta.hat.noK.sim)
Yhat.sim <- normObj.scope.sim$Yhat[[which.max(normObj.scope.sim$BIC)]]
fGC.hat.sim <- normObj.scope.sim$fGC.hat[[which.max(normObj.scope.sim$BIC)]]

Gini <- get_gini(Y_sim)

# first-pass CODEX2 run with no latent factors
normObj.sim <- normalize_codex2_ns_noK(Y_qc = Y_sim,
                                        gc_qc = ref_sim$gc,
                                        norm_index = which(Gini<=0.12))
Yhat.noK.sim <- normObj.sim$Yhat
beta.hat.noK.sim <- normObj.sim$beta.hat
fGC.hat.noK.sim <- normObj.sim$fGC.hat
N.sim <- normObj.sim$N

# Ploidy initialization
ploidy.sim <- initialize_ploidy(Y = Y_sim, 
                                Yhat = Yhat.noK.sim, 
                                ref = ref_sim)
ploidy.sim

normObj.scope.sim <- normalize_scope(Y_qc = Y_sim, gc_qc = ref_sim$gc,
                                    K = 1, ploidyInt = ploidy.sim,
                                    norm_index = which(Gini<=0.12), T = 1:5,
                                    beta0 = beta.hat.noK.sim)
Yhat.sim <- normObj.scope.sim$Yhat[[which.max(normObj.scope.sim$BIC)]]
fGC.hat.sim <- normObj.scope.sim$fGC.hat[[which.max(normObj.scope.sim$BIC)]]

Normalization of read depth with latent factors using Expectation-Maximization algorithm under the case-control setting in parallel

Description

Usage

normalize_scope_foreach(Y_qc, gc_qc, K, norm_index, T,
    ploidyInt, beta0, minCountQC = 20, nCores = NULL)
normalize_scope_foreach(Y_qc, gc_qc, K, norm_index, T,
    ploidyInt, beta0, minCountQC = 20, nCores = NULL)

Arguments

`Y_qc`	read depth matrix after quality control
`gc_qc`	vector of GC content for each bin after quality control
`K`	Number of latent Poisson factors
`norm_index`	indices of normal/diploid cells
`T`	a vector of integers indicating number of CNV groups. Use BIC to select optimal number of CNV groups. If `T = 1`, assume all reads are from normal regions so that EM algorithm is not implemented. Otherwise, we assume there is always a CNV group of heterozygous deletion and a group of null region. The rest groups are representative of different duplication states.
`ploidyInt`	a vector of initialized ploidy return from `initialize_ploidy`. Users are also allowed to provide prior-knowledge ploidies as the input and to manually tune a few cells that have poor fitting
`beta0`	a vector of initialized bin-specific biases returned from CODEX2 without latent factors
`minCountQC`	the minimum read coverage required for normalization and EM fitting. Defalut is `20`
`nCores`	number of cores to use. If `NULL`, number of cores is detected. Default is `NULL`.

Value

A list with components

`Yhat`	A list of normalized read depth matrix with EM
`alpha.hat`	A list of absolute copy number matrix
`fGC.hat`	A list of EM estimated GC content bias matrix
`beta.hat`	A list of EM estimated bin-specific bias vector
`g.hat`	A list of estimated Poisson latent factor
`h.hat`	A list of estimated Poisson latent factor
`AIC`	AIC for model selection
`BIC`	BIC for model selection
`RSS`	RSS for model selection
`K`	Number of latent Poisson factors

Author(s)

Rujin Wang [email protected]

Examples

Gini <- get_gini(Y_sim)

# first-pass CODEX2 run with no latent factors
normObj.sim <- normalize_codex2_ns_noK(Y_qc = Y_sim,
                                        gc_qc = ref_sim$gc,
                                        norm_index = which(Gini<=0.12))
Yhat.noK.sim <- normObj.sim$Yhat
beta.hat.noK.sim <- normObj.sim$beta.hat
fGC.hat.noK.sim <- normObj.sim$fGC.hat
N.sim <- normObj.sim$N

# Ploidy initialization
ploidy.sim <- initialize_ploidy(Y = Y_sim,
                            Yhat = Yhat.noK.sim,
                            ref = ref_sim)
ploidy.sim

# Specify nCores = 2 only for checking examples
normObj.scope.sim <- normalize_scope_foreach(Y_qc = Y_sim, 
                        gc_qc = ref_sim$gc,
                        K = 1, ploidyInt = ploidy.sim,
                        norm_index = which(Gini<=0.12), T = 1:5,
                        beta0 = beta.hat.noK.sim, nCores = 2)
Yhat.sim <- normObj.scope.sim$Yhat[[which.max(normObj.scope.sim$BIC)]]
fGC.hat.sim <- normObj.scope.sim$fGC.hat[[which.max(normObj.scope.sim$BIC)]]

Gini <- get_gini(Y_sim)

# first-pass CODEX2 run with no latent factors
normObj.sim <- normalize_codex2_ns_noK(Y_qc = Y_sim,
                                        gc_qc = ref_sim$gc,
                                        norm_index = which(Gini<=0.12))
Yhat.noK.sim <- normObj.sim$Yhat
beta.hat.noK.sim <- normObj.sim$beta.hat
fGC.hat.noK.sim <- normObj.sim$fGC.hat
N.sim <- normObj.sim$N

# Ploidy initialization
ploidy.sim <- initialize_ploidy(Y = Y_sim,
                            Yhat = Yhat.noK.sim,
                            ref = ref_sim)
ploidy.sim

# Specify nCores = 2 only for checking examples
normObj.scope.sim <- normalize_scope_foreach(Y_qc = Y_sim, 
                        gc_qc = ref_sim$gc,
                        K = 1, ploidyInt = ploidy.sim,
                        norm_index = which(Gini<=0.12), T = 1:5,
                        beta0 = beta.hat.noK.sim, nCores = 2)
Yhat.sim <- normObj.scope.sim$Yhat[[which.max(normObj.scope.sim$BIC)]]
fGC.hat.sim <- normObj.scope.sim$fGC.hat[[which.max(normObj.scope.sim$BIC)]]

Group-wise normalization of read depth with latent factors using Expectation-Maximization algorithm and shared clonal memberships

Description

Fit a Poisson generalized linear model to normalize the raw read depth data from single-cell DNA sequencing, with latent factors and shared clonal memberships. Model GC content bias using an expectation-maximization algorithm, which accounts for clonal specific copy number states.

Usage

normalize_scope_group(Y_qc, gc_qc, K, norm_index, groups, T, 
                        ploidyInt, beta0, minCountQC = 20)
normalize_scope_group(Y_qc, gc_qc, K, norm_index, groups, T, 
                        ploidyInt, beta0, minCountQC = 20)

Arguments

`Y_qc`	read depth matrix after quality control
`gc_qc`	vector of GC content for each bin after quality control
`K`	Number of latent Poisson factors
`norm_index`	indices of normal/diploid cells using group/clone labels
`groups`	clonal membership labels for each cell
`T`	a vector of integers indicating number of CNV groups. Use BIC to select optimal number of CNV groups. If `T = 1`, assume all reads are from normal regions so that EM algorithm is not implemented. Otherwise, we assume there is always a CNV group of heterozygous deletion and a group of null region. The rest groups are representative of different duplication states.
`ploidyInt`	a vector of group-wise initialized ploidy return from `initialize_ploidy_group`. Users are also allowed to provide prior-knowledge ploidies as the input and to manually tune a few cells/clones that have poor fitting
`beta0`	a vector of initialized bin-specific biases returned from CODEX2 without latent factors
`minCountQC`	the minimum read coverage required for normalization and EM fitting. Defalut is `20`

Value

A list with components

`Yhat`	A list of normalized read depth matrix with EM
`alpha.hat`	A list of absolute copy number matrix
`fGC.hat`	A list of EM estimated GC content bias matrix
`beta.hat`	A list of EM estimated bin-specific bias vector
`g.hat`	A list of estimated Poisson latent factor
`h.hat`	A list of estimated Poisson latent factor
`AIC`	AIC for model selection
`BIC`	BIC for model selection
`RSS`	RSS for model selection
`K`	Number of latent Poisson factors

Author(s)

Rujin Wang [email protected]

Examples

Gini <- get_gini(Y_sim)

# first-pass CODEX2 run with no latent factors
normObj.sim <- normalize_codex2_ns_noK(Y_qc = Y_sim,
                                        gc_qc = ref_sim$gc,
                                        norm_index = which(Gini<=0.12))
Yhat.noK.sim <- normObj.sim$Yhat
beta.hat.noK.sim <- normObj.sim$beta.hat
fGC.hat.noK.sim <- normObj.sim$fGC.hat
N.sim <- normObj.sim$N

# Group-wise ploidy initialization
clones <- c("normal", "tumor1", "normal", "tumor1", "tumor1")
ploidy.sim.group <- initialize_ploidy_group(Y = Y_sim, Yhat = Yhat.noK.sim, 
                                ref = ref_sim, groups = clones)
ploidy.sim.group

normObj.scope.sim.group <- normalize_scope_group(Y_qc = Y_sim, 
                                    gc_qc = ref_sim$gc,
                                    K = 1, ploidyInt = ploidy.sim.group,
                                    norm_index = which(clones=="normal"), 
                                    groups = clones, 
                                    T = 1:5,
                                    beta0 = beta.hat.noK.sim)
Yhat.sim.group <- normObj.scope.sim.group$Yhat[[which.max(
                                    normObj.scope.sim.group$BIC)]]
fGC.hat.sim.group <- normObj.scope.sim.group$fGC.hat[[which.max(
                                    normObj.scope.sim.group$BIC)]]

Gini <- get_gini(Y_sim)

# first-pass CODEX2 run with no latent factors
normObj.sim <- normalize_codex2_ns_noK(Y_qc = Y_sim,
                                        gc_qc = ref_sim$gc,
                                        norm_index = which(Gini<=0.12))
Yhat.noK.sim <- normObj.sim$Yhat
beta.hat.noK.sim <- normObj.sim$beta.hat
fGC.hat.noK.sim <- normObj.sim$fGC.hat
N.sim <- normObj.sim$N

# Group-wise ploidy initialization
clones <- c("normal", "tumor1", "normal", "tumor1", "tumor1")
ploidy.sim.group <- initialize_ploidy_group(Y = Y_sim, Yhat = Yhat.noK.sim, 
                                ref = ref_sim, groups = clones)
ploidy.sim.group

normObj.scope.sim.group <- normalize_scope_group(Y_qc = Y_sim, 
                                    gc_qc = ref_sim$gc,
                                    K = 1, ploidyInt = ploidy.sim.group,
                                    norm_index = which(clones=="normal"), 
                                    groups = clones, 
                                    T = 1:5,
                                    beta0 = beta.hat.noK.sim)
Yhat.sim.group <- normObj.scope.sim.group$Yhat[[which.max(
                                    normObj.scope.sim.group$BIC)]]
fGC.hat.sim.group <- normObj.scope.sim.group$fGC.hat[[which.max(
                                    normObj.scope.sim.group$BIC)]]

Pre-stored normObj.scope data for demonstration purposes

Description

Pre-stored normObj.scope data for demonstration purposes

Usage

normObj.scopeDemo
normObj.scopeDemo

Format

Pre-computed by SCOPE using pre-stored data Y_sim

Quality control for cells and bins

Description

Perform QC step on single cells and bins.

Usage

perform_qc(Y_raw, sampname_raw, ref_raw, QCmetric_raw,
        cov_thresh = 0, minCountQC = 20, 
        mapq20_thresh = 0.3, mapp_thresh = 0.9,
        gc_thresh = c(20, 80), nMAD = 3)
perform_qc(Y_raw, sampname_raw, ref_raw, QCmetric_raw,
        cov_thresh = 0, minCountQC = 20, 
        mapq20_thresh = 0.3, mapp_thresh = 0.9,
        gc_thresh = c(20, 80), nMAD = 3)

Arguments

`Y_raw`	raw read count matrix returned from `get_coverage_scDNA`
`sampname_raw`	sample names for quality control returned from `get_bam_bed`
`ref_raw`	raw GRanges object with corresponding GC content and mappability for quality control returned from `get_bam_bed`
`QCmetric_raw`	a QC metric for single cells returned from `get_samp_QC`
`cov_thresh`	scalar variable specifying the lower bound of read count summation of each cell. Default is `0`
`minCountQC`	the minimum read coverage required for normalization and EM fitting. Defalut is `20`
`mapq20_thresh`	scalar variable specifying the lower threshold of proportion of reads with mapping quality greater than 20. Default is `0.3`
`mapp_thresh`	scalar variable specifying mappability of each genomic bin. Default is `0.9`
`gc_thresh`	vector specifying the lower and upper bound of GC content threshold for quality control. Default is `20-80`
`nMAD`	scalar variable specifying the number of MAD from the median of total read counts adjusted by library size for each cell. Default is `3`

Value

A list with components

`Y`	read depth matrix after quality control
`sampname`	sample names after quality control
`ref`	A GRanges object specifying whole genomic bin positions after quality control
`QCmetric`	A data frame of QC metric for single cells after quality control

Author(s)

Rujin Wang [email protected]

Examples

Y_raw <- coverageObj.scopeDemo$Y
sampname_raw <- rownames(QCmetric.scopeDemo)
ref_raw <- ref.scopeDemo
QCmetric_raw <- QCmetric.scopeDemo
qcObj <- perform_qc(Y_raw = Y_raw, sampname_raw = sampname_raw,
                ref_raw = ref_raw, QCmetric_raw = QCmetric_raw)

Y_raw <- coverageObj.scopeDemo$Y
sampname_raw <- rownames(QCmetric.scopeDemo)
ref_raw <- ref.scopeDemo
QCmetric_raw <- QCmetric.scopeDemo
qcObj <- perform_qc(Y_raw = Y_raw, sampname_raw = sampname_raw,
                ref_raw = ref_raw, QCmetric_raw = QCmetric_raw)

Visualize EM fitting for each cell.

Description

A pdf file containing EM fitting results and plots is generated.

Usage

plot_EM_fit(Y_qc, gc_qc, norm_index, T, ploidyInt, beta0,
                minCountQC = 20, filename)
plot_EM_fit(Y_qc, gc_qc, norm_index, T, ploidyInt, beta0,
                minCountQC = 20, filename)

Arguments

`Y_qc`	read depth matrix across all cells after quality control
`gc_qc`	vector of GC content for each bin after quality control
`norm_index`	indices of normal/diploid cells
`T`	a vector of integers indicating number of CNV groups. Use BIC to select optimal number of CNV groups. If `T = 1`, assume all reads are from normal regions so that EM algorithm is not implemented. Otherwise, we assume there is always a CNV group of heterozygous deletion and a group of null region. The rest groups are representative of different duplication states.
`ploidyInt`	a vector of initialized ploidy return from `initialize_ploidy`
`beta0`	a vector of initialized bin-specific biases returned from CODEX2 without latent factors
`minCountQC`	the minimum read coverage required for EM fitting. Defalut is `20`
`filename`	the name of output pdf file

Value

pdf file with EM fitting results and two plots: log likelihood, and BIC versus the number of CNV groups.

Author(s)

Rujin Wang [email protected]

Examples

Gini <- get_gini(Y_sim)
# first-pass CODEX2 run with no latent factors
normObj.sim <- normalize_codex2_ns_noK(Y_qc = Y_sim,
                                        gc_qc = ref_sim$gc,
                                        norm_index = which(Gini<=0.12))
Yhat.noK.sim <- normObj.sim$Yhat
beta.hat.noK.sim <- normObj.sim$beta.hat
fGC.hat.noK.sim <- normObj.sim$fGC.hat
N.sim <- normObj.sim$N

# Ploidy initialization
ploidy.sim <- initialize_ploidy(Y = Y_sim, 
                                Yhat = Yhat.noK.sim, 
                                ref = ref_sim)
ploidy.sim

plot_EM_fit(Y_qc = Y_sim, gc_qc = ref_sim$gc,
        norm_index = which(Gini<=0.12), T = 1:7,
        ploidyInt = ploidy.sim,
        beta0 = beta.hat.noK.sim,
        filename = 'plot_EM_fit_demo.pdf')

Gini <- get_gini(Y_sim)
# first-pass CODEX2 run with no latent factors
normObj.sim <- normalize_codex2_ns_noK(Y_qc = Y_sim,
                                        gc_qc = ref_sim$gc,
                                        norm_index = which(Gini<=0.12))
Yhat.noK.sim <- normObj.sim$Yhat
beta.hat.noK.sim <- normObj.sim$beta.hat
fGC.hat.noK.sim <- normObj.sim$fGC.hat
N.sim <- normObj.sim$N

# Ploidy initialization
ploidy.sim <- initialize_ploidy(Y = Y_sim, 
                                Yhat = Yhat.noK.sim, 
                                ref = ref_sim)
ploidy.sim

plot_EM_fit(Y_qc = Y_sim, gc_qc = ref_sim$gc,
        norm_index = which(Gini<=0.12), T = 1:7,
        ploidyInt = ploidy.sim,
        beta0 = beta.hat.noK.sim,
        filename = 'plot_EM_fit_demo.pdf')

Plot post-segmentation copy number profiles of integer values

Description

Show heatmap of inferred integer copy-number profiles by SCOPE with cells clustered by hierarchical clustering

Usage

plot_iCN(iCNmat, ref, Gini, annotation = NULL,
        plot.dendrogram = TRUE, show.names = FALSE, filename)
plot_iCN(iCNmat, ref, Gini, annotation = NULL,
        plot.dendrogram = TRUE, show.names = FALSE, filename)

Arguments

`iCNmat`	inferred integer copy-number matrix by SCOPE, with each column being a cell and each row being a genomic bin
`ref`	GRanges object after quality control procedure
`Gini`	vector of Gini coefficients for each cell, with the same order as that of cells in columns of `iCNmat`
`annotation`	vector of annotation for each cell, with the same order as that of cells in columns of `iCNmat`. Default is `NULL`.
`plot.dendrogram`	logical, whether to plot the dendrogram. Default is `TRUE`.
`show.names`	logical, whether to show cell names by y axis. Default is `FALSE`.
`filename`	name of the output png file

Value

png file with integer copy-number profiles across single cells with specified annotations

Author(s)

Rujin Wang [email protected]

Examples

Gini <- get_gini(Y_sim)
plot_iCN(iCNmat = iCN_sim,
        ref = ref_sim,
        Gini = Gini,
        filename = 'plot_iCN_demo')

Gini <- get_gini(Y_sim)
plot_iCN(iCNmat = iCN_sim,
        ref = ref_sim,
        Gini = Gini,
        filename = 'plot_iCN_demo')

Pre-stored QCmetric data for demonstration purposes

Description

Pre-stored QCmetric data for demonstration purposes

Usage

QCmetric.scopeDemo
QCmetric.scopeDemo

Format

Pre-computed using whole genome sequencing data of three single cells from 10X Genomics Single-Cell CNV solution

A reference genome in the toy dataset

Description

A reference genome in the toy dataset

Usage

ref_sim
ref_sim

Format

A GRanges object with 1544 bins and 1 metadata column of GC content

Pre-stored 500kb-size reference genome for demonstration purposes

Description

Pre-stored 500kb-size reference genome for demonstration purposes

Usage

ref.scopeDemo
ref.scopeDemo

Format

Pre-computed using whole genome sequencing data with GC content and mappability scores

Cross-sample segmentation

Description

SCOPE offers a cross-sample Poisson likelihood-based recursive segmentation, enabling shared breakpoints across cells from the same genetic background.

Usage

segment_CBScs(Y, Yhat, sampname, ref, chr, 
                    mode = "integer", max.ns)
segment_CBScs(Y, Yhat, sampname, ref, chr, 
                    mode = "integer", max.ns)

Arguments

`Y`	raw read depth matrix after quality control procedure
`Yhat`	normalized read depth matrix
`sampname`	vector of sample names
`ref`	GRanges object after quality control procedure
`chr`	chromosome name. Make sure it is consistent with the reference genome.
`mode`	format of returned copy numbers. Only integer mode is supported for scDNA-seq data.
`max.ns`	a number specifying how many rounds of nested structure searching would be performed. Defalut is `0`.

Value

A list with components

`poolcall`	Cross-sample CNV callings indicating shared breakpoints
`finalcall`	Final cross-sample segmented callset of CNVs with genotyping results
`image.orig`	A matrix giving logarithm of normalized z-scores
`image.seg`	A matrix of logarithm of estimated copy number over 2
`iCN`	A matrix of inferred integer copy number profiles

Author(s)

Rujin Wang [email protected]

Examples

Yhat.sim <- normObj.scopeDemo$Yhat[[which.max(normObj.scopeDemo$BIC)]]
segment_cs_chr1 <- segment_CBScs(Y = Y_sim, Yhat = Yhat.sim,
                        sampname = colnames(Y_sim),
                        ref = ref_sim, chr = 'chr1', max.ns = 1)

Yhat.sim <- normObj.scopeDemo$Yhat[[which.max(normObj.scopeDemo$BIC)]]
segment_cs_chr1 <- segment_CBScs(Y = Y_sim, Yhat = Yhat.sim,
                        sampname = colnames(Y_sim),
                        ref = ref_sim, chr = 'chr1', max.ns = 1)

A read count matrix in the toy dataset

Description

A read count matrix in the toy dataset

Usage

Y_sim
Y_sim

Format

A read count matrix with 1544 bins and 39 cells

Package 'SCOPE'

Help Index

Pre-stored coverageObj.scope data for demonstration purposes

Description

Usage

Format

Get bam file directories, sample names, and whole genomic bins

Description

Usage

Arguments

Value

Author(s)

Examples

Get read coverage from single-cell DNA sequencing

Description

Usage

Arguments

Value

Author(s)

Examples

Compute GC content

Description

Usage

Arguments

Value

Author(s)

Examples

Compute Gini coefficients for single cells

Description

Usage

Arguments

Value

Author(s)

Examples

Compute mappability

Description

Usage

Arguments

Value

Author(s)

Examples

Get QC metrics for single cells

Description

Usage

Arguments

Value

Author(s)

Examples

A post cross-sample segmentation integer copy number matrix returned by SCOPE in the demo

Description

Usage

Format

Ploidy pre-initialization

Description

Usage

Arguments

Value

Author(s)

Examples

Group-wise ploidy pre-initialization

Description

Usage

Arguments

Value

Author(s)

Examples

Normalization of read depth without latent factors under the case-control setting

Description

Usage

Arguments

Value

Author(s)

Examples

Normalization of read depth with latent factors using Expectation-Maximization algorithm under the case-control setting

Description

Usage

Arguments

Value

Author(s)

Examples