Title: | Infer locally aneuploid cells using single cell RNA-seq data |
---|---|
Description: | This package uses a statistical framework for rapid and accurate detection of aneuploid cells with local copy number deletion or amplification. Our method uses an EM algorithm with mixtures of Poisson distributions while incorporating cytogenetics information (e.g., regional deletion or amplification) to guide the classification (partCNV). When applicable, we further improve the accuracy by integrating a Hidden Markov Model for feature selection (partCNVH). |
Authors: | Ziyi Li [aut, cre, ctb], Ruoxing Li [ctb] |
Maintainer: | Ziyi Li <[email protected]> |
License: | GPL-2 |
Version: | 1.5.0 |
Built: | 2024-10-30 09:24:14 UTC |
Source: | https://github.com/bioc/partCNV |
This function helps you identify the location of the cytogenetics feature. For example, if the region of interest is chr20(q11.1-q13.1), this function greps the start and end location of this region. Additionally, you can just put in "chr20", and it provides you all the available cytogenetics locations on chromosome 20. It also report the number of genes within the region. If the number of genes is too few, we recommend to include neighboring regions to provide more stable results.
GetCytoLocation(cyto_feature = NULL, chr = NULL, start = NULL, end = NULL)
GetCytoLocation(cyto_feature = NULL, chr = NULL, start = NULL, end = NULL)
cyto_feature |
the cytogenetics location you are interested. It can be of two format: chr20(q11.1-q13.1) or chr20. For the first format, the start and end regions need to be separated by "-". If you are interested in one region for example, chr20(q11.1), put it as chr20(q11.1-11.1). For the second format, all the available regions will be printed for selection. |
chr |
chromosome location of the interested region. This is only used when cyto_feature is null. |
start |
starting location of the interested region. This is only used when cyto_feature is null. |
end |
ending location of the interested region. This is only used when cyto_feature is null. |
If the first format of cyto_feature is provided, the starting and ending location as well as the number of genes overlapped with be provided. If the second format of cyto_feature is provided, all the cytogenetics locations will be displayed for review. If the region location (chr, start, end) is provided, the number of genes overlapped will be the output.
### example 1 GetCytoLocation(cyto_feature = "chr20(q11.1-q13.1)") ### example 2 GetCytoLocation(cyto_feature = "chr20") ### example 3 GetCytoLocation(chr = "chr20", start = 25600000, end = 49800000)
### example 1 GetCytoLocation(cyto_feature = "chr20(q11.1-q13.1)") ### example 2 GetCytoLocation(cyto_feature = "chr20") ### example 3 GetCytoLocation(chr = "chr20", start = 25600000, end = 49800000)
This function helps normalize the gene expression count matrix if needed and select the genes that are located in the interested region. This procedure happens after applying GetCytoLocation().
GetExprCountCyto( cytoloc_output, Counts = NULL, normalization = TRUE, qt_cutoff = 0.99 )
GetExprCountCyto( cytoloc_output, Counts = NULL, normalization = TRUE, qt_cutoff = 0.99 )
cytoloc_output |
The output from the function GetCytoLocation(). The function needs to be run with a complete cytogenetics feature input, e.g., chr20(q11.1-11.1), or providing the chr/start/end loction of the interested region. |
Counts |
The single cell expression matrix for the whole genome of the sample. Rows are genes and columns are cell IDs. |
normalization |
Specify whether the data need to be normalized. Default is TRUE. |
qt_cutoff |
A quantile cut-off to remove genes that are almost all zeros. If the cut-off is 0.99, then all the genes expressed in less than 0.01 percent of cells will be eliminated for further analysis. |
A list with normalized and ordered gene expression for the interested cytogenetics region.
res <- GetCytoLocation(cyto_feature = "chr20(q11.1-q13.1)") data(SimData) GetExprCountCyto(cytoloc_output = res, Counts = as.matrix(SimData), normalization = TRUE, qt_cutoff = 0.99)
res <- GetCytoLocation(cyto_feature = "chr20(q11.1-q13.1)") data(SimData) GetExprCountCyto(cytoloc_output = res, Counts = as.matrix(SimData), normalization = TRUE, qt_cutoff = 0.99)
Gene location data for Hg38 genome
data(Hg38_gtf)
data(Hg38_gtf)
An object of class "data frame"
Gene location data for Hg38 genome
Frankish, Adam, et al. Nucleic acids research 47.D1 (2019): D766-D773. (PubMed)
data(Hg38_gtf) head(Hg38_gtf)
data(Hg38_gtf) head(Hg38_gtf)
This function helps normalize the gene expression count matrix for a SingleCellExperiment object.
NormalizeCounts(obj, scale_factor = 10000)
NormalizeCounts(obj, scale_factor = 10000)
obj |
The SingleCellExperiment object. |
scale_factor |
Feature counts for each cell are divided by the total counts for that cell and multiplied by the scale.factor, and then natural-log transformed using log1p. |
A normalized gene expression counts matrix.
data(SimDataSce) counts_mat <- NormalizeCounts(SimDataSce)
data(SimDataSce) counts_mat <- NormalizeCounts(SimDataSce)
This function uses EM algorithm to cluster the cells with a Poisson Mixture model. Cells will be grouped into two groups, locally aneuploid (status = 1) and diploid (status = 0).
partCNV(int_counts, cyto_type, cyto_p, tau = 0.1, maxniter = 1000)
partCNV(int_counts, cyto_type, cyto_p, tau = 0.1, maxniter = 1000)
int_counts |
Normalized gene expression counts for the genes in the interested region, e.g., the ProcessedCount variable from the output of GetExprCountCyto(). |
cyto_type |
The type of the cytogenetics alteration. It can only be "del" or "amp" |
cyto_p |
The percentage of cells with the cytogenetics alteration, e.g., 0.2. |
tau |
The variance of the prior information. Default is 0.1. If you have less confidence, specify a larger tau, e.g., 10. |
maxniter |
The maximum number of iterations of the EM algorithm. |
A vector with the cell status inferred by the method, 1 is aneuploid and 0 is diploid.
### example 1 cytoloc <- GetCytoLocation(cyto_feature = "chr20(q11.1-q13.1)") data(SimData) exprout <- GetExprCountCyto(cytoloc_output = cytoloc, Counts = as.matrix(SimData), normalization = TRUE, qt_cutoff = 0.99) status <- partCNV(int_counts = exprout$ProcessedCount, cyto_type = "del", cyto_p = 0.2)
### example 1 cytoloc <- GetCytoLocation(cyto_feature = "chr20(q11.1-q13.1)") data(SimData) exprout <- GetExprCountCyto(cytoloc_output = cytoloc, Counts = as.matrix(SimData), normalization = TRUE, qt_cutoff = 0.99) status <- partCNV(int_counts = exprout$ProcessedCount, cyto_type = "del", cyto_p = 0.2)
This function uses EM algorithm to cluster the cells with a Poisson Mixture model in the first step. With the results, it applies hidden markov model to improve feature selection. After that, another round of EM algorithm is applied to obtain the final cell status. Cells will be grouped into two groups, locally aneuploid (status = 1) and diploid (status = 0).
partCNVH(int_counts, cyto_type, cyto_p, tau = 0.1, maxniter = 1000, navg = 50)
partCNVH(int_counts, cyto_type, cyto_p, tau = 0.1, maxniter = 1000, navg = 50)
int_counts |
Normalized gene expression counts for the genes in the interested region, e.g., the ProcessedCount variable from the output of GetExprCountCyto(). |
cyto_type |
The type of the cytogenetics alteration. It can only be "del" or "amp" |
cyto_p |
The percentage of cells with the cytogenetics alteration, e.g., 0.2. |
tau |
The variance of the prior information. Default is 0.1. If you have less confidence, specify a larger tau, e.g., 10. |
maxniter |
The maximum number of iterations of the EM algorithm. |
navg |
Number of genes used for rolling average. |
A vector with the cell status inferred by the method, 1 is aneuploid and 0 is diploid.
cytoloc <- GetCytoLocation(cyto_feature = "chr20(q11.1-q13.1)") data(SimData) exprout <- GetExprCountCyto(cytoloc_output = cytoloc, Counts = as.matrix(SimData), normalization = TRUE, qt_cutoff = 0.99) status <- partCNVH(int_counts = exprout$ProcessedCount, cyto_type = "del", cyto_p = 0.2, navg = 50)
cytoloc <- GetCytoLocation(cyto_feature = "chr20(q11.1-q13.1)") data(SimData) exprout <- GetExprCountCyto(cytoloc_output = cytoloc, Counts = as.matrix(SimData), normalization = TRUE, qt_cutoff = 0.99) status <- partCNVH(int_counts = exprout$ProcessedCount, cyto_type = "del", cyto_p = 0.2, navg = 50)
Simulation data to examplify the usage of the method
data(SimData)
data(SimData)
An object of class "data frame"
Simulation data to examplify the usage of the method
data(SimData) dim(SimData)
data(SimData) dim(SimData)
Simulation SingleCellExperiment object to examplify the usage of the method
data(SimDataSce)
data(SimDataSce)
A SingleCellExperiment object
Simulation SingleCellExperiment object to examplify the usage of the method
data(SimDataSce)
data(SimDataSce)