Title: | Cell type annotation for unannotated single-cell RNA-Seq data |
---|---|
Description: | scTGIF connects the cells and the related gene functions without cell type label. |
Authors: | Koki Tsuyuzaki [aut, cre] |
Maintainer: | Koki Tsuyuzaki <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.21.0 |
Built: | 2024-10-31 05:20:27 UTC |
Source: | https://github.com/bioc/scTGIF |
scTGIF connects the cells and the related gene functions without cell type label.
The DESCRIPTION file:
Package: | scTGIF |
Type: | Package |
Title: | Cell type annotation for unannotated single-cell RNA-Seq data |
Version: | 1.21.0 |
Authors@R: | person("Koki", "Tsuyuzaki", role = c("aut", "cre"), email = "[email protected]") |
Depends: | R (>= 3.6.0) |
Imports: | GSEABase, Biobase, SingleCellExperiment, BiocStyle, plotly, tagcloud, rmarkdown, Rcpp, grDevices, graphics, utils, knitr, S4Vectors, SummarizedExperiment, RColorBrewer, nnTensor, methods, scales, msigdbr, schex, tibble, ggplot2, igraph |
Suggests: | testthat |
Description: | scTGIF connects the cells and the related gene functions without cell type label. |
License: | Artistic-2.0 |
biocViews: | DimensionReduction, QualityControl, SingleCell, Software, GeneExpression |
VignetteBuilder: | knitr |
Repository: | https://bioc.r-universe.dev |
RemoteUrl: | https://github.com/bioc/scTGIF |
RemoteRef: | HEAD |
RemoteSha: | 90a5f4e22d6a2e98a6d8cc196fea49eea3afb63b |
Author: | Koki Tsuyuzaki [aut, cre] |
Maintainer: | Koki Tsuyuzaki <[email protected]> |
Index of help topics:
DistalLungEpithelium Gene expression matrix of DistalLungEpithelium dataset containing five cluster. calcTGIF Function for connecting cellular patterns and functional patterns using jNMF cellMarkerToGmt A function to convert the CellMarker data to GMT files. convertRowID A function to change the row names of a matrix. label.DistalLungEpithelium Cellular label of DistalLungEpithelium dataset containing five cluster. pca.DistalLungEpithelium The result of PCA of the DistalLungEpithelium dataset. reportTGIF Function for reporting the result of 'calcTGIF' function scTGIF-package Cell type annotation for unannotated single-cell RNA-Seq data settingTGIF Paramter setting for scTGIF
calcTGIF
function calculates what kind of
cellular patterns and functional patterns are contained
in single-cell RNA-seq data and reportTGIF
function generates report of analytic result.
The algorithm is based on joint NMF, which is implemented in nnTensor package.
Koki Tsuyuzaki [aut, cre]
Maintainer: Koki Tsuyuzaki <[email protected]>
Dominic Grun, Anna Lyubimova, Lennart Kester, Kay Wiebrands, Onur Basak, Nobuo Sasaki, Hans Clevers, Alexander van Oudenaarden (2015) Single-cell messenger RNA sequencing reveals rare intestinal cell types. Nature, 525: 251-255
calcTGIF
function calculates what kind of
cellular patterns and functional patterns are contained in
single-cell RNA-seq data and reportTGIF
function
generates report of analytic result.
calcTGIF(sce, ndim, verbose=FALSE, droplet=TRUE)
calcTGIF(sce, ndim, verbose=FALSE, droplet=TRUE)
sce |
A object generated by instantization of SingleCellExperiment-class. |
ndim |
The number of low-dimension of joint NMF algorithm. |
verbose |
The verbose parameter for nnTensor::jNMF (Default: FALSE). |
droplet |
Whether Droplet-based single-cell RNA-Seq or not (Default: TRUE). |
The result is saved to metadata slot of SingleCellExperiment object.
Koki Tsuyuzaki [aut, cre]
showMethods("calcTGIF")
showMethods("calcTGIF")
The GMT (Gene Matrix Transposed file format : *.gmt) file is formatted by the Broad Institute (https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats#GMT:_Gene_Matrix_Transposed_file_format_.28.2A.gmt.29). The data can be downloaded from the website of CellMarker (http://biocc.hrbmu.edu.cn/CellMarker).
cellMarkerToGmt(infile, outfile, uniq.column=c("tissueType", "cellName"), geneid.type=c("geneID", "geneSymbol"))
cellMarkerToGmt(infile, outfile, uniq.column=c("tissueType", "cellName"), geneid.type=c("geneID", "geneSymbol"))
infile |
The input file downloaded from CellMarker website |
outfile |
The output GMT file converted from the CellMarker data |
uniq.column |
The duplicated terms in the specified column are aggrgated as a row of GMT file (Default: geneID) |
geneid.type |
Output gene identifier. (Default: geneID) |
output |
A GMT file is generated. |
Koki Tsuyuzaki [aut, cre]
library("GSEABase") tmp <- tempdir() infile1 = paste0(tmp, "/Human_cell_markers.txt") outfile1_1 = paste0(tmp, "/Human_cell_markers_1.gmt") outfile1_2 = paste0(tmp, "/Human_cell_markers_2.gmt") outfile1_3 = paste0(tmp, "/Human_cell_markers_3.gmt") outfile1_4 = paste0(tmp, "/Human_cell_markers_4.gmt") sink(infile1) cat("speciesType\ttissueType\tUberonOntologyID\tcancerType\tcellType\tcellName\tCellOntologyID\tcellMarker\tgeneSymbol\tgeneID\tproteinName\tproteinID\tmarkerResource\tPMID\tCompany\n") cat("Human\tKidney\tUBERON_0002113\tNormal\tNormal cell\tProximal tubular cell\tNA\tIntestinal Alkaline Phosphatase\tALPI\t248\tPPBI\tP09923\tExperiment\t9263997\tNA\n") cat("Human\tLiver\tUBERON_0002107\tNormal\tNormal cell\tIto cell (hepatic stellate cell)\tCL_0000632\tSynaptophysin\tSYP\t6855\tSYPH\tP08247\tExperiment\t10595912\tNA\n") cat("Human\tEndometrium\tUBERON_0001295\tNormal\tNormal cell\tTrophoblast cell\tCL_0000351\tCEACAM1\tCEACAM1\t634\tCEAM1\tP13688\tExperiment\t10751340\tNA\n") cat("Human\tGerm\tUBERON_0000923\tNormal\tNormal cell\tPrimordial germ cell\tCL_0000670\tVASA\tDDX4\t54514\tDDX4\tQ9NQI0\tExperiment\t10920202\tNA\n") cat("Human\tCorneal epithelium\tUBERON_0001772\tNormal\tNormal cell\tEpithelial cell\tCL_0000066\tKLF6\tKLF6\t1316\tKLF6\tQ99612\tExperiment\t12407152\tNA\n") cat("Human\tPlacenta\tUBERON_0001987\tNormal\tNormal cell\tCytotrophoblast\tCL_0000351\tFGF10\tFGF10\t2255\tFGF10\tO15520\tExperiment\t15950061\tNA\n") cat("Human\tPeriosteum\tUBERON_0002515\tNormal\tNormal cell\tPeriosteum-derived progenitor cell\tNA\tCD166, CD45, CD9, CD90\tALCAM, PTPRC, CD9, THY1\t214, 5788, 928, 7070\tCD166, PTPRC, CD9, THY1\tQ13740, P08575, P21926, P04216\tExperiment\t15977065\tNA\n") cat("Human\tAmniotic membrane\tUBERON_0009742\tNormal\tNormal cell\tAmnion epithelial cell\tCL_0002536\tNANOG, OCT3/4\tNANOG, POU5F1\t79923, 5460\tNANOG, PO5F1\tQ9H9S0, Q01860\tExperiment\t16081662\tNA\n") cat("Human\tPrimitive streak\tUBERON_0004341\tNormal\tNormal cell\tPrimitive streak cell\tNA\tLHX1, MIXL1\tLHX1, MIXL1\t3975, 83881\tLHX1, MIXL1\tP48742, Q9H2W2\tExperiment\t16258519\tNA\n") sink() cellMarkerToGmt(infile1, outfile1_1, uniq.column=c("tissueType"), geneid.type=c("geneID")) cellMarkerToGmt(infile1, outfile1_2, uniq.column=c("tissueType"), geneid.type=c("geneSymbol")) cellMarkerToGmt(infile1, outfile1_3, uniq.column=c("cellName"), geneid.type=c("geneID")) cellMarkerToGmt(infile1, outfile1_4, uniq.column=c("cellName"), geneid.type=c("geneSymbol")) gmt1_1 <- getGmt(outfile1_1) gmt1_2 <- getGmt(outfile1_2) gmt1_3 <- getGmt(outfile1_3) gmt1_4 <- getGmt(outfile1_4)
library("GSEABase") tmp <- tempdir() infile1 = paste0(tmp, "/Human_cell_markers.txt") outfile1_1 = paste0(tmp, "/Human_cell_markers_1.gmt") outfile1_2 = paste0(tmp, "/Human_cell_markers_2.gmt") outfile1_3 = paste0(tmp, "/Human_cell_markers_3.gmt") outfile1_4 = paste0(tmp, "/Human_cell_markers_4.gmt") sink(infile1) cat("speciesType\ttissueType\tUberonOntologyID\tcancerType\tcellType\tcellName\tCellOntologyID\tcellMarker\tgeneSymbol\tgeneID\tproteinName\tproteinID\tmarkerResource\tPMID\tCompany\n") cat("Human\tKidney\tUBERON_0002113\tNormal\tNormal cell\tProximal tubular cell\tNA\tIntestinal Alkaline Phosphatase\tALPI\t248\tPPBI\tP09923\tExperiment\t9263997\tNA\n") cat("Human\tLiver\tUBERON_0002107\tNormal\tNormal cell\tIto cell (hepatic stellate cell)\tCL_0000632\tSynaptophysin\tSYP\t6855\tSYPH\tP08247\tExperiment\t10595912\tNA\n") cat("Human\tEndometrium\tUBERON_0001295\tNormal\tNormal cell\tTrophoblast cell\tCL_0000351\tCEACAM1\tCEACAM1\t634\tCEAM1\tP13688\tExperiment\t10751340\tNA\n") cat("Human\tGerm\tUBERON_0000923\tNormal\tNormal cell\tPrimordial germ cell\tCL_0000670\tVASA\tDDX4\t54514\tDDX4\tQ9NQI0\tExperiment\t10920202\tNA\n") cat("Human\tCorneal epithelium\tUBERON_0001772\tNormal\tNormal cell\tEpithelial cell\tCL_0000066\tKLF6\tKLF6\t1316\tKLF6\tQ99612\tExperiment\t12407152\tNA\n") cat("Human\tPlacenta\tUBERON_0001987\tNormal\tNormal cell\tCytotrophoblast\tCL_0000351\tFGF10\tFGF10\t2255\tFGF10\tO15520\tExperiment\t15950061\tNA\n") cat("Human\tPeriosteum\tUBERON_0002515\tNormal\tNormal cell\tPeriosteum-derived progenitor cell\tNA\tCD166, CD45, CD9, CD90\tALCAM, PTPRC, CD9, THY1\t214, 5788, 928, 7070\tCD166, PTPRC, CD9, THY1\tQ13740, P08575, P21926, P04216\tExperiment\t15977065\tNA\n") cat("Human\tAmniotic membrane\tUBERON_0009742\tNormal\tNormal cell\tAmnion epithelial cell\tCL_0002536\tNANOG, OCT3/4\tNANOG, POU5F1\t79923, 5460\tNANOG, PO5F1\tQ9H9S0, Q01860\tExperiment\t16081662\tNA\n") cat("Human\tPrimitive streak\tUBERON_0004341\tNormal\tNormal cell\tPrimitive streak cell\tNA\tLHX1, MIXL1\tLHX1, MIXL1\t3975, 83881\tLHX1, MIXL1\tP48742, Q9H2W2\tExperiment\t16258519\tNA\n") sink() cellMarkerToGmt(infile1, outfile1_1, uniq.column=c("tissueType"), geneid.type=c("geneID")) cellMarkerToGmt(infile1, outfile1_2, uniq.column=c("tissueType"), geneid.type=c("geneSymbol")) cellMarkerToGmt(infile1, outfile1_3, uniq.column=c("cellName"), geneid.type=c("geneID")) cellMarkerToGmt(infile1, outfile1_4, uniq.column=c("cellName"), geneid.type=c("geneSymbol")) gmt1_1 <- getGmt(outfile1_1) gmt1_2 <- getGmt(outfile1_2) gmt1_3 <- getGmt(outfile1_3) gmt1_4 <- getGmt(outfile1_4)
To avoid to specify the duplicated row names against matrix, multiple aggregation rules are implemented.
convertRowID(input, rowID, LtoR, aggr.rule=c("sum", "mean", "large.mean", "large.var", "large.cv2"))
convertRowID(input, rowID, LtoR, aggr.rule=c("sum", "mean", "large.mean", "large.var", "large.cv2"))
input |
A matrix filled with number (n * m). |
rowID |
A vector to specify the row names of input (length: n). |
LtoR |
A corresponding table to covert the row names of input as different type of IDs. (Left: current row names -> Right: new row names) |
aggr.rule |
The aggregation rule to change the row names of input and collapse/select the values, if the row names changed by LtoR are duplicated. sum: Collapses multiple row vectors by summation. mean: Collapses multiple row vectors by mean. large.mean: Select a vector having the largest mean in the duplicated vectors. large.var: Select a vector having the largest variance in the duplicated vectors. large.cv2: Select a vector having the largest CV2 in the duplicated vectors. |
output |
A matrix, in which the row names are changed, according to the aggregation rule user specified. |
ctable |
The corresponding table explaining the relationship between previous row names and changed row names of input. |
Koki Tsuyuzaki [aut, cre]
input <- matrix(1:20, nrow=4, ncol=5) rowID <- c("A", "B", "C", "D") LtoR <- rbind( c("A", "3"), c("B", "2"), c("C", "4"), c("D", "7")) (input2 <- convertRowID(input, rowID, LtoR, "sum")) (input3 <- convertRowID(input, rowID, LtoR, "mean")) (input4 <- convertRowID(input, rowID, LtoR, "large.mean")) (input5 <- convertRowID(input, rowID, LtoR, "large.var")) (input6 <- convertRowID(input, rowID, LtoR, "large.cv2"))
input <- matrix(1:20, nrow=4, ncol=5) rowID <- c("A", "B", "C", "D") LtoR <- rbind( c("A", "3"), c("B", "2"), c("C", "4"), c("D", "7")) (input2 <- convertRowID(input, rowID, LtoR, "sum")) (input3 <- convertRowID(input, rowID, LtoR, "mean")) (input4 <- convertRowID(input, rowID, LtoR, "large.mean")) (input5 <- convertRowID(input, rowID, LtoR, "large.var")) (input6 <- convertRowID(input, rowID, LtoR, "large.cv2"))
A data frame with 3397 rows (genes) with following 80 columns (cells).
The data is downloaded as supplementary information of the distal lung epithelium paper (https://www.nature.com/articles/nature13173).
Low-expressed genes are filted.
All Gene ID is converted to Human Entrez Gene ID for applying the data to scTGIF.
data("DistalLungEpithelium")
data("DistalLungEpithelium")
http://www.nature.com/nbt/journal/v33/n2/full/nbt.3102.html
Treutlein, B. et al. (2014) Reconstructing lineage hierarchies of the distal lung epithelium using single-cell RNA-seq. Nature 509, 371-375
data("DistalLungEpithelium")
data("DistalLungEpithelium")
A vector containing 80 elements (cells).
data("label.DistalLungEpithelium")
data("label.DistalLungEpithelium")
Treutlein, B. et al. (2014) Reconstructing lineage hierarchies of the distal lung epithelium using single-cell RNA-seq. Nature 509, 371-375
data("label.DistalLungEpithelium")
data("label.DistalLungEpithelium")
A matrix having 80 (cells) * 2 (PCs) elements.
data("pca.DistalLungEpithelium")
data("pca.DistalLungEpithelium")
Treutlein, B. et al. (2014) Reconstructing lineage hierarchies of the distal lung epithelium using single-cell RNA-seq. Nature 509, 371-375
data("pca.DistalLungEpithelium")
data("pca.DistalLungEpithelium")
calcTGIF
function
calcTGIF
function calculates what kind of cellular patterns
and functional patterns are contained in single-cell RNA-seq data and
reportTGIF
function generates report of analytic result.
reportTGIF(sce, out.dir=tempdir(), html.open=FALSE, title="The result of scTGIF", author="The person who runs this script", assayNames="counts")
reportTGIF(sce, out.dir=tempdir(), html.open=FALSE, title="The result of scTGIF", author="The person who runs this script", assayNames="counts")
sce |
A object generated by instantization of SingleCellExperiment-class. |
out.dir |
Output directory user want to save the report (Default: tempdir()). |
html.open |
Whether html is opened when |
title |
Title of report (Default: "The result of scTGIF") |
author |
The name of user name (Default: "The person who runs this script") |
assayNames |
The unit of gene expression for using scTGIF (e.g. normcounts, cpm...etc) (Default: "counts"). |
Some file is generated to output directory user specified.
Koki Tsuyuzaki [aut, cre]
if(interactive()){ # Package loading library("SingleCellExperiment") library("GSEABase") library("msigdbr") # Test data data("DistalLungEpithelium") data("pca.DistalLungEpithelium") data("label.DistalLungEpithelium") # Test data par(ask=FALSE) plot(pca.DistalLungEpithelium, col=label.DistalLungEpithelium, pch=16, main="Distal lung epithelium dataset", xlab="PCA1", ylab="PCA2", bty="n") text(0.1, 0.05, "AT1", col="#FF7F00", cex=2) text(0.07, -0.15, "AT2", col="#E41A1C", cex=2) text(0.13, -0.04, "BP", col="#A65628", cex=2) text(0.125, -0.15, "Clara", col="#377EB8", cex=2) text(0.09, -0.2, "Cilliated", col="#4DAF4A", cex=2) # Load the gmt file from MSigDB # Only "Entrez Gene ID" can be used in scTGIF # e.g. gmt <- GSEABase::getGmt( # "/PATH/YOU/SAVED/THE/GMTFILES/h.all.v6.0.entrez.gmt") # Here we use msigdbr to retrieve mouse gene sets # Mouse gene set (NCBI Gene ID) m_df <- msigdbr(species = "Mus musculus", category = "H")[, c("gs_name", "entrez_gene")] # Convert to GeneSetCollection hallmark = unique(m_df$gs_name) gsc <- lapply(hallmark, function(h){ target = which(m_df$gs_name == h) geneIds = unique(as.character(m_df$entrez_gene[target])) GeneSet(setName=h, geneIds) }) gmt <- GeneSetCollection(gsc) # SingleCellExperiment-class sce <- SingleCellExperiment( assays = list(counts = DistalLungEpithelium)) reducedDims(sce) <- SimpleList(PCA=pca.DistalLungEpithelium) # User's Original Normalization Function CPMED <- function(input){ libsize <- colSums(input) median(libsize) * t(t(input) / libsize) } # Normalization normcounts(sce) <- log10(CPMED(counts(sce)) + 1) # Registration of required information into metadata(sce) settingTGIF(sce, gmt, reducedDimNames="PCA", assayNames="normcounts") # Functional Annotation based on jNMF calcTGIF(sce, ndim=7) # HTML Reprt reportTGIF(sce, html.open=TRUE, title="scTGIF Report for DistalLungEpithelium dataset", author="Koki Tsuyuzaki") }
if(interactive()){ # Package loading library("SingleCellExperiment") library("GSEABase") library("msigdbr") # Test data data("DistalLungEpithelium") data("pca.DistalLungEpithelium") data("label.DistalLungEpithelium") # Test data par(ask=FALSE) plot(pca.DistalLungEpithelium, col=label.DistalLungEpithelium, pch=16, main="Distal lung epithelium dataset", xlab="PCA1", ylab="PCA2", bty="n") text(0.1, 0.05, "AT1", col="#FF7F00", cex=2) text(0.07, -0.15, "AT2", col="#E41A1C", cex=2) text(0.13, -0.04, "BP", col="#A65628", cex=2) text(0.125, -0.15, "Clara", col="#377EB8", cex=2) text(0.09, -0.2, "Cilliated", col="#4DAF4A", cex=2) # Load the gmt file from MSigDB # Only "Entrez Gene ID" can be used in scTGIF # e.g. gmt <- GSEABase::getGmt( # "/PATH/YOU/SAVED/THE/GMTFILES/h.all.v6.0.entrez.gmt") # Here we use msigdbr to retrieve mouse gene sets # Mouse gene set (NCBI Gene ID) m_df <- msigdbr(species = "Mus musculus", category = "H")[, c("gs_name", "entrez_gene")] # Convert to GeneSetCollection hallmark = unique(m_df$gs_name) gsc <- lapply(hallmark, function(h){ target = which(m_df$gs_name == h) geneIds = unique(as.character(m_df$entrez_gene[target])) GeneSet(setName=h, geneIds) }) gmt <- GeneSetCollection(gsc) # SingleCellExperiment-class sce <- SingleCellExperiment( assays = list(counts = DistalLungEpithelium)) reducedDims(sce) <- SimpleList(PCA=pca.DistalLungEpithelium) # User's Original Normalization Function CPMED <- function(input){ libsize <- colSums(input) median(libsize) * t(t(input) / libsize) } # Normalization normcounts(sce) <- log10(CPMED(counts(sce)) + 1) # Registration of required information into metadata(sce) settingTGIF(sce, gmt, reducedDimNames="PCA", assayNames="normcounts") # Functional Annotation based on jNMF calcTGIF(sce, ndim=7) # HTML Reprt reportTGIF(sce, html.open=TRUE, title="scTGIF Report for DistalLungEpithelium dataset", author="Koki Tsuyuzaki") }
All parameters is saved to metadata slot of SingleCellExperiment object.
settingTGIF(sce, gmt, reducedDimNames, assayNames="counts", nbins=40)
settingTGIF(sce, gmt, reducedDimNames, assayNames="counts", nbins=40)
sce |
A object generated by instantization of SingleCellExperiment-class. |
gmt |
Object generated from GSEABase::getGmt function. GMT file can be downloaded from MSigDB web (site http://software.broadinstitute.org/gsea/login.jsp#msigdb). Please confirm that the gmt file contains Human Entrez Gene ID, not gene symbol. Also confirm that the DataMatrix has Human Entrez Gene ID. |
reducedDimNames |
The names of reducedDim(sce) that user want use in scTGIF. |
assayNames |
The unit of gene expression for using scTGIF (e.g. normcounts, cpm...etc) (Default: "counts"). |
nbins |
The number of bins of schex (Default: 40). |
The result is saved to metadata slot of SingleCellExperiment object.
Koki Tsuyuzaki [aut, cre]
showMethods("settingTGIF")
showMethods("settingTGIF")