Title: | Marker Enrichment Modeling (MEM) |
---|---|
Description: | MEM, Marker Enrichment Modeling, automatically generates and displays quantitative labels for cell populations that have been identified from single-cell data. The input for MEM is a dataset that has pre-clustered or pre-gated populations with cells in rows and features in columns. Labels convey a list of measured features and the features' levels of relative enrichment on each population. MEM can be applied to a wide variety of data types and can compare between MEM labels from flow cytometry, mass cytometry, single cell RNA-seq, and spectral flow cytometry using RMSD. |
Authors: | Sierra Lima [aut] , Kirsten Diggins [aut] , Jonathan Irish [aut, cre] |
Maintainer: | Jonathan Irish <[email protected]> |
License: | GPL-3 |
Version: | 1.11.0 |
Built: | 2024-10-30 05:37:40 UTC |
Source: | https://github.com/bioc/cytoMEM |
Takes matrix list generated from MEM
as input and outputs MEM labels, heatmap of population median values, and heatmap of MEM scores.
build_heatmaps( MEM_values, cluster.MEM = "both", cluster.medians = "none", cluster.IQRs = "none", display.thresh = 1, output.files = FALSE, labels = FALSE, only.MEMheatmap = FALSE)
build_heatmaps( MEM_values, cluster.MEM = "both", cluster.medians = "none", cluster.IQRs = "none", display.thresh = 1, output.files = FALSE, labels = FALSE, only.MEMheatmap = FALSE)
MEM_values |
List of matrices generated as output from |
cluster.MEM |
|
cluster.medians |
|
cluster.IQRs |
|
display.thresh |
Numeric; 0-10. Markers with enrichment scores that are equal to or greater than |
output.files |
|
labels |
|
only.MEMheatmap |
|
Heatmaps are clustered using the default complete linkage hierarchical clustering in the hclust
function. See heatmap.2
and hclust
for more information.
Heatmaps of median, IQR, and MEM values on each population; optionally written to file.
Kirsten Diggins, Sierra Lima, Jonathan Irish
Diggins et al., Nature Methods, 2017
# Use output from MEM function or use example data with data(MEM_values) build_heatmaps( MEM_values, cluster.MEM = "both", cluster.medians = "none", cluster.IQRs = "none", display.thresh = 1, output.files = TRUE, labels = FALSE, only.MEMheatmap = FALSE)
# Use output from MEM function or use example data with data(MEM_values) build_heatmaps( MEM_values, cluster.MEM = "both", cluster.medians = "none", cluster.IQRs = "none", display.thresh = 1, output.files = TRUE, labels = FALSE, only.MEMheatmap = FALSE)
The MEM function takes pre-clustered, single-cell data as input and calculates relative enrichment scores for each marker on each population.
MEM(exp_data, transform=FALSE, cofactor=1, choose.markers=FALSE, markers="all", choose.ref=FALSE, zero.ref=FALSE, rename.markers=FALSE, new.marker.names="none", file.is.clust=FALSE, add.fileID=FALSE, IQR.thresh=NULL, output.prescaled.MEM=FALSE, scale.matrix = "linear", scale.factor = 0)
MEM(exp_data, transform=FALSE, cofactor=1, choose.markers=FALSE, markers="all", choose.ref=FALSE, zero.ref=FALSE, rename.markers=FALSE, new.marker.names="none", file.is.clust=FALSE, add.fileID=FALSE, IQR.thresh=NULL, output.prescaled.MEM=FALSE, scale.matrix = "linear", scale.factor = 0)
exp_data |
list of file names or a |
transform |
|
cofactor |
numeric; if |
choose.markers |
|
markers |
|
choose.ref |
|
zero.ref |
|
rename.markers |
|
new.marker.names |
|
file.is.clust |
|
add.fileID |
|
IQR.thresh |
Default |
output.prescaled.MEM |
Default |
scale.matrix |
Default |
scale.factor |
Default |
For each population and its reference, MEM first calculates median
marker levels and marker interquartile ranges (IQR
), and then calculates MEM scores according to the equation
MEM = |Median_Pop - Median_Ref| + IQR_Ref/IQR_Pop -1 ; if Median_Pop - Median_ref < 0, -MEM
A dataset is provided as an example to be used with MEM
and build_heatmaps
. Please see dataset PBMC
for more details.
Input data can be file type .txt, .fcs, or .csv. A matrix or data.frame object where the last column contains cluster identy per cell is also accepted. In all cases, the expected data structure is cells (datapoints) in rows and measured markers (i.e. features, parameters) in columns of the input data.
IQR threshold
: The MEM equation takes the ratio of population and reference IQRs and adds this value to the difference in medians. Low IQR values below 1, like those resulting from background noise level measurements, can therefore artificially inflate the overall MEM score. In order to correct this, a threshold of 0.5 is automatically applied.
However, the function can calculate an IQR threshold using the input data. If IQR_thresh is set to "auto", the threshold will be calculated as the IQR associated with the 2nd quartile median value across all populations and corresponding reference populations. This should be used if the user anticipates that 0.5 will not be an adequate threshold for the particular dataset.
MAGpop |
Matrix; Median expression level of markers on each population |
MAGref |
Matrix; Median expression on each population's corresponding reference population |
IQRpop |
Matrix; IQR of markers on each population |
IQRref |
Matrix; IQR on each population's corresponding reference population |
The object generated from MEM
is meant to be passed to build_heatmaps
which will generate MEM labels and heatmaps.
Kirsten Diggins, Sierra Lima, and Jonathan Irish
Diggins et al., Nature Methods, 2017
## For multiple file input, set working directory to folder containing files, then ## infiles <- dir() ## For single file or object input (e.g. PBMC), input data directly into MEM function ## User inputs data(PBMC) MEM_values = MEM( PBMC, transform=TRUE, cofactor=15, choose.markers=FALSE, markers="all", choose.ref=FALSE, zero.ref = FALSE, rename.markers=FALSE, new.marker.names="none", IQR.thresh=NULL, output.prescaled.MEM=FALSE, scale.matrix = "linear", scale.factor = 0)
## For multiple file input, set working directory to folder containing files, then ## infiles <- dir() ## For single file or object input (e.g. PBMC), input data directly into MEM function ## User inputs data(PBMC) MEM_values = MEM( PBMC, transform=TRUE, cofactor=15, choose.markers=FALSE, markers="all", choose.ref=FALSE, zero.ref = FALSE, rename.markers=FALSE, new.marker.names="none", IQR.thresh=NULL, output.prescaled.MEM=FALSE, scale.matrix = "linear", scale.factor = 0)
This matrix is the output generated from MEM
analysis of the PBMC
dataset. It is meant to be used as input for the MEM_RMSD
function to generate RMSD scores of similarity.
data(MEM_matrix)
data(MEM_matrix)
The format is the 7 populations in rows and the MEM scores for all 25 measured markers in columns. See PBMC
dataset for more details.
data(MEM_matrix)
data(MEM_matrix)
MEM_RMSD calculates a normalized average RMSD score pairwise between populations given their MEM scores as input. This is meant to serve as a metric of similarity between populations.
The function calculates the sum of squares for all shared markers between two populations, then takes the square root of the average.
For "a" through n markers, the sum of squares is calculated as: sum of squares = (a2-a1)^2 + (b2-b1)^2 ...(n2-n1)^2
Root-mean-square deviation (RMSD) is calculated as: RMSD = sqrt(sum of squares/number of markers)
The RMSD values are then converted to percentages with the maximum RMSD in the matrix set as 100 percent, so that the final RMSD score is the percent of the maximum RMSD.
Percent_max_RMSD = 100-RMSD/max_RMSD*100
The function then outputs a clustered heatmap of Percent_max_RMSD values and the matrix of numerical values used to build the heatmap.
MEM_RMSD( MEM_matrix, format=NULL, output.matrix=FALSE)
MEM_RMSD( MEM_matrix, format=NULL, output.matrix=FALSE)
MEM_matrix |
The input to MEM_RMSD can be either 1) a matrix of values, where populations are in rows and their MEM scores are in columns, 2) the list of matrices output by |
format |
Default is NULL. When |
output.matrix |
If |
If you are calculating MEM_RMSD
on population files, populations do not have to include all of the same markers. The function will determine which markers each pair of populations has in common and will use those common markers to calculate RMSD. If the populations have no markers in common, the function will terminate with an error. Note that population names must match exactly between files in order for them to be considered the same.
RMSD_vals |
Matrix of the calculated pairwise percent max RMSD scores |
RMSD heatmap |
Hierarchically clustered heatmap of RMSD_vals |
Kirsten Diggins, Sierra Lima, Jonathan Irish
Diggins et al., Nature Methods, 2017
## For single matrix, input data directly into RMSD function ## User inputs data(MEM_matrix) MEM_RMSD( MEM_matrix, format=NULL, output.matrix=FALSE)
## For single matrix, input data directly into RMSD function ## User inputs data(MEM_matrix) MEM_RMSD( MEM_matrix, format=NULL, output.matrix=FALSE)
This list of 5 matrices is the output generated from MEM
analysis of the PBMC
dataset. It is meant to be used as input for the build_heatmaps
function to generate population-specific MEM labels and clustered median and MEM score heatmaps.
data(MEM_values)
data(MEM_values)
The format is: List of 6 $ MAGpop :List of 1 ..$ : num [1:7, 1:25] 0.0254 0.0189 0.0207 2.5075 2.4995 ... .. ..- attr(*, "dimnames")=List of 2 .. .. ..$ : chr [1:7] "1" "2" "3" "4" ... .. .. ..$ : chr [1:25] "CD19" "CD117" "CD11b" "CD4" ... $ MAGref :List of 1 ..$ : num [1:7, 1:25] 0.0146 0.0209 0.0206 0.0213 0.0235 ... .. ..- attr(*, "dimnames")=List of 2 .. .. ..$ : chr [1:7] "1" "2" "3" "4" ... .. .. ..$ : chr [1:25] "CD19" "CD117" "CD11b" "CD4" ... $ IQRpop :List of 1 ..$ : num [1:7, 1:25] 0.5 0.5 0.5 0.68 0.655 ... .. ..- attr(*, "dimnames")=List of 2 .. .. ..$ : chr [1:7] "1" "2" "3" "4" ... .. .. ..$ : chr [1:25] "CD19" "CD117" "CD11b" "CD4" ... $ IQRref :List of 1 ..$ : num [1:7, 1:25] 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ... .. ..- attr(*, "dimnames")=List of 2 .. .. ..$ : chr [1:7] "1" "2" "3" "4" ... .. .. ..$ : chr [1:25] "CD19" "CD117" "CD11b" "CD4" ... $ MEM_matrix:List of 1 ..$ : num [1:7, 1:25] 0.014421 -0.002766 0.000195 3.309071 3.295596 ... .. ..- attr(*, "dimnames")=List of 2 .. .. ..$ : chr [1:7] "1" "2" "3" "4" ... .. .. ..$ : chr [1:25] "CD19" "CD117" "CD11b" "CD4" ... $ File Order:List of 1 ..$ : num 0
data(MEM_values)
data(MEM_values)
This dataset contains 49651 events from a 25-marker panel CyTOF analysis of normal human PBMCs. Expression values are the raw (pre-transformation) median intensity (MI) values.
Data has been pre-gated to include only DNA-intercalator (Iridium) positive and CD45-high events. Expert biaxial gating was used to separate these events into 7 major blood cell populations: CD4+ T cells (cluster 1), CD8+ T cells (cluster 2), IgM+ B cells (cluster 5), IgM- B cells (cluster 4), dendritic cells (DCs) (cluster 3), natural killer (NK) cells (cluster 7), and monocytes (cluster 6). Per-cell population identity is specified in the cluster
channel (variable).
This dataset is meant to be used as an example with the MEM
package.
See refs for experimental protocol and further details.
data("PBMC")
data("PBMC")
A data frame with 49651 observations on the following 26 variables.
The following 25 surface markers were measured by CyTOF.
CD19
B cell receptor (BCR)
CD117
c-Kit; RTK expressed by stem and progenitor cells
CD11b
ITGAM, macrophage-1 antigen; complement receptor 3
CD4
T cell receptor (TCR) co-receptor; binds antigens presented by MHC II
CD8
T cell receptor (TCR) co-receptor; binds antigens presented by MHC I
CD20
B cell surface protein
CD34
surface protein expressed by hemotopoeitic stem cells and lost over course of differentiation
CD61
surface marker expressed on platelets
CD123
Interleukin-3 receptor; expressed by progenitor cells
CD45RA
CD45 isoform expressed by Naive T lymphocytes
CD45
protein tyrosine phosphatase; expressed by all mature leukocytes
CD10
membrane metallo-endopeptidase expressed by common lymphoid progenitors
CD33
Siglec-3; expressed by myeloid cells
CD11c
complement receptor; highly expressed on dendritic cells and myeloid cells
CD14
pattern recognition receptor expressed by innate lymphoid cells
CD69
involved in signaling and proliferation of activated t-lymphocytes and natural killer cells
CD15
plays role in phagocytosis and chemotaxis; expressed in multiple blood cell malignancies
CD16
low affinity Fc receptor for IgG; expressed by natural killer cells, neutrophils, and myeloid cells
CD44
cell adhesion molecule and hyaluronic acid receptor
CD38
highly expressed on germinal center B cells and plasma cells
CD25
IL-2 receptor; expressed by activated T cells
CD3
T cell receptor (TCR)
IgM
heavy chain isoform of BCR
HLADR
MHC class II receptor
CD56
NCAM; expressed by natural killer cells
cluster
1: CD4+ T cells 2: CD8+ T cells 3: Dendritic cells (DCs) 4: IgM- B cells 5: IgM+ B cells 6: Monocytes 7: Natural killer (NK) cells
The dataset should be arcsinh transformed with cofactor of 15. See MEM
for more details.
Leelatian et al., Methods Mol Biol, 2015.
Diggins et al., Methods, 2016. Diggins et al., Nature Methods, 2017
data(PBMC)
data(PBMC)