Install the package SOMNiBUS from Bioconductor.
SOMNiBUS aims to analyze count-based methylation data on predefined genomic regions, such as those obtained by targeted sequencing, and thus to identify differentially methylated regions (DMRs) that are associated with phenotypes or traits surch as cell types.
Major advantages of SOMNiBUS
For a more comprehensive introduction of the SOMNiBUS approach, please read our SOMNiBUS paper (Kaiqiong Zhao 2020).
If you use this package, please cite our SOMNiBUS paper (Kaiqiong Zhao 2020).
Throughout this vignette, we illustrate the SOMNiBUS
approach with analysis of a targeted region from a rheumatoid arthritis
(RA) study. See help(RAdat)
for further details. In this
example, the phenotype of major interest is the RA status (coded as
RA
) and the adjusting variable is the cell type status
(coded as T_cell
) which is binary because the experiment
used cell-type-separated blood samples, and methylation profiles were
characterized for both T-cells and Monocytes. We will refer to both
RA
and T_cell
as covariates. We are
going to use the package SOMNiBUS
to investigate the
methylation patterns in this region and study association with RA status
and cell type.
Currently, we require a matrix-type input of the methylated reads
(Meth_Counts
) and the read depth
(Total_Counts
) for CpG sites of each sample. Inputs in
another format, such as Bismark or a BSeq
object
from the bsseq package are now supported using the formatting
functions formatFromBSseq()
,
formatFromBismark()
and formatFromBSmooth()
(more details below).
Before using the package, the input data matrix (or data frame) should be formatted such that:
Meth_Counts
(methylated counts), Total_Counts
(read depths), Position
(Genomic position for the CpG site)
and ID
(sample ID)An example of the input data:
data("RAdat")
head(RAdat)
#> Meth_Counts Total_Counts Position ID T_cell RA
#> 1 0 8 102711629 1 0 1
#> 2 0 2 102711630 1 0 1
#> 3 0 12 102711702 1 0 1
#> 4 0 4 102711703 1 0 1
#> 5 0 15 102711747 1 0 1
#> 6 0 6 102711748 1 0 1
We implemented 3 functions dedicated to the conversion of outputs generated by standard whole-genome shotgun bisulfite sequencing (WGBS) tools such as BSseq R package (Hansen, Langmead, and Irizarry 2012), Bismark (Krueger, and Andrews 2011) and BSmooth (Hansen, Langmead, and Irizarry 2012) alignment suites.
The function formatFromBSseq
reads and converts a
BSseq
object (bsseq_dat
) into a list of
data.frame
s (one per chromosome) to a format compatible
with runSOMNiBUS
and binomRegMethModel
. Each
data.frame
contains rows as individual CpGs appearing in
all samples. The first 4 columns contain the information of
Meth_Counts
(methylated counts), Total_Counts
(read depths), Position
(Genomic position for the CpG site)
and ID
(sample ID).
The additional information (such as disease status, sex, age) extracted from the BSseq object are listed in column 5 and onwards and will be considered as covariate information by SOMNiBUS algorithms.
The functions formatFromBismark
and
formatFromBSmooth
utilize pre-existing methods implemented
in the bsseq R package, read.bismark
and
read.bsmooth
to convert, respectively, Bismark and
BSmooth outputs into BSseq
objects.
Once this conversion is applied, we call formatFromBSseq
to generate the final output (described above).
formatFromBismark <- function(..., verbose = TRUE)
formatFromBSmooth <- function(..., verbose = TRUE)
...
refers to the parameters from
bsseq::read.bismark()
or bsseq::read.bsmooth()
functions. Use ?bsseq::read.bismark()
or
?bsseq::read.bsmooth()
for more information.
To better use the information in the methylation dataset, on one hand, SOMNiBUS uses a smoothing technique (regression splines) to borrow information from the nearby CpG sites; on the other hand, our approach uses regression-based modelling to take advantage of information contained across samples. Therefore, this algorithm does not require filtering out the CpG sites that have methylation levels measured only in a small part of the samples, or the samples that have overall poor read-depths and many missing values. Our analysis of differentially methylated regions (DMRs) requires filtering only on the following two conditions:
T_cell
or RA
in the data set RAdat
)The smooth covariate estimation and the region-wise test steps are
wrapped into a function binomRegMethModel
. See
help(binomRegMethModel)
for more details. We can use the
following code to run the analysis with both covariates
T_cell
and RA
.
If there is a single region to analyze, we can directly call the function binomRegMethModel. We can use the following code to run the analysis with both covariates T_cell and RA.
out <- binomRegMethModel(data = RAdat.f, n.k = rep(5, 3), p0 = 0.003,
p1 = 0.9, Quasi = FALSE, RanEff = FALSE,
verbose = FALSE)
Or, we can use the argument covs
to specify that we only
want the covariate T_cell
in the model.
out.ctype <- binomRegMethModel(data = RAdat.f, n.k = rep(5, 2), p0 = 0.003,
p1 = 0.9, covs = "T_cell", verbose = FALSE)
If the analysis encompasses multiple regions, we use a wrapper
function runSOMNiBUS
(See help(runSOMNiBUS)
for more details) which encapsulates the function
binomRegMethModel
. This function splits the methylation
data into regions (according to different approaches) and, for each
region, calls the function binomRegMethModel
to fit a
(dispersion-adjusted) binomial regression model to regional methylation
data. It returns a list (one element by independent region) of results
generated by the function binomRegMethModel
. Each result
reports the estimated smooth covariate effects and regional p-values for
the test of DMRs. Over or under dispersion across loci is accounted for
in the model by the combination of a multiplicative dispersion parameter
(or scale parameter) and a sample-specific random effect.
The four main approaches are:
help(splitDataByRegion)
for more details.help(splitDataByDensity)
for more details.help(splitDataByGene)
for more details.help(splitDataByChromatin)
for more details.Two generic approaches have also been implemented to enable users to use their own annotations for partitioning purposes:
help(splitDataByBed)
for more details.GenomicRanges
object. See
help(splitDataByGRanges)
for more details.Specifically, the granges approach is used internally to align and partition annotation data coming from bed, gene and chromatin approaches.
Each partitioning function requires a data frame (dat
)
with rows as individual CpGs appearing in all the samples. The first 4
columns contain the information of Meth_Counts
(methylated
counts), Total_Counts
(read depths), Position
(Genomic position for the CpG site) and ID
(sample ID). The
covariate information, such as disease status or cell type composition,
are listed in column 5 and onwards. These partitioning functions return
a named list of data.frame containing the data of each independent
region. By default, the partitioning approach (region
)
splits the methylation data into regions based on the spacing between
CpGs.
The following command line enables to split my the input data based on the spacing between CpG (CpG islands) and analyze each region:
outs <- runSOMNiBUS(dat = RAdat.f,
split = list(approach = "region", gap = 100),
n.k = rep(10,3), p0 = 0.003, p1 = 0.9,
min.cpgs = 10, max.cpgs = 2000, verbose = TRUE)
#> [2024-10-13 05:23:23][INFO] Running runSOMNiBUS(): Processing methylation data.
#> [2024-10-13 05:23:23][INFO] Running splitDataByRegion(): Partitioning the regions based on the spacing of CpGs
#> [2024-10-13 05:23:23][INFO] Running splitDataByRegion(): Dropped region [102712831-102712832] containing 2 measurement points: below minimal region size accepted for analysis (10).
#> [2024-10-13 05:23:23][INFO] Finished splitDataByRegion(): Process completed in 0.029 secs
#> [2024-10-13 05:23:23][INFO] Input partitioned into 1 independent region(s)
#> [2024-10-13 05:23:23][INFO] Starting analysis for region_102711629_102712708 (119 unique CpGs)
#> [2024-10-13 05:23:23][INFO] Running binomRegMethModelInit(): Initialize data for binomRegMethModel
#> [2024-10-13 05:23:23][INFO] Running binomRegMethModelChecks(): Check inputs consistency
#> [2024-10-13 05:23:23][INFO] Running binomRegMethModelChecks(): We recommend basis dimentions n.k approximately equal to the number of unique CpGs in the region divided by 20
#> [2024-10-13 05:23:23][INFO] Finished binomRegMethModelInit(): Process completed in 0.0025 secs
#> [2024-10-13 05:23:23][INFO] Running fitGam(): Fit gam for the initial value
#> [2024-10-13 05:23:25][INFO] Finished fitGam(): Process completed in 1.4 secs
#> [2024-10-13 05:23:25][INFO] Running phiFletcher(): Calculating the Fletcher-based dispersion estimate from the final gam output
#> [2024-10-13 05:23:25][INFO] Finished phiFletcher(): Process completed in 0.00099 secs
#> [2024-10-13 05:23:25][INFO] Running fitEM(): Running EM algorithm to obtain the estimate of alpha
#> [2024-10-13 05:23:25][INFO] Running binomRegMethModelUpdate(): One-step update inside the EM algorithm for fitting the functional parameters theta.0 and beta.
#> [2024-10-13 05:23:26][INFO] Finished binomRegMethModelUpdate(): Process completed in 1.5 secs
#> [2024-10-13 05:23:26][INFO] iteration 1
#> [2024-10-13 05:23:27][INFO] iteration 2
#> [2024-10-13 05:23:28][INFO] iteration 3
#> [2024-10-13 05:23:30][INFO] iteration 4
#> [2024-10-13 05:23:31][INFO] iteration 5
#> [2024-10-13 05:23:32][INFO] iteration 6
#> [2024-10-13 05:23:33][INFO] iteration 7
#> [2024-10-13 05:23:34][INFO] iteration 8
#> [2024-10-13 05:23:36][INFO] iteration 9
#> [2024-10-13 05:23:37][INFO] iteration 10
#> [2024-10-13 05:23:39][INFO] iteration 11
#> [2024-10-13 05:23:40][INFO] iteration 12
#> [2024-10-13 05:23:42][INFO] iteration 13
#> [2024-10-13 05:23:44][INFO] iteration 14
#> [2024-10-13 05:23:45][INFO] iteration 15
#> [2024-10-13 05:23:46][INFO] iteration 16
#> [2024-10-13 05:23:47][INFO] iteration 17
#> [2024-10-13 05:23:49][INFO] iteration 18
#> [2024-10-13 05:23:50][INFO] iteration 19
#> [2024-10-13 05:23:52][INFO] iteration 20
#> [2024-10-13 05:23:53][INFO] iteration 21
#> [2024-10-13 05:23:55][INFO] Finished fitEM(): Process completed in 30 secs
#> [2024-10-13 05:23:55][INFO] Running estimateBZ(): Get the basis matrix for beta0(t) and beta(t)
#> [2024-10-13 05:23:55][INFO] Finished estimateBZ(): Process completed in 0.0029 secs
#> [2024-10-13 05:23:55][INFO] Running estimateBeta(): Given a final GAM output, extract the estimates of beta(t) = BZ * alpha
#> [2024-10-13 05:23:55][INFO] Finished estimateBeta(): Process completed in 0.00077 secs
#> [2024-10-13 05:23:55][INFO] Running estimateVar(): Estimate the variance covariance matrix
#> [2024-10-13 05:23:55][INFO] Running hessianComp(): Compute the Hessian matrix for an EM estimate
#> [2024-10-13 05:23:56][INFO] Finished hessianComp(): Process completed in 1.2 secs
#> [2024-10-13 05:23:56][INFO] Finished estimateVar(): Process completed in 1.2 secs
#> [2024-10-13 05:23:56][INFO] Running estimateSE(): Estimate SE of beta(t) for each covariates
#> [2024-10-13 05:23:56][INFO] Finished estimateSE(): Process completed in 0.00065 secs
#> [2024-10-13 05:23:56][INFO] Running extractDesignMatrix(): Extract design matrix
#> [2024-10-13 05:23:56][INFO] Finished extractDesignMatrix(): Process completed in 0.00047 secs
#> [2024-10-13 05:23:56][INFO] Running binomRegMethModelSummary(): Extract the regional testing results from a GamObj
#> [2024-10-13 05:23:56][INFO] Running binomRegMethModelSummary(): Get the regional summary results for an individual covariate effect
#> [2024-10-13 05:23:56][INFO] Finished binomRegMethModelSummary(): Process completed in 0.0049 secs
#> [2024-10-13 05:23:56][INFO] Running binomRegMethModelSummary(): Extract the regional testing results from a GamObj
#> [2024-10-13 05:23:56][INFO] Running binomRegMethModelSummary(): Get the regional summary results for an individual covariate effect
#> [2024-10-13 05:23:56][INFO] Finished binomRegMethModelSummary(): Process completed in 0.0041 secs
#> [2024-10-13 05:23:56][INFO] Finished binomRegMethModel(): Process completed in 33 secs
#> [2024-10-13 05:23:56][INFO] Finished runSOMNiBUS(): Process completed in 33 secs
In the example data set, we have cell type separated samples. The
error rates for individual samples can be estimated by a E-M algorithm
(Lakhal-Chaieb et al. 2017) using the
package SmoothMSC
. The error rate default values, p0 = 0.003 and p1 = 0.9, were estimated
as the average incomplete (p0) or over- conversion
(1 − p1) of the
metabisulfite. These two estimated values coincide roughly with the
incomplete and over conversion rates related to bisulfite sequencing
experiment reported in Prochenka et al.
(2015). Both parameters, p0 and p1, correspond to the false
positive rate and the true positive rate respectively, where 1-p1 being
the false negative rate.
For experiments with samples from a tissue containing a mixture of cell types, the user could consider the following ways to specify the error rates p0 and 1-p1.
p0
and p1
Argument n.k
in the binomRegMethModel
is
the dimension of the basis expansion for smooth covariate effects. The
exact number n.k
used for each functional parameter is not
crucial, because it only sets an upper bound. We recommend choosing a
basis dimension approximately equal to the number of unique CpGs in the
region divided by 20. Please notice that, this parameter is computed
automatically (overwriting the value provided by the user if any), when
several regions are generated by the partitioning function within the
wrapper function runSOMNiBUS
.
Under the null hypothesis, we are expecting no effects of the covariates over the region-wide methylation status.
binomRegMethModelPlot(BEM.obj = out)
#> [2024-10-13 05:23:56][INFO] Running binomRegMethModelPlot(): Plot the smooth covariate effect
#> Warning in geom_line(): All aesthetics have length 1, but the data has 363 rows.
#> ℹ Please consider using `annotate()` or provide this layer with data containing
#> a single row.
#> Warning in geom_line(aes(y = "Lower"), linetype = "dashed"): All aesthetics have length 1, but the data has 363 rows.
#> ℹ Please consider using `annotate()` or provide this layer with data containing
#> a single row.
#> Warning in geom_line(aes(y = "Upper"), linetype = "dashed"): All aesthetics have length 1, but the data has 363 rows.
#> ℹ Please consider using `annotate()` or provide this layer with data containing
#> a single row.
#> Warning in geom_rug(sides = "b", color = "black"): All aesthetics have length 1, but the data has 363 rows.
#> ℹ Please consider using `annotate()` or provide this layer with data containing
#> a single row.
We can also force the covariate effect plots to have the same
vertical range, for all covariates, by specifying
same.range = TRUE
.
binomRegMethModelPlot(out, same.range = TRUE)
#> [2024-10-13 05:23:57][INFO] Running binomRegMethModelPlot(): Plot the smooth covariate effect
#> Warning in geom_line(): All aesthetics have length 1, but the data has 363 rows.
#> ℹ Please consider using `annotate()` or provide this layer with data containing
#> a single row.
#> Warning in geom_line(aes(y = "Lower"), linetype = "dashed"): All aesthetics have length 1, but the data has 363 rows.
#> ℹ Please consider using `annotate()` or provide this layer with data containing
#> a single row.
#> Warning in geom_line(aes(y = "Upper"), linetype = "dashed"): All aesthetics have length 1, but the data has 363 rows.
#> ℹ Please consider using `annotate()` or provide this layer with data containing
#> a single row.
#> Warning in geom_rug(sides = "b", color = "black"): All aesthetics have length 1, but the data has 363 rows.
#> ℹ Please consider using `annotate()` or provide this layer with data containing
#> a single row.
The user can select a subset of covariates of interest by indicating
the name of those covariates with the covs
arguments.
# creating plot
binomRegMethModelPlot(BEM.obj = out, same.range = FALSE, verbose = FALSE,
covs = c("RA", "T_cell"))
#> Warning in geom_line(): All aesthetics have length 1, but the data has 242 rows.
#> ℹ Please consider using `annotate()` or provide this layer with data containing
#> a single row.
#> Warning in geom_line(aes(y = "Lower"), linetype = "dashed"): All aesthetics have length 1, but the data has 242 rows.
#> ℹ Please consider using `annotate()` or provide this layer with data containing
#> a single row.
#> Warning in geom_line(aes(y = "Upper"), linetype = "dashed"): All aesthetics have length 1, but the data has 242 rows.
#> ℹ Please consider using `annotate()` or provide this layer with data containing
#> a single row.
#> Warning in geom_rug(sides = "b", color = "black"): All aesthetics have length 1, but the data has 242 rows.
#> ℹ Please consider using `annotate()` or provide this layer with data containing
#> a single row.
The mfrow
parameter allows you to create a matrix of
plots in one plotting space. It takes a vector of length two as an
argument, corresponding to the number of rows and columns in the
resulting plotting matrix.
# creating a 2x2 matrix
binomRegMethModelPlot(BEM.obj = out, same.range = FALSE, verbose = FALSE,
mfrow = c(2,2))
#> Warning in geom_line(): All aesthetics have length 1, but the data has 363 rows.
#> ℹ Please consider using `annotate()` or provide this layer with data containing
#> a single row.
#> Warning in geom_line(aes(y = "Lower"), linetype = "dashed"): All aesthetics have length 1, but the data has 363 rows.
#> ℹ Please consider using `annotate()` or provide this layer with data containing
#> a single row.
#> Warning in geom_line(aes(y = "Upper"), linetype = "dashed"): All aesthetics have length 1, but the data has 363 rows.
#> ℹ Please consider using `annotate()` or provide this layer with data containing
#> a single row.
#> Warning in geom_rug(sides = "b", color = "black"): All aesthetics have length 1, but the data has 363 rows.
#> ℹ Please consider using `annotate()` or provide this layer with data containing
#> a single row.
# creating a 3x1 matrix
binomRegMethModelPlot(BEM.obj = out, same.range = FALSE, verbose = FALSE,
mfrow = c(3,1))
#> Warning in geom_line(): All aesthetics have length 1, but the data has 363 rows.
#> ℹ Please consider using `annotate()` or provide this layer with data containing
#> a single row.
#> Warning in geom_line(aes(y = "Lower"), linetype = "dashed"): All aesthetics have length 1, but the data has 363 rows.
#> ℹ Please consider using `annotate()` or provide this layer with data containing
#> a single row.
#> Warning in geom_line(aes(y = "Upper"), linetype = "dashed"): All aesthetics have length 1, but the data has 363 rows.
#> ℹ Please consider using `annotate()` or provide this layer with data containing
#> a single row.
#> Warning in geom_rug(sides = "b", color = "black"): All aesthetics have length 1, but the data has 363 rows.
#> ℹ Please consider using `annotate()` or provide this layer with data containing
#> a single row.
First, construct a new data set for prediction. Make sure that the
Position in the new data set is the same as the original input
data
in runSOMNiBUS
(or
binomRegMethModel
).
# simulating new data
pos <- out$uni.pos
my.p <- length(pos)
newdata <- expand.grid(pos, c(0, 1), c(0, 1))
colnames(newdata) <- c("Position", "T_cell", "RA")
The predicted methylation levels can be calculated from function
binomRegMethModelPred
using both prediction types
(link.scale and proportion). See
help(binomRegMethModelPred)
for more details.
# prediction of methylation levels for the new data (logit scale)
my.pred.log <- binomRegMethModelPred(out, newdata, type = "link.scale",
verbose = FALSE)
# prediction of methylation levels for the new data (proportion)
my.pred.prop <- binomRegMethModelPred(out, newdata, type = "proportion",
verbose = FALSE)
We can visualize the prediction results using the function
binomRegMethPredPlot
(see
help(binomRegMethPredPlot)
for more details). We defined
some experimental design in order to identify the different expression
patterns based on the different disease and cell type status. In the
following example, we created 4 groups of samples:
controls (RA == 0
) t-cells
(T_cell == 1
),
controls (RA == 0
) monocytes
(T_cell == 0
),
cases (RA == 1
) t-cells
(T_cell == 1
),
cases (RA == 1
) monocytes
(T_cell == 0
).
And add the design to the input data and prediction results using the following code:
# creating the experimental design
newdata$group <- ""
newdata[(newdata$RA == 0 & newdata$T_cell == 0),]$group <- "CTRL MONO"
newdata[(newdata$RA == 0 & newdata$T_cell == 1),]$group <- "CTRL TCELL"
newdata[(newdata$RA == 1 & newdata$T_cell == 0),]$group <- "RA MONO"
newdata[(newdata$RA == 1 & newdata$T_cell == 1),]$group <- "RA TCELL"
# merge input data and prediction results
pred <- cbind(newdata, Logit = my.pred.log, Prop = my.pred.prop)
head(pred)
#> Position T_cell RA group Logit Prop
#> 1 102711629 0 0 CTRL MONO -6.062564 0.0023230139
#> 2 102711630 0 0 CTRL MONO -6.083959 0.0022739534
#> 3 102711702 0 0 CTRL MONO -7.585820 0.0005073406
#> 4 102711703 0 0 CTRL MONO -7.605609 0.0004974049
#> 5 102711747 0 0 CTRL MONO -8.424272 0.0002194269
#> 6 102711748 0 0 CTRL MONO -8.441490 0.0002156818
Once the data are ready, we create a style
describing
the way each group should be displayed in the plot. If
style = NULL
, the default color and line type are
picked.
# creating the custom style for each experimental group
style <- list(
"CTRL MONO" = list(color = "blue", type = "solid"),
"CTRL TCELL" = list(color = "green", type = "solid"),
"RA MONO" = list(color = "red", type = "solid"),
"RA TCELL" = list(color = "black", type = "solid")
)
The following code enables to visualize the two types of prediction results.
# creating plot (logit scale)
binomRegMethPredPlot(pred = pred, pred.type = "link.scale",
pred.col = "Logit", group.col = "group",
title = "Logit scale",
style = style, verbose = TRUE)
#> [2024-10-13 05:24:00][INFO] Running binomRegMethPredPlot(): Plot the predicted methylation levels
#> Warning in geom_line(mapping = aes(y = pred.col, group = group.col, color = group.col, : All aesthetics have length 1, but the data has 484 rows.
#> ℹ Please consider using `annotate()` or provide this layer with data containing
#> a single row.
#> Warning in geom_rug(sides = "b", color = "black"): All aesthetics have length 1, but the data has 484 rows.
#> ℹ Please consider using `annotate()` or provide this layer with data containing
#> a single row.
#> Warning: No shared levels found between `names(values)` of the manual scale and the
#> data's colour values.
#> Warning: No shared levels found between `names(values)` of the manual scale and the
#> data's linetype values.
#> Warning: No shared levels found between `names(values)` of the manual scale and the
#> data's colour values.
#> Warning: No shared levels found between `names(values)` of the manual scale and the
#> data's linetype values.
# creating plot (proportion)
binomRegMethPredPlot(pred = pred, pred.type = "proportion",
pred.col = "Prop", group.col = "group",
title = "Proportion scale",
style = style, verbose = FALSE)
#> Warning in geom_line(mapping = aes(y = pred.col, group = group.col, color = group.col, : All aesthetics have length 1, but the data has 484 rows.
#> ℹ Please consider using `annotate()` or provide this layer with data containing
#> a single row.
#> Warning in geom_rug(sides = "b", color = "black"): All aesthetics have length 1, but the data has 484 rows.
#> ℹ Please consider using `annotate()` or provide this layer with data containing
#> a single row.
#> Warning: No shared levels found between `names(values)` of the manual scale and the
#> data's colour values.
#> Warning: No shared levels found between `names(values)` of the manual scale and the
#> data's linetype values.
#> Warning: No shared levels found between `names(values)` of the manual scale and the
#> data's colour values.
#> Warning: No shared levels found between `names(values)` of the manual scale and the
#> data's linetype values.
By default, no experimental design is required
(group.col = NULL
). In this case, the prediction results
are displayed as a scatter plot.
binomRegMethPredPlot(pred = pred, pred.type = "link.scale", pred.col = "Logit",
group.col = NULL, title = "Logit scale", verbose = FALSE)
#> Warning in geom_point(mapping = aes(y = pred.col)): All aesthetics have length 1, but the data has 484 rows.
#> ℹ Please consider using `annotate()` or provide this layer with data containing
#> a single row.
#> Warning in geom_rug(sides = "b", color = "black"): All aesthetics have length 1, but the data has 484 rows.
#> ℹ Please consider using `annotate()` or provide this layer with data containing
#> a single row.
By putting the group value to NA
or ""
(empty character), we can specifically exclude some experimental groups
from the plot. In the following example, we excluded monocytes.
# exclusion of the MONO cells (not T_cell)
pred[(pred$RA == 0 & pred$T_cell == 0),]$group <- NA
pred[(pred$RA == 1 & pred$T_cell == 0),]$group <- ""
# creating plot (logit scale)
binomRegMethPredPlot(pred = pred, pred.type = "link.scale",
pred.col = "Logit", group.col = "group",
title = "Logit scale",
style = style, verbose = FALSE)
#> Warning in geom_line(mapping = aes(y = pred.col, group = group.col, color = group.col, : All aesthetics have length 1, but the data has 242 rows.
#> ℹ Please consider using `annotate()` or provide this layer with data containing
#> a single row.
#> Warning in geom_rug(sides = "b", color = "black"): All aesthetics have length 1, but the data has 242 rows.
#> ℹ Please consider using `annotate()` or provide this layer with data containing
#> a single row.
#> Warning: No shared levels found between `names(values)` of the manual scale and the
#> data's colour values.
#> Warning: No shared levels found between `names(values)` of the manual scale and the
#> data's linetype values.
#> Warning: No shared levels found between `names(values)` of the manual scale and the
#> data's colour values.
#> Warning: No shared levels found between `names(values)` of the manual scale and the
#> data's linetype values.
Here is the output of sessionInfo()
on the system on
which this document was compiled running pandoc 3.2.1:
#> R version 4.4.1 (2024-06-14)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] SOMNiBUS_1.13.0 BiocStyle_2.33.1
#>
#> loaded via a namespace (and not attached):
#> [1] sys_3.4.3 jsonlite_1.8.9
#> [3] magrittr_2.0.3 GenomicFeatures_1.57.1
#> [5] farver_2.1.2 rmarkdown_2.28
#> [7] BiocIO_1.15.2 zlibbioc_1.51.1
#> [9] vctrs_0.6.5 memoise_2.0.1
#> [11] Rsamtools_2.21.2 DelayedMatrixStats_1.27.3
#> [13] RCurl_1.98-1.16 htmltools_0.5.8.1
#> [15] S4Arrays_1.5.10 AnnotationHub_3.13.3
#> [17] curl_5.2.3 Rhdf5lib_1.27.0
#> [19] SparseArray_1.5.44 rhdf5_2.49.0
#> [21] sass_0.4.9 bslib_0.8.0
#> [23] bsseq_1.41.0 plyr_1.8.9
#> [25] cachem_1.1.0 buildtools_1.0.0
#> [27] GenomicAlignments_1.41.0 lifecycle_1.0.4
#> [29] pkgconfig_2.0.3 Matrix_1.7-0
#> [31] R6_2.5.1 fastmap_1.2.0
#> [33] GenomeInfoDbData_1.2.13 MatrixGenerics_1.17.0
#> [35] digest_0.6.37 colorspace_2.1-1
#> [37] AnnotationDbi_1.67.0 S4Vectors_0.43.2
#> [39] regioneR_1.37.0 GenomicRanges_1.57.2
#> [41] RSQLite_2.3.7 filelock_1.0.3
#> [43] fansi_1.0.6 httr_1.4.7
#> [45] abind_1.4-8 mgcv_1.9-1
#> [47] compiler_4.4.1 bit64_4.5.2
#> [49] withr_3.0.1 BiocParallel_1.39.0
#> [51] DBI_1.2.3 highr_0.11
#> [53] HDF5Array_1.33.8 R.utils_2.12.3
#> [55] rappdirs_0.3.3 DelayedArray_0.31.14
#> [57] rjson_0.2.23 gtools_3.9.5
#> [59] permute_0.9-7 tools_4.4.1
#> [61] R.oo_1.26.0 glue_1.8.0
#> [63] restfulr_0.0.15 nlme_3.1-166
#> [65] rhdf5filters_1.17.0 grid_4.4.1
#> [67] reshape2_1.4.4 generics_0.1.3
#> [69] gtable_0.3.5 BSgenome_1.73.1
#> [71] tzdb_0.4.0 R.methodsS3_1.8.2
#> [73] tidyr_1.3.1 data.table_1.16.2
#> [75] hms_1.1.3 utf8_1.2.4
#> [77] XVector_0.45.0 BiocGenerics_0.51.3
#> [79] BiocVersion_3.20.0 pillar_1.9.0
#> [81] stringr_1.5.1 limma_3.61.12
#> [83] splines_4.4.1 dplyr_1.1.4
#> [85] BiocFileCache_2.13.2 lattice_0.22-6
#> [87] rtracklayer_1.65.0 bit_4.5.0
#> [89] tidyselect_1.2.1 locfit_1.5-9.10
#> [91] maketools_1.3.1 Biostrings_2.73.2
#> [93] knitr_1.48 IRanges_2.39.2
#> [95] SummarizedExperiment_1.35.4 stats4_4.4.1
#> [97] xfun_0.48 Biobase_2.65.1
#> [99] statmod_1.5.0 annotatr_1.31.0
#> [101] matrixStats_1.4.1 stringi_1.8.4
#> [103] VGAM_1.1-12 UCSC.utils_1.1.0
#> [105] yaml_2.3.10 evaluate_1.0.1
#> [107] codetools_0.2-20 tibble_3.2.1
#> [109] BiocManager_1.30.25 cli_3.6.3
#> [111] munsell_0.5.1 jquerylib_0.1.4
#> [113] Rcpp_1.0.13 GenomeInfoDb_1.41.2
#> [115] dbplyr_2.5.0 png_0.1-8
#> [117] XML_3.99-0.17 parallel_4.4.1
#> [119] ggplot2_3.5.1 readr_2.1.5
#> [121] blob_1.2.4 sparseMatrixStats_1.17.2
#> [123] bitops_1.0-9 scales_1.3.0
#> [125] purrr_1.0.2 crayon_1.5.3
#> [127] rlang_1.1.4 KEGGREST_1.45.1