Switch to scrapper for DE detection when de.method="t"
or de.method="wilcox"
in trainSingleR()
.
This should give similar results to but is faster than the previous scran functions.
Switch to scrapper for variance calculation, PCA and clustering in aggregateReference()
.
This should be faster than the previous BiocSingular and stats::kmeans
functions, and avoids the need to consider the seed.
Added a configureMarkerHeatmap()
function to perform all the calculations used by plotMarkerHeatmap()
.
This allows users to re-use the calculations with a custom visualization for the expression values.
Automatically remove duplicated gene names in trainSingleR()
.
This avoids matching to the wrong gene after identifying markers from the reference dataset.
Report scores as a DataFrame of nested Dataframes in combineRecomputedResults()
.
Each inner DataFrame corresponds to a reference and contains the identity of the best label and the recomputed score in that reference.
This is simpler and more efficient than the previous "expanded with NA" format.
Report the deltas (i.e., difference between the best and next-best scores) in combineRecomputedResults()
.
Separate the missingness check arguments in SingleR()
with the new check.missing.test=
and check.missing.ref=
options.
The former is disabled by default, to avoid an unnecessary missingness check in the vast majority of test cases.
Removed the deprecated combineCommonResults()
function.
Added the test.genes=
argument to trainSingleR()
, to restrict marker detection to only those genes in the test dataset.
This is also checked against rownames(test)
in classifySingleR()
to ensure that the test's feature space is consistent with the space used during training.
The introduction of test.genes=
means that we no longer need to explicitly subset the rows of the reference dataset (to match the test features) in SingleR()
.
This saves memory by avoiding an unnecessary copy of the reference dataset, but may also slightly alter the marker selection as ties are broken in a different way.
Namely, if the top X genes are used as markers, and the X-th and (X+1)-th gene have the same log-fold change,
tie breaking will be based on the ordering of the rows in the reference matrix - which is no longer the same as in the previous version of SingleR.
This results in some slight differences in the markers that propagate down to the classification results.
Restored the BNPARAM=
argument in trainSingleR()
, to enable more fine-grained specification of neighbor search algorithms.
The approximate=
argument is deprecated.
Soft-deprecated check.missing=
in classifySingleR()
and combineRecomputedResults()
.
This is because any filtering will cause a mismatch between the row names of tests
and the test.genes
in trained
.
Rather, filtering should be done prior to trainSingleR()
, as is done in the main SingleR()
function.
combineRecomputedResults()
now supports fine-tuning to resolve closely-related labels from different references.
This is similar to the fine-tuning in classifySingleR()
where the feature space is iterately redefined as the union of markers of labels with near-highest scores.
Added the plotMarkerHeatmap()
function to plot a diagnostic heatmap of the most interesting markers for each label.
The format of the output of trainSingleR()
has changed and is no longer back-compatible.
recompute=FALSE
in trainSingleR()
does nothing; all integrated analyses are now done with recompute=TRUE
.
To that end, combineCommonResults()
is also deprecated.
genes = "sd"
and its associated options in trainSingleR()
are no longer supported.
first.labels
is no longer reported in classifySingleR()
.
Added another parallelization mechanism via num.threads=
and C++11 threads.
This should be much more memory efficient than using BiocParallel.
combineRecomputedScores()
will automatically handle mismatches in the input references by default.
Relaxed the requirements for consistent row names in combineRecomputedResults()
.
Support sparse DelayedArray inputs in classifySingleR()
.
Parallelize over labels instead of rows in aggregateReference()
, with minor changes in the setting of the seed.
Restrict the PCA to the top 1000 most highly variable genes, for speed.
Migrated all of the dataset getter functions to the celldex package.
Streamlined the vignette to point to the book at https://bioconductor.org/books/devel/SingleRBook/.
Added a restrict=
argument to trainSingleR()
and SingleR()
to easily restrict to a subset of features.
Deprecated the method=
argument in SingleR()
.
Protect against accidental data.frames in ref=
or test=
in all functions.
Added support for consolidating labels from multiple references via combineResults()
.
Added mappings to standardized Cell Ontology terms in all *Data()
functions.
Changed the name of the labels
input of plotScoreDistribution()
to labels.use
for consistency across functions.
Fixed a label from adipocytes to astrocytes in BlueprintEncodeData()
.
Removed umlauts from labels (e.g., naive) in NovershternHematopoieticData()
to avoid problems with Windows.
Perform PCA before clustering in aggregateReference()
for speed and memory efficiency.
Modified genes="all"
behavior in trainSingleR()
to report DE-based markers for fine-tuning only.
New package SingleR for cell type annotation.