Use a more stable algorithm for identifying the knee point in barcodeRanks()
.
The new algorithm is based on maximizing the distance from a line between the plateau and the inflection point.
Previously, we tried to minimize the signed curvature but this was susceptible to many local minima due to the instability of the empirical second derivative, even after smoothing.
Set alpha=Inf
as the default for testEmptyDrops()
.
This is motivated by the realization that an underestimated alpha
can still yield anticonservative p-values and is not universally safer than alpha=Inf
.
Defaulting alpha=Inf
is preferable as it is at least correct in the expected case of multinomial sampling.
Added an intersect.genes=
option to read10xCounts()
for samples with inconsistent gene information.
Automatically fix empty chromosome names for mitochondrial genes in certain Cellranger outputs.
Added BPPARAM=
to read10xCounts()
for parallelized reading of multiple samples.
Gave all the *Ambience()
functions better names, and soft-deprecated the current versions.
Added ambientContribSparse()
to estimate the ambient contribution under sparsity assumptions.
Added cleanTagCounts()
to remove undesirable barcodes from tag count matrices.
Converted all matrix-accepting functions to S4 generics to support SummarizedExperiment inputs.
emptyDrops()
will now coerce all DelayedArray inputs into wrapped SparseArraySeeds.
Setting test.ambient=TRUE
in emptyDrops()
will no longer alter the FDRs compared to test.ambient=FALSE
.
Added test.ambient=NA
to retain back-compatible behavior.
Bugfix for correct use of redefined lower
when by.rank=
is set in emptyDrops()
.
Added a constant.ambient=TRUE
option to hashedDrops()
to better support experiments with very few HTOs.
Migrated downsampleMatrix()
to scuttle with a re-export.
Added features=
to downsampleReads()
for per-feature-set downsampling.
Added matrix support for y=
and ambient=
in maximumAmbience()
.
Added controlAmbience()
for easy estimation of ambient contamination with control features.
Added removeAmbience()
function to remove the ambient solution from a count matrix, mostly for aesthetics.
Report library index and feature type in output of read10xMolInfo()
.
Support subsetting by library index/type in functions that use the molecule information file,
such as swappedDrops()
and chimericDrops()
.
Added by.rank=
option to estimateAmbience()
and emptyDrops()
,
for estimation of the ambient profile by excluding barcodes with the largest totals.
Added exclude.from=
option to barcodeRanks()
,
to avoid problems with instability at low ranks for knee/inflection calculations
(contributed by Stefano Mangiola).
Minor bugfix in barcodeRanks()
calculation of the knee point.
Note that this affects the default choice of retain=
in emptyDrops()
.
Split off HTO ambience inferences into a separate inferAmbience()
function.
Added support for combinatorial barcodes in hashedDrops()
.
Added the downsampleBatches() function for convenient downsampling of batches.
Preliminary support for using the output of write10xCounts() back in Cellranger.
Support reading in 10X output files via prefixes in read10xCounts(). Automatic detection of whether a file is Gzipped or not.
Added chimericDrops() to remove chimeric molecules due to within-sample re-priming.
Added hashedDrops() to demultiplex cell hashing experiments.
Added maximumAmbience() to estimate the maximum contribution of the ambient profile.
Switched emptyDrops() to use Boost's discrete_distribution for weighted sampling. This results in some minor stochastic changes to the Monte Carlo p-values. Automatically round non-integer count matrices.
Removed read10xMatrix().
Supported CellRanger v3 output files in read10xMolInfo(), read10xCounts(), write10xCounts().
Modified barcodeRanks() to return a DataFrame with knee/inflection estimates in metadata.
Slight change to random number generation in emptyDrops() to be agnostic to number of cores.
Added removeSwappedDrops() for removing swapping in other types of droplet-based data.
Added alpha= argument to testEmptyDrops() to support overdispersion during sampling. Returned arguments and estimates in metadata of testEmptyDrops(), emptyDrops().
Added encodeSequences() for convenient 2-bit encoding of sequences.
Added get10xMolInfoStats() function to compute per-cell statistics from a molecule info file.
Deprecated read10xMatrix(), as it does not add much practical value over Matrix::readMM().
Support the 10X sparse HDF5 format in read10xCounts().
Support the 10X sparse HDF5 format in write10xCounts().
New package DropletUtils, for handling droplet-based single-cell RNA sequencing data.