The flowAI package allows to perform quality control on flow
cytometry data in order to warrant superior results for both manual and
automated downstream analysis. The package is built on the functions: 1)
flow_auto_qc
, for the automatic analysis and 2)
flow_iQC()
, for the interactive analysis. The full pipeline
of our quality control procedure includes the removal of events having
anomalous values when looking at three aspect of a flow cytometry
analysis:
The evaluation of these aspects makes it possible to remove the technical variability derived from surges or deviations in the flow rate, defective laser-detection system, data range limitations and other technical issues.
For this documentation or for other testing purposes we use a small built-in dataset. The dataset was manually created extracting a subsample of cells and channels from three FCS files that were part of an aging study of a Singaporean cohort. The dataset is stored as a flowSet object.
data(Bcells)
Bcells
## A flowSet with 3 experiments.
##
## column names(13): FSC-A FSC-H ... PE-Cy7-A Time
To select FCS files from your working directory you can create a
character vector of the files you want to analyze calling the function
dir
with “*fcs$” as regular expression for the pattern
argument.
The automatic method is implemented in the function
flow_auto_qc
. The following calls show how to perform the
quality control with default settings on the FCS files in your folder
and in the toy dataset that comes with the FlowAI packages. The flowAI
package depends on the flowCore package for the handling of the FCS
files in the R environment. The flowCore package provides two main
classes, flowFrame
and flowSet
. The
Bcells
object is an instance of the flowSet
class and contains a set of three FCS files that taken singuarly are
instances of the flowFrame
object. The
flow_auto_qc
function can be called on either one of the
flowCore objects, flowSet
and flowFrame
, and
on a character vector of the fcs files:
resQC <- flow_auto_qc(Bcells) # using a flowSet
resQC <- flow_auto_qc(Bcells[[1]]) # using a flowFrame
resQC <- flow_auto_qc(fcsfiles) # using a character vector
When a character vector is used to call the flow_auto_qc
function, a flowSet
object is automatically generated since
the creation of the histogram for the cell number comparison depends on
it. Therefore, to avoid memory saturation, we suggest to split large
datasets in batches that are compatible with the hardware specifications
of your computer system. For example, if you want batches of maximum 2
gigabytes you can use:
GbLimit <- 2 # decide the limit in gigabyte for your batches of FCS files
size_fcs <- file.size(fcsfiles)/1024/1024/1024 # it calculates the size in gigabytes for each FCS file
groups <- ceiling(sum(size_fcs)/GbLimit)
cums <- cumsum(size_fcs)
batches <- cut(cums, groups)
Then you can run your analysis on the batches using a for-loop:
When setting the output argument to 0 (or any other value apart 1 and 2), no R objects are returned.
After the quality control, the automatic method generates by default
a new FCS file containing an additional parameter where the low quality
events have a value higher than 10,000, similarly to the flowClean
flagging method. Alternatively, a new FCS containing only the high
quality events can be generated. Moreover, flowAI can be implemented in
automatic pipelines of analysis through the returned objects of the
flowFrame
or flowSet
class
Remember that there are several arguments that you can set to improve
the quality control results obtained on your dataset. Moreover, with the
argument remove_from
it is possible to perform partial
quality control on only one or two of the above mentioned properties
(flow rate, signal acquisition and dynamic range).
The function flow_auto_qc
generates a report for each
FCS file, in both a graphic and tabular format, to evaluate the
performance of the algorithms in the detection of the anomalies.
We suggest to run the automatic method first with default settings. If
the results are not satisfying you can either modify the settings or use
the interactive method flow_iQC
.
The interactive method is implemented as a Shiny app and is executed
through the flow_iQC()
command on the R environment. For
performance and clearness reasons, it allows to analyze one file at a
time only. Once you open the Shiny app on your web browser, you can
upload the FCS file from the top part of the left hand side panel.
Here, we give an example of the results obtained after performing the quality control on the first FCS file of the Bcells dataset.
The summary information of the FCS file analyzed is reported in the
first section of the automatically generated report or on the left hand
side panel of the flow_iQC
Shiny app. The summary
information contains the name of the file, the number of events and the
total percentage of anomalies detected and removed.
The following information were obtained from the automatically generated report of our example:
Input File Name: Bcells1
Number of Events: 64562
The anomalies were removed from: Flow Rate, Flow Signal and Flow Margin Anomalies Detected in total: 23% Number of high quality events: 49535
If the dataset has more than three FCS files, the automatic method will produce a histogram containing the number of events for each file. The bar in blue correspond to the FCS file whose quality control analysis is described in the remaining part of the report.
The flow rate is reconstructed using the keyword $TIMESTEP contained
in FCS files with version equal or greater to 3. By default the analysis
is performed using a timestep of 1/10 of a second.
flow_auto_qc
uses an anomaly detection algorithm to detect
and remove the data acquired during flow rate surges and shift from the
median value. The algorithm is based on the Generalized ESD outlier
detection method optimized to work on time series data. The anomalies
automatically detected are circled in green.
flow_iQC
allows to manually select the most stable
region of the flow rate.
For each channel, the median of the signal of equally-sized bins of
events is reported as a Levy-Jennings-type graph. The mean and standard
deviation of the median should remain constant over the course of the
analysis. flow_auto_qc
uses a changepoint detection method
to verify the stability of the signal. Precisely, a shift in the median
or the variance is detected by the Binary Segmentation algorithm of the
changepoint
package. In the resulting plot, the region that
passed the quality control is highlighted in yellow.
As for the flow rate checking, flow_iQC
allows to
manually choose the most stable region.
Events from the upper and lower limits of the dynamic range are
checked in the last step. For the upper limit, the maximum value of the
dynamic range is removed since the instrument is unable to record values
exceeding a maximum pre-set by the manufacturer. For the lower limit,
the quality control removes all the values below zero for the scatter
channels and all the outliers in the negative range for the
immunofluorescence channels. The plot shows the frequency of events
removed over the course of the analysis; the scaling of the x-axis is
complementary to the one of the signal acquisition check. For this step,
both flow_auto_qc
and flow_iQC
use the same
detection principle to scout for anomalies. When using the automatic
method to refine the lower limit of the dynamic range, with the
neg_valueFM
argument you can decide to truncate the
negative values to the cut-off suggested in the FCS file instead of the
removing the negative outliers.