| Title: | Automatic scoring and classification of cell-based assay images |
|---|---|
| Description: | This software is meant to be used for classification of images of cell-based assays for neuronal surface autoantibody detection or similar techniques. It takes imaging files as input and creates a composite score from these, that for example can be used to classify samples as negative or positive for a certain antibody-specificity. The reason for its name is that I during its creation have thought about the individual picture as an archielago where we with different filters control the water level as well as ground characteristica, thereby finding islands of interest. |
| Authors: | Jakob Theorell [aut, cre, fnd] (ORCID: <https://orcid.org/0000-0001-8752-3151>, Funding provided by the Swedish Wenner-Gren Foundations) |
| Maintainer: | Jakob Theorell <[email protected]> |
| License: | GPL-3 |
| Version: | 1.5.0 |
| Built: | 2026-05-30 07:53:51 UTC |
| Source: | https://github.com/bioc/islify |
This software is meant to be used for classification of images of cell-based assays for neuronal surface autoantibody detection or similar techniques. It takes imaging files as input and creates a composite score from these, that for example can be used to classify samples as negative or positive for a certain antibody-specificity. The reason for its name is that I during its creation have envisioned the individual picture as an archielago where we with different filters control the water level, and thereby define islands with differing sizes and characteristics.
Maintainer: Jakob Theorell <[email protected]>
Useful links:
With this function, the otherwise integrated intensity cutoff is externalised. This is only useful in situations where the original values from the imaging have not been perturbed, so that the individual frames have different max values, and are not all compressed to a range from 0 to 1. NB! This requirement is not fulfilled if the files are exported as RGB compliant, as the RGB format always ranges from 0 to 1.
getIntensityCutoff( imgDirs, frameNum, numOfImgs = "All", numPix = "All", threshold_method = "Triangle", ignore_white = FALSE )getIntensityCutoff( imgDirs, frameNum, numOfImgs = "All", numPix = "All", threshold_method = "Triangle", ignore_white = FALSE )
imgDirs |
A vector or list of pathways, including filenames, to the images to be analysed, e.g. "Raw_images/Positive_ctrl.nd2". Formats that are currently supported are nd2, czi, tiff, png or lists of images, in the form of three-dimensional arrays, where each layer in the third dimension represents a color. nd2, czi or non-normalised, integer TIFF are clearly preferable for memory and resolution purposes. |
frameNum |
This identifies which of the frames in the file that contains the information about the autoantibody binding. |
numOfImgs |
If the provided files are nd2 format, they can contain multiple files. In this case, this flag can be used to restrict the number of used images. |
numPix |
If the frames are very large, this can be used to reduce the computational burden. For reproducibility reasons, it might be clever to run this multiple times, in this case, to bootstrap, or alternatively to set a seed before starting. |
threshold_method |
The method used for thresholding. Available
alternatives are the same as for the
|
ignore_white |
This is passed on to the
|
An intensity cutoff value for the chosen color.
# Load example data and run the function: data(negImage) data(posImage) getIntensityCutoff(imgDirs = list(negImage, posImage), frameNum = 1)# Load example data and run the function: data(negImage) data(posImage) getIntensityCutoff(imgDirs = list(negImage, posImage), frameNum = 1)
With this function, the user can get an overview of how the values for the intensities in a file or set of files are distributed. This is useful mainly to set reasonable cutoff values for the saveImage function, but might also be good for sanity checking the getIntensityCutoff results.
getQuantileIntensities(imgDirs, quantiles = c(0.99))getQuantileIntensities(imgDirs, quantiles = c(0.99))
imgDirs |
A vector or list of pathways, including filenames, to the images to be analysed, e.g. "Raw_images/Positive_ctrl.nd2". Formats that are currently supported are nd2, czi, tiff, png or lists of images, in the form of three-dimensional arrays, where each layer in the third dimension represents a color. nd2, czi or non-normalised, integer TIFF are clearly preferable for memory and resolution purposes. |
quantiles |
Which quantiles should be returned? Values between and including 0 and 1 are accepted. |
A list of matrices with one row per color in the data and with thirteen columns of intensities from percentile 0 through to percentile 100.
# Load example data and run the function: data(negImage) getQuantileIntensities(list(negImage))# Load example data and run the function: data(negImage) getQuantileIntensities(list(negImage))
This function is needed in situations where the cell size is not known. What it does is that it identifies the median cell nucleus size in a provided file. In this case, therefore, it is important that the frame number identifies a frame with nuclear staining, such as DAPI or similar.
getSizeCutoff( imgDirs, frameNum, noiseThreshold = 10, numPix = "All", numOfImgs = "All" )getSizeCutoff( imgDirs, frameNum, noiseThreshold = 10, numPix = "All", numOfImgs = "All" )
imgDirs |
A vector or list of pathways, including filenames, to the images to be analysed, e.g. "Raw_images/Positive_ctrl.nd2". Formats that are currently supported are nd2, czi, tiff, png or lists of images, in the form of three-dimensional arrays, where each layer in the third dimension represents a color. nd2, czi or non-normalised, integer TIFF are clearly preferable for memory and resolution purposes. |
frameNum |
This identifies which of the frames in the file that contains the information about the nuclear staining. In RGB-compliant files, this is often number 3. |
noiseThreshold |
This decides under what size in pixels that an islet is considered noise. Defaults to 10 pixels. |
numPix |
If the frames are very large, this can be used to reduce the computational burden. For reproducibility reasons, it might be clever to run this multiple times, in this case, to bootstrap, or alternatively to set a seed before starting. |
numOfImgs |
If the provided files are nd2 format, they can contain multiple files. In this case, this flag can be used to restrict the number of used images. |
A size cutoff value in pixels, which corresponds to the median of the island diameters above the noise threshold.
# Load example data and run the function: data(negImage) getSizeCutoff(imgDirs = list(negImage), frameNum = 3)# Load example data and run the function: data(negImage) getSizeCutoff(imgDirs = list(negImage), frameNum = 3)
This function is the central umbrella function that takes a list of image files or image file directories and generates a list of statistics for each of the files.
islify( imgDirs, imgNames, frameNumFocus, frameNumReference = FALSE, frameNumNuclei = FALSE, sizeCutoff, intensityCutoffFocus = TRUE, intensityCutoffReference = TRUE, intensityCutoffNuclei = TRUE, problematicUnevenDistribution = FALSE, threshold_method = "Triangle", ignore_white = FALSE, ringFrac = 0, flatFrac = 0, truncLim = "max", reportIntensity = FALSE, diagnoImgs = TRUE, outDir = ".", highNoise = FALSE, numPix = "All", numOfImgs = "All" )islify( imgDirs, imgNames, frameNumFocus, frameNumReference = FALSE, frameNumNuclei = FALSE, sizeCutoff, intensityCutoffFocus = TRUE, intensityCutoffReference = TRUE, intensityCutoffNuclei = TRUE, problematicUnevenDistribution = FALSE, threshold_method = "Triangle", ignore_white = FALSE, ringFrac = 0, flatFrac = 0, truncLim = "max", reportIntensity = FALSE, diagnoImgs = TRUE, outDir = ".", highNoise = FALSE, numPix = "All", numOfImgs = "All" )
imgDirs |
A vector or list of pathways, including filenames, to the images to be analysed, e.g. "Raw_images/Positive_ctrl.nd2". Formats that are currently supported are nd2, czi, tiff, png or lists of images, in the form of three-dimensional arrays, where each layer in the third dimension represents a color. nd2, czi or non-normalised, integer TIFF are clearly preferable for memory and resolution purposes. |
imgNames |
The names of the images. This argument is used both for naming of the rows in the output matrix, but also for naming of the images if such are generated using the diagnoImgs flag. |
frameNumFocus |
This identifies which of the frames in the files that contains the information that should be measured primarily, such as IgG binding in the case of autoantibody screening. |
frameNumReference |
Optionally, this identifies a frame in the files that can be used to focus the analysis to certain regions. It could be that only areas with GFP expression can contain meaningful expression of the marker of primary interest, and then only these areas will be considered. |
frameNumNuclei |
If no reference is present, but a nuclear staining is, this parameter can be used to identify which approximate surface area for the individual picture that could under optimal circumstances be covered with antibodies. As the nuclei are smaller than the cells, it is theoretically possible to get a report back of a fraction exceeding 1, but this is an unlikely scenario, as 100 percent transfection efficiency is seldomly obtained. |
sizeCutoff |
The size of a typical cell nucleus. This is used to
identify structures of a size equal to or larger than this cell threshold.
Smaller objects that are ring-shaped will be excluded with this cutoff,
which reduces noise considerably, but the threshold should not be set too
high as it then risks leading to the exclusion of true positive cells. This
value can be generated using the |
intensityCutoffFocus |
If a series of images have been generated with
the same settings and the original dynamic range of these images has
been retained, it is preferred to calculate the intensity cutoff before
and provide it here, but the value can also be calculated within
the function. This parameter therefore takes either a value or TRUE,
in which each individual frame gets a separate intensity cutoff. See
|
intensityCutoffReference |
Same as above but for the reference frame, if present. |
intensityCutoffNuclei |
Same as above but for the nuclei frame, if present. |
problematicUnevenDistribution |
In some cases, such as for NMDA-R, a recurrent problem is that regions on the borders of the well, or also more centrally, clumps of cells can form that artificially stain very positively. To get rid of this effect, this flag can be set to true, in which case the image (if it is wider than 2000 pixels in both directions) is split into regions of at least 500 pixels in width and length. The number of pixels identified as part of islands is thereafter calculated. Regions with more than the average plus two standard deviations of island presence are thereafter excluded and any islands with presence therein are excluded. This might unintendedly have the consequence of excluding true, large islands in settings with general high intensity, as such a large island might have a central region with increased intensity, so this flag should only be used in cases of very noisy data. |
threshold_method |
The method used for thresholding. Available
alternatives are the same as for the
|
ignore_white |
This is passed on to the
|
ringFrac |
The positive selection filter. After a crude minimal diameter criterion, checks if the island retains its width even if pixels below this fraction of the 99th to 1th percentile range of the intensity of the island are removed. The point is to actively seek truly poiitive islands as surface-stained cells have a heterogeneous, ring-shaped structure. This is not used in the default setting. |
flatFrac |
The negative selection filter, meant to get rid of dead cells that generally have a very homogenous staining pattern. If the median absolute deviation is lower than the flatFrac of the range from the 99th to the 1st percentile of the island, then it is considered flat and excluded. Low values, in the range of 0.001 to 0.1 are recommended. |
truncLim |
In extreme cases, such as with the NMDA-R assay, it might
be helpful to truncated the most highly stained cells, as they
counterintuirively tend to be part of the background. The level for this
truncation limit is set here. THis also affects the diagnoImgs if these
are present. It can be helpful to consult the
|
reportIntensity |
Should the sum of the intensities for all the surviving islands be returned? Default is FALSE. |
diagnoImgs |
Should a images delimiting the islands that have been selected be returned? |
outDir |
The directory that the diagnoImages should be saved in. Only used together with diagnoImgs = TRUE. |
highNoise |
Is the assay especially noisy, with very high and individual background? If so, a second run of noise reduction can be used to increase signal-to-noise ratio. This is often but not always needed for fixed cell-based assays, e.g. |
numPix |
If the frames are very large, this can be used to reduce the computational burden. NB! If this command is used, then the images will not be identical from round to round, and will not be comparable to the output of other functions, as a random subset of the pictures are used. |
numOfImgs |
If the provided files are nd2 format, they can contain multiple files. In this case, this flag can be used to restrict the number of used images. |
A data frame with statistics for the individual images: #'
The intensity cutoff for the individual file. If the reference is included, it will have a separate value.
The fraction of the total number of pixels that pass the filters. If the reference is included, it will have a separate value.
If a reference is included, then fractionOfAll_focus/fractionOfAll_ref is also saved.
If highNoise is TRUE, then the second noise filter is included here. If the reference is included, it will have a separate value.
Optionally, if diagnoImgs = TRUE, a directory containing images identifying the regions that have been identified as cells of interest is exported.
getIntensityCutoff, getSizeCutoff
auto_thresh
# Retrieve the example data data(posImage) data(negImage) # First, establish the average nuclear size. Here, tje nuclei from one image # is generally enough. sizeCutoff <- getSizeCutoff(imgDirs = list(negImage), frameNum = 3) # We can run this algorithm in two ways: either only using the blue for size # and red for the rest, or we can also incorporate the green color, saying # e.g.that we only are interested in IgG expression that colocalizes with # GFP expression. We start with the simpler case. intensityCutoffRed <- getIntensityCutoff( imgDirs = list(negImage, posImage), frameNum = 1 ) result <- islify( imgDirs = list(negImage, posImage), imgNames = c("Neg", "Pos"), frameNumFocus = 1, sizeCutoff = sizeCutoff, intensityCutoffFocus = intensityCutoffRed, diagnoImgs = FALSE ) # As can be noted above, diagnoImgs are set to FALSE, which means that the # output will be restricted to statistics only. If set to TRUE, then an # image showing the selected areas in red on a black-and-white background # are generated. In that case, a directory, outDir, also needs to be # specified. # Now comes the more complex useage, where the reference color is also # integrated. Here, to save time, only the negative image is used for # reference as both are expected to have similar GFP intensities. intensityCutoffGreen <- getIntensityCutoff( imgDirs = list(negImage), frameNum = 2 ) resultWithGreen <- islify( imgDirs = list(negImage, posImage), imgNames = c("Neg", "Pos"), frameNumFocus = 1, frameNumReference = 2, sizeCutoff = sizeCutoff, intensityCutoffFocus = intensityCutoffRed, intensityCutoffReference = intensityCutoffGreen, diagnoImgs = FALSE )# Retrieve the example data data(posImage) data(negImage) # First, establish the average nuclear size. Here, tje nuclei from one image # is generally enough. sizeCutoff <- getSizeCutoff(imgDirs = list(negImage), frameNum = 3) # We can run this algorithm in two ways: either only using the blue for size # and red for the rest, or we can also incorporate the green color, saying # e.g.that we only are interested in IgG expression that colocalizes with # GFP expression. We start with the simpler case. intensityCutoffRed <- getIntensityCutoff( imgDirs = list(negImage, posImage), frameNum = 1 ) result <- islify( imgDirs = list(negImage, posImage), imgNames = c("Neg", "Pos"), frameNumFocus = 1, sizeCutoff = sizeCutoff, intensityCutoffFocus = intensityCutoffRed, diagnoImgs = FALSE ) # As can be noted above, diagnoImgs are set to FALSE, which means that the # output will be restricted to statistics only. If set to TRUE, then an # image showing the selected areas in red on a black-and-white background # are generated. In that case, a directory, outDir, also needs to be # specified. # Now comes the more complex useage, where the reference color is also # integrated. Here, to save time, only the negative image is used for # reference as both are expected to have similar GFP intensities. intensityCutoffGreen <- getIntensityCutoff( imgDirs = list(negImage), frameNum = 2 ) resultWithGreen <- islify( imgDirs = list(negImage, posImage), imgNames = c("Neg", "Pos"), frameNumFocus = 1, frameNumReference = 2, sizeCutoff = sizeCutoff, intensityCutoffFocus = intensityCutoffRed, intensityCutoffReference = intensityCutoffGreen, diagnoImgs = FALSE )
This is a negative control image, used for help functions and vignettes. It is a real-life image showing CASPR2 expressing HEK293-MSC cells co-incubated with CASPR2-IgG. It was generated by Dr Amanda Freitas Huhtamäki during her postdoctoral period in the Jakob Theorell team, Center for Infectious Medicine, Department of Medicine Huddinge, in the Live Cell Imaging core facility, Huddinge, both Karolinska Institutet, Sweden.
data("negImage")data("negImage")
An a x b x c array representing red (IgG), green (GFP) and blue (DAPI) colors.
This is a positive control image, used for help functions and vignettes. It is a real-life image showing CASPR2 expressing HEK293-MSC cells co-incubated with CASPR2-IgG. It was generated by Dr Amanda Freitas Huhtamäki during her postdoctoral period in the Jakob Theorell team, Center for Infectious Medicine, Department of Medicine Huddinge, in the Live Cell Imaging core facility, Huddinge, both Karolinska Institutet, Sweden.
data("posImage")data("posImage")
An a x b x c array representing red (IgG), green (GFP) and blue (DAPI) colors.
This function simply generates pngs from raw image files, such as .nd2.
saveImage( imgDirs, imgNames, frameNums = "All", frameCols, truncTo = "max", outDir = ".", numOfImgs = "All", numPix = "All", saveImages = TRUE )saveImage( imgDirs, imgNames, frameNums = "All", frameCols, truncTo = "max", outDir = ".", numOfImgs = "All", numPix = "All", saveImages = TRUE )
imgDirs |
The directory containing the image files. nd2, czi, tiff and png formats or a list of three-dimensional arrays, where each layer in the third dimension represents a color are currently supported. nd2, czi or non-normalised, integer TIFF are clearly preferable. |
imgNames |
The names of the images. This argument is used both for naming of the rows in the output matrix, but also for naming of the images if such are generated using the diagnoImgs flag. |
frameNums |
This identifies which of the frames in the files that should be plotted. Defaults to "All", which then also generates a merge. |
frameCols |
Which color does the frame/frames in question have? Strangely, in some file formats such as nd2, it varies - not all are "R-G-B", but "G-R-B" also occurs. Can take the values "R"/"Red", "G"/"Green", and "B"/"Blue". Should be a vector the same length as frameNums. If nothing is provided, it defaults to only saving the merge. |
truncTo |
Above which value should the data be truncated? This argument
has two possible inputs: a numeric value or "max". Vectors of
individual values, the same length as frameCols are also accepted. The value
option is often the most suitable, but requires some knowledge of the range
of values expected in the file. For this, the
|
outDir |
The directory that the images should be saved in. Subfolders are created within this directory for each color. |
numOfImgs |
If the provided files are nd2 format, they can contain multiple files. In this case, this flag can be used to restrict the number of used images. |
numPix |
If the frames are very large, this can be used to reduce the computational burden. For reproducibility reasons, it might be clever to run this multiple times, in this case, to bootstrap, or alternatively to set a seed before starting. NB! If this command is used, then the images will not be identical from round to round, and will not be comparable to the output of other functions, as a random subset of the pictures are used. |
saveImages |
Should images be saved? This of course seems like a rather pointless command for a function meant to save pictures, but it is there for development reasons. Don't use it! |
Named png images that are scaled to the max allowed value in the raw image file and coloured according to frameCol.
# Retrieve the example data data(posImage) data(negImage) # To use this function, the getQuantileIntensities function is very helpful, # as it can be used to define reasonable truncTo values. It is of course # important to use a strongly stained sample in this case, to make sure that # the positive samples in the series are not overly truncated, reducing the # visual signal-to-noise ratio. posQuants <- getQuantileIntensities(list(posImage), quantiles = 0.99) # We will use the 99th percentile posQuants # Percent_99 # 1 0.3681333 # 2 0.2476340 # 3 0.3794010 # And now to the function. NB! In this case, we are using the last # "saveImages = FALSE" flag, which means that nothing is produced! Remove it # to use the function. saveImage(list(posImage, negImage), c("pos", "neg"), frameNums = "All", frameCols = c("R", "G", "B"), truncTo = posQuants[[1]][, "Percent_99"], outDir = ".", saveImages = FALSE )# Retrieve the example data data(posImage) data(negImage) # To use this function, the getQuantileIntensities function is very helpful, # as it can be used to define reasonable truncTo values. It is of course # important to use a strongly stained sample in this case, to make sure that # the positive samples in the series are not overly truncated, reducing the # visual signal-to-noise ratio. posQuants <- getQuantileIntensities(list(posImage), quantiles = 0.99) # We will use the 99th percentile posQuants # Percent_99 # 1 0.3681333 # 2 0.2476340 # 3 0.3794010 # And now to the function. NB! In this case, we are using the last # "saveImages = FALSE" flag, which means that nothing is produced! Remove it # to use the function. saveImage(list(posImage, negImage), c("pos", "neg"), frameNums = "All", frameCols = c("R", "G", "B"), truncTo = posQuants[[1]][, "Percent_99"], outDir = ".", saveImages = FALSE )
This function is derived in its entirety from the deprecated imageHTS package maintained by Gregoire Pau <gregoire.pau at embl.de>. All the documentation below as well as the code is identical to the original text. Only the simplified example is new. So kudos to the original authors! This re-publication is just there as no alternative has seemingly been available in R for years and it is needed for the downstream analyses with islify data.
zprime(a, b, method = c("mahalanobis", "robust", "fixsd", "original"))zprime(a, b, method = c("mahalanobis", "robust", "fixsd", "original"))
a |
A vector or matrix of control features. For example, positive controls. |
b |
A vector or matrix of different control values. For example negative controls. |
method |
A character vector, indicating which method should be used to compute the Z'-factor. Default is mahalanobis. See details. |
The Z'-factor is a popular metric measuring the separation of control features in high-throughput screens. The original paper describing the Z'-factor is Zhang, 1999, J Biomol Screen.
#' Several univariate Z'-factor scores exist. The original Z'-factor from Zhang, 1999 is computed by Z' = 1 - 3*(sd(a)+sd(b))/abs(mean(a)-mean(b)). A more rigorous definition of the score, implemented by the method fixsd is given by Z' = 1 - 3*sqrt(var(a)+var(b))/abs(mean(a)-mean(b)), where the pooled standard deviation is computed by the square root of the sum of the control variances. A robust method, less sensitive to outliers, is computed by the relation Z' = 1-3*(mad(a)+mad(b))/abs(median(a)-median(b)) where the control dispersions are computed with the mad and the control locations with the median.
A multivariate extension of the Z'-factor score can be designed by linearly transforming the multivariate data to one dimension and computing the standard (here, fixsd) Z'-factor. It can be shown that the linear transform that maximizes the score is the LDA. Moreover, one can demonstrate that the resulting Z'-factor score is equivalent of computing Z' = 1 - 3/dMaha(mu_a, mu_b, Sigma_a + Sigma_b) where dMaha is the Mahalanobis distance.
The z-prime value for the data in question.
J. H. Zhang, T. D. Chung, K. R. Oldenburg. A Simple Statistical Parameter for Use in Evaluation and Validation of High Throughput Screening Assays. J Biomol Screening, 1999.
#Generate some data a <- rnorm(100, mean = 0, sd = 1) b <- rnorm(100, mean = 7, sd = 1) zprime(a, b)#Generate some data a <- rnorm(100, mean = 0, sd = 1) b <- rnorm(100, mean = 7, sd = 1) zprime(a, b)