synapsis
synapsis
is an R package which is designed for the
objective and reproducible analysis of meiotic immunofluorescence
data.
Meiosis is a special type of cell division where the chromosome
number is halved prior to fertilisation in sexually reproducing species.
The accurate halving of chromosomes is facilitated through large scale
chromosome exchanges that are referred to as meiotic crossovers. The
necessary precursors to crossovers are meiotic DNA double-strand breaks.
There is very tight control of DNA double-strand break numbers, timing
and positioning. Similarly the repair of these breaks and the
intermediate steps to crossover formation are under strict control. All
of these events take place on a meiosis-specific chromosomal structure,
which is referred to as the synaptonemal complex. These different
meiosis-specific molecular events and structures can be visualised with
antibody labelling and these signals are what synapsis
analyses.
There are three common types of labelling patterns that are considered in meiotic immunofluorescence data sets:
The synaptonemal complex - These linear and thread-like structures are specific to cells which are in meiotic prophase I. A good description of the mouse synaptonemal complex can be found here (Bisig et al. 2012).
Foci - A number of common proteins that are involved in meiotic double-strand break repair form foci when visualised with immunofluoresence. Some common examples of include the visualisation of the dynamic loading and unloading of RAD51, DMC1, MSH4, MLH1 (Cole et al. 2012). Foci can also be seen when visualising relatively fixed chromosomal structures such as centromeres and telomeres.
DNA - DNA is commonly visualised with a stain referred to as DAPI (4′,6-diamidino-2-phenylindole). As nearly all cells contain DNA, DAPI-stained cells are not necessarily in meiosis, but DAPI is still an informative cytological marker.
One of the more common uses of meiotic immunofluorescence data will
be a quantitative analysis of some aspect of double-strand break repair.
The challenge for manual quantification of foci relates to subjective
and undocumented choices: What criteria are required to consider the
data from one cell? E.g. how is meiotic sub-stage determined? How is a
good or bad quality image determined?
How is a foci defined? E.g. size, intensity, signal to background
ratio.
Synapsis also lays the foundation for more complex analyses. For
example synapsis
generates output that is compatible with
machine learning approaches. In future versions of synapsis
it will be possible to measure the relative positioning of multiple foci
on the same synaptonemal complex, which is another common task that is
performed manually by researchers.
A number of good open-source packages exist for the analysis of
immunofluoresence data from somatic cells such as the R packages EBImage
, colocr
,
ColocalizR
,
and the code-free tool CellProfiler
. We
recognise the value of these software and synapsis
makes
significant use of EBImage in particular. Objective and reproducible
analyses are essential in research.
To run this tutorial, you will need the following packages:
##
## Attaching package: 'rmarkdown'
## The following objects are masked from 'package:BiocStyle':
##
## html_document, md_document, pdf_document
synapsis
through bioconductorMake sure you have the BiocManager package installed.
Then type the following into the Rstudio console:
synapsis
from GitHubTo install synapsis
from GitHub, you will need:
You can install the development version of synapsis
directly from the github repository by pasting the following into the
Rstudio console:
You can type
to explore the first function we will use in the “help” window in Rstudio.
synapsis
functions on sample imageHere we will demonstrate the functionality of synapsis. Given an
image whose foci channel (MLH3) is shown in
MLH3rabbit488_SYCP3mouse594_fancm_fvb_x_fancm_bl6_724++_slide01_006-MLH3.tif
,
and synaptonemal complex channel (SYCP3) shown in
MLH3rabbit488_SYCP3mouse594_fancm_fvb_x_fancm_bl6_724++_slide01_006-SYCP3.tif
,
we can:
which are currently the three main features of
synapsis
.
Some notable features mentioned in the tutorial are:
auto_crop_fast(path, annotation = "on", max_cell_area = 30000, min_cell_area = 7000, file_ext = "tif")
## [1] "couldn't crop it since cell is on the edge. Neglected the following mask of a cell candidate:"
## [1] "I cropped this cell:"
## [1] "using this mask"
## [1] "whose cell number is"
## [1] 2
## out of 1 images, we got 1 viable cells
Here we call path, plus other optional parameters (that would otherwise take on default values). But only path is essential. This is because auto_crop_fast has built-in default values which are assumed when the user doesn’t specify.
A crops folder with three channels per “viable cell” should have been generated inside the folder where these images are kept i.e. in path.
We will redo this with different settings, so let’s delete the crops folder.
unlink(paste0(path,"/crops-RGB/"), recursive = TRUE)
unlink(paste0(path,"/crops/"), recursive = TRUE)
Now we will see one of the useful features of synapsis. It can separate cells by making use of EBImage’s watershed and distmap functions. If we set crowded_cells = TRUE, then
auto_crop_fast(path, annotation = "on", max_cell_area = 30000, min_cell_area = 7000, file_ext = "tif",crowded_cells = TRUE)
## [1] "from the file:"
## [1] "MLH3rabbit488_SYCP3mouse594_fancm_fvb_x_fancm_bl6_724++_slide01_006"
## [1] "I cropped this cell:"
## [1] "using this mask"
## [1] "whose cell number is"
## [1] 1
## [1] "I cropped this cell:"
## [1] "using this mask"
## [1] "whose cell number is"
## [1] 2
## [1] "I cropped this cell:"
## [1] "using this mask"
## [1] "whose cell number is"
## [1] 3
## out of 1 images, we got 3 viable cells
Instead of 1 cell (before we said that crowded_cells = TRUE), 3 cells were identified.
SYCP3_stats <- get_pachytene(path,ecc_thresh = 0.8, area_thresh = 0.04, annotation = "on",file_ext = "tif")
## [1] "decided the following is pachytene"
## [1] "decided the following is pachytene"
## number of cells kept 2
SYCP3_stats is a data frame summarizing some features of the cells classified as pachytene.
The data frame is populated with information such as the image file name where the data comes from. This will be more useful when you have many images in a folder. Then there are columns of output from the cells that were identified. You will notice a column for genotype. The default settings assume that “++” or “–” are present in the file name. However these can be modified with the parameters “WT_str” and “KO_str”.
Now that we have the cropped images, and those which have been selected as our stage of interest (pachytene), we can count foci per cell.
Note the option crowded_foci = FALSE, because for our antibodies we only typically expect a couple dozen foci per cell. If you are dealing with an antibody that typically has in excess of ~100 foci per cell, e.g. DMC1, it’s recommended to use crowded_foci = TRUE. .
foci_counts <- count_foci(path,offset_factor = 8, brush_size = 3, offset_px = 0.6, brush_sigma = 3, annotation = "on", stage = "pachytene",file_ext = "tif", watershed_stop= "on", crowded_foci = FALSE, C1 = 0.03, disc_size_foci = 9)
##
## at file /tmp/RtmppQsAS9/Rinst15a640c8b97d/synapsis/extdata/crops/pachytene/MLH3rabbit488_SYCP3mouse594_fancm_fvb_x_fancm_bl6_724++_slide01_006-crop-2-SYCP3.tif
## cell counter is 1
## original images
##
## displaying resulting foci count plots. Overlay two channels:
##
## displaying resulting masks. Overlay two masks:
##
## coincident foci:
##
## two channels, only coincident foci
##
## which counts this many foci: 23
## number of alone foci 7
## at file /tmp/RtmppQsAS9/Rinst15a640c8b97d/synapsis/extdata/crops/pachytene/MLH3rabbit488_SYCP3mouse594_fancm_fvb_x_fancm_bl6_724++_slide01_006-crop-3-SYCP3.tif
## cell counter is 2
## original images
##
## displaying resulting foci count plots. Overlay two channels:
##
## displaying resulting masks. Overlay two masks:
##
## coincident foci:
##
## two channels, only coincident foci
##
## which counts this many foci: 28
## number of alone foci 9
Let’s look at the results:
## filename
## 1 /tmp/RtmppQsAS9/Rinst15a640c8b97d/synapsis/extdata/crops/pachytene/MLH3rabbit488_SYCP3mouse594_fancm_fvb_x_fancm_bl6_724++_slide01_006-crop-2-SYCP3.tif
## 2 /tmp/RtmppQsAS9/Rinst15a640c8b97d/synapsis/extdata/crops/pachytene/MLH3rabbit488_SYCP3mouse594_fancm_fvb_x_fancm_bl6_724++_slide01_006-crop-3-SYCP3.tif
## cell_no genotype stage foci_count sd_foci mean_foci
## 1 1 +/+ pachytene 23 3.58621242263058 6.96666666666667
## 2 2 +/+ pachytene 28 4.53763044882349 10.5135135135135
## median_foci mean_px median_px percent_on
## 1 8 0.0192701445994088 0.0117647058823529 0.880382775119617
## 2 11 0.0289901843781464 0.0196078431372549 0.794344473007712
## sd_px lone_foci comments verdict
## 1 0.0241255328238543 7 meets crisp-ness criteria keep
## 2 0.048167318981632 9 meets crisp-ness criteria keep
## C1
## 1 0.0214486388913312
## 2 0.0148827521147477
foci_counts is a data frame summarizing some features (including foci counts) of the cells classified as pachytene.
The “verdict” column recommends to keep or discard an image based on how it compares to the “crispness criteria” C1 (a measurement of how uniformly sized the foci identified are… empirically indicative of a “clear” image with concident foci and low background).
This is currently how we can identify (to ignore) cells with low signal / high background etc in the foci channel which passed through the automated pipeline.
Let’s clean up the crops folders we just generated..
In this tutorial, we have demonstrated the main functions of the synapsis package. These were:
The output is stored in a folder called “crops” with square images (from all channels) of the cell candidates. Note that a crops-RGB folder is also generated for you to look at the overlapping channels in colour, for your own troubleshooting purposes.
If you use this, a folder called “pachytene” inside the folder
“crops” is generated with the pachytene cell candidates. Make sure you
specify this for the counting function (3) with stage = pachytene, so
that synapsis
knows to look in this folder.
The output is a data frame, which includes a measure of the “crispness” of the foci channel image. Related to this is a recommendation to keep or discard the cell candidate.
The process outlined in (1)-(3) may be second nature to biologists who work with microscopy, and images that pass through this automated pipeline would be discarded by a real person much earlier. However, with an automated package, these decisions are recorded along with a quantification of real signal vs background / noise. The user can also change this criteria to be e.g. less strict and allow more noisy images (in the foci channel) to be kept.
Before trying to repeat the same process on your own images, let’s note some things. It is recommended to use high-resolution images (e.g. more than 4MB and 1000x1000 pixels) so that underlying smoothing operations do not erase small foci that you may aim to detect. This may require some testing on a small number of images to determine the appropriate balance between computational efficiency (if images are excessively large) and throughput. To run a function with a small number of images you can set the parameter test_amount within the function auto_crop_fast.
Next, make sure you are aware of your file extensions, and when you call functions on images that are not .jpeg, to specify e.g. file_ext = tif
Finally, there are many optional inputs to the functions. If the default settings do not generate the desired results, progressively try and alter different parameters by 10’s of percent at a time. The annotation = “on” option is designed for you to knit a notebook which steps you through all of the decisions being made, so that you can troubleshoot parameter values.
Comment on resolution and size
Many of the input parameters in synapsis are in pixel units. It’s good to get a feel of this before you start analysing them. Let’s
or size
The image is 750 x 750 pixels and 2.25mb. If we recall the SYCP3 channel image, keeping in mind that it is 750 pixels wide, a single cell looks like it takes up roughly 1/8th of the width, i.e. 125 pixels. Keeping this approximate width of a cell in pixels will come in handy when determining input parameters for your own images.