Title: | UMI4Cats: Processing, analysis and visualization of UMI-4C chromatin contact data |
---|---|
Description: | UMI-4C is a technique that allows characterization of 3D chromatin interactions with a bait of interest, taking advantage of a sonication step to produce unique molecular identifiers (UMIs) that help remove duplication bias, thus allowing a better differential comparsion of chromatin interactions between conditions. This package allows processing of UMI-4C data, starting from FastQ files provided by the sequencing facility. It provides two statistical methods for detecting differential contacts and includes a visualization function to plot integrated information from a UMI-4C assay. |
Authors: | Mireia Ramos-Rodriguez [aut, cre] , Marc Subirana-Granes [aut], Lorenzo Pasquali [aut] |
Maintainer: | Mireia Ramos-Rodriguez <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.17.0 |
Built: | 2024-10-31 06:35:55 UTC |
Source: | https://github.com/bioc/UMI4Cats |
Get BiocFileCache object
.getCache()
.getCache()
Returns BFC object with the cache for the UMI4Cats package
Get summary of interesting bam statistics
.getSummaryBam(bam_file, mapped = TRUE, secondary = FALSE)
.getSummaryBam(bam_file, mapped = TRUE, secondary = FALSE)
bam_file |
Path for the bam file. |
mapped |
Logical indicating whether to extract mapped reads. |
secondary |
Logical indicating whether to extract secondary aligned reads. |
Returns a numeric containing the number of reads in bam_file
that has the specified mapped
and secondary
status.
Align split fastq file
.singleAlignmentUMI4C( split_file, align_dir, threads = 1, bowtie_index, pos_viewpoint, filter_bp = 1e+07 )
.singleAlignmentUMI4C( split_file, align_dir, threads = 1, bowtie_index, pos_viewpoint, filter_bp = 1e+07 )
split_file |
Split fastq file to align. |
align_dir |
Directory where to save aligned files. |
threads |
Number of threads to use in the analysis. Default=1. |
bowtie_index |
Path and prefix of the bowtie index to use for the alignment. |
pos_viewpoint |
GRanges object containing the genomic position of the viewpoint. |
filter_bp |
Integer indicating the bp upstream and downstream of the viewpoint to select for further analysis. Default=10e6 |
Creates a BAM file in wk_dir/align
named
"basename(fastq))_filtered.bam
", containing the aligned filtered
reads. A data.frame object with the statisitics is also returned.
Count UMIs for a given bam file.
.singleCounterUMI4C( filtered_bam_R1, filtered_bam_R2, digested_genome_gr, pos_viewpoint, res_enz, count_dir, filter_bp = 1e+07 )
.singleCounterUMI4C( filtered_bam_R1, filtered_bam_R2, digested_genome_gr, pos_viewpoint, res_enz, count_dir, filter_bp = 1e+07 )
filtered_bam_R1 |
R1 bam file. |
filtered_bam_R2 |
R2 bam file. |
digested_genome_gr |
GRanges object containing the coordinates for the digested genome. |
pos_viewpoint |
Vector consist of chromosome, start and end position of the viewpoint. |
res_enz |
Character containing the restriction enzyme sequence. |
count_dir |
Counter directory. |
filter_bp |
Integer indicating the bp upstream and downstream of the viewpoint to select for further analysis. Default=10e6 |
Creates a tab-delimited file in wk_dir/count
named
"basename(fastq) _counts.tsv
", containing the
coordinates for the viewpoint fragment, contact fragment and the number of
UMIs detected in the ligation.
Prepar fastq files at a given barcode.
.singlePrepUMI4C( fq_R1, fq_R2, bait_seq, bait_pad, res_enz, prep_dir, numb_reads = 1e+09 )
.singlePrepUMI4C( fq_R1, fq_R2, bait_seq, bait_pad, res_enz, prep_dir, numb_reads = 1e+09 )
fq_R1 |
Fastq file R1. |
fq_R2 |
Fastq file R2. |
bait_seq |
Character containing the bait primer sequence. |
bait_pad |
Character containing the pad sequence (sequence between the bait primer and the restriction enzyme sequence). |
res_enz |
Character containing the restriction enzyme sequence. |
prep_dir |
Prep directory. |
numb_reads |
Number of lines from the FastQ file to load in each loop. If having memory size problems, change it to a smaller number. Default=1e9. |
Creates a compressed FASTQ file in wk_dir/prep
named
basename(Fastq)).fq.gz
, containing the filtered reads with the UMI
sequence in the header. A data.frame object with the statisitics is also
returned.
Split fastq files at a given restriction site.
.singleSplitUMI4C( fastq_file, res_enz, cut_pos, split_dir, min_flen = 20, numb_reads = 1e+09 )
.singleSplitUMI4C( fastq_file, res_enz, cut_pos, split_dir, min_flen = 20, numb_reads = 1e+09 )
fastq_file |
Fastq file path. |
res_enz |
Character containing the restriction enzyme sequence. |
cut_pos |
Numeric indicating the nucleotide position where restriction enzyme cuts (zero-based) (for example, for DpnII is 0). |
split_dir |
Directory where to save split files. |
min_flen |
Minimal fragment length to use for selecting the fragments. Default=20 |
numb_reads |
Number of lines from the FastQ file to load in each loop. If having memory size problems, change it to a smaller number. Default=1e9. |
Creates a compressed FASTQ file in wk_dir/split
named
basename(fastq)).fq.gz
, containing the split reads based on the
restriction enzyme used.
Takes the variance stabilized count values and calculates a symmetric monotone fit for the distance dependency. The signal trend is fitted using the fda package.
.smoothMonotone(trafo_counts, alpha = 20, penalty = 0.1, frag_data)
.smoothMonotone(trafo_counts, alpha = 20, penalty = 0.1, frag_data)
trafo_counts |
Variance stabilized count values assay from DDS object. |
alpha |
Approximate number of fragments desired for every basis function
of the B-spline basis. |
penalty |
Amount of smoothing to be applied to the estimated functional parameter. Default: 0.1. |
frag_data |
Data frame with all the information on restriction fragments and the interval around the viewpoint. |
This function computes the smoothing function for the VST values, based on fda package, and calculates a symmetric monotone fit counts for the distance dependency
A dataframe with monotone smoothed fit counts.
This function can be used to add specific groupings to UMI4C
objects.
addGrouping( umi4c, grouping = "sampleID", scales = 5:150, normalized = TRUE, sd = 2 )
addGrouping( umi4c, grouping = "sampleID", scales = 5:150, normalized = TRUE, sd = 2 )
umi4c |
|
grouping |
Name of the column in colData used to merge the samples or replicates. Set to NULL for skipping grouping. Default: "condition". |
scales |
Numeric vector containing the scales for calculating the domainogram. |
normalized |
Logical indicating whether UMI-4C profiles should be
normalized to the |
sd |
Stantard deviation for adaptative trend. |
Adds a new UMI4C
object into the groupsUMI4C
slot with
samples grouped according to grouping
variable.
data("ex_ciita_umi4c") ex_ciita_umi4c <- addGrouping(ex_ciita_umi4c, grouping="condition")
data("ex_ciita_umi4c") ex_ciita_umi4c <- addGrouping(ex_ciita_umi4c, grouping="condition")
Given a GRanges dataset representing genes, will add an arbitrary value for them to be plotted in the Y axis without overlapping each other.
addStepping(genesDat, coordinates, mcol.name)
addStepping(genesDat, coordinates, mcol.name)
genesDat |
GRanges object containing gene information. |
coordinates |
GRanges object with coordinates you want to plot. |
mcol.name |
Integer containing the column number that contains the gene name. |
Calculates the stepping position to avoid overlap between genes.
Align split UMI-4C reads to a reference genome using Bowtie2.
alignmentUMI4C( wk_dir, pos_viewpoint, bowtie_index, threads = 1, filter_bp = 1e+07 )
alignmentUMI4C( wk_dir, pos_viewpoint, bowtie_index, threads = 1, filter_bp = 1e+07 )
wk_dir |
Working directory where to save the outputs generated by the UMI-4c analysis. |
pos_viewpoint |
GRanges object containing the genomic position of the
viewpoint. It can be generated by |
bowtie_index |
Path and prefix of the bowtie index to use for the alignment. |
threads |
Number of threads to use in the analysis. Default=1. |
filter_bp |
Integer indicating the bp upstream and downstream of the viewpoint to select for further analysis. Default=10e6 |
Creates a BAM file in wk_dir/align
named
"basename(fastq))_filtered.bam
", containing the
aligned filtered reads. The alignment log is also generated in
wk_dir/logs
named "umi4c_alignment_stats.txt
".
if (interactive()){ path <- downloadUMI4CexampleData(reduced = TRUE) alignmentUMI4C( wk_dir = file.path(path, "CIITA"), pos_viewpoint = GenomicRanges::GRanges("chr16:10972515-10972548"), bowtie_index = file.path(path, "ref_genome", "ucsc.hg19.chr16") ) }
if (interactive()){ path <- downloadUMI4CexampleData(reduced = TRUE) alignmentUMI4C( wk_dir = file.path(path, "CIITA"), pos_viewpoint = GenomicRanges::GRanges("chr16:10972515-10972548"), bowtie_index = file.path(path, "ref_genome", "ucsc.hg19.chr16") ) }
Will perform adaptative smoothing will scaling one profile to the reference UMI-4C profile.
calculateAdaptativeTrend(umi4c, sd = 2, normalized = TRUE)
calculateAdaptativeTrend(umi4c, sd = 2, normalized = TRUE)
umi4c |
|
sd |
Stantard deviation for adaptative trend. |
normalized |
Logical indicating whether UMI-4C profiles should be
normalized to the |
Calculates the adaptative trend considering the minimum number of molecules to use for merging different restriction fragments. It also calculates the geometric mean of the coordinates of the merged restriction fragments.
Using as input the raw UMIs, this function creates a domainogram for the supplied scales.
calculateDomainogram(umi4c, scales = 5:150, normalized = TRUE)
calculateDomainogram(umi4c, scales = 5:150, normalized = TRUE)
umi4c |
|
scales |
Integer vector indicating the number of scales to use for the domainogram creation. Default: 5:150. |
normalized |
Logical whether the the resulting domainograms should be normalized or not. Default: TRUE. |
A matrix where the first column represents the fragment end coordinates (start) and the rest represent the number of UMIs found when using a specific scale.
Test a set of query_regions
for significant interactions with the
viewpoint.
callInteractions( umi4c, design = ~condition, query_regions, padj_method = "fdr", zscore_threshold = 2, padj_threshold = 0.1, alpha = 20, penalty = 0.1 )
callInteractions( umi4c, design = ~condition, query_regions, padj_method = "fdr", zscore_threshold = 2, padj_threshold = 0.1, alpha = 20, penalty = 0.1 )
umi4c |
UMI4C object as generated by |
design |
A |
query_regions |
|
padj_method |
The method to use for adjusting p-values, see
|
zscore_threshold |
Numeric indicating the z-score threshold to use to define significant differential contacts. Default: 2. |
padj_threshold |
Numeric indicating the adjusted p-value threshold to use to define significant differential contacts. Default: 0.1. |
alpha |
Approximate number of fragments desired for every basis function
of the B-spline basis. |
penalty |
Amount of smoothing to be applied to the estimated functional parameter. Default: 0.1. |
GRangesList
where each element is a
UMI4C sample with the queried regions and their adjusted p-values and Z-scores.
data("ex_ciita_umi4c") umi <- ex_ciita_umi4c win_frags <- makeWindowFragments(umi, n_frags=8, sliding=1) gr <- callInteractions(umi, ~condition, win_frags, padj_threshold = 0.01, zscore_threshold=2) inter <- getSignInteractions(gr)
data("ex_ciita_umi4c") umi <- ex_ciita_umi4c win_frags <- makeWindowFragments(umi, n_frags=8, sliding=1) gr <- callInteractions(umi, ~condition, win_frags, padj_threshold = 0.01, zscore_threshold=2) inter <- getSignInteractions(gr)
Combine the UMI4C fragments that overlap a given set of query_regions
.
combineUMI4C(umi4c, query_regions)
combineUMI4C(umi4c, query_regions)
umi4c |
UMI4C object as generated by |
query_regions |
|
UMI4C
object with rowRanges corresponding to query_regions and
assay containing the sum of raw UMI counts at each specified query_region
.
data("ex_ciita_umi4c") wins <- makeWindowFragments(ex_ciita_umi4c) umi_comb <- combineUMI4C(ex_ciita_umi4c, wins)
data("ex_ciita_umi4c") wins <- makeWindowFragments(ex_ciita_umi4c) umi_comb <- combineUMI4C(ex_ciita_umi4c, wins)
Using demultiplexed FastQ files as input, performs all necessary steps to end up with a tsv file summarizing the restriction enzyme fragments and the number of UMIs supporting that specific contact with the viewpoint (bait) of interest.
contactsUMI4C( fastq_dir, wk_dir, file_pattern = NULL, bait_seq, bait_pad, res_enz, cut_pos, digested_genome, bowtie_index, threads = 1, numb_reads = 1e+09, rm_tmp = TRUE, min_flen = 20, filter_bp = 1e+07, ref_gen, sel_seqname = NULL )
contactsUMI4C( fastq_dir, wk_dir, file_pattern = NULL, bait_seq, bait_pad, res_enz, cut_pos, digested_genome, bowtie_index, threads = 1, numb_reads = 1e+09, rm_tmp = TRUE, min_flen = 20, filter_bp = 1e+07, ref_gen, sel_seqname = NULL )
fastq_dir |
Path of the directory containing the FastQ files (compressed or uncompressed). |
wk_dir |
Working directory where to save the outputs generated by the UMI-4c analysis. |
file_pattern |
Character that can be used to filter the files you want
to analyze in the |
bait_seq |
Character containing the bait primer sequence. |
bait_pad |
Character containing the pad sequence (sequence between the bait primer and the restriction enzyme sequence). |
res_enz |
Character containing the restriction enzyme sequence. |
cut_pos |
Numeric indicating the nucleotide position where restriction enzyme cuts (zero-based) (for example, for DpnII is 0). |
digested_genome |
Path for the digested genome file generated using the
|
bowtie_index |
Path and prefix of the bowtie index to use for the alignment. |
threads |
Number of threads to use in the analysis. Default=1. |
numb_reads |
Number of lines from the FastQ file to load in each loop. If having memory size problems, change it to a smaller number. Default=1e9. |
rm_tmp |
Logical indicating whether to remove temporary files (sam and intermediate bams). TRUE or FALSE. Default=TRUE. |
min_flen |
Minimal fragment length to use for selecting the fragments. Default=20 |
filter_bp |
Integer indicating the bp upstream and downstream of the viewpoint to select for further analysis. Default=10e6 |
ref_gen |
A BSgenome object of the reference genome. |
sel_seqname |
A character with the chromosome name to focus the search for the viewpoint sequence. |
This function is a combination of calls to other functions that perform the necessary steps for processing UMI-4C data.
if (interactive()) { path <- downloadUMI4CexampleData() hg19_dpnii <- digestGenome( cut_pos = 0, res_enz = "GATC", name_RE = "DpnII", ref_gen = BSgenome.Hsapiens.UCSC.hg19::BSgenome.Hsapiens.UCSC.hg19, out_path = file.path(path, "digested_genome") ) raw_dir <- file.path(path, "CIITA", "fastq") contactsUMI4C( fastq_dir = raw_dir, wk_dir = file.path(path, "CIITA"), bait_seq = "GGACAAGCTCCCTGCAACTCA", bait_pad = "GGACTTGCA", res_enz = "GATC", cut_pos = 0, digested_genome = hg19_dpnii, bowtie_index = file.path(path, "ref_genome", "ucsc.hg19.chr16"), threads = 1, numb_reads = 1e9, ref_gen = BSgenome.Hsapiens.UCSC.hg19::BSgenome.Hsapiens.UCSC.hg19, sel_seqname = "chr16" ) unlink(path, recursive=TRUE) }
if (interactive()) { path <- downloadUMI4CexampleData() hg19_dpnii <- digestGenome( cut_pos = 0, res_enz = "GATC", name_RE = "DpnII", ref_gen = BSgenome.Hsapiens.UCSC.hg19::BSgenome.Hsapiens.UCSC.hg19, out_path = file.path(path, "digested_genome") ) raw_dir <- file.path(path, "CIITA", "fastq") contactsUMI4C( fastq_dir = raw_dir, wk_dir = file.path(path, "CIITA"), bait_seq = "GGACAAGCTCCCTGCAACTCA", bait_pad = "GGACTTGCA", res_enz = "GATC", cut_pos = 0, digested_genome = hg19_dpnii, bowtie_index = file.path(path, "ref_genome", "ucsc.hg19.chr16"), threads = 1, numb_reads = 1e9, ref_gen = BSgenome.Hsapiens.UCSC.hg19::BSgenome.Hsapiens.UCSC.hg19, sel_seqname = "chr16" ) unlink(path, recursive=TRUE) }
Algorithm for counting and collapsing the number of UMIs supporting a specific ligation.
counterUMI4C( wk_dir, pos_viewpoint, res_enz, digested_genome, filter_bp = 1e+07 )
counterUMI4C( wk_dir, pos_viewpoint, res_enz, digested_genome, filter_bp = 1e+07 )
wk_dir |
Working directory where to save the outputs generated by the UMI-4c analysis. |
pos_viewpoint |
GRanges object containing the genomic position of the viewpoint. |
res_enz |
Character containing the restriction enzyme sequence. |
digested_genome |
Path for the digested genome file generated using the
|
filter_bp |
Integer indicating the bp upstream and downstream of the viewpoint to select for further analysis. Default=10e6. |
For collapsing different molecules into the same UMI, takes into account the ligation position and the number of UMI sequence mismatches.
Creates a compressed tab-delimited file in wk_dir/count
named
"basename(fastq) _counts.tsv.gz
", containing the
coordinates for the viewpoint fragment, contact fragment and the number of
UMIs detected in the ligation.
if (interactive()) { path <- downloadUMI4CexampleData(reduced = TRUE) hg19_dpnii <- digestGenome( cut_pos = 0, res_enz = "GATC", name_RE = "DpnII", sel_chr = "chr16", # digest only chr16 to make example faster ref_gen = BSgenome.Hsapiens.UCSC.hg19::BSgenome.Hsapiens.UCSC.hg19, out_path = file.path(path, "digested_genome") ) viewpoint <- GenomicRanges::GRanges("chr16:10972515-10972548") counterUMI4C( wk_dir = file.path(path, "CIITA"), pos_viewpoint = viewpoint, res_enz = "GATC", digested_genome = hg19_dpnii ) }
if (interactive()) { path <- downloadUMI4CexampleData(reduced = TRUE) hg19_dpnii <- digestGenome( cut_pos = 0, res_enz = "GATC", name_RE = "DpnII", sel_chr = "chr16", # digest only chr16 to make example faster ref_gen = BSgenome.Hsapiens.UCSC.hg19::BSgenome.Hsapiens.UCSC.hg19, out_path = file.path(path, "digested_genome") ) viewpoint <- GenomicRanges::GRanges("chr16:10972515-10972548") counterUMI4C( wk_dir = file.path(path, "CIITA"), pos_viewpoint = viewpoint, res_enz = "GATC", digested_genome = hg19_dpnii ) }
Create gene annotation object
createGeneAnnotation( window, TxDb = TxDb.Hsapiens.UCSC.hg19.knownGene::TxDb.Hsapiens.UCSC.hg19.knownGene, longest = TRUE )
createGeneAnnotation( window, TxDb = TxDb.Hsapiens.UCSC.hg19.knownGene::TxDb.Hsapiens.UCSC.hg19.knownGene, longest = TRUE )
window |
GRanges object with coordinates to use for selecting the genes to plot. |
TxDb |
TxDb object to use for drawing the genomic annotation. |
longest |
Logical indicating whether to plot only the longest transcripts for each gene in the gene annotation plot. |
GRanges object with the gene annotation in the window.
library(TxDb.Hsapiens.UCSC.hg19.knownGene) window <- GRanges("chr16:11298262-11400036") gene_anno <- createGeneAnnotation( window = window, TxDb = TxDb.Hsapiens.UCSC.hg19.knownGene )
library(TxDb.Hsapiens.UCSC.hg19.knownGene) window <- GRanges("chr16:11298262-11400036") gene_anno <- createGeneAnnotation( window = window, TxDb = TxDb.Hsapiens.UCSC.hg19.knownGene )
Create a statistical summary of the UMI-4C experiments analyzed with
contactsUMI4C
.
createStatsTable(wk_dir)
createStatsTable(wk_dir)
wk_dir |
Working directory where to save the outputs generated by the UMI-4c analysis. |
Returns a data.frame summarizing all the different statistics
for each sample analyzed in wk_dir
.
Darken colors
darken(color, factor = 1.4)
darken(color, factor = 1.4)
color |
Character containing the name or hex value of a color. |
factor |
Numeric representing a factor by which darken the specified color. |
Darkens the provided color by the provided factor.
darken("blue", factor = 1.4)
darken("blue", factor = 1.4)
Transforms an DDS object to a UMI4C object after applying
nbinomWaldTestUMI4C
.
dds2UMI4C( umi4c, dds, normalized = TRUE, padj_method = "fdr", padj_threshold = 0.05 )
dds2UMI4C( umi4c, dds, normalized = TRUE, padj_method = "fdr", padj_threshold = 0.05 )
umi4c |
UMI4C object as generated by |
dds |
DDS object as generated by |
normalized |
Logical indicating if the function should return normalized or raw UMI counts. Default: TRUE. |
padj_method |
The method to use for adjusting p-values, see
|
padj_threshold |
Numeric indicating the adjusted p-value threshold to use to define significant differential contacts. Default: 0.05. |
UMI4C object with the DESeq2 Wald Test results.
Demultiplex FASTQ files containng different bait information
demultiplexFastq(barcodes, fastq, out_path = "raw_fastq", numb_reads = 1e+11)
demultiplexFastq(barcodes, fastq, out_path = "raw_fastq", numb_reads = 1e+11)
barcodes |
Dataframe with "name of sample" and "barcode" for every sample to demultiplex. |
fastq |
Fastq to demultiplex containing mate 1s. Different pairs should be named as "_R1" or "_R2". Allowed formats: _R1.fastq.gz, _R1.fq.gz, _R1.fastq or _R1.fq. |
out_path |
Path where to save the demultiplex output. Defaults to a path
named |
numb_reads |
Number of lines from the FastQ file to load in each loop. If having memory size problems, change it to a smaller number. Default=10e10. |
Paired-end FastQ files demultiplexed in a compressed format. A log file with the statistics
is also generated in out_path
named barcode
_umi4cats_demultiplexFastq_stats.txt.
## Not run: path <- downloadUMI4CexampleData(use_sample = TRUE) fastq <- file.path(path, "CIITA", "fastq", "sub_ctrl_hi19_CIITA_R1.fastq.gz") barcodes <- data.frame( sample = c("CIITA"), barcode = c("GGACAAGCTCCCTGCAACTCA") ) demultiplexFastq( barcodes = barcodes, fastq = fastq, out_path = path ) ## End(Not run)
## Not run: path <- downloadUMI4CexampleData(use_sample = TRUE) fastq <- file.path(path, "CIITA", "fastq", "sub_ctrl_hi19_CIITA_R1.fastq.gz") barcodes <- data.frame( sample = c("CIITA"), barcode = c("GGACAAGCTCCCTGCAACTCA") ) demultiplexFastq( barcodes = barcodes, fastq = fastq, out_path = path ) ## End(Not run)
This page contains a summary of the different methods used to
access the information contained inside the UMI4C-class
object. See
the details section for more information on the different accessors.
dgram(object) dgram(object) <- value groupsUMI4C(object, value) groupsUMI4C(object) <- value bait(object) trend(object) resultsUMI4C(object, format = "GRanges", counts = TRUE, ordered = FALSE) ## S4 method for signature 'UMI4C' dgram(object) ## S4 replacement method for signature 'UMI4C' dgram(object) <- value ## S4 method for signature 'UMI4C' groupsUMI4C(object) ## S4 replacement method for signature 'UMI4C' groupsUMI4C(object) <- value ## S4 method for signature 'UMI4C' bait(object) ## S4 method for signature 'UMI4C' trend(object) ## S4 method for signature 'UMI4C' resultsUMI4C(object, format = "GRanges", counts = FALSE, ordered = FALSE)
dgram(object) dgram(object) <- value groupsUMI4C(object, value) groupsUMI4C(object) <- value bait(object) trend(object) resultsUMI4C(object, format = "GRanges", counts = TRUE, ordered = FALSE) ## S4 method for signature 'UMI4C' dgram(object) ## S4 replacement method for signature 'UMI4C' dgram(object) <- value ## S4 method for signature 'UMI4C' groupsUMI4C(object) ## S4 replacement method for signature 'UMI4C' groupsUMI4C(object) <- value ## S4 method for signature 'UMI4C' bait(object) ## S4 method for signature 'UMI4C' trend(object) ## S4 method for signature 'UMI4C' resultsUMI4C(object, format = "GRanges", counts = FALSE, ordered = FALSE)
object |
a |
value |
Alternative list of dgrams to replace the current slot. |
format |
Either "GRanges" (default) or "data.frame", indicating the format output of the results. |
counts |
Logical indicating whether counts for the different region should be provided. Default: FALSE. |
ordered |
Logical indicating whether to sort output by significance (adjusted p-value). Default: FALSE. |
There are several accessors to easily retrive information from a
UMI4C-class
object:
dgram
: Returns a named list with the output domainograms for
each sample.
bait
: Returns a GRanges object with the position
of the bait.
trend
: Returns a data.frame in long format with the values of
the adapative smoothen trend.
resultsUMI4C
: Returns a GRanges or data.frame with
the results of the differential analysis.
UMI4C, UMI4C-class
# Access the different information inside the UMI4C object data("ex_ciita_umi4c") ex_ciita_umi4c <- addGrouping(ex_ciita_umi4c, grouping="condition") dgram(ex_ciita_umi4c) bait(ex_ciita_umi4c) head(trend(ex_ciita_umi4c)) # Perform differential test enh <- GRanges(c("chr16:10925006-10928900", "chr16:11102721-11103700")) umi_dif <- fisherUMI4C(ex_ciita_umi4c, query_regions = enh, filter_low = 20, resize = 5e3) resultsUMI4C(umi_dif)
# Access the different information inside the UMI4C object data("ex_ciita_umi4c") ex_ciita_umi4c <- addGrouping(ex_ciita_umi4c, grouping="condition") dgram(ex_ciita_umi4c) bait(ex_ciita_umi4c) head(trend(ex_ciita_umi4c)) # Perform differential test enh <- GRanges(c("chr16:10925006-10928900", "chr16:11102721-11103700")) umi_dif <- fisherUMI4C(ex_ciita_umi4c, query_regions = enh, filter_low = 20, resize = 5e3) resultsUMI4C(umi_dif)
Using a UMI4C
object, infers the differences between conditions
specified in design
of the smooth monotone fitted values using a
Wald Test from DESeq2
package.
differentialNbinomWaldTestUMI4C( umi4c, design = ~condition, normalized = TRUE, padj_method = "fdr", query_regions = NULL, padj_threshold = 0.05, penalty = 0.1, alpha = 20 )
differentialNbinomWaldTestUMI4C( umi4c, design = ~condition, normalized = TRUE, padj_method = "fdr", query_regions = NULL, padj_threshold = 0.05, penalty = 0.1, alpha = 20 )
umi4c |
UMI4C object as generated by |
design |
A |
normalized |
Logical indicating if the function should return normalized or raw UMI counts. Default: TRUE. |
padj_method |
The method to use for adjusting p-values, see
|
query_regions |
|
padj_threshold |
Numeric indicating the adjusted p-value threshold to use to define significant differential contacts. Default: 0.05. |
penalty |
Amount of smoothing to be applied to the estimated functional parameter. Default: 0.1. |
alpha |
Approximate number of fragments desired for every basis function
of the B-spline basis. |
This function fits the signal trend of a variance stabilized count values using a symmetric monotone fit for the distance dependency. Then scales the raw counts across the samples to obtain normalized factors. Finally, it detects differences between conditions applying the DESeq2 Wald Test.
UMI4C
object with the DESeq2 Wald Test results.
## Not run: files <- list.files(system.file("extdata", "CIITA", "count", package="UMI4Cats"), pattern = "*_counts.tsv.gz", full.names = TRUE ) # Create colData including all relevant information colData <- data.frame( sampleID = gsub("_counts.tsv.gz", "", basename(files)), file = files, stringsAsFactors = FALSE ) library(tidyr) colData <- colData %>% separate(sampleID, into = c("condition", "replicate", "viewpoint"), remove = FALSE ) # Make UMI-4C object including grouping by condition umi <- makeUMI4C( colData = colData, viewpoint_name = "CIITA", grouping = NULL, bait_expansion = 2e6 ) umi_wald <- differentialNbinomWaldTestUMI4C(umi4c=umi, design=~condition, alpha = 100) ## End(Not run)
## Not run: files <- list.files(system.file("extdata", "CIITA", "count", package="UMI4Cats"), pattern = "*_counts.tsv.gz", full.names = TRUE ) # Create colData including all relevant information colData <- data.frame( sampleID = gsub("_counts.tsv.gz", "", basename(files)), file = files, stringsAsFactors = FALSE ) library(tidyr) colData <- colData %>% separate(sampleID, into = c("condition", "replicate", "viewpoint"), remove = FALSE ) # Make UMI-4C object including grouping by condition umi <- makeUMI4C( colData = colData, viewpoint_name = "CIITA", grouping = NULL, bait_expansion = 2e6 ) umi_wald <- differentialNbinomWaldTestUMI4C(umi4c=umi, design=~condition, alpha = 100) ## End(Not run)
Performs an in silico digestion of a given reference genome using a given restriction enzyme sequence.
digestGenome( res_enz, cut_pos, name_RE, ref_gen, sel_chr = paste0("chr", c(seq_len(22), "X", "Y")), out_path = "digested_genome/" )
digestGenome( res_enz, cut_pos, name_RE, ref_gen, sel_chr = paste0("chr", c(seq_len(22), "X", "Y")), out_path = "digested_genome/" )
res_enz |
Character containing the restriction enzyme sequence. |
cut_pos |
Numeric indicating the nucleotide position where restriction enzyme cuts (zero-based) (for example, for DpnII is 0). |
name_RE |
Restriction enzyme name. |
ref_gen |
A BSgenome object of the reference genome. |
sel_chr |
Character vector indicating which chromosomes to select for the digestion. Default: chr1-22, chrX, chrY. |
out_path |
Output path where to save the genomic track. The default is a
directory named |
Creates a rda file for every chromosome defined in sel_chr
.
library(BSgenome.Hsapiens.UCSC.hg19) ref_gen <- BSgenome.Hsapiens.UCSC.hg19 hg19_dpnii <- digestGenome( res_enz = "GATC", cut_pos = 0, name_RE = "dpnII", sel_chr = "chr16", # Only in chr16 to reduce example running time ref_gen = ref_gen, out_path = file.path(tempdir(), "digested_genome/") )
library(BSgenome.Hsapiens.UCSC.hg19) ref_gen <- BSgenome.Hsapiens.UCSC.hg19 hg19_dpnii <- digestGenome( res_enz = "GATC", cut_pos = 0, name_RE = "dpnII", sel_chr = "chr16", # Only in chr16 to reduce example running time ref_gen = ref_gen, out_path = file.path(tempdir(), "digested_genome/") )
Downloads the required UMI4Cats example datasets.
downloadUMI4CexampleData(out_dir = tempdir(), verbose = TRUE, reduced = FALSE)
downloadUMI4CexampleData(out_dir = tempdir(), verbose = TRUE, reduced = FALSE)
out_dir |
Output directory for the datasets, defaults to tempdir(). |
verbose |
Whether to print verbose messages or not. Default: TRUE. |
reduced |
Whether to use a reduced dataset to make test functions run faster. |
It creates the output_dir
with the example UMI-4C files used
by the vignette and examples. Takes advantage of the BiocFileCache package to
make sure that the file has not been previously downloaded by the user.
if (interactive()) { # Using reduced data data to make example faster. # Remove reduced=TRUE or set to FALSE to # download the full dataset. path <- downloadUMI4CexampleData(reduced = TRUE) }
if (interactive()) { # Using reduced data data to make example faster. # Remove reduced=TRUE or set to FALSE to # download the full dataset. path <- downloadUMI4CexampleData(reduced = TRUE) }
An example UMI4C object showing the contacts with a viewpoint located at the CIITA gene promoter.
ex_ciita_umi4c
ex_ciita_umi4c
A UMI4C object from this package.
See inst/script/CIITA_process_example.R to see the code use for generating the UMI4C object.
Using the UMIs inside query_regions
performs Fisher's Exact test to
calculate significant differences between contact intensities.
fisherUMI4C( umi4c, grouping = "condition", query_regions, resize = NULL, window_size = 5000, filter_low = 50, padj_method = "fdr", padj_threshold = 0.05 )
fisherUMI4C( umi4c, grouping = "condition", query_regions, resize = NULL, window_size = 5000, filter_low = 50, padj_method = "fdr", padj_threshold = 0.05 )
umi4c |
UMI4C object as generated by |
grouping |
Name of the column in colData used to merge the samples or
replicates. If none available or want to add new groupings, run
|
query_regions |
|
resize |
Width in base pairs for resizing the |
window_size |
If |
filter_low |
Either the minimum median UMIs requiered to perform
Fisher's Exact test or |
padj_method |
Method for adjusting p-values. See
|
padj_threshold |
Numeric indicating the adjusted p-value threshold to use to define significant differential contacts. |
This function calculates the
overlap of fragment ends with either the provided query_regions
or
the binned region using window_size
. The resulting number of UMIs in
each query_region
will be the sum of UMIs in all overlapping
fragment ends. As a default, will filter out those regions whose median
UMIs are lower than filter_low
.
Finally, a contingency table for each query_reegions
or window
that passed the filter_low
filter is created as follows:
query_region | region | |
Reference | n1 | N1-n1 |
Condition | n2 | N2-n2 |
and the Fisher's Exact test is performed. Obtained p-values are then
converted to adjusted p-values using padj_method
and the results list
is added to the UMI4C
object.
Calculates statistical differences between UMI-4C experiments.
data("ex_ciita_umi4c") ex_ciita_umi4c <- addGrouping(ex_ciita_umi4c, grouping="condition") # Perform differential test enh <- GRanges(c("chr16:10925006-10928900", "chr16:11102721-11103700")) umi_dif <- fisherUMI4C(ex_ciita_umi4c, query_regions = enh, filter_low = 20, resize = 5e3) resultsUMI4C(umi_dif)
data("ex_ciita_umi4c") ex_ciita_umi4c <- addGrouping(ex_ciita_umi4c, grouping="condition") # Perform differential test enh <- GRanges(c("chr16:10925006-10928900", "chr16:11102721-11103700")) umi_dif <- fisherUMI4C(ex_ciita_umi4c, query_regions = enh, filter_low = 20, resize = 5e3) resultsUMI4C(umi_dif)
Format plots for UMI4C
formatPlotsUMI4C(plot_list, font_size)
formatPlotsUMI4C(plot_list, font_size)
plot_list |
List of plots generated by |
font_size |
Base font size to use for the UMI4C plot. Default: 14. |
Given a named plot_list and considering the number and type of included plots, formats their axes accordingly to show the final UMI4C plot.
Get geometric mean of given coordinates
geoMeanCoordinates(coords, scale, bait_start)
geoMeanCoordinates(coords, scale, bait_start)
coords |
Vector of integers representing the coordinates from which to obtain the geometric mean. |
scale |
Vector of scales indicating how many fragment where merged. |
bait_start |
Integer indicating the coordinates for the bait start. |
Calculates geometric mean of the provided coordinates, taking into account the distance to the viewpoint and how many restriction fragments are being merged.
Get default colors
getColors(factors)
getColors(factors)
factors |
Name of the factors that will be used for grouping variables. |
Depending on the number of factors it creates different color palettes.
Get factors fro plotting
getFactors(umi4c, grouping = NULL)
getFactors(umi4c, grouping = NULL)
umi4c |
UMI4C object |
grouping |
Grouping used for the different samples. If none available or
want to add new groupings, run |
Factor vector where the first element is the reference factor.
Will return a normalization matrix.
getNormalizationMatrix( umi4c, norm_bins = 10^(3:6), post_smooth_win = 50, r_expand = 1.2 )
getNormalizationMatrix( umi4c, norm_bins = 10^(3:6), post_smooth_win = 50, r_expand = 1.2 )
umi4c |
|
norm_bins |
Numeric vector with the genomic bins to use for normalization. Default: 1K, 10K, 100K, 1Mb. |
post_smooth_win |
Numeric indicating the smoothing window to use. Default: 50. |
r_expand |
Numeric indicanting the expansion value for normalization. Default: 1.2. |
Creates a matrix of normalization factors using as a reference the profile specified in the UMI4C object.
Retrieves all significant interactions from the output of
callInteractions
.
getSignInteractions(gr_interactions)
getSignInteractions(gr_interactions)
gr_interactions |
GRangesList outputed by |
GRanges
object with a list of significantly interacting regions.
data("ex_ciita_umi4c") umi <- ex_ciita_umi4c win_frags <- makeWindowFragments(umi, n_frags=8, sliding=1) gr <- callInteractions(umi, ~condition, win_frags, padj_threshold = 0.01, zscore_threshold=2) inter <- getSignInteractions(gr)
data("ex_ciita_umi4c") umi <- ex_ciita_umi4c win_frags <- makeWindowFragments(umi, n_frags=8, sliding=1) gr <- callInteractions(umi, ~condition, win_frags, padj_threshold = 0.01, zscore_threshold=2) inter <- getSignInteractions(gr)
Finds the viewpoint coordinates for a given reference genome and sequence.
getViewpointCoordinates( bait_seq, bait_pad, res_enz, ref_gen, sel_seqname = NULL )
getViewpointCoordinates( bait_seq, bait_pad, res_enz, ref_gen, sel_seqname = NULL )
bait_seq |
Character containing the bait primer sequence. |
bait_pad |
Character containing the pad sequence (sequence between the bait primer and the restriction enzyme sequence). |
res_enz |
Character containing the restriction enzyme sequence. |
ref_gen |
A BSgenome object of the reference genome. |
sel_seqname |
A character with the chromosome name to focus the search for the viewpoint sequence. |
Creates a GRanges object containing the genomic position of the viewpoint.
getViewpointCoordinates( bait_seq = "GGACAAGCTCCCTGCAACTCA", bait_pad = "GGACTTGCA", res_enz = "GATC", ref_gen = BSgenome.Hsapiens.UCSC.hg19::BSgenome.Hsapiens.UCSC.hg19, sel_seqname = "chr16" # Look only in chr16 )
getViewpointCoordinates( bait_seq = "GGACAAGCTCCCTGCAACTCA", bait_pad = "GGACTTGCA", res_enz = "GATC", ref_gen = BSgenome.Hsapiens.UCSC.hg19::BSgenome.Hsapiens.UCSC.hg19, sel_seqname = "chr16" # Look only in chr16 )
Combines UMI4C samples by adding UMIs from assay(umi4c)
to represent
the levels in grouping
.
groupSamplesUMI4C(umi4c, grouping = "condition")
groupSamplesUMI4C(umi4c, grouping = "condition")
umi4c |
|
grouping |
Name of the column in colData used to merge the samples or replicates. Set to NULL for skipping grouping. Default: "condition". |
A grouped UMI4C
object.
Use a set of continuous restriction fragments to generate windows containing a fixed number of fragments (n_frags).
makeWindowFragments(input, n_frags = 8, sliding = 1)
makeWindowFragments(input, n_frags = 8, sliding = 1)
input |
Input object containing the restriction fragments. Should be class UMI4C (rowRanges will be extracted) or class GRanges. |
n_frags |
Number of fragments to use for generating the windows. This should include restriction fragments with 0 counts (Default: 8). |
sliding |
Numeric indicating the factor for generating sliding windows. If set to 1 (default) will use fixed windows. If set to > 0 and < 1 will use n_frags * sliding fragments to generate sliding windows. |
A GRanges object containing the windows of merged restriction fragments.
data("ex_ciita_umi4c") # Without sliding windows win_frags <- makeWindowFragments(ex_ciita_umi4c, n_frags=30, sliding=1) win_frags # With sliding windows (n_frags*sliding) win_frags <- makeWindowFragments(ex_ciita_umi4c, n_frags=30, sliding=0.5) win_frags
data("ex_ciita_umi4c") # Without sliding windows win_frags <- makeWindowFragments(ex_ciita_umi4c, n_frags=30, sliding=1) win_frags # With sliding windows (n_frags*sliding) win_frags <- makeWindowFragments(ex_ciita_umi4c, n_frags=30, sliding=0.5) win_frags
Takes the smooth monotone fit count values and infers the differential UMI4C
contacts using DESeq2 Wald Test from
DESeq2
package.
nbinomWaldTestUMI4C(dds, query_regions = NULL)
nbinomWaldTestUMI4C(dds, query_regions = NULL)
dds |
DDS object as generated by |
query_regions |
|
This function back-transform fitted values to the
scale of raw counts and scale them across the samples to
obtain normalized factors using normalizationFactors
from
DESeq2
package. To detect differences between conditions, the DESeq2
DDS object with the DESeq2 Wald Test results,
with results columns accessible with the results
function.
Plot differential contacts
plotDifferential(umi4c, grouping = NULL, colors = NULL, xlim = NULL)
plotDifferential(umi4c, grouping = NULL, colors = NULL, xlim = NULL)
umi4c |
|
grouping |
Grouping used for the different samples. If none available or
want to add new groupings, run |
colors |
Named vector of colors to use for plotting for each group. |
xlim |
Limits for the plot x axis (genomic coordinates). |
Produces a plot of the fold changes at the differential regions analyzed ghat are contained in the UMI4C object.
data("ex_ciita_umi4c") ex_ciita_umi4c <- addGrouping(ex_ciita_umi4c, grouping="condition") enh <- GRanges(c("chr16:10925006-10928900", "chr16:11102721-11103700")) umi_dif <- fisherUMI4C(ex_ciita_umi4c, query_regions = enh, filter_low = 20, resize = 5e3) plotDifferential(umi_dif)
data("ex_ciita_umi4c") ex_ciita_umi4c <- addGrouping(ex_ciita_umi4c, grouping="condition") enh <- GRanges(c("chr16:10925006-10928900", "chr16:11102721-11103700")) umi_dif <- fisherUMI4C(ex_ciita_umi4c, query_regions = enh, filter_low = 20, resize = 5e3) plotDifferential(umi_dif)
Plot domainogram
plotDomainogram( umi4c, dgram_function = "quotient", grouping = NULL, colors = NULL, xlim = NULL )
plotDomainogram( umi4c, dgram_function = "quotient", grouping = NULL, colors = NULL, xlim = NULL )
umi4c |
|
dgram_function |
Function used for calculating the fold-change in the domainogram plot, either "difference" or "quotient". Default: "quotient". |
grouping |
Grouping used for the different samples. If none available or
want to add new groupings, run |
colors |
Named vector of colors to use for plotting for each group. |
xlim |
Limits for the plot x axis (genomic coordinates). |
Produces the domainogram plot, summarizing the merged number of UMIs at the different scales analyzed (y axis).
data("ex_ciita_umi4c") ex_ciita_umi4c <- addGrouping(ex_ciita_umi4c, grouping="condition") plotDomainogram(ex_ciita_umi4c, grouping = "condition")
data("ex_ciita_umi4c") ex_ciita_umi4c <- addGrouping(ex_ciita_umi4c, grouping="condition") plotDomainogram(ex_ciita_umi4c, grouping = "condition")
Plot genes in a window of interest.
plotGenes( window, TxDb = TxDb.Hsapiens.UCSC.hg19.knownGene::TxDb.Hsapiens.UCSC.hg19.knownGene, longest = TRUE, xlim = NULL, font_size = 14 )
plotGenes( window, TxDb = TxDb.Hsapiens.UCSC.hg19.knownGene::TxDb.Hsapiens.UCSC.hg19.knownGene, longest = TRUE, xlim = NULL, font_size = 14 )
window |
GRanges object with coordinates to use for selecting the genes to plot. |
TxDb |
TxDb object to use for drawing the genomic annotation. |
longest |
Logical indicating whether to plot only the longest transcripts for each gene in the gene annotation plot. |
xlim |
Limits for the plot x axis (genomic coordinates). |
font_size |
Base font size to use for the UMI4C plot. Default: 14. |
Produces a plot with the genes found in the provided window
.
window <- GRanges("chr16:11348649-11349648") plotGenes( window = window, TxDb = TxDb.Hsapiens.UCSC.hg19.knownGene::TxDb.Hsapiens.UCSC.hg19.knownGene )
window <- GRanges("chr16:11348649-11349648") plotGenes( window = window, TxDb = TxDb.Hsapiens.UCSC.hg19.knownGene::TxDb.Hsapiens.UCSC.hg19.knownGene )
Plot the results of callInteractions
.
plotInteractions(gr_interactions, xlim = NULL, significant = TRUE)
plotInteractions(gr_interactions, xlim = NULL, significant = TRUE)
gr_interactions |
GRangesList outputed by |
xlim |
Limits for the plot x axis (genomic coordinates). |
significant |
Logical indicating whether to plot only significant interactions (default: TRUE). |
Produces a ggplot2 plot showing the queried interactions at each analysed sample.
Plot the results of callInteractions
together with the gene
annotation and the trend.
plotInteractionsUMI4C( umi4c, gr_interactions, grouping = "condition", significant = TRUE, colors = NULL, xlim = NULL, ylim = NULL, TxDb = TxDb.Hsapiens.UCSC.hg19.knownGene::TxDb.Hsapiens.UCSC.hg19.knownGene, longest = TRUE, rel_heights = c(0.25, 0.5, 0.25), font_size = 14 )
plotInteractionsUMI4C( umi4c, gr_interactions, grouping = "condition", significant = TRUE, colors = NULL, xlim = NULL, ylim = NULL, TxDb = TxDb.Hsapiens.UCSC.hg19.knownGene::TxDb.Hsapiens.UCSC.hg19.knownGene, longest = TRUE, rel_heights = c(0.25, 0.5, 0.25), font_size = 14 )
umi4c |
|
gr_interactions |
GRangesList outputed by |
grouping |
Grouping used for the different samples. If none available or
want to add new groupings, run |
significant |
Logical indicating whether to plot only significant interactions (default: TRUE). |
colors |
Named vector of colors to use for plotting for each group. |
xlim |
Limits for the plot x axis (genomic coordinates). |
ylim |
Limits of the trend y axis. |
TxDb |
TxDb object to use for drawing the genomic annotation. |
longest |
Logical indicating whether to plot only the longest transcripts for each gene in the gene annotation plot. |
rel_heights |
Numeric vector of length 3 or 4 (if differential plot) indicating the relative heights of each part of the UMI4C plot. |
font_size |
Base font size to use for the UMI4C plot. Default: 14. |
Combined plot with gene annotation, trend and interaction plot.
Plot adaptative smoothen trend
plotTrend(umi4c, grouping = NULL, colors = NULL, xlim = NULL, ylim = NULL)
plotTrend(umi4c, grouping = NULL, colors = NULL, xlim = NULL, ylim = NULL)
umi4c |
|
grouping |
Grouping used for the different samples. If none available or
want to add new groupings, run |
colors |
Named vector of colors to use for plotting for each group. |
xlim |
Limits for the plot x axis (genomic coordinates). |
ylim |
Limits of the trend y axis. |
Produces the adaptative trend plot, showing average UMIs at each position taking into account the minimum number of molecules used to merge restriction fragments.
data("ex_ciita_umi4c") plotTrend(ex_ciita_umi4c)
data("ex_ciita_umi4c") plotTrend(ex_ciita_umi4c)
Produce a UMI-4C data plot containing the genes in the region, the adaptative smoothen trend and the domainogram.
plotUMI4C( umi4c, grouping = "condition", dgram_function = "quotient", dgram_plot = TRUE, colors = NULL, xlim = NULL, ylim = NULL, TxDb = TxDb.Hsapiens.UCSC.hg19.knownGene::TxDb.Hsapiens.UCSC.hg19.knownGene, longest = TRUE, rel_heights = c(0.25, 0.4, 0.12, 0.23), font_size = 14 )
plotUMI4C( umi4c, grouping = "condition", dgram_function = "quotient", dgram_plot = TRUE, colors = NULL, xlim = NULL, ylim = NULL, TxDb = TxDb.Hsapiens.UCSC.hg19.knownGene::TxDb.Hsapiens.UCSC.hg19.knownGene, longest = TRUE, rel_heights = c(0.25, 0.4, 0.12, 0.23), font_size = 14 )
umi4c |
|
grouping |
Grouping used for the different samples. If none available or
want to add new groupings, run |
dgram_function |
Function used for calculating the fold-change in the domainogram plot, either "difference" or "quotient". Default: "quotient". |
dgram_plot |
Logical indicating whether to plot the domainogram. If the UMI4C object only contains one sample will be automatically set to FALSE. Default: TRUE. |
colors |
Named vector of colors to use for plotting for each group. |
xlim |
Limits for the plot x axis (genomic coordinates). |
ylim |
Limits of the trend y axis. |
TxDb |
TxDb object to use for drawing the genomic annotation. |
longest |
Logical indicating whether to plot only the longest transcripts for each gene in the gene annotation plot. |
rel_heights |
Numeric vector of length 3 or 4 (if differential plot) indicating the relative heights of each part of the UMI4C plot. |
font_size |
Base font size to use for the UMI4C plot. Default: 14. |
Produces a summary plot with all the information contained in the UMI4C opject.
data("ex_ciita_umi4c") ex_ciita_umi4c <- addGrouping(ex_ciita_umi4c, grouping="condition") plotUMI4C(ex_ciita_umi4c, dgram_plot = FALSE )
data("ex_ciita_umi4c") ex_ciita_umi4c <- addGrouping(ex_ciita_umi4c, grouping="condition") plotUMI4C(ex_ciita_umi4c, dgram_plot = FALSE )
Prepare the FastQ files for the further analysis by selecting reads with bait and adding the respective UMI identifier for each read in its header.
prepUMI4C( fastq_dir, wk_dir, file_pattern = NULL, bait_seq, bait_pad, res_enz, numb_reads = 1e+09 )
prepUMI4C( fastq_dir, wk_dir, file_pattern = NULL, bait_seq, bait_pad, res_enz, numb_reads = 1e+09 )
fastq_dir |
Path of the directory containing the FastQ files (compressed or uncompressed). |
wk_dir |
Working directory where to save the outputs generated by the UMI-4c analysis. |
file_pattern |
Character that can be used to filter the files you want
to analyze in the |
bait_seq |
Character containing the bait primer sequence. |
bait_pad |
Character containing the pad sequence (sequence between the bait primer and the restriction enzyme sequence). |
res_enz |
Character containing the restriction enzyme sequence. |
numb_reads |
Number of lines from the FastQ file to load in each loop. If having memory size problems, change it to a smaller number. Default=1e9. |
Creates a compressed FASTQ file in wk_dir/prep
named
basename(fastq)).fq.gz
, containing the filtered reads with the UMI
sequence in the header. A log file with the statistics is also generated
in wk_dir/logs
named umi4c_stats.txt
.
if (interactive()) { path <- downloadUMI4CexampleData(reduced = TRUE) raw_dir <- file.path(path, "CIITA", "fastq") prepUMI4C( fastq_dir = raw_dir, wk_dir = file.path(path, "CIITA"), bait_seq = "GGACAAGCTCCCTGCAACTCA", bait_pad = "GGACTTGCA", res_enz = "GATC" ) }
if (interactive()) { path <- downloadUMI4CexampleData(reduced = TRUE) raw_dir <- file.path(path, "CIITA", "fastq") prepUMI4C( fastq_dir = raw_dir, wk_dir = file.path(path, "CIITA"), bait_seq = "GGACAAGCTCCCTGCAACTCA", bait_pad = "GGACTTGCA", res_enz = "GATC" ) }
Takes the variance stabilized count values and calculates a symmetric
monotone fit for the distance dependency. The signal trend is fitted using
the fda package. The position
information about the viewpoint have to be stored in the metadata as
metadata(dds)[['bait']]
.
smoothMonotoneUMI4C(dds, alpha = 20, penalty = 0.1)
smoothMonotoneUMI4C(dds, alpha = 20, penalty = 0.1)
dds |
DDS object as generated by |
alpha |
Approximate number of fragments desired for every basis function
of the B-spline basis. |
penalty |
Amount of smoothing to be applied to the estimated functional parameter. Default: 0.1. |
This function computes the smoothing function for the VST values, based on fda package, and calculates a symmetric monotone fit counts for the distance dependency
DDS object with monotone smoothed fit counts.
Split the prepared reads using the restrition enzyme information.
splitUMI4C(wk_dir, res_enz, cut_pos, numb_reads = 1e+09, min_flen = 20)
splitUMI4C(wk_dir, res_enz, cut_pos, numb_reads = 1e+09, min_flen = 20)
wk_dir |
Working directory where to save the outputs generated by the UMI-4c analysis. |
res_enz |
Character containing the restriction enzyme sequence. |
cut_pos |
Numeric indicating the nucleotide position where restriction enzyme cuts (zero-based) (for example, for DpnII is 0). |
numb_reads |
Number of lines from the FastQ file to load in each loop. If having memory size problems, change it to a smaller number. Default=1e9. |
min_flen |
Minimal fragment length to use for selecting the fragments. Default=20 |
Creates a compressed FASTQ file in wk_dir/split
named
basename(fastq)).fq.gz
, containing the
split reads based on the restriction enzyme used.
if (interactive()) { path <- downloadUMI4CexampleData(reduced = TRUE) splitUMI4C( wk_dir = file.path(path, "CIITA"), res_enz = "GATC", cut_pos = 0 ) }
if (interactive()) { path <- downloadUMI4CexampleData(reduced = TRUE) splitUMI4C( wk_dir = file.path(path, "CIITA"), res_enz = "GATC", cut_pos = 0 ) }
Creates a stats summary file and generates a summary plot describing statistics for processed UMI-4C samples.
statsUMI4C(wk_dir)
statsUMI4C(wk_dir)
wk_dir |
Working directory where to save the outputs generated by the UMI-4c analysis. |
Returns a plot summarizing the main statistics of the processed
UMI-4C experiments found in wk_dir
. Also, writes a file named
stats_summary.txt
in wk_dir/logs
that summarizes all the
values represented in the plot.
statsUMI4C(wk_dir = system.file("extdata", "CIITA", package = "UMI4Cats" )) stats <- read.delim(system.file("extdata", "CIITA", "logs", "stats_summary.txt", package = "UMI4Cats" )) head(stats)
statsUMI4C(wk_dir = system.file("extdata", "CIITA", package = "UMI4Cats" )) stats <- read.delim(system.file("extdata", "CIITA", "logs", "stats_summary.txt", package = "UMI4Cats" )) head(stats)
Theme
theme(...)
theme(...)
... |
Additional arguments to pass to the theme call from ggplot2. |
ggplot2 theme.
library(ggplot2) ggplot( iris, aes(Sepal.Length, Sepal.Width) ) + geom_point() + theme()
library(ggplot2) ggplot( iris, aes(Sepal.Length, Sepal.Width) ) + geom_point() + theme()
Theme X blank
themeXblank(...)
themeXblank(...)
... |
Additional arguments to pass to the theme call from ggplot2. |
ggplot2 theme with a blank X axis.
library(ggplot2) ggplot( iris, aes(Sepal.Length, Sepal.Width) ) + geom_point() + themeXblank()
library(ggplot2) ggplot( iris, aes(Sepal.Length, Sepal.Width) ) + geom_point() + themeXblank()
Theme Y blank
themeXYblank(...)
themeXYblank(...)
... |
Additional arguments to pass to the theme call from ggplot2. |
ggplot2 theme with a blank X and Y axis.
library(ggplot2) ggplot( iris, aes(Sepal.Length, Sepal.Width) ) + geom_point() + themeXYblank()
library(ggplot2) ggplot( iris, aes(Sepal.Length, Sepal.Width) ) + geom_point() + themeXYblank()
Theme Y blank
themeYblank(...)
themeYblank(...)
... |
Additional arguments to pass to the theme call from ggplot2. |
ggplot2 theme with a blank Y axis.
library(ggplot2) ggplot( iris, aes(Sepal.Length, Sepal.Width) ) + geom_point() + themeYblank()
library(ggplot2) ggplot( iris, aes(Sepal.Length, Sepal.Width) ) + geom_point() + themeYblank()
The UMI4C constructor is the function makeUMI4C
. By using
the arguments listed below, performs the necessary steps to analyze UMI-4C
data and summarize it in an object of class UMI4C.
makeUMI4C( colData, viewpoint_name = "Unknown", grouping = "condition", normalized = TRUE, ref_umi4c = NULL, bait_exclusion = 3000, bait_expansion = 1e+06, scales = 5:150, min_win_factor = 0.02, sd = 2 )
makeUMI4C( colData, viewpoint_name = "Unknown", grouping = "condition", normalized = TRUE, ref_umi4c = NULL, bait_exclusion = 3000, bait_expansion = 1e+06, scales = 5:150, min_win_factor = 0.02, sd = 2 )
colData |
Data.frame containing the information for constructing the UMI4C experiment object. Needs to contain the following columns:
|
viewpoint_name |
Character indicating the name for the used viewpoint. |
grouping |
Name of the column in colData used to merge the samples or replicates. Set to NULL for skipping grouping. Default: "condition". |
normalized |
Logical indicating whether UMI-4C profiles should be
normalized to the |
ref_umi4c |
Name of the sample or group to use as reference for
normalization. By default is NULL, which means it will use the sample with
less UMIs in the analyzed region. It should be a named vector, where the name
corresponds to the grouping column from |
bait_exclusion |
Region around the bait (in bp) to be excluded from the analysis. Default: 3000bp. |
bait_expansion |
Number of bp upstream and downstream of the bait to use for the analysis (region centered in bait). Default: 1Mb. |
scales |
Numeric vector containing the scales for calculating the domainogram. |
min_win_factor |
Proportion of UMIs that need to be found in a specific window for adaptative trend calculation |
sd |
Stantard deviation for adaptative trend. |
It returns an object of the class UMI4C.
colData
Data.frame containing the information for constructing the UMI4C experiment object. Needs to contain the following columns:
sampleID
: Unique identifier for the sample.
condition
: Condition for performing differential analysis.
Can be control and treatment, two different cell types, etc.
replicate
: Number or ID for identifying different replicates.
file
: Path to the files outputed by contactsUMI4C
.
rowRanges
GRanges object with the coordinates for the restriction fragment ends, their IDs and other additional annotation columns.
metadata
List containing the following elements:
bait
: GRanges object representing the position
of the bait used for the analysis.
scales
: Numeric vector containing the scales used for
calculating the domainogram.
min_win_factor
: Factor for calculating the minimum molecules
requiered in a window for not merging it with the next one when
calculating the adaptative smoothing trend.
grouping
: Columns in colData
used for the different
sample groupings, accessible through groupsUMI4C
.
normalized
: Logical indicating whether samples/groups are
normalized or not.
region
: GRanges with the coordinates of
the genomic window used for analyzing UMI4C data.
ref_umi4c
: Name of the sample or group used as reference for
normalization.
assays
Matrix where each row represents a restriction fragment site
and columns represent each sample or group defined in grouping
.
After running the makeUMI4C
function, it will contain the
following data:
umis
: Raw number of UMIs detected by contactsUMI4C
.
norm_mat
: Normalization factors for each sample/group and fragment end.
trend
: Adaptative smoothing trend of UMIs.
geo_coords
: Geometric coordinates obtained when performing
the adaptative smoothing.
scale
: Scale selected for the adaptative smoothing.
sd
: Stantard deviation for the adaptative smoothing trend.
dgram
List containing the domainograms for each sample. A domainogram is matrix where columns are different scales selected for merging UMI counts and rows are the restriction fragments.
groupsUMI4C
List of UMI4C
objects with the specific groupings.
results
List containing the results for the differential analysis ran
using fisherUMI4C
.
The UMI4C
class extends the SummarizedExperiment class.
UMI4C-methods
# Load sample processed file paths files <- list.files(system.file("extdata", "CIITA", "count", package = "UMI4Cats" ), pattern = "*_counts.tsv", full.names = TRUE ) # Create colData including all relevant information colData <- data.frame( sampleID = gsub("_counts.tsv.gz", "", basename(files)), file = files, stringsAsFactors = FALSE ) library(tidyr) colData <- colData %>% separate(sampleID, into = c("condition", "replicate", "viewpoint"), remove = FALSE ) # Load UMI-4C data and generate UMI4C object umi <- makeUMI4C( colData = colData, viewpoint_name = "CIITA", grouping = "condition" )
# Load sample processed file paths files <- list.files(system.file("extdata", "CIITA", "count", package = "UMI4Cats" ), pattern = "*_counts.tsv", full.names = TRUE ) # Create colData including all relevant information colData <- data.frame( sampleID = gsub("_counts.tsv.gz", "", basename(files)), file = files, stringsAsFactors = FALSE ) library(tidyr) colData <- colData %>% separate(sampleID, into = c("condition", "replicate", "viewpoint"), remove = FALSE ) # Load UMI-4C data and generate UMI4C object umi <- makeUMI4C( colData = colData, viewpoint_name = "CIITA", grouping = "condition" )
Transforms an UMI4C object to a DDS object
UMI4C2dds(umi4c, design = ~condition)
UMI4C2dds(umi4c, design = ~condition)
umi4c |
UMI4C object as generated by |
design |
A |
DDS object.
The UMI4Cats package provides functions for the pre-processing, analysis and visualization of UMI-4C chromatin contact data.
There are two different functions that can be used to prepare the files for analyzing them with UMI4Cats:
demultiplexFastq
. Demultiplex reads belonging to
different viewpoints from a paired-end FastQ file.
digestGenome
. Digest the reference genome of choice
using a given restriction sequence.
The pre-processing functions are wrapped in the contactsUMI4C
main function. This function will sequentially run the following steps:
prepUMI4C
. Filter specific and high quality reads.
splitUMI4C
. Split reads by the restriction sequence.
alignmentUMI4C
. Align reads to the reference genome.
counterUMI4C
. Apply UMI counting algorithm to quantify
the interactions with the viewpoint.
The statistics from the samples analyzed with the contactsUMI4C
function can be extracted and visualized with the function
statsUMI4C
.
The analysis of UMI-4C data is wrapped in the construction of an object of
UMI4C class by the creator function makeUMI4C
.
This function will group your samples according to the variable you provided
in the grouping
argument (default: "condition") and then normalize it
to ref_umi4c
.
Significant interactions with the viewpoint can be called when several
replicates are available, using the callInteractions
function.
A set of query_regions
, such as enhancers or open chromatin regions
needs to be provided. When no candidate regions are available, one can use
the function makeWindowFragments
to join a fixed number of
restriction fragments into windows. The results of this analysis can be
visualized using plotInteractionsUMI4C
and the list of
significant interactions can be retrieved using
getSignInteractions
.
The differential analysis can be performed with
fisherUMI4C
or waldUMI4C
functions and can be
focused in a set of regions of interest by providing the query_regions
argument. Both will return a UMI4C object containing the
results of the differential test. You can access these results with the
method resultsUMI4C
.
An integrative plot showing the results stored inside the UMI4C
object can be generated with the function plotUMI4C
.
Using a DDS object performs a variance stabilizing transformation from DESeq2 package to the UMI4C counts
vstUMI4C(dds)
vstUMI4C(dds)
dds |
DDS object generated by |
This function estimate the size factors and dispersions of the counts
base on DESeq
for infering the VST distribution and
transform raw UMI4C counts.
DDS object with variance stabilizing transformation counts
Using a UMI4C
object, infers the differences between conditions
specified in design
using a Wald Test from DESeq2
package.
waldUMI4C( umi4c, query_regions = NULL, subset = "sum", design = ~condition, normalized = TRUE, padj_method = "fdr", padj_threshold = 0.05 )
waldUMI4C( umi4c, query_regions = NULL, subset = "sum", design = ~condition, normalized = TRUE, padj_method = "fdr", padj_threshold = 0.05 )
umi4c |
UMI4C object as generated by |
query_regions |
|
subset |
If |
design |
A |
normalized |
Logical indicating if the function should return normalized or raw UMI counts. Default: TRUE. |
padj_method |
The method to use for adjusting p-values, see
|
padj_threshold |
Numeric indicating the adjusted p-value threshold to use to define significant differential contacts. Default: 0.05. |
UMI4C
object with the DESeq2 Wald Test results, which can be
accessed using resultsUMI4C
.
data("ex_ciita_umi4c") umi_dif <- waldUMI4C(ex_ciita_umi4c)
data("ex_ciita_umi4c") umi_dif <- waldUMI4C(ex_ciita_umi4c)
Calculates the z-score and then they are converted into one-sided P-values and adjusted for multiple testing using the method of Benjamini and Hochberg
zscoreUMI4C( dds, padj_method = "fdr", zscore_threshold = 2, padj_threshold = 0.1 )
zscoreUMI4C( dds, padj_method = "fdr", zscore_threshold = 2, padj_threshold = 0.1 )
dds |
DDS object as generated by |
padj_method |
The method to use for adjusting p-values, see
|
zscore_threshold |
Numeric indicating the z-score threshold to use to define significant differential contacts. Default: 2. |
padj_threshold |
Numeric indicating the adjusted p-value threshold to use to define significant differential contacts. Default: 0.1. |
This function calculates the z-score for each fragment over all samples from the residuals of the symmetric monotone fit and the median absolute deviation (mad). Z-scores are then converted into one-sided P-values using the standard Normal cumulative distribution function, and these are adjusted for multiple testing using the method of Benjamini and Hochberg
DDS object with zscore,pvalue and padjusted assays