Title: | SNPhood: Investigate, quantify and visualise the epigenomic neighbourhood of SNPs using NGS data |
---|---|
Description: | To date, thousands of single nucleotide polymorphisms (SNPs) have been found to be associated with complex traits and diseases. However, the vast majority of these disease-associated SNPs lie in the non-coding part of the genome, and are likely to affect regulatory elements, such as enhancers and promoters, rather than function of a protein. Thus, to understand the molecular mechanisms underlying genetic traits and diseases, it becomes increasingly important to study the effect of a SNP on nearby molecular traits such as chromatin environment or transcription factor (TF) binding. Towards this aim, we developed SNPhood, a user-friendly *Bioconductor* R package to investigate and visualize the local neighborhood of a set of SNPs of interest for NGS data such as chromatin marks or transcription factor binding sites from ChIP-Seq or RNA- Seq experiments. SNPhood comprises a set of easy-to-use functions to extract, normalize and summarize reads for a genomic region, perform various data quality checks, normalize read counts using additional input files, and to cluster and visualize the regions according to the binding pattern. The regions around each SNP can be binned in a user-defined fashion to allow for analysis of very broad patterns as well as a detailed investigation of specific binding shapes. Furthermore, SNPhood supports the integration with genotype information to investigate and visualize genotype-specific binding patterns. Finally, SNPhood can be employed for determining, investigating, and visualizing allele-specific binding patterns around the SNPs of interest. |
Authors: | Christian Arnold [aut, cre], Pooja Bhat [aut], Judith Zaugg [aut] |
Maintainer: | Christian Arnold <[email protected]> |
License: | LGPL (>= 3) |
Version: | 1.37.0 |
Built: | 2024-10-31 05:31:12 UTC |
Source: | https://github.com/bioc/SNPhood |
analyzeSNPhood
is the main function of the SNPhood
package.
All results, parameters and metadata are stored in an object of class SNPhood
.
analyzeSNPhood(par.l, files.df, onlyPrepareForDatasetCorrelation = FALSE, verbose = TRUE)
analyzeSNPhood(par.l, files.df, onlyPrepareForDatasetCorrelation = FALSE, verbose = TRUE)
par.l |
Named list. Named list with all required parameter names and their respective values, which
should be generated via the helper function |
files.df |
Data frame with at least the column "signal" specifying the absolute paths to the BAM files that will be processed.
Optionally, further columns can be added.
Supported are "input", "individual" and "genotype". See the Vignette for further details.
The data frame can either be created manually or via the helper function |
onlyPrepareForDatasetCorrelation |
Logical(1). Default FALSE. If set to TRUE, only steps necessary to analyze
the correlation among datasets with respect to their read counts are calculated, which is less thsan time-consuming than running the full pipeline.
This is a quality control step to identify outlier datasets
that show artefacts and that should therefore be removed from the analysis. If set to FALSE (the default), the full pipeline is
executed. In both cases, the function |
verbose |
Logical(1). Default TRUE. Should the verbose mode (i.e., diagnostic messages during execution of the script) be enabled? |
If you already have BAM files in objects of class BamFile
or BamFileList
,
see the function collectFiles
for how to seemlessly integrate them into the SNPhood
framework.
In addition, see the vignettes for more details.
Object of class SNPhood
. See the class description (?"SNPhood-class", or click the link) for details.
## For the following example, see also the workflow vignette! library(SNPhoodData) # get a list of files to process dataDir = system.file("extdata", package = "SNPhoodData") files.df = collectFiles(patternFiles = paste0(dataDir,"/*.bam")) files.df$individual = c("GM10847", "GM10847", "GM12890", "GM12890") fileUserRegions = list.files(pattern = "*.txt",dataDir, full.names = TRUE) par.l = getDefaultParameterList(path_userRegions = fileUserRegions) par.l$poolDatasets = TRUE # Run the main function with the full pipeline SNPhood.o = analyzeSNPhood (par.l, files.df)
## For the following example, see also the workflow vignette! library(SNPhoodData) # get a list of files to process dataDir = system.file("extdata", package = "SNPhoodData") files.df = collectFiles(patternFiles = paste0(dataDir,"/*.bam")) files.df$individual = c("GM10847", "GM10847", "GM12890", "GM12890") fileUserRegions = list.files(pattern = "*.txt",dataDir, full.names = TRUE) par.l = getDefaultParameterList(path_userRegions = fileUserRegions) par.l$poolDatasets = TRUE # Run the main function with the full pipeline SNPhood.o = analyzeSNPhood (par.l, files.df)
SNPhood
object.Specific elements within the annotation slot may also be extracted by using the elements
parameter.
## S4 method for signature 'SNPhood' annotation(object, elements = NULL, ...)
## S4 method for signature 'SNPhood' annotation(object, elements = NULL, ...)
object |
Object of class |
elements |
Character. The name of the element(s) in the annotation slot to be extracted.
If set to |
... |
not supported |
If only a single value for elements
is provided, the element is returned directly.
If multiple values are provided, a named list with the requested elements is returned.
data(SNPhood.o, package="SNPhood") annotation(SNPhood.o) annotation(SNPhood.o, elements = "regions") annotation(SNPhood.o, elements = c("regions", "bins"))
data(SNPhood.o, package="SNPhood") annotation(SNPhood.o) annotation(SNPhood.o, elements = "regions") annotation(SNPhood.o, elements = c("regions", "bins"))
Return the names of the Bins that are defined in the SNPhood
object.
annotationBins(SNPhood.o, verbose = FALSE)
annotationBins(SNPhood.o, verbose = FALSE)
SNPhood.o |
Object of class |
verbose |
Logical(1). Default FALSE. Should the verbose mode (i.e., diagnostic messages during execution of the script) be enabled? |
Character vector. Names of the bins that are defined in the SNPhood
object.
data(SNPhood.o, package="SNPhood") annotationReadGroups(SNPhood.o)
data(SNPhood.o, package="SNPhood") annotationReadGroups(SNPhood.o)
annotationBins2
is a helper function that returns annotation of the bins that are defined in the SNPhood
object.
annotationBins2(SNPhood.o, regions = NULL, fullAnnotation = FALSE, verbose = TRUE)
annotationBins2(SNPhood.o, regions = NULL, fullAnnotation = FALSE, verbose = TRUE)
SNPhood.o |
Object of class |
regions |
Integer or character. Default NULL. A subset of the SNP regions for which annotation is needed. Either the row numbers or the rownames(IDs) of the SNP regions are supported. |
fullAnnotation |
Logical(1). Should the full annotation(as a data.frame) be returned or only the annotation of the bins(as a character vector)? |
verbose |
Logical(1). Default TRUE. Should the verbose mode (i.e., diagnostic messages during execution of the script) be enabled? |
If fullAnnotation
is set to TRUE
, a data.frame with the full annotation of the bins for the(subset of) SNP regions is returned. Otherwise, a character vector with only the annotation of the bins is returned.
The number of returned bins can easily be very large, in the order of millions. Be careful because the memory consumption due the resulting object may increase considerably. Reduce memory requirements by returning only a subset of SNP regions
data(SNPhood.o, package="SNPhood") annotation.df = annotationBins2(SNPhood.o, regions = 1:10, fullAnnotation = TRUE) annotation.vec = annotationBins2(SNPhood.o, regions = 1:10, fullAnnotation = FALSE)
data(SNPhood.o, package="SNPhood") annotation.df = annotationBins2(SNPhood.o, regions = 1:10, fullAnnotation = TRUE) annotation.vec = annotationBins2(SNPhood.o, regions = 1:10, fullAnnotation = FALSE)
Return the names of the datasets/individuals that are defined in the SNPhood
object.
annotationDatasets(SNPhood.o, verbose = FALSE)
annotationDatasets(SNPhood.o, verbose = FALSE)
SNPhood.o |
Object of class |
verbose |
Logical(1). Default FALSE. Should the verbose mode (i.e., diagnostic messages during execution of the script) be enabled? |
Character vector. Names of the datasets/individuals that are defined in the SNPhood
object.
data(SNPhood.o, package="SNPhood") annotationDatasets(SNPhood.o)
data(SNPhood.o, package="SNPhood") annotationDatasets(SNPhood.o)
Return the names of the read groups that are defined in the SNPhood
object.
annotationReadGroups(SNPhood.o, verbose = FALSE)
annotationReadGroups(SNPhood.o, verbose = FALSE)
SNPhood.o |
Object of class |
verbose |
Logical(1). Default FALSE. Should the verbose mode (i.e., diagnostic messages during execution of the script) be enabled? |
Character vector. Names of the read groups that are defined in the SNPhood
object.
data(SNPhood.o, package="SNPhood") annotationReadGroups(SNPhood.o)
data(SNPhood.o, package="SNPhood") annotationReadGroups(SNPhood.o)
Return the annotation of the SNP regions that are defined in the SNPhood
object.
annotationRegions(SNPhood.o, asGRangesObj = FALSE, verbose = FALSE)
annotationRegions(SNPhood.o, asGRangesObj = FALSE, verbose = FALSE)
SNPhood.o |
Object of class |
asGRangesObj |
Logical(1). Default FALSE. Should the full annotation be returned (as |
verbose |
Logical(1). Default FALSE. Should the verbose mode (i.e., diagnostic messages during execution of the script) be enabled? |
If asGRangesObj
is set to TRUE
, a GRanges
object is returned. Otherwise, a character vector with the currently stored SNP annotation is returned.
data(SNPhood.o, package="SNPhood") IDs.vec = annotationRegions(SNPhood.o, asGRangesObj = FALSE) IDs.gr = annotationRegions(SNPhood.o, asGRangesObj = TRUE)
data(SNPhood.o, package="SNPhood") IDs.vec = annotationRegions(SNPhood.o, asGRangesObj = FALSE) IDs.gr = annotationRegions(SNPhood.o, asGRangesObj = TRUE)
The function associateGenotypes
associates genotypes with SNP regions as defined in a SNPhood
object. It is possible to assign
genotypes only for a subset of datasets as defined in a SNPhood
object.
To avoid any ambiguities, a 1:1 for genotype and dataset mapping must be given (ses below).
associateGenotypes(SNPhood.o, genotypeMapping, verbose = TRUE)
associateGenotypes(SNPhood.o, genotypeMapping, verbose = TRUE)
SNPhood.o |
Object of class |
genotypeMapping |
Data frame. A data frame that establishes the mapping between datasets in the object and the corresponding genotype file and column names. See the examples. must be provided. See the Vignette for a more detailed description of the supported file format. |
verbose |
Logical(1). Default TRUE. Should the verbose mode (i.e., diagnostic messages during execution of the script) be enabled? |
Object of class SNPhood
with the genotype information added to the slot annotation
, element genotype.
You may retrieve it via the accessor function annotation
.
data(SNPhood.o, package="SNPhood") fileGenotypes = list.files(pattern = "*genotypes*",system.file("extdata", package = "SNPhoodData"), full.names = TRUE) mapping = data.frame(samples = annotationDatasets(SNPhood.o), genotypeFile = rep(fileGenotypes, 2), sampleName = c("NA10847", "NA12890")) SNPhood.o = associateGenotypes(SNPhood.o, mapping)
data(SNPhood.o, package="SNPhood") fileGenotypes = list.files(pattern = "*genotypes*",system.file("extdata", package = "SNPhoodData"), full.names = TRUE) mapping = data.frame(samples = annotationDatasets(SNPhood.o), genotypeFile = rep(fileGenotypes, 2), sampleName = c("NA10847", "NA12890")) SNPhood.o = associateGenotypes(SNPhood.o, mapping)
The function changeObjectIntegrityChecking
disables object integrity checking for a SNPhood object.
This might be desired for large objects when the integrity test takes too much time.
Note, however, that disabling these checks is not recommended.
changeObjectIntegrityChecking(SNPhood.o, disable = FALSE, verbose = TRUE)
changeObjectIntegrityChecking(SNPhood.o, disable = FALSE, verbose = TRUE)
SNPhood.o |
Object of class |
disable |
Logical(1). Default FALSE. Disable the object integrity checking? |
verbose |
Logical(1). Default TRUE. Should the verbose mode (i.e., diagnostic messages during execution of the script) be enabled? |
Object of class SNPhood
with object integrity checking disabled.
data(SNPhood.o, package="SNPhood") SNPhood.o = changeObjectIntegrityChecking(SNPhood.o, disable = TRUE)
data(SNPhood.o, package="SNPhood") SNPhood.o = changeObjectIntegrityChecking(SNPhood.o, disable = TRUE)
collectFiles
creates a data frame that can be used as input for the function analyzeSNPhood
.
The resulting data frame contains information about files that will be processed (column signal) and, optionally,
corresponding input files for normalization (column input) and labels to combine datasets to meta-datasets (column individual).
collectFiles(patternFiles, recursive = FALSE, ignoreCase = TRUE, inputFiles = NA, individualID = NA, genotypeMapping = NA, verbose = TRUE)
collectFiles(patternFiles, recursive = FALSE, ignoreCase = TRUE, inputFiles = NA, individualID = NA, genotypeMapping = NA, verbose = TRUE)
patternFiles |
Character. If vector of length 1, absolute path to one or multiple BAM files that should be processed. Wildcards ("*") are allowed
(examples are *CTCF* or *.bam, see also examples). If vector of length > 1, each element must specify the absolute path to a BAM file,
with no wildcards being allowed. See also the note above concerning the integration of
|
recursive |
Logical(1). Default FALSE. Should the search for BAM files within the directory be performed recursively? If set to TRUE, all files matching the pattern within the specified directory and all of its subdirectories will be added. If set to FALSE, only files within the specified directory but not any subdirectories will be used. |
ignoreCase |
Logical(1). Default TRUE. Should the specified pattern be case sensitive? |
inputFiles |
Character. Default NULL. Input files that should be used as a control for normalization. Supported values are NA (no input normalization), a single character specifying one or multiple input files (comma-separated, see examples) that should be used for all processed files, or a character vector of the same length as the number of files that will be processed. Set to NULL if you want to add the files later manually in the data frame (see vignette). |
individualID |
Character. Default NULL. Name of the individual IDs. Only relevant if datasets should be pooled. |
genotypeMapping |
Character. Default NULL. Path to the corresponding genotype file in VCF format, followed by a colon and the name of the column in the VCF file.
Genotypes can also be integrated later using the function |
verbose |
Logical(1). Default TRUE. Should the verbose mode (i.e., diagnostic messages during execution of the script) be enabled? |
Note that if you already have an object of class BamFile
or BamFileList
, this can easily be
integrated into the SNPhood
framework by using the path
function to specify the
value of the parameter patternFiles
, see the examples below.
a data frame with the three columns signal
, input
and individual
that can be used as input for the function analyzeSNPhood
.
## For brevity, only exemplary filenames are given in the following. ## Note that in reality, absolute paths should be provided. ## First some examples using specific files rather than files that ## match a pattern in a particular directory ## Load SNPhoodData library library(SNPhoodData) files.df = collectFiles(patternFiles = paste0(system.file("extdata", package = "SNPhoodData"),"/*.bam")) ## If you already have BAM files in objects of class \code{\linkS4class{BamFile}} or \code{\linkS4class{BamFileList}}, ## you may use the following code snippet: files = list.files(pattern = "*.bam$",system.file("extdata", package = "SNPhoodData"),full.names = TRUE) BamFile.o = BamFile(files[1]) BamFiles.o = BamFileList(files) files.df = collectFiles(patternFiles = path(BamFile.o)) files.df = collectFiles(patternFiles = path(BamFiles.o))
## For brevity, only exemplary filenames are given in the following. ## Note that in reality, absolute paths should be provided. ## First some examples using specific files rather than files that ## match a pattern in a particular directory ## Load SNPhoodData library library(SNPhoodData) files.df = collectFiles(patternFiles = paste0(system.file("extdata", package = "SNPhoodData"),"/*.bam")) ## If you already have BAM files in objects of class \code{\linkS4class{BamFile}} or \code{\linkS4class{BamFileList}}, ## you may use the following code snippet: files = list.files(pattern = "*.bam$",system.file("extdata", package = "SNPhoodData"),full.names = TRUE) BamFile.o = BamFile(files[1]) BamFiles.o = BamFileList(files) files.df = collectFiles(patternFiles = path(BamFile.o)) files.df = collectFiles(patternFiles = path(BamFiles.o))
convertToAllelicFractions
convert read counts across read groups to their relative fractions among all
read groups (all read counts will be between 0 and 1, with 1 for a particular read group depicting that all reads
from this particular position originate from the one read group)
Affected slots are readCountsUnbinned
and readCountsBinned
.
It is recommended to save the resulting SNPhood
object with a new name because it is not possible to go back from
fractions to read counts at a later point.
convertToAllelicFractions(SNPhood.o, roundDigits = 2, setNaNToZero = FALSE, verbose = TRUE)
convertToAllelicFractions(SNPhood.o, roundDigits = 2, setNaNToZero = FALSE, verbose = TRUE)
SNPhood.o |
Object of class |
roundDigits |
Numeric(1). Default 2. Number of digits after the decimal place when converting read counts to fractions |
setNaNToZero |
Logical(1). Default FALSE. Should NaN (not a number) be converted to 0? NaN may result from individual regions or bins with no reads across all read groups due to a division by zero. |
verbose |
Logical(1). Default TRUE. Should the verbose mode (i.e., diagnostic messages during execution of the script) be enabled? |
an object of class SNPhood
with read counts across read groups (both for the slots readCountsUnbinned and
readCountsBinned) replaced by their respective relative fractions. Otherwise identical to the input SNPhood
object.
data(SNPhood.o, package="SNPhood") SNPhood_allelicFractions.o = convertToAllelicFractions(SNPhood.o) # Convert all NaN to 0 for subsequent analyses SNPhood_allelicFractions.o = convertToAllelicFractions(SNPhood.o, setNaNToZero = TRUE)
data(SNPhood.o, package="SNPhood") SNPhood_allelicFractions.o = convertToAllelicFractions(SNPhood.o) # Convert all NaN to 0 for subsequent analyses SNPhood_allelicFractions.o = convertToAllelicFractions(SNPhood.o, setNaNToZero = TRUE)
SNPhood
object.counts
extracts count data from a SNPhood
object. The full count data or only a subset can be extracted
by settings the parameters type
, readGroup
and dataset
accordingly. Either the count data
for the unbinned or binned SNP regions can be extracted.
## S4 method for signature 'SNPhood' counts(object, type = "binned", readGroup = NULL, dataset = NULL, ...)
## S4 method for signature 'SNPhood' counts(object, type = "binned", readGroup = NULL, dataset = NULL, ...)
object |
Object of class |
type |
Character(1). Default "binned". Either "binned" or "unbinned" to extract counts after or before binning the SNP regions, respectively. |
readGroup |
Character(1). Default NULL. Read group that should be plotted, specified by its name as obtained by the function |
dataset |
Numeric(1) or Character(1). Single dataset that should be used for plotting, either specified as integer (such as 1, value must be
between 1 and the total number of datasets as defined in the object) or its annotation (name must appear in the dataset names as obtained via the function |
... |
not used |
A named nested list with the requested count data, organized after read group and dataset.
data(SNPhood.o, package="SNPhood") str(counts(SNPhood.o)) str(counts(SNPhood.o, readGroup = "paternal", dataset = 1)) str(counts(SNPhood.o, readGroup = c("maternal", "paternal"), dataset = 1))
data(SNPhood.o, package="SNPhood") str(counts(SNPhood.o)) str(counts(SNPhood.o, readGroup = "paternal", dataset = 1)) str(counts(SNPhood.o, readGroup = c("maternal", "paternal"), dataset = 1))
deleteDatasets
deletes a particular set of datasets from a SNPhood
object.
Removal is irreversible. It is therefore recommended to save the resulting SNPhood
object with a new name because the deleted datasets cannot be recovered.
deleteDatasets(SNPhood.o, datasets = NULL, verbose = TRUE)
deleteDatasets(SNPhood.o, datasets = NULL, verbose = TRUE)
SNPhood.o |
Object of class |
datasets |
Numeric or Character or NULL. Default NULL. Datasets that should be used for plotting, either specified as integer (such as 1, value must be
between 1 and the total number of datasets as defined in the object) or their annotation (name must appear in the dataset names as obtained via the function |
verbose |
Logical(1). Default TRUE. Should the verbose mode (i.e., diagnostic messages during execution of the script) be enabled? |
an object of class SNPhood
with the requested datasets removed from all slots.
deleteRegions
, deleteReadGroups
data(SNPhood.o, package="SNPhood") SNPhood_mod.o = deleteDatasets(SNPhood.o, c(1,2))
data(SNPhood.o, package="SNPhood") SNPhood_mod.o = deleteDatasets(SNPhood.o, c(1,2))
deleteReadGroups
deletes a particular set of read groups from a SNPhood object.
Removal is irreversible. It is therefore recommended to save the resulting SNPhood
object with a new name.
deleteReadGroups(SNPhood.o, readGroups = NULL, verbose = TRUE)
deleteReadGroups(SNPhood.o, readGroups = NULL, verbose = TRUE)
SNPhood.o |
Object of class |
readGroups |
Character or NULL. Default NULL. Read groups that should be plotted, specified by their name as obtained by the function |
verbose |
Logical(1). Default TRUE. Should the verbose mode (i.e., diagnostic messages during execution of the script) be enabled? |
an object of class SNPhood
with read counts across read groups (both for the slots readCountsUnbinned and readCountsBinned) replaced by their respective relative fractions. Otherwise identical to the input SNPhood
object.
data(SNPhood.o, package="SNPhood") SNPhood_allelicFractions.o = deleteReadGroups(SNPhood.o, "ambiguous")
data(SNPhood.o, package="SNPhood") SNPhood_allelicFractions.o = deleteReadGroups(SNPhood.o, "ambiguous")
deleteRegions
deletes a set of user regions.
Removal is irreversible. It is therefore recommended to save the resulting SNPhood
object with a new name.
deleteRegions(SNPhood.o, regions, verbose = TRUE)
deleteRegions(SNPhood.o, regions, verbose = TRUE)
SNPhood.o |
Object of class |
regions |
Numeric or Character or NULL. Default NULL. Regions that should be plotted, either specified as integer (such as 1, value must be
between 1 and the total number of regions as defined in the object) or their annotation (name must appear in the region names as obtained via the function |
verbose |
Logical(1). Default TRUE. Should the verbose mode (i.e., diagnostic messages during execution of the script) be enabled? |
an object of class SNPhood
with the requested regions being deleted.
Execution of this function resets the slot additionalResults and all of its results (e.g, allelic bias analysis). The reason for this is that all results stored in this slot are affected by the deletion of regions
deleteDatasets
, deleteReadGroups
data(SNPhood.o, package="SNPhood") # Delete the first 10 regions SNPhood_mod.o = deleteRegions(SNPhood.o, c(1:10)) # Delete regions by their annotation SNPhood_mod.o = deleteRegions(SNPhood.o, c("rs2822405", "rs467140"))
data(SNPhood.o, package="SNPhood") # Delete the first 10 regions SNPhood_mod.o = deleteRegions(SNPhood.o, c(1:10)) # Delete regions by their annotation SNPhood_mod.o = deleteRegions(SNPhood.o, c("rs2822405", "rs467140"))
Extract enrichment data from an object.
enrichment
extracts enrichment data from a SNPhood
object. The full count data or only a subset can be extracted
by settings the parameters type
, readGroup
and dataset
accordingly. Principally, either the count data
for the unbinned or binned SNP regions can be extracted.
enrichment(object, ...) ## S4 method for signature 'SNPhood' enrichment(object, readGroup = NULL, dataset = NULL, ...)
enrichment(object, ...) ## S4 method for signature 'SNPhood' enrichment(object, readGroup = NULL, dataset = NULL, ...)
object |
An object containing enrichment information. |
... |
Additional arguments, for use in specific methods. |
readGroup |
Character(1). Default NULL. Read group that should be plotted, specified by its name as obtained by the function |
dataset |
Numeric(1) or Character(1). Single dataset that should be used for plotting, either specified as integer (such as 1, value must be
between 1 and the total number of datasets as defined in the object) or its annotation (name must appear in the dataset names as obtained via the function |
Enrichment of the object or the objects components.
Named list with the requested enrichment matrices from the SNPhood
object, organized by read group and dataset
data(SNPhood.o, package="SNPhood") str(enrichment(SNPhood.o), list.len=5)
data(SNPhood.o, package="SNPhood") str(enrichment(SNPhood.o), list.len=5)
getDefaultParameterList
generates a default parameter list that can be used as input for the function analyzeSNPhood
.
The path to the user regions file can optionally be provided as an argument to the function. See the examples for further details.
Before running the function analyzeSNPhood
, carefully check that the default parameters are suitable for the analysis.
getDefaultParameterList(path_userRegions = NULL, isPairedEndData = TRUE)
getDefaultParameterList(path_userRegions = NULL, isPairedEndData = TRUE)
path_userRegions |
Character(1). Specify the value of the parameter |
isPairedEndData |
Logical(1). Default TRUE. Are the data paired-end (TRUE) or single-end (FALSE)? |
a named list with default values for the currently supported parameters that can be used as
input for the function analyzeSNPhood
:
readFlag_isPaired: Logical(1), TRUE for paired-end data, NA for single-end
readFlag_isProperPair: Logical(1), TRUE
readFlag_isUnmappedQuery: Logical(1), FALSE
readFlag_hasUnmappedMate: Logical(1), FALSE
readFlag_isMinusStrand: Logical(1), NA
readFlag_isMateMinusStrand: Logical(1), NA
readFlag_isFirstMateRead: Logical(1), NA
readFlag_isSecondMateRead: Logical(1), NA
readFlag_isNotPrimaryRead: Logical(1), FALSE
readFlag_isNotPassingQualityControls: Logical(1), FALSE
readFlag_isDuplicate: Logical(1), FALSE
readFlag_reverseComplement: Logical(1), FALSE
readFlag_simpleCigar: Logical(1), TRUE
path_userRegions: Character(1), as given by the function argument path_userRegions
zeroBasedCoordinates: Logical(1), FALSE
regionSize: Integer(1), 500
binSize: Integer(1), 50
readGroupSpecific: Logical(1), TRUE
strand: Character(1), "both"
startOpen: Logical(1), FALSE
endOpen: Logical(1), FALSE
headerLine: Logical(1), FALSE
linesToParse: Integer(1), -1
lastBinTreatment: Character(1), "delete"
assemblyVersion: Character(1), "hg19"
nCores: Integer(1), 1
keepAllReadCounts: Logical(1), FALSE
normByInput: Logical(1), FALSE
normAmongEachOther: Logical(1), TRUE
poolDatasets: Logical(1), FALSE
For reasons of reduced redundancy, a detailed description of the parameters can be found at the end of the
main vignette in SNPhood
(browseVignettes("SNPhood")
).
## Only one parameter can, optionally, be specified when calling the function par.l = getDefaultParameterList(path_userRegions = "path/to/regions", isPairedEndData = TRUE) ## If the file is not specified, you need to change it ## before you can execute the function \code{\link{analyzeSNPhood}} par.l = getDefaultParameterList(isPairedEndData = TRUE) par.l$path_userRegions = "path/to/regions"
## Only one parameter can, optionally, be specified when calling the function par.l = getDefaultParameterList(path_userRegions = "path/to/regions", isPairedEndData = TRUE) ## If the file is not specified, you need to change it ## before you can execute the function \code{\link{analyzeSNPhood}} par.l = getDefaultParameterList(isPairedEndData = TRUE) par.l$path_userRegions = "path/to/regions"
mergeReadGroups
merges the counts of all read groups for a SNPhood
object. This function can only be executed if more than one read group is defined in the object and
if read counts have not been converted into allelic fractions. Also carefully note the warning below.
mergeReadGroups(SNPhood.o, summaryFunction = "sum", verbose = TRUE)
mergeReadGroups(SNPhood.o, summaryFunction = "sum", verbose = TRUE)
SNPhood.o |
Object of class |
summaryFunction |
Character(1). Default "sum". Either "sum" or "mean". How should the read counts from different read groups be summarized. If set to "sum", all counts are summed up, which yields values that are identical as running the main analysis non-allele-specifically. If set to "mean", the mean value across all read groups is calculated. |
verbose |
Logical(1). Default TRUE. Should the verbose mode (i.e., diagnostic messages during execution of the script) be enabled? |
A modified SNPhood
object with only one read group "allReadGroups", with all occurences of
the original read groups replaced by "allReadGroups". For object consistency, as mentioned in the warning below,
some results from analyses depending on read groups are removed completely.
Merging read groups is irreversible. This transformation cannot be undone. It might therefore be advisable to save the resulting object in a new variable as shown in the examples.
Results from the allelic bias test and clustering results will also be removed to keep the object consistent.
data(SNPhood.o, package="SNPhood") nReadGroups(SNPhood.o) SNPhood_merged.o = mergeReadGroups(SNPhood.o) nReadGroups(SNPhood.o)
data(SNPhood.o, package="SNPhood") nReadGroups(SNPhood.o) SNPhood_merged.o = mergeReadGroups(SNPhood.o) nReadGroups(SNPhood.o)
Return the number of bins that are defined in the SNPhood
object.
nBins(SNPhood.o, verbose = FALSE)
nBins(SNPhood.o, verbose = FALSE)
SNPhood.o |
Object of class |
verbose |
Logical(1). Default FALSE. Should the verbose mode (i.e., diagnostic messages during execution of the script) be enabled? |
Integer. Number of bins that are defined in the SNPhood
object
data(SNPhood.o, package="SNPhood") nBins(SNPhood.o)
data(SNPhood.o, package="SNPhood") nBins(SNPhood.o)
Return the number of datasets that are defined in the SNPhood
object.
nDatasets(SNPhood.o, verbose = FALSE)
nDatasets(SNPhood.o, verbose = FALSE)
SNPhood.o |
Object of class |
verbose |
Logical(1). Default FALSE. Should the verbose mode (i.e., diagnostic messages during execution of the script) be enabled? |
Integer. Number of datasets that are defined in the SNPhood
object
data(SNPhood.o, package="SNPhood") nDatasets(SNPhood.o)
data(SNPhood.o, package="SNPhood") nDatasets(SNPhood.o)
Return the number of read groups that are defined in the SNPhood
object.
nReadGroups(SNPhood.o, verbose = FALSE)
nReadGroups(SNPhood.o, verbose = FALSE)
SNPhood.o |
Object of class |
verbose |
Logical(1). Default FALSE. Should the verbose mode (i.e., diagnostic messages during execution of the script) be enabled? |
Integer. Number of read groups that are defined in the SNPhood
object
data(SNPhood.o, package="SNPhood") nReadGroups(SNPhood.o)
data(SNPhood.o, package="SNPhood") nReadGroups(SNPhood.o)
nRegions
is a helper function that returns the number of SNP regions that are defined in the SNPhood
object.
nRegions(SNPhood.o, verbose = FALSE)
nRegions(SNPhood.o, verbose = FALSE)
SNPhood.o |
Object of class |
verbose |
Logical(1). Default FALSE. Should the verbose mode (i.e., diagnostic messages during execution of the script) be enabled? |
Integer. Number of SNP regions that are defined in the SNPhood
object
data(SNPhood.o, package="SNPhood") nRegions(SNPhood.o)
data(SNPhood.o, package="SNPhood") nRegions(SNPhood.o)
Retrieve the parameters of an object.
Retrieve the parameters of a SNPhood
object.
parameters(object, ...) ## S4 method for signature 'SNPhood' parameters(object, ...)
parameters(object, ...) ## S4 method for signature 'SNPhood' parameters(object, ...)
object |
An object containing parameters with which it was created. |
... |
Additional arguments, for use in specific methods. |
A named list with all parameters and its current values of the SNPhood
object.
data(SNPhood.o, package="SNPhood") parameters(SNPhood.o)
data(SNPhood.o, package="SNPhood") parameters(SNPhood.o)
plotAllelicBiasResults
graphically summarizes the results of the allelic bias analysis for a specific dataset and region.
Three plots are generated, each of which focuses on a different aspect of the allelic bias analysis across the selected user region.
plotAllelicBiasResults(SNPhood.o, dataset = 1, region = 1, signThreshold = 0.05, readGroupColors = NULL, fileToPlot = NULL, verbose = FALSE)
plotAllelicBiasResults(SNPhood.o, dataset = 1, region = 1, signThreshold = 0.05, readGroupColors = NULL, fileToPlot = NULL, verbose = FALSE)
SNPhood.o |
Object of class |
dataset |
Numeric(1) or Character(1). Single dataset that should be used for plotting, either specified as integer (such as 1, value must be
between 1 and the total number of datasets as defined in the object) or its annotation (name must appear in the dataset names as obtained via the function |
region |
Numeric(1) or Character(1). Single region that should be plotted, either specified as integer (such as 1, value must be
between 1 and the total number of region as defined in the object) or its annotation (name must appear in the region names as obtained via the function |
signThreshold |
Numeric(1). Default 0.05. The significance threshold (such as p-value or FDR threshold). Must be between 0 and 1. If the parameter belongs to a plotting function, a horizontal line is drawn at the chosen value. For the allelic bias summary plots, p-values below this threshold and the corresponding allelic fractions are highlighted. |
readGroupColors |
Character or NULL. Default NULL. Colors of the read groups that appear in the plot.
If set to NULL, colors will be set automatically. The length of the vector must equal the total number of read groups
that are defined in the |
fileToPlot |
Character(1) or |
verbose |
Logical(1). Default FALSE. Should the verbose mode (i.e., diagnostic messages during execution of the script) be enabled? |
The first plot shows the estimates of the allelic fraction, along with confidence intervals for the estimate
according to the parameters chosen when the function testForAllelicBias
was called. Fraction estimates for which the corresponding p-values
are deemed significant according to the value of the parameter signThreshold
are highlighted (see also the legend). At 0.5, the estimated
allelic fraction if there was no allelic bias, a horizontal line is drawn.
The second plot shows the p-values (-log 10 transformed, so that smaller p-values have higher transformed values).
In analogy to the estimates of the allelic fraction, significant p-values are highlighted. The -log10 transformed significance threshold
(according to the parameter signThreshold
) appears as a horizontal line.
Finally, the third plot shows the distribution of the read counts across all read groups. In addition, the genotype distribution for each read group is given (see the Vignette for details). This helps to identify allelic biases based on genotype differences among read groups.
the generated ggplot2 plot(s) as list for further processing. May contain multiple plots, depending on the function. The plot(s) can then be plotted individually or modified arbitrarily as the user wants. For example, if multiple plots are returned and the plots have been saved in a variable called plots.l, simply type plots.l[[1]] to view the first plot.
data(SNPhood.o, package="SNPhood") SNPhood.o = testForAllelicBiases(SNPhood.o, readGroups = c("maternal", "paternal")) # Leave all parameters with their standard values plots = plotAllelicBiasResults(SNPhood.o) # Change the colors plots = plotAllelicBiasResults(SNPhood.o, readGroupColors = c("blue", "red", "gray")) # Alter the significance threshold plots = plotAllelicBiasResults(SNPhood.o, signThreshold = 0.01)
data(SNPhood.o, package="SNPhood") SNPhood.o = testForAllelicBiases(SNPhood.o, readGroups = c("maternal", "paternal")) # Leave all parameters with their standard values plots = plotAllelicBiasResults(SNPhood.o) # Change the colors plots = plotAllelicBiasResults(SNPhood.o, readGroupColors = c("blue", "red", "gray")) # Alter the significance threshold plots = plotAllelicBiasResults(SNPhood.o, signThreshold = 0.01)
plotBinCounts
visualizes the results of the allelic bias analysis across regions or a user-defined genomic range.
Note that only the results of a particular chromosome can be visualized. It is therefore only possible if the regions to be visualized
are located on one particular chromosome; otherwise, an error is thrown.
plotAllelicBiasResultsOverview(SNPhood.o, regions = 1, datasets = NULL, plotChr = NULL, plotStartPos = NULL, plotEndPos = NULL, ylim = NULL, plotRegionBoundaries = FALSE, plotRegionLabels = FALSE, signThreshold = 0.05, pValueSummary = "min", maxWidthLabels = NULL, colorPalette = "Set1", sizePoints = 4, printPlot = TRUE, fileToPlot = NULL, verbose = FALSE)
plotAllelicBiasResultsOverview(SNPhood.o, regions = 1, datasets = NULL, plotChr = NULL, plotStartPos = NULL, plotEndPos = NULL, ylim = NULL, plotRegionBoundaries = FALSE, plotRegionLabels = FALSE, signThreshold = 0.05, pValueSummary = "min", maxWidthLabels = NULL, colorPalette = "Set1", sizePoints = 4, printPlot = TRUE, fileToPlot = NULL, verbose = FALSE)
SNPhood.o |
Object of class |
regions |
Numeric or Character or NULL. Default NULL. Regions that should be plotted, either specified as integer (such as 1, value must be
between 1 and the total number of regions as defined in the object) or their annotation (name must appear in the region names as obtained via the function |
datasets |
Numeric or Character or NULL. Default NULL. Datasets that should be used for plotting, either specified as integer (such as 1, value must be
between 1 and the total number of datasets as defined in the object) or their annotation (name must appear in the dataset names as obtained via the function |
plotChr |
Character(1) or NULL. Default NULL. The name of the chromosome for which the visualization should be done. Must be a valid chromosome name.
If set to NULL, other parameters (such as |
plotStartPos |
Character(1) or NULL. Default NULL. The start coordinates for which the visualization should be done. Must be a valid number with respect to the chromosome it refers to. If set to NULL and the parameter |
plotEndPos |
Character(1) or NULL. Default NULL. The end coordinates for which the visualization should be done. Must be a valid number with respect to the chromosome it refers to. If set to NULL and the parameter |
ylim |
Numeric(2). Default NULL. Range of the y-axis, as specified by a minimum and a maximum value. See ?ylim for details. |
plotRegionBoundaries |
Logical(1). Default FALSE. Should the region boundaries be drawn in the plot? If set to TRUE, two vertical lines will be drawn for each region, corresponding to the region boundaries upstream and downstream of the SNP. This visual aid may help to judge the size of the regions and overlaps among regions. |
plotRegionLabels |
Logical(1). Should the annotation of the regions be drawn vertically below the x axis? If many regions are plotted, labels may overlap; however, for a few regions, this is usually not a problem. |
signThreshold |
Numeric(1). Default 0.05. The significance threshold (such as p-value or FDR threshold). Must be between 0 and 1. If the parameter belongs to a plotting function, a horizontal line is drawn at the chosen value. For the allelic bias summary plots, p-values below this threshold and the corresponding allelic fractions are highlighted. |
pValueSummary |
Character(1). Default "min". Either "min" or "median". If set to "min", for each region, the minimum p-value across all bins is displayed as a representative result for the region. This is in analogy to how the background caclulation for the FDR calculation works, see the vignette for details. If set to "median", the median p-value is calculated for each region and plotted. This may facilitate to identify regions for which a lot of bins have low p-values. |
maxWidthLabels |
Numeric(1). Default NULL. Maximum width of the legend labels in number of characters. If the width of the legend labels are longer, they are shortened. Set to NULL to not shorten labels. |
colorPalette |
Character(1). Default "Set1". Name of the palette from the |
sizePoints |
Numeric(1). Default 4. Size of the points that are drawn in the plot (if type is set to the default value of "p"). This parameter has no effect if type is set to "l". |
printPlot |
Logical(1). Default TRUE. Should the plots be printed? Only relevant if |
fileToPlot |
Character(1) or |
verbose |
Logical(1). Default FALSE. Should the verbose mode (i.e., diagnostic messages during execution of the script) be enabled? |
the generated ggplot2 plot(s) as list for further processing. May contain multiple plots, depending on the function. The plot(s) can then be plotted individually or modified arbitrarily as the user wants. For example, if multiple plots are returned and the plots have been saved in a variable called plots.l, simply type plots.l[[1]] to view the first plot.
data(SNPhood.o, package="SNPhood") # Plot the allelic bias results for the first region using default values for all parameters plots = plotAllelicBiasResultsOverview(SNPhood.o) # Plot the allelic bias results for the full chr21 plots = plotAllelicBiasResultsOverview(SNPhood.o, regions = NULL, plotChr = "chr21")
data(SNPhood.o, package="SNPhood") # Plot the allelic bias results for the first region using default values for all parameters plots = plotAllelicBiasResultsOverview(SNPhood.o) # Plot the allelic bias results for the full chr21 plots = plotAllelicBiasResultsOverview(SNPhood.o, regions = NULL, plotChr = "chr21")
plotAndCalculateCorrelationDatasets
calculates and plots the pairwise correlation of all pairs of input files with among each other.
The main purpose is to identify artefacts with particular files that should subsequently be excluded.
The correlation is based on the raw region read counts(i.e., before binning).
The results of the correlation analysis are stored in the SNPhood
object.
If the corrplot
package is available, it will be used to produce a nice visualization of the correlation matrix.
plotAndCalculateCorrelationDatasets(SNPhood.o, fileToPlot = NULL, corMeasure = "pearson", verbose = FALSE, ...)
plotAndCalculateCorrelationDatasets(SNPhood.o, fileToPlot = NULL, corMeasure = "pearson", verbose = FALSE, ...)
SNPhood.o |
Object of class |
fileToPlot |
Character(1) or |
corMeasure |
Character(1). Default "pearson". The correlation measure that should be used to compare between pairs of samples.
Either |
verbose |
Logical(1). Default FALSE. Should the verbose mode (i.e., diagnostic messages during execution of the script) be enabled? |
... |
Additional arguments for the |
An object of type SNPhood
, with the results of the correlation analysis stored in the slot "additionalResults".
They can be retrieved via the helper function results
for further investigation.
The results consist of a named list with two elements: A correlation matrix of the region read counts across all input files and a
translation table to correlate the input files with the abbreviations from the correlation plot.
data(SNPhood.o, package="SNPhood") # Plot directly, using Pearson correlation SNPhood.o = plotAndCalculateCorrelationDatasets(SNPhood.o) # Plot to a PDF file SNPhood.o = plotAndCalculateCorrelationDatasets(SNPhood.o, fileToPlot = "res.pdf") # Using Spearman correlation instead of Pearson SNPhood.o = plotAndCalculateCorrelationDatasets(SNPhood.o, corMeasure = "spearman")
data(SNPhood.o, package="SNPhood") # Plot directly, using Pearson correlation SNPhood.o = plotAndCalculateCorrelationDatasets(SNPhood.o) # Plot to a PDF file SNPhood.o = plotAndCalculateCorrelationDatasets(SNPhood.o, fileToPlot = "res.pdf") # Using Spearman correlation instead of Pearson SNPhood.o = plotAndCalculateCorrelationDatasets(SNPhood.o, corMeasure = "spearman")
The function plotAndCalculateWeakAndStrongGenotype
finds the strongest and weakest genotypes based on reads extracted around each region. Strong and weak genotypes are found using the reads extracted from SNPhood and their corresponding genotypes as found by the function associateGenotypes
Note the reads have to be merged using the function mergeReadGroups
before running this function.
plotAndCalculateWeakAndStrongGenotype(SNPhood.o, normalize = TRUE, nClustersVec = 3, fileToPlot = NULL, verbose = FALSE)
plotAndCalculateWeakAndStrongGenotype(SNPhood.o, normalize = TRUE, nClustersVec = 3, fileToPlot = NULL, verbose = FALSE)
SNPhood.o |
Object of class |
normalize |
Logical(1). Default TRUE. Should a normalization be done on the counts/enrichments values before clustering? If set to TRUE, a normalization procedure based on subtracting the mean dividing by standard deviation for each region is performed. For more details, see the vignette. |
nClustersVec |
Numeric. Default 2. The number of clusters the data should be divided into. This can either be a vector or a single value. if multiple clusters are specified, multiple clustering analyses will be performed and for each of them, a plot is produced. make sure to specify the parameter fileToPlot in that case; otherwise, only the last plot may be visible. |
fileToPlot |
Character(1) or |
verbose |
Logical(1). Default FALSE. Should the verbose mode (i.e., diagnostic messages during execution of the script) be enabled? |
Modified SNPhood
object with the results of the analysis stored in the object.
Specifically, a matrix for average reads per SNP for datasets which have strong and weak genotypes, respectively, are stored in the
slot additionalResults$genotype.
The SNPs which have invariant genotypes across all the samples being analyzed are also saved.
In addition, clustering on the strong and weak genotype read mateices are reportd as in the function plotAndClusterMatrix
.
data(SNPhood.o, package="SNPhood") SNPhood_merged.o = mergeReadGroups(SNPhood.o) SNPhood_merged.o = plotAndCalculateWeakAndStrongGenotype(SNPhood_merged.o, nClustersVec = 6) SNPhood_merged.o = plotAndCalculateWeakAndStrongGenotype(SNPhood_merged.o, nClustersVec = 2:6, verbose = FALSE)
data(SNPhood.o, package="SNPhood") SNPhood_merged.o = mergeReadGroups(SNPhood.o) SNPhood_merged.o = plotAndCalculateWeakAndStrongGenotype(SNPhood_merged.o, nClustersVec = 6) SNPhood_merged.o = plotAndCalculateWeakAndStrongGenotype(SNPhood_merged.o, nClustersVec = 2:6, verbose = FALSE)
plotAndClusterMatrix
can be used to cluster regions such as SNPs based on their local neighbourhood.
The underlying clustering is done using partitioning around medoids (PAM). For more details, see the vignette.
plotAndClusterMatrix(SNPhood.o, readGroup, dataset, nClustersVec = 3, normalize = TRUE, clustersToPlot = NULL, fileToPlot = NULL, verbose = FALSE, ...)
plotAndClusterMatrix(SNPhood.o, readGroup, dataset, nClustersVec = 3, normalize = TRUE, clustersToPlot = NULL, fileToPlot = NULL, verbose = FALSE, ...)
SNPhood.o |
Object of class |
readGroup |
Character(1). Default NULL. Read group that should be plotted, specified by its name as obtained by the function |
dataset |
Numeric(1) or Character(1). Single dataset that should be used for plotting, either specified as integer (such as 1, value must be
between 1 and the total number of datasets as defined in the object) or its annotation (name must appear in the dataset names as obtained via the function |
nClustersVec |
Numeric. Default 2. The number of clusters the data should be divided into. This can either be a vector or a single value. if multiple clusters are specified, multiple clustering analyses will be performed and for each of them, a plot is produced. make sure to specify the parameter fileToPlot in that case; otherwise, only the last plot may be visible. |
normalize |
Logical(1). Default TRUE. Should a normalization be done on the counts/enrichments values before clustering? If set to TRUE, a normalization procedure based on subtracting the mean dividing by standard deviation for each region is performed. For more details, see the vignette. |
clustersToPlot |
Integer. Default NULL. Vector of clusters that should be plotted. If set to NULL, all clusters from the clustering result will be plotted. Otherwise, only the clusters as specified by the user are plotted, omitting regions belonging to other clusters. This is useful to, for example, only display regions that show a bin-dependent pattern and are not invariant across the whole user region. |
fileToPlot |
Character(1) or |
verbose |
Logical(1). Default FALSE. Should the verbose mode (i.e., diagnostic messages during execution of the script) be enabled? |
... |
Additional graphical parameters that can be used to modify the output of the function levelplot (panel.levelplot). See ?levelplot for details. |
The clustering reports the cluster in which each SNP falls, the average silhouette for pam clustering, plots for the clustered reads and a summary plot of average reads per cluster across the region being analyzed.
data(SNPhood.o, package="SNPhood") SNPhood.o = plotAndClusterMatrix(SNPhood.o, readGroup = "paternal", dataset = 1, nClustersVec = c(3:6)) SNPhood.o = plotAndClusterMatrix(SNPhood.o, readGroup = "paternal", dataset = 1, normalize = FALSE)
data(SNPhood.o, package="SNPhood") SNPhood.o = plotAndClusterMatrix(SNPhood.o, readGroup = "paternal", dataset = 1, nClustersVec = c(3:6)) SNPhood.o = plotAndClusterMatrix(SNPhood.o, readGroup = "paternal", dataset = 1, normalize = FALSE)
plotAndSummarizeAllelicBiasTest
summarizes the allelic bias test across SNP regions and bins by calculating various summary statistics.
See the Vignette for more details. TODO
plotAndSummarizeAllelicBiasTest(SNPhood.o, signThreshold = 0.05, fileToPlot = NULL)
plotAndSummarizeAllelicBiasTest(SNPhood.o, signThreshold = 0.05, fileToPlot = NULL)
SNPhood.o |
Object of class |
signThreshold |
Numeric(1). Default 0.05. The significance threshold (such as p-value or FDR threshold). Must be between 0 and 1. If the parameter belongs to a plotting function, a horizontal line is drawn at the chosen value. For the allelic bias summary plots, p-values below this threshold and the corresponding allelic fractions are highlighted. |
fileToPlot |
Character(1) or |
A named list with various elements, each of which summarizes the allelic bias tests with a different focus. TODO
data(SNPhood, package="SNPhood") SNPhood.o = testForAllelicBiases (SNPhood.o, readGroups = c("maternal", "paternal")) SNPhood.o = plotAndSummarizeAllelicBiasTest(SNPhood.o)
data(SNPhood, package="SNPhood") SNPhood.o = testForAllelicBiases (SNPhood.o, readGroups = c("maternal", "paternal")) SNPhood.o = plotAndSummarizeAllelicBiasTest(SNPhood.o)
plotBinCounts
visualizes counts or enrichment for a particular region across bins, datasets, and read groups.
Many graphical parameters can be adjusted to suit the needs of the user, see below.
plotBinCounts(SNPhood.o, regions = 1, readGroups = NULL, datasets = NULL, readGroupColors = NULL, ylim = NULL, addGenotype = TRUE, plotGenotypeRatio = FALSE, addTitle = TRUE, colorPalette = "Set1", printPlot = TRUE, fileToPlot = NULL, maxWidthLabels = NULL, verbose = FALSE)
plotBinCounts(SNPhood.o, regions = 1, readGroups = NULL, datasets = NULL, readGroupColors = NULL, ylim = NULL, addGenotype = TRUE, plotGenotypeRatio = FALSE, addTitle = TRUE, colorPalette = "Set1", printPlot = TRUE, fileToPlot = NULL, maxWidthLabels = NULL, verbose = FALSE)
SNPhood.o |
Object of class |
regions |
Numeric or Character or NULL. Default NULL. Regions that should be plotted, either specified as integer (such as 1, value must be
between 1 and the total number of regions as defined in the object) or their annotation (name must appear in the region names as obtained via the function |
readGroups |
Character or NULL. Default NULL. Read groups that should be plotted, specified by their name as obtained by the function |
datasets |
Numeric or Character or NULL. Default NULL. Datasets that should be used for plotting, either specified as integer (such as 1, value must be
between 1 and the total number of datasets as defined in the object) or their annotation (name must appear in the dataset names as obtained via the function |
readGroupColors |
Character or NULL. Default NULL. Colors of the read groups that appear in the plot.
If set to NULL, colors will be set automatically. The length of the vector must equal the total number of read groups
that are defined in the |
ylim |
Numeric(2). Default NULL. Range of the y-axis, as specified by a minimum and a maximum value. See ?ylim for details. |
addGenotype |
Logical(1). Default TRUE. Should the genotype distribution for each read group at the original user position be displayed in the legend in addition? See the Vignette for more details how this distribution is determined. |
plotGenotypeRatio |
Logical(1). Default FALSE. Should the ratio of the genotypes be plotted instead of the count or enrichment values? Only applicable if the number of read groups to be plotted is 2 and if one region is plotted. Setting this parameter to TRUE may result in ratios across bins that are interrupted due to zero counts (and a resulting division by zero, which can therefore not be displayed). Also, ratios cannot be plotted if the genotype for the selected regions could not be determined due to the lack of reads overlapping with the particular region (see the Vignette for details). |
addTitle |
Logical(1). Default TRUE. Should the plot contain a title that summarizes the genomic region that is visualized? |
colorPalette |
Character(1). Default "Set1". Name of the palette from the |
printPlot |
Logical(1). Default TRUE. Should the plots be printed? Only relevant if |
fileToPlot |
Character(1) or |
maxWidthLabels |
Numeric(1). Default NULL. Maximum width of the legend labels in number of characters. If the width of the legend labels are longer, they are shortened. Set to NULL to not shorten labels. |
verbose |
Logical(1). Default FALSE. Should the verbose mode (i.e., diagnostic messages during execution of the script) be enabled? |
the generated ggplot2 plot(s) as list for further processing. May contain multiple plots, depending on the function. The plot(s) can then be plotted individually or modified arbitrarily as the user wants. For example, if multiple plots are returned and the plots have been saved in a variable called plots.l, simply type plots.l[[1]] to view the first plot.
data(SNPhood.o, package="SNPhood") # Plot the first region, all parameters with their default values plot = plotBinCounts(SNPhood.o) # Plot the second region for the first dataset, using specific colors for the read groups. plot = plotBinCounts(SNPhood.o, regions = 2, dataset = 1, readGroupColors = c("red","blue","gray")) # Plot the first region for the first dataset and the genotype ratio. Save the plot in a variable plot = plotBinCounts(SNPhood.o, regions = 1, readGroups = c("maternal", "paternal"), dataset = 1, plotGenotypeRatio = TRUE) #' # Plot all regions for the first dataset and aggregate. Save the plot in a variable plot = plotBinCounts(SNPhood.o, regions = NULL, readGroups = c("maternal", "paternal"), dataset = 1)
data(SNPhood.o, package="SNPhood") # Plot the first region, all parameters with their default values plot = plotBinCounts(SNPhood.o) # Plot the second region for the first dataset, using specific colors for the read groups. plot = plotBinCounts(SNPhood.o, regions = 2, dataset = 1, readGroupColors = c("red","blue","gray")) # Plot the first region for the first dataset and the genotype ratio. Save the plot in a variable plot = plotBinCounts(SNPhood.o, regions = 1, readGroups = c("maternal", "paternal"), dataset = 1, plotGenotypeRatio = TRUE) #' # Plot all regions for the first dataset and aggregate. Save the plot in a variable plot = plotBinCounts(SNPhood.o, regions = NULL, readGroups = c("maternal", "paternal"), dataset = 1)
plotClusterAverage
visualizes the average reads per cluster. Note that the function plotAndClusterMatrix
has to be executed
before plotClusterAverage
is called for the same read group and dataset
plotClusterAverage(SNPhood.o, readGroup, dataset, fileToPlot = NULL, returnOnlyPlotNotObject = FALSE, verbose = FALSE)
plotClusterAverage(SNPhood.o, readGroup, dataset, fileToPlot = NULL, returnOnlyPlotNotObject = FALSE, verbose = FALSE)
SNPhood.o |
Object of class |
readGroup |
Character(1). Default NULL. Read group that should be plotted, specified by its name as obtained by the function |
dataset |
Numeric(1) or Character(1). Single dataset that should be used for plotting, either specified as integer (such as 1, value must be
between 1 and the total number of datasets as defined in the object) or its annotation (name must appear in the dataset names as obtained via the function |
fileToPlot |
Character(1) or |
returnOnlyPlotNotObject |
Logical(1). Default FALSE. If set to TRUE, only the plots are returned but not the actual object.
Otherwise, for consistancy among the various visualization functions, the |
verbose |
Logical(1). Default FALSE. Should the verbose mode (i.e., diagnostic messages during execution of the script) be enabled? |
the generated ggplot2 plot(s) as list for further processing. May contain multiple plots, depending on the function. The plot(s) can then be plotted individually or modified arbitrarily as the user wants. For example, if multiple plots are returned and the plots have been saved in a variable called plots.l, simply type plots.l[[1]] to view the first plot.
plotAndClusterMatrix
data(SNPhood.o, package="SNPhood") plot = plotClusterAverage(SNPhood.o, readGroup = "paternal", dataset = 1)
data(SNPhood.o, package="SNPhood") plot = plotClusterAverage(SNPhood.o, readGroup = "paternal", dataset = 1)
plotAllelicBiasResults
graphically summarizes the results of the allelic bias analysis for a specific dataset and region.
plotFDRResults(SNPhood.o, dataset, FDRThreshold = NULL, fileToPlot = NULL, printPlot = TRUE, returnOnlyPlotNotObject = FALSE, verbose = FALSE)
plotFDRResults(SNPhood.o, dataset, FDRThreshold = NULL, fileToPlot = NULL, printPlot = TRUE, returnOnlyPlotNotObject = FALSE, verbose = FALSE)
SNPhood.o |
Object of class |
dataset |
Numeric(1) or Character(1). Single dataset that should be used for plotting, either specified as integer (such as 1, value must be
between 1 and the total number of datasets as defined in the object) or its annotation (name must appear in the dataset names as obtained via the function |
FDRThreshold |
Numeric(1) or NULL. Default NULL. If set to a value between 0 and 1, a horizontal line will be drawn in the FDR summary plot to indicate at which p-value threshold the FDR reaches the user-defined value. Additionally, the maximum p-value threshold from the FDR summary data will be printed for which thr FDR is below the specified threshold. |
fileToPlot |
Character(1) or |
printPlot |
Logical(1). Default TRUE. Should the plots be printed? Only relevant if |
returnOnlyPlotNotObject |
Logical(1). Default FALSE. If set to TRUE, only the plots are returned but not the actual object.
Otherwise, for consistancy among the various visualization functions, the |
verbose |
Logical(1). Default FALSE. Should the verbose mode (i.e., diagnostic messages during execution of the script) be enabled? |
the generated ggplot2 plot(s) as list for further processing. May contain multiple plots, depending on the function. The plot(s) can then be plotted individually or modified arbitrarily as the user wants. For example, if multiple plots are returned and the plots have been saved in a variable called plots.l, simply type plots.l[[1]] to view the first plot.
data(SNPhood.o, package="SNPhood") SNPhood.o = testForAllelicBiases(SNPhood.o, readGroups = c("maternal", "paternal")) # Plot FDR results for first dataset plotFDRResults(SNPhood.o, annotationDatasets(SNPhood.o)[1], FDRThreshold = NULL, fileToPlot = NULL) # Plot FDR results for second dataset, save in file and also detemrine p-value treshold for which the FDR is below 10% plotFDRResults(SNPhood.o, annotationDatasets(SNPhood.o)[1], FDRThreshold = 0.1, fileToPlot = "FDR_summary.pdf")
data(SNPhood.o, package="SNPhood") SNPhood.o = testForAllelicBiases(SNPhood.o, readGroups = c("maternal", "paternal")) # Plot FDR results for first dataset plotFDRResults(SNPhood.o, annotationDatasets(SNPhood.o)[1], FDRThreshold = NULL, fileToPlot = NULL) # Plot FDR results for second dataset, save in file and also detemrine p-value treshold for which the FDR is below 10% plotFDRResults(SNPhood.o, annotationDatasets(SNPhood.o)[1], FDRThreshold = 0.1, fileToPlot = "FDR_summary.pdf")
The function plotGenotypesPerCluster
plots average clusters per genotype based on the clustering results of
the strong an weak genotype analysis (see plotAndCalculateWeakAndStrongGenotype
), which has to be executed before.
plotGenotypesPerCluster(SNPhood.o, printBinLabels = TRUE, fileToPlot = NULL, printPlot = TRUE, returnOnlyPlotNotObject = FALSE, verbose = FALSE)
plotGenotypesPerCluster(SNPhood.o, printBinLabels = TRUE, fileToPlot = NULL, printPlot = TRUE, returnOnlyPlotNotObject = FALSE, verbose = FALSE)
SNPhood.o |
Object of class |
printBinLabels |
Logical(1). Default TRUE. Should the bin labels be printed?
If multiple clusters are plotted simultaenously, bin labels might overlap, in which case |
fileToPlot |
Character(1) or |
printPlot |
Logical(1). Default TRUE. Should the plots be printed? Only relevant if |
returnOnlyPlotNotObject |
Logical(1). Default FALSE. If set to TRUE, only the plots are returned but not the actual object.
Otherwise, for consistancy among the various visualization functions, the |
verbose |
Logical(1). Default FALSE. Should the verbose mode (i.e., diagnostic messages during execution of the script) be enabled? |
the generated ggplot2 plot(s) as list for further processing. May contain multiple plots, depending on the function. The plot(s) can then be plotted individually or modified arbitrarily as the user wants. For example, if multiple plots are returned and the plots have been saved in a variable called plots.l, simply type plots.l[[1]] to view the first plot.
plotAndCalculateWeakAndStrongGenotype
data(SNPhood.o, package="SNPhood") SNPhood_merged.o = mergeReadGroups(SNPhood.o) SNPhood_merged.o = plotAndCalculateWeakAndStrongGenotype(SNPhood_merged.o) plot = plotGenotypesPerCluster(SNPhood_merged.o, printPlot = FALSE)
data(SNPhood.o, package="SNPhood") SNPhood_merged.o = mergeReadGroups(SNPhood.o) SNPhood_merged.o = plotAndCalculateWeakAndStrongGenotype(SNPhood_merged.o) plot = plotGenotypesPerCluster(SNPhood_merged.o, printPlot = FALSE)
Creates bar plots for the distribution of genotype frequencies of regions across individuals.
plotGenotypesPerSNP(SNPhood.o, regions = NULL, fileToPlot = NULL, returnOnlyPlotNotObject = FALSE, verbose = FALSE)
plotGenotypesPerSNP(SNPhood.o, regions = NULL, fileToPlot = NULL, returnOnlyPlotNotObject = FALSE, verbose = FALSE)
SNPhood.o |
Object of class |
regions |
Numeric or Character or NULL. Default NULL. Regions that should be plotted, either specified as integer (such as 1, value must be
between 1 and the total number of regions as defined in the object) or their annotation (name must appear in the region names as obtained via the function |
fileToPlot |
Character(1) or |
returnOnlyPlotNotObject |
Logical(1). Default FALSE. If set to TRUE, only the plots are returned but not the actual object.
Otherwise, for consistancy among the various visualization functions, the |
verbose |
Logical(1). Default FALSE. Should the verbose mode (i.e., diagnostic messages during execution of the script) be enabled? |
the generated ggplot2 plot(s) as list for further processing. May contain multiple plots, depending on the function. The plot(s) can then be plotted individually or modified arbitrarily as the user wants. For example, if multiple plots are returned and the plots have been saved in a variable called plots.l, simply type plots.l[[1]] to view the first plot.
data(SNPhood.o, package="SNPhood") plot = plotGenotypesPerSNP(SNPhood.o, regions=1:20)
data(SNPhood.o, package="SNPhood") plot = plotGenotypesPerSNP(SNPhood.o, regions=1:20)
plotBinCounts
visualizes the raw read counts (i.e., before binning user regions) across regions or a user-defined genomic range.
Note that only the results of a particular chromosome can be visualized. It is therefore only possible if the regions to be visualized
are located on one particular chromosome; otherwise, an error is thrown.
plotRegionCounts(SNPhood.o, regions = NULL, datasets = NULL, readGroups = NULL, mergeReadGroupCounts = FALSE, plotChr = NULL, plotStartPos = NULL, plotEndPos = NULL, ylim = NULL, plotRegionBoundaries = FALSE, plotRegionLabels = FALSE, maxWidthLabels = NULL, colorPalette = "Set1", sizePoints = 4, type = "p", printPlot = TRUE, fileToPlot = NULL, verbose = FALSE)
plotRegionCounts(SNPhood.o, regions = NULL, datasets = NULL, readGroups = NULL, mergeReadGroupCounts = FALSE, plotChr = NULL, plotStartPos = NULL, plotEndPos = NULL, ylim = NULL, plotRegionBoundaries = FALSE, plotRegionLabels = FALSE, maxWidthLabels = NULL, colorPalette = "Set1", sizePoints = 4, type = "p", printPlot = TRUE, fileToPlot = NULL, verbose = FALSE)
SNPhood.o |
Object of class |
regions |
Numeric or Character or NULL. Default NULL. Regions that should be plotted, either specified as integer (such as 1, value must be
between 1 and the total number of regions as defined in the object) or their annotation (name must appear in the region names as obtained via the function |
datasets |
Numeric or Character or NULL. Default NULL. Datasets that should be used for plotting, either specified as integer (such as 1, value must be
between 1 and the total number of datasets as defined in the object) or their annotation (name must appear in the dataset names as obtained via the function |
readGroups |
Character or NULL. Default NULL. Read groups that should be plotted, specified by their name as obtained by the function |
mergeReadGroupCounts |
Logical(1). Default FALSE. Should the read groups be merged for visualization purposes? |
plotChr |
Character(1) or NULL. Default NULL. The name of the chromosome for which the visualization should be done. Must be a valid chromosome name.
If set to NULL, other parameters (such as |
plotStartPos |
Character(1) or NULL. Default NULL. The start coordinates for which the visualization should be done. Must be a valid number with respect to the chromosome it refers to. If set to NULL and the parameter |
plotEndPos |
Character(1) or NULL. Default NULL. The end coordinates for which the visualization should be done. Must be a valid number with respect to the chromosome it refers to. If set to NULL and the parameter |
ylim |
Numeric(2). Default NULL. Range of the y-axis, as specified by a minimum and a maximum value. See ?ylim for details. |
plotRegionBoundaries |
Logical(1). Default FALSE. Should the region boundaries be drawn in the plot? If set to TRUE, two vertical lines will be drawn for each region, corresponding to the region boundaries upstream and downstream of the SNP. This visual aid may help to judge the size of the regions and overlaps among regions. |
plotRegionLabels |
Logical(1). Should the annotation of the regions be drawn vertically below the x axis? If many regions are plotted, labels may overlap; however, for a few regions, this is usually not a problem. |
maxWidthLabels |
Numeric(1). Default NULL. Maximum width of the legend labels in number of characters. If the width of the legend labels are longer, they are shortened. Set to NULL to not shorten labels. |
colorPalette |
Character(1). Default "Set1". Name of the palette from the |
sizePoints |
Numeric(1). Default 4. Size of the points that are drawn in the plot (if type is set to the default value of "p"). This parameter has no effect if type is set to "l". |
type |
Character(1). "p" or "l". Default "p". What type of plot should be drawn, points ("p") or lines ("l")? |
printPlot |
Logical(1). Default TRUE. Should the plots be printed? Only relevant if |
fileToPlot |
Character(1) or |
verbose |
Logical(1). Default FALSE. Should the verbose mode (i.e., diagnostic messages during execution of the script) be enabled? |
the generated ggplot2 plot(s) as list for further processing. May contain multiple plots, depending on the function. The plot(s) can then be plotted individually or modified arbitrarily as the user wants. For example, if multiple plots are returned and the plots have been saved in a variable called plots.l, simply type plots.l[[1]] to view the first plot.
data(SNPhood.o, package="SNPhood") # Plot the read counts for the first ten regions plot = plotRegionCounts(SNPhood.o, regions = 1:10) # Plot the read counts for the full chr21 plot = plotRegionCounts(SNPhood.o, plotChr = "chr21") # Plot the read counts for the full chr21, merge read group counts and decrease the point size plot = plotRegionCounts(SNPhood.o, plotChr = "chr21", sizePoints = 2, mergeReadGroupCounts = TRUE)
data(SNPhood.o, package="SNPhood") # Plot the read counts for the first ten regions plot = plotRegionCounts(SNPhood.o, regions = 1:10) # Plot the read counts for the full chr21 plot = plotRegionCounts(SNPhood.o, plotChr = "chr21") # Plot the read counts for the full chr21, merge read group counts and decrease the point size plot = plotRegionCounts(SNPhood.o, plotChr = "chr21", sizePoints = 2, mergeReadGroupCounts = TRUE)
renameBins
renames bins from a SNPhood object.
renameBins(SNPhood.o, newBinsMapping, verbose = TRUE)
renameBins(SNPhood.o, newBinsMapping, verbose = TRUE)
SNPhood.o |
Object of class |
newBinsMapping |
Named list. For clarity of mapping, the names of the list must be the currently defined bin names, and the values of each element the corresponding new ones. |
verbose |
Logical(1). Default TRUE. Should the verbose mode (i.e., diagnostic messages during execution of the script) be enabled? |
an object of class SNPhood
with the requested bins being renamed.
data(SNPhood.o, package="SNPhood") mapping = list("Bin1_NEW") names(mapping) = annotationBins(SNPhood.o)[1] SNPhood_mod.o = renameBins(SNPhood.o, mapping)
data(SNPhood.o, package="SNPhood") mapping = list("Bin1_NEW") names(mapping) = annotationBins(SNPhood.o)[1] SNPhood_mod.o = renameBins(SNPhood.o, mapping)
renameDatasets
renames datasets from a SNPhood object.
renameDatasets(SNPhood.o, newDatasetsMapping, verbose = TRUE)
renameDatasets(SNPhood.o, newDatasetsMapping, verbose = TRUE)
SNPhood.o |
Object of class |
newDatasetsMapping |
Named list. Named list. For clarity of mapping, the names of the list must be the currently defined dataset names, and the values of each element the corresponding new ones. |
verbose |
Logical(1). Default TRUE. Should the verbose mode (i.e., diagnostic messages during execution of the script) be enabled? |
an object of class SNPhood
with the requested datasets being renamed.
renameBins
, renameReadGroups
, renameRegions
data(SNPhood.o, package="SNPhood") mapping = list("Individual1", "Individual2") names(mapping) = annotationDatasets(SNPhood.o) SNPhood_mod.o = renameDatasets(SNPhood.o, mapping)
data(SNPhood.o, package="SNPhood") mapping = list("Individual1", "Individual2") names(mapping) = annotationDatasets(SNPhood.o) SNPhood_mod.o = renameDatasets(SNPhood.o, mapping)
renameReadGroups
renames a set of read groups from a SNPhood object.
renameReadGroups(SNPhood.o, newReadGroupsMapping, verbose = TRUE)
renameReadGroups(SNPhood.o, newReadGroupsMapping, verbose = TRUE)
SNPhood.o |
Object of class |
newReadGroupsMapping |
Named list. Named list. For clarity of mapping, the names of the list must be the currently defined read group names, and the values of each element the corresponding new ones. |
verbose |
Logical(1). Default TRUE. Should the verbose mode (i.e., diagnostic messages during execution of the script) be enabled? |
an object of class SNPhood
with the requested read groups being renamed.
renameBins
, renameDatasets
, renameRegions
data(SNPhood.o, package="SNPhood") mapping = list("a", "b", "c") names(mapping) = annotationReadGroups(SNPhood.o) SNPhood_mod.o = renameReadGroups (SNPhood.o, mapping)
data(SNPhood.o, package="SNPhood") mapping = list("a", "b", "c") names(mapping) = annotationReadGroups(SNPhood.o) SNPhood_mod.o = renameReadGroups (SNPhood.o, mapping)
renameRegions
renames regions from a SNPhood object.
renameRegions(SNPhood.o, newRegionsMapping, verbose = TRUE)
renameRegions(SNPhood.o, newRegionsMapping, verbose = TRUE)
SNPhood.o |
Object of class |
newRegionsMapping |
Named list. For clarity of mapping, the names of the list must be the currently defined region names, and the values of each element the corresponding new ones. |
verbose |
Logical(1). Default TRUE. Should the verbose mode (i.e., diagnostic messages during execution of the script) be enabled? |
An object of class SNPhood
with the requested regions being renamed
renameBins
, renameDatasets
, renameReadGroups
data(SNPhood.o, package="SNPhood") mapping = as.list(paste0(annotationRegions(SNPhood.o),".newName")) names(mapping) = annotationRegions(SNPhood.o) SNPhood_mod.o = renameRegions(SNPhood.o, mapping)
data(SNPhood.o, package="SNPhood") mapping = as.list(paste0(annotationRegions(SNPhood.o),".newName")) names(mapping) = annotationRegions(SNPhood.o) SNPhood_mod.o = renameRegions(SNPhood.o, mapping)
SNPhood
object.Return the results of a particular analyis that is stored in the SNPhood
object.
results(SNPhood.o, type, elements = NULL)
results(SNPhood.o, type, elements = NULL)
SNPhood.o |
Object of class |
type |
Character(1). Name of analyses one wants to retrieve the results for. Currently supported are "allelicBias", "clustering", "genotype" and "samplesCorrelation". |
elements |
Character. Default NULL. Which elements of the resulting list structure should be returned? If set to NULL, all elements will be returned. Otherwise, if names are provided, only the requested subset elements will be returned. If type equals "allelicBias", valid values are "pValue", "confIntervalMin", "confIntervalMax", "fractionEstimate", "background", "FDR_results", and "parameters". If type equals "clustering", valid values are the defined read groups in the object. If type equals "genotype", valid values are "strongGenotypes", "weakGenotypes", and "invariantGenotypes". If type equals "samplesCorrelation", valid values are "corTable", and "transl". |
A list with the results of the requested analysis and elements within.
data(SNPhood.o, package="SNPhood") head(results(SNPhood.o, type="allelicBias", elements = "parameters")) head(results(SNPhood.o, type="allelicBias"))
data(SNPhood.o, package="SNPhood") head(results(SNPhood.o, type="allelicBias", elements = "parameters")) head(results(SNPhood.o, type="allelicBias"))
For more information and an introduction to the package, see the two vignettes.
Summary analyses and visualizations for the selected genomic regions with respect to, for example, their read counts, genotype, and allelic origin
SNPhood
functionsanalyzeSNPhood
annotation
annotationBins
annotationBins2
annotationDatasets
annotationReadGroups
annotationRegions
associateGenotypes
collectFiles
convertToAllelicFractions
counts
deleteDatasets
deleteReadGroups
deleteRegions
enrichment
getDefaultParameterList
mergeReadGroups
nBins
nDatasets
nReadGroups
nRegions
parameters
plotAllelicBiasResults
plotAllelicBiasResultsOverview
plotAndCalculateCorrelationDatasets
plotAndCalculateWeakAndStrongGenotype
plotAndClusterMatrix
plotBinCounts
plotClusterAverage
plotGenotypesPerCluster
plotGenotypesPerSNP
plotRegionCounts
renameBins
renameDatasets
renameReadGroups
renameRegions
results
testForAllelicBiases
We value all the feedback that we receive and will try to reply in a timely manner. Please report any bug that you encounter as well as any feature request that you may have to [email protected].
The class SNPhood
stores read count-derived information from NGS files for a set of genomic regions of interest as well as associated metadata.
It may additionally contain results of various subsequent analyses and statistical tests.
See the description below or the Vignette for more details.
annotation
Named list. Contains various annotation and metadata such as:
regions
: An object of class GenomicRanges
that contains the user regions, including annotation and the position
of the original user-provided position before creating regions and bins.
genotype
: A list one or two elements, both of which contain genotype-related information, either directly from the sequencing reads
or externally derived from a VCF file using the function associateGenotypes
.
readGroups
: The names of the read groups that are currently defined.
files
: Contains a named list with additional information about each processed file, such as type(signal
or input
), files
(a vector of one or multiple filenames), and composite
(TRUE
or FALSE
, indicating if this is a composite file from multiple individual files)
Elements from this slot can be retrieved with the accessor function annotation
.
config
Named list. Named list with the parameters as specified in the parameter list and additionally the specific parameters
the function analyzeSNPhood
was called with (such as onlyPrepareForDatasetCorrelation
and input
). Elements from
this slot can be retrieved with the accessor function parameters
.
readCountsUnbinned
Named list (nested). Contains vectors of raw reads counts for each user region (before binning).
The names of the list are the read groups and the filenames of the annotated datasets. Elements from
this slot can be retrieved with the accessor function counts
using type
= "unbinned".
readCountsBinned
Named list (nested). Each element contains a matrix of raw reads counts per user region and bin (i.e., after binning).
The names of the list are the read groups and the filenames of the annotated datasets.
Contains the raw read counts if normalization among all datasets ha sbeen performed (parameter normAmongEachOther
is set to FALSE)
and normalized read counts otherwise.
If read counts are recorded allele-specifically (in the following snippet paternal, maternal and ambiguous) for each group,
the structure therefore may look like this:
paternal
:
dataset ID 1
: Matrix of read counts for each user region across bins
dataset ID 2
: Matrix of read counts for each user region across bins
...
maternal
: See read group paternal, identical structure
ambiguous
: See read group paternal, identical structure
enrichmentBinned
Named list. See the description for the slot readCountsBinned
,
with the only difference that this slot contains the enrichment after normalizing with an input rather than the read counts.
If input normalization is turned off, this slot is empty.
additionalResults
Named list. Contains additional information from subsequent analyses such as allelic bias tests or results
of the genotype analysis. Initially empty. Different functions write the results in this slot.
Elements from this slot can be retrieved with the accessor function results
.
Currently, a SNPhood
object can only be constructed by executing the main function of the package, analyzeSNPhood
.
In the following code snippets, SNPhood.o
is a SNPhood
object and
readGroupCur
and datasetCur
a particular read group and dataset as defined in SNPhood.o
, respectively.
# Get general annotation of a SNPhood object
annotation(SNPhood.o)
: Get the annotation information, a nested list with multiple components (see names(annotation(SNPhood.o))).
# Get more specific annotation such as number and annotation of regions, datasets, bins, and read groups
nRegions(SNPhood.o)
: Get the number of user regions.
nDatasets(SNPhood.o)
: Get the number of datasets.
nBins(SNPhood.o)
: Get the number of bins.
nReadGroups(SNPhood.o)
: Get the number of read groups.
annotationRegions(SNPhood.o)
: Get the annotation of user regions.
annotationDatasets(SNPhood.o)
: Get the annotation of datasets.
annotationBins(SNPhood.o)
: Get the annotation of bins.
annotationReadGroups(SNPhood.o)
: Get the annotation of read groups.
# Get the parameters that were used for the analysis
parameters(SNPhood.o)
: Get the parameter information, a nested list with multiple components (see names(parameters(SNPhood.o))).
# Get counts before binning
counts(SNPhood.o, type = "unbinned", readGroup = readGroupCur, dataset = datasetCur)
: Get the counts for each user region before binning.
See ?counts for more details.
# If applicable, get counts after binning
counts(SNPhood.o, type = "binned", readGroup = readGroupCur, dataset = datasetCur)
: Get the counts for each user region after binning.
See ?counts for more details.
# If applicable, get enrichment after binning
enrichment(SNPhood.o, type = "binned", readGroup = readGroupCur, dataset = datasetCur)
: Get the enrichment for each user region after binning.
See ?enrichment for more details.
In addition, see the workflow vignette (browseVignettes(\"SNPhood\") for a full workflow that uses all accessors.
This dataset is an example dataset that can be used for exploring the SNPhood
package.
For more information, see the workflow vignette of the SNPhood
and SNPhoodData
package, respectively.
an example SNPhood
object from the SNPhoodData
package with read counts for
174 genomic regions across 2 datasets, three read groups and 100 bins
testForAllelicBiases
performs tests for allelic biases for each binned user region using binomial tests.
For the parameter readGroups
, the name of exactly two read groups must be provided for which allelic ratio tests should be performed.
See the Vignette for more details.
testForAllelicBiases(SNPhood.o, readGroups, confLevel = 0.95, nullHypothesisFraction = 0.5, calcBackgroundDistr = TRUE, nRepetitions = 100, pValuesToTestBackground = c(1e-04, 5e-04, 0.001, 0.005, seq(0.01, 1, 0.01)), verbose = TRUE)
testForAllelicBiases(SNPhood.o, readGroups, confLevel = 0.95, nullHypothesisFraction = 0.5, calcBackgroundDistr = TRUE, nRepetitions = 100, pValuesToTestBackground = c(1e-04, 5e-04, 0.001, 0.005, seq(0.01, 1, 0.01)), verbose = TRUE)
SNPhood.o |
Object of class |
readGroups |
Character or NULL. Default NULL. Read groups that should be plotted, specified by their name as obtained by the function |
confLevel |
Numeric(1). Default 0.95. The confidence level for estimating the confidence intervals. Must be between 0 and 1. |
nullHypothesisFraction |
Numeric(1). Default 0.5. The expected probability under the null hypothesis of not having any bias. Must be between 0 and 1. |
calcBackgroundDistr |
Logical(1). Default |
nRepetitions |
Integer(1). Default 10. Number of repetitions for calculating the background distribution.
Only relevant if |
pValuesToTestBackground |
Numeric. Default c(0.0001, 0.0005, 0.001, 0.005, seq(0.01,1,0.01)). Set of p-values for which corresponding FDR values will be computed |
verbose |
Logical(1). Default TRUE. Should the verbose mode (i.e., diagnostic messages during execution of the script) be enabled? |
Object of class SNPhood
with all the data from the allelic bias test stored in the slot additionalResults
,
which can be easily retrieved via the accessor function results
.
See the help pages of the result function (?results) or the vignette for details.
data(SNPhood.o, package="SNPhood") ## Perform the test without calculating the background distribution SNPhood.o = testForAllelicBiases (SNPhood.o, readGroups = c("paternal","maternal")) str(results(SNPhood.o, type="allelicBias"), list.len = 8) ## Check the parameters results(SNPhood.o, type="allelicBias", elements = "parameters")
data(SNPhood.o, package="SNPhood") ## Perform the test without calculating the background distribution SNPhood.o = testForAllelicBiases (SNPhood.o, readGroups = c("paternal","maternal")) str(results(SNPhood.o, type="allelicBias"), list.len = 8) ## Check the parameters results(SNPhood.o, type="allelicBias", elements = "parameters")