Package 'Basic4Cseq' reference manual

Title:	Basic4Cseq: an R/Bioconductor package for analyzing 4C-seq data
Description:	Basic4Cseq is an R/Bioconductor package for basic filtering, analysis and subsequent visualization of 4C-seq data. Virtual fragment libraries can be created for any BSGenome package, and filter functions for both reads and fragments and basic quality controls are included. Fragment data in the vicinity of the experiment's viewpoint can be visualized as a coverage plot based on a running median approach and a multi-scale contact profile.
Authors:	Carolin Walter
Maintainer:	Carolin Walter <[email protected]>
License:	LGPL-3
Version:	1.43.0
Built:	2025-03-29 05:19:01 UTC
Source:	https://github.com/bioc/Basic4Cseq

Remove invalid 4C-seq reads from a SAM file

Description

Basic4Cseq offers filter functions for invalid 4C-seq reads. This function removes 4C-seq reads from a provided Sequence Alignment/Map (SAM) file that show mismatches in the restriction enzyme sequence.

Usage

checkRestrictionEnzymeSequence(firstCutter, inputFileName, outputFileName = "output.sam", keepOnlyUniqueReads = TRUE, writeStatistics = TRUE)checkRestrictionEnzymeSequence(firstCutter, inputFileName, outputFileName = "output.sam", keepOnlyUniqueReads = TRUE, writeStatistics = TRUE)

Arguments

`firstCutter`	First restriction enzyme sequence of the 4C-seq experiment
`inputFileName`	Name of the input SAM file that contains aligned reads for the 4C-seq experiment
`outputFileName`	Name of the output SAM file that is created to store the filtered 4C-seq reads
`keepOnlyUniqueReads`	If TRUE, delete non-unique reads. Information in the SAM flag field is used to determine whether a read is unique or not.
`writeStatistics`	If TRUE, write statistics (e.g. the number of unique reads) to a text file

Details

Valid 4C-seq reads start at a primary restriction site and continue with its downstream sequence, so any mismatch in the restriction enzyme sequence of a read is an indicator for a mismatch. The mapping information of the restriction enzyme sequence bases of a read (if present) can be used for filtering purposes. checkRestrictionEnzymeSequence tests the first bases of a read (depending on the length of the first restriction enzyme either 4 or 6 bp long) for mismatches. Reads with mismatches in the restriction enzyme sequence are deleted, the filtered data is then written to a new SAM file. The function does not yet differentiate between blind and nonblind fragments, but removes potential misalignments that may overlap with valid fragment ends and distort the true 4C-seq signal.

Value

A SAM file containing the filtered valid 4C-seq reads

Note

The use of the function is only possible if the restriction enzyme sequence is not trimmed or otherwise absent.

Author(s)

Carolin Walter

Examples

    if(interactive()) {
        file <- system.file("extdata", "fetalLiverCutter.sam", package="Basic4Cseq")
        checkRestrictionEnzymeSequence("aagctt", file)
    }
if(interactive()) {
        file <- system.file("extdata", "fetalLiverCutter.sam", package="Basic4Cseq")
        checkRestrictionEnzymeSequence("aagctt", file)
    }

Choose fragments in a provided region around the viewpoint

Description

This function extracts fragment data from a Data4Cseq object's rawFragments slot for visualization with the functions visualizeViewpoint and drawHeatmap . Relevant fragments are located within the chosen visualization range; the viewpoint itself can be excluded or included.

Usage

chooseNearCisFragments(expData, regionCoordinates, deleteViewpoint = TRUE)chooseNearCisFragments(expData, regionCoordinates, deleteViewpoint = TRUE)

Arguments

`expData`	Experiment data of class `Data4Cseq` with information on the 4C-seq experiment, including fragment data for the viewpoint chromosome
`regionCoordinates`	Interval on the viewpoint chromosome for the intended visualization
`deleteViewpoint`	If TRUE, delete all fragments that intersect with the experiment's viewpoint interval

Value

A data frame containing the chosen near-cis fragments

Note

Viewpoint fragments are removed per default to prevent bias through overrepresented sequences caused by self-ligation. These fragments can be included, but should be interpreted with caution.

Author(s)

Carolin Walter

Examples

    # read example data
    data(liverData)
    fragments<-chooseNearCisFragments(liverData, regionCoordinates = c(20800000, 21000000))
    head(fragments)
# read example data
    data(liverData)
    fragments<-chooseNearCisFragments(liverData, regionCoordinates = c(20800000, 21000000))
    head(fragments)

Create a virtual fragment library from a provided genome and two restriction enzymes

Description

Basic4Cseq can create virtual fragment libraries from any BSgenome package or DNAString object. Two restriction enzymes have to be specified to cut the DNA, the read length is needed to check the fragment ends of corresponding length for uniqueness. Filter options (minimum and maximum size) are provided on fragment level and on fragment end level.

Usage

createVirtualFragmentLibrary(chosenGenome, firstCutter, secondCutter, readLength, onlyNonBlind = TRUE, useOnlyIndex = FALSE, minSize = 0, maxSize = -1, minFragEndSize = 0, maxFragEndSize = 10000000, useAllData = TRUE, chromosomeName = "chr1", libraryName = "default")
createVirtualFragmentLibrary(chosenGenome, firstCutter, secondCutter, readLength, onlyNonBlind = TRUE, useOnlyIndex = FALSE, minSize = 0, maxSize = -1, minFragEndSize = 0, maxFragEndSize = 10000000, useAllData = TRUE, chromosomeName = "chr1", libraryName = "default")

Arguments

`chosenGenome`	The genome that is to be digested in silico with the provided enzymes; can be an instance of BSgenome or DNAString
`firstCutter`	First of two restriction enzymes
`secondCutter`	Second of two restriction enzymes
`readLength`	Read length for the experiment
`onlyNonBlind`	Variable that is TRUE (default) if only non-blind fragments are considered (i.e. all blind fragments are removed)
`useOnlyIndex`	Convenience function to adapt the annotation style of the chromosomes ("chr1", ... "chrY" or "1", ..., "Y"); parameter has to be set to match the BAM file in question
`minSize`	Filter option that allows to delete fragments below a certain size (in bp)
`maxSize`	Filter option that allows to delete fragments above a certain size (in bp)
`minFragEndSize`	Filter option that allows to delete fragment ends below a certain size (in bp)
`maxFragEndSize`	Filter option that allows to delete fragment ends above a certain size (in bp)
`useAllData`	Variable that indicates if all data of a BSgenome package is to be used. If FALSE, chromosome names including a "_" are removed, reducing the set of chromosomes to (1 ... 19, X, Y, MT) for the mouse genome or (1 ... 22, X, Y, MT) for the human genome
`chromosomeName`	Chromosome name for the virtual fragment library if a `DNAString` object is used instead of a `BSgenome` object.
`libraryName`	Name of the file the created virtual fragment library is written to. Per default the file is called "fragments_firstCutter_secondCutter.csv". The fragment data is returned as a data frame if and only if an empty character string is chosen as `libraryName`.

Details

readLength is relevant for the creation of the virtual fragment library to differenciate between unique and non-unique fragment ends. While two fragments can be unique, their respective ends may be repetitive if only the first few bases are considered. For 4C-seq data, reads can only map to the start (or end, respectively) of a 4C-seq fragment, the remaining fragment part is not covered. The length of a fragment end that has to be checked for uniqueness therefore depends on the read length of the experiment.
useAllData uses the lengths of the chromosomes to identify relevant ones, based on the current BSgenome packages for mm10 or hg19, and may therefore provide undesirable results for smaller genomes with different lengths (i.e. discard all chromosomes).
The length of a fragment influences the expected read count of a 4C-seq fragment. Per default, Basic4Cseq uses the experiment's read length as minimum fragment end size and places virtually no limit on the maximum fragment end size.

Value

A tab-separated file with the specified virtual fragment library (containing fragment position, length, presence of second restriction enzyme and uniqueness of the fragment ends)

Note

It is strongly recommended to preprocess and store the virtual fragment library if a number of experiments with the same restriction enzyme combination, read length and underlying genome have to be analyzed.
Processing one of the larger BSgenome packages takes some time and computer data storage.
If no library name for the virtual fragment library is specified, the fragment data is returned as a data frame. If the library name "default" is chosen, the tab-separated file is named "fragments_firstCutter_secondCutter" (with variable cutter sequences).

Author(s)

Carolin Walter

Examples

    if(interactive()) {
        library(BSgenome.Ecoli.NCBI.20080805)
        fragmentData = createVirtualFragmentLibrary(chosenGenome = Ecoli$NC_002655, firstCutter = "catg", secondCutter = "gtac", readLength = 30,  onlyNonBlind = TRUE, chromosomeName = "NC_002655", libraryName = "fragments_Ecoli.csv")
    }
if(interactive()) {
        library(BSgenome.Ecoli.NCBI.20080805)
        fragmentData = createVirtualFragmentLibrary(chosenGenome = Ecoli$NC_002655, firstCutter = "catg", secondCutter = "gtac", readLength = 30,  onlyNonBlind = TRUE, chromosomeName = "NC_002655", libraryName = "fragments_Ecoli.csv")
    }

Creating a Data4Cseq object

Description

This function creates a Data4Cseq object. Data on the 4C-seq experiment, e.g. the chromosome of the viewpoint, is stored and checked for consistency.

Usage

Data4Cseq(viewpointChromosome, viewpointInterval, readLength, pointsOfInterest, rawReads)Data4Cseq(viewpointChromosome, viewpointInterval, readLength, pointsOfInterest, rawReads)

Arguments

`viewpointChromosome`	The experiment's viewpoint chromosome
`viewpointInterval`	The interval of the experiment's viewpoint, consisting of a start and end coordinate
`readLength`	The experiment's read length (in base pairs)
`pointsOfInterest`	Points of interest to be marked in a near-cis visualization
`rawReads`	Reads of the 4C-seq experiment, aligned and stored as an `GAlignments` object

Details

A Data4Cseq object contains basic information for the corresponding 4C-seq experiment, including the viewpoint chromosome, the viewpoint region and reads from the experiment. See Data4Cseq-class for more details. The constructor collects the basic data; fragment data or normalized read counts are added later.

Fragments at the experiment's viewpoint are usually vastly overrepresented due to self-ligation; chooseNearCisFragments offers the option to discard all fragments in the specified viewpoint region. The specified viewpoint interval of a Data4Cseq object is supposed to correspond to the positions of the biological primers on the genome, but can also be increased in size if more fragments around the viewpoint should be removed.

Value

An instance of the Data4Cseq class.

Author(s)

Carolin Walter

Examples

    # create a Data4Cseq object with a minimum of data
    liverData = Data4Cseq(viewpointChromosome = "10", viewpointInterval = c(20879870, 20882209), readLength = 54)
    liverData

    # create a Data4Cseq object, including possible points of interest and raw reads 
    bamFile <- system.file("extdata", "fetalLiverShort.bam", package="Basic4Cseq")
    liverReads <- readGAlignments(bamFile)
    pointsOfInterestFile <- system.file("extdata", "fetalLiverVP.bed", package="Basic4Cseq")
    liverData = Data4Cseq(viewpointChromosome = "10", viewpointInterval = c(20879870, 20882209), readLength = 54, pointsOfInterest = readPointsOfInterestFile(pointsOfInterestFile), rawReads = liverReads)
    liverData
# create a Data4Cseq object with a minimum of data
    liverData = Data4Cseq(viewpointChromosome = "10", viewpointInterval = c(20879870, 20882209), readLength = 54)
    liverData

    # create a Data4Cseq object, including possible points of interest and raw reads 
    bamFile <- system.file("extdata", "fetalLiverShort.bam", package="Basic4Cseq")
    liverReads <- readGAlignments(bamFile)
    pointsOfInterestFile <- system.file("extdata", "fetalLiverVP.bed", package="Basic4Cseq")
    liverData = Data4Cseq(viewpointChromosome = "10", viewpointInterval = c(20879870, 20882209), readLength = 54, pointsOfInterest = readPointsOfInterestFile(pointsOfInterestFile), rawReads = liverReads)
    liverData

Class `"Data4Cseq"`

Description

This class is a container for information on a specific 4C-seq experiment. Stored information includes raw reads, fragment data, and the experiment's viewpoint location.

Objects from the Class

Objects can be created by calls of the form new("Data4Cseq", ...).

Slots

viewpointChromosome:: Object of class "character" representing the viewpoint chromosome's name
viewpointInterval:: Object of class "numeric" representing the viewpoint interval's location
readLength:: Object of class "numeric" representing the experiment's read length
pointsOfInterest:: Object of class "data.frame" representing any points of interest to be marked in the near-cis visualizations
rawReads:: Object of class "GAlignments" representing the raw 4C-seq reads of the experiment
rawFragments:: Object of class "data.frame" representing the experiment's corresponding virtual fragment library
nearCisFragments:: Object of class "data.frame" representing near-cis data in fragment form

Methods

viewpointChromosome<-: signature(object = "Data4Cseq", value = "character"): Setter-method for the viewpointChromosome slot.
viewpointChromosome: signature(object = "Data4Cseq"): Getter-method for the viewpointChromosome slot.
viewpointInterval<-: signature(object = "Data4Cseq", value = "numeric"): Setter-method for the viewpointInterval slot.
viewpointInterval: signature(object = "Data4Cseq"): Getter-method for the viewpointInterval slot.
readLength<-: signature(object = "Data4Cseq", value = "numeric"): Setter-method for the readLength slot.
readLength: signature(object = "Data4Cseq"): Getter-method for the readLength slot.
pointsOfInterest<-: signature(object = "Data4Cseq", value = "data.frame"): Setter-method for the pointsOfInterest slot.
pointsOfInterest: signature(object = "Data4Cseq"): Getter-method for the pointsOfInterest slot.
rawReads<-: signature(object = "Data4Cseq", value = "GAlignments"): Setter-method for the rawReads slot.
rawReads: signature(object = "Data4Cseq"): Getter-method for the rawReads slot.
rawFragments<-: signature(object = "Data4Cseq", value = "data.frame"): Setter-method for the rawFragments slot.
rawFragments: signature(object = "Data4Cseq"): Getter-method for the rawFragments slot.
nearCisFragments<-: signature(object = "Data4Cseq", value = "data.frame"): Setter-method for the nearCisFragments slot.
nearCisFragments: signature(object = "Data4Cseq"): Getter-method for the nearCisFragments slot.

Author(s)

Carolin Walter

Examples

showClass("Data4Cseq")
showClass("Data4Cseq")

Visualize digestion fragments with a histogram

Description

This function is a small convenience function to plot the results of simulateDigestion as a histogram. Minimum and maximum fragment lengths can be specified to visualize a specified interval of the fragment data.

Usage

    drawDigestionFragmentHistogram(fragments, minLength = 0, maxLength = 10000)
drawDigestionFragmentHistogram(fragments, minLength = 0, maxLength = 10000)

Arguments

`fragments`	Fragment data to visualize (data frame with lengths and corresponding frequencies)
`minLength`	Minimum fragment length to visualize
`maxLength`	Maximum fragment length to visualize

Value

Histogram plot of the fragment data

Author(s)

Carolin Walter

Examples

    shortTestGenome = "ATCCATGTAGGCTAAGTACACATGTTAAGGTACAGTACAATTGCACGATCAT"
    fragments = simulateDigestion("catg", "gtac", shortTestGenome)
    drawDigestionFragmentHistogram(fragments)
shortTestGenome = "ATCCATGTAGGCTAAGTACACATGTTAAGGTACAGTACAATTGCACGATCAT"
    fragments = simulateDigestion("catg", "gtac", shortTestGenome)
    drawDigestionFragmentHistogram(fragments)

Draw a heatmap-like multi-scale contact profile

Description

This method draws a fragment-based heatmap-like plot for 4C-seq data around a given viewpoint. For a given number of bands, color-coded running medians or running means of signal intensity (normalized and log-scaled) in different fragments are displayed; the window size of the running medians or running means increases from top to bottom. A corresponding colour legend is added in an extra plot.

Usage

    ## S4 method for signature 'Data4Cseq'
drawHeatmap(expData, plotFileName = "", smoothingType = "median", picDim = c(9, 2.2), bands = 5, cutoffLog = -7.0, xAxisIntervalLength = 50000, legendLabels = expression(2^-7, 2^0), useFragEnds = TRUE)
    ## S4 method for signature 'data.frame'
drawHeatmap(expData, plotFileName = "", smoothingType = "median", picDim = c(9, 2.2), bands = 5, cutoffLog = -7.0, xAxisIntervalLength = 50000, legendLabels = expression(2^-7, 2^0), useFragEnds = TRUE)
## S4 method for signature 'Data4Cseq'
drawHeatmap(expData, plotFileName = "", smoothingType = "median", picDim = c(9, 2.2), bands = 5, cutoffLog = -7.0, xAxisIntervalLength = 50000, legendLabels = expression(2^-7, 2^0), useFragEnds = TRUE)
    ## S4 method for signature 'data.frame'
drawHeatmap(expData, plotFileName = "", smoothingType = "median", picDim = c(9, 2.2), bands = 5, cutoffLog = -7.0, xAxisIntervalLength = 50000, legendLabels = expression(2^-7, 2^0), useFragEnds = TRUE)

Arguments

`expData`	Experiment data from a given 4C-seq experiment for visualization; can be a `Data4Cseq` object or a data frame
`plotFileName`	Name for the heatmap plot
`smoothingType`	Type of interpolation (running mean or running median). Default value is "median" (i.e. running median)
`picDim`	Dimensions of the plot. Default value is c(9, 2.2), to fit a small heatmap plot below the main 4C-seq plot that is created by `visualizeViewpoint`
`bands`	Number of coloured "bands" (rows) to visualize. The first band contains the raw data (running median or running mean with window size 1), the following bands increase in window size (+2 per band)
`cutoffLog`	Cut off value for the logarithmic scale
`xAxisIntervalLength`	Length of the x axis intervals in the plot
`legendLabels`	Labels for a heat colour legend plot; labels should correspond to the logarithmic cut offs
`useFragEnds`	Indicates whether fragment end data is used directly or interpolated on fragment level

Value

A multiscale intensity contact profile plot and a corresponding colour legend)

Note

PDF export and output as TIFF format are supported. The export format is chosen depending on the plot file name's ending. If no plot file name is provided, the result is plotted on screen.

Author(s)

Carolin Walter

Examples

    if(interactive()) {
        data(liverData)
        drawHeatmap(liverData)
    }
if(interactive()) {
        data(liverData)
        drawHeatmap(liverData)
    }

Export near-cis fragment data of a `Data4Cseq` object

Description

This function is a simple helper function that writes the near-cis data of a Data4Cseq object as tab-separated file to hard disk.

Usage

exportVisualizationFragmentData(expData, fileName, fullData = FALSE)exportVisualizationFragmentData(expData, fileName, fullData = FALSE)

Arguments

`expData`	Experiment data of class `Data4Cseq` information on the 4C-seq experiment, including visualization data
`fileName`	Name for the tab-separated file
`fullData`	If TRUE, the function exports the full fragment data (including fragment end length etc). If FALSE, only the minimum fragment information is exported, i.e. chromosome, start, end and (normalized) read count.

Value

A tab-separated file containing near-cis framgent data of a Data4Cseq object

Author(s)

Carolin Walter

Examples

    if(interactive()) {
        data(liverData)
        exportVisualizationFragmentData(liverData, "fetalLiverData.csv")
    }
if(interactive()) {
        data(liverData)
        exportVisualizationFragmentData(liverData, "fetalLiverData.csv")
    }

Calculate the read distribution for a 4C-seq experiment

Description

This function provides some 4C-seq quality statistics based on the experiment's read distribution throughout the genome. getReadDistribution calculates the number of total reads, cis to overall ratio of reads, and the percentage of covered fragment ends within a certain distance around the experiment's viewpoint. Reference values for high-quality experiments, as provided by van de Werken et al, 2012, are more than one million reads total, a cis to overall ratio of more than 40% and a large fraction of covered fragment ends in the viewpoint's vicinity.

Usage

getReadDistribution(expData, distanceFromVP = 100000, useFragEnds = TRUE, outputName = "")
getReadDistribution(expData, distanceFromVP = 100000, useFragEnds = TRUE, outputName = "")

Arguments

`expData`	Experiment data of class `Data4Cseq` with information on the 4C-seq experiment
`distanceFromVP`	Distance from the viewpoint that is checked for covered fragments
`useFragEnds`	If TRUE, the function uses fragment end data; if FALSE, an average value for whole fragments is used.
`outputName`	An optional name for an output text file containing the statistics data

Value

Text with statistics data on the 4C-seq experiment

Note

Text export is supported; if no file name is provided, the results are printed on screen.

Author(s)

Carolin Walter

References

van de Werken, H., de Vree, P., Splinter, E., et al. (2012): 4C technology: protocols and data analysis, Methods Enzymology, 513, 89-112

Examples

    data(liverData)
    getReadDistribution(liverData)
data(liverData)
    getReadDistribution(liverData)

Provide the corresponding enzyme sequence for an enzyme name

Description

This function is a small convenience function that reads in a prepared file with restriction enzyme sequence names and sequences. giveEnzymeSequence then provides restriction enzyme sequences for the example enzymes listed in van de Werken et al's 4Cseqpipe data base.

Usage

giveEnzymeSequence(fileNameDatabase, enzymeName)giveEnzymeSequence(fileNameDatabase, enzymeName)

Arguments

`fileNameDatabase`	File name of the prepared enzyme database
`enzymeName`	Name of the enzyme for which the sequence is to be returned

Value

Character string with the restriction enzyme sequence

Note

For any custom-made enzyme list it is assumed that there are no duplicate enzyme names in the database.

Author(s)

Carolin Walter

References

van de Werken, H., Landan, G., Holwerda, S., et al. (2012): Robust 4C-seq data analysis to screen for regulatory DNA interactions, Nature Methods, 9, 969-971.

Examples

    file <- system.file("extdata", "enzymeData.csv", package="Basic4Cseq")  
    giveEnzymeSequence(file, "NlaIII")
file <- system.file("extdata", "enzymeData.csv", package="Basic4Cseq")  
    giveEnzymeSequence(file, "NlaIII")

Import visualization data from a file

Description

This function is a simple helper function that can import near-cis data which was previously exported and stored as tab-separated file.

Usage

importVisualizationFragmentData(fileName)importVisualizationFragmentData(fileName)

Arguments

fileName

Name for the tab-separated file with near-cis fragment data

Value

Data frame containing the near-cis fragment data

Author(s)

Carolin Walter

Examples

    file <- system.file("extdata", "fetalLiver_finalFragments.csv", package="Basic4Cseq")
    importVisualizationFragmentData(file)
    head(file)
file <- system.file("extdata", "fetalLiver_finalFragments.csv", package="Basic4Cseq")
    importVisualizationFragmentData(file)
    head(file)

Example 4C-seq data set of fetal liver data

Description

This data set contains an instance of a Data4Cseq object; 2185 reads on 453 fragments are included.

The 4C-seq data was taken from Stadhouders et al's fetal liver data set.

Usage

    data("liverData")
data("liverData")

Format

Formal class 'Data4Cseq'

Source

Shortened version of Stadhouders et al's fetal liver data:

Stadhouders, R., Thongjuea, S., et al. (2012): Dynamic long-range chromatin interactions control Myb proto-oncogene transcription during erythroid development. EMBO, 31, 986-999.

Examples

    data("liverData")
    liverData
data("liverData")
    liverData

Example 4C-seq data set of fetal liver data

Description

This data set contains an instance of a Data4Cseq object; 2185 reads on 453 fragments are included. Raw reads are mapped to fragments, but the read count has not yet been normalized.

The 4C-seq data was taken from Stadhouders et al's fetal liver data set.

Usage

    data("liverDataRaw")
data("liverDataRaw")

Format

Formal class 'Data4Cseq'

Source

Shortened version of Stadhouders et al's fetal liver data:

Stadhouders, R., Thongjuea, S., et al. (2012): Dynamic long-range chromatin interactions control Myb proto-oncogene transcription during erythroid development. EMBO, 31, 986-999.

Examples

    data("liverDataRaw")
    liverDataRaw
data("liverDataRaw")
    liverDataRaw

Normalize near-cis fragment data read count

Description

This function provides a simple RPM (reads per million) normalization for near-cis fragment data read counts of a Data4Cseq object. A form of normalization is especially important for the comparison of samples with a different read count.

Usage

normalizeFragmentData(expData)normalizeFragmentData(expData)

Arguments

expData

Experiment data of class Data4Cseq with information on the 4C-seq experiment

Value

Data frame with RPM-normalized data

Author(s)

Carolin Walter

Examples

    data(liverDataRaw)
    normalizedFragments<-normalizeFragmentData(liverDataRaw)
    head(normalizedFragments)
data(liverDataRaw)
    normalizedFragments<-normalizeFragmentData(liverDataRaw)
    head(normalizedFragments)

Visualize trans interaction intervals

Description

This function visualizes trans interaction intervals of a 4C-seq experiment with the help of the RCircos package. Significant interactions can be obtained by use of Splinter et al's significant_interactions code or similar algorithms.

Usage

plotTransInteractions(interactionFile, chromosomeViewpoint, coordViewpoint, ideogramData, PlotColor = "default", expandBands = FALSE, expansionValue = 0, plotFileName = "", picDim = c(8, 8))
plotTransInteractions(interactionFile, chromosomeViewpoint, coordViewpoint, ideogramData, PlotColor = "default", expandBands = FALSE, expansionValue = 0, plotFileName = "", picDim = c(8, 8))

Arguments

`interactionFile`	Interaction interval data; either a file name or a data frame
`chromosomeViewpoint`	Viewpoint chromosome of the 4C-seq experiment
`coordViewpoint`	Viewpoint coordinates of the 4C-seq experiment
`ideogramData`	Ideogram data to be visualized in the RCirco-plot; either a file name or a data frame
`PlotColor`	Plot colours for the visualized interactions
`expandBands`	If TRUE, add a specified value to the size of the interaction intervals to increase the visibility of very small interactions
`expansionValue`	Value that is added to each interaction interval end
`plotFileName`	Optional name for an output file
`picDim`	Dimensions of the plot

Details

The code of Splinter et al to determine significant interactions provides chromosome, start and end of interaction intervals and a forth column with information on far-cis or trans data. This column is ignored by plotTransInteractions; it is assumed that all interactions for trans visualization are indeed trans interactions. Otherwise, far-cis interactions are visualized as well. While not a mistake per se, the (usually more numerous) far-cis interactions are easier to interpret if visualized with Splinter et al's spider-plot functions.

Value

An RCircos-plot of trans interaction intervals

Note

PDF export and output as TIFF format are supported. The export format is chosen depending on the plot file name's ending. If no plot file name is provided, the result is plotted on screen.

Author(s)

Carolin Walter

References

Zhang, H., Meltzer, P. and Davis, S. (2013) RCircos: an R package for Circos 2D track plots, BMC Bioinformatics, 14, 244

Splinter, E., de Wit, E., van de Werken, H., et al. (2012) Determining long-range chromatin interactions for selected genomic sites using 4C-seq technology: From fixation to computation, Methods, 58, 221-230.

Examples

    if(interactive()) {
        library(RCircos)
        interactions <- system.file("extdata", "transInteractionData.txt", package="Basic4Cseq")
        ideograms <- system.file("extdata", "RCircos_GRCm38_ideogram.csv", package="Basic4Cseq")
        plotTransInteractions(interactions, "10", c(20000042, 20001000), ideograms, PlotColor = "blue", expandBands = TRUE, expansionValue = 1000000, plotFileName = "")
    }
if(interactive()) {
        library(RCircos)
        interactions <- system.file("extdata", "transInteractionData.txt", package="Basic4Cseq")
        ideograms <- system.file("extdata", "RCircos_GRCm38_ideogram.csv", package="Basic4Cseq")
        plotTransInteractions(interactions, "10", c(20000042, 20001000), ideograms, PlotColor = "blue", expandBands = TRUE, expansionValue = 1000000, plotFileName = "")
    }

Alignment and filtering of raw 4C-seq data

Description

This function is an optional wrapper for the alignment and preliminary filtering of 4C-seq data. prepare4CseqData reads a provided 4C-seq fastq file from hard disk. Alignment of the reads is done with BWA, the function checkRestrictionEnzymeSequence is used for optional filtering. Samtools and bedtools provide the necessary functionality for intersecting the filtered reads with a given 4C-seq fragment library for visualization purposes (e.g. with the Integrative Genomics Viewer, IGV).

Usage

prepare4CseqData(fastqFileName, firstCutter, fragmentLibrary, referenceGenome, pathToBWA = "", pathToSam = "", pathToBED = "", controlCutterSequence = FALSE, bwaThreads = 1, minFragEndLength = 0)
prepare4CseqData(fastqFileName, firstCutter, fragmentLibrary, referenceGenome, pathToBWA = "", pathToSam = "", pathToBED = "", controlCutterSequence = FALSE, bwaThreads = 1, minFragEndLength = 0)

Arguments

`fastqFileName`	The name of the fastq file that contains the 4C-seq reads
`firstCutter`	First cutting enzyme sequence for the 4C-seq experiment, e.g. "AAGCTT"
`fragmentLibrary`	Name of the fragment library to use for the current 4C-seq experiment; has to correspond to the chosen cutters and chosen genome
`referenceGenome`	Name (plus path) of the reference genome to use
`pathToBWA`	Path to BWA
`pathToSam`	Path to samtools
`pathToBED`	Path to bedtools
`controlCutterSequence`	If TRUE, the function `checkRestrictionEnzymeSequence` is used to filter non-valid 4C-seq reads
`bwaThreads`	Number of BWA threads
`minFragEndLength`	Minimum fragment end length to use for BED export

Value

computes and writes sorted .bam file for the data, as long as BWA, samtools and bedtools are available

Author(s)

Carolin Walter

References

Li, H. and Durbin, R. (2009) Fast and accurate short read alignment with Burrows-Wheeler Transform, Bioinformatics, 25, 1754-60.

Helga Thorvaldsdottir, James T. Robinson, Jill P. Mesirov. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Briefings in Bioinformatics 2012.

Examples

    if(interactive()) {
        # BWA, samtools and bedtools must be installed
        # It is assumed that the example data files (from the package) are in the active directory
        prepare4CseqData("veryShortExample.fastq", "CATG", "veryShortLib.csv", referenceGenome = "veryShortReference.fasta")
    }
if(interactive()) {
        # BWA, samtools and bedtools must be installed
        # It is assumed that the example data files (from the package) are in the active directory
        prepare4CseqData("veryShortExample.fastq", "CATG", "veryShortLib.csv", referenceGenome = "veryShortReference.fasta")
    }

Print a BED-file fragment library

Description

This function extracts the first columns of a virtual fragment library file and exports them as a BED-file for use with other tools (e.g. visualization in the Integrative Genomics Viewer (IGV).)

Usage

printBEDFragmentLibrary(fragmentLibrary, BEDLibraryName, minFragEndLength = 0, zeroBased = FALSE)
printBEDFragmentLibrary(fragmentLibrary, BEDLibraryName, minFragEndLength = 0, zeroBased = FALSE)

Arguments

`fragmentLibrary`	Virtual fragment library file name
`BEDLibraryName`	File name for the exported BED file
`minFragEndLength`	Minimum fragment end length to be considered
`zeroBased`	If TRUE, adapt the start of the BED-file fragments

Value

writes BED-file containing the virtual fragment library position data

Author(s)

Carolin Walter

Examples

    if(interactive()) {
        file <- system.file("extdata", "vfl_aagctt_catg_mm9_54_vp.csv", package="Basic4Cseq")
        printBEDFragmentLibrary(file, "BEDLibrary_FL_vp.bed")
    }
if(interactive()) {
        file <- system.file("extdata", "vfl_aagctt_catg_mm9_54_vp.csv", package="Basic4Cseq")
        printBEDFragmentLibrary(file, "BEDLibrary_FL_vp.bed")
    }

Print a wig file from 4C-seq read data

Description

This function provides wig files from filtered fragment data. Only reads on unique frag-ends are considered for the export. Export of wig files with a fixed span length requires a uniform read length throughout the data.

While some tools (e.g. the Integrative Genomics Viewer, IGV) accept 'raw' wig data, the UCSC browser needs a header line for correct visualizations. A basic header line has the form 'track type=wiggle_0', but may also contain information on the track's name and a short description. Since the header line may complicate possible downstream analysis of the wig files, no header is included per default.

Usage

printWigFile(expData, wigFileName = "output.wig", fixedSpan = TRUE, headerUCSC = "", useOnlyIndex = FALSE)
printWigFile(expData, wigFileName = "output.wig", fixedSpan = TRUE, headerUCSC = "", useOnlyIndex = FALSE)

Arguments

`expData`	Experiment data of class `Data4Cseq` with information on the 4C-seq experiment
`wigFileName`	Name of the wig file that is written to hard disk
`fixedSpan`	If TRUE, use a fixed span for the wig file
`headerUCSC`	A header line for the UCSC browser
`useOnlyIndex`	If TRUE, use only '1,2,...Y' as chromosome names, if FALSE, use 'chr1,chr2...chrY'.

Value

A wig file containing the experiment's reads

Author(s)

Carolin Walter

References

UCSC Genome Browser: Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D. The human genome browser at UCSC. Genome Res. 2002 Jun;12(6):996-1006.

http://genome.ucsc.edu/

Examples

    if(interactive()) {
        data(liverData)
        printWigFile(liverData, wigFileName = "fetalLiver.wig")
    }
if(interactive()) {
        data(liverData)
        printWigFile(liverData, wigFileName = "fetalLiver.wig")
    }

Read a file with coordinates of marker points

Description

This small helper function reads a tab-separated file with points of interest information stored in a BED-like format. The file has to provide the columns "chromosome", "start", "end", "name" and "colour" of the regions. The data can then be used for marking the points in near-cis visualization plots, as provided by visualizeViewpoint and drawHeatmap.

Usage

readPointsOfInterestFile(poiFile)readPointsOfInterestFile(poiFile)

Arguments

poiFile

Name of the input file (tab-separated)

Value

Data frame with information on points of interest for the near-cis visualizations

Author(s)

Carolin Walter

Examples

    file <- system.file("extdata", "fetalLiverVP.bed", package="Basic4Cseq")
    pointsOfInterests = readPointsOfInterestFile(file)
    pointsOfInterests
file <- system.file("extdata", "fetalLiverVP.bed", package="Basic4Cseq")
    pointsOfInterests = readPointsOfInterestFile(file)
    pointsOfInterests

Determine fragment coverage of a 4C-seq fragment library

Description

This function maps aligned reads to fragment ends of the virtual fragment library to calculate the coverage of the fragments. The number of reads at the start and end of a fragment is provided, as well as the average of both fragment ends.

Usage

readsToFragments(expData, fragmentLib)readsToFragments(expData, fragmentLib)

Arguments

`expData`	Experiment data of class `Data4Cseq` with information on the 4C-seq experiment, including raw 4C-seq read data
`fragmentLib`	Fragment library for the given genome and cutting enzyme combination

Value

Data frame containing fragment-based data, i.e. a fragment's position and read coverage

Author(s)

Carolin Walter

Examples

    data(liverData)
    file <- system.file("extdata", "vfl_aagctt_catg_mm9_54_vp.csv", package="Basic4Cseq")
    rawFragments(liverData) = readsToFragments(liverData, file)
    head(rawFragments(liverData))
data(liverData)
    file <- system.file("extdata", "vfl_aagctt_catg_mm9_54_vp.csv", package="Basic4Cseq")
    rawFragments(liverData) = readsToFragments(liverData, file)
    head(rawFragments(liverData))

Simulate the digestion of a genome

Description

This function simulates the digestion process with two restriction enzymes for a dna sequence or a BSgenome package. The information can then be used for quality controls of the biological fragment library.

Usage

    simulateDigestion(firstCutter, secondCutter, dnaSequence)
simulateDigestion(firstCutter, secondCutter, dnaSequence)

Arguments

`firstCutter`	First restriction enzyme sequence for the digestion process
`secondCutter`	Second restriction enzyme sequence for the digestion process
`dnaSequence`	DNA sequence that is digested

Details

The resulting virtual library of fragment parts does not provide information on blind or non-blind fragments, but provides information on the fragment length distribution of the real (i.e. biological) 4C-seq library. In contrast to the regular virtual fragment library for 4C-seq data, fragments between two adjacent secondary restriction enzyme sites are counted as well.

Value

Data frame with lengths and corresponding frequences of fragments

Note

The resulting fragment lengths and corresponding frequencies can easily be visualized with R's plot function or the small convenience function drawDigestionFragmentHistogram
The resulting table of fragment frequencies can easily be exported with R's write.table function

Author(s)

Carolin Walter

Examples

    shortTestGenome = "ATCCATGTAGGCTAAGTACACATGTTAAGGTACAGTACAATTGCACGATCAT"
    fragments = simulateDigestion("catg", "gtac", shortTestGenome)
    head(fragments)
shortTestGenome = "ATCCATGTAGGCTAAGTACACATGTTAAGGTACAGTACAATTGCACGATCAT"
    fragments = simulateDigestion("catg", "gtac", shortTestGenome)
    head(fragments)

Draw a near-cis coverage plot for 4C-seq data

Description

This method creates a plot of near-cis 4C-seq fragment data around the experiment's viewpoint. Fragment-based raw data is visualized as grey dots, interpolated data (running median / running mean) as coloured dots. Trend line and quantiles are loess-smoothed; the trend line is shown as colored line whereas the quantiles are depicted as light-grey bands. A corresponding quantile legend is added in an extra plot.

Usage

visualizeViewpoint(expData, poi = data.frame(chr = character(), start = character(), end = character(), name = character(), colour = character()), plotFileName = "", windowLength = 5, interpolationType = "median", picDim = c(9, 5), maxY = -1, minQuantile = 0.2, maxQuantile = 0.8, mainColour = "blue", plotTitle = "4C-seq plot", loessSpan = 0.1, xAxisIntervalLength = 50000, yAxisIntervalLength = 500, useFragEnds = TRUE)
visualizeViewpoint(expData, poi = data.frame(chr = character(), start = character(), end = character(), name = character(), colour = character()), plotFileName = "", windowLength = 5, interpolationType = "median", picDim = c(9, 5), maxY = -1, minQuantile = 0.2, maxQuantile = 0.8, mainColour = "blue", plotTitle = "4C-seq plot", loessSpan = 0.1, xAxisIntervalLength = 50000, yAxisIntervalLength = 500, useFragEnds = TRUE)

Arguments

`expData`	Experiment data of class `Data4Cseq` with information on the 4C-seq experiment, including normalized near-cis fragment data for visualization
`poi`	Points of interest that will be marked in the plot
`plotFileName`	Name for the 4C-seq plot file
`windowLength`	Length of the window for running median / running mean that is used to smooth the trend line
`interpolationType`	Type of interpolation, either running median or running mean
`picDim`	Dimensions of the plot
`maxY`	Maximum y-value to plot. If no maximum is given, the maximum running median / mean value is used
`minQuantile`	Minimum quantile to draw
`maxQuantile`	Maximum quantile to draw
`mainColour`	Main colour of the plot
`plotTitle`	Title of the 4C-seq plot, depicted above the main plot
`loessSpan`	Span value for the loess curve; smaller values mean a tighter fit to the data points, but a value that is too small may produce errors
`xAxisIntervalLength`	Length of the x axis intervals in the plot
`yAxisIntervalLength`	Length of the y axis intervals in the plot
`useFragEnds`	Indicates whether fragment end data is used directly or interpolated on fragment level

Value

A near-cis coverage plot and a corresponding quantile legend

Note

PDF export and output as TIFF format are supported. The export format is chosen depending on the plot file name's ending. If no plot file name is provided, the result is plotted on screen.

Author(s)

Carolin Walter

Examples

    data(liverData)
    file <- system.file("extdata", "fetalLiverVP.bed", package="Basic4Cseq")
        visualizeViewpoint(liverData, readPointsOfInterestFile(file), plotFileName = "", mainColour = "red", plotTitle = "Fetal Liver Near-Cis Plot", loessSpan = 0.1, maxY = 6000, xAxisIntervalLength = 50000, yAxisIntervalLength = 1000)
data(liverData)
    file <- system.file("extdata", "fetalLiverVP.bed", package="Basic4Cseq")
        visualizeViewpoint(liverData, readPointsOfInterestFile(file), plotFileName = "", mainColour = "red", plotTitle = "Fetal Liver Near-Cis Plot", loessSpan = 0.1, maxY = 6000, xAxisIntervalLength = 50000, yAxisIntervalLength = 1000)

Package 'Basic4Cseq'

Help Index

Remove invalid 4C-seq reads from a SAM file

Description

Usage

Arguments

Details

Value

Note

Author(s)

Examples

Choose fragments in a provided region around the viewpoint

Description

Usage

Arguments

Value

Note

Author(s)

Examples

Create a virtual fragment library from a provided genome and two restriction enzymes

Description

Usage

Arguments

Details

Value

Note

Author(s)

Examples

Creating a Data4Cseq object

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Class "Data4Cseq"

Description

Objects from the Class

Slots

Methods

Author(s)

Examples

Visualize digestion fragments with a histogram

Description

Usage

Arguments

Value

Author(s)

Examples

Draw a heatmap-like multi-scale contact profile

Description

Usage

Arguments

Value

Note

Author(s)

Examples

Export near-cis fragment data of a Data4Cseq object

Description

Usage

Arguments

Value

Author(s)

Examples

Calculate the read distribution for a 4C-seq experiment

Description

Usage

Arguments

Value

Note

Author(s)

References

Examples

Provide the corresponding enzyme sequence for an enzyme name

Description

Usage

Arguments

Value

Class `"Data4Cseq"`

Export near-cis fragment data of a `Data4Cseq` object