Package 'Motif2Site' reference manual

Title:	Detect binding sites from motifs and ChIP-seq experiments, and compare binding sites across conditions
Description:	Detect binding sites using motifs IUPAC sequence or bed coordinates and ChIP-seq experiments in bed or bam format. Combine/compare binding sites across experiments, tissues, or conditions. All normalization and differential steps are done using TMM-GLM method. Signal decomposition is done by setting motifs as the centers of the mixture of normal distribution curves.
Authors:	Peyman Zarrineh [cre, aut]
Maintainer:	Peyman Zarrineh <[email protected]>
License:	GPL-2
Version:	1.11.0
Built:	2025-03-27 04:25:09 UTC
Source:	https://github.com/bioc/Motif2Site

Read a bed file as Genomic Ranges

Description

Read a bed file as Genomic Ranges.

Usage

Bed2Granges(fileName)
Bed2Granges(fileName)

Arguments

fileName

A table delimeted file in bed format

Value

granges format of given coordinates

Examples


yeastExampleFile=system.file("extdata", "YeastSampleMotif.bed",
     package="Motif2Site")
ex <- Bed2Granges(yeastExampleFile)
ex

yeastExampleFile=system.file("extdata", "YeastSampleMotif.bed",
     package="Motif2Site")
ex <- Bed2Granges(yeastExampleFile)
ex

Combine all IP and Input count table files

Description

Open raw counts IP and Inut files and with given total counts calculate Fold Enrichment values, and combine them into one file

Usage

combine2Table(outputName, replicateNumber, currentDir)
combine2Table(outputName, replicateNumber, currentDir)

Arguments

`outputName`	Name of the output table
`replicateNumber`	Number of the replicates
`currentDir`	Directory for I/O operations

Value

No return value

Combine motif bed files into a combined ranges

Description

Get motif file names and combine them into a matrix, and keep the indices of original motifs in the combined file.

If the motif type is string the bed files are deleted after being combined to one matrix.

Usage

combineMotifFiles(motifFileNames, motifType = "BioString")
combineMotifFiles(motifFileNames, motifType = "BioString")

Arguments

`motifFileNames`	a vector motif file names
`motifType`	Type of motif string or give bed

Value

No return value

Combine count Table and statistics table

Description

Combine count table and pvalue FE statistics into one file for motifs and regions seperately.

Usage

combineTestResults(
  motifFile,
  acceptedMotifsOutputFile,
  acceptedRegionsOutputFile,
  countTableFile,
  testTableFile,
  fdrCutoff,
  windowSize
)
combineTestResults(
  motifFile,
  acceptedMotifsOutputFile,
  acceptedRegionsOutputFile,
  countTableFile,
  testTableFile,
  fdrCutoff,
  windowSize
)

Arguments

`motifFile`	File contains motifs
`acceptedMotifsOutputFile`	File name of accepted motif table inforation
`acceptedRegionsOutputFile`	File name of accepteted region information
`countTableFile`	Table of count values file name
`testTableFile`	negative binomial test table file name
`fdrCutoff`	Pvalue cut-off related to the used FDR
`windowSize`	Window size around binding site. The total region would be 2*windowSize+1

Value

The average binding intensity for each ChIP-seq

Compare a set of bed files to a user provided regions set

Description

This function gets user provided bedfiles and compare them with a user provided region.

It returns this comparison to given user binding regions in terms of precision/recall.

Usage

compareBedFiless2UserProvidedRegions(bedfiles, motifnames, givenRegion)
compareBedFiless2UserProvidedRegions(bedfiles, motifnames, givenRegion)

Arguments

`bedfiles`	a vector of bed files
`motifnames`	a vector of the names related to bed files
`givenRegion`	granges of user provided binding regions

Value

A dataframe which includes precision recall values for each bed file

Examples


yeastExampleFile=system.file("extdata", "YeastSampleMotif.bed",
                              package="Motif2Site")
YeastRegionsChIPseq <- Bed2Granges(yeastExampleFile)
bed1 <- system.file("extdata", "YeastBedFile1.bed", package="Motif2Site")
bed2 <- system.file("extdata", "YeastBedFile2.bed", package="Motif2Site")
BedFilesVector <- c(bed1, bed2)
SequenceComparison <- compareBedFiless2UserProvidedRegions(
     givenRegion=YeastRegionsChIPseq,
     bedfiles=BedFilesVector,
     motifnames=c("YeastBed1", "YeastBed2")
     )
SequenceComparison

yeastExampleFile=system.file("extdata", "YeastSampleMotif.bed",
                              package="Motif2Site")
YeastRegionsChIPseq <- Bed2Granges(yeastExampleFile)
bed1 <- system.file("extdata", "YeastBedFile1.bed", package="Motif2Site")
bed2 <- system.file("extdata", "YeastBedFile2.bed", package="Motif2Site")
BedFilesVector <- c(bed1, bed2)
SequenceComparison <- compareBedFiless2UserProvidedRegions(
     givenRegion=YeastRegionsChIPseq,
     bedfiles=BedFilesVector,
     motifnames=c("YeastBed1", "YeastBed2")
     )
SequenceComparison

Compare a set of bed files to a provided regions set

Description

Get combined ranges of bed files and compare them to given

binding regions in terms of precision/recall.

Usage

CompareBeds2GivenRegions(motifName, bindingRegions)
CompareBeds2GivenRegions(motifName, bindingRegions)

Arguments

`motifName`	a vector of motif names
`bindingRegions`	granges of provided binding regions

Value

A dataframe which includes precision recall values for each motif

Comparison motifs locations to a given regions set

Description

Comparison of motifs locations to user provided binding regions.

It returns this comparison to given user binding regions in terms of precision/recall.

Usage

CompareMotifs2GivenRegions(motifs, mismatchNumbers, bindingRegions)
CompareMotifs2GivenRegions(motifs, mismatchNumbers, bindingRegions)

Arguments

`motifs`	a vector of motif characters in nucleotide IUPAC format
`mismatchNumbers`	a vector Number of mismatches allowed to match with motifs
`bindingRegions`	granges of user provided binding regions

Value

A dataframe which includes precision recall values for each motif

Compare a set of motifs to a user provided regions set

Description

This function gets user provided motifs and related mismatch numbers, it detects motifs and compare them with a user provided region.

It returns this comparison to given user binding regions in terms of precision/recall.

The genome and build information should be provided and relevant BS genomes packages such as BSgenome.Mmusculus.UCSC.mm10 or BSgenome.Hsapiens.UCSC.hg38 must be installed for the used genome and builds.

Usage

compareMotifs2UserProvidedRegions(
  motifs,
  mismatchNumbers,
  genome,
  genomeBuild,
  DB = "UCSC",
  givenRegion,
  mainCHRs = TRUE
)
compareMotifs2UserProvidedRegions(
  motifs,
  mismatchNumbers,
  genome,
  genomeBuild,
  DB = "UCSC",
  givenRegion,
  mainCHRs = TRUE
)

Arguments

`motifs`	a vector of motif characters in nucleotide IUPAC format
`mismatchNumbers`	a vector Number of mismatches allowed to match with motifs
`genome`	The genome name such as "Hsapiens", "Mmusculus", "Dmelanogaster"
`genomeBuild`	The genome build such as "hg38", "hg19", "mm10", "dm3"
`DB`	The database of genome build. default: "UCSC"
`givenRegion`	granges of user provided binding regions
`mainCHRs`	If true only the major chromosome are considered, if FALSE Random, Uncharacterised, and Mithocondrial chromosomes are also considered

Value

A dataframe which includes precision recall values for each motif

Examples


# Artificial example in Yeast
# install BSgenome.Scerevisiae.UCSC.sacCer3 prior to run this code
 yeastExampleFile=system.file("extdata", "YeastSampleMotif.bed",
                                package="Motif2Site")
YeastRegionsChIPseq <- Bed2Granges(yeastExampleFile)
SequenceComparison <- compareMotifs2UserProvidedRegions(
   givenRegion=YeastRegionsChIPseq,
   motifs=c("TGATTSCAGGANT", "TGATTCCAGGANT", "TGATWSCAGGANT"),
   mismatchNumbers=c(1,0,2),
   genome="Scerevisiae",
   genomeBuild="sacCer3"
   )
SequenceComparison


# Artificial example in Yeast
# install BSgenome.Scerevisiae.UCSC.sacCer3 prior to run this code
 yeastExampleFile=system.file("extdata", "YeastSampleMotif.bed",
                                package="Motif2Site")
YeastRegionsChIPseq <- Bed2Granges(yeastExampleFile)
SequenceComparison <- compareMotifs2UserProvidedRegions(
   givenRegion=YeastRegionsChIPseq,
   motifs=c("TGATTSCAGGANT", "TGATTCCAGGANT", "TGATWSCAGGANT"),
   mismatchNumbers=c(1,0,2),
   genome="Scerevisiae",
   genomeBuild="sacCer3"
   )
SequenceComparison

compute fold enrichment values for an experiment

Description

Open raw counts IP and Inut files and with given total counts calculate Fold Enrichment values for the motifs

Usage

computeFoldEnrichment(
  ipCountFile,
  inputCountFile,
  ipTotalCount,
  inputTotalCount,
  outputName
)
computeFoldEnrichment(
  ipCountFile,
  inputCountFile,
  ipTotalCount,
  inputTotalCount,
  outputName
)

Arguments

`ipCountFile`	File contains motifs count values for IP experiment
`inputCountFile`	File contains motifs count values for Input experiment
`ipTotalCount`	Total short reads number in IP experiment
`inputTotalCount`	Total short reads number in Input experiment
`outputName`	Name of the output table

Value

No return value

Synthetic datasets used in the package

Description

Comparison Yeast synthetic motifs and binding sites: Two synthetic motif files in bed fromat are created to compare them with a synthetic binding site set in terms of precision and recall.

Fur binding sites detection in E. coli build NC_000913: Synthetic Fur ChIp-seq in E. coli was generated using real peaks published in Seo et al 2014. The ChIP-seq data are provided in bed format in fe and dpd condition and both contains two replicates. Synthetic Input ChIP-seq datasets were generated by randomly distributing short reads in E. coli genome. User provided candidate binding sites in bed format was generated by combining instances of "GWWTGANAA" motif with 1-mistmatch and "GWWTGAGAAT" with 2-mismatches in E. coli genome.

Format

Three bed files to compare user provided motifs and binding sites in Yeast. Seven bed files to compare Fur ChIP-seq binding sites in E.coli.

YeastBedFile1.bed: The first synthetic motif set
YeastBedFile2.bed: The second synthetic motif set
YeastSampleMotif.bed: The synthetic binding region
FurMotifs.bed: User provided Fur motif set in E. coli
FUR_fe1.bed: Synthetic Fur ChIP-seq short reads in fe condition rep1
FUR_fe2.bed: Synthetic Fur ChIP-seq short reads in fe condition rep2
FUR_dpd1.bed: Synthetic Fur ChIP-seq short reads in dpd condition rep1
FUR_dpd2.bed: Synthetic Fur ChIP-seq short reads in dpd condition rep2
Input1.bed: The synthetic background Input ChIP-seq rep1
Input2.bed: The synthetic background Input ChIP-seq rep2

Examples


## Data for examplex to compare user provided motifs and binding sites in Yeast 

yeastExampleFile=system.file("extdata", "YeastSampleMotif.bed",
                              package="Motif2Site")
YeastRegionsChIPseq <- Bed2Granges(yeastExampleFile)
bed1 <- system.file("extdata", "YeastBedFile1.bed", package="Motif2Site")
bed2 <- system.file("extdata", "YeastBedFile2.bed", package="Motif2Site")


## Data for examples of binding site detection in E. coli

# FUR candidate motifs in NC_000913 E. coli
FurMotifs = system.file("extdata", "FurMotifs.bed", package="Motif2Site")

# ChIP-seq FUR fe datasets binding sites from user provided bed file 
# ChIP-seq datasets in bed single end format

IPFe <- c(system.file("extdata", "FUR_fe1.bed", package="Motif2Site"),
          system.file("extdata", "FUR_fe2.bed", package="Motif2Site"))


# ChIP-seq FUR dpd datasets binding sites from user provided bed file 
# ChIP-seq datasets in bed single end format

IPDpd <- c(system.file("extdata", "FUR_dpd1.bed", package="Motif2Site"),
           system.file("extdata", "FUR_dpd2.bed", package="Motif2Site"))


# ChIP-seq background

Inputs <- c(system.file("extdata", "Input1.bed", package="Motif2Site"),
            system.file("extdata", "Input2.bed", package="Motif2Site"))


## Data for examplex to compare user provided motifs and binding sites in Yeast 

yeastExampleFile=system.file("extdata", "YeastSampleMotif.bed",
                              package="Motif2Site")
YeastRegionsChIPseq <- Bed2Granges(yeastExampleFile)
bed1 <- system.file("extdata", "YeastBedFile1.bed", package="Motif2Site")
bed2 <- system.file("extdata", "YeastBedFile2.bed", package="Motif2Site")


## Data for examples of binding site detection in E. coli

# FUR candidate motifs in NC_000913 E. coli
FurMotifs = system.file("extdata", "FurMotifs.bed", package="Motif2Site")

# ChIP-seq FUR fe datasets binding sites from user provided bed file 
# ChIP-seq datasets in bed single end format

IPFe <- c(system.file("extdata", "FUR_fe1.bed", package="Motif2Site"),
          system.file("extdata", "FUR_fe2.bed", package="Motif2Site"))


# ChIP-seq FUR dpd datasets binding sites from user provided bed file 
# ChIP-seq datasets in bed single end format

IPDpd <- c(system.file("extdata", "FUR_dpd1.bed", package="Motif2Site"),
           system.file("extdata", "FUR_dpd2.bed", package="Motif2Site"))


# ChIP-seq background

Inputs <- c(system.file("extdata", "Input1.bed", package="Motif2Site"),
            system.file("extdata", "Input2.bed", package="Motif2Site"))

Decompose binding signal among accepted motifs

Description

Gets motif locations and related short reads and select the motifs which are non-skewed: abs(skewness) < 0.3 and more short reads binds closer to site, and show strong binding after decomposition.

Decomposition is performed by using mixtools normalmixEM command fixing mu as motif locations.

Usage

decomposeBindingSignal(
  windowSize,
  replicateNumber,
  acceptedRegionsOutputFile = "BindingRegions",
  acceptedMotifsOutputFile = "BindingMotifsTable",
  currentDir
)
decomposeBindingSignal(
  windowSize,
  replicateNumber,
  acceptedRegionsOutputFile = "BindingRegions",
  acceptedMotifsOutputFile = "BindingMotifsTable",
  currentDir
)

Arguments

`windowSize`	Window size around binding site. The total region would be 2*windowSize+1
`replicateNumber`	experiment replicate number
`acceptedRegionsOutputFile`	File name contains binding regions coordinates and related motifs
`acceptedMotifsOutputFile`	File name contains motifs coordinates and related information, Pvalue, FE, etc
`currentDir`	Directory for I/O operations

Value

motifStatistics Ratio of accepted motifs, rejected motifs due to skewnewss, and rejected motifs after decomposition

Delete a vector of files

Description

Delete multiple give files as a vector of characters

Usage

DeleteMultipleFiles(files)
DeleteMultipleFiles(files)

Arguments

files

a vector of files

Value

No return value

build heurisitc distribution around the binding sites

Description

This function generates heuristic distribution of short reads around binding sites which do not need to deconvolve, total numer of short reads and window size as number of neucleotid around binding sites.

It fits a kernel to the distribution and return the distribution as output. The total sum of returned values is equal to one. It plots this kernel.

Also it calculates FRiPs (Fraction of Reads in Peaks) for each

ChIP-seq and returns it. FRiPs and kernel distributions are measures of goodness of ChIP-seq experiments and selected motifs.

Usage

deriveHeuristicBindingDistribution(
  chipSeq,
  averageBindings,
  windowSize,
  acceptedRegionsOutputFile = "BindingRegions",
  currentDir
)
deriveHeuristicBindingDistribution(
  chipSeq,
  averageBindings,
  windowSize,
  acceptedRegionsOutputFile = "BindingRegions",
  currentDir
)

Arguments

`chipSeq`	ChIP-seq aligned 1nt short reads
`averageBindings`	expected short reads number aligned to a random location of genes of given size
`windowSize`	Window size around binding site. The total region would be 2*windowSize+1
`acceptedRegionsOutputFile`	Accepted binding regions
`currentDir`	Directory for I/O operations

Value

FRiPs Fraction of Reads in Peaks

Detect binding sites from motif

Description

DETECT Binding sites with given motif and mismatch number as well genome/build, False Discovery Rate for a given experiment name.

This function is called by both DetectBindingSitesBed and DetectBindingSitesMotif with different input.

Usage

DetectBindingSites(
  From,
  BedFile,
  motif,
  mismatchNumber,
  chipSeq,
  genome,
  genomeBuild,
  DB = "UCSC",
  fdrValue = 0.05,
  windowSize = 100,
  GivenRegion = NA,
  currentDir
)
DetectBindingSites(
  From,
  BedFile,
  motif,
  mismatchNumber,
  chipSeq,
  genome,
  genomeBuild,
  DB = "UCSC",
  fdrValue = 0.05,
  windowSize = 100,
  GivenRegion = NA,
  currentDir
)

Arguments

`From`	Type of motif dataset either "Motif" or "Bed"
`BedFile`	Motif locations in bed format file
`motif`	motif characters in nucleotide IUPAC format
`mismatchNumber`	Number of mismatches allowed to match with motifs
`chipSeq`	ChIP-seq alignment both IP and background in 1nt bed format files
`genome`	The genome name such as "Hsapiens", "Mmusculus", "Dmelanogaster"
`genomeBuild`	The genome build such as "hg38", "hg19", "mm10", "dm3"
`DB`	The database of genome build. default: "UCSC"
`fdrValue`	FDR value cut-off
`windowSize`	Window size around binding site. The total region would be 2*windowSize+1
`GivenRegion`	granges of user provided binding regions
`currentDir`	Directory for I/O operations

Value

A list of FRiPs, sequence statistics, and Motif statistics

Detect binding sites from bed motif input

Description

Takes user provied bed regions, and check for validity of them. Read bam or bed alignment files and convert to 1 nt bed and call detect binding site from 1nt bed.

Usage

DetectBindingSitesBed(
  BedFile,
  IPfiles,
  BackgroundFiles,
  genome,
  genomeBuild,
  DB = "UCSC",
  fdrValue = 0.05,
  expName = "Motif_Centric_Peaks",
  windowSize = 100,
  format = ""
)
DetectBindingSitesBed(
  BedFile,
  IPfiles,
  BackgroundFiles,
  genome,
  genomeBuild,
  DB = "UCSC",
  fdrValue = 0.05,
  expName = "Motif_Centric_Peaks",
  windowSize = 100,
  format = ""
)

Arguments

`BedFile`	Motif locations in bed format file
`IPfiles`	IP ChIP-seq alignment files
`BackgroundFiles`	Background ChIP-seq alignment files. Can be Input experimetn, DNA whole exctract, etc.
`genome`	The genome name such as "Hsapiens", "Mmusculus", "Dmelanogaster"
`genomeBuild`	The genome build such as "hg38", "hg19", "mm10", "dm3"
`DB`	The database of genome build. default: "UCSC"
`fdrValue`	FDR value cut-off
`expName`	The name of the output table
`windowSize`	Window size around binding site. The total region would be 2*windowSize+1
`format`	alignment format and should be one of these: "BAMPE", "BAMSE", "BEDPE", "BEDSE"

Value

peakCallingStatistics A list FRiPs, sequence statistics, and Motif statistics

Examples


# FUR candidate motifs in NC_000913 E. coli
FurMotifs=system.file("extdata", "FurMotifs.bed", package="Motif2Site")

# ChIP-seq datasets in bed single end format
IPFe <- c(system.file("extdata", "FUR_fe1.bed", package="Motif2Site"),
        system.file("extdata", "FUR_fe2.bed", package="Motif2Site"))
Inputs <- c(system.file("extdata", "Input1.bed", package="Motif2Site"),
            system.file("extdata", "Input2.bed", package="Motif2Site"))
FURfeBedInputStats <- 
  DetectBindingSitesBed(BedFile=FurMotifs,
                        IPfiles=IPFe, 
                        BackgroundFiles=Inputs, 
                        genome="Ecoli",
                        genomeBuild="20080805",
                        DB="NCBI",
                        expName="FUR_Fe_BedInput",
                        format="BEDSE"
                       )


# FUR candidate motifs in NC_000913 E. coli
FurMotifs=system.file("extdata", "FurMotifs.bed", package="Motif2Site")

# ChIP-seq datasets in bed single end format
IPFe <- c(system.file("extdata", "FUR_fe1.bed", package="Motif2Site"),
        system.file("extdata", "FUR_fe2.bed", package="Motif2Site"))
Inputs <- c(system.file("extdata", "Input1.bed", package="Motif2Site"),
            system.file("extdata", "Input2.bed", package="Motif2Site"))
FURfeBedInputStats <- 
  DetectBindingSitesBed(BedFile=FurMotifs,
                        IPfiles=IPFe, 
                        BackgroundFiles=Inputs, 
                        genome="Ecoli",
                        genomeBuild="20080805",
                        DB="NCBI",
                        expName="FUR_Fe_BedInput",
                        format="BEDSE"
                       )

Detect binding sites from sequence motif sequence and mismatchNumber

Description

DETECT Binding sites with given motif and mismatch number as well genome/build, False Discovery Rate for a given experiment name. Read bam or bed alignment files and convert to 1 nt bed and detect binding site among motifs from 1nt bed alignment.

Usage

DetectBindingSitesMotif(
  motif,
  mismatchNumber,
  IPfiles,
  BackgroundFiles,
  genome,
  genomeBuild,
  DB = "UCSC",
  fdrValue = 0.05,
  expName = "Motif_Centric_Peaks",
  windowSize = 100,
  format = "",
  GivenRegion = NA
)
DetectBindingSitesMotif(
  motif,
  mismatchNumber,
  IPfiles,
  BackgroundFiles,
  genome,
  genomeBuild,
  DB = "UCSC",
  fdrValue = 0.05,
  expName = "Motif_Centric_Peaks",
  windowSize = 100,
  format = "",
  GivenRegion = NA
)

Arguments

`motif`	motif characters in nucleotide IUPAC format
`mismatchNumber`	Number of mismatches allowed to match with motifs
`IPfiles`	IP ChIP-seq alignment files
`BackgroundFiles`	Background ChIP-seq alignment files. Can be Input experimetn, DNA whole exctract, etc.
`genome`	The genome name such as "Hsapiens", "Mmusculus", "Dmelanogaster"
`genomeBuild`	The genome build such as "hg38", "hg19", "mm10", "dm3"
`DB`	The database of genome build. default: "UCSC"
`fdrValue`	FDR value cut-off
`expName`	The name of the output table
`windowSize`	Window size around binding site. The total region would be 2*windowSize+1
`format`	alignment format and should be one of these: "BAMPE", "BAMSE", "BEDPE", "BEDSE"
`GivenRegion`	granges of user provided binding regions

Value

A list FRiPs, sequence statistics, and Motif statistics

Examples


# ChIP-seq datasets in bed single end format
IPFe <- c(system.file("extdata", "FUR_fe1.bed", package="Motif2Site"),
        system.file("extdata", "FUR_fe2.bed", package="Motif2Site"))
Inputs <- c(system.file("extdata", "Input1.bed", package="Motif2Site"),
            system.file("extdata", "Input2.bed", package="Motif2Site"))

 # Granages region for motif search           
   NC_000913_Coordiante <-
     GenomicRanges::GRanges(seqnames=S4Vectors::Rle("NC_000913"),
                            ranges=IRanges::IRanges(1, 4639675))           
            
FURfeStringInputStats <- 
  DetectBindingSitesMotif(motif="GWWTGAGAA",
   mismatchNumber=1,
   IPfiles=IPFe, 
   BackgroundFiles=Inputs, 
   genome="Ecoli",
   genomeBuild="20080805",
   DB="NCBI",
   expName="FUR_Fe_StringInput",
   format="BEDSE",
   GivenRegion=NC_000913_Coordiante 
   )

                                      
# ChIP-seq datasets in bed single end format
IPFe <- c(system.file("extdata", "FUR_fe1.bed", package="Motif2Site"),
        system.file("extdata", "FUR_fe2.bed", package="Motif2Site"))
Inputs <- c(system.file("extdata", "Input1.bed", package="Motif2Site"),
            system.file("extdata", "Input2.bed", package="Motif2Site"))

 # Granages region for motif search           
   NC_000913_Coordiante <-
     GenomicRanges::GRanges(seqnames=S4Vectors::Rle("NC_000913"),
                            ranges=IRanges::IRanges(1, 4639675))           
            
FURfeStringInputStats <- 
  DetectBindingSitesMotif(motif="GWWTGAGAA",
   mismatchNumber=1,
   IPfiles=IPFe, 
   BackgroundFiles=Inputs, 
   genome="Ecoli",
   genomeBuild="20080805",
   DB="NCBI",
   expName="FUR_Fe_StringInput",
   format="BEDSE",
   GivenRegion=NC_000913_Coordiante 
   )

FDR cut-off detection Benjamini Hochberg method

Description

Return FDR cut-off for a user provided fdrvalue using Benjamini Hochberg on main motif test data

Usage

DetectFdrCutoffBH(TestTableFile = "TestResults", fdrValue = 0.05)
DetectFdrCutoffBH(TestTableFile = "TestResults", fdrValue = 0.05)

Arguments

`TestTableFile`	test table which contains pvalues
`fdrValue`	FDR cut-off

Value

pvalue cut-off

Find motif instances with a certain mismatch number

Description

Find motif instances in a given genome. It gets motif strings and related allowed mismatchnumbers and returns genomewide motif instances.

Usage

findMotifs(
  motif,
  mismatchNumber,
  genome,
  genomeBuild,
  DB = "UCSC",
  mainCHRs = TRUE,
  firstCHR = FALSE,
  MotifLocationName = "Motif_Locations",
  limitedRegion = NA
)
findMotifs(
  motif,
  mismatchNumber,
  genome,
  genomeBuild,
  DB = "UCSC",
  mainCHRs = TRUE,
  firstCHR = FALSE,
  MotifLocationName = "Motif_Locations",
  limitedRegion = NA
)

Arguments

`motif`	motif characters in nucleotide IUPAC format
`mismatchNumber`	Number of mismatch allowed to match with motif
`genome`	The genome name such as "Hsapiens", "Mmusculus", "Dmelanogaster"
`genomeBuild`	The genome build such as "hg38", "hg19", "mm10", "dm3"
`DB`	The database of genome build. default: "UCSC"
`mainCHRs`	If true only the major chromosome are considered, if FALSE Random, Uncharacterised, and Mithocondrial chromosomes are also considered
`firstCHR`	If true only Chr1 is used to find motifs. Default is FALSE
`MotifLocationName`	The name of the file of the motif locations
`limitedRegion`	If specified the motifs are detected in the provided granges

Value

No return value

Fit a kernel density distribution to the obersever heuristic distribution

Description

This function gets heuristic distribution of short reads around binding sites, total numer of short reads and window size as number of neucleotid around binding sites.

It fits a kernel to the distribution and return the distribution as output. The total sum of returned values is equal to one.

Usage

fitKernelDensity(heuristicDistribution, totalShortReads, windowSize)
fitKernelDensity(heuristicDistribution, totalShortReads, windowSize)

Arguments

`heuristicDistribution`	Original short distribution
`totalShortReads`	Total number of short reads
`windowSize`	Window size around binding site. The total region would be 2*windowSize+1

Value

kernel returns fitted kernel distribution of short reads around binding sites

Convert bam and bed files to 1 nucleotide bed

Description

Take alignment files in bam or bed fomat and convert them to 1 nucleotide bed file

Usage

generate1ntBedAlignment(InputFile, bedFile, format = "")
generate1ntBedAlignment(InputFile, bedFile, format = "")

Arguments

`InputFile`	Original alignment file name
`bedFile`	Name of output 1nt bed file
`format`	alignment format and should be one of these: "BAMPE", "BAMSE", "BEDPE", "BEDSE"

Value

No return value

Detect and Recenter binding sites from ChIP-seq experiments

Description

Take ChIP-seq and motifs and detect Binding sites. It also combines/compares binding sites across experiments. Here is a synthetic example of differential Fur binding sites in E.coli:

Examples


# FUR candidate motifs in NC_000913 E. coli
FurMotifs=system.file("extdata", "FurMotifs.bed", package="Motif2Site")

# ChIP-seq datasets fe in bed single end format
IPFe <- c(system.file("extdata", "FUR_fe1.bed", package="Motif2Site"),
        system.file("extdata", "FUR_fe2.bed", package="Motif2Site"))
Inputs <- c(system.file("extdata", "Input1.bed", package="Motif2Site"),
            system.file("extdata", "Input2.bed", package="Motif2Site"))
FURfeBedInputStats <- 
  DetectBindingSitesBed(BedFile=FurMotifs,
                        IPfiles=IPFe, 
                        BackgroundFiles=Inputs, 
                        genome="Ecoli",
                        genomeBuild="20080805",
                        DB="NCBI",
                        expName="FUR_Fe_BedInput",
                        format="BEDSE"
                       )

# ChIP-seq datasets dpd in bed single end format
IPDpd <- c(system.file("extdata", "FUR_dpd1.bed", package="Motif2Site"),
        system.file("extdata", "FUR_dpd2.bed", package="Motif2Site"))
FURdpdBedInputStats <- 
  DetectBindingSitesBed(BedFile=FurMotifs,
                        IPfiles=IPDpd, 
                        BackgroundFiles=Inputs, 
                        genome="Ecoli",
                        genomeBuild="20080805",
                        DB="NCBI",
                        expName="FUR_Dpd_BedInput",
                        format="BEDSE"
                       )
                       

# Combine all FUR binding sites into one table
corMAT <- recenterBindingSitesAcrossExperiments(
  expLocations=c("FUR_Fe_BedInput","FUR_Dpd_BedInput"),
  experimentNames=c("FUR_Fe","FUR_Dpd"),
  expName="combinedFUR",
  )
corMAT

# Differential binding sites across FUR conditions fe vs dpd
diffFUR <- pairwisDifferential(tableOfCountsDir="combinedFUR",
                               exp1="FUR_Fe",
                               exp2="FUR_Dpd",
                               FDRcutoff=0.05,
                               logFCcuttoff=1
                               )

FeUp <- diffFUR[[1]]
DpdUp <- diffFUR[[2]]
TotalComparison <- diffFUR[[3]]
head(TotalComparison)

# FUR candidate motifs in NC_000913 E. coli
FurMotifs=system.file("extdata", "FurMotifs.bed", package="Motif2Site")

# ChIP-seq datasets fe in bed single end format
IPFe <- c(system.file("extdata", "FUR_fe1.bed", package="Motif2Site"),
        system.file("extdata", "FUR_fe2.bed", package="Motif2Site"))
Inputs <- c(system.file("extdata", "Input1.bed", package="Motif2Site"),
            system.file("extdata", "Input2.bed", package="Motif2Site"))
FURfeBedInputStats <- 
  DetectBindingSitesBed(BedFile=FurMotifs,
                        IPfiles=IPFe, 
                        BackgroundFiles=Inputs, 
                        genome="Ecoli",
                        genomeBuild="20080805",
                        DB="NCBI",
                        expName="FUR_Fe_BedInput",
                        format="BEDSE"
                       )

# ChIP-seq datasets dpd in bed single end format
IPDpd <- c(system.file("extdata", "FUR_dpd1.bed", package="Motif2Site"),
        system.file("extdata", "FUR_dpd2.bed", package="Motif2Site"))
FURdpdBedInputStats <- 
  DetectBindingSitesBed(BedFile=FurMotifs,
                        IPfiles=IPDpd, 
                        BackgroundFiles=Inputs, 
                        genome="Ecoli",
                        genomeBuild="20080805",
                        DB="NCBI",
                        expName="FUR_Dpd_BedInput",
                        format="BEDSE"
                       )
                       

# Combine all FUR binding sites into one table
corMAT <- recenterBindingSitesAcrossExperiments(
  expLocations=c("FUR_Fe_BedInput","FUR_Dpd_BedInput"),
  experimentNames=c("FUR_Fe","FUR_Dpd"),
  expName="combinedFUR",
  )
corMAT

# Differential binding sites across FUR conditions fe vs dpd
diffFUR <- pairwisDifferential(tableOfCountsDir="combinedFUR",
                               exp1="FUR_Fe",
                               exp2="FUR_Dpd",
                               FDRcutoff=0.05,
                               logFCcuttoff=1
                               )

FeUp <- diffFUR[[1]]
DpdUp <- diffFUR[[2]]
TotalComparison <- diffFUR[[3]]
head(TotalComparison)

Model IP and Input count values with negative Binomal

Description

Using edgeR TMM normalization and estimating dispersion as well as Adapting exact test function from edgeR to model IP vs Input counts.

To make this function memory effcient motifs into smaller sets and compute them seperately and combine them at the end.

Usage

motifBindingNegativeBinomialCount(
  countTableFile,
  replicateNumber,
  outputFile,
  currentDir
)
motifBindingNegativeBinomialCount(
  countTableFile,
  replicateNumber,
  outputFile,
  currentDir
)

Arguments

`countTableFile`	Table of counts which contains all IP and Input value raw counts
`replicateNumber`	experiment replicate number
`outputFile`	The name of the output file generated by this function
`currentDir`	Directory for I/O operations

Value

A dataframe includes fold enrichment, pvalue, and normalized count values

count short reads related to each motif for a given ChIPseq file

Description

count 1nt short reads related to each motif for a given ChIPseq file.

Usage

motifChipCount(motifFile, chipFile, windowSize, outputName)
motifChipCount(motifFile, chipFile, windowSize, outputName)

Arguments

`motifFile`	File contains motifs
`chipFile`	ChIP-seq 1nt alignment locations in bed format
`windowSize`	Window size around binding site. The total region would be 2*windowSize+1
`outputName`	Name of the output table

Value

Total number of short reads in motif reagions

count short reads around motifs for all ChIP-seq experiments

Description

count short reads related to each motif for all ChIPseq files both IP and Input.

Usage

motifCount(motifFile, chipSeq, windowSize, outputName, currentDir)
motifCount(motifFile, chipSeq, windowSize, outputName, currentDir)

Arguments

`motifFile`	File contains motifs
`chipSeq`	dataframe of ChIP-seq 1nt alignment location
`windowSize`	Window size around binding site. The total region would be 2*windowSize+1
`outputName`	Name of the output table
`currentDir`	Directory for I/O operations

Value

No return value

Process count data and perform negative binomial test

Description

Remove unmmaped regions, low and high binding regions and regions without fold change, and call negative binomial or nb test for the remaining regions.

Usage

motifTablePreProcess(countTableFile, outFile, currentDir)
motifTablePreProcess(countTableFile, outFile, currentDir)

Arguments

`countTableFile`	Tabl of count values around motifs for all ChIP-seq experiments
`outFile`	The name of the output file
`currentDir`	Directory for I/O operations

Value

sequencingStatitics A dataframe consists of the ratio of non-sequenced, low-sequenced, ang high-sequenced regions.

Negative binomial test of binding using all replicates

Description

Adapted exact test function from edgeR to compare IP vs Input with replicates. Input is a DGELIST with common and tag-wise dispression has been already caluclated by edgeR commands.

It calculates abundaces with mglmOneGroup identical to edgeR. logFE was calculated identiacl to edgeR. For the pvalue test negative binomial test is performed on the calculated abundance.

Usage

NegativeBinomialTestWithReplicate(object, prior.count = 0.125)
NegativeBinomialTestWithReplicate(object, prior.count = 0.125)

Arguments

`object`	Table of counts which contains all IP and Input value counts, TMM normalized and contains dispersion values
`prior.count`	edgeR prior value

Value

log fold enrichment, pvalue, and normalized count values

Detect differential motifs

Description

Take combined matrix of motif counts generated by recenterBindingSitesAcrossExperiments, and experiment names. It detect differential motifs using edgeR TMM nomralizaiton with Generalized linear model

Usage

pairwisDifferential(
  tableOfCountsDir = "",
  exp1,
  exp2,
  FDRcutoff = 0.05,
  logFCcuttoff = 1
)
pairwisDifferential(
  tableOfCountsDir = "",
  exp1,
  exp2,
  FDRcutoff = 0.05,
  logFCcuttoff = 1
)

Arguments

`tableOfCountsDir`	Directory which conatins the combined motifs and ChIP-seq count file
`exp1`	Experiment name which will be compared in pairwise comparison
`exp2`	Experiment name which will be compared in pairwise comparison
`FDRcutoff`	FDR cutoff applies on pvalue distribution
`logFCcuttoff`	log fold change cutoff

Value

A list of differential motifs, motif1 and motif2 as well as a table of total motifs and log fold changes

Examples


# FUR candidate motifs in NC_000913 E. coli
FurMotifs=system.file("extdata", "FurMotifs.bed", package="Motif2Site")

# ChIP-seq datasets fe in bed single end format
IPFe <- c(system.file("extdata", "FUR_fe1.bed", package="Motif2Site"),
        system.file("extdata", "FUR_fe2.bed", package="Motif2Site"))
Inputs <- c(system.file("extdata", "Input1.bed", package="Motif2Site"),
            system.file("extdata", "Input2.bed", package="Motif2Site"))
FURfeBedInputStats <- 
  DetectBindingSitesBed(BedFile=FurMotifs,
   IPfiles=IPFe, 
   BackgroundFiles=Inputs, 
   genome="Ecoli",
   genomeBuild="20080805",
   DB="NCBI",
   expName="FUR_Fe_BedInput",
   format="BEDSE"
   )

# ChIP-seq datasets dpd in bed single end format
IPDpd <- c(system.file("extdata", "FUR_dpd1.bed", package="Motif2Site"),
        system.file("extdata", "FUR_dpd2.bed", package="Motif2Site"))
FURdpdBedInputStats <- 
  DetectBindingSitesBed(BedFile=FurMotifs,
   IPfiles=IPDpd, 
   BackgroundFiles=Inputs, 
   genome="Ecoli",
   genomeBuild="20080805",
   DB="NCBI",
   expName="FUR_Dpd_BedInput",
   format="BEDSE"
   )
                       

# Combine all FUR binding sites into one table
corMAT <- recenterBindingSitesAcrossExperiments(
  expLocations=c("FUR_Fe_BedInput","FUR_Dpd_BedInput"),
  experimentNames=c("FUR_Fe","FUR_Dpd"),
  expName="combinedFUR",
  )

# Differential binding sites across FUR conditions fe vs dpd
diffFUR <- pairwisDifferential(tableOfCountsDir="combinedFUR",
   exp1="FUR_Fe",
   exp2="FUR_Dpd",
   FDRcutoff=0.05,
   logFCcuttoff=1
   )

FeUp <- diffFUR[[1]]
DpdUp <- diffFUR[[2]]
TotalComparison <- diffFUR[[3]]
head(TotalComparison)

# FUR candidate motifs in NC_000913 E. coli
FurMotifs=system.file("extdata", "FurMotifs.bed", package="Motif2Site")

# ChIP-seq datasets fe in bed single end format
IPFe <- c(system.file("extdata", "FUR_fe1.bed", package="Motif2Site"),
        system.file("extdata", "FUR_fe2.bed", package="Motif2Site"))
Inputs <- c(system.file("extdata", "Input1.bed", package="Motif2Site"),
            system.file("extdata", "Input2.bed", package="Motif2Site"))
FURfeBedInputStats <- 
  DetectBindingSitesBed(BedFile=FurMotifs,
   IPfiles=IPFe, 
   BackgroundFiles=Inputs, 
   genome="Ecoli",
   genomeBuild="20080805",
   DB="NCBI",
   expName="FUR_Fe_BedInput",
   format="BEDSE"
   )

# ChIP-seq datasets dpd in bed single end format
IPDpd <- c(system.file("extdata", "FUR_dpd1.bed", package="Motif2Site"),
        system.file("extdata", "FUR_dpd2.bed", package="Motif2Site"))
FURdpdBedInputStats <- 
  DetectBindingSitesBed(BedFile=FurMotifs,
   IPfiles=IPDpd, 
   BackgroundFiles=Inputs, 
   genome="Ecoli",
   genomeBuild="20080805",
   DB="NCBI",
   expName="FUR_Dpd_BedInput",
   format="BEDSE"
   )
                       

# Combine all FUR binding sites into one table
corMAT <- recenterBindingSitesAcrossExperiments(
  expLocations=c("FUR_Fe_BedInput","FUR_Dpd_BedInput"),
  experimentNames=c("FUR_Fe","FUR_Dpd"),
  expName="combinedFUR",
  )

# Differential binding sites across FUR conditions fe vs dpd
diffFUR <- pairwisDifferential(tableOfCountsDir="combinedFUR",
   exp1="FUR_Fe",
   exp2="FUR_Dpd",
   FDRcutoff=0.05,
   logFCcuttoff=1
   )

FeUp <- diffFUR[[1]]
DpdUp <- diffFUR[[2]]
TotalComparison <- diffFUR[[3]]
head(TotalComparison)

Suppress messages generated by in external package

Description

mixtools and MASS::fitdistr generates warning by cat which is suppressed by this funcitons

Usage

quiet(func)
quiet(func)

Arguments

func

functional input call for which cat messages should be supressed

Value

No return value

Combine binding sites across experiments

Description

Take experiment folder locations and experiment names and combine them into a combined matrix of motifs and ChIP-seq counts

Experiment folders must be generated either by DetectBindingSitesBed or DetectBindingSitesMotif.

Usage

recenterBindingSitesAcrossExperiments(
  expLocations,
  experimentNames,
  expName = "combinedData",
  fdrValue = 0.05,
  fdrCrossExp = 0.001
)
recenterBindingSitesAcrossExperiments(
  expLocations,
  experimentNames,
  expName = "combinedData",
  fdrValue = 0.05,
  fdrCrossExp = 0.001
)

Arguments

`expLocations`	The path to the experiment folders
`experimentNames`	Name of the experiment to be used in combined ChIP-seq
`expName`	Name of the combined matrix
`fdrValue`	FDR cut-off to accept binding in each ChIP-seq experiments
`fdrCrossExp`	If no experiment fullfill this cutoff, the motif is not considered

Value

A pariwise Pearson correlation matrix across experiments

Examples


# FUR candidate motifs in NC_000913 E. coli
FurMotifs=system.file("extdata", "FurMotifs.bed", package="Motif2Site")

# ChIP-seq datasets fe in bed single end format
IPFe <- c(system.file("extdata", "FUR_fe1.bed", package="Motif2Site"),
        system.file("extdata", "FUR_fe2.bed", package="Motif2Site"))
Inputs <- c(system.file("extdata", "Input1.bed", package="Motif2Site"),
            system.file("extdata", "Input2.bed", package="Motif2Site"))
FURfeBedInputStats <- 
  DetectBindingSitesBed(BedFile=FurMotifs,
   IPfiles=IPFe, 
   BackgroundFiles=Inputs, 
   genome="Ecoli",
   genomeBuild="20080805",
   DB="NCBI",
   expName="FUR_Fe_BedInput",
   format="BEDSE"
   )

# ChIP-seq datasets dpd in bed single end format
IPDpd <- c(system.file("extdata", "FUR_dpd1.bed", package="Motif2Site"),
        system.file("extdata", "FUR_dpd2.bed", package="Motif2Site"))
FURdpdBedInputStats <- 
  DetectBindingSitesBed(BedFile=FurMotifs,
   IPfiles=IPDpd, 
   BackgroundFiles=Inputs, 
   genome="Ecoli",
   genomeBuild="20080805",
   DB="NCBI",
   expName="FUR_Dpd_BedInput",
   format="BEDSE"
   )
                       

# Combine all FUR binding sites into one table
corMAT <- recenterBindingSitesAcrossExperiments(
    expLocations=c("FUR_Fe_BedInput","FUR_Dpd_BedInput"),
    experimentNames=c("FUR_Fe","FUR_Dpd"),
    expName="combinedFUR",
    )
corMAT

# FUR candidate motifs in NC_000913 E. coli
FurMotifs=system.file("extdata", "FurMotifs.bed", package="Motif2Site")

# ChIP-seq datasets fe in bed single end format
IPFe <- c(system.file("extdata", "FUR_fe1.bed", package="Motif2Site"),
        system.file("extdata", "FUR_fe2.bed", package="Motif2Site"))
Inputs <- c(system.file("extdata", "Input1.bed", package="Motif2Site"),
            system.file("extdata", "Input2.bed", package="Motif2Site"))
FURfeBedInputStats <- 
  DetectBindingSitesBed(BedFile=FurMotifs,
   IPfiles=IPFe, 
   BackgroundFiles=Inputs, 
   genome="Ecoli",
   genomeBuild="20080805",
   DB="NCBI",
   expName="FUR_Fe_BedInput",
   format="BEDSE"
   )

# ChIP-seq datasets dpd in bed single end format
IPDpd <- c(system.file("extdata", "FUR_dpd1.bed", package="Motif2Site"),
        system.file("extdata", "FUR_dpd2.bed", package="Motif2Site"))
FURdpdBedInputStats <- 
  DetectBindingSitesBed(BedFile=FurMotifs,
   IPfiles=IPDpd, 
   BackgroundFiles=Inputs, 
   genome="Ecoli",
   genomeBuild="20080805",
   DB="NCBI",
   expName="FUR_Dpd_BedInput",
   format="BEDSE"
   )
                       

# Combine all FUR binding sites into one table
corMAT <- recenterBindingSitesAcrossExperiments(
    expLocations=c("FUR_Fe_BedInput","FUR_Dpd_BedInput"),
    experimentNames=c("FUR_Fe","FUR_Dpd"),
    expName="combinedFUR",
    )
corMAT

Remove non-bell shpape motifs prior to binding signal decomposition

Description

Gets motif locations and related short reads and returns the motifs which are non-skewed abs(skewness) < 0.3 and more short reads binds closer to site.

It counts around motif with interval windowSize and windowSize/2, if the smaller window is less than half of the larger one then motif is not considered as Bell-shape

Usage

removeNonBellShapedMotifs(motifLocations, readLocations, windowSize)
removeNonBellShapedMotifs(motifLocations, readLocations, windowSize)

Arguments

`motifLocations`	A vector of motif locations
`readLocations`	A vector of 1nt short reads
`windowSize`	Window size around binding site. The total region would be 2*windowSize+1

Value

The coordinates of accepted motifs

Returns the motif with the highest count

Description

Gets motif locations and related short reads and returns the motif which include the highest number of short reads around it.

Usage

strongestMotif(motifLocations, readLocations, windowSize)
strongestMotif(motifLocations, readLocations, windowSize)

Arguments

`motifLocations`	A vector of motif locations
`readLocations`	A vector of 1nt short reads
`windowSize`	Window size around binding site. The total region would be 2*windowSize+1

Value

The strongest motif

Package 'Motif2Site'

Help Index

Read a bed file as Genomic Ranges

Description

Usage

Arguments

Value

Examples

Combine all IP and Input count table files

Description

Usage

Arguments

Value

Combine motif bed files into a combined ranges

Description

Usage

Arguments

Value

Combine count Table and statistics table

Description

Usage

Arguments

Value

Compare a set of bed files to a user provided regions set

Description

Usage

Arguments

Value

See Also

Examples

Compare a set of bed files to a provided regions set

Description

Usage

Arguments

Value

Comparison motifs locations to a given regions set

Description

Usage

Arguments

Value

Compare a set of motifs to a user provided regions set

Description

Usage

Arguments

Value

See Also

Examples

compute fold enrichment values for an experiment

Description

Usage

Arguments

Value

Synthetic datasets used in the package

Description

Format

Examples

Decompose binding signal among accepted motifs

Description

Usage

Arguments

Value

Delete a vector of files

Description

Usage

Arguments

Value

build heurisitc distribution around the binding sites

Description

Usage

Arguments

Value

Detect binding sites from motif

Description

Usage

Arguments

Value

Detect binding sites from bed motif input

Description

Usage

Arguments