Title: | An R package for nucleosome positioning prediction |
---|---|
Description: | NuPoP is an R package for Nucleosome Positioning Prediction.This package is built upon a duration hidden Markov model proposed in Xi et al, 2010; Wang et al, 2008. The core of the package was written in Fotran. In addition to the R package, a stand-alone Fortran software tool is also available at https://github.com/jipingw. The Fortran codes have complete functonality as the R package. Note: NuPoP has two separate functions for prediction of nucleosome positioning, one for MNase-map trained models and the other for chemical map-trained models. The latter was implemented for four species including yeast, S.pombe, mouse and human, trained based on our recent publications. We noticed there is another package nuCpos by another group for prediction of nucleosome positioning trained with chemicals. A report to compare recent versions of NuPoP with nuCpos can be found at https://github.com/jiping/NuPoP_doc. Some more information can be found and will be posted at https://github.com/jipingw/NuPoP. |
Authors: | Ji-Ping Wang <[email protected]>; Liqun Xi <[email protected]>; Oscar Zarate <[email protected]> |
Maintainer: | Ji-Ping Wang<[email protected]> |
License: | GPL-2 |
Version: | 2.15.0 |
Built: | 2024-12-18 03:07:46 UTC |
Source: | https://github.com/bioc/NuPoP |
NuPoP
is an R package for Nu
cleosome Po
sitioning P
rediction.
This package is built upon a duration hidden Markov model proposed in Xi et al 2010 and
Wang et al 2008. The core of the package was written in Fortran. NuPoP
has prediction
function trained based on both MNase maps and chemical maps of nucleosome positioning data,
all generated from Ji-Ping Wang's lab. Some detailed information about this package can be
found at Ji-Ping Wang's github: https://github.com/jipingw/NuPoP, and its performatics relative to another derived package nuCpos can be found at https://github.com/jipingw/NuPoP_doc.
NuPoP
also provides additional functions for visualization of nucleosome predictions.
Four functions including predNuPoP
,predNuPoP_chem
, readNuPoP
,
and plotNuPoP
are provided for nucleosome positioning prediction based on MNase map and chemical map respecively, prediction results readin, and prediction results visualization respectively.
The input DNA sequence can be of any length.
Package: | NuPoP |
Type: | Package |
Version: | 1.0 |
Date: | 2010-06-24 |
License: | GPL-2 |
predNuPoP
: R function invoking Fortran codes to predict nucleosome positioning, nucleosome occupancy and binding affinity.
readNuPoP
: R function to read in the prediction results by predNuPoP
.
plotNuPoP
: R function to visualize predictions.
predNuPoP_chem
: R function invoking Fortran codes to predict nucleosome positioning, nucleosome occupancy and binding affinity using profiles trained based on chemically mapped nucleosomes in yeast, S.pombe, mouse and human.
Ji-Ping Wang, Liqun Xi
Maintainer: Ji-Ping Wang<[email protected]>
Xi, L., Fondufe-Mittendorf, Y., Xia, L., Flatow, J., Widom, J. and Wang, J.-P. (2010), Predicting nucleosome positioning using a duration Hidden Markov Model, BMC Bioinformatics , doi:10.1186/1471-2105-11-346
Wang, J.-P., Fondufe-Mittendorf, Y., Xi, L., Tsai, G., Segal, E. and Widom, J.(2008), Preferentially quantized linker DNA lengths in Saccharomyces cerevisiae, PLoS Computational Biology, 4(9) e1000175
## Not run: predNuPoP(system.file("extdata", "test.seq", package="NuPoP"),species=7,model=4) ## the prediction results are stored in the current working directory ## the user should replace "system.file("extdata","test.seq_Prediction4.txt",package="NuPoP")" ## by the actual path and file name generated from prediction. temp=readNuPoP(system.file("extdata","test.seq_Prediction4.txt",package="NuPoP"),startPos=1,endPos=5000) plotNuPoP(temp) ## End(Not run)
## Not run: predNuPoP(system.file("extdata", "test.seq", package="NuPoP"),species=7,model=4) ## the prediction results are stored in the current working directory ## the user should replace "system.file("extdata","test.seq_Prediction4.txt",package="NuPoP")" ## by the actual path and file name generated from prediction. temp=readNuPoP(system.file("extdata","test.seq_Prediction4.txt",package="NuPoP"),startPos=1,endPos=5000) plotNuPoP(temp) ## End(Not run)
This function produces two plots from a specified region based on the prediction results from function predNuPoP
. The first plot is the nucleosome occupancy (grey color). In the second plot, in addition to the occupancy, Viterbi prediction (red rectangle) and the posterior probability for a position to be the start of a nucleosome (blue color) are superimposed.
plotNuPoP(predNuPoPResults)
plotNuPoP(predNuPoPResults)
predNuPoPResults |
NuPoP prediction results from predNuPoP function. It must be a data frame read in by readNuPoP function. |
plotNuPoP
outputs two plots: the nucleosome occupancy score map and Viterbi optimal nucleosome positioning map together with posterior probability for a position to be the start of a nucleosome.
## the prediction results are stored in the current working directory ## the user should replace "system.file("extdata","test.seq_Prediction4.txt",package="NuPoP")" ## by the actual path and file name generated from prediction. temp=readNuPoP(system.file("extdata","test.seq_Prediction4.txt",package="NuPoP"),startPos=1,endPos=5000) plotNuPoP(temp)
## the prediction results are stored in the current working directory ## the user should replace "system.file("extdata","test.seq_Prediction4.txt",package="NuPoP")" ## by the actual path and file name generated from prediction. temp=readNuPoP(system.file("extdata","test.seq_Prediction4.txt",package="NuPoP"),startPos=1,endPos=5000) plotNuPoP(temp)
This function invokes Fortran codes to compute the Viterbi prediction of nucleosome positioning, nucleosome occupancy score and nucleosome binding affinity score . A pre-trained linker DNA length distribution for the current species is used in a duration Hidden Markov model.
predNuPoP(file,species=7,model=4)
predNuPoP(file,species=7,model=4)
file |
a string for the path and name of a DNA sequence file in FASTA format. This sequence file can be located in any directory. It must contain only one sequence of any length. By FASTA format, we require each line to be of the same length (the last line can be shorter; the first line should be '>sequenceName'). The length of each line should be not longer than 10 million bp. |
species |
an integer from 0 to 11 as the label for a species indexed as follows: 1 = Human; 2 = Mouse; 3 = Rat; 4 = Zebrafish; 5 = D. melanogaster; 6 = C. elegans; 7 = S. cerevisiae; 8 = C. albicans; 9 = S. pombe; 10 = A. thaliana; 11 = Maize; 0 = Other. The default is 7 = S. cerevisiae . If |
model |
an integer = 4 or 1. NuPoP has two models integrated. One is the first order Markov chain for both nucleosome and linker DNA states. The other is 4th order (default). The latter distinguishes nucleosome/linker in up to 5-mer usage, and thus is slightly more effective in prediction, but runs slower. The time used by 4th order model is about 2.5 times of the 1st order model. |
predNuPoP
outputs the prediction results into the current working directory. The output file is named after the input file with an added extension _Prediction1.txt
or _Prediction4.txt
, where 1 or 4 stands for the order of Markov chain models specified. The output file has five columns, Position
, P-start
, Occup
, N/L
, Affinity
:
Position |
position in the input DNA sequence |
P-start |
probability that the current position is the start of a nucleosome |
Occup |
nucleosome occupancy score |
N/L |
nucleosome (1) or linker (0) for each position based on Viterbi prediction |
Affinity |
nucleosome binding affinity score |
predNuPoP(system.file("extdata", "test.seq", package="NuPoP"),species=7,model=4)
predNuPoP(system.file("extdata", "test.seq", package="NuPoP"),species=7,model=4)
This function invokes Fortran codes to compute the Viterbi prediction of nucleosome positioning, nucleosome occupancy score and nucleosome binding affinity score . A pre-trained linker DNA length distribution for the current species is used in a duration Hidden Markov model. Nucleosome profile is trained based on chemical maps.
predNuPoP_chem(file,species=7,model=4)
predNuPoP_chem(file,species=7,model=4)
file |
a string for the path and name of a DNA sequence file in FASTA format. This sequence file can be located in any directory. It must contain only one sequence of any length. By FASTA format, we require each line to be of the same length (the last line can be shorter; the first line should be '>sequenceName'). The length of each line should be not longer than 10 million bp. |
species |
an integer from 0 to 11 as the label for a species indexed as follows: 1 = Human; 2 = Mouse; 3 = Rat; 4 = Zebrafish; 5 = D. melanogaster; 6 = C. elegans; 7 = S. cerevisiae; 8 = C. albicans; 9 = S. pombe; 10 = A. thaliana; 11 = Maize; 0 = Other. The default is 7 = S. cerevisiae . If |
model |
an integer = 4 or 1. NuPoP has two models integrated. One is the first order Markov chain for both nucleosome and linker DNA states. The other is 4th order (default). The latter distinguishes nucleosome/linker in up to 5-mer usage, and thus is slightly more effective in prediction, but runs slower. The time used by 4th order model is about 2.5 times of the 1st order model. |
predNuPoP
outputs the prediction results into the current working directory. The output file is named after the input file with an added extension _Prediction1.txt
or _Prediction4.txt
, where 1 or 4 stands for the order of Markov chain models specified. The output file has five columns, Position
, P-start
, Occup
, N/L
, Affinity
:
Position |
position in the input DNA sequence |
P-start |
probability that the current position is the start of a nucleosome |
Occup |
nucleosome occupancy score |
N/L |
nucleosome (1) or linker (0) for each position based on Viterbi prediction |
Affinity |
nucleosome binding affinity score |
library(NuPoP) predNuPoP_chem(system.file("extdata", "test.seq", package="NuPoP"),species=7,model=4)
library(NuPoP) predNuPoP_chem(system.file("extdata", "test.seq", package="NuPoP"),species=7,model=4)
This function reads in the prediction results generated by predNuPoP
for specified region.
readNuPoP(file,startPos,endPos)
readNuPoP(file,startPos,endPos)
file |
the prediction output file name from |
startPos |
the start position in the DNA sequence for prediction results plotting. |
endPos |
the end position in the DNA sequence for prediction results plotting. |
A dataframe that contains the results from predNuPoP
. The five columns are
position
: position in the input seqeunce; P-start
: probability as the start of a nucleosome; Occu
: nucleosome occupancy, N/L
: nucleosome (1) or linker (0) based on Viterbi prediction; and Affinity
: nucleosome binding affinity score.
predNuPoP(system.file("extdata", "test.seq", package="NuPoP"),species=7,model=4) ## the prediction results are stored in the current working directory ## the user should replace "system.file("extdata","test.seq_Prediction4.txt",package="NuPoP")" ## by the actual path and file name generated from prediction. temp=readNuPoP(system.file("extdata","test.seq_Prediction4.txt",package="NuPoP"),startPos=1,endPos=5000)
predNuPoP(system.file("extdata", "test.seq", package="NuPoP"),species=7,model=4) ## the prediction results are stored in the current working directory ## the user should replace "system.file("extdata","test.seq_Prediction4.txt",package="NuPoP")" ## by the actual path and file name generated from prediction. temp=readNuPoP(system.file("extdata","test.seq_Prediction4.txt",package="NuPoP"),startPos=1,endPos=5000)