Title: | Predict Acetylated Lysine Sites |
---|---|
Description: | ASEB is an R package to predict lysine sites that can be acetylated by a specific KAT-family. |
Authors: | Likun Wang <[email protected]> and Tingting Li <[email protected]>. |
Maintainer: | Likun Wang <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.51.0 |
Built: | 2024-10-30 03:30:13 UTC |
Source: | https://github.com/bioc/ASEB |
This function is used to predict all lysine sites on a specific protein that can be acetylated by a specific KAT-family.
asebProteins(backgroundSites, prodefinedSites, testProteins, outputFile=NULL, permutationTimes=10000) ## S4 method for signature 'character,character,character' asebProteins(backgroundSites, prodefinedSites, testProteins, outputFile=NULL, permutationTimes=10000) ## S4 method for signature 'SequenceInfo,SequenceInfo,SequenceInfo' asebProteins(backgroundSites, prodefinedSites, testProteins, outputFile=NULL, permutationTimes=10000)
asebProteins(backgroundSites, prodefinedSites, testProteins, outputFile=NULL, permutationTimes=10000) ## S4 method for signature 'character,character,character' asebProteins(backgroundSites, prodefinedSites, testProteins, outputFile=NULL, permutationTimes=10000) ## S4 method for signature 'SequenceInfo,SequenceInfo,SequenceInfo' asebProteins(backgroundSites, prodefinedSites, testProteins, outputFile=NULL, permutationTimes=10000)
backgroundSites |
SequenceInfo object or file name (character(1)) for background peptides set. |
prodefinedSites |
SequenceInfo object or file name (character(1)) for KAT special peptides set. |
testProteins |
SequenceInfo object or file name (character(1)) for query Proteins set. |
outputFile |
file name for output (character(1)). |
permutationTimes |
permutation times (integer(1)), default and recommended: 10000. |
This function is used to predict lysine sites that can be acetylated by a specific KAT-family.
The whole process is similar with the GSEA method (permuting gene sets). Please see the references for details.
The first three arguments of method asebProteins can be SequenceInfo objects or file names. If these arguments are SequenceInfo objects, this method returns a list to the users besides an output file. Otherwise, this method processes the FASTA format files directly and outputs all results to a file. In this case, this method can process huge number of sites each time without loading any sequences to R workspace.
The output file contains enrichment scores and P-values for each query site.
The asebProteins,SequenceInfo,SequenceInfo,SequenceInfo-method
also returns a list contains two data.frame
objects: results
and curveInfo
.
results |
contains enrichment scores and P-values for each query site. |
curveInfo |
contains information for enrichment score curves. |
The acetylated lysine sites and their surrounding amino acids (8 on each side) are treated as acetylated peptides.
Example for peptides sequence : "KEHDDIFDKLKEAVKEE".
All input file should follow FASTA format.
Subramanian, A. et al. (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A, 102, 15545-15550.
Mootha, V.K. et al. (2003) PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet, 34, 267-273.
Guttman, M. et al. (2009) Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature, 458, 223-227.
Li, T.T. et al. Characterization and prediction of lysine (K)-acetyl-transferase (KAT) specific acetylation sites. Mol Cell Proteomics, in press.
SequenceInfo
,
readSequence
,
asebSites
,
drawStat
,
drawEScurve
.
backgroundSites <- readSequence(system.file("extdata", "background_sites.fa", package="ASEB")) prodefinedSites <- readSequence(system.file("extdata", "predefined_sites.fa", package="ASEB")) testProteins <- readSequence(system.file("extdata", "proteins_to_test.fa", package="ASEB")) resultList <- asebProteins(backgroundSites, prodefinedSites, testProteins, permutationTimes=100) resultList$results[1:2,]
backgroundSites <- readSequence(system.file("extdata", "background_sites.fa", package="ASEB")) prodefinedSites <- readSequence(system.file("extdata", "predefined_sites.fa", package="ASEB")) testProteins <- readSequence(system.file("extdata", "proteins_to_test.fa", package="ASEB")) resultList <- asebProteins(backgroundSites, prodefinedSites, testProteins, permutationTimes=100) resultList$results[1:2,]
This function is used to predict lysine sites that can be acetylated by a specific KAT-family.
asebSites(backgroundSites, prodefinedSites, testSites, outputFile=NULL, permutationTimes=10000) ## S4 method for signature 'character,character,character' asebSites(backgroundSites, prodefinedSites, testSites, outputFile=NULL, permutationTimes=10000) ## S4 method for signature 'SequenceInfo,SequenceInfo,SequenceInfo' asebSites(backgroundSites, prodefinedSites, testSites, outputFile=NULL, permutationTimes=10000)
asebSites(backgroundSites, prodefinedSites, testSites, outputFile=NULL, permutationTimes=10000) ## S4 method for signature 'character,character,character' asebSites(backgroundSites, prodefinedSites, testSites, outputFile=NULL, permutationTimes=10000) ## S4 method for signature 'SequenceInfo,SequenceInfo,SequenceInfo' asebSites(backgroundSites, prodefinedSites, testSites, outputFile=NULL, permutationTimes=10000)
backgroundSites |
SequenceInfo object or file name (character(1)) for background peptides set. |
prodefinedSites |
SequenceInfo object or file name (character(1)) for KAT special peptides set. |
testSites |
SequenceInfo object or file name (character(1)) for query peptides set. |
outputFile |
file name for output (character(1)). |
permutationTimes |
permutation times (integer(1)), default and recommended: 10000. |
This function is used to predict lysine sites that can be acetylated by a specific KAT-family.
It will give an enrichment score and a P-value for each query lysine site.
The whole process is similar with the GSEA method (permuting gene sets). Please see the references for details.
The first three arguments of method asebSites can be SequenceInfo objects or file names. If these arguments are SequenceInfo objects, this method returns a list to the users besides an output file. Otherwise, this method processes the FASTA format files directly and outputs all results to a file. In this case, this method can process huge number of sites each time without loading any sequences to R workspace.
The output file contains enrichment scores and P-values for each query site.
The asebSites,SequenceInfo,SequenceInfo,SequenceInfo-method
also returns a list contains two data.frame
objects: results
and curveInfo
.
results |
contains enrichment scores and P-values for each query site. |
curveInfo |
contains information for enrichment score curves. |
The acetylated lysine sites and their surrounding amino acids (8 on each side) are treated as acetylated peptides.
Example for peptides sequence : "KEHDDIFDKLKEAVKEE".
All input file should follow FASTA format.
Subramanian, A. et al. (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A, 102, 15545-15550.
Mootha, V.K. et al. (2003) PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet, 34, 267-273.
Guttman, M. et al. (2009) Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature, 458, 223-227.
Li, T.T. et al. Characterization and prediction of lysine (K)-acetyl-transferase (KAT) specific acetylation sites. Mol Cell Proteomics, in press.
SequenceInfo
,
readSequence
,
asebProteins
,
drawStat
,
drawEScurve
.
backgroundSites <- readSequence(system.file("extdata", "background_sites.fa", package="ASEB")) prodefinedSites <- readSequence(system.file("extdata", "predefined_sites.fa", package="ASEB")) testSites <- readSequence(system.file("extdata", "sites_to_test.fa", package="ASEB")) resultList <- asebSites(backgroundSites, prodefinedSites, testSites, permutationTimes=100) resultList$results[1:2,]
backgroundSites <- readSequence(system.file("extdata", "background_sites.fa", package="ASEB")) prodefinedSites <- readSequence(system.file("extdata", "predefined_sites.fa", package="ASEB")) testSites <- readSequence(system.file("extdata", "sites_to_test.fa", package="ASEB")) resultList <- asebSites(backgroundSites, prodefinedSites, testSites, permutationTimes=100) resultList$results[1:2,]
This function is used to draw Enriment Score curves for specific sites.
drawEScurve(curveInfoDataFrame, sites=NULL, max_p_value=0.1, min_es=0.2, outputDir=NULL, figKind=c("pdf","jpeg")) ## S4 method for signature 'data.frame' drawEScurve(curveInfoDataFrame, sites=NULL, max_p_value=0.1, min_es=0.2, outputDir=NULL, figKind=c("pdf","jpeg"))
drawEScurve(curveInfoDataFrame, sites=NULL, max_p_value=0.1, min_es=0.2, outputDir=NULL, figKind=c("pdf","jpeg")) ## S4 method for signature 'data.frame' drawEScurve(curveInfoDataFrame, sites=NULL, max_p_value=0.1, min_es=0.2, outputDir=NULL, figKind=c("pdf","jpeg"))
curveInfoDataFrame |
data.frame object: contains curve information for sites, see example. |
sites |
character vector: only draw curves for sites with ids appear in this vector. |
max_p_value |
numeric(1), only draw curves for sites with p-value less than this value. |
min_es |
numeric(1), only draw curves for sites with Enriment Score more than this value. |
outputDir |
character(1), output directory name for all the figures. |
figKind |
character(1), fig format: |
This function is used to draw Enrichment Score curves for specific sites.
These curves show running-sum process for calculating enrichment score.
The data.frame object contains curve information is given by
asebSites
or asebProteins
.
SequenceInfo
,
readSequence
,
asebSites
,
asebProteins
,
drawStat
.
backgroundSites <- readSequence(system.file("extdata", "background_sites.fa", package="ASEB")) prodefinedSites <- readSequence(system.file("extdata", "predefined_sites.fa", package="ASEB")) testSites <- readSequence(system.file("extdata", "sites_to_test.fa", package="ASEB")) resultList <- asebSites(backgroundSites, prodefinedSites, testSites, permutationTimes=100) drawEScurve(resultList$curveInfo, max_p_value=0.1, min_es=0.1, outputDir=tempdir(), figKind="jpeg") cat("see figures in output dir:", tempdir(),"\n")
backgroundSites <- readSequence(system.file("extdata", "background_sites.fa", package="ASEB")) prodefinedSites <- readSequence(system.file("extdata", "predefined_sites.fa", package="ASEB")) testSites <- readSequence(system.file("extdata", "sites_to_test.fa", package="ASEB")) resultList <- asebSites(backgroundSites, prodefinedSites, testSites, permutationTimes=100) drawEScurve(resultList$curveInfo, max_p_value=0.1, min_es=0.1, outputDir=tempdir(), figKind="jpeg") cat("see figures in output dir:", tempdir(),"\n")
This function is used to show P-values and enrichment scores for all lysine sites on a specific protein.
drawStat(curveInfoDataFrame, proteinIds=NULL, outputDir=NULL, figKind=c("pdf","jpeg")) ## S4 method for signature 'data.frame' drawStat(curveInfoDataFrame, proteinIds=NULL, outputDir=NULL, figKind=c("pdf","jpeg"))
drawStat(curveInfoDataFrame, proteinIds=NULL, outputDir=NULL, figKind=c("pdf","jpeg")) ## S4 method for signature 'data.frame' drawStat(curveInfoDataFrame, proteinIds=NULL, outputDir=NULL, figKind=c("pdf","jpeg"))
curveInfoDataFrame |
data.frame object: contains curve information for all proteins, see example. |
proteinIds |
character vector: only draw curves for proteins with ids appear in this vector. |
outputDir |
character(1), output directory name for all the figures. |
figKind |
character(1), fig format: |
This function is used to draw P-values and enrichment scores for all lysine sites on a specific protein.
The X-axis shows positions of all lysine sites on a specific protein, and Y-axis shows the enrichment scores (0~1) and P-values (0~1)
for each lysine site. The data.frame object contains curve information is given by asebProteins
.
SequenceInfo
,
readSequence
,
asebSites
,
asebProteins
,
drawEScurve
.
backgroundSites <- readSequence(system.file("extdata", "background_sites.fa", package="ASEB")) prodefinedSites <- readSequence(system.file("extdata", "predefined_sites.fa", package="ASEB")) testProteins <- readSequence(system.file("extdata", "proteins_to_test.fa", package="ASEB")) resultList <- asebProteins(backgroundSites, prodefinedSites, testProteins, permutationTimes=100) #drawEScurve(resultList$curveInfo, max_p_value=0.5, min_es=0, outputDir=tempdir(), figKind="jpeg") drawStat(resultList$curveInfo, outputDir=tempdir(), figKind="jpeg"); cat("see figures in output dir:", tempdir(),"\n")
backgroundSites <- readSequence(system.file("extdata", "background_sites.fa", package="ASEB")) prodefinedSites <- readSequence(system.file("extdata", "predefined_sites.fa", package="ASEB")) testProteins <- readSequence(system.file("extdata", "proteins_to_test.fa", package="ASEB")) resultList <- asebProteins(backgroundSites, prodefinedSites, testProteins, permutationTimes=100) #drawEScurve(resultList$curveInfo, max_p_value=0.5, min_es=0, outputDir=tempdir(), figKind="jpeg") drawStat(resultList$curveInfo, outputDir=tempdir(), figKind="jpeg"); cat("see figures in output dir:", tempdir(),"\n")
This function is used to read sequences from FASTA format file.
readSequence(file)
readSequence(file)
file |
character(1), file name for input. |
This function return an object of SequenceInfo
that
contains sequences and identifiers from FASTA format input file.
A SequenceInfo
object containing sequences and identifiers from input file.
SequenceInfo
,
asebSites
,
asebProteins
,
drawStat
,
drawEScurve
.
ff <- system.file("extdata", "background_sites.fa", package="ASEB") readSequence(ff)
ff <- system.file("extdata", "background_sites.fa", package="ASEB") readSequence(ff)
This class is used to store sequences and identifiers for lysine sites or proteins.
Objects from this class are created by constructor SequenceInfo
, as outlined below.
sequences
:"character"
containing sequences.
id
:"character"
containing identifiers.
Constructor:
signature(sequences = "character", id = "character")
:
Create a SequenceInfo
object from sequences and their
identifiers. The length of id
must match that of sequences
.
readSequence
,
asebSites
,
asebProteins
,
drawStat
,
drawEScurve
.
showClass("SequenceInfo")
showClass("SequenceInfo")