Title: | Assign rfPred functional prediction scores to a missense variants list |
---|---|
Description: | Based on external numerous data files where rfPred scores are pre-calculated on all genomic positions of the human exome, the package gives rfPred scores to missense variants identified by the chromosome, the position (hg19 version), the referent and alternative nucleotids and the uniprot identifier of the protein. Note that for using the package, the user has to download the TabixFile and index (approximately 3.3 Go). |
Authors: | Fabienne Jabot-Hanin, Hugo Varet and Jean-Philippe Jais |
Maintainer: | Hugo Varet <[email protected]> |
License: | GPL (>=2 ) |
Version: | 1.45.0 |
Built: | 2024-10-31 04:53:03 UTC |
Source: | https://github.com/bioc/rfPred |
The package provides a function which returns the rfPred score for a list of
non-synonymous missense variants. All the rfPred scores are pre-calculated and
stored in a TabixFile
available on a server and which can be downloaded
for using the package while not connected on the Internet. The package does not
work without an access to the TabixFile
. However, a toy example on the
chromosome Y is available within the package to test the rfPred_scores
function.
curves with numbers of subjects at risk, compare data sets, display spaghetti-plot, build multi-contingency tables...
Fabienne Jabot-Hanin, Hugo Varet and Jean-Philippe Jais
dbNSFP database: Liu X, Jian X and Boerwinkle E. 2011. dbNSFP: a lightweight database of human non-synonymous SNPs and their functional predictions. Human Mutation. 32:894-899.
rfPred method: Jabot-Hanin F, Varet H, Tores F and Jais J-P. 2013. rfPred: a new meta-score for functional prediction of missense variants in human exome (submitted).
Toy example of GRanges object
A GRanges
object with 11 rows and several columns:
seqnames
Chromosome number (only Y
in this example)
ranges
IRanges
object for which start
=end
: position on the chromosome
reference
Referent nucleotid (A
, C
, G
or T
)
alteration
Alteration nucleotid (A
, C
, G
or T
)
rfPred is a statistical method which combines 5 algorithms predictions in a random forest model: SIFT, Polyphen2, LRT, PhyloP and MutationTaster. These scores are available in the dbNFSP database for all the possible missense variants in hg19 version, and the package rfPred gives a composite score more reliable than each of the isolated algorithms.
rfPred_scores( variant_list, data = system.file("extdata/chrY_rfPred.txtz", package = "rfPred"), index = system.file("extdata/chrY_rfPred.txtz.tbi", package = "rfPred"), all.col = FALSE, file.export = NULL, n.cores = 1 ) rfPred_scores(variant_list, data=system.file("extdata/chrY_rfPred.txtz", package="rfPred"), index=system.file("extdata/chrY_rfPred.txtz.tbi", package="rfPred"), all.col=FALSE, file.export=NULL, n.cores=1) ## S4 method for signature 'character' rfPred_scores( variant_list, data = system.file("extdata/chrY_rfPred.txtz", package = "rfPred"), index = system.file("extdata/chrY_rfPred.txtz.tbi", package = "rfPred"), all.col = FALSE, file.export = NULL, n.cores = 1 ) ## S4 method for signature 'GRanges' rfPred_scores( variant_list, data = system.file("extdata/chrY_rfPred.txtz", package = "rfPred"), index = system.file("extdata/chrY_rfPred.txtz.tbi", package = "rfPred"), all.col = FALSE, file.export = NULL, n.cores = 1 )
rfPred_scores( variant_list, data = system.file("extdata/chrY_rfPred.txtz", package = "rfPred"), index = system.file("extdata/chrY_rfPred.txtz.tbi", package = "rfPred"), all.col = FALSE, file.export = NULL, n.cores = 1 ) rfPred_scores(variant_list, data=system.file("extdata/chrY_rfPred.txtz", package="rfPred"), index=system.file("extdata/chrY_rfPred.txtz.tbi", package="rfPred"), all.col=FALSE, file.export=NULL, n.cores=1) ## S4 method for signature 'character' rfPred_scores( variant_list, data = system.file("extdata/chrY_rfPred.txtz", package = "rfPred"), index = system.file("extdata/chrY_rfPred.txtz.tbi", package = "rfPred"), all.col = FALSE, file.export = NULL, n.cores = 1 ) ## S4 method for signature 'GRanges' rfPred_scores( variant_list, data = system.file("extdata/chrY_rfPred.txtz", package = "rfPred"), index = system.file("extdata/chrY_rfPred.txtz.tbi", package = "rfPred"), all.col = FALSE, file.export = NULL, n.cores = 1 )
variant_list |
A variants list in a |
data |
Path to the compressed TabixFile, either on the server (default) or on the user's computer |
index |
Path to the index of the TabixFile, either on the server (default) or on the user's computer |
all.col |
|
file.export |
Optional, name of the CSV file in which export the results (default is |
n.cores |
number of cores to use when scaning the TabixFile, can be efficient for large request (default is 1) |
The variants list with the assigned rfPred scores, as well as the scores used to build rfPred meta-score: SIFT, phyloP, MutationTaster, LRT (transformed) and Polyphen2 (corresponding to Polyphen2_HVAR_score). The data frame returned contains these columns:
chromosome |
chromosome number |
position_hg19 |
physical position on the chromosome as to hg19 (1-based coordinate) |
reference |
reference nucleotide allele (as on the + strand) |
alteration |
alternative nucleotide allele (as on the + strand) |
proteine |
Uniprot accession number |
aaref |
reference amino acid |
aaalt |
alternative amino acid |
aapos |
amino acid position as to the protein |
rfPred_score |
rfPred score betwen 0 and 1 (higher it is, higher is the probability of pathogenicity) |
SIFT_score |
SIFT score between 0 and 1 (higher it is, higher is the probability of pathogenicity contrary to the original SIFT score) = 1-original SIFT score |
Polyphen2_score |
Polyphen2 (HVAR one) score between 0 and 1, used to calculate rfPred (higher it is, higher is the probability of pathogenicity) |
MutationTaster_score |
MutationTaster score between 0 and 1 (higher it is, higher is the probability of pathogenicity) |
PhyloP_score |
PhyloP score between 0 and 1 (higher it is, higher is the probability of pathogenicity): PhyloP_score=1-0.5x10^phyloP if phyloP>0 or PhyloP_score=0.5x10^-phyloP if phyloP<0 |
LRT_score |
LRT score between 0 and 1 (higher it is, higher is the probability of pathogenicity): LRT_score=1-LRToriginalx0.5 if LRT_Omega<1 or LRT_score=LRToriginalx0.5 if LRT_Omega>=1 |
The following columns are also returned if all.col
is TRUE
:
Uniprot_id |
Uniprot ID number |
genename |
gene name |
position_hg18 |
physical position on the chromosome as to hg18 (1-based coordinate) |
Polyphen2_HDIV_score |
Polyphen2 score based on HumDiv, i.e. hdiv_prob. The score ranges from 0 to 1: the corresponding prediction is "probably damaging" if it is in [0.957,1]; "possibly damaging" if it is in [0.453,0.956]; "benign" if it is in [0,0.452]. Score cut-off for binary classification is 0.5, i.e. the prediction is "neutral" if the score is lower than 0.5 and "deleterious" if the score is higher than 0.5. Multiple entries separated by ";" |
Polyphen2_HDIV_pred |
Polyphen2 prediction based on HumDiv: |
Polyphen2_HVAR_score |
Polyphen2 score based on HumVar, i.e. hvar_prob. The score ranges from 0 to 1, and the corresponding prediction is "probably damaging" if it is in [0.909,1]; "possibly damaging" if it is in [0.447,0.908]; "benign" if it is in [0,0.446]. Score cut-off for binary classification is 0.5, i.e. the prediction is "neutral" if the score is lower than 0.5 and "deleterious" if the score is higher than 0.5. Multiple entries separated by ";" |
Polyphen2_HVAR_pred |
Polyphen2 prediction based on HumVar: |
MutationTaster_pred |
MutationTaster prediction: |
phyloP |
original phyloP score |
LRT_Omega |
estimated nonsynonymous-to-synonymous-rate ratio |
LRT_pred |
LRT prediction, |
Fabienne Jabot-Hanin, Hugo Varet and Jean-Philippe Jais
Jabot-Hanin F, Varet H, Tores F and Jais J-P. 2013. rfPred: a new meta-score for functional prediction of missense variants in human exome (submitted).
# from a data.frame without uniprot protein identifier data(variant_list_Y) res=rfPred_scores(variant_list = variant_list_Y[,1:4], data = system.file("extdata", "chrY_rfPred.txtz", package="rfPred",mustWork=TRUE), index = system.file("extdata", "chrY_rfPred.txtz.tbi", package="rfPred",mustWork=TRUE)) # from a data.frame with uniprot protein identifier res2=rfPred_scores(variant_list = variant_list_Y, data = system.file("extdata", "chrY_rfPred.txtz", package="rfPred",mustWork=TRUE), index = system.file("extdata", "chrY_rfPred.txtz.tbi", package="rfPred",mustWork=TRUE)) # from a VCF file res3=rfPred_scores(variant_list = system.file("extdata", "example.vcf", package="rfPred",mustWork=TRUE), data = system.file("extdata", "chrY_rfPred.txtz", package="rfPred",mustWork=TRUE), index = system.file("extdata", "chrY_rfPred.txtz.tbi", package="rfPred",mustWork=TRUE)) # from a GRanges object data(example_GRanges) res4=rfPred_scores(variant_list = example_GRanges, data = system.file("extdata", "chrY_rfPred.txtz", package="rfPred",mustWork=TRUE), index = system.file("extdata", "chrY_rfPred.txtz.tbi", package="rfPred",mustWork=TRUE))
# from a data.frame without uniprot protein identifier data(variant_list_Y) res=rfPred_scores(variant_list = variant_list_Y[,1:4], data = system.file("extdata", "chrY_rfPred.txtz", package="rfPred",mustWork=TRUE), index = system.file("extdata", "chrY_rfPred.txtz.tbi", package="rfPred",mustWork=TRUE)) # from a data.frame with uniprot protein identifier res2=rfPred_scores(variant_list = variant_list_Y, data = system.file("extdata", "chrY_rfPred.txtz", package="rfPred",mustWork=TRUE), index = system.file("extdata", "chrY_rfPred.txtz.tbi", package="rfPred",mustWork=TRUE)) # from a VCF file res3=rfPred_scores(variant_list = system.file("extdata", "example.vcf", package="rfPred",mustWork=TRUE), data = system.file("extdata", "chrY_rfPred.txtz", package="rfPred",mustWork=TRUE), index = system.file("extdata", "chrY_rfPred.txtz.tbi", package="rfPred",mustWork=TRUE)) # from a GRanges object data(example_GRanges) res4=rfPred_scores(variant_list = example_GRanges, data = system.file("extdata", "chrY_rfPred.txtz", package="rfPred",mustWork=TRUE), index = system.file("extdata", "chrY_rfPred.txtz.tbi", package="rfPred",mustWork=TRUE))
rfPred_scores
Motor of rfPred_scores
rfPred_scores_motor(variant_list, data, index, all.col, file.export, n.cores)
rfPred_scores_motor(variant_list, data, index, all.col, file.export, n.cores)
variant_list |
Variants list in a |
data |
Path to the compressed TabixFile, either on the server (default) or on the user's computer |
index |
Path to the index of the TabixFile, either on the server (default) or on the user's computer |
all.col |
|
file.export |
Optional, name of the CSV file in which export the results (default is |
n.cores |
number of cores to use when scaning the TabixFile, can be efficient for large request (default is 1) |
see the rfPred_scores
function
This function is called by the rfPred_scores
S4 method
Toy example of data.frame
A data frame with 5 observations on the following 5 variables:
chr
Chromosome number (only Y
in this example)
pos
Position on the chromosome (numeric)
ref
Referent nucleotid (A
, C
, G
or T
)
alt
Alteration nucleotid (A
, C
, G
or T
)
uniprot
Uniprot protein identifier (factor)