Title: | VarCon: an R package for retrieving neighboring nucleotides of an SNV |
---|---|
Description: | VarCon is an R package which converts the positional information from the annotation of an single nucleotide variation (SNV) (either referring to the coding sequence or the reference genomic sequence). It retrieves the genomic reference sequence around the position of the single nucleotide variation. To asses, whether the SNV could potentially influence binding of splicing regulatory proteins VarCon calcualtes the HEXplorer score as an estimation. Besides, VarCon additionally reports splice site strengths of splice sites within the retrieved genomic sequence and any changes due to the SNV. |
Authors: | Johannes Ptok [aut, cre] |
Maintainer: | Johannes Ptok <[email protected]> |
License: | GPL-3 |
Version: | 1.15.0 |
Built: | 2024-10-31 06:29:12 UTC |
Source: | https://github.com/bioc/VarCon |
This function generates a table with HZEI scores per index nucleotide.
calculateHZEIperNT(seq)
calculateHZEIperNT(seq)
seq |
Nucleotide sequence longer than 11nt and only containing bases "A", "G", "C", "T". |
Dataframe with HZEI value per index position.
calculateHZEIperNT("TTCCAAACGAACTTTTGTAGGGA")
calculateHZEIperNT("TTCCAAACGAACTTTTGTAGGGA")
This function calculates the MaxEntScan score of either splice donor or acceptor sequences.
calculateMaxEntScanScore(seqVector, ssType)
calculateMaxEntScanScore(seqVector, ssType)
seqVector |
Character vector of nucleotide sequence of a splice site sequences. SA sequences should be 23nt long (20 intronic, 3 exonic) and splice donor sequences should be 9nt long (3 exonic, 6 intronic) only contain bases "A", "G", "C", "T". |
ssType |
Numeric indicator, if the entred sequence is a splice donor (5) or acceptor (3) |
Character vector of the MaxEntScan scores generated from the entered seqVector
.
calculateMaxEntScanScore("TTCCAAACGAACTTTTGTAGGGA",3) calculateMaxEntScanScore("GAGGTAAGT",5)
calculateMaxEntScanScore("TTCCAAACGAACTTTTGTAGGGA",3) calculateMaxEntScanScore("GAGGTAAGT",5)
Small data frame specifying a transcript to certain genes for synonymous use.
gene2transcript
gene2transcript
data frame
HGNC gene name
Ensembl gene ID
Ensembl transcript ID
gene2transcript
gene2transcript
This function generates a plot depicting the HZEI score changes and changes in the HBS or MaxEntScan score, from a sequence variation.
generateHEXplorerPlot(variationInfoList, ntWindow)
generateHEXplorerPlot(variationInfoList, ntWindow)
variationInfoList |
Output from the |
ntWindow |
Numeric value defining the sequence surrounding of interest. |
Plot stating the HZEI values per nt and splice site strength with and without the SNV.
#Defining exemplary input data transcriptTable <- transCoord # Using pseudo transcript table transcriptID <- "pseudo_ENST00000650636" # Using pseudo transcript variation <- "c.412C>G/p.(T89M)" ntWindow <- 20 gene2transcript <- data.frame(gene_name = "Example_gene", gene_ID = "pseudo_ENSG00000147099", transcript_ID = "pseudo_ENST00000650636") results <- getSeqInfoFromVariation(referenceDnaStringSet, transcriptID, variation, ntWindow=ntWindow, transcriptTable, gene2transcript) generateHEXplorerPlot(results)
#Defining exemplary input data transcriptTable <- transCoord # Using pseudo transcript table transcriptID <- "pseudo_ENST00000650636" # Using pseudo transcript variation <- "c.412C>G/p.(T89M)" ntWindow <- 20 gene2transcript <- data.frame(gene_name = "Example_gene", gene_ID = "pseudo_ENSG00000147099", transcript_ID = "pseudo_ENST00000650636") results <- getSeqInfoFromVariation(referenceDnaStringSet, transcriptID, variation, ntWindow=ntWindow, transcriptTable, gene2transcript) generateHEXplorerPlot(results)
This function generates a table with MaxEntScan scores per potential SA position.
getMaxEntInfo(seq)
getMaxEntInfo(seq)
seq |
Nucleotide sequence longer than 22nt and only containing bases "A", "G", "C", "T". |
Dataframe of potential acceptor index positons and corresponding MaxEntScan scores.
getMaxEntInfo("TTCCAAACGAACTTTTGTAGGGA")
getMaxEntInfo("TTCCAAACGAACTTTTGTAGGGA")
This function collects information about genomic context of sequence variants.
getSeqInfoFromVariation(referenceDnaStringSet, transcriptID, variation, ntWindow=20, transcriptTable,gene2transcript=gene2transcript)
getSeqInfoFromVariation(referenceDnaStringSet, transcriptID, variation, ntWindow=20, transcriptTable,gene2transcript=gene2transcript)
referenceDnaStringSet |
DNAStringset from the reference genome fasta file. |
transcriptID |
Ensembl ID of the transcript of interest. |
variation |
A sequence variation either refering to coding sequence or the genomic sequence (c.12A>T, or g.182284A>T). |
ntWindow |
Numeric value defining the sequence surrounding of interest. |
transcriptTable |
Table of transcrits and their exon coordinates and CDS coordinates. |
gene2transcript |
Gene to transcript conversion table with the gene name in the first column and the gene ID in the second and the transcript ID in the third column. |
List of informations about the entered variation.
#Defining exemplary input data transcriptTable <- transCoord transcriptID <- "pseudo_ENST00000650636" variation <- "c.412C>G/p.(T89M)" gene2transcript <- data.frame(gene_name = "Example_gene", gene_ID = "pseudo_ENSG00000147099", transcriptID = "pseudo_ENST00000650636") results <- getSeqInfoFromVariation(referenceDnaStringSet, transcriptID, variation, ntWindow=20, transcriptTable, gene2transcript=gene2transcript) #Using a predefined gene to transcript conversion transcriptID <- "Example_gene" results <- getSeqInfoFromVariation(referenceDnaStringSet, transcriptID, variation, ntWindow=20, transcriptTable, gene2transcript=gene2transcript)
#Defining exemplary input data transcriptTable <- transCoord transcriptID <- "pseudo_ENST00000650636" variation <- "c.412C>G/p.(T89M)" gene2transcript <- data.frame(gene_name = "Example_gene", gene_ID = "pseudo_ENSG00000147099", transcriptID = "pseudo_ENST00000650636") results <- getSeqInfoFromVariation(referenceDnaStringSet, transcriptID, variation, ntWindow=20, transcriptTable, gene2transcript=gene2transcript) #Using a predefined gene to transcript conversion transcriptID <- "Example_gene" results <- getSeqInfoFromVariation(referenceDnaStringSet, transcriptID, variation, ntWindow=20, transcriptTable, gene2transcript=gene2transcript)
Donor sequences and their HBS
hbg
hbg
A data frame with columns:
11nt long donor sequence
HBS of the donor sequence
hbg
hbg
Hexamers and Z scores
hex
hex
A data frame with columns:
Sequence of the hexamer.
ZEI-score of the hexamer from HEXplorer.
First codon within the hexamer.
Second codon within the hexamer.
First encoded amino acid within the hexamer (three lettre code).
Second encoded amino acid within the hexamer (three lettre code).
Both encoded amino acid within the hexamer
hex
hex
This function imports Fasta file of the reference genome into R enviroment as DNAStringset.
prepareReferenceFasta(filepath)
prepareReferenceFasta(filepath)
filepath |
R conform filepath to the fasta file of the reference genome to use. |
Creates new DNAStringSet from the object stated by the entered filepath.
## Loading exemplary DNAStringSet filepath <- system.file("extdata", "fastaEx.fa", package="Biostrings") referenceDnaStringSet <- prepareReferenceFasta(filepath)
## Loading exemplary DNAStringSet filepath <- system.file("extdata", "fastaEx.fa", package="Biostrings") referenceDnaStringSet <- prepareReferenceFasta(filepath)
Small DNAStringset as exemplary reference genome sequence
referenceDnaStringSet
referenceDnaStringSet
DNAStringset
Length of feature sequence
Sequence of the feature
Name of the feature
referenceDnaStringSet
referenceDnaStringSet
Start graphical user interface for the VarCon application.
startVarConApp()
startVarConApp()
Shiny app
## Not run: startVarConApp() ## End(Not run)
## Not run: startVarConApp() ## End(Not run)
Small table as exemplary transcript table with exon coordinates.
transCoord
transCoord
data frame
Ensembl gene ID
Ensembl Transcript ID
Strand of the feature
Smalles coordinate of the exon end coordinates of a specific exon
Largest coordinate of the exon end coordinates of a specific exon
Start of the coding sequence
End of the coding sequence
Covered coding nucleotides start
Covered coding nucleotides end
Rank of the exon within the respective transcript
Ensembl exon ID
Name of the chromosome
transCoord
transCoord