Package 'DominoEffect' reference manual

Title:	Identification and Annotation of Protein Hotspot Residues
Description:	The functions support identification and annotation of hotspot residues in proteins. These are individual amino acids that accumulate mutations at a much higher rate than their surrounding regions.
Authors:	Marija Buljan and Peter Blattmann
Maintainer:	Marija Buljan <[email protected]>, Peter Blattmann <[email protected]>
License:	GPL (>= 3)
Version:	1.27.0
Built:	2025-03-29 05:34:26 UTC
Source:	https://github.com/bioc/DominoEffect

Align protein segnent around the hotspot to the UniProt/Swiss-Prot KB sequence.

Description

This function alignes the Ensembl protein region with a hotspot to the UniProt sequence. The Ensembl region encompasses 15 amino acids where the hotspot is in the middle. If the hotspot was at the protein start or end the region is still 15 amino acids long, but the hotspot position is shifted.

Usage

  align_to_unip(ens.seq, uni.seq, ensembl_mut_position)
align_to_unip(ens.seq, uni.seq, ensembl_mut_position)

Arguments

`ens.seq`	AAString object with the Ensembl protein sequence corresponding to the representative transcript.
`uni.seq`	AAString with the UniProt sequence for the identifier matching the Ensembl gene name.
`ensembl_mut_position`	Numeric vector defining the hotspot position in the Ensembl sequence, i.e. in the ens.seq

Value

Returns a list where the first element is a character vector defining whether there was a significant alignment and the second element provides the hotspot position in the UniProt sequence.

Author(s)

Marija Buljan <[email protected]> Peter Blattmann <[email protected]>

Examples

library(Biostrings)

ens.seq <- AAString("MDLSALREEVQNVINAMQKILECPICLELIKEPVSTKCDHIFCKFCMLK")
uni.seq <- AAString("MDLSALRVEEVQNVINAMQKILECPICLELIKEPVSTKCDHIFCKFCMLA")
ensembl_mut_position <- 25

align_to_unip(ens.seq, uni.seq, ensembl_mut_position)
library(Biostrings)

ens.seq <- AAString("MDLSALREEVQNVINAMQKILECPICLELIKEPVSTKCDHIFCKFCMLK")
uni.seq <- AAString("MDLSALRVEEVQNVINAMQKILECPICLELIKEPVSTKCDHIFCKFCMLA")
ensembl_mut_position <- 25

align_to_unip(ens.seq, uni.seq, ensembl_mut_position)

calculate_boundary

Description

The function calculates boundaries of sequence windows around the mutation. It is possible to define up to two window lengths. If the mutation is close to the start or end of the protein, the region is extended into the other direction to keep the indicated size

Usage

calculate_boundary(mut_pos_numeric, length_aa, flanking_region)
calculate_boundary(mut_pos_numeric, length_aa, flanking_region)

Arguments

`mut_pos_numeric`	Amino acid position of mutation
`length_aa`	Length of transcript in amino acids
`flanking_region`	Vector containing two flanking regions

Value

returns a list with the boundaries for the two regions

Author(s)

Marija Buljan <[email protected]> Peter Blattmann <[email protected]>

Examples

calculate_boundary(250, 500, c(200, 300))
calculate_boundary(250, 500, 300)
calculate_boundary(250, 500, c(200, 300))
calculate_boundary(250, 500, 300)

Sample data

Description

Sample Data

Author(s)

Marija Buljan <[email protected]> Peter Blattmann <[email protected]>

Identification of significant mutation hotspot residues.

Description

The function identifies individual amino acid residues, which accumulate a high fraction of the overall mutation load within a protein. After detecting mutation hotspots, the function obtains UniProt/Swiss-Prot KB functional and structural annotations for the affected protein regions and checks for the sequence agreement.

Usage

  DominoEffect(mutation_dataset, gene_data, snp_data,
min_n_muts = 5, MAF_thresh = 0.01,
flanking_region = c(200, 300),
poisson.thr = 0.01, percentage.thr = 0.15,
ratio.thr = 45, approach = "percentage", write_to_file = "NO",
ens_release = "https://feb2023.archive.ensembl.org")
DominoEffect(mutation_dataset, gene_data, snp_data,
min_n_muts = 5, MAF_thresh = 0.01,
flanking_region = c(200, 300),
poisson.thr = 0.01, percentage.thr = 0.15,
ratio.thr = 45, approach = "percentage", write_to_file = "NO",
ens_release = "https://feb2023.archive.ensembl.org")

Arguments

`mutation_dataset`	Object containing a table with the mutation data (e.g. TCGA point mutations mapped to protein level).
`gene_data`	DominoData object containing information about Ensembl gene annotations: gene identifiers and regresentative transcript cDNA length.
`snp_data`	Object containing a table with information on population SNPs.
`min_n_muts`	Numeric vector defining a minimum number of mutations that need to occur at the same residue. Default: 5
`MAF_thresh`	Numeric vector that defines Minor allele frequency threshold for considering reported mutations as population SNPs.
`flanking_region`	Numeric vector that defines size of a window around the mutation that will be considered. Up to two window sizes are allowed.
`poisson.thr`	Numeric vector that defines a treshold for the adjusted p-value. Residues with an associated p-value that is lower than the defined value are reported. Default: 0.01
`percentage.thr`	Number defining the fraction of mutations within the window that need to fall on a single residue in order for it to be classified as a hotspot. Default: 0.15
`ratio.thr`	Number defining a requirement that a number of mutations on a single residue should exceed what would be expected by chance given a background mutation rate in the window (i.e. the surrounding region). Default: 45
`approach`	Option to define selection criteria to use precentage.thr or ratio.thr as criterion for finding single residue mutation clusters. Options: "both", "percentage" or "ratio". Default = "percentage"
`write_to_file`	Option if the identified and annotated hotspots should be written to a file (YES or NO). Default: NO
`ens_release`	Release of ensembl to be used. Default: https://feb2023.archive.ensembl.org

Value

Results table

Author(s)

Marija Buljan <[email protected]> Peter Blattmann <[email protected]>

Examples

data("SnpData", package = "DominoEffect")
data("TestData", package = "DominoEffect")
data("DominoData", package = "DominoEffect")

hotspot_mutations <- DominoEffect(mutation_dataset = TestData, 
gene_data = DominoData, snp_data = SnpData)

data("SnpData", package = "DominoEffect")
data("TestData", package = "DominoEffect")
data("DominoData", package = "DominoEffect")

hotspot_mutations <- DominoEffect(mutation_dataset = TestData, 
gene_data = DominoData, snp_data = SnpData)

Converts hotspot mutation table into a GPo object

Description

This function converts the genomic information on hotspot mutations into a GPo object.

Usage

  GPo_of_hotspots(hotspot_mutations)
GPo_of_hotspots(hotspot_mutations)

Arguments

hotspot_mutations

Data frame with information on hotspot mutations generated by the DominoEffect package.

Value

GPo object that contains the genomic information on hotspot mutations.

Author(s)

Marija Buljan <[email protected]> Peter Blattmann <[email protected]>

Examples


data("SnpData", package = "DominoEffect")
data("TestData", package = "DominoEffect")
data("DominoData", package = "DominoEffect")

hotspot_mutations <- DominoEffect(mutation_dataset = TestData, 
gene_data = DominoData, snp_data = SnpData)
GPo_of_hotspots(hotspot_mutations)

data("SnpData", package = "DominoEffect")
data("TestData", package = "DominoEffect")
data("DominoData", package = "DominoEffect")

hotspot_mutations <- DominoEffect(mutation_dataset = TestData, 
gene_data = DominoData, snp_data = SnpData)
GPo_of_hotspots(hotspot_mutations)

Identify hotspots

Description

The function identify protein hotspot mutation residues

Usage

  identify_hotspots(mutation_dataset, gene_data,
  snp_data, min_n_muts = 5, MAF_thresh = 0.01, flanking_region = c(200, 300), 
  poisson.thr = 0.01, percentage.thr = 0.15, ratio.thr = 45, approach = "percentage")
identify_hotspots(mutation_dataset, gene_data,
  snp_data, min_n_muts = 5, MAF_thresh = 0.01, flanking_region = c(200, 300), 
  poisson.thr = 0.01, percentage.thr = 0.15, ratio.thr = 45, approach = "percentage")

Arguments

`mutation_dataset`	Object containing a table with the mutation data (e.g. TCGA point mutations mapped to protein level).
`gene_data`	Data frame or Txdb object containing information about Ensembl gene annotations: gene identifiers and regresentative transcript cDNA length.
`snp_data`	Object containing a table or vcf object with information on population SNPs.
`min_n_muts`	Numeric vector defining a minimum number of mutations that need to occur at the same residue. Default: 5
`MAF_thresh`	Numeric vector that defines Minor allele frequency threshold for considering reported mutations as population SNPs.
`flanking_region`	Numeric vector that defines size of a window around the mutation that will be considered. Up to two window sizes are allowed.
`poisson.thr`	Numeric vector that defines a treshold for the adjusted p-value. Residues with an associated p-value that is lower than the defined value are reported. Default: 0.01
`percentage.thr`	Number defining the fraction of mutations within the window that need to fall on a single residue in order for it to be classified as a hotspot. Default: 0.15
`ratio.thr`	Number defining a requirement that a number of mutations on a single residue should exceed what would be expected by chance given a background mutation rate in the window (i.e. the surrounding region). Default: 45
`approach`	Option to define selection criteria to use precentage.thr or ratio.thr as criterion for finding single residue mutation clusters. Options: "both", "percentage" or "ratio". Default = "percentage"

Value

An object containing information on the significant hotspots, associated Gene and protein identifiers, number of mutations, percentage of mutations within defined windows that map to the same residue and associated p-values.

Author(s)

Marija Buljan <[email protected]> Peter Blattmann <[email protected]>

Examples


data("SnpData", package = "DominoEffect")
data("TestData", package = "DominoEffect")
data("DominoData", package = "DominoEffect")
hotspot_mutations <- identify_hotspots(mutation_dataset = TestData, 
   gene_data = DominoData, snp_data = SnpData)
data("SnpData", package = "DominoEffect")
data("TestData", package = "DominoEffect")
data("DominoData", package = "DominoEffect")
hotspot_mutations <- identify_hotspots(mutation_dataset = TestData, 
   gene_data = DominoData, snp_data = SnpData)

Imports txdb data and converts it into format required for DominoEffect package

Description

This function imports txdb data and converts into a data frame used in the DominoEffect package only extracting the required information from the txdb object.

Usage

  import_txdb(txdb_object)
import_txdb(txdb_object)

Arguments

txdb_object

TxDB Object, e.g. from makeTxDbFromEnsembl

Value

Data frame that can be used in DominoEffect package.

Author(s)

Marija Buljan <[email protected]> Peter Blattmann <[email protected]>

Examples


#EnsTxDB <- makeTxDbFromEnsembl(organism="Homo sapiens", release=73, 
#                               server="ensembldb.ensembl.org")
#DominoData <- import_txdb(EnsTxDB)
#head(DominoData)

#EnsTxDB <- makeTxDbFromEnsembl(organism="Homo sapiens", release=73, 
#                               server="ensembldb.ensembl.org")
#DominoData <- import_txdb(EnsTxDB)
#head(DominoData)

Functional annotation of significant hotspot residues.

Description

This function retrieves Uniprot annotations for th efunctional elements in the proteins with significant hotspots and overlaps and maps the hotspot residues to these.

Usage

  map_to_func_elem(hotspot_results, write_to_file = "NO", ens_release = "109")
map_to_func_elem(hotspot_results, write_to_file = "NO", ens_release = "109")

Arguments

`hotspot_results`	Object containining information about the hotspot residues identified using the function identify_hotspots().
`write_to_file`	A character vector defining whether the resulting annotated hotspots should be saved in a file (YES or NO).
`ens_release`	A character vector defining whether the default gene annotations are used, i.e. Ensembl release 109, or if the gene_data correspond to a different Ensembl release. For the current Ensembl version this should be set to: ens_release="www.ensembl.org". For the archive versions to the corresponding archive website.

Value

Updated results file containing an additional columns with the information on the annotated functional and structural region within which the mutation is mapped.

Author(s)

Marija Buljan <[email protected]> Peter Blattmann <[email protected]>

Examples

data("TestData", package = "DominoEffect")
data("DominoData", package = "DominoEffect")
data("SnpData", package = "DominoEffect")

hotspot_mutations <- identify_hotspots(TestData, DominoData, SnpData)
hotspot_mutations <- map_to_func_elem(hotspot_mutations, 
write_to_file = "NO", ens_release = "109")

head(hotspot_mutations)
data("TestData", package = "DominoEffect")
data("DominoData", package = "DominoEffect")
data("SnpData", package = "DominoEffect")

hotspot_mutations <- identify_hotspots(TestData, DominoData, SnpData)
hotspot_mutations <- map_to_func_elem(hotspot_mutations, 
write_to_file = "NO", ens_release = "109")

head(hotspot_mutations)

Package 'DominoEffect'

Help Index

Align protein segnent around the hotspot to the UniProt/Swiss-Prot KB sequence.

Description

Usage

Arguments

Value

Author(s)

Examples

calculate_boundary

Description

Usage

Arguments

Value

Author(s)

Examples

Sample data

Description

Author(s)

Identification of significant mutation hotspot residues.

Description

Usage

Arguments

Value

Author(s)

Examples

Converts hotspot mutation table into a GPo object

Description

Usage

Arguments

Value

Author(s)

Examples

Identify hotspots

Description

Usage

Arguments

Value

Author(s)

Examples

Imports txdb data and converts it into format required for DominoEffect package

Description

Usage

Arguments

Value

Author(s)

Examples

Functional annotation of significant hotspot residues.

Description

Usage

Arguments

Value

Author(s)

Examples