Package 'KinSwingR' reference manual

Title:	KinSwingR: network-based kinase activity prediction
Description:	KinSwingR integrates phosphosite data derived from mass-spectrometry data and kinase-substrate predictions to predict kinase activity. Several functions allow the user to build PWM models of kinase-subtrates, statistically infer PWM:substrate matches, and integrate these data to infer kinase activity.
Authors:	Ashley J. Waardenberg [aut, cre]
Maintainer:	Ashley J. Waardenberg <[email protected]>
License:	GPL-3
Version:	1.25.0
Built:	2025-03-29 04:32:13 UTC
Source:	https://github.com/bioc/KinSwingR

Generate Position Weight Matrices (PWMs)

Description

Generate Position Weight Matrices (PWMs) for a table containing centered substrate peptide sequences for a list of kinases. The output of this function is to be used for scoring PWM matches to peptides via scoreSequences()

Usage

buildPWM(kinase_table = NULL, wild_card = "_", substrate_length = 15,
  substrates_n = 10, pseudo = 0.01, remove_center = FALSE,
  verbose = FALSE)
buildPWM(kinase_table = NULL, wild_card = "_", substrate_length = 15,
  substrates_n = 10, pseudo = 0.01, remove_center = FALSE,
  verbose = FALSE)

Arguments

`kinase_table`	A data.frame of substrate sequences and kinase names. Format of data must be as follows: column 1 - kinase/kinase family name/GeneID, column 2 - centered peptide seqeuence.
`wild_card`	Letter to describe sequences that are outside of the protein after centering on the phosphosite (e.g ___MERSTRELCLNF). Default: "_".
`substrate_length`	Full length of substrate sequence (default is 15). Will be trimmed automatically or report error if sequences in kinase_table are not long enough.
`substrates_n`	Number of sequences used to build a PWM model. Low sequence counts will produce poor representative PWM models. Default: "10"
`pseudo`	Small number to add to values for PWM log transformation to prevent log transformation of zero. Default = 0.01
`remove_center`	Remove all peptide seqeuences with the central amino acid matching a character (e.g. "y"). Default = FALSE
`verbose`	Print progress to screen. Default=FALSE

Value

Output is a list containing two tables, "pwm" and "kinase". To access PWMs: pwms$pwm and Table of Kinase and sequence counts: pwms$kinase

Examples

## Build PWM models from phosphositeplus data with default of minimum
## of 10 substrate sequences for building a PWM model.

data(phosphositeplus_human)

##randomly sample 1000 substrates for demonstration.
set.seed(1)
sample_pwm <- phosphositeplus_human[sample(nrow(phosphositeplus_human), 
1000),]
pwms <- buildPWM(sample_pwm)

## Data frame of models built and number of sequences used to build each
## PWM model:
head(pwms$kinase)

## Build PWM models from phosphositeplus data with default of minimum
## of 10 substrate sequences for building a PWM model.

data(phosphositeplus_human)

##randomly sample 1000 substrates for demonstration.
set.seed(1)
sample_pwm <- phosphositeplus_human[sample(nrow(phosphositeplus_human), 
1000),]
pwms <- buildPWM(sample_pwm)

## Data frame of models built and number of sequences used to build each
## PWM model:
head(pwms$kinase)

Function for extracting peptide sequences from multimapped or complex annotated data

Description

This function extracts unique peptide:annotation combinations from complex annotated data and formats for further analysis using KinSwingR. For instance, example input annotation may be: "A0A096MIX2|Ddx17|494|RSRYRTTSSANNPN". This function will extract the peptide sequence into a second column and associate it all annotations. See vignette for more details.

Usage

cleanAnnotation(input_data = NULL, annotation_delimiter = "|",
  multi_protein_delimiter = ":", multi_site_delimiter = ";",
  seq_number = 4, replace = FALSE, replace_search = "X",
  replace_with = "_", verbose = FALSE)
cleanAnnotation(input_data = NULL, annotation_delimiter = "|",
  multi_protein_delimiter = ":", multi_site_delimiter = ";",
  seq_number = 4, replace = FALSE, replace_search = "X",
  replace_with = "_", verbose = FALSE)

Arguments

`input_data`	A data.frame of phosphopeptide data. Must contain 4 columns and the following format must be adhered to. Column 1 - Annotation, Column 2 - centered peptide sequence, Column 3 - Fold Change [-ve to +ve], Column 4 - p-value [0-1]. This will extract the peptide sequences from Column1 and replace all values in Column2 to be used in scoreSequences(). Where peptide sequences have not been extracted from the annotation, leave Column2 as NA's.
`annotation_delimiter`	The character used to delimit annotations. Default="\|"
`multi_protein_delimiter`	The character used to delimit multi-protein assignments. Default=":". E.g. Ddx17:Ddx2
`multi_site_delimiter`	The character used to delimit multi-site assignments. Default=";". E.g. 494;492
`seq_number`	The annotation frame that contains the sequence after delimitation. E.g. The sequence "RSRYRTTSSANNPN" is contained in the 4th annotation frame of the following annotation: "A0A096MIX2\|Ddx17\|494\|RSRYRTTSSANNPN" and would therefore set seq_number=4. Default=4
`replace`	Replace a letter that describes sequences outside of the protein after centering on the phosphosite (e.g X in XXXMERSTRELCLNF). Use in combination with replace_search and replace_with to replace amino acids. Options are "TRUE" or "FALSE". Default="FALSE".
`replace_search`	Amino Acid to search for when replacing sequences. Default="X"
`replace_with`	Amino Acid to replace with when replacing sequences. Default="_"
`verbose`	Print progress to screen. Default=FALSE

Value

A data.table with the peptides extracted from the annotation column

Examples

## Extract peptide sequences from annotation data:
data(example_phosphoproteome)

## A0A096MJ61|NA|89|PRRVRNLSAVLAART
## The following will extract all the uniquely annotated peptide
## sequences from the "annotation" column and place these in the
## "peptide" column. Where multi-mapped peptide sequences are input,
## these are placed on a new line.
##
## Here, sequences with a "X" and also replaced with a "_". This is ensure 
## that PWMs are built correctly.

## Sample data for demonstration:
sample_data <- head(example_phosphoproteome)
annotated_data <- cleanAnnotation(input_data = sample_data,
                                   annotation_delimiter = "|",
                                   multi_protein_delimiter = ":",
                                   multi_site_delimiter = ";",
                                   seq_number = 4,
                                   replace = TRUE,
                                   replace_search = "X",
                                   replace_with = "_")

## Return the annotated data with extracted peptides:
head(annotated_data)

## Extract peptide sequences from annotation data:
data(example_phosphoproteome)

## A0A096MJ61|NA|89|PRRVRNLSAVLAART
## The following will extract all the uniquely annotated peptide
## sequences from the "annotation" column and place these in the
## "peptide" column. Where multi-mapped peptide sequences are input,
## these are placed on a new line.
##
## Here, sequences with a "X" and also replaced with a "_". This is ensure 
## that PWMs are built correctly.

## Sample data for demonstration:
sample_data <- head(example_phosphoproteome)
annotated_data <- cleanAnnotation(input_data = sample_data,
                                   annotation_delimiter = "|",
                                   multi_protein_delimiter = ":",
                                   multi_site_delimiter = ";",
                                   seq_number = 4,
                                   replace = TRUE,
                                   replace_search = "X",
                                   replace_with = "_")

## Return the annotated data with extracted peptides:
head(annotated_data)

Example phosphoproteome.

Description

A dataset containing annotated subtrate sequences derived from XXX. See original publication for more details: Engholm-Keller & Waardenberg AJ et al.

Usage

example_phosphoproteome
example_phosphoproteome

Format

A data frame with 6215 rows and 4 variables:

annotation: Annotation of phosphorylated peptides
peptide: blank - peptides need to be extracted from annotation
fc: Fold Change (log2)
pval: P-value for fold-change.

KinSwingR: A package for predicting kinase activity

Description

This package provides functionality for kinase-subtrate prediction, and integration with phosphopeptide fold change and signficance to assess the local connectivity (swing) of kinase-substrate networks. The final output of KinSwingR is a score that is normalised and weighted for prediction of kinase activity.

Details

Contact [email protected] for questions relating to functionality.

buildPWM function

Builds PWMs for kinases from a table of kinases and known substrate sequences.

scoreSequences function

Score kinase PWMs matches against a set of peptide seqeuences.

swing function

Integrates kinase PWMs matches against peptide seqeuences and directionality as well as significance of peptides for prediction of kinase activity.

cleanAnnotation function

Function for extracting peptides from multimapped data

Human kinase-substrates derived from PhosphositePlus.

Description

A dataset containing human kinases and subtrate sequences. See original publication for more details: Hornbeck et al. Nucleic Acids Res. 40:D261-70, 2012

Usage

phosphositeplus_human
phosphositeplus_human

Format

A data frame with 11985 rows and 2 variables:

kinase: human kinase gene symbol
substrate: centered substrate sequence for kinase

Source

https://www.phosphosite.org/

Score substrate sequences for matches to kinase Position Weight Matrices (PWMs)

Description

Scores each input sequence for a match against all PWMs provided from buildPWM() and generates p-values for scores. The output of this function is to be used for building the swing metric, the predicted activity of kinases.

Usage

scoreSequences(input_data = NULL, pwm_in = NULL,
  background = "random", n = 1000, force_trim = FALSE,
  verbose = FALSE)
scoreSequences(input_data = NULL, pwm_in = NULL,
  background = "random", n = 1000, force_trim = FALSE,
  verbose = FALSE)

Arguments

`input_data`	A data.frame of phoshopeptide data. Must contain 4 columns and the following format must be adhered to. Column 1 - Annotation, Column 2 - centered peptide sequence, Column 3 - Fold Change [-ve to +ve], Column 4 - p-value [0-1]
`pwm_in`	List of PWMs created using buildPWM()
`background`	Option to provide a data.frame of peptides to use as background. If providing a background as a table, this must contain two columns; Column 1 - Annotation, Column 2 - centered peptide sequence. These must be centered. OR generate a random background for PWM scoring from the input list - background = random. Default: "random"
`n`	Number of permutations to perform for generating background. Default: "1000"
`force_trim`	This function will detect if a peptide sequence is of different length to the PWM models generated (provided in pwm_in) and trim the input sequences to the same length as the PWM models. If a background is provided, this will also be trimmed to the same width as the PWM models. Options are: "TRUE, FALSE". Default = FALSE
`verbose`	Turn verbosity on/off. To turn on, verbose=TRUE. Options are: "TRUE, FALSE". Default = FALSE

Value

A list with 3 elements: 1) PWM-substrate scores: substrate_scores$peptide_scores, 2) PWM-substrate p-values: substrate_scores$peptide_p 3) Background used for reproducibility: substrate_scores$background 4) input_data is returned in the case that it was trimmed.

Examples

## import data
data(example_phosphoproteome)
data(phosphositeplus_human)

## clean up the annotations
## sample 100 data points for demonstration
sample_data <- head(example_phosphoproteome, 100)
annotated_data <- cleanAnnotation(input_data = sample_data)

## build the PWM models:
set.seed(1234)
sample_pwm <- phosphositeplus_human[sample(nrow(phosphositeplus_human), 
1000),]
pwms <- buildPWM(sample_pwm)

## score the PWM - substrate matches
## Using a "random" background, to calculate the p-value of the matches
## Using n=10 for demonstration
## set.seed for reproducibility
set.seed(1234)
substrate_scores <- scoreSequences(input_data = annotated_data,
                                   pwm_in = pwms,
                                   background = "random",
                                   n = 10)

## import data
data(example_phosphoproteome)
data(phosphositeplus_human)

## clean up the annotations
## sample 100 data points for demonstration
sample_data <- head(example_phosphoproteome, 100)
annotated_data <- cleanAnnotation(input_data = sample_data)

## build the PWM models:
set.seed(1234)
sample_pwm <- phosphositeplus_human[sample(nrow(phosphositeplus_human), 
1000),]
pwms <- buildPWM(sample_pwm)

## score the PWM - substrate matches
## Using a "random" background, to calculate the p-value of the matches
## Using n=10 for demonstration
## set.seed for reproducibility
set.seed(1234)
substrate_scores <- scoreSequences(input_data = annotated_data,
                                   pwm_in = pwms,
                                   background = "random",
                                   n = 10)

Swing statistic

Description

This function integrates the kinase-substrate predictions, directionality of phosphopeptide fold change and signficance to assess local connectivity (swing) of kinase-substrate networks. The final score is a normalised and weighted score of predicted kinase activity. If permutations are selected, network node:edges are permutated. P-values will be calculated for both ends of the distribution of swing scores (positive and negative swing scores).

Usage

swing(input_data = NULL, pwm_in = NULL, pwm_scores = NULL,
  pseudo_count = 1, p_cut_pwm = 0.05, p_cut_fc = 0.05,
  permutations = 1000, return_network = FALSE, verbose = FALSE)
swing(input_data = NULL, pwm_in = NULL, pwm_scores = NULL,
  pseudo_count = 1, p_cut_pwm = 0.05, p_cut_fc = 0.05,
  permutations = 1000, return_network = FALSE, verbose = FALSE)

Arguments

`input_data`	A data.frame of phoshopeptide data. Must contain 4 columns and the following format must be adhered to. Column 1 - Annotation, Column 2 - centered peptide sequence, Column 3 - Fold Change [-ve to +ve], Column 4 - p-value [0-1]. This must be the same dataframe used in scoreSequences()
`pwm_in`	List of PWMs created using buildPWM()
`pwm_scores`	List of PWM-substrate scores created using scoreSequences()
`pseudo_count`	Pseudo-count acts at two levels. 1) It adds a small number to the counts to avoid zero divisions, which also 2) avoids log-zero transformations. Note that this means that pos, neg and all values in the output table include the addition of the pseudo-count. Default: "1"
`p_cut_pwm`	Significance level for determining a significant kinase-substrate enrichment. Default: "0.05"
`p_cut_fc`	Significance level for determining a significant level of Fold-change in the phosphoproteomics data. Default: "0.05"
`permutations`	Number of permutations to perform. This will shuffle the kinase-subtrate edges of the network n times. To not perform permutations and only generate the scores, set permutations=1 or permutations=FALSE. Default: "1000"
`return_network`	Option to return an interaction network for visualising in cystoscape. Default = FALSE
`verbose`	Turn verbosity on/off. To turn on, verbose=TRUE. Options are: "TRUE, FALSE". Default=FALSE

Value

A data.table of swing scores

Examples

## import data
data(example_phosphoproteome)
data(phosphositeplus_human)

## clean up the annotations
## sample 100 data points for demonstration
sample_data <- head(example_phosphoproteome, 100)
annotated_data <- cleanAnnotation(input_data = sample_data)

## build the PWM models:
set.seed(1234)
sample_pwm <- phosphositeplus_human[sample(nrow(phosphositeplus_human), 
1000),]
pwms <- buildPWM(sample_pwm)

## score the PWM - substrate matches
## Using a "random" background, to calculate the p-value of the matches
## Using n = 100 for demonstration
## set.seed for reproducibility
set.seed(1234)
substrate_scores <- scoreSequences(input_data = annotated_data,
                                   pwm_in = pwms,
                                   background = "random",
                                   n = 100)

## Use substrate_scores and annotated_data data to predict kinase activity.
## This will permute the network node and edges 10 times for demonstration.
## set.seed for reproducibility
set.seed(1234)
swing_output <- swing(input_data = annotated_data,
                      pwm_in = pwms,
                      pwm_scores = substrate_scores,
                      permutations = 10)

## import data
data(example_phosphoproteome)
data(phosphositeplus_human)

## clean up the annotations
## sample 100 data points for demonstration
sample_data <- head(example_phosphoproteome, 100)
annotated_data <- cleanAnnotation(input_data = sample_data)

## build the PWM models:
set.seed(1234)
sample_pwm <- phosphositeplus_human[sample(nrow(phosphositeplus_human), 
1000),]
pwms <- buildPWM(sample_pwm)

## score the PWM - substrate matches
## Using a "random" background, to calculate the p-value of the matches
## Using n = 100 for demonstration
## set.seed for reproducibility
set.seed(1234)
substrate_scores <- scoreSequences(input_data = annotated_data,
                                   pwm_in = pwms,
                                   background = "random",
                                   n = 100)

## Use substrate_scores and annotated_data data to predict kinase activity.
## This will permute the network node and edges 10 times for demonstration.
## set.seed for reproducibility
set.seed(1234)
swing_output <- swing(input_data = annotated_data,
                      pwm_in = pwms,
                      pwm_scores = substrate_scores,
                      permutations = 10)

View motif

Description

View information content for each position of the PWM. Information content is modelled using Shannon's Entropy Model. The maximum information content is therefore log2(n), where n is the number of amino acids. Colors of Amino Acids are in accordance with the Lesk scheme.

Usage

viewPWM(pwm_in = NULL, which_pwm = NULL, fontsize = 10,
  view_pwm = FALSE, pseudo = 0.01, convert_PWM = FALSE,
  color_scheme = "shapely", correction_factor = NULL)
viewPWM(pwm_in = NULL, which_pwm = NULL, fontsize = 10,
  view_pwm = FALSE, pseudo = 0.01, convert_PWM = FALSE,
  color_scheme = "shapely", correction_factor = NULL)

Arguments

`pwm_in`	View a PWM provided using the buildPWM. Default = NULL
`which_pwm`	If pwms are input (outputs of buildPWM), a kinase name must match a name in pwms$kinase$kinase list of names. Default = NULL
`fontsize`	Font size to use on x and y axis. Default = 10
`view_pwm`	View the PWM. Default = FALSE
`pseudo`	Small amount added to the PWM model, where zero's exist, to avoid log zero. Default = 0.01
`convert_PWM`	pwm_in is a matrix of counts at position. TRUE will convert this matrix to a PWM. Default = FALSE
`color_scheme`	Which color scheme to use for Amino Acid Groups. Options are "lesk" or "shapely". Default = "shapely"
`correction_factor`	Number of sequences used to infer the PWM. This can be used where a small number of sequences were used to build the model and included as E_n in the Shannon's Entropy Model. Default = NULL

Value

Visualisation of a motif, scaled on bits and two tables. 1) pwm: corresponding to the PWM from pwm and 2) pwm_bits: corresponding to the conversion to bits.

Examples

## Build PWM models from phosphositeplus data with default of minimum
## of 10 substrate sequences for building a PWM model.
data(phosphositeplus_human)
##randomly sample 1000 substrates for demonstration.
set.seed(1)
sample_pwm <- phosphositeplus_human[sample(nrow(phosphositeplus_human), 
1000),]
pwms <- buildPWM(sample_pwm)

## Data frame of models built and number of sequences used to build each
## PWM model:
head(pwms$kinase)
## Will not visualise the motif
CAMK2A_motif <- viewPWM(pwm_in = pwms, 
                        which_pwm = "CAMK2A",
                        view_pwm = FALSE)
# Use view_pwm = TRUE to view the motif
## Build PWM models from phosphositeplus data with default of minimum
## of 10 substrate sequences for building a PWM model.
data(phosphositeplus_human)
##randomly sample 1000 substrates for demonstration.
set.seed(1)
sample_pwm <- phosphositeplus_human[sample(nrow(phosphositeplus_human), 
1000),]
pwms <- buildPWM(sample_pwm)

## Data frame of models built and number of sequences used to build each
## PWM model:
head(pwms$kinase)
## Will not visualise the motif
CAMK2A_motif <- viewPWM(pwm_in = pwms, 
                        which_pwm = "CAMK2A",
                        view_pwm = FALSE)
# Use view_pwm = TRUE to view the motif

Package 'KinSwingR'

Help Index

Generate Position Weight Matrices (PWMs)

Description

Usage

Arguments

Value

Examples

Function for extracting peptide sequences from multimapped or complex annotated data

Description

Usage

Arguments

Value

Examples

Example phosphoproteome.

Description

Usage

Format

KinSwingR: A package for predicting kinase activity

Description

Details

buildPWM function

scoreSequences function

swing function

cleanAnnotation function

Human kinase-substrates derived from PhosphositePlus.

Description

Usage

Format

Source

Score substrate sequences for matches to kinase Position Weight Matrices (PWMs)

Description

Usage

Arguments

Value

Examples

Swing statistic

Description

Usage

Arguments

Value

Examples

View motif

Description

Usage

Arguments

Value

Examples