Title: | Package to Draw Protein Schematics from Uniprot API output |
---|---|
Description: | This package draws protein schematics from Uniprot API output. From the JSON returned by the GET command, it creates a dataframe from the Uniprot Features API. This dataframe can then be used by geoms based on ggplot2 and base R to draw protein schematics. |
Authors: | Paul Brennan [aut, cre] |
Maintainer: | Paul Brennan <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.27.0 |
Built: | 2024-11-30 05:49:49 UTC |
Source: | https://github.com/bioc/drawProteins |
draw_canvas
uses the dataframe containing the protein features to
creates the basic plot element by determining the length of the longest
protein and the number of proteins to plot.
draw_canvas(data)
draw_canvas(data)
data |
Dataframe of one or more rows with the following column names: 'type', 'description', 'begin', 'end', 'length', 'accession', 'entryName', 'taxid', 'order'. Must contain a minimum of one "CHAIN" as data$type. |
A ggplot2 object either in the plot window or as an object.
# draws a blank canvas of the correct size data("five_rel_data") draw_canvas(five_rel_data)
# draws a blank canvas of the correct size data("five_rel_data") draw_canvas(five_rel_data)
draw_chains
uses the dataframe containing the protein features to
plot the chains, the full length proteins. It creates the basic plot element
by determining the length of the longest protein. The ggplot2 function
geom_rect
is then used to draw each of the protein
chains proportional to their number of amino acids (length).
draw_chains(p, data = data, outline = "black", fill = "grey", label_chains = TRUE, labels = data[data$type == "CHAIN",]$entryName, size = 0.5, label_size = 4)
draw_chains(p, data = data, outline = "black", fill = "grey", label_chains = TRUE, labels = data[data$type == "CHAIN",]$entryName, size = 0.5, label_size = 4)
p |
ggplot2 object ideally created with |
data |
Dataframe of one or more rows with the following column names: 'type', 'description', 'begin', 'end', 'length', 'accession', 'entryName', 'taxid', 'order'. Must contain a minimum of one "CHAIN" as data$type. |
outline |
Colour of the outline of each chain. |
fill |
Colour of the fill of each chain. |
label_chains |
Option to label chains or not. |
labels |
Vector with source of names for the chains. EntryName used as default but can be changed. |
size |
Size of the outline of the chains. |
label_size |
Size of the text used for labels. |
A ggplot2 object either in the plot window or as an object.
# combines with draw_canvas to plot and label chains. data("five_rel_data") p <- draw_canvas(five_rel_data) draw_chains(p, five_rel_data) # draws five chains with different colours to default data("five_rel_data") p <- draw_canvas(five_rel_data) draw_chains(p, five_rel_data, label_chains = FALSE, fill = "red", outline = "grey")
# combines with draw_canvas to plot and label chains. data("five_rel_data") p <- draw_canvas(five_rel_data) draw_chains(p, five_rel_data) # draws five chains with different colours to default data("five_rel_data") p <- draw_canvas(five_rel_data) draw_chains(p, five_rel_data, label_chains = FALSE, fill = "red", outline = "grey")
draw_domains
adds domains to the ggplot2 object created by
draw_chains
.
It uses the data object.
The ggplot2 function geom_rect
is used to draw each of the domain
chains proportional to their number of amino acids (length).
draw_domains(p, data = data, label_domains = TRUE, label_size = 4, show.legend = TRUE, type = "DOMAIN")
draw_domains(p, data = data, label_domains = TRUE, label_size = 4, show.legend = TRUE, type = "DOMAIN")
p |
ggplot2 object ideally created with |
data |
Dataframe of one or more rows with the following column names: 'type', 'description', 'begin', 'end', 'length', 'accession', 'entryName', 'taxid', 'order'. Must contain a minimum of one "CHAIN" as data$type. |
label_domains |
Option to label domains or not. |
label_size |
Size of the text used for labels. |
show.legend |
Option to include legend in this layer |
type |
Can change to show other protein features |
A ggplot2 object either in the plot window or as an object with an additional geom_rect layer.
# combines with draw_chains to plot chains and domains. data("five_rel_data") p <- draw_canvas(five_rel_data) p <- draw_chains(p, five_rel_data, label_size = 1.25) draw_domains(p, five_rel_data)
# combines with draw_chains to plot chains and domains. data("five_rel_data") p <- draw_canvas(five_rel_data) p <- draw_chains(p, five_rel_data, label_size = 1.25) draw_domains(p, five_rel_data)
draw_folding
adds alpha-helixes, beta-strands and turns to the
ggplot2 object created by draw_chains
.
It uses the data object.
The ggplot2 function geom_rect
is used to draw parts of the protein
chain which has alpha-helixes, beta-strands and turns proportional to the
number of amino acids (length).
draw_folding(p, data = data, show.legend = TRUE,show_strand = TRUE,show_helix = TRUE, show_turn = TRUE)
draw_folding(p, data = data, show.legend = TRUE,show_strand = TRUE,show_helix = TRUE, show_turn = TRUE)
p |
ggplot2 object ideally created with |
data |
Dataframe of one or more rows with the following column names: 'type', 'description', 'begin', 'end', 'length', 'accession', 'entryName', 'taxid', 'order'. Uses STRAND, HELIX and TURN type to indicate these parts of the proteins. |
show.legend |
Option to include legend in this layer |
show_strand |
Option to show STRAND in this layer |
show_helix |
Option to show HELIX in this layer |
show_turn |
Option to show TURN in this layer |
A ggplot2 object either in the plot window or as an object with an additional geom_rect layer.
# combines with draw_chains to colour chain with helicies, strands and turns. data("five_rel_data") p <- draw_canvas(five_rel_data) p <- draw_chains(p, five_rel_data, label_size = 1.25) draw_folding(p, five_rel_data)
# combines with draw_chains to colour chain with helicies, strands and turns. data("five_rel_data") p <- draw_canvas(five_rel_data) p <- draw_chains(p, five_rel_data, label_size = 1.25) draw_folding(p, five_rel_data)
draw_motif
adds protein motifs from Uniprot to ggplot2 object created
by draw_canvas
and draw_chains
.
It uses the data object.
The ggplot2 function geom_rect
is used to draw each of the
motifs proportional to their number of amino acids (length).
draw_motif(p, data = data, show.legend = TRUE)
draw_motif(p, data = data, show.legend = TRUE)
p |
ggplot2 object ideally created with |
data |
Dataframe of one or more rows with the following column names: 'type', 'description', 'begin', 'end', 'length', 'accession', 'entryName', 'taxid', 'order'. Must contain a minimum of one "CHAIN" as data$type. |
show.legend |
Option to include legend in this layer |
A ggplot2 object either in the plot window or as an object with an additional geom_rect layer.
# combines with draw_chains to plot chains and motifs data("five_rel_data") p <- draw_canvas(five_rel_data) p <- draw_chains(p, five_rel_data, label_size = 1.25) draw_motif(p, five_rel_data)
# combines with draw_chains to plot chains and motifs data("five_rel_data") p <- draw_canvas(five_rel_data) p <- draw_chains(p, five_rel_data, label_size = 1.25) draw_motif(p, five_rel_data)
draw_phospho
adds phosphorylation sites to ggplot2 object created by
draw_canvas
and draw_chains
.
It uses the data object.
The ggplot2 function
geom_point
is used to draw each of the
phosphorylation sites at their location as determined by data object.
draw_phospho(p, data = data, size = 2, fill = "yellow", show.legend = FALSE)
draw_phospho(p, data = data, size = 2, fill = "yellow", show.legend = FALSE)
p |
ggplot2 object ideally created with |
data |
Dataframe of one or more rows with the following column names: 'type', 'description', 'begin', 'end', 'length', 'accession', 'entryName', 'taxid', 'order'. Must contain a minimum of one "CHAIN" as data$type. |
size |
Size of the circle |
fill |
Colour of the circle. |
show.legend |
Option to include legend in this layer |
A ggplot2 object either in the plot window or as an object with an additional geom_point layer.
# combines will with draw_domains to plot chains and phosphorylation sites. data("five_rel_data") p <- draw_canvas(five_rel_data) p <- draw_chains(p, five_rel_data, label_size = 1.25) draw_phospho(p, five_rel_data)
# combines will with draw_domains to plot chains and phosphorylation sites. data("five_rel_data") p <- draw_canvas(five_rel_data) p <- draw_chains(p, five_rel_data, label_size = 1.25) draw_phospho(p, five_rel_data)
draw_recept_dom
adds receptor domains to the ggplot2 object created by
draw_chains
.
It uses the data object.
The ggplot2 function geom_rect
is used to draw each of the domain
chains proportional to their number of amino acids (length).
draw_recept_dom(p, data = data, label_domains = FALSE, label_size = 4, show.legend = TRUE)
draw_recept_dom(p, data = data, label_domains = FALSE, label_size = 4, show.legend = TRUE)
p |
ggplot2 object ideally created with |
data |
Dataframe of one or more rows with the following column names: 'type', 'description', 'begin', 'end', 'length', 'accession', 'entryName', 'taxid', 'order'. Uses TOPO_DOM and TRANSMEM type to plot these parts of receptors |
label_domains |
Option to label receptor domains or not. |
label_size |
Size of the text used for labels. |
show.legend |
Option to include legend in this layer |
A ggplot2 object either in the plot window or as an object with an additional geom_rect layer.
# combines with draw_chains to plot chains and domains. # we like to draw receptors vertically so flip using ggplot2 functions # scale_x_reverse and coord_flip data("tnfs_data") p <- draw_canvas(tnfs_data) p <- draw_chains(p, tnfs_data, label_size = 1.25) draw_recept_dom(p, tnfs_data) + ggplot2::scale_x_reverse() + ggplot2::coord_flip()
# combines with draw_chains to plot chains and domains. # we like to draw receptors vertically so flip using ggplot2 functions # scale_x_reverse and coord_flip data("tnfs_data") p <- draw_canvas(tnfs_data) p <- draw_chains(p, tnfs_data, label_size = 1.25) draw_recept_dom(p, tnfs_data) + ggplot2::scale_x_reverse() + ggplot2::coord_flip()
draw_regions
adds protein regions from Uniprot to ggplot2 object
created by draw_canvas
draw_chains
.
It uses the data object.
The ggplot2 function geom_rect
is used to draw each of the
regions proportional to their number of amino acids (length).
draw_regions(p, data = data, show.legend=TRUE)
draw_regions(p, data = data, show.legend=TRUE)
p |
ggplot2 object ideally created with |
data |
Dataframe of one or more rows with the following column names: 'type', 'description', 'begin', 'end', 'length', 'accession', 'entryName', 'taxid', 'order'. Must contain a minimum of one "CHAIN" as data$type. |
show.legend |
Option to include legend in this layer |
A ggplot2 object either in the plot window or as an object with an additional geom_rect layer.
# combines with draw_chains to plot chains and regions. data("five_rel_data") p <- draw_canvas(five_rel_data) p <- draw_chains(p, five_rel_data, label_size = 1.25) draw_regions(p, five_rel_data)
# combines with draw_chains to plot chains and regions. data("five_rel_data") p <- draw_canvas(five_rel_data) p <- draw_chains(p, five_rel_data, label_size = 1.25) draw_regions(p, five_rel_data)
draw_repeat
adds protein repeats from Uniprot to ggplot2 object
created by draw_canvas
and draw_chains
.
It uses the data object.
The ggplot2 function geom_rect
is used to draw each of the motifs proportional to their number of
amino acids (length).
draw_repeat(p, data = data, label_size = 2, outline = "dimgrey", fill = "dimgrey", label_repeats = TRUE, show.legend = TRUE)
draw_repeat(p, data = data, label_size = 2, outline = "dimgrey", fill = "dimgrey", label_repeats = TRUE, show.legend = TRUE)
p |
ggplot2 object ideally created with |
data |
Dataframe of one or more rows with the following column names: 'type', 'description', 'begin', 'end', 'length', 'accession', 'entryName', 'taxid', 'order'. Must contain a minimum of one "CHAIN" as data$type. |
label_size |
Size of text used for labels of protein repeats. |
outline |
Colour of the outline of each repeat. |
fill |
Colour of the fill of each repeat. |
label_repeats |
Option to label repeats or not. |
show.legend |
Option to include legend in this layer |
A ggplot2 object either in the plot window or as an object with an additional geom_rect layer.
# combines with draw_chains to plot chains and repeats. data("five_rel_data") p <- draw_canvas(five_rel_data) p <- draw_chains(p, five_rel_data, label_size = 1.25) draw_repeat(p, five_rel_data)
# combines with draw_chains to plot chains and repeats. data("five_rel_data") p <- draw_canvas(five_rel_data) p <- draw_chains(p, five_rel_data, label_size = 1.25) draw_repeat(p, five_rel_data)
This package has been created to allow the visualisation of protein schematics based on the data obtained from the [Uniprot Protein Database](http://www.uniprot.org/).
Converts the list of 6 JSON object created by getting the features from UniProt. Used in the feature_to_dataframe(). Does not give order. Does not operate on List of lists - just the list of 6.
extract_feat_acc(features_list)
extract_feat_acc(features_list)
features_list |
A JSON object - list of 6 with features inside. Created as one of the lists in the list of lists by the get_features() function. |
A dataframe with features: "type", "description", "begin", "end" and adds accession, entryName and taxid for each row.
data("five_rel_list") one_protein_features <- extract_feat_acc(five_rel_list[[1]]) head(one_protein_features)
data("five_rel_list") one_protein_features <- extract_feat_acc(five_rel_list[[1]]) head(one_protein_features)
Extracts protein names from JSON object produced by a search of Uniprot with a single protein asking for all the information. The search produces a Large list that contains all the Uniprot information about a protein.
extract_names(protein_json)
extract_names(protein_json)
protein_json |
A JSON object from a search with 14 primary parts |
A List of 6 with "accession", "name", "protein.recommendedName.fullName", gene.name.primary, gene.name.synonym and organism.name.scientific
# using internal data data("protein_json") prot_names <- extract_names(protein_json) # generates a list of 6 ## Not run: # access the Uniprot Protein API uniprot_acc <- c("Q04206") # change this for your fav protein # Get UniProt entry by accession acc_uniprot_url <- c("https://www.ebi.ac.uk/proteins/api/proteins?accession=") comb_acc_api <- paste0(acc_uniprot_url, uniprot_acc) # basic function is GET() which accesses the API # requires internet access protein <- httr::GET(comb_acc_api, accept_json()) status_code(protein) # returns a 200 means it worked # use content() function from httr to give us a list protein_json <- httr::content(protein) # gives a Large list # with 14 primary parts and lots of bits inside # function from my package to extract names of protein names <- extract_names(protein_json) ## End(Not run)
# using internal data data("protein_json") prot_names <- extract_names(protein_json) # generates a list of 6 ## Not run: # access the Uniprot Protein API uniprot_acc <- c("Q04206") # change this for your fav protein # Get UniProt entry by accession acc_uniprot_url <- c("https://www.ebi.ac.uk/proteins/api/proteins?accession=") comb_acc_api <- paste0(acc_uniprot_url, uniprot_acc) # basic function is GET() which accesses the API # requires internet access protein <- httr::GET(comb_acc_api, accept_json()) status_code(protein) # returns a 200 means it worked # use content() function from httr to give us a list protein_json <- httr::content(protein) # gives a Large list # with 14 primary parts and lots of bits inside # function from my package to extract names of protein names <- extract_names(protein_json) ## End(Not run)
This function works on the object returned by the get_features() function. It creates a data.frame of features and includes the accession number AND an order number. It uses the extract_feat_acc function
extract_transcripts(data)
extract_transcripts(data)
data |
Dataframe of one or more rows with the following column names: 'type', 'description', 'begin', 'end', 'length', 'accession', 'entryName', 'taxid', 'order'. Must contain a minimum of one "CHAIN" as data$type. |
A dataframe with extra rows if there were multiple transcripts present. Extra transcripts will have an order at the end of the object Each new row should have 9 variables including type, description, begin, end, length, accession, entryName, taxid and order for plotting.
data(five_rel_data) new_data <- extract_transcripts(five_rel_data) # because there are two entries with two transcripts max(new_data$order) # should now be 7...
data(five_rel_data) new_data <- extract_transcripts(five_rel_data) # because there are two entries with two transcripts max(new_data$order) # should now be 7...
This function works on the object returned by the get_features() function. It creates a data.frame of features and includes the accession number AND an order number. It uses the extract_feat_acc function below.
feature_to_dataframe(features_in_lists_of_six)
feature_to_dataframe(features_in_lists_of_six)
features_in_lists_of_six |
A list of lists returned by get_features() The number of lists corresponds to the number of accession numbers queried using get_features. The list of 6 contains protein names and features. |
A dataframe with 9 variables including type, description, begin, end, length, accession, entryName, taxid and order for plotting.
data("rel_json") rel_data <- feature_to_dataframe(rel_json) head(rel_data) data("five_rel_list") prot_data <- feature_to_dataframe(five_rel_list) head(prot_data)
data("rel_json") rel_data <- feature_to_dataframe(rel_json) head(rel_data) data("five_rel_list") prot_data <- feature_to_dataframe(five_rel_list) head(prot_data)
Dataframe features of 5 human NFkappaB proteins Uniprot on 1 Nov 2017
five_rel_data
five_rel_data
A data frame with 320 rows and 9 variables:
type of features - e.g. chain
long name for the protein
starting position (amino acid number) of feature
ending position (amino acid number) of feature
length of feature - number of amino acids
protein Uniprot accession number
protein Uniprot entry Name
taxonomic identification - species
plotting order from the bottom of the graph
A data frame with 320 rows and 9 variables
Uniprot http://www.uniprot.org Accession numbers Q04206 Q01201 Q04864 P19838 Q00653
List of features from five human NFkappaB proteins downloaded from Uniprot on 15 August 2017
five_rel_list
five_rel_list
Large List of 5 elements - one element for each protein
Large List of 5 elements - one element for each protein
Uniprot http://www.uniprot.org Accession numbers Q04206 Q01201 Q04864 P19838 Q00653
This function creates the URL required to query the UniProt API and returns the features of the protein or proteins in JSON format. It uses the GET() function from the httr package.
get_features(proteins_acc)
get_features(proteins_acc)
proteins_acc |
A vector of length 1 with one or more UniProt accession numbers separated by spaces. |
If there is internet access and the UniProt accession numbers are good, the function will return a list of lists. The list will be of length equivalent to the number of Uniprot accession numbers supplied. The lists inside will be of length 6 and will contain information about the proteins and the features.
# Requires internet access prot_data <- get_features("Q04206 Q01201 Q04864 P19838 Q00653")
# Requires internet access prot_data <- get_features("Q04206 Q01201 Q04864 P19838 Q00653")
parse_gff
loads a file or downloads from an url if provided
protein information that is then changed to allow it to work with
draw_canvas
and other draw functions in drawProteins.
parse_gff(file_or_link)
parse_gff(file_or_link)
file_or_link |
link in gff format or a file in gff format that can be
read by |
Dataframe of one or more rows with the following column names: 'accession', 'source', 'type', 'begin', 'end', 'order', 'entryName', 'description'. Must contain a minimum of one "CHAIN" as data$type to allow plotting.
data <- parse_gff("https://www.uniprot.org/uniprot/Q04206.gff")
data <- parse_gff("https://www.uniprot.org/uniprot/Q04206.gff")
Reduces data.frame of features to just phosphorylation sites. Uses a subsetting step and a grep with the pattern "Phospho".
phospho_site_info(features)
phospho_site_info(features)
features |
A dataframe of protein features, for example created by the feature_to_dataframe() function. |
A dataframe that only contains protein phosphorylation sites from Uniprot
data("five_rel_data") sites <- phospho_site_info(five_rel_data) head(sites)
data("five_rel_data") sites <- phospho_site_info(five_rel_data) head(sites)
Large list (968.8 Kb) of information about human Rel A downloaded from Uniprot on 1 November 2017
protein_json
protein_json
List of 1 with List of 14 inside
List of 6 - information necessary to draw Rel A/p65
http://www.uniprot.org/uniprot/Q04206
List of features from human Rel A downloaded from Uniprot on 15 August 2017
rel_A_features
rel_A_features
List of 6 - information necessary to draw Rel A/p65
List of 6 - information necessary to draw Rel A/p65
http://www.uniprot.org/uniprot/Q04206
List of 1 with List of 6 inside downloaded from Uniprot on 1 November 2017
rel_json
rel_json
List of 1 with List of 6 - information necessary to draw Rel A/p65
List of 1 with List of 6 - information necessary to draw Rel A/p65
http://www.uniprot.org/uniprot/Q04206
Dataframe features of 2 human TNF receptors from Uniprot on 3 Jan 2018
tnfs_data
tnfs_data
A data frame with 127 rows of 9 variables:
type of features - e.g. chain
long name for the protein
starting position (amino acid number) of feature
ending position (amino acid number) of feature
length of feature - number of amino acids
protein Uniprot accession number
protein Uniprot entry Name
taxonomic identification - species
plotting order from the bottom of the graph
A data frame with 127 rows and 9 variables
Uniprot http://www.uniprot.org Accession numbers P19438 P25942