Title: | Plot Multiple Sequence Alignment using 'ggplot2' |
---|---|
Description: | A visual exploration tool for multiple sequence alignment and associated data. Supports MSA of DNA, RNA, and protein sequences using 'ggplot2'. Multiple sequence alignment can easily be combined with other 'ggplot2' plots, such as phylogenetic tree Visualized by 'ggtree', boxplot, genome map and so on. More features: visualization of sequence logos, sequence bundles, RNA secondary structures and detection of sequence recombinations. |
Authors: | Lang Zhou [aut, cre], Guangchuang Yu [aut, ths] , Shuangbin Xu [ctb], Huina Huang [ctb] |
Maintainer: | Lang Zhou <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.13.0 |
Built: | 2024-10-30 08:05:07 UTC |
Source: | https://github.com/bioc/ggmsa |
adjust the tree branch position after assigning ancestor node
adjust_ally(tree, node, sub = FALSE, seq_colname = "mol_seq")
adjust_ally(tree, node, sub = FALSE, seq_colname = "mol_seq")
tree |
ggtree object |
node |
internal node in tree |
sub |
logical value. |
seq_colname |
the colname of MSA on tree$data |
tree
Lang Zhou
assign dms value to alignments.
assign_dms(x, dms)
assign_dms(x, dms)
x |
data frame from tidy_msa() |
dms |
dms data frame |
tree
Lang Zhou
This function lists color schemes currently available that can be used by 'ggmsa'
available_colors()
available_colors()
A character vector of available color schemes
Lang Zhou
available_colors()
available_colors()
This function lists font families currently available that can be used by 'ggmsa'
available_fonts()
available_fonts()
A character vector of available font family names
Lang Zhou
available_fonts()
available_fonts()
This function lists MSA objects currently available that can be used by 'ggmsa'
available_msa()
available_msa()
A character vector of available objects
Lang Zhou
available_msa()
available_msa()
extract ancestor sequence from tree data
extract_seq(tree_adjust, seq_colname = "mol_seq")
extract_seq(tree_adjust, seq_colname = "mol_seq")
tree_adjust |
ggtree object |
seq_colname |
the colname of MSA on tree$data |
character
Lang Zhou
The MSA would be plot in a field that you set.
facet_msa(field)
facet_msa(field)
field |
a numeric vector of the field size. |
ggplot layers
Lang Zhou
library(ggplot2) f <- system.file("extdata/sample.fasta", package="ggmsa") # 2 fields ggmsa(f, end = 120, font = NULL, color="Chemistry_AA") + facet_msa(field = 60) # 3 fields ggmsa(f, end = 120, font = NULL, color="Chemistry_AA") + facet_msa(field = 40)
library(ggplot2) f <- system.file("extdata/sample.fasta", package="ggmsa") # 2 fields ggmsa(f, end = 120, font = NULL, color="Chemistry_AA") + facet_msa(field = 60) # 3 fields ggmsa(f, end = 120, font = NULL, color="Chemistry_AA") + facet_msa(field = 40)
Multiple sequence alignment layer for ggplot2. It plot points of GC content.
geom_GC(show.legend = FALSE)
geom_GC(show.legend = FALSE)
show.legend |
logical. Should this layer be included in the legends? |
a ggplot layer
Lang Zhou
#plot GC content f <- system.file("extdata/LeaderRepeat_All.fa", package="ggmsa") ggmsa(f, font = NULL, color="Chemistry_NT") + geom_GC()
#plot GC content f <- system.file("extdata/LeaderRepeat_All.fa", package="ggmsa") ggmsa(f, font = NULL, color="Chemistry_NT") + geom_GC()
The layer of helix plot
geom_helix(helix_data, color_by = "length", overlap = FALSE, ...)
geom_helix(helix_data, color_by = "length", overlap = FALSE, ...)
helix_data |
a data frame. The file of nucleltide secondary structure and then read by readSSfile(). |
color_by |
generate colors for helices by various rules, including integer counts and value ranges one of "length" and "value" |
overlap |
Logicals. If TRUE, two structures data called predict and known must be given(eg:heilx_data = list(known = data1, predicted = data2)), plots the predicted helices that are known on top, predicted helices that are not known on the bottom, and finally plots unpredicted helices on top in black. |
... |
additional parameter |
ggplot2 layers
Lang Zhou
RF03120 <- system.file("extdata/Rfam/RF03120_SS.txt", package="ggmsa") RF03120_fas <- system.file("extdata/Rfam/RF03120.fasta", package="ggmsa") SS <- readSSfile(RF03120, type = "Vienna") ggmsa(RF03120_fas, font = NULL,border = NA, color = "Chemistry_NT", seq_name = FALSE) + geom_helix(SS)
RF03120 <- system.file("extdata/Rfam/RF03120_SS.txt", package="ggmsa") RF03120_fas <- system.file("extdata/Rfam/RF03120.fasta", package="ggmsa") SS <- readSSfile(RF03120, type = "Vienna") ggmsa(RF03120_fas, font = NULL,border = NA, color = "Chemistry_NT", seq_name = FALSE) + geom_helix(SS)
Multiple sequence alignment layer for ggplot2. It creates background tiles with/without sequence characters.
geom_msa( data, font = "helvetical", mapping = NULL, color = "Chemistry_AA", custom_color = NULL, char_width = 0.9, none_bg = FALSE, by_conservation = FALSE, position_highlight = NULL, seq_name = NULL, border = NULL, consensus_views = FALSE, use_dot = FALSE, disagreement = TRUE, ignore_gaps = FALSE, ref = NULL, position = "identity", show.legend = FALSE, dms = FALSE, position_color = FALSE, ... )
geom_msa( data, font = "helvetical", mapping = NULL, color = "Chemistry_AA", custom_color = NULL, char_width = 0.9, none_bg = FALSE, by_conservation = FALSE, position_highlight = NULL, seq_name = NULL, border = NULL, consensus_views = FALSE, use_dot = FALSE, disagreement = TRUE, ignore_gaps = FALSE, ref = NULL, position = "identity", show.legend = FALSE, dms = FALSE, position_color = FALSE, ... )
data |
sequence alignment with data frame, generated by tidy_msa(). |
font |
font families, possible values are 'helvetical', 'mono', and 'DroidSansMono', 'TimesNewRoman'. Defaults is 'helvetical'. |
mapping |
aes mapping If font = NULL, only plot the background tile. |
color |
A Color scheme. One of 'Clustal', 'Chemistry_AA', 'Shapely_AA', 'Zappo_AA', 'Taylor_AA', 'LETTER','CN6',, 'Chemistry_NT', 'Shapely_NT', 'Zappo_NT', 'Taylor_NT'. Defaults is 'Chemistry_AA'. |
custom_color |
A data frame with two column called "names" and "color".Customize the color scheme. |
char_width |
a numeric vector. Specifying the character width in the range of 0 to 1. Defaults is 0.9. |
none_bg |
a logical value indicating whether background should be displayed. Defaults is FALSE. |
by_conservation |
a logical value. The most conserved regions have the brightest colors. |
position_highlight |
A numeric vector of the position that need to be highlighted. |
seq_name |
a logical value indicating whether sequence names should be displayed. Defaults is 'NULL' which indicates that the sequence name is displayed when 'font = null', but 'font = char' will not be displayed. If 'seq_name = TRUE' the sequence name will be displayed in any case. If 'seq_name = FALSE' the sequence name will not be displayed under any circumstances. |
border |
a character string. The border color. |
consensus_views |
a logical value that opening consensus views. |
use_dot |
a logical value. Displays characters as dots instead of fading their color in the consensus view. |
disagreement |
a logical value. Displays characters that disagreement to consensus(excludes ambiguous disagreements). |
ignore_gaps |
a logical value. When selected TRUE, gaps in column are treated as if that row didn't exist. |
ref |
a character string. Specifying the reference sequence which should be one of input sequences when 'consensus_views' is TRUE. |
position |
Position adjustment, either as a string, or the result of a call to a position adjustment function, default is 'identity' meaning 'position_identity()'. |
show.legend |
logical. Should this layer be included in the legends? |
dms |
logical. |
position_color |
logical. |
... |
additional parameter |
A list
Guangchuang Yu, Lang Zhou seq_name' work position_highlight' work border' work none_bg' work
library(ggplot2) aln <- system.file("extdata", "sample.fasta", package = "ggmsa") tidy_aln <- tidy_msa(aln, start = 150, end = 170) ggplot() + geom_msa(data = tidy_aln, font = NULL) + coord_fixed()
library(ggplot2) aln <- system.file("extdata", "sample.fasta", package = "ggmsa") tidy_aln <- tidy_msa(aln, start = 150, end = 170) ggplot() + geom_msa(data = tidy_aln, font = NULL) + coord_fixed()
Multiple sequence alignment layer for ggplot2. It plot sequence conservation bar.
geom_msaBar()
geom_msaBar()
A list
Lang Zhou
#plot multiple sequence alignment and conservation bar. f <- system.file("extdata/sample.fasta", package="ggmsa") ggmsa(f, 221, 280, font = NULL, seq_name = TRUE) + geom_msaBar()
#plot multiple sequence alignment and conservation bar. f <- system.file("extdata/sample.fasta", package="ggmsa") ggmsa(f, 221, 280, font = NULL, seq_name = TRUE) + geom_msaBar()
Highlighting the seed in miRNA sequences
geom_seed(seed, star = FALSE)
geom_seed(seed, star = FALSE)
seed |
a character string.Specifying the miRNA seed sequence like 'GAGGUAG'. |
star |
a logical value indicating whether asterisks should be displayed. |
a ggplot layer
Lang Zhou
miRNA_sequences <- system.file("extdata/seedSample.fa", package="ggmsa") ggmsa(miRNA_sequences, font = 'DroidSansMono', color = "Chemistry_NT", none_bg = TRUE) + geom_seed(seed = "GAGGUAG", star = FALSE) ggmsa(miRNA_sequences, font = 'DroidSansMono', color = "Chemistry_NT") + geom_seed(seed = "GAGGUAG", star = TRUE)
miRNA_sequences <- system.file("extdata/seedSample.fa", package="ggmsa") ggmsa(miRNA_sequences, font = 'DroidSansMono', color = "Chemistry_NT", none_bg = TRUE) + geom_seed(seed = "GAGGUAG", star = FALSE) ggmsa(miRNA_sequences, font = 'DroidSansMono', color = "Chemistry_NT") + geom_seed(seed = "GAGGUAG", star = TRUE)
Multiple sequence alignment layer for ggplot2. It plot sequence motifs.
geom_seqlogo( font = "DroidSansMono", color = "Chemistry_AA", adaptive = TRUE, top = TRUE, custom_color = NULL, show.legend = FALSE, ... )
geom_seqlogo( font = "DroidSansMono", color = "Chemistry_AA", adaptive = TRUE, top = TRUE, custom_color = NULL, show.legend = FALSE, ... )
font |
font families, possible values are 'helvetical', 'mono', and 'DroidSansMono', 'TimesNewRoman'. Defaults is 'DroidSansMono'. |
color |
A Color scheme. One of 'Clustal', 'Chemistry_AA', 'Shapely_AA', 'Zappo_AA', 'Taylor_AA', 'LETTER', 'CN6', 'Chemistry_NT', 'Shapely_NT', 'Zappo_NT', 'Taylor_NT'. Defaults is 'Chemistry_AA'. |
adaptive |
A logical value indicating whether the overall height of seqlogo corresponds to the number of sequences.If is FALSE, seqlogo overall height = 4,fixedly. |
top |
A logical value. If TRUE, seqlogo is aligned to the top of MSA. |
custom_color |
A data frame with two cloumn called "names" and "color".Customize the color scheme. |
show.legend |
logical. Should this layer be included in the legends? |
... |
additional parameter |
A list
Lang Zhou
#plot multiple sequence alignment and sequence motifs f <- system.file("extdata/LeaderRepeat_All.fa", package="ggmsa") ggmsa(f,font = NULL,color = "Chemistry_NT") + geom_seqlogo()
#plot multiple sequence alignment and sequence motifs f <- system.file("extdata/LeaderRepeat_All.fa", package="ggmsa") ggmsa(f,font = NULL,color = "Chemistry_NT") + geom_seqlogo()
Plots nucleltide secondary structure as helices in arc diagram
gghelix(helix_data, color_by = "length", overlap = FALSE)
gghelix(helix_data, color_by = "length", overlap = FALSE)
helix_data |
a data frame. The file of nucleltide secondary structure and then read by readSSfile(). |
color_by |
generate colors for helices by various rules, including integer counts and value ranges one of "length" and "value" |
overlap |
Logicals. If TRUE, two structures data called predict and known must be given(eg:heilx_data = list(known = data1, predicted = data2)), plots the predicted helices that are known on top, predicted helices that are not known on the bottom, and finally plots unpredicted helices on top in black. |
ggplot object
Lang Zhou
RF03120 <- system.file("extdata/Rfam/RF03120_SS.txt", package="ggmsa") helix_data <- readSSfile(RF03120, type = "Vienna") gghelix(helix_data)
RF03120 <- system.file("extdata/Rfam/RF03120_SS.txt", package="ggmsa") helix_data <- readSSfile(RF03120, type = "Vienna") gghelix(helix_data)
plot MAF
ggmaf( data, ref, block_start = NULL, block_end = NULL, facet_field = NULL, heights = c(0.4, 0.6), facet_heights = NULL )
ggmaf( data, ref, block_start = NULL, block_end = NULL, facet_field = NULL, heights = c(0.4, 0.6), facet_heights = NULL )
data |
a tidy MAF data frame.You can get it by tidy_maf_df() |
ref |
character, the name of reference genome. eg:"hg38.chr1_KI270707v1_random" |
block_start |
a numeric vector(>0). The start block to plot. |
block_end |
a numeric vector(< max block). The end block to plot. |
facet_field |
a numeric vector. The field in a facet panel. |
heights |
two numeric vector.The plot proportion between "Genomic location" panel(upon) and "Alignment" panel(down). Default:c(0.4,0.6) |
facet_heights |
Numeric vectors.The facet proportion. |
ggplot object
Lang Zhou
Plot multiple sequence alignment using ggplot2 with multiple color schemes supported.
ggmsa( msa, start = NULL, end = NULL, font = "helvetical", color = "Chemistry_AA", custom_color = NULL, char_width = 0.9, none_bg = FALSE, by_conservation = FALSE, position_highlight = NULL, seq_name = NULL, border = NULL, consensus_views = FALSE, use_dot = FALSE, disagreement = TRUE, ignore_gaps = FALSE, ref = NULL, show.legend = FALSE )
ggmsa( msa, start = NULL, end = NULL, font = "helvetical", color = "Chemistry_AA", custom_color = NULL, char_width = 0.9, none_bg = FALSE, by_conservation = FALSE, position_highlight = NULL, seq_name = NULL, border = NULL, consensus_views = FALSE, use_dot = FALSE, disagreement = TRUE, ignore_gaps = FALSE, ref = NULL, show.legend = FALSE )
msa |
Multiple aligned sequence files or objects representing either nucleotide sequences or AA sequences. |
start |
a numeric vector. Start position to plot. |
end |
a numeric vector. End position to plot. |
font |
font families, possible values are 'helvetical', 'mono', and 'DroidSansMono', 'TimesNewRoman'. Defaults is 'helvetical'. If font = NULL, only plot the background tile. |
color |
a Color scheme. One of 'Clustal', 'Chemistry_AA', 'Shapely_AA', 'Zappo_AA', 'Taylor_AA', 'LETTER', 'CN6', 'Chemistry_NT', 'Shapely_NT', 'Zappo_NT', 'Taylor_NT'. Defaults is 'Chemistry_AA'. |
custom_color |
A data frame with two column called "names" and "color".Customize the color scheme. |
char_width |
a numeric vector. Specifying the character width in the range of 0 to 1. Defaults is 0.9. |
none_bg |
a logical value indicating whether background should be displayed. Defaults is FALSE. |
by_conservation |
a logical value. The most conserved regions have the brightest colors. |
position_highlight |
A numeric vector of the position that need to be highlighted. |
seq_name |
a logical value indicating whether sequence names should be displayed. Defaults is 'NULL' which indicates that the sequence name is displayed when 'font = null', but 'font = char' will not be displayed. If 'seq_name = TRUE' the sequence name will be displayed in any case. If 'seq_name = FALSE' the sequence name will not be displayed under any circumstances. |
border |
a character string. The border color. |
consensus_views |
a logical value that opening consensus views. |
use_dot |
a logical value. Displays characters as dots instead of fading their color in the consensus view. |
disagreement |
a logical value. Displays characters that disagreememt to consensus(excludes ambiguous disagreements). |
ignore_gaps |
a logical value. When selected TRUE, gaps in column are treated as if that row didn't exist. |
ref |
a character string. Specifying the reference sequence which should be one of input sequences when 'consensus_views' is TRUE. |
show.legend |
logical. Should this layer be included in the legends? |
ggplot object
Guangchuang Yu
#plot multiple sequences by loading fasta format fasta <- system.file("extdata", "sample.fasta", package = "ggmsa") ggmsa(fasta, 164, 213, color="Chemistry_AA") ## Not run: #XMultipleAlignment objects can be used as input in the 'ggmsa' AAMultipleAlignment <- readAAMultipleAlignment(fasta) ggmsa(AAMultipleAlignment, 164, 213, color="Chemistry_AA") #XStringSet objects can be used as input in the 'ggmsa' AAStringSet <- readAAStringSet(fasta) ggmsa(AAStringSet, 164, 213, color="Chemistry_AA") #Xbin objects from 'seqmagick' can be used as input in the 'ggmsa' AAbin <- fa_read(fasta) ggmsa(AAbin, 164, 213, color="Chemistry_AA") ## End(Not run)
#plot multiple sequences by loading fasta format fasta <- system.file("extdata", "sample.fasta", package = "ggmsa") ggmsa(fasta, 164, 213, color="Chemistry_AA") ## Not run: #XMultipleAlignment objects can be used as input in the 'ggmsa' AAMultipleAlignment <- readAAMultipleAlignment(fasta) ggmsa(AAMultipleAlignment, 164, 213, color="Chemistry_AA") #XStringSet objects can be used as input in the 'ggmsa' AAStringSet <- readAAStringSet(fasta) ggmsa(AAStringSet, 164, 213, color="Chemistry_AA") #Xbin objects from 'seqmagick' can be used as input in the 'ggmsa' AAbin <- fa_read(fasta) ggmsa(AAbin, 164, 213, color="Chemistry_AA") ## End(Not run)
plot Sequence Bundles for MSA based 'ggolot2'
ggSeqBundle( msa, line_widch = 0.3, line_thickness = 0.3, line_high = 0, spline_shape = 0.3, size = 0.5, alpha = 0.2, bundle_color = c("#2ba0f5", "#424242"), lev_molecule = c("-", "A", "V", "L", "I", "P", "F", "W", "M", "G", "S", "T", "C", "Y", "N", "Q", "D", "E", "K", "R", "H") )
ggSeqBundle( msa, line_widch = 0.3, line_thickness = 0.3, line_high = 0, spline_shape = 0.3, size = 0.5, alpha = 0.2, bundle_color = c("#2ba0f5", "#424242"), lev_molecule = c("-", "A", "V", "L", "I", "P", "F", "W", "M", "G", "S", "T", "C", "Y", "N", "Q", "D", "E", "K", "R", "H") )
msa |
Multiple sequence alignment file(FASTA) or object for representing either nucleotide sequences or peptide sequences.Also receives multiple MSA files. eg:msa = c("Gram-negative_AKL.fasta", "Gram-positive_AKL.fasta"). |
line_widch |
The width of bundles at each site, default is 0.3. |
line_thickness |
The thickness of bundles at each site, default is 0.3. |
line_high |
The high of bundles at each site, default is 0. |
spline_shape |
A numeric vector of values between -1 and 1, which control the shape of the spline relative to the control points. From geom_xspline() in ggalt package. |
size |
A numeric vector of values between o and 1, which control the size of each lines. |
alpha |
A numeric vector of values between o and 1, which control the alpha of each lines. |
bundle_color |
The colors of each sequence bundles. eg: bundle_color = c("#2ba0f5","#424242"). |
lev_molecule |
Reassigning the Y-axis and displaying letter-coded amino acids/nucleotides arranged by physiochemical properties or others.eg:amino acids hydrophobicity lev_molecule = c("-","A", "V", "L", "I", "P", "F", "W", "M", "G", "S","T", "C", "Y", "N", "Q", "D", "E", "K","R", "H"). |
ggplot object
Lang Zhou
aln <- system.file("extdata", "Gram-negative_AKL.fasta", package = "ggmsa") ggSeqBundle(aln)
aln <- system.file("extdata", "Gram-negative_AKL.fasta", package = "ggmsa") ggSeqBundle(aln)
Amino acids in the adenylate kinase lid (AKL) domain from Gram-negative bacteria.
A MSA fasta with 100 sequences and 36 positions.
http://biovis.net/year/2013/info/redesign-contest
Amino acids in the adenylate kinase lid (AKL) domain from Gram-positive bacteria.
A MSA fasta with 100 sequences and 36 positions.
http://biovis.net/year/2013/info/redesign-contest
A folder containing 4 MAS files as a sample data set to identify the sequence recombination event.
a folder
A.Mont.fas MSA with sequences of 'Mont' and 'CF_YL21'
B.Oz.fas MSA with sequences of 'Oz' and 'CF_YL21'
C.Wilga5.fas MSA with sequences of 'Wilga5' and 'CF_YL21'
sample_alignment.fa MSA with sequences of 'Mont', 'CF_YL21', 'Oz', and 'Wilga5'
https://link.springer.com/article/10.1007/s11540-015-9307-3
DNA alignment sequences with 24 sequences and 56 positions.
A MSA fasta
merge two MSA
merge_seq(previous_seq, gap, subsequent_seq, adjust_name = TRUE)
merge_seq(previous_seq, gap, subsequent_seq, adjust_name = TRUE)
previous_seq |
previous MSA |
gap |
gap length |
subsequent_seq |
subsequent MSA |
adjust_name |
logical value. merge seq name or not |
tidy MSA data frame
Lang Zhou
plot method for SeqDiff object
## S4 method for signature 'SeqDiff,ANY' plot( x, width = 50, title = "auto", xlab = "Nucleotide Position", by = "bar", fill = "firebrick", colors = c(A = "#ff6d6d", C = "#769dcc", G = "#f2be3c", T = "#74ce98"), xlim = NULL )
## S4 method for signature 'SeqDiff,ANY' plot( x, width = 50, title = "auto", xlab = "Nucleotide Position", by = "bar", fill = "firebrick", colors = c(A = "#ff6d6d", C = "#769dcc", G = "#f2be3c", T = "#74ce98"), xlim = NULL )
x |
SeqDiff object |
width |
bin width |
title |
plot title |
xlab |
xlab |
by |
one of 'bar' and 'area' |
fill |
fill color of upper part of the plot |
colors |
color of lower part of the plot |
xlim |
limits of x-axis |
plot
guangchuang yu
fas <- list.files(system.file("extdata", "GVariation", package="ggmsa"), pattern="fas", full.names=TRUE) x1 <- seqdiff(fas[1], reference=1) plot(x1)
fas <- list.files(system.file("extdata", "GVariation", package="ggmsa"), pattern="fas", full.names=TRUE) x1 <- seqdiff(fas[1], reference=1) plot(x1)
read 'multiple alignment format'(MAF) file
read_maf(multiple_alignment_format)
read_maf(multiple_alignment_format)
multiple_alignment_format |
a multiple alignment format(MAF) file |
data frame
Lang Zhou
Read secondary structure file
readSSfile(file, type = NULL)
readSSfile(file, type = NULL)
file |
A text file in connect format |
type |
file type. one of "Helix, "Connect", "Vienna" and "Bpseq" |
data frame
Lang Zhou
RF03120 <- system.file("extdata/Rfam/RF03120_SS.txt", package="ggmsa") helix_data <- readSSfile(RF03120, type = "Vienna")
RF03120 <- system.file("extdata/Rfam/RF03120_SS.txt", package="ggmsa") helix_data <- readSSfile(RF03120, type = "Vienna")
reset MSA position
reset_pos(seq_df)
reset_pos(seq_df)
seq_df |
MSA data |
data frame
Lang Zhou
A folder containing seed alignment sequences and corresponding consensus RNA secondary structure.
a folder
RF00458.fasta seed alignment sequences of Cripavirus internal ribosome entry site (IRES)
RF03120.fasta seed alignment sequences of Sarbecovirus 5'UTR
RF03120_SS.txt consensus RNA secondary structure of Sarbecovirus 5'UTR
A dataset containing the alignment sequences of the phenylalanine hydroxylase protein (PH4H) within nine species
A MSA fasta with 9 sequences and 456 positions.
Fasta format sequences of mature miRNA sequences from miRBase
A MSA fasta with 6 sequences and 22 positions.
https://www.mirbase.org/ftp.shtml
calculate difference of two aligned sequences
seqdiff(fasta, reference = 1)
seqdiff(fasta, reference = 1)
fasta |
fasta file |
reference |
which sequence serve as reference, 1 or 2 |
SeqDiff object
guangchuang yu
fas <- list.files(system.file("extdata", "GVariation", package="ggmsa"), pattern="fas", full.names=TRUE) seqdiff(fas[1], reference=1)
fas <- list.files(system.file("extdata", "GVariation", package="ggmsa"), pattern="fas", full.names=TRUE) seqdiff(fas[1], reference=1)
plot sequence logo for MSA based 'ggolot2'
seqlogo( msa, start = NULL, end = NULL, font = "DroidSansMono", color = "Chemistry_AA", adaptive = FALSE, top = FALSE, custom_color = NULL )
seqlogo( msa, start = NULL, end = NULL, font = "DroidSansMono", color = "Chemistry_AA", adaptive = FALSE, top = FALSE, custom_color = NULL )
msa |
Multiple sequence alignment file or object for representing either nucleotide sequences or peptide sequences. |
start |
Start position to plot. |
end |
End position to plot. |
font |
font families, possible values are 'helvetical', 'mono', and 'DroidSansMono', 'TimesNewRoman'. Defaults is 'DroidSansMono'. If font=NULL, only the background tiles is drawn. |
color |
A Color scheme. One of 'Clustal', 'Chemistry_AA', 'Shapely_AA', 'Zappo_AA', 'Taylor_AA', 'LETTER', 'CN6','Chemistry_NT', 'Shapely_NT', 'Zappo_NT', 'Taylor_NT'. Defaults is 'Chemistry_AA'. |
adaptive |
A logical value indicating whether the overall height of seqlogo corresponds to the number of sequences. If FALSE, seqlogo overall height = 4,fixedly. |
top |
A logical value. If TRUE, seqlogo is aligned to the top of MSA. |
custom_color |
A data frame with two cloumn called "names" and "color".Customize the color scheme. |
ggplot object
Lang Zhou
#plot sequence motif independently nt_sequence <- system.file("extdata", "LeaderRepeat_All.fa", package = "ggmsa") seqlogo(nt_sequence, color = "Chemistry_NT")
#plot sequence motif independently nt_sequence <- system.file("extdata", "LeaderRepeat_All.fa", package = "ggmsa") seqlogo(nt_sequence, color = "Chemistry_NT")
Alignment sequences used to demonstrate circular MSA layout
A MSA fasta with 28 sequences and 480 positions.
show method
show(object)
show(object)
object |
SeqDiff object |
message
fas <- list.files(system.file("extdata", "GVariation", package="ggmsa"), pattern="fas", full.names=TRUE) x1 <- seqdiff(fas[1], reference=1) x1
fas <- list.files(system.file("extdata", "GVariation", package="ggmsa"), pattern="fas", full.names=TRUE) x1 <- seqdiff(fas[1], reference=1) x1
reset hdata data position
simplify_hdata(hdata, sim_msa)
simplify_hdata(hdata, sim_msa)
hdata |
data from tidy_hdata() |
sim_msa |
MSA data frame |
data frame
Lang Zhou
Sequence similarity plot
simplot( file, query, window = 200, step = 20, group = FALSE, id, sep, sd = FALSE, smooth = FALSE, smooth_params = list(method = "loess", se = FALSE) )
simplot( file, query, window = 200, step = 20, group = FALSE, id, sep, sd = FALSE, smooth = FALSE, smooth_params = list(method = "loess", se = FALSE) )
file |
alignment fast file |
query |
query sequence |
window |
sliding window size (bp) |
step |
step size to slide the window (bp) |
group |
whether grouping sequence.(eg. For "A-seq1,A-seq-2,B-seq1 and B-seq2", using sep = "-" and id = 1 to divide sequences into groups A and B) |
id |
position to extract id for grouping; only works if group = TRUE |
sep |
separator to split sequence name; only works if group = TRUE |
sd |
whether display standard deviation of similarity among each group; only works if group=TRUE |
smooth |
FALSE(default)or TRUE; whether display smoothed spline. |
smooth_params |
a list that add params for geom_smooth, (default: smooth_params = list(method = "loess", se = FALSE)) |
ggplot object
guangchuang yu
fas <- system.file("extdata/GVariation/sample_alignment.fa", package="ggmsa") simplot(fas, 'CF_YL21')
fas <- system.file("extdata/GVariation/sample_alignment.fa", package="ggmsa") simplot(fas, 'CF_YL21')
tidy protein-protein interactive position data
tidy_hdata(gap, inter, previous_seq, subsequent_seq)
tidy_hdata(gap, inter, previous_seq, subsequent_seq)
gap |
gap length |
inter |
protein-protein interactive position data |
previous_seq |
previous MSA |
subsequent_seq |
subsequent MSA |
helix data
Lang Zhou
tidy MAF data frame
tidy_maf_df(maf_df, ref)
tidy_maf_df(maf_df, ref)
maf_df |
a MAF data frame.You can get it by read_maf() |
ref |
character, the name of reference genome. eg:"hg38.chr1_KI270707v1_random" |
data frame
Lang Zhou
Convert msa file/object to tidy data frame.
tidy_msa(msa, start = NULL, end = NULL)
tidy_msa(msa, start = NULL, end = NULL)
msa |
multiple sequence alignment file or sequence object in DNAStringSet, RNAStringSet, AAStringSet, BStringSet, DNAMultipleAlignment, RNAMultipleAlignment, AAMultipleAlignment, DNAbin or AAbin |
start |
start position to extract subset of alignment |
end |
end position to extract subset of alignemnt |
tibble data frame
Guangchuang Yu
fasta <- system.file("extdata", "sample.fasta", package = "ggmsa") aln <- tidy_msa(msa = fasta, start = 10, end = 100)
fasta <- system.file("extdata", "sample.fasta", package = "ggmsa") aln <- tidy_msa(msa = fasta, start = 10, end = 100)
The local genome map shows the 30000 sites around the TP53 gene.
xlsx
Alignment sequences of used to show graphical combination
A MSA fasta with 5 sequences and 404 positions.
plot Tree-MSA plot
treeMSA_plot( p_tree, tidymsa_df, ancestral_node = "none", sub = FALSE, panel = "MSA", font = NULL, color = "Chemistry_AA", seq_colname = NULL, ... )
treeMSA_plot( p_tree, tidymsa_df, ancestral_node = "none", sub = FALSE, panel = "MSA", font = NULL, color = "Chemistry_AA", seq_colname = NULL, ... )
p_tree |
tree view |
tidymsa_df |
tidy MSA data |
ancestral_node |
vector, internal node in tree. Assigning a internal node to display "ancestral sequences",If ancestral_node = "none" hides all ancestral sequences, if ancestral_node = "all" shows all ancestral sequences. |
sub |
logical value. Displaying a subset of ancestral sequences or not. |
panel |
panel name for plot of MSA data |
font |
font families, possible values are 'helvetical', 'mono', and 'DroidSansMono', 'TimesNewRoman'. Defaults is 'helvetical'. If font = NULL, only plot the background tile. |
color |
a Color scheme. One of 'Clustal', 'Chemistry_AA', 'Shapely_AA', 'Zappo_AA', 'Taylor_AA', 'LETTER', 'CN6', 'Chemistry_NT', 'Shapely_NT', 'Zappo_NT', 'Taylor_NT'. Defaults is 'Chemistry_AA'. |
seq_colname |
the colname of MSA on tree$data |
... |
additional parameters for 'geom_msa' |
'treeMSA_plot()' automatically re-arranges the MSA data according to the tree structure,
ggplot object
Lang Zhou