Title: | Quantification and Visualization of Variations of Splicing in Population |
---|---|
Description: | Discovery of genome-wide variable alternative splicing events from short-read RNA-seq data and visualizations of gene splicing information for publication-quality multi-panel figures in a population. (Warning: The visualizing function is removed due to the dependent package Sushi deprecated. If you want to use it, please change back to an older version.) |
Authors: | Huihui Yu [aut, cre] , Qian Du [aut] , Chi Zhang [aut] |
Maintainer: | Huihui Yu <[email protected]> |
License: | GPL (>= 2.0) |
Version: | 1.19.0 |
Built: | 2024-10-31 06:29:18 UTC |
Source: | https://github.com/bioc/VaSP |
Find bimodal distrubition features and divide the samples into 2 groups by k-means clustering.
BMfinder(x, p.value = 0.01, maf = 0.05, miss = 0.05, fold = 2, log = FALSE, cores = detectCores() - 1)
BMfinder(x, p.value = 0.01, maf = 0.05, miss = 0.05, fold = 2, log = FALSE, cores = detectCores() - 1)
x |
a numeric matrix with feature rows and sample columns, e.g.,
splicing score matrix from |
p.value |
p.value threshold for bimodal distrubition test |
maf |
minor allele frequency threshold in k-means clustering |
miss |
missing grouping rate threshold in k-means clustering |
fold |
fold change threshold between the two groups |
log |
whether the scores are to be logarithmic. If TRUE, all the scores are log2 tranformed before k-means clustering: x = log2(x+1). |
cores |
threads to be used. This value is passed to ?mclapply in parallel package |
The matrix contains 1, 2 and NA, and values of 'x' in group 2 are larger than group 1.
a matrix with feature rows and sample columns.
data(rice.bg) score<-spliceGene(rice.bg,'MSTRG.183',junc.type='score') score<-round(score,2) as<-BMfinder(score,cores=1) # 4 bimodal distrubition features found ##compare as score[rownames(score)%in%rownames(as),]
data(rice.bg) score<-spliceGene(rice.bg,'MSTRG.183',junc.type='score') score<-round(score,2) as<-BMfinder(score,cores=1) # 4 bimodal distrubition features found ##compare as score[rownames(score)%in%rownames(as),]
Get read depth from a BAM file (in bedgraph format)
getDepth(x, chrom, start, end)
getDepth(x, chrom, start, end)
x |
path to a BAM file |
chrom |
chromosome of a region to be searched |
start |
start position |
end |
end position |
a data.frame in bedgraph file format.
path <- system.file('extdata',package='VaSP') bam_files<-list.files(path,'bam$') bam_files depth<-getDepth(file.path(path, bam_files[1]), 'Chr1', start=1171800, end=1179400) head(depth) # library(Sushi) # plotBedgraph(depth,'Chr1',chromstart=1171800, chromend=1179400,yaxt='s') # mtext('Depth',side=2,line=2.5,cex=1.2,font=2) # labelgenome('Chr1',1171800,1179400,side=1,scipen=20,n=5,scale='Kb')
path <- system.file('extdata',package='VaSP') bam_files<-list.files(path,'bam$') bam_files depth<-getDepth(file.path(path, bam_files[1]), 'Chr1', start=1171800, end=1179400) head(depth) # library(Sushi) # plotBedgraph(depth,'Chr1',chromstart=1171800, chromend=1179400,yaxt='s') # mtext('Depth',side=2,line=2.5,cex=1.2,font=2) # labelgenome('Chr1',1171800,1179400,side=1,scipen=20,n=5,scale='Kb')
Get gene informaton from a ballgown object by genes or by genomic regions
getGeneinfo(genes = NA, bg, chrom, start, end, samples = sampleNames(bg), trans.select = NA)
getGeneinfo(genes = NA, bg, chrom, start, end, samples = sampleNames(bg), trans.select = NA)
genes |
a character vector specifying gene IDs in 'bg'. Any values other than NA override genomic region (chrom, start, stop) |
bg |
ballgown object |
chrom |
chromosome of a region |
start |
start postion |
end |
stop postion |
samples |
names of samples. The transcrpts in these samples are subjected to 'trans.select' |
trans.select |
logical expression-like string, indicating transcript rows to select from a matrix of transcript coverages: NA value keeps all transcripts. |
a data.frame in bed-like file format
data(rice.bg) unique(geneIDs(rice.bg)) gene_id <- c('MSTRG.181', 'MSTRG.182', 'MSTRG.183') geneinfo <- getGeneinfo(genes=gene_id,rice.bg) trans <- table(geneinfo$name) # show how many exons each transcript has trans # library(Sushi) # chrom = geneinfo$chrom[1] # chromstart = min(geneinfo$start) - 1e3 # chromend = max(geneinfo$stop) + 1e3 # color = rep(SushiColors(2)(length(trans)), trans) # par(mar=c(3,1,1,1)) # plotGenes(geneinfo, chrom, chromstart, chromend, col = color, bheight = 0.2, # bentline = FALSE, plotgenetype = 'arrow', labeloffset = 0.5) # labelgenome(chrom, chromstart , chromend, side = 1, n = 5, scale = 'Kb')
data(rice.bg) unique(geneIDs(rice.bg)) gene_id <- c('MSTRG.181', 'MSTRG.182', 'MSTRG.183') geneinfo <- getGeneinfo(genes=gene_id,rice.bg) trans <- table(geneinfo$name) # show how many exons each transcript has trans # library(Sushi) # chrom = geneinfo$chrom[1] # chromstart = min(geneinfo$start) - 1e3 # chromend = max(geneinfo$stop) + 1e3 # color = rep(SushiColors(2)(length(trans)), trans) # par(mar=c(3,1,1,1)) # plotGenes(geneinfo, chrom, chromstart, chromend, col = color, bheight = 0.2, # bentline = FALSE, plotgenetype = 'arrow', labeloffset = 0.5) # labelgenome(chrom, chromstart , chromend, side = 1, n = 5, scale = 'Kb')
Small ballgown object created with a subset of rice RNAseq data, for demonstration purposes
a ballgown object with 33 transcripts and 6 samples
The raw RNA-seq data were screened and trimmed using Trimmomatic (Bolger et al., 2014) and RNA-seq mapping, transcript assembly, and quantification were conducted with HISAT, StringTie, and Ballgown by following the method described by Pertea et al. (Pertea et al., 2016). The rice.bg is a subset ballgown object with 33 transcripts and 6 samples (Yu et al., 2021).
The raw RNA-seq data were from the project of variation in transcriptional responses to salt stress in rice (SRA Accession: SRP106054)
Yu, H., Du, Q., Campbell, M., Yu, B., Walia, H. and Zhang, C. (2021), Genome‐wide discovery of natural variation in pre‐mRNA splicing and prioritising causal alternative splicing to salt stress response in rice. New Phytol. https://doi.org/10.1111/nph.17189
Bolger, A.M., Lohse, M., and Usadel, B. (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114-2120.
Pertea, M., Kim, D., Pertea, G.M., Leek, J.T., and Salzberg, S.L. (2016). Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat Protoc 11, 1650-1667.
data(rice.bg) rice.bg # ballgown instance with 33 transcripts and 6 samples
data(rice.bg) rice.bg # ballgown instance with 33 transcripts and 6 samples
Calculate splicing Scores from ballgown object for a given gene.
This function can only calculate one gene. Please use function
spliceGenome
to obtain genome-wide splicing scores.
spliceGene(bg, gene, samples = sampleNames(bg), junc.type = c("score", "count"), trans.select = "rowMaxs(x)>=1", junc.select = "rowMaxs(x)>=5")
spliceGene(bg, gene, samples = sampleNames(bg), junc.type = c("score", "count"), trans.select = "rowMaxs(x)>=1", junc.select = "rowMaxs(x)>=5")
bg |
ballgown object |
gene |
a character string specifying gene id |
samples |
names of samples |
junc.type |
type of junction estimate ('score' for junction score; 'count' for junction read count) |
trans.select |
logical expression-like string, indicating transcript rows to select from a matrix of transcript coverages: NA value keeps all transcripts. e.g. use trans.select='rowMaxs(x)>=1' to filter the transcrpts with the maximium coverage among all the samples less than 1. |
junc.select |
logical expression-like string, indicating junction rows to select from a matrix of junction counts: NA value keeps all junctions. e.g. use junc.select='rowMaxs(x)>=5' to filter the junctions with the maximium read count among all the samples less than 5. |
score = junction count/gene-level per base read coverage.
Row functions for matrices are useful to select transcripts and junctions.
See matrixStats
package.
a matrix of junction scores with intron rows and sample columns.
Yu, H., Du, Q., Campbell, M., Yu, B., Walia, H. and Zhang, C. (2021), Genome‐wide discovery of natural variation in pre‐mRNA splicing and prioritising causal alternative splicing to salt stress response in rice. New Phytol. https://doi.org/10.1111/nph.17189
spliceGenome
, which calculates splicing scores in
whole genome.
data(rice.bg) rice.bg head(geneIDs(rice.bg)) score<-spliceGene(rice.bg,'MSTRG.183',junc.type='score') count<-spliceGene(rice.bg,'MSTRG.183',junc.type='count') ## compare tail(score) tail(count) ## get intron structrue intron<-structure(rice.bg)$intron intron[intron$id%in%rownames(score)]
data(rice.bg) rice.bg head(geneIDs(rice.bg)) score<-spliceGene(rice.bg,'MSTRG.183',junc.type='score') count<-spliceGene(rice.bg,'MSTRG.183',junc.type='count') ## compare tail(score) tail(count) ## get intron structrue intron<-structure(rice.bg)$intron intron[intron$id%in%rownames(score)]
Calculate splicing scores from ballgown objects for all genes.
spliceGenome(bg, gene.select = "rowQuantiles(x,probs = 0.05)>=1", intron.select = "rowQuantiles(x,probs = 0.95)>=5")
spliceGenome(bg, gene.select = "rowQuantiles(x,probs = 0.05)>=1", intron.select = "rowQuantiles(x,probs = 0.95)>=5")
bg |
ballgown object |
gene.select |
logical expression-like string, indicating genes to select from a matrix of gene-level coverages: NA value keeps all genes. e.g. gene.select = 'rowQuantiles(x,probs = 0.05)>=1' keeps the genes with the read coverage greater than or equal to 1 in at least 95 (0.05 quantile). Used to filter low expressed genes. |
intron.select |
logical expression-like string, indicating introns to select from a matrix of junction counts: NA value keeps all introns. e.g. intron.select = 'rowQuantiles(x,probs = 0.95)>=5' keeps the introns with the read count greater than or euqal to 5 in at least 5 (0.95 quantile). Used to filter introns with very few junction reads supporting. |
score = junction count/gene-level per base read coverage.
Row functions for matrices in matrixStats
package are useful
to select genes and introns.
a list of two elelments: 'score' is matrix of intron splicing scores
with intron rows and sample columns and 'intron' is a
GRanges
object of intron structure.
See structure
in ballgown package
Yu, H., Du, Q., Campbell, M., Yu, B., Walia, H. and Zhang, C. (2021), Genome‐wide discovery of natural variation in pre‐mRNA splicing and prioritising causal alternative splicing to salt stress response in rice. New Phytol. https://doi.org/10.1111/nph.17189
spliceGene
, which calculates splicing scores in one
gene.
data(rice.bg) rice.bg splice<-spliceGenome(rice.bg,gene.select=NA,intron.select=NA) names(splice) head(splice$score) splice$intron
data(rice.bg) rice.bg splice<-spliceGenome(rice.bg,gene.select=NA,intron.select=NA) names(splice) head(splice$score) splice$intron