Title: | Ribosome Profiling Data Analysis: from BAM to Data Representation and Interpretation |
---|---|
Description: | Starting with a BAM file, this package provides the necessary functions for quality assessment, read start position recalibration, the counting of reads on CDS, 3'UTR, and 5'UTR, plotting of count data: pairs, log fold-change, codon frequency and coverage assessment, principal component analysis on codon coverage. |
Authors: | Alexandra Popa |
Maintainer: | A. Popa <[email protected]> |
License: | GPL-3 |
Version: | 1.37.0 |
Built: | 2024-12-19 04:13:44 UTC |
Source: | https://github.com/bioc/RiboProfiling |
Returns the flank size around the TSS for the x % CDSs
aroundPromoter(txdb, alnGRanges, percBestExpressed, flankSize)
aroundPromoter(txdb, alnGRanges, percBestExpressed, flankSize)
txdb |
a TxDb object containing the annotations info to intersect with the alignment files. |
alnGRanges |
A GRanges object containing the alignment information. In order to improve the performance of this function the GAlignments BAM object should be transformed into a GRanges object containing the cigar match size information as metadata. |
percBestExpressed |
a numeric [between 0 and 1]. The percentage of the best expressed CDSs on which to plot the coverage around the TSS. Necessary if the shiftValue parameter must be estimated. Default value 0.03 (3%). |
flankSize |
a numeric positive integer. How many bp left and right of the TSS should the coverage be performed? Ex. flankSize <- 20 |
A GRanges object containing the 1 bp ranges for the selected CDSs in the TSS defined flanking region.
#read the BAM into a GAlignments object using #GenomicAlignments::readGAlignments #the GAlignments object should be similar to ctrlGAlignments data(ctrlGAlignments) aln <- ctrlGAlignments #transform the GAlignments object into a GRanges object (faster processing) alnGRanges <- readsToStartOrEnd(aln, what="start") #make a txdb object containing the annotations for the specified species. #In this case hg19. txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene::TxDb.Hsapiens.UCSC.hg19.knownGene #Please make sure that seqnames of txdb correspond to #the seqnames of the alignment files ("chr" particle) #if not rename the txdb seqlevels #renameSeqlevels(txdb, sub("chr", "",seqlevels(txdb))) #get the flanking region around the promoter of the best expressed CDSs oneBinRanges <- aroundPromoter(txdb, alnGRanges)
#read the BAM into a GAlignments object using #GenomicAlignments::readGAlignments #the GAlignments object should be similar to ctrlGAlignments data(ctrlGAlignments) aln <- ctrlGAlignments #transform the GAlignments object into a GRanges object (faster processing) alnGRanges <- readsToStartOrEnd(aln, what="start") #make a txdb object containing the annotations for the specified species. #In this case hg19. txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene::TxDb.Hsapiens.UCSC.hg19.knownGene #Please make sure that seqnames of txdb correspond to #the seqnames of the alignment files ("chr" particle) #if not rename the txdb seqlevels #renameSeqlevels(txdb, sub("chr", "",seqlevels(txdb))) #get the flanking region around the promoter of the best expressed CDSs oneBinRanges <- aroundPromoter(txdb, alnGRanges)
A list of start and end codons relative to transcript
data(cdsPosTransc)
data(cdsPosTransc)
A list
A list
A list of 2 data.frame objects: one with the number of times each codon type is found in each ORF and one with the number of reads for each codon type in each ORF.
data(codonDataCtrl)
data(codonDataCtrl)
A list of 2 lists.
A list of 2 lists.
A list containing the number of reads for each codon in each ORF. Codons are reported on their index in the ORF and no information is available about their type/sequence.
data(codonIndexCovCtrl)
data(codonIndexCovCtrl)
A list of 2 columns dataframes.
A list of 2 columns dataframes.
Associates the read counts on codons with the codon type for each ORF.
codonInfo(listReadsCodon, genomeSeq, orfCoord, motifSize)
codonInfo(listReadsCodon, genomeSeq, orfCoord, motifSize)
listReadsCodon |
a list of data.frame objects. It contains the number of reads per codon in a CDS. |
genomeSeq |
a BSgenome object. It contains the full genome sequences for the organism. |
orfCoord |
a GRangesList. The coordinates of the ORFs on the genome. |
motifSize |
an integer. The number of nucleotides in each motif on which to compute coverage and usage. Either 3, 6, or 9. Default 3 nucleotides (codon). Attention! For long motifs, the function can be quite slow!! |
a list of 2 data.frame objects: one with the number of times each codon type is found in each ORF and one with the number of reads for each codon type in each ORF.
#for each codon in each ORF get the read coverage #parameter listReadsCodon can be returned by the riboSeqFromBam function #it corresponts to the 2nd element in the list returned by riboSeqFromBam data(codonIndexCovCtrl) listReadsCodon <- codonIndexCovCtrl txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene::TxDb.Hsapiens.UCSC.hg19.knownGene #get the names of the ORFs #grouped by transcript cds <- GenomicFeatures::cdsBy(txdb, use.names=TRUE) orfCoord <- cds[names(cds) %in% names(listReadsCodon)] #get the genome, please check that the genome has the same seqlevels genomeSeq <- BSgenome.Hsapiens.UCSC.hg19::BSgenome.Hsapiens.UCSC.hg19 #if not rename it #gSeq <- GenomeInfoDb::renameSeqlevels(genomeSeq, #sub("chr", "", GenomeInfoDb::seqlevels(genomeSeq))) #codon frequency, coverage, and annotation codonData <- codonInfo(listReadsCodon, genomeSeq, orfCoord)
#for each codon in each ORF get the read coverage #parameter listReadsCodon can be returned by the riboSeqFromBam function #it corresponts to the 2nd element in the list returned by riboSeqFromBam data(codonIndexCovCtrl) listReadsCodon <- codonIndexCovCtrl txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene::TxDb.Hsapiens.UCSC.hg19.knownGene #get the names of the ORFs #grouped by transcript cds <- GenomicFeatures::cdsBy(txdb, use.names=TRUE) orfCoord <- cds[names(cds) %in% names(listReadsCodon)] #get the genome, please check that the genome has the same seqlevels genomeSeq <- BSgenome.Hsapiens.UCSC.hg19::BSgenome.Hsapiens.UCSC.hg19 #if not rename it #gSeq <- GenomeInfoDb::renameSeqlevels(genomeSeq, #sub("chr", "", GenomeInfoDb::seqlevels(genomeSeq))) #codon frequency, coverage, and annotation codonData <- codonInfo(listReadsCodon, genomeSeq, orfCoord)
PCA graphs on codon coverage
codonPCA(data, typeData)
codonPCA(data, typeData)
data |
a list of 2 data.frames: one with the number of times each codon type is found in each ORF and one with the number of reads for each codon in each ORF. |
typeData |
a character string. It is used as title for the PCA. Ex. typeData="codonCoverage" |
a list of length 2: PCA_scores - matrix of the scores on the first 4 principal components. PCA_plots - a list of 5 PCA scatterplots.
#How to perform a PCA analysis based on codon coverage #adapted from #http://stackoverflow.com/questions/20260434/test-significance-of-clusters-on-a-pca-plot #either get the codon frequency, coverage, and annotation using a function #such as codonInfo in this package #or create a list of matrices with the above information data(codonDataCtrl) codonData <- codonDataCtrl codonUsage <- codonData[[1]] codonCovMatrix <- codonData[[2]] #keep only genes with a minimum number of reads nbrReadsGene <- apply(codonCovMatrix, 1, sum) ixExpGenes <- which(nbrReadsGene >= 50) codonCovMatrix <- codonCovMatrix[ixExpGenes, ] #get the PCA on the codon coverage codonCovMatrixTransp <- t(codonCovMatrix) rownames(codonCovMatrixTransp) <- colnames(codonCovMatrix) colnames(codonCovMatrixTransp) <- rownames(codonCovMatrix) listPCACodonCoverage <- codonPCA(codonCovMatrixTransp,"codonCoverage") print(listPCACodonCoverage[[2]]) #See aditional examples in the pdf manual
#How to perform a PCA analysis based on codon coverage #adapted from #http://stackoverflow.com/questions/20260434/test-significance-of-clusters-on-a-pca-plot #either get the codon frequency, coverage, and annotation using a function #such as codonInfo in this package #or create a list of matrices with the above information data(codonDataCtrl) codonData <- codonDataCtrl codonUsage <- codonData[[1]] codonCovMatrix <- codonData[[2]] #keep only genes with a minimum number of reads nbrReadsGene <- apply(codonCovMatrix, 1, sum) ixExpGenes <- which(nbrReadsGene >= 50) codonCovMatrix <- codonCovMatrix[ixExpGenes, ] #get the PCA on the codon coverage codonCovMatrixTransp <- t(codonCovMatrix) rownames(codonCovMatrixTransp) <- colnames(codonCovMatrix) colnames(codonCovMatrixTransp) <- rownames(codonCovMatrix) listPCACodonCoverage <- codonPCA(codonCovMatrixTransp,"codonCoverage") print(listPCACodonCoverage[[2]]) #See aditional examples in the pdf manual
Apply an offset on the read start along the transcript and returns the coverage on the 5pUTR, CDS, 3pUTR, as well as a matrix of codon coverage per ORF.
countShiftReads(exonGRanges, cdsPosTransc, alnGRanges, shiftValue, motifSize)
countShiftReads(exonGRanges, cdsPosTransc, alnGRanges, shiftValue, motifSize)
exonGRanges |
a GRangesList. It contains the exon coordinates grouped by transcript. |
cdsPosTransc |
a list. It contains the relative positions of the start and end of the ORFs. The transcript names in exonGRanges and cdsPosTransc should be the same. |
alnGRanges |
A GRanges object containing the alignment information. In order to improve the performance the GAlignments BAM object should be transformed into a GRanges object with cigar match size metadata. |
shiftValue |
integer. The offset for recalibrating reads on transcripts when computing coverage. The default value for this parameter is 0, no offset should be performed. |
motifSize |
an integer. The number of nucleotides in each motif on which to compute coverage and usage. Either 3, 6, or 9. Default 3 nucleotides (codon). |
a list with 2 objects. The first object in the list is a data.frame containing: information on ORFs (names, chromosomal position, length) as well as the counts on the 5pUTR, CDS and 3pUTR once the offset is applied. The second object in the list is a list in itself. It contains: for each ORF in the cdsPosTransc, for each codon the sum of read starts covering the 3 codon nucleotides. For motifs of size 6 nucleotides, the motif coverage is computed only for the first codon in the motif, considered as the codon in the P-site. For motifs of size 9 nucleotides, the motif coverage is computed only for the second codon in the motif, considered as the codon in the P-site. This per codon coverage does not contain information on the codon type, just its position in the ORF and its coverage.
#read the BAM file into a GAlignments object using #GenomicAlignments::readGAlignments #the GAlignments object should be similar to ctrlGAlignments data(ctrlGAlignments) aln <- ctrlGAlignments #transform the GAlignments object into a GRanges object (faster processing) alnGRanges <- readsToStartOrEnd(aln, what="start") #make a txdb object containing the annotations for the specified species. #In this case hg19. txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene::TxDb.Hsapiens.UCSC.hg19.knownGene #Please make sure that seqnames of txdb correspond to #the seqnames of the alignment files ("chr" particle) #if not rename the txdb seqlevels #renameSeqlevels(txdb, sub("chr", "", seqlevels(txdb))) #get all CDSs by transcript cds <- GenomicFeatures::cdsBy(txdb, by="tx", use.names=TRUE) #get all exons by transcript exonGRanges <- GenomicFeatures::exonsBy(txdb, by="tx", use.names=TRUE) #get the per transcript relative position of start and end codons #cdsPosTransc <- orfRelativePos(cds, exonGRanges) data(cdsPosTransc) #compute the counts on the different features after applying #the specified shift value on the read start along the transcript countsData <- countShiftReads(exonGRanges[names(cdsPosTransc)], cdsPosTransc, alnGRanges, -14)
#read the BAM file into a GAlignments object using #GenomicAlignments::readGAlignments #the GAlignments object should be similar to ctrlGAlignments data(ctrlGAlignments) aln <- ctrlGAlignments #transform the GAlignments object into a GRanges object (faster processing) alnGRanges <- readsToStartOrEnd(aln, what="start") #make a txdb object containing the annotations for the specified species. #In this case hg19. txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene::TxDb.Hsapiens.UCSC.hg19.knownGene #Please make sure that seqnames of txdb correspond to #the seqnames of the alignment files ("chr" particle) #if not rename the txdb seqlevels #renameSeqlevels(txdb, sub("chr", "", seqlevels(txdb))) #get all CDSs by transcript cds <- GenomicFeatures::cdsBy(txdb, by="tx", use.names=TRUE) #get all exons by transcript exonGRanges <- GenomicFeatures::exonsBy(txdb, by="tx", use.names=TRUE) #get the per transcript relative position of start and end codons #cdsPosTransc <- orfRelativePos(cds, exonGRanges) data(cdsPosTransc) #compute the counts on the different features after applying #the specified shift value on the read start along the transcript countsData <- countShiftReads(exonGRanges[names(cdsPosTransc)], cdsPosTransc, alnGRanges, -14)
Graphs of sample read counts (quality assesment)
countsPlot(listCounts, ixCounts, log2Bool)
countsPlot(listCounts, ixCounts, log2Bool)
listCounts |
a list of data.frame objects. It contains the counts on the genomic features. Each data.frame in the list should have the same number of columns. |
ixCounts |
a numeric (a vector of integers). It contains the index of the columns containing counts in the dataFrame. |
log2Bool |
a numeric, either 0 or 1. 0 (default) for no log2 transformation and 1 for log2 transformation. |
A list of pairs and boxplots between the counts data in each data.frame.
#read the BAM file into a GAlignments object using #GenomicAlignments::readGAlignments #the GAlignments object should be similar to ctrlGAlignments data(ctrlGAlignments) aln <- ctrlGAlignments #transform the GAlignments object into a GRanges object (faster processing) alnGRanges <- readsToStartOrEnd(aln, what="start") #make a txdb object containing the annotations for the specified species. #In this case hg19. txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene::TxDb.Hsapiens.UCSC.hg19.knownGene #Please make sure that seqnames of txdb correspond to #the seqnames of the alignment files ("chr" particle) #if not rename the txdb seqlevels #renameSeqlevels(txdb, sub("chr", "",seqlevels(txdb))) #get the flanking region around the promoter of the best expressed CDSs #get all CDSs by transcript cds <- GenomicFeatures::cdsBy(txdb,by="tx",use.names=TRUE) #get all exons by transcript exonGRanges <- GenomicFeatures::exonsBy(txdb,by="tx",use.names=TRUE) #get the per transcript relative position of start and end codons cdsPosTransc <- orfRelativePos(cds, exonGRanges) #compute the counts on the different features after applying #the specified shift value on the read start along the transcript countsData <- countShiftReads( exonGRanges[names(cdsPosTransc)], cdsPosTransc, alnGRanges, -14 ) #now make the plots listCountsPlots <- countsPlot( list(countsData[[1]]), grep("_counts$", colnames(countsData[[1]])), 1 ) listCountsPlots
#read the BAM file into a GAlignments object using #GenomicAlignments::readGAlignments #the GAlignments object should be similar to ctrlGAlignments data(ctrlGAlignments) aln <- ctrlGAlignments #transform the GAlignments object into a GRanges object (faster processing) alnGRanges <- readsToStartOrEnd(aln, what="start") #make a txdb object containing the annotations for the specified species. #In this case hg19. txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene::TxDb.Hsapiens.UCSC.hg19.knownGene #Please make sure that seqnames of txdb correspond to #the seqnames of the alignment files ("chr" particle) #if not rename the txdb seqlevels #renameSeqlevels(txdb, sub("chr", "",seqlevels(txdb))) #get the flanking region around the promoter of the best expressed CDSs #get all CDSs by transcript cds <- GenomicFeatures::cdsBy(txdb,by="tx",use.names=TRUE) #get all exons by transcript exonGRanges <- GenomicFeatures::exonsBy(txdb,by="tx",use.names=TRUE) #get the per transcript relative position of start and end codons cdsPosTransc <- orfRelativePos(cds, exonGRanges) #compute the counts on the different features after applying #the specified shift value on the read start along the transcript countsData <- countShiftReads( exonGRanges[names(cdsPosTransc)], cdsPosTransc, alnGRanges, -14 ) #now make the plots listCountsPlots <- countsPlot( list(countsData[[1]]), grep("_counts$", colnames(countsData[[1]])), 1 ) listCountsPlots
A dataset containing the alignment information on chromosome 1 from the control BAM. The data object is a GAlignments object containing 3,504,859 hg19 mapped reads.
data(ctrlGAlignments)
data(ctrlGAlignments)
A GAlignments object with 3,504,859 reads.
the GAlignments object of reads on chr 1
Histogram of match length distribution of reads.
histMatchLength(aln, log10Transf = 0, titleHist)
histMatchLength(aln, log10Transf = 0, titleHist)
aln |
A GAlignments object of the BAM mapping file. |
log10Transf |
A boolean. Either 0 (default) or 1 (log10). |
titleHist |
a character. The main title for the histogram. Default - none. |
A list with 2 elements. The first element: a data.frame of the number of counts per match length distribution. The second element in the list: a ggplot2 histogram of the match length distribution.
#starting from a GAlignment object data(ctrlGAlignments) aln <- ctrlGAlignments #no log10 scaling matchLenDistr <- histMatchLength(aln, 0) #to plot the histogram matchLenDistr[[2]]
#starting from a GAlignment object data(ctrlGAlignments) aln <- ctrlGAlignments #no log10 scaling matchLenDistr <- histMatchLength(aln, 0) #to plot the histogram matchLenDistr[[2]]
Relative position of the start and stop codon along the transcript
orfRelativePos(cdsTransc, exonGRanges)
orfRelativePos(cdsTransc, exonGRanges)
cdsTransc |
a GRangesList. It contains the CDS coordinates grouped by transcript. |
exonGRanges |
a GRangesList. It contains the exon coordinates grouped by transcript. |
a list. A list of relative positions of the start and end of ORFs.
#make a txdb object containing the annotations for the specified species. #In this case hg19. txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene::TxDb.Hsapiens.UCSC.hg19.knownGene #get all CDSs by transcript cds <- GenomicFeatures::cdsBy(txdb, by="tx", use.names=TRUE) #get all exons by transcript exonGRanges <- GenomicFeatures::exonsBy(txdb, by="tx", use.names=TRUE) #retrieve the positions of start and end codons relative to the transcript cdsPosTransc <- orfRelativePos(cds, exonGRanges)
#make a txdb object containing the annotations for the specified species. #In this case hg19. txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene::TxDb.Hsapiens.UCSC.hg19.knownGene #get all CDSs by transcript cds <- GenomicFeatures::cdsBy(txdb, by="tx", use.names=TRUE) #get all exons by transcript exonGRanges <- GenomicFeatures::exonsBy(txdb, by="tx", use.names=TRUE) #retrieve the positions of start and end codons relative to the transcript cdsPosTransc <- orfRelativePos(cds, exonGRanges)
Plots the summarized coverage in a specified range (e.g. around TSS) for the specified match sizes
plotSummarizedCov(covSummarized)
plotSummarizedCov(covSummarized)
covSummarized |
a list of GRanges objects. For each matchSize a GRanges object of the summarized coverage. |
a ggplot2 plot of read coverage in interval
#read the BAM file into a GAlignments object using #GenomicAlignments::readGAlignments #the GAlignments object should be similar to ctrlGAlignments data(ctrlGAlignments) aln <- ctrlGAlignments #transform the GAlignments object into a GRanges object (faster processing) alnGRanges <- readsToStartOrEnd(aln, what="start") #make a txdb object containing the annotations for the specified species. #In this case hg19. txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene::TxDb.Hsapiens.UCSC.hg19.knownGene #Please make sure that seqnames of txdb correspond to #the seqnames of the alignment files ("chr" particle) #if not rename the txdb seqlevels #renameSeqlevels(txdb, sub("chr", "",seqlevels(txdb))) #get the flanking region around the promoter of the best expressed CDSs oneBinRanges <- aroundPromoter(txdb, alnGRanges) #the read start coverage around the TSS as a percentage for all match sizes. covSummarized <- readStartCov(alnGRanges, oneBinRanges, matchSize="all", c(-20,20), "aroundTSS", charPerc="perc") trackPlotTSS <- plotSummarizedCov(covSummarized) print(trackPlotTSS)
#read the BAM file into a GAlignments object using #GenomicAlignments::readGAlignments #the GAlignments object should be similar to ctrlGAlignments data(ctrlGAlignments) aln <- ctrlGAlignments #transform the GAlignments object into a GRanges object (faster processing) alnGRanges <- readsToStartOrEnd(aln, what="start") #make a txdb object containing the annotations for the specified species. #In this case hg19. txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene::TxDb.Hsapiens.UCSC.hg19.knownGene #Please make sure that seqnames of txdb correspond to #the seqnames of the alignment files ("chr" particle) #if not rename the txdb seqlevels #renameSeqlevels(txdb, sub("chr", "",seqlevels(txdb))) #get the flanking region around the promoter of the best expressed CDSs oneBinRanges <- aroundPromoter(txdb, alnGRanges) #the read start coverage around the TSS as a percentage for all match sizes. covSummarized <- readStartCov(alnGRanges, oneBinRanges, matchSize="all", c(-20,20), "aroundTSS", charPerc="perc") trackPlotTSS <- plotSummarizedCov(covSummarized) print(trackPlotTSS)
Plots the PCA scatterplots produced by codonPCA function.
printPCA(listPCAGraphs)
printPCA(listPCAGraphs)
listPCAGraphs |
a list of 5 PCA ggplot scatterplots. |
a unique plot with the 5 PCA scatterplots.
#How to perform a PCA analysis based on codon coverage data(codonDataCtrl) codonData <- codonDataCtrl codonUsage <- codonData[[1]] codonCovMatrix <- codonData[[2]] #keep only genes with a minimum number of reads nbrReadsGene <- apply(codonCovMatrix, 1, sum) ixExpGenes <- which(nbrReadsGene >= 50) codonCovMatrix <- codonCovMatrix[ixExpGenes, ] #get the PCA on the codon coverage codonCovMatrixTransp <- t(codonCovMatrix) rownames(codonCovMatrixTransp) <- colnames(codonCovMatrix) colnames(codonCovMatrixTransp) <- rownames(codonCovMatrix) listPCACodonCoverage <- codonPCA(codonCovMatrixTransp,"codonCoverage") printPCA(listPCACodonCoverage[[2]])
#How to perform a PCA analysis based on codon coverage data(codonDataCtrl) codonData <- codonDataCtrl codonUsage <- codonData[[1]] codonCovMatrix <- codonData[[2]] #keep only genes with a minimum number of reads nbrReadsGene <- apply(codonCovMatrix, 1, sum) ixExpGenes <- which(nbrReadsGene >= 50) codonCovMatrix <- codonCovMatrix[ixExpGenes, ] #get the PCA on the codon coverage codonCovMatrixTransp <- t(codonCovMatrix) rownames(codonCovMatrixTransp) <- colnames(codonCovMatrix) colnames(codonCovMatrixTransp) <- rownames(codonCovMatrix) listPCACodonCoverage <- codonPCA(codonCovMatrixTransp,"codonCoverage") printPCA(listPCACodonCoverage[[2]])
Read start coverage around the TSS on the predifined CDSs
readStartCov(alnGRanges, oneBinRanges, matchSize = "all", fixedInterval, renameChr, charPerc = "perc")
readStartCov(alnGRanges, oneBinRanges, matchSize = "all", fixedInterval, renameChr, charPerc = "perc")
alnGRanges |
A GRanges object containing the alignment information. In order to improve the performance transform the GAlignments BAM object into a GRanges object containing cigar match size as metadata. |
oneBinRanges |
A GRanges object. Transform the gene GRangesList into one big GRanges object. Add the info on the cds_id. |
matchSize |
either "all" or a vector of read match sizes. If matchSize <- "all", then all the reads are used to compute the coverage. If the matchSize is a vector of read match sizes, the summarized coverage is reported per match size and for the sum up. |
fixedInterval |
a numeric vector with the extremities of the interval. Ex. fixedInterval <- c(-20,20) or fixedInterval <- c(0,40) ... |
renameChr |
a character object. It contains the name to be given to the new summarized coverage interval. Ex. renameChr <- "aroundTSS" the summarized region around the TSS. |
charPerc |
a character object. Either "perc" (the default) or "sum" for percentage of counts per position or the sum of counts per position. |
a list of GRanges objects (for each matchSize chosen). It contains the summarized coverage for the specified read match sizes.
#read the BAM into a GAlignments object using #GenomicAlignments::readGAlignments #the GAlignments object should be similar to ctrlGAlignments data(ctrlGAlignments) aln <- ctrlGAlignments #transform the GAlignments object into a GRanges object (faster processing) alnGRanges <- readsToStartOrEnd(aln, what="start") #make a txdb object containing the annotations for the specified species. #In this case hg19. txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene::TxDb.Hsapiens.UCSC.hg19.knownGene #Please make sure that seqnames of txdb correspond to #the seqnames of the alignment files ("chr" particle) #if not rename the txdb seqlevels #renameSeqlevels(txdb, sub("chr", "", seqlevels(txdb))) #get the flanking region around the promoter of the best expressed CDSs oneBinRanges <- aroundPromoter(txdb, alnGRanges, percBestExpressed=0.01) #the coverage in the TSS flanking region for the summarized read match sizes listPromoterCov <- readStartCov( alnGRanges, oneBinRanges, matchSize="all", fixedInterval=c(-20, 20), renameChr="aroundTSS", charPerc="perc" )
#read the BAM into a GAlignments object using #GenomicAlignments::readGAlignments #the GAlignments object should be similar to ctrlGAlignments data(ctrlGAlignments) aln <- ctrlGAlignments #transform the GAlignments object into a GRanges object (faster processing) alnGRanges <- readsToStartOrEnd(aln, what="start") #make a txdb object containing the annotations for the specified species. #In this case hg19. txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene::TxDb.Hsapiens.UCSC.hg19.knownGene #Please make sure that seqnames of txdb correspond to #the seqnames of the alignment files ("chr" particle) #if not rename the txdb seqlevels #renameSeqlevels(txdb, sub("chr", "", seqlevels(txdb))) #get the flanking region around the promoter of the best expressed CDSs oneBinRanges <- aroundPromoter(txdb, alnGRanges, percBestExpressed=0.01) #the coverage in the TSS flanking region for the summarized read match sizes listPromoterCov <- readStartCov( alnGRanges, oneBinRanges, matchSize="all", fixedInterval=c(-20, 20), renameChr="aroundTSS", charPerc="perc" )
Reads in GAlignments converted to either Read Start (5') or End (3') Positions
readsToStartOrEnd(aln, what)
readsToStartOrEnd(aln, what)
aln |
A GAlignments object of the BAM mapping file. |
what |
A character object. Either "start" (the default) or "end" for read start or read end. |
A GRanges object containing either the read start or end genomic positions.
#read the BAM file into a GAlignments object using #GenomicAlignments::readGAlignments #the GAlignments object should be similar to ctrlGAlignments object data(ctrlGAlignments) aln <- ctrlGAlignments #transform the GAlignments object into a GRanges object (faster processing) alnGRanges <- readsToStartOrEnd(aln, what = "end")
#read the BAM file into a GAlignments object using #GenomicAlignments::readGAlignments #the GAlignments object should be similar to ctrlGAlignments object data(ctrlGAlignments) aln <- ctrlGAlignments #transform the GAlignments object into a GRanges object (faster processing) alnGRanges <- readsToStartOrEnd(aln, what = "end")
Starting from a BAM file path: quality plots, shift ribosome position, coverage on multiple transcript features and on codons.
riboSeqFromBAM(listeInputBamFile, paramScanBAM, genomeName, txdb, percBestExpressed, flankSize, offsetStartEnd, listShiftValue)
riboSeqFromBAM(listeInputBamFile, paramScanBAM, genomeName, txdb, percBestExpressed, flankSize, offsetStartEnd, listShiftValue)
listeInputBamFile |
A character path or a vector of paths to the ribo-seq BAM file(s). If multiple BAM files are provided, they should come from the same genome alignment. |
paramScanBAM |
NULL or ScanBamParam object specifying what fields and which records are imported. Default value is NULL. |
genomeName |
a character object containing the name of the genome used for the alignment BAM file. The name should be one of the UCSC ensGene list: ucscGenomes()[ , "db"]. Ex. "hg19" or "mm10". This parameter is used to build the TxDb object. |
txdb |
a TxDb object containing the annotations info to comfront with the alignment files. Either genomeName or txdb parameters should be provided. |
percBestExpressed |
numeric [between 0 and 1]. The percentage of best expressed CDSs on which to plot the coverage around the TSS. Necessary if the shiftValue parameter must be estimated. Default value 0.03 (3%). |
flankSize |
an integer. How many bp left and right of the TSS should the coverage be performed? |
offsetStartEnd |
a character object. Either "start" (the default) or "end" for read start or read end to define the offset. |
listShiftValue |
a vector of integer. It should have the same length as the inputBamFile vector. The numeric value for shifting ranges of reads on genomic features when computing coverage. Set this parameter to 0 if no shift should be performed. If this parameter is missing, the shiftValue is computed based on the maximum peak of read start coverage around the TSS. A plot is produced to illustrate this estimation. |
A list of list for each BAM file in the inputBamFile list. For each BAM file 2 objects are returned: one data.frame with info on the genomic features and the corresponding coverage column, and one list of per ORF codon coverage.
#the txdb object can be given as parameter or not. #If it is not specified, a txdb object is build from UCSC. txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene::TxDb.Hsapiens.UCSC.hg19.knownGene #in this example only one BAM file is treated. #However, multiple BAM files can be analyzed together. myFile <- system.file("extdata", "ctrl_sample.bam", package="RiboProfiling") listeInputBam <- c(myFile) #when running this function it is important that chromosome names #in UCSC and your BAM correspond: the "chr" particle covData <- riboSeqFromBAM(listeInputBam, txdb=txdb, listShiftValue=c(-14))
#the txdb object can be given as parameter or not. #If it is not specified, a txdb object is build from UCSC. txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene::TxDb.Hsapiens.UCSC.hg19.knownGene #in this example only one BAM file is treated. #However, multiple BAM files can be analyzed together. myFile <- system.file("extdata", "ctrl_sample.bam", package="RiboProfiling") listeInputBam <- c(myFile) #when running this function it is important that chromosome names #in UCSC and your BAM correspond: the "chr" particle covData <- riboSeqFromBAM(listeInputBam, txdb=txdb, listShiftValue=c(-14))