Title: | Intron-Exon Retention Estimator |
---|---|
Description: | This package performs Intron-Exon Retention analysis on RNA-seq data (.bam files). |
Authors: | Ali Oghabian <[email protected]>, Dario Greco <[email protected]>, Mikko Frilander <[email protected]> |
Maintainer: | Ali Oghabian <[email protected]> |
License: | GPL-2 |
Version: | 1.31.0 |
Built: | 2024-10-30 07:37:05 UTC |
Source: | https://github.com/bioc/IntEREst |
Intron/Exon retention estimator quantifies and normalizes Intron retention and Exon junction read levels by analyzing mapped reads (.bam) files.
Package: | IntEREst |
Type: | Package |
Version: | 1.0 |
Date: | 2015-11-18 |
License: GPL-2 |
To run the pipeline use functions interest()
or
interest.sequential()
, i.e. wrapper functions that run all the necessary
functions.
Ali Oghabian <[email protected]>, Dario Greco <[email protected]>, Mikko Frilander <[email protected]>
Maintainer: Ali Oghabian <[email protected]>, Mikko Frilander <[email protected]>
SummarizedExperiment
object
Adds a new sample annotation to the SummarizedExperiment
object. In
other words it adds and column with sample annotations to the colData of the
SummarizedExperiment
object.
addAnnotation(x, sampleAnnotationType, sampleAnnotation)
addAnnotation(x, sampleAnnotationType, sampleAnnotation)
x |
Object of type |
sampleAnnotationType |
The name of the new column to be added to the |
sampleAnnotation |
Vector with the same length as the row-size of the |
An InterestResult object.
Ali Oghabian
# Check the annotation table of mdsChr22Obj data getAnnotation(mdsChr22Obj) # Add a new sample annotation newMdsChr22Obj <- addAnnotation(x=mdsChr22Obj, sampleAnnotationType="sample_number", sampleAnnotation=1:16 ) # Retrieve annotations of the new object getAnnotation(newMdsChr22Obj)
# Check the annotation table of mdsChr22Obj data getAnnotation(mdsChr22Obj) # Add a new sample annotation newMdsChr22Obj <- addAnnotation(x=mdsChr22Obj, sampleAnnotationType="sample_number", sampleAnnotation=1:16 ) # Retrieve annotations of the new object getAnnotation(newMdsChr22Obj)
Receives coordinates, a reference genome and PWMs of splice site of U12 and U2 type introns, and returns a data.frame with 2 columns. The first column shows wheather the corresponding sequences matches U12, U2 or both (U12/U2) consensus sequences (based on their score when fitting the PWMs). The second column shows whether the match is on positive strand or negative when fitting the PWMs to the sequences.
annotateU12(pwmU12U2=c(), pwmSsIndex=c(), referenceChr, referenceBegin, referenceEnd, referenceIntronExon, intronExon='intron', matchWindowRelativeUpstreamPos=c() , matchWindowRelativeDownstreamPos=c(), minMatchScore='80%', refGenome='', setNaAs='U2', annotateU12Subtype=TRUE, includeMatchScores=FALSE, ignoreHybrid=TRUE, filterReference)
annotateU12(pwmU12U2=c(), pwmSsIndex=c(), referenceChr, referenceBegin, referenceEnd, referenceIntronExon, intronExon='intron', matchWindowRelativeUpstreamPos=c() , matchWindowRelativeDownstreamPos=c(), minMatchScore='80%', refGenome='', setNaAs='U2', annotateU12Subtype=TRUE, includeMatchScores=FALSE, ignoreHybrid=TRUE, filterReference)
pwmU12U2 |
A list containing position weight matrices of (in order): Donor site, branch
point, and acceptor site of U12-type introns, and donor site and acceptor site
of U2-type introns. If not provided, the information related to |
pwmSsIndex |
A list (or vector) that contains the column number in each element of
|
referenceChr |
Chromosome names of the references (e.g. introns). |
referenceBegin |
A vector that corresponds to the begin coordinates of the reference (e.g. introns). |
referenceEnd |
A vector that corresponds to the end coordinates of the reference (e.g.
introns). |
referenceIntronExon |
A vector with the same size as the |
intronExon |
Should be assigned either |
matchWindowRelativeUpstreamPos |
A vector the same size as the |
matchWindowRelativeDownstreamPos |
A vector the same size as the |
minMatchScore |
Min percentage match score, when scoring matching of a sequence to |
refGenome |
The reference genome; Object of class BSgenome. Use |
setNaAs |
Defines that if reference (e.g. intron) did not match any of U12 or U2 type introns based on the scores obtained from PWM what should the function return. If an intron was not proven to be U12 or U2 based on PWM scores it can be considered as U2-type since the U12 type introns constitute for about 1% of introns in human genome and they are muxh more conserved than the U2 type introns, hence the default is 'U2'; otherwise it is also possible to set it as NA or nan or 'U12/U2'. |
annotateU12Subtype |
Whether annotate the subtypes of the U12 type
Introns. The value is |
includeMatchScores |
If set as TRUE the final data frame result includes the PWM match scores (FALSE by default). |
ignoreHybrid |
Whether ignore the U12 hybrid subtypes, i.e. GT-AC and AT-AG (TRUE by default). |
filterReference |
Optional parameter that can be defined either as a GRanges or SummarizedExperiment object. If defined as the latter, the first 3 columns of the rowData must be: chr name, start and end of the coordinates. If the parameter is defined the introns/exon coordinates will be mapped against it and the intron type of all those that do not match will be set as NA. |
Data frame containing 3 columns representing (in order): intron type (U12, U2 or none), strand match indicating whether the PWM matches to the sequence (+ strand) or the reverese complement of the sequence (- strand) or none (NA), and the U12 subtype (GT-AG or AT-AC). If includeMatchScores is set as TRUE further columns that include the PWM match scores will also be included.
Ali Oghabian
# Improting genome BSgenome.Hsapiens.UCSC.hg19 <- BSgenome.Hsapiens.UCSC.hg19::BSgenome.Hsapiens.UCSC.hg19 #Choosing subset of rows ind<- 69:94 # Annotate U12 introns with strong U12 donor site, branch point # and acceptor site from the u12 data in the package annoU12<- annotateU12(pwmU12U2=list(pwmU12db[[1]][,11:17],pwmU12db[[2]] ,pwmU12db[[3]][,38:40],pwmU12db[[4]][,11:17], pwmU12db[[5]][,38:40]), pwmSsIndex=list(indexDonU12=1, indexBpU12=1, indexAccU12=3, indexDonU2=1, indexAccU2=3), referenceChr=u12[ind,'chr'], referenceBegin=u12[ind,'begin'], referenceEnd=u12[ind,'end'], referenceIntronExon=u12[ind,"int_ex"], intronExon="intron", matchWindowRelativeUpstreamPos=c(NA,-29,NA,NA,NA), matchWindowRelativeDownstreamPos=c(NA,-9,NA,NA,NA), minMatchScore=c(rep(paste(80,"%",sep=""),2), "60%", paste(80,"%",sep=""), "60%"), refGenome=BSgenome.Hsapiens.UCSC.hg19, setNaAs="U2", annotateU12Subtype=TRUE) # How many U12 and U2 type introns with strong U12 donor sites, # acceptor sites (and branch points for U12-type) are there? table(annoU12[,1])
# Improting genome BSgenome.Hsapiens.UCSC.hg19 <- BSgenome.Hsapiens.UCSC.hg19::BSgenome.Hsapiens.UCSC.hg19 #Choosing subset of rows ind<- 69:94 # Annotate U12 introns with strong U12 donor site, branch point # and acceptor site from the u12 data in the package annoU12<- annotateU12(pwmU12U2=list(pwmU12db[[1]][,11:17],pwmU12db[[2]] ,pwmU12db[[3]][,38:40],pwmU12db[[4]][,11:17], pwmU12db[[5]][,38:40]), pwmSsIndex=list(indexDonU12=1, indexBpU12=1, indexAccU12=3, indexDonU2=1, indexAccU2=3), referenceChr=u12[ind,'chr'], referenceBegin=u12[ind,'begin'], referenceEnd=u12[ind,'end'], referenceIntronExon=u12[ind,"int_ex"], intronExon="intron", matchWindowRelativeUpstreamPos=c(NA,-29,NA,NA,NA), matchWindowRelativeDownstreamPos=c(NA,-9,NA,NA,NA), minMatchScore=c(rep(paste(80,"%",sep=""),2), "60%", paste(80,"%",sep=""), "60%"), refGenome=BSgenome.Hsapiens.UCSC.hg19, setNaAs="U2", annotateU12Subtype=TRUE) # How many U12 and U2 type introns with strong U12 donor sites, # acceptor sites (and branch points for U12-type) are there? table(annoU12[,1])
Runs a function on columns of the counts (assay) of a 'SummarizedExperiment'
object (resulted by interest()
, interest.sequential()
or
readInterestResults()
) based on the overalp of its exon/intron
coordinates with those of another 'SummarizedExperiment' object. The number of
the rows and the dimensions of the counts of the result are equal to those of
the subject
. The function is applied on the query based on it's overlap
to the subject.
applyOverlap( query, subject, type="any", replaceValues=FALSE, intExCol="int_ex", intronExon="intron", sujectGeneNamesCol, repeatsTableToFilter=c(), scaleFragment=TRUE, scaleLength=TRUE, unmapValue=0, FUN=mean, ... )
applyOverlap( query, subject, type="any", replaceValues=FALSE, intExCol="int_ex", intronExon="intron", sujectGeneNamesCol, repeatsTableToFilter=c(), scaleFragment=TRUE, scaleLength=TRUE, unmapValue=0, FUN=mean, ... )
query , subject
|
SummarizedExperiment objects resulted by |
type |
The type of overlap. By default it considers any overlap. See findOverlaps-methods for more info. |
replaceValues |
Whether return a 'SummarizedExperiment' object with new counts (resulted by running function) replaced. |
intExCol |
Column name (or number) in the rowData of the objects that represents whether each row of the assay is "intron" or "exon". |
intronExon |
Should be assigned either |
sujectGeneNamesCol |
The column in the row data of the subject that includes the gene names. |
repeatsTableToFilter |
A data.frame table that includes chr,begin and end columns. If defined, all reads mapped to the described regions will be ingnored. |
scaleFragment |
Logical value, indicating whether the retention levels must be scaled by (genewide) fragment levels. |
scaleLength |
Logical value, indicating whether the retention levels must be scaled by length of the introns/exons. |
unmapValue |
The value to assign to unmapped rows (i.e. introns/exons). |
FUN |
The function to apply. |
... |
Other parameter settings from |
The returned value is a data frame if replaceValues
is FALSE and it is
SummarizedExperiment if replaceValues
is TRUE.
Ali Oghabian
readInterestResults
interest
interest.sequential
mdsChr22Obj tmp<- applyOverlap( query=mdsChr22Obj, subject=mdsChr22Obj, type="equal", replaceValues=FALSE, intExCol="int_ex", intronExon="intron", sujectGeneNamesCol="collapsed_transcripts", scaleFragment=TRUE, scaleLength=TRUE, unmapValue=0, FUN=head, n=1 )
mdsChr22Obj tmp<- applyOverlap( query=mdsChr22Obj, subject=mdsChr22Obj, type="equal", replaceValues=FALSE, intExCol="int_ex", intronExon="intron", sujectGeneNamesCol="collapsed_transcripts", scaleFragment=TRUE, scaleLength=TRUE, unmapValue=0, FUN=head, n=1 )
Several functions are provided that can extract various attribnutes from an
object of class SummarizedExperiment
generated by IntEREst functions,
e.g. interest()
, interest
, and readInterestResults
. It is
possible to extract sample annotations using getAnnotation
function.
One can also extract the scaled retention levels of the introns/exons using
scaledRetention()
function. Notes that colData
and rowData
methods of SummarizedExperiment
class can also be used to extract row
and column data.
getAnnotation(x) scaledRetention(x)
getAnnotation(x) scaledRetention(x)
x |
Object of type |
Various data types (data.frame/vector) dependent on the function used. See the "Description" for more information.
Ali Oghabian
SummarizedExperiment-class
addAnnotation
counts-method
plot-method
# Retrieve the sample annotations from mdsChr22Obj getAnnotation(mdsChr22Obj) # Retrieving the scaled retention levels from mdsChr22Obj head(scaledRetention(mdsChr22Obj)) #for row and column data SummarizedExperiment methods can be used head(rowData(mdsChr22Obj)) colData(mdsChr22Obj)
# Retrieve the sample annotations from mdsChr22Obj getAnnotation(mdsChr22Obj) # Retrieving the scaled retention levels from mdsChr22Obj head(scaledRetention(mdsChr22Obj)) #for row and column data SummarizedExperiment methods can be used head(rowData(mdsChr22Obj)) colData(mdsChr22Obj)
boxplot method for SummarizedExperiment
objects.
## S4 method for signature 'SummarizedExperiment' boxplot(x, sampleAnnoCol=NA, intexTypeCol="int_type", intexType=c(), col="white", boxplotNames=c(), lasNames=3, outline=FALSE, addGrid=FALSE, ...)
## S4 method for signature 'SummarizedExperiment' boxplot(x, sampleAnnoCol=NA, intexTypeCol="int_type", intexType=c(), col="white", boxplotNames=c(), lasNames=3, outline=FALSE, addGrid=FALSE, ...)
x |
Object of type |
sampleAnnoCol |
Which colummn of |
intexTypeCol |
Column name (or number) that represents what type of intron/exon each row of
|
intexType |
A vector of characters describing types of introns/exons to be plotted. They
must be elements in the |
col |
Vector showing box colours. It is either of size 1 or the same size as the number of groups to be plotted. |
boxplotNames |
Names to write under boxes. If not defined, as names, it pastes the row (intron/exon) annotation names to the sample group annotations separated by a space " ". |
lasNames |
Orientation of the box names. |
outline |
If outline is TRUE the outlier points are drawn otherwise if FALSE (default) they are not. |
addGrid |
Whether add a grid under the boxplots (FALSE by default). |
... |
Other arguments to pass to the |
Returns NULL.
Ali Oghabian
Class:
SummarizedExperiment-class
Method:
counts-method
plot-method
#Plotting U12- vs U2-type introns par(mar=c(8,4,2,1)) boxplot(x=mdsChr22Obj, sampleAnnoCol="type", intexTypeCol="intron_type", intexType=c("U2", "U12"), col=rep(c("yellow", "orange"),3), boxplotNames=c(), lasNames=3, outline=FALSE, addGrid=TRUE)
#Plotting U12- vs U2-type introns par(mar=c(8,4,2,1)) boxplot(x=mdsChr22Obj, sampleAnnoCol="type", intexTypeCol="intron_type", intexType=c("U2", "U12"), col=rep(c("yellow", "orange"),3), boxplotNames=c(), lasNames=3, outline=FALSE, addGrid=TRUE)
Builds position Weigh Matrices for the donor and acceptor sites of the U12 and
U2 type introns, and the branchpoint of the U12 type introns. if
pdfFileSeqLogos
is defined a pdf is also produced that contains the
sequence logos of the results. The result is a list that contains PWMs of the
splice sites of U12 and U2 dependent introns.
buildSsTypePwms( cexSeqLogo=1, pdfWidth=35, pdfHeight=10, tmpDir="./", u12dbSpecies="Homo_sapiens", pwmSource="U12DB", u12DonorBegin, u12BranchpointBegin, u12AcceptorBegin, u2DonorBegin, u2AcceptorBegin, u12DonorEnd, u12BranchpointEnd, u12AcceptorEnd, u2DonorEnd, u2AcceptorEnd, pasteSites=FALSE, splicerackSsLinks=list( U12_AT_AC_donor= "http://katahdin.mssm.edu/splice/out/9606_logo_file.25", U12_AT_AC_branchpoint= "http://katahdin.mssm.edu/splice/out/9606_logo_file.26", U12_AT_AC_acceptor= "http://katahdin.mssm.edu/splice/out/9606_logo_file.29", U12_GT_AG_donor= "http://katahdin.mssm.edu/splice/out/9606_logo_file.22", U12_GT_AG_branchpoint= "http://katahdin.mssm.edu/splice/out/9606_logo_file.27", U12_GT_AG_acceptor= "http://katahdin.mssm.edu/splice/out/9606_logo_file.21", U2_GC_AG_donor="http://katahdin.mssm.edu/splice/out/9606_logo_file.24", U2_GC_AG_acceptor= "http://katahdin.mssm.edu/splice/out/9606_logo_file.30", U2_GT_AG_donor="http://katahdin.mssm.edu/splice/out/9606_logo_file.23", U2_GT_AG_acceptor= "http://katahdin.mssm.edu/splice/out/9606_logo_file.28"), u12dbLink="https://genome.crg.cat/pub/software/u12/u12db_v1_0.sql.gz", u12dbDbName="u12db", u12dbDropDb=TRUE, pdfFileSeqLogos="", removeTempFiles=TRUE, ...)
buildSsTypePwms( cexSeqLogo=1, pdfWidth=35, pdfHeight=10, tmpDir="./", u12dbSpecies="Homo_sapiens", pwmSource="U12DB", u12DonorBegin, u12BranchpointBegin, u12AcceptorBegin, u2DonorBegin, u2AcceptorBegin, u12DonorEnd, u12BranchpointEnd, u12AcceptorEnd, u2DonorEnd, u2AcceptorEnd, pasteSites=FALSE, splicerackSsLinks=list( U12_AT_AC_donor= "http://katahdin.mssm.edu/splice/out/9606_logo_file.25", U12_AT_AC_branchpoint= "http://katahdin.mssm.edu/splice/out/9606_logo_file.26", U12_AT_AC_acceptor= "http://katahdin.mssm.edu/splice/out/9606_logo_file.29", U12_GT_AG_donor= "http://katahdin.mssm.edu/splice/out/9606_logo_file.22", U12_GT_AG_branchpoint= "http://katahdin.mssm.edu/splice/out/9606_logo_file.27", U12_GT_AG_acceptor= "http://katahdin.mssm.edu/splice/out/9606_logo_file.21", U2_GC_AG_donor="http://katahdin.mssm.edu/splice/out/9606_logo_file.24", U2_GC_AG_acceptor= "http://katahdin.mssm.edu/splice/out/9606_logo_file.30", U2_GT_AG_donor="http://katahdin.mssm.edu/splice/out/9606_logo_file.23", U2_GT_AG_acceptor= "http://katahdin.mssm.edu/splice/out/9606_logo_file.28"), u12dbLink="https://genome.crg.cat/pub/software/u12/u12db_v1_0.sql.gz", u12dbDbName="u12db", u12dbDropDb=TRUE, pdfFileSeqLogos="", removeTempFiles=TRUE, ...)
cexSeqLogo |
Font size of sequence logo plots; used only if |
pdfWidth , pdfHeight
|
The width and height of the graphics region of the pdf in inches. The default values are 35 and 10. |
tmpDir |
Path to directory used for storing temporary files. |
u12dbSpecies |
What species data to use when getting the data from the U12DB database
( |
pwmSource |
The source used to buildSplice Sites of U12 and U2 type introns the PWM for U12
and U2 dependent introns. Default is |
u12DonorBegin , u12DonorEnd
|
Integer values. They correspond to the begin and end point of the donor sequences of U12-type introns to consider (optional). |
u12BranchpointBegin , u12BranchpointEnd
|
Integer values. Begin and end points of the branch point sequences of U12-type introns (optional). |
u12AcceptorBegin , u12AcceptorEnd
|
Integer values. Begin and end points of the acceptor sequences of U12-type introns (optional). |
u2DonorBegin , u2DonorEnd
|
Integer values. Begin and end points of the donor sequences of U2-type introns (optional). |
u2AcceptorBegin , u2AcceptorEnd
|
Integer values. Begin and end points of the acceptor sequences of U2-type introns (optional). |
pasteSites |
Logical. If TRUE the donor, branch point and acceptor seqs are pasted before a PWM is built; then the PWMs of each (donor, acceptor and bp) are assigned. If FALSE (default) the PWMs for each is built separately. |
splicerackSsLinks |
A list (or vector) that contains the SpliceRack URL links to the text files
that contain Position Weigh Matrices of the splice sites of U12 and U2 introns.
This parameter is used only when pwmSource="SpliceRack". You can get the links
to PWM files from this URL (choose logo files with "File" links):
http://katahdin.mssm.edu/splice/splice_matrix.cgi?database=spliceNew.
The links should be defined in the following order: |
u12dbLink |
A character string containing the URL for downloading the zipped MySQL dump
file of the U12DB. Used when |
u12dbDbName |
Name of the database copy of the U12DB that is build locally. Used when
|
u12dbDropDb |
Drop (or remove) the local copy of the U12DB database at the end of the run.
Used when |
pdfFileSeqLogos |
Path to PDF file containing the sequence logos of the results. By default it does not produce a file. |
removeTempFiles |
Whether remove temporary files at the end of the run; accepts |
... |
Authorization arguments needed by the DBMS instance. See the manual for
|
pwmDonorU12 |
Matrix (with 4 rows represnting A, C, G, T and |
pwmBpU12 |
Position Weight Matrix of branchpoint of U12-type introns. |
pwmAccU12 |
Position Weight Matrix of acceptor site of U12-type introns. |
pwmDonU2 |
Position Weight Matrix of donor site of U2-type introns. |
pwmAccU2 |
Position Weight Matrix of acceptor site of U2-type introns. |
Ali Oghabian
# Time demanding function ## Not run: #Build temp directory tmpDir<- tempdir() # Creating subdirectory for storing u12db temp files dir.create(paste(tmpDir, "u12dbTmp", sep="/")) # Extracting PWMs of Splice Sites of U12 and U2 type introns - # based on u12db u12dbPwm<-buildSsTypePwms( tmpDir=paste(tmpDir, "u12dbTmp", sep="/"), u12dbSpecies="Homo_sapiens", resource="U12DB", u12dbDbName="u12db", u12dbDropDb=TRUE, removeTempFiles=TRUE) # Creating subdirectory for storing SpliceRack temp files dir.create(paste(tmpDir, "splicerackTmp", sep="/")) # Extracting PWMs of Splice Sites of U12 and U2 type introns - # based on SpliceRack spliceRackPwm<- buildSsTypePwms( tmpDir= paste(tmpDir, "splicerackTmp", sep="/"), resource="SpliceRack", removeTempFiles=TRUE) ## End(Not run)
# Time demanding function ## Not run: #Build temp directory tmpDir<- tempdir() # Creating subdirectory for storing u12db temp files dir.create(paste(tmpDir, "u12dbTmp", sep="/")) # Extracting PWMs of Splice Sites of U12 and U2 type introns - # based on u12db u12dbPwm<-buildSsTypePwms( tmpDir=paste(tmpDir, "u12dbTmp", sep="/"), u12dbSpecies="Homo_sapiens", resource="U12DB", u12dbDbName="u12db", u12dbDropDb=TRUE, removeTempFiles=TRUE) # Creating subdirectory for storing SpliceRack temp files dir.create(paste(tmpDir, "splicerackTmp", sep="/")) # Extracting PWMs of Splice Sites of U12 and U2 type introns - # based on SpliceRack spliceRackPwm<- buildSsTypePwms( tmpDir= paste(tmpDir, "splicerackTmp", sep="/"), resource="SpliceRack", removeTempFiles=TRUE) ## End(Not run)
Returns the (row) number of reads that are mapped to introns/exons in various samples.
## S4 method for signature 'SummarizedExperiment' counts(object)
## S4 method for signature 'SummarizedExperiment' counts(object)
object |
Object of type |
Returns a numeric matrix.
Ali Oghabian
Class:
SummarizedExperiment-class
Method:
plot-method
.
#Show contents of a InterestResults object included in IntEREst head(counts(mdsChr22Obj)) #Make a test InterestResults object geneId<- paste("gene", c(rep(1,5), rep(2,5), rep(3,5), rep(4,5)), sep="_") readCnt1<- sample(1:100, 20) readCnt2<- sample(1:100, 20) readCnt3<- sample(1:100, 20) readCnt4<- sample(1:100, 20) fpkm1<- readCnt1/(tapply(readCnt1, geneId, sum))[geneId] fpkm2<- readCnt2/(tapply(readCnt2, geneId, sum))[geneId] fpkm3<- readCnt3/(tapply(readCnt3, geneId, sum))[geneId] fpkm4<- readCnt4/(tapply(readCnt4, geneId, sum))[geneId] # Creating object using test data interestDat<- data.frame( int_ex=rep(c(rep(c("exon","intron"),2),"exon"),4), int_ex_num= rep(c(1,1,2,2,3),4), gene_id= geneId, sam1_readCnt=readCnt1, sam2_readCnt=readCnt2, sam3_readCnt=readCnt3, sam4_readCnt=readCnt4, sam1_fpkm=fpkm1, sam2_fpkm=fpkm2, sam3_fpkm=fpkm3, sam4_fpkm=fpkm4 ) readFreqColIndex<- grep("_readCnt$",colnames(interestDat)) scaledRetentionColIndex<- grep("_fpkm$",colnames(interestDat)) scalRetTmp<- as.matrix(interestDat[ ,scaledRetentionColIndex]) colnames(scalRetTmp)<-gsub("_fpkm$","", colnames(scalRetTmp)) frqTmp<- as.matrix(interestDat[ ,readFreqColIndex]) colnames(frqTmp)<-gsub("_readCnt$","", colnames(frqTmp)) InterestResultObj<- InterestResult( resultFiles=paste("file",1:4, sep="_"), rowData= interestDat[ , -c(readFreqColIndex, scaledRetentionColIndex)], counts= frqTmp, scaledRetention= scalRetTmp, scaleLength=TRUE, scaleFragment=FALSE, sampleAnnotation=data.frame( sampleName=paste("sam",1:4, sep=""), gender=c("M","M","F","F"), health=c("healthy","unhealthy","healthy","unhealthy") , row.names=paste("sam", 1:4, sep="") ) ) #Show head(counts(InterestResultObj))
#Show contents of a InterestResults object included in IntEREst head(counts(mdsChr22Obj)) #Make a test InterestResults object geneId<- paste("gene", c(rep(1,5), rep(2,5), rep(3,5), rep(4,5)), sep="_") readCnt1<- sample(1:100, 20) readCnt2<- sample(1:100, 20) readCnt3<- sample(1:100, 20) readCnt4<- sample(1:100, 20) fpkm1<- readCnt1/(tapply(readCnt1, geneId, sum))[geneId] fpkm2<- readCnt2/(tapply(readCnt2, geneId, sum))[geneId] fpkm3<- readCnt3/(tapply(readCnt3, geneId, sum))[geneId] fpkm4<- readCnt4/(tapply(readCnt4, geneId, sum))[geneId] # Creating object using test data interestDat<- data.frame( int_ex=rep(c(rep(c("exon","intron"),2),"exon"),4), int_ex_num= rep(c(1,1,2,2,3),4), gene_id= geneId, sam1_readCnt=readCnt1, sam2_readCnt=readCnt2, sam3_readCnt=readCnt3, sam4_readCnt=readCnt4, sam1_fpkm=fpkm1, sam2_fpkm=fpkm2, sam3_fpkm=fpkm3, sam4_fpkm=fpkm4 ) readFreqColIndex<- grep("_readCnt$",colnames(interestDat)) scaledRetentionColIndex<- grep("_fpkm$",colnames(interestDat)) scalRetTmp<- as.matrix(interestDat[ ,scaledRetentionColIndex]) colnames(scalRetTmp)<-gsub("_fpkm$","", colnames(scalRetTmp)) frqTmp<- as.matrix(interestDat[ ,readFreqColIndex]) colnames(frqTmp)<-gsub("_readCnt$","", colnames(frqTmp)) InterestResultObj<- InterestResult( resultFiles=paste("file",1:4, sep="_"), rowData= interestDat[ , -c(readFreqColIndex, scaledRetentionColIndex)], counts= frqTmp, scaledRetention= scalRetTmp, scaleLength=TRUE, scaleFragment=FALSE, sampleAnnotation=data.frame( sampleName=paste("sam",1:4, sep=""), gender=c("M","M","F","F"), health=c("healthy","unhealthy","healthy","unhealthy") , row.names=paste("sam", 1:4, sep="") ) ) #Show head(counts(InterestResultObj))
Differential intron retention test adapted from the DESeq2
package.
deseqInterest (x, design, pAdjustMethod = "BH", sizeFactor=c(), contrast, bpparam, ...)
deseqInterest (x, design, pAdjustMethod = "BH", sizeFactor=c(), contrast, bpparam, ...)
x |
Object of type |
design |
Formula specifying the design of the experiment. It must specify an interaction
term between variables from column names of |
pAdjustMethod |
What adjustment method to be sed on the p-values. See p.adjust for more information. |
sizeFactor |
Numeric vector with the same size as the clolumn size of the count matrix in
|
contrast |
Argument speciftying the comparison to extract from |
bpparam |
An optional |
... |
Other parameter settings for the results function
in the |
a DESeqResults
object.
Ali Oghabian
exactTestInterest
qlfInterest
,
treatInterest
DEXSeqIntEREst
mdsChr22IntObj<- mdsChr22Obj[rowData(mdsChr22Obj)$int_ex=="intron",] deseqRes<- deseqInterest(x=mdsChr22IntObj, design=~test_ctrl, contrast=list("test_ctrl_test_vs_ctrl")) # Number of U12/U2 type significantly differential retained introns in chr22 table(rowData(mdsChr22Obj)[which(deseqRes$padj<.01), "intron_type"])
mdsChr22IntObj<- mdsChr22Obj[rowData(mdsChr22Obj)$int_ex=="intron",] deseqRes<- deseqInterest(x=mdsChr22IntObj, design=~test_ctrl, contrast=list("test_ctrl_test_vs_ctrl")) # Number of U12/U2 type significantly differential retained introns in chr22 table(rowData(mdsChr22Obj)[which(deseqRes$padj<.01), "intron_type"])
Genewise differential exon usage or intron retention test adapted from the
DEXSeq
package.
DEXSeqIntEREst (x, design, reducedModel = ~ sample + intex, fitExpToVar, intExCol, geneIdCol, bpparam, silent=TRUE,...)
DEXSeqIntEREst (x, design, reducedModel = ~ sample + intex, fitExpToVar, intExCol, geneIdCol, bpparam, silent=TRUE,...)
x |
Object of type |
design |
Formula specifying the design of the experiment. It must specify an interaction
term between a variable from columns of |
reducedModel |
The null model formula. By default it is '~ sample + intex'. |
fitExpToVar |
A variable name contained in the column data (i.e. column names of
|
intExCol |
Column name (or number) that represents whether each row is "intron" or "exon"
in |
geneIdCol |
Column name (or number of column) in |
bpparam |
An optional |
silent |
Whether run the DEXSeq function silently (if TRUE) or allow it to print messages at each step (if FALSE). |
... |
Other parameter settings for the |
The design
and reduceModel
accept formula that specify the design
of the experiment. The formula must describe an interaction between variables
from columns of sampleData(x)
with one of the 'exon', 'intron' or
'intex' (i.e. intron and exon) variables; Based on which of these variables are
used (exon, intron , or 'intex') the input object (x
) will be filtered
reletively to include exons, introns , or introns and exons. Hence the number
of the rows of the returned value is equal to the number of the rows of the
filtered object, i.e. the number of the exons, introns or both based on the
design
formula.
A DEXSeqResults
object.
Ali Oghabian
dexseqExRes<-DEXSeqIntEREst (x=mdsChr22ExObj, design= ~ sample + exon + test_ctrl:exon, reducedModel = ~ sample + exon, fitExpToVar="test_ctrl", intExCol="int_ex", geneIdCol="transcripts_id", silent=TRUE) head(dexseqExRes)
dexseqExRes<-DEXSeqIntEREst (x=mdsChr22ExObj, design= ~ sample + exon + test_ctrl:exon, reducedModel = ~ sample + exon, fitExpToVar="test_ctrl", intExCol="int_ex", geneIdCol="transcripts_id", silent=TRUE) head(dexseqExRes)
Compute genewise exact test between two groups of read counts, using the
edgeR
package.
exactTestInterest(x, sampleAnnoCol=c(), sampleAnnotation=c(), geneIdCol, silent=TRUE, group=c(), rejection.region="doubletail", big.count=900, prior.count=0.125, disp="common", ...)
exactTestInterest(x, sampleAnnoCol=c(), sampleAnnotation=c(), geneIdCol, silent=TRUE, group=c(), rejection.region="doubletail", big.count=900, prior.count=0.125, disp="common", ...)
x |
Object of type |
sampleAnnoCol |
Which colummn of |
sampleAnnotation |
A vector of size 2 which cotains values from |
geneIdCol |
Column name (or number of column) in |
silent |
Whether run the function silently, i.e. without printing the top differential expression tags. |
group |
Vector to manually define the sample groups (or annotations). It is ignored if
|
rejection.region |
The |
big.count |
The |
prior.count |
The |
disp |
The type of estimating the dispersion in the data. Available options are:
"tagwise", "trended", "common" and "genewise". It is also possible to assign a
number for manually setting the |
... |
Other parameter settings for the |
table |
Data frame containing columns for the log2 fold-change (logFC), the average of log2 counts-per-million (logCPM), and the two-sided p-value (PValue). |
comparison |
The name of the two compared groups. |
dispersionType |
The name of the type of dispersion used. |
dispersion |
The estimated dispersion values. |
Ali Oghabian
lfc
, glmInterest
, qlfInterest
,
treatInterest
, DEXSeqIntEREst
geneId<- paste("gene", c(rep(1,5), rep(2,5), rep(3,5), rep(4,5)), sep="_") readCnt1<- sample(1:100, 20) readCnt2<- sample(1:100, 20) readCnt3<- sample(1:100, 20) readCnt4<- sample(1:100, 20) fpkm1<- readCnt1/(tapply(readCnt1, geneId, sum))[geneId] fpkm2<- readCnt2/(tapply(readCnt2, geneId, sum))[geneId] fpkm3<- readCnt3/(tapply(readCnt3, geneId, sum))[geneId] fpkm4<- readCnt4/(tapply(readCnt4, geneId, sum))[geneId] # Creating object using test data interestDat<- data.frame( int_ex=rep(c(rep(c("exon","intron"),2),"exon"),4), int_ex_num= rep(c(1,1,2,2,3),4), gene_id= geneId, sam1_readCnt=readCnt1, sam2_readCnt=readCnt2, sam3_readCnt=readCnt3, sam4_readCnt=readCnt4, sam1_fpkm=fpkm1, sam2_fpkm=fpkm2, sam3_fpkm=fpkm3, sam4_fpkm=fpkm4 ) readFreqColIndex<- grep("_readCnt$",colnames(interestDat)) scaledRetentionColIndex<- grep("_fpkm$",colnames(interestDat)) scalRetTmp<- as.matrix(interestDat[ ,scaledRetentionColIndex]) colnames(scalRetTmp)<-gsub("_fpkm$","", colnames(scalRetTmp)) frqTmp<- as.matrix(interestDat[ ,readFreqColIndex]) colnames(frqTmp)<-gsub("_readCnt$","", colnames(frqTmp)) InterestResultObj<- InterestResult( resultFiles=paste("file",1:4, sep="_"), rowData= interestDat[ , -c(readFreqColIndex, scaledRetentionColIndex)], counts= frqTmp, scaledRetention= scalRetTmp, scaleLength=TRUE, scaleFragment=FALSE, sampleAnnotation=data.frame( sampleName=paste("sam",1:4, sep=""), gender=c("M","M","F","F"), row.names=paste("sam", 1:4, sep="") ) ) res<- exactTestInterest(InterestResultObj, sampleAnnoCol="gender", sampleAnnotation=c("F","M"), geneIdCol= "gene_id", silent=TRUE, disp="common")
geneId<- paste("gene", c(rep(1,5), rep(2,5), rep(3,5), rep(4,5)), sep="_") readCnt1<- sample(1:100, 20) readCnt2<- sample(1:100, 20) readCnt3<- sample(1:100, 20) readCnt4<- sample(1:100, 20) fpkm1<- readCnt1/(tapply(readCnt1, geneId, sum))[geneId] fpkm2<- readCnt2/(tapply(readCnt2, geneId, sum))[geneId] fpkm3<- readCnt3/(tapply(readCnt3, geneId, sum))[geneId] fpkm4<- readCnt4/(tapply(readCnt4, geneId, sum))[geneId] # Creating object using test data interestDat<- data.frame( int_ex=rep(c(rep(c("exon","intron"),2),"exon"),4), int_ex_num= rep(c(1,1,2,2,3),4), gene_id= geneId, sam1_readCnt=readCnt1, sam2_readCnt=readCnt2, sam3_readCnt=readCnt3, sam4_readCnt=readCnt4, sam1_fpkm=fpkm1, sam2_fpkm=fpkm2, sam3_fpkm=fpkm3, sam4_fpkm=fpkm4 ) readFreqColIndex<- grep("_readCnt$",colnames(interestDat)) scaledRetentionColIndex<- grep("_fpkm$",colnames(interestDat)) scalRetTmp<- as.matrix(interestDat[ ,scaledRetentionColIndex]) colnames(scalRetTmp)<-gsub("_fpkm$","", colnames(scalRetTmp)) frqTmp<- as.matrix(interestDat[ ,readFreqColIndex]) colnames(frqTmp)<-gsub("_readCnt$","", colnames(frqTmp)) InterestResultObj<- InterestResult( resultFiles=paste("file",1:4, sep="_"), rowData= interestDat[ , -c(readFreqColIndex, scaledRetentionColIndex)], counts= frqTmp, scaledRetention= scalRetTmp, scaleLength=TRUE, scaleFragment=FALSE, sampleAnnotation=data.frame( sampleName=paste("sam",1:4, sep=""), gender=c("M","M","F","F"), row.names=paste("sam", 1:4, sep="") ) ) res<- exactTestInterest(InterestResultObj, sampleAnnoCol="gender", sampleAnnotation=c("F","M"), geneIdCol= "gene_id", silent=TRUE, disp="common")
This function returns a data.frame that includes regions with repetetive DNA sequences. These sequences can bias the mapping of the reads to the genome excluding them will remove the bias.
getRepeatTable( dbUser="genome", dbHost="genome-mysql.cse.ucsc.edu", ucscGenome="hg19", ucscTable="rmsk", minLength=0, repFamilyFil="Alu", repFamilyCol="repFamily", repChrCol="genoName", repBegCol="genoStart", repEndCol="genoEnd", repStrandCol="strand", repNameCol="repName", repClassCol="repClass")
getRepeatTable( dbUser="genome", dbHost="genome-mysql.cse.ucsc.edu", ucscGenome="hg19", ucscTable="rmsk", minLength=0, repFamilyFil="Alu", repFamilyCol="repFamily", repChrCol="genoName", repBegCol="genoStart", repEndCol="genoEnd", repStrandCol="strand", repNameCol="repName", repClassCol="repClass")
dbUser |
Database user name; set as "genome" by default. |
dbHost |
Database host address; set as "genome-mysql.cse.ucsc.edu" by default. |
ucscGenome |
The UCSC genome. |
ucscTable |
The UCSC table name. The table with repetetive sequences by default it is set
as |
minLength |
the minimum length criteria to consider the repetetive sequences. the default
setting is |
repFamilyFil |
A vector including the repeats family to consider. By default the |
repFamilyCol |
The name of the column of the input table ( |
repChrCol |
The column (either name or the number of the column) of the input table that represents the Chromosome names. |
repBegCol |
The column of the table that represents the start coordinates. |
repEndCol |
The column of the table that represents the end coordinates. |
repStrandCol |
The column of the table that represents the strand. |
repNameCol |
The column of the table representing the repeats' names. |
repClassCol |
The column of the table representing the repeats' classes. |
Data frame with columns representing coordinates and annotations of repetitive DNA elements.
Ali Oghabian
## Not run: # Download table for Alu elemnts in the human genome suppressWarnings(repTable<- getRepeatTable(repFamilyFil="Alu", ucscGenome="hg19")) ## End(Not run)
## Not run: # Download table for Alu elemnts in the human genome suppressWarnings(repTable<- getRepeatTable(repFamilyFil="Alu", ucscGenome="hg19")) ## End(Not run)
Compute generalized linear model likelihood ratio tests using edgeR
package. For more information see glmfit
and
glmLRT()
functions in edgeR
package.
glmInterest(x, design=c(), silent=TRUE, disp="common", coef=c(), contrast=NULL, ...)
glmInterest(x, design=c(), silent=TRUE, disp="common", coef=c(), contrast=NULL, ...)
x |
Object of type |
design |
Design matrix. |
silent |
Whether run the function silently, i.e. without printing the top differential expression tags. Default is TRUE. |
disp |
The method of estimating the dispersion in the data. Available options are: "common", "trended", "tagwiseInitCommon" and "tagwiseInitTrended". It is also possible to assign a number. |
coef |
Integer or character vector indicating which coefficients of the linear model
are to be tested equal to zero. See |
contrast |
Numeric vector or matrix specifying contrasts of the linear model coefficients
to be tested equal to zero. See |
... |
Other parameter settings for the |
All values produced by glmLRT
in edgeR
package plus following:
dispersionType |
The name of the type of dispersion used. |
dispersion |
The estimated dispersion values. |
Ali Oghabian
exactTestInterest
, qlfInterest
,
treatInterest
#Test retention differentiation across the 3 types of sampels group <- getAnnotation(mdsChr22Obj)[,"type"] glmRes<- glmInterest(x=mdsChr22Obj, design=model.matrix(~group), silent=TRUE, disp="tagwiseInitTrended", coef=2:3, contrast=NULL)
#Test retention differentiation across the 3 types of sampels group <- getAnnotation(mdsChr22Obj)[,"type"] glmRes<- glmInterest(x=mdsChr22Obj, design=model.matrix(~group), silent=TRUE, disp="tagwiseInitTrended", coef=2:3, contrast=NULL)
A read summarization function that countsns all the reads mapping to the introns/exons based on the users detailed parameter settings. The process can be run in parallel on multiple computing cores to improve it performance.
interest( bamFileYieldSize=1000000, bamFile, isPaired, isPairedDuplicate=FALSE, isSingleReadDuplicate= NA, reference, referenceGeneNames, referenceIntronExon, repeatsTableToFilter=c(), junctionReadsOnly=FALSE, outFile, logFile="", returnObj= FALSE, method=c("ExEx", "IntRet", "IntSpan", "ExSkip"), strandSpecific, bpparam, appendLogFile=FALSE, sampleName="", scaleLength= c(TRUE,FALSE), scaleFragment= c(TRUE,TRUE), limitRanges=GRanges(), excludeFusionReads=FALSE, loadLimitRangesReads=FALSE, ...)
interest( bamFileYieldSize=1000000, bamFile, isPaired, isPairedDuplicate=FALSE, isSingleReadDuplicate= NA, reference, referenceGeneNames, referenceIntronExon, repeatsTableToFilter=c(), junctionReadsOnly=FALSE, outFile, logFile="", returnObj= FALSE, method=c("ExEx", "IntRet", "IntSpan", "ExSkip"), strandSpecific, bpparam, appendLogFile=FALSE, sampleName="", scaleLength= c(TRUE,FALSE), scaleFragment= c(TRUE,TRUE), limitRanges=GRanges(), excludeFusionReads=FALSE, loadLimitRangesReads=FALSE, ...)
bamFileYieldSize |
Maximum number of pair reads in the temprorary files created as the result of dividing the input .bam file. |
bamFile |
Path of the input bam file. |
isPaired |
Whether the bam file is the result of a paired end sequencing read mapping (TRUE) or not (FALSE). |
isPairedDuplicate |
Whether extract only (if set TRUE), filter (FALSE) or include (if set NA) PCR
dupplicates for paired mapped reads. It uses the FLAG field in the bam file to
filter the duplicate read. If the mapping software does not support detection
and flaging the duplicate reads |
isSingleReadDuplicate |
Whether extract only (if set TRUE), filter (FALSE) or include (if set NA) PCR dupplicates for single mapped reads. |
reference |
Dataframe to be used as reference; It should at least contain three same-size
vectors with the tag names |
referenceGeneNames |
A vector with the same size as the row-size of the reference which includes the gene names of the reference. |
referenceIntronExon |
A vector with the same size as the row-size of the reference with values "intron" and "exon" describing which (intron or exon) each row of the reference represents. |
repeatsTableToFilter |
A data.frame table with similar stucture to the |
junctionReadsOnly |
The parameter is considered if the |
outFile |
The name or path of the result file. |
logFile |
The log file path; if defined log information are written to the log file. |
returnObj |
If set |
method |
A vector describing the summarization methods to use; i.e. whether count reads
mapping to the introns ( |
strandSpecific |
The description for strand specificity of the RNAseq data. The values are either "unstranded", "stranded", or "reverse".If the reads are not strand specific or directional use "unstranded". If the first read in paired-read sequencing or the reads single-read sequencing is in the same direction as the the transcript strand use "stranded". If the first read in paired-read sequencing or the reads in single-read sequencing is in the oposite direction to the transcript strand use "reverse". |
bpparam |
An optional |
appendLogFile |
Whether log information should be appended to the |
sampleName |
The name of the sample being analyzed. It will be included in the returned
object if |
scaleLength |
A vector constructed of TRUE/FALSE values, same size as the
|
scaleFragment |
A vector constructed of TRUE/FALSE values, same size as the
|
limitRanges |
A GRanges object. If defined it loads sequencing reads that
fall in the defined coordinates. It is similar to |
excludeFusionReads |
Only valid if limitRanges is defined. It filters the
defined by |
loadLimitRangesReads |
Boolean (TRUE/FALSE) variable. If set as
|
... |
Other parameter settings specific to |
If returnObj
is set TRUE
in addition to making result text files,
dependant on whether a single or two method
is defined, the results
would be returned as a single object of class SummarizedExperiment
or as
a list of size 2 which includes 2 objects of class SummarizedExperiment
one for IntRet and the other for ExEx.
Ali Oghabian
# Creating temp directory to store the results outDir<- file.path(tempdir(),"interestFolder") dir.create(outDir) outDir<- normalizePath(outDir) # Loading suitable bam file bamF <- system.file("extdata", "small_test_SRR1691637_ZRSR2Mut_RHBDD3.bam", package="IntEREst", mustWork=TRUE) # Choosing reference for the gene RHBDD3 ref= u12[u12[,"gene_name"]=="RHBDD3",] test= interest( bamFileYieldSize=10000, bamFile=bamF, isPaired=TRUE, isPairedDuplicate=FALSE, isSingleReadDuplicate=NA, reference=ref, referenceGeneNames=ref[,"ens_gene_id"], referenceIntronExon=ref[,"int_ex"], repeatsTableToFilter=c(), outFile=paste(outDir, "interestRes.tsv", sep="/"), logFile=paste(outDir, "log.txt", sep="/"), method=c("IntRet", "IntSpan"), strandSpecific="unstranded", junctionReadsOnly=FALSE, returnObj=TRUE, scaleLength= c(TRUE,FALSE), scaleFragment= c(TRUE,TRUE) ) test
# Creating temp directory to store the results outDir<- file.path(tempdir(),"interestFolder") dir.create(outDir) outDir<- normalizePath(outDir) # Loading suitable bam file bamF <- system.file("extdata", "small_test_SRR1691637_ZRSR2Mut_RHBDD3.bam", package="IntEREst", mustWork=TRUE) # Choosing reference for the gene RHBDD3 ref= u12[u12[,"gene_name"]=="RHBDD3",] test= interest( bamFileYieldSize=10000, bamFile=bamF, isPaired=TRUE, isPairedDuplicate=FALSE, isSingleReadDuplicate=NA, reference=ref, referenceGeneNames=ref[,"ens_gene_id"], referenceIntronExon=ref[,"int_ex"], repeatsTableToFilter=c(), outFile=paste(outDir, "interestRes.tsv", sep="/"), logFile=paste(outDir, "log.txt", sep="/"), method=c("IntRet", "IntSpan"), strandSpecific="unstranded", junctionReadsOnly=FALSE, returnObj=TRUE, scaleLength= c(TRUE,FALSE), scaleFragment= c(TRUE,TRUE) ) test
A read summarization function that countsns all the reads mapping to the introns/exons based on the users detailed parameter settings. The process runs on a single computing core.
interest.sequential( bamFileYieldSize=1000000, bamFile, isPaired, isPairedDuplicate=FALSE, isSingleReadDuplicate=NA, reference, referenceGeneNames, referenceIntronExon, repeatsTableToFilter=c(), junctionReadsOnly=FALSE, outFile, logFile="", returnObj= FALSE, method=c("ExEx", "IntRet", "IntSpan", "ExSkip"), strandSpecific, appendLogFile=FALSE, sampleName="", scaleLength= c(TRUE,FALSE), scaleFragment= c(TRUE,TRUE), limitRanges=GRanges(), excludeFusionReads=FALSE, loadLimitRangesReads=FALSE, ...)
interest.sequential( bamFileYieldSize=1000000, bamFile, isPaired, isPairedDuplicate=FALSE, isSingleReadDuplicate=NA, reference, referenceGeneNames, referenceIntronExon, repeatsTableToFilter=c(), junctionReadsOnly=FALSE, outFile, logFile="", returnObj= FALSE, method=c("ExEx", "IntRet", "IntSpan", "ExSkip"), strandSpecific, appendLogFile=FALSE, sampleName="", scaleLength= c(TRUE,FALSE), scaleFragment= c(TRUE,TRUE), limitRanges=GRanges(), excludeFusionReads=FALSE, loadLimitRangesReads=FALSE, ...)
bamFileYieldSize |
Maximum number of paired Reads in the temprorary files created as the result of dividing the input .bam file. |
bamFile |
Path of the input bam file. |
isPaired |
Whether the bam file is the result of a paired end sequencing read mapping (TRUE) or not (FALSE). |
isPairedDuplicate |
Whether extract only (if set TRUE), filter (FALSE) or include (if set NA) PCR
dupplicates for paired mapped reads. It uses the FLAG field in the bam file to
filter the duplicate read. If the mapping software does not support detection
and flaging the duplicate reads |
isSingleReadDuplicate |
Whether extract only (if set TRUE), filter (FALSE) or include (if set NA) PCR dupplicates for single mapped reads. |
reference |
Dataframe to be used as reference; It should at least contain three same-size
vectors with the tag names |
referenceGeneNames |
A vector with the same size as the row-size of the reference which include the gene names. |
referenceIntronExon |
A vector with the same size as the row-size of the reference with values "intron" and "exon" describing which (intron or exon) each row of the reference represents. |
repeatsTableToFilter |
A data frame with similar structure as the |
junctionReadsOnly |
The parameter is considered if the |
outFile |
The name or path of the result file. |
logFile |
The log file path; if defined log information are written to the log file. |
returnObj |
If set |
method |
A vector describing the summarization methods to use; i.e. whether count reads
mapping to the introns ( |
strandSpecific |
The description for strand specificity of the RNAseq data. The values are either "unstranded", "stranded", or "reverse".If the reads are not strand specific or directional use "unstranded". If the first read in paired-read sequencing or the reads single-read sequencing is in the same direction as the the transcript strand use "stranded". If the first read in paired-read sequencing or the reads in single-read sequencing is in the oposite direction to the transcript strand use "reverse". |
appendLogFile |
Whether log information should be appended to the |
sampleName |
The name of the sample being analyzed. It will be included in the returned
object if |
scaleLength |
A vector constructed of TRUE/FALSE values, same size as the
|
scaleFragment |
A vector constructed of TRUE/FALSE values, same size as the
|
limitRanges |
A GRanges object. If defined it only loads sequencing read
if they fall in the defined coordinates. It is similar to |
excludeFusionReads |
Only valid if limitRanges is defined. It filters the
defined by |
loadLimitRangesReads |
Boolean (TRUE/FALSE) variable. If set as
|
... |
Other parameter settings specific to |
If returnObj
is set TRUE
in addition to making result text files,
dependant on whether a single or two method
is defined, the results
would be returned as a single object of class SummarizedExperiment
or as
a list of size 2 which includes 2 objects of class SummarizedExperiment
one for IntRet and the other for ExEx.
Ali Oghabian
# Creating temp directory to store the results outDir<- file.path(tempdir(),"interestFolder") dir.create(outDir) outDir<- normalizePath(outDir) # Loading suitable bam file bamF <- system.file("extdata", "small_test_SRR1691637_ZRSR2Mut_RHBDD3.bam", package="IntEREst", mustWork=TRUE) # Choosing reference for the gene RHBDD3 ref=u12[u12[,"gene_name"]=="RHBDD3",] test= interest.sequential( bamFileYieldSize=10000, bamFile=bamF, isPaired=TRUE, isPairedDuplicate=FALSE, isSingleReadDuplicate=NA, reference=ref, referenceGeneNames=ref[,"ens_gene_id"], referenceIntronExon=ref[,"int_ex"], repeatsTableToFilter=c(), outFile=paste(outDir, "interestRes.tsv", sep="/"), logFile=paste(outDir, "log.txt", sep="/"), method=c("IntRet","IntSpan"), strandSpecific="unstranded", returnObj=TRUE, scaleLength= c(TRUE,FALSE), scaleFragment= c(TRUE,TRUE) ) test
# Creating temp directory to store the results outDir<- file.path(tempdir(),"interestFolder") dir.create(outDir) outDir<- normalizePath(outDir) # Loading suitable bam file bamF <- system.file("extdata", "small_test_SRR1691637_ZRSR2Mut_RHBDD3.bam", package="IntEREst", mustWork=TRUE) # Choosing reference for the gene RHBDD3 ref=u12[u12[,"gene_name"]=="RHBDD3",] test= interest.sequential( bamFileYieldSize=10000, bamFile=bamF, isPaired=TRUE, isPairedDuplicate=FALSE, isSingleReadDuplicate=NA, reference=ref, referenceGeneNames=ref[,"ens_gene_id"], referenceIntronExon=ref[,"int_ex"], repeatsTableToFilter=c(), outFile=paste(outDir, "interestRes.tsv", sep="/"), logFile=paste(outDir, "log.txt", sep="/"), method=c("IntRet","IntSpan"), strandSpecific="unstranded", returnObj=TRUE, scaleLength= c(TRUE,FALSE), scaleFragment= c(TRUE,TRUE) ) test
Calls the constructors and creates a SummarizedExperiment
object.
For more information on the resulted object and the class see
SummarizedExperiment-class
.
InterestResult(resultFiles=c(), counts, scaledRetention, scaleLength, scaleFragment, sampleAnnotation, rowData)
InterestResult(resultFiles=c(), counts, scaledRetention, scaleLength, scaleFragment, sampleAnnotation, rowData)
resultFiles |
Vector of link to the result files of |
counts |
Numeric Matrix that includes the read counts. |
scaledRetention |
Matrix that includes the scaled retention values. |
scaleLength |
Logical value, indicating whether the intron/exon retention levels are scaled to the length of the introns/exons. |
scaleFragment |
Logical value, indicating whether the intron/exon retention levels are scaled to the fragments mapped to the genes. |
sampleAnnotation |
Data frame with the row-size equal to the size of |
rowData |
Data frame with Intron/Exon annotations and read count and scaled retention values for each sample. |
Returns an object of class SummarizedExperiment
.
Ali Oghabian
SummarizedExperiment-class
attributes
addAnnotation
counts-method
plot-method
geneId<- paste("gene", c(rep(1,5), rep(2,5), rep(3,5), rep(4,5)), sep="_") readCnt1<- sample(1:100, 20) readCnt2<- sample(1:100, 20) readCnt3<- sample(1:100, 20) readCnt4<- sample(1:100, 20) fpkm1<- readCnt1/(tapply(readCnt1, geneId, sum))[geneId] fpkm2<- readCnt2/(tapply(readCnt2, geneId, sum))[geneId] fpkm3<- readCnt3/(tapply(readCnt3, geneId, sum))[geneId] fpkm4<- readCnt4/(tapply(readCnt4, geneId, sum))[geneId] # Creating object using test data interestDat<- data.frame( int_ex=rep(c(rep(c("exon","intron"),2),"exon"),4), int_ex_num= rep(c(1,1,2,2,3),4), gene_id= geneId, sam1_readCnt=readCnt1, sam2_readCnt=readCnt2, sam3_readCnt=readCnt3, sam4_readCnt=readCnt4, sam1_fpkm=fpkm1, sam2_fpkm=fpkm2, sam3_fpkm=fpkm3, sam4_fpkm=fpkm4 ) readFreqColIndex<- grep("_readCnt$",colnames(interestDat)) scaledRetentionColIndex<- grep("_fpkm$",colnames(interestDat)) scalRetTmp<- as.matrix(interestDat[ ,scaledRetentionColIndex]) colnames(scalRetTmp)<-gsub("_fpkm$","", colnames(scalRetTmp)) frqTmp<- as.matrix(interestDat[ ,readFreqColIndex]) colnames(frqTmp)<-gsub("_readCnt$","", colnames(frqTmp)) InterestResultObj<- InterestResult( resultFiles=paste("file",1:4, sep="_"), rowData= interestDat[ , -c(readFreqColIndex, scaledRetentionColIndex)], counts= frqTmp, scaledRetention= scalRetTmp, scaleLength=TRUE, scaleFragment=FALSE, sampleAnnotation=data.frame( sampleName=paste("sam",1:4, sep=""), gender=c("M","M","F","F"), row.names=paste("sam", 1:4, sep="") ) ) # View object InterestResultObj
geneId<- paste("gene", c(rep(1,5), rep(2,5), rep(3,5), rep(4,5)), sep="_") readCnt1<- sample(1:100, 20) readCnt2<- sample(1:100, 20) readCnt3<- sample(1:100, 20) readCnt4<- sample(1:100, 20) fpkm1<- readCnt1/(tapply(readCnt1, geneId, sum))[geneId] fpkm2<- readCnt2/(tapply(readCnt2, geneId, sum))[geneId] fpkm3<- readCnt3/(tapply(readCnt3, geneId, sum))[geneId] fpkm4<- readCnt4/(tapply(readCnt4, geneId, sum))[geneId] # Creating object using test data interestDat<- data.frame( int_ex=rep(c(rep(c("exon","intron"),2),"exon"),4), int_ex_num= rep(c(1,1,2,2,3),4), gene_id= geneId, sam1_readCnt=readCnt1, sam2_readCnt=readCnt2, sam3_readCnt=readCnt3, sam4_readCnt=readCnt4, sam1_fpkm=fpkm1, sam2_fpkm=fpkm2, sam3_fpkm=fpkm3, sam4_fpkm=fpkm4 ) readFreqColIndex<- grep("_readCnt$",colnames(interestDat)) scaledRetentionColIndex<- grep("_fpkm$",colnames(interestDat)) scalRetTmp<- as.matrix(interestDat[ ,scaledRetentionColIndex]) colnames(scalRetTmp)<-gsub("_fpkm$","", colnames(scalRetTmp)) frqTmp<- as.matrix(interestDat[ ,readFreqColIndex]) colnames(frqTmp)<-gsub("_readCnt$","", colnames(frqTmp)) InterestResultObj<- InterestResult( resultFiles=paste("file",1:4, sep="_"), rowData= interestDat[ , -c(readFreqColIndex, scaledRetentionColIndex)], counts= frqTmp, scaledRetention= scalRetTmp, scaleLength=TRUE, scaleFragment=FALSE, sampleAnnotation=data.frame( sampleName=paste("sam",1:4, sep=""), gender=c("M","M","F","F"), row.names=paste("sam", 1:4, sep="") ) ) # View object InterestResultObj
Building SummarizedExperiment-class
object
from an intron retention and an exon-exon junction results in IntEREst. The
average of the junction levels are added to the SummerizedExperiment object of
the intron retentions.
interestResultIntEx (intObj, exObj, intExCol=c(), mean.na.rm=TRUE, postExName="ex_junc" )
interestResultIntEx (intObj, exObj, intExCol=c(), mean.na.rm=TRUE, postExName="ex_junc" )
intObj |
A |
exObj |
A |
intExCol |
Column name (or number) in the |
mean.na.rm |
Whether exclude missing values when measuring the mean. |
postExName |
The postfix to use for the column names of the exons junction values in the |
Returns an object of class SummarizedExperiment
.
Ali Oghabian
SummarizedExperiment-class
attributes
addAnnotation
counts-method
plot-method
testIntObj<- InterestResult( resultFiles= paste(paste("testFile",1:3, sep="_"),"bam", sep="."), counts= matrix(1:15, ncol=3, nrow=5, byrow=TRUE, dimnames= list(c(), paste("s", 1:3, sep="_"))), scaledRetention= matrix(1:15, ncol=3, nrow=5, byrow=TRUE, dimnames= list(c(), paste("s", 1:3, sep="_"))), scaleLength= FALSE, scaleFragment= FALSE, sampleAnnotation= data.frame( files=paste(paste("testFile",1:3, sep="_"),"bam", sep="."), names=paste("s", 1:3, sep="_"), row.names=paste("s", 1:3, sep="_")), rowData=data.frame(id= paste("i", 1:5, sep="_"), chr= rep("chr1", 5), begin=seq(100, by=100, length.out=5 ), end=seq(110, by=100, length.out=5 ), strand=rep("+",5)) ) testExObj<- InterestResult( resultFiles= paste(paste("testFile",1:3, sep="_"),"bam", sep="."), counts= matrix(1:30, ncol=3, nrow=10, byrow=TRUE, dimnames= list(c(), paste("s", 1:3, sep="_"))), scaledRetention= matrix(1:30, ncol=3, nrow=10, byrow=TRUE, dimnames= list(c(), paste("s", 1:3, sep="_"))), scaleLength= FALSE, scaleFragment= FALSE, sampleAnnotation= data.frame( files=paste(paste("testFile",1:3, sep="_"),"bam", sep="."), names=paste("s", 1:3, sep="_"), row.names=paste("s", 1:3, sep="_")), rowData=data.frame(id= paste("e", 1:10, sep="_"), chr= rep("chr1", 10), begin= c(seq(90, by=100, length.out=5), seq(111, by=100, length.out=5)), end= c(seq(99, by=100, length.out=5), seq(120, by=100, length.out=5 )), strand=rep("+",10)) ) (testIntExObj<- interestResultIntEx(intObj=testIntObj, exObj=testExObj, mean.na.rm=TRUE, postExName="ex_junc" ) )
testIntObj<- InterestResult( resultFiles= paste(paste("testFile",1:3, sep="_"),"bam", sep="."), counts= matrix(1:15, ncol=3, nrow=5, byrow=TRUE, dimnames= list(c(), paste("s", 1:3, sep="_"))), scaledRetention= matrix(1:15, ncol=3, nrow=5, byrow=TRUE, dimnames= list(c(), paste("s", 1:3, sep="_"))), scaleLength= FALSE, scaleFragment= FALSE, sampleAnnotation= data.frame( files=paste(paste("testFile",1:3, sep="_"),"bam", sep="."), names=paste("s", 1:3, sep="_"), row.names=paste("s", 1:3, sep="_")), rowData=data.frame(id= paste("i", 1:5, sep="_"), chr= rep("chr1", 5), begin=seq(100, by=100, length.out=5 ), end=seq(110, by=100, length.out=5 ), strand=rep("+",5)) ) testExObj<- InterestResult( resultFiles= paste(paste("testFile",1:3, sep="_"),"bam", sep="."), counts= matrix(1:30, ncol=3, nrow=10, byrow=TRUE, dimnames= list(c(), paste("s", 1:3, sep="_"))), scaledRetention= matrix(1:30, ncol=3, nrow=10, byrow=TRUE, dimnames= list(c(), paste("s", 1:3, sep="_"))), scaleLength= FALSE, scaleFragment= FALSE, sampleAnnotation= data.frame( files=paste(paste("testFile",1:3, sep="_"),"bam", sep="."), names=paste("s", 1:3, sep="_"), row.names=paste("s", 1:3, sep="_")), rowData=data.frame(id= paste("e", 1:10, sep="_"), chr= rep("chr1", 10), begin= c(seq(90, by=100, length.out=5), seq(111, by=100, length.out=5)), end= c(seq(99, by=100, length.out=5), seq(120, by=100, length.out=5 )), strand=rep("+",10)) ) (testIntExObj<- interestResultIntEx(intObj=testIntObj, exObj=testExObj, mean.na.rm=TRUE, postExName="ex_junc" ) )
Extract row numbers where introns (or exons dependant on user's request)
are located in an object of type SummarizedExperiment
.
intexIndex(x, intExCol="int_ex", what="intron")
intexIndex(x, intExCol="int_ex", what="intron")
x |
Object of type |
intExCol |
Column name (or number) that represents whether each row is "intron" or "exon"
in |
what |
A character string that defines whether the index for the introns or exons
should be returned. Accepts either |
A numeric vector which includes the index of the introns/exons.
Ali Oghabian
# Show the few first index of rows that represent the introns head(intexIndex(mdsChr22Obj, what="intron"))
# Show the few first index of rows that represent the introns head(intexIndex(mdsChr22Obj, what="intron"))
Log fold change estimation and normalized log fold change using edgeR
package.
lfc(x, fcType="edgeR", sampleAnnoCol=c(), sampleAnnotation=c(), silent=TRUE, group=c(), rejection.region="doubletail", pseudoCnt=1, log2=TRUE, ...)
lfc(x, fcType="edgeR", sampleAnnoCol=c(), sampleAnnotation=c(), silent=TRUE, group=c(), rejection.region="doubletail", pseudoCnt=1, log2=TRUE, ...)
x |
Object of type |
fcType |
Available as "scaledRetention" or "edgeR" (as default) corresponding to either log fold change of scaled retention values or degeR normalized log fold change values. |
sampleAnnoCol |
Which colummn of |
sampleAnnotation |
A vector of size 2 which cotains values from |
silent |
Whether run |
group |
Vector to manually define the sample groups (or annotations). It is ignored if
|
rejection.region |
The rejection.region parameter in |
pseudoCnt |
Pseudo count for log transformation (default=1). |
log2 |
Logical value either TRUE (default) or FALSE indicating whether the foldchanges should be log 2 transformed. |
... |
Other parameter settings from the |
Vector including fold change values.
Ali Oghabian
exactTestInterest
, u12DensityPlotIntron
lfcFpkm<- lfc(mdsChr22Obj, fcType="scaledRetention", sampleAnnoCol="test_ctrl", sampleAnnotation=c("ctrl", "test"), silent=TRUE, group=c(), pseudoFpkm=1, log2=TRUE) lfcEdgeRFpkm<- lfc(mdsChr22Obj, fcType="edgeR", sampleAnnoCol="test_ctrl", sampleAnnotation=c("ctrl", "test"), silent=TRUE, group=c(), pseudoFpkm=1, log2=TRUE)
lfcFpkm<- lfc(mdsChr22Obj, fcType="scaledRetention", sampleAnnoCol="test_ctrl", sampleAnnotation=c("ctrl", "test"), silent=TRUE, group=c(), pseudoFpkm=1, log2=TRUE) lfcEdgeRFpkm<- lfc(mdsChr22Obj, fcType="edgeR", sampleAnnoCol="test_ctrl", sampleAnnotation=c("ctrl", "test"), silent=TRUE, group=c(), pseudoFpkm=1, log2=TRUE)
The Results of interest()
analysis in exon-exon junction mode, for the
genes that feature U12-type introns and are located on Chr22 in MDS data.
data(mdsChr22ExObj)
data(mdsChr22ExObj)
An Object of class SummarizedExperiment
that contains intron retention
results generated by interest()
function on MDS data consisting of
bone-marrows samples of 8 MDS patients with ZRSR2 mutations, 4 patients without
the mutation and 4 healthy individuals.
@colData
A "DataFrame" (from "S4Vectors" package) that its rownames can be set as the sample identification names and the other columns are various annotations for the samples. Its column names are characters that describe the annotations.
@assays
List of size 2 that includes two numeric matrices:
counts
that includes raw read counts of the sequencing reads
mapped to introns and exons, and (2) scaledRetention
, i.e. the
normalized read counts.
@NAMES
A NULL value.
@elementMetadata
A "DataFrame" (from "S4Vectors" package) that include intron and exon annotations.
@metadata
A list of size 2 that includes parameter settings
for the interest()
and interest.sequential()
runs.
Object of class SummarizedExperiment
.
Madan, V., et.al., Aberrant splicing of U12-type introns is the hallmark of ZRSR2 mutant myelodysplastic syndrome. Nat Communication 2015 Jan 14;6:6042. doi: 10.1038/ncomms7042.
The Results of interest()
analysis in intron-spanning mode, for the
genes that feature U12-type introns and are located on Chr22 in MDS data.
data(mdsChr22ExObj)
data(mdsChr22ExObj)
An Object of class SummarizedExperiment
that contains intron retention
results generated by interest()
function on MDS data consisting of
bone-marrows samples of 8 MDS patients with ZRSR2 mutations, 4 patients without
the mutation and 4 healthy individuals.
@colData
A "DataFrame" (from "S4Vectors" package) that its rownames can be set as the sample identification names and the other columns are various annotations for the samples. Its column names are characters that describe the annotations.
@assays
List of size 2 that includes two numeric matrices:
counts
that includes raw read counts of the sequencing reads
mapped to introns and exons, and (2) scaledRetention
, i.e. the
normalized read counts.
@NAMES
A NULL value.
@elementMetadata
A "DataFrame" (from "S4Vectors" package) that include intron and exon annotations.
@metadata
A list of size 2 that includes parameter settings
for the interest()
and interest.sequential()
runs.
Object of class SummarizedExperiment
.
Madan, V., et.al., Aberrant splicing of U12-type introns is the hallmark of ZRSR2 mutant myelodysplastic syndrome. Nat Communication 2015 Jan 14;6:6042. doi: 10.1038/ncomms7042.
The Results of interest()
analysis in Intron-retention mode, for the
genes that feature U12-type introns and are located on Chr22 in MDS data.
data(mdsChr22Obj)
data(mdsChr22Obj)
An Object of class SummarizedExperiment
that contains intron retention
results generated by interest()
function on MDS data consisting of
bone-marrows samples of 8 MDS patients with ZRSR2 mutations, 4 patients without
the mutation and 4 healthy individuals.
@colData
A "DataFrame" (from "S4Vectors" package) that its rownames can be set as the sample identification names and the other columns are various annotations for the samples. Its column names are characters that describe the annotations.
@assays
List of size 2 that includes two numeric matrices:
counts
that includes raw read counts of the sequencing reads
mapped to introns and exons, and (2) scaledRetention
, i.e. the
normalized read counts.
@NAMES
A NULL value.
@elementMetadata
A "DataFrame" (from "S4Vectors" package) that include intron and exon annotations.
@metadata
A list of size 2 that includes parameter settings
for the interest()
and interest.sequential()
runs.
Object of class SummarizedExperiment
.
Madan, V., et.al., Aberrant splicing of U12-type introns is the hallmark of ZRSR2 mutant myelodysplastic syndrome. Nat Communication 2015 Jan 14;6:6042. doi: 10.1038/ncomms7042.
Build a new object bu merging data of two SummarizedExperiment
objects.
mergeInterestResult(x, y)
mergeInterestResult(x, y)
x |
Object of type |
y |
Object of type |
An object of calss SummarizedExperiment
.
Ali Oghabian
geneId<- paste("gene", c(rep(1,5), rep(2,5), rep(3,5), rep(4,5)), sep="_") readCnt1<- sample(1:100, 20) readCnt2<- sample(1:100, 20) readCnt3<- sample(1:100, 20) readCnt4<- sample(1:100, 20) fpkm1<- readCnt1/(tapply(readCnt1, geneId, sum))[geneId] fpkm2<- readCnt2/(tapply(readCnt2, geneId, sum))[geneId] fpkm3<- readCnt3/(tapply(readCnt3, geneId, sum))[geneId] fpkm4<- readCnt4/(tapply(readCnt4, geneId, sum))[geneId] # Creating object using test data interestDat<- data.frame( int_ex=rep(c(rep(c("exon","intron"),2),"exon"),4), int_ex_num= rep(c(1,1,2,2,3),4), gene_id= geneId, sam1_readCnt=readCnt1, sam2_readCnt=readCnt2, sam3_readCnt=readCnt3, sam4_readCnt=readCnt4, sam1_fpkm=fpkm1, sam2_fpkm=fpkm2, sam3_fpkm=fpkm3, sam4_fpkm=fpkm4 ) readFreqColIndex<- grep("_readCnt$",colnames(interestDat)) scaledRetentionColIndex<- grep("_fpkm$",colnames(interestDat)) scalRetTmp<- as.matrix(interestDat[ ,scaledRetentionColIndex]) colnames(scalRetTmp)<-gsub("_fpkm$","", colnames(scalRetTmp)) frqTmp<- as.matrix(interestDat[ ,readFreqColIndex]) colnames(frqTmp)<-gsub("_readCnt$","", colnames(frqTmp)) #Object including data for Males interestResObjM<-InterestResult( resultFiles=paste("file",1:2, sep="_"), rowData= interestDat[, -c(readFreqColIndex, scaledRetentionColIndex)], counts= frqTmp[,1:2], scaledRetention= scalRetTmp[,1:2], scaleLength=TRUE, scaleFragment=FALSE, sampleAnnotation=data.frame( sampleName=paste("sam",1:2, sep=""), gender=c("M","M"), health=c("healthy","unhealthy"), row.names=paste("sam", 1:2, sep="") ) ) #Object including data for Females interestResObjF<-InterestResult( resultFiles=paste("file",3:4, sep="_"), rowData= interestDat[, -c(readFreqColIndex, scaledRetentionColIndex)], counts= frqTmp[,3:4], scaledRetention= scalRetTmp[,3:4], scaleLength=TRUE, scaleFragment=FALSE, sampleAnnotation=data.frame( sampleName=paste("sam",3:4, sep=""), gender=c("F","F"), health=c("healthy","unhealthy"), row.names=paste("sam", 3:4, sep="") ) ) #Build new object newObj<- mergeInterestResult(interestResObjM, interestResObjF) #View newObj print(newObj)
geneId<- paste("gene", c(rep(1,5), rep(2,5), rep(3,5), rep(4,5)), sep="_") readCnt1<- sample(1:100, 20) readCnt2<- sample(1:100, 20) readCnt3<- sample(1:100, 20) readCnt4<- sample(1:100, 20) fpkm1<- readCnt1/(tapply(readCnt1, geneId, sum))[geneId] fpkm2<- readCnt2/(tapply(readCnt2, geneId, sum))[geneId] fpkm3<- readCnt3/(tapply(readCnt3, geneId, sum))[geneId] fpkm4<- readCnt4/(tapply(readCnt4, geneId, sum))[geneId] # Creating object using test data interestDat<- data.frame( int_ex=rep(c(rep(c("exon","intron"),2),"exon"),4), int_ex_num= rep(c(1,1,2,2,3),4), gene_id= geneId, sam1_readCnt=readCnt1, sam2_readCnt=readCnt2, sam3_readCnt=readCnt3, sam4_readCnt=readCnt4, sam1_fpkm=fpkm1, sam2_fpkm=fpkm2, sam3_fpkm=fpkm3, sam4_fpkm=fpkm4 ) readFreqColIndex<- grep("_readCnt$",colnames(interestDat)) scaledRetentionColIndex<- grep("_fpkm$",colnames(interestDat)) scalRetTmp<- as.matrix(interestDat[ ,scaledRetentionColIndex]) colnames(scalRetTmp)<-gsub("_fpkm$","", colnames(scalRetTmp)) frqTmp<- as.matrix(interestDat[ ,readFreqColIndex]) colnames(frqTmp)<-gsub("_readCnt$","", colnames(frqTmp)) #Object including data for Males interestResObjM<-InterestResult( resultFiles=paste("file",1:2, sep="_"), rowData= interestDat[, -c(readFreqColIndex, scaledRetentionColIndex)], counts= frqTmp[,1:2], scaledRetention= scalRetTmp[,1:2], scaleLength=TRUE, scaleFragment=FALSE, sampleAnnotation=data.frame( sampleName=paste("sam",1:2, sep=""), gender=c("M","M"), health=c("healthy","unhealthy"), row.names=paste("sam", 1:2, sep="") ) ) #Object including data for Females interestResObjF<-InterestResult( resultFiles=paste("file",3:4, sep="_"), rowData= interestDat[, -c(readFreqColIndex, scaledRetentionColIndex)], counts= frqTmp[,3:4], scaledRetention= scalRetTmp[,3:4], scaleLength=TRUE, scaleFragment=FALSE, sampleAnnotation=data.frame( sampleName=paste("sam",3:4, sep=""), gender=c("F","F"), health=c("healthy","unhealthy"), row.names=paste("sam", 3:4, sep="") ) ) #Build new object newObj<- mergeInterestResult(interestResObjM, interestResObjF) #View newObj print(newObj)
plot method for SummarizedExperiment
objects.
## S4 method for signature 'SummarizedExperiment,ANY' plot(x, summary="none", subsetRows=NULL, what="scaled", intronExon="intron", logScaleBase=NULL, logPseudoCnt=1, plotLoess=TRUE, loessCol="red", loessLwd=1, loessLty=1, cexText=1, marPlot=c(2,2,2,2), mgpPlot=c(1, 1, 0), cexAxis=1, writeCor=TRUE, corCex=1, corMethod="pearson", corCol="grey63", upperCorXY=c("topleft", NULL), lowerCorXY=c("topleft", NULL), na.rm=TRUE, cex=1, sampleAnnoCol=c(), lowerPlot=FALSE, upperPlot=TRUE, ...)
## S4 method for signature 'SummarizedExperiment,ANY' plot(x, summary="none", subsetRows=NULL, what="scaled", intronExon="intron", logScaleBase=NULL, logPseudoCnt=1, plotLoess=TRUE, loessCol="red", loessLwd=1, loessLty=1, cexText=1, marPlot=c(2,2,2,2), mgpPlot=c(1, 1, 0), cexAxis=1, writeCor=TRUE, corCex=1, corMethod="pearson", corCol="grey63", upperCorXY=c("topleft", NULL), lowerCorXY=c("topleft", NULL), na.rm=TRUE, cex=1, sampleAnnoCol=c(), lowerPlot=FALSE, upperPlot=TRUE, ...)
x |
Object of type |
summary |
Whether to plot the mean or median of the values over the sample with the same annotations, or plot the values for each individual sample separately. The available options are "mean", "median", or "none". |
subsetRows |
Vector either constructed of TRUE/FALSE values or constructed of numeric values
that could be used to choose rows of |
what |
Whether plot "scaled" (default) or read counts ("counts"). |
intronExon |
Whether plot intron retention, i.e. "intron" (default) or exon-junction "exon". |
logScaleBase |
Base of the log transform of the values, if defined. By default the value is
|
logPseudoCnt |
Pseudocount for the log transformation (default=1). |
plotLoess |
Whether fit and plot LOESS curve line (default="red"). |
loessCol |
loess line colour (default="red"). |
loessLwd |
loess line width (default=1). |
loessLty |
loess line type (default=1). |
cexText |
Size of the text for sample names or annotations (default=1). |
marPlot |
Plot margins (default=c(2,2,2,2)). See |
mgpPlot |
Plotting |
cexAxis |
Size of the text for the axis (default=1). |
writeCor |
Write correlation values (default=TRUE). |
corCex |
Text size of correlation values (default=1). |
corMethod |
Method used for correlation calculation. For more information see
|
corCol |
Color of the text of correlation (default="grey"). |
upperCorXY |
The coordinates of the correlation text in the upper panel plots ( default= c("topleft", NULL) ). |
lowerCorXY |
The coordinates of the correlation text in the lower panel plots ( default= c("topleft", NULL) ). |
na.rm |
whether remove the rows with missing values (default=TRUE). |
cex |
size of the plot text and symbols (default=1). |
sampleAnnoCol |
Which colummn of |
lowerPlot |
Whether plot the lower panel (default=FALSE). |
upperPlot |
Whether plot the upper panel (default=TRUE). |
... |
Other arguments to pass to the |
Returns NULL.
Ali Oghabian
Class:
SummarizedExperiment-class
Method:
counts-method
boxplot-method
geneId<- paste("gene", c(rep(1,5), rep(2,5), rep(3,5), rep(4,5)), sep="_") readCnt1<- sample(1:100, 20) readCnt2<- sample(1:100, 20) readCnt3<- sample(1:100, 20) readCnt4<- sample(1:100, 20) fpkm1<- readCnt1/(tapply(readCnt1, geneId, sum))[geneId] fpkm2<- readCnt2/(tapply(readCnt2, geneId, sum))[geneId] fpkm3<- readCnt3/(tapply(readCnt3, geneId, sum))[geneId] fpkm4<- readCnt4/(tapply(readCnt4, geneId, sum))[geneId] # Creating object using test data interestDat<- data.frame( int_ex=rep(c(rep(c("exon","intron"),2),"exon"),4), int_ex_num= rep(c(1,1,2,2,3),4), gene_id= geneId, sam1_readCnt=readCnt1, sam2_readCnt=readCnt2, sam3_readCnt=readCnt3, sam4_readCnt=readCnt4, sam1_fpkm=fpkm1, sam2_fpkm=fpkm2, sam3_fpkm=fpkm3, sam4_fpkm=fpkm4 ) readFreqColIndex<- grep("_readCnt$",colnames(interestDat)) scaledRetentionColIndex<- grep("_fpkm$",colnames(interestDat)) scalRetTmp<- as.matrix(interestDat[ ,scaledRetentionColIndex]) colnames(scalRetTmp)<-gsub("_fpkm$","", colnames(scalRetTmp)) frqTmp<- as.matrix(interestDat[ ,readFreqColIndex]) colnames(frqTmp)<-gsub("_readCnt$","", colnames(frqTmp)) InterestResultObj<- InterestResult( resultFiles=paste("file",1:4, sep="_"), rowData= interestDat[ , -c(readFreqColIndex, scaledRetentionColIndex)], counts= frqTmp, scaledRetention= scalRetTmp, scaleLength=TRUE, scaleFragment=FALSE, sampleAnnotation=data.frame( sampleName=paste("sam",1:4, sep=""), gender=c("M","M","F","F"), row.names=paste("sam", 1:4, sep="") ) ) InterestResultObj2<- addAnnotation(x=InterestResultObj, sampleAnnotationType="health", sampleAnnotation=c("healthy","unhealthy","healthy","unhealthy") ) #Plotting plot(InterestResultObj) plot(InterestResultObj, sampleAnnoCol="gender", summary="mean") plot(InterestResultObj2, sampleAnnoCol=3, summary="mean") plot(InterestResultObj2, summary="none")
geneId<- paste("gene", c(rep(1,5), rep(2,5), rep(3,5), rep(4,5)), sep="_") readCnt1<- sample(1:100, 20) readCnt2<- sample(1:100, 20) readCnt3<- sample(1:100, 20) readCnt4<- sample(1:100, 20) fpkm1<- readCnt1/(tapply(readCnt1, geneId, sum))[geneId] fpkm2<- readCnt2/(tapply(readCnt2, geneId, sum))[geneId] fpkm3<- readCnt3/(tapply(readCnt3, geneId, sum))[geneId] fpkm4<- readCnt4/(tapply(readCnt4, geneId, sum))[geneId] # Creating object using test data interestDat<- data.frame( int_ex=rep(c(rep(c("exon","intron"),2),"exon"),4), int_ex_num= rep(c(1,1,2,2,3),4), gene_id= geneId, sam1_readCnt=readCnt1, sam2_readCnt=readCnt2, sam3_readCnt=readCnt3, sam4_readCnt=readCnt4, sam1_fpkm=fpkm1, sam2_fpkm=fpkm2, sam3_fpkm=fpkm3, sam4_fpkm=fpkm4 ) readFreqColIndex<- grep("_readCnt$",colnames(interestDat)) scaledRetentionColIndex<- grep("_fpkm$",colnames(interestDat)) scalRetTmp<- as.matrix(interestDat[ ,scaledRetentionColIndex]) colnames(scalRetTmp)<-gsub("_fpkm$","", colnames(scalRetTmp)) frqTmp<- as.matrix(interestDat[ ,readFreqColIndex]) colnames(frqTmp)<-gsub("_readCnt$","", colnames(frqTmp)) InterestResultObj<- InterestResult( resultFiles=paste("file",1:4, sep="_"), rowData= interestDat[ , -c(readFreqColIndex, scaledRetentionColIndex)], counts= frqTmp, scaledRetention= scalRetTmp, scaleLength=TRUE, scaleFragment=FALSE, sampleAnnotation=data.frame( sampleName=paste("sam",1:4, sep=""), gender=c("M","M","F","F"), row.names=paste("sam", 1:4, sep="") ) ) InterestResultObj2<- addAnnotation(x=InterestResultObj, sampleAnnotationType="health", sampleAnnotation=c("healthy","unhealthy","healthy","unhealthy") ) #Plotting plot(InterestResultObj) plot(InterestResultObj, sampleAnnoCol="gender", summary="mean") plot(InterestResultObj2, sampleAnnoCol=3, summary="mean") plot(InterestResultObj2, summary="none")
Calculating the relative inclusion level of intron or Psi values base on two count matrices from a single or two separate objects. The values for each intron is in the range of [0,1], where 0 means complete splicing or no retention of the intron and 1 represnet complete 100
psi (x, y, intCol, exCol, pseudoCnt=0)
psi (x, y, intCol, exCol, pseudoCnt=0)
x |
Object of type |
y |
Optional; i.e. an object of type |
intCol |
Column numbers or column names in counts matrix of |
exCol |
Column numbers or column names in counts matrix of |
pseudoCnt |
Pseudo counts to sum to the denominator of the devision to avoid devision to zero. |
data.frame with column size equal to the size of intCol
parameter, and
row size equal to the number of rows in x
. It contains the psi values
(i,e.values between 0 and 1 showing the fraction of spliced in transcripts).
Ali Oghabian
mdsChr22IntObj<- mdsChr22Obj[which(rowData(mdsChr22Obj)$int_ex=="intron"), ] #Build object including intron-retention and exon-junction results mdsChr22RefIntExObj<- interestResultIntEx(intObj=mdsChr22Obj, exObj=mdsChr22ExObj, mean.na.rm=TRUE, postExName="ex_junc", intExCol="int_ex" ) # Calculate Psi psiRes<- psi(mdsChr22RefIntExObj, intCol=which(colData(mdsChr22RefIntExObj)$intronExon=="intron"), exCol=which(colData(mdsChr22RefIntExObj)$intronExon=="exon")) # show Psi results head(psiRes)
mdsChr22IntObj<- mdsChr22Obj[which(rowData(mdsChr22Obj)$int_ex=="intron"), ] #Build object including intron-retention and exon-junction results mdsChr22RefIntExObj<- interestResultIntEx(intObj=mdsChr22Obj, exObj=mdsChr22ExObj, mean.na.rm=TRUE, postExName="ex_junc", intExCol="int_ex" ) # Calculate Psi psiRes<- psi(mdsChr22RefIntExObj, intCol=which(colData(mdsChr22RefIntExObj)$intronExon=="intron"), exCol=which(colData(mdsChr22RefIntExObj)$intronExon=="exon")) # show Psi results head(psiRes)
PWM of U12 and U2-type introns splice sites and it is based on the U12DB database.
data("pwmU12db")
data("pwmU12db")
A list that contains Position Weight Matrices (PWM) of donor site, branch point and acceptor site of U12-type introns and the PWMs of donor site and acceptor site of U2-type introns. It is based on the U12DB database.
pwmDonU12
A position weigh matrix for the donor site of the U12-type introns, with 4 rows and 46 columns. The rows of the matrix represent "A", "C", "G", and "T" nucleotides and the columns represent the postions in the genome. Each position in the matrix include a weight (i.e. number between 0 and 1) which indicates how common the corresponding base (represented by the row of the matrix) is observed in the corerespoding position (represented by the colum of the matrix).
pwmBpU12
A position weigh matrix for the branch point of the U12-type introns, with 4 rows and 9 columns.
pwmAccU12
A position weigh matrix for the acceptor site of the U12-type introns, with 4 rows and 46 columns.
pwmDonU2
A position weigh matrix for the donor site of the U2-type introns, with 4 rows and 25 columns.
pwmAccU2
A position weigh matrix for the acceptor site of the U12-type introns, with 4 rows and 46 columns.
List of 5 numeric matrices representing the PWMs of donor site of U12-type introns, branch point site of U12-type introns, acceptor site of U12-type introns, donor site of U2-type introns, and acceptor site of U2-type introns.
Alioto, T.S. U12DB: a database of orthologous U12-type spliceosomal introns. Nucleic Acids Research 2006, doi: 10.1093/nar/gkl796
Compute quasi-likelihood F-test using edgeR
package. For more
information see glmQLFit
and glmQLFTest
functions
in edgeR
package.
qlfInterest(x, design=c(), silent=TRUE, disp="common", coef=c(), contrast=NULL, poisson.bound=TRUE, ...)
qlfInterest(x, design=c(), silent=TRUE, disp="common", coef=c(), contrast=NULL, poisson.bound=TRUE, ...)
x |
Object of type |
design |
Design matrix. |
silent |
Whether run silently, i.e. without printing the top differential expression tags. The default is TRUE. |
disp |
The method of estimating the dispersion in the data. Available options are: "common", "trended", "tagwiseInitCommon" and "tagwiseInitTrended". It is also possible to assign a number. |
coef |
Integer or character vector indicating which coefficients of the linear model
are to be tested equal to zero. See |
contrast |
Numeric vector or matrix specifying contrasts of the linear model coefficients
to be tested equal to zero. See |
poisson.bound |
Logical value, if TRUE (i.e. default) the pvalue would be higher than when obtained fom likelihood ratio test while Negative Binomial dispersion is zero. |
... |
Other parameter settings for the |
All values produced by glmQLFTest
plus the following :
dispersionType |
The name of the type of dispersion used. |
dispersion |
The estimated dispersion values. |
Ali Oghabian
exactTestInterest
, glmInterest
,
treatInterest
#Test retention differentiation across the 3 types of sampels group <- getAnnotation(mdsChr22Obj)[,"type"] qlfRes<- qlfInterest(x=mdsChr22Obj, design=model.matrix(~group), silent=TRUE, disp="tagwiseInitTrended", coef=2:3, contrast=NULL) qlfRes
#Test retention differentiation across the 3 types of sampels group <- getAnnotation(mdsChr22Obj)[,"type"] qlfRes<- qlfInterest(x=mdsChr22Obj, design=model.matrix(~group), silent=TRUE, disp="tagwiseInitTrended", coef=2:3, contrast=NULL) qlfRes
Reads one or multiple text file results generated by the interest
or interest.sequential
functions and builds an object of
SummarizedExperiment-class
class.
readInterestResults(resultFiles, sampleNames, sampleAnnotation, commonColumns, freqCol, scaledRetentionCol, scaleLength, scaleFragment, reScale=FALSE, geneIdCol, repeatsTableToFilter=c())
readInterestResults(resultFiles, sampleNames, sampleAnnotation, commonColumns, freqCol, scaledRetentionCol, scaleLength, scaleFragment, reScale=FALSE, geneIdCol, repeatsTableToFilter=c())
resultFiles |
Vector of character strings which includes the path to the tab-separated files
resulted by the |
sampleNames |
Vector of character strings which includes the name of the samples. It should
be the same size as the |
sampleAnnotation |
Data frame with the same row number as the size of |
commonColumns |
Columns in the result file which include intron/exon annotations and are common
across all files defined in |
freqCol |
Column in the result file which include the read counts for introns/exons. |
scaledRetentionCol |
Column in the result file which include the scaled retention values for introns/exons. |
scaleLength |
Logical value, indicating whether the intron/exon retention levels are scaled
to the length of the introns/exons. If |
scaleFragment |
Logical value, indicating whether the intron/exon retention levels are scaled
to the fragments mapped to the genes. If |
reScale |
Logical value, indicating whether the scaled retention levels would be
rescalculated when reading the data. By default it does not calculate and
trusts the user to set the |
geneIdCol |
The number or name of the column in |
repeatsTableToFilter |
A data.frame table with similar stucture to the
|
An object of calss
SummarizedExperiment-class
.
Ali Oghabian
geneId<- paste("gene", c(rep(1,7), rep(2,7), rep(3,7), rep(4,7)), sep="_") readCnt1<- sample(1:100, 28) readCnt2<- sample(1:100, 28) readCnt3<- sample(1:100, 28) readCnt4<- sample(1:100, 28) fpkm1<- readCnt1/(tapply(readCnt1, geneId, sum))[geneId] fpkm2<- readCnt2/(tapply(readCnt2, geneId, sum))[geneId] fpkm3<- readCnt3/(tapply(readCnt3, geneId, sum))[geneId] fpkm4<- readCnt4/(tapply(readCnt4, geneId, sum))[geneId] #Create tmp director tmpDir=file.path(tempdir(),"InterestResult") dir.create(tmpDir) # Build text files similar to files resulted by interest dfTmp=data.frame( int_ex=rep(c(rep(c("exon","intron"),3),"exon"),4), int_ex_num= rep(c(1,1,2,2,3,3,4),4), int_type=rep(c(NA,"U2",NA,"U12",NA,"U2",NA),4), strand=rep("*",28), gene_id= geneId, sam1_readCnt=readCnt1, sam2_readCnt=readCnt2, sam3_readCnt=readCnt3, sam4_readCnt=readCnt4, sam1_fpkm=fpkm1, sam2_fpkm=fpkm2, sam3_fpkm=fpkm3, sam4_fpkm=fpkm4 ) writeDf<-function(df, file){ write.table(df, file, col.names=TRUE, row.names=FALSE, quote=FALSE, sep='\t') } writeDf(dfTmp[, c(1:5,6,10)], paste(tmpDir, "df1.tsv", sep="/")) writeDf(dfTmp[, c(1:5,7,11)], paste(tmpDir, "df2.tsv", sep="/")) writeDf(dfTmp[, c(1:5,8,12)], paste(tmpDir, "df3.tsv", sep="/")) writeDf(dfTmp[, c(1:5,9,13)], paste(tmpDir, "df4.tsv", sep="/")) # Build object from generated text file results testObj<-readInterestResults( resultFiles=paste(tmpDir, c("df1.tsv", "df2.tsv", "df3.tsv", "df4.tsv"), sep="/"), sampleNames=c("sam1","sam2","sam3","sam4"), sampleAnnotation= data.frame( gender=c("M","M","F","F"), health=c("healthy","unhealthy","healthy","unhealthy")), commonColumns=1:5, freqCol=6, scaledRetentionCol=7, scaleLength=FALSE, scaleFragment=TRUE, reScale=FALSE) #View object testObj
geneId<- paste("gene", c(rep(1,7), rep(2,7), rep(3,7), rep(4,7)), sep="_") readCnt1<- sample(1:100, 28) readCnt2<- sample(1:100, 28) readCnt3<- sample(1:100, 28) readCnt4<- sample(1:100, 28) fpkm1<- readCnt1/(tapply(readCnt1, geneId, sum))[geneId] fpkm2<- readCnt2/(tapply(readCnt2, geneId, sum))[geneId] fpkm3<- readCnt3/(tapply(readCnt3, geneId, sum))[geneId] fpkm4<- readCnt4/(tapply(readCnt4, geneId, sum))[geneId] #Create tmp director tmpDir=file.path(tempdir(),"InterestResult") dir.create(tmpDir) # Build text files similar to files resulted by interest dfTmp=data.frame( int_ex=rep(c(rep(c("exon","intron"),3),"exon"),4), int_ex_num= rep(c(1,1,2,2,3,3,4),4), int_type=rep(c(NA,"U2",NA,"U12",NA,"U2",NA),4), strand=rep("*",28), gene_id= geneId, sam1_readCnt=readCnt1, sam2_readCnt=readCnt2, sam3_readCnt=readCnt3, sam4_readCnt=readCnt4, sam1_fpkm=fpkm1, sam2_fpkm=fpkm2, sam3_fpkm=fpkm3, sam4_fpkm=fpkm4 ) writeDf<-function(df, file){ write.table(df, file, col.names=TRUE, row.names=FALSE, quote=FALSE, sep='\t') } writeDf(dfTmp[, c(1:5,6,10)], paste(tmpDir, "df1.tsv", sep="/")) writeDf(dfTmp[, c(1:5,7,11)], paste(tmpDir, "df2.tsv", sep="/")) writeDf(dfTmp[, c(1:5,8,12)], paste(tmpDir, "df3.tsv", sep="/")) writeDf(dfTmp[, c(1:5,9,13)], paste(tmpDir, "df4.tsv", sep="/")) # Build object from generated text file results testObj<-readInterestResults( resultFiles=paste(tmpDir, c("df1.tsv", "df2.tsv", "df3.tsv", "df4.tsv"), sep="/"), sampleNames=c("sam1","sam2","sam3","sam4"), sampleAnnotation= data.frame( gender=c("M","M","F","F"), health=c("healthy","unhealthy","healthy","unhealthy")), commonColumns=1:5, freqCol=6, scaledRetentionCol=7, scaleLength=FALSE, scaleFragment=TRUE, reScale=FALSE) #View object testObj
Creates reference file for IntEREst functions, e.g. interest()
. The
function uses functions of biomaRt
library.
referencePrepare( outFileTranscriptsAnnotation="", annotateGeneIds=TRUE, u12IntronsChr=c(), u12IntronsBeg=c(), u12IntronsEnd=c(), u12IntronsRef, collapseExons=TRUE, sourceBuild="UCSC", ucscGenome="hg19", ucscTableName="knownGene", ucscUrl="http://genome-euro.ucsc.edu/cgi-bin/", biomart="ENSEMBL_MART_ENSEMBL", biomartDataset="hsapiens_gene_ensembl", biomartTranscriptIds=NULL, biomartExtraFilters=NULL, biomartIdPrefix="ensembl_", biomartHost="www.ensembl.org", biomartPort=80, circSeqs="", miRBaseBuild=NA, taxonomyId=NA, filePath="", fileFormat=c("auto", "gff3", "gtf"), fileDatSrc=NA, fileOrganism=NA, fileChrInf=NULL, fileDbXrefTag=c(), addCollapsedTranscripts=TRUE, ignore.strand=FALSE )
referencePrepare( outFileTranscriptsAnnotation="", annotateGeneIds=TRUE, u12IntronsChr=c(), u12IntronsBeg=c(), u12IntronsEnd=c(), u12IntronsRef, collapseExons=TRUE, sourceBuild="UCSC", ucscGenome="hg19", ucscTableName="knownGene", ucscUrl="http://genome-euro.ucsc.edu/cgi-bin/", biomart="ENSEMBL_MART_ENSEMBL", biomartDataset="hsapiens_gene_ensembl", biomartTranscriptIds=NULL, biomartExtraFilters=NULL, biomartIdPrefix="ensembl_", biomartHost="www.ensembl.org", biomartPort=80, circSeqs="", miRBaseBuild=NA, taxonomyId=NA, filePath="", fileFormat=c("auto", "gff3", "gtf"), fileDatSrc=NA, fileOrganism=NA, fileChrInf=NULL, fileDbXrefTag=c(), addCollapsedTranscripts=TRUE, ignore.strand=FALSE )
outFileTranscriptsAnnotation |
If defined outputs transcripts annotations. |
annotateGeneIds |
Wether annotate and add the gene ids information. |
collapseExons |
Whether collapse (i.e. reduce) the exonic regions. TRUE by default. |
sourceBuild |
The source to use to build the reference data, |
ucscGenome |
The genome to use. |
ucscTableName |
The UCSC table name to use. See |
ucscUrl |
The UCSC URL address. See |
u12IntronsChr |
A vector of character strings that includes chromsomal locations of the U12
type introns. If defined together with |
u12IntronsBeg |
A vector of numbers that defines the begin (or start) coordinates of the u12-type introns. |
u12IntronsEnd |
A vector of numbers that defines the end coordinates of the u12-type introns. |
u12IntronsRef |
A GRanges object that includes the coordinates of the U12 type introns. If defined, it would be used to annotate the U12-type introns. |
biomart |
BioMart database name. See |
biomartDataset |
BioMart dataset name; default is "hsapiens_gene_ensembl". See |
biomartTranscriptIds |
optional parameter to only retrieve transcript annotation results for a defined
set of transcript ids. See |
biomartExtraFilters |
A list of names; i.e. additional filters to use in the BioMart query. See
|
biomartIdPrefix |
A list of names; i.e. additional filters to use in the BioMart query. See
|
biomartHost |
Host to connect to; the default is "www.ensembl.org". For older versions of the GRCH you can provide the archive websites, e.g. for GRCH37 you can use "grch37.ensembl.org". |
biomartPort |
The port to use in the HTTP communication with the host. Default is 80. |
circSeqs |
A character vector that includes chromosomes that should be marked as circular.
See |
miRBaseBuild |
Set appropriate build Information from mirbase.db to use for microRNAs
(default=NA). See |
taxonomyId |
This parameter can be used to provide taxonomy Ids. It is set to NA by default.
You can check the taxonomy Ids with the |
filePath |
Character string i.e. the path to file. Used if |
fileFormat |
The format of the input file. |
fileDatSrc |
Character string describing the source of the data file. Used if
|
fileOrganism |
The genus and species name of the organism. Used if |
fileChrInf |
Dataframe that includes information about the chromosome. The first column
represents the chromosome name and the second column is the length of the
chromosome. Used if |
fileDbXrefTag |
A vector of chracater strings which if defined it would be used as feature
names. Used if |
addCollapsedTranscripts |
Whether add a column that includes the collapsed transcripts information. Used
if |
ignore.strand |
Whether consider the strands in the reference. If set |
Data frame that includes the coordinates and annotations of the introns and exons of the transcripts, i.e. the reference.
Ali Oghabian
# Build test gff3 data tmpGen<- u12[u12[,"ens_trans_id"]=="ENST00000413811",] tmpEx<-tmpGen[tmpGen[,"int_ex"]=="exon",] exonDat<- cbind(tmpEx[,3], ".", tmpEx[,c(7,4,5)], ".", tmpEx[,6], ".",paste("ID=exon", tmpEx[,11], "; Parent=ENST00000413811", sep="") ) trDat<- c(tmpEx[1,3], ".", "mRNA", as.numeric(min(tmpEx[,4])), as.numeric(max(tmpEx[,5])), ".", tmpEx[1,6], ".", "ID=ENST00000413811") outDir<- file.path(tempdir(),"tmpFolder") dir.create(outDir) outDir<- normalizePath(outDir) gff3File=paste(outDir, "gffFile.gff", sep="/") cat("##gff-version 3\n",file=gff3File, append=FALSE) cat(paste(paste(trDat, collapse="\t"),"\n", sep=""), file=gff3File, append=TRUE) write.table(exonDat, gff3File, row.names=FALSE, col.names=FALSE, sep='\t', quote=FALSE, append=TRUE) # Selecting U12 introns info from 'u12' data u12Int<-u12[u12$int_ex=="intron"&u12$int_type=="U12",] # Test the function refseqRef<- referencePrepare (sourceBuild="file", filePath=gff3File, u12IntronsChr=u12Int[,"chr"], u12IntronsBeg=u12Int[,"begin"], u12IntronsEnd=u12Int[,"end"], collapseExons=TRUE, fileFormat="gff3", annotateGeneIds=FALSE)
# Build test gff3 data tmpGen<- u12[u12[,"ens_trans_id"]=="ENST00000413811",] tmpEx<-tmpGen[tmpGen[,"int_ex"]=="exon",] exonDat<- cbind(tmpEx[,3], ".", tmpEx[,c(7,4,5)], ".", tmpEx[,6], ".",paste("ID=exon", tmpEx[,11], "; Parent=ENST00000413811", sep="") ) trDat<- c(tmpEx[1,3], ".", "mRNA", as.numeric(min(tmpEx[,4])), as.numeric(max(tmpEx[,5])), ".", tmpEx[1,6], ".", "ID=ENST00000413811") outDir<- file.path(tempdir(),"tmpFolder") dir.create(outDir) outDir<- normalizePath(outDir) gff3File=paste(outDir, "gffFile.gff", sep="/") cat("##gff-version 3\n",file=gff3File, append=FALSE) cat(paste(paste(trDat, collapse="\t"),"\n", sep=""), file=gff3File, append=TRUE) write.table(exonDat, gff3File, row.names=FALSE, col.names=FALSE, sep='\t', quote=FALSE, append=TRUE) # Selecting U12 introns info from 'u12' data u12Int<-u12[u12$int_ex=="intron"&u12$int_type=="U12",] # Test the function refseqRef<- referencePrepare (sourceBuild="file", filePath=gff3File, u12IntronsChr=u12Int[,"chr"], u12IntronsBeg=u12Int[,"begin"], u12IntronsEnd=u12Int[,"end"], collapseExons=TRUE, fileFormat="gff3", annotateGeneIds=FALSE)
Build a new object using subset of data in an SummarizedExperiment
object.
subInterestResult(x, selectRow, selectCol, sampleAnnoCol, sampleAnnotation=c())
subInterestResult(x, selectRow, selectCol, sampleAnnoCol, sampleAnnotation=c())
x |
Object of type |
selectRow |
Numeric or TRUE/FALSE Vector indicating what rows to extract. |
selectCol |
A vector with Numeric values, character strings (sample names) or TRUE/FALSE Vector indicating what columns to extract. |
sampleAnnoCol |
Which colummn of |
sampleAnnotation |
Vector including the annotations to consider for subset data extraction. They
should be present in the |
An object of calss SummarizedExperiment
.
Ali Oghabian
geneId<- paste("gene", c(rep(1,7), rep(2,7), rep(3,7), rep(4,7)), sep="_") readCnt1<- sample(1:100, 28) readCnt2<- sample(1:100, 28) readCnt3<- sample(1:100, 28) readCnt4<- sample(1:100, 28) fpkm1<- readCnt1/(tapply(readCnt1, geneId, sum))[geneId] fpkm2<- readCnt2/(tapply(readCnt2, geneId, sum))[geneId] fpkm3<- readCnt3/(tapply(readCnt3, geneId, sum))[geneId] fpkm4<- readCnt4/(tapply(readCnt4, geneId, sum))[geneId] # Creating object using test data interestDat<-data.frame( int_ex=rep(c(rep(c("exon","intron"),3),"exon"),4), int_ex_num= rep(c(1,1,2,2,3,3,4),4), int_type=rep(c(NA,"U2",NA,"U12",NA,"U2",NA),4), strand=rep("*",28), gene_id= geneId, sam1_readCnt=readCnt1, sam2_readCnt=readCnt2, sam3_readCnt=readCnt3, sam4_readCnt=readCnt4, sam1_fpkm=fpkm1, sam2_fpkm=fpkm2, sam3_fpkm=fpkm3, sam4_fpkm=fpkm4 ) readFreqColIndex<- grep("_readCnt$",colnames(interestDat)) scaledRetentionColIndex<- grep("_fpkm$",colnames(interestDat)) samNames<-paste("sam", 1:4, sep="") frqTmp<-as.matrix(interestDat[, readFreqColIndex]) sclTmp<-as.matrix(interestDat[, scaledRetentionColIndex]) colnames(frqTmp)<- samNames colnames(sclTmp)<- samNames interestResObj<- InterestResult( resultFiles=paste("file",1:4, sep="_"), rowData= interestDat[, -c(readFreqColIndex, scaledRetentionColIndex)], counts= frqTmp, scaledRetention= sclTmp , scaleLength=TRUE, scaleFragment=FALSE, sampleAnnotation=data.frame( sampleName=paste("sam",1:4, sep=""), gender=c("M","M","F","F"), health=c("healthy","unhealthy","healthy","unhealthy"), row.names=samNames ) ) #Build new object newObj<- subInterestResult(interestResObj, selectRow=1:20) #View newObj print(newObj)
geneId<- paste("gene", c(rep(1,7), rep(2,7), rep(3,7), rep(4,7)), sep="_") readCnt1<- sample(1:100, 28) readCnt2<- sample(1:100, 28) readCnt3<- sample(1:100, 28) readCnt4<- sample(1:100, 28) fpkm1<- readCnt1/(tapply(readCnt1, geneId, sum))[geneId] fpkm2<- readCnt2/(tapply(readCnt2, geneId, sum))[geneId] fpkm3<- readCnt3/(tapply(readCnt3, geneId, sum))[geneId] fpkm4<- readCnt4/(tapply(readCnt4, geneId, sum))[geneId] # Creating object using test data interestDat<-data.frame( int_ex=rep(c(rep(c("exon","intron"),3),"exon"),4), int_ex_num= rep(c(1,1,2,2,3,3,4),4), int_type=rep(c(NA,"U2",NA,"U12",NA,"U2",NA),4), strand=rep("*",28), gene_id= geneId, sam1_readCnt=readCnt1, sam2_readCnt=readCnt2, sam3_readCnt=readCnt3, sam4_readCnt=readCnt4, sam1_fpkm=fpkm1, sam2_fpkm=fpkm2, sam3_fpkm=fpkm3, sam4_fpkm=fpkm4 ) readFreqColIndex<- grep("_readCnt$",colnames(interestDat)) scaledRetentionColIndex<- grep("_fpkm$",colnames(interestDat)) samNames<-paste("sam", 1:4, sep="") frqTmp<-as.matrix(interestDat[, readFreqColIndex]) sclTmp<-as.matrix(interestDat[, scaledRetentionColIndex]) colnames(frqTmp)<- samNames colnames(sclTmp)<- samNames interestResObj<- InterestResult( resultFiles=paste("file",1:4, sep="_"), rowData= interestDat[, -c(readFreqColIndex, scaledRetentionColIndex)], counts= frqTmp, scaledRetention= sclTmp , scaleLength=TRUE, scaleFragment=FALSE, sampleAnnotation=data.frame( sampleName=paste("sam",1:4, sep=""), gender=c("M","M","F","F"), health=c("healthy","unhealthy","healthy","unhealthy"), row.names=samNames ) ) #Build new object newObj<- subInterestResult(interestResObj, selectRow=1:20) #View newObj print(newObj)
Compute a genewise statistical test relative to a fold-change threshold using
edgeR
package. For more information see glmTreat
function in edgeR
package.
treatInterest(x, design=c(), silent=TRUE, disp="common", coef=c(), contrast=NULL, lfc=0, ...)
treatInterest(x, design=c(), silent=TRUE, disp="common", coef=c(), contrast=NULL, lfc=0, ...)
x |
Object of class |
design |
Design matrix. |
silent |
Whether run silently, i.e. without printing the top differential expression tags. Default is TRUE. |
disp |
The method of estimating the dispersion in the data. Available options are: "common", "trended", "tagwiseInitCommon" and "tagwiseInitTrended". It is also possible to assign a number. |
coef |
Integer or character vector indicating which coefficients of the linear model
are to be tested equal to zero. See |
contrast |
Numeric vector or matrix specifying contrasts of the linear model coefficients
to be tested equal to zero. See |
lfc |
Numeric scalar i.e. the log fold change threshold. |
... |
Other parameter settings for the |
All values produced by glmTreat
plus the following :
dispersionType |
The name of the type of dispersion used. |
dispersion |
The estimated dispersion values. |
Ali Oghabian
exactTestInterest
, qlfInterest
,
glmInterest
group <- getAnnotation(mdsChr22Obj)[,"type"] #Test retention differentiation across the 3 types of sampels # The log fold change threshold is 0 treatRes<- treatInterest(x=mdsChr22Obj, design=model.matrix(~group), silent=TRUE, disp="tagwiseInitTrended", coef=2:3, contrast=NULL, lfc=0) treatRes
group <- getAnnotation(mdsChr22Obj)[,"type"] #Test retention differentiation across the 3 types of sampels # The log fold change threshold is 0 treatRes<- treatInterest(x=mdsChr22Obj, design=model.matrix(~group), silent=TRUE, disp="tagwiseInitTrended", coef=2:3, contrast=NULL, lfc=0) treatRes
Intron/exon annotations of genes featuring U12 introns. It is based on HG19/GRCh37 (converted from hg17/NCBI35). Moreover the u12 genes are based on the U12DB database.
data("u12")
data("u12")
A data frame with 22713 observations on the following 17 variables.
id
a numeric vector
int_ex_id
a character vector
chr
a character vector
begin
a numeric vector
end
a numeric vector
strand
a numeric vector
int_ex
a character vector
trans_type
a character vector
ens_gene_id
a character vector
ens_trans_id
a character vector
int_ex_num
a numeric vector
gene_name
a character vector
trans_name
a character vector
overlap_no
a numeric vector
int_type
a character vector
int_subtype
a character vector
Data frame that includes the coordinates and annotations of the introns and exons of the transcripts, i.e. the reference.
Alioto, T.S. U12DB: a database of orthologous U12-type spliceosomal introns. Nucleic Acids Research 2006, doi: 10.1093/nar/gkl796
A boxplot method for U12 and U2-type introns of SummarizedExperiment
objects.
u12Boxplot(x, sampleAnnoCol=NA, intExCol="int_ex", intTypeCol="int_type", intronExon, col="white", boxplotNames=c(), lasNames=3, outline=FALSE, addGrid=FALSE, ...)
u12Boxplot(x, sampleAnnoCol=NA, intExCol="int_ex", intTypeCol="int_type", intronExon, col="white", boxplotNames=c(), lasNames=3, outline=FALSE, addGrid=FALSE, ...)
x |
Object of type |
sampleAnnoCol |
Which colummn of |
intExCol |
Column name (or number) that represents whether each row of |
intTypeCol |
Column name (or number) that represents what type of intron each row of
|
intronExon |
Whether plot intron retention (set |
col |
Vector showing box colours. It is either of size 1 or the same size as the number of groups to be plotted. |
boxplotNames |
Names to write under boxes. If not defined, as names, it pastes U12/U2 (intron annotation) to the sample group annotations separated by a space " ". |
lasNames |
Orientation of the box names. |
outline |
If outline is TRUE the outlier points are drawn otherwise if FALSE (default) they are not. |
addGrid |
Whether add a grid under the boxplots (FALSE by default). |
... |
Other arguments to pass to the |
A SummarizedExperiment
object.
Ali Oghabian
u12Boxplot(mdsChr22Obj, sampleAnnoCol="type", intExCol="int_ex", intTypeCol="intron_type", intronExon="intron", col=rep(c("orange", "yellow"),3) , lasNames=3, outline=FALSE, ylab="FPKM", cex.axis=0.8)
u12Boxplot(mdsChr22Obj, sampleAnnoCol="type", intExCol="int_ex", intTypeCol="intron_type", intronExon="intron", col=rep(c("orange", "yellow"),3) , lasNames=3, outline=FALSE, ylab="FPKM", cex.axis=0.8)
boxplot U12 introns and (Up/Down)stream U2 introns in
SummarizedExperiment
objects.
u12BoxplotNb(x, sampleAnnoCol=2, intExCol="int_ex", intTypeCol="int_type", intronExon, strandCol="strand", geneIdCol, col=c(), names=c(), lasNames=1, outline=FALSE, plotLegend=TRUE, cexLegend=1, xLegend="topright", yLegend=NULL, bgLegend="transparent", legend=c(), addGrid=FALSE, ...)
u12BoxplotNb(x, sampleAnnoCol=2, intExCol="int_ex", intTypeCol="int_type", intronExon, strandCol="strand", geneIdCol, col=c(), names=c(), lasNames=1, outline=FALSE, plotLegend=TRUE, cexLegend=1, xLegend="topright", yLegend=NULL, bgLegend="transparent", legend=c(), addGrid=FALSE, ...)
x |
Object of type |
sampleAnnoCol |
Which colummn of |
intExCol |
Column name (or number) that represents whether each row of |
intTypeCol |
Column name (or number) that represents what type of intron each row of
|
intronExon |
Whether plot intron retention (set |
strandCol |
Column name (or number) that represents the strand of each row of assays in
|
geneIdCol |
Column name (or number) that represents the gene ID of each row of assays in
|
col |
Vector containing box colours. It is either of size 1 or the same size as the
number of boxes resulted based on the grouping of the samples defined by
|
names |
Names to write under group of boxes. |
lasNames |
Orientation of the box names. |
outline |
If outline is TRUE the outlier points are drawn otherwise if FALSE (default) they are not. |
plotLegend |
Whether show legend (TRUE by default). |
cexLegend |
Size of the text in legend . |
xLegend , yLegend
|
Position of legend in the plot. For more info see |
bgLegend |
Bakcground colour of the legend box. It is "transparent" by default. |
legend |
The replacement texts to be used in legend. |
addGrid |
Whether add a grid under the boxplots (FALSE by default). |
... |
Other arguments to pass to the |
Returns NULL
Ali Oghabian
u12BoxplotNb(mdsChr22Obj, sampleAnnoCol="type", lasNames=1, intExCol="int_ex", intTypeCol="intron_type", intronExon="intron", boxplotNames=c(), outline=FALSE, plotLegend=TRUE, geneIdCol="collapsed_transcripts_id", xLegend="topleft", col=c("pink", "lightblue", "lightyellow"), ylim=c(0,600000), ylab="FPKM", cex.axis=0.8)
u12BoxplotNb(mdsChr22Obj, sampleAnnoCol="type", lasNames=1, intExCol="int_ex", intTypeCol="intron_type", intronExon="intron", boxplotNames=c(), outline=FALSE, plotLegend=TRUE, geneIdCol="collapsed_transcripts_id", xLegend="topleft", col=c("pink", "lightblue", "lightyellow"), ylim=c(0,600000), ylab="FPKM", cex.axis=0.8)
Density plot of fold change of the retention levels of U12- vs U2- type intron,
or exon-exon junction levels of the flanking exons. For the density plot of the
foldchange of intron retention levels the u12DensityPlotIntron()
function or u12DensityPlot()
function with intronExon= "intron"
can be used. For density plot of the foldchange of exon-exon junction levels
use u12DensityPlot()
function with intronExon= "exon"
.
u12DensityPlot(x, type=c("U12", "U2Up", "U2Dn", "U2UpDn", "U2Rand"), fcType="edgeR", sampleAnnotation=c(), sampleAnnoCol=c(), group=c(), intExCol="int_ex", intTypeCol="int_type", intronExon, strandCol="strand", geneIdCol="collapsed_transcripts", naUnstrand=FALSE, col=1, lty=1, lwd=1, plotLegend=TRUE, cexLegend=1, xLegend="topright", yLegend=NULL, legend=c(), randomSeed=NULL, xlab="", ...) u12DensityPlotIntron(x, type= c("U12", "U2Up", "U2Dn", "U2UpDn", "U2Rand"), fcType= "edgeR", sampleAnnotation=c(), sampleAnnoCol=c(), group=c(), intExCol="int_ex", intTypeCol="int_type", strandCol= "strand", geneIdCol= "collapsed_transcripts", naUnstrand=FALSE, col=1, lty=1, lwd=1, plotLegend=TRUE, cexLegend=1, xLegend="topright", yLegend=NULL, legend=c(), randomSeed=NULL, xlab="", ...)
u12DensityPlot(x, type=c("U12", "U2Up", "U2Dn", "U2UpDn", "U2Rand"), fcType="edgeR", sampleAnnotation=c(), sampleAnnoCol=c(), group=c(), intExCol="int_ex", intTypeCol="int_type", intronExon, strandCol="strand", geneIdCol="collapsed_transcripts", naUnstrand=FALSE, col=1, lty=1, lwd=1, plotLegend=TRUE, cexLegend=1, xLegend="topright", yLegend=NULL, legend=c(), randomSeed=NULL, xlab="", ...) u12DensityPlotIntron(x, type= c("U12", "U2Up", "U2Dn", "U2UpDn", "U2Rand"), fcType= "edgeR", sampleAnnotation=c(), sampleAnnoCol=c(), group=c(), intExCol="int_ex", intTypeCol="int_type", strandCol= "strand", geneIdCol= "collapsed_transcripts", naUnstrand=FALSE, col=1, lty=1, lwd=1, plotLegend=TRUE, cexLegend=1, xLegend="topright", yLegend=NULL, legend=c(), randomSeed=NULL, xlab="", ...)
x |
Object of type |
type |
A vector that includes the type of introns to plot. Available options are U12
introns "U12", U2 introns at downstream of U12 introns "U2Dn", U2 introns at
upstream of U12 introns "U2Up", U2 introns at upstream or downstream of U12
introns suitable for when the coorduinates in object x are unstranded (their
strand is "*") "U2UpDn", random U2 introns from object x "U2Rand". Settings
"U2Up", "U2Dn" and "U2UpDn" are useful only if the refernce is linearly
ordered. References with exons only resulted by |
fcType |
Available as "fpkm" or "edgeR" (as default) corresponding to either log fold change of fpkm values or degeR normalized log fold change values. |
sampleAnnoCol |
Which colummn of |
sampleAnnotation |
A vector of size 2 which cotains values from |
group |
Vector to manually define the sample groups (or annotations). It is ignored if
|
intExCol |
Column name (or number) that represents whether each row of |
intTypeCol |
Column name (or number) that represents what type of intron each row of
|
intronExon |
Whether plot intron retention (set |
strandCol |
Column name (or number) that represents the strand of each row of assays in
|
geneIdCol |
Column name (or number) that represents the gene ID of each row of assays in
|
naUnstrand |
Replace unstranded results, i.e. introns or exon with "*" strand, with NA (to be excluded). |
col |
A vector with the size of 1 or the same size as the |
lty |
A vector with the size of 1 or the same size as the |
lwd |
A vector with the size of 1 or the same size as the |
plotLegend |
Whether show legend (TRUE by default). |
cexLegend |
Size of the text in legend . |
xLegend , yLegend
|
Position of legend in the plot. For more info see |
legend |
The replacement texts to be used in legend. |
randomSeed |
Seed value for random number generator. |
xlab |
The lable of the X axis of the plot; by default it is "". |
... |
Other parameter settings from the |
Returns NULL.
Ali Oghabian
u12DensityPlotIntron(mdsChr22Obj, type= c("U12", "U2Up", "U2Dn", "U2UpDn", "U2Rand"), fcType= "edgeR", sampleAnnoCol="test_ctrl", sampleAnnotation=c("ctrl","test"), intExCol="int_ex", intTypeCol="intron_type", strandCol= "strand", geneIdCol= "collapsed_transcripts_id", naUnstrand=FALSE, col=c(2,3,4,5,6), lty=c(1,2,3,4,5), lwd=1, plotLegend=TRUE, cexLegend=0.7, xLegend="topright", yLegend=NULL, legend=c(), randomSeed=10, ylim=c(0,0.6), xlab=expression("log"[2]*" fold change FPKM"))
u12DensityPlotIntron(mdsChr22Obj, type= c("U12", "U2Up", "U2Dn", "U2UpDn", "U2Rand"), fcType= "edgeR", sampleAnnoCol="test_ctrl", sampleAnnotation=c("ctrl","test"), intExCol="int_ex", intTypeCol="intron_type", strandCol= "strand", geneIdCol= "collapsed_transcripts_id", naUnstrand=FALSE, col=c(2,3,4,5,6), lty=c(1,2,3,4,5), lwd=1, plotLegend=TRUE, cexLegend=0.7, xLegend="topright", yLegend=NULL, legend=c(), randomSeed=10, ylim=c(0,0.6), xlab=expression("log"[2]*" fold change FPKM"))
Extract row numbers of U12 introns in an object of class
SummarizedExperiment
.
u12Index(x, intExCol="int_ex", intTypeCol="int_type", intronExon="intron")
u12Index(x, intExCol="int_ex", intTypeCol="int_type", intronExon="intron")
x |
Object of type |
intExCol |
Column name (or number) that represents whether each row of |
intTypeCol |
Column name (or number) that represents what type of intron each row of
|
intronExon |
Whether extract U12 type introns (set |
A numeric vector which includes the index of U12 introns.
Ali Oghabian
head(u12Index(mdsChr22Obj, intTypeCol="intron_type"))
head(u12Index(mdsChr22Obj, intTypeCol="intron_type"))
Extract row numbers of U2-type introns (up/down)stream of U12-type introns (in
the @interestDf
attribute of an object of class
SummarizedExperiment
).
u12NbIndex(x, intExCol="int_ex", intTypeCol="int_type", strandCol="strand", geneIdCol="collapsed_transcripts", naUnstrand=FALSE)
u12NbIndex(x, intExCol="int_ex", intTypeCol="int_type", strandCol="strand", geneIdCol="collapsed_transcripts", naUnstrand=FALSE)
x |
Object of type |
intExCol |
Column name (or number) that represents whether each row of |
intTypeCol |
Column name (or number) that represents what type of intron each row of
|
strandCol |
Column name (or number) that represents the strand of each row of assays in
|
geneIdCol |
Column name (or number) that represents the gene ID of each row of assays in
|
naUnstrand |
Replace unstranded results, i.e. introns or exon with "*" strand, with NA. If set as FALSE (default) "*" strand would be same as "+" strand. |
upIntron |
A numeric vector which includes the index of U2-type intron upstream the U12-type introns. |
downIntron |
A numeric vector which includes the index of U2-type intron downstream the U12-type introns. |
upExon |
A numeric vector which includes the index of exon upstream the U12-type introns. |
downExon |
A numeric vector which includes the index of exon downstream the U12-type introns. |
Ali Oghabian
head(u12NbIndex(mdsChr22Obj, intExCol="int_ex", intTypeCol="intron_type", strandCol="strand", geneIdCol="collapsed_transcripts_id", naUnstrand=FALSE)) # Return NA if no strand information available head(u12NbIndex(mdsChr22Obj, intExCol="int_ex", intTypeCol="intron_type", strandCol="strand", geneIdCol="collapsed_transcripts_id", naUnstrand=TRUE))
head(u12NbIndex(mdsChr22Obj, intExCol="int_ex", intTypeCol="intron_type", strandCol="strand", geneIdCol="collapsed_transcripts_id", naUnstrand=FALSE)) # Return NA if no strand information available head(u12NbIndex(mdsChr22Obj, intExCol="int_ex", intTypeCol="intron_type", strandCol="strand", geneIdCol="collapsed_transcripts_id", naUnstrand=TRUE))
Performs union on the overlapping introns/exons so that the final merged transcripts would feature from each exon or intron, one copy.
unionRefTr( referenceChr, referenceBegin, referenceEnd, referenceTr, referenceIntronExon, intronExon="exon", silent=FALSE)
unionRefTr( referenceChr, referenceBegin, referenceEnd, referenceTr, referenceIntronExon, intronExon="exon", silent=FALSE)
referenceChr |
Chromosome names of the references (e.g. introns). |
referenceBegin |
A vector that corresponds to the begin coordinates of the reference. |
referenceEnd |
A vector that corresponds to the end coordinates of the reference. |
referenceTr |
A character vector that includes transcription IDs. |
referenceIntronExon |
A vector with the same size as the |
intronExon |
Should be assigned either |
silent |
Whether run silently. |
Data frame containing merged transcripts structure. The merged transcripts feature from each intron or exon, one copy ONLY.
Ali Oghabian
unU12Ex<-unionRefTr( referenceChr=u12[1:94,"chr"], referenceBegin=u12[1:94,"begin"], referenceEnd=u12[1:94,"end"], referenceTr=u12[1:94,"trans_name"], referenceIntronExon=u12[1:94,"int_ex"], intronExon="exon", silent=TRUE) unU12Int<-unionRefTr( referenceChr=u12[1:94,"chr"], referenceBegin=u12[1:94,"begin"], referenceEnd=u12[1:94,"end"], referenceTr=u12[1:94,"trans_name"], referenceIntronExon=u12[1:94,"int_ex"], intronExon="intron", silent=TRUE) unU12IntEx<-unionRefTr( referenceChr=u12[1:94,"chr"], referenceBegin=u12[1:94,"begin"], referenceEnd=u12[1:94,"end"], referenceTr=u12[1:94,"trans_name"], referenceIntronExon=u12[1:94,"int_ex"], intronExon=c("intron","exon"), silent=TRUE)
unU12Ex<-unionRefTr( referenceChr=u12[1:94,"chr"], referenceBegin=u12[1:94,"begin"], referenceEnd=u12[1:94,"end"], referenceTr=u12[1:94,"trans_name"], referenceIntronExon=u12[1:94,"int_ex"], intronExon="exon", silent=TRUE) unU12Int<-unionRefTr( referenceChr=u12[1:94,"chr"], referenceBegin=u12[1:94,"begin"], referenceEnd=u12[1:94,"end"], referenceTr=u12[1:94,"trans_name"], referenceIntronExon=u12[1:94,"int_ex"], intronExon="intron", silent=TRUE) unU12IntEx<-unionRefTr( referenceChr=u12[1:94,"chr"], referenceBegin=u12[1:94,"begin"], referenceEnd=u12[1:94,"end"], referenceTr=u12[1:94,"trans_name"], referenceIntronExon=u12[1:94,"int_ex"], intronExon=c("intron","exon"), silent=TRUE)
rowData
of SummarizedExperiment
objects
Updates the values in a single column of the rowData
of
SummarizedExperiment
objects.
updateRowDataCol(x, updateCol, value)
updateRowDataCol(x, updateCol, value)
x |
Object of type |
updateCol |
Name or the number of the column in the |
value |
The new Replacing values. |
Returns an object of type SummarizedExperiment
.
Ali Oghabian
test<- mdsChr22Obj # See the the frequency of each intron type annotation table(rowData(test)$intron_type) #Change U2 to u2 newIntType<- as.character(rowData(test)$intron_type) newIntType[newIntType=="U2" & !is.na(newIntType=="U2")]<- "u2" #Updating values test<- updateRowDataCol(test, updateCol="intron_type", value=newIntType) #See the frequency of the updated intron type annotations table(rowData(test)$intron_type) #Adding a new column test<- updateRowDataCol(test, updateCol="new_column", value=rep(NA, nrow(rowData(test))) ) head(rowData(test))
test<- mdsChr22Obj # See the the frequency of each intron type annotation table(rowData(test)$intron_type) #Change U2 to u2 newIntType<- as.character(rowData(test)$intron_type) newIntType[newIntType=="U2" & !is.na(newIntType=="U2")]<- "u2" #Updating values test<- updateRowDataCol(test, updateCol="intron_type", value=newIntType) #See the frequency of the updated intron type annotations table(rowData(test)$intron_type) #Adding a new column test<- updateRowDataCol(test, updateCol="new_column", value=rep(NA, nrow(rowData(test))) ) head(rowData(test))