Package 'easyRNASeq' reference manual

Title:	Count summarization and normalization for RNA-Seq data
Description:	Calculates the coverage of high-throughput short-reads against a genome of reference and summarizes it per feature of interest (e.g. exon, gene, transcript). The data can be normalized as 'RPKM' or by the 'DESeq' or 'edgeR' package.
Authors:	Nicolas Delhomme, Ismael Padioleau, Bastian Schiffthaler, Niklas Maehler
Maintainer:	Nicolas Delhomme <[email protected]>
License:	Artistic-2.0
Version:	2.43.0
Built:	2025-03-19 04:21:05 UTC
Source:	https://github.com/bioc/easyRNASeq

Class "AnnotParam"

Description

A class holding all the necessary parameters to retrieve the necessary annotation for processing an RNA-Seq experiment.

Objects from the Class

Objects can be created by calls of the form new("AnnotParamCharacter", ...) or new("AnnotParamObject", ...) (both subject to API changes) or using the AnnotParam constructor (failsafe, prefered). The class AnnotParam in itself is virtual and hence cannot be instantiated.

Author(s)

Nicolas Delhomme

Examples


showClass("AnnotParam")

showClass("AnnotParam")

Class "BamParam"

Description

A class describing the parameters of a bam file issued from an RNA-Seq experiment.

Objects from the Class

Objects can be created by calls of the form new("BamParam", ...) or using the BamParam constructor.

Slots from the Class

The BamParam class has the following slots:

paired
stranded
strandProtocol
yieldSize

all of which can be accessed using the accordingly names accessor.

Author(s)

Nicolas Delhomme

Examples


showClass("BamParam")

showClass("BamParam")

Extend the basename function to display Rsamtools BamFile class basename

Description

Display the basename of the bam file represented by a BamFile object.

Usage

## S4 method for signature 'BamFile'
basename(path)
## S4 method for signature 'BamFile'
basename(path)

Arguments

path

an object of class BamFile or BamFileList

Methods

list("signature(object = \"BamFile\")"): Display the basename of the bam file linked to by a BamFile object.

Manages the data necessary for the examples using BiocFileCache

Description

Manages the tutorial, example and vignette data using the BiocFileCache package

Usage

fetchData(fileURL)
tutorialData(...)
fetchData(fileURL)
tutorialData(...)

Arguments

`...`	unused for the time being
`fileURL`	The URL of the file to retrieve. Alternatively, the ID of the file in the BiocFileCache (i.e. the file basename), can be used.

Methods

.get_cache: internal function to set up the cache
fetchData: A function to fetch tutorial data, a file at a time
tutorialData: the function to retrieve all the tutorial data and cache it, if it is not already available
vignetteData: the function to retrieve all the tutorial data and cache it, if it is not already available

Examples

tdir <- tutorialData()
gAnnot.path <- fetchData("gAnnot.rda")
vdir <- vignetteData()
md5.txt <- fetchData("md5.txt")
tdir <- tutorialData()
gAnnot.path <- fetchData("gAnnot.rda")
vdir <- vignetteData()
md5.txt <- fetchData("md5.txt")

Methods to create synthetic transcripts

Description

This function create a set of synthetic transcripts from a provided annotation file in "gff3" or "gtf" format. As detailed in http://www.epigenesys.eu/en/protocols/bio-informatics/1283-guidelines-for-rna-seq-data-analysis, one major caveat of estimating gene expression using aligned RNA-Seq reads is that a single read, which originated from a single mRNA molecule, might sometimes align to several features (e.g. transcripts or genes) with alignments of equivalent quality. This, for example, might happen as a result of gene duplication and the presence of repetitive or common domains. To avoid counting unique mRNA fragments multiple times, the stringent approach is to keep only uniquely mapping reads - being aware of potential consequences. Not only can "multiple counting" arise from a biological reason, but also from technical artifacts, introduced mostly by poorly formatted gff3/gtf annotation files. To avoid this, it is best practice to adopt a conservative approach by collapsing all existing transcripts of a single gene locus into a "synthetic" transcript containing every exon of that gene. In the case of overlapping exons, the longest genomic interval is kept, i.e. an artificial exon is created. This process results in a flattened transcript - a gene structure with a one (gene) to one (transcript) relationship.

Usage

## S4 method for signature 'AnnotParamCharacter'
createSyntheticTranscripts(
  obj,
  features = c("mRNA", "miRNA", "tRNA", "transcript"),
  verbose = TRUE
)

## S4 method for signature 'character'
createSyntheticTranscripts(
  obj,
  features = c("mRNA", "miRNA", "tRNA", "transcript"),
  verbose = TRUE,
  output = c("Genome_intervals", "GRanges"),
  input = c("gff3", "gtf")
)
## S4 method for signature 'AnnotParamCharacter'
createSyntheticTranscripts(
  obj,
  features = c("mRNA", "miRNA", "tRNA", "transcript"),
  verbose = TRUE
)

## S4 method for signature 'character'
createSyntheticTranscripts(
  obj,
  features = c("mRNA", "miRNA", "tRNA", "transcript"),
  verbose = TRUE,
  output = c("Genome_intervals", "GRanges"),
  input = c("gff3", "gtf")
)

Arguments

`obj`	a `AnnotParamCharacter` object or the annotation filename as a `character` string
`features`	one or more of 'mRNA', 'miRNA', 'tRNA', 'transcript'
`verbose`	increase the verbosity (default TRUE)
`output`	the output type, one of 'Genome_intervals' or 'GRanges'
`input`	the type of input, one of 'gff3' or 'gtf'

Details

The createSyntheticTranscripts function implements this, taking advantage of the hierarchical structure of the gff3/gtf file. Exon features are related to their transcript (parent), which themselves derives from their gene parents. Using this relationship, exons are combined per gene into a flattened transcript structure. Note that this might not avoid multiple counting if genes overlap on opposing strands. There, only strand specific sequencing data has the power to disentangle these situations.

As gff3/gtf file can contain a large number of feature types, the createSyntheticTranscripts currently only supports: mRNA, miRNA, tRNA and transcript. Please contact me if you need additional features to be considered. Note however, that I will only add features that are part of the sequenceontology.org SOFA (SO_Feature_Annotation) ontology.

Value

Depending on the obj class.

AnnotParamCharacter: a AnnotParamObject object
a character filename: depending on the selected output value, a Genome_intervals or a GRanges object.

Author(s)

Nicolas Delhomme

Examples


  # get the example file
  Dm.gtf <- fetchData("Drosophila_melanogaster.BDGP5.77.with-chr.gtf.gz")

  # create the AnnotParam
  annotParam <- AnnotParam(
    datasource=Dm.gtf,
    type="gtf")

  # create the synthetic transcripts
  annotParam <- createSyntheticTranscripts(annotParam,verbose=FALSE)

# get the example file
  Dm.gtf <- fetchData("Drosophila_melanogaster.BDGP5.77.with-chr.gtf.gz")

  # create the AnnotParam
  annotParam <- AnnotParam(
    datasource=Dm.gtf,
    type="gtf")

  # create the synthetic transcripts
  annotParam <- createSyntheticTranscripts(annotParam,verbose=FALSE)

The following function are defunct:

`easyRNASeq`

`fetchCoverage`

`fetchAnnotation`

`knownOrganisms`

Description

The easyRNASeq function is superseded by the simpleRNASeq function to consolidate and prune the overall package. The changes are based on user comments and on the general standardization occuring in the field.
The fetchCoverage function only had two parameters deprecated as the consequence of the package consolidation. As the scanBam function is not called directly anymore but through higher level functions (from the GenomicRanges package), the 'what' and 'isUnmappedQuery' parameters were obsolete.

Accessors for RNAseq class

Description

These functions and generics define 'accessors' (to get and set values) for objects in the easyRNASeq package.

Usage

genomicAnnotation(obj)
readCounts(obj,count=c("exons","features","genes","islands","transcripts"),
summarization=c("bestExons","geneModels"),unique=FALSE)
genomicAnnotation(obj) <- value
genomicAnnotation(obj)
readCounts(obj,count=c("exons","features","genes","islands","transcripts"),
summarization=c("bestExons","geneModels"),unique=FALSE)
genomicAnnotation(obj) <- value

Arguments

`obj`	An object derived from class `RNAseq`.
`count`	The type of count you want to access, 'genes','features','exons','transcripts' or 'islands'
`summarization`	If count is set to genes, precise the type of summarization, 'bestExons' or 'geneModels'
`unique`	For the 'exons' count only. Should the counts returned be unique for their identifier (i.e. the matrix row names)?
`value`	The replacement value.

Value

Usually, the value of the corresponding slot, or other simple content described on the help page of easyRNASeq.

Author(s)

Nicolas Delhomme

Examples


# This class is deprecated and as such there are no exmples of its use

# This class is deprecated and as such there are no exmples of its use

Get genic annotation from a gff3/gtf file or using biomaRt

Description

The annotation can be retrieved in two ways

biomaRtUse biomaRt and Ensembl to get organism specific annotation.
gff3/gtfUse a gff3 or gtf local annotation file.

When using biomaRt, it is important that the organism argument to AnnotParam is set the prefix of one of the value available using the biomaRt listDatasets function, e.g. "Dmelanogaster".
When reading from a gff3/gtf file, a version 3 formatted gff or a gtf (an Ensembl defined gff2 version) is expected. The function genomeIntervals genomeIntervals-readGff3 is used to import the data.

Usage

## S4 method for signature 'AnnotParam'
getAnnotation(obj, verbose = FALSE, ...)
## S4 method for signature 'AnnotParam'
getAnnotation(obj, verbose = FALSE, ...)

Arguments

`obj`	An object of class `AnnotParam`
`verbose`	a boolean to turn on verbosity
`...`	See details

Details

... are for additional arguments, passed to the biomaRt getBM function or to the readGffGtf internal function that takes an optional arguments: annotation.type that default to "exon". This is used to select the proper rows of the gff or gtf file.

Value

A GRanges containing the fetched annotations.

Author(s)

Nicolas Delhomme

Examples


  ## Not run: 
	library("RnaSeqTutorial")
  getAnnotation(
    AnnotParam(
      organism="Dmelanogaster",
      datasource=system.file(
  		  "extdata",
				"Dmel-mRNA-exon-r5.52.gff3",
				package="RnaSeqTutorial"),
  		type="gff3"
  ))

## End(Not run)

## Not run: 
	library("RnaSeqTutorial")
  getAnnotation(
    AnnotParam(
      organism="Dmelanogaster",
      datasource=system.file(
  		  "extdata",
				"Dmel-mRNA-exon-r5.52.gff3",
				package="RnaSeqTutorial"),
  		type="gff3"
  ))

## End(Not run)

Accessors for AnnotParam class

Description

These functions and generics define 'accessors' (to get and set values) for AnnotParam objects within the easyRNASeq package. Implemented are:

datasource
type

Usage

datasource(object)
## S4 method for signature 'AnnotParam'
type(x)
datasource(object)
## S4 method for signature 'AnnotParam'
type(x)

Arguments

`object`	An object derived from class `AnnotParam`.
`x`	An object derived from class `AnnotParam`.

Value

The value of the corresponding slot.

Author(s)

Nicolas Delhomme

Examples

# fetch the example data
Dm.annot <- fetchData("Dmel-mRNA-exon-r5.52.gff3.gz")

annot <- AnnotParam(datasource=Dm.annot)
# get the datasource Parameter
datasource(annot)

# fetch the example data
Dm.annot <- fetchData("Dmel-mRNA-exon-r5.52.gff3.gz")

annot <- AnnotParam(datasource=Dm.annot)
# get the datasource Parameter
datasource(annot)

AnnotParam constructor

Description

This constructs a AnnotParam object. The datasource parameter (see details) is mandatory, however other parameters, i.e. when the datasource is not a GRanges default to "genes" and gff3", indicating that the datasource is in the gff3 format and that the contained information needs to be grouped by "genes". This representing the most common use case. Hence, it is left to the user to refine the parameters accordingly to the annotation he is providing or whishes to retrieve.

Usage

## S4 method for signature 'character'
AnnotParam(
  datasource = character(0),
  type = c("gff3", "biomaRt", "gtf", "rda")
)
## S4 method for signature 'character'
AnnotParam(
  datasource = character(0),
  type = c("gff3", "biomaRt", "gtf", "rda")
)

Arguments

`datasource`	a character or a `GRanges` object. See details.
`type`	one of NULL, biomaRt, gff3, gtf or rda. Default to NULL. See details.

Details

Note that calling the constructor without argument fails, as the datasource is a mandatory parameter. Calling the constructor with additional (not all) parameters will affect the value of the selected parameters, leaving the other parameters unaffected. There are three parameters for an AnnotParam object:

datasourceIf no type is provided, the datasource should be GRanges object containing the genic information. These can be obtained using the getAnnotation function.
typeOne of biomaRt, gff3, gtf or rda. The default is "gff3". In all cases, the datasource is a character describing:
- For biomaRt, the name of the organism as known by the ensembl Mart, e.g. dmelanogaster or hsapiens.
- For gff3, gtf or rda, the filename (including the full or relative path).

Examples

# create an object to retrieve annotation from biomaRt
annotParam <- AnnotParam(datasource="Hsapiens",type="biomaRt")

# get the datasource and type
datasource(annotParam)
type(annotParam)

# create an object to retrieve annotation from an rda object
# fetch the example data
gAnnot.rda <- fetchData("gAnnot.rda")
annotParam <- AnnotParam(datasource=gAnnot.rda,type="rda")

# create an object to retrieve annotation from biomaRt
annotParam <- AnnotParam(datasource="Hsapiens",type="biomaRt")

# get the datasource and type
datasource(annotParam)
type(annotParam)

# create an object to retrieve annotation from an rda object
# fetch the example data
gAnnot.rda <- fetchData("gAnnot.rda")
annotParam <- AnnotParam(datasource=gAnnot.rda,type="rda")

Accessors for BamParam class

Description

These functions and generics define 'accessors' (to get and set values) for BamParam objects within the easyRNASeq package.

Usage

yieldSize(object,...)
paired(object)
stranded(object)
strandProtocol(object)
yieldSize(object,...)
paired(object)
stranded(object)
strandProtocol(object)

Arguments

`object`	An object derived from class `BamParam`.
`...`	Additional parameter inherited from the `Rsamtools package yieldSize function`. Ignored here.

Value

The value of the corresponding slot.

Author(s)

Nicolas Delhomme

Examples


bp <- BamParam()
## get the yieldSize Parameter
ysize <-yieldSize(bp)

bp <- BamParam()
## get the yieldSize Parameter
ysize <-yieldSize(bp)

BamParam constructor

Description

This constructs a BamParam object. The default parameters are derived from the currently most common RNA-Seq experimental use-case and are detailed below:

paired is TRUE, i.e. paired-end sequencing is expected.
stranded is FALSE i.e. stranded sequencing is not expected.
yieldSize is set to 1,000,000. This is the amount of reads iteratively processed from the bam file stream. It is a compromise between speed, process-parallelization and memory usage.

Usage

## S4 method for signature 'ANY'
BamParam(
  paired = TRUE,
  stranded = FALSE,
  strandProtocol = c("reverse", "forward"),
  yieldSize = 1000000L
)
## S4 method for signature 'ANY'
BamParam(
  paired = TRUE,
  stranded = FALSE,
  strandProtocol = c("reverse", "forward"),
  yieldSize = 1000000L
)

Arguments

`paired`	boolean whether the BAM file contains paired-end data or not
`stranded`	boolean whether the reads are strand specific
`strandProtocol`	factor with values 'reverse' and 'forward' specifying the type of strand specificity protocol. 'reverse', the reads are on the opposite strand to the gene; typical for Illumina TRUSEQ strand-specific protocol.
`yieldSize`	the amount of reads to be streamed at a time. Default to 1M

Details

Calling the constructor without argument result in the default parameter described above to be returned. Calling the constructor with any parameter will affect the value of the selected parameters, leaving the other parameters unaffected.

Examples

# the defaults
BamParam()

# change the default
BamParam(paired=FALSE)
BamParam(stranded=TRUE,yieldSize=1L)
BamParam(stranded=TRUE,strandProtocol="forward",yieldSize=1L)

# the defaults
BamParam()

# change the default
BamParam(paired=FALSE)
BamParam(stranded=TRUE,yieldSize=1L)
BamParam(stranded=TRUE,strandProtocol="forward",yieldSize=1L)

easyRNASeq count table correction to RPKM

Description

Convert a count table obtained from the easyRNASeq function into an RPKM corrected count table.

Usage

## S4 method for signature 'matrix,ANY,vector,vector'
RPKM(
  obj,
  from = c("exons", "features", "transcripts", "bestExons", "geneModels", "islands"),
  lib.size = numeric(1),
  feature.size = integer(1),
  simplify = TRUE,
  ...
)
## S4 method for signature 'matrix,ANY,vector,vector'
RPKM(
  obj,
  from = c("exons", "features", "transcripts", "bestExons", "geneModels", "islands"),
  lib.size = numeric(1),
  feature.size = integer(1),
  simplify = TRUE,
  ...
)

Arguments

`obj`	An object of class `RNAseq` or a `matrix`, see details
`from`	Determine the kind of coverage to use, choice limited to: exons, features, transcripts, bestExons, geneModels or islands.
`lib.size`	Precise the library size. It should be a named numeric list, i.e. named after the sample names.
`feature.size`	Precise the feature (e.g. exons, genes) sizes. It should be a named numeric list, named after the feature names.
`simplify`	If set to TRUE, whenever a feature (exon, feature, ...) is duplicated in the count table, it is only returned once.
`...`	additional arguments. See details

Details

RPKM accepts two sets of arguments:

RNAseq,character the ... are additional arguments to be passed to the readCounts method.
matrix,named vectornormalize a count matrix by providing the feature sizes (e.g. gene sizes) as a named vector where the names match the row names of the count matrix and the lib sizes as a named vector where the names match the column names of the count matrix.

Value

A matrix containing RPKM corrected read counts.

Author(s)

Nicolas Delhomme

Examples



	## Not run: 
	## get an RNAseq object
	rnaSeq <- easyRNASeq(filesDirectory=
		    			system.file(
					"extdata",
					package="RnaSeqTutorial"),
					pattern="[A,C,T,G]{6}\\.bam$",
				format="bam",
				readLength=36L,
				organism="Dmelanogaster",
				chr.sizes=as.list(seqlengths(Dmelanogaster)),
				annotationMethod="rda",
				annotationFile=system.file(
				                            "data",
							    "gAnnot.rda",
							    package="RnaSeqTutorial"),
				count="exons",
				outputFormat="RNAseq")

	## get the RPKM
	rpkm <- RPKM(rnaSeq,from="exons")
	
	## the same from a count table
	count.table <- readCounts(rnaSeq,count="exons")

	## get the RPKM
	## verify that the feature are sorted as the count.table
	all(.getName(rnaSeq,"exon") == rownames(count.table))
	feature.size <- unlist(width(ranges(rnaSeq)))

	## verify that the samples are ordered in the same way
	all(names(librarySize(rnaSeq)) == colnames(count.table))

	## get the RPKM
	rpkm <- RPKM(count.table,
			feature.size=feature.size,
			lib.size=librarySize(rnaSeq))

## End(Not run)

## Not run: 
	## get an RNAseq object
	rnaSeq <- easyRNASeq(filesDirectory=
		    			system.file(
					"extdata",
					package="RnaSeqTutorial"),
					pattern="[A,C,T,G]{6}\\.bam$",
				format="bam",
				readLength=36L,
				organism="Dmelanogaster",
				chr.sizes=as.list(seqlengths(Dmelanogaster)),
				annotationMethod="rda",
				annotationFile=system.file(
				                            "data",
							    "gAnnot.rda",
							    package="RnaSeqTutorial"),
				count="exons",
				outputFormat="RNAseq")

	## get the RPKM
	rpkm <- RPKM(rnaSeq,from="exons")
	
	## the same from a count table
	count.table <- readCounts(rnaSeq,count="exons")

	## get the RPKM
	## verify that the feature are sorted as the count.table
	all(.getName(rnaSeq,"exon") == rownames(count.table))
	feature.size <- unlist(width(ranges(rnaSeq)))

	## verify that the samples are ordered in the same way
	all(names(librarySize(rnaSeq)) == colnames(count.table))

	## get the RPKM
	rpkm <- RPKM(count.table,
			feature.size=feature.size,
			lib.size=librarySize(rnaSeq))

## End(Not run)

Compute the coverage from a Short Read Alignment file

Description

Computes the genomic reads' coverage from a read file in bam format or any format supported by ShortRead.

Usage

## S4 method for signature 'RNAseq'
fetchCoverage(
  obj,
  format = c("aln", "bam"),
  filename = character(1),
  filter = srFilter(),
  type = "SolexaExport",
  chr.sel = c(),
  validity.check = TRUE,
  chr.map = data.frame(),
  ignoreWarnings = FALSE,
  gapped = TRUE,
  paired = FALSE,
  stranded = FALSE,
  bp.coverage = FALSE,
  ...
)
## S4 method for signature 'RNAseq'
fetchCoverage(
  obj,
  format = c("aln", "bam"),
  filename = character(1),
  filter = srFilter(),
  type = "SolexaExport",
  chr.sel = c(),
  validity.check = TRUE,
  chr.map = data.frame(),
  ignoreWarnings = FALSE,
  gapped = TRUE,
  paired = FALSE,
  stranded = FALSE,
  bp.coverage = FALSE,
  ...
)

Arguments

`obj`	An `RNAseq` object
`format`	The format of the reads, one of "aln","bam". If not "bam", all the types supported by the ShortRead package are supported too.
`filename`	The full path of the file to use
`filter`	The filter to be applied when loading the data using the "aln" format
`type`	The type of data when using the "aln" format. See the ShortRead package.
`chr.sel`	A vector of chromosome names to subset the final results.
`validity.check`	Shall UCSC chromosome name convention be enforced
`chr.map`	A data.frame describing the mapping of original chromosome names towards wished chromosome names. See details.
`ignoreWarnings`	set to TRUE (bad idea! they have a good reason to be there) if you do not want warning messages.
`gapped`	Is the bam file provided containing gapped alignments?
`paired`	Is the bam file containing PE reads?
`stranded`	Is the bam file from a strand specific protocol?
`bp.coverage`	a boolean that default to FALSE to decide whether coverage is to be calculated and stored by bp
`...`	additional arguments. See details

Details

... for fetchCoverage: Can be used for readAligned method from package ShortRead. The use of the dots for the scanBamFlag method from package Rsamtools has been deprecated, as were the 'what' and 'isUnmappedQuery' argument to the function

Value

An RNAseq object. The slot readCoverage contains a SimpleRleList object representing a list of coverage vectors, one per chromosome.

Author(s)

Nicolas Delhomme

Examples


  ## Not run: 
	library("RnaSeqTutorial")
	library(BSgenome.Dmelanogaster.UCSC.dm3)

	obj <- new('RNAseq',
		organismName="Dmelanogaster",
		readLength=36L,
		chrSize=as.list(seqlengths(Dmelanogaster))
		)

	obj <- fetchCoverage(
			obj,
			format="bam",
                        filename=system.file(
				"extdata",
				"ACACTG.bam",
                            	package="RnaSeqTutorial")
			)
	
## End(Not run)

## Not run: 
	library("RnaSeqTutorial")
	library(BSgenome.Dmelanogaster.UCSC.dm3)

	obj <- new('RNAseq',
		organismName="Dmelanogaster",
		readLength=36L,
		chrSize=as.list(seqlengths(Dmelanogaster))
		)

	obj <- fetchCoverage(
			obj,
			format="bam",
                        filename=system.file(
				"extdata",
				"ACACTG.bam",
                            	package="RnaSeqTutorial")
			)
	
## End(Not run)

Defunct annotation function

Description

The fetchAnnotation and knownOrganisms function are now defunct. The fetchAnnotation function has been replaced by the getAnnotation method.

Author(s)

Nicolas Delhomme

Extension of the GenomicRanges package

Description

Describes extensions to the GenomicRanges package. For GRanges and GRangesList objects:

colnames returns the column name of a GRanges or GRangesList object.
unsafeAppend appends two GAlignments object together bypassing most sanity checks. Faster than the standard c or append function.

Usage

colnames(x, do.NULL = TRUE, prefix = "col")
unsafeAppend(obj1,obj2)
colnames(x, do.NULL = TRUE, prefix = "col")
unsafeAppend(obj1,obj2)

Arguments

`x`	An object of the `GRanges` or `GRangesList` class
`do.NULL`	see `row_colnames` for details
`prefix`	see `row_colnames` for details
`obj1`	A `GAlignments` object
`obj2`	A `GAlignments` object

Details

colnames returns the actual column names of the elementMetadata slot of the GRanges or GRangesList object. The elementMetadata contains a DataFrame object used to store additional information provided by the user, such as exon ID in our case.
unsafeAppend appends two GAlignments objects.

Value

colnames: A vector of column names.
unsafeAppend: A GAlignments object

Author(s)

Nicolas Delhomme

Examples


 # an example of annotation
	grngs <- GRanges(seqnames=c("chr01","chr01","chr02"),
                     ranges=IRanges(
                             start=c(10,30,100),
                             end=c(21,53,123)),
                          strand=c("+","+","-"),
                          transcripts=c("trA1","trA2","trB"),
                          gene=c("gA","gA","gB"),
                          exon=c("e1","e2","e3")
                          )

	# accessing the colnames
	colnames(grngs)

	# creating a GRangesList
	grngsList<-split(grngs,seqnames(grngs))

	# accessing the colnames
	colnames(grngsList)

# For unsafeAppend
library(GenomicAlignments)
unsafeAppend(GAlignments(),GAlignments())

# an example of annotation
	grngs <- GRanges(seqnames=c("chr01","chr01","chr02"),
                     ranges=IRanges(
                             start=c(10,30,100),
                             end=c(21,53,123)),
                          strand=c("+","+","-"),
                          transcripts=c("trA1","trA2","trB"),
                          gene=c("gA","gA","gB"),
                          exon=c("e1","e2","e3")
                          )

	# accessing the colnames
	colnames(grngs)

	# creating a GRangesList
	grngsList<-split(grngs,seqnames(grngs))

	# accessing the colnames
	colnames(grngsList)

# For unsafeAppend
library(GenomicAlignments)
unsafeAppend(GAlignments(),GAlignments())

Identify expressed regions de-novo

Description

Process the coverage to locate regions with a minimum coverage (min.cov). If regions are separated by a gap shorter than a maximum length (max.gap), they are unified. Only islands longer than min.length are returned. These functions are now outdated and would need to be actualized.

Usage

## S4 method for signature 'RNAseq'
findIslands(
  obj,
  max.gap = integer(1),
  min.cov = 1L,
  min.length = integer(1),
  plot = TRUE,
  ...
)
## S4 method for signature 'RNAseq'
findIslands(
  obj,
  max.gap = integer(1),
  min.cov = 1L,
  min.length = integer(1),
  plot = TRUE,
  ...
)

Arguments

`obj`	An object of class `RNAseq`
`max.gap`	Maximum gap between two peaks to build an island
`min.cov`	Minimum coverage for an island to be returned
`min.length`	Minimum size of an island to be returned
`plot`	If TRUE, draw plots of coverage distribution. Help the user to select an appropriate value for the minimum coverage.
`...`	See details

Details

... are for providing additional options to the hist plot function.

Value

An RNAseq object with the readIsland slot set with a GRanges containing the selected islands and the readCount slot actualized with a list containing the count table per island.

Author(s)

Nicolas Delhomme

Examples


	## Not run: 
	# NOTE that this function might need to be actualized
	obj <- new('RNAseq',
		organismName="Dmelanogaster",
		readLength=36L,
		chrSize=as.list(seqlengths(Dmelanogaster))
		)

 # fetch the example data
 bamFilePath <- fetchData("ACACTG.bam")

	obj <- fetchCoverage(obj,format="bam",filename=bamFilePath)

	obj <- findIslands(
			obj,
			max.gap=10L,
			min.cov=10L,
			min.length=200L)
	
## End(Not run)

## Not run: 
	# NOTE that this function might need to be actualized
	obj <- new('RNAseq',
		organismName="Dmelanogaster",
		readLength=36L,
		chrSize=as.list(seqlengths(Dmelanogaster))
		)

 # fetch the example data
 bamFilePath <- fetchData("ACACTG.bam")

	obj <- fetchCoverage(obj,format="bam",filename=bamFilePath)

	obj <- findIslands(
			obj,
			max.gap=10L,
			min.cov=10L,
			min.length=200L)
	
## End(Not run)

Accessors for RnaSeqParam class

Description

These functions and generics define 'accessors' (to get and set values) for RnaSeqParam objects within the easyRNASeq package. Implemented are:

annotParam
bamParam
countBy
datasource
paired
precision
stranded
strandProtocol
yieldSize

Usage

## S4 method for signature 'RnaSeqParam'
yieldSize(object)
## S4 method for signature 'RnaSeqParam'
yieldSize(object)

Arguments

object

An object derived from class RnaSeqParam.

Value

The value of the corresponding slot.

Author(s)

Nicolas Delhomme

Examples

## create the necessary AnnotParam
annotParam <- AnnotParam(
                datasource=system.file(
                                "extdata",
                                "Dmel-mRNA-exon-r5.52.gff3",
                                package="RnaSeqTutorial"))

## create the RnaSeqParam
rsp <- RnaSeqParam(annotParam=annotParam)
## get the yieldSize Parameter
ysize <-yieldSize(rsp)

## create the necessary AnnotParam
annotParam <- AnnotParam(
                datasource=system.file(
                                "extdata",
                                "Dmel-mRNA-exon-r5.52.gff3",
                                package="RnaSeqTutorial"))

## create the RnaSeqParam
rsp <- RnaSeqParam(annotParam=annotParam)
## get the yieldSize Parameter
ysize <-yieldSize(rsp)

RnaSeqParam constructor

Description

This constructs a RnaSeqParam object, that combines all the necessary parameters for the analysis of RNA-Seq data. As much as possible, these parameters are determined automa-gi/ti-cally. It describes three sets of parameters:

parameters describing the annotation
parameters describing the BAM files, i.e. the type of sequencing that was conducted.
parameters describing how the counting should be done.

The first two are provided through sepcific objects: AnnotParam and BamParam respectively. The third one is a set constituted of:

countBy: the feature per which the counts should be summarized ( exon, transcript or gene. A forth possibility - feature - can be used to define arbitrary genomic loci)
precision: the precision at which the counts should be performed: bp or reads. bp used to be the default in the easyRNASeq package, whereas now reads is, following the Bioconductor main stream development.

The default parameters for the BamParam parameter are derived from the currently most common RNA-Seq experimental use-case: strand-specific paired-end Illumina sequencing. See the respective manual pages of AnnotParam and BamParam for more details.

Usage

## S4 method for signature 'ANY'
RnaSeqParam(
  annotParam = AnnotParam(),
  bamParam = BamParam(),
  countBy = c("exons", "features", "genes", "transcripts"),
  precision = c("read", "bp")
)
## S4 method for signature 'ANY'
RnaSeqParam(
  annotParam = AnnotParam(),
  bamParam = BamParam(),
  countBy = c("exons", "features", "genes", "transcripts"),
  precision = c("read", "bp")
)

Arguments

`annotParam`	An object derived from class `AnnotParam`.
`bamParam`	An object derived from class `BamParam`.
`countBy`	TODO
`precision`	A character value, either 'read' or 'bp' that defines the precision at which counting is done, either per read or per covered bp. 'read' is the default.

Examples

annotParam <- AnnotParam(
                datasource=system.file(
                                "extdata",
                                "Dmel-mRNA-exon-r5.52.gff3",
                                package="RnaSeqTutorial"))

## create the RnaSeqParam
rsp <- RnaSeqParam(annotParam=annotParam)

## change some defaults
RnaSeqParam(countBy="features",annotParam=annotParam)
RnaSeqParam(bamParam=BamParam(stranded=TRUE,yieldSize=1L),annotParam=annotParam)


annotParam <- AnnotParam(
                datasource=system.file(
                                "extdata",
                                "Dmel-mRNA-exon-r5.52.gff3",
                                package="RnaSeqTutorial"))

## create the RnaSeqParam
rsp <- RnaSeqParam(annotParam=annotParam)

## change some defaults
RnaSeqParam(countBy="features",annotParam=annotParam)
RnaSeqParam(bamParam=BamParam(stranded=TRUE,yieldSize=1L),annotParam=annotParam)

Count methods for RNAseq object

Description

Summarize the read counts per exon, feature, gene, transcript or island.

exonCounts: for that summarization, reads are summarized per exons. An "exon" field is necessary in the annotation object for this to work. See easyRNASeq annotation methods for more details on the annotation object.
featureCounts is similar to the 'exons' one. This is just a wrapper to summarize count for genomic features that are not exon related. I.e. one could use it to measure eRNAs. Again, a "feature" field is necessary in the annotation object for this to work.
geneCounts sums the counts per either bestExons or geneModels. In either case, the annotation object needs to contain both an "exon" and a "gene" field.
islandCounts sums the counts per computed islands.
transcriptCounts sums the counts obtained by exons into their respective transcripts. Note that this often result in counting some reads several times. For this function to work you need both an "exon" and a "transcript" field in your annotation object. To avoid this, one could create transcript specific synthetic exons, i.e. features that would be unique to a transcript. To offer this possibility, transcripts count can be summarized from "features", in which case the annotation object need to have both the "feature" and "transcript" fields defined.

Usage

exonCounts(obj)
featureCounts(obj)
transcriptCounts(obj,from="exons")
geneCounts(obj,summarization=c("bestExons","geneModels"),...)
islandCounts(obj,force=FALSE,...)
exonCounts(obj)
featureCounts(obj)
transcriptCounts(obj,from="exons")
geneCounts(obj,summarization=c("bestExons","geneModels"),...)
islandCounts(obj,force=FALSE,...)

Arguments

`obj`	An object derived from class `RNAseq`,can be a `matrix` for RPKM, see details
`force`	For `islandCount`, force RNAseq to redo `findIsland`
`from`	either "exons" or "features" can be used to summarize per transcript
`summarization`	Method use for summarize genes
`...`	See details

Details

... for

geneCounts: additional options for the .geneModelSummarization
islandCounts: additional options for findIslands

Value

A numeric vector containing count per exon, feature, gene or transcript.

Author(s)

Nicolas Delhomme

Examples

## Not run: 
library(BSgenome.Dmelanogaster.UCSC.dm3)

 # get the example data files
 tdir <- tutorialData()

 # get an example annotation file - we retrieve it from GitHub using curl
 gAnnot.rda <- fetchData("gAnnot.rda")

	# create an RNAseq object
	# summarizing 2 bam files by exons
	rnaSeq <- easyRNASeq(tdir,
                     organism="Dmelanogaster",
                     chr.sizes=seqlengths(Dmelanogaster),
                     readLength=36L,
                     annotationMethod="rda",
                     annotationFile=gAnnot.rda,
                     format="bam",
                     count="exons",
                     pattern="[A,C,T,G]{6}\\.bam$",
                     outputFormat="RNAseq")
	# summing up the exons by transcript
	rnaSeq <- transcriptCounts(rnaSeq)

## End(Not run)

## Not run: 
library(BSgenome.Dmelanogaster.UCSC.dm3)

 # get the example data files
 tdir <- tutorialData()

 # get an example annotation file - we retrieve it from GitHub using curl
 gAnnot.rda <- fetchData("gAnnot.rda")

	# create an RNAseq object
	# summarizing 2 bam files by exons
	rnaSeq <- easyRNASeq(tdir,
                     organism="Dmelanogaster",
                     chr.sizes=seqlengths(Dmelanogaster),
                     readLength=36L,
                     annotationMethod="rda",
                     annotationFile=gAnnot.rda,
                     format="bam",
                     count="exons",
                     pattern="[A,C,T,G]{6}\\.bam$",
                     outputFormat="RNAseq")
	# summing up the exons by transcript
	rnaSeq <- transcriptCounts(rnaSeq)

## End(Not run)

Dataset included in the package

Description

The package contains a dataset from the Robinson, Delhomme et al., 2014 publication.

RobinsonDelhomme2014a normalised expression count table. This dataset was generated from 17 Populus tremula - Eurasian aspen - trees used to assess the sexual dimorphism of this dioecious species. This count matrix has been generating following published pre-processing guidelines - see http://www.epigenesys.eu/en/protocols/bio-informatics/1283-guidelines-for-rna-seq-data-analysis - and the resulting HTSeq files have been collated and the obtained raw count matrix submitted to a variance stabilising transformation. Subsequently, the values have been transformed so that the minimal vst values - that corresponds to an absence of expression - is 0. Hence the counts in the matrix are library-size normalized, variance stabilised expression values, with a minimal value of 0.

easyRNASeq method

Description

This function is a wrapper around the more low level functionalities of the package. Is the easiest way to get a count matrix from a set of read files. It does the following:

use ShortRead/Rsamtools methods for loading/pre-processing the data.
fetch the annotations depending on the provided arguments
get the reads coverage from the provided file(s)
summarize the reads according to the selected summarization features
optionally apply a data correction (i.e. generating RPKM).
use edgeR methods for post-processing the data, this being strongly recommended over RPKM).

Usage

## S4 method for signature 'character'
easyRNASeq(
  filesDirectory = getwd(),
  organism = character(1),
  chr.sizes = c("auto"),
  readLength = integer(1),
  annotationMethod = c("biomaRt", "env", "gff", "gtf", "rda"),
  annotationFile = character(1),
  annotationObject = GRangesList(),
  format = c("bam", "aln"),
  gapped = FALSE,
  count = c("exons", "features", "genes", "islands", "transcripts"),
  outputFormat = c("matrix", "SummarizedExperiment", "edgeR", "RNAseq"),
  pattern = character(1),
  filenames = character(0),
  nbCore = 1,
  filter = srFilter(),
  type = "SolexaExport",
  chr.sel = c(),
  summarization = c("bestExons", "geneModels"),
  normalize = FALSE,
  max.gap = integer(1),
  min.cov = 1L,
  min.length = integer(1),
  plot = TRUE,
  conditions = c(),
  validity.check = TRUE,
  chr.map = data.frame(),
  ignoreWarnings = FALSE,
  silent = FALSE,
  ...
)
## S4 method for signature 'character'
easyRNASeq(
  filesDirectory = getwd(),
  organism = character(1),
  chr.sizes = c("auto"),
  readLength = integer(1),
  annotationMethod = c("biomaRt", "env", "gff", "gtf", "rda"),
  annotationFile = character(1),
  annotationObject = GRangesList(),
  format = c("bam", "aln"),
  gapped = FALSE,
  count = c("exons", "features", "genes", "islands", "transcripts"),
  outputFormat = c("matrix", "SummarizedExperiment", "edgeR", "RNAseq"),
  pattern = character(1),
  filenames = character(0),
  nbCore = 1,
  filter = srFilter(),
  type = "SolexaExport",
  chr.sel = c(),
  summarization = c("bestExons", "geneModels"),
  normalize = FALSE,
  max.gap = integer(1),
  min.cov = 1L,
  min.length = integer(1),
  plot = TRUE,
  conditions = c(),
  validity.check = TRUE,
  chr.map = data.frame(),
  ignoreWarnings = FALSE,
  silent = FALSE,
  ...
)

Arguments

`filesDirectory`	The directory where the files to be used are located. Defaults to the current directory.
`organism`	A character string describing the organism
`chr.sizes`	A vector or a list containing the chromosomes' size of the selected organism or simply the string "auto". See details.
`readLength`	The read length in bp
`annotationMethod`	The method to fetch the annotation, one of "biomaRt","env","gff","gtf" or "rda". All methods but "biomaRt" and "env" require the annotationFile to be set. The "env" method requires the annotationObject to be set.
`annotationFile`	The location (full path) of the annotation file
`annotationObject`	A `GRangesList` object containing the annotation.
`format`	The format of the reads, one of "aln","bam". If not "bam", all the types supported by the ShortRead package are supported too. As of version 1.3.5, it defaults to bam.
`gapped`	Is the bam file provided containing gapped alignments?
`count`	The feature used to summarize the reads. One of 'exons','features','genes','islands' or 'transcripts'. See details.
`outputFormat`	By default, easyRNASeq returns a matrix. If one of `edgeR`, `RNAseq` or `SummarizedExperiment` is provided then the respective object is returned.
`pattern`	For easyRNASeq, the pattern of file to look for, e.g. "bam$"
`filenames`	The name, not the path, of the files to use
`nbCore`	defines how many CPU core to use when computing the geneModels. Use the default parallel library
`filter`	The filter to be applied when loading the data using the "aln" format
`type`	The type of data when using the "aln" format. See the ShortRead library.
`chr.sel`	A vector of chromosome names to subset the final results.
`summarization`	A character defining which method to use when summarizing reads by genes. So far, only "geneModels" is available.
`normalize`	A boolean to convert the returned counts in RPKM. Valid when the `outputFormat` is left undefined (i.e. when a matrix is returned) and when it is `edgeR`. Note that you should not normalize the data prior to using edgeR!
`max.gap`	When computing read islands, the maximal gap size allowed between two islands to merge them
`min.cov`	When computing read islands, the minimal coverage to take into account for calling an island
`min.length`	The minimal size an island should have to be kept
`plot`	Whether or not to plot assessment graphs.
`conditions`	A vector of descriptor, each sample must have a descriptor if you use outputFormat edgeR. The size of this list must be equal to the number of sample. In addition the vector should be named with the filename of the corresponding samples.
`validity.check`	Shall UCSC chromosome name convention be enforced? This is only supported for a set of organisms, which are Dmelanogaster, Hsapiens, Mmusculus and Rnorvegicus; otherwise the argument 'chr.map' can be used to complement it.
`chr.map`	A data.frame describing the mapping of original chromosome names towards wished chromosome names. See details.
`ignoreWarnings`	set to TRUE (bad idea! they have a good reason to be there) if you do not want warning messages.
`silent`	set to TRUE if you do not want messages to be printed out.
`...`	additional arguments. See details

Details

... Additional arguments for different functions:
- For the biomaRt getBM function
- For the readGffGtf internal function that takes an optional arguments: annotation.type that default to "exon" (used to select the proper rows of the gff or gtf file)
- For to the list.files function used to locate the read files.
the annotationObject When the annotationMethods is set to env or rda, a properly formatted GRangesList object need to be provided. Check the vignette or the examples at the bottom of this page for examples. The data.frame-like structure of these objects is where easyRNASeq will look for the exon, feature, transcript, or gene identifier. Depending on the count method selected, it is essential that the akin column name is present in the annotationObject. E.g. when counting "features", the annotationObject has to contain a "feature" field.
the chr.map The chr.map argument for the easyRNASeq function only works for an "organismName" of value 'custom' with the "validity.check" parameter set to 'TRUE'. This data.frame should contain two columns named 'from' and 'to'. The row should represent the chromosome name in your original data and the wished name in the output of the function.
count The count can be summarized by exons, features, genes, islands or transcripts. While exons, genes and transcripts are obvious, "features" describes any features provided by the user, e.g. enhancer loci. These are processed as the exons are. For "islands", it is for an under development function that identifies de-novo expression loci and count the number of reads overlapping them.
chr.sizes If set to "auto", then the format has to be "bam", in which case the chromosome names and size are extracted from the BAM header

Value

Returns a count table (a matrix of m features x n samples). If the outputFormat option has been set, a corresponding object is returned: a RangedSummarizedExperiment, a edgeR:DGEList or RNAseq.

Author(s)

Nicolas Delhomme

Examples

 ## Not run: 
	library(BSgenome.Dmelanogaster.UCSC.dm3)

 # get the example data
 tdir <- tutorialData()


 # get an example annotation file
 gAnnot.rda <- fetchData("gAnnot.rda")

	# creating a count table from 4 bam files
	count.table <- easyRNASeq(filesDirectory="tdir",
					pattern="[A,C,T,G]{6}\\.bam$",
				format="bam",
				readLength=36L,
				organism="Dmelanogaster",
				chr.sizes=seqlengths(Dmelanogaster),
				annotationMethod="rda",
				annotationFile=gAnnot.rda,
				count="exons")

	# an example of a chr.map
	chr.map <- data.frame(from=c("2L","2R","MT"),to=c("chr2L","chr2R","chrMT"))

	# an example of a GRangesList annotation
	grngs <- GRanges(seqnames=c("chr01","chr01","chr02"),
                     ranges=IRanges(
                             start=c(10,30,100),
                             end=c(21,53,123)),
                          strand=c("+","+","-"),
                          transcript=c("trA1","trA2","trB"),
                          gene=c("gA","gA","gB"),
                          exon=c("e1","e2","e3")
                          )

	grngsList<-split(grngs,seqnames(grngs))

## End(Not run)
## Not run: 
	library(BSgenome.Dmelanogaster.UCSC.dm3)

 # get the example data
 tdir <- tutorialData()


 # get an example annotation file
 gAnnot.rda <- fetchData("gAnnot.rda")

	# creating a count table from 4 bam files
	count.table <- easyRNASeq(filesDirectory="tdir",
					pattern="[A,C,T,G]{6}\\.bam$",
				format="bam",
				readLength=36L,
				organism="Dmelanogaster",
				chr.sizes=seqlengths(Dmelanogaster),
				annotationMethod="rda",
				annotationFile=gAnnot.rda,
				count="exons")

	# an example of a chr.map
	chr.map <- data.frame(from=c("2L","2R","MT"),to=c("chr2L","chr2R","chrMT"))

	# an example of a GRangesList annotation
	grngs <- GRanges(seqnames=c("chr01","chr01","chr02"),
                     ranges=IRanges(
                             start=c(10,30,100),
                             end=c(21,53,123)),
                          strand=c("+","+","-"),
                          transcript=c("trA1","trA2","trB"),
                          gene=c("gA","gA","gB"),
                          exon=c("e1","e2","e3")
                          )

	grngsList<-split(grngs,seqnames(grngs))

## End(Not run)

Extension for the edgeR package

Description

This method extends the edgeR package by offering the functionality to plot the effect of the normalization factor.

Usage

## S4 method for signature 'DGEList,character,character'
plotNormalizationFactors(
  obj = DGEList(),
  cond1 = character(1),
  cond2 = character(1)
)
## S4 method for signature 'DGEList,character,character'
plotNormalizationFactors(
  obj = DGEList(),
  cond1 = character(1),
  cond2 = character(1)
)

Arguments

`obj`	An object of class `DGEList`
`cond1`	A character string describing the first condition
`cond2`	A character string describing the second condition

Value

none

Author(s)

Nicolas Delhomme

Examples


	## Not run: 
	## create the object
	dgeList <- DGEList(counts,group)
	## calculate the sie factors
	dgeList <- calcNormFactors(dgeList)
	## plot them
	apply(combn(rownames(dgeList$samples),2),
		2,
		function(co,obj){plotNormalizationFactors(obj,co[1],co[2])},dgeList)
	
## End(Not run)

## Not run: 
	## create the object
	dgeList <- DGEList(counts,group)
	## calculate the sie factors
	dgeList <- calcNormFactors(dgeList)
	## plot them
	apply(combn(rownames(dgeList$samples),2),
		2,
		function(co,obj){plotNormalizationFactors(obj,co[1],co[2])},dgeList)
	
## End(Not run)

Extend the file.exists function to check the path slot of a Rsamtools BamFile class for existence

Description

Check if the bam file represented by a BamFile object exists.

Usage

## S4 method for signature 'BamFile'
file.exists(...)
## S4 method for signature 'BamFile'
file.exists(...)

Arguments

...

a BamFile object

Methods

list("signature(object = \"BamFile\")"): Checkk if the bam file linked to by a BamFile object exists.

Extension for the genomeIntervals package

Description

type: Another way to access the content of the gff type column.

Usage

## S4 method for signature 'Genome_intervals'
type(x)
## S4 method for signature 'Genome_intervals'
type(x)

Arguments

`x`	An object of class `Genome_intervals`

Value

type: The content of the type column, usually a factor or a character vector

Author(s)

Nicolas Delhomme

Examples

# library
library(genomeIntervals)

# fetch the example data
gffFilePath <- fetchData("Dmel-mRNA-exon-r5.52.gff3.gz")
annot<-readGff3(gffFilePath,quiet=TRUE)
type(annot)

# library
library(genomeIntervals)

# fetch the example data
gffFilePath <- fetchData("Dmel-mRNA-exon-r5.52.gff3.gz")
annot<-readGff3(gffFilePath,quiet=TRUE)
type(annot)

Get a BamFileList from a list of filenames

Description

A utility function to create a BamFileList-class object from a set of filenames. The filenames need to contain the file path if they are not in the working directory.

Usage

## S4 method for signature 'character,character'
getBamFileList(filenames = character(0), indexnames = character(0))
## S4 method for signature 'character,character'
getBamFileList(filenames = character(0), indexnames = character(0))

Arguments

`filenames`	a character vector containing fully defined BAM file filenames
`indexnames`	a character vector containing fully defined BAM index file filenames

Value

a BamFileList-class

Examples

# tutorial data - store the data in the BiocCache
tdir <- tutorialData()

  # creating a BamFileList using a directory and pattern
  # using filenames (from the Bioc cache)
  filenames <- dir(tdir,pattern="[A,C,T,G]{6}\\.bam$",full.names=TRUE)
  indexnames <- sapply(paste0(sub(".*_","",basename(filenames)),".bai"),fetchData)
  bfl <- getBamFileList(filenames,indexnames)

  # get them recursively
  filenames <- dir(path=tdir,pattern="[A,C,T,G]{6}\\.bam$",
                   full.names=TRUE,recursive=TRUE)
  indexnames <- sapply(paste0(sub(".*_","",basename(filenames)),".bai"),fetchData)
  bfl <- getBamFileList(filenames,indexnames)

# tutorial data - store the data in the BiocCache
tdir <- tutorialData()

  # creating a BamFileList using a directory and pattern
  # using filenames (from the Bioc cache)
  filenames <- dir(tdir,pattern="[A,C,T,G]{6}\\.bam$",full.names=TRUE)
  indexnames <- sapply(paste0(sub(".*_","",basename(filenames)),".bai"),fetchData)
  bfl <- getBamFileList(filenames,indexnames)

  # get them recursively
  filenames <- dir(path=tdir,pattern="[A,C,T,G]{6}\\.bam$",
                   full.names=TRUE,recursive=TRUE)
  indexnames <- sapply(paste0(sub(".*_","",basename(filenames)),".bai"),fetchData)
  bfl <- getBamFileList(filenames,indexnames)

Extension of the IRanges package

Description

Return the ranges of the genomic annotation.

Usage

## S4 method for signature 'RNAseq'
ranges(x)
## S4 method for signature 'RNAseq'
ranges(x)

Arguments

`x`	An object of the `RNAseq` class

Details

It retrieves the object stored in the genomicAnnotation slot of the RNAseq object and apply the ranges function on it.

Value

An IRangesList object, where the split is performed by seqnames (e.g. chromosomes).

Author(s)

Nicolas Delhomme

Examples


	## Not run: 
	library("RnaSeqTutorial")

	obj <- getAnnotation(
            AnnotParam(
              organism="Dmelanogaster",
              datasource=system.file(
    	          "extdata",
				        "Dmel-mRNA-exon-r5.52.gff3",
				        package="RnaSeqTutorial"),
  		        type="gff3"
  ))

	ranges(obj)
	
## End(Not run)

## Not run: 
	library("RnaSeqTutorial")

	obj <- getAnnotation(
            AnnotParam(
              organism="Dmelanogaster",
              datasource=system.file(
    	          "extdata",
				        "Dmel-mRNA-exon-r5.52.gff3",
				        package="RnaSeqTutorial"),
  		        type="gff3"
  ))

	ranges(obj)
	
## End(Not run)

parallel additional methods

Description

Functions defined in the easyRNASeq package that enhance the parallel package.

Usage

## S4 method for signature 'list,function'
parallelize(obj = list(), fun = NULL, nnodes = 1, ...)
## S4 method for signature 'list,function'
parallelize(obj = list(), fun = NULL, nnodes = 1, ...)

Arguments

`obj`	the object which processing has to be parallelizes
`fun`	the function to be applied in parallel
`nnodes`	the number of nodes to use
`...`	additional arguments passed to the function `fun`

Details

The parallelize function ease the use of the parallel package. If the number of nodes provided by the user is 1, then a simple 'lapply' is used, otherwise a cluster object is created and the object dispatched for parallelization.

Value

the result of the clusterApply function.

Author(s)

Nicolas Delhomme

Examples

parallelize(list(a<-c(1,2),b<-c(2,1)),sum,nnodes=1)
parallelize(list(a<-c(1,2),b<-c(2,1)),sum,nnodes=1)

Pretty print the content of classes from the easyRNASeq package.

Description

Print information about a RNAseq, AnnotParam, BamParam or RnaSeqParam object.

Usage

## S4 method for signature 'RNAseq'
print(x, verbose = FALSE, ...)
## S4 method for signature 'RNAseq'
print(x, verbose = FALSE, ...)

Arguments

`x`	An object from class `RNAseq`, `AnnotParam`, `BamParam` or `RnaSeqParam`
`verbose`	A logical to have a verbose or not output. Default to FALSE For object of the `RNAseq` class only.
`...`	Additional arguments, currently unused.

Value

Print information about the provided object.

Author(s)

Nicolas Delhomme

Class "RNAseq"

Description

A class holding all the necessary information and annotation to summarize couts (number of reads) per features (i.e. exons or transcripts or genes) for RNA-Seq experiments.

Objects from the Class

Objects can be created by calls of the form new("RNAseq", ...).

Author(s)

Nicolas Delhomme

Examples


showClass("RNAseq")

showClass("RNAseq")

Class "RnaSeqParam"

Description

A class holding all the necessary parameters to process a bam file issued from an RNA-Seq experiment together with the related annotation to compute a count-table using the simpleRNASeq function. The precision slot is used to determine the count unit:

readsdefault. The standard summarizeOverlaps-methods function is used to extract the read counts
bpThe easyRNASeq summarization functions are used to extract the read covered bp counts

Objects from the Class

Objects can be created by calls of the form new("RnaSeqParam", ...) or using the RnaSeqParam constructor.

Author(s)

Nicolas Delhomme

Examples


showClass("RnaSeqParam")

showClass("RnaSeqParam")

Methods extending the ShortRead package functionalities

Description

These are functions extending the ShortRead packages capabilities:

Usage

demultiplex(obj,barcodes=c(),barcodes.qty=12,barcode.length=6,
edition.dist=2,type=c("independant","within"),index.only=FALSE,mc.cores=1L)
barcodePlot(obj,barcodes=c(),type=c("independant","within"),
barcode.length=6,show.barcode=20,...)
chastityFilter(.name="Illumina Chastity Filter")
naPositionFilter(.name="NA Position Filter")
demultiplex(obj,barcodes=c(),barcodes.qty=12,barcode.length=6,
edition.dist=2,type=c("independant","within"),index.only=FALSE,mc.cores=1L)
barcodePlot(obj,barcodes=c(),type=c("independant","within"),
barcode.length=6,show.barcode=20,...)
chastityFilter(.name="Illumina Chastity Filter")
naPositionFilter(.name="NA Position Filter")

Arguments

`obj`	An object derived from class `AlignedRead`
`barcodes`	A character vector describing the multiplex (i.e. barcode) sequences used in the experiment.
`barcodes.qty`	An integer describing the number of barcodes
`barcode.length`	An integer describing the barcode length in bp
`edition.dist`	The maximal edition distance (i.e. the number of changes to apply), to accept an incorrectly sequenced barcode.
`type`	The type of barcode used. `independent` represents barcodes generated by the illumina protocol; i.e. a separate additional sequencing step performed once the first mate has been sequenced. `within` represents barcodes that are part of the sequenced reads as established by Lefrancois P et al., BMC Genomics, 2009
`index.only`	simply return the index and not the barcode themselves.
`mc.cores`	A parameter ultimately passed to srdistance to enable parallel processing on mc.cores. On linux and Mac only, windows task remain serially processed.
`.name`	An internal string describing the filter
`show.barcode`	An integer specifying how many barcodes should be displayed in the final output.
`...`	additional graphic parameters

Details

barcodePlot Creates a plot showing the barcode distribution of a multiplexed sequencing library.
chastityFilter Creates a SRFilter instance that filters SolexaExport read according to the chastity filtering value.
demultiplex Split a single AlignedRead object into a list of AlignedRead objects according to the barcodes provided by the user. It supports multicore processing but has a default serial behaviour.
naPositionFilter Creates a SRFilter instance that filters SolexaExport read having an NA position.

When demultiplexing, the function if provided with just the AlignedRead will try to find out how many barcodes were used and what they are. This is unwise to do as many barcodes will get wrongly sequenced and not always the most frequent ones are the one you used! It's therefore strongly advised to specify the barcodes' sequences that were used.

Value

barcodePlot returns invisibly the barcode frequencies.
chastityFilter returns a SRFilter instance.
demultiplex returns a list of AlignedRead objects.
naPositionFilter returns a SRFilter instance.

Author(s)

Nicolas Delhomme

Examples


	## Not run: 
	# the barcode
	barcodes=c("ACACTG","ACTAGC","ATGGCT","TTGCGA")

invisible(download.file(paste0("https://github.com/UPSCb/UPSCb/raw/",
       "master/tutorial/easyRNASeq/multiplex_export.txt.gz"),
       "multiplex_export.txt.gz"))

	# the multiplexed data
	alns <- readAligned(".",
                    pattern="multiplex_export",
                    filter=compose(
                      chastityFilter(),
                      nFilter(2),
                      chromosomeFilter(regex="chr")),
                    type="SolexaExport",
                    withAll=TRUE)

	# barcode plot
	barcodePlot(alns,
            barcodes=barcodes,
            type="within",
            barcode.length=6,
            show.barcode=20,
            main="All samples",
            xlim=c(0,0.5))

	# demultiplexing
	dem.alns <- demultiplex(alns,
                        barcodes=barcodes,
                        edition.dist=2,
                        barcodes.qty=4,
                        type="within")

	# plotting again
	par(mfrow=c(2,2))
	barcode.frequencies <- lapply(
                              names(dem.alns$barcodes),
                              function(barcode,alns){
                                barcodePlot(
                                            alns$barcodes[[barcode]],
                                            barcodes=barcode,
                                            type="within",barcode.length=6,
                                            show.barcode=20,
                                            main=paste(
                                              "Expected barcode:",
                                              barcode))
                              },dem.alns)
	
## End(Not run)

## Not run: 
	# the barcode
	barcodes=c("ACACTG","ACTAGC","ATGGCT","TTGCGA")

invisible(download.file(paste0("https://github.com/UPSCb/UPSCb/raw/",
       "master/tutorial/easyRNASeq/multiplex_export.txt.gz"),
       "multiplex_export.txt.gz"))

	# the multiplexed data
	alns <- readAligned(".",
                    pattern="multiplex_export",
                    filter=compose(
                      chastityFilter(),
                      nFilter(2),
                      chromosomeFilter(regex="chr")),
                    type="SolexaExport",
                    withAll=TRUE)

	# barcode plot
	barcodePlot(alns,
            barcodes=barcodes,
            type="within",
            barcode.length=6,
            show.barcode=20,
            main="All samples",
            xlim=c(0,0.5))

	# demultiplexing
	dem.alns <- demultiplex(alns,
                        barcodes=barcodes,
                        edition.dist=2,
                        barcodes.qty=4,
                        type="within")

	# plotting again
	par(mfrow=c(2,2))
	barcode.frequencies <- lapply(
                              names(dem.alns$barcodes),
                              function(barcode,alns){
                                barcodePlot(
                                            alns$barcodes[[barcode]],
                                            barcodes=barcode,
                                            type="within",barcode.length=6,
                                            show.barcode=20,
                                            main=paste(
                                              "Expected barcode:",
                                              barcode))
                              },dem.alns)
	
## End(Not run)

Display the content of classes from the easyRNASeq package.

Description

Display the content of a RNAseq, AnnotParam, BamParam or RnaSeqParam object.

Usage

## S4 method for signature 'RNAseq'
show(object)
## S4 method for signature 'RNAseq'
show(object)

Arguments

object

An object of the AnnotParam, BamParam, RnaSeqParam or RNAseq class

Methods

list("signature(object = \"RNAseq\")"): Display the values of the different slots of the RNAseq object.
Annot/Bam/RnaSeqParam: The respective object settings.

simpleRNASeq method

Description

This function is a wrapper around the more low level functionalities of the package. It is the simplest way to get a RangedSummarizedExperiment object from a set of bam files. RangedSummarizedExperiment are containers meant to hold any Next-Generation Sequencing experiment results and metadata. The simpleRNASeq method replaces the easyRNASeq function to simplify the usability. It does the following:

use GenomicAlignments for reading/pre-processing the BAM files.
get the annotations depending on the selected parameters
calculate the coverage from the provided file(s)
summarizes the read counts according to the selected summarization
returns a RangedSummarizedExperiment object.

Usage

## S4 method for signature 'BamFileList,RnaSeqParam'
simpleRNASeq(
  bamFiles = BamFileList(),
  param = RnaSeqParam(),
  nnodes = 1,
  verbose = TRUE,
  override = FALSE
)
## S4 method for signature 'BamFileList,RnaSeqParam'
simpleRNASeq(
  bamFiles = BamFileList(),
  param = RnaSeqParam(),
  nnodes = 1,
  verbose = TRUE,
  override = FALSE
)

Arguments

`bamFiles`	a `BamFileList` object
`param`	RnaSeqParam a `RnaSeqParam` object that describes the RNA-Seq experimental setup.
`nnodes`	The number of CPU cores to use in parallel
`verbose`	a logical to be report progress or not.
`override`	Should the provided parameters override the detected ones

Value

returns a RangedSummarizedExperiment object.

Author(s)

Nicolas Delhomme

Examples


# the data
tdir <- tutorialData()
annot <- fetchData("Drosophila_melanogaster.BDGP5.77.with-chr.gtf.gz")

 # create the BamFileList, get the BAM and BAI index files from the Bioc cache
 filenames <- dir(tdir,pattern="[A,T].*\\.bam$",full.names=TRUE)
 indexnames <- sapply(paste0(sub(".*_","",basename(filenames)),".bai"),fetchData)
 bamFiles <- getBamFileList(filenames,indexnames)

  # create the AnnotParam
  annotParam <- AnnotParam(annot,type="gtf")

  # create the RnaSeqParam
  rnaSeqParam <- RnaSeqParam(annotParam=annotParam)

  # get a RangedSummarizedExperiment containing the counts table
  sexp <- simpleRNASeq(
    bamFiles=bamFiles,
    param=rnaSeqParam,
    verbose=TRUE
  )

  # get the counts
  assays(sexp)$exons

# the data
tdir <- tutorialData()
annot <- fetchData("Drosophila_melanogaster.BDGP5.77.with-chr.gtf.gz")

 # create the BamFileList, get the BAM and BAI index files from the Bioc cache
 filenames <- dir(tdir,pattern="[A,T].*\\.bam$",full.names=TRUE)
 indexnames <- sapply(paste0(sub(".*_","",basename(filenames)),".bai"),fetchData)
 bamFiles <- getBamFileList(filenames,indexnames)

  # create the AnnotParam
  annotParam <- AnnotParam(annot,type="gtf")

  # create the RnaSeqParam
  rnaSeqParam <- RnaSeqParam(annotParam=annotParam)

  # get a RangedSummarizedExperiment containing the counts table
  sexp <- simpleRNASeq(
    bamFiles=bamFiles,
    param=rnaSeqParam,
    verbose=TRUE
  )

  # get the counts
  assays(sexp)$exons

Extension of the Rsamtools package

Description

Describes extensions to the Rsamtools package.

For BamFile and BamFileList objects:
- validate validates a BamFile or BamFileList object.

Usage

## S4 method for signature 'BamFile'
validate(obj, header = TRUE, cross.validation = TRUE)
## S4 method for signature 'BamFile'
validate(obj, header = TRUE, cross.validation = TRUE)

Arguments

`obj`	An object of the `BamFile` or `BamFileList` class
`header`	a boolean to (de)activate the check for a BAM header
`cross.validation`	a boolean - only valid for `BamFileList` objects - to (de)activate the cross validation of all the BAM files header

Details

validate checks whether the BAM file exists and if a BAI index is present.

Value

validate returns invisibly a vector of boolean. Fails anyway if any file is missing.

Author(s)

Nicolas Delhomme

Examples

# retrieve the data
tdir <- tutorialData()

# get the bam file path from the Bioc cache
filenames <- dir(tdir,pattern="[A,C,T,G]{6}\\.bam$",full.names=TRUE)

# retrieve the index from the Bioc cache too
inxnames <- sapply(paste0(sub(".*_","",basename(filenames)),".bai"),fetchData)

bfl <-BamFileList(filenames,index=inxnames)

validate(bfl)

# retrieve the data
tdir <- tutorialData()

# get the bam file path from the Bioc cache
filenames <- dir(tdir,pattern="[A,C,T,G]{6}\\.bam$",full.names=TRUE)

# retrieve the index from the Bioc cache too
inxnames <- sapply(paste0(sub(".*_","",basename(filenames)),".bai"),fetchData)

bfl <-BamFileList(filenames,index=inxnames)

validate(bfl)

Package 'easyRNASeq'

Help Index

Class "AnnotParam"

Description

Objects from the Class

Author(s)

See Also

Examples

Class "BamParam"

Description

Objects from the Class

Slots from the Class

Author(s)

See Also

Examples

Extend the basename function to display Rsamtools BamFile class basename

Description

Usage

Arguments

Methods

Manages the data necessary for the examples using BiocFileCache

Description

Usage

Arguments

Methods

See Also

Examples

Methods to create synthetic transcripts

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

The following function are defunct: easyRNASeq fetchCoverage fetchAnnotation knownOrganisms

Description

Accessors for RNAseq class

Description

Usage

Arguments

Value

Author(s)

Examples

Get genic annotation from a gff3/gtf file or using biomaRt

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Accessors for AnnotParam class

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

AnnotParam constructor

Description

Usage

Arguments

Details

See Also

Examples

Accessors for BamParam class

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

BamParam constructor

Description

Usage

Arguments

The following function are defunct:

`easyRNASeq`

`fetchCoverage`

`fetchAnnotation`

`knownOrganisms`