Package 'HTSFilter' reference manual

Title:	Filter replicated high-throughput transcriptome sequencing data
Description:	This package implements a filtering procedure for replicated transcriptome sequencing data based on a global Jaccard similarity index in order to identify genes with low, constant levels of expression across one or more experimental conditions.
Authors:	Andrea Rau [cre, aut] , Melina Gallopin [ctb], Gilles Celeux [ctb], Florence Jaffrézic [ctb]
Maintainer:	Andrea Rau <[email protected]>
License:	Artistic-2.0
Version:	1.47.0
Built:	2025-02-27 05:34:43 UTC
Source:	https://github.com/bioc/HTSFilter

Filter replicated high-throughput transcriptome sequencing data

Description

This package implements a filtering procedure for replicated transcriptome sequencing data based on a global Jaccard similarity index in order to identify genes with low, constant levels of expression across one or more experimental conditions.

Details

Package:	HTSFilter
Type:	Package
Version:	1.31.1
Date:	2020-11-26
License:	Artistic-2.0
LazyLoad:	yes

Author(s)

Andrea Rau, Melina Gallopin, Gilles Celeux, and Florence Jaffrezic

Maintainer: Andrea Rau <[email protected]>

References

R. Bourgon, R. Gentleman, and W. Huber. (2010) Independent filtering increases detection power for high- throughput experiments. PNAS 107(21):9546-9551.

P. Jaccard (1901). Etude comparative de la distribution orale dans une portion des Alpes et des Jura. Bulletin de la Societe Vaudoise des Sciences Naturelles, 37:547-549.

A. Rau, M. Gallopin, G. Celeux, F. Jaffrezic (2013). Data-based filtering for replicated high-throughput transcriptome sequencing experiments. Bioinformatics, doi: 10.1093/bioinformatics/btt350.

Examples

library(Biobase)
data("sultan")
conds <- pData(sultan)$cell.line

########################################################################
## Matrix or data.frame
########################################################################

filter <- HTSFilter(exprs(sultan), conds, s.len=25, plot=FALSE)

########################################################################
## DGEExact
########################################################################

library(edgeR)
dge <- DGEList(counts=exprs(sultan), group=conds)
dge <- calcNormFactors(dge)
dge <- estimateCommonDisp(dge)
dge <- estimateTagwiseDisp(dge)
et <- exactTest(dge)
et <- HTSFilter(et, DGEList=dge, s.len=25, plot=FALSE)$filteredData
## topTags(et)


########################################################################
## DESeq2
########################################################################

library(DESeq2)
conds <- gsub(" ", ".", conds)
dds <- DESeqDataSetFromMatrix(countData = exprs(sultan),
                              colData = data.frame(cell.line = conds),
                              design = ~ cell.line)
## Not run:
##
## dds <- DESeq(dds)
## filter <- HTSFilter(dds, s.len=25, plot=FALSE)$filteredData
## class(filter)
## res <- results(filter, independentFiltering=FALSE)
library(Biobase)
data("sultan")
conds <- pData(sultan)$cell.line

########################################################################
## Matrix or data.frame
########################################################################

filter <- HTSFilter(exprs(sultan), conds, s.len=25, plot=FALSE)

########################################################################
## DGEExact
########################################################################

library(edgeR)
dge <- DGEList(counts=exprs(sultan), group=conds)
dge <- calcNormFactors(dge)
dge <- estimateCommonDisp(dge)
dge <- estimateTagwiseDisp(dge)
et <- exactTest(dge)
et <- HTSFilter(et, DGEList=dge, s.len=25, plot=FALSE)$filteredData
## topTags(et)


########################################################################
## DESeq2
########################################################################

library(DESeq2)
conds <- gsub(" ", ".", conds)
dds <- DESeqDataSetFromMatrix(countData = exprs(sultan),
                              colData = data.frame(cell.line = conds),
                              design = ~ cell.line)
## Not run:
##
## dds <- DESeq(dds)
## filter <- HTSFilter(dds, s.len=25, plot=FALSE)$filteredData
## class(filter)
## res <- results(filter, independentFiltering=FALSE)

Implement basic filters for transcriptome sequencing data.

Description

Implement a variety of basic filters for transcriptome sequencing data.

Usage

HTSBasicFilter(x, ...)

## S4 method for signature 'matrix'
HTSBasicFilter(
  x,
  method,
  cutoff.type = "value",
  cutoff = 10,
  length = NA,
  normalization = c("TMM", "DESeq", "none")
)

## S4 method for signature 'data.frame'
HTSBasicFilter(
  x,
  method,
  cutoff.type = "value",
  cutoff = 10,
  length = NA,
  normalization = c("TMM", "DESeq", "none")
)

## S4 method for signature 'DGEList'
HTSBasicFilter(
  x,
  method,
  cutoff.type = "value",
  cutoff = 10,
  length = NA,
  normalization = c("TMM", "DESeq", "pseudo.counts", "none")
)

## S4 method for signature 'DGEExact'
HTSBasicFilter(
  x,
  method,
  cutoff.type = "value",
  cutoff = 10,
  length = NA,
  normalization = c("TMM", "DESeq", "pseudo.counts", "none")
)

## S4 method for signature 'DGEGLM'
HTSBasicFilter(
  x,
  method,
  cutoff.type = "value",
  cutoff = 10,
  length = NA,
  normalization = c("TMM", "DESeq", "none")
)

## S4 method for signature 'DGELRT'
HTSBasicFilter(
  x,
  method,
  cutoff.type = "value",
  cutoff = 10,
  length = NA,
  normalization = c("TMM", "DESeq", "none")
)

## S4 method for signature 'DESeqDataSet'
HTSBasicFilter(
  x,
  method,
  cutoff.type = "value",
  cutoff = 10,
  length = NA,
  normalization = c("DESeq", "TMM", "none"),
  pAdjustMethod = "BH"
)
HTSBasicFilter(x, ...)

## S4 method for signature 'matrix'
HTSBasicFilter(
  x,
  method,
  cutoff.type = "value",
  cutoff = 10,
  length = NA,
  normalization = c("TMM", "DESeq", "none")
)

## S4 method for signature 'data.frame'
HTSBasicFilter(
  x,
  method,
  cutoff.type = "value",
  cutoff = 10,
  length = NA,
  normalization = c("TMM", "DESeq", "none")
)

## S4 method for signature 'DGEList'
HTSBasicFilter(
  x,
  method,
  cutoff.type = "value",
  cutoff = 10,
  length = NA,
  normalization = c("TMM", "DESeq", "pseudo.counts", "none")
)

## S4 method for signature 'DGEExact'
HTSBasicFilter(
  x,
  method,
  cutoff.type = "value",
  cutoff = 10,
  length = NA,
  normalization = c("TMM", "DESeq", "pseudo.counts", "none")
)

## S4 method for signature 'DGEGLM'
HTSBasicFilter(
  x,
  method,
  cutoff.type = "value",
  cutoff = 10,
  length = NA,
  normalization = c("TMM", "DESeq", "none")
)

## S4 method for signature 'DGELRT'
HTSBasicFilter(
  x,
  method,
  cutoff.type = "value",
  cutoff = 10,
  length = NA,
  normalization = c("TMM", "DESeq", "none")
)

## S4 method for signature 'DESeqDataSet'
HTSBasicFilter(
  x,
  method,
  cutoff.type = "value",
  cutoff = 10,
  length = NA,
  normalization = c("DESeq", "TMM", "none"),
  pAdjustMethod = "BH"
)

Arguments

`x`	A numeric matrix or data.frame representing the counts of dimension (g x n), for g genes in n samples, a `DGEList` object, a `DGEExact` object, a `DGEGLM` object, a `DGELRT` object, or a `DESeqDataSet` object.
`...`	Additional optional arguments
`method`	Basic filtering method to be used: “mean”, “sum”, “rpkm”, “variance”, “cpm”, “max”, “cpm.mean”, “cpm.sum”, “cpm.variance”, “cpm.max”, “rpkm.mean”, “rpkm.sum”, “rpkm.variance”, or “rpkm.max”
`cutoff.type`	Type of cutoff to be used: a numeric value indicating the number of samples to be used for filtering (when `method` = “cpm” or “rpkm”), or one of “value”, “number”, or “quantile”
`cutoff`	Cutoff to be used for chosen filter
`length`	Optional vector of length n containing the lengths of each gene in `x`; optional except in the case of `method` = “rpkm”
`normalization`	Normalization method to be used to correct for differences in library sizes, with choices “TMM” (Trimmed Mean of M-values), “DESeq” (normalization method proposed in the DESeq package), “pseudo.counts” (pseudo-counts obtained via quantile-quantile normalization in the edgeR package, only available for objects of class `DGEList` and `DGEExact`), and “none” (to be used only if user is certain no normalization is required, or if data have already been pre-normalized by an alternative method)
`pAdjustMethod`	The method used to adjust p-values, see `?p.adjust`

Details

This function implements a basic filter for high-throughput sequencing data for a variety of filter types: mean, sum, RPKM, variance, CPM, maximum, mean CPM values, the sum of CPM values, the variance of CPM values, maximum CPM value, mean RPKM values, the sum of RPKM values, the variance of RPKM values, or the maximum RPKM value. The filtering criteria used may be for a given cutoff value, a number of genes, or a given quantile value.

Value

filteredData An object of the same class as x containing the data that passed the filter
on A binary vector of length g, where 1 indicates a gene with normalized expression greater than the optimal filtering threshold s.optimal in at least one sample (irrespective of condition labels), and 0 indicates a gene with normalized expression less than or equal to the optimal filtering threshold in all samples
normFactor A vector of length n giving the estimated library sizes estimated by the normalization method specified in normalization
removedData A matrix containing the filtered data
filterCrit A vector or matrix containing the criteria used to perform filtering

Author(s)

Andrea Rau, Melina Gallopin, Gilles Celeux, and Florence Jaffrezic

References

R. Bourgon, R. Gentleman, and W. Huber. (2010) Independent filtering increases detection power for high- throughput experiments. PNAS 107(21):9546-9551.

A. Rau, M. Gallopin, G. Celeux, F. Jaffrezic (2013). Data-based filtering for replicated high-throughput transcriptome sequencing experiments. Bioinformatics, doi: 10.1093/bioinformatics/btt350.

Examples

library(Biobase)
data("sultan")
conds <- pData(sultan)$cell.line
 
########################################################################
## Matrix or data.frame
########################################################################

## Filter genes with total (sum) normalized gene counts < 10
filter <- HTSBasicFilter(exprs(sultan), method="sum", cutoff.type="value", 
                        cutoff = 10)
                        
                        
########################################################################
## DGEExact
########################################################################

library(edgeR)
## Filter genes with CPM values less than 100 in more than 2 samples
dge <- DGEList(counts=exprs(sultan), group=conds)
dge <- calcNormFactors(dge)
filter <- HTSBasicFilter(dge, method="cpm", cutoff.type=2, cutoff=100)

########################################################################
## DESeq2
########################################################################

library(DESeq2)
conds <- gsub(" ", ".", conds)
dds <- DESeqDataSetFromMatrix(countData = exprs(sultan),
                             colData = data.frame(cell.line = conds),
                               design = ~ cell.line)
                             
                             
## Not run: Filter genes with mean normalized gene counts < 40% quantile
## dds <- DESeq(dds)
## filter <- HTSBasicFilter(dds, method="mean", cutoff.type="quantile", 
##	cutoff = 0.4)
## res <- results(filter, independentFiltering=FALSE)
library(Biobase)
data("sultan")
conds <- pData(sultan)$cell.line
 
########################################################################
## Matrix or data.frame
########################################################################

## Filter genes with total (sum) normalized gene counts < 10
filter <- HTSBasicFilter(exprs(sultan), method="sum", cutoff.type="value", 
                        cutoff = 10)
                        
                        
########################################################################
## DGEExact
########################################################################

library(edgeR)
## Filter genes with CPM values less than 100 in more than 2 samples
dge <- DGEList(counts=exprs(sultan), group=conds)
dge <- calcNormFactors(dge)
filter <- HTSBasicFilter(dge, method="cpm", cutoff.type=2, cutoff=100)

########################################################################
## DESeq2
########################################################################

library(DESeq2)
conds <- gsub(" ", ".", conds)
dds <- DESeqDataSetFromMatrix(countData = exprs(sultan),
                             colData = data.frame(cell.line = conds),
                               design = ~ cell.line)
                             
                             
## Not run: Filter genes with mean normalized gene counts < 40% quantile
## dds <- DESeq(dds)
## filter <- HTSBasicFilter(dds, method="mean", cutoff.type="quantile", 
##	cutoff = 0.4)
## res <- results(filter, independentFiltering=FALSE)

Calculate data-based filtering threshold for replicated transcriptome sequencing data.

Description

Calculate a data-based filtering threshold for replicated transcriptome sequencing data through the pairwise Jaccard similarity index between pairs of replicates within each experimental condition.

Usage

HTSFilter(x, ...)

## S4 method for signature 'matrix'
HTSFilter(
  x,
  conds,
  s.min = 1,
  s.max = 200,
  s.len = 100,
  loess.span = 0.3,
  normalization = c("TMM", "DESeq", "none"),
  plot = TRUE,
  plot.name = NA,
  parallel = FALSE,
  BPPARAM = bpparam()
)

## S4 method for signature 'data.frame'
HTSFilter(
  x,
  conds,
  s.min = 1,
  s.max = 200,
  s.len = 100,
  loess.span = 0.3,
  normalization = c("TMM", "DESeq", "none"),
  plot = TRUE,
  plot.name = NA,
  parallel = FALSE,
  BPPARAM = bpparam()
)

## S4 method for signature 'DGEList'
HTSFilter(
  x,
  s.min = 1,
  s.max = 200,
  s.len = 100,
  loess.span = 0.3,
  normalization = c("TMM", "DESeq", "pseudo.counts", "none"),
  plot = TRUE,
  plot.name = NA,
  parallel = FALSE,
  BPPARAM = bpparam(),
  conds
)

## S4 method for signature 'DGEExact'
HTSFilter(
  x,
  DGEList,
  s.min = 1,
  s.max = 200,
  s.len = 100,
  loess.span = 0.3,
  normalization = c("TMM", "DESeq", "pseudo.counts", "none"),
  plot = TRUE,
  plot.name = NA,
  parallel = FALSE,
  BPPARAM = bpparam(),
  conds
)

## S4 method for signature 'DGEGLM'
HTSFilter(
  x,
  s.min = 1,
  s.max = 200,
  s.len = 100,
  loess.span = 0.3,
  normalization = c("TMM", "DESeq", "none"),
  plot = TRUE,
  plot.name = NA,
  parallel = FALSE,
  BPPARAM = bpparam(),
  conds
)

## S4 method for signature 'DGELRT'
HTSFilter(
  x,
  DGEGLM,
  s.min = 1,
  s.max = 200,
  s.len = 100,
  loess.span = 0.3,
  normalization = c("TMM", "DESeq", "none"),
  plot = TRUE,
  plot.name = NA,
  parallel = FALSE,
  BPPARAM = bpparam(),
  conds
)

## S4 method for signature 'DESeqDataSet'
HTSFilter(
  x,
  s.min = 1,
  s.max = 200,
  s.len = 100,
  loess.span = 0.3,
  normalization = c("DESeq", "TMM", "none"),
  plot = TRUE,
  plot.name = NA,
  pAdjustMethod = "BH",
  parallel = FALSE,
  BPPARAM = bpparam(),
  conds
)
HTSFilter(x, ...)

## S4 method for signature 'matrix'
HTSFilter(
  x,
  conds,
  s.min = 1,
  s.max = 200,
  s.len = 100,
  loess.span = 0.3,
  normalization = c("TMM", "DESeq", "none"),
  plot = TRUE,
  plot.name = NA,
  parallel = FALSE,
  BPPARAM = bpparam()
)

## S4 method for signature 'data.frame'
HTSFilter(
  x,
  conds,
  s.min = 1,
  s.max = 200,
  s.len = 100,
  loess.span = 0.3,
  normalization = c("TMM", "DESeq", "none"),
  plot = TRUE,
  plot.name = NA,
  parallel = FALSE,
  BPPARAM = bpparam()
)

## S4 method for signature 'DGEList'
HTSFilter(
  x,
  s.min = 1,
  s.max = 200,
  s.len = 100,
  loess.span = 0.3,
  normalization = c("TMM", "DESeq", "pseudo.counts", "none"),
  plot = TRUE,
  plot.name = NA,
  parallel = FALSE,
  BPPARAM = bpparam(),
  conds
)

## S4 method for signature 'DGEExact'
HTSFilter(
  x,
  DGEList,
  s.min = 1,
  s.max = 200,
  s.len = 100,
  loess.span = 0.3,
  normalization = c("TMM", "DESeq", "pseudo.counts", "none"),
  plot = TRUE,
  plot.name = NA,
  parallel = FALSE,
  BPPARAM = bpparam(),
  conds
)

## S4 method for signature 'DGEGLM'
HTSFilter(
  x,
  s.min = 1,
  s.max = 200,
  s.len = 100,
  loess.span = 0.3,
  normalization = c("TMM", "DESeq", "none"),
  plot = TRUE,
  plot.name = NA,
  parallel = FALSE,
  BPPARAM = bpparam(),
  conds
)

## S4 method for signature 'DGELRT'
HTSFilter(
  x,
  DGEGLM,
  s.min = 1,
  s.max = 200,
  s.len = 100,
  loess.span = 0.3,
  normalization = c("TMM", "DESeq", "none"),
  plot = TRUE,
  plot.name = NA,
  parallel = FALSE,
  BPPARAM = bpparam(),
  conds
)

## S4 method for signature 'DESeqDataSet'
HTSFilter(
  x,
  s.min = 1,
  s.max = 200,
  s.len = 100,
  loess.span = 0.3,
  normalization = c("DESeq", "TMM", "none"),
  plot = TRUE,
  plot.name = NA,
  pAdjustMethod = "BH",
  parallel = FALSE,
  BPPARAM = bpparam(),
  conds
)

Arguments

`x`	A numeric matrix or data.frame representing the counts of dimension (g x n), for g genes in n samples, a `DGEList` object, a `DGEExact` object, a `DGEGLM` object, a `DGELRT` object, or a `DESeqDataSet` object.
`...`	Additional optional arguments
`conds`	Vector of length n identifying the experimental condition of each of the n samples; required when sQuote(x) is a numeric matrix. In the case of objects of class `DGEList`, `DGEExact`, `DGEGLM`, `DGELRT`, or `DESeqDataSet`, the design matrix is automatically
`s.min`	Minimum value of filtering threshold to be considered, with default value equal to 1
`s.max`	Maximum value of filtering threshold to be considered, with default value equal to 200
`s.len`	Length of sequence of filtering thresholds to be considered (from `s.min` to `s.max`) for the calculation of the global similarity index
`loess.span`	Span of the loess curve to be fitted to the filtering thresholds and corresponding global similarity indices, with default value equal to 0.3
`normalization`	Normalization method to be used to correct for differences in library sizes, with choices “TMM” (Trimmed Mean of M-values), “DESeq” (normalization method proposed in the DESeq package), “pseudo.counts” (pseudo-counts obtained via quantile-quantile normalization in the edgeR package, only available for objects of class `DGEList` and `DGEExact`), and “none” (to be used only if user is certain no normalization is required, or if data have already been pre-normalized by an alternative method)
`plot`	If “TRUE”, produce a plot of the calculated global similarity indices against the filtering threshold with superimposed loess curve
`plot.name`	If `plot` = “TRUE”, the name of the PDF file to be saved to the current working directory. If `plot.name` = NA, the plot is drawn in the current window.
`parallel`	If `FALSE`, no parallelization. If `TRUE`, parallel execution using BiocParallel (see next argument `BPPARAM`). A note on running in parallel using BiocParallel: it may be advantageous to remove large, unneeded objects from the current R environment before calling the function, as it is possible that R's internal garbage collection will copy these files while running on worker nodes.
`BPPARAM`	Optional parameter object passed internally to `bplapply` when `parallel=TRUE`. If not specified, the parameters last registered with `register` will be used.
`DGEList`	Object of class DGEList, to be used when filtering objects of class DGEExact
`DGEGLM`	Object of class DGEGLM, to be used when filtering objects of class DGELRT
`pAdjustMethod`	The method used to adjust p-values, see `?p.adjust`

Details

The Jaccard similarity index, which measures the overlap of two sets, is calculated as follows. Given two binary vectors, each of length n, we define the following values:

a = the number of attributes with a value of 1 in both vectors
b = the number of attributes with a value of 1 in the first vector and 0 in the second
c = the number of attributes with a value of 0 in the first vector and 1 in the second
d = the number of attributes with a value of 0 in both vectors

We note that all attributes fall into one of these four quantities, so $a+b+c+d=n$ . Given these quantities, we may calculate the Jaccard similarity index between the two vectors as follows:

$J = \frac{a}{a+b+c}.$

Value

filteredData An object of the same class as x containing the data that passed the filter
on A binary vector of length g, where 1 indicates a gene with normalized expression greater than the optimal filtering threshold s.optimal in at least one sample (irrespective of condition labels), and 0 indicates a gene with normalized expression less than or equal to the optimal filtering threshold in all samples
s The optimal filtering threshold as identified by the global similarity index
indexValues A matrix of dimension (s.len x 2) giving the tested filtering thersholds and the corresponding global similarity indices. Note that the threshold values are equally spaced on the log scale, and thus unequally spaced on the count scale (i.e., we test more threshold values at very low levels of expression, and fewer at very high levels of expression).
normFactor A vector of length n giving the estimated library sizes estimated by the normalization method specified in normalization
removedData A matrix containing the filtered data

Author(s)

Andrea Rau, Melina Gallopin, Gilles Celeux, and Florence Jaffrezic

References

R. Bourgon, R. Gentleman, and W. Huber. (2010) Independent filtering increases detection power for high- throughput experiments. PNAS 107(21):9546-9551.

P. Jaccard (1901). Etude comparative de la distribution orale dans une portion des Alpes et des Jura. Bulletin de la Societe Vaudoise des Sciences Naturelles, 37:547-549.

A. Rau, M. Gallopin, G. Celeux, F. Jaffrezic (2013). Data-based filtering for replicated high-throughput transcriptome sequencing experiments. Bioinformatics, doi: 10.1093/bioinformatics/btt350.

Examples

library(Biobase)
data("sultan")
conds <- pData(sultan)$cell.line

########################################################################
## Matrix or data.frame
########################################################################

filter <- HTSFilter(exprs(sultan), conds, s.len=25, plot=FALSE)

########################################################################
## DGEExact
########################################################################

library(edgeR)
dge <- DGEList(counts=exprs(sultan), group=conds)
dge <- calcNormFactors(dge)
dge <- estimateCommonDisp(dge)
dge <- estimateTagwiseDisp(dge)
et <- exactTest(dge)
et <- HTSFilter(et, DGEList=dge, s.len=25, plot=FALSE)$filteredData
## topTags(et)


########################################################################
## DESeq2
########################################################################

library(DESeq2)
conds <- gsub(" ", ".", conds)
dds <- DESeqDataSetFromMatrix(countData = exprs(sultan),
                              colData = data.frame(cell.line = conds),
                              design = ~ cell.line)
## Not run:
##
## dds <- DESeq(dds)
## filter <- HTSFilter(dds, s.len=25, plot=FALSE)$filteredData
## class(filter)
## res <- results(filter, independentFiltering=FALSE)
library(Biobase)
data("sultan")
conds <- pData(sultan)$cell.line

########################################################################
## Matrix or data.frame
########################################################################

filter <- HTSFilter(exprs(sultan), conds, s.len=25, plot=FALSE)

########################################################################
## DGEExact
########################################################################

library(edgeR)
dge <- DGEList(counts=exprs(sultan), group=conds)
dge <- calcNormFactors(dge)
dge <- estimateCommonDisp(dge)
dge <- estimateTagwiseDisp(dge)
et <- exactTest(dge)
et <- HTSFilter(et, DGEList=dge, s.len=25, plot=FALSE)$filteredData
## topTags(et)


########################################################################
## DESeq2
########################################################################

library(DESeq2)
conds <- gsub(" ", ".", conds)
dds <- DESeqDataSetFromMatrix(countData = exprs(sultan),
                              colData = data.frame(cell.line = conds),
                              design = ~ cell.line)
## Not run:
##
## dds <- DESeq(dds)
## filter <- HTSFilter(dds, s.len=25, plot=FALSE)$filteredData
## class(filter)
## res <- results(filter, independentFiltering=FALSE)

Normalize transcriptome sequencing data.

Description

Normalize count-based measures of transcriptome sequencing data using the Trimmed Means of M-values (TMM) or DESeq approach.

Usage

normalizeData(data, normalization)
normalizeData(data, normalization)

Arguments

`data`	numeric matrix representing the counts of dimension (g x n), for g genes in n samples.
`normalization`	Normalization method to be used to correct for differences in library sizes, with choices “TMM” (Trimmed Mean of M-values), “DESeq” (normalization method proposed in the DESeq package), and “none”

Value

data.norm A numeric matrix representing the normalized counts of dimension (g x n), for g genes in n samples.
norm.factor A vector of length n giving the estimated library sizes estimated by the normalization method specified in normalization

Author(s)

Andrea Rau, Melina Gallopin, Gilles Celeux, and Florence Jaffrezic

References

S. Anders and W. Huber (2010). Differential expression analysis for sequence count data. Genome Biology, 11(R106):1-28.

A. Rau, M. Gallopin, G. Celeux, F. Jaffrezic (2013). Data-based filtering for replicated high-throughput transcriptome sequencing experiments. Bioinformatics, doi: 10.1093/bioinformatics/btt350.

M. D. Robinson and A. Oshlack (2010). A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biology, 11(R25).

Examples

library(Biobase)
data("sultan")
normData <- normalizeData(exprs(sultan), norm="DESeq") 

library(Biobase)
data("sultan")
normData <- normalizeData(exprs(sultan), norm="DESeq")

RNA-seq data from humans in Sultan et al. (2008)

Description

This dataset represents RNA-seq data from humans in two conditions (Ramos B cell line and HEK293T), with two biological replicates per condition. The ExpressionSet was downloaded from the ReCount online resource.

Usage

data(sultan)
data(sultan)

Format

An ExpressionSet named sultan.eset containing the phenotype data and expression data for the Sultan et al. (2008) experiment. Phenotype data may be accessed using the pData function, and expression data may be accessed using the exprs function.

Value

Object of class ‘ExpressionSet’. Matrix of counts can be accessed after loading the ‘Biobase’ package and calling exprs(sultan)).

Source

ReCount online resource (http://bowtie-bio.sourceforge.net/recount).

References

data_blah.com

A. C. Frazee, B. Langmead, and J. T. Leek. ReCount: a multi-experiment resource of analysis-ready RNA-seq gene count datasets. BMC Bioinformatics, 12(449), 2011.

M. Sultan, M. H. Schulz, H. Richard, A. Magen, A. Klingenhoff, M. Scherf, M. Seifert, T. Borodina, A. Soldatov, D. Parkhomchuk, D. Schmidt, S. O'Keefe, S. Haas, M. Vingron, H. Lehrach, and M. L. Yaspo. A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science, 15(5891):956-60, 2008.

Package 'HTSFilter'

Help Index

Filter replicated high-throughput transcriptome sequencing data

Description

Details

Author(s)

References

Examples

Implement basic filters for transcriptome sequencing data.

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Calculate data-based filtering threshold for replicated transcriptome sequencing data.

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Normalize transcriptome sequencing data.

Description

Usage

Arguments

Value

Author(s)

References

Examples

RNA-seq data from humans in Sultan et al. (2008)

Description

Usage

Format

Value

Source

References