Package 'yarn' reference manual

Title:	YARN: Robust Multi-Condition RNA-Seq Preprocessing and Normalization
Description:	Expedite large RNA-Seq analyses using a combination of previously developed tools. YARN is meant to make it easier for the user in performing basic mis-annotation quality control, filtering, and condition-aware normalization. YARN leverages many Bioconductor tools and statistical techniques to account for the large heterogeneity and sparsity found in very large RNA-seq experiments.
Authors:	Joseph N Paulson [aut, cre], Cho-Yi Chen [aut], Camila Lopes-Ramos [aut], Marieke Kuijjer [aut], John Platig [aut], Abhijeet Sonawane [aut], Maud Fagny [aut], Kimberly Glass [aut], John Quackenbush [aut]
Maintainer:	Joseph N Paulson <[email protected]>
License:	Artistic-2.0
Version:	1.33.0
Built:	2025-03-30 07:43:12 UTC
Source:	https://github.com/bioc/yarn

Annotate your Expression Set with biomaRt

Description

Annotate your Expression Set with biomaRt

Usage

annotateFromBiomart(obj, genes = featureNames(obj),
  filters = "ensembl_gene_id", attributes = c("ensembl_gene_id",
  "hgnc_symbol", "chromosome_name", "start_position", "end_position"),
  biomart = "ensembl", dataset = "hsapiens_gene_ensembl", ...)
annotateFromBiomart(obj, genes = featureNames(obj),
  filters = "ensembl_gene_id", attributes = c("ensembl_gene_id",
  "hgnc_symbol", "chromosome_name", "start_position", "end_position"),
  biomart = "ensembl", dataset = "hsapiens_gene_ensembl", ...)

Arguments

`obj`	ExpressionSet object.
`genes`	Genes or rownames of the ExpressionSet.
`filters`	getBM filter value, see getBM help file.
`attributes`	getBM attributes value, see getBM help file.
`biomart`	BioMart database name you want to connect to. Possible database names can be retrieved with teh function listMarts.
`dataset`	Dataset you want to use. To see the different datasets available within a biomaRt you can e.g. do: mart = useMart('ensembl'), followed by listDatasets(mart).
`...`	Values for useMart, see useMart help file.

Value

ExpressionSet object with a fuller featureData.

Examples


data(skin)
# subsetting and changing column name just for a silly example
skin <- skin[1:10,]
colnames(fData(skin)) = paste("names",1:6)
biomart<-"ENSEMBL_MART_ENSEMBL";
genes <- sapply(strsplit(rownames(skin),split="\\."),function(i)i[1])
newskin <-annotateFromBiomart(skin,genes=genes,biomar=biomart)
head(fData(newskin)[,7:11])

data(skin)
# subsetting and changing column name just for a silly example
skin <- skin[1:10,]
colnames(fData(skin)) = paste("names",1:6)
biomart<-"ENSEMBL_MART_ENSEMBL";
genes <- sapply(strsplit(rownames(skin),split="\\."),function(i)i[1])
newskin <-annotateFromBiomart(skin,genes=genes,biomar=biomart)
head(fData(newskin)[,7:11])

Bladder RNA-seq data from the GTEx consortium

Description

Bladder RNA-seq data from the GTEx consortium. V6 release.

Usage

data(bladder)
data(bladder)

Format

An object of class "ExpressionSet"; see ExpressionSet.

Value

ExpressionSet object

Source

GTEx Portal

References

GTEx Consortium, 2015. The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans. Science, 348(6235), pp.648-660. (PubMed)

Examples

data(bladder);
checkMissAnnotation(bladder);
data(bladder);
checkMissAnnotation(bladder);

Check for wrong annotation of a sample using classical MDS and control genes.

Description

Check for wrong annotation of a sample using classical MDS and control genes.

Usage

checkMisAnnotation(obj, phenotype, controlGenes = "all",
  columnID = "chromosome_name", plotFlag = TRUE,
  legendPosition = NULL, ...)
checkMisAnnotation(obj, phenotype, controlGenes = "all",
  columnID = "chromosome_name", plotFlag = TRUE,
  legendPosition = NULL, ...)

Arguments

`obj`	ExpressionSet object.
`phenotype`	phenotype column name in the phenoData slot to check.
`controlGenes`	Name of controlGenes, ie. 'Y' chromosome. Can specify 'all'.
`columnID`	Column name where controlGenes is defined in the featureData slot if other than 'all'.
`plotFlag`	TRUE/FALSE Whether to plot or not
`legendPosition`	Location for the legend.
`...`	Extra parameters for `plotCMDS` function.

Value

Plots a classical multi-dimensional scaling of the 'controlGenes'. Optionally returns co-ordinates.

Examples

data(bladder)
checkMisAnnotation(bladder,'GENDER',controlGenes='Y',legendPosition='topleft')

data(bladder)
checkMisAnnotation(bladder,'GENDER',controlGenes='Y',legendPosition='topleft')

Check tissues to merge based on gene expression profile

Description

Check tissues to merge based on gene expression profile

Usage

checkTissuesToMerge(obj, majorGroups, minorGroups, filterFun = NULL,
  plotFlag = TRUE, ...)
checkTissuesToMerge(obj, majorGroups, minorGroups, filterFun = NULL,
  plotFlag = TRUE, ...)

Arguments

`obj`	ExpressionSet object.
`majorGroups`	Column name in the phenoData slot that describes the general body region or site of the sample.
`minorGroups`	Column name in the phenoData slot that describes the specific body region or site of the sample.
`filterFun`	Filter group specific genes that might disrupt PCoA analysis.
`plotFlag`	TRUE/FALSE whether to plot or not
`...`	Parameters that can go to `checkMisAnnotation`

Value

CMDS Plots of the majorGroupss colored by the minorGroupss. Optional matrix of CMDS loadings for each comparison.

Examples

data(skin)
checkTissuesToMerge(skin,'SMTS','SMTSD')

data(skin)
checkTissuesToMerge(skin,'SMTS','SMTSD')

Download GTEx files and turn them into ExpressionSet object

Description

Downloads the V6 GTEx release and turns it into an ExpressionSet object.

Usage

downloadGTEx(type = "genes", file = NULL, ...)
downloadGTEx(type = "genes", file = NULL, ...)

Arguments

`type`	Type of counts to download - default genes.
`file`	File path and name to automatically save the downloaded GTEx expression set. Saves as a RDS file.
`...`	Does nothing currently.

Value

Organized ExpressionSet set.

Examples

# obj <- downloadGTEx(type='genes',file='~/Desktop/gtex.rds')
# obj <- downloadGTEx(type='genes',file='~/Desktop/gtex.rds')

Extract the appropriate matrix

Description

This returns the raw counts, log2-transformed raw counts, or normalized expression. If normalized = TRUE then the log paramater is ignored.

Usage

extractMatrix(obj, normalized = FALSE, log = TRUE)
extractMatrix(obj, normalized = FALSE, log = TRUE)

Arguments

`obj`	ExpressionSet object or objrix.
`normalized`	TRUE / FALSE, use the normalized matrix or raw counts
`log`	TRUE/FALSE log2-transform.

Value

matrix

Examples


data(skin)
head(yarn:::extractMatrix(skin,normalized=FALSE,log=TRUE))
head(yarn:::extractMatrix(skin,normalized=FALSE,log=FALSE))

data(skin)
head(yarn:::extractMatrix(skin,normalized=FALSE,log=TRUE))
head(yarn:::extractMatrix(skin,normalized=FALSE,log=FALSE))

Filter specific genes

Description

The main use case for this function is the removal of sex-chromosome genes. Alternatively, filter genes that are not protein-coding.

Usage

filterGenes(obj, labels = c("X", "Y", "MT"),
  featureName = "chromosome_name", keepOnly = FALSE)
filterGenes(obj, labels = c("X", "Y", "MT"),
  featureName = "chromosome_name", keepOnly = FALSE)

Arguments

`obj`	ExpressionSet object.
`labels`	Labels of genes to filter or keep, eg. X, Y, and MT
`featureName`	FeatureData column name, eg. chr
`keepOnly`	Filter or keep only the genes with those labels

Value

Filtered ExpressionSet object

Examples

data(skin)
filterGenes(skin,labels = c('X','Y','MT'),featureName='chromosome_name')
filterGenes(skin,labels = 'protein_coding',featureName='gene_biotype',keepOnly=TRUE)

data(skin)
filterGenes(skin,labels = c('X','Y','MT'),featureName='chromosome_name')
filterGenes(skin,labels = 'protein_coding',featureName='gene_biotype',keepOnly=TRUE)

Filter genes that have less than a minimum threshold CPM for a given group/tissue

Description

Filter genes that have less than a minimum threshold CPM for a given group/tissue

Usage

filterLowGenes(obj, groups, threshold = 1, minSamples = NULL, ...)
filterLowGenes(obj, groups, threshold = 1, minSamples = NULL, ...)

Arguments

`obj`	ExpressionSet object.
`groups`	Vector of labels for each sample or a column name of the phenoData slot. for the ids to filter. Default is the column names.
`threshold`	The minimum threshold for calling presence of a gene in a sample.
`minSamples`	Minimum number of samples - defaults to half the minimum group size.
`...`	Options for cpm.

Value

Filtered ExpressionSet object

Examples

data(skin)
filterLowGenes(skin,'SMTSD')

data(skin)
filterLowGenes(skin,'SMTSD')

Filter genes not expressed in any sample

Description

The main use case for this function is the removal of missing genes.

Usage

filterMissingGenes(obj, threshold = 0)
filterMissingGenes(obj, threshold = 0)

Arguments

`obj`	ExpressionSet object.
`threshold`	Minimum sum of gene counts across samples – defaults to zero.

Value

Filtered ExpressionSet object

Examples

data(skin)
filterMissingGenes(skin)

data(skin)
filterMissingGenes(skin)

Filter samples

Description

Filter samples

Usage

filterSamples(obj, ids, groups = colnames(obj), keepOnly = FALSE)
filterSamples(obj, ids, groups = colnames(obj), keepOnly = FALSE)

Arguments

`obj`	ExpressionSet object.
`ids`	Names found within the groups labels corresponding to samples to be removed
`groups`	Vector of labels for each sample or a column name of the phenoData slot for the ids to filter. Default is the column names.
`keepOnly`	Filter or keep only the samples with those labels.

Value

Filtered ExpressionSet object

Examples

data(skin)
filterSamples(skin,ids = "Skin - Not Sun Exposed (Suprapubic)",groups="SMTSD")
filterSamples(skin,ids=c("GTEX-OHPL-0008-SM-4E3I9","GTEX-145MN-1526-SM-5SI9T"))

data(skin)
filterSamples(skin,ids = "Skin - Not Sun Exposed (Suprapubic)",groups="SMTSD")
filterSamples(skin,ids=c("GTEX-OHPL-0008-SM-4E3I9","GTEX-145MN-1526-SM-5SI9T"))

Normalize in a tissue aware context

Description

This function provides a wrapper to various normalization methods developed. Currently it only wraps qsmooth and quantile normalization returning a log-transformed normalized matrix. qsmooth is a normalization approach that normalizes samples in a condition aware manner.

Usage

normalizeTissueAware(obj, groups, normalizationMethod = c("qsmooth",
  "quantile"), ...)
normalizeTissueAware(obj, groups, normalizationMethod = c("qsmooth",
  "quantile"), ...)

Arguments

`obj`	ExpressionSet object
`groups`	Vector of labels for each sample or a column name of the phenoData slot for the ids to filter. Default is the column names
`normalizationMethod`	Choice of 'qsmooth' or 'quantile'
`...`	Options for `qsmooth` function or `normalizeQuantiles`

Value

ExpressionSet object with an assayData called normalizedMatrix

Source

The function qsmooth comes from the qsmooth packages currently available on github under user 'kokrah'.

Examples

data(skin)
normalizeTissueAware(skin,"SMTSD")

data(skin)
normalizeTissueAware(skin,"SMTSD")

Plot classical MDS of dataset

Description

This function plots the MDS coordinates for the "n" features of interest. Potentially uncovering batch effects or feature relationships.

Usage

plotCMDS(obj, comp = 1:2, normalized = FALSE, distFun = dist,
  distMethod = "euclidian", n = NULL, samples = TRUE, log = TRUE,
  plotFlag = TRUE, ...)
plotCMDS(obj, comp = 1:2, normalized = FALSE, distFun = dist,
  distMethod = "euclidian", n = NULL, samples = TRUE, log = TRUE,
  plotFlag = TRUE, ...)

Arguments

`obj`	ExpressionSet object or objrix.
`comp`	Which components to display.
`normalized`	TRUE / FALSE, use the normalized matrix or raw counts.
`distFun`	Distance function, default is dist.
`distMethod`	The distance measure to be used. This must be one of "euclidean", "maximum", "manhattan", "canberra", "binary" or "minkowski". Any unambiguous substring can be given.
`n`	Number of features to make use of in calculating your distances.
`samples`	Perform on samples or genes.
`log`	TRUE/FALSE log2-transform raw counts.
`plotFlag`	TRUE/FALSE whether to plot or not.
`...`	Additional plot arguments.

Value

coordinates

Examples

data(skin)
res <- plotCMDS(skin,pch=21,bg=factor(pData(skin)$SMTSD))

# library(calibrate)
# textxy(X=res[,1],Y=res[,2],labs=rownames(res))

data(skin)
res <- plotCMDS(skin,pch=21,bg=factor(pData(skin)$SMTSD))

# library(calibrate)
# textxy(X=res[,1],Y=res[,2],labs=rownames(res))

Density plots of columns in a matrix

Description

Plots the density of the columns of a matrix. Wrapper for matdensity.

Usage

plotDensity(obj, groups = NULL, normalized = FALSE, legendPos = NULL,
  ...)
plotDensity(obj, groups = NULL, normalized = FALSE, legendPos = NULL,
  ...)

Arguments

`obj`	ExpressionSet object
`groups`	Vector of labels for each sample or a column name of the phenoData slot for the ids to filter. Default is the column names.
`normalized`	TRUE / FALSE, use the normalized matrix or log2-transformed raw counts
`legendPos`	Legend title position. If null, does not create legend by default.
`...`	Extra parameters for matdensity.

Value

A density plot for each column in the ExpressionSet object colored by groups

Examples

data(skin)
filtData <- filterLowGenes(skin,"SMTSD")
plotDensity(filtData,groups="SMTSD",legendPos="topleft")
# to remove the legend
plotDensity(filtData,groups="SMTSD")

data(skin)
filtData <- filterLowGenes(skin,"SMTSD")
plotDensity(filtData,groups="SMTSD",legendPos="topleft")
# to remove the legend
plotDensity(filtData,groups="SMTSD")

Plot heatmap of most variable genes

Description

This function plots a heatmap of the gene expressions forthe "n" features of interest.

Usage

plotHeatmap(obj, n = NULL, fun = stats::sd, normalized = TRUE,
  log = TRUE, ...)
plotHeatmap(obj, n = NULL, fun = stats::sd, normalized = TRUE,
  log = TRUE, ...)

Arguments

`obj`	ExpressionSet object or objrix.
`n`	Number of features to make use of in plotting heatmap.
`fun`	Function to sort genes by, default `sd`.
`normalized`	TRUE / FALSE, use the normalized matrix or raw counts.
`log`	TRUE/FALSE log2-transform raw counts.
`...`	Additional plot arguments for `heatmap.2`.

Value

coordinates

Examples

data(skin)
tissues <- pData(skin)$SMTSD
plotHeatmap(skin,normalized=FALSE,log=TRUE,trace="none",n=10)
# Even prettier

# library(RColorBrewer)
data(skin)
tissues <- pData(skin)$SMTSD
heatmapColColors <- brewer.pal(12,"Set3")[as.integer(factor(tissues))]
heatmapCols <- colorRampPalette(brewer.pal(9, "RdBu"))(50)
plotHeatmap(skin,normalized=FALSE,log=TRUE,trace="none",n=10,
 col = heatmapCols,ColSideColors = heatmapColColors,cexRow = 0.6,cexCol = 0.6)

data(skin)
tissues <- pData(skin)$SMTSD
plotHeatmap(skin,normalized=FALSE,log=TRUE,trace="none",n=10)
# Even prettier

# library(RColorBrewer)
data(skin)
tissues <- pData(skin)$SMTSD
heatmapColColors <- brewer.pal(12,"Set3")[as.integer(factor(tissues))]
heatmapCols <- colorRampPalette(brewer.pal(9, "RdBu"))(50)
plotHeatmap(skin,normalized=FALSE,log=TRUE,trace="none",n=10,
 col = heatmapCols,ColSideColors = heatmapColColors,cexRow = 0.6,cexCol = 0.6)

Quantile shrinkage normalization

Description

This function was modified from github user kokrah.

Usage

qsmooth(obj, groups, norm.factors = NULL, plot = FALSE,
  window = 0.05, log = TRUE)
qsmooth(obj, groups, norm.factors = NULL, plot = FALSE,
  window = 0.05, log = TRUE)

Arguments

`obj`	for counts use log2(raw counts + 1)), for MA use log2(raw intensities)
`groups`	groups to which samples belong (character vector)
`norm.factors`	scaling normalization factors
`plot`	plot weights? (default=FALSE)
`window`	window size for running median (a fraction of the number of rows of exprs)
`log`	Whether or not the data should be log transformed before normalization, TRUE = YES.

Value

Normalized expression

Source

Kwame Okrah's qsmooth R package

Examples

data(skin)
head(yarn:::qsmooth(skin,groups=pData(skin)$SMTSD))

data(skin)
head(yarn:::qsmooth(skin,groups=pData(skin)$SMTSD))

Compute quantile statistics

Description

This function was directly borrowed from github user kokrah.

Usage

qstats(exprs, groups, window)
qstats(exprs, groups, window)

Arguments

`exprs`	for counts use log2(raw counts + 1)), for MA use log2(raw intensities)
`groups`	groups to which samples belong (character vector)
`window`	window size for running median as a fraction on the number of rows of exprs

Value

list of statistics

Source

Kwame Okrah's qsmooth R package Compute quantile statistics

Skin RNA-seq data from the GTEx consortium

Description

Skin RNA-seq data from the GTEx consortium. V6 release. Random selection of 20 skin samples. 13 of the samples are fibroblast cells, 5 Skin sun exposed, 2 sun unexposed.

Usage

data(skin)
data(skin)

Format

An object of class "ExpressionSet"; see ExpressionSet.

Value

ExpressionSet object

Source

GTEx Portal

References

GTEx Consortium, 2015. The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans. Science, 348(6235), pp.648-660. (PubMed)

Examples

data(skin);
checkMissAnnotation(skin,"GENDER");
data(skin);
checkMissAnnotation(skin,"GENDER");

Package 'yarn'

Help Index

Annotate your Expression Set with biomaRt

Description

Usage

Arguments

Value

Examples

Bladder RNA-seq data from the GTEx consortium

Description

Usage

Format

Value

Source

References

Examples

Check for wrong annotation of a sample using classical MDS and control genes.

Description

Usage

Arguments

Value

Examples

Check tissues to merge based on gene expression profile

Description

Usage

Arguments

Value

See Also

Examples

Download GTEx files and turn them into ExpressionSet object

Description

Usage

Arguments

Value

Examples

Extract the appropriate matrix

Description

Usage

Arguments

Value

Examples

Filter specific genes

Description

Usage

Arguments

Value

Examples

Filter genes that have less than a minimum threshold CPM for a given group/tissue

Description

Usage

Arguments

Value

See Also

Examples

Filter genes not expressed in any sample

Description

Usage

Arguments

Value

Examples

Filter samples

Description

Usage

Arguments

Value

Examples

Normalize in a tissue aware context

Description

Usage

Arguments

Value

Source

Examples

Plot classical MDS of dataset

Description

Usage

Arguments

Value

Examples

Density plots of columns in a matrix