Title: | Transcriptome instability analysis |
---|---|
Description: | The TIN package implements a set of tools for transcriptome instability analysis based on exon expression profiles. Deviating exon usage is studied in the context of splicing factors to analyse to what degree transcriptome instability is correlated to splicing factor expression. In the transcriptome instability correlation analysis, the data is compared to both random permutations of alternative splicing scores and expression of random gene sets. |
Authors: | Bjarne Johannessen, Anita Sveen and Rolf I. Skotheim |
Maintainer: | Bjarne Johannessen <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.39.0 |
Built: | 2024-10-31 06:22:36 UTC |
Source: | https://github.com/bioc/TIN |
The function takes in the data.frame from 'firmaAnalysis' (containing log2 FIRMA scores for all probe sets/exons (rows) in all samples (columns)), and a number indicating which percentile value of global FIRMA scores to be used as threshold for denoting aberrant exon usage (default value '1', calculating the lower and upper 1st percentiles, indicating aberrant exon skipping and inclusion, respectively). Lower and upper percentile values are calculated and stored in the global list object 'quantiles'. Also, the total number of exons per sample denoted with aberrant exon usage (having FIRMA scores outside the indicated threshold values) is calculated and stored in the global list object 'aberrantExons'. The function returns a vector with these total sample-wise amounts of aberrant exon usage (sum of aberrant skipping and inclusion amounts) relative to the average sample-wise amount in the dataset (log2-transformed).
aberrantExonUsage(percentile, fs)
aberrantExonUsage(percentile, fs)
percentile |
This number indicates the percentile value of the global FIRMA scores to be used as threshold for denoting aberrant exon usage. Default value '1' calculates the lower and upper 1st percentiles, indicating aberrant exon skipping and inclusion, respectively. |
fs |
Data.frame consisting of log2 FIRMA scores for all probe sets/exons (rows) in all samples (columns). This data.frame is the output from 'firmaAnalysis'. |
A numeric vector with log2-transformed sample-wise amounts of aberrant exon usage relative to the average sample-wise amount in the dataset. In addition, the quantiles list object is created, which contains the threshold values for the lower and upper percentiles.
# Calculate aberrant exon usage for each sample in the data set: fs <- firmaAnalysis(useToyData=TRUE) tra <- aberrantExonUsage(1.0, fs) # The aberrantExonUsage function also creates the 'quantiles' object with # upper and lower threshold values for accepting aberrant exon usage, and # the list object 'aberrantExons' with the sample-wise number of exons # outside the threshold values.
# Calculate aberrant exon usage for each sample in the data set: fs <- firmaAnalysis(useToyData=TRUE) tra <- aberrantExonUsage(1.0, fs) # The aberrantExonUsage function also creates the 'quantiles' object with # upper and lower threshold values for accepting aberrant exon usage, and # the list object 'aberrantExons' with the sample-wise number of exons # outside the threshold values.
Create plot from hierarchical clustering analysis of the samples, based on splicing factor expression levels.
clusterPlot(geneSummaries, tra, distmethod, clustermethod, fileName)
clusterPlot(geneSummaries, tra, distmethod, clustermethod, fileName)
geneSummaries |
The data.frame with gene-level expression values for each sample, returned from the function 'readGeneSummaries'. |
tra |
The list returned from the function 'aberrantExonUsage', containing sample-wise total relative amounts of aberrant exon usage. |
distmethod |
Which distance measure to be used. Possible options are "euclidean", "maximum", "manhattan", "canberra", "binary" or "minkowski". |
clustermethod |
Which clustering algorithm to be used. Possible options are "ward", "single", "complete", "average", "mcquitty", "median" or "centroid". |
fileName |
Output filename. File format is optional, but must be one of png, jpg, eps or pdf. |
clusterPlot is used for the side-effect of producing a hierarchical clustering plot showing how the samples are separated based on expression levels for the splicing factors in each sample.
fs <- firmaAnalysis(useToyData=TRUE) gs <- readGeneSummaries(useToyData=TRUE) tra <- aberrantExonUsage(1.0, fs) # create cluster plot with the samples clusterPlot(gs, tra, "euclidean", "complete", "cluster.png")
fs <- firmaAnalysis(useToyData=TRUE) gs <- readGeneSummaries(useToyData=TRUE) tra <- aberrantExonUsage(1.0, fs) # create cluster plot with the samples clusterPlot(gs, tra, "euclidean", "complete", "cluster.png")
The function makes use of the corAndPValue function from the WGCNA package to calculate sample-wise Pearson correlation between relative amounts of aberrant exon usage and splicing factor expression levels.
correlation(splicingFactors, geneSummaries, tra)
correlation(splicingFactors, geneSummaries, tra)
splicingFactors |
A data.frame with a list of splicing factor genes (Affymetrix transcript cluster id's and gene symbols) to be included in the correlation analysis. The list can include any set of genes included in the data.frame returned from the function 'readGeneSummaries'. An example set of 280 genes is obtained by issuing the command 'data(splicingFactors)'. |
geneSummaries |
A data.frame with gene-level expression values for all samples, returned from the 'readGeneSummaries' function. |
tra |
List of sample-wise total relative amounts of aberrant exon usage, returned from the 'aberrantExonUsage' function. |
A list with sample-wise Pearson correlation values between relative amounts of aberrant exon usage and splicing factor expression levels.
data(sampleSetGeneSummaries) gs <- sampleSetGeneSummaries fs <- firmaAnalysis(useToyData=TRUE) tra <- aberrantExonUsage(1.0, fs) # calculate correlation between splicing factor expression and aberrant # exon usage data(splicingFactors) corr <- correlation(splicingFactors, gs, tra)
data(sampleSetGeneSummaries) gs <- sampleSetGeneSummaries fs <- firmaAnalysis(useToyData=TRUE) tra <- aberrantExonUsage(1.0, fs) # calculate correlation between splicing factor expression and aberrant # exon usage data(splicingFactors) corr <- correlation(splicingFactors, gs, tra)
Function for creating a plot that visualizes the number of splicing factor genes with expression levels significantly correlated with the sample-wise total relative amounts of aberrant exon usage (red). To compare this value with correlation values obtained from random sample permutations, the function performs two types of iterative sample calculations as control experiments. First, expression levels of the splicing factor gene set are correlated with permutations of the relative aberrant exon usage (dark blue). Second, expression levels of randomly generated gene sets are correlated with the relative aberrant exon usage amounts in the data (pale blue).
correlationPlot(fileName, tra, geneSummaries, splicingFactors, randomGeneSets, traPermutations)
correlationPlot(fileName, tra, geneSummaries, splicingFactors, randomGeneSets, traPermutations)
fileName |
Output filename. File format is optional, but must be one of png, jpg, eps or pdf. |
tra |
List of sample-wise total relative amounts of aberrant exon usage, obtained using the 'aberrantExonUsage' function. |
geneSummaries |
Data.frame with gene-level expression data for each sample, returned from the function 'readGeneSummaries'. |
splicingFactors |
List with Affymetrix transcript cluster id's and gene symbols for a set of genes involved in pre-mRNA splicing. An example set with 280 genes is obtained by issuing the command 'data(splicingFactors)', but the list may well include any set of genes included in the data.frame returned from the function 'readGeneSummaries'. |
randomGeneSets |
Number of random gene sets of 280 genes to be created and included in the analysis. |
traPermutations |
Number of permutations of the sample-wise amounts of aberrant exon usage to be performed and included in the analysis. |
correlationPlot is used for the side-effect of producing a plot showing the number of splicing factor genes with expression levels significantly correlated with the sample-wise total relative amounts of aberrant exon usage.
data(splicingFactors) fs <- firmaAnalysis(useToyData=TRUE) gs <- readGeneSummaries(useToyData=TRUE) tra <- aberrantExonUsage(1.0, fs) # Create a plot that visualizes the number of splicing factor # genes with expression levels significantly correlated with the # sample-wise total relative amounts of aberrant exon usage. correlationPlot("c.png", tra, gs, splicingFactors, 1000, 1000)
data(splicingFactors) fs <- firmaAnalysis(useToyData=TRUE) gs <- readGeneSummaries(useToyData=TRUE) tra <- aberrantExonUsage(1.0, fs) # Create a plot that visualizes the number of splicing factor # genes with expression levels significantly correlated with the # sample-wise total relative amounts of aberrant exon usage. correlationPlot("c.png", tra, gs, splicingFactors, 1000, 1000)
The function makes use of the aroma.affymetrix package to analyze Affymetrix Human Exon 1.0 ST Arrays. The function reads CEL files, and performs background correction, normalization (customized RMA approach), and alternative splicing analysis according to the FIRMA method (http://www.aroma-project.org/vignettes/FIRMA-HumanExonArrayAnalysis). The function returns a data.frame with log2 FIRMA (alternative splicing) scores for each probeset/sample combination.
firmaAnalysis(useToyData, aromaPath, dataSetName)
firmaAnalysis(useToyData, aromaPath, dataSetName)
useToyData |
Boolean argument to indicate whether sample data sets included in the TIN package should be used in the analysis. |
aromaPath |
Absolute or relative path to the aroma.affymetrix directory. Requires custom CDF annotation file (please refer to the FIRMA vignette for download and setup). |
dataSetName |
Name of folder in the 'aromaPath' containing raw data (CEL files; please refer to the FIRMA vignette for setup). |
A data.frame with expression level values after the FIRMA analysis has been applied. The data.frame consists of one column for each sample and one row for each probeset.
E. Purdom, K. Simpson, M. Robinson, J. Conboy, A. Lapuk & T.P. Speed, FIRMA: a method for detection of alternative splicing from exon array data. Bioinformatics, 2008.
# Perform FIRMA analysis on the raw expression data # To use sample data sets included in the TIN package as input: fs <- firmaAnalysis(useToyData=TRUE) # To use your own data, provide path to aroma.affymetrix root directory and # name of data set as arguments: # fs <- firmaAnalysis(useToyData=FALSE, "/tmp/path/to/aroma.affymetrix", # "sampleSet")
# Perform FIRMA analysis on the raw expression data # To use sample data sets included in the TIN package as input: fs <- firmaAnalysis(useToyData=TRUE) # To use your own data, provide path to aroma.affymetrix root directory and # name of data set as arguments: # fs <- firmaAnalysis(useToyData=FALSE, "/tmp/path/to/aroma.affymetrix", # "sampleSet")
The object contains a data.frame with Affymetrix transcript cluster id and gene symbol for the 22,011 genes defined by the Affymetrix core gene set.
data(geneAnnotation)
data(geneAnnotation)
data.frame with two columns; Affymetrix transcript cluster id and gene symbol.
A data.frame with Affymetrix transcript cluster id and gene symbol for 22,011 genes.
The function makes use of the corAndPValue function from the WGCNA package to calculate the Pearson correlation between sample-wise aberrant exon usage amounts and expression levels of all genes for all gene sets defined by the input parameter list geneSets.
geneSetCorrelation(geneSets, geneAnnotation, geneSummaries, tra, noGeneSets)
geneSetCorrelation(geneSets, geneAnnotation, geneSummaries, tra, noGeneSets)
geneSets |
A data.frame with a number of gene lists to be included in the correlation analysis. An example data.frame of 1,454 lists of genes (specified by Affymetrix transcript cluster id's and gene symbols) is included in the package, and will be accessible by issuing the command 'data(genesets)'. The example data.frame contains a complete collection of all Gene Ontology gene sets included in the Molecular Signatures Database v3.1. |
geneAnnotation |
A data.frame with Affymetrix transcript cluster id's and gene symbols for all 22,011 genes included in the Affymetrix 'core' set. |
geneSummaries |
A data.frame with gene-level expression values for all samples, returned from the 'readGeneSummaries' function. |
tra |
List with total relative amounts of aberrant exon usage for all samples obtained using the 'aberrantExonUsage' function. |
noGeneSets |
Optional argument specifying how many gene sets to include in the analysis. |
A data.frame with one row for each data set used as input, and columns for name of set, number of genes, number of significant positively/negatively correlated genes in the set, and median correlation strength.
# Load data data(geneSets) data(geneAnnotation) fs <- firmaAnalysis(useToyData=TRUE) gs <- readGeneSummaries(useToyData=TRUE) tra <- aberrantExonUsage(1.0, fs) # Calculate correlation in other gene sets crs <- geneSetCorrelation(geneSets, geneAnnotation, gs, tra, 50)
# Load data data(geneSets) data(geneAnnotation) fs <- firmaAnalysis(useToyData=TRUE) gs <- readGeneSummaries(useToyData=TRUE) tra <- aberrantExonUsage(1.0, fs) # Calculate correlation in other gene sets crs <- geneSetCorrelation(geneSets, geneAnnotation, gs, tra, 50)
A data.frame with 1,454 lists of genes (specified by Affymetrix transcript cluster id's and gene symbols) to be included in the correlation analysis. The data.frame contains a complete collection of all Gene Ontology gene sets included in the Molecular Signatures Database v3.1.
data(geneSets)
data(geneSets)
data.frame that contains 1,454 lists of genes, specified by Affymetrix transcript cluster id's and gene symbols.
A data.frame with 1,454 Gene Ontology gene lists specified by Affymetrix transcript cluster id's and gene symbols.
The posNegCorrPlot is a scatterPlot that compares the amount of splicing factor genes (red) for which expression levels are significant positively (vertical axis) and negatively (horizontal axis) correlated with the total relative amounts of aberrant exon usage per sample. The plot can also include results from permutations of the sample-wise aberrant exon usage amounts (dark blue), and randomly constructed gene sets of 280 genes.
posNegCorrPlot(fileName, tra, geneSummaries, splicingFactors, randomGeneSets, traPermutations)
posNegCorrPlot(fileName, tra, geneSummaries, splicingFactors, randomGeneSets, traPermutations)
fileName |
Output filename. File format is optional, but must be one of png, jpg, eps or pdf. |
tra |
List object with total relative amounts of aberrant exon usage per sample. The object is returned by the function 'aberrantExonUsage'. |
geneSummaries |
The data.frame with gene-level expression values for each sample, returned from the function 'readGeneSummaries'. |
splicingFactors |
List of genes (Affymetrix transcript cluster id's and gene symbols) involved in pre-mRNA splicing. An example set of 280 genes is obtained by issuing the command 'data(splicingFactors)', but the input list can include any set of genes included in the data.frame returned from the function 'readGeneSummaries'. |
randomGeneSets |
Number of random gene sets of 280 genes to be created and included in the analysis. |
traPermutations |
Number of permutations of the sample-wise amounts of aberrant exon usage to be performed and included in the analysis. |
posNegCorrPlot is used for the side-effect of producing a scatter plot that compares the amount of splicing factor genes (red) for which expression levels are significant positively (vertical axis) and negatively (horizontal axis) correlated with the total relative amounts of aberrant exon usage per sample. In addition, the plot can also include results from permutations of the sample-wise aberrant exon usage amounts (dark blue), and randomly constructed gene sets of 280 genes.
data(splicingFactors) fs <- firmaAnalysis(useToyData=TRUE) gs <- readGeneSummaries(useToyData=TRUE) tra <- aberrantExonUsage(1.0, fs) # Create plot that compares the amount of splicing factor genes for which # expression levels are significant positively and negatively correlated # with the total relative amounts of aberrant exon usage per sample posNegCorrPlot("cg.png", tra, gs, splicingFactors, 1000, 1000)
data(splicingFactors) fs <- firmaAnalysis(useToyData=TRUE) gs <- readGeneSummaries(useToyData=TRUE) tra <- aberrantExonUsage(1.0, fs) # Create plot that compares the amount of splicing factor genes for which # expression levels are significant positively and negatively correlated # with the total relative amounts of aberrant exon usage per sample posNegCorrPlot("cg.png", tra, gs, splicingFactors, 1000, 1000)
The function takes in the data.frame from 'firmaAnalysis' (containing log2 FIRMA scores for all probe sets/exons (rows) in all samples (columns)), along with the list 'percentiles' from 'aberrantExonUsage' (containing the lower and upper percentile values of FIRMA scores used as thresholds for denoting aberrant exon usage), and makes permutations of the FIRMA scores for each probe set/exon across all samples. Based on the permutation, random relative amounts of aberrant exon skipping and inclusion per sample is calculated and returned.
probesetPermutations(fs, percentiles)
probesetPermutations(fs, percentiles)
fs |
FIRMA scores for each probe set/sample combination (the data.frame returned from the function 'firmaAnalysis'). |
percentiles |
List containing two FIRMA score percentile values, i.e., the lower and upper percentiles used as thresholds for denoting aberrant exon usage (the list object 'percentiles' returned from the function 'aberrantExonUsage'). |
A list with two vectors, with number of exon skipping and inclusion events, respectively, for each sample after permutations of the expression levels at each probeset.
# Set up data set with FIRMA scores and calculate relative aberrant # exon usage for each sample fs <- firmaAnalysis(useToyData=TRUE) tra <- aberrantExonUsage(1.0, fs) # Make permutations of the expression data at each probeset perms <- probesetPermutations(fs, quantiles)
# Set up data set with FIRMA scores and calculate relative aberrant # exon usage for each sample fs <- firmaAnalysis(useToyData=TRUE) tra <- aberrantExonUsage(1.0, fs) # Make permutations of the expression data at each probeset perms <- probesetPermutations(fs, quantiles)
The function reads pre-pocessed and summarized gene-level expression data from file, and builds a data.frame with the information. The data.frame will have the same structure as the input summary file, i.e., genes in rows and samples in columns.
readGeneSummaries(useToyData, summaryFile)
readGeneSummaries(useToyData, summaryFile)
useToyData |
Boolean argument to indicate whether sample data sets included in the TIN package should be used in the analysis. |
summaryFile |
Tab separated file with expression values for each combination of gene (row) and sample (column). Gene expression summary files can be obtained by using for instance APT (Affymetrix Power Tools) or Expression Console. The file is expected to have one initial header line with sample names, and one initial column with Affymetrix transcript cluster id's as row names. |
A data.frame with gene summary values. It consists of one column for each sample, and one row for each gene.
# Read pre-processed gene summary values from file # To use sample test data included in the TIN package as input: gs <- readGeneSummaries(useToyData=TRUE) # To use your own data, provide path to txt file with expression values: # gs <- readGeneSummaries("/tmp/path/to/GeneLevelExpressionValues.txt")
# Read pre-processed gene summary values from file # To use sample test data included in the TIN package as input: gs <- readGeneSummaries(useToyData=TRUE) # To use your own data, provide path to txt file with expression values: # gs <- readGeneSummaries("/tmp/path/to/GeneLevelExpressionValues.txt")
A data.frame containing preprocessed log2 expression values from 16 samples in 10,000 randomly selected Affymetrix probesets.
data(sampleSetFirmaScores)
data(sampleSetFirmaScores)
data.frame that contains preprocessed log2 expression values from 16 samples in 10,000 randomly selected Affymetrix probesets.
A data.frame with preprocessed log2 expression values in 10,000 randomly selected Affymetrix probesets for a test data set of 16 samples.
A data.frame containing gene expression summary values from 16 samples in 22,011 Affymetrix transcript clusters.
data(sampleSetGeneSummaries)
data(sampleSetGeneSummaries)
A data.frame containing gene expression summary values from 16 samples in 22,011 Affymetrix transcript clusters.
A data.frame with expression values at the gene level for a test data set of 16 samples.
Scatterplot showing sample-wise relative amounts (blue dots) of aberrant exon inclusion (horizontal axis) and exon skipping (vertical axis). Random sample-wise amounts calculated from permuted FIRMA scores can also be included in the plot (yellow dots).
scatterPlot(fileName, permutations, percentileHits, permPercentileHits)
scatterPlot(fileName, permutations, percentileHits, permPercentileHits)
fileName |
Output filename. File format is optional, but must be one of png, jpg, eps or pdf. |
permutations |
Boolean argument to indicate whether permuted data should be included. 'TRUE' adds permutation data to the plot. |
percentileHits |
List with two items containing sample-wise numbers of exons denoted with aberrant exon skipping and inclusion, i.e., having FIRMA scores in the indicated lower and upper percentiles, respectively. This list is returned from the function 'aberrantExonUsage'. |
permPercentileHits |
List with two items containing random sample-wise numbers of exons denoted with aberrant exon skipping and inclusion, i.e., having FIRMA scores in the indicated lower and upper percentiles, respecively (calculated from FIRMA score permutations). This list is returned from the function 'probesetPermutations'. |
scatterPlot is used for the side-effect of producing a scatter plot showing relative amounts of aberrant exon usage per sample.
data(splicingFactors) fs <- firmaAnalysis(useToyData=TRUE) gs <- readGeneSummaries(useToyData=TRUE) tra <- aberrantExonUsage(1.0, fs) # The aberrantExonUsage function also creates the 'quantiles' object with # upper and lower threshold values for accepting aberrant exon usage, and # the list object 'aberrantExons' with the sample-wise number of exons # outside the threshold values. aberrantExonsPerms <- probesetPermutations(fs, quantiles) # Create scatter plot with the samples scatterPlot("scatter.png", TRUE, aberrantExons, aberrantExonsPerms)
data(splicingFactors) fs <- firmaAnalysis(useToyData=TRUE) gs <- readGeneSummaries(useToyData=TRUE) tra <- aberrantExonUsage(1.0, fs) # The aberrantExonUsage function also creates the 'quantiles' object with # upper and lower threshold values for accepting aberrant exon usage, and # the list object 'aberrantExons' with the sample-wise number of exons # outside the threshold values. aberrantExonsPerms <- probesetPermutations(fs, quantiles) # Create scatter plot with the samples scatterPlot("scatter.png", TRUE, aberrantExons, aberrantExonsPerms)
The gene set is a comprehensive collection of 280 genes involved in pre-mRNA splicing events (Sveen et al., Genome Medicine, 2011, 3:32).
data(splicingFactors)
data(splicingFactors)
data.frame with two columns; Affymetrix transcript cluster id and gene symbol.
A data.frame with 280 genes involved in pre-mRNA splicing events.