Title: | An R package estimates the correlations of orthologs and transposable elements between two species |
---|---|
Description: | TEKRABber is made to provide a user-friendly pipeline for comparing orthologs and transposable elements (TEs) between two species. It considers the orthology confidence between two species from BioMart to normalize expression counts and detect differentially expressed orthologs/TEs. Then it provides one to one correlation analysis for desired orthologs and TEs. There is also an app function to have a first insight on the result. Users can prepare orthologs/TEs RNA-seq expression data by their own preference to run TEKRABber following the data structure mentioned in the vignettes. |
Authors: | Yao-Chung Chen [aut, cre] , Katja Nowick [aut] |
Maintainer: | Yao-Chung Chen <[email protected]> |
License: | LGPL (>=3) |
Version: | 1.11.0 |
Built: | 2024-10-31 05:44:21 UTC |
Source: | https://github.com/bioc/TEKRABber |
Provide a shiny UI for presenting the results from DE analysis and correlation analysis.
appTEKRABber(corrRef, corrCompare, DEobject)
appTEKRABber(corrRef, corrCompare, DEobject)
corrRef |
correlation results for reference using corrOrtholgScale() |
corrCompare |
correlation results for comparison using corrOrthologScale() |
DEobject |
DE object using DEgeneTE() |
provide an interactive shinyapp
data(speciesCounts) hmGene <- speciesCounts$hmGene hmTE <- speciesCounts$hmTE chimpGene <- speciesCounts$chimpGene chimpTE <- speciesCounts$chimpTE data(fetchDataHmChimp) fetchData <- fetchDataHmChimp inputBundle <- DECorrInputs(fetchData) meta <- data.frame( species = c(rep("human", ncol(hmGene) - 1), rep("chimpanzee", ncol(chimpGene) - 1))) meta$species <- factor(meta$species, levels = c("human", "chimpanzee")) rownames(meta) <- colnames(inputBundle$geneInputDESeq2) hmchimpDE <- DEgeneTE( geneTable = inputBundle$geneInputDESeq2, teTable = inputBundle$teInputDESeq2, metadata = meta, expDesign = TRUE) # use only 10 rows of Genes and TEs hmCorrResult <- corrOrthologTE( geneInput = hmchimpDE$geneCorrInputRef[c(1:10),], teInput = hmchimpDE$teCorrInputRef[c(1:10),], corrMethod = "pearson", padjMethod = "fdr") chimpCorrResult <- corrOrthologTE( geneInput = hmchimpDE$geneCorrInputCompare[c(1:10), ], teInput = hmchimpDE$teCorrInputCompare[c(1:10), ], corrMethod = "pearson", padjMethod = "fdr") #library(plotly) #appTEKRABber( #corrRef = hmCorrResult, #corrCompare = chimpCorrResult, #DEobject = hmchimpDE)
data(speciesCounts) hmGene <- speciesCounts$hmGene hmTE <- speciesCounts$hmTE chimpGene <- speciesCounts$chimpGene chimpTE <- speciesCounts$chimpTE data(fetchDataHmChimp) fetchData <- fetchDataHmChimp inputBundle <- DECorrInputs(fetchData) meta <- data.frame( species = c(rep("human", ncol(hmGene) - 1), rep("chimpanzee", ncol(chimpGene) - 1))) meta$species <- factor(meta$species, levels = c("human", "chimpanzee")) rownames(meta) <- colnames(inputBundle$geneInputDESeq2) hmchimpDE <- DEgeneTE( geneTable = inputBundle$geneInputDESeq2, teTable = inputBundle$teInputDESeq2, metadata = meta, expDesign = TRUE) # use only 10 rows of Genes and TEs hmCorrResult <- corrOrthologTE( geneInput = hmchimpDE$geneCorrInputRef[c(1:10),], teInput = hmchimpDE$teCorrInputRef[c(1:10),], corrMethod = "pearson", padjMethod = "fdr") chimpCorrResult <- corrOrthologTE( geneInput = hmchimpDE$geneCorrInputCompare[c(1:10), ], teInput = hmchimpDE$teCorrInputCompare[c(1:10), ], corrMethod = "pearson", padjMethod = "fdr") #library(plotly) #appTEKRABber( #corrRef = hmCorrResult, #corrCompare = chimpCorrResult, #DEobject = hmchimpDE)
To estimate correlation comparing orthologs and TEs one-by-one from inputs. You can specify the correlation and adjusted p-value methods (see details in parameters). If you want to save your outputs instead of just returning them, please specify the fileDir and fileName with the extension .csv. The default fileName is TEKRABber_geneTECorrReusult.csv.
corrOrthologTE(geneInput, teInput, corrMethod = "pearson", padjMethod = "fdr", numCore=1, fileDir=NULL, fileName="TEKRABber_geneTECorrResult.csv")
corrOrthologTE(geneInput, teInput, corrMethod = "pearson", padjMethod = "fdr", numCore=1, fileDir=NULL, fileName="TEKRABber_geneTECorrResult.csv")
geneInput |
gene count input for correlation from using DECorrInputs() |
teInput |
te count input for correlation from using DECorrInputs() |
corrMethod |
correlation method, including pearson, kendall, spearman. Default is pearson. |
padjMethod |
method to return adjusted p-value, and default is fdr. See ?p.adjust |
numCore |
number of cores to run parallel. Default is 1. You can use detectCores() to get how many cores you can use. |
fileDir |
the name of directory for saving output files. Default is NULL. |
fileName |
the name for saving output files. Default is "TEKRABber_geneTECorrResult.csv" |
a dataframe includes correlation coefficient, pvalue, padj
data(ctInputDE) geneInputDE <- ctInputDE$gene teInputDE <- ctInputDE$te metaExp <- data.frame(experiment = c(rep("control", 5), rep("treatment", 5))) rownames(metaExp) <- colnames(geneInputDE) metaExp$experiment <- factor( metaExp$experiment, levels = c("control", "treatment") ) resultDE <- DEgeneTE( geneTable = geneInputDE, teTable = teInputDE, metadata = metaExp, expDesign = FALSE ) controlCorr <- corrOrthologTE( geneInput = resultDE$geneCorrInputRef[c(1:10),], teInput = resultDE$teCorrInputRef[c(1:10),], numCore = 1, corrMethod = "pearson", padjMethod = "fdr" )
data(ctInputDE) geneInputDE <- ctInputDE$gene teInputDE <- ctInputDE$te metaExp <- data.frame(experiment = c(rep("control", 5), rep("treatment", 5))) rownames(metaExp) <- colnames(geneInputDE) metaExp$experiment <- factor( metaExp$experiment, levels = c("control", "treatment") ) resultDE <- DEgeneTE( geneTable = geneInputDE, teTable = teInputDE, metadata = metaExp, expDesign = FALSE ) controlCorr <- corrOrthologTE( geneInput = resultDE$geneCorrInputRef[c(1:10),], teInput = resultDE$teCorrInputRef[c(1:10),], numCore = 1, corrMethod = "pearson", padjMethod = "fdr" )
TEKRABber can also be used comparing orthologs and transposable elements within same species, i.e., control and treatment. Here we provide an example data for demonstration. This data was based on syn8466812 RNA-seq (Allen M et al., 2016). However, the expression data was modified due to confidential agreement. Therefore, it cannot represent the original data.
data(ctInputDE)
data(ctInputDE)
An object contains 2 expression data:
input gene data for DE analysis comparing control and treatment
input TE data for DE analysis comparing control and treatment
data(ctInputDE) geneInputDE <- ctInputDE$gene teInputDE <- ctInputDE$te
data(ctInputDE) geneInputDE <- ctInputDE$gene teInputDE <- ctInputDE$te
Generate all the inputs files for differentially expressed orthologous genes/TEs analysis, and for correlation analysis. The output is a list containing 6 dataframes.
DECorrInputs(fetchData)
DECorrInputs(fetchData)
fetchData |
output list from TEKRABber::orthologScale() |
create inputs for DE analysis and correlations: (1) geneInputDESeq2 (2) teInputDESeq2 (3) geneCorrInputRef (4) geneCorrInputCompare (5) TECorrInputRef (6) TECorrInputCompare
data(speciesCounts) data(hg38_panTro6_rmsk) hmGene <- speciesCounts$hmGene chimpGene <- speciesCounts$chimpGene hmTE <- speciesCounts$hmTE chimpTE <- speciesCounts$chimpTE ## For demonstration, here we only select 1000 rows to save time set.seed(1234) hmGeneSample <- hmGene[sample(nrow(hmGene), 1000), ] chimpGeneSample <- chimpGene[sample(nrow(chimpGene), 1000), ] fetchData <- orthologScale( speciesRef = "hsapiens", speciesCompare = "ptroglodytes", geneCountRef = hmGeneSample, geneCountCompare = chimpGeneSample, teCountRef = hmTE, teCountCompare = chimpTE, rmsk = hg38_panTro6_rmsk, version = 105 ) inputBundle <- DECorrInputs(fetchData)
data(speciesCounts) data(hg38_panTro6_rmsk) hmGene <- speciesCounts$hmGene chimpGene <- speciesCounts$chimpGene hmTE <- speciesCounts$hmTE chimpTE <- speciesCounts$chimpTE ## For demonstration, here we only select 1000 rows to save time set.seed(1234) hmGeneSample <- hmGene[sample(nrow(hmGene), 1000), ] chimpGeneSample <- chimpGene[sample(nrow(chimpGene), 1000), ] fetchData <- orthologScale( speciesRef = "hsapiens", speciesCompare = "ptroglodytes", geneCountRef = hmGeneSample, geneCountCompare = chimpGeneSample, teCountRef = hmTE, teCountCompare = chimpTE, rmsk = hg38_panTro6_rmsk, version = 105 ) inputBundle <- DECorrInputs(fetchData)
To estimate differentially expressed genes and TEs, DEgeneTE() takes gene inputs and TE inputs from the results using the DECorrInputs function. You need to specify your metadata and expDesign based on your design. If you also want to save the output, please specify the fileDir parameter.
DEgeneTE(geneTable, teTable, metadata, expDesign=TRUE, fileDir=NULL)
DEgeneTE(geneTable, teTable, metadata, expDesign=TRUE, fileDir=NULL)
geneTable |
gene input table from using DECorrInputs() |
teTable |
TE input table from using DECorrInputs() |
metadata |
an one column dataframe with rownames same as the column name of gene/te count table. Column name must be species or experiment. |
expDesign |
Logic value for comparing between or within species. TRUE for comparing between two species, and FALSE for comparing between control and treatment. |
fileDir |
the name and path of directory for saving output files. Default is NULL. |
return DESeq2 res and normalized gene counts.
## comparing between species: ## (1) set expDesign = TRUE ## (2) column name of metadata needs to be "species". data(fetchDataHmChimp) fetchData <- fetchDataHmChimp inputBundle <- DECorrInputs(fetchData) meta <- data.frame(species=c(rep("human", ncol(fetchData$geneRef) - 1), rep("chimpanzee", ncol(fetchData$geneCompare) - 1)) ) rownames(meta) <- colnames(inputBundle$geneInputDESeq2) meta$species <- factor(meta$species, levels = c("human", "chimpanzee")) hmchimpDE <- DEgeneTE( geneTable = inputBundle$geneInputDESeq2, teTable = inputBundle$teInputDESeq2, metadata = meta, expDesign = TRUE )
## comparing between species: ## (1) set expDesign = TRUE ## (2) column name of metadata needs to be "species". data(fetchDataHmChimp) fetchData <- fetchDataHmChimp inputBundle <- DECorrInputs(fetchData) meta <- data.frame(species=c(rep("human", ncol(fetchData$geneRef) - 1), rep("chimpanzee", ncol(fetchData$geneCompare) - 1)) ) rownames(meta) <- colnames(inputBundle$geneInputDESeq2) meta$species <- factor(meta$species, levels = c("human", "chimpanzee")) hmchimpDE <- DEgeneTE( geneTable = inputBundle$geneInputDESeq2, teTable = inputBundle$teInputDESeq2, metadata = meta, expDesign = TRUE )
An output list of data contains 7 elements after using orthologScale(), including (1) orthology table comparing human and chimpanzee. (2) scaling factor for orthologous genes (3) gene count table from reference species (4) gnee count table from species you want to compare (5) scaling factor for TEs (6) TE count table from reference species (7) TE count table from the species you want to compare. The aim to provide this dataset is to save time for user running the vignettes and give a template for demonstration.
data(fetchDataHmChimp)
data(fetchDataHmChimp)
An object contains 2 elements:
orthology information from Ensembl
scaling factor to normalize data
data(fetchDataHmChimp) fetchData <- fetchDataHmChimp fetchData$orthologTable fetchData$scaleFactor
data(fetchDataHmChimp) fetchData <- fetchDataHmChimp fetchData$orthologTable fetchData$scaleFactor
This Repeatmasker track annotations table was first downloaded from UCSC Genome Table Browser and it included the name, class, and average gene length in repeats(transposable elements). This data is used for demonstrate an example for user how to provide a annotation table to normalize their data which in this case comparing human(hg38) to chimpanzee(panTro6).
data(hg38_panTro6_rmsk)
data(hg38_panTro6_rmsk)
An object of class grouped_df
(inherits from tbl_df
, tbl
, data.frame
) with 12550 rows and 4 columns.
data(hg38_panTro6_rmsk)
data(hg38_panTro6_rmsk)
Normalize orthologous genes and TEs between two species with a scaling factor using their expression level and gene lengths.
orthologScale(speciesRef, speciesCompare, geneCountRef, geneCountCompare, teCountRef, teCountCompare, rmsk, version)
orthologScale(speciesRef, speciesCompare, geneCountRef, geneCountCompare, teCountRef, teCountCompare, rmsk, version)
speciesRef |
The scientific name for your reference species. i.e., hsapiens |
speciesCompare |
The scientific name for your species to compare. i.e., ptroglodytes |
geneCountRef |
Gene count from your reference species. First column should be Ensmebl gene ID. |
geneCountCompare |
Gene count from the species you want to compare. First column should be Ensembl gene ID. |
teCountRef |
TE count from your reference species. First column should be teName. |
teCountCompare |
TE count from the species you want to compare. First column should be teName. |
rmsk |
a repeatmasker table including 4 columns: (1) the name of TE (2) the class of TE (3) The average length of that TE from your reference species (4) The average length of that TE from the species you want to compare. |
version |
for specify Ensembl version. Default is NULL for getting the latest version |
a list of outputs: (1) orthologTable, orthology information (2) c_ortholog, scaling factor for orthologous genes (3) geneRef, gene count table for reference species (4) geneCompare, normalized gene count table for species compared (5) c_te, scaling factor for TEs (6) teRef, TE count table for reference species (7) teCompare, normalized TE count table for species compared.
data(speciesCounts) data(hg38_panTro6_rmsk) hmGene <- speciesCounts$hmGene chimpGene <- speciesCounts$chimpGene hmTE <- speciesCounts$hmTE chimpTE <- speciesCounts$chimpTE ## For demonstration, here we only select 1000 rows to save time set.seed(1234) hmGeneSample <- hmGene[sample(nrow(hmGene), 1000), ] chimpGeneSample <- chimpGene[sample(nrow(chimpGene), 1000), ] fetchData <- orthologScale( speciesRef = "hsapiens", speciesCompare = "ptroglodytes", geneCountRef = hmGeneSample, geneCountCompare = chimpGeneSample, teCountRef = hmTE, teCountCompare = chimpTE, rmsk = hg38_panTro6_rmsk, version = 105 )
data(speciesCounts) data(hg38_panTro6_rmsk) hmGene <- speciesCounts$hmGene chimpGene <- speciesCounts$chimpGene hmTE <- speciesCounts$hmTE chimpTE <- speciesCounts$chimpTE ## For demonstration, here we only select 1000 rows to save time set.seed(1234) hmGeneSample <- hmGene[sample(nrow(hmGene), 1000), ] chimpGeneSample <- chimpGene[sample(nrow(chimpGene), 1000), ] fetchData <- orthologScale( speciesRef = "hsapiens", speciesCompare = "ptroglodytes", geneCountRef = hmGeneSample, geneCountCompare = chimpGeneSample, teCountRef = hmTE, teCountCompare = chimpTE, rmsk = hg38_panTro6_rmsk, version = 105 )
create a table to the rmsk argument in orthologScale(). Before version 1.8, TEKRABber requires user to prepare this table by themselves and this function can help user automatically get the RepeatMasker table from UCSC. The arguments required are the abbreviation of the version of reference (case-sensitive). For example, "hg38" for human. Note: currently only 91 genomes provided. Check if the reference exists with GenomeInfoDb::registered_UCSC_genomes().
prepareRMSK(refSpecies, compareSpecies)
prepareRMSK(refSpecies, compareSpecies)
refSpecies |
the version of reference species, i.e. hg38 |
compareSpecies |
the version of compared species, i.e. panTro6 |
Dataframe with four columns: repName, repClass, rLen and cLen
df_rmsk <- prepareRMSK(refSpecies = "hg38", compareSpecies = "panTro6")
df_rmsk <- prepareRMSK(refSpecies = "hg38", compareSpecies = "panTro6")
Estimate the correlation between genes and transposable elements
rcpp_corr(df1, df2, Method)
rcpp_corr(df1, df2, Method)
df1 |
First dataframe |
df2 |
Second dataframe |
Method |
correlation method |
a dataframe containing correlation results
Dataset contains 4 expression data from human and chimpanzee brain RNA-seq. We select raw fastq data from 10 humans and 10 chimpanzees from (Khrameeva E et al., 2020). Gene expression is generated using HISAT2 and featureCounts (Kim D et al., 2019; Liao Y et al., 2014). Transposable elements (TEs) expression is generated with multi-mapping option using STAR and TEtranscripts (Dobin A et al., 2013; Jin Y et al., 2015).
data(speciesCounts)
data(speciesCounts)
An object contains 4 expression counts:
human gene expression data
human TE expression
chimpanzee gene expression data
chimpanzee TE expression data
data(speciesCounts) hmGene <- speciesCounts$hmGene hmTE <- speciesCounts$hmTE chimpGene <- speciesCounts$chimpGene chimpTE <- speciesCounts$chimpTE
data(speciesCounts) hmGene <- speciesCounts$hmGene hmTE <- speciesCounts$hmTE chimpGene <- speciesCounts$chimpGene chimpTE <- speciesCounts$chimpTE
TEKRABber is made to provide an user-friendly pipeline for comparing orthologs and transposable elements (TEs) between two species. It considers the orthology confidence between two species from BioMart to normalize expression counts and detect differentially expressed ortholog/TEs. Then it provides one to one correlation analysis for desired orthologs and TEs. There is also an app function to have a first insight on the result. Users can prepare orthologs/TEs RNA-seq expression data by their own preference to run TEKRABber following the data structure mentioned in the vignettes.
TEKRABber analysis pipeline includes 5 main functions:
1. orthologScale(): obtain orthology information and calculate scaling factor. 2. DECorrInputs(): create the input files for running DE/correlation analysis. 3. DEgeneTE(): run DE analysis on orthologs and transposable elements. 4. corrOrthologTE(): estimate correlation between selected orthologs and transposable elements. 5. appTEKRABber(): (optional) find first insight from data using an local webapp. Find more details in vignette or on the helping page, i.e. ?orthologScale
Yao-Chung Chen, Katja Nowick.
Maintainer: Yao-Chung Chen [email protected]