Title: | Analyze Transcription Factor Enrichment |
---|---|
Description: | Package to analize transcription factor enrichment in a gene set using data from ChIP-Seq experiments. |
Authors: | Laura Puente Santamaría, Luis del Peso |
Maintainer: | Laura Puente Santamaría <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.25.0 |
Built: | 2024-06-30 04:45:10 UTC |
Source: | https://github.com/bioc/TFEA.ChIP |
Used to run examples. Data frame containing metadata information for the ChIP-Seq GSM2390643. Fields in the data frame:
Name: Name of the file.
Accession: Accession ID of the experiment.
Cell: Cell line or tissue.
'Cell Type': More information about the cells.
Treatment
Antibody
TF: Transcription factor tested in the ChIP-Seq experiment.
data("ARNT.metadata")
data("ARNT.metadata")
a data frame of one row and 7 variables.
Used to run examples. Data frame containing peak information from the ChIP-Seq GSM2390643. Fields in the data frame:
Name: Name of the file.
chr: Chromosome, factor
start: Start coordinate for each peak
end: End coordinate for each peak
X.10.log.pvalue.: log10(p-Value) for each peak.
data("ARNT.peaks.bed")
data("ARNT.peaks.bed")
a data frame of 2140 rows and 4 variables.
Its rows correspond to human genes, and its columns, to every ChIP-Seq experiment in the database. The values are 1 – if the ChIP-Seq has a peak assigned to that gene – or 0 – if it hasn’t –.
data("ChIPDB")
data("ChIPDB")
a matrix of 1060 columns and 16797 rows
Function to compute contingency 2x2 matrix by the partition of the two gene ID lists according to the presence or absence of the terms in these list in a ChIP-Seq binding database.
contingency_matrix(test_list, control_list, chip_index = get_chip_index())
contingency_matrix(test_list, control_list, chip_index = get_chip_index())
test_list |
List of gene Entrez IDs |
control_list |
If not provided, all human genes not present in test_list will be used as control. |
chip_index |
Output of the function “get_chip_index”, a data frame containing accession IDs of ChIPs on the database and the TF each one tests. If not provided, the whole internal database will be used |
List of contingency matrices, one CM per element in chip_index (i.e. per ChIP-seq dataset).
data('Genes.Upreg',package = 'TFEA.ChIP') CM_list_UP <- contingency_matrix(Genes.Upreg)
data('Genes.Upreg',package = 'TFEA.ChIP') CM_list_UP <- contingency_matrix(Genes.Upreg)
Used to run examples. Part of a DHS database storing 76 sites for the human genome in GenomicRanges format.
data("DnaseHS_db")
data("DnaseHS_db")
GenomicRanges object with 76 elements
Used to run examples. Array of 2754 Entrez Gene IDs extracted from an RNA-Seq experiment sorted by log(Fold Change).
data("Entrez.gene.IDs")
data("Entrez.gene.IDs")
Array of 2754 Entrez Gene IDs.
Translates mouse or human gene IDs from Gene Symbol or Ensemble Gene ID to Entrez Gene ID using the IDs approved by HGNC. When translating from Gene Symbol, keep in mind that many genes have been given more than one symbol through the years. This function will return the Entrez ID corresponding to the currently approved symbols if they exist, otherwise NA is returned. In addition some genes might map to more than one Entrez ID, in this case gene is assigned to the first match and a warning is displayed.
GeneID2entrez(gene.IDs, return.Matrix = FALSE, mode = "h2h")
GeneID2entrez(gene.IDs, return.Matrix = FALSE, mode = "h2h")
gene.IDs |
Array of Gene Symbols or Ensemble Gene IDs. |
return.Matrix |
T/F. When TRUE, the function returns a matrix[n,2], one column with the gene symbols or Ensemble IDs, another with their respective Entrez IDs. |
mode |
Specify the organism used: 'h2h' for homo sapiens gene IDs, 'm2m' for mouse gene IDs, or 'm2h' to get the corresponding human gene IDs from a mouse input. |
Vector or matrix containing the Entrez IDs(or NA) corresponding to every element of the input.
GeneID2entrez(c('TNMD','DPM1','SCYL3','FGR','CFH','FUCA2','GCLC'))
GeneID2entrez(c('TNMD','DPM1','SCYL3','FGR','CFH','FUCA2','GCLC'))
Used to run examples. Array of 342 Entrez Gene IDs extracted from upregulated genes in an RNA-Seq experiment.
data("Genes.Upreg")
data("Genes.Upreg")
Array of 2754 Entrez Gene IDs.
Function to create a data frame containing the ChIP-Seq dataset accession IDs and the transcription factor tested in each ChIP. This index is used in functions like “contingency_matrix” and “GSEA_run” as a filter to select specific ChIPs or transcription factors to run an analysis.
get_chip_index(encodeFilter = FALSE, TFfilter = NULL)
get_chip_index(encodeFilter = FALSE, TFfilter = NULL)
encodeFilter |
(Optional) If TRUE, only ENCODE ChIP-Seqs are included in the index. |
TFfilter |
(Optional) Transcription factors of interest. |
Data frame containig the accession ID and TF for every ChIP-Seq experiment included in the metadata files.
get_chip_index(encodeFilter = TRUE) get_chip_index(TFfilter=c('SMAD2','SMAD4'))
get_chip_index(encodeFilter = TRUE) get_chip_index(TFfilter=c('SMAD2','SMAD4'))
Function to plot a color bar from log2(Fold Change) values from an expression experiment.
get_LFC_bar(LFC)
get_LFC_bar(LFC)
LFC |
Vector of log2(fold change) values arranged from higher to lower. Use ony the values of genes that have an Entrez ID. |
Plotly heatmap plot -log2(fold change) bar-.
From a list of contingency matrices, such as the output from “contingency_matrix”, this function computes a fisher's exact test for each matrix and generates a data frame that stores accession ID of a ChIP-Seq experiment, the TF tested in that experiment, the p-value and the odds ratio resulting from the test.
getCMstats(CM_list, chip_index = get_chip_index())
getCMstats(CM_list, chip_index = get_chip_index())
CM_list |
Output of “contingency_matrix”, a list of contingency matrices. |
chip_index |
Output of the function “get_chip_index”, a data frame containing accession IDs of ChIPs on the database and the TF each one tests. If not provided, the whole internal database will be used |
Data frame containing accession ID of a ChIP-Seq experiment and its experimental conditions, the TF tested in that experiment, raw and adjusted p-values, odds-ratio, and euclidean distance. and FDR-adjusted p-values (-10*log10 adj.pvalue).
data('Genes.Upreg',package = 'TFEA.ChIP') CM_list_UP <- contingency_matrix( Genes.Upreg ) stats_mat_UP <- getCMstats( CM_list_UP )
data('Genes.Upreg',package = 'TFEA.ChIP') CM_list_UP <- contingency_matrix( Genes.Upreg ) stats_mat_UP <- getCMstats( CM_list_UP )
Used to run examples. List of part of one ChIP-Seq dataset (from wgEncodeEH002402) in GenomicRanges format with 50 peaks.
data("gr.list")
data("gr.list")
List of one ChIP-Seq dataset.
Computes the weighted GSEA score of gene.set in gene.list.
GSEA_EnrichmentScore( gene.list, gene.set, weighted.score.type = 0, correl.vector = NULL )
GSEA_EnrichmentScore( gene.list, gene.set, weighted.score.type = 0, correl.vector = NULL )
gene.list |
The ordered gene list |
gene.set |
A gene set, e.g. gene IDs corresponding to a ChIP-Seq experiment's peaks. |
weighted.score.type |
Type of score: weight: 0 (unweighted = Kolmogorov-Smirnov), 1 (weighted), and 2 (over-weighted) |
correl.vector |
A vector with the coorelations (such as signal to noise scores) corresponding to the genes in the gene list |
list of: ES: Enrichment score (real number between -1 and +1) arg.ES: Location in gene.list where the peak running enrichment occurs (peak of the 'mountain') RES: Numerical vector containing the running enrichment score for all locations in the gene list tag.indicator: Binary vector indicating the location of the gene sets (1's) in the gene list
GSEA_EnrichmentScore(gene.list=c('3091','2034','405','55818'), gene.set=c('2034','112399','405'))
GSEA_EnrichmentScore(gene.list=c('3091','2034','405','55818'), gene.set=c('2034','112399','405'))
Function to calculate enrichment scores over a randomly ordered gene list.
GSEA_ESpermutations( gene.list, gene.set, weighted.score.type = 0, correl.vector = NULL, perms = 1000 )
GSEA_ESpermutations( gene.list, gene.set, weighted.score.type = 0, correl.vector = NULL, perms = 1000 )
gene.list |
Vector of gene Entrez IDs. |
gene.set |
A gene set, e.g. gene IDs corresponding to a ChIP-Seq experiment's peaks. |
weighted.score.type |
Type of score: weight: 0 (unweighted = Kolmogorov-Smirnov), 1 (weighted), and 2 (over-weighted) |
correl.vector |
A vector with the coorelations (such as signal to noise scores) corresponding to the genes in the gene list |
perms |
Number of permutations |
Vector of Enrichment Scores for a permutation test. gene.set=c('2034','112399','405'), perms=10)
Function to run a GSEA to analyze the distribution of TFBS across a sorted list of genes.
GSEA_run( gene.list, LFC, chip_index = get_chip_index(), get.RES = FALSE, RES.filter = NULL, perms = 1000 )
GSEA_run( gene.list, LFC, chip_index = get_chip_index(), get.RES = FALSE, RES.filter = NULL, perms = 1000 )
gene.list |
List of Entrez IDs ordered by their fold change. |
LFC |
Vector of log2( Fold Change ) values. |
chip_index |
Output of the function “get_chip_index”, a data frame containing accession IDs of ChIPs on the database and the TF each one tests. If not provided, the whole internal database will be used |
get.RES |
(Optional) boolean. If TRUE, the function stores Running Enrichment Scores of all/some TF. |
RES.filter |
(Optional) chr vector. When get.RES==TRUE, allows to choose which TF's Running Enrichment Score to store. |
perms |
(Optional) integer. Number of permutations for the enrichment test. |
a list of: Enrichment.table: data frame containing accession ID, Cell type, ChIP-Seq treatment, transcription factor tested, enrichment score, adjusted p-value, and argument of every ChIP-Seq experiment. RES (optional): list of running sums of every ChIP-Seq indicators (optional): list of 0/1 vectors that stores the matches (1) and mismatches (0) between the gene list and the gene set.
data( 'hypoxia', package = 'TFEA.ChIP' ) hypoxia <- preprocessInputData( hypoxia ) chip_index <- get_chip_index( TFfilter = c('HIF1A','EPAS1','ARNT' ) ) GSEA.result <- GSEA_run( hypoxia$Genes, hypoxia$log2FoldChange, chip_index, get.RES = TRUE)
data( 'hypoxia', package = 'TFEA.ChIP' ) hypoxia <- preprocessInputData( hypoxia ) chip_index <- get_chip_index( TFfilter = c('HIF1A','EPAS1','ARNT' ) ) GSEA.result <- GSEA_run( hypoxia$Genes, hypoxia$log2FoldChange, chip_index, get.RES = TRUE)
Used to run examples. Output of the function GSEA.run from the TFEA.ChIP package, contains an enrichment table and two lists, one storing runnign enrichment scores and the other, matches/missmatches along a gene list.
data("GSEA.result")
data("GSEA.result")
list of three elements, an erihcment table (data frame), and two list of arrays.
Function to highlight certain transcription factors using different colors in a plotly graph.
highlight_TF(table, column, specialTF, markerColors)
highlight_TF(table, column, specialTF, markerColors)
table |
Enrichment matrix/data.frame. |
column |
Column # that stores the TF name in the matrix/df. |
specialTF |
Named vector containing TF names as they appear in the enrichment matrix/df and nicknames for their color group. Example: specialTF<-c('HIF1A','EPAS1','ARNT','SIN3A') names(specialTF)<-c('HIF','HIF','HIF','SIN3A') |
markerColors |
Vector specifying the shade for every color group. |
List of two objects: A vector to attach to the enrichment matrix/df pointing out the color group of every row. A named vector connecting each color group to the chosen color.
A data frame containing information of of an RNA-Seq experiment on newly transcripted RNA in HUVEC cells during two conditions, 8h of normoxia and 8h of hypoxia (deposited at GEO as GSE89831). The data frame contains the following fields:
Gene: Gene Symbol for each gene analyzed.
Log2FoldChange: base 2 logarithm of the fold change on RNA transcription for a given gene between the two conditions.
pvalue
padj: p-value adjusted via FDR.
data("hypoxia")
data("hypoxia")
a data frame of 17527 observations of 4 variables.
A DESeqResults objetc containing information of of an RNA-Seq experiment on newly transcripted RNA in HUVEC cells during two conditions, 8h of normoxia and 8h of hypoxia (deposited at GEO as GSE89831).
hypoxia_DESeq
hypoxia_DESeq
a DESeqResults objtec
Used to run examples. Array of 2754 log2(Fold Change) values extracted from an RNA-Seq experiment.
data("log2.FC")
data("log2.FC")
Array of 2754 log2(Fold Change) values.
makeChIPGeneDB generates a ChIP-seq - target database through the association of ChIP-Seq peak coordinates (provided as a GenomicRange object) to overlapping genes or gene-associated genomic regions (Ref.db).
makeChIPGeneDB(Ref.db, gr.list, distanceMargin = 10, min.Targets = 10)
makeChIPGeneDB(Ref.db, gr.list, distanceMargin = 10, min.Targets = 10)
Ref.db |
GenomicRanges object containing a database of reference elements (either Genes or gene-associated regions) including a gene_id metacolumn |
gr.list |
List of GR objects containing ChIP-seq peak coordinates (output of txt2GR). |
distanceMargin |
Maximum distance allowed between a gene or regulatory element to assign a gene to a ChIP-seq peak. Set to 10 bases by default. |
min.Targets |
Minimum number of putative targets per ChIP-seq in gr.list. ChIPs with fewer targets will be discarded. regulatory element to assign a gene to a ChIP-seq peak. Set to 10 bases by default. |
List containing two elements: - Gene Keys: vector of gene IDs - ChIP Targets: list of vectors, one per element in gr.list, containing the putative targets assigned. Each target is coded as its position in the vector 'Gene Keys'.
data( 'DnaseHS_db','gr.list', package = 'TFEA.ChIP' ) makeChIPGeneDB( DnaseHS_db, gr.list )
data( 'DnaseHS_db','gr.list', package = 'TFEA.ChIP' ) makeChIPGeneDB( DnaseHS_db, gr.list )
Function to transform a ChIP-gene data base from the former binary matrix to the current list-based format.
matrixDB_to_listDB(Mat01)
matrixDB_to_listDB(Mat01)
Mat01 |
Matrix[n,m] which rows correspond to all the human genes that have been assigned an Entrez ID, and its columns, to every ChIP-Seq experiment in the database. The values are 1 – if the ChIP-Seq has a peak assigned to that gene – or 0 – if it hasn’t –. |
List containing two elements: - Gene Keys: vector of gene IDs - ChIP Targets: list of vectors, one per ChIP-seq experiment in the, database, containing the putative targets assigned. Each target is coded as its position in the vector 'Gene Keys'.
Mat01 <- matrix( round( runif(9) ), nrow = 3, dimnames= list( paste0("Gene ", 1:3), paste0("ChIPseq ", 1:3)) ) matrixDB_to_listDB( Mat01 )
Mat01 <- matrix( round( runif(9) ), nrow = 3, dimnames= list( paste0("Gene ", 1:3), paste0("ChIPseq ", 1:3)) ) matrixDB_to_listDB( Mat01 )
A data frame containing information about the ChIP-Seq experiments used to build the TF-gene binding DB. Fields in the data frame:
Accession: Accession ID of the experiment.
Cell: Cell line or tissue.
'Cell Type': More information about the cells.
Treatment
Antibody
TF: Transcription factor tested in the ChIP-Seq experiment.
data("MetaData")
data("MetaData")
A data frame of 1060 observations of 6 variables
Function to generate an interactive html plot from a transcription factor enrichment table, output of the function 'getCMstats'.
plot_CM(CM.statMatrix, plot_title = NULL, specialTF = NULL, TF_colors = NULL)
plot_CM(CM.statMatrix, plot_title = NULL, specialTF = NULL, TF_colors = NULL)
CM.statMatrix |
Output of the function 'getCMstats'. A data frame storing: Accession ID of every ChIP-Seq tested, Transcription Factor,Odds Ratio, p-value and adjusted p-value. |
plot_title |
The title for the plot. |
specialTF |
(Optional) Named vector of TF symbols -as written in the enrichment table- to be highlighted in the plot. The name of each element of the vector specifies its color group, i.e.: naming elements HIF1A and HIF1B as 'HIF' to represent them with the same color. |
TF_colors |
(Optional) Nolors to highlight TFs chosen in specialTF. |
plotly scatter plot.
data('Genes.Upreg',package = 'TFEA.ChIP') CM_list_UP <- contingency_matrix( Genes.Upreg ) stats_mat_UP <- getCMstats( CM_list_UP ) plot_CM( stats_mat_UP )
data('Genes.Upreg',package = 'TFEA.ChIP') CM_list_UP <- contingency_matrix( Genes.Upreg ) stats_mat_UP <- getCMstats( CM_list_UP ) plot_CM( stats_mat_UP )
Function to plot the Enrichment Score of every member of the ChIPseq binding database.
plot_ES( GSEA_result, LFC, plot_title = NULL, specialTF = NULL, TF_colors = NULL, Accession = NULL, TF = NULL )
plot_ES( GSEA_result, LFC, plot_title = NULL, specialTF = NULL, TF_colors = NULL, Accession = NULL, TF = NULL )
GSEA_result |
Returned by GSEA_run |
LFC |
Vector with log2(Fold Change) of every gene that has an Entrez ID. Arranged from higher to lower. |
plot_title |
(Optional) Title for the plot |
specialTF |
(Optional) Named vector of TF symbols -as written in the enrichment table- to be highlighted in the plot. The name of each element specifies its color group, i.e.: naming elements HIF1A and HIF1B as 'HIF' to represent them with the same color. |
TF_colors |
(Optional) Colors to highlight TFs chosen in specialTF. |
Accession |
(Optional) restricts plot to the indicated list dataset IDs. |
TF |
(Optional) restricts plot to the indicated list transcription factor names. |
Plotly object with a scatter plot -Enrichment scores- and a heatmap -log2(fold change) bar-.
data('GSEA.result','log2.FC',package = 'TFEA.ChIP') TF.hightlight <- c('E2F1' = 'E2F1') col <- c('red') plot_ES( GSEA.result, log2.FC, "Example", TF.hightlight, col )
data('GSEA.result','log2.FC',package = 'TFEA.ChIP') TF.hightlight <- c('E2F1' = 'E2F1') col <- c('red') plot_ES( GSEA.result, log2.FC, "Example", TF.hightlight, col )
Function to plot all the RES stored in a GSEA_run output.
plot_RES( GSEA_result, LFC, plot_title = NULL, line.colors = NULL, line.styles = NULL, Accession = NULL, TF = NULL )
plot_RES( GSEA_result, LFC, plot_title = NULL, line.colors = NULL, line.styles = NULL, Accession = NULL, TF = NULL )
GSEA_result |
Returned by GSEA_run |
LFC |
Vector with log2(Fold Change) of every gene that has an Entrez ID. Arranged from higher to lower. |
plot_title |
(Optional) Title for the plot. |
line.colors |
(Optional) Vector of colors for each line. |
line.styles |
(Optional) Vector of line styles for each line ('solid'/'dash'/'longdash'). |
Accession |
(Optional) restricts plot to the indicated list dataset IDs. |
TF |
(Optional) restricts plot to the indicated list transcription factor names. |
Plotly object with a line plot -running sums- and a heatmap -log2(fold change) bar-.
data('GSEA.result','log2.FC',package = 'TFEA.ChIP') plot_RES(GSEA.result, log2.FC, TF = c('E2F4',"E2F1"), Accession=c('ENCSR000DYY.E2F4.GM12878', 'ENCSR000EVJ.E2F1.HeLa-S3'))
data('GSEA.result','log2.FC',package = 'TFEA.ChIP') plot_RES(GSEA.result, log2.FC, TF = c('E2F4',"E2F1"), Accession=c('ENCSR000DYY.E2F4.GM12878', 'ENCSR000EVJ.E2F1.HeLa-S3'))
Function to extract Gene IDs, logFoldChange, and p-val values from a DESeqResults object or data frame. Gene IDs are translated to ENTREZ IDs, if possible, and the resultant data frame is sorted accordint to decreasing log2(Fold Change). Translating gene IDs from mouse to their equivalent human genes is avaible using the variable "mode".
preprocessInputData(inputData, mode = "h2h")
preprocessInputData(inputData, mode = "h2h")
inputData |
DESeqResults object or data frame. In all cases must include gene IDs. Data frame inputs should include 'pvalue' and 'log2FoldChange' as well. |
mode |
Specify the organism used: 'h2h' for homo sapiens gene IDs, 'm2m' for mouse gene IDs, or 'm2h' to get the corresponding human gene IDs from a mouse input. |
A table containing Entrez Gene IDs, LogFoldChange and p-val values (both raw p-value and fdr adjusted p-value), sorted by log2FoldChange.
data('hypoxia_DESeq',package='TFEA.ChIP') preprocessInputData( hypoxia_DESeq )
data('hypoxia_DESeq',package='TFEA.ChIP') preprocessInputData( hypoxia_DESeq )
Rank the TFs in the output from 'getCMstats' using Wilcoxon rank-sum test or a GSEA-like approach.
rankTFs( resultsTable, rankMethod = "gsea", makePlot = FALSE, plotTitle = "TF ranking" )
rankTFs( resultsTable, rankMethod = "gsea", makePlot = FALSE, plotTitle = "TF ranking" )
resultsTable |
Output from the function 'getCMstats' |
rankMethod |
"wilcoxon" or "gsea". |
makePlot |
(Optional) For rankMethod="gsea". If TRUE, generates a plot for TFs with a p-value < 0.05. |
plotTitle |
(Optional) Title for the plot. |
data frame containing:
For Wilcoxon rank-sum test: rank, TF name, test statistic ('wilc_W), p-value, Freeman's theta, epsilon-squared anf effect size
For GSEA-like ranking: TF name, enrichment score, argument, p-value, number of ChIPs
data('Genes.Upreg',package = 'TFEA.ChIP') CM_list_UP <- contingency_matrix(Genes.Upreg) stats_mat_UP <- getCMstats(CM_list_UP) rankTFs( stats_mat_UP )
data('Genes.Upreg',package = 'TFEA.ChIP') CM_list_UP <- contingency_matrix(Genes.Upreg) stats_mat_UP <- getCMstats(CM_list_UP) rankTFs( stats_mat_UP )
Function to extract Gene IDs from a dataframe according to the established limits for log2(FoldChange) and p-value. If possible, the function will use the adjusted p-value column.
Select_genes( GeneExpression_df, max_pval = 0.05, min_pval = 0, max_LFC = Inf, min_LFC = -Inf )
Select_genes( GeneExpression_df, max_pval = 0.05, min_pval = 0, max_LFC = Inf, min_LFC = -Inf )
GeneExpression_df |
A data frame with the folowing fields: 'Gene', 'pvalue' or 'pval.adj', 'log2FoldChange'. |
max_pval |
maximum p-value allowed, 0.05 by default. |
min_pval |
minimum p-value allowed, 0 by default. |
max_LFC |
maximum log2(FoldChange) allowed. |
min_LFC |
minimum log2(FoldChange) allowed. |
A vector of gene IDs.
data('hypoxia',package='TFEA.ChIP') Select_genes(hypoxia)
data('hypoxia',package='TFEA.ChIP') Select_genes(hypoxia)
Function to set the data objects provided by the user as default to the rest of the functions.
set_user_data(metadata, ChIPDB)
set_user_data(metadata, ChIPDB)
metadata |
Data frame/matrix/array contaning the following fields: 'Name','Accession','Cell','Cell Type','Treatment','Antibody','TF'. |
ChIPDB |
List containing two elements: - Gene Keys: vector of gene IDs - ChIP Targets: list of vectors, one per ChIP-seq experiment in the, database, containing the putative targets assigned. Each target is coded as its position in the vector 'Gene Keys'. |
sets the user's metadata table and TFBS matrix as the variables 'MetaData' and 'ChIPDB', used by the rest of the package.
data( 'MetaData', 'ChIPDB', package='TFEA.ChIP' ) # For this example, we will usethe variables already included in the # package. set_user_data( MetaData, ChIPDB )
data( 'MetaData', 'ChIPDB', package='TFEA.ChIP' ) # For this example, we will usethe variables already included in the # package. set_user_data( MetaData, ChIPDB )
Function to filter a ChIP-Seq output (in .narrowpeak or MACS's peaks.bed formats) and then store the peak coordinates in a GenomicRanges object, associated to its metadata.
txt2GR(fileTable, format, fileMetaData, alpha = NULL)
txt2GR(fileTable, format, fileMetaData, alpha = NULL)
fileTable |
data frame from a txt/tsv/bed file |
format |
'narrowpeak', 'macs1.4' or 'macs2'. narrowPeak fields: 'chrom','chromStart','chromEnd','name','score','strand','signalValue', 'pValue','qValue','peak' macs1.4 fields: 'chrom','chromStart','chromEnd','name','-10*log10(p-value)' macs2 fields: 'chrom','chromStart','chromEnd','name','-log10(p-value)' |
fileMetaData |
Data frame/matrix/array contaning the following fields: 'Name','Accession','Cell','Cell Type','Treatment','Antibody', 'TF'. |
alpha |
max p-value to consider ChIPseq peaks as significant and include them in the database. By default alpha is 0.05 for narrow peak files and 1e-05 for MACS files |
The function returns a GR object generated from the ChIP-Seq dataset input.
data('ARNT.peaks.bed','ARNT.metadata',package = 'TFEA.ChIP') ARNT.gr<-txt2GR(ARNT.peaks.bed,'macs1.4',ARNT.metadata)
data('ARNT.peaks.bed','ARNT.metadata',package = 'TFEA.ChIP') ARNT.gr<-txt2GR(ARNT.peaks.bed,'macs1.4',ARNT.metadata)