Title: | TFutils |
---|---|
Description: | This package helps users to work with TF metadata from various sources. Significant catalogs of TFs and classifications thereof are made available. Tools for working with motif scans are also provided. |
Authors: | Vincent Carey [aut, cre], Shweta Gopaulakrishnan [aut] |
Maintainer: | Vincent Carey <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.27.1 |
Built: | 2024-11-13 03:27:43 UTC |
Source: | https://github.com/bioc/TFutils |
check columns of a dataframe for numerical tokens of 7 or 8 digits and create HTML anchors to pubmed.gov constituting a link to a PMID
anchor_pmids(dataframe)
anchor_pmids(dataframe)
dataframe |
a data.frame instance |
data.frame with HTML anchors to pubmed.gov inserted where 7- or 8-digit numbers are found
The method of isolating putative PMIDs is peculiar to patterns found in
the comment fields of annotated TF table (supplemental table S1 found in
https://www.cell.com/cms/10.1016/j.cell.2018.01.029/attachment/88c0eca1-66f9-4068-b02e-bd3d55144f79/mmc2.xlsx of PMID 29425488). When DT::datatable is called on the output
of this function with escape=FALSE
the PMIDs will render as hyperlinks.
Note that column 1 is assumed to be an ENSEMBL ID which could have 7 or 8 digits but is handled differently
litdf = data.frame(id="ENSG00000116819", a="Binds the same GCCTGAGGC sequence as the other AP-2s (PMID: 24789576)", stringsAsFactors=FALSE) anchor_pmids(litdf)
litdf = data.frame(id="ENSG00000116819", a="Binds the same GCCTGAGGC sequence as the other AP-2s (PMID: 24789576)", stringsAsFactors=FALSE) anchor_pmids(litdf)
use DT::datatable to browse the GO catalogue of human DNA-binding transcription factors in Table S1.A of Lovering et al.
browse_gotf_main(cache = BiocFileCache::BiocFileCache(ask = FALSE))
browse_gotf_main(cache = BiocFileCache::BiocFileCache(ask = FALSE))
cache |
a BiocFileCache instance |
result of DT::datatable
if (interactive()) browse_gotf_main()
if (interactive()) browse_gotf_main()
use DT::datatable to browse the Lambert's Human Transcription Factors repository
browse_humantfs_main(cache = BiocFileCache::BiocFileCache(ask = FALSE))
browse_humantfs_main(cache = BiocFileCache::BiocFileCache(ask = FALSE))
cache |
a BiocFileCache instance |
result of DT::datatable
if (interactive()) browse_humantfs_main()
if (interactive()) browse_humantfs_main()
browse several hundred disease-TF associations with hyperlinked PMIDs
browse_lambert_gwaslinks()
browse_lambert_gwaslinks()
DT::datatable
Based on supplemental table S4 of PMID 29425488
if (interactive()) browse_lambert_gwaslinks()
if (interactive()) browse_lambert_gwaslinks()
use DT::datatable to browse the Lambert table S1
browse_lambert_main(cache = BiocFileCache::BiocFileCache(ask = FALSE))
browse_lambert_main(cache = BiocFileCache::BiocFileCache(ask = FALSE))
cache |
a BiocFileCache instance |
result of DT::datatable
PMIDs are converted to HTML anchors and DT::datatable is run with escape=FALSE
.
if (interactive()) browse_lambert_main()
if (interactive()) browse_lambert_main()
cisbpTFcat: data.frame with information on CISBP TFs for human, retained for reproducibility support; see cisbpTFcat_2.0 for a more recent catalog
cisbpTFcat
cisbpTFcat
data.frame
Extracted March 2018, checked August 2018. The only changes observed are that genes ZUFSP and T are used has HGNC values in the March catalog; these symbols seem to be absent from the org.Hs.eg.db of August 2018. The records involved are 1356, 7412 and 7413. These symbols were left in the package image of CISBP in August 2018.
http://cisbp.ccbr.utoronto.ca/bulk.php select Homo_sapiens
head(TFutils::cisbpTFcat)
head(TFutils::cisbpTFcat)
cisbpTFcat_2.0: data.frame with information on CISBP TFs for human, described in PMID 31133749
cisbpTFcat_2.0
cisbpTFcat_2.0
data.frame
Extracted August 2019.
http://cisbp.ccbr.utoronto.ca/bulk.php select Homo_sapiens
head(TFutils::cisbpTFcat_2.0)
head(TFutils::cisbpTFcat_2.0)
basic layout parameters for circos
defaultCircosParms()
defaultCircosParms()
a list
head(defaultCircosParms())
head(defaultCircosParms())
fimo_granges
a list of GRanges instances with TF FIMO scores returned by fimo_granges
demo_fimo_granges
demo_fimo_granges
a list of GRanges instances
names(S4Vectors::mcols(demo_fimo_granges$VDR[[1]]))
names(S4Vectors::mcols(demo_fimo_granges$VDR[[1]]))
demonstrate interoperation of TF catalog with GWAS catalog
directHitsInCISBP(traitTag, gwascat)
directHitsInCISBP(traitTag, gwascat)
traitTag |
character(1) string found in DISEASE/TRAIT field of gwascat instance |
gwascat |
instance of |
data.frame
data(gwascat_hg19_chr17) directHitsInCISBP("Prostate cancer" , gwascat_hg19_chr17)
data(gwascat_hg19_chr17) directHitsInCISBP("Prostate cancer" , gwascat_hg19_chr17)
encode690: DataFrame extending AnnotationHub metadata about ENCODE cell line x TF ranges
encode690
encode690
DataFrame
see metadata(encode690)
names(TFutils::encode690) TFutils::encode690[,1:5]
names(TFutils::encode690) TFutils::encode690[,1:5]
create a list of GRanges for FIMO hits in a GenomicFiles instance, corresponding to a GRanges-based query
fimo_granges(gf, query)
fimo_granges(gf, query)
gf |
GenomicFiles instance, like fimo16 in TFutils |
query |
a GRanges specifying ranges to check for TF binding scores |
a list of GRanges, produced by GenomicFiles::reduceByRange
Be sure to use register([BPPARAM])
appropriately.
if (interactive()) { # need internet # setup -- annotate fimo16 object and create an informative # query colnames(fimo16) = fimo16$HGNC si = GenomeInfoDb::Seqinfo(genome="hg19")["chr17"] # to fix query genome myg = GRanges("chr17", IRanges(38.07e6,38.09e6), seqinfo=si) requireNamespace("BiocParallel") BiocParallel::register(BiocParallel::SerialParam()) f1 = fimo_granges(fimo16[, c("VDR", "POU2F1")], myg) f1 }
if (interactive()) { # need internet # setup -- annotate fimo16 object and create an informative # query colnames(fimo16) = fimo16$HGNC si = GenomeInfoDb::Seqinfo(genome="hg19")["chr17"] # to fix query genome myg = GRanges("chr17", IRanges(38.07e6,38.09e6), seqinfo=si) requireNamespace("BiocParallel") BiocParallel::register(BiocParallel::SerialParam()) f1 = fimo_granges(fimo16[, c("VDR", "POU2F1")], myg) f1 }
fimo16: GenomicFiles instance to AWS S3-resident FIMO bed for 16 TFs
fimo16
fimo16
GenomicFiles for a TabixFileList
K. Glass FIMO runs, see https://doi.org/10.1016/j.celrep.2017.10.001
TFutils::fimo16
TFutils::fimo16
fimoMap: table with Mnnnn (motif PWM tags) and HGNC symbols for TFs
fimoMap
fimoMap
data.frame
Kimberly Glass ([email protected])
head(TFutils::fimoMap)
head(TFutils::fimoMap)
use EnsDb to generate an exon-level model of genes identified by symbol
genemodelDF(sym, resource, columnsKept = c("gene_id", "tx_id"), ...)
genemodelDF(sym, resource, columnsKept = c("gene_id", "tx_id"), ...)
sym |
a character() vector of gene symbols |
resource |
should be or inherit from EnsDb, answering exons(), with AnnotationFilter::SymbolFilter as filter parameter |
columnsKept |
character vector used as |
... |
passed to exons() |
data.frame instance with exons in rows
There are many approaches available to acquiring 'gene models' in Bioconductor; this one emphasizes the use of the exons method for Ensembl annotation.
if (requireNamespace("EnsDb.Hsapiens.v75")) { orm = genemodelDF("ORMDL3", EnsDb.Hsapiens.v75::EnsDb.Hsapiens.v75) dim(orm) } head(orm)
if (requireNamespace("EnsDb.Hsapiens.v75")) { orm = genemodelDF("ORMDL3", EnsDb.Hsapiens.v75::EnsDb.Hsapiens.v75) dim(orm) } head(orm)
create a GeneRegionTrack instance for selected symbols
genemodForGviz( sym = "ORMDL3", id_elem = c("symbol", "tx_id"), resource = EnsDb.Hsapiens.v75::EnsDb.Hsapiens.v75, ... )
genemodForGviz( sym = "ORMDL3", id_elem = c("symbol", "tx_id"), resource = EnsDb.Hsapiens.v75::EnsDb.Hsapiens.v75, ... )
sym |
character vector of gene symbols, should be neighboring genes |
id_elem |
vector of names of columns generated by genemodelDF to be used to label transcripts |
resource |
should be or inherit from EnsDb, answering exons(), with AnnotationFilter::SymbolFilter as filter parameter |
... |
passed to genemodelDF |
instance of Gviz GeneRegionTrack
This function helps to display the locations of TF binding sites in the context of complex gene models. A complication is that we have nice visualization of quantitative affinity predictions for TFs in the vignette, based on ggplot2, but it is not clear how to use that specific code to work with Gviz.
if (requireNamespace("EnsDb.Hsapiens.v75") & requireNamespace("Gviz")) { orm = genemodForGviz("ORMDL3", resource= EnsDb.Hsapiens.v75::EnsDb.Hsapiens.v75) orm Gviz::plotTracks(orm, showId=TRUE) # change id_elem for shorter id string }
if (requireNamespace("EnsDb.Hsapiens.v75") & requireNamespace("Gviz")) { orm = genemodForGviz("ORMDL3", resource= EnsDb.Hsapiens.v75::EnsDb.Hsapiens.v75) orm Gviz::plotTracks(orm, showId=TRUE) # change id_elem for shorter id string }
utility to obtain location etc. for rsids of SNPs
get_rslocs_38(rsids = c("rs6060535", "rs56116432"))
get_rslocs_38(rsids = c("rs6060535", "rs56116432"))
rsids |
character vector of dbSNP identifiers |
GRanges instance
Uses rest.ensembl.org, posting to variant_recorder/homo_sapiens. Parses result minimally, using only the first SPDI to obtain location information, adding 1 as ensembl genomic coordinates are zero-based.
if (interactive()) get_rslocs_38() # see https://stat.ethz.ch/pipermail/bioc-devel/2020-October/017263.html
if (interactive()) get_rslocs_38() # see https://stat.ethz.ch/pipermail/bioc-devel/2020-October/017263.html
create table of TF targets and related metadata
grabTab( tfstub = "STAT1", gscoll = TFutils::tftColl, orgdb = org.Hs.eg.db::org.Hs.eg.db, gwrngs = TFutils::gwascat_hg19_chr17 )
grabTab( tfstub = "STAT1", gscoll = TFutils::tftColl, orgdb = org.Hs.eg.db::org.Hs.eg.db, gwrngs = TFutils::gwascat_hg19_chr17 )
tfstub |
character(1) gene-like symbol for TF; will be grepped in names(gscoll) |
gscoll |
a GSEABase GeneSetCollection |
orgdb |
an instance of OrgDb as defined in AnnotationDbi |
gwrngs |
a GRanges representing EBI gwascat, must have |
data.frame instance
This function will link together information on targets of a given TF to the GWAS catalog.
gt = grabTab("VDR", gscoll=TFutils::tftColl, orgdb=org.Hs.eg.db::org.Hs.eg.db, gwrngs=TFutils::gwascat_hg19_chr17) dim(gt) head(gt)
gt = grabTab("VDR", gscoll=TFutils::tftColl, orgdb=org.Hs.eg.db::org.Hs.eg.db, gwrngs=TFutils::gwascat_hg19_chr17) dim(gt) head(gt)
gwascat_hg19: GRanges of march 21 2018 EBI gwascat, limit to chr17
gwascat_hg19_chr17
gwascat_hg19_chr17
GenomicRanges GRanges instance
gwascat::makeCurrentGwascat, with gwascat:::lo38to19 applied
TFutils::gwascat_hg19_chr17[,1:5]
TFutils::gwascat_hg19_chr17[,1:5]
simple accessor for HGNCmap component of TFCatalog
HGNCmap(x)
HGNCmap(x)
x |
instance of TFCatalog |
dataframe instance
HGNCmap
HGNCmap
hocomoco.mono: data.frame with information on HOCOMOCO TFs for human
hocomoco.mono
hocomoco.mono
data.frame
Extracted March 2018
http://hocomoco11.autosome.ru/human/mono?full=true
head(TFutils::hocomoco.mono)
head(TFutils::hocomoco.mono)
hocomoco.mono.sep2018: data.frame with information on HOCOMOCO TFs for human, Sept 2018 download
hocomoco.mono.sep2018
hocomoco.mono.sep2018
data.frame
Extracted September 2018
http://hocomoco11.autosome.ru/human/mono?full=true
head(TFutils::hocomoco.mono.sep2018)
head(TFutils::hocomoco.mono.sep2018)
utility to read FIMO outputs from local resource(cluster), assuming bed text split by chromosome
importFIMO_local_split(tf, chr)
importFIMO_local_split(tf, chr)
tf |
character(1) file id |
chr |
character(1) chromosome name |
data.table instance
requireNamespace("GenomicRanges") requireNamespace("IRanges") importFIMO_local_split("M5946_1", "chr1") dim(importFIMO_local_split("M5946_1", "chr17"))
requireNamespace("GenomicRanges") requireNamespace("IRanges") importFIMO_local_split("M5946_1", "chr1") dim(importFIMO_local_split("M5946_1", "chr17"))
import a FIMO bed-like file
## S4 method for signature 'TabixFile,GRanges' importFIMO(src, parms, ...) ## S4 method for signature 'character,missing' importFIMO(src, parms, ...)
## S4 method for signature 'TabixFile,GRanges' importFIMO(src, parms, ...) ## S4 method for signature 'character,missing' importFIMO(src, parms, ...)
src |
TabixFile instance |
parms |
a GRanges instance delimiting the import; multiple GRanges can be used |
... |
passed to GenomicRanges::GRanges |
instance of GRanges
if (requireNamespace("Rsamtools")) { tf = Rsamtools::TabixFile(system.file("M5946_1/chr1.bed.gz", package="TFutils")) importFIMO(tf, GenomicRanges::GRanges("chr1", IRanges::IRanges(1e6,11e6))) }
if (requireNamespace("Rsamtools")) { tf = Rsamtools::TabixFile(system.file("M5946_1/chr1.bed.gz", package="TFutils")) importFIMO(tf, GenomicRanges::GRanges("chr1", IRanges::IRanges(1e6,11e6))) }
lambert_snps is Table S3 of Lambert et al PMID 29425488
lambert_snps
lambert_snps
data.frame
head(lambert_snps)
head(lambert_snps)
metadata_tf: list with metadata (motif_if and hgnc_symbol) about all the CISBP FIMO scan TF bed files
metadata_tf
metadata_tf
list
K. Glass ran FIMO
TFutils::metadata_tf
TFutils::metadata_tf
named_tf: named list with the names being the hgnc_symbol of the motif_id
named_tf
named_tf
list
K. Glass ran FIMO
TFutils::named_tf named_tf[["VDR"]]
TFutils::named_tf named_tf[["VDR"]]
acquire the content of Table S1.A from Lovering et al., A GO catalogue of human DNA-binding transcription factors, DOI: https://doi.org/10.1101/2020.10.28.359232
retrieve_gotf_main(cache = BiocFileCache::BiocFileCache(ask = FALSE))
retrieve_gotf_main(cache = BiocFileCache::BiocFileCache(ask = FALSE))
cache |
a BiocFileCache instance |
a tbl_df
This will download the spreadsheet if not found in cache
.
if (interactive()) retrieve_gotf_main()
if (interactive()) retrieve_gotf_main()
acquire the CSV content for table S1 of Lambert et al. Cell 2018 from the Human TFS repository at http://humantfs.ccbr.utoronto.ca
retrieve_humantfs_main(cache = BiocFileCache::BiocFileCache(ask = FALSE))
retrieve_humantfs_main(cache = BiocFileCache::BiocFileCache(ask = FALSE))
cache |
a BiocFileCache instance |
a tbl_df
This will download the spreadsheet if not found in cache
.
if (interactive()) retrieve_humantfs_main()
if (interactive()) retrieve_humantfs_main()
acquire the Excel spreadsheet content for table S1 of Lambert et al. Cell 2018, "The Human Transcription Factors"
retrieve_lambert_main(cache = BiocFileCache::BiocFileCache(ask = FALSE))
retrieve_lambert_main(cache = BiocFileCache::BiocFileCache(ask = FALSE))
cache |
a BiocFileCache instance |
a tbl_df
This will download the spreadsheet if not found in cache
.
if (interactive()) retrieve_lambert_main()
if (interactive()) retrieve_lambert_main()
a Seqinfo instance for a chr17 in hg19
seqinfo_hg19_chr17
seqinfo_hg19_chr17
a Seqinfo instance
seqinfo_hg19_chr17
seqinfo_hg19_chr17
process a gene_attribute_matrix.txt file from harmonizeome into a GeneSetCollection
setupHIZE(txtfn = "gene_attribute_matrix.txt", tag)
setupHIZE(txtfn = "gene_attribute_matrix.txt", tag)
txtfn |
character(1) path to gene_attribute_matrix.txt file from harmonizeome |
tag |
character(1) will be added to shortDescription field of each GeneSet instance |
GSEABase::GeneSetCollection
After uncompressing content of http://amp.pharm.mssm.edu/static/hdfs/harmonizome/data/cheappi/gene_attribute_matrix.txt.gz run this on gene_attribute_matrix.txt with tag="CHEA".
produce a concise report on TFCatalog instance
## S4 method for signature 'TFCatalog' show(object)
## S4 method for signature 'TFCatalog' show(object)
object |
instance of TFCatalog |
side effect
Constructor for TFCatalog
TFCatalog(name, nativeIds, HGNCmap, metadata)
TFCatalog(name, nativeIds, HGNCmap, metadata)
name |
informative character(1) for collection |
nativeIds |
character() vector of identifiers used by collection creators |
HGNCmap |
data.frame with column 1 nativeIds, column 2 HGNC or hgnc.heur for MSigDb and any other columns of use |
metadata |
a list of metadata elements |
instance of TFCatalog
if (require("GSEABase")) { TFs_MSIG = TFCatalog(name="MsigDb.TFT",nativeIds=names(TFutils::tftColl), HGNCmap=data.frame(TFutils::tftCollMap,stringAsFactors=FALSE)) TFs_MSIG }
if (require("GSEABase")) { TFs_MSIG = TFCatalog(name="MsigDb.TFT",nativeIds=names(TFutils::tftColl), HGNCmap=data.frame(TFutils::tftCollMap,stringAsFactors=FALSE)) TFs_MSIG }
define a structure to hold information about TFs from diverse reference sources
name
character
nativeIds
character tokens used by the provider to enumerate transcription factors
HGNCmap
data.frame with atleast two columns, native id as first column and HGNC symbol as second column
metadata
ANY
This class respects the notions that 1) a source of information about transcription factors should have a name, 2) each source has its own 'native' nomenclature for the factors themselves, 3) it is common to use the gene symbol to refer to the transctiption factor, and 4) additional metadata will frequently be required to establish information about provenance of assertions about transcription factors.
use a radial plot (by default) for motif stack
tffamCirc.plot(motiflist, circosParms = defaultCircosParms())
tffamCirc.plot(motiflist, circosParms = defaultCircosParms())
motiflist |
a list of pfm instances from motifStack |
circosParms |
a list of parameter settings for circos plot |
side effect to graphics device
p1 = tffamCirc.prep( ) tffamCirc.plot(p1[c(1:8, 10:17, 19)])
p1 = tffamCirc.prep( ) tffamCirc.plot(p1[c(1:8, 10:17, 19)])
set up list of pfms in motifStack protocol
tffamCirc.prep(tffam = "Paired-related HD factors{3.1.3}", trimfac = 0.4)
tffamCirc.prep(tffam = "Paired-related HD factors{3.1.3}", trimfac = 0.4)
tffam |
character(1) name of TF family as found in TFutils::hocomoco.mono field |
trimfac |
fraction passed as parameter |
a list of pfm instances as defined in motifStack
Uses MotifDb, motifStack to create a list of pfms
n1 = tffamCirc.prep() str(n1)
n1 = tffamCirc.prep() str(n1)
tfhash: data.frame with MSigDb TFs, TF targets as symbol or ENTREZ
tfhash
tfhash
list
MSigDb "c3" (motif gene sets) has been harvested for simple annotation of TFs and targets.
TFutils::tfhash tfhash[1:3,]
TFutils::tfhash tfhash[1:3,]
gadget to help sort through tags naming TFs
TFtargs( gscoll = TFutils::tftColl, initTF = "VDR_Q3", gwcat = TFutils::gwascat_hg19_chr17, gadtitle = "Search for a TF; its targets will be checked for mapped status in GWAS catalog" )
TFtargs( gscoll = TFutils::tftColl, initTF = "VDR_Q3", gwcat = TFutils::gwascat_hg19_chr17, gadtitle = "Search for a TF; its targets will be checked for mapped status in GWAS catalog" )
gscoll |
a GSEABase GeneSetCollection |
initTF |
character(1) initial TF string for app |
gwcat |
GRanges-like structure with GWAS catalog information |
gadtitle |
character(1) a title for the gadget panel |
on app conclusion a data.frame is returned
Will use TFutils::gwascat_hg19_chr17 to look for 'MAPPED_GENE' field entries matching targets, also hardcoded to use org.Hs.eg.db to map symbols
if (interactive()) TFtargs()
if (interactive()) TFtargs()
tftColl: GSEABase GeneSetCollection for transcription factor targets
tftColl
tftColl
GSEABase GeneSetCollection instance
run GSEABase::getGMT() on c3/TFT geneset collection from MSigDb
broad institute
TFutils::tftColl
TFutils::tftColl
tftCollMap: data.frame with information on MSigDb TFs for human
tftCollMap
tftCollMap
data.frame
Annotation of TFs is ad-hoc. GeneSet names were tokenized, splitting by underscore, and then fragments were matched to SYMBOL and ALIAS elements of org.Hs.eg.db. Extracted March 2018
http://software.broadinstitute.org/gsea/msigdb/genesets.jsp?collection=TFT
head(TFutils::tftCollMap)
head(TFutils::tftCollMap)
Use MSigDB TF targets resource to find targets of input TF and find traits to which these targets have been mapped
topTraitsOfTargets(TFsym, gsc, gwcat, ntraits = 6, force = FALSE, ...)
topTraitsOfTargets(TFsym, gsc, gwcat, ntraits = 6, force = FALSE, ...)
TFsym |
character(1) symbol for a TF must be present in |
gsc |
an instance of |
gwcat |
instance of |
ntraits |
numeric(1) number of traits to report |
force |
logical see note, set to true if you want to skip mapping from TFsym to a specific motif or TF identifier used as name of a GeneSet in gsc |
... |
character() vector of fields in mcols(gwcat) to include |
data.frame
symbol, set force = TRUE to use a known 'motif' name among names(gsc)
If tftCollMap[, "hgnc.heur"]
does not possess the necessary
suppressPackageStartupMessages({ library(GSEABase) }) # more results if you substitute ebicat37 from gwascat below topTraitsOfTargets("MTF1" , tftColl, gwascat_hg19_chr17)
suppressPackageStartupMessages({ library(GSEABase) }) # more results if you substitute ebicat37 from gwascat below topTraitsOfTargets("MTF1" , tftColl, gwascat_hg19_chr17)
utility to generate link to biocfound bucket for FIMO TFBS scores
URL_s3_tf(tag = "M3433")
URL_s3_tf(tag = "M3433")
tag |
character(1) token identifying TF, can be an HGNC gene name or Mnnnn PWM tag. It must be findable in TFutils::fimoMap table. |
character(1) URL
URL_s3_tf
URL_s3_tf