Package: preciseTAD 1.17.0

Mikhail Dozmorov

preciseTAD: preciseTAD: A machine learning framework for precise TAD boundary prediction

preciseTAD provides functions to predict the location of boundaries of topologically associated domains (TADs) and chromatin loops at base-level resolution. As an input, it takes BED-formatted genomic coordinates of domain boundaries detected from low-resolution Hi-C data, and coordinates of high-resolution genomic annotations from ENCODE or other consortia. preciseTAD employs several feature engineering strategies and resampling techniques to address class imbalance, and trains an optimized random forest model for predicting low-resolution domain boundaries. Translated on a base-level, preciseTAD predicts the probability for each base to be a boundary. Density-based clustering and scalable partitioning techniques are used to detect precise boundary regions and summit points. Compared with low-resolution boundaries, preciseTAD boundaries are highly enriched for CTCF, RAD21, SMC3, and ZNF143 signal and more conserved across cell lines. The pre-trained model can accurately predict boundaries in another cell line using CTCF, RAD21, SMC3, and ZNF143 annotation data for this cell line.

Authors:Spiro Stilianoudakis [aut], Mikhail Dozmorov [aut, cre]

preciseTAD_1.17.0.tar.gz
preciseTAD_1.17.0.zip(r-4.5)preciseTAD_1.17.0.zip(r-4.4)preciseTAD_1.17.0.zip(r-4.3)
preciseTAD_1.17.0.tgz(r-4.4-any)preciseTAD_1.17.0.tgz(r-4.3-any)
preciseTAD_1.17.0.tar.gz(r-4.5-noble)preciseTAD_1.17.0.tar.gz(r-4.4-noble)
preciseTAD_1.17.0.tgz(r-4.4-emscripten)preciseTAD_1.17.0.tgz(r-4.3-emscripten)
preciseTAD.pdf |preciseTAD.html
preciseTAD/json (API)
NEWS

# Install 'preciseTAD' in R:
install.packages('preciseTAD', repos = c('https://bioc.r-universe.dev', 'https://cloud.r-project.org'))

Peer review:

Bug tracker:https://github.com/dozmorovlab/precisetad/issues

Datasets:
  • arrowhead_gm12878_5kb - Domain data from ARROWHEAD TAD-caller for GM12878 at 5 kb
  • tfbsList - A list of the chromosomal coordinates for 26 transcription factor binding sites from the Gm12878 cell line

On BioConductor:preciseTAD-1.17.0(bioc 3.21)preciseTAD-1.16.0(bioc 3.20)

softwarehicsequencingclusteringclassificationfunctionalgenomicsfeatureextraction

5.29 score 7 stars 14 scripts 155 downloads 7 exports 169 dependencies

Last updated 2 months agofrom:8d055cd88d. Checks:OK: 1 NOTE: 6. Indexed: yes.

TargetResultDate
Doc / VignettesOKDec 02 2024
R-4.5-winNOTEDec 02 2024
R-4.5-linuxNOTEDec 02 2024
R-4.4-winNOTEDec 02 2024
R-4.4-macNOTEDec 02 2024
R-4.3-winNOTEDec 02 2024
R-4.3-macNOTEDec 02 2024

Exports:bedToGRangesListcreateTADdataextractBoundariesjuicer_funcpreciseTADTADrandomForestTADrfe

Dependencies:abindaCGHaffyaffyioAnnotationDbiaskpassbase64encBHBiobaseBiocGenericsBiocIOBiocManagerBiocParallelBiostringsbitbit64bitopsblobbslibcachemcaretclasscliclockclustercodetoolscolorspacecommonmarkcpp11crayoncurldata.tableDBIdbscanDelayedArraydiagramdigestDNAcopydoSNOWdplyre1071fansifarverfastmapfontawesomeforeachformatRfsfutile.loggerfutile.optionsfuturefuture.applygenericsGenomeInfoDbGenomeInfoDbDataGenomicAlignmentsGenomicFeaturesGenomicRangesggplot2globalsgluegowergtablegtoolshardhathtmltoolshttpuvhttripredIRangesisobanditeratorsjquerylibjsonliteKEGGRESTKernSmoothlabelinglambda.rlaterlatticelavalifecyclelimmalistenvlubridatemagrittrMASSMatrixMatrixGenericsmatrixStatsmclustmemoisemgcvmimeModelMetricsmulttestmunsellnlmennetnumDerivopensslorg.Hs.eg.dbparallellypbapplypillarpkgconfigplogrplyrpngpreprocessCorepROCprodlimprogressrpromisesproxyPRROCpurrrR6randomForestrappdirsrCGHRColorBrewerRcppRCurlrecipesreshape2restfulrRhtslibrjsonrlangrpartRsamtoolsRSQLitertracklayerS4ArraysS4VectorssassscalesshapeshinysnowsourcetoolsSparseArraySQUAREMstatmodstringistringrSummarizedExperimentsurvivalsystibbletidyrtidyselecttimechangetimeDateTxDb.Hsapiens.UCSC.hg18.knownGeneTxDb.Hsapiens.UCSC.hg19.knownGeneTxDb.Hsapiens.UCSC.hg38.knownGenetzdbUCSC.utilsutf8vctrsviridisLitewithrXMLxtableXVectoryamlzlibbioc

preciseTAD Vignette

Rendered frompreciseTAD.Rmdusingknitr::rmarkdownon Dec 02 2024.

Last update: 2021-09-28
Started: 2020-06-19

Readme and manuals

Help Manual

Help pageTopics
Domain data from ARROWHEAD TAD-caller for GM12878 at 5 kbarrowhead_gm12878_5kb
Function to create a GRangesList object from functional genomic annotation data in the form of BED filesbedToGRangesList
Helper function used to create binary overlap type feature spacebinary_func
Helper function used to create count overlap type feature spacecount_func
Function to create a data matrix used for building a predictive model to classify boundary regions from functional genomic elementscreateTADdata
Helper function used to create (log2) distance type feature spacedistance_func
Function to extract boundaries from domain data.extractBoundaries
Helper function for transforming a GRanges object into matrix form to be saved as .txt or .BED file and imported into juicerjuicer_func
Helper function used to create percent overlap type feature spacepercent_func
Precise TAD boundary prediction at base-level resolution using density-based spatial clustering and partitioning techniquespreciseTAD
Helper function used to create signal type feature spacesignal_func
A wrapper function passed to 'caret::train' to apply a random forest classification algorithm built and tested on user-defined binned domain data from 'createTADdata'.TADrandomForest
A wrapper function passed to 'caret::rfe' to apply recursive feature elimination (RFE) on binned domain data as a feature reduction technique for random forests. Backward elimination is performed from p down to 2, by powers of 2, where p is the number of features in the data.TADrfe
A list of the chromosomal coordinates for 26 transcription factor binding sites from the Gm12878 cell linetfbsList