Title: | Informatics Tools for Cell-Free DNA Study |
---|---|
Description: | The cfTools R package provides methods for cell-free DNA (cfDNA) methylation data analysis to facilitate cfDNA-based studies. Given the methylation sequencing data of a cfDNA sample, for each cancer marker or tissue marker, we deconvolve the tumor-derived or tissue-specific reads from all reads falling in the marker region. Our read-based deconvolution algorithm exploits the pervasiveness of DNA methylation for signal enhancement, therefore can sensitively identify a trace amount of tumor-specific or tissue-specific cfDNA in plasma. cfTools provides functions for (1) cancer detection: sensitively detect tumor-derived cfDNA and estimate the tumor-derived cfDNA fraction (tumor burden); (2) tissue deconvolution: infer the tissue type composition and the cfDNA fraction of multiple tissue types for a plasma cfDNA sample. These functions can serve as foundations for more advanced cfDNA-based studies, including cancer diagnosis and disease monitoring. |
Authors: | Ran Hu [aut, cre] , Mary Louisa Stackpole [aut] , Shuo Li [aut] , Xianghong Jasmine Zhou [aut] , Wenyuan Li [aut] |
Maintainer: | Ran Hu <[email protected]> |
License: | file LICENSE |
Version: | 1.7.0 |
Built: | 2024-12-18 03:11:26 UTC |
Source: | https://github.com/bioc/cfTools |
A list of methylation levels (e.g., beta values), where each row is a sample and each column is a marker
data("beta_matrix")
data("beta_matrix")
A tibble with 20 rows and 3 variables
Beta values of marker1 for all samples
Beta values of marker2 for all samples
Beta values of marker3 for all samples
A tibble with 20 rows and 3 variables
Ran Hu [email protected]
Detect tumor-derived cfDNA and estimate the tumor burden.
CancerDetector( readsBinningFile, tissueMarkersFile, lambda = 0.5, id = "sample" )
CancerDetector( readsBinningFile, tissueMarkersFile, lambda = 0.5, id = "sample" )
readsBinningFile |
a file of the fragment-level methylation states of reads that mapped to the markers. |
tissueMarkersFile |
a file of paired shape parameters of beta distributions for markers. |
lambda |
a number controlling "confounding" markers' distance from average markers. |
id |
the sample ID. |
a list containing the cfDNA tumor burden and the normal cfDNA fraction.
## input files demo.dir <- system.file("data", package="cfTools") readsBinningFile <- file.path(demo.dir, "CancerDetector.reads.txt.gz") tissueMarkersFile <- file.path(demo.dir, "CancerDetector.markers.txt.gz") lambda <- 0.5 id <- "test" CancerDetector(readsBinningFile, tissueMarkersFile, lambda, id)
## input files demo.dir <- system.file("data", package="cfTools") readsBinningFile <- file.path(demo.dir, "CancerDetector.reads.txt.gz") tissueMarkersFile <- file.path(demo.dir, "CancerDetector.markers.txt.gz") lambda <- 0.5 id <- "test" CancerDetector(readsBinningFile, tissueMarkersFile, lambda, id)
The paired shape parameters of beta distributions for cancer-specific markers
data("CancerDetector.markers")
data("CancerDetector.markers")
A tibble with 1266 rows and 3 variables
Name of the marker
Paired beta distribution shape parameters for tumor samples
Paired beta distribution shape parameters for normal plasma samples
A tibble with 1266 rows and 3 variables
Ran Hu [email protected]
The fragment-level methylation states of reads that mapped to the cancer-specific markers
data("CancerDetector.reads")
data("CancerDetector.reads")
A tibble with 9991 rows and 2 variables
Name of the marker
Fragment-level methylation states, which are represented by a sequence of binary values (0 represents unmethylated CpG and 1 represents methylated CpG on the same fragment)
A tibble with 9991 rows and 2 variables
Ran Hu [email protected]
Infer the tissue-type composition of plasma cfDNA.
cfDeconvolve( readsBinningFile, tissueMarkersFile, numTissues, emAlgorithmType = "em.global.unknown", likelihoodRatioThreshold = 2, emMaxIterations = 100, randomSeed = 0, id = "sample" )
cfDeconvolve( readsBinningFile, tissueMarkersFile, numTissues, emAlgorithmType = "em.global.unknown", likelihoodRatioThreshold = 2, emMaxIterations = 100, randomSeed = 0, id = "sample" )
readsBinningFile |
a file of the fragment-level methylation states of reads that mapped to the markers. Either in plain text or compressed form. |
tissueMarkersFile |
a file of paired shape parameters of beta distributions for markers. |
numTissues |
a number of tissue types. |
emAlgorithmType |
a read-based tissue deconvolution EM algorithm type: em.global.unknown (default), em.global.known, em.local.unknown, em.local.known. |
likelihoodRatioThreshold |
a positive float number. Default is 2. |
emMaxIterations |
a number of EM algorithm maximum iteration. Default is 100. |
randomSeed |
a random seed that initialize the EM algorithm. Default is 0. |
id |
the sample ID. |
a list containing the cfDNA fractions of different tissue types and an unknown class.
## input files demo.dir <- system.file("data", package="cfTools") readsBinningFile <- file.path(demo.dir, "cfDeconvolve.reads.txt.gz") tissueMarkersFile <- file.path(demo.dir, "cfDeconvolve.markers.txt.gz") numTissues <- 7 emAlgorithmType <- "em.global.unknown" likelihoodRatioThreshold <- 2 emMaxIterations <- 100 randomSeed <- 0 id <- "test" cfDeconvolve(readsBinningFile, tissueMarkersFile, numTissues, emAlgorithmType, likelihoodRatioThreshold, emMaxIterations, randomSeed, id)
## input files demo.dir <- system.file("data", package="cfTools") readsBinningFile <- file.path(demo.dir, "cfDeconvolve.reads.txt.gz") tissueMarkersFile <- file.path(demo.dir, "cfDeconvolve.markers.txt.gz") numTissues <- 7 emAlgorithmType <- "em.global.unknown" likelihoodRatioThreshold <- 2 emMaxIterations <- 100 randomSeed <- 0 id <- "test" cfDeconvolve(readsBinningFile, tissueMarkersFile, numTissues, emAlgorithmType, likelihoodRatioThreshold, emMaxIterations, randomSeed, id)
The paired shape parameters of beta distributions for tissue-specific markers
data("cfDeconvolve.markers")
data("cfDeconvolve.markers")
A tibble with 10 rows and 8 variables
Name of the marker
Paired beta distribution shape parameters for tissue1 samples
Paired beta distribution shape parameters for tissue2 samples
Paired beta distribution shape parameters for tissue3 samples
Paired beta distribution shape parameters for tissue4 samples
Paired beta distribution shape parameters for tissue5 samples
Paired beta distribution shape parameters for tissue6 samples
Paired beta distribution shape parameters for tissue7 samples
A tibble with 10 rows and 8 variables
Ran Hu [email protected]
The fragment-level methylation states of reads that mapped to the tissue-specific markers
data("cfDeconvolve.reads")
data("cfDeconvolve.reads")
A tibble with 942 rows and 2 variables
Name of the marker
Fragment-level methylation states, which are represented by a sequence of binary values (0 represents unmethylated CpG and 1 represents methylated CpG on the same fragment)
A tibble with 942 rows and 2 variables
Ran Hu [email protected]
Tissue deconvolution in cfDNA using DNN models.
cfSort(readsBinningFile, id = "sample")
cfSort(readsBinningFile, id = "sample")
readsBinningFile |
a file of the fragment-level methylation states of reads that mapped to the cfSort markers. In compressed form. |
id |
the sample ID. |
the tissue composition of the cfDNA sample.
## input files demo.dir <- system.file("data", package="cfTools") readsBinningFile <- file.path(demo.dir, "cfsort_reads.txt.gz") id <- "test" cfSort(readsBinningFile, id)
## input files demo.dir <- system.file("data", package="cfTools") readsBinningFile <- file.path(demo.dir, "cfsort_reads.txt.gz") id <- "test" cfSort(readsBinningFile, id)
Marker information for the cfSort function, where each row is the information about a marker
data("cfsort_markers")
data("cfsort_markers")
A tibble with 51035 rows and 4 variables
The marker index used in cfSort method
The alpha threshold for each marker
The pair of tissues used for identifying the marker
The group number for each marker
A tibble with 51035 rows and 4 variables
Ran Hu [email protected]
The fragment-level methylation states of reads that mapped to the cfSort markers
data("cfsort_reads")
data("cfsort_reads")
A tibble with 99999 rows and 2 variables
Name of the cfSort marker
Fragment-level methylation states, which are represented by a sequence of binary values (0 represents unmethylated CpG and 1 represents methylated CpG on the same fragment)
A tibble with 99999 rows and 2 variables
Ran Hu [email protected]
Methylation information for CpG on the original bottom strand (OB), which is one of the outputs from 'bismark methylation extractor'
data("CpG_OB_demo")
data("CpG_OB_demo")
A tibble with 2224 rows and 5 variables
ID of the sequence
Methylated or unmethylated CpG site
Chromosome name
Chromosome start position
Methylation call
A tibble with 2224 rows and 5 variables
Ran Hu [email protected]
Methylation information for CpG on the original top strand (OT), which is one of the outputs from 'bismark methylation extractor'
data("CpG_OT_demo")
data("CpG_OT_demo")
A tibble with 2556 rows and 5 variables
ID of the sequence
Methylated or unmethylated CpG site
Chromosome name
Chromosome start position
Methylation call
A tibble with 2556 rows and 5 variables
Ran Hu [email protected]
A BED file of fragment-level methylation information
data("demo.fragment_level.meth.bed")
data("demo.fragment_level.meth.bed")
A tibble with 552 rows and 9 variables
Chromosome
Chromosome start
Chromosome end
ID of the sequence
Fragment length
Strand
Number of CpG sites on the fragment
Postions of CpG sites on the fragment
A string of methylation states of CpG sites on the fragment
A tibble with 552 rows and 9 variables
Ran Hu [email protected]
A BED file of fragment-level information
data("demo.refo_frag.bed")
data("demo.refo_frag.bed")
A tibble with 559 rows and 6 variables
Chromosome
Chromosome start
Chromosome end
Fragment length
Strand
ID of the sequence
A tibble with 559 rows and 6 variables
Ran Hu [email protected]
A BED file of methylation information on fragments
data("demo.refo_meth.bed")
data("demo.refo_meth.bed")
A tibble with 552 rows and 8 variables
Chromosome
Start postion of first CpG on the fragment
End postion of first CpG on the fragment
Strand
Number of CpG sites on the fragment
Postions of CpG sites on the fragment
A string of methylation states of CpG sites on the fragment
ID of the sequence
A tibble with 552 rows and 8 variables
Ran Hu [email protected]
Paired-end sequencing reads information
data("demo.sorted.bed")
data("demo.sorted.bed")
A tibble with 1117 rows and 6 variables
Chromosome name
Chromosome start
Chromosome end
Sequence ID
Mapping quality score
Strand
A tibble with 1117 rows and 6 variables
Ran Hu [email protected]
Join two lists containing the fragment information and the methylation states on each fragment into one list.
GenerateFragMeth(frag_bed, meth_bed, output.dir = "", id = "")
GenerateFragMeth(frag_bed, meth_bed, output.dir = "", id = "")
frag_bed |
a BED file containing information for every fragment, which is the output of MergePEReads(). |
meth_bed |
a BED file containing methylation states on every fragment, which is the output of MergeCpGs(). |
output.dir |
a path to the output directory. Default is "", which means the output will not be written into a file. |
id |
an ID name for the input data. Default is "", which means the output will not be written into a file. |
a list in BED file format and/or written to an output BED file.
## input files demo.dir <- system.file("data", package="cfTools") frag_bed <- read.delim(file.path(demo.dir, "demo.refo_frag.bed.txt.gz"), colClasses = "character") meth_bed <- read.delim(file.path(demo.dir, "demo.refo_meth.bed.txt.gz"), colClasses = "character") output <- GenerateFragMeth(frag_bed, meth_bed)
## input files demo.dir <- system.file("data", package="cfTools") frag_bed <- read.delim(file.path(demo.dir, "demo.refo_frag.bed.txt.gz"), colClasses = "character") meth_bed <- read.delim(file.path(demo.dir, "demo.refo_meth.bed.txt.gz"), colClasses = "character") output <- GenerateFragMeth(frag_bed, meth_bed)
Output paired shape parameters of beta distributions for methylation markers.
GenerateMarkerParam(x, sample.types, marker.names, output.file = "")
GenerateMarkerParam(x, sample.types, marker.names, output.file = "")
x |
a list of methylation levels (e.g., beta values), where each row is a sample and each column is a marker. |
sample.types |
a vector of sample types (e.g., tumor or normal, tissue types) corresponding to the rows of the list. |
marker.names |
a vector of marker names corresponding to the columns of the list. |
output.file |
a character string naming the output file. Default is "", which means the output will not be written into a file. |
a list containing the paired shape parameters of beta distributions for markers and/or written to an output file.
## input files demo.dir <- system.file("data", package="cfTools") methLevel <- read.table(file.path(demo.dir, "beta_matrix.txt.gz"), row.names=1, header = TRUE) sampleTypes <- read.table(file.path(demo.dir, "sample_type.txt.gz"), row.names=1, header = TRUE)$sampleType markerNames <- read.table(file.path(demo.dir, "marker_index.txt.gz"), row.names=1, header = TRUE)$markerIndex output <- GenerateMarkerParam(methLevel, sampleTypes, markerNames)
## input files demo.dir <- system.file("data", package="cfTools") methLevel <- read.table(file.path(demo.dir, "beta_matrix.txt.gz"), row.names=1, header = TRUE) sampleTypes <- read.table(file.path(demo.dir, "sample_type.txt.gz"), row.names=1, header = TRUE)$sampleType markerNames <- read.table(file.path(demo.dir, "marker_index.txt.gz"), row.names=1, header = TRUE)$markerIndex output <- GenerateMarkerParam(methLevel, sampleTypes, markerNames)
A vector of marker names corresponding to the columns of the list of methylation levels.
data("marker_index")
data("marker_index")
A tibble with 3 rows and 1 variables
Marker name
A tibble with 3 rows and 1 variables
Ran Hu [email protected]
A BED file of genomic regions of markers
data("markers.bed")
data("markers.bed")
A tibble with 3 rows and 4 variables
Chromosome
Chromosome start
Chromosome end
Marker name
A tibble with 3 rows and 4 variables
Ran Hu [email protected]
Merge the methylation states of all CpGs corresponding to the same fragment onto one line in output.
MergeCpGs(CpG_OT, CpG_OB, output.dir = "", id = "")
MergeCpGs(CpG_OT, CpG_OB, output.dir = "", id = "")
CpG_OT |
a file of methylation information for CpG on the original top strand (OT), which is one of the outputs from 'bismark methylation extractor'. |
CpG_OB |
a file of methylation information for CpG on the original bottom strand (OB), which is one of the outputs from 'bismark methylation extractor'. |
output.dir |
a path to the output directory. Default is "", which means the output will not be written into a file. |
id |
an ID name for the input data. Default is "", which means the output will not be written into a file. |
a list in BED file format and/or written to an output BED file.
## input files demo.dir <- system.file("data", package="cfTools") CpG_OT <- file.path(demo.dir, "CpG_OT_demo.txt.gz") CpG_OB <- file.path(demo.dir, "CpG_OB_demo.txt.gz") output <- MergeCpGs(CpG_OT, CpG_OB)
## input files demo.dir <- system.file("data", package="cfTools") CpG_OT <- file.path(demo.dir, "CpG_OT_demo.txt.gz") CpG_OB <- file.path(demo.dir, "CpG_OB_demo.txt.gz") output <- MergeCpGs(CpG_OT, CpG_OB)
Merge BED file (the output of 'bedtools bamtobed') to fragment-level for paired-end sequencing reads.
MergePEReads(bed_file, output.dir = "", id = "")
MergePEReads(bed_file, output.dir = "", id = "")
bed_file |
a (sorted) BED file of paired-end reads. |
output.dir |
a path to the output directory. Default is "", which means the output will not be written into a file. |
id |
an ID name for the input data. Default is "", which means the output will not be written into a file. |
a list in BED file format and/or written to an output BED file.
## input files demo.dir <- system.file("data", package="cfTools") PEReads <- file.path(demo.dir, "demo.sorted.bed.txt.gz") output <- MergePEReads(PEReads)
## input files demo.dir <- system.file("data", package="cfTools") PEReads <- file.path(demo.dir, "demo.sorted.bed.txt.gz") output <- MergePEReads(PEReads)
A vector of sample types (e.g., tumor or normal, tissue types) corresponding to the rows of the list of methylation levels.
data("sample_type")
data("sample_type")
A tibble with 20 rows and 1 variables
Sample type
A tibble with 20 rows and 1 variables
Ran Hu [email protected]