Package 'HiCDCPlus' reference manual

Title:	Hi-C Direct Caller Plus
Description:	Systematic 3D interaction calls and differential analysis for Hi-C and HiChIP. The HiC-DC+ (Hi-C/HiChIP direct caller plus) package enables principled statistical analysis of Hi-C and HiChIP data sets – including calling significant interactions within a single experiment and performing differential analysis between conditions given replicate experiments – to facilitate global integrative studies. HiC-DC+ estimates significant interactions in a Hi-C or HiChIP experiment directly from the raw contact matrix for each chromosome up to a specified genomic distance, binned by uniform genomic intervals or restriction enzyme fragments, by training a background model to account for random polymer ligation and systematic sources of read count variation.
Authors:	Merve Sahin [cre, aut]
Maintainer:	Merve Sahin <[email protected]>
License:	GPL-3
Version:	1.15.0
Built:	2025-03-19 05:08:55 UTC
Source:	https://github.com/bioc/HiCDCPlus

add_1D_features

Description

Adds 1D features to the gi_list instance. If any bin on gi_list overlaps with multiple feature records, feature values are aggregated for the bin according to the vector valued function agg (e.g., sum, mean)

Usage

add_1D_features(gi_list, df, chrs = NULL, features = NULL, agg = mean)
add_1D_features(gi_list, df, chrs = NULL, features = NULL, agg = mean)

Arguments

`gi_list`	List of `GenomicInteractions` objects where each object named with chromosomes contains intrachromosomal interaction information (see `?gi_list_validate` for a detailed explanation of valid `gi_list` instances).
`df`	DataFrame with columns named 'chr', and'start' and features to be added with their respective names.
`chrs`	a subset of chromosomes' e.g., c('chr21','chr22'). Defaults to all chromosomes specified in the data frame `df`.
`features`	features to be added. Needs to be a subset of `colnames(df)`. Defaults to all columns in `df` other than 'chr','start',and 'end'.
`agg`	any vector valued function with one data argument: defaults to `mean`.

Value

a gi_list instance with 1D features stored in regions metadata handle of each list element (e.g., gi_list[[1]]@regions@elementMetadata) in the instance

Examples

df<-data.frame(chr='chr9',start=seq(1e6,10e6,1e6),end=seq(2e6,11e6,1e6))
gi_list<-generate_df_gi_list(df)
feats<-data.frame(chr='chr9',start=seq(1e6,10e6,1e6),gc=runif(10))
gi_list<-add_1D_features(gi_list,feats)
df<-data.frame(chr='chr9',start=seq(1e6,10e6,1e6),end=seq(2e6,11e6,1e6))
gi_list<-generate_df_gi_list(df)
feats<-data.frame(chr='chr9',start=seq(1e6,10e6,1e6),gc=runif(10))
gi_list<-add_1D_features(gi_list,feats)

add_2D_features

Description

Adds 2D features to a gi_list instance. If any bin on gi_list overlaps with multiple feature records, features are aggregated among matches according to the univariate vector valued function agg (e.g., sum, mean). For efficient use of memory, using add/expand 1D features (see ?add_1D_features and expand_1D_features) in sequence is recommended instead of using add_2D_features directly for each chromosome.

Usage

add_2D_features(gi, df, features = NULL, agg = sum)
add_2D_features(gi, df, features = NULL, agg = sum)

Arguments

`gi`	Element of a valid `gi_list` instance (restricted to a single chromosome e.g., `gi_list[['chr9']]`—see `?gi_list_validate` for a detailed explanation of valid `gi_list` instances).
`df`	data frame for a single chromosome containing columns named chr, startI and startJ and features to be added with their respective names (if df contains multiple chromosomes, you can convert it into a list of smaller data.frames for each chromosome and apply this function with `sapply`).
`features`	features to be added. Needs to be subset of `colnames(df)`. Defaults to all columns in `df` other than 'chr','start',and 'end'.
`agg`	any vector valued function with one data argument: defaults to `mean`.

Value

a gi_list element with 2D features stored in metadata handle (i.e., mcols(gi)).

Examples

df<-data.frame(chr='chr9',start=seq(1e6,10e6,1e6))
gi_list<-generate_df_gi_list(df,Dthreshold=500e3)
feats<-data.frame(chr='chr9',
startI=seq(1e6,10e6,1e6),startJ=seq(1e6,10e6,1e6),counts=rpois(10,lambda=5))
gi_list[['chr9']]<-add_2D_features(gi_list[['chr9']],feats)
df<-data.frame(chr='chr9',start=seq(1e6,10e6,1e6))
gi_list<-generate_df_gi_list(df,Dthreshold=500e3)
feats<-data.frame(chr='chr9',
startI=seq(1e6,10e6,1e6),startJ=seq(1e6,10e6,1e6),counts=rpois(10,lambda=5))
gi_list[['chr9']]<-add_2D_features(gi_list[['chr9']],feats)

add_hic_counts

Description

This function adds counts from a .hic file into a valid, binned, gi_list instance.

Usage

add_hic_counts(gi_list, hic_path, chrs = NULL, add_inter = FALSE)
add_hic_counts(gi_list, hic_path, chrs = NULL, add_inter = FALSE)

Arguments

`gi_list`	valid, uniformly binned gi_list instance. See `?gi_list_validate` and `gi_list_binsize_detect` for details.
`hic_path`	path to the .hic file
`chrs`	a subset of chromosomes' e.g., c('chr21','chr22'). Defaults to all chromosomes in the `gi_list` instance.
`add_inter`	Interchromosomal interaction counts added as a 1D feature named 'inter' on regions metadata handle of each gi_list element (e.g., `gi_list[[1]]@regions@elementMetadata` or not; default FALSE

Value

gi_list instance with counts on the metadata (e.g., mcols(gi_list[[1]]) handle on each list element, and 'inter' on regions metadata handle of each element if add_inter=TRUE.

Examples

gi_list<-generate_binned_gi_list(50e3,chrs='chr22')
gi_list<-add_hic_counts(gi_list,
hic_path=system.file("extdata", "GSE63525_HMEC_combined_example.hic",
package = "HiCDCPlus"))
gi_list<-generate_binned_gi_list(50e3,chrs='chr22')
gi_list<-add_hic_counts(gi_list,
hic_path=system.file("extdata", "GSE63525_HMEC_combined_example.hic",
package = "HiCDCPlus"))

add_hicpro_allvalidpairs_counts

Description

This function converts HiC-Pro outputs in allValidPairs format into a gi_list instance.

Usage

add_hicpro_allvalidpairs_counts(
  gi_list,
  allvalidpairs_path,
  chrs = NULL,
  binned = TRUE,
  add_inter = FALSE
)
add_hicpro_allvalidpairs_counts(
  gi_list,
  allvalidpairs_path,
  chrs = NULL,
  binned = TRUE,
  add_inter = FALSE
)

Arguments

`gi_list`	valid gi_list instance. See `?gi_list_validate` for details. You can also detect whether a gi_list instance is uniformly binned, along with its bin size using `gi_list_binsize_detect`.
`allvalidpairs_path`	allValidPairsfile obtained from HiC-Pro (e.g., 'GSM2572593_con_rep1.allvalidPairs.txt')
`chrs`	a subset of chromosomes' e.g., c('chr21','chr22'). Defaults to all chromosomes in the `gi_list` instance.
`binned`	TRUE if the gi_list instance is uniformly binned (helps faster execution). Defaults to TRUE.
`add_inter`	Interchromosomal interaction counts added as a 1D feature named 'inter' on regions metadata handle of each gi_list element (e.g., `gi_list[[1]]@regions@elementMetadata` or not; default FALSE

Value

gi_list instance with counts on the metadata (e.g., mcols(gi_list[[1]]) handle on each list element, and 'inter' on regions metadata handle of each element if add_inter=TRUE.

add_hicpro_matrix.counts

Description

This function converts HiC-Pro matrix and bed outputs into a gi_list instance.

Usage

add_hicpro_matrix_counts(
  gi_list,
  absfile_path,
  matrixfile_path,
  chrs = NULL,
  add_inter = FALSE
)
add_hicpro_matrix_counts(
  gi_list,
  absfile_path,
  matrixfile_path,
  chrs = NULL,
  add_inter = FALSE
)

Arguments

`gi_list`	valid, uniformly binned gi_list instance. See `?gi_list_validate` and `gi_list_binsize_detect` for details.
`absfile_path`	absfile BED out of HiC-Pro (e.g., 'rawdata_10000_abs.bed')
`matrixfile_path`	matrix count file out of HiC-Pro (e.g., 'rawdata_10000.matrix')
`chrs`	a subset of chromosomes' e.g., c('chr21','chr22'). Defaults to all chromosomes in the `gi_list` instance.
`add_inter`	Interchromosomal interaction counts added as a 1D feature named 'inter' on regions metadata handle of each gi_list element (e.g., `gi_list[[1]]@regions@elementMetadata` or not; default FALSE

Value

gi_list instance with counts on the metadata (e.g., mcols(gi_list[[1]]) handle on each list element, and 'inter' on regions metadata handle of each element if add_inter=TRUE.

construct_features

Description

This function lists all restriction enzyme cutsites of a given genome and genome version with genomic features outlined in Carty et al. (2017) https://www.nature.com/articles/ncomms15454; GC content, mappability, and effective length

Usage

construct_features(
  output_path,
  gen = "Hsapiens",
  gen_ver = "hg19",
  sig = "GATC",
  bin_type = "Bins-uniform",
  binsize = 5000,
  wg_file = NULL,
  chrs = NULL,
  feature_type = "RE-based"
)
construct_features(
  output_path,
  gen = "Hsapiens",
  gen_ver = "hg19",
  sig = "GATC",
  bin_type = "Bins-uniform",
  binsize = 5000,
  wg_file = NULL,
  chrs = NULL,
  feature_type = "RE-based"
)

Arguments

`output_path`	the path to the folder and name prefix you want to place feature files into. The feature file will have the suffix '_bintolen.txt.gz'.
`gen`	name of the species: e.g., default `'Hsapiens'`.
`gen_ver`	genomic assembly version: e.g., default `'hg19'`.
`sig`	restriction enzyme cut pattern (or a vector of patterns; e.g., 'GATC' or c('GATC','GANTC')).
`bin_type`	'Bins-uniform' if uniformly binned by binsize in bp, or 'Bins-RE-sites' if binned by number of restriction enzyme fragments.
`binsize`	binsize in bp if bin_type='Bins-uniform' (or number of RE fragment cut sites if bin_type='Bins-RE-sites'), defaults to 5000.
`wg_file`	path to the bigWig file containing mappability values across the genome of interest.
`chrs`	select a subset of chromosomes' e.g., c('chr21','chr22'). Defaults to all chromosomes (except Y and M) in the genome specified.
`feature_type`	'RE-based' if features are to be computed based on restriction enzyme fragments. 'RE-agnostic' ignores restriction enzyme cutsite information and computes features gc and map based on binwide averages. bin_type has to be 'Bins-uniform' if `feature_type='RE-agnostic'`.

Value

a features 'bintolen' file that contains GC, mappability and length features.

Examples

outdir<-paste0(tempdir(check=TRUE),'/')
construct_features(output_path=outdir,gen='Hsapiens',
gen_ver='hg19',sig=c('GATC','GANTC'),bin_type='Bins-uniform',binsize=100000,
wg_file=NULL,chrs=c('chr21'))
outdir<-paste0(tempdir(check=TRUE),'/')
construct_features(output_path=outdir,gen='Hsapiens',
gen_ver='hg19',sig=c('GATC','GANTC'),bin_type='Bins-uniform',binsize=100000,
wg_file=NULL,chrs=c('chr21'))

construct_features_chr

Description

This function lists all restriction enzyme cutsites of a given genome and genome version with genomic features outlined in Carty et al. (2017) for a single chromosome. https://www.nature.com/articles/ncomms15454; GC content, mappability, and effective length

Usage

construct_features_chr(
  chrom,
  gen = "Hsapiens",
  gen_ver = "hg19",
  sig = "GATC",
  bin_type = "Bins-uniform",
  binsize = 5000,
  wg_file = NULL,
  feature_type = "RE-based"
)
construct_features_chr(
  chrom,
  gen = "Hsapiens",
  gen_ver = "hg19",
  sig = "GATC",
  bin_type = "Bins-uniform",
  binsize = 5000,
  wg_file = NULL,
  feature_type = "RE-based"
)

Arguments

`chrom`	select a chromosome.
`gen`	name of the species: e.g., default `'Hsapiens'`.
`gen_ver`	genomic assembly version: e.g., default `'hg19'`.
`sig`	restriction enzyme cut pattern (or a vector of patterns; e.g., 'GATC' or c('GATC','GANTC')).
`bin_type`	'Bins-uniform' if uniformly binned by binsize in bp, or 'Bins-RE-sites' if binned by number of restriction enzyme fragments.
`binsize`	binsize in bp if bin_type='Bins-uniform' (or number of RE fragment cut sites if bin_type='Bins-RE-sites'), defaults to 5000.
`wg_file`	path to the bigWig file containing mappability values across the genome of interest.
`feature_type`	'RE-based' if features are to be computed based on restriction enzyme fragments. 'RE-agnostic' ignores restriction enzyme cutsite information and computes features gc and map based on binwide averages. bin_type has to be 'Bins-uniform' if `feature_type='RE-agnostic'`.

Value

a features 'bintolen' file that contains GC, mappability and length features.

Examples

df<-construct_features_chr(chrom='chr22',
gen='Hsapiens', gen_ver='hg19',sig=c('GATC','GANTC'),bin_type='Bins-uniform',
binsize=100000,wg_file=NULL)
df<-construct_features_chr(chrom='chr22',
gen='Hsapiens', gen_ver='hg19',sig=c('GATC','GANTC'),bin_type='Bins-uniform',
binsize=100000,wg_file=NULL)

construct_features_parallel

Description

Usage

construct_features_parallel(
  output_path,
  gen = "Hsapiens",
  gen_ver = "hg19",
  sig = "GATC",
  bin_type = "Bins-uniform",
  binsize = 5000,
  wg_file = NULL,
  chrs = NULL,
  feature_type = "RE-based",
  ncore = NULL
)
construct_features_parallel(
  output_path,
  gen = "Hsapiens",
  gen_ver = "hg19",
  sig = "GATC",
  bin_type = "Bins-uniform",
  binsize = 5000,
  wg_file = NULL,
  chrs = NULL,
  feature_type = "RE-based",
  ncore = NULL
)

Arguments

`output_path`	the path to the folder and name prefix you want to place feature files into. The feature file will have the suffix '_bintolen.txt.gz'.
`gen`	name of the species: e.g., default `'Hsapiens'`.
`gen_ver`	genomic assembly version: e.g., default `'hg19'`.
`sig`	restriction enzyme cut pattern (or a vector of patterns; e.g., 'GATC' or c('GATC','GANTC')).
`bin_type`	'Bins-uniform' if uniformly binned by binsize in bp, or 'Bins-RE-sites' if binned by number of restriction enzyme fragments.
`binsize`	binsize in bp if bin_type='Bins-uniform' (or number of RE fragment cut sites if bin_type='Bins-RE-sites'), defaults to 5000.
`wg_file`	path to the bigWig file containing mappability values across the genome of interest.
`chrs`	select a subset of chromosomes' e.g., c('chr21','chr22'). Defaults to all chromosomes (except Y and M) in the genome specified.
`feature_type`	'RE-based' if features are to be computed based on restriction enzyme fragments. 'RE-agnostic' ignores restriction enzyme cutsite information and computes features gc and map based on binwide averages. bin_type has to be 'Bins-uniform' if `feature_type='RE-agnostic'`.
`ncore`	Number of cores to parallelize. Defaults to `parallel::detectCores()-1`.

Value

a features 'bintolen' file that contains GC, mappability and length features.

Examples

outdir<-paste0(tempdir(check=TRUE),'/')
construct_features_parallel(output_path=outdir,gen='Hsapiens',
gen_ver='hg19',sig=c('GATC','GANTC'),bin_type='Bins-uniform',binsize=100000,
wg_file=NULL,chrs=c('chr21'),ncore=2)
outdir<-paste0(tempdir(check=TRUE),'/')
construct_features_parallel(output_path=outdir,gen='Hsapiens',
gen_ver='hg19',sig=c('GATC','GANTC'),bin_type='Bins-uniform',binsize=100000,
wg_file=NULL,chrs=c('chr21'),ncore=2)

expand_1D_features

Description

Expands 1D features on the regions metadata handle of each list element (e.g., gi_list[[1]]@regions@elementMetadata) to the to 2D metadata e.g., mcols(gi_list[[1]])). Two feature values corresponding to each anchor is summarized as a score using a vector valued function agg that takes two vector valued arguments of the same size and outputs a vector of the same size as the input vectors. This defaults to the transform.vec function outlined in (Carty et al., 2017). For efficient use of memory, using add/expand 1D features (see ?add_1D_features and expand_1D_features) in sequence is recommended instead of using add_2D_features directly for each chromosome.

Usage

expand_1D_features(gi_list, chrs = NULL, features = NULL, agg = transform.vec)
expand_1D_features(gi_list, chrs = NULL, features = NULL, agg = transform.vec)

Arguments

`gi_list`	List of `GenomicInteractions` objects where each object named with chromosomes contains intra-chromosomal interaction information (see `?gi_list_validate` for a detailed explanation of valid `gi_list` instances).
`chrs`	a subset of chromosomes' e.g., c('chr21','chr22'). Defaults to all chromosomes in the `gi_list` instance.
`features`	features to be added. Defaults to all 1D features in elements of `gi_list[[1]]@regions@elementMetadata`
`agg`	any vector valued function with two data arguments: defaults to `transform.vec` described in HiC-DC (Carty et al., 2017).

Value

a gi_list element with 2D features stored in metadata handle (i.e., mcols(gi)).

Examples

df<-data.frame(chr='chr9',start=seq(1e6,10e6,1e6),end=seq(2e6,11e6,1e6))
gi_list<-generate_df_gi_list(df)
feats<-data.frame(chr='chr9',start=seq(1e6,10e6,1e6),gc=runif(10))
gi_list<-add_1D_features(gi_list,feats)
gi_list<-expand_1D_features(gi_list)
df<-data.frame(chr='chr9',start=seq(1e6,10e6,1e6),end=seq(2e6,11e6,1e6))
gi_list<-generate_df_gi_list(df)
feats<-data.frame(chr='chr9',start=seq(1e6,10e6,1e6),gc=runif(10))
gi_list<-add_1D_features(gi_list,feats)
gi_list<-expand_1D_features(gi_list)

extract_hic_eigenvectors

Description

This function uses Juicer command line tools to extract first eigenvectors across chromosomes from counts data in a .hic file and outputs them to text file of the structure chr start end score where the score column contains the eigenvector elements.

Usage

extract_hic_eigenvectors(
  hicfile,
  mode = "KR",
  binsize = 250e3,
  chrs = NULL,
  gen = "Hsapiens",
  gen_ver = "hg19"
)
extract_hic_eigenvectors(
  hicfile,
  mode = "KR",
  binsize = 250e3,
  chrs = NULL,
  gen = "Hsapiens",
  gen_ver = "hg19"
)

Arguments

`hicfile`	path to the input .hic file.
`mode`	Normalization mode to extract first eigenvectors from Allowable options are: 'NONE' for raw (normalized counts if .hic file is written using `hicdc2hic` or `hic2icenorm_gi_list`), 'KR' for Knight-Ruiz normalization, 'VC' for Vanilla-Coverage normalization and 'VC_SQRT' for square root vanilla coverage. Defaults to 'KR'.
`binsize`	the uniform binning size for compartment scores in bp. Defaults to 100e3.
`chrs`	a subset of chromosomes' e.g., c('chr21','chr22'). Defaults to all chromosomes except "Y", and "M" for the specified `gen` and `gen_ver`.
`gen`	name of the species: e.g., default `'Hsapiens'`.
`gen_ver`	genomic assembly version: e.g., default `'hg19'`.

Value

path to the eigenvector text files for each chromosome containing chromosome, start, end and compartment score values that may need to be flipped signs for each chromosome. File paths follow gsub('.hic','_<chromosome>_eigenvectors.txt',hicfile)

Examples

eigenvector_filepaths<-extract_hic_eigenvectors(
hicfile=system.file("extdata", "eigenvector_example.hic",
package = "HiCDCPlus"),
chrs=c("chr22"),binsize=250e3,mode="NONE")
eigenvector_filepaths<-extract_hic_eigenvectors(
hicfile=system.file("extdata", "eigenvector_example.hic",
package = "HiCDCPlus"),
chrs=c("chr22"),binsize=250e3,mode="NONE")

generate_binned_gi_list

Description

Generates a valid uniformly binned gi_list instance.

Usage

generate_binned_gi_list(
  binsize,
  chrs = NULL,
  Dthreshold = 2e+06,
  gen = "Hsapiens",
  gen_ver = "hg19"
)
generate_binned_gi_list(
  binsize,
  chrs = NULL,
  Dthreshold = 2e+06,
  gen = "Hsapiens",
  gen_ver = "hg19"
)

Arguments

`binsize`	Desired binsize in bp, e.g., 5000, 25000.
`chrs`	a subset of chromosomes' e.g., c('chr21','chr22'). Defaults to all chromosomes except "Y", and "M" for the specified `gen` and `gen_ver`.
`Dthreshold`	maximum distance (included) to check for significant interactions, defaults to 2e6 or maximum in the data; whichever is smaller.
`gen`	name of the species: e.g., default `'Hsapiens'`.
`gen_ver`	genomic assembly version: e.g., default `'hg19'`.

Value

a valid, uniformly binned gi_list instance.

Examples

gi_list<-generate_binned_gi_list(1e6,chrs='chr22')
gi_list<-generate_binned_gi_list(1e6,chrs='chr22')

generate_bintolen_gi_list

Description

Generates a gi_list instance from a bintolen file generated by generate.features (see ?generate.features) for details).

Usage

generate_bintolen_gi_list(
  bintolen_path,
  chrs = NULL,
  Dthreshold = 2e+06,
  binned = TRUE,
  binsize = NULL,
  gen = "Hsapiens",
  gen_ver = "hg19"
)
generate_bintolen_gi_list(
  bintolen_path,
  chrs = NULL,
  Dthreshold = 2e+06,
  binned = TRUE,
  binsize = NULL,
  gen = "Hsapiens",
  gen_ver = "hg19"
)

Arguments

`bintolen_path`	path to the flat file containing columns named bins and features
`chrs`	select a subset of chromosomes' e.g., c('chr21','chr22'). Defaults to all chromosomes specified in the bintolen file.
`Dthreshold`	maximum distance (included) to check for significant interactions, defaults to 2e6 or maximum in the data; whichever is smaller.
`binned`	TRUE if the bintolen file is uniformly binned. Defaults to TRUE.
`binsize`	bin size in bp to be generated for the object. Defaults to the binsize in the bintolen file, if exists.
`gen`	name of the species: e.g., default `'Hsapiens'`
`gen_ver`	genomic assembly version: e.g., default `'hg19'`

Value

a valid gi_list instance with genomic features derived from specified restriction enzyme cut patterns when generating the bintolen file using construct_features (see ?construct_features for help). Genomic 1D features are stored in the regions metadata handle of each list element (e.g., gi_list[[1]]@regions@elementMetadata).

Examples

chrs<-'chr22'
bintolen_path<-system.file("extdata", "test_bintolen.txt.gz",
package = "HiCDCPlus")
gi_list<-generate_bintolen_gi_list(bintolen_path,chrs)
chrs<-'chr22'
bintolen_path<-system.file("extdata", "test_bintolen.txt.gz",
package = "HiCDCPlus")
gi_list<-generate_bintolen_gi_list(bintolen_path,chrs)

generate_df_gi_list

Description

Generates a gi_list instance from a data frame object describing the regions.

Usage

generate_df_gi_list(
  df,
  chrs = NULL,
  Dthreshold = 2e+06,
  gen = "Hsapiens",
  gen_ver = "hg19"
)
generate_df_gi_list(
  df,
  chrs = NULL,
  Dthreshold = 2e+06,
  gen = "Hsapiens",
  gen_ver = "hg19"
)

Arguments

`df`	DataFrame with columns named 'chr', 'start', (and optionally 'end', if the regions have gaps) and 1D features with their respective column names.
`chrs`	select a subset of chromosomes' e.g., c('chr21','chr22'). Defaults to all chromosomes specified in `df`.
`Dthreshold`	maximum distance (included) to check for significant interactions, defaults to 2e6 or maximum in the data, whichever is smaller.
`gen`	name of the species: e.g., default `'Hsapiens'`
`gen_ver`	genomic assembly version: e.g., default `'hg19'`

Value

a valid gi_list instance with genomic features supplied from df. Genomic 1D features are stored in the regions metadata handle of each list element (e.g., gi_list[[1]]@regions@elementMetadata).

Examples

df<-data.frame(chr='chr9',start=seq(1e6,10e6,1e6))
gi_list<-generate_df_gi_list(df)
df<-data.frame(chr='chr9',start=seq(1e6,10e6,1e6))
gi_list<-generate_df_gi_list(df)

get_chr_sizes

Description

This function finds all chromosome sizes of a given genome, genome version and set of chromosomes.

Usage

get_chr_sizes(gen = "Hsapiens", gen_ver = "hg19", chrs = NULL)
get_chr_sizes(gen = "Hsapiens", gen_ver = "hg19", chrs = NULL)

Arguments

`gen`	name of the species: e.g., default `'Hsapiens'`
`gen_ver`	genomic assembly version: e.g., default `'hg19'`
`chrs`	select a subset of chromosomes' e.g., c('chr21','chr22'). Defaults to all chromosomes (except Y and M) in the genome specified.

Value

named vector containing names as chromosomes and values as chromosome sizes.

Examples

get_chr_sizes('Hsapiens','hg19',c('chr21','chr22'))
get_chr_sizes('Hsapiens','hg19',c('chr21','chr22'))

get_chrs

Description

This function finds all chromosomes of a given genome and genome version except for Y and M.

Usage

get_chrs(gen = "Hsapiens", gen_ver = "hg19")
get_chrs(gen = "Hsapiens", gen_ver = "hg19")

Arguments

`gen`	name of the species: e.g., default `'Hsapiens'`
`gen_ver`	genomic assembly version: e.g., default `'hg19'`

Value

string vector of chromosomes.

Examples

get_chrs('Hsapiens','hg19')
get_chrs('Hsapiens','hg19')

get_enzyme_cutsites

Description

This function finds all restriction enzyme cutsites of a given genome, genome version, and set of cut patterns

Usage

get_enzyme_cutsites(sig, gen = "Hsapiens", gen_ver = "hg19", chrs = NULL)
get_enzyme_cutsites(sig, gen = "Hsapiens", gen_ver = "hg19", chrs = NULL)

Arguments

`sig`	a set of restriction enzyme cut patterns (e.g., 'GATC' or c('GATC','GANTC'))
`gen`	name of the species: e.g., default `'Hsapiens'`
`gen_ver`	genomic assembly version: e.g., default `'hg19'`
`chrs`	a subset of chromosomes' e.g., c('chr21','chr22'). Defaults to all chromosomes (except Y and M) in the genome specified by `gen` and `gen_ver`.

Value

list of chromosomes.

Examples

get_enzyme_cutsites(gen='Hsapiens',gen_ver='hg19',
sig=c('GATC','GANTC'),chrs=c('chr22'))
get_enzyme_cutsites(gen='Hsapiens',gen_ver='hg19',
sig=c('GATC','GANTC'),chrs=c('chr22'))

gi_list_binsize_detect

Description

This function finds the bin size of a uniformly binned valid gi_list instance in bp. It raises an error if the gi_list instance is not uniformly binned.

Usage

gi_list_binsize_detect(gi_list)
gi_list_binsize_detect(gi_list)

Arguments

gi_list

gi_list object to be verified. In order to pass without errors, a gi_list object (1) has to be a list of InteractionSet::GInteractions objects,(2) each list element has to be named as chromosomes and only contain intra-chromosomal interaction information, (3) mcols(.) for each list element should at least contain pairwise genomic distances in a column named 'D' and (4) each list element needs to be uniformly binned

Value

uniform binsize in base pairs or an error if the gi_list instance is not uniformly binned.

Examples

gi_list<-generate_binned_gi_list(1e6,chrs='chr22')
gi_list_binsize_detect(gi_list)
gi_list<-generate_binned_gi_list(1e6,chrs='chr22')
gi_list_binsize_detect(gi_list)

gi_list_Dthreshold_detect

Description

This function finds the maximum genomic distance in a valid gi_list object.

Usage

gi_list_Dthreshold.detect(gi_list)
gi_list_Dthreshold.detect(gi_list)

Arguments

gi_list

A valid gi_list instance. See ?gi_list_validate for more details about the attributes of a valid gi_list instance.

Value

maximum genomic distance in the object

Examples

gi_list<-generate_binned_gi_list(1e6,chrs='chr22')
gi_list_Dthreshold.detect(gi_list)
gi_list<-generate_binned_gi_list(1e6,chrs='chr22')
gi_list_Dthreshold.detect(gi_list)

gi_list_read

Description

Reads a written gi_list instance using gi_list_write into a valid gi_list instance.

Usage

gi_list_read(
  fname,
  chrs = NULL,
  Dthreshold = NULL,
  features = NULL,
  gen = "Hsapiens",
  gen_ver = "hg19"
)
gi_list_read(
  fname,
  chrs = NULL,
  Dthreshold = NULL,
  features = NULL,
  gen = "Hsapiens",
  gen_ver = "hg19"
)

Arguments

`fname`	path to the file to read from (can end with .txt, .rds, or .txt.gz).
`chrs`	select a subset of chromosomes' e.g., c('chr21','chr22'). Defaults to all chromosomes contained in the `fname`.
`Dthreshold`	maximum distance (included) to check for significant interactions, defaults to the maximum in the data.
`features`	Select the subset of features (1-D or 2-D) to be added to the gi_list instance (without the trailing I or J), defaults to all features (score column gets ingested as 'score').
`gen`	name of the species: e.g., default `'Hsapiens'`
`gen_ver`	genomic assembly version: e.g., default `'hg19'`

Value

A valid gi_list instance with 1D features stored in regions metadata handle of each list element (e.g., gi_list[[1]]@regions@elementMetadata) in the instance and with 2D features stored in metadata handle (i.e., mcols(gi)).

Examples

outputdir<-paste0(tempdir(check=TRUE),'/')
gi_list<-generate_binned_gi_list(1e6,chrs='chr22')
gi_list_write(gi_list,paste0(outputdir,'testgiread.txt'))
gi_list2<-gi_list_read(paste0(outputdir,'testgiread.txt'))
outputdir<-paste0(tempdir(check=TRUE),'/')
gi_list<-generate_binned_gi_list(1e6,chrs='chr22')
gi_list_write(gi_list,paste0(outputdir,'testgiread.txt'))
gi_list2<-gi_list_read(paste0(outputdir,'testgiread.txt'))

gi_list_topdom

Description

This function converts a gi_list instance with ICE normalized counts into TAD annotations through an implementation of TopDom v0.0.2 (https://github.com/HenrikBengtsson/TopDom) adapted as TopDom at this package. If you're using this function, please cite TopDom according to the documentation at https://github.com/HenrikBengtsson/TopDom/blob/0.0.2/docs/

Usage

gi_list_topdom(
  gi_list,
  chrs = NULL,
  file_out = FALSE,
  fpath = NULL,
  window.size = 5,
  verbose = FALSE
)
gi_list_topdom(
  gi_list,
  chrs = NULL,
  file_out = FALSE,
  fpath = NULL,
  window.size = 5,
  verbose = FALSE
)

Arguments

`gi_list`	List of `GenomicInteractions` objects where each object named with chromosomes contains intrachromosomal interaction information (see `?gi_list_validate` for a detailed explanation of valid `gi_list` instances).
`chrs`	select a subset of chromosomes' e.g., c('chr21','chr22'). Defaults to chromosomes in `gi_list`.
`file_out`	If true, outputs TAD annotations into files with paths beginning with `fpath`. Defaults to FALSE
`fpath`	Outputs TAD annotations into files with paths beginning in `fpath`.
`window.size`	integer, number of bins to extend. Defaults to 5.
`verbose`	TRUE if you would like to troubleshoot TopDom.

Value

a list instance with TAD annotation reporting for each chromosome

Examples

hic_path<-system.file("extdata", "GSE63525_HMEC_combined_example.hic",
package = "HiCDCPlus")
gi_list<-hic2icenorm_gi_list(hic_path,binsize=50e3,chrs='chr22')
tads<-gi_list_topdom(gi_list)
hic_path<-system.file("extdata", "GSE63525_HMEC_combined_example.hic",
package = "HiCDCPlus")
gi_list<-hic2icenorm_gi_list(hic_path,binsize=50e3,chrs='chr22')
tads<-gi_list_topdom(gi_list)

gi_list_validate

Description

This function validates a gi_list instance.

Usage

gi_list_validate(gi_list)
gi_list_validate(gi_list)

Arguments

gi_list

gi_list object to be verified. In order to pass without errors, a gi_list object (1) has to be a list of InteractionSet::GInteractions objects, (2) each list element has to be named as chromosomes and only contain intra-chromosomal interaction information, (3) mcols(.) for each list element should at least contain pairwise genomic distances in a column named 'D'.

Value

invisible value if the gi_list instance is valid. Otherwise, an error is raised.

Examples

gi_list<-generate_binned_gi_list(1e6,chrs='chr22')
gi_list_validate(gi_list)
gi_list<-generate_binned_gi_list(1e6,chrs='chr22')
gi_list_validate(gi_list)

gi_list_write

Description

Writes a valid gi_list instance into a file.

Usage

gi_list_write(
  gi_list,
  fname,
  chrs = NULL,
  columns = "minimal",
  rows = "all",
  significance_threshold = 0.05,
  score = NULL
)
gi_list_write(
  gi_list,
  fname,
  chrs = NULL,
  columns = "minimal",
  rows = "all",
  significance_threshold = 0.05,
  score = NULL
)

Arguments

`gi_list`	List of `GenomicInteractions` objects where each object named with chromosomes contains intra-chromosomal interaction information (see `?gi_list_validate` for a detailed explanation of valid `gi_list` instances).
`fname`	path to the file to write to (can end with .txt, or .txt.gz).
`chrs`	select a subset of chromosomes' e.g., c('chr21','chr22'). Defaults to all chromosomes in the `gi_list`.
`columns`	Can be 'minimal', which is just distance and counts (and `HiCDCPlus` result columns 'qvalue','pvalue','mu',and 'sdev', if exists; see `?HiCDCPlus`) information, 'minimal_plus_features', which is distance, counts, and other calculated 2D features, 'minimal_plus_score', which generates a .hic pre compatible text file, or 'all', which is distance, counts, calculated 2D features, as well as all 1D features. Defaults to 'minimal'.
`rows`	Can be 'all' or 'significant', which filters rows according to FDR adjusted pvalue column 'qvalue' (this has to exist in `mcols(.)`) at `significance_threshold`. Defaults to 'all'.
`significance_threshold`	Row filtering threshold on 'qvalue'. Defaults to 0.05.
`score`	Score column to extract to .hic pre compatible file. See `mode` options in `?hicdc2hic` for more details.

Value

a tab separated flat file concatenating all intra-chromosomal interaction information.

Examples

outputdir<-paste0(tempdir(check=TRUE),'/')
gi_list<-generate_binned_gi_list(1e6,chrs='chr22')
gi_list_write(gi_list,paste0(outputdir,'test.txt'))
outputdir<-paste0(tempdir(check=TRUE),'/')
gi_list<-generate_binned_gi_list(1e6,chrs='chr22')
gi_list_write(gi_list,paste0(outputdir,'test.txt'))

gi_list2HTClist

Description

This function converts a gi_list instance into a HTClist instance compatible for use with the R Bioconductor package HiTC https://bioconductor.org/packages/HiTC/

Usage

gi_list2HTClist(gi_list, chrs = NULL)
gi_list2HTClist(gi_list, chrs = NULL)

Arguments

`gi_list`	List of `GenomicInteractions` objects with a counts column where each object named with chromosomes contains intra-chromosomal interaction information (minimally containing counts and genomic distance in `mcols(gi_list)`— see `?gi_list_validate` for a detailed explanation of valid `gi_list` instances).
`chrs`	select a subset of chromosomes' e.g., c('chr21','chr22'). Defaults to chromosomes in `gi_list`.

Value

a HTClist instance compatible for use with HiTC

Examples

gi_list<-generate_binned_gi_list(50e3,chrs=c('chr22'))
gi_list<-add_hic_counts(gi_list,
hic_path<-system.file("extdata", "GSE63525_HMEC_combined_example.hic",
package = "HiCDCPlus"))
htc_list<-gi_list2HTClist(gi_list)
gi_list<-generate_binned_gi_list(50e3,chrs=c('chr22'))
gi_list<-add_hic_counts(gi_list,
hic_path<-system.file("extdata", "GSE63525_HMEC_combined_example.hic",
package = "HiCDCPlus"))
htc_list<-gi_list2HTClist(gi_list)

hic2icenorm_gi_list

Description

This function converts a .hic file into a gi_list instance with ICE normalized counts on the counts column for TAD annotation using a copy of TopDom (see ?TopDom_0.0.2) as well as an (optional) .hic file with ICE normalized counts for visualization with Juicebox. This function requires installing the Bioconductor package HiTC.

Usage

hic2icenorm_gi_list(
  hic_path,
  binsize = 50000,
  chrs = NULL,
  hic_output = FALSE,
  gen = "Hsapiens",
  gen_ver = "hg19",
  Dthreshold = Inf
)
hic2icenorm_gi_list(
  hic_path,
  binsize = 50000,
  chrs = NULL,
  hic_output = FALSE,
  gen = "Hsapiens",
  gen_ver = "hg19",
  Dthreshold = Inf
)

Arguments

`hic_path`	Path to the .hic file.
`binsize`	Desired bin size in bp (default 50000).
`chrs`	select a subset of chromosomes' e.g., c('chr21','chr22'). Defaults to chromosomes in `gen` and `gen_ver` except 'chrY' and 'chrM'.
`hic_output`	If TRUE, a .hic file with the name `gsub("\.hic$","_icenorm.hic",hic_path)` is generated containing the ICE normalized counts under 'NONE' normalization.
`gen`	name of the species: e.g., default `'Hsapiens'`
`gen_ver`	genomic assembly version: e.g., default `'hg19'`
`Dthreshold`	maximum distance (included) to check for significant interactions, defaults to maximum in the data.

Value

a thresholded gi_list instance with ICE normalized intra-chromosomal counts for further use with this package, HiCDCPlus.

Examples

hic_path<-system.file("extdata", "GSE63525_HMEC_combined_example.hic",
package = "HiCDCPlus")
gi_list=hic2icenorm_gi_list(hic_path,binsize=50e3,chrs=c('chr22'))
hic_path<-system.file("extdata", "GSE63525_HMEC_combined_example.hic",
package = "HiCDCPlus")
gi_list=hic2icenorm_gi_list(hic_path,binsize=50e3,chrs=c('chr22'))

hicdc2hic

Description

This function converts various modes from HiCDCPlus gi_list (uniformly binned) instance back into a .hic file with the mode passed as counts that can be retrieved using Juicer Dump (https://github.com/aidenlab/juicer/wiki/Data-Extraction) with 'NONE' normalization.

Usage

hicdc2hic(
  gi_list,
  hicfile,
  mode = "normcounts",
  chrs = NULL,
  gen_ver = "hg19",
  memory = 8
)
hicdc2hic(
  gi_list,
  hicfile,
  mode = "normcounts",
  chrs = NULL,
  gen_ver = "hg19",
  memory = 8
)

Arguments

`gi_list`	List of `GenomicInteractions` objects where each object named with chromosomes contains intra-chromosomal interaction information (minimally containing counts and genomic distance in `mcols(gi_list)`— see `?gi_list_validate` for a detailed explanation of valid `gi_list` instances).
`hicfile`	the path to the .hic file
`mode`	What to put to the .hic file as score. Allowable options are: 'pvalue' for -log10 significance p-value, 'qvalue' for -log10 FDR corrected p-value, 'normcounts' for raw counts/expected counts, and 'zvalue' for standardized counts (raw counts-expected counts)/modeled standard deviation of expected counts and 'raw' to pass-through 'raw counts. Defaults to 'normcounts'.
`chrs`	select a subset of chromosomes' e.g., c('chr21','chr22'). Defaults to chromosomes in `gi_list`.
`gen_ver`	genomic assembly version: e.g., default `'hg19'`
`memory`	Java memory to generate .hic files. Defaults to 8. Up to 64 is recommended for higher resolutions.

Value

path of the .hic file.

Examples

outdir<-paste0(tempdir(check=TRUE),'/')
gi_list<-generate_binned_gi_list(50e3,chrs='chr22')
gi_list<-add_hic_counts(gi_list,
hic_path=system.file("extdata", "GSE63525_HMEC_combined_example.hic",
package = "HiCDCPlus"))
hicdc2hic(gi_list,hicfile=paste0(outdir,'out.hic'),
mode='raw')
outdir<-paste0(tempdir(check=TRUE),'/')
gi_list<-generate_binned_gi_list(50e3,chrs='chr22')
gi_list<-add_hic_counts(gi_list,
hic_path=system.file("extdata", "GSE63525_HMEC_combined_example.hic",
package = "HiCDCPlus"))
hicdc2hic(gi_list,hicfile=paste0(outdir,'out.hic'),
mode='raw')

hicdcdiff

Description

This function calculates differential interactions for a set of chromosomes across conditions and replicates. You need to install DESeq2 from Bioconductor to use this function.

Usage

hicdcdiff(
  input_paths,
  filter_file,
  output_path,
  bin_type = "Bins-uniform",
  binsize = 5000,
  granularity = 5000,
  chrs = NULL,
  Dmin = 0,
  Dmax = 2e+06,
  diagnostics = FALSE,
  DESeq.save = FALSE,
  fitType = "local"
)
hicdcdiff(
  input_paths,
  filter_file,
  output_path,
  bin_type = "Bins-uniform",
  binsize = 5000,
  granularity = 5000,
  chrs = NULL,
  Dmin = 0,
  Dmax = 2e+06,
  diagnostics = FALSE,
  DESeq.save = FALSE,
  fitType = "local"
)

Arguments

`input_paths`	a list with names as condition names and values as paths to `gi_list` RDS objects (see `?gi_list_validate` for a detailed explanation of valid `gi_list` instances) saved with `saveRDS` or paths to .hic files for each replicate. e.g.,`list( CTCF=c('~/Downloads/GM_CTCF_rep1_MAPQ30_10kb.rds', '~/Downloads/GM_CTCF_rep2_MAPQ30_10kb.rds'), SMC=c('~/Downloads/GM_SMC_rep1_MAPQ30_10kb.rds', '~/Downloads/GM_SMC_rep2_MAPQ30_10kb.rds'))`
`filter_file`	path to the text file containing columns chr', startI, and startJ denoting the name of the chromosomes and starting coordinates of 2D interaction bins to be compared across conditions, respectively.
`output_path`	the path to the folder and name prefix you want to place DESeq-processed matrices (in a .txt file), plots (if `diagnostics=TRUE`) and DESeq2 objects (if `DESeq.save=TRUE`). Files will be generated for each chromosome.
`bin_type`	'Bins-uniform' if uniformly binned by binsize in bp, or 'Bins-RE-sites' if binned by number of restriction enzyme fragment cutsites!
`binsize`	binsize in bp if bin_type='Bins-uniform' (or number of RE fragments if bin_type='Bins-RE-sites'), e.g., default 5000
`granularity`	Desired distance granularity to base dispersion parameters on in bp. For uniformly binned analysis (i.e., `bin_type=='Bins-uniform'`), this defaults to the bin size. Otherwise, it is 5000.
`chrs`	select a subset of chromosomes' e.g., c('chr21','chr22'). Defaults to all chromosomes (except Y and M) in the filter_file.
`Dmin`	minimum distance (included) to check for significant interactions, defaults to 0. Put Dmin=1 to ignore D=0 bins in calculating normalization factors.
`Dmax`	maximum distance (included) to check for significant interactions, defaults to 2e6 or maximum in the data; whichever is minimum.
`diagnostics`	if TRUE, generates diagnostic plots of the normalization factors, geometric means of such factors by distance bin, as well as MA Plots (see DESeq documentation for details about MA plots). Defaults to FALSE.
`DESeq.save`	if TRUE, saves the DESeq objects for each chromosome as an .rds file in the `output_path`. Defaults to FALSE.
`fitType`	follows fitType in `DESeq2::estimateDispersions`. Allowable options are 'parametric' (parametric regression),'local' (local regression), and 'mean' (constant across interaction bins). Default is 'local'.

Value

paths of a list of three entities. outputpaths will have differential bins among those in filter_file. deseq2paths will have the DESeq2 object stored as an .rds file. Available if DESeq.save=TRUE plotpaths will have diagnostic plots (e.g., MA, dispersion, PCA) if diagnostics=TRUE.

Examples

outputdir<-paste0(tempdir(check=TRUE),'/')
hicdcdiff(input_paths=list(NSD2=c(
system.file("extdata", "GSE131651_NSD2_LOW_arima_example.hic",
package = "HiCDCPlus"),
system.file("extdata", "GSE131651_NSD2_HIGH_arima_example.hic",
package = "HiCDCPlus")),
TKO=c(system.file("extdata", "GSE131651_TKOCTCF_new_example.hic",
package = "HiCDCPlus"),
system.file("extdata", "GSE131651_NTKOCTCF_new_example.hic",
package = "HiCDCPlus"))),
filter_file=system.file("extdata", "GSE131651_analysis_indices.txt.gz",
package = "HiCDCPlus"),
         chrs='chr22',
         output_path=outputdir,
         fitType = 'mean',
         binsize=50000,
         diagnostics=FALSE)
outputdir<-paste0(tempdir(check=TRUE),'/')
hicdcdiff(input_paths=list(NSD2=c(
system.file("extdata", "GSE131651_NSD2_LOW_arima_example.hic",
package = "HiCDCPlus"),
system.file("extdata", "GSE131651_NSD2_HIGH_arima_example.hic",
package = "HiCDCPlus")),
TKO=c(system.file("extdata", "GSE131651_TKOCTCF_new_example.hic",
package = "HiCDCPlus"),
system.file("extdata", "GSE131651_NTKOCTCF_new_example.hic",
package = "HiCDCPlus"))),
filter_file=system.file("extdata", "GSE131651_analysis_indices.txt.gz",
package = "HiCDCPlus"),
         chrs='chr22',
         output_path=outputdir,
         fitType = 'mean',
         binsize=50000,
         diagnostics=FALSE)

HiCDCPlus

Description

This function finds significant interactions in a HiC-DC readable matrix and expresses statistical significance of counts through the following: 'pvalue': significance P-value, 'qvalue': FDR corrected P-value, mu': expected counts, 'sdev': modeled standard deviation of expected counts.

Usage

HiCDCPlus(
  gi_list,
  covariates = NULL,
  chrs = NULL,
  distance_type = "spline",
  model_distribution = "nb",
  binned = TRUE,
  df = 6,
  Dmin = 0,
  Dmax = 2e+06,
  ssize = 0.01,
  splineknotting = "uniform",
  model_filepath = NULL
)
HiCDCPlus(
  gi_list,
  covariates = NULL,
  chrs = NULL,
  distance_type = "spline",
  model_distribution = "nb",
  binned = TRUE,
  df = 6,
  Dmin = 0,
  Dmax = 2e+06,
  ssize = 0.01,
  splineknotting = "uniform",
  model_filepath = NULL
)

Arguments

`gi_list`	List of `GenomicInteractions` objects where each object named with chromosomes contains intrachromosomal interaction information (minimally containing counts and genomic distance in `mcols(gi_list[[1]])`—see `?gi_list_validate` for a detailed explanation of valid `gi_list` instances).
`covariates`	covariates to be considered in addition to genomic distance D. Defaults to all covariates besides 'D','counts','mu','sdev',pvalue','qvalue' in `mcols(gi_list[[1]])`
`chrs`	select a subset of chromosomes' e.g., c('chr21','chr22'). Defaults to all chromosomes in the `gi_list`.
`distance_type`	distance covariate form: 'spline' or 'log'. Defaults to 'spline'.
`model_distribution`	'nb' uses a Negative Binomial model, 'nb_vardisp' uses a Negative Binomial model with a distance specific dispersion parameter inferred from the data, 'nb_hurdle' uses the legacy HiCDC model.
`binned`	TRUE if uniformly binned or FALSE if binned by restriction enzyme fragment cutsites
`df`	degrees of freedom for the genomic distance spline function if `distance_type='spline'`. Defaults to 6, which corresponds to a cubic spline as explained in Carty et al. (2017)
`Dmin`	minimum distance (included) to check for significant interactions, defaults to 0
`Dmax`	maximum distance (included) to check for significant interactions, defaults to 2e6 or maximum in the data; whichever is minimum.
`ssize`	Distance stratified sampling size. Can decrease for large chromosomes. Increase recommended if model fails to converge. Defaults to 0.01.
`splineknotting`	Spline knotting strategy. Either "uniform", uniformly spaced in distance, or placed based on distance distribution of counts "count-based" (i.e., more closely spaced where counts are more dense).
`model_filepath`	Outputs fitted HiC-DC model object as an .rds file per chromosome. Defaults to NULL (no output).

Value

A valid gi_list instance with additional mcols(.) for each chromosome: pvalue': significance P-value, 'qvalue': FDR corrected P-value, mu': expected counts, 'sdev': modeled standard deviation of expected counts.

Examples

gi_list<-generate_binned_gi_list(50e3,chrs='chr22')
gi_list<-add_hic_counts(gi_list,
hic_path<-system.file("extdata", "GSE63525_HMEC_combined_example.hic",
package = "HiCDCPlus"))
gi_list<-HiCDCPlus(gi_list)
gi_list<-generate_binned_gi_list(50e3,chrs='chr22')
gi_list<-add_hic_counts(gi_list,
hic_path<-system.file("extdata", "GSE63525_HMEC_combined_example.hic",
package = "HiCDCPlus"))
gi_list<-HiCDCPlus(gi_list)

HiCDCPlus_chr

Description

This function finds significant interactions in a HiC-DC readable matrix restricted to a single chromosome and expresses statistical significance of counts through the following: 'pvalue': significance P-value, 'qvalue': FDR corrected P-value, mu': expected counts, 'sdev': modeled standard deviation of expected counts.

Usage

HiCDCPlus_chr(
  gi,
  covariates = NULL,
  distance_type = "spline",
  model_distribution = "nb",
  binned = TRUE,
  df = 6,
  Dmin = 0,
  Dmax = 2e+06,
  ssize = 0.01,
  splineknotting = "uniform",
  model_filepath = NULL
)
HiCDCPlus_chr(
  gi,
  covariates = NULL,
  distance_type = "spline",
  model_distribution = "nb",
  binned = TRUE,
  df = 6,
  Dmin = 0,
  Dmax = 2e+06,
  ssize = 0.01,
  splineknotting = "uniform",
  model_filepath = NULL
)

Arguments

`gi`	Instance of a single chromosome `GenomicInteractions` object containing intra-chromosomal interaction information (minimally containing counts and genomic distance).
`covariates`	covariates to be considered in addition to genomic distance D. Defaults to all covariates besides 'D','counts','mu','sdev',pvalue','qvalue' in `mcols(gi)`
`distance_type`	distance covariate form: 'spline' or 'log'. Defaults to 'spline'.
`model_distribution`	'nb' uses a Negative Binomial model, 'nb_vardisp' uses a Negative Binomial model with a distance specific dispersion parameter inferred from the data, 'nb_hurdle' uses the legacy HiC-DC model.
`binned`	TRUE if uniformly binned or FALSE if binned by restriction enzyme fragment cut sites.
`df`	degrees of freedom for the genomic distance spline function if `distance_type='spline'`. Defaults to 6, which corresponds to a cubic spline as explained in Carty et al. (2017)
`Dmin`	minimum distance (included) to check for significant interactions, defaults to 0
`Dmax`	maximum distance (included) to check for significant interactions, defaults to 2e6 or maximum in the data; whichever is minimum.
`ssize`	Distance stratified sampling size. Can decrease for large chromosomes. Increase recommended if model fails to converge. Defaults to 0.01.
`splineknotting`	Spline knotting strategy. Either "uniform", uniformly spaced in distance, or placed based on distance distribution of counts "count-based" (i.e., more closely spaced where counts are more dense).
`model_filepath`	Outputs fitted HiC-DC model object as an .rds file with chromosome name indicatd on it. Defaults to NULL (no output).

Value

A valid gi instance with additional mcols(.): pvalue': significance P-value, 'qvalue': FDR corrected P-value, mu': expected counts, 'sdev': modeled standard deviation of expected counts.

Examples

gi_list<-generate_binned_gi_list(50e3,chrs='chr22')
gi_list<-add_hic_counts(gi_list,
hic_path<-system.file("extdata", "GSE63525_HMEC_combined_example.hic",
package = "HiCDCPlus"))
gi<-HiCDCPlus_chr(gi_list[[1]])
gi_list<-generate_binned_gi_list(50e3,chrs='chr22')
gi_list<-add_hic_counts(gi_list,
hic_path<-system.file("extdata", "GSE63525_HMEC_combined_example.hic",
package = "HiCDCPlus"))
gi<-HiCDCPlus_chr(gi_list[[1]])

HiCDCPlus_parallel

Description

This function finds significant interactions in a HiC-DC readable matrix and expresses statistical significance of counts through the following with a parallel implementation (using sockets; compatible with Windows): 'pvalue': significance P-value, 'qvalue': FDR corrected P-value, mu': expected counts, 'sdev': modeled standard deviation of expected counts.

Usage

HiCDCPlus_parallel(
  gi_list,
  covariates = NULL,
  chrs = NULL,
  distance_type = "spline",
  model_distribution = "nb",
  binned = TRUE,
  df = 6,
  Dmin = 0,
  Dmax = 2e+06,
  ssize = 0.01,
  splineknotting = "uniform",
  ncore = NULL
)
HiCDCPlus_parallel(
  gi_list,
  covariates = NULL,
  chrs = NULL,
  distance_type = "spline",
  model_distribution = "nb",
  binned = TRUE,
  df = 6,
  Dmin = 0,
  Dmax = 2e+06,
  ssize = 0.01,
  splineknotting = "uniform",
  ncore = NULL
)

Arguments

`gi_list`	List of `GenomicInteractions` objects where each object named with chromosomes contains intrachromosomal interaction information (minimally containing counts and genomic distance in `mcols(gi_list[[1]])`—see `?gi_list_validate` for a detailed explanation of valid `gi_list` instances).
`covariates`	covariates to be considered in addition to genomic distance D. Defaults to all covariates besides 'D','counts','mu','sdev',pvalue','qvalue' in `mcols(gi)`
`chrs`	select a subset of chromosomes' e.g., c('chr21','chr22'). Defaults to all chromosomes in the `gi_list`.
`distance_type`	distance covariate form: 'spline' or 'log'. Defaults to 'spline'.
`model_distribution`	'nb' uses a Negative Binomial model, 'nb_vardisp' uses a Negative Binomial model with a distance specific dispersion parameter inferred from the data, 'nb_hurdle' uses the legacy HiC-DC model.
`binned`	TRUE if uniformly binned or FALSE if binned by restriction enzyme fragment cutsites
`df`	degrees of freedom for the genomic distance spline function if `distance_type='spline'`. Defaults to 6, which corresponds to a cubic spline as explained in Carty et al. (2017)
`Dmin`	minimum distance (included) to check for significant interactions, defaults to 0
`Dmax`	maximum distance (included) to check for significant interactions, defaults to 2e6 or maximum in the data; whichever is minimum.
`ssize`	Distance stratified sampling size. Can decrease for large chromosomes. Increase recommended if model fails to converge. Defaults to 0.01.
`splineknotting`	Spline knotting strategy. Either "uniform", uniformly spaced in distance, or placed based on distance distribution of counts "count-based" (i.e., more closely spaced where counts are more dense).
`ncore`	Number of cores to parallelize. Defaults to `parallel::detectCores()-1`.

Value

Examples

gi_list<-generate_binned_gi_list(50e3,chrs='chr22')
gi_list<-add_hic_counts(gi_list,
hic_path=system.file("extdata", "GSE63525_HMEC_combined_example.hic",
package = "HiCDCPlus"))
gi<-HiCDCPlus_parallel(gi_list,ncore=1)
gi_list<-generate_binned_gi_list(50e3,chrs='chr22')
gi_list<-add_hic_counts(gi_list,
hic_path=system.file("extdata", "GSE63525_HMEC_combined_example.hic",
package = "HiCDCPlus"))
gi<-HiCDCPlus_parallel(gi_list,ncore=1)

HTClist2gi_list

Description

This function converts a HTClist instance into a gi_list instance with counts for further use with this package, HiCDCPlus

Usage

HTClist2gi_list(htc_list, chrs = NULL, Dthreshold = 2e+06)
HTClist2gi_list(htc_list, chrs = NULL, Dthreshold = 2e+06)

Arguments

`htc_list`	A valid HTClist instance (see `vignette("HiTC")`)
`chrs`	select a subset of chromosomes' e.g., c('chr21','chr22'). Defaults to chromosomes in `htc_list`.
`Dthreshold`	maximum distance (included) to check for significant interactions, defaults to 2e6 or maximum in the data; whichever is smaller.

Value

a thresholded gi_list instance with intra-chromosomal counts for further use with HiCDCPlus

Examples

gi_list<-generate_binned_gi_list(50e3,chrs=c('chr22'))
gi_list<-add_hic_counts(gi_list,
hic_path=system.file("extdata", "GSE63525_HMEC_combined_example.hic",
package = "HiCDCPlus"))
htc_list<-gi_list2HTClist(gi_list)
gi_list2<-HTClist2gi_list(htc_list,Dthreshold=Inf)
gi_list<-generate_binned_gi_list(50e3,chrs=c('chr22'))
gi_list<-add_hic_counts(gi_list,
hic_path=system.file("extdata", "GSE63525_HMEC_combined_example.hic",
package = "HiCDCPlus"))
htc_list<-gi_list2HTClist(gi_list)
gi_list2<-HTClist2gi_list(htc_list,Dthreshold=Inf)

straw

Description

Adapted C++ implementation of Juicer's dump. Reads the .hic file, finds the appropriate matrix and slice of data, and outputs as an R DataFrame.

Usage

straw(norm, fn, ch1, ch2, u, bs)
straw(norm, fn, ch1, ch2, u, bs)

Arguments

`norm`	Normalization to apply. Must be one of NONE/VC/VC_SQRT/KR. VC is vanilla coverage, VC_SQRT is square root of vanilla coverage, and KR is Knight-Ruiz or Balanced normalization.
`fn`	path to the .hic file
`ch1`	first chromosome location (e.g., "1")
`ch2`	second chromosome location (e.g., "8")
`u`	BP (BasePair) or FRAG (restriction enzyme FRAGment)
`bs`	The bin size. By default, for BP, this is one of <2500000, 1000000, 500000, 250000, 100000, 50000, 25000, 10000, 5000> and for FRAG this is one of <500, 200, 100, 50, 20, 5, 2, 1>.

Details

Usage: straw <NONE/VC/VC_SQRT/KR> <hicFile(s)> <chr1>[:x1:x2] <chr2>[:y1:y2] <BP/FRAG> <binsize>

Value

Data.frame of a sparse matrix of data from hic file. x,y,counts

straw_dump

Description

Interface for Juicer's dump in case C++ straw fails (known to fail on Windows due to zlib compression not being OS agnostic and particularly not preserving null bytes, which .hic files are delimited with). This function reads the .hic file, finds the appropriate matrix and slice of data, writes it to a temp file, reads and modifies it, and outputs as an R DataFrame (and also deletes the temp file).

Usage

straw_dump(norm, fn, ch1, ch2, u, bs)
straw_dump(norm, fn, ch1, ch2, u, bs)

Arguments

`norm`	Normalization to apply. Must be one of NONE/VC/VC_SQRT/KR. VC is vanilla coverage, VC_SQRT is square root of vanilla coverage, and KR is Knight-Ruiz or Balanced normalization.
`fn`	path to the .hic file
`ch1`	first chromosome location (e.g., "1")
`ch2`	second chromosome location (e.g., "8")
`u`	BP (BasePair) or FRAG (restriction enzyme FRAGment)
`bs`	The bin size. By default, for BP, this is one of <2500000, 1000000, 500000, 250000, 100000, 50000, 25000, 10000, 5000> and for FRAG this is one of <500, 200, 100, 50, 20, 5, 2, 1>.

Details

Usage: straw_dump <oe/observed> <NONE/VC/VC_SQRT/KR> <hicFile(s)> <chr1>[:x1:x2] <chr2>[:y1:y2] <BP/FRAG> <binsize> <outfile>

Value

Data.frame of a sparse matrix of data from hic file. x,y,counts

Package 'HiCDCPlus'

Help Index

add_1D_features

Description

Usage

Arguments

Value

Examples

add_2D_features

Description

Usage

Arguments

Value

Examples

add_hic_counts

Description

Usage

Arguments

Value

Examples

add_hicpro_allvalidpairs_counts

Description

Usage

Arguments

Value

add_hicpro_matrix.counts

Description

Usage

Arguments

Value

construct_features

Description

Usage

Arguments

Value

Examples

construct_features_chr

Description

Usage

Arguments

Value

Examples

construct_features_parallel

Description

Usage

Arguments

Value

Examples

expand_1D_features

Description

Usage

Arguments

Value

Examples

extract_hic_eigenvectors

Description

Usage

Arguments

Value

Examples

generate_binned_gi_list

Description

Usage

Arguments

Value

Examples

generate_bintolen_gi_list

Description

Usage

Arguments

Value

Examples

generate_df_gi_list

Description

Usage

Arguments

Value

Examples

get_chr_sizes

Description