Title: | Locus overlap analysis for enrichment of genomic ranges |
---|---|
Description: | Provides functions for testing overlap of sets of genomic regions with public and custom region set (genomic ranges) databases. This makes it possible to do automated enrichment analysis for genomic region sets, thus facilitating interpretation of functional genomics and epigenomics data. |
Authors: | Nathan Sheffield <http://www.databio.org> [aut, cre], Christoph Bock [ctb] |
Maintainer: | Nathan Sheffield <[email protected]> |
License: | GPL-3 |
Version: | 1.37.0 |
Built: | 2024-10-30 07:35:33 UTC |
Source: | https://github.com/bioc/LOLA |
If you want to test for differential enrichment within your usersets, you can restrict the universe to only regions that are covered in at least one of your sets. This function helps you build just such a restricted universe
buildRestrictedUniverse(userSets)
buildRestrictedUniverse(userSets)
userSets |
The userSets you will pass to the enrichment calculation. |
A restricted universe
data("sample_input", package="LOLA") # load userSets restrictedUniverse = buildRestrictedUniverse(userSets)
data("sample_input", package="LOLA") # load userSets restrictedUniverse = buildRestrictedUniverse(userSets)
Checks to see if the universe is appropriate for the userSets Anything in the userSets should be present in the universe. In addition, 2 different regions in the userSets should not overlap the same region in the universe
checkUniverseAppropriateness(userSets, userUniverse, cores = 1, fast = FALSE)
checkUniverseAppropriateness(userSets, userUniverse, cores = 1, fast = FALSE)
userSets |
Regions of interest |
userUniverse |
Regions tested for inclusion in userSets |
cores |
Number of processors |
fast |
Skip the (slow) test for many-to-many relationships |
No return value.
data("sample_input", package="LOLA") # load userSet data("sample_universe", package="LOLA") # load userUniverse checkUniverseAppropriateness(userSets, userUniverse)
data("sample_input", package="LOLA") # load userSet data("sample_universe", package="LOLA") # load userUniverse checkUniverseAppropriateness(userSets, userUniverse)
cleanws takes multi-line, code formatted strings and just formats them as simple strings
cleanws(string)
cleanws(string)
string |
string to clean |
A string with all consecutive whitespace characters, including tabs and newlines, merged into a single space.
Just a reverser. Reverses the order of arguments and passes them untouched to countOverlapsAny – so you can use it with lapply.
countOverlapsAnyRev(subj, quer)
countOverlapsAnyRev(subj, quer)
subj |
Subject |
quer |
Query |
Results from countOverlaps
Given a single row from an enrichment table calculation, finds the set of overlaps between the user set and the test set. You can then use these, for example, to get sequences for those regions.
extractEnrichmentOverlaps(locResult, userSets, regionDB)
extractEnrichmentOverlaps(locResult, userSets, regionDB)
locResult |
Results from runLOLA function |
userSets |
User sets passed to the runLOLA function |
regionDB |
Region database used |
userSets overlapping the supplied database entry.
dbPath = system.file("extdata", "hg19", package="LOLA") regionDB = loadRegionDB(dbLocation=dbPath) data("sample_universe", package="LOLA") data("sample_input", package="LOLA") getRegionSet(regionDB, collections="ucsc_example", filenames="vistaEnhancers.bed") getRegionSet(dbPath, collections="ucsc_example", filenames="vistaEnhancers.bed") getRegionFile(dbPath, collections="ucsc_example", filenames="vistaEnhancers.bed") res = runLOLA(userSets, userUniverse, regionDB, cores=1) locResult = res[2,] extractEnrichmentOverlaps(locResult, userSets, regionDB) writeCombinedEnrichment(locResult, "temp_outfolder") userSetsRedefined = redefineUserSets(userSets, userUniverse) resRedefined = runLOLA(userSetsRedefined, userUniverse, regionDB, cores=1) g = plotTopLOLAEnrichments(resRedefined)
dbPath = system.file("extdata", "hg19", package="LOLA") regionDB = loadRegionDB(dbLocation=dbPath) data("sample_universe", package="LOLA") data("sample_input", package="LOLA") getRegionSet(regionDB, collections="ucsc_example", filenames="vistaEnhancers.bed") getRegionSet(dbPath, collections="ucsc_example", filenames="vistaEnhancers.bed") getRegionFile(dbPath, collections="ucsc_example", filenames="vistaEnhancers.bed") res = runLOLA(userSets, userUniverse, regionDB, cores=1) locResult = res[2,] extractEnrichmentOverlaps(locResult, userSets, regionDB) writeCombinedEnrichment(locResult, "temp_outfolder") userSetsRedefined = redefineUserSets(userSets, userUniverse) resRedefined = runLOLA(userSetsRedefined, userUniverse, regionDB, cores=1) g = plotTopLOLAEnrichments(resRedefined)
Like getRegionSet but returns a filename instead of a GRanges object. Given a local filename, returns a complete absolulte path so you can read that file in.
getRegionFile(dbLocation, filenames, collections = NULL)
getRegionFile(dbLocation, filenames, collections = NULL)
dbLocation |
folder of regionDB |
filenames |
Filename(s) of a particular region set to grab. |
collections |
(optional) subset of collections to list |
A filename the specified file in the regionDB.
dbPath = system.file("extdata", "hg19", package="LOLA") regionDB = loadRegionDB(dbLocation=dbPath) data("sample_universe", package="LOLA") data("sample_input", package="LOLA") getRegionSet(regionDB, collections="ucsc_example", filenames="vistaEnhancers.bed") getRegionSet(dbPath, collections="ucsc_example", filenames="vistaEnhancers.bed") getRegionFile(dbPath, collections="ucsc_example", filenames="vistaEnhancers.bed") res = runLOLA(userSets, userUniverse, regionDB, cores=1) locResult = res[2,] extractEnrichmentOverlaps(locResult, userSets, regionDB) writeCombinedEnrichment(locResult, "temp_outfolder") userSetsRedefined = redefineUserSets(userSets, userUniverse) resRedefined = runLOLA(userSetsRedefined, userUniverse, regionDB, cores=1) g = plotTopLOLAEnrichments(resRedefined)
dbPath = system.file("extdata", "hg19", package="LOLA") regionDB = loadRegionDB(dbLocation=dbPath) data("sample_universe", package="LOLA") data("sample_input", package="LOLA") getRegionSet(regionDB, collections="ucsc_example", filenames="vistaEnhancers.bed") getRegionSet(dbPath, collections="ucsc_example", filenames="vistaEnhancers.bed") getRegionFile(dbPath, collections="ucsc_example", filenames="vistaEnhancers.bed") res = runLOLA(userSets, userUniverse, regionDB, cores=1) locResult = res[2,] extractEnrichmentOverlaps(locResult, userSets, regionDB) writeCombinedEnrichment(locResult, "temp_outfolder") userSetsRedefined = redefineUserSets(userSets, userUniverse) resRedefined = runLOLA(userSetsRedefined, userUniverse, regionDB, cores=1) g = plotTopLOLAEnrichments(resRedefined)
If you want to work with a LOLA regionDB region set individually, this function can help you. It can extract individual (or subsets of) region sets from either loaded regionDBs, loaded with loadRegionDB(), or from a database on disk, where only the region sets of interest will be loaded.
getRegionSet(regionDB, filenames, collections = NULL)
getRegionSet(regionDB, filenames, collections = NULL)
regionDB |
A region database loaded with loadRegionDB(). |
filenames |
Filename(s) of a particular region set to grab. |
collections |
(optional) subset of collections to list |
A GRanges object derived from the specified file in the regionDB.
dbPath = system.file("extdata", "hg19", package="LOLA") regionDB = loadRegionDB(dbLocation=dbPath) data("sample_universe", package="LOLA") data("sample_input", package="LOLA") getRegionSet(regionDB, collections="ucsc_example", filenames="vistaEnhancers.bed") getRegionSet(dbPath, collections="ucsc_example", filenames="vistaEnhancers.bed") getRegionFile(dbPath, collections="ucsc_example", filenames="vistaEnhancers.bed") res = runLOLA(userSets, userUniverse, regionDB, cores=1) locResult = res[2,] extractEnrichmentOverlaps(locResult, userSets, regionDB) writeCombinedEnrichment(locResult, "temp_outfolder") userSetsRedefined = redefineUserSets(userSets, userUniverse) resRedefined = runLOLA(userSetsRedefined, userUniverse, regionDB, cores=1) g = plotTopLOLAEnrichments(resRedefined)
dbPath = system.file("extdata", "hg19", package="LOLA") regionDB = loadRegionDB(dbLocation=dbPath) data("sample_universe", package="LOLA") data("sample_input", package="LOLA") getRegionSet(regionDB, collections="ucsc_example", filenames="vistaEnhancers.bed") getRegionSet(dbPath, collections="ucsc_example", filenames="vistaEnhancers.bed") getRegionFile(dbPath, collections="ucsc_example", filenames="vistaEnhancers.bed") res = runLOLA(userSets, userUniverse, regionDB, cores=1) locResult = res[2,] extractEnrichmentOverlaps(locResult, userSets, regionDB) writeCombinedEnrichment(locResult, "temp_outfolder") userSetsRedefined = redefineUserSets(userSets, userUniverse) resRedefined = runLOLA(userSetsRedefined, userUniverse, regionDB, cores=1) g = plotTopLOLAEnrichments(resRedefined)
Function to run lapply or mclapply, depending on the option set in getOption("mc.cores"), which can be set with setLapplyAlias().
lapplyAlias(..., mc.preschedule = TRUE)
lapplyAlias(..., mc.preschedule = TRUE)
... |
Arguments passed lapply() or mclapply() |
mc.preschedule |
Argument passed to mclapply |
Result from lapply or parallel::mclapply
Lists the region sets for given collection(s) in a region database on disk.
listRegionSets(regionDB, collections = NULL)
listRegionSets(regionDB, collections = NULL)
regionDB |
File path to region database |
collections |
(optional) subset of collections to list |
a list of files in the given collections
dbPath = system.file("extdata", "hg19", package="LOLA") listRegionSets(dbPath)
dbPath = system.file("extdata", "hg19", package="LOLA") listRegionSets(dbPath)
converts a list of GRanges into a GRangesList; strips all metadata.
listToGRangesList(lst)
listToGRangesList(lst)
lst |
a list of GRanges objects |
a GRangesList object
Helper function to annotate and load a regionDB, a folder with subfolder collections of regions.
loadRegionDB(dbLocation, useCache = TRUE, limit = NULL, collections = NULL)
loadRegionDB(dbLocation, useCache = TRUE, limit = NULL, collections = NULL)
dbLocation |
folder where your regionDB is stored, or list of such folders |
useCache |
uses simpleCache to cache and load the results |
limit |
You can limit the number of regions for testing. Default: NULL (no limit) |
collections |
Restrict the database loading to this list of collections |
regionDB list containing database location, region and collection annotations, and regions GRangesList
dbPath = system.file("extdata", "hg19", package="LOLA") regionDB = loadRegionDB(dbLocation=dbPath)
dbPath = system.file("extdata", "hg19", package="LOLA") regionDB = loadRegionDB(dbLocation=dbPath)
Given two regionDBs, (lists returned from loadRegionDB()), This function will combine them into a single regionDB. This will enable you to combine, for example, LOLA Core databases with custom databases into a single analysis.
mergeRegionDBs(dbA, dbB)
mergeRegionDBs(dbA, dbB)
dbA |
First regionDB database. |
dbB |
Second regionDB database. |
A combined regionDB.
dbPath = system.file("extdata", "hg19", package="LOLA") regionDB = loadRegionDB(dbPath) combinedRegionDB = mergeRegionDBs(regionDB, regionDB)
dbPath = system.file("extdata", "hg19", package="LOLA") regionDB = loadRegionDB(dbPath) combinedRegionDB = mergeRegionDBs(regionDB, regionDB)
This function is a drop-in replacement for the base list() function, which automatically names your list according to the names of the variables used to construct it. It seemlessly handles lists with some names and others absent, not overwriting specified names while naming any unnamed parameters. Took me awhile to figure this out.
nlist(...)
nlist(...)
... |
arguments passed to list() |
A named list object.
Given some results (you grab the top ones on your own), this plots a barplot visualizing their odds ratios.
plotTopLOLAEnrichments(data)
plotTopLOLAEnrichments(data)
data |
A results table returned from runLOLA() |
Returns a ggplot2 plot object.
dbPath = system.file("extdata", "hg19", package="LOLA") regionDB = loadRegionDB(dbLocation=dbPath) data("sample_universe", package="LOLA") data("sample_input", package="LOLA") getRegionSet(regionDB, collections="ucsc_example", filenames="vistaEnhancers.bed") getRegionSet(dbPath, collections="ucsc_example", filenames="vistaEnhancers.bed") getRegionFile(dbPath, collections="ucsc_example", filenames="vistaEnhancers.bed") res = runLOLA(userSets, userUniverse, regionDB, cores=1) locResult = res[2,] extractEnrichmentOverlaps(locResult, userSets, regionDB) writeCombinedEnrichment(locResult, "temp_outfolder") userSetsRedefined = redefineUserSets(userSets, userUniverse) resRedefined = runLOLA(userSetsRedefined, userUniverse, regionDB, cores=1) g = plotTopLOLAEnrichments(resRedefined)
dbPath = system.file("extdata", "hg19", package="LOLA") regionDB = loadRegionDB(dbLocation=dbPath) data("sample_universe", package="LOLA") data("sample_input", package="LOLA") getRegionSet(regionDB, collections="ucsc_example", filenames="vistaEnhancers.bed") getRegionSet(dbPath, collections="ucsc_example", filenames="vistaEnhancers.bed") getRegionFile(dbPath, collections="ucsc_example", filenames="vistaEnhancers.bed") res = runLOLA(userSets, userUniverse, regionDB, cores=1) locResult = res[2,] extractEnrichmentOverlaps(locResult, userSets, regionDB) writeCombinedEnrichment(locResult, "temp_outfolder") userSetsRedefined = redefineUserSets(userSets, userUniverse) resRedefined = runLOLA(userSetsRedefined, userUniverse, regionDB, cores=1) g = plotTopLOLAEnrichments(resRedefined)
Imports bed files and creates GRanges objects, using the fread() function from data.table.
readBed(file)
readBed(file)
file |
File name of bed file. |
GRanges Object
a = readBed(system.file("extdata", "examples/combined_regions.bed", package="LOLA"))
a = readBed(system.file("extdata", "examples/combined_regions.bed", package="LOLA"))
Given a bunch of region set files, read in all those flat (bed) files and create a GRangesList object holding all the region sets. This function is used by readRegionGRL to process annotation objects.
readCollection(filesToRead, limit = NULL)
readCollection(filesToRead, limit = NULL)
filesToRead |
a vector containing bed files |
limit |
for testing purposes, limit the number of files read. NULL for no limit (default). |
A GRangesList with the GRanges in the filesToRead.
files = list.files(system.file("extdata", "hg19/ucsc_example/regions", package="LOLA"), pattern="*.bed") regionAnno = readCollection(files)
files = list.files(system.file("extdata", "hg19/ucsc_example/regions", package="LOLA"), pattern="*.bed") regionAnno = readCollection(files)
Read collection annotation
readCollectionAnnotation(dbLocation, collections = NULL)
readCollectionAnnotation(dbLocation, collections = NULL)
dbLocation |
Location of the database |
collections |
Restrict the database loading to this list of collections. Leave NULL to load the entire database (Default). |
Collection annotation data.table
dbPath = system.file("extdata", "hg19", package="LOLA") collectionAnno = readCollectionAnnotation(dbLocation=dbPath)
dbPath = system.file("extdata", "hg19", package="LOLA") collectionAnno = readCollectionAnnotation(dbLocation=dbPath)
Given a database and a collection, this will create the region annotation data.table; either giving a generic table based on file names, or by reading in the annotation data.
readCollectionFiles(dbLocation, collection, refreshSizes = FALSE)
readCollectionFiles(dbLocation, collection, refreshSizes = FALSE)
dbLocation |
folder where your regionDB is stored. |
collection |
Collection folder to load |
refreshSizes |
should I recreate the sizes files documenting how many regions (lines) are in each region set? |
A data.table annotating the regions in the collections.
dbPath = system.file("extdata", "hg19", package="LOLA") regionAnno = readCollectionFiles(dbLocation=dbPath, "ucsc_example")
dbPath = system.file("extdata", "hg19", package="LOLA") regionAnno = readCollectionFiles(dbLocation=dbPath, "ucsc_example")
This function takes a region annotation object and reads in the regions, returning a GRangesList object of the regions.
readRegionGRL( dbLocation, annoDT, refreshCaches = FALSE, useCache = TRUE, limit = NULL )
readRegionGRL( dbLocation, annoDT, refreshCaches = FALSE, useCache = TRUE, limit = NULL )
dbLocation |
folder of regiondB |
annoDT |
output of readRegionSetAnnotation(). |
refreshCaches |
should I recreate the caches? |
useCache |
uses simpleCache to cache and load the results |
limit |
for testing purposes, limit the nmber of files read. NULL for no limit (default). |
GRangesList object
dbPath = system.file("extdata", "hg19", package="LOLA") regionAnno = readRegionSetAnnotation(dbLocation=dbPath) regionGRL = readRegionGRL(dbLocation= dbPath, regionAnno, useCache=FALSE)
dbPath = system.file("extdata", "hg19", package="LOLA") regionAnno = readRegionSetAnnotation(dbLocation=dbPath) regionGRL = readRegionGRL(dbLocation= dbPath, regionAnno, useCache=FALSE)
Given a folder containing region collections in subfolders, this function will either read the annotation file if one exists, or create a generic annotation file.
readRegionSetAnnotation( dbLocation, collections = NULL, refreshCaches = FALSE, refreshSizes = TRUE, useCache = TRUE )
readRegionSetAnnotation( dbLocation, collections = NULL, refreshCaches = FALSE, refreshSizes = TRUE, useCache = TRUE )
dbLocation |
folder where your regionDB is stored. |
collections |
Restrict the database loading to this list of collections Leave NULL to load the entire database (Default). |
refreshCaches |
should I recreate the caches? Default: FALSE |
refreshSizes |
should I refresh the size files? Default:TRUE |
useCache |
Use simpleCache to store results and load them? |
Region set annotation (data.table)
dbPath = system.file("extdata", "hg19", package="LOLA") regionAnno = readRegionSetAnnotation(dbLocation=dbPath)
dbPath = system.file("extdata", "hg19", package="LOLA") regionAnno = readRegionSetAnnotation(dbLocation=dbPath)
This function will take the user sets, overlap with the universe, and redefine the user sets as the set of regions in the user universe that overlap at least one region in user sets. this makes for a more appropriate statistical enrichment comparison, as the user sets are actually exactly the same regions found in the universe otherwise, you can get some weird artifacts from the many-to-many relationship between user set regions and universe regions.
redefineUserSets(userSets, userUniverse, cores = 1)
redefineUserSets(userSets, userUniverse, cores = 1)
userSets |
Regions of interest |
userUniverse |
Regions tested for inclusion in userSets |
cores |
Number of processors |
userSets redefined in terms of userUniverse
dbPath = system.file("extdata", "hg19", package="LOLA") regionDB = loadRegionDB(dbLocation=dbPath) data("sample_universe", package="LOLA") data("sample_input", package="LOLA") getRegionSet(regionDB, collections="ucsc_example", filenames="vistaEnhancers.bed") getRegionSet(dbPath, collections="ucsc_example", filenames="vistaEnhancers.bed") getRegionFile(dbPath, collections="ucsc_example", filenames="vistaEnhancers.bed") res = runLOLA(userSets, userUniverse, regionDB, cores=1) locResult = res[2,] extractEnrichmentOverlaps(locResult, userSets, regionDB) writeCombinedEnrichment(locResult, "temp_outfolder") userSetsRedefined = redefineUserSets(userSets, userUniverse) resRedefined = runLOLA(userSetsRedefined, userUniverse, regionDB, cores=1) g = plotTopLOLAEnrichments(resRedefined)
dbPath = system.file("extdata", "hg19", package="LOLA") regionDB = loadRegionDB(dbLocation=dbPath) data("sample_universe", package="LOLA") data("sample_input", package="LOLA") getRegionSet(regionDB, collections="ucsc_example", filenames="vistaEnhancers.bed") getRegionSet(dbPath, collections="ucsc_example", filenames="vistaEnhancers.bed") getRegionFile(dbPath, collections="ucsc_example", filenames="vistaEnhancers.bed") res = runLOLA(userSets, userUniverse, regionDB, cores=1) locResult = res[2,] extractEnrichmentOverlaps(locResult, userSets, regionDB) writeCombinedEnrichment(locResult, "temp_outfolder") userSetsRedefined = redefineUserSets(userSets, userUniverse) resRedefined = runLOLA(userSetsRedefined, userUniverse, regionDB, cores=1) g = plotTopLOLAEnrichments(resRedefined)
This will change the string in filename to have a new extension
replaceFileExtension(filename, extension)
replaceFileExtension(filename, extension)
filename |
string to convert |
extension |
new extension |
Filename with original extension deleted, replaced by provided extension
Workhorse function that calculates overlaps between userSets, and then uses a fisher's exact test rank them by significance of the overlap.
runLOLA( userSets, userUniverse, regionDB, minOverlap = 1, cores = 1, redefineUserSets = FALSE, direction = "enrichment" )
runLOLA( userSets, userUniverse, regionDB, minOverlap = 1, cores = 1, redefineUserSets = FALSE, direction = "enrichment" )
userSets |
Regions of interest |
userUniverse |
Regions tested for inclusion in userSets |
regionDB |
Region DB to check for overlap, from loadRegionDB() |
minOverlap |
(Default:1) Minimum bases required to count an overlap |
cores |
Number of processors |
redefineUserSets |
run redefineUserSets() on your userSets? |
direction |
Defaults to "enrichment", but may also accept "depletion", which will swap the direction of the fisher test (use 'greater' or less' value passed to the 'alternative' option of fisher.test) |
Data.table with enrichment results. Rows correspond to individual pairwise fisher's tests comparing a single userSet with a single databaseSet. The columns in this data.table are: userSet and dbSet: index into their respective input region sets. pvalueLog: -log10(pvalue) from the fisher's exact result; oddsRatio: result from the fisher's exact test; support: number of regions in userSet overlapping databaseSet; rnkPV, rnkOR, rnkSup: rank in this table of p-value, oddsRatio, and Support respectively. The –value is the negative natural log of the p-value returned from a one-sided fisher's exact test. maxRnk, meanRnk: max and mean of the 3 previous ranks, providing a combined ranking system. b, c, d: 3 other values completing the 2x2 contingency table (with support). The remaining columns describe the dbSet for the row.
If you have the qvalue package installed from bioconductor, runLOLA will add a q-value transformation to provide FDR scores automatically.
dbPath = system.file("extdata", "hg19", package="LOLA") regionDB = loadRegionDB(dbLocation=dbPath) data("sample_universe", package="LOLA") data("sample_input", package="LOLA") getRegionSet(regionDB, collections="ucsc_example", filenames="vistaEnhancers.bed") getRegionSet(dbPath, collections="ucsc_example", filenames="vistaEnhancers.bed") getRegionFile(dbPath, collections="ucsc_example", filenames="vistaEnhancers.bed") res = runLOLA(userSets, userUniverse, regionDB, cores=1) locResult = res[2,] extractEnrichmentOverlaps(locResult, userSets, regionDB) writeCombinedEnrichment(locResult, "temp_outfolder") userSetsRedefined = redefineUserSets(userSets, userUniverse) resRedefined = runLOLA(userSetsRedefined, userUniverse, regionDB, cores=1) g = plotTopLOLAEnrichments(resRedefined)
dbPath = system.file("extdata", "hg19", package="LOLA") regionDB = loadRegionDB(dbLocation=dbPath) data("sample_universe", package="LOLA") data("sample_input", package="LOLA") getRegionSet(regionDB, collections="ucsc_example", filenames="vistaEnhancers.bed") getRegionSet(dbPath, collections="ucsc_example", filenames="vistaEnhancers.bed") getRegionFile(dbPath, collections="ucsc_example", filenames="vistaEnhancers.bed") res = runLOLA(userSets, userUniverse, regionDB, cores=1) locResult = res[2,] extractEnrichmentOverlaps(locResult, userSets, regionDB) writeCombinedEnrichment(locResult, "temp_outfolder") userSetsRedefined = redefineUserSets(userSets, userUniverse) resRedefined = runLOLA(userSetsRedefined, userUniverse, regionDB, cores=1) g = plotTopLOLAEnrichments(resRedefined)
Function to sample regions from a GRangesList object, in specified proportion
sampleGRL(GRL, prop)
sampleGRL(GRL, prop)
GRL |
GRangesList from which to sample |
prop |
vector with same length as GRL, of values between 0-1, proportion of the list to select |
A sampled subset of original GRangesList object.
To make parallel processing a possibility but not required, I use an lapply alias which can point at either the base lapply (for no multicore), or it can point to mclapply, and set the options for the number of cores (what mclapply uses). With no argument given, returns intead the number of cpus currently selected.
setLapplyAlias(cores = 0)
setLapplyAlias(cores = 0)
cores |
Number of cpus |
None
Efficiently split a data.table by a column in the table
splitDataTable(DT, splitFactor)
splitDataTable(DT, splitFactor)
DT |
Data.table to split |
splitFactor |
Column to split, which can be a character vector or an integer. |
List of data.table objects, split by column
This function will take a single large bed file that is annotated with a column grouping different sets of similar regions, and split it into separate files for use with the LOLA collection format.
splitFileIntoCollection( filename, splitCol, collectionFolder = NULL, filenamePrepend = "" )
splitFileIntoCollection( filename, splitCol, collectionFolder = NULL, filenamePrepend = "" )
filename |
the file to split |
splitCol |
factor column that groups the lines in the file by set. It should be an integer. |
collectionFolder |
name of folder to place the new split files. |
filenamePrepend |
string to prepend to the filenames. Defaults to blank. |
No return value.
combFile = system.file("extdata", "examples/combined_regions.bed", package="LOLA") splitFileIntoCollection(combFile, 4)
combFile = system.file("extdata", "examples/combined_regions.bed", package="LOLA") splitFileIntoCollection(combFile, 4)
A dataset containing a few sample regions.
data(sample_input)
data(sample_input)
A GRangesList object
No return value.
## Not run: This is how I produced the sample data sets: dbPath = system.file("extdata", "hg19", package="LOLA") regionDB = loadRegionDB(dbLocation= dbPath) userSetA = reduce(do.call(c, (sampleGRL(regionDB$regionGRL, prop=c(.1,.25,.05,.05,0))))) userSetB = reduce(do.call(c, (sampleGRL(regionDB$regionGRL, prop=c(.2,.05,.05,.05,0))))) userSets = GRangesList(setA=userSetA, setB=userSetB) userUniverse = reduce(do.call(c, regionDB$regionGRL)) save(userSets, file="sample_input.RData") save(userUniverse, file="sample_universe.RData") ## End(Not run)
## Not run: This is how I produced the sample data sets: dbPath = system.file("extdata", "hg19", package="LOLA") regionDB = loadRegionDB(dbLocation= dbPath) userSetA = reduce(do.call(c, (sampleGRL(regionDB$regionGRL, prop=c(.1,.25,.05,.05,0))))) userSetB = reduce(do.call(c, (sampleGRL(regionDB$regionGRL, prop=c(.2,.05,.05,.05,0))))) userSets = GRangesList(setA=userSetA, setB=userSetB) userUniverse = reduce(do.call(c, regionDB$regionGRL)) save(userSets, file="sample_input.RData") save(userUniverse, file="sample_universe.RData") ## End(Not run)
A reduced GRanges object from the example regionDB database
data(sample_universe)
data(sample_universe)
A GRanges object
No return value.
Wrapper of write.table that provides defaults to write a simple .tsv file. Passes additional arguments to write.table
write.tsv(...)
write.tsv(...)
... |
Additional arguments passed to write.table |
No return value
Function for writing output all at once: combinedResults is an table generated by "locationEnrichment()" or by rbinding category/location results. Writes all enrichments to a single file, and also spits out the same data divided into groups based on userSets, and Databases, just for convenience. disable this with an option.
writeCombinedEnrichment( combinedResults, outFolder = NULL, includeSplits = TRUE )
writeCombinedEnrichment( combinedResults, outFolder = NULL, includeSplits = TRUE )
combinedResults |
enrichment results object |
outFolder |
location to write results on disk |
includeSplits |
also include individual files for each user set and database? |
No return value.
dbPath = system.file("extdata", "hg19", package="LOLA") regionDB = loadRegionDB(dbLocation=dbPath) data("sample_universe", package="LOLA") data("sample_input", package="LOLA") getRegionSet(regionDB, collections="ucsc_example", filenames="vistaEnhancers.bed") getRegionSet(dbPath, collections="ucsc_example", filenames="vistaEnhancers.bed") getRegionFile(dbPath, collections="ucsc_example", filenames="vistaEnhancers.bed") res = runLOLA(userSets, userUniverse, regionDB, cores=1) locResult = res[2,] extractEnrichmentOverlaps(locResult, userSets, regionDB) writeCombinedEnrichment(locResult, "temp_outfolder") userSetsRedefined = redefineUserSets(userSets, userUniverse) resRedefined = runLOLA(userSetsRedefined, userUniverse, regionDB, cores=1) g = plotTopLOLAEnrichments(resRedefined)
dbPath = system.file("extdata", "hg19", package="LOLA") regionDB = loadRegionDB(dbLocation=dbPath) data("sample_universe", package="LOLA") data("sample_input", package="LOLA") getRegionSet(regionDB, collections="ucsc_example", filenames="vistaEnhancers.bed") getRegionSet(dbPath, collections="ucsc_example", filenames="vistaEnhancers.bed") getRegionFile(dbPath, collections="ucsc_example", filenames="vistaEnhancers.bed") res = runLOLA(userSets, userUniverse, regionDB, cores=1) locResult = res[2,] extractEnrichmentOverlaps(locResult, userSets, regionDB) writeCombinedEnrichment(locResult, "temp_outfolder") userSetsRedefined = redefineUserSets(userSets, userUniverse) resRedefined = runLOLA(userSetsRedefined, userUniverse, regionDB, cores=1) g = plotTopLOLAEnrichments(resRedefined)
Given a data table and a factor variable to split on, efficiently divides the table and then writes the different splits to separate files, named with filePrepend and numbered according to split.
writeDataTableSplitByColumn( DT, splitFactor, filePrepend = "", orderColumn = NULL )
writeDataTableSplitByColumn( DT, splitFactor, filePrepend = "", orderColumn = NULL )
DT |
data.table to split |
splitFactor |
column of DT to split on |
filePrepend |
notation string to prepend to output files |
orderColumn |
column of DT to order on (defaults to the first column) |
number of splits written