| Title: | SNP Effect Matrix Pipeline in R |
|---|---|
| Description: | SEMPLR computes transcription factor binding affinity scores for genomic positions and genetic variants. Scores are computed from SNP Effect Matrices (SEMs) produced by SEMpl. 223 pre-computed SEMs are included with the package or custom sets can be provided. Enrichment can be tested among sets of genomic positions to determine if transcription factor binding events occur more often than expected. Comparing binding affinity scores between alleles can reveal differences in transcription factor binding with genetic variation. This package also includes several visualization functions to view scores both on the motif and variant/position level. |
| Authors: | Grace Kenney [aut, cre] (ORCID: <https://orcid.org/0009-0009-6308-3150>), Douglas Phanstiel [aut], NIH NIGMS [fnd], NSF GRFP [fnd] |
| Maintainer: | Grace Kenney <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.1.1 |
| Built: | 2026-05-11 22:11:37 UTC |
| Source: | https://github.com/bioc/SEMPLR |
Converts the SNP Effect Matrix in a SNPEffectMatrix or SNPEffectMatrixCollection object to a position probability matrix (PPM) where each entry is the probability of a base at each position in the motif.
convertSEMsToPPMs(x)convertSEMsToPPMs(x)
x |
A |
a list of matrices
# Load default SEMs # From a SNPEffectMatrixCollection convertSEMsToPPMs(SEMC) # From a single SNPEffectMatrix convertSEMsToPPMs(getSEMs(SEMC, "JUN"))# Load default SEMs # From a SNPEffectMatrixCollection convertSEMsToPPMs(SEMC) # From a single SNPEffectMatrix convertSEMsToPPMs(getSEMs(SEMC, "JUN"))
enrichmentSets() is a one‐stop wrapper that takes a vector of
user‐provided IDs from a differential expression analysis and returns
promoter regions plus matched background sets for downstream motif
enrichment. It performs:
ID mapping
Optional biotype filtering
Promoter coordinate extraction
Background sampling within that same call
enrichmentSets( txdb, orgdb, id_type, foreground_ids, background_ids = NULL, transcript = FALSE, threshold = 0.9, stripVersions = TRUE, inflateThresh = 1, geneType = NULL, overlapMinGap = 0, onePromoterPerGene = FALSE, n_ratio = 1, promoterWindow = c(upstream = 300, downstream = 50), standardChroms = TRUE, reduceOverlaps = TRUE )enrichmentSets( txdb, orgdb, id_type, foreground_ids, background_ids = NULL, transcript = FALSE, threshold = 0.9, stripVersions = TRUE, inflateThresh = 1, geneType = NULL, overlapMinGap = 0, onePromoterPerGene = FALSE, n_ratio = 1, promoterWindow = c(upstream = 300, downstream = 50), standardChroms = TRUE, reduceOverlaps = TRUE )
txdb |
A TxDb object.
(e.g. |
orgdb |
An OrgDb object. (e.g. |
id_type |
Type of identifier supplied in foreground and background IDs.
See the |
foreground_ids |
Character vector of gene or transcript IDs (e.g. Ensembl, RefSeq, gene symbols) to analyze. |
background_ids |
Character vector of gene or transcript IDs to use as background set. |
transcript |
Logical; |
threshold |
Numeric in range 0 to 1. Min fraction of IDs that must map to pick a keytype (default 0.9). |
stripVersions |
Logical; strip version suffixes (e.g. ".1") from Ensembl/RefSeq IDs. |
inflateThresh |
Numeric in range 0 to 1; max allowed transcript:gene inflation before auto‐collapsing (default 1). |
geneType |
Optional character; biotype filter
(e.g. |
overlapMinGap |
Numeric; minimum gap when reducing overlapping promoters. |
onePromoterPerGene |
Logical; if
|
n_ratio |
Numeric; ratio of ranges to retain in the background set.
The number of background ranges will be equal to |
promoterWindow |
Numeric named vector |
standardChroms |
Logical; restrict to standard chromosomes. |
reduceOverlaps |
Logical; merge overlapping promoter windows. |
A list containing:
GRanges of sampled background promoters
GRanges of foreground promoters
GRanges of the pruned promoter pool
library(TxDb.Hsapiens.UCSC.hg38.knownGene) library(org.Hs.eg.db) txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene orgdb <- org.Hs.eg.db # Note that this is a minimal set for demonstration only genes <- c("ENSG00000139618", "ENSG00000157764") background <- c("ENSG00000111640", "ENSG00000075624") # Minimal run with defaults: results <- enrichmentSets(genes, background, txdb = txdb, orgdb = orgdb, id_type = "ENSEMBL" )library(TxDb.Hsapiens.UCSC.hg38.knownGene) library(org.Hs.eg.db) txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene orgdb <- org.Hs.eg.db # Note that this is a minimal set for demonstration only genes <- c("ENSG00000139618", "ENSG00000157764") background <- c("ENSG00000111640", "ENSG00000075624") # Minimal run with defaults: results <- enrichmentSets(genes, background, txdb = txdb, orgdb = orgdb, id_type = "ENSEMBL" )
Perform a binomial test to determine if SNP Effect Matrices are bound more often than expected.
enrichSEMs(x, sem, background = NULL, seqs = NULL, nFlank = 0, genome = NULL)enrichSEMs(x, sem, background = NULL, seqs = NULL, nFlank = 0, genome = NULL)
x |
The scoring table produced by |
sem |
A |
background |
A |
seqs |
The sequences scored in |
nFlank |
Number of flanking nucleotides added to the sequences. Defaults
to the length of the longest motif. If no flanks were added (ie, sequences
were scored rather than a |
genome |
A |
a list of matrices
# load SEMs # note that this is a small example for demonstration purposes # in actual enrichment analyses sets of 100+ ranges are recommended # create a GRanges object gr <- GenomicRanges::GRanges( seqnames = "chr12", ranges = 94136009 ) # calculate binding propensity sb <- scoreBinding(gr, SEMC, BSgenome.Hsapiens.UCSC.hg19::Hsapiens) enrichSEMs(sb, SEMC)# load SEMs # note that this is a small example for demonstration purposes # in actual enrichment analyses sets of 100+ ranges are recommended # create a GRanges object gr <- GenomicRanges::GRanges( seqnames = "chr12", ranges = 94136009 ) # calculate binding propensity sb <- scoreBinding(gr, SEMC, BSgenome.Hsapiens.UCSC.hg19::Hsapiens) enrichSEMs(sb, SEMC)
Access baseline from a SNPEffectMatrix object
getBaseline(x) ## S4 method for signature 'SNPEffectMatrix' getBaseline(x)getBaseline(x) ## S4 method for signature 'SNPEffectMatrix' getBaseline(x)
x |
SNPEffectMatrix object |
The numeric baseline value is returned
# Isolate a single SNPEffectMatrix object from the default # SNPEffectMatrixCollection sm <- getSEMs(SEMC)[[1]] # Access the baseline getBaseline(sm)# Isolate a single SNPEffectMatrix object from the default # SNPEffectMatrixCollection sm <- getSEMs(SEMC)[[1]] # Access the baseline getBaseline(sm)
Access ranges slot in a SEMScores object
getRanges(x) ## S4 method for signature 'SEMScores' getRanges(x)getRanges(x) ## S4 method for signature 'SEMScores' getRanges(x)
x |
a SEMScores object |
A GRanges or VRanges object
library(VariantAnnotation) # load default SEMs # create a VRanges object vr <- VRanges( seqnames = "chr12", ranges = 94136009, ref = "G", alt = "C" ) # calculate binding propensity s <- scoreVariants(vr, SEMC, BSgenome.Hsapiens.UCSC.hg19::Hsapiens) getRanges(s)library(VariantAnnotation) # load default SEMs # create a VRanges object vr <- VRanges( seqnames = "chr12", ranges = 94136009, ref = "G", alt = "C" ) # calculate binding propensity s <- scoreVariants(vr, SEMC, BSgenome.Hsapiens.UCSC.hg19::Hsapiens) getRanges(s)
Query sequences associated with a given range in a reference genome and optionally include nucleotides up and downstream of the range of interest sequences. Store the results in the metadata of the range object.
getRangeSeqs(x, genome, up = 0, down = 0, refCol = NULL, altCol = NULL)getRangeSeqs(x, genome, up = 0, down = 0, refCol = NULL, altCol = NULL)
x |
A |
genome |
A |
up |
Numeric, number of bases to return upstream of variant |
down |
Numeric, number of bases to return downstream of variant |
refCol |
Column name in meta data storing the ref allele |
altCol |
Column name in meta data storing the alt allele |
a x with added meta data columns. By default, only a "sequence"
column is added to the meta data. If variant information is supplied,
either in a VRanges object or through the "allele" parameter, "ref_seq"
and "alt_seq" columns are added to the meta data.
# set reference genome b <- BSgenome.Hsapiens.UCSC.hg19::Hsapiens # Query sequnece for a GRanges object gr <- GenomicRanges::GRanges( seqnames = "chr12", ranges = IRanges::IRanges(94136009, 94136019) ) getRangeSeqs(gr, genome = b) # Query variant sequences for a VRanges object with 10bp flanks vr <- VariantAnnotation::VRanges( seqnames = "chr12", ranges = IRanges::IRanges(94136009), ref = "G", alt = "C" ) getRangeSeqs(vr, genome = b, up = 10, down = 10)# set reference genome b <- BSgenome.Hsapiens.UCSC.hg19::Hsapiens # Query sequnece for a GRanges object gr <- GenomicRanges::GRanges( seqnames = "chr12", ranges = IRanges::IRanges(94136009, 94136019) ) getRangeSeqs(gr, genome = b) # Query variant sequences for a VRanges object with 10bp flanks vr <- VariantAnnotation::VRanges( seqnames = "chr12", ranges = IRanges::IRanges(94136009), ref = "G", alt = "C" ) getRangeSeqs(vr, genome = b, up = 10, down = 10)
Access the SEM from a SNPEffectMatrix object
getSEM(x) ## S4 method for signature 'SNPEffectMatrix' getSEM(x)getSEM(x) ## S4 method for signature 'SNPEffectMatrix' getSEM(x)
x |
SNPEffectMatrix object |
A position (rows) by nucleic acid (columns)
data.table is returned
# Isolate a single SNPEffectMatrix object from the default # SNPEffectMatrixCollection sm <- getSEMs(SEMC)[[1]] # Access the matrix getSEM(sm)# Isolate a single SNPEffectMatrix object from the default # SNPEffectMatrixCollection sm <- getSEMs(SEMC)[[1]] # Access the matrix getSEM(sm)
Access semId from a SNPEffectMatrix object
getSEMId(x) ## S4 method for signature 'SNPEffectMatrix' getSEMId(x)getSEMId(x) ## S4 method for signature 'SNPEffectMatrix' getSEMId(x)
x |
SNPEffectMatrix object |
The character id is returned
# Isolate a single SNPEffectMatrix object from the default # SNPEffectMatrixCollection sm <- getSEMs(SEMC)[[1]] # Access the SEM id getSEMId(sm)# Isolate a single SNPEffectMatrix object from the default # SNPEffectMatrixCollection sm <- getSEMs(SEMC)[[1]] # Access the SEM id getSEMId(sm)
Access sems matrix from a SNPEffectMatrixCollection object
getSEMs(x, semId = NULL) ## S4 method for signature 'SNPEffectMatrixCollection' getSEMs(x, semId = NULL)getSEMs(x, semId = NULL) ## S4 method for signature 'SNPEffectMatrixCollection' getSEMs(x, semId = NULL)
x |
SNPEffectMatrixCollection object |
semId |
optional |
A position (rows) by nucleic acid (columns)
data.table is returned
## Create example SEM df <- data.frame( A = c(1, 2, 3), C = c(1, 2, 3), G = c(1, 2, 3), T = c(1, 2, 3) ) s <- SNPEffectMatrix(df, 1.205, "motif_id") sc <- SNPEffectMatrixCollection(s) ## Access count matrix getSEMs(sc)## Create example SEM df <- data.frame( A = c(1, 2, 3), C = c(1, 2, 3), G = c(1, 2, 3), T = c(1, 2, 3) ) s <- SNPEffectMatrix(df, 1.205, "motif_id") sc <- SNPEffectMatrixCollection(s) ## Access count matrix getSEMs(sc)
Load .sem files and meta data into a SNPEffectMatrixCollection
loadSEMCollection( semFiles, semMetaData = NULL, semMetaKey = "", semIds = NULL, bls = NULL )loadSEMCollection( semFiles, semMetaData = NULL, semMetaKey = "", semIds = NULL, bls = NULL )
semFiles |
A list of paths to .sem files. Expects header of .sem files
to be in format |
semMetaData |
A |
semMetaKey |
The name of a column in semData that matches the semIds in
the sems list as a |
semIds |
Unique id for the sem as a |
bls |
|
A SNPEffectMatrix object
# write a tmp file to hold a SEM m <- matrix(rnorm(16), nrow = 4) colnames(m) <- c("A", "C", "G", "T") tf <- tempfile() write.table(m, tf, quote = FALSE, sep = "\t", row.names = FALSE) # build a meta data table md <- data.table::data.table( transcription_factor = c("tf1"), cell_type = c("HepG2"), sem_id = c("sem_id") ) loadSEMCollection(tf, semMetaData = md, semIds = "sem_id", semMetaKey = "sem_id", bls = 1 )# write a tmp file to hold a SEM m <- matrix(rnorm(16), nrow = 4) colnames(m) <- c("A", "C", "G", "T") tf <- tempfile() write.table(m, tf, quote = FALSE, sep = "\t", row.names = FALSE) # build a meta data table md <- data.table::data.table( transcription_factor = c("tf1"), cell_type = c("HepG2"), sem_id = c("sem_id") ) loadSEMCollection(tf, semMetaData = md, semIds = "sem_id", semMetaKey = "sem_id", bls = 1 )
enrichSEMs
Generates a circular dendrogram, clustering SNP Effect Matrices on similarity and a heatmap representing the -log10 transformed adjusted p-value of a SEMPLR enrichment.
plotEnrich( e, sem, label = "transcription_factor", method = "WPCC", threshold = 0.05, lineWidth = 0.5, textCols = c("darkgrey", "black"), textCex = 1, heatmapCols = c("white", "red"), pvalRange = c(0, 20) )plotEnrich( e, sem, label = "transcription_factor", method = "WPCC", threshold = 0.05, lineWidth = 0.5, textCols = c("darkgrey", "black"), textCex = 1, heatmapCols = c("white", "red"), pvalRange = c(0, 20) )
e |
The resulting data.table from |
sem |
A |
label |
Column in semData(sem) to use for tree labels |
method |
Method to use for SEM comparison. See ?universalmotif::compare_motifs for options. |
threshold |
The adjusted p-value threshold for coloring SEMs |
lineWidth |
A numeric specifying the dendrogram line width |
textCols |
A vector of two colors to label non-significant and significant SEMs respectively. |
textCex |
Text size of SEM labels. |
heatmapCols |
A vector of two colors to use for the heatmap, ordered
low to high |
pvalRange |
A vector of 2 numerics to use as the scale range for the
heatmap of |
a ggtree object
# load SEMs # note that this is a small example for demonstration purposes # in actual enrichment analyses sets of 100+ ranges are recommended # create a GRanges object gr <- GenomicRanges::GRanges( seqnames = "chr12", ranges = 94136009 ) # calculate binding propensity sb <- scoreBinding(gr, SEMC, BSgenome.Hsapiens.UCSC.hg19::Hsapiens) e <- enrichSEMs(sb, SEMC) plotEnrich(e, SEMC)# load SEMs # note that this is a small example for demonstration purposes # in actual enrichment analyses sets of 100+ ranges are recommended # create a GRanges object gr <- GenomicRanges::GRanges( seqnames = "chr12", ranges = 94136009 ) # calculate binding propensity sb <- scoreBinding(gr, SEMC, BSgenome.Hsapiens.UCSC.hg19::Hsapiens) e <- enrichSEMs(sb, SEMC) plotEnrich(e, SEMC)
Plot SEM scores for each nucleic acid in each position of the motif
plotSEM( sem, motif = NULL, motifSeq = NULL, rc = FALSE, cols = c("lightgrey", "dodgerblue"), size = 7, hindex = NULL, hcol = "dodgerblue", halpha = 0.1, hwidth = 15, lcol = "#d7dbdd", lwidth = 1 )plotSEM( sem, motif = NULL, motifSeq = NULL, rc = FALSE, cols = c("lightgrey", "dodgerblue"), size = 7, hindex = NULL, hcol = "dodgerblue", halpha = 0.1, hwidth = 15, lcol = "#d7dbdd", lwidth = 1 )
sem |
A SNPEffectMatrix or SNPEffectMatrixCollection object. |
motif |
sem id to plot. If providing a SNPEffectMatrixCollection object, must match a name in the list. If providing a single SNPEffectMatrix object, this parameter is ignored and the SNPEffectMatrix's semId is used for plotting. |
motifSeq |
Character sequence to color on plot |
rc |
Boolean indicating whether to plot the reverse complement orientation of the SEM |
cols |
A vector of two colors to color the nucleotides not included and included in the provided motifSeq respectively |
size |
Font size of nucleotides in plot |
hindex |
Index of nucleotide to highlight in the motifSeq |
hcol |
The color of the vertical highlight bar |
halpha |
The alpha transparency of the vertical highlight bar |
hwidth |
Width of highlight bar |
lcol |
Color of the horizongtal endogenous and background binding lines |
lwidth |
Width of the horizontal endogenous and background binding lines |
a ggplot with sem scores for each nucleic acid per position
library(VariantAnnotation) # Given a SNPEffectMatrix Collection plotSEM(SEMC, motif = "JUN") # Given a single SNPEffectMatrix sem <- getSEMs(SEMC, "JUN") plotSEM(sem) # color by sequence plotSEM(sem, motifSeq = "TGAGTCA", hindex = 2)library(VariantAnnotation) # Given a SNPEffectMatrix Collection plotSEM(SEMC, motif = "JUN") # Given a single SNPEffectMatrix sem <- getSEMs(SEMC, "JUN") plotSEM(sem) # color by sequence plotSEM(sem, motifSeq = "TGAGTCA", hindex = 2)
Plot non-alt versus alt binding propensity for a single variant
plotSEMMotifs( s, variant, label = "transcription_factor", rc = TRUE, labsize = 4, cols = c("#F8766D", "dodgerblue2"), ptsize = 1 )plotSEMMotifs( s, variant, label = "transcription_factor", rc = TRUE, labsize = 4, cols = c("#F8766D", "dodgerblue2"), ptsize = 1 )
s |
a SEMScores object with scores populated |
variant |
variant id to plot |
label |
column in sem_metadata slot of semplObj to use for point labels |
rc |
label SEM orientations |
labsize |
numeric size of the point labels |
cols |
vector of length 2 with colors to use for plotting gained and lost motifs respectively |
ptsize |
numeric size of the ggplot points |
a ggplot scatter plot of non-alt binding propensity versus alt
binding propensity
library(VariantAnnotation) # create a VRanges object vr <- VRanges( seqnames = "chr12", ranges = 94136009, ref = "G", alt = "C" ) # calculate binding propensity s <- scoreVariants(vr, SEMC, BSgenome.Hsapiens.UCSC.hg19::Hsapiens) plotSEMMotifs(s, "chr12:94136009:G>C", label = "transcription_factor")library(VariantAnnotation) # create a VRanges object vr <- VRanges( seqnames = "chr12", ranges = 94136009, ref = "G", alt = "C" ) # calculate binding propensity s <- scoreVariants(vr, SEMC, BSgenome.Hsapiens.UCSC.hg19::Hsapiens) plotSEMMotifs(s, "chr12:94136009:G>C", label = "transcription_factor")
Plot non-alt versus alt binding propensity for a single motif
plotSEMVariants( s, sem, label = "varId", labsize = 4, cols = c("#F8766D", "dodgerblue2"), ptsize = 1 )plotSEMVariants( s, sem, label = "varId", labsize = 4, cols = c("#F8766D", "dodgerblue2"), ptsize = 1 )
s |
a SEMScores object with scores populated |
sem |
a single character vector matching a SEM in the semplObj |
label |
column in scores slot of semplObj to use for point labels |
labsize |
numeric size of the point labels |
cols |
vector of length 2 with colors to use for plotting gained and lost motifs respectively |
ptsize |
numeric size of the ggplot points |
a ggplot scatter plot of non-alt binding propensity versus alt
binding propensity
library(VariantAnnotation) # create a VRanges object vr <- VRanges( seqnames = "chr12", ranges = 94136009, ref = "G", alt = "C" ) # calculate binding propensity s <- scoreVariants(vr, SEMC, BSgenome.Hsapiens.UCSC.hg19::Hsapiens) plotSEMVariants(s, "IKZF1_HUMAN.GM12878")library(VariantAnnotation) # create a VRanges object vr <- VRanges( seqnames = "chr12", ranges = 94136009, ref = "G", alt = "C" ) # calculate binding propensity s <- scoreVariants(vr, SEMC, BSgenome.Hsapiens.UCSC.hg19::Hsapiens) plotSEMVariants(s, "IKZF1_HUMAN.GM12878")
Calculate binding propensity for all SEM motifs and genomic positions provided
scoreBinding(x, sem, genome, nFlank = NULL, seqId = NULL, rc = TRUE)scoreBinding(x, sem, genome, nFlank = NULL, seqId = NULL, rc = TRUE)
x |
|
sem |
A |
genome |
A |
nFlank |
Number of flanking nucleotides to add to provided range. By default will add flank equal to the length of the longest motif. Ignored if providing a vector of sequences. |
seqId |
Column in |
rc |
plot the reverse complement SEMs |
If a GRanges object is provided, return a SEMScores object.
If a list of sequences is provided, just return the scoring table.
The scoring table will contain the following columns:
seqId: a unique identifier for the sequence scored
SEM: the identifier for the SEM
rc: the orientation of the sequence (fwd or rev)
score: the unnormalized SEM score
scoreNorm: the SEM score normalized to it's corresponding baseline
index: the index of the motif with the highest SEM score within the sequence scored
seq: the motif sequence with the highest SEM scores within the sequence scored
# load SEMs # create a GRanges object gr <- GenomicRanges::GRanges( seqnames = "chr12", ranges = 94136009 ) # calculate binding propensity scoreBinding(gr, SEMC, BSgenome.Hsapiens.UCSC.hg19::Hsapiens)# load SEMs # create a GRanges object gr <- GenomicRanges::GRanges( seqnames = "chr12", ranges = 94136009 ) # calculate binding propensity scoreBinding(gr, SEMC, BSgenome.Hsapiens.UCSC.hg19::Hsapiens)
Calculate risk/non-risk binding propensity for all SEM motifs and variants provided
scoreVariants( x, sem, genome, refCol = NULL, altCol = NULL, varId = NULL, rc = TRUE )scoreVariants( x, sem, genome, refCol = NULL, altCol = NULL, varId = NULL, rc = TRUE )
x |
|
sem |
a list of |
genome |
A |
refCol |
If providing a GRanges, the meta data column name with the reference (ref) allele. Ignored if providing a VRanges object. |
altCol |
If providing a GRanges, the meta data column name with the alternative (alt) allele. Ignored if providing a VRanges object. |
varId |
A column name in the meta data of x to use as a unique id. |
rc |
plot the reverse complement SEMs |
a SEMScores object with slots for the ranges of the provided
variants with an added sequence column with the sequence scored,
the SEM metadata, and the resulting scoring table.
These slots are accessible with the getRanges(), semData(), and
scores() accessor functions respectively.
The scoring table will contain the following columns:
varId: a unique identifier for the sequence scored
SEM: the identifier for the SEM
rc: the orientation of the sequence (fwd or rev)
score: the unnormalized SEM score
scoreNorm: the SEM score normalized to it's corresponding baseline
index: the index of the motif with the highest SEM score within the sequence scored
seq: the motif sequence with the highest SEM scores within the sequence scored
library(VariantAnnotation) # load default SEMs # create a VRanges object x <- VRanges( seqnames = "chr12", ranges = 94136009, ref = "G", alt = "C" ) # calculate binding propensity scoreVariants(x, SEMC, BSgenome.Hsapiens.UCSC.hg19::Hsapiens)library(VariantAnnotation) # load default SEMs # create a VRanges object x <- VRanges( seqnames = "chr12", ranges = 94136009, ref = "G", alt = "C" ) # calculate binding propensity scoreVariants(x, SEMC, BSgenome.Hsapiens.UCSC.hg19::Hsapiens)
A collection of pre-computed SNP Effect Matrix objects to be used for motif scoring
SEMCSEMC
SEMCA SNPEffectMatrixCollection object containing 223 SEMs as SNPEffectMatrix objects and a data frame with 223 rows and 13 columns containing meta data:
Transcription factor name
Ensembl id
Uniprot accession id
Position weighted matrix id
SNP Effect Matrix file
SNP Effect Matrix baseline
Cell Type
-log10(p value) from SEMpl calculation
ENCODE accession for ChIP data used in SEMpl
ENCODE accession for DNase data used in SEMpl
Position weighted matrix source
...
https://data.igvf.org/multireport/?type=ModelSet&software_versions.software.title=SEMpl
Accessor semData slot in a SEMScores object
Access semData from a SNPEffectMatrixCollection object
semData(x) ## S4 method for signature 'SEMScores' semData(x) ## S4 method for signature 'SNPEffectMatrixCollection' semData(x)semData(x) ## S4 method for signature 'SEMScores' semData(x) ## S4 method for signature 'SNPEffectMatrixCollection' semData(x)
x |
SNPEffectMatrixCollection object |
A data.table is returned
library(VariantAnnotation) # load default SEMs # create a VRanges object vr <- VRanges( seqnames = "chr12", ranges = 94136009, ref = "G", alt = "C" ) # calculate binding propensity s <- scoreVariants(vr, SEMC, BSgenome.Hsapiens.UCSC.hg19::Hsapiens) semData(s) ## Create example SEM df <- data.frame( A = c(1, 2, 3), C = c(1, 2, 3), G = c(1, 2, 3), T = c(1, 2, 3) ) sem_data <- data.frame(SEM = "motif_id", TF = "tf_id") s <- SNPEffectMatrix(df, 1.205, "motif_id") sc <- SNPEffectMatrixCollection(s, semData = sem_data, semKey = "SEM") ## Access count matrix semData(sc)library(VariantAnnotation) # load default SEMs # create a VRanges object vr <- VRanges( seqnames = "chr12", ranges = 94136009, ref = "G", alt = "C" ) # calculate binding propensity s <- scoreVariants(vr, SEMC, BSgenome.Hsapiens.UCSC.hg19::Hsapiens) semData(s) ## Create example SEM df <- data.frame( A = c(1, 2, 3), C = c(1, 2, 3), G = c(1, 2, 3), T = c(1, 2, 3) ) sem_data <- data.frame(SEM = "motif_id", TF = "tf_id") s <- SNPEffectMatrix(df, 1.205, "motif_id") sc <- SNPEffectMatrixCollection(s, semData = sem_data, semKey = "SEM") ## Access count matrix semData(sc)
Constructs a SEMScores class object.
SEMScores(ranges = NULL, semData = NULL, scores = NULL)SEMScores(ranges = NULL, semData = NULL, scores = NULL)
ranges |
A |
semData |
A named list of SNPEffectMatrix objects |
scores |
(optional) A |
a SEMScores object
# load default SEMs # create a VRanges object vr <- VariantAnnotation::VRanges( seqnames = c("chr12", "chr19"), ranges = c(94136009, 10640062), ref = c("G", "T"), alt = c("C", "A") ) SEMScores(ranges = vr, semData = semData(SEMC))# load default SEMs # create a VRanges object vr <- VariantAnnotation::VRanges( seqnames = c("chr12", "chr19"), ranges = c(94136009, 10640062), ref = c("G", "T"), alt = c("C", "A") ) SEMScores(ranges = vr, semData = semData(SEMC))
Class for storing SEM motif binding calculations for multiple genomic ranges or variants
rangesA GRanges or VRanges object to hold one or more genomic
ranges
semDataA data.table object of metadata for the SEMs with one
row for each SEM
scoresA data.table object for motif information and binding scores
Prints information about the number of variants, SEM meta data columns, and the scoring table if scoreVariants has been run.
## S4 method for signature 'SEMScores' show(object)## S4 method for signature 'SEMScores' show(object)
object |
a SEMScores object |
An invisible NULL
Show for SNPEffectMatrix
## S4 method for signature 'SNPEffectMatrix' show(object)## S4 method for signature 'SNPEffectMatrix' show(object)
object |
SNPEffectMatrix object. |
An invisible NULL
Prints information about the number of SEMs and meta data columns included in the object.
## S4 method for signature 'SNPEffectMatrixCollection' show(object)## S4 method for signature 'SNPEffectMatrixCollection' show(object)
object |
a SNPEffectMatrixCollection object |
An invisible NULL
Constructs a SNPEffectMatrix class object.
SNPEffectMatrix(sem, baseline, semId)SNPEffectMatrix(sem, baseline, semId)
sem |
A |
baseline |
A |
semId |
A |
a SNPEffectMatrix object
# build a matrix for a motif of length 4 m <- matrix(rnorm(16), nrow = 4) colnames(m) <- c("A", "C", "G", "T") # build a SNPEffectMatrix object SNPEffectMatrix(m, baseline = 1, semId = "sem_id")# build a matrix for a motif of length 4 m <- matrix(rnorm(16), nrow = 4) colnames(m) <- c("A", "C", "G", "T") # build a SNPEffectMatrix object SNPEffectMatrix(m, baseline = 1, semId = "sem_id")
A SNP Effect Matrix (SEM) estimates the binding affinity of every possible mutation in a particular transcription factor (TF) binding motif. Read more about SEMs here: https://doi.org/10.1093/bioinformatics/btz612 This class contains three slots: the matrix, the baseline value, and a unique id
semThe SEM itself as a data table Rows represent sequence position (variable length), columns represent effects due to each nucleotide base A, C, G, T (fixed length: 4)
baselineA scrambled baseline, representing the binding score of randomly scrambled kmers of the same length. This is the binding cutoff for a TF
semIdbasename of the sem file
# build a SEM example matrix m <- matrix(rnorm(16), nrow = 4) colnames(m) <- c("A", "C", "G", "T") SNPEffectMatrix(sem = m, baseline = 1, semId = "sem_id")# build a SEM example matrix m <- matrix(rnorm(16), nrow = 4) colnames(m) <- c("A", "C", "G", "T") SNPEffectMatrix(sem = m, baseline = 1, semId = "sem_id")
Constructs a SNPEffectMatrixCollection class object.
SNPEffectMatrixCollection(sems, semData = NULL, semKey = "")SNPEffectMatrixCollection(sems, semData = NULL, semKey = "")
sems |
A list of SNPEffectMatrix objects |
semData |
A |
semKey |
A column name in semData corresponding to the semIds in each SNPEffectMatrix object in the sems parameter |
a SNPEffectMatrixCollection object
# # build two matries for a motif of length 4 m1 <- matrix(rnorm(16), nrow = 4) m2 <- matrix(rnorm(16), nrow = 4) colnames(m1) <- c("A", "C", "G", "T") colnames(m2) <- c("A", "C", "G", "T") # build two SNPEffectMatrix objects s1 <- SNPEffectMatrix(m1, baseline = 1, semId = "sem_id_1") s2 <- SNPEffectMatrix(m2, baseline = 1, semId = "sem_id_2") # build a meta data table md <- data.table::data.table( transcription_factor = c("tf1", "tf2"), cell_type = c("HepG2", "K562"), sem_id = c("sem_id_1", "sem_id_2") ) # build a SNPEffectMatrixCollection object SNPEffectMatrixCollection(list(s1, s2), semData = md, semKey = "sem_id")# # build two matries for a motif of length 4 m1 <- matrix(rnorm(16), nrow = 4) m2 <- matrix(rnorm(16), nrow = 4) colnames(m1) <- c("A", "C", "G", "T") colnames(m2) <- c("A", "C", "G", "T") # build two SNPEffectMatrix objects s1 <- SNPEffectMatrix(m1, baseline = 1, semId = "sem_id_1") s2 <- SNPEffectMatrix(m2, baseline = 1, semId = "sem_id_2") # build a meta data table md <- data.table::data.table( transcription_factor = c("tf1", "tf2"), cell_type = c("HepG2", "K562"), sem_id = c("sem_id_1", "sem_id_2") ) # build a SNPEffectMatrixCollection object SNPEffectMatrixCollection(list(s1, s2), semData = md, semKey = "sem_id")
A collection of SNPEffectMatrix objects and their corresponding meta data
semsA list of SNPEffectMatrix objects
semDataA data.table with meta data on each SEM
semKeyThe name of a column in semData that matches the semIds in the
sems list as a character. If column entries have a .sem suffix, a new
column named SEM_KEY will be created without the .sem suffixes.
# build a SEM example matrix m <- matrix(rnorm(16), nrow = 4) colnames(m) <- c("A", "C", "G", "T") # build a SNPEffectMatrix object sm <- SNPEffectMatrix(sem = m, baseline = 1, semId = "sem_name") # create a meta data table md <- data.table::data.table(tf = "tf_name", sem = "sem_name") # build a collection with 1 SEM SNPEffectMatrixCollection(sems = list(sm), semData = md, semKey = "sem")# build a SEM example matrix m <- matrix(rnorm(16), nrow = 4) colnames(m) <- c("A", "C", "G", "T") # build a SNPEffectMatrix object sm <- SNPEffectMatrix(sem = m, baseline = 1, semId = "sem_name") # create a meta data table md <- data.table::data.table(tf = "tf_name", sem = "sem_name") # build a collection with 1 SEM SNPEffectMatrixCollection(sems = list(sm), semData = md, semKey = "sem")