Title: | Container for PhIP-Seq Experiments |
---|---|
Description: | PhIPData defines an S4 class for phage-immunoprecipitation sequencing (PhIP-seq) experiments. Buliding upon the RangedSummarizedExperiment class, PhIPData enables users to coordinate metadata with experimental data in analyses. Additionally, PhIPData provides specialized methods to subset and identify beads-only samples, subset objects using virus aliases, and use existing peptide libraries to populate object parameters. |
Authors: | Athena Chen [aut, cre] , Rob Scharpf [aut], Ingo Ruczinski [aut] |
Maintainer: | Athena Chen <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.15.0 |
Built: | 2024-11-30 03:04:09 UTC |
Source: | https://github.com/bioc/PhIPData |
Rather than typing out full viruses names or repeating
regexpressions, users can use aliases as a convenient tool to subset
PhIPData
objects by viral species.
getAlias(virus) setAlias(virus, pattern) deleteAlias(virus)
getAlias(virus) setAlias(virus, pattern) deleteAlias(virus)
virus |
character vector of the alias |
pattern |
character vector of regexpressions corresponding to the alias |
Aliases are cached to an rda file containing only a
data.frame
with two columns: alias
and pattern
. The
alias
column contains the alias while the pattern
column
contains the corresponding regexpression of interest.
Once an alias is added to the database, it can always be accessed once the
package is loaded. It is recommended to use the functions setAlias
and deleteAlias
. If an alias already exists in the database,
setAlias
replaces the matched pattern. If an alias does not exist
in the database, getAlias
returns NA_character_
.
getAlias()
returns a vector of regexpressions corresponding to
queried inputs. The returned vector is the same length as the input vector.
Queries that do not exist in the database return NA_character_
.
getAlias
: return a regexpression corresponding to the alias.
setAlias
: define/modify the regexpression for an alias.
deleteAlias
: remove an alias from the database.
## Edit and modify aliases in the database setAlias("test_virus", "test_pattern") getAlias("test_virus") setAlias("test_virus", "test_pattern2") getAlias("test_virus") deleteAlias("test_virus") ## Edit and modify multiple aliases at once. setAlias(c("virus_1", "virus_2"), c("pattern_1", "pattern_2")) getAlias(c("virus_1", "virus_2")) deleteAlias(c("virus_1", "virus_2")) ## Example of how to subset HIV using `getAlias` ## Often, it is useful to set the `ignore.case` of `grep`/`grepl` to TRUE. counts_dat <- matrix(1:10, nrow = 5) peptide_meta <- data.frame(species = c( rep("Epstein-Barr virus", 2), rep("human immunodeficiency virus", 3) )) phip_obj <- PhIPData(counts = counts_dat, peptideInfo = peptide_meta) subset(phip_obj, grepl(getAlias("HIV"), species, ignore.case = TRUE))
## Edit and modify aliases in the database setAlias("test_virus", "test_pattern") getAlias("test_virus") setAlias("test_virus", "test_pattern2") getAlias("test_virus") deleteAlias("test_virus") ## Edit and modify multiple aliases at once. setAlias(c("virus_1", "virus_2"), c("pattern_1", "pattern_2")) getAlias(c("virus_1", "virus_2")) deleteAlias(c("virus_1", "virus_2")) ## Example of how to subset HIV using `getAlias` ## Often, it is useful to set the `ignore.case` of `grep`/`grepl` to TRUE. counts_dat <- matrix(1:10, nrow = 5) peptide_meta <- data.frame(species = c( rep("Epstein-Barr virus", 2), rep("human immunodeficiency virus", 3) )) phip_obj <- PhIPData(counts = counts_dat, peptideInfo = peptide_meta) subset(phip_obj, grepl(getAlias("HIV"), species, ignore.case = TRUE))
getBeadsName
and setBeadsName
are two function to
get and set the string that encodes which samples are beads-only samples.
Information about beads-only samples are stored in the groups
column
of sampleInfo
.
getBeadsName() setBeadsName(name)
getBeadsName() setBeadsName(name)
name |
a string indicating how beads-only samples are encoded. |
If name
is of length greater than one, only the first element
of the vector is used. Non-character values of name
are first coerced
into strings.
a string indicating how beads-only samples are encoded.
getBeadsName
: function that returns a string corresponding to how
beads-only samples are encoded.
setBeadsName
: function to set the string that indicates which
samples are beads-only samples in the groups
column of
sampleInfo
.
## Returns the default string, "beads" getBeadsName() ## Not run since it changes defaults/user settings ## Not run: setBeadsName("beads-only") ## End(Not run)
## Returns the default string, "beads" getBeadsName() ## Not run since it changes defaults/user settings ## Not run: setBeadsName("beads-only") ## End(Not run)
This function is a wrapper function for colSums on the counts
assay.
librarySize(object, ..., withDimnames = TRUE)
librarySize(object, ..., withDimnames = TRUE)
object |
PhIPData object |
... |
arguments passed to colSums |
withDimnames |
logical; if true, the vector names are the sample names; otherwise the vector is unnamed. |
a (named) numeric vector. The length of the vector is equal to the number of samples.
example("PhIPData") librarySize(phip_obj) ## Return an unnamed vector librarySize(phip_obj, withDimnames = FALSE)
example("PhIPData") librarySize(phip_obj) ## Return an unnamed vector librarySize(phip_obj, withDimnames = FALSE)
PhIP-Seq experiments often use identical peptide libraries different cohorts. These functions enable the user to conveniently reuse tidied libraries.
getLibrary(name) makeLibrary(library, name) removeLibrary(name) listLibrary()
getLibrary(name) makeLibrary(library, name) removeLibrary(name) listLibrary()
name |
name of the library |
library |
a |
Each library is stored as a DataFrame in .rds file.
New libraries can be stored for future use with the makeLibrary
function.
getLibrary
returns a DataFrame corresponding to
the peptide information for the specified library.
getLibrary
: return a DataFrame with the
peptide information corresponding to the library.
makeLibrary
: create and store a DataFrame with
the specified peptide information.
removeLibrary
: delete stored libraries
listLibrary
: list all available libraries
## Create a new library pep_meta <- data.frame(species = c( rep("human immunodeficiency virus", 3), rep("Epstein-Barr virus", 2) )) makeLibrary(pep_meta, "new_library") ## Use new library counts_dat <- matrix(1:10, nrow = 5) phip_obj <- PhIPData( counts = counts_dat, peptideInfo = getLibrary("new_library") ) ## List libraries listLibrary() ## Delete created library removeLibrary("new_library")
## Create a new library pep_meta <- data.frame(species = c( rep("human immunodeficiency virus", 3), rep("Epstein-Barr virus", 2) )) makeLibrary(pep_meta, "new_library") ## Use new library counts_dat <- matrix(1:10, nrow = 5) phip_obj <- PhIPData( counts = counts_dat, peptideInfo = getLibrary("new_library") ) ## List libraries listLibrary() ## Delete created library removeLibrary("new_library")
The PhIPData
class is a matrix-like container designed to organize
results from phage-immunoprecipitation (PhIP-Seq) experiments. Rows in
PhIPData objects represent peptides and columns represent samples. Each
object contains at least three assays:
counts
: a matrix of raw read counts,
logfc
: a matrix of log2 estimated fold-change in
comparison to beads-only samples,
prob
: a matrix of probabilities associated with whether
a sample has an enriched antibody response for a peptide.
The PhIPData
class extends the
RangedSummarizedExperiment class, so methods documented
in RangedSummarizedExperiment and
SummarizedExperiment also work on PhIPData
objects.
PhIPData( counts = matrix(nrow = 0, ncol = 0), logfc = matrix(nrow = 0, ncol = 0), prob = matrix(nrow = 0, ncol = 0), peptideInfo = S4Vectors::DataFrame(), sampleInfo = S4Vectors::DataFrame(), metadata = list(), .defaultNames = "info" )
PhIPData( counts = matrix(nrow = 0, ncol = 0), logfc = matrix(nrow = 0, ncol = 0), prob = matrix(nrow = 0, ncol = 0), peptideInfo = S4Vectors::DataFrame(), sampleInfo = S4Vectors::DataFrame(), metadata = list(), .defaultNames = "info" )
counts |
a |
logfc |
a |
prob |
a |
peptideInfo |
a |
sampleInfo |
a |
metadata |
a |
.defaultNames |
vector of names to use when sample and peptide
identifiers disagree across the metadata and the Valid options are:
|
Rows of PhIPData
objects correspond to peptides of interest and are
organized in GRanges or GRangesList objects. Though originally
designed for genomic ranges, the sequence name and genomic range information
in GRanges objects can be replaced with peptide names and amino acid
positions, respectively. If no peptide names are given, peptides are given
the names of pep_rownum
. Peptide positions are specified by columns
pos_start
and pos_end
in the peptideInfo
argument of the
constuctor. Missing position information is set to 0. Additional peptide
annotation can also be stored in GRanges objects and can be used
to subset PhIPData
objects as shown below.
Columns of PhIPData
objects represent samples. Sample metadata
are stored in a DataFrame and can be accessed as shown below. If no
sample names are specified, samples are given default names of
sample_colnum
.
Unlike RangedSummarizedExperiment/SummarizedExperiment objects,
PhIPData
objects must contain counts
, logfc
,
prob
. If any of the three assays are missing when the constructor is
called, an empty matrix of the same names and dimensions is initialized for
that assay. Sample and peptide names are harmonized across assays and
annotation during construction and replacement.
Though 'counts' typically contain integer values for the number of reads aligned to each peptide, 'PhIPData' only requires that stored values are non-negative numeric values. Pseudocounts or non-integer count values can also be stored in the 'counts' assay.
A PhIPData
object.
PhIPData
objects are constructed using the homonymous function and
arguments as described above. Any PhIPData
object can be created
so long as peptide and sample identifiers (or lack thereof) are specified
via any of the parameters.
PhIPData-methods
for accessors and modifiers for PhIPData
components.
SummarizedExperiment
## Construct a new PhIPData object counts_dat <- matrix(sample(1:1e6, 25, replace = TRUE), nrow = 5) logfc_dat <- matrix(rnorm(25, 0, 10), nrow = 5) prob_dat <- matrix(rbeta(25, 1, 1), nrow = 5) peptide_meta <- data.frame( pos_start = 1:5, pos_end = 6:10, species = c(rep("HIV", 3), rep("EBV", 2)) ) sample_meta <- data.frame( gender = sample(c("M", "F"), 5, TRUE), group = sample(c("ctrl", "trt", "beads"), 5, TRUE) ) exp_meta <- list( date_run = as.Date("2021/01/20"), reads_per_sample = colSums(counts_dat) ) rownames(counts_dat) <- rownames(logfc_dat) <- rownames(prob_dat) <- rownames(peptide_meta) <- paste0("pep_", 1:5) colnames(counts_dat) <- colnames(logfc_dat) <- colnames(prob_dat) <- rownames(sample_meta) <- paste0("sample_", 1:5) phip_obj <- PhIPData( counts_dat, logfc_dat, prob_dat, peptide_meta, sample_meta, exp_meta ) phip_obj
## Construct a new PhIPData object counts_dat <- matrix(sample(1:1e6, 25, replace = TRUE), nrow = 5) logfc_dat <- matrix(rnorm(25, 0, 10), nrow = 5) prob_dat <- matrix(rbeta(25, 1, 1), nrow = 5) peptide_meta <- data.frame( pos_start = 1:5, pos_end = 6:10, species = c(rep("HIV", 3), rep("EBV", 2)) ) sample_meta <- data.frame( gender = sample(c("M", "F"), 5, TRUE), group = sample(c("ctrl", "trt", "beads"), 5, TRUE) ) exp_meta <- list( date_run = as.Date("2021/01/20"), reads_per_sample = colSums(counts_dat) ) rownames(counts_dat) <- rownames(logfc_dat) <- rownames(prob_dat) <- rownames(peptide_meta) <- paste0("pep_", 1:5) colnames(counts_dat) <- colnames(logfc_dat) <- colnames(prob_dat) <- rownames(sample_meta) <- paste0("sample_", 1:5) phip_obj <- PhIPData( counts_dat, logfc_dat, prob_dat, peptide_meta, sample_meta, exp_meta ) phip_obj
Methods to extract and modify assay(s)
(including
convenient functions for counts
, logfc
, and prob
),
sampleInfo
, peptideInfo
, and metadata
.
## S4 method for signature 'PhIPData' counts(object, ...) logfc(object, ...) ## S4 method for signature 'PhIPData' logfc(object, ...) prob(object, ...) ## S4 method for signature 'PhIPData' prob(object, ...) peptideInfo(object, ...) ## S4 method for signature 'PhIPData' peptideInfo(object, ...) sampleInfo(object, ...) ## S4 method for signature 'PhIPData' sampleInfo(object, ...) ## S4 replacement method for signature 'PhIPData,list' assays(x, withDimnames = TRUE, ...) <- value ## S4 replacement method for signature 'PhIPData,SimpleList' assays(x, withDimnames = TRUE, ...) <- value ## S4 replacement method for signature 'PhIPData,missing' assay(x, i, withDimnames = TRUE, ...) <- value ## S4 replacement method for signature 'PhIPData,numeric' assay(x, i, withDimnames = TRUE, ...) <- value ## S4 replacement method for signature 'PhIPData,character' assay(x, i, withDimnames = TRUE, ...) <- value ## S4 replacement method for signature 'PhIPData' counts(object, ...) <- value logfc(object, ...) <- value ## S4 replacement method for signature 'PhIPData' logfc(object, ...) <- value prob(object, ...) <- value ## S4 replacement method for signature 'PhIPData' prob(object, ...) <- value peptideInfo(object) <- value ## S4 replacement method for signature 'PhIPData' peptideInfo(object) <- value sampleInfo(object, ...) <- value ## S4 replacement method for signature 'PhIPData' sampleInfo(object) <- value
## S4 method for signature 'PhIPData' counts(object, ...) logfc(object, ...) ## S4 method for signature 'PhIPData' logfc(object, ...) prob(object, ...) ## S4 method for signature 'PhIPData' prob(object, ...) peptideInfo(object, ...) ## S4 method for signature 'PhIPData' peptideInfo(object, ...) sampleInfo(object, ...) ## S4 method for signature 'PhIPData' sampleInfo(object, ...) ## S4 replacement method for signature 'PhIPData,list' assays(x, withDimnames = TRUE, ...) <- value ## S4 replacement method for signature 'PhIPData,SimpleList' assays(x, withDimnames = TRUE, ...) <- value ## S4 replacement method for signature 'PhIPData,missing' assay(x, i, withDimnames = TRUE, ...) <- value ## S4 replacement method for signature 'PhIPData,numeric' assay(x, i, withDimnames = TRUE, ...) <- value ## S4 replacement method for signature 'PhIPData,character' assay(x, i, withDimnames = TRUE, ...) <- value ## S4 replacement method for signature 'PhIPData' counts(object, ...) <- value logfc(object, ...) <- value ## S4 replacement method for signature 'PhIPData' logfc(object, ...) <- value prob(object, ...) <- value ## S4 replacement method for signature 'PhIPData' prob(object, ...) <- value peptideInfo(object) <- value ## S4 replacement method for signature 'PhIPData' peptideInfo(object) <- value sampleInfo(object, ...) <- value ## S4 replacement method for signature 'PhIPData' sampleInfo(object) <- value
object |
A |
... |
parameters for |
x |
A |
withDimnames |
Parameter for RangedSummarizedExperiment class functions. Overrided since row/column names are automatically synced within each object. |
value |
A |
i |
A |
In addition to the functions detailed in
RangedSummarizedExperiment, the PhIPData
class includes
conveniently named functions to quickly access and modify frequently used
components of PhIPData objects.
Replacement functions ensure that names of the replacement object are matched
with the names of the PhIPData
object.
Since packages for identifying differential expression in RNA-seq experiments are frequently used for estimating fold-changes for peptide enrichments, the class also includes coercion methods to and from DGELists.
Accessors: a DataFrame object
Setters: a PhIPData
object
In the following code snippets, x
is a PhIPData object,
value
is a matrix-like object with the same dimensions as x
,
and ...
are further arguments passed to assay
(for the getter) or assay<-
(for the setter).
counts(x, ...)
, counts(x, ...) <- value
:Get or set a matrix of raw read counts
logfc(x, ...)
, logfc(x, ...) <- value
:Get or set a matrix of log2 estimated fold changes (in comparison to beads-only samples)
prob(x, ...)
, pob(x, ...) <- value
:Get or set a matrix of probabilities associated with whether a sample has an enriched antibody response for a peptide.
assays
for
SummarizedExperiment operations.
example("PhIPData") replacement_dat <- matrix(1L, nrow = 5, ncol = 5) ## SummarizedExperiment Accessors and Setters assays(phip_obj) assays(phip_obj)$counts <- replacement_dat assay(phip_obj, "logfc") assay(phip_obj, "logfc") <- replacement_dat ## counts counts(phip_obj) counts(phip_obj) <- counts_dat ## logfc logfc(phip_obj) logfc(phip_obj) <- logfc_dat ## prob prob(phip_obj) prob(phip_obj) <- replacement_dat ## coercion functions as(phip_obj, "DGEList") as(phip_obj, "List") as(phip_obj, "list")
example("PhIPData") replacement_dat <- matrix(1L, nrow = 5, ncol = 5) ## SummarizedExperiment Accessors and Setters assays(phip_obj) assays(phip_obj)$counts <- replacement_dat assay(phip_obj, "logfc") assay(phip_obj, "logfc") <- replacement_dat ## counts counts(phip_obj) counts(phip_obj) <- counts_dat ## logfc logfc(phip_obj) logfc(phip_obj) <- logfc_dat ## prob prob(phip_obj) prob(phip_obj) <- replacement_dat ## coercion functions as(phip_obj, "DGEList") as(phip_obj, "List") as(phip_obj, "list")
This function calculates the proportion of total sample reads pulled by each peptide.
propReads(object, withDimnames = TRUE)
propReads(object, withDimnames = TRUE)
object |
PhIPData object |
withDimnames |
logical; if true return a matrix with the same dimension names as the original object. |
A (named) numeric matrix with the same dimensions as the function input. Matrix values are between 0 and 1.
example("PhIPData") propReads(phip_obj) ## Return an unnamed matrix propReads(phip_obj, withDimnames = FALSE)
example("PhIPData") propReads(phip_obj) ## Return an unnamed matrix propReads(phip_obj, withDimnames = FALSE)
Function to subset PhIP-seq data for beads-only samples.
subsetBeads(object)
subsetBeads(object)
object |
PhIPData object |
a PhIPData object.
example("PhIPData") subsetBeads(phip_obj)
example("PhIPData") subsetBeads(phip_obj)