Title: | CMap Tools in R |
---|---|
Description: | The Connectivity Map (CMap) is a massive resource of perturbational gene expression profiles built by researchers at the Broad Institute and funded by the NIH Library of Integrated Network-Based Cellular Signatures (LINCS) program. Please visit https://clue.io for more information. The cmapR package implements methods to parse, manipulate, and write common CMap data objects, such as annotated matrices and collections of gene sets. |
Authors: | Ted Natoli [aut, cre] |
Maintainer: | Ted Natoli <[email protected]> |
License: | file LICENSE |
Version: | 1.19.0 |
Built: | 2024-11-18 03:24:38 UTC |
Source: | https://github.com/bioc/cmapR |
Align the rows and columns of two (or more) matrices
align_matrices(m1, m2, ..., L = NULL, na.pad = TRUE, as.3D = TRUE)
align_matrices(m1, m2, ..., L = NULL, na.pad = TRUE, as.3D = TRUE)
m1 |
a matrix with unique row and column names |
m2 |
a matrix with unique row and column names |
... |
additional matrices with unique row and column names |
L |
a list of matrix objects. If this is given, m1, m2, and ... are ignored |
na.pad |
boolean indicating whether to pad the combined matrix with NAs for rows/columns that are not shared by m1 and m2. |
as.3D |
boolean indicating whether to return the result as a 3D array. If FALSE, will return a list. |
an object containing the aligned matrices. Will either be a list or a 3D array
# construct some example matrices m1 <- matrix(rnorm(20), nrow=4) rownames(m1) <- letters[1:4] colnames(m1) <- LETTERS[1:5] m2 <- matrix(rnorm(20), nrow=5) rownames(m2) <- letters[1:5] colnames(m2) <- LETTERS[1:4] m1 m2 # align them, padding with NA and returning a 3D array align_matrices(m1, m2) # align them, not padding and retuning a list align_matrices(m1, m2, na.pad=FALSE, as.3D=FALSE)
# construct some example matrices m1 <- matrix(rnorm(20), nrow=4) rownames(m1) <- letters[1:4] colnames(m1) <- LETTERS[1:5] m2 <- matrix(rnorm(20), nrow=5) rownames(m2) <- letters[1:5] colnames(m2) <- LETTERS[1:4] m1 m2 # align them, padding with NA and returning a 3D array align_matrices(m1, m2) # align them, not padding and retuning a list align_matrices(m1, m2, na.pad=FALSE, as.3D=FALSE)
Given a GCT object and either a data.frame
or
a path to an annotation table, apply the annotations to the
gct using the given keyfield
.
annotate.gct(...) annotate_gct(g, annot, dim = "row", keyfield = "id") ## S4 method for signature 'GCT' annotate_gct(g, annot, dim = "row", keyfield = "id")
annotate.gct(...) annotate_gct(g, annot, dim = "row", keyfield = "id") ## S4 method for signature 'GCT' annotate_gct(g, annot, dim = "row", keyfield = "id")
... |
arguments passed on to |
g |
a GCT object |
annot |
a |
dim |
either 'row' or 'column' indicating which dimension
of |
keyfield |
the character name of the column in |
a GCT object with annotations applied to the specified dimension
Other GCT utilities:
melt.gct()
,
merge.gct()
,
rank.gct()
,
subset.gct()
gct_path <- system.file("extdata", "modzs_n25x50.gctx", package="cmapR") # read the GCT file, getting the matrix only g <- parse_gctx(gct_path, matrix_only=TRUE) # separately, read the column annotations and then apply them using # annotate_gct cdesc <- read_gctx_meta(gct_path, dim="col") g <- annotate_gct(g, cdesc, dim="col", keyfield="id")
gct_path <- system.file("extdata", "modzs_n25x50.gctx", package="cmapR") # read the GCT file, getting the matrix only g <- parse_gctx(gct_path, matrix_only=TRUE) # separately, read the column annotations and then apply them using # annotate_gct cdesc <- read_gctx_meta(gct_path, dim="col") g <- annotate_gct(g, cdesc, dim="col", keyfield="id")
An example table of metadata, as would be parsed from or parse.gctx. Initially all the columns are of type character.
cdesc_char
cdesc_char
An object of class data.frame
with 368 rows and 8 columns.
test_names
are columns in the data.frame
dfCheck whether test_names
are columns in the data.frame
df
check_colnames(test_names, df, throw_error = TRUE)
check_colnames(test_names, df, throw_error = TRUE)
test_names |
a vector of column names to test |
df |
the |
throw_error |
boolean indicating whether to throw an error if
any |
boolean indicating whether or not all test_names
are
columns of df
check_colnames(c("pert_id", "pert_iname"), cdesc_char) # TRUE check_colnames(c("pert_id", "foobar"), cdesc_char, throw_error=FALSE)# FALSE, suppress error
check_colnames(c("pert_id", "pert_iname"), cdesc_char) # TRUE check_colnames(c("pert_id", "foobar"), cdesc_char, throw_error=FALSE)# FALSE, suppress error
Check for duplicates in a vector
check_dups(x, name = "")
check_dups(x, name = "")
x |
the vector |
name |
the name of the object to print in an error message if duplicates are found |
silently returns NULL
# this will throw an erorr, let's catch it tryCatch( check_dups(c("a", "b", "c", "a", "d")), error=function(e) print(e) )
# this will throw an erorr, let's catch it tryCatch( check_dups(c("a", "b", "c", "a", "d")), error=function(e) print(e) )
This is equivalent to the 'modz' procedure used in collapsing replicates in traditional L1000 data processing. The weight for each replicate is computed as its normalized average correlation to the other replicates in the set.
distil(m, dimension = "col", method = "spearman")
distil(m, dimension = "col", method = "spearman")
m |
a numeric matrix where the rows or columns are assumed to be replicates |
dimension |
the dimension to collapse. either 'row' or 'col' |
method |
the correlation method to use |
a list with the following elements
a vector of the collapsed values
a vector of the pairwise correlations
a vector of the computed weights
m <- matrix(rnorm(30), ncol=3) distil(m)
m <- matrix(rnorm(30), ncol=3) distil(m)
An example of a GCT object with row and column metadata and gene expression values in the matrix.
ds
ds
An object of class GCT
of length 1.
extract the elements from a GCT
object
where the values of row_field
and col_field
are the same. A concrete example is if g
represents
a matrix of signatures of genetic perturbations, and you wan
to extract all the values of the targeted genes.
extract.gct(...) extract_gct( g, row_field, col_field, rdesc = NULL, cdesc = NULL, row_keyfield = "id", col_keyfield = "id" )
extract.gct(...) extract_gct( g, row_field, col_field, rdesc = NULL, cdesc = NULL, row_keyfield = "id", col_keyfield = "id" )
... |
arguments passed on to |
g |
the GCT object |
row_field |
the column name in rdesc to search on |
col_field |
the column name in cdesc to search on |
rdesc |
a |
cdesc |
a |
row_keyfield |
the column name of |
col_keyfield |
the column name of |
a list of the following elements
a logical matrix of the same dimensions as
ds@mat
indicating which matrix elements have
been extracted
an array index into ds@mat
representing which elements have been extracted
a vector of the extracted values
# get the values for all targeted genes from a # dataset of knockdown experiments res <- extract_gct(kd_gct, row_field="pr_gene_symbol", col_field="pert_mfc_desc") str(res) stats::quantile(res$vals)
# get the values for all targeted genes from a # dataset of knockdown experiments res <- extract_gct(kd_gct, row_field="pr_gene_symbol", col_field="pert_mfc_desc") str(res) stats::quantile(res$vals)
GCT
Initialize an object of class GCT
GCT( mat = NULL, rdesc = NULL, cdesc = NULL, src = NULL, rid = NULL, cid = NULL, matrix_only = FALSE )
GCT( mat = NULL, rdesc = NULL, cdesc = NULL, src = NULL, rid = NULL, cid = NULL, matrix_only = FALSE )
mat |
a matrix |
rdesc |
a |
cdesc |
a |
src |
path to a GCT file to read |
rid |
vector of character identifiers for rows |
cid |
vector of character identifiers for columns |
matrix_only |
logical indicating whether to read just the matrix
data from |
If mat
is provided, rid
and cid
are treated as
the row and column identifiers for the matrix and are assigned to the
rid
and cid
slots of the GCT
object.
If mat
is not provided but src
is provided,
rid
and cid
are treated as filters. Data will be read from
the file path provided to src
and will then be restricted to the
character ids or integer indices provided to rid
and cid
.
In a similar manner, matrix_only
controls whether the
row and column metadata are also read from the src
file path.
a GCT
object
Other GCTX parsing functions:
append.dim()
,
fix.datatypes()
,
parse.gctx()
,
process_ids()
,
read.gctx.ids()
,
read.gctx.meta()
,
write.gctx.meta()
,
write.gctx()
,
write.gct()
# an empty object (g <- GCT()) # with a matrix # note we must specify row and column ids (g <- GCT(mat=matrix(rnorm(100), nrow=10), rid=letters[1:10], cid=letters[1:10])) # from file gct_file <- system.file("extdata", "modzs_n25x50.gctx", package="cmapR") (g <- GCT(src=gct_file))
# an empty object (g <- GCT()) # with a matrix # note we must specify row and column ids (g <- GCT(mat=matrix(rnorm(100), nrow=10), rid=letters[1:10], cid=letters[1:10])) # from file gct_file <- system.file("extdata", "modzs_n25x50.gctx", package="cmapR") (g <- GCT(src=gct_file))
The GCT class serves to represent annotated
matrices. The mat
slot contains said data and the
rdesc
and cdesc
slots contain data frames with
annotations about the rows and columns, respectively
mat
a numeric matrix
rid
a character vector of row ids
cid
a character vector of column ids
rdesc
a data.frame
of row descriptors
rdesc
a data.frame
of column descriptors
src
a character indicating the source (usually file path) of the data
parse_gctx
,
write_gctx
, read_gctx_meta
,
read_gctx_ids
visit http://clue.io/help for more information on the GCT format
An example collection of gene sets as used in the Lamb 2006 CMap paper.
gene_set
gene_set
An object of class list
of length 8.
Lamb et al 2006 doi:10.1126/science.1132939
Extract the or set row or column ids of a GCT object
ids(g, dimension = "row") ## S4 method for signature 'GCT' ids(g, dimension = "row") ids(g, dimension = "row") <- value ## S4 replacement method for signature 'GCT' ids(g, dimension = "row") <- value
ids(g, dimension = "row") ## S4 method for signature 'GCT' ids(g, dimension = "row") ids(g, dimension = "row") <- value ## S4 replacement method for signature 'GCT' ids(g, dimension = "row") <- value
g |
the GCT object |
dimension |
the dimension to extract/update ['row' or 'column'] |
value |
a character vector |
a vector of row ids
Other GCT accessor methods:
mat()
,
meta()
# extract rids rids <- ids(ds) # extract column ids cids <- ids(ds, "column") # set rids ids(ds) <- as.character(1:length(rids)) # set cids ids(ds, "column") <- as.character(1:length(cids))
# extract rids rids <- ids(ds) # extract column ids cids <- ids(ds, "column") # set rids ids(ds) <- as.character(1:length(rids)) # set cids ids(ds, "column") <- as.character(1:length(cids))
Check if x is a whole number
is.wholenumber(x, tol = .Machine$double.eps^0.5)
is.wholenumber(x, tol = .Machine$double.eps^0.5)
x |
number to test |
tol |
the allowed tolerance |
boolean indicating whether x is tol away from a whole number value
is.wholenumber(1) is.wholenumber(0.5)
is.wholenumber(1) is.wholenumber(0.5)
An example GCT object of knockdown experiments targeting a subset of landmark genes.
kd_gct
kd_gct
An object of class GCT
of length 1.
Read an LXB file and return a matrix
lxb2mat(lxb_path, columns = c("RID", "RP1"), newnames = c("barcode_id", "FI"))
lxb2mat(lxb_path, columns = c("RID", "RP1"), newnames = c("barcode_id", "FI"))
lxb_path |
the path to the lxb file |
columns |
which columns in the lxb file to retain |
newnames |
what to name these columns in the returned matrix |
a matrix
Other CMap parsing functions:
parse.gmt()
,
parse.gmx()
,
parse.grp()
,
write_gmt()
,
write_grp()
lxb_path <- system.file("extdata", "example.lxb", package="cmapR") lxb_data <- lxb2mat(lxb_path) str(lxb_data)
lxb_path <- system.file("extdata", "example.lxb", package="cmapR") lxb_data <- lxb2mat(lxb_path) str(lxb_data)
Extract or set the matrix of GCT object
mat(g) ## S4 method for signature 'GCT' mat(g) mat(g) <- value ## S4 replacement method for signature 'GCT' mat(g) <- value
mat(g) ## S4 method for signature 'GCT' mat(g) mat(g) <- value ## S4 replacement method for signature 'GCT' mat(g) <- value
g |
the GCT object |
value |
a numeric matrix |
a matrix
Other GCT accessor methods:
ids()
,
meta()
# get the matrix m <- mat(ds) # set the matrix mat(ds) <- matrix(0, nrow=nrow(m), ncol=ncol(m))
# get the matrix m <- mat(ds) # set the matrix mat(ds) <- matrix(0, nrow=nrow(m), ncol=ncol(m))
data.table
(aka 'melt')Utilizes the melt.data.table
function to
transform the
matrix into long form. Optionally can include the row and column
annotations in the transformed data.table
.
melt.gct(...) melt_gct( g, suffixes = NULL, remove_symmetries = FALSE, keep_rdesc = TRUE, keep_cdesc = TRUE, ... ) ## S4 method for signature 'GCT' melt_gct( g, suffixes = NULL, remove_symmetries = FALSE, keep_rdesc = TRUE, keep_cdesc = TRUE, ... )
melt.gct(...) melt_gct( g, suffixes = NULL, remove_symmetries = FALSE, keep_rdesc = TRUE, keep_cdesc = TRUE, ... ) ## S4 method for signature 'GCT' melt_gct( g, suffixes = NULL, remove_symmetries = FALSE, keep_rdesc = TRUE, keep_cdesc = TRUE, ... )
... |
further arguments passed along to |
g |
the GCT object |
suffixes |
the character suffixes to be applied if there are collisions between the names of the row and column descriptors |
remove_symmetries |
boolean indicating whether to remove
the lower triangle of the matrix (only applies if |
keep_rdesc |
boolean indicating whether to keep the row descriptors in the final result |
keep_cdesc |
boolean indicating whether to keep the column descriptors in the final result |
a data.table
object with the row and column ids and
the matrix
values and (optinally) the row and column descriptors
Other GCT utilities:
annotate.gct()
,
merge.gct()
,
rank.gct()
,
subset.gct()
# simple melt, keeping both row and column meta head(melt_gct(ds)) # update row/colum suffixes to indicate rows are genes, columns experiments head(melt_gct(ds, suffixes = c("_gene", "_experiment"))) # ignore row/column meta head(melt_gct(ds, keep_rdesc = FALSE, keep_cdesc = FALSE))
# simple melt, keeping both row and column meta head(melt_gct(ds)) # update row/colum suffixes to indicate rows are genes, columns experiments head(melt_gct(ds, suffixes = c("_gene", "_experiment"))) # ignore row/column meta head(melt_gct(ds, keep_rdesc = FALSE, keep_cdesc = FALSE))
Merge two GCT objects together
## S3 method for class 'gct' merge(...) merge_gct(g1, g2, dim = "row", matrix_only = FALSE) ## S4 method for signature 'GCT,GCT' merge_gct(g1, g2, dim = "row", matrix_only = FALSE)
## S3 method for class 'gct' merge(...) merge_gct(g1, g2, dim = "row", matrix_only = FALSE) ## S4 method for signature 'GCT,GCT' merge_gct(g1, g2, dim = "row", matrix_only = FALSE)
... |
arguments passed on to |
g1 |
the first GCT object |
g2 |
the second GCT object |
dim |
the dimension on which to merge (row or column) |
matrix_only |
boolean idicating whether to keep only the
data matrices from |
a GCT object
Other GCT utilities:
annotate.gct()
,
melt.gct()
,
rank.gct()
,
subset.gct()
# take the first 10 and last 10 rows of an object # and merge them back together (a <- subset_gct(ds, rid=1:10)) (b <- subset_gct(ds, rid=969:978)) (merged <- merge_gct(a, b, dim="row"))
# take the first 10 and last 10 rows of an object # and merge them back together (a <- subset_gct(ds, rid=1:10)) (b <- subset_gct(ds, rid=969:978)) (merged <- merge_gct(a, b, dim="row"))
Extract the or set metadata of a GCT object
meta(g, dimension = "row") ## S4 method for signature 'GCT' meta(g, dimension = "row") meta(g, dimension = "row") <- value ## S4 replacement method for signature 'GCT' meta(g, dimension = "row") <- value
meta(g, dimension = "row") ## S4 method for signature 'GCT' meta(g, dimension = "row") meta(g, dimension = "row") <- value ## S4 replacement method for signature 'GCT' meta(g, dimension = "row") <- value
g |
the GCT object |
dimension |
the dimension to extract/update ['row' or 'column'] |
value |
a data.frame |
a data.frame
Other GCT accessor methods:
ids()
,
mat()
# extract rdesc rdesc <- meta(ds) # extract cdesc cdesc <- meta(ds, dim="column") # set rdesc meta(ds) <- data.frame(x=sample(letters, nrow(rdesc), replace=TRUE)) # set cdesc meta(ds, dim="column") <- data.frame(x=sample(letters, nrow(cdesc), replace=TRUE))
# extract rdesc rdesc <- meta(ds) # extract cdesc cdesc <- meta(ds, dim="column") # set rdesc meta(ds) <- data.frame(x=sample(letters, nrow(rdesc), replace=TRUE)) # set cdesc meta(ds, dim="column") <- data.frame(x=sample(letters, nrow(cdesc), replace=TRUE))
Pad a matrix with additional rows/columns of NA values
na_pad_matrix(m, row_universe = NULL, col_universe = NULL)
na_pad_matrix(m, row_universe = NULL, col_universe = NULL)
m |
a matrix with unique row and column names |
row_universe |
a vector with the universe of possible row names |
col_universe |
a vector with the universe of possible column names |
a matrix
m <- matrix(rnorm(10), nrow=2) rownames(m) <- c("A", "B") colnames(m) <- letters[1:5] na_pad_matrix(m, row_universe=LETTERS, col_universe=letters)
m <- matrix(rnorm(10), nrow=2) rownames(m) <- c("A", "B") colnames(m) <- letters[1:5] na_pad_matrix(m, row_universe=LETTERS, col_universe=letters)
Parse a GCTX file into the workspace as a GCT object
parse.gctx(...) parse_gctx(fname, rid = NULL, cid = NULL, matrix_only = FALSE)
parse.gctx(...) parse_gctx(fname, rid = NULL, cid = NULL, matrix_only = FALSE)
... |
arguments passed on to |
fname |
path to the GCTX file on disk |
rid |
either a vector of character or integer row indices or a path to a grp file containing character row indices. Only these indicies will be parsed from the file. |
cid |
either a vector of character or integer column indices or a path to a grp file containing character column indices. Only these indicies will be parsed from the file. |
matrix_only |
boolean indicating whether to parse only the matrix (ignoring row and column annotations) |
parse_gctx
also supports parsing of plain text
GCT files, so this function can be used as a general GCT parser.
a GCT object
Other GCTX parsing functions:
GCT
,
append.dim()
,
fix.datatypes()
,
process_ids()
,
read.gctx.ids()
,
read.gctx.meta()
,
write.gctx.meta()
,
write.gctx()
,
write.gct()
gct_file <- system.file("extdata", "modzs_n25x50.gctx", package="cmapR") (ds <- parse_gctx(gct_file)) # matrix only (ds <- parse_gctx(gct_file, matrix_only=TRUE)) # only the first 10 rows and columns (ds <- parse_gctx(gct_file, rid=1:10, cid=1:10))
gct_file <- system.file("extdata", "modzs_n25x50.gctx", package="cmapR") (ds <- parse_gctx(gct_file)) # matrix only (ds <- parse_gctx(gct_file, matrix_only=TRUE)) # only the first 10 rows and columns (ds <- parse_gctx(gct_file, rid=1:10, cid=1:10))
Read a GMT file and return a list
parse.gmt(...) parse_gmt(fname)
parse.gmt(...) parse_gmt(fname)
... |
arguments passed on to |
fname |
the file path to be parsed |
parse_gmt
returns a nested list object. The top
level contains one list per row in fname
. Each of
these is itself a list with the following fields:
- head
: the name of the data (row in fname
)
- desc
: description of the corresponding data
- len
: the number of data items
- entry
: a vector of the data items
a list of the contents of fname
. See details.
Visit http://clue.io/help for details on the GMT file format
Other CMap parsing functions:
lxb2mat()
,
parse.gmx()
,
parse.grp()
,
write_gmt()
,
write_grp()
gmt_path <- system.file("extdata", "query_up.gmt", package="cmapR") gmt <- parse_gmt(gmt_path) str(gmt)
gmt_path <- system.file("extdata", "query_up.gmt", package="cmapR") gmt <- parse_gmt(gmt_path) str(gmt)
Read a GMX file and return a list
parse.gmx(...) parse_gmx(fname)
parse.gmx(...) parse_gmx(fname)
... |
arguments passed on to |
fname |
the file path to be parsed |
parse_gmx
returns a nested list object. The top
level contains one list per column in fname
. Each of
these is itself a list with the following fields:
- head
: the name of the data (column in fname
)
- desc
: description of the corresponding data
- len
: the number of data items
- entry
: a vector of the data items
a list of the contents of fname
. See details.
Visit http://clue.io/help for details on the GMX file format
Other CMap parsing functions:
lxb2mat()
,
parse.gmt()
,
parse.grp()
,
write_gmt()
,
write_grp()
gmx_path <- system.file("extdata", "lm_probes.gmx", package="cmapR") gmx <- parse_gmx(gmx_path) str(gmx)
gmx_path <- system.file("extdata", "lm_probes.gmx", package="cmapR") gmx <- parse_gmx(gmx_path) str(gmx)
Read a GRP file and return a vector of its contents
parse.grp(...) parse_grp(fname)
parse.grp(...) parse_grp(fname)
... |
arguments passed on to |
fname |
the file path to be parsed |
a vector of the contents of fname
Visit http://clue.io/help for details on the GRP file format
Other CMap parsing functions:
lxb2mat()
,
parse.gmt()
,
parse.gmx()
,
write_gmt()
,
write_grp()
grp_path <- system.file("extdata", "lm_epsilon_n978.grp", package="cmapR") values <- parse_grp(grp_path) str(values)
grp_path <- system.file("extdata", "lm_epsilon_n978.grp", package="cmapR") values <- parse_grp(grp_path) str(values)
Convert a GCT object's matrix to ranks
rank.gct(...) rank_gct(g, dim = "col", decreasing = TRUE) ## S4 method for signature 'GCT' rank_gct(g, dim = "col", decreasing = TRUE)
rank.gct(...) rank_gct(g, dim = "col", decreasing = TRUE) ## S4 method for signature 'GCT' rank_gct(g, dim = "col", decreasing = TRUE)
... |
arguments passed on to |
g |
the |
dim |
the dimension along which to rank (row or column) |
decreasing |
boolean indicating whether higher values should get lower ranks |
a modified version of g
, with the
values in the matrix converted to ranks
Other GCT utilities:
annotate.gct()
,
melt.gct()
,
merge.gct()
,
subset.gct()
(ranked <- rank_gct(ds, dim="column")) # scatter rank vs. score for a few columns m <- mat(ds) m_ranked <- mat(ranked) plot(m[, 1:3], m_ranked[, 1:3], xlab="score", ylab="rank")
(ranked <- rank_gct(ds, dim="column")) # scatter rank vs. score for a few columns m <- mat(ds) m_ranked <- mat(ranked) plot(m[, 1:3], m_ranked[, 1:3], xlab="score", ylab="rank")
Read GCTX row or column ids
read.gctx.ids(...) read_gctx_ids(gctx_path, dim = "row")
read.gctx.ids(...) read_gctx_ids(gctx_path, dim = "row")
... |
arguments passed on to |
gctx_path |
path to the GCTX file |
dim |
which ids to read (row or column) |
a character vector of row or column ids from the provided file
Other GCTX parsing functions:
GCT
,
append.dim()
,
fix.datatypes()
,
parse.gctx()
,
process_ids()
,
read.gctx.meta()
,
write.gctx.meta()
,
write.gctx()
,
write.gct()
gct_file <- system.file("extdata", "modzs_n25x50.gctx", package="cmapR") # row ids rid <- read_gctx_ids(gct_file) head(rid) # column ids cid <- read_gctx_ids(gct_file, dim="column") head(cid)
gct_file <- system.file("extdata", "modzs_n25x50.gctx", package="cmapR") # row ids rid <- read_gctx_ids(gct_file) head(rid) # column ids cid <- read_gctx_ids(gct_file, dim="column") head(cid)
Parse row or column metadata from GCTX files
read.gctx.meta(...) read_gctx_meta(gctx_path, dim = "row", ids = NULL)
read.gctx.meta(...) read_gctx_meta(gctx_path, dim = "row", ids = NULL)
... |
arguments passed on to |
gctx_path |
the path to the GCTX file |
dim |
which metadata to read (row or column) |
ids |
a character vector of a subset of row/column ids for which to read the metadata |
a data.frame
of metadata
Other GCTX parsing functions:
GCT
,
append.dim()
,
fix.datatypes()
,
parse.gctx()
,
process_ids()
,
read.gctx.ids()
,
write.gctx.meta()
,
write.gctx()
,
write.gct()
gct_file <- system.file("extdata", "modzs_n25x50.gctx", package="cmapR") # row meta row_meta <- read_gctx_meta(gct_file) str(row_meta) # column meta col_meta <- read_gctx_meta(gct_file, dim="column") str(col_meta) # now for only the first 10 ids col_meta_first10 <- read_gctx_meta(gct_file, dim="column", ids=col_meta$id[1:10]) str(col_meta_first10)
gct_file <- system.file("extdata", "modzs_n25x50.gctx", package="cmapR") # row meta row_meta <- read_gctx_meta(gct_file) str(row_meta) # column meta col_meta <- read_gctx_meta(gct_file, dim="column") str(col_meta) # now for only the first 10 ids col_meta_first10 <- read_gctx_meta(gct_file, dim="column", ids=col_meta$id[1:10]) str(col_meta_first10)
robust zscore implementation takes in a 1D vector, returns 1D vector after computing robust zscores rZ = (x-med(x))/mad(x)
robust_zscore(x, min_mad = 1e-06, ...)
robust_zscore(x, min_mad = 1e-06, ...)
x |
numeric vector to z-score |
min_mad |
the minimum allowed MAD, useful for avoiding division by very small numbers |
... |
further options to median, max functions |
transformed version of x
(x <- rnorm(25)) (robust_zscore(x)) # with min_mad (robust_zscore(x, min_mad=1e-4))
(x <- rnorm(25)) (robust_zscore(x)) # with min_mad (robust_zscore(x, min_mad=1e-4))
Subset a gct object using the provided row and column ids
## S3 method for class 'gct' subset(...) subset_gct(g, rid = NULL, cid = NULL) ## S4 method for signature 'GCT' subset_gct(g, rid = NULL, cid = NULL)
## S3 method for class 'gct' subset(...) subset_gct(g, rid = NULL, cid = NULL) ## S4 method for signature 'GCT' subset_gct(g, rid = NULL, cid = NULL)
... |
arguments passed on to |
g |
a gct object |
rid |
a vector of character ids or integer indices for ROWS |
cid |
a vector of character ids or integer indices for COLUMNS |
a GCT object
Other GCT utilities:
annotate.gct()
,
melt.gct()
,
merge.gct()
,
rank.gct()
# first 10 rows and columns by index (a <- subset_gct(ds, rid=1:10, cid=1:10)) # first 10 rows and columns using character ids # use \code{ids} to extract the ids rid <- ids(ds) cid <- ids(ds, dimension="col") (b <- subset_gct(ds, rid=rid[1:10], cid=cid[1:10])) identical(a, b) # TRUE
# first 10 rows and columns by index (a <- subset_gct(ds, rid=1:10, cid=1:10)) # first 10 rows and columns using character ids # use \code{ids} to extract the ids rid <- ids(ds) cid <- ids(ds, dimension="col") (b <- subset_gct(ds, rid=rid[1:10], cid=cid[1:10])) identical(a, b) # TRUE
Threshold a numeric vector
threshold(x, minval, maxval)
threshold(x, minval, maxval)
x |
the vector |
minval |
minium allowed value |
maxval |
maximum allowed value |
a thresholded version of x
x <- rnorm(20) threshold(x, -0.1, -0.1)
x <- rnorm(20) threshold(x, -0.1, -0.1)
Transpose a GCT object
transpose.gct(...) transpose_gct(g) ## S4 method for signature 'GCT' transpose_gct(g)
transpose.gct(...) transpose_gct(g) ## S4 method for signature 'GCT' transpose_gct(g)
... |
arguments passed on to |
g |
the |
a modified verion of the input GCT
object
where the matrix has been transposed and the row and column
ids and annotations have been swapped.
transpose_gct(ds)
transpose_gct(ds)
Update the matrix of an existing GCTX file
## S3 method for class 'gctx' update(...) update_gctx(x, ofile, rid = NULL, cid = NULL)
## S3 method for class 'gctx' update(...) update_gctx(x, ofile, rid = NULL, cid = NULL)
... |
arguments passed on to |
x |
an array of data |
ofile |
the filename of the GCTX to update |
rid |
integer indices or character ids of the rows to update |
cid |
integer indices or character ids of the columns to update |
Overwrite the rows and columns of ofile
as indicated by rid
and cid
respectively.
rid
and cid
can either be integer indices
or character ids corresponding to the row and column ids
in ofile
.
silently returns NULL
## Not run: m <- matrix(rnorm(20), nrow=10) # update by integer indices update_gctx(m, ofile="my.gctx", rid=1:10, cid=1:2) # update by character ids row_ids <- letters[1:10] col_ids <- LETTERS[1:2] update_gctx(m, ofile="my.gctx", rid=row_ids, cid=col_ids) ## End(Not run)
## Not run: m <- matrix(rnorm(20), nrow=10) # update by integer indices update_gctx(m, ofile="my.gctx", rid=1:10, cid=1:2) # update by character ids row_ids <- letters[1:10] col_ids <- LETTERS[1:2] update_gctx(m, ofile="my.gctx", rid=row_ids, cid=col_ids) ## End(Not run)
Write a nested list to a GMT file
write_gmt(lst, fname)
write_gmt(lst, fname)
lst |
the nested list to write. See |
fname |
the desired file name |
lst
needs to be a nested list where each
sub-list is itself a list with the following fields:
- head
: the name of the data
- desc
: description of the corresponding data
- len
: the number of data items
- entry
: a vector of the data items
silently returns NULL
Visit http://clue.io/help for details on the GMT file format
Other CMap parsing functions:
lxb2mat()
,
parse.gmt()
,
parse.gmx()
,
parse.grp()
,
write_grp()
## Not run: write_gmt(gene_set, "gene_set.gmt") ## End(Not run)
## Not run: write_gmt(gene_set, "gene_set.gmt") ## End(Not run)
Write a vector to a GRP file
write_grp(vals, fname)
write_grp(vals, fname)
vals |
the vector of values to be written |
fname |
the desired file name |
silently returns NULL
Visit http://clue.io/help for details on the GRP file format
Other CMap parsing functions:
lxb2mat()
,
parse.gmt()
,
parse.gmx()
,
parse.grp()
,
write_gmt()
## Not run: write_grp(letters, "letter.grp") ## End(Not run)
## Not run: write_grp(letters, "letter.grp") ## End(Not run)
Write a GCT object to disk in GCT format
write.gct(...) write_gct(ds, ofile, precision = 4, appenddim = TRUE, ver = 3)
write.gct(...) write_gct(ds, ofile, precision = 4, appenddim = TRUE, ver = 3)
... |
arguments passed on to |
ds |
the GCT object |
ofile |
the desired output filename |
precision |
the numeric precision at which to
save the matrix. See |
appenddim |
boolean indicating whether to append matrix dimensions to filename |
ver |
the GCT version to write. See |
Since GCT is text format, the higher precision
you choose, the larger the file size.
ver
is assumed to be 3, aka GCT version 1.3, which supports
embedded row and column metadata in the GCT file. Any other value
passed to ver
will result in a GCT version 1.2 file which
contains only the matrix data and no annotations.
silently returns NULL
Other GCTX parsing functions:
GCT
,
append.dim()
,
fix.datatypes()
,
parse.gctx()
,
process_ids()
,
read.gctx.ids()
,
read.gctx.meta()
,
write.gctx.meta()
,
write.gctx()
# note this will create a GCT file in your current directory write_gct(ds, "dataset", precision=2)
# note this will create a GCT file in your current directory write_gct(ds, "dataset", precision=2)
Write a GCT object to disk in GCTX format
write.gctx(...) write_gctx( ds, ofile, appenddim = TRUE, compression_level = 0, matrix_only = FALSE, max_chunk_kb = 1024 )
write.gctx(...) write_gctx( ds, ofile, appenddim = TRUE, compression_level = 0, matrix_only = FALSE, max_chunk_kb = 1024 )
... |
arguments passed on to |
ds |
a GCT object |
ofile |
the desired file path for writing |
appenddim |
boolean indicating whether the resulting filename will have dimensions appended (e.g. my_file_n384x978.gctx) |
compression_level |
integer between 1-9 indicating how much to compress data before writing. Higher values result in smaller files but slower read times. |
matrix_only |
boolean indicating whether to write only the matrix data (and skip row, column annotations) |
max_chunk_kb |
for chunking, the maximum number of KB a given chunk will occupy |
silently returns NULL
Other GCTX parsing functions:
GCT
,
append.dim()
,
fix.datatypes()
,
parse.gctx()
,
process_ids()
,
read.gctx.ids()
,
read.gctx.meta()
,
write.gctx.meta()
,
write.gct()
# note this will create a GCT file in your current directory write_gctx(ds, "dataset")
# note this will create a GCT file in your current directory write_gctx(ds, "dataset")
data.frame
to a tab-delimited text fileWrite a data.frame
to a tab-delimited text file
write.tbl(...) write_tbl(tbl, ofile, ...)
write.tbl(...) write_tbl(tbl, ofile, ...)
... |
additional arguments passed on to |
tbl |
the |
ofile |
the desired file name |
This method simply calls write.table
with some
preset arguments that generate a unquoated, tab-delimited file
without row names.
silently returns NULL
## Not run: write_tbl(cdesc_char, "col_meta.txt") ## End(Not run)
## Not run: write_tbl(cdesc_char, "col_meta.txt") ## End(Not run)