Title: | Tools for HELP data analysis |
---|---|
Description: | The package contains a modular pipeline for analysis of HELP microarray data, and includes graphical and mathematical tools with more general applications. |
Authors: | Reid F. Thompson <[email protected]>, John M. Greally <[email protected]>, with contributions from Mark Reimers <[email protected]> |
Maintainer: | Reid F. Thompson <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.65.0 |
Built: | 2024-10-30 07:24:57 UTC |
Source: | https://github.com/bioc/HELP |
Unified thermodynamic parameters (delta-S and delta-H values) for nearest-neighbor base stacking calculations
data(base.stacking.thermodynamics)
data(base.stacking.thermodynamics)
A matrix with 2 columns and 16 rows. Column 1 indicates enthalpic parameters (dH) and column 2 indicates entropic parameters (dS). Rows indicate all possible 2bp combinations of "A", "T", "C", and "G" (e.g. "AC")
Allawi, H.T. and SantaLucia, J., Jr. (1997) Thermodynamics and NMR of internal G.T mismatches in DNA, Biochemistry, 36, 10581-10594.
data(base.stacking.thermodynamics)
data(base.stacking.thermodynamics)
Function to calculate GC percent from a nucleotide sequence input
calcGC(x, ...)
calcGC(x, ...)
x |
characters containing nucleotide sequence (ex: "ATCGGAA") or an object of class |
... |
Other arguments passed to methods:
|
Returns a numerical value (from 0 to 1) indicating the C+G content of the sequence, corresponding to the fraction of (C+G)/(A+T+C+G...). A value of NA is returned if the function encounters an error that prevents proper calculation of GC percent.
Reid F. Thompson ([email protected])
#demo(pipeline,package="HELP") calcGC("AAAACGCG") calcGC(sequence="cXgXcXgXcXgX",allow="X")
#demo(pipeline,package="HELP") calcGC("AAAACGCG") calcGC(sequence="cXgXcXgXcXgX",allow="X")
Methods for calculating GC percent from oligonucleotide sequences
Handle empty function call
Handle empty function call
Handle character input
Handle input of an object of class ExpressionSet
Calculates prototype (trimmed mean) across all samples
calcPrototype(x, ...)
calcPrototype(x, ...)
x |
a numeric matrix, where each column represents a different sample |
... |
Arguments to be passed to methods (see
|
Returns a vector of numerical data, representing the prototype ([trimmed] mean) of all samples in x.
Mark Reimers ([email protected]), Reid F. Thompson ([email protected])
#demo(pipeline,package="HELP") x <- matrix(data=rep(1:1000,10),nrow=1000,ncol=10) x <- x*(sample(1:100/100,size=10000,replace=TRUE)) x <- t(t(x)-1000*(1:10)) x[c(1:10,991:1000),] x.avg <- calcPrototype(x) x.avg[c(1:10,991:1000)] #rm(x,x.avg)
#demo(pipeline,package="HELP") x <- matrix(data=rep(1:1000,10),nrow=1000,ncol=10) x <- x*(sample(1:100/100,size=10000,replace=TRUE)) x <- t(t(x)-1000*(1:10)) x[c(1:10,991:1000),] x.avg <- calcPrototype(x) x.avg[c(1:10,991:1000)] #rm(x,x.avg)
Methods for calculating prototype ([trimmed] mean) across all samples
Handle empty function call
Handle input of an object of class ExpressionSet
. Derive data from AssayData
.
Handle vector input as a matrix
Handle matrix input
Mark Reimers ([email protected]), Reid F. Thompson ([email protected])
Calculate melting temperature (Tm) using the nearest-neighbor base-stacking algorithm and the unified thermodynamic parameters.
calcTm(x, ...)
calcTm(x, ...)
x |
characters containing nucleotide sequences (ex: "ATCGGAA") or an object of class |
... |
Additional arguments passed to methods:
|
Returns a numerical value indicating the predicted melting temperature (Tm) of the sequence in degrees Celsius. A value of NA is returned if the function encounters an error that prevents proper Tm calculation.
Reid F. Thompson ([email protected])
Allawi, H.T. and SantaLucia, J., Jr. (1997) Thermodynamics and NMR of internal G.T mismatches in DNA, Biochemistry, 36, 10581-10594.
calcTm-methods
, base.stacking.thermodynamics
, calcGC
#demo(pipeline,package="HELP") calcTm("GTGTGGCTACAGGTGGGCCGTGGCGCACCTAAGTGAGGACAGAGAACAAC") calcTm("GTGTGGCTACAGGTGGGCCGTGGCGCACCTAAGTGAGGACAGAGAACAAC",strand1.concentration=1E-5,strand2.concentration=2E-8)
#demo(pipeline,package="HELP") calcTm("GTGTGGCTACAGGTGGGCCGTGGCGCACCTAAGTGAGGACAGAGAACAAC") calcTm("GTGTGGCTACAGGTGGGCCGTGGCGCACCTAAGTGAGGACAGAGAACAAC",strand1.concentration=1E-5,strand2.concentration=2E-8)
Methods for calculating melting temperature (Tm) of nucleotide sequences
Handle empty function call
Handle empty function call
Handle character input
Handle input of an object of class ExpressionSet
Calculate trimmed and/or weighted means of groups of rows in a given data matrix.
combineData(x, y, w, ...)
combineData(x, y, w, ...)
x |
a numeric matrix containing the values whose trimmed and/or weighted mean is to be computed. Each column is treated independently. |
y |
a vector describing the discrete groups used to divide the elements of x. If y is missing then all elements of x are handled together. |
w |
a matrix of weights the same dimensions as x giving the weights to use for each element of x. If w is missing then all elements of x are given the same weight. |
... |
Arguments to be passed to methods (see
|
Returns a matrix of combined numerical data, where each row represents the summary of a group of elements from the corresponding column in x.
Each column in data matrix treated separately.
Reid F. Thompson ([email protected])
combineData-methods
, mean
, weighted.mean
#demo(pipeline,package="HELP") x <- 1:100 combineData(x,w=x/100) weighted.mean(x,w=x/100) y <- sample(c("a","b","c",1:3),size=100,replace=TRUE) combineData(cbind(x,x,2*x),y,trim=0.5) #rm(x,y)
#demo(pipeline,package="HELP") x <- 1:100 combineData(x,w=x/100) weighted.mean(x,w=x/100) y <- sample(c("a","b","c",1:3),size=100,replace=TRUE) combineData(cbind(x,x,2*x),y,trim=0.5) #rm(x,y)
Methods for calculating trimmed and/or weighted means of groups of rows in a given data matrix.
Handle empty function call
Handle partially empty function call. Reinterpret with default parameters instead of missing values.
Handle partially empty function call. Reinterpret with default parameters instead of missing values.
Handle partially empty function call. Reinterpret with default parameters instead of missing values.
Handle input of three vectors specifying data, grouping, and weighting information, respectively. Note that the data and weighting inputs are handled as matrices.
Handle partially empty function call. Reinterpret with default parameters instead of missing values.
Handle input of one matrix, one vector, and one matrix specifying data, grouping, and weighting information, respectively.
Handle input of an object of class ExpressionSet
. Derive grouping and weighting data from featureData
and AssayDataElement
, respectively.
Handle input of an object of class ExpressionSet
and a vector specifying grouping information. Derive weighting data from codeAssayDataElement.
Reid F. Thompson ([email protected])
Create and write a wiggle track (UCSC Genome Browser format) to flat file
createWiggle(x, y, ...)
createWiggle(x, y, ...)
x |
matrix of numerical data, where each column represents data for an individual wiggle track. |
y |
an additional matrix of numerical data with columns corresponding to chr, start, and end, respectively. |
... |
Arguments to be passed to methods (see
|
Reid F. Thompson ([email protected])
UCSC Genome Browser, http://genome.ucsc.edu/goldenPath/help/customTrack.html: Kent, W.J., Sugnet, C. W., Furey, T. S., Roskin, K.M., Pringle, T. H., Zahler, A. M., and Haussler, D. The Human Genome Browser at UCSC. Genome Res. 12(6), 996-1006 (2002).
#demo(pipeline,package="HELP") chr <- rep("chr1", 500) start <- (1:500)*200 end <- start+199 data <- sample(5*(1:10000/10000)-2, size=500) data <- cbind(data, abs(data), -1*data) colnames(data) <- c("data", "abs", "opposite") createWiggle(data, cbind(chr, start, end)) #rm(chr, start, end, data)
#demo(pipeline,package="HELP") chr <- rep("chr1", 500) start <- (1:500)*200 end <- start+199 data <- sample(5*(1:10000/10000)-2, size=500) data <- cbind(data, abs(data), -1*data) colnames(data) <- c("data", "abs", "opposite") createWiggle(data, cbind(chr, start, end)) #rm(chr, start, end, data)
Methods for creating wiggle tracks
Handle empty function call
Handle input of an object of class ExpressionSet
. Derive features from FeatureData
.
Handle input of an object of class ExpressionSet
. Derive features from matrix input
Handle vector input
Handle matrix input
Reid F. Thompson ([email protected])
Access (and/or assign) data for signal channel 2 in a given ExpressionSet object
exprs2(object) exprs2(object) <- value
exprs2(object) exprs2(object) <- value
object |
Object of class |
value |
Matrix with rows representing features and columns representing samples |
exprs2
returns a (usually large!) matrix of values
Reid F. Thompson ([email protected])
#demo(pipeline,package="HELP")
#demo(pipeline,package="HELP")
Methods for accessing (and/or assigning) data for signal channel 2 in a given ExpressionSet object
Handle empty function call
Handle input of an object of class ExpressionSet
Handle empty function call
Handle input of an object of class ExpressionSet
e
Match and reinterpret a vector in terms of a second vector, essentially using the second vector as a key to interpret the first.
fuzzyMatches(x, y, ...)
fuzzyMatches(x, y, ...)
x |
vector, the values to be matched. |
y |
vector, the values to be matched against. |
... |
Arguments to be passed to methods (see
|
This function employs multiple stages of matching between two vectors. First, the values in x
are matched against y
to find any exact matches. Next, numerical values in x
are used to retrieve the corresponding positions in y
. All unmatched values may be retained or dropped (depending on the value of keep
), and a list of unique values is returned. Note that a value of match.all
in x
will be interpreted as a match for ALL values in y
, and therefore replaced with the contents of y
.
Returns a vector of unique values in x
, that match values in y
according to the parameters described above.
Reid F. Thompson ([email protected])
a <- c(1, "four", "missing") b <- c("one", "two", "three", "four") fuzzyMatches(a, b) fuzzyMatches(a, b, strict=FALSE) fuzzyMatches(a, b, strict=FALSE, alias=FALSE) fuzzyMatches(a, b, strict=FALSE, keep=FALSE)
a <- c(1, "four", "missing") b <- c("one", "two", "three", "four") fuzzyMatches(a, b) fuzzyMatches(a, b, strict=FALSE) fuzzyMatches(a, b, strict=FALSE, alias=FALSE) fuzzyMatches(a, b, strict=FALSE, keep=FALSE)
Methods for matching and reinterpreting a vector in terms of a second vector, essentially using the second vector as a key to interpreting the first.
Handle empty function call
Handle empty function call
Handle empty function call
Handle input of two vectors.
Handle empty function call
Reid F. Thompson ([email protected])
Fetch a subset of features from a given data structure
getFeatures(x, y, ...)
getFeatures(x, y, ...)
x |
the matrix of feature data to subset. If |
y |
which feature(s) to use. Can be a vector of characters matching feature names, integers indicating which features to choose, or a mixture of the two. If not supplied (or if equivalent to "*"), all features will be used. |
... |
other arguments passed are not handled at this time. |
Returns a matrix of values corresponding to a subset of features from the data structure supplied, where columns correspond to features. Function halts if no features to return.
Reid F. Thompson ([email protected])
data(sample.ExpressionSet) df <- data.frame(x=1:500,y=501:1000, row.names=featureNames(sample.ExpressionSet)) featureData(sample.ExpressionSet) <- new("AnnotatedDataFrame", data=df, dimLabels=c("featureNames", "")) getFeatures(sample.ExpressionSet, "y")[1:10]
data(sample.ExpressionSet) df <- data.frame(x=1:500,y=501:1000, row.names=featureNames(sample.ExpressionSet)) featureData(sample.ExpressionSet) <- new("AnnotatedDataFrame", data=df, dimLabels=c("featureNames", "")) getFeatures(sample.ExpressionSet, "y")[1:10]
Methods for fetching a subset of features from a given data structure
Handle empty function call
Handle input of an object of class ExpressionSet
. Select all feature data.
Handle input of an object of class ExpressionSet
. Select all feature data.
Handle input of an object of class ExpressionSet
. Select a subset of features.
Handle input of an AnnotatedDataFrame
object. Select all feature data.
Handle input of an AnnotatedDataFrame
object. Select all feature data.
Handle input of an AnnotatedDataFrame
object. Select a subset of features.
Handle input of a vector (interpreted as a matrix). Select all feature data
Handle input of a vector (interpreted as a matrix). Select all feature data
Handle input of two vectors specifying feature data and feature subset information, respectively.
Handle input of a matrix and a vector specifying feature data and feature subset information, respectively.
Reid F. Thompson ([email protected])
Fetch a subset of samples from a given data structure
getSamples(x, y, ...)
getSamples(x, y, ...)
x |
the matrix of sample data to subset. If |
y |
which sample(s) to use as data. Can be a vector of characters matching sample names, integers indicating which samples to choose, or a mixture of the two. If not supplied, all samples will be used. |
... |
Arguments to be passed to methods (see
|
Returns a matrix of values corresponding to a subset of samples from the data supplied, where columns correspond to samples. Function halts if no samples to return.
Reid F. Thompson ([email protected])
data(sample.ExpressionSet) se.ABC <- getSamples(sample.ExpressionSet, c("A", "B", "C"), element="se.exprs") se.ABC[1:10,]
data(sample.ExpressionSet) se.ABC <- getSamples(sample.ExpressionSet, c("A", "B", "C"), element="se.exprs") se.ABC[1:10,]
Methods for fetching subsets of samples from various data structures
Handle empty function call
Handle input of an object of class ExpressionSet
. Select data for all samples.
Handle input of an object of class ExpressionSet
. Select data for all samples.
Handle input of an object of class ExpressionSet
. Select data for a subset of samples.
Handle input of a vector (interpreted as a matrix). Select data for all samples.
Handle input of a vector (interpreted as a matrix). Select data for all samples.
Handle input of two vectors specifying data and sample subset information, respectively.
Handle input of a matrix. Select data for all samples.
Handle input of a matrix. Select data for all samples.
Handle input of a matrix and a vector specifying data and sample subset information, respectively.
Reid F. Thompson ([email protected])
Plot densities of multiple bins of data, divided by a sliding window approach
plotBins(x, y, ...)
plotBins(x, y, ...)
x |
the vector of numerical data to be plotted. If |
y |
an additional vector of numerical data to be used for binning. If |
... |
Arguments to be passed to methods (see
|
Reid F. Thompson ([email protected])
plotBins-methods
, density
, quantile
#demo(pipeline,package="HELP") x <- 1:1000 y <- sample(1:50,size=1000,replace=TRUE) plotBins(x,y,show.avg=TRUE,main="Random binning data",xlab="1:1000") #rm(x,y)
#demo(pipeline,package="HELP") x <- 1:1000 y <- sample(1:50,size=1000,replace=TRUE) plotBins(x,y,show.avg=TRUE,main="Random binning data",xlab="1:1000") #rm(x,y)
Methods for plotting densities of multiple bins of data, divided by a sliding window approach
Handle empty function call
Handle matrix input, reinterpret function call with two vector input if matrix has two columns, otherwise handle as empty function call
Handle empty function call
Handle input of an object of class ExpressionSet
. Derive binning information from this class but use data from a vector input.
Handle input of two vectors specifying data and binning information, respectively.
Handle matrix input, reinterpret function call with vector input
Handle matrix input, reinterpret function call with vector input
Handle input of an object of class ExpressionSet
. Derive both data and binning information from a single object.
Handle input of an object of class ExpressionSet
. Derive data from this class but use binning information from a vector input.
Handle input of two objects of class ExpressionSet
. Derive data and binning information from each one, respectively.
Reid F. Thompson ([email protected])
Graphic display of spatially-linked data, particularly applicable for microarrays
plotChip(x, y, z, ...)
plotChip(x, y, z, ...)
x |
vector of numerical data determining x-coordinates of data on chip. |
y |
vector of numerical data determining y-coordinates of data on chip |
z |
the vector of numerical data to be plotted |
... |
Arguments to be passed to methods (see
|
Reid F. Thompson [email protected], Mark Reimers [email protected]
#demo(pipeline,package="HELP") x <- rep(1:100,100) y <- rep(1:100,each=100) z <- x*(1001:11000/1000) z <- z-mean(z) z <- z*(sample(1:10000/10000)+1) plotChip(x,y,z,main="Curved gradient",xlab="x",ylab="y") plotChip(x,y,sample(1:10000,size=10000),colors=gray(0:50/50),range=c(1,10000),main="Random noise") #rm(x,y,z)
#demo(pipeline,package="HELP") x <- rep(1:100,100) y <- rep(1:100,each=100) z <- x*(1001:11000/1000) z <- z-mean(z) z <- z*(sample(1:10000/10000)+1) plotChip(x,y,z,main="Curved gradient",xlab="x",ylab="y") plotChip(x,y,sample(1:10000,size=10000),colors=gray(0:50/50),range=c(1,10000),main="Random noise") #rm(x,y,z)
Methods for graphic display of spatially-linked data, particularly applicable for microarrays
Handle empty function call
Handle matrix input, extract information, and reinterpret function call with appropriate vectors
Handle input of an object of class ExpressionSet
. Derive both data and position information from a single object.
Handle input of an object of class ExpressionSet
. Derive position information from this object, but the corresponding data from vector input.
Handle input of two objects of class ExpressionSet
. Derive position information and data from each one, respectively.
Handle input of an object of class ExpressionSet
. Derive data from this object, but the corresponding position information from vector input.
Handle input of three vectors. Derive X and Y positions and data from each one, respectively.
Reid F. Thompson ([email protected]), Mark Reimers ([email protected])
Graphical display of featureData (ex: fragment size) versus two-color signal intensity data
plotFeature(x, y, ...)
plotFeature(x, y, ...)
x |
matrix of numerical data to be plotted, with two columns (one for each signal channel). |
y |
an additional vector of numerical data to be used for feature. If |
... |
Arguments to be passed to methods (see
|
Reid F. Thompson ([email protected])
#demo(pipeline,package="HELP") msp1 <- sample(8000:16000/1000, size=1000) msp1 <- msp1[order(msp1)] hpa2 <- sample(8000:16000/1000, size=1000) hpa2 <- hpa2[order(hpa2)] size <- sample((1:1000)*1.8+200, size=1000) rand <- which.min(abs(msp1-quantile(msp1, 0.25))) plotFeature(cbind(msp1, hpa2), size, which.random=(rand-20):(rand+20), main="Random") #rm(msp1, hpa2, size, rand)
#demo(pipeline,package="HELP") msp1 <- sample(8000:16000/1000, size=1000) msp1 <- msp1[order(msp1)] hpa2 <- sample(8000:16000/1000, size=1000) hpa2 <- hpa2[order(hpa2)] size <- sample((1:1000)*1.8+200, size=1000) rand <- which.min(abs(msp1-quantile(msp1, 0.25))) plotFeature(cbind(msp1, hpa2), size, which.random=(rand-20):(rand+20), main="Random") #rm(msp1, hpa2, size, rand)
Methods for plotting featureData (ex: fragment size) versus two-color signal intensity data
Handle empty function call
Handle input of an object of class ExpressionSet
. Derive both data and feature information from a single object.
Handle input of an object of class ExpressionSet
. Derive data from this class but use feature values from a vector input.
Handle matrix input, where each of two columns in matrix represents data from one signal channel. Feature data is derived from values from a vector input.
Pairwise comparison of samples producing a matrix of scatterplots and a corresponding dendrogram
plotPairs(x, ...)
plotPairs(x, ...)
x |
a numeric matrix, where each column represents a different sample. |
... |
Arguments to be passed to methods (see
|
Reid F. Thompson ([email protected])
plotPairs-methods
, dist
, hclust
, dendrogram
, cutree
, pairs
#demo(pipeline,package="HELP") x <- sample(1:10000,size=10000) x <- cbind(x,x+5,x*sample((1000:2000)/1000,size=10000,replace=TRUE),sample(-1*(1:10000),size=10000)) colnames(x) <- c("x","x+5","spread","random") plotPairs(x) #rm(x)
#demo(pipeline,package="HELP") x <- sample(1:10000,size=10000) x <- cbind(x,x+5,x*sample((1000:2000)/1000,size=10000,replace=TRUE),sample(-1*(1:10000),size=10000)) colnames(x) <- c("x","x+5","spread","random") plotPairs(x) #rm(x)
Methods for pairwise comparison of samples producing a matrix of scatterplots and a corresponding dendrogram
Handle empty function call
Handle matrix input
Handle input of an object of class ExpressionSet
. Derive data from AssayData
.
Reid F. Thompson ([email protected])
Apply quantile normalization to multiple bins of data, divided by a sliding window approach
quantileNormalize(x, y, ...)
quantileNormalize(x, y, ...)
x |
the vector of numerical data to be normalized. If |
y |
an additional vector of numerical data to be used for binning. If |
... |
Arguments to be passed to methods (see
|
Returns a vector of normalized numerical data according to input parameters.
Reid F. Thompson ([email protected])
quantileNormalize-methods
, quantile
#demo(pipeline,package="HELP") x <- rep(1:100,10)+10*rep(1:10,each=100) y <- rep(1:20,each=50) d <- density(quantileNormalize(x,y,num.bins=20,num.steps=1,mode="discrete")) plot(density(x)) lines(d$x,d$y/3,col="red") #rm(x,y,d)
#demo(pipeline,package="HELP") x <- rep(1:100,10)+10*rep(1:10,each=100) y <- rep(1:20,each=50) d <- density(quantileNormalize(x,y,num.bins=20,num.steps=1,mode="discrete")) plot(density(x)) lines(d$x,d$y/3,col="red") #rm(x,y,d)
Methods for applying quantile normalization to multiple bins of data, divided by a sliding window approach
Handle empty function call
Handle matrix input, reinterpret function call with two vector input if matrix has two columns, otherwise handle as empty function call
Handle empty function call
Handle input of an object of class ExpressionSet
. Derive binning information from this class but use data from a vector input.
Handle input of two vectors specifying data and binning information, respectively.
Handle input of an object of class ExpressionSet
. Derive both data and binning information from a single object.
Handle input of an object of class ExpressionSet
. Derive data from this class but use binning information from a vector input.
Handle input of two objects of class ExpressionSet
. Derive data and binning information from each one, respectively.
Reid F. Thompson ([email protected])
Function to extract array design information from corresponding files in the Nimblegen .ndf and .ngd formats.
readDesign(x, y, z, ...)
readDesign(x, y, z, ...)
x |
path to the Nimblegen design file (.ndf). Each line of the file is interpreted as a single spot on the array design. If it does not contain an absolute path, the file name is relative to the current working directory, |
y |
path to the Nimblegen gene descriptions file (.ngd). Each line of the file is interpreted as a single locus. If it does not contain an absolute path, the file name is relative to the current working directory, |
z |
object in which to store design information from files. Can be an |
... |
Arguments to be passed to methods (see
|
Returns an ExpressionSet
filled with featureData
containing the following featureColumns
:
\option{SEQ_ID} |
a vector of characters with container IDs, linking each probe to a parent identifier |
\option{PROBE_ID} |
a vector of characters containing unique ID information for each probe |
\option{X} |
vector of numerical data determining x-coordinates of probe location on chip |
\option{Y} |
vector of numerical data determining y-coordinates of probe location on chip |
\option{TYPE} |
a vector of characters defining the type of probe, e.g. random background signals ("RAND") or usable data ("DATA"). |
\option{CHR} |
a matrix of characters containing unique ID and chromosomal positions for each container |
\option{START} |
a matrix of characters containing unique ID and chromosomal positions for each container |
\option{STOP} |
a matrix of characters containing unique ID and chromosomal positions for each container |
\option{SIZE} |
a matrix of characters containing unique ID and chromosomal positions for each container |
\option{SEQUENCE} |
a vector of characters containing sequence information for each probe |
\option{WELL} |
a vector of characters containing multiplex well location for each probe (if present in design files) |
Reid F. Thompson ([email protected])
readDesign-methods
, read.table
#demo(pipeline, package="HELP") chr <- rep("chr1", 500) start <- (1:500)*200 stop <- start+199 x <- 1:500 seqids <- sample(1:50, size=500, replace=TRUE) cat("#COMMENT\nSEQ_ID\tCHROMOSOME\tSTART\tSTOP\n", file="./read.design.test.ngd") table.ngd <- cbind(seqids, chr, start, stop) write.table(table.ngd, file="./read.design.test.ngd", append=TRUE, col.names=FALSE, row.names=FALSE, quote=FALSE, sep="\t") cat("#COMMENT\nSEQ_ID\tX\tY\tPROBE_ID\tCONTAINER\tPROBE_SEQUENCE\tPROBE_DESIGN_ID\n", file="./read.design.test.ndf") sequence <- rep("NNNNNNNN", 500) table.ndf <- cbind(seqids, x, x, x, x, sequence, x) write.table(table.ndf, file="./read.design.test.ndf", append=TRUE, col.names=FALSE, row.names=FALSE, quote=FALSE, sep="\t") x <- readDesign("./read.design.test.ndf", "./read.design.test.ngd") seqids[1:10] pData(featureData(x))$"SEQ_ID"[1:10] #rm(table.ngd, table.ndf, chr, start, stop, x, seqids, sequence) #file.remove("./read.design.test.ngd") #file.remove("./read.design.test.ndf")
#demo(pipeline, package="HELP") chr <- rep("chr1", 500) start <- (1:500)*200 stop <- start+199 x <- 1:500 seqids <- sample(1:50, size=500, replace=TRUE) cat("#COMMENT\nSEQ_ID\tCHROMOSOME\tSTART\tSTOP\n", file="./read.design.test.ngd") table.ngd <- cbind(seqids, chr, start, stop) write.table(table.ngd, file="./read.design.test.ngd", append=TRUE, col.names=FALSE, row.names=FALSE, quote=FALSE, sep="\t") cat("#COMMENT\nSEQ_ID\tX\tY\tPROBE_ID\tCONTAINER\tPROBE_SEQUENCE\tPROBE_DESIGN_ID\n", file="./read.design.test.ndf") sequence <- rep("NNNNNNNN", 500) table.ndf <- cbind(seqids, x, x, x, x, sequence, x) write.table(table.ndf, file="./read.design.test.ndf", append=TRUE, col.names=FALSE, row.names=FALSE, quote=FALSE, sep="\t") x <- readDesign("./read.design.test.ndf", "./read.design.test.ngd") seqids[1:10] pData(featureData(x))$"SEQ_ID"[1:10] #rm(table.ngd, table.ndf, chr, start, stop, x, seqids, sequence) #file.remove("./read.design.test.ngd") #file.remove("./read.design.test.ndf")
Methods for extracting array design information from corresponding files in the Nimblegen .ndf and .ngd formats.
Handle empty function call
Handle single vector input. If two values specified in vector, reinterpret function call with two character inputs. Otherwise, handle as empty function call.
Handle two vector input. If vectors of unit length, reinterpret function call with two character inputs. Otherwise, handle as improper function call.
Handle two character vector inputs, each specifiying a filename to use when reading design information. Design information will be written to an ExpressionSet
.
Handle two character vector inputs, each specifiying a filename to use when reading design information. Design information will be written to a database.
Reid F. Thompson ([email protected])
Function to extract data from corresponding files in the Nimblegen .pair format.
readPairs(x, y, z, ...)
readPairs(x, y, z, ...)
x |
the name of the file containing data from signal channel 1. Each line of the file is interpreted as a single data point. If it does not contain an absolute path, the file name is relative to the current working directory, |
y |
the name of the file containing data from signal channel 1. Each line of the file is interpreted as a single data point. If it does not contain an absolute path, the file name is relative to the current working directory, |
z |
object in which to store pair information from files. Can be an |
... |
Arguments to be passed to methods (see
|
Returns an ExpressionSet
filled with assayData
containing matrices of data from both signal channels.
and featureData
containing the following featureColumns
:
\option{SEQ_ID} |
a vector of characters with container IDs, linking each probe to a parent identifier |
\option{PROBE_ID} |
a vector of characters containing unique ID information for each probe |
and phenoData
containing the following sampleColumns
:
\option{CHIPS} |
a vector of characters with .pair file locations for signal channel 1 data |
\option{CHIPS2} |
a vector of characters with .pair file locations for signal channel 2 data |
Reid F. Thompson ([email protected])
#demo(pipeline,package="HELP") x <- 1:500 y <- rev(x) data <- sample(8000:10000/1000,size=500) seqids <- sample(1:50,size=500,replace=TRUE) cat("#COMMENT\nSEQ_ID\tPROBE_ID\tX\tY\tPM\n",file="./read.pair.test.1") table.1 <- cbind(seqids,y,x,x,data) write.table(table.1,file="./read.pair.test.1",append=TRUE,col.names=FALSE,row.names=FALSE,quote=FALSE,sep="\t") cat("#COMMENT\nSEQ_ID\tPROBE_ID\tX\tY\tPM\n",file="./read.pair.test.2") table.2 <- cbind(seqids,y,x,x,rev(data)) write.table(table.2,file="./read.pair.test.2",append=TRUE,col.names=FALSE,row.names=FALSE,quote=FALSE,sep="\t") x <- readPairs("./read.pair.test.1","./read.pair.test.2") c(seqids[1],y[1],data[1],rev(data)[1]) pData(featureData(x))$"SEQ_ID"[1] pData(featureData(x))$"PROBE_ID"[1] assayDataElement(x, "exprs")[1] assayDataElement(x, "exprs2")[1] #rm(table.1,table.2,x,y,data,seqids) #file.remove("./read.pair.test.1") #file.remove("./read.pair.test.2")
#demo(pipeline,package="HELP") x <- 1:500 y <- rev(x) data <- sample(8000:10000/1000,size=500) seqids <- sample(1:50,size=500,replace=TRUE) cat("#COMMENT\nSEQ_ID\tPROBE_ID\tX\tY\tPM\n",file="./read.pair.test.1") table.1 <- cbind(seqids,y,x,x,data) write.table(table.1,file="./read.pair.test.1",append=TRUE,col.names=FALSE,row.names=FALSE,quote=FALSE,sep="\t") cat("#COMMENT\nSEQ_ID\tPROBE_ID\tX\tY\tPM\n",file="./read.pair.test.2") table.2 <- cbind(seqids,y,x,x,rev(data)) write.table(table.2,file="./read.pair.test.2",append=TRUE,col.names=FALSE,row.names=FALSE,quote=FALSE,sep="\t") x <- readPairs("./read.pair.test.1","./read.pair.test.2") c(seqids[1],y[1],data[1],rev(data)[1]) pData(featureData(x))$"SEQ_ID"[1] pData(featureData(x))$"PROBE_ID"[1] assayDataElement(x, "exprs")[1] assayDataElement(x, "exprs2")[1] #rm(table.1,table.2,x,y,data,seqids) #file.remove("./read.pair.test.1") #file.remove("./read.pair.test.2")
Methods for extracting data from corresponding files in the Nimblegen .pair format.
Handle empty function call
Handle single vector input. If two values specified in vector, reinterpret function call with two character inputs. Otherwise, handle as empty function call.
Handle two vector input. If vectors of unit length, reinterpret function call with two character inputs. Otherwise, handle as improper function call.
Handle two character vector inputs, each specifiying a filename to use when reading pair information. Pair data will be written to an ExpressionSet
object.
Handle two character vector inputs, each specifiying a filename to use when reading pair information. Pair data will be written to a database.
Reid F. Thompson ([email protected])
Function to extract sample key data from a file and link chip ID information with aliases if they exist.
readSampleKey(file = NULL, chips = NULL, comment.char = "#", sep = "\t")
readSampleKey(file = NULL, chips = NULL, comment.char = "#", sep = "\t")
file |
the name of the file containing sample key information. Each line of the file is interpreted as a single chip-to-sample map. If it does not contain an absolute path, the file name is relative to the current working directory, |
chips |
a character vector specifying a specific chip ID lookup in the sample key, for which the function will return the appropriate sample aliases |
comment.char |
character: a character vector of length one containing a single character or an empty string. Use "" to turn off the interpretation of comments altogether. |
sep |
the field separator character. Values on each line of the file are separated by this character. If sep = "" the separator is "white space", that is one or more spaces, tabs, newlines or carriage returns. |
Returns a character vector of sample alias information corresponding to the chips present in the sample key or a subset thereof, specified by the chips
input.
Reid F. Thompson ([email protected])
#demo(pipeline,package="HELP") cat("#COMMENT\nCHIP_ID\tSAMPLE\n",file="./sample.key.txt") write.table(cbind(1:10,1001:1010),file="./sample.key.txt",append=TRUE,col.names=FALSE,row.names=FALSE,quote=FALSE,sep="\t") readSampleKey(file="./sample.key.txt") readSampleKey(file="./sample.key.txt",chips=c(7:10,"NA1","NA2")) #file.remove("./sample.key.txt")
#demo(pipeline,package="HELP") cat("#COMMENT\nCHIP_ID\tSAMPLE\n",file="./sample.key.txt") write.table(cbind(1:10,1001:1010),file="./sample.key.txt",append=TRUE,col.names=FALSE,row.names=FALSE,quote=FALSE,sep="\t") readSampleKey(file="./sample.key.txt") readSampleKey(file="./sample.key.txt",chips=c(7:10,"NA1","NA2")) #file.remove("./sample.key.txt")