Title: | A "corrective make-up" program for microarray chips |
---|---|
Description: | The package is used to detect extended, diffuse and compact blemishes on microarray chips. Harshlight automatically marks the areas in a collection of chips (affybatch objects) and a corrected AffyBatch object is returned, in which the defected areas are substituted with NAs or the median of the values of the same probe in the other chips in the collection. The new version handle the substitute value as whole matrix to solve the memory problem. |
Authors: | Mayte Suarez-Farinas, Maurizio Pellegrino, Knut M. Wittkowski, Marcelo O. Magnasco |
Maintainer: | Maurizio Pellegrino <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.79.0 |
Built: | 2024-11-20 06:06:39 UTC |
Source: | https://github.com/bioc/Harshlight |
Harshlight automatically detects and masks blemishes in microarray chips of class AffyBatch
HarshComp(affy.object, my.ErrorImage = NULL, extended.radius = 10, compact.quant.bright = 0.025, compact.quant.dark = 0.025, compact.size.limit = 15, compact.connect = 8, compact.pval = 0.01, percent.contiguity = 50, report.name = 'R.report.ps', na.sub = FALSE, interpolate = TRUE)
HarshComp(affy.object, my.ErrorImage = NULL, extended.radius = 10, compact.quant.bright = 0.025, compact.quant.dark = 0.025, compact.size.limit = 15, compact.connect = 8, compact.pval = 0.01, percent.contiguity = 50, report.name = 'R.report.ps', na.sub = FALSE, interpolate = TRUE)
affy.object |
An AffyBatch object containing two or more chips. |
my.ErrorImage |
A batch of ErrorImages obtained through other programs. The error images must be in a matrix format, in which the first index represents each cell in the matrix and the second index represents the chip number. By default, the program calculates the error images for the batch of chips affy.object as described in Suarez-Farinas M et al., BMC Bioinformatics - 2005. If a batch of error images is provided, the affy.object is also required. |
extended.radius |
Radius of the median kernel used to identify extended defects on the chip. |
compact.quant.bright , compact.quant.dark
|
Quantiles of the Error Image used to declare outliers. Values bigger than the '(1 - compact.quant.bright)' percentile are bright outliers, while values smaller than the 'compact.quant.dark' percentile are dark outliers. The two quantiles are used to detect compact defects. Set it to 0 to turn compact defect detection off. |
compact.size.limit |
Minimum size for clusters to be considered defects. If 0, all the clusters identified will be considered defects, if their size is significantly bigger than the one expected by chance (see also compact.pval). |
compact.connect |
Defines the neighbourhood of a pixel, used to connect outliers into clusters. If 4, the neighbourhood contains the pixels that are adjacent of a pixel of reference, on the vertical or horizontal axis. If 8, the neighbourhood contains all the 8 pixels sorrounding the pixel of reference. If a connectivity of 4 is used, clusters that are connected only through an edge will be considered as separate clusters. In this case, the single clusters could be eliminated because their size does not exceed compact.size.limit or diffuse.size.limit. Therefore, we suggest to use a connectivity of 8. |
compact.pval |
Threshold for compact defect size. This is the maximum probability accepted to find a cluster of the same size by chance. If 1, a cluster is considered a compact defect if it is bigger than the value of compact.size.limit. |
percent.contiguity |
Minimum percentage of area density for defects to be considered compact. If 0, every compact defect found will be eliminated before searching for diffuse defects. Though possible, avoid using less than 20; otherwise diffuse defects might not be identified properly. |
report.name |
Name of the PostScript file in which to save the final report. If report.name is set to "", no report will be written. |
na.sub |
If TRUE, the intensity values of the input affyBatch that are affected by defects will be changed in NA. If FALSE, the values will be substituted with the median of the intensity values of the other chips. |
interpolate |
This option is only used if the value of compact.quant.bright or compact.quant.dark is not among those tabulated (density of outliers = 0.01, 0.02, 0.05, 0.10, 0.20, 0.25, 0.30, 0.40; chip size = 534x534, 640x640, 712x712). If TRUE, the cluster size distribution under the null hypothesis of spatially randomly distributed outliers is derived from simulated values through interpolation. If FALSE, the distribution is simulated for the input value of density of outliers (compact.quant.bright/compact.quant.dark) and the specific chip size. The program runs 100.000 simulations by default. |
HarshComp is used to detect extended and compact defects only.
AffyBatch object |
The input AffyBatch object, whose intensity values corresponding to defected areas are substituted either by NA or by the median of the chip's values (depending on na.sub). |
Report |
For each AffyBatch analyzed, a report is written as a PostScript file (see also report.name). |
Mayte Suarez-Farinas, Maurizio Pellegrino, Knut M. Wittkwosky, Marcelo O. Magnasco [email protected]
http://asterion.rockefeller.edu/Harshlight/
Harshlight: a "corrective make-up" program for microarray chips, Mayte Suarez-Farinas, Maurizio Pellegrino, Knut M Wittkowski and Marcelo O Magnasco, BMC Bioinformatics 2005 Dec 10; 6(1):294
"Harshlighting" small blemishes on microarrays, Suarez-Farinas M, Haider A, Wittkowski KM., BMC Bioinformatics. 2005 Mar 22;6(1):65.
Harshlight automatically detects and masks blemishes in microarray chips of class AffyBatch
HarshExt(affy.object, my.ErrorImage = NULL, extended.radius = 10)
HarshExt(affy.object, my.ErrorImage = NULL, extended.radius = 10)
affy.object |
An AffyBatch object containing two or more chips. |
my.ErrorImage |
A batch of ErrorImages obtained through other programs. The error images must be in a matrix format, in which the first index represents each cell in the matrix and the second index represents the chip number. By default, the program calculates the error images for the batch of chips affy.object as described in Suarez-Farinas M et al., BMC Bioinformatics - 2005. If a batch of error images is provided, the affy.object is also required. |
extended.radius |
Radius of the median kernel used to identify extended defects on the chip. |
HarshExt is used to detect only extended defects on the surface of the chip. It does not detect compact or diffuse defects (see the help page for Harshlight).
Mayte Suarez-Farinas, Maurizio Pellegrino, Knut M. Wittkwosky, Marcelo O. Magnasco [email protected]
http://asterion.rockefeller.edu/Harshlight/
Harshlight: a "corrective make-up" program for microarray chips, Mayte Suarez-Farinas, Maurizio Pellegrino, Knut M Wittkowski and Marcelo O Magnasco, BMC Bioinformatics 2005 Dec 10; 6(1):294
"Harshlighting" small blemishes on microarrays, Suarez-Farinas M, Haider A, Wittkowski KM., BMC Bioinformatics. 2005 Mar 22;6(1):65.
Harshlight automatically detects and masks blemishes in microarray chips of class AffyBatch
Harshlight(affy.object, my.ErrorImage = NULL, extended.radius = 10, compact.quant.bright = 0.025, compact.quant.dark = 0.025, compact.size.limit = 15, compact.connect = 8, compact.pval = 0.01, diffuse.bright = 40, diffuse.dark = 35, diffuse.pval = 0.001, diffuse.connect = 8, diffuse.radius = 10, diffuse.size.limit = (3*3.14*(diffuse.radius**2)), percent.contiguity = 50, report.name = 'R.report.ps', na.sub = FALSE, interpolate = TRUE, diffuse.close = TRUE)
Harshlight(affy.object, my.ErrorImage = NULL, extended.radius = 10, compact.quant.bright = 0.025, compact.quant.dark = 0.025, compact.size.limit = 15, compact.connect = 8, compact.pval = 0.01, diffuse.bright = 40, diffuse.dark = 35, diffuse.pval = 0.001, diffuse.connect = 8, diffuse.radius = 10, diffuse.size.limit = (3*3.14*(diffuse.radius**2)), percent.contiguity = 50, report.name = 'R.report.ps', na.sub = FALSE, interpolate = TRUE, diffuse.close = TRUE)
affy.object |
An AffyBatch object containing two or more chips. |
my.ErrorImage |
A batch of ErrorImages obtained through other programs. The error images must be in a matrix format, in which the first index represents each cell in the matrix and the second index represents the chip number. By default, the program calculates the error images for the batch of chips affy.object as described in Suarez-Farinas M et al., BMC Bioinformatics - 2005. If a batch of error images is provided, the affy.object is also required. |
extended.radius |
Radius of the median kernel used to identify extended defects on the chip. |
compact.quant.bright , compact.quant.dark
|
Quantiles of the Error Image used to declare outliers. Values bigger than the '(1 - compact.quant.bright)' percentile are bright outliers, while values smaller than the 'compact.quant.dark' percentile are dark outliers. The two quantiles are used to detect compact defects. Set it to 0 to turn compact defect detection off. |
compact.size.limit , diffuse.size.limit
|
Minimum size for clusters to be considered defects. If 0, all the clusters identified will be considered defects, if their size is significantly bigger than the one expected by chance (see also compact.pval). |
compact.connect , diffuse.connect
|
Defines the neighbourhood of a pixel, used to connect outliers into clusters. If 4, the neighbourhood contains the pixels that are adjacent of a pixel of reference, on the vertical or horizontal axis. If 8, the neighbourhood contains all the 8 pixels sorrounding the pixel of reference. If a connectivity of 4 is used, clusters that are connected only through an edge will be considered as separate clusters. In this case, the single clusters could be eliminated because their size does not exceed compact.size.limit or diffuse.size.limit. Therefore, we suggest to use a connectivity of 8. |
compact.pval |
Threshold for compact defect size. This is the maximum probability accepted to find a cluster of the same size by chance. If 1, a cluster is considered a compact defect if it is bigger than the value of compact.size.limit. |
diffuse.bright , diffuse.dark
|
Percentage of increase (bright) or decrease (dark) of the intensity value of a pixel compared to the expected intensity. Used to declare outliers to detect diffuse defects. The option to detect diffuse defects is turned off if the value is set to 0. |
diffuse.radius |
Radius of the mask used to identify diffuse defects on the chip. Inside this mask the binomial test is performed. |
diffuse.pval |
Significance for the binomial test during diffuse defects' detection. |
percent.contiguity |
Minimum percentage of area density for defects to be considered compact. If 0, every compact defect found will be eliminated before searching for diffuse defects. Though possible, avoid using less than 20; otherwise diffuse defects might not be identified properly. |
report.name |
Name of the PostScript file in which to save the final report. If report.name is set to ”, no report will be written. |
na.sub |
If TRUE, the intensity values of the input affyBatch that are affected by defects will be changed in NA. If FALSE, the values will be substituted with the median of the intensity values of the other chips. |
interpolate |
This option is only used if the value of compact.quant.bright or compact.quant.dark is not among those tabulated (density of outliers = 0.01, 0.02, 0.05, 0.10, 0.20, 0.25, 0.30, 0.40; chip size = 534x534, 640x640, 712x712). If TRUE, the cluster size distribution under the null hypothesis of spatially randomly distributed outliers is derived from simulated values through interpolation. If FALSE, the distribution is simulated for the input value of density of outliers (compact.quant.bright/compact.quant.dark) and the specific chip size. The program runs 100.000 simulations by default. |
diffuse.close |
If TRUE, the whole area in which the diffuse defects are included is considered as a defect. If FALSE, only the outliers inside the area are considered defects. |
AffyBatch object |
The input AffyBatch object, whose intensity values corresponding to defected areas are substituted either by NA or by the median of the chip's values (depending on na.sub). |
Report |
For each AffyBatch analyzed, a report is written as a PostScript file (see also report.name). |
Mayte Suarez-Farinas, Maurizio Pellegrino, Knut M. Wittkwosky, Marcelo O. Magnasco [email protected]
http://asterion.rockefeller.edu/Harshlight/
Harshlight: a "corrective make-up" program for microarray chips, Mayte Suarez-Farinas, Maurizio Pellegrino, Knut M Wittkowski and Marcelo O Magnasco, BMC Bioinformatics 2005 Dec 10; 6(1):294
"Harshlighting" small blemishes on microarrays, Suarez-Farinas M, Haider A, Wittkowski KM., BMC Bioinformatics. 2005 Mar 22;6(1):65.
## To run the example, download the affybatch object example.rda ## from the website http://asterion.rockefeller.edu/Harshlight/ ## Not run: source("example.rda") ## this creates the object my.affybatch in your working environment library(Harshlight) harsh <- Harshlight(affy.object = my.affybatch, report.name = 'example.ps') ## The file example.ps will appear in your working directory ## Calculate expression measures using MAS5 mas.example <- mas5(my.affybatch) mas.harsh <- mas5(harsh) plot(log2(exprs(mas.example)),log2(exprs(mas.harsh))) ## End(Not run)
## To run the example, download the affybatch object example.rda ## from the website http://asterion.rockefeller.edu/Harshlight/ ## Not run: source("example.rda") ## this creates the object my.affybatch in your working environment library(Harshlight) harsh <- Harshlight(affy.object = my.affybatch, report.name = 'example.ps') ## The file example.ps will appear in your working directory ## Calculate expression measures using MAS5 mas.example <- mas5(my.affybatch) mas.harsh <- mas5(harsh) plot(log2(exprs(mas.example)),log2(exprs(mas.harsh))) ## End(Not run)
This data set contains the probability distribution of the cluster size under the assumption of spatially randomly distributed outliers. The distribution depends on the chip size, the density of outliers and the definition of connectivity (see help(Harshlight)). The sets sim\_n4\_/sim\_n8\_ contain the results from 100.000 simulations (sim.pval), while the sets sim\_4\_/sim\_8\_ contain the parameters that are used to interpolate the probability for values of density of outliers and chip sizes not simulated (a, b). The data set is used by the package Harshlight and is not intended for direct use by the user.
The number of occurrences of at least one cluster of a certain size in 100.000 chips with randomly distributed outliers.