Title: | A Genotype Calling Algorithm for Affymetrix SNP Arrays |
---|---|
Description: | A classification algorithm, based on a multi-chip, multi-SNP approach for Affymetrix SNP arrays. Using a large training sample where the genotype labels are known, this aglorithm will obtain more accurate classification results on new data. RLMM is based on a robust, linear model and uses the Mahalanobis distance for classification. The chip-to-chip non-biological variation is removed through normalization. This model-based algorithm captures the similarities across genotype groups and probes, as well as thousands other SNPs for accurate classification. NOTE: 100K-Xba only at for now. |
Authors: | Nusrat Rabbee <[email protected]>, Gary Wong <[email protected]> |
Maintainer: | Nusrat Rabbee <[email protected]> |
License: | LGPL (>= 2) |
Version: | 1.69.0 |
Built: | 2024-11-30 03:47:16 UTC |
Source: | https://github.com/bioc/RLMM |
This function entails classification of SNPs based on the theta estimates (thetafile), genotype information (A regions file), and some internal files. Currently, this algorithm works for the Affymetrix 100K - Xba dataset.
Classify(genotypefile = "", regionsfile = "", thetafile = "", callrate = 100)
Classify(genotypefile = "", regionsfile = "", thetafile = "", callrate = 100)
genotypefile |
Name of the classified SNPs with the genotypes (required) |
regionsfile |
Character string specifying the directory AND name of regionsfile - e.g., "Xba.regions" (required) |
thetafile |
Character string specifying the directory AND name of thetafile (required) |
callrate |
Call Rate percentage; The user can specify any number from the list: 80,82,84,86,88,90,92,94,96,98,100. Default is 100%(optional) |
For each SNP, Mahalanobis distances from each chip's (theta A, theta B) ordered pair to the genotype centers is calculated. Each chip is assigned the genotype of the cluster which it is closest to (ie: AA, AB, BB).
Nusrat Rabbee <[email protected]>, Gary Wong <[email protected]>
Assuming that the *.norm files are created, this step of the data analysis will calculate estimates of theta A and theta B values for each SNP and chip based on normalized probe intensity data from the *.norm files. The theta values are produced from fitting a probe-level additive model to the log2 A probe intensities and the B intensities separately.
create_Thetafile(probefiledir = getwd(), start = 1, end = -1, thetafile = "")
create_Thetafile(probefiledir = getwd(), start = 1, end = -1, thetafile = "")
probefiledir |
Character string specifying the directory with the *.norm files (optional) |
start |
An integer value specifying which SNP number we should start at when calculating the theta values (optional) |
end |
An integer value specifying which SNP number we should stop at when calculating the theta values (optional) |
thetafile |
A character string specifying the name the theta file will be saved as (optional) |
Nusrat Rabbee <[email protected]>, Gary Wong <[email protected]>
Given a directory with *.raw files, it will normalize the PMA and PMB intensities in each file using Xba.CQV (composite quantile vector) and return the normalized values written to *.norm files corresponding to its *.raw files. EG: If two *.raw files are used, two *.norm files will be returned. This normalization simply puts the probe data on the same scale as the training data.
normalize_Rawfiles(cqvfile = "", probefiledir = getwd())
normalize_Rawfiles(cqvfile = "", probefiledir = getwd())
cqvfile |
Character string specifying the CQV filename (e.g., Xba.CQV) (required) |
probefiledir |
Character string specifying location of the *.raw files and *.norm files (optional) |
Nusrat Rabbee <[email protected]>, Gary Wong <[email protected]>
Creates an Allele Summary plot (allele B vs. allele A) for each SNP specified in snpsfilename. The points in the plot are the (theta A, theta B) ordered pairs for all the samples of the SNP. If a plotfilename is specified, it will save the plot as a .ps file, otherwise the plot is shown on screen.
plot_theta(genotypefile = "Xba.rlmm", thetafile = "Xba.theta", Pick.Obj = "FALSE", plotfile = "plots.ps", snpsfile = "snps.lst")
plot_theta(genotypefile = "Xba.rlmm", thetafile = "Xba.theta", Pick.Obj = "FALSE", plotfile = "plots.ps", snpsfile = "snps.lst")
genotypefile |
Character string specifying the directory AND name of the .rlmm file created by |
thetafile |
Character string specifying the directory AND name of the .rlmm file created by |
Pick.Obj |
At this point, it should always be left as the default FALSE, ie: it is for development purposes only (optional) |
plotfile |
The name where to store the plot as a .ps file, if blank such as "", it will display on screen instead (optional) |
snpsfile |
A list of SNPs to plot, with one SNP following another on a newline (optional) |
Nusrat Rabbee <[email protected]>, Gary Wong <[email protected]>