Title: | Chromosome Instability Index |
---|---|
Description: | The CINdex package addresses important area of high-throughput genomic analysis. It allows the automated processing and analysis of the experimental DNA copy number data generated by Affymetrix SNP 6.0 arrays or similar high throughput technologies. It calculates the chromosome instability (CIN) index that allows to quantitatively characterize genome-wide DNA copy number alterations as a measure of chromosomal instability. This package calculates not only overall genomic instability, but also instability in terms of copy number gains and losses separately at the chromosome and cytoband level. |
Authors: | Lei Song [aut] (Innovation Center for Biomedical Informatics, Georgetown University Medical Center), Krithika Bhuvaneshwar [aut] (Innovation Center for Biomedical Informatics, Georgetown University Medical Center), Yue Wang [aut, ths] (Virginia Polytechnic Institute and State University), Yuanjian Feng [aut] (Virginia Polytechnic Institute and State University), Ie-Ming Shih [aut] (Johns Hopkins University School of Medicine), Subha Madhavan [aut] (Innovation Center for Biomedical Informatics, Georgetown University Medical Center), Yuriy Gusev [aut, cre] (Innovation Center for Biomedical Informatics, Georgetown University Medical Center) |
Maintainer: | Yuriy Gusev <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.35.0 |
Built: | 2024-10-30 04:48:20 UTC |
Source: | https://github.com/bioc/CINdex |
The example dataset consisits of 10 colon cancer patients, of which 5 had relapse (return of cancer to colon) and the rest did not relapse. This example dataset is part of the complete dataset used in CRC, and can be accessed via G-DOC Plus at https://gdoc.georgetown.edu. The column names are described below:
data(clin.crc)
data(clin.crc)
A matrix with 10 rows and 2 columns
Sample. Sample ID
Label. Refers to the group label/outcome
More details on how this object was created is provided in the vignette titled "How to prepare Input data" in the CINdex package.
An example clinical dataset
This is a probe annotation file for Affymetrix Genome Wide Human SNP Array 6.0. It contains annotation for only the copy number probes in this array and corresponds to hg18 reference genome.
The GRanges object contains details about probe name, chromosome number, start end position and strand. The annotation has been filtered to include only those probes that are located in autosomes.
More details on how this object was created is provided in the vignette titled "How to prepare Input data" in the CINdex package.
data(cnvgr.18.auto)
data(cnvgr.18.auto)
A GRanges object
An example probe annotation file
When the run.cin.chr
and run.cyto.chr
functions are
called, we get Chromosome and Cytoband CIN values for various gain/loss threshold settings.
This comp.heatmap
function can be used to pick the best threshold for the input data.
It plots heatmaps for two groups of interest (case and control) for all the input gain/loss threshold
settings. By visually checking the heatmaps, the user can pick the threshold/setting that shows the best
contrast between two groups of interest.
Steps:
#Step 1: Run cytoband CIN or chromosome CIN - using run.cin.chr()
or run.cin.cyto()
#Step 2: Call this function to create chromosome or cytoband level heatmaps. Pick gain/loss threshold
appropriate for data.
See vignette for more details.
comp.heatmap(R_or_C = "Regular", clinical.inf = NULL, genome.ucsc = NULL, in.folder.name = "output_chr_cin", out.folder.name = "output_chr_plots", plot.choice = "png", base.color = "black", thr.gain = c(2.5, 2.25, 2.1), thr.loss = c(1.5, 1.75, 1.9), V.def = 2:3, V.mode = c("sum", "amp", "del"))
comp.heatmap(R_or_C = "Regular", clinical.inf = NULL, genome.ucsc = NULL, in.folder.name = "output_chr_cin", out.folder.name = "output_chr_plots", plot.choice = "png", base.color = "black", thr.gain = c(2.5, 2.25, 2.1), thr.loss = c(1.5, 1.75, 1.9), V.def = 2:3, V.mode = c("sum", "amp", "del"))
R_or_C |
The value'Regular' plots chromosome level heatmap and 'Cytobands' plots cytoband level heatmaps |
clinical.inf |
An n*2 matrix, the 1st column is 'sample name', the second is 'label' |
genome.ucsc |
A Reference genome |
in.folder.name |
Name of folder where the Chromsome CIN or Cytoband CIN objects are present |
out.folder.name |
Name of folder where the Chromosome heatmaps or Cytoband heatmaps will be saved |
plot.choice |
A choice of whether the heatmaps should be .png or .pdf format |
base.color |
A choice of 'black' or 'white' base color for the heatmap (indicating no instability) |
thr.gain |
A threshold above which will be set as gain |
thr.loss |
A threshold below which will be set as loss |
V.def |
There are 2 different CIN definitions - normalized (value=2) and un-normalized (value=3) |
V.mode |
There are 3 options: 'sum', 'amp' and 'del' |
No value returned. If R_or_C='Regular', it will genearte chromosome level heatmap, If R_or_C='Cytobands',it will generate cytoband level heatmap
See accompanying vignette for end-to-end tutorial
###### Example 1 - Chromosome level ## Step 1: Run chromosome CIN # This is how command should be run: ## Not run: run.cin.chr(grl.seg = grl.data) ## End(Not run) # For this example, we run chr CIN on one threshold only data("grl.data") run.cin.chr(grl.seg = grl.data, thr.gain=2.25, thr.loss=1.75, V.def=3, V.mode="sum") ## Step 2: Plot chromosome level heatmap # This is how the command must be called: ## Not run: comp.heatmap(R_or_C="Regular", clinical.inf=clin.crc, genome.ucsc=hg18.ucsctrack, thr.gain = 2.25, thr.loss = 1.75,V.def = 3,V.mode = "sum") ## End(Not run) # For this example, we run chr heatmap on one threshold only comp.heatmap(R_or_C='Regular', clinical.inf=clin.crc, genome.ucsc=hg18.ucsctrack, thr.gain = 2.25, thr.loss = 1.75,V.def = 3,V.mode = "sum") ###### Example 2 - Cytoband level ## Step 1 : Run cytoband CIN # This is how command should be run: ## Not run: run.cin.cyto(grl.seg = grl.data,cnvgr=cnvgr.18.auto, snpgr=snpgr.18.auto, genome.ucsc = hg18.ucsctrack) ## Step 2: Plot cytoband level heatmap comp.heatmap(R_or_C="Cytobands", clinical.inf=clin.crc, genome.ucsc=hg18.ucsctrack, thr.gain=2.25, thr.loss=1.75,V.def=3,V.mode="sum") ## End(Not run)
###### Example 1 - Chromosome level ## Step 1: Run chromosome CIN # This is how command should be run: ## Not run: run.cin.chr(grl.seg = grl.data) ## End(Not run) # For this example, we run chr CIN on one threshold only data("grl.data") run.cin.chr(grl.seg = grl.data, thr.gain=2.25, thr.loss=1.75, V.def=3, V.mode="sum") ## Step 2: Plot chromosome level heatmap # This is how the command must be called: ## Not run: comp.heatmap(R_or_C="Regular", clinical.inf=clin.crc, genome.ucsc=hg18.ucsctrack, thr.gain = 2.25, thr.loss = 1.75,V.def = 3,V.mode = "sum") ## End(Not run) # For this example, we run chr heatmap on one threshold only comp.heatmap(R_or_C='Regular', clinical.inf=clin.crc, genome.ucsc=hg18.ucsctrack, thr.gain = 2.25, thr.loss = 1.75,V.def = 3,V.mode = "sum") ###### Example 2 - Cytoband level ## Step 1 : Run cytoband CIN # This is how command should be run: ## Not run: run.cin.cyto(grl.seg = grl.data,cnvgr=cnvgr.18.auto, snpgr=snpgr.18.auto, genome.ucsc = hg18.ucsctrack) ## Step 2: Plot cytoband level heatmap comp.heatmap(R_or_C="Cytobands", clinical.inf=clin.crc, genome.ucsc=hg18.ucsctrack, thr.gain=2.25, thr.loss=1.75,V.def=3,V.mode="sum") ## End(Not run)
Example output obtained from running the T-test on Cytoband CIN object. See accompanying vignette in the CINdex package for a complete tutorial
data(cyto.cin4heatmap)
data(cyto.cin4heatmap)
List
Cytoband CIN T-test output
Example output obtained from running the Cytoband CIN function in the CINdex package. Indicates chromsome instability index value for every cytoband.
data(cytobands.cin)
data(cytobands.cin)
List
An example cytoband CIN
Once the user has a list of cytobands of interest, one downstream application could
be to find the list of genes present in the cytoband regions. This extract.genes.in.cyto.regions
function can be used for this purpose. The following steps should be run before this function can
be called:
#Step 1 : Run cytoband CIN - using run.cin.chr()
#Step 2: Plot cytoband level heatmap - using comp.heatmap()
#Step 3: Go through heatmaps as select one appropriate threshold. Load the file.
#Step 4: Perform T test to find differentially expressed cytobands - using ttest.cyto.cin.heatmap()
#Step 5: Call this funtion to extract genes located in cytoband regions
#More details and tutorial are given in the accompanying vignette
extract.genes.in.cyto.regions(cyto.cin4heatmapObj = NULL, genome.ucsc = NULL, gene.annotations = NULL, folder.name = "output_genename")
extract.genes.in.cyto.regions(cyto.cin4heatmapObj = NULL, genome.ucsc = NULL, gene.annotations = NULL, folder.name = "output_genename")
cyto.cin4heatmapObj |
Output of the cytoband T test results |
genome.ucsc |
Reference sequence |
gene.annotations |
Information about CDS start and end positions, Gene names |
folder.name |
Name of output folder |
Output files: The genes names present in the cytoband regions
See accompanying vignette for an end-to-end tutorial
#For this example, we load example T test output object data("cyto.cin4heatmap") data("hg18.ucsctrack") #load Hg 18 reference annotation file data("geneAnno") #load Gene annotations file extract.genes.in.cyto.regions(cyto.cin4heatmapObj =cyto.cin4heatmap, genome.ucsc = hg18.ucsctrack, gene.annotations = geneAnno)
#For this example, we load example T test output object data("cyto.cin4heatmap") data("hg18.ucsctrack") #load Hg 18 reference annotation file data("geneAnno") #load Gene annotations file extract.genes.in.cyto.regions(cyto.cin4heatmapObj =cyto.cin4heatmap, genome.ucsc = hg18.ucsctrack, gene.annotations = geneAnno)
A CDS gene annotation file with the following column names (obtained for human reference)
chrom. Chromosome number
strand. Positive or negative strand
cdsStart. CDS Start position
cdsEnd. CDS end position
GeneID. Gene symbol
More details on how this object was created is provided in the vignette titled "How to prepare Input data" in the CINdex package.
data(geneAnno)
data(geneAnno)
A matrix
An example CDS gene annotation file
To mathematically and quantitatively describe these alternations we first locate their genomic positions and measure their ranges. Such algorithms are referred to as segmentation algorithms. Bioconductor has several copy number segmentation algorithms. There are many copy number segmentation algorithms outside of Bioconductor as well, examples are Fused Margin Regression (FMR) and Circular Binary Segmentation (CBS).
Segmentation results are typically have information about the start position and end position in the genome, and the segment value. The algorithms typically covers chromosomes 1 to 22 without any gaps, sometimes sex chromosomes are also included.
For more details refer tutorial in the accompanying vignette in the CINdex package
data(grl.data)
data(grl.data)
A GRangesList
An example output of segmentation algorithm
The reference annotation file used in the CIN algorithm. The example file used here is for Human Species hg18 and includes information about chromosome number, start and end position, name of cytoband and stain.
More details on how this object was created is provided in the vignette titled "How to prepare Input data" in the CINdex package.
data(hg18.ucsctrack)
data(hg18.ucsctrack)
GRanges object
An example hg18 annotation file
run.cin.chr
calculates chromosome level CIN for the following default thresholds
(with and without normalization): (a) gain threshold 2.5 and loss threshold 1.5 (b) gain threshold 2.25
and loss threshold 1.75 (c) gain threshold 2.10 and loss threshold 1.90. For each of these threshold
settings, this function will calculate CIN for gains, losses, and a combination of gains and losses
(referred to as 'sum' or 'overall' CIN). This will allow user to examine and select the best setting
of gain and loss threshold for their data. More details and tutorial are given in the accompanying
vignette.
run.cin.chr(grl.seg, out.folder.name = "output_chr_cin", thr.gain = c(2.5, 2.25, 2.1), thr.loss = c(1.5, 1.75, 1.9), V.def = 2:3, V.mode = c("sum", "amp", "del"))
run.cin.chr(grl.seg, out.folder.name = "output_chr_cin", thr.gain = c(2.5, 2.25, 2.1), thr.loss = c(1.5, 1.75, 1.9), V.def = 2:3, V.mode = c("sum", "amp", "del"))
grl.seg |
The result of any segmentation algorithm such as CBS,FMR. Should be a data frame of 3 column-lists or matrix of three-column lists |
out.folder.name |
Name of output folder, where the CIN ojbects for each setting will be created |
thr.gain |
A numeric list that contains values set as threshold gain |
thr.loss |
A numeric list that contains values set as threshold loss |
V.def |
An integer vector that has different CIN definitions (2 means normalized, 3 means un-normalized) |
V.mode |
A vector that has 3 options: 'sum', 'amp' and 'del' |
Creates a dataMatrix R object for each setting that contains CIN values
See accompanying vignette for end-to-end tutorial
# Run chromosome level CIN calculation for all thresholds. This is how command should be run: # A number of RData objects will be created in 'output_chr' folder. ## Not run: run.cin.chr(grl.seg = grl.data) ## End(Not run) #For this example, we run this function for one threshold only data("grl.data") run.cin.chr(grl.seg = grl.data, thr.gain=2.25, thr.loss=1.75, V.def=3, V.mode="sum") # Next step: Plot chromosome level heatmap \code{\link{comp.heatmap}} # More details and tutorial are given in the accompanying vignette
# Run chromosome level CIN calculation for all thresholds. This is how command should be run: # A number of RData objects will be created in 'output_chr' folder. ## Not run: run.cin.chr(grl.seg = grl.data) ## End(Not run) #For this example, we run this function for one threshold only data("grl.data") run.cin.chr(grl.seg = grl.data, thr.gain=2.25, thr.loss=1.75, V.def=3, V.mode="sum") # Next step: Plot chromosome level heatmap \code{\link{comp.heatmap}} # More details and tutorial are given in the accompanying vignette
run.cyto.chr
calculates cytoband level CIN for the following default thresholds
(with and without normalization): (a) gain threshold 2.5 and loss threshold 1.5 (b) gain threshold 2.25
and loss threshold 1.75 (c) gain threshold 2.10 and loss threshold 1.90. For each of these threshold
settings, this function will calculate CIN for gains, losses, and a combination of gains and losses
(referred to as 'sum' or 'overall' CIN). This will allow user to examine and select the best setting
of gain and loss threshold for their data.
More details and tutorial are given in the accompanying vignette.
run.cin.cyto(grl.seg, cnvgr = NULL, snpgr = NULL, genome.ucsc, out.folder.name = "output_cyto_cin", thr.gain = c(2.5, 2.25, 2.1), thr.loss = c(1.5, 1.75, 1.9), V.def = 2:3, V.mode = c("sum", "amp", "del"), chr.num = 22)
run.cin.cyto(grl.seg, cnvgr = NULL, snpgr = NULL, genome.ucsc, out.folder.name = "output_cyto_cin", thr.gain = c(2.5, 2.25, 2.1), thr.loss = c(1.5, 1.75, 1.9), V.def = 2:3, V.mode = c("sum", "amp", "del"), chr.num = 22)
grl.seg |
The result of any segmentation algorithm such as CBS,FMR. Should be a GRangesList |
cnvgr |
Probe annotation info for the copy number probes - GRanges object |
snpgr |
Probe annotation info for the SNP probes - GRanges object |
genome.ucsc |
A Reference genome |
out.folder.name |
Name of output folder, where the CIN objects for each setting will be created |
thr.gain |
A numeric list that contains values set as threshold gain |
thr.loss |
A numeric list that contains values set as threshold loss |
V.def |
An integer vector that has 2 different CIN definitions - normalized (value=2) and un-normalized (value=3) |
V.mode |
A vector that has 3 options: 'sum', 'amp' and 'del' |
chr.num |
Number of chromosomes in input. Typically 22. |
Creates a dataMatrix and cytobands.cin R objects for each setting that contains CIN values
Accompanying vignette for complete end-to-end tutorial
#### For this example, we run cytoband CIN calculation for one setting on chromosome 1 only data("grl.data") #need segment level data #getting genome reference file data("hg18.ucsctrack") hg18.ucsctrack.chr <- subset(hg18.ucsctrack, seqnames(hg18.ucsctrack) %in% "chr22") #get probe annotation information data("cnvgr.18.auto") #Call function to run cytoband CIN run.cin.cyto(grl.seg = grl.data, cnvgr=cnvgr.18.auto, snpgr=NULL, genome.ucsc = hg18.ucsctrack.chr, thr.gain = 2.25,thr.loss = 1.75, V.def = 3, V.mode="sum",chr.num = 22) #Run cytoband level CIN calculation for all thresholds. This is how command should be run: ## Not run: run.cin.cyto(grl.seg = grl.data, cnvgr=cnvgr.18.auto, snpgr=snpgr.18.auto, genome.ucsc = hg18.ucsctrack) ## End(Not run) # A number of RData objects will be created in 'output_cyto' folder.
#### For this example, we run cytoband CIN calculation for one setting on chromosome 1 only data("grl.data") #need segment level data #getting genome reference file data("hg18.ucsctrack") hg18.ucsctrack.chr <- subset(hg18.ucsctrack, seqnames(hg18.ucsctrack) %in% "chr22") #get probe annotation information data("cnvgr.18.auto") #Call function to run cytoband CIN run.cin.cyto(grl.seg = grl.data, cnvgr=cnvgr.18.auto, snpgr=NULL, genome.ucsc = hg18.ucsctrack.chr, thr.gain = 2.25,thr.loss = 1.75, V.def = 3, V.mode="sum",chr.num = 22) #Run cytoband level CIN calculation for all thresholds. This is how command should be run: ## Not run: run.cin.cyto(grl.seg = grl.data, cnvgr=cnvgr.18.auto, snpgr=snpgr.18.auto, genome.ucsc = hg18.ucsctrack) ## End(Not run) # A number of RData objects will be created in 'output_cyto' folder.
This is a probe annotation file for Affymetrix Genome Wide Human SNP Array 6.0. It contains annotation for only the SNP probes in this array and corresponds to hg18 reference genome.
The GRanges object contains details about probe name, chromosome number, physical location and strand. The annotation has been filtered to include only those probes that are located in autosomes.
More details on how this object was created is provided in the vignette titled "How to prepare Input data" in the CINdex package.
data(snpgr.18.auto)
data(snpgr.18.auto)
A GRanges object
An example probe annotation file
ttest.cyto.cin.heatmap
to perform T test to find differentially expressed cytobands.
It also plots a heatmap after performing heirarchical clustering. When to use this function:
#Step 1: Run cytoband CIN - using run.cin.chr()
.
#Step 2: Plot cytoband level heatmap - using comp.heatmap()
.
#Step 3: Go through heatmaps as select one appropriate threshold. Load the file.
#Step 4: Call this function.
More details and tutorial are given in the accompanying vignette
ttest.cyto.cin.heatmap(cytobands.cin.obj, clinical.inf, genome.ucsc, file.ext = "gainT_lossT_unnorm", folder.name = "output_ttest", combine.cyto.flag = FALSE)
ttest.cyto.cin.heatmap(cytobands.cin.obj, clinical.inf, genome.ucsc, file.ext = "gainT_lossT_unnorm", folder.name = "output_ttest", combine.cyto.flag = FALSE)
cytobands.cin.obj |
(eg. cytobands.cin_2.25_1.75_unnormalized_amp.Rdata), a list in which each cell is chromosome cin matrix |
clinical.inf |
In a clinical.inf.Rdata is a two columns array, the 1st column is samplename, the 2nd is the label |
genome.ucsc |
Reference sequence |
file.ext |
Provide a meaningful file name extension. Ideally include the gain, loss threshold settings |
folder.name |
Name of folder where the output files will be generated |
combine.cyto.flag |
Whether or not to save the combine cytobands as a uni array rather than a list |
#Outputs: 1. cyto.cin.uni.file.ext.Rdata (eg. cyto.cin.uni.gainT_lossT_unnormalized.Rdata) 2. Heatmaps: eg. CIN relapse-free VS relapse for gainT_lossT_unnormalized_dendrogram.pdf 3. Raw CIN array for the corresponding heatmap: #ttest.cyto.cin4heatmap.gainT_lossT_unnormalized.csv #ttest.cyto.cin4heatmap.gainT_lossT_unnormalized.Rdata 4. T test results for all cytobands on the whole genome #ttest.cytobands.cin.gainT_lossT_unnormalized.txt
See accompaying vignette for a detailed end to end workflow tutorial
#For this example, we load an example cytoband CIN data data("cytobands.cin") data("clin.crc") # sample names with group information data("hg18.ucsctrack") #hg18 reference file ttest.cyto.cin.heatmap(cytobands.cin.obj = cytobands.cin, clinical.inf = clin.crc, genome.ucsc = hg18.ucsctrack)
#For this example, we load an example cytoband CIN data data("cytobands.cin") data("clin.crc") # sample names with group information data("hg18.ucsctrack") #hg18 reference file ttest.cyto.cin.heatmap(cytobands.cin.obj = cytobands.cin, clinical.inf = clin.crc, genome.ucsc = hg18.ucsctrack)