Title: | Integrated Copy Number Variation detection |
---|---|
Description: | Integrative copy number variation (CNV) detection from multiple platform and experimental design. |
Authors: | Zilu Zhou, Nancy Zhang |
Maintainer: | Zilu Zhou <[email protected]> |
License: | GPL-2 |
Version: | 1.27.0 |
Built: | 2024-11-24 06:29:36 UTC |
Source: | https://github.com/bioc/iCNV |
If your vcf follow the format in the example, you could use this function to extract NGS baf from vcf files. Remember to load library before hands. Save 6 lists, each list has N entry. N = # of individuals (or vcf file) ngs_baf.nm: name of the bamfiles; ngs_baf.chr: the chromosome; ngs_baf.pos: the position of the variants; ngs_baf: the BAF of the variants; ngs_baf.id: the ID of the variants; filenm:the file name
bambaf_from_vcf(dir = ".", vcf_list, chr = NULL, projname = "")
bambaf_from_vcf(dir = ".", vcf_list, chr = NULL, projname = "")
dir |
The directory to all the vcf stored; default is right in this folder. Type character. Defualt '.' |
vcf_list |
All the vcf names stored in vcf.list; could use command:"ls *.vcf > vcf.list" to generate. Type character. |
chr |
Specify the chromosome you want to generate. Must be of int from 1-22. If not specify, this function will generate all chromosomes. Defualt NULL |
projname |
Name of the project. Type character. Default ” |
void
dir <- system.file("extdata", package="iCNV") bambaf_from_vcf(dir,'bam_vcf.list',projname='icnv.demo.') bambaf_from_vcf(dir,'bam_vcf.list',chr=22,projname='icnv.demo.')
dir <- system.file("extdata", package="iCNV") bambaf_from_vcf(dir,'bam_vcf.list',projname='icnv.demo.') bambaf_from_vcf(dir,'bam_vcf.list',chr=22,projname='icnv.demo.')
Default position generated from USCS genome browser
bed_generator(chr = numeric(), hg = numeric(), start = NULL, end = NULL, by = 1000)
bed_generator(chr = numeric(), hg = numeric(), start = NULL, end = NULL, by = 1000)
chr |
Specify the chromosome you want to generate. Must be of int from 1-22. Type integer. |
hg |
Specify the coordinate you want to generate from. Start and end position of hg19 and hg38 have been pre-implemented. Type integer. |
start |
The start position of your BED file. Default NULL |
end |
The end position of your BED file. Default NULL |
by |
The chunk of your DNA for each bin. Type integer. Default 1000. |
void
bed_generator(chr=22,hg=38) bed_generator(22,38,5001,10000,by=500)
bed_generator(chr=22,hg=38) bed_generator(22,38,5001,10000,by=500)
Example NGS VCF files for the 1000 Genome Project, value stored at filenm
filenm
filenm
vector with NGS vcf file names
File names for the NGS vcf
If your array input file follow the format in the example, you could use this function to extract array LRR and baf. Remember to load library before hands. Save 4*[# of chr] lists, each list has N entry. N = # of individuals snp_lrr: SNP LRR intensity; snp_lrr.pos: the position of the SNPs snp_baf: the BAF of the SNPs; snp_baf.pos: the position of the SNPs
get_array_input(dir = character(), pattern = character(), chr = NULL, projname = "")
get_array_input(dir = character(), pattern = character(), chr = NULL, projname = "")
dir |
A string. The directory path to the folder where store signal intensity file according to chr. Type character |
pattern |
A string. The pattern of all the intensity file. Type character |
chr |
Specify the chromosome you want to generate. Must be of int from 1-22. If not specify, this function will generate files for all chromosomes. Default NULL |
projname |
Name of the project. Type character |
void
dir <- system.file("extdata", package="iCNV") pattern <- paste0('*.csv.arrayicnv$') get_array_input(dir,pattern,chr=22,projname='icnv.demo.')
dir <- system.file("extdata", package="iCNV") pattern <- paste0('*.csv.arrayicnv$') get_array_input(dir,pattern,chr=22,projname='icnv.demo.')
Copy number variation detection tool for germline data. Able to combine intensity and BAF from SNP array and NGS data.
iCNV_detection(ngs_plr = NULL, snp_lrr = NULL, ngs_baf = NULL, snp_baf = NULL, ngs_plr.pos = NULL, snp_lrr.pos = NULL, ngs_baf.pos = NULL, snp_baf.pos = NULL, maxIt = 50, visual = 0, projname = "iCNV.", CN = 0, mu = c(-3, 0, 2), cap = FALSE)
iCNV_detection(ngs_plr = NULL, snp_lrr = NULL, ngs_baf = NULL, snp_baf = NULL, ngs_plr.pos = NULL, snp_lrr.pos = NULL, ngs_baf.pos = NULL, snp_baf.pos = NULL, maxIt = 50, visual = 0, projname = "iCNV.", CN = 0, mu = c(-3, 0, 2), cap = FALSE)
ngs_plr |
A list of NGS intensity data. Each entry is an individual. If no NGS data, no need to specify. |
snp_lrr |
A list of SNP array intensity data. Each entry is an individual. If no SNP array data, no need to specify. |
ngs_baf |
A list of NGS BAF data. Each entry is an individual. If no NGS data, no need to specify. |
snp_baf |
A list of SNP array BAF data. Each entry is an individual. If no SNP array data, no need to specify. |
ngs_plr.pos |
A list of NGS intensity postion data. Each entry is an individual with dimension= (#of bins or exons, 2(start and end position)). If no NGS data, no need to specify. |
snp_lrr.pos |
A list of SNP array intensity postion data. Each entry is an individual with length=#of SNPs. If no SNP array data, no need to specify. |
ngs_baf.pos |
A list of NGS BAF postion data. Each entry is an individual with length=#of BAFs. If no NGS data, no need to specify. |
snp_baf.pos |
A list of SNP array BAF postion data. Each entry is an individual with length=#of BAFs. If no SNP array data, no need to specify. |
maxIt |
An integer number indicate the maximum number of EM iteration if not converged during parameter inference. Type integer. Default 50. |
visual |
An indicator variable with value 0,1,2. 0 indicates no visualization, 1 indicates basic visualization, 2 indicates complete visualization (Note visual 2 only work for single platform and integer CN inferenced). Type integer. Default 0 |
projname |
A string as the name of this project. Type character. Default 'iCNV.' |
CN |
An indicator variable with value 0,1 for whether wants to infer exact copy number. 0 no exact CN, 1 exact CN. Type integer. Default 0. |
mu |
A length tree vectur specify means of intensity in mixture normal distribution (Deletion, Diploid, Duplification). Default c(-3,0,2) |
cap |
A boolean decides whether we cap insane intensity value due to double deletion or mutiple amplification. Type logical. Default False |
(1) CNV inference, contains CNV inference, Start and end position for each inference, Conditional probability for each inference, mu for mixture normal, sigma for mixture normal, probability of CNVs, Z score for each inference.
(2) exact copy number for each CNV inference, if CN=1.
# icnv call without genotype (just infer deletion, duplication) projname <- 'icnv.demo.' icnv_res0 <- iCNV_detection(ngs_plr,snp_lrr, ngs_baf,snp_baf, ngs_plr.pos,snp_lrr.pos, ngs_baf.pos,snp_baf.pos, projname=projname,CN=0,mu=c(-3,0,2),cap=TRUE,visual = 1) # icnv call with genotype inference and complete plot projname <- 'icnv.demo.geno.' icnv_res1 <- iCNV_detection(ngs_plr,snp_lrr, ngs_baf,snp_baf, ngs_plr.pos,snp_lrr.pos, ngs_baf.pos,snp_baf.pos, projname=projname,CN=1,mu=c(-3,0,2),cap=TRUE,visual = 2)
# icnv call without genotype (just infer deletion, duplication) projname <- 'icnv.demo.' icnv_res0 <- iCNV_detection(ngs_plr,snp_lrr, ngs_baf,snp_baf, ngs_plr.pos,snp_lrr.pos, ngs_baf.pos,snp_baf.pos, projname=projname,CN=0,mu=c(-3,0,2),cap=TRUE,visual = 1) # icnv call with genotype inference and complete plot projname <- 'icnv.demo.geno.' icnv_res1 <- iCNV_detection(ngs_plr,snp_lrr, ngs_baf,snp_baf, ngs_plr.pos,snp_lrr.pos, ngs_baf.pos,snp_baf.pos, projname=projname,CN=1,mu=c(-3,0,2),cap=TRUE,visual = 2)
We could add the output to custom tracks on Genome Browser. Remeber to choose human assembly matches your input data. We color coded the CNVs to make it as consistant as IGV. To show color, click 'User Track after submission', and edit config to 'visibility=2 itemRgb="On"'. Color see Github page for more example.
icnv_output_to_gb(chr = numeric(), icnv.output)
icnv_output_to_gb(chr = numeric(), icnv.output)
chr |
CNV chromosome. Type integer. |
icnv.output |
output from output_list_function |
matrix for Genome browser
icnv.output <- output_list(icnv_res=icnv_res0,sampleid=sampname_qc, CN=0, min_size=10000) gb_input <- icnv_output_to_gb(chr=22,icnv.output) write.table(gb_input,file='icnv_res_gb_chr22.tab',quote=FALSE,col.names=FALSE,row.names=FALSE)
icnv.output <- output_list(icnv_res=icnv_res0,sampleid=sampname_qc, CN=0, min_size=10000) gb_input <- icnv_output_to_gb(chr=22,icnv.output) write.table(gb_input,file='icnv_res_gb_chr22.tab',quote=FALSE,col.names=FALSE,row.names=FALSE)
iCNV calling result of all the samples
icnv_res
icnv_res
A list containing the calling result of CNVs:
HMM call result without Copy number
exact copy number
iCNV calling result
10 samples BAF value extracted from VCF files, location stored at ngs_baf.pos
ngs_baf
ngs_baf
A list of ten, which each entry is the BAF value for a individual
BAF value
46 samples BAF chromosome. Pre-computed using whole exome sequencing data of 46 HapMap samples.
ngs_baf.chr
ngs_baf.chr
A list of 46, which each entry is the BAF chromosome for a individual position
BAF chromosome
46 samples BAF ids. Pre-computed using whole exome sequencing data of 46 HapMap samples.
ngs_baf.id
ngs_baf.id
A list of 46, which each entry is the BAF variants id a individual position
BAF variants id
46 samples BAF names.
ngs_baf.nm
ngs_baf.nm
A list of 46, which each entry is the sample name
BAF variants sample names
10 samples BAF position extracted from VCF files, value stored at ngs_baf
ngs_baf.pos
ngs_baf.pos
A list of ten, which each entry is the BAF positions for a individual
BAF position
10 samples PLR value from BAM calculated by CODEX, exon position stored at ngs_plr.pos
ngs_plr
ngs_plr
A list of ten, which each entry is the PLR value for a individual, calculated from CODEX
PLR value
10 samples exon position extracted from BED files, value stored at ngs_plr
ngs_plr.pos
ngs_plr.pos
A list of ten, which each entry is the Exon positions for a individual
Exon position
Pre-stored normObj data for demonstration purposes.
normObj
normObj
Pre-computed using whole exome sequencing data of 46 HapMap samples.
normObj demo data (list) pre-computed.
Zilu Zhou [email protected]
Yhat <- normObjDemo$Yhat AIC <- normObjDemo$AIC BIC <- normObjDemo$BIC RSS <- normObjDemo$RSS K <- normObjDemo$K
Yhat <- normObjDemo$Yhat AIC <- normObjDemo$AIC BIC <- normObjDemo$BIC RSS <- normObjDemo$RSS K <- normObjDemo$K
Generate human readable output from result calculated by iCNV_detection function
output_list(icnv_res, sampleid = NULL, CN = 0, min_size = 0)
output_list(icnv_res, sampleid = NULL, CN = 0, min_size = 0)
icnv_res |
CNV inference result. Output from iCNV_detection() |
sampleid |
the name of the sample, same order as the input |
CN |
An indicator variable with value 0,1 for whether exact copy number inferred in iCNV_detection. 0 no exact CN, 1 exact CN. Type integer. Default 0. |
min_size |
A integer which indicate the minimum length of the CNV you are interested in. This could remove super short CNVs due to noise. Type integer. Default 0. Recommend 1000. |
output CNV list of each individual
icnv.output <- output_list(icnv_res=icnv_res0,sampleid=sampname_qc, CN=0)
icnv.output <- output_list(icnv_res=icnv_res0,sampleid=sampname_qc, CN=0)
For quality checking purpose during intermediate steps
plot_intensity(intensity, chr = numeric())
plot_intensity(intensity, chr = numeric())
intensity |
Specify the ngs_plr object generated by CODEX or SNP array. |
chr |
Specify the chromosome you want to generate. Must be of int from 1-22. Type integer |
void
chr <- 22 plot_intensity(ngs_plr,chr) plot_intensity(snp_lrr,chr)
chr <- 22 plot_intensity(ngs_plr,chr) plot_intensity(snp_lrr,chr)
Plot out CNV inference score. Each row is a sample, each column is a SNP or, exon (WES) or bin (WGS). Red color indicate score favor duplication whereas blue favor deletion.
plotHMMscore(icnv_res, h = NULL, t = NULL, title = "score plot", output = NULL, col = "")
plotHMMscore(icnv_res, h = NULL, t = NULL, title = "score plot", output = NULL, col = "")
icnv_res |
CNV inference result. Result from iCNV_detection() (i.e. iCNV_detection(...)) |
h |
start position of this plot. Default Start of the whole chromosome |
t |
end position of this plot. Default End of the whole chromosome |
title |
of this plot. Character value. Type character Default "score plot" |
output |
generated from output_list_function. If it isn't null, only CNVs in output file will be highlighted. Default NULL |
col |
Specify if would like to plot in DGV color scheme ('DGV',red for deletion, blue for duplication and grey for diploid) or default color scheme (blue for deletion, red for duplicatin and and green for diploid) Type character. Default ” |
void
plotHMMscore(icnv_res0,h=21000000, t=22000000, title='my favorite subject') plotHMMscore(icnv_res0,h=21000000, t=22000000, title='my favorite subject',col='DGV')
plotHMMscore(icnv_res0,h=21000000, t=22000000, title='my favorite subject') plotHMMscore(icnv_res0,h=21000000, t=22000000, title='my favorite subject',col='DGV')
Plot relationship between platforms and features for each individual. Only work for muli-platform inference.
plotindi(ngs_plr, snp_lrr, ngs_baf, snp_baf, ngs_plr.pos, snp_lrr.pos, ngs_baf.pos, snp_baf.pos, icnvres, I = numeric(), h = NULL, t = NULL)
plotindi(ngs_plr, snp_lrr, ngs_baf, snp_baf, ngs_plr.pos, snp_lrr.pos, ngs_baf.pos, snp_baf.pos, icnvres, I = numeric(), h = NULL, t = NULL)
ngs_plr |
A list of NGS intensity data. Each entry is an individual. If no NGS data, no need to specify. |
snp_lrr |
A list of SNP array intensity data. Each entry is an individual. If no SNP array data, no need to specify. |
ngs_baf |
A list of NGS BAF data. Each entry is an individual. If no NGS data, no need to specify. |
snp_baf |
A list of SNP array BAF data. Each entry is an individual. If no SNP array data, no need to specify. |
ngs_plr.pos |
A list of NGS intensity postion data. Each entry is an individual with dimension= (#of bins or exons, 2(start and end position)). If no NGS data, no need to specify. |
snp_lrr.pos |
A list of SNP array intensity postion data. Each entry is an individual with length=#of SNPs. If no SNP array data, no need to specify. |
ngs_baf.pos |
A list of NGS BAF postion data. Each entry is an individual with length=#of BAFs. If no NGS data, no need to specify. |
snp_baf.pos |
A list of SNP array BAF postion data. Each entry is an individual with length=#of BAFs. If no SNP array data, no need to specify. |
icnvres |
CNV inference result. The output from iCNV_detection() |
I |
Indicating the position of the individual to plot. Type integer. |
h |
start position of this plot. Default Start of the whole chromosome |
t |
end position of this plot. Default End of the whole chromosome |
void
plotindi(ngs_plr,snp_lrr,ngs_baf,snp_baf, ngs_plr.pos,snp_lrr.pos,ngs_baf.pos,snp_baf.pos, icnv_res0,I=1)
plotindi(ngs_plr,snp_lrr,ngs_baf,snp_baf, ngs_plr.pos,snp_lrr.pos,ngs_baf.pos,snp_baf.pos, icnv_res0,I=1)
name of project
projname
projname
string
name of project
Pre-stored qcObj data for demonstration purposes.
qcObj
qcObj
Pre-computed using whole exome sequencing data of 46 HapMap samples.
qcObj demo data (list) pre-computed.
Zilu Zhou [email protected]
Y_qc <- qcObj$Y_qc sampname_qc <- qcObj$sampname_qc gc_qc <- qcObj$gc_qc mapp_qc <- qcObj$mapp_qc ref_qc <- qcObj$ref_qc
Y_qc <- qcObj$Y_qc sampname_qc <- qcObj$sampname_qc gc_qc <- qcObj$gc_qc mapp_qc <- qcObj$mapp_qc ref_qc <- qcObj$ref_qc
46 samples BAM names.
sampname
sampname
A vector of 46, which each entry is the sample name
CODEX sample names
QCed sample name
sampname_qc
sampname_qc
string
name of samples after QC
10 samples BAF value extracted from standard format files, location stored at snp_baf.pos
snp_baf
snp_baf
A list of ten, which each entry is the BAF value for a individual
BAF value
10 samples BAF position extracted from standard format, value stored at snp_baf
snp_baf.pos
snp_baf.pos
A list of ten, which each entry is the BAF positions for a individual
BAF position
10 samples LRR value from standard format, SNP position stored at snp_lrr.pos
snp_lrr
snp_lrr
A list of ten, which each entry is the LRR value for a individual
LRR value
10 samples SNP position extracted from standard format, value stored at snp_lrr
snp_lrr.pos
snp_lrr.pos
A list of ten, which each entry is the SNP positions for a individual
SNP position