Title: | A suite for analysis of rare genomic variants in whole genome sequencing data |
---|---|
Description: | Second version of RareVariantVis package aims to provide comprehensive information about rare variants for your genome data. It annotates, filters and presents genomic variants (especially rare ones) in a global, per chromosome way. For discovered rare variants CRISPR guide RNAs are designed, so the user can plan further functional studies. Large structural variants, including copy number variants are also supported. Package accepts variants directly from variant caller - for example GATK or Speedseq. Output of package are lists of variants together with adequate visualization. Visualization of variants is performed in two ways - standard that outputs png figures and interactive that uses JavaScript d3 package. Interactive visualization allows to analyze trio/family data, for example in search for causative variants in rare Mendelian diseases, in point-and-click interface. The package includes homozygous region caller and allows to analyse whole human genomes in less than 30 minutes on a desktop computer. RareVariantVis disclosed novel causes of several rare monogenic disorders, including one with non-coding causative variant - keratolythic winter erythema. |
Authors: | Adam Gudys and Tomasz Stokowy |
Maintainer: | Tomasz Stokowy <[email protected]> |
License: | Artistic-2.0 |
Version: | 2.35.0 |
Built: | 2024-12-21 06:15:28 UTC |
Source: | https://github.com/bioc/RareVariantVis |
Function calls homozygous regions from whole genome sequencing data.
callHomozygous(sample, chromosomes, caller = "speedseq", MA_Window = 1000, HMZ_length = 100000, min_n_HMZ = 20)
callHomozygous(sample, chromosomes, caller = "speedseq", MA_Window = 1000, HMZ_length = 100000, min_n_HMZ = 20)
sample |
A name of SNV sample file to be analyzed. |
chromosomes |
A vector of strings indicating chromosomes to be analyzed. |
caller |
A string indicating vcf caller. Default is "speedseq", supports "GATK" |
MA_Window |
A number indicating window size for moving average function. Recommended value for genome is 2000, for exome is 20. Default is 1000. |
HMZ_length |
Minimal length of homozygous region to be called. Default is 100000. |
min_n_HMZ |
Minimal number of variants necessary to call a region. Default is 20. |
comp1 |
function calls homozygous regions from whole genome sequencing data and returns them in a tab separated txt file. |
Tomasz Stokowy
# sample = system.file("extdata", "CoriellIndex_S1_chr19_9-10_S1.vcf.recode.vcf.gz", # package = "RareVariantVis") # callHomozygous(sample=sample, chromosomes=c("19"))
# sample = system.file("extdata", "CoriellIndex_S1_chr19_9-10_S1.vcf.recode.vcf.gz", # package = "RareVariantVis") # callHomozygous(sample=sample, chromosomes=c("19"))
Reads files containing single nucleotide variants (SNV) and structural genomic variants(SV) - vcf.gz files generated by speedseq aligner and variant caller. Function outputs visualization png figures. Figure illustrates variants (blue dots) in their genomic coordinates (x axis). Ratio of alternative reads and depth (y axis) gives information about type of variant: homozygous alternative (expected ratio 1) and heterozygous (expected ratio 0.5). Green dots represent rare variants that pass filters: coding/UTR, nonsynonymous variant with dbSNP frequency < 0.01 and ExAC frequency < 0.01. Orange vertical lines depict position of centromere. Orange dots depict structural and copy number variants that overlap with coding region and are relatively good quality (QUAL > 0). Red curve illustrates moving average of alternative reads/depth ratio. High values of this curve (exceeding 0.75) can suggest potential homozygous/deleterious regions. In addition, files containing table with rare SNV and SV variants only are generated. Tables include variants that passed filters specified above with annotations (uniprot, RefSeq and other). Function analyzes whole genome in about 30 minutes on a desktop computer.
chromosomeVis(sample, sv_sample, dbSNP_file, Exac_file, chromosomes, pngWidth, pngHeight, caller, MA_Window, coding_regions_file, annotation_file, uniprot_file)
chromosomeVis(sample, sv_sample, dbSNP_file, Exac_file, chromosomes, pngWidth, pngHeight, caller, MA_Window, coding_regions_file, annotation_file, uniprot_file)
sample |
A name of SNV sample file to be analyzed. |
sv_sample |
A name of additional SV sample file. If not specified, structural variants are discarded. |
dbSNP_file |
A file with SNPs database. If not specified, chromosome 19 dbSNP is used. |
Exac_file |
ExAC database file. If not specified, chromosome 19 ExAC is used. |
chromosomes |
A vector of strings indicating chromosomes to be analyzed. |
pngWidth |
A number indicating pixel width of output png files. Default is 1600. |
pngHeight |
A number indicating pixel height of output png files. Default is 1200. |
caller |
A string indicating vcf caller. Default is "speedseq", supports "GATK" |
MA_Window |
A number indicating window size for moving average function. Recommended value for genome is 2000, for exome is 20. Default is 1000. |
coding_regions_file |
A bed file indicating coding regions |
annotation_file |
Text file indicating positions of the genes (from UCSC) |
uniprot_file |
Text file indicating gene functions and related diseases (from Uniprot) |
comp1 |
function plots static visualization of genomic variants on all chromosomes, annotates them, filters and reports output variants in tables |
Adam Gudys and Tomasz Stokowy
# analyze chromosome 19 from example genome sample = system.file("extdata", "CoriellIndex_S1_chr19_9-10_S1.vcf.recode.vcf.gz", package = "RareVariantVis") sv_sample = system.file("extdata", "CoriellIndex_S1.sv.vcf.gz", package = "RareVariantVis") chromosomeVis(sample=sample, sv_sample=sv_sample, chromosomes=c("19")) # without sv data # sample = system.file("extdata", "CoriellIndex_S1_chr19_9-10_S1.vcf.recode.vcf.gz", # package = "RareVariantVis") # chromosomeVis(sample=sample, chromosomes=c("19")) # analyze entire genome (use external full-genome dbSNP and ExAC) # it takes approximately 30 mins on a desktop computer # large example data and all necessary hg19 references can be downloaded from: # https://github.com/agudys/DataRareVariantVis # dbSNP_file = "All_20160601.vcf.gz" # Exac_file = "ExAC.r0.3.1.sites.vep.vcf.gz" # chromosomeVis(sample=sample, sv_sample=sv_sample, # dbSNP_file=dbSNP_file, Exac_file=Exac_file, # chromosomes=c(as.character(1:22), "X", "Y"), MA_Window = 2000, # coding_regions_file = "nexterarapidcapture_exome_targetedregions_v1.2.bed", # annotation_file = "UCSC_hg19_refSeq_160702.txt", # uniprot_file = "uniprot-all.txt")
# analyze chromosome 19 from example genome sample = system.file("extdata", "CoriellIndex_S1_chr19_9-10_S1.vcf.recode.vcf.gz", package = "RareVariantVis") sv_sample = system.file("extdata", "CoriellIndex_S1.sv.vcf.gz", package = "RareVariantVis") chromosomeVis(sample=sample, sv_sample=sv_sample, chromosomes=c("19")) # without sv data # sample = system.file("extdata", "CoriellIndex_S1_chr19_9-10_S1.vcf.recode.vcf.gz", # package = "RareVariantVis") # chromosomeVis(sample=sample, chromosomes=c("19")) # analyze entire genome (use external full-genome dbSNP and ExAC) # it takes approximately 30 mins on a desktop computer # large example data and all necessary hg19 references can be downloaded from: # https://github.com/agudys/DataRareVariantVis # dbSNP_file = "All_20160601.vcf.gz" # Exac_file = "ExAC.r0.3.1.sites.vep.vcf.gz" # chromosomeVis(sample=sample, sv_sample=sv_sample, # dbSNP_file=dbSNP_file, Exac_file=Exac_file, # chromosomes=c(as.character(1:22), "X", "Y"), MA_Window = 2000, # coding_regions_file = "nexterarapidcapture_exome_targetedregions_v1.2.bed", # annotation_file = "UCSC_hg19_refSeq_160702.txt", # uniprot_file = "uniprot-all.txt")
Function checks whether a guideRNA can be found that overlaps given SNP. Returns sequence of the guideRNA with the variant marked with the lowercase letters. When multiple guideRNAs are possible for given SNP, guideRNA with the variant closest to the PAM site is being selected.
df |
A data frame, preferably out from chromosomeVis. |
genome |
A object of the BSGenome, by default BSgenome.Hsapiens.UCSC.hg19. |
gsize |
Prefered size of the guideRNA, by default, standard 23 is used. |
PAM |
Prefered Protospacer Adjecent Motif "PAM", short motif that has to be found on the 5' end of the guideRNA, by default, Cas9 is used "GG". |
PAM_rev |
Prefered Protospacer Adjecent Motif "PAM", short motif that has to be found on the reverse strand, by default, Cas9 is used "CC". This is checked only when no guideRNA is found on the forward strand. |
character vector |
Vector of guideRNAs, when no guideRNA was found for the forward strand, reverse strand is checked, when no guideRNA is found NA is returned. |
Kornel Labun
file <- system.file("extdata", "RareVariants_CoriellIndex_S1.txt", package = "RareVariantVis") df <- read.delim(file, stringsAsFactors = FALSE) getCrisprGuides(df)
file <- system.file("extdata", "RareVariants_CoriellIndex_S1.txt", package = "RareVariantVis") df <- read.delim(file, stringsAsFactors = FALSE) getCrisprGuides(df)
Function calculates moving average from a vector of numeric values.
movingAverage(x, n, centered)
movingAverage(x, n, centered)
x |
a vector of numeric values for which moving average is computed |
n |
numeric value giving the frame length for moving average |
centered |
logic variable indicating if moving average should be centered (default = FALSE) |
comp1 |
function returns vector of moving average values |
Winston Chang
movingAverage(1:20, n=3, centered=FALSE)
movingAverage(1:20, n=3, centered=FALSE)
Reads files containing table of rare variants from one chromosome and provides adequate multiple sample visualization. Input files can be obtained from function chromosomeVis. Function outputs visualization html figure. Figure depicts samples in subfigures. Subfigures illustrate variants (dots) in their genomic coordinates (x axis). Ratio of alternative reads and depth (y axis) gives information about type of variant: homozygous alternative (expected ratio 1) and heterozygous (expected ratio 0.5). Zoom to the figures is possible, by marking the region of interest with mouse left click. Right click induces zoom out and return to the original plot. Pointing on variants provides basic information about the variant - gene name and position on chromosome.
multipleVis(inputFiles, outputFile, sampleNames, chromosome)
multipleVis(inputFiles, outputFile, sampleNames, chromosome)
inputFiles |
Vector of strings containing input file names. |
outputFile |
Output file name (string). |
sampleNames |
Vector of sample names (strings). |
chromosome |
Name of the chromosome to be analyzed (string). |
comp1 |
function returns html visualization file for specified samples |
Adam Gudys and Tomasz Stokowy
file1 = system.file("extdata", "RareVariants_CoriellIndex_S1.txt", package = "RareVariantVis") file2 = system.file("extdata", "RareVariants_Coriell_S2.txt", package = "RareVariantVis") inputFiles = c(file1, file2) sampleNames = c("CoriellIndex_S1", "Coriell_S2"); multipleVis(inputFiles, "CorielSamples.html", sampleNames, "19")
file1 = system.file("extdata", "RareVariants_CoriellIndex_S1.txt", package = "RareVariantVis") file2 = system.file("extdata", "RareVariants_Coriell_S2.txt", package = "RareVariantVis") inputFiles = c(file1, file2) sampleNames = c("CoriellIndex_S1", "Coriell_S2"); multipleVis(inputFiles, "CorielSamples.html", sampleNames, "19")
Reads file containing table of rare variants from one chromosome and provides adequate visualization. Input file can be obtained from function chromosomeVis. Function outputs visualization html figure in current working directory. Figure illustrates variants (dots) in their genomic coordinates (x axis). Ratio of alternative reads and depth (y axis) gives information about type of variant: homozygous alternative (expected ratio 1) and heterozygous (expected ratio 0.5). Zoom to the figures is possible, by marking the region of interest with mouse left click. Right click induces zoom out and return to the original plot. Pointing on variants provides basic information about the variant - gene name and position on chromosome.
rareVariantVis(input, outputFile, sample, chromosomes, append)
rareVariantVis(input, outputFile, sample, chromosomes, append)
input |
Name of the input file (string) containing the table with rare variants generated by chromosomeVis. It can also be the variable with the table itself. |
outputFile |
Name of the output file (string) with visualisation. |
sample |
Name of the sample (used only for entitling the charts). |
chromosomes |
Vector of chromosome names (strings) to be included in the visualisation (all chromosomes by default). Chromosomes included in the parameter but absent in the table are ommited. |
append |
Logical value indicating whether charts should be appended to the output file without destroying it (FALSE by default). |
comp1 |
function returns html file with visualization of rare variants |
Adam Gudys and Tomasz Stokowy
file = system.file("extdata", "RareVariants_CoriellIndex_S1.txt", package = "RareVariantVis") rareVariantVis(input=file, "RareVariants_CoriellIndex_S1.html", "CorielIndex")
file = system.file("extdata", "RareVariants_CoriellIndex_S1.txt", package = "RareVariantVis") rareVariantVis(input=file, "RareVariants_CoriellIndex_S1.html", "CorielIndex")