Title: | Integrative analysis of structural variations |
---|---|
Description: | This package provides efficient tools to read and integrate structural variations predicted by popular softwares. Annotation and visulation of structural variations are also implemented in the package. |
Authors: | Wen Yao <[email protected]> |
Maintainer: | Wen Yao <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.47.0 |
Built: | 2024-11-29 09:19:42 UTC |
Source: | https://github.com/bioc/intansv |
Integrate predictions of different tools to provide more reliable structural variations.
methodsMerge(..., others=NULL, overLapPerDel = 0.8, overLapPerDup = 0.8, overLapPerInv = 0.8, numMethodsSupDel = 2, numMethodsSupDup = 2, numMethodsSupInv = 2)
methodsMerge(..., others=NULL, overLapPerDel = 0.8, overLapPerDup = 0.8, overLapPerInv = 0.8, numMethodsSupDel = 2, numMethodsSupDup = 2, numMethodsSupInv = 2)
... |
results of different SVs predictions read in to R by intansv. |
others |
a data frame of structural variations predicted by other tools. |
overLapPerDel |
Deletions predicted by different methods that have reciprocal coordinate overlap larger than this threshold would be clustered together |
overLapPerDup |
Duplications predicted by different methods that have reciprocal coordinate overlap larger than this threshold would be clustered together |
overLapPerInv |
Inversions predicted by different methods that have reciprocal coordinate overlap larger than this threshold would be clustered together |
numMethodsSupDel |
Deletion clusters supportted by no more than this threshold of read support would be discarded |
numMethodsSupDup |
Duplication clusters supportted by no more than this threshold of read support would be discarded |
numMethodsSupInv |
Inversion clusters supportted by no more than this threshold of read support would be discarded |
A structural variation (deletion, duplication, inversion et al.) may be reported by different tools. However, the boundaries of this structural variation predicted by different tools don't always agree with each other. Predictions of different methods with reciprocal overlap more than 80 percent were merged. Structural varions supported by only one method were discarded.
A list with the following components:
del |
the integrated deletions of different methods. |
dup |
the integrated duplications of different methods. |
inv |
the integrated inversions of different methods. |
Wen Yao
breakdancer <- readBreakDancer(system.file("extdata/ZS97.breakdancer.sv", package="intansv")) str(breakdancer) cnvnator <- readCnvnator(system.file("extdata/cnvnator",package="intansv")) str(cnvnator) svseq <- readSvseq(system.file("extdata/svseq2",package="intansv")) str(svseq) delly <- readDelly(system.file("extdata/ZS97.DELLY.vcf",package="intansv")) str(delly) pindel <- readPindel(system.file("extdata/pindel",package="intansv")) str(pindel) sv_all_methods <- methodsMerge(breakdancer,pindel,cnvnator,delly,svseq) str(sv_all_methods) sv_all_methods.1 <- methodsMerge(breakdancer,pindel,cnvnator,delly,svseq, overLapPerDel=0.7) str(sv_all_methods.1) sv_all_methods.2 <- methodsMerge(breakdancer,pindel,cnvnator,delly,svseq, overLapPerDel=0.8, numMethodsSupDel=3) str(sv_all_methods.2)
breakdancer <- readBreakDancer(system.file("extdata/ZS97.breakdancer.sv", package="intansv")) str(breakdancer) cnvnator <- readCnvnator(system.file("extdata/cnvnator",package="intansv")) str(cnvnator) svseq <- readSvseq(system.file("extdata/svseq2",package="intansv")) str(svseq) delly <- readDelly(system.file("extdata/ZS97.DELLY.vcf",package="intansv")) str(delly) pindel <- readPindel(system.file("extdata/pindel",package="intansv")) str(pindel) sv_all_methods <- methodsMerge(breakdancer,pindel,cnvnator,delly,svseq) str(sv_all_methods) sv_all_methods.1 <- methodsMerge(breakdancer,pindel,cnvnator,delly,svseq, overLapPerDel=0.7) str(sv_all_methods.1) sv_all_methods.2 <- methodsMerge(breakdancer,pindel,cnvnator,delly,svseq, overLapPerDel=0.8, numMethodsSupDel=3) str(sv_all_methods.2)
Display the chromosome distribution of structural variations by splitting the chromosomes into windows of specific size and counting the number of structural variations in each window.
plotChromosome(genome, structuralVariation, windowSize=1000000)
plotChromosome(genome, structuralVariation, windowSize=1000000)
genome |
A data frame with ID and length of all Chromosomes. |
structuralVariation |
A list of structural variations. |
windowSize |
A specific size (in base pair) to split chromosomes into windows. |
To visualize the distribution of structural variations in the whole genome, chromosomes were splitted into windows of specific size (default 1 Mb) and the number of structural variations in each window were counted. The number of structural variations were shown using circular barplot.
A circular plot with five layers:
the circular view of genome ideogram.
the chromosome coordinates labels.
the circular barplot of number of deletions in each chromosome window.
the circular barplot of number of duplications in each chromosome window.
the circular barplot of number of inversions in each chromosome window.
Wen Yao
delly <- readDelly(system.file("extdata/ZS97.DELLY.vcf",package="intansv")) str(delly) genome.file.path <- system.file("extdata/chr05_chr10.genome.txt", package="intansv") genome <- read.table(genome.file.path, head=TRUE, as.is=TRUE) str(genome) plotChromosome(genome,delly,1000000)
delly <- readDelly(system.file("extdata/ZS97.DELLY.vcf",package="intansv")) str(delly) genome.file.path <- system.file("extdata/chr05_chr10.genome.txt", package="intansv") genome <- read.table(genome.file.path, head=TRUE, as.is=TRUE) str(genome) plotChromosome(genome,delly,1000000)
Display the structural variations in a specific genomic region in circular view.
plotRegion(structuralVariation, genomeAnnotation, regionChromosome, regionStart, regionEnd)
plotRegion(structuralVariation, genomeAnnotation, regionChromosome, regionStart, regionEnd)
structuralVariation |
A list of structural variations. |
genomeAnnotation |
A data frame of genome annotations. |
regionChromosome |
The chromosome identifier of a specific region to view. |
regionStart |
The start coordinate of a specific region to view. |
regionEnd |
The end coordinate of a specific region to view. |
Different SVs were shown as rectangles in different layers. See the package vignette and the example dataset for more details.
A circular plot of all the structural variations and genes in a specific region with four layers:
The composition of genes of a specific genomic region.
The composition of deletions of a specific genomic region.
The composition of duplications of a specific genomic region.
The composition of inversions of a specific genomic region.
Wen Yao
delly <- readDelly(system.file("extdata/ZS97.DELLY.vcf",package="intansv")) str(delly) anno.file.path <- system.file("extdata/chr05_chr10.anno.txt", package="intansv") msu_gff_v7 <- read.table(anno.file.path, head=TRUE, as.is=TRUE) str(msu_gff_v7) plotRegion(delly,msu_gff_v7,"chr05",1,200000)
delly <- readDelly(system.file("extdata/ZS97.DELLY.vcf",package="intansv")) str(delly) anno.file.path <- system.file("extdata/chr05_chr10.anno.txt", package="intansv") msu_gff_v7 <- read.table(anno.file.path, head=TRUE, as.is=TRUE) str(msu_gff_v7) plotRegion(delly,msu_gff_v7,"chr05",1,200000)
Reading in the structural variations predicted by breakDancer, filtering low quality predictions and merging overlapping predictions.
readBreakDancer(file="", scoreCutoff=60, readsSupport=3, regSizeLowerCutoff=100, regSizeUpperCutoff=10000000, method="BreakDancer", ...)
readBreakDancer(file="", scoreCutoff=60, readsSupport=3, regSizeLowerCutoff=100, regSizeUpperCutoff=10000000, method="BreakDancer", ...)
file |
the output file of breakDancer. |
scoreCutoff |
the minimum score for a structural variation to be read in. |
readsSupport |
the minimum read pair support for a structural variation to be read in. |
regSizeLowerCutoff |
the minimum size for a structural variation to be read in. |
regSizeUpperCutoff |
the maximum size for a structural variation to be read in. |
method |
a tag to assign to the result of this function. |
... |
parameters passed to read.table. |
The predicted SVs could be further filtered by score, number of read pairs supporting the occurence of a specific SV, and the predicted size of SVs to get more reliable SVs. See our paper for more details.
A list with the following components:
del |
the deletions predicted by breakDancer. |
inv |
the inversions predicted by breakDancer. |
Wen Yao
breakdancer <- readBreakDancer(system.file("extdata/ZS97.breakdancer.sv", package="intansv")) str(breakdancer)
breakdancer <- readBreakDancer(system.file("extdata/ZS97.breakdancer.sv", package="intansv")) str(breakdancer)
Reading the structural variations predicted by CNVnator, filtering low quality predictions and merging overlapping predictions.
readCnvnator(dataDir=".", regSizeLowerCutoff=100, regSizeUpperCutoff=1000000, method="CNVnator")
readCnvnator(dataDir=".", regSizeLowerCutoff=100, regSizeUpperCutoff=1000000, method="CNVnator")
dataDir |
the directory that contain the output files of CNVnator. |
regSizeLowerCutoff |
the minimum size for a structural variation to be read. |
regSizeUpperCutoff |
the maximum size for a structural variation to be read. |
method |
a tag to assign to the result of this function. |
The predicted SVs could be further filtered by the predicted size of SVs to get more reliable SVs. See our paper for more details. The directory that specified by the parameter "dataDir" should only contain the predictions of CNVnator. See the example dataset for more details.
A list with the following components:
del |
the deletions predicted by CNVnator. |
dup |
the duplications predicted by CNVnator. |
Wen Yao
cnvnator <- readCnvnator(system.file("extdata/cnvnator",package="intansv")) str(cnvnator)
cnvnator <- readCnvnator(system.file("extdata/cnvnator",package="intansv")) str(cnvnator)
Reading the structural variations predicted by DELLY, filtering low quality predictions and merging overlapping predictions.
readDelly(file="", regSizeLowerCutoff=100, regSizeUpperCutoff=1000000, readsSupport=3, method="Delly", ...)
readDelly(file="", regSizeLowerCutoff=100, regSizeUpperCutoff=1000000, readsSupport=3, method="Delly", ...)
file |
the file containing the prediction results of DELLY. |
regSizeLowerCutoff |
the minimum size for a structural variation to be read. |
regSizeUpperCutoff |
the maximum size for a structural variation to be read. |
readsSupport |
the minimum read pair support for a structural variation to be read. |
method |
a tag to assign to the result of this function. |
... |
parameters passed to read.table. |
The predicted SVs could be further filtered by the number of read pairs supporting the occurence of a specific SV, and the predicted size of SVs to get more reliable SVs. See our paper for more details.
A list with the following components:
del |
the deletions predicted by DELLY. |
dup |
the duplications predicted by DELLY. |
inv |
the inversions predicted by DELLY. |
Wen Yao
delly <- readDelly(system.file("extdata/ZS97.DELLY.vcf",package="intansv")) str(delly)
delly <- readDelly(system.file("extdata/ZS97.DELLY.vcf",package="intansv")) str(delly)
Reading the structural variations predicted by Lumpy, filtering low quality predictions and merging overlapping predictions.
readLumpy(file="", regSizeLowerCutoff=100, regSizeUpperCutoff=1000000, readsSupport=3, method="Lumpy", ...)
readLumpy(file="", regSizeLowerCutoff=100, regSizeUpperCutoff=1000000, readsSupport=3, method="Lumpy", ...)
file |
the file containing the prediction results of Lumpy. |
regSizeLowerCutoff |
the minimum size for a structural variation to be read. |
regSizeUpperCutoff |
the maximum size for a structural variation to be read. |
readsSupport |
the minimum read pair support for a structural variation to be read. |
method |
a tag to assign to the result of this function. |
... |
parameters passed to read.table. |
The predicted SVs could be further filtered by the number of reads supporting the occurence of a specific SV, and the predicted size of SVs to get more reliable SVs. See our paper for more details.
A list with the following components:
del |
the deletions predicted by Lumpy. |
dup |
the duplications predicted by Lumpy. |
inv |
the inversions predicted by Lumpy. |
Wen Yao
lumpy <- readLumpy(system.file("extdata/ZS97.LUMPY.vcf",package="intansv")) str(lumpy)
lumpy <- readLumpy(system.file("extdata/ZS97.LUMPY.vcf",package="intansv")) str(lumpy)
Reading the structural variations predicted by Pindel, filtering low quality predictions and merging overlapping predictions.
readPindel(dataDir=".", regSizeLowerCutoff=100, regSizeUpperCutoff=1000000, readsSupport=3, method="Pindel")
readPindel(dataDir=".", regSizeLowerCutoff=100, regSizeUpperCutoff=1000000, readsSupport=3, method="Pindel")
dataDir |
the directory containing the prediction results of Pindel. |
regSizeLowerCutoff |
the minimum size for a structural variation to be read. |
regSizeUpperCutoff |
the maximum size for a structural variation to be read. |
readsSupport |
the minimum read pair support for a structural variation to be read. |
method |
a tag to assign to the result of this function. |
The predicted SVs could be further filtered by the number of reads supporting the occurence of a specific SV, and the predicted size of SVs to get more reliable SVs. See our paper for more details. The directory that specified by the parameter "dataDir" should only contain the predictions of Pindel. The deletions output files should be named using the suffix "_D", the duplications output files should be named using the suffix "_TD", and the inversions output files should be named using the suffix "_INV". See the example dataset for more details.
A list with the following components:
del |
the deletions predicted by Pindel. |
dup |
the duplications predicted by Pindel. |
inv |
the inversions predicted by Pindel. |
Wen Yao
pindel <- readPindel(system.file("extdata/pindel",package="intansv")) str(pindel)
pindel <- readPindel(system.file("extdata/pindel",package="intansv")) str(pindel)
Reading the structural variations predicted by SoftSearch, filtering low quality predictions and merging overlapping predictions.
readSoftSearch(file="", regSizeLowerCutoff=100, readsSupport=3, method="softSearch", regSizeUpperCutoff=1000000, softClipsSupport=3, ...)
readSoftSearch(file="", regSizeLowerCutoff=100, readsSupport=3, method="softSearch", regSizeUpperCutoff=1000000, softClipsSupport=3, ...)
file |
the file containing the prediction results of SoftSearch. |
regSizeLowerCutoff |
the minimum size for a structural variation to be read. |
regSizeUpperCutoff |
the maximum size for a structural variation to be read. |
readsSupport |
the minimum read pair support for a structural variation to be read. |
method |
a tag to assign to the result of this function. |
softClipsSupport |
the minimum soft clip support for a structural variation to be read. |
... |
parameters passed to read.table |
The predicted SVs could be further filtered by the number of reads supporting the occurence of a specific SV, and the predicted size of SVs to get more reliable SVs. See our paper for more details.
A list with the following components:
del |
the deletions predicted by SoftSearch. |
dup |
the duplications predicted by SoftSearch. |
inv |
the inversions predicted by SoftSearch. |
Wen Yao
softSearch <- readSoftSearch(system.file("extdata/ZS97.softsearch",package="intansv")) str(softSearch)
softSearch <- readSoftSearch(system.file("extdata/ZS97.softsearch",package="intansv")) str(softSearch)
Reading the structural variations predicted by SVseq2, filtering low quality predictions and merging overlapping predictions.
readSvseq(dataDir=".", regSizeLowerCutoff=100, method="SVseq2", regSizeUpperCutoff=1000000, readsSupport=3)
readSvseq(dataDir=".", regSizeLowerCutoff=100, method="SVseq2", regSizeUpperCutoff=1000000, readsSupport=3)
dataDir |
a directory containing the predictions of SVseq2. |
regSizeLowerCutoff |
the minimum size for a structural variation to be read. |
regSizeUpperCutoff |
the maximum size for a structural variation to be read. |
readsSupport |
the minimum read pair support for a structural variation to be read. |
method |
a tag to assign to the result of this function. |
The predicted SVs could be further filtered by the number of reads supporting the occurence of a specific SV, and the predicted size of SVs to get more reliable SVs. See our paper for more details. The directory that specified by the parameter "dataDir" should only contain the predictions of SVseq2. The deletions output files should be named using the suffix ".del". See the example dataset for more details.
A list with the following components:
del |
the deletions predicted by SVseq2. |
Wen Yao
svseq <- readSvseq(system.file("extdata/svseq2",package="intansv")) str(svseq)
svseq <- readSvseq(system.file("extdata/svseq2",package="intansv")) str(svseq)
Annotate the effect caused by structural variations to genes and elements of genes.
svAnnotation(structuralVariation,genomeAnnotation)
svAnnotation(structuralVariation,genomeAnnotation)
structuralVariation |
A data frame of structural variations. |
genomeAnnotation |
A data frame of genome annotations. |
A structural variation (deletion, duplication, inversion et al.) could affect the structure of a specific gene, including deletion of introns/exons, deletion of whole gene, et al.. This function gives the detailed effects caused by structural variations to genes and elements of genes.
The parameter "structuralVariation" should be a data frame with three columns:
chromosome the chromosome of a structural variation.
pos1 the start coordinate of a structural variation.
pos2 the end coordinate of a structural variation.
A data frame with the following columns:
chromosome |
the chromosome of a structural variation. |
pos1 |
the start coordinate of a structural variation. |
pos2 |
the end coordinate of a structural variation. |
size |
the size of a structural variation. |
info |
information on a structural variation. |
tag |
the tag of a genomic element. |
start |
the start coordinate of a genomic element. |
end |
the end coordinate of a genomic element. |
strand |
the strand of a genomic element. |
ID |
the ID of a genomic element. |
Wen Yao
breakdancer <- readBreakDancer(system.file("extdata/ZS97.breakdancer.sv", package="intansv")) str(breakdancer) msu_gff_v7 <- read.table(system.file("extdata/chr05_chr10.anno.txt", package="intansv"), head=TRUE, as.is=TRUE, sep="\t") breakdancer.anno <- llply(breakdancer,svAnnotation, genomeAnnotation=msu_gff_v7) str(breakdancer.anno)
breakdancer <- readBreakDancer(system.file("extdata/ZS97.breakdancer.sv", package="intansv")) str(breakdancer) msu_gff_v7 <- read.table(system.file("extdata/chr05_chr10.anno.txt", package="intansv"), head=TRUE, as.is=TRUE, sep="\t") breakdancer.anno <- llply(breakdancer,svAnnotation, genomeAnnotation=msu_gff_v7) str(breakdancer.anno)