Title: | Find SNV/Indel differences between two bam files with near relationship |
---|---|
Description: | This package is to find SNV/Indel differences between two bam files with near relationship in a way of pairwise comparison thourgh each base position across the genome region of interest. The difference is inferred by fisher test and euclidean distance, the input of which is the base count (A,T,G,C) in a given position and read counts for indels that span no less than 2bp on both sides of indel region. |
Authors: | Xiaobin Xing, Wu Wei |
Maintainer: | Xiaobin Xing <[email protected]> |
License: | GPL (>=2) |
Version: | 1.37.0 |
Built: | 2024-11-18 06:11:28 UTC |
Source: | https://github.com/bioc/SICtools |
This package is to find SNV/Indel differences between two bam files with near relationship in a way of pairwise comparison thourgh each base position across the genome region of interest. The difference is inferred by fisher test and euclidean distance, the input of which is the base count (A,T,G,C) in a given position and read counts for indels that span no less than 2bp on both sides of indel region called from samtools+bcftools
Package: | SICtools |
Type: | Package |
Version: | 1.0 |
Date: | 2014-07-24 |
License: | GPL (>=2) |
LazyLoad: | Yes |
Xiaobin Xing
Maintainer: Xiaobin Xing <[email protected]>
test indel-read count differences at a given indel position between the two bam files. The indel position are obtained by samtools+bcftools first, and count the number of reads that span no less than 3bp of the indel boundary. The read-count matrix at a given indel region from the two bam files are tested by fisher exact test and euclidean distance. If nothing difference, NULL will be returned.
indelDiff(bam1, bam2, refFsa, regChr, regStart, regEnd, minBaseQuality = 13, minMapQuality = 0, nCores = 1, pValueCutOff= 0.05,gtDistCutOff = 0.1,verbose = TRUE)
indelDiff(bam1, bam2, refFsa, regChr, regStart, regEnd, minBaseQuality = 13, minMapQuality = 0, nCores = 1, pValueCutOff= 0.05,gtDistCutOff = 0.1,verbose = TRUE)
bam1 |
the first bam file to be compared |
bam2 |
the second bam file to be compared |
refFsa |
the reference fasta file used for bam1 and bam2 alignments |
regChr |
chromosome name of the region of interest, it should match the chromosome name in reference name |
regStart |
the start position (1-based) of the region of interest |
regEnd |
the end position (1-based) of the region of interest |
minBaseQuality |
the minimum base quality to be used for indel-read count |
minMapQuality |
the minimum read mapping quality to be used for indel-read count |
nCores |
number of thread used for calculate in parallel |
pValueCutOff |
p.value cutoff from fisher.test to display output. If there is no difference between two compared positions (p.value = 1 and d.value = 0), NULL will be returned even setting pValueCutOff = 1. |
gtDistCutOff |
euclidean distance cutoff from dist(,method='euclidean') to display output. If there is no difference between two compared positions (p.value = 1 and d.value = 0), NULL will be returned even setting gtDistCutOff = 0. |
verbose |
print progress on screen, default = TRUE. |
indelDiff
: returns a data.frame with difference information: chromosome, position, reference genenotype, two alt genotypes, and their indel-read count for two bam files, p.value (fisher exact test of these read counts) and d.value (euclidean distance of these read counts)
Xiaobin Xing, <email:[email protected]>
Li H.*, Handsaker B.*, Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R. and 1000 Genome Project Data Processing Subgroup (2009) The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics, 25, 2078-9. [PMID: 19505943]
bam1 <- system.file(package='SICtools','extdata','example1.bam') bam2 <- system.file(package='SICtools','extdata','example2.bam') refFsa <- system.file(package='SICtools','extdata','example.ref.fasta') indelDiff(bam1,bam2,refFsa,'chr07',828514,828914,pValueCutOff=1,gtDistCutOff=0)
bam1 <- system.file(package='SICtools','extdata','example1.bam') bam2 <- system.file(package='SICtools','extdata','example2.bam') refFsa <- system.file(package='SICtools','extdata','example.ref.fasta') indelDiff(bam1,bam2,refFsa,'chr07',828514,828914,pValueCutOff=1,gtDistCutOff=0)
test base count (A,T,G,C) difference at a given position between the two bam files. The base count matrix is tested by fisher exact test and euclidean distance. If nothing difference, NULL will be returned.
snpDiff(bam1, bam2, refFsa, regChr, regStart, regEnd, minBaseQuality = 13, minMapQuality = 0, nCores = 1, pValueCutOff = 0.05, baseDistCutOff = 0.1,verbose = TRUE)
snpDiff(bam1, bam2, refFsa, regChr, regStart, regEnd, minBaseQuality = 13, minMapQuality = 0, nCores = 1, pValueCutOff = 0.05, baseDistCutOff = 0.1,verbose = TRUE)
bam1 |
the first bam file to be compared |
bam2 |
the second bam file to be compared |
refFsa |
the reference fasta file used for bam1 and bam2 alignments |
regChr |
chromosome name of the region of interest, it should match the chromosome name in reference name |
regStart |
the start position (1-based) of the region of interest |
regEnd |
the end position (1-based) of the region of interest |
minBaseQuality |
the minimum base quality to be used for base count |
minMapQuality |
the minimum read mapping quality to be used for base count |
nCores |
number of thread used for calculate in parallel |
pValueCutOff |
p.value cutoff from fisher.test to display output. If there is no difference between two compared positions (p.value = 1 and d.value = 0), NULL will be returned even setting pValueCutOff = 1. |
baseDistCutOff |
euclidean distance cutoff from dist(,method='euclidean') to display output. If there is no difference between two compared positions (p.value = 1 and d.value = 0), NULL will be returned even setting baseDistCutOff = 0. |
verbose |
print progress on screen, default = TRUE. |
snpDiff
: returns a data.frame with difference information: chromosome, position, reference base, base count (A,C,G,T,N) for two bam files, p.value (fisher exact test of these base counts) and d.value (euclidean distance of these base counts)
Xiaobin Xing, <email:[email protected]>
Morgan M, Pages H, Obenchain V and Hayden N. Rsamtools: Binary alignment (BAM), FASTA, variant call (BCF), and tabix file import.
bam1 <- system.file(package='SICtools','extdata','example1.bam') bam2 <- system.file(package='SICtools','extdata','example2.bam') refFsa <- system.file(package='SICtools','extdata','example.ref.fasta') snpDiff(bam1,bam2,refFsa,'chr04',962501,1026983,pValueCutOff=1,baseDistCutOff=0)
bam1 <- system.file(package='SICtools','extdata','example1.bam') bam2 <- system.file(package='SICtools','extdata','example2.bam') refFsa <- system.file(package='SICtools','extdata','example.ref.fasta') snpDiff(bam1,bam2,refFsa,'chr04',962501,1026983,pValueCutOff=1,baseDistCutOff=0)