Title: | Crossover analysis and genetic map construction |
---|---|
Description: | comapr detects crossover intervals for single gametes from their haplotype states sequences and stores the crossovers in GRanges object. The genetic distances can then be calculated via the mapping functions using estimated crossover rates for maker intervals. Visualisation functions for plotting interval-based genetic map or cumulative genetic distances are implemented, which help reveal the variation of crossovers landscapes across the genome and across individuals. |
Authors: | Ruqian Lyu [aut, cre] |
Maintainer: | Ruqian Lyu <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.11.1 |
Built: | 2024-12-22 03:37:09 UTC |
Source: | https://github.com/bioc/comapr |
Generating distribution of sample genetic distances
bootstrapDist(co_gr, B = 1000, mapping_fun = "k", group_by)
bootstrapDist(co_gr, B = 1000, mapping_fun = "k", group_by)
co_gr |
GRanges or RangedSummarizedExperiment object that contains the
crossover counts for each marker interval across all samples.
Returned by |
B |
integer the number of sampling times |
mapping_fun |
character default to "k" (kosambi mapping function). It can be one of the mapping functions: "k","h" |
group_by |
the prefix for each group that we need to generate distributions for(only when co_gr is a GRanges object). Or the column name for 'colData(co_gr)' that contains the group factor (only when co_gr is a RangedSummarizedExperiment object) |
It takes the crossover counts for samples in multiple groups that is returned by 'countCO'. It then draws samples from a group with replacement and calculate the distribution of relevant statistics.
lists of numeric genetic distances for multiple samples
Ruqian Lyu
data(coCount) bootsDiff <- bootstrapDist(coCount, group_by = "sampleGroup",B=10)
data(coCount) bootsDiff <- bootstrapDist(coCount, group_by = "sampleGroup",B=10)
Calculate genetic distances of marker intervals or binned-chromosome Given whether crossover happens in each marker interval, calculate the recombination fraction in samples and then derive the Haldane or Kosambi genetic distances via mapping functions
calGeneticDist( co_count, bin_size = NULL, mapping_fun = "k", ref_genome = "mm10", group_by = NULL, chrom_info = NULL ) ## S4 method for signature 'GRanges,missing,ANY,ANY,missing' calGeneticDist( co_count, bin_size = NULL, mapping_fun = "k", ref_genome = "mm10", group_by = NULL, chrom_info = NULL ) ## S4 method for signature 'GRanges,numeric,ANY,ANY,missing' calGeneticDist( co_count, bin_size = NULL, mapping_fun = "k", ref_genome = "mm10", group_by = NULL, chrom_info = NULL ) ## S4 method for signature 'GRanges,missing,ANY,ANY,character' calGeneticDist( co_count, bin_size = NULL, mapping_fun = "k", ref_genome = "mm10", group_by = NULL, chrom_info = NULL ) ## S4 method for signature 'GRanges,numeric,ANY,ANY,character' calGeneticDist( co_count, bin_size = NULL, mapping_fun = "k", ref_genome = "mm10", group_by = NULL, chrom_info = NULL ) ## S4 method for signature 'RangedSummarizedExperiment,missing,ANY,ANY,missing' calGeneticDist( co_count, bin_size = NULL, mapping_fun = "k", ref_genome = "mm10", group_by = NULL, chrom_info = NULL ) ## S4 method for signature ## 'RangedSummarizedExperiment,missing,ANY,ANY,character' calGeneticDist( co_count, bin_size = NULL, mapping_fun = "k", ref_genome = "mm10", group_by = NULL, chrom_info = NULL ) ## S4 method for signature ## 'RangedSummarizedExperiment,numeric,ANY,ANY,character' calGeneticDist( co_count, bin_size = NULL, mapping_fun = "k", ref_genome = "mm10", group_by = NULL, chrom_info = NULL ) ## S4 method for signature 'RangedSummarizedExperiment,numeric,ANY,ANY,missing' calGeneticDist( co_count, bin_size = NULL, mapping_fun = "k", ref_genome = "mm10", group_by = NULL, chrom_info = NULL )
calGeneticDist( co_count, bin_size = NULL, mapping_fun = "k", ref_genome = "mm10", group_by = NULL, chrom_info = NULL ) ## S4 method for signature 'GRanges,missing,ANY,ANY,missing' calGeneticDist( co_count, bin_size = NULL, mapping_fun = "k", ref_genome = "mm10", group_by = NULL, chrom_info = NULL ) ## S4 method for signature 'GRanges,numeric,ANY,ANY,missing' calGeneticDist( co_count, bin_size = NULL, mapping_fun = "k", ref_genome = "mm10", group_by = NULL, chrom_info = NULL ) ## S4 method for signature 'GRanges,missing,ANY,ANY,character' calGeneticDist( co_count, bin_size = NULL, mapping_fun = "k", ref_genome = "mm10", group_by = NULL, chrom_info = NULL ) ## S4 method for signature 'GRanges,numeric,ANY,ANY,character' calGeneticDist( co_count, bin_size = NULL, mapping_fun = "k", ref_genome = "mm10", group_by = NULL, chrom_info = NULL ) ## S4 method for signature 'RangedSummarizedExperiment,missing,ANY,ANY,missing' calGeneticDist( co_count, bin_size = NULL, mapping_fun = "k", ref_genome = "mm10", group_by = NULL, chrom_info = NULL ) ## S4 method for signature ## 'RangedSummarizedExperiment,missing,ANY,ANY,character' calGeneticDist( co_count, bin_size = NULL, mapping_fun = "k", ref_genome = "mm10", group_by = NULL, chrom_info = NULL ) ## S4 method for signature ## 'RangedSummarizedExperiment,numeric,ANY,ANY,character' calGeneticDist( co_count, bin_size = NULL, mapping_fun = "k", ref_genome = "mm10", group_by = NULL, chrom_info = NULL ) ## S4 method for signature 'RangedSummarizedExperiment,numeric,ANY,ANY,missing' calGeneticDist( co_count, bin_size = NULL, mapping_fun = "k", ref_genome = "mm10", group_by = NULL, chrom_info = NULL )
co_count |
GRange or RangedSummarizedExperiment object, returned by |
bin_size |
The binning size for grouping marker intervals into bins. If not supplied,the orginial marker intervals are returned with converted genetic distancens based on recombination rate |
mapping_fun |
The mapping function to use, can be one of "k" or "h" (kosambi or haldane) |
ref_genome |
The reference genome name. It is used to fetch the chromosome size information from UCSC database. |
group_by |
character vector contains the unique prefix of sample names that are used for defining different sample groups. Or the column name in colData(co_count) that specify the group factor. If missing all samples are assumed to be from one group |
chrom_info |
A user supplied data.frame containing two columns with column names chrom and size, describing the chromosome names and lengths if not using ref_genome from UCSC. If supplied, the 'ref_genome' is ignored. |
GRanges object GRanges for marker intervals or binned intervals with Haldane or Kosambi centiMorgans
data(coCount) dist_se <- calGeneticDist(coCount) # dist_se <- calGeneticDist(coCount,group_by="sampleGroup")
data(coCount) dist_se <- calGeneticDist(coCount) # dist_se <- calGeneticDist(coCount,group_by="sampleGroup")
RangedSummarizedExperiment object containing the crossover counts across samples for the list of SNP marker intervals
data(coCount)
data(coCount)
An object of class RangedSummarizedExperiment
with 3 rows and 10 columns.
combine two 'RangedSummarizedExperiment' objects, each contains the haplotype state for a list of SNPs across a set of cells. The combined result will have cells from two individuals and merged list of SNPs from the two.
combineHapState(rse1, rse2, groupName = c("Sample1", "Sample2"))
combineHapState(rse1, rse2, groupName = c("Sample1", "Sample2"))
rse1 |
the first 'RangedSummarizedExperiment' |
rse2 |
the second 'RangedSummarizedExperiment' |
groupName |
a character vector of length 2 that contains the first and the second group's names |
A 'RangedSummarizedExperiment' that contains the cells and SNPs in both 'rse'
Ruqian Lyu
BiocParallel::register(BiocParallel::SnowParam(workers = 1)) demo_path <- paste0(system.file("extdata",package = "comapr"),"/") s1_rse_state <- readHapState("s1",chroms=c("chr1"), path=demo_path,barcodeFile=NULL,minSNP = 0, minlogllRatio = 50, bpDist = 100,maxRawCO=10, minCellSNP = 1) s2_rse_state <- readHapState("s2",chroms=c("chr1"), path=demo_path, barcodeFile=paste0(demo_path,"s2_barcodes.txt"), minSNP = 0, minlogllRatio = 50, bpDist = 100,maxRawCO=10, minCellSNP = 1) sb <- combineHapState(s1_rse_state,s2_rse_state)
BiocParallel::register(BiocParallel::SnowParam(workers = 1)) demo_path <- paste0(system.file("extdata",package = "comapr"),"/") s1_rse_state <- readHapState("s1",chroms=c("chr1"), path=demo_path,barcodeFile=NULL,minSNP = 0, minlogllRatio = 50, bpDist = 100,maxRawCO=10, minCellSNP = 1) s2_rse_state <- readHapState("s2",chroms=c("chr1"), path=demo_path, barcodeFile=paste0(demo_path,"s2_barcodes.txt"), minSNP = 0, minlogllRatio = 50, bpDist = 100,maxRawCO=10, minCellSNP = 1) sb <- combineHapState(s1_rse_state,s2_rse_state)
function for formatting and correction genotypes of markers
correctGT(gt_matrix, ref, alt, failed = "Fail", wrong_label = "Homo_ref")
correctGT(gt_matrix, ref, alt, failed = "Fail", wrong_label = "Homo_ref")
gt_matrix |
the input genotype matrix of markers by samples with rownames as marker IDs and column names as sample IDs |
ref |
a vector of genotypes of the inbred reference strain |
alt |
a vector of genotypes of the inbred alternative strain |
failed |
what was used for encoding failed genotype calling such as "Fail" in example
data |
wrong_label |
what would be considered a wrong genotype label for example Homo_ref which should not be in the possible genotypes of BC1F1 samples |
This function changes genotype in alleles to genotype labels, change Homo_ref to Hets/Fail, infer Failed genotype, and change "Failed" to NA for counting crossover later
This function changes genotype in alleles to labels by calling internal
functions lable_gt
, and changes the wrong genotype Homo_ref to Fail by
calling .change_missing
.
a genotype data.frame of sample genotypes with dimension as the input 'gt_matrix' with genotypes converted to labels and failed calls are changed to NA.
Ruqian Lyu
data(snp_geno_gr) data(parents_geno) snp_gt_crt <- correctGT(gt_matrix = GenomicRanges::mcols(snp_geno_gr), ref = parents_geno$ref, alt = parents_geno$alt, fail = "Fail", wrong_label = "Homo_ref")
data(snp_geno_gr) data(parents_geno) snp_gt_crt <- correctGT(gt_matrix = GenomicRanges::mcols(snp_geno_gr), ref = parents_geno$ref, alt = parents_geno$alt, fail = "Fail", wrong_label = "Homo_ref")
Bins the chromosome into supplied number of bins and find the state of the chromosome bins across all gamete cells
countBinState(chr, snpAnno, viState, genomeRange, ntile = 5)
countBinState(chr, snpAnno, viState, genomeRange, ntile = 5)
chr |
character, the chromosome to check |
snpAnno |
data.frame, the SNP annotation for the supplied chromosome |
viState |
dgTMatrix/Matrix, the viterbi state matrix, output from 'sgcocaller' |
genomeRange |
GRanges object with seqlengths information for the genome |
ntile |
integer, how many tiles the chromosome is binned into |
This function is used for checking whether chromosome segregation pattern obeys the expected ratio.
a data.frame that contains chromosome bin segregation ratio
Ruqian Lyu
library(IRanges) library(S4Vectors) chrom_info <- GenomeInfoDb::getChromInfoFromUCSC("mm10") seq_length <- chrom_info$size names(seq_length) <- chrom_info$chrom dna_mm10_gr <- GenomicRanges::GRanges( seqnames = Rle(names(seq_length)), ranges = IRanges(1, end = seq_length, names = names(seq_length)), seqlengths = seq_length) GenomeInfoDb::genome(dna_mm10_gr) <- "mm10" demo_path <- system.file("extdata",package = "comapr") sampleName <- "s1" chr <- "chr1" vi_mtx <- Matrix::readMM(file = paste0(demo_path,"/", sampleName, "_", chr, "_vi.mtx")) snpAnno <- read.table(file = paste0(demo_path,"/", sampleName, "_", chr, "_snpAnnot.txt"), stringsAsFactors = FALSE, header = TRUE) countBinState(chr = "chr1",snpAnno = snpAnno, viState = vi_mtx,genomeRange = dna_mm10_gr, ntile = 1)
library(IRanges) library(S4Vectors) chrom_info <- GenomeInfoDb::getChromInfoFromUCSC("mm10") seq_length <- chrom_info$size names(seq_length) <- chrom_info$chrom dna_mm10_gr <- GenomicRanges::GRanges( seqnames = Rle(names(seq_length)), ranges = IRanges(1, end = seq_length, names = names(seq_length)), seqlengths = seq_length) GenomeInfoDb::genome(dna_mm10_gr) <- "mm10" demo_path <- system.file("extdata",package = "comapr") sampleName <- "s1" chr <- "chr1" vi_mtx <- Matrix::readMM(file = paste0(demo_path,"/", sampleName, "_", chr, "_vi.mtx")) snpAnno <- read.table(file = paste0(demo_path,"/", sampleName, "_", chr, "_snpAnnot.txt"), stringsAsFactors = FALSE, header = TRUE) countBinState(chr = "chr1",snpAnno = snpAnno, viState = vi_mtx,genomeRange = dna_mm10_gr, ntile = 1)
Count number of COs within each marker interval COs identified in the interval overlapping missing markers are distributed according to marker interval base-pair sizes. Genotypes encoded with "0" are treated as missing value.
countCOs(geno) ## S4 method for signature 'GRanges' countCOs(geno) ## S4 method for signature 'RangedSummarizedExperiment' countCOs(geno)
countCOs(geno) ## S4 method for signature 'GRanges' countCOs(geno) ## S4 method for signature 'RangedSummarizedExperiment' countCOs(geno)
geno |
GRanges object or RangedSummarizedExperiment object with genotype matrix that has SNP positions in the rows and cells/samples in the columns |
GRanges object or RangedSummarizedExperiment with markers-intervals as rows and samples in columns, values as the number of COs estimated for each marker interval
Ruqian Lyu
data(twoSamples) cocount <- countCOs(twoSamples)
data(twoSamples) cocount <- countCOs(twoSamples)
count how many samples have genotypes calls across markers and count how many markers that each individual has called genotypes for. This function helps identify poor samples or poor markers for filtering. It can also generate plots that help identify outlier samples/markers
countGT(geno, plot = TRUE, interactive = FALSE)
countGT(geno, plot = TRUE, interactive = FALSE)
geno |
the genotype data.frame of markers by samples from output of
function |
plot |
it determines whether a plot will be generated, defaults to TRUE |
interactive |
it determines whether an interactive plot will be generated |
A list of two elements including n_markers
and n_samples
Ruqian Lyu
data(snp_geno_gr) genotype_counts <- countGT(GenomicRanges::mcols(snp_geno_gr))
data(snp_geno_gr) genotype_counts <- countGT(GenomicRanges::mcols(snp_geno_gr))
Filter markers or samples that have too many missing values
filterGT(geno, min_markers = 5, min_samples = 3) ## S4 method for signature 'matrix,numeric,numeric' filterGT(geno, min_markers = 5, min_samples = 3) ## S4 method for signature 'GRanges,numeric,numeric' filterGT(geno, min_markers = 5, min_samples = 3)
filterGT(geno, min_markers = 5, min_samples = 3) ## S4 method for signature 'matrix,numeric,numeric' filterGT(geno, min_markers = 5, min_samples = 3) ## S4 method for signature 'GRanges,numeric,numeric' filterGT(geno, min_markers = 5, min_samples = 3)
geno |
the genotype data.frame of markers by samples from output of
function |
min_markers |
the minimum number of markers for a sample to be kept |
min_samples |
the minimum number of samples for a marker to be kept |
This function takes the geno
data.frame and filter the data.frame by
the provided cut-offs.
The filtered genotype matrix
Ruqian Lyu
data(snp_geno_gr) corrected_geno <- filterGT(snp_geno_gr, min_markers = 30,min_samples = 2)
data(snp_geno_gr) corrected_geno <- filterGT(snp_geno_gr, min_markers = 30,min_samples = 2)
Find the duplicated samples by look at the number of matching genotypes in all pair-wise samples
findDupSamples(geno, threshold = 0.99, in_text = FALSE)
findDupSamples(geno, threshold = 0.99, in_text = FALSE)
geno |
the genotype data.frame of markers by samples from output of
function |
threshold |
the frequency cut-off of number of matching genotypes out of
all geneotypes for determining whether the pair of samples
are duplicated, defaults to |
in_text |
whether text of frequencies should be displayed in the heatmap cells |
The paris of duplicated samples.
Ruqian Lyu
data(snp_geno) or_geno <- snp_geno[,grep("X",colnames(snp_geno))] rownames(or_geno) <- paste0(snp_geno$CHR,"_",snp_geno$POS) or_geno[,1] <- or_geno[,5] cr_geno <- correctGT(or_geno,ref = snp_geno$C57BL.6J, alt = snp_geno$FVB.NJ..i.) dups <- findDupSamples(cr_geno)
data(snp_geno) or_geno <- snp_geno[,grep("X",colnames(snp_geno))] rownames(or_geno) <- paste0(snp_geno$CHR,"_",snp_geno$POS) or_geno[,1] <- or_geno[,5] cr_geno <- correctGT(or_geno,ref = snp_geno$C57BL.6J, alt = snp_geno$FVB.NJ..i.) dups <- findDupSamples(cr_geno)
Generate the raw alternative allele frequencies tracks for all cells in the columns of provided 'co_count'
getAFTracks( chrom = "chr1", path_loc = "./output/firstBatch/WC_522/", sampleName = "WC_522", nwindow = 80, barcodeFile, co_count, snp_track = NULL )
getAFTracks( chrom = "chr1", path_loc = "./output/firstBatch/WC_522/", sampleName = "WC_522", nwindow = 80, barcodeFile, co_count, snp_track = NULL )
chrom |
the chromosome |
path_loc |
the path prefix to the output files from sscocaller including "*_totalCount.mtx" and "_altCount.mtx" |
sampleName |
the sample name, which is the prefix of sscocaller's output files |
nwindow |
the number of windows for binning the chromosome |
barcodeFile |
the barcode file containing the list of cell barcodes used as the input file for sscocaller |
co_count |
'GRange' or 'RangedSummarizedExperiment' object,
returned by |
snp_track |
the SNP position track which is used for obtaining the SNP chromosome locations. It could be omitted and the SNP positions will be acquired from the "*_snpAnnot.txt" file. |
a list object, in which each element is a list of two items with the cell's alternative allele frequency DataTrack and the called crossover ranges.
Ruqian Lyu
demo_path <- system.file("extdata",package = "comapr") s1_rse_state <- readHapState("s1",chroms=c("chr1"), path=demo_path,barcodeFile=NULL,minSNP = 0, minlogllRatio = 50, bpDist = 100,maxRawCO=10, minCellSNP = 0) s1_counts <- countCOs(s1_rse_state) af_co_tracks <- getAFTracks(chrom ="chr1", path_loc = demo_path, sampleName = "s1", barcodeFile = file.path(demo_path, "s1_barcodes.txt"), co_count = s1_counts)
demo_path <- system.file("extdata",package = "comapr") s1_rse_state <- readHapState("s1",chroms=c("chr1"), path=demo_path,barcodeFile=NULL,minSNP = 0, minlogllRatio = 50, bpDist = 100,maxRawCO=10, minCellSNP = 0) s1_counts <- countCOs(s1_rse_state) af_co_tracks <- getAFTracks(chrom ="chr1", path_loc = demo_path, sampleName = "s1", barcodeFile = file.path(demo_path, "s1_barcodes.txt"), co_count = s1_counts)
It plots the raw alternative allele frequencies and highlight the crossover regions for the selected cell.
It plots the raw alternative allele frequencies and highlight the crossover regions for the selected cell.
getCellAFTrack( chrom = "chr1", path_loc = "./output/firstBatch/WC_522/", sampleName = "WC_522", nwindow = 80, barcodeFile, cellBarcode, co_count, snp_track = NULL, chunk = 1000L ) getCellAFTrack( chrom = "chr1", path_loc = "./output/firstBatch/WC_522/", sampleName = "WC_522", nwindow = 80, barcodeFile, cellBarcode, co_count, snp_track = NULL, chunk = 1000L )
getCellAFTrack( chrom = "chr1", path_loc = "./output/firstBatch/WC_522/", sampleName = "WC_522", nwindow = 80, barcodeFile, cellBarcode, co_count, snp_track = NULL, chunk = 1000L ) getCellAFTrack( chrom = "chr1", path_loc = "./output/firstBatch/WC_522/", sampleName = "WC_522", nwindow = 80, barcodeFile, cellBarcode, co_count, snp_track = NULL, chunk = 1000L )
chrom |
the chromosome |
path_loc |
the path prefix to the output files from sscocaller including "*_totalCount.mtx" and "_altCount.mtx" |
sampleName |
the sample name, which is the prefix of sscocaller's output files |
nwindow |
the number of windows for binning the chromosome |
barcodeFile |
the barcode file containing the list of cell barcodes used as the input file for sscocaller |
cellBarcode |
the selected cell barcode |
co_count |
'GRange' or 'RangedSummarizedExperiment' object,
returned by |
snp_track |
the SNP position track which is used for obtaining the SNP chromosome locations. It could be omitted and the SNP positions will be acquired from the "*_snpAnnot.txt" file. |
chunk |
An integer scalar indicating the chunk size to use, i.e., number of rows to read at any one time. |
The DataTrack object defined in DataTrack
The DataTrack object defined in DataTrack
Ruqian Lyu
demo_path <-paste0(system.file("extdata",package = "comapr"),"/") s1_rse_state <- readHapState("s1",chroms=c("chr1"), path=demo_path,barcodeFile=NULL,minSNP = 0, minlogllRatio = 50, bpDist = 100,maxRawCO=10, minCellSNP = 0) s1_counts <- countCOs(s1_rse_state) af_co_tracks <- getCellAFTrack(chrom ="chr1", path_loc = demo_path, sampleName = "s1", barcodeFile = paste0(demo_path, "s1_barcodes.txt"), cellBarcode = "BC1", co_count = s1_counts) demo_path <-paste0(system.file("extdata",package = "comapr"),"/") s1_rse_state <- readHapState("s1",chroms=c("chr1"), path=demo_path,barcodeFile=NULL,minSNP = 0, minlogllRatio = 50, bpDist = 100,maxRawCO=10, minCellSNP = 0) s1_counts <- countCOs(s1_rse_state) af_co_tracks <- getCellAFTrack(chrom ="chr1", path_loc = demo_path, sampleName = "s1", barcodeFile = paste0(demo_path, "s1_barcodes.txt"), cellBarcode = "BC1", co_count = s1_counts)
demo_path <-paste0(system.file("extdata",package = "comapr"),"/") s1_rse_state <- readHapState("s1",chroms=c("chr1"), path=demo_path,barcodeFile=NULL,minSNP = 0, minlogllRatio = 50, bpDist = 100,maxRawCO=10, minCellSNP = 0) s1_counts <- countCOs(s1_rse_state) af_co_tracks <- getCellAFTrack(chrom ="chr1", path_loc = demo_path, sampleName = "s1", barcodeFile = paste0(demo_path, "s1_barcodes.txt"), cellBarcode = "BC1", co_count = s1_counts) demo_path <-paste0(system.file("extdata",package = "comapr"),"/") s1_rse_state <- readHapState("s1",chroms=c("chr1"), path=demo_path,barcodeFile=NULL,minSNP = 0, minlogllRatio = 50, bpDist = 100,maxRawCO=10, minCellSNP = 0) s1_counts <- countCOs(s1_rse_state) af_co_tracks <- getCellAFTrack(chrom ="chr1", path_loc = demo_path, sampleName = "s1", barcodeFile = paste0(demo_path, "s1_barcodes.txt"), cellBarcode = "BC1", co_count = s1_counts)
It finds the crossover intervals for a selected cell
getCellCORange(co_count, cellBarcode)
getCellCORange(co_count, cellBarcode)
co_count |
'GRanges' or 'RangedSummarizedExperiment' object, |
cellBarcode |
the selected cell's barcode |
GRange object containing the crossover intervals for the selected cell
Ruqian Lyu
demo_path <-paste0(system.file("extdata",package = "comapr"),"/") s1_rse_state <- readHapState("s1",chroms=c("chr1"), path=demo_path,barcodeFile=NULL,minSNP = 0, minlogllRatio = 50, bpDist = 100,maxRawCO=10, minCellSNP = 0) s1_counts <- countCOs(s1_rse_state) co_ranges <- getCellCORange(cellBarcode = "BC1", co_count = s1_counts)
demo_path <-paste0(system.file("extdata",package = "comapr"),"/") s1_rse_state <- readHapState("s1",chroms=c("chr1"), path=demo_path,barcodeFile=NULL,minSNP = 0, minlogllRatio = 50, bpDist = 100,maxRawCO=10, minCellSNP = 0) s1_counts <- countCOs(s1_rse_state) co_ranges <- getCellCORange(cellBarcode = "BC1", co_count = s1_counts)
It plots the total allele counts for the selected cell.
getCellDPTrack( chrom = "chr1", path_loc = "./output/firstBatch/WC_522/", sampleName = "WC_522", nwindow = 80, barcodeFile, cellBarcode, snp_track = NULL, chunk = 1000L, log = TRUE, plot_type = "hist" )
getCellDPTrack( chrom = "chr1", path_loc = "./output/firstBatch/WC_522/", sampleName = "WC_522", nwindow = 80, barcodeFile, cellBarcode, snp_track = NULL, chunk = 1000L, log = TRUE, plot_type = "hist" )
chrom |
the chromosome |
path_loc |
the path prefix to the output files from sscocaller including "*_totalCount.mtx" |
sampleName |
the sample name, which is the prefix of sscocaller's output files |
nwindow |
the number of windows for binning the chromosome |
barcodeFile |
the barcode file containing the list of cell barcodes used as the input file for sscocaller |
cellBarcode |
the selected cell barcode |
snp_track |
the SNP position track which is used for obtaining the SNP chromosome locations. It could be omitted and the SNP positions will be acquired from the "*_snpAnnot.txt" file. |
chunk |
A integer scalar indicating the chunk size to use, i.e., number of rows to read at any one time. |
log |
whether the histogram of SNP density should be plotted on log scale (log10) |
plot_type |
the DataTrack plot type, default to be 'hist' |
The DataTrack object defined in DataTrack
Ruqian Lyu
demo_path <- system.file("extdata",package = "comapr") s1_rse_state <- readHapState("s1",chroms=c("chr1"), path=demo_path,barcodeFile=NULL,minSNP = 0, minlogllRatio = 50, bpDist = 100,maxRawCO=10, minCellSNP = 0) s1_counts <- countCOs(s1_rse_state) dp_co_tracks <- getCellDPTrack(chrom ="chr1", path_loc = demo_path, sampleName = "s1", barcodeFile = file.path(demo_path, "s1_barcodes.txt"), cellBarcode = "BC1")
demo_path <- system.file("extdata",package = "comapr") s1_rse_state <- readHapState("s1",chroms=c("chr1"), path=demo_path,barcodeFile=NULL,minSNP = 0, minlogllRatio = 50, bpDist = 100,maxRawCO=10, minCellSNP = 0) s1_counts <- countCOs(s1_rse_state) dp_co_tracks <- getCellDPTrack(chrom ="chr1", path_loc = demo_path, sampleName = "s1", barcodeFile = file.path(demo_path, "s1_barcodes.txt"), cellBarcode = "BC1")
Marker segregation distortion detection using chisq-test
getDistortedMarkers(geno, p = c(0.5, 0.5), adj.method = "BH")
getDistortedMarkers(geno, p = c(0.5, 0.5), adj.method = "BH")
geno |
the genotype data.frame of markers by samples from output of
function |
p |
the expected geneotype ratio in a numeric vector, defaults to c(0.5,0.5) |
adj.method |
Methods to adjust for multiple comparisons, defaults to "BH" |
We expect the genotypes to appear with the frequencies of 1:1 homo_alt:hets. We usechisq.test for finding markers that have genotypes among samples that are significantly different from the 1:1 ratio and report them
data.frame with each row representing one SNP marker and columns containing the chisq.test results
Ruqian Lyu
data(parents_geno) data(snp_geno_gr) corrected_geno <- correctGT(gt_matrix = GenomicRanges::mcols(snp_geno_gr), ref = parents_geno$ref,alt = parents_geno$alt,fail = "Fail", wrong_label = "Homo_ref") GenomicRanges::mcols(snp_geno_gr) <- corrected_geno corrected_geno <- filterGT(snp_geno_gr, min_markers = 30,min_samples = 2) pvalues <- getDistortedMarkers(GenomicRanges::mcols(corrected_geno))
data(parents_geno) data(snp_geno_gr) corrected_geno <- correctGT(gt_matrix = GenomicRanges::mcols(snp_geno_gr), ref = parents_geno$ref,alt = parents_geno$alt,fail = "Fail", wrong_label = "Homo_ref") GenomicRanges::mcols(snp_geno_gr) <- corrected_geno corrected_geno <- filterGT(snp_geno_gr, min_markers = 30,min_samples = 2) pvalues <- getDistortedMarkers(GenomicRanges::mcols(corrected_geno))
Generate the mean DP (Depth) DataTrack (from Gviz) for cells
getMeanDPTrack( chrom = "chr1", path_loc, nwindow = 80, sampleName, barcodeFile, plot_type = "hist", selectedBarcodes = NULL, snp_track = NULL, log = TRUE )
getMeanDPTrack( chrom = "chr1", path_loc, nwindow = 80, sampleName, barcodeFile, plot_type = "hist", selectedBarcodes = NULL, snp_track = NULL, log = TRUE )
chrom |
the chromosome |
path_loc |
the path prefix to the output files from sscocaller including "*_totalCount.mtx" and "_altCount.mtx" |
nwindow |
the number of windows for binning the chromosome |
sampleName |
the sample name, which is the prefix of sscocaller's output files |
barcodeFile |
the barcode file containing the list of cell barcodes used as the input file for sscocaller |
plot_type |
the DataTrack plot type, default to be 'hist' |
selectedBarcodes |
the selected cell barcodes which should be the barcodes that have been called crossovers for. If not supplied then all cells are counted. |
snp_track |
the SNP position track which is used for obtaining the SNP chromosome locations. It could be omitted and the SNP positions will be acquired from the "*_snpAnnot.txt" file. |
log |
whether the histogram of SNP density should be plotted on log scale (log10) |
DataTrack object plotting the mean DP histogram for windowed chromosomes
Ruqian Lyu
demo_path <-paste0(system.file("extdata",package = "comapr"),"/") meanDP_track <- getMeanDPTrack(chrom ="chr1", path_loc = demo_path, sampleName = "s1", barcodeFile = paste0(demo_path, "s1_barcodes.txt"))
demo_path <-paste0(system.file("extdata",package = "comapr"),"/") meanDP_track <- getMeanDPTrack(chrom ="chr1", path_loc = demo_path, sampleName = "s1", barcodeFile = paste0(demo_path, "s1_barcodes.txt"))
Generate the SNP density DataTrack (from 'Gviz') for selected chromosome
getSNPDensityTrack( chrom = "chr1", sampleName = "s1", path_loc = ".", nwindow = 80, plot_type = "hist", log = TRUE )
getSNPDensityTrack( chrom = "chr1", sampleName = "s1", path_loc = ".", nwindow = 80, plot_type = "hist", log = TRUE )
chrom |
the chromosome |
sampleName |
the sample name, which is the prefix of sscocaller's output files |
path_loc |
the path prefix to the output files from sscocaller including "*_totalCount.mtx" and "_altCount.mtx" |
nwindow |
the number of windows for binning the chromosome |
plot_type |
the DataTrack plot type, default to be 'hist' |
log |
whether the histogram of SNP density should be plotted on log scale (log10) |
DataTrack object plotting the SNP density histogram
Ruqian Lyu
demo_path <- system.file("extdata",package = "comapr") snp_track <- getSNPDensityTrack(chrom ="chr1", path_loc = demo_path, sampleName = "s1")
demo_path <- system.file("extdata",package = "comapr") snp_track <- getSNPDensityTrack(chrom ="chr1", path_loc = demo_path, sampleName = "s1")
Parents' genotype for F1 samples in 'snp_geno'
data(parents_geno)
data(parents_geno)
A data.frame:
genotype of reference mouse train across markers
genotype of alternative mouse train across markers
A function that parses output ('_viSegInfo.txt' )from 'sgcocaller' https://gitlab.svi.edu.au/biocellgen-public/sgcocaller and generate cell cell (per chr) summary statistics
perCellChrQC( sampleName, chroms = c("chr1", "chr7", "chr15"), path, barcodeFile = NULL, doPlot = TRUE )
perCellChrQC( sampleName, chroms = c("chr1", "chr7", "chr15"), path, barcodeFile = NULL, doPlot = TRUE )
sampleName |
the name of the sample to parse which is used as prefix for finding relevant files for the underlying sample |
chroms |
the character vectors of chromosomes to parse. Multiple chromosomes' results will be concated together. |
path |
the path to the files, with name patterns *chrom_vi.mtx, *chrom_viSegInfo.txt, end with slash |
barcodeFile |
defaults to NULL, it is assumed to be in the same d irectory as the other files and with name sampleName_barcodes.txt |
doPlot |
whether a plot should returned, default to TRUE |
a list object that contains the data.frame with summarised statistics per chr per cell and a plot (if doPlot)
Ruqian Lyu
demo_path <-system.file("extdata",package = "comapr") pcQC <- perCellChrQC(sampleName="s1",chroms=c("chr1"), path=demo_path, barcodeFile=NULL)
demo_path <-system.file("extdata",package = "comapr") pcQC <- perCellChrQC(sampleName="s1",chroms=c("chr1"), path=demo_path, barcodeFile=NULL)
Permutation test of two sample groups
permuteDist(co_gr, B = 100, mapping_fun = "k", group_by)
permuteDist(co_gr, B = 100, mapping_fun = "k", group_by)
co_gr |
GRanges or RangedSummarizedExperiment object that contains the
crossover counts for each marker interval across all samples.
Returned by |
B |
integer the number of sampling times |
mapping_fun |
character default to "k" (kosambi mapping function). It can be one of the mapping functions: "k","h" |
group_by |
the prefix for each group that we need to generate distributions for(only when co_gr is a GRanges object). Or the column name for 'colData(co_gr)' that contains the group factor (only when co_gr is a RangedSummarizedExperiment object) |
It shuffles the group labels for the samples and calculate a difference between two groups after shuffling.
A list of three elements. 'permutes' of length B with numeric differences of permuted group differences,'observed_diff' the observed genetic distances of two groups, 'nSample', the number of samples in the first and second group.
Ruqian Lyu
data(coCount) perms <- permuteDist(coCount, group_by = "sampleGroup",B=10)
data(coCount) perms <- permuteDist(coCount, group_by = "sampleGroup",B=10)
Plots the summary statistics of segments that are generated by 'sgcocaller' https://gitlab.svi.edu.au/biocellgen-public/sgcocaller which have been detected by finding consequtive viter states along the list of SNP markers.
perSegChrQC( sampleName, chroms = c("chr1", "chr7", "chr15"), path, barcodeFile = NULL, maxRawCO = 10 )
perSegChrQC( sampleName, chroms = c("chr1", "chr7", "chr15"), path, barcodeFile = NULL, maxRawCO = 10 )
sampleName |
the name of the sample to parse which is used as prefix for finding relevant files for the underlying sample |
chroms |
the vector of chromosomes |
path |
the path to the files, with name patterns *chrom_vi.mtx, *chrom_viSegInfo.txt, end with slash |
barcodeFile |
defaults to NULL, it is assumed to be in the same directory as the other files and with name sampleName_barcodes.txt |
maxRawCO |
if a cell has more than 'maxRawCO' number of raw crossovers called across a chromosome, the cell is filtered out#' |
It provides guidance in filtering out close double crossovers that are not likely biological but due to technical reasons as well as crossovers that are supported by fewer number of SNPs at the ends of the chromosomes.
Histogram plots for statistics summarized across all Viterbi state segments
Ruqian Lyu
demo_path <- system.file("extdata",package = "comapr") s1_rse_qc <- perSegChrQC(sampleName="s1", chroms=c("chr1"), path=demo_path, maxRawCO=10)
demo_path <- system.file("extdata",package = "comapr") s1_rse_qc <- perSegChrQC(sampleName="s1", chroms=c("chr1"), path=demo_path, maxRawCO=10)
Plot the number of COs per sample group or per chromosome
plotCount( co_count, by_chr = FALSE, group_by = "sampleGroup", plot_type = "error_bar" ) ## S4 method for signature 'RangedSummarizedExperiment,missing,missing' plotCount( co_count, by_chr = FALSE, group_by = "sampleGroup", plot_type = "error_bar" ) ## S4 method for signature 'RangedSummarizedExperiment,missing,character' plotCount( co_count, by_chr = FALSE, group_by = "sampleGroup", plot_type = "error_bar" ) ## S4 method for signature 'RangedSummarizedExperiment,logical,character' plotCount( co_count, by_chr = FALSE, group_by = "sampleGroup", plot_type = "error_bar" ) ## S4 method for signature 'RangedSummarizedExperiment,logical,missing' plotCount( co_count, by_chr = FALSE, group_by = "sampleGroup", plot_type = "error_bar" ) ## S4 method for signature 'GRanges,logical,missing' plotCount( co_count, by_chr = FALSE, group_by = "sampleGroup", plot_type = "error_bar" ) ## S4 method for signature 'GRanges,missing,missing' plotCount( co_count, by_chr = FALSE, group_by = "sampleGroup", plot_type = "error_bar" ) ## S4 method for signature 'GRanges,missing,character' plotCount( co_count, by_chr = FALSE, group_by = "sampleGroup", plot_type = "error_bar" ) ## S4 method for signature 'GRanges,logical,character' plotCount( co_count, by_chr = FALSE, group_by = "sampleGroup", plot_type = "error_bar" )
plotCount( co_count, by_chr = FALSE, group_by = "sampleGroup", plot_type = "error_bar" ) ## S4 method for signature 'RangedSummarizedExperiment,missing,missing' plotCount( co_count, by_chr = FALSE, group_by = "sampleGroup", plot_type = "error_bar" ) ## S4 method for signature 'RangedSummarizedExperiment,missing,character' plotCount( co_count, by_chr = FALSE, group_by = "sampleGroup", plot_type = "error_bar" ) ## S4 method for signature 'RangedSummarizedExperiment,logical,character' plotCount( co_count, by_chr = FALSE, group_by = "sampleGroup", plot_type = "error_bar" ) ## S4 method for signature 'RangedSummarizedExperiment,logical,missing' plotCount( co_count, by_chr = FALSE, group_by = "sampleGroup", plot_type = "error_bar" ) ## S4 method for signature 'GRanges,logical,missing' plotCount( co_count, by_chr = FALSE, group_by = "sampleGroup", plot_type = "error_bar" ) ## S4 method for signature 'GRanges,missing,missing' plotCount( co_count, by_chr = FALSE, group_by = "sampleGroup", plot_type = "error_bar" ) ## S4 method for signature 'GRanges,missing,character' plotCount( co_count, by_chr = FALSE, group_by = "sampleGroup", plot_type = "error_bar" ) ## S4 method for signature 'GRanges,logical,character' plotCount( co_count, by_chr = FALSE, group_by = "sampleGroup", plot_type = "error_bar" )
co_count |
GRange or RangedSummarizedExperiment object, returned by |
by_chr |
whether it should plot each chromosome separately |
group_by |
the column name in 'colData(co_count)' that specify the grouping factor. Or the character vector contains the unique prefix of sample names that are used for defining different sample groups. If missing all samples are assumed to be from one group |
plot_type |
determins what type the plot will be, choose from "error_bar" or "hist". Only relevant when by_chr=TRUE |
ggplot object
demo_path <-paste0(system.file("extdata",package = "comapr"),"/") s1_rse_state <- readHapState("s1",chroms=c("chr1"), path=demo_path,barcodeFile=NULL,minSNP = 0, minlogllRatio = 50, bpDist = 100,maxRawCO=10, minCellSNP = 0) s1_count <- countCOs(s1_rse_state) plotCount(s1_count)
demo_path <-paste0(system.file("extdata",package = "comapr"),"/") s1_rse_state <- readHapState("s1",chroms=c("chr1"), path=demo_path,barcodeFile=NULL,minSNP = 0, minlogllRatio = 50, bpDist = 100,maxRawCO=10, minCellSNP = 0) s1_count <- countCOs(s1_rse_state) plotCount(s1_count)
Plotting the calculated genetic distanced for each bin or marker interval supplied by the GRanges object
plotGeneticDist(gr, bin = TRUE, chr = NULL, cumulative = FALSE)
plotGeneticDist(gr, bin = TRUE, chr = NULL, cumulative = FALSE)
gr |
GRanges object with genetic distances calculated for marker intervals |
bin |
TRUE or FALSE, indicating whether the supplied GRange objecct is for binned interval |
chr |
the specific chrs selected to plot |
cumulative |
TRUE or FALSE, indicating whether it plots the bin-wise genetic distances or the cumulative distances |
ggplot2 plot
Ruqian Lyu
data(coCount) dist_se <- calGeneticDist(coCount) plotGeneticDist(SummarizedExperiment::rowRanges(dist_se))
data(coCount) dist_se <- calGeneticDist(coCount) plotGeneticDist(SummarizedExperiment::rowRanges(dist_se))
Function to plot the genotypes for all samples faceted by genotype
plotGTFreq(geno)
plotGTFreq(geno)
geno |
the genotype data.frame of markers by samples from output of
function |
A ggplot object
Ruqian Lyu
data(snp_geno) or_geno <- snp_geno[,grep("X",colnames(snp_geno))] rownames(or_geno) <- paste0(snp_geno$CHR,"_",snp_geno$POS) or_geno[1,] <- rep("Fail",dim(or_geno)[2]) cr_geno <- correctGT(or_geno,ref = snp_geno$C57BL.6J, alt = snp_geno$FVB.NJ..i.) ft_gt <- filterGT(cr_geno) plotGTFreq(ft_gt)
data(snp_geno) or_geno <- snp_geno[,grep("X",colnames(snp_geno))] rownames(or_geno) <- paste0(snp_geno$CHR,"_",snp_geno$POS) or_geno[1,] <- rep("Fail",dim(or_geno)[2]) cr_geno <- correctGT(or_geno,ref = snp_geno$C57BL.6J, alt = snp_geno$FVB.NJ..i.) ft_gt <- filterGT(cr_geno) plotGTFreq(ft_gt)
This function takes the calculated genetic distances for all marker intervals across all chromosomes provided and plot the cumulative genetic distances
plotWholeGenome(gr)
plotWholeGenome(gr)
gr |
GRanges object with genetic distances calculated for marker intervals |
A ggplot object
data(coCount) dist_se <- calGeneticDist(coCount) plotWholeGenome(SummarizedExperiment::rowRanges(dist_se))
data(coCount) dist_se <- calGeneticDist(coCount) plotWholeGenome(SummarizedExperiment::rowRanges(dist_se))
Modified the 'Matrix::readMM' function for reading matrices stored in the Harwell-Boeing or MatrixMarket formats but only reads selected column.
readColMM(file, which.col, chunk = 1000L)
readColMM(file, which.col, chunk = 1000L)
file |
the name of the file to be read from as a character scalar. Those storing matrices in the MatrixMarket format usually end in ".mtx". |
which.col |
An integer scalar, the column index |
chunk |
An integer scalar indicating the chunk size to use, i.e., number of rows to read at any one time. |
See readMM
A sparse matrix object that inherits from the "Matrix" class which the original dimensions.To get the vector of the specified column, one need to subset the matrix to select the column with the same index.
Ruqian Lyu
demo_path <-paste0(system.file("extdata",package = "comapr"),"/") readColMM(file = paste0(demo_path,"s1_chr1_vi.mtx"), which.col=2,chunk=2)
demo_path <-paste0(system.file("extdata",package = "comapr"),"/") readColMM(file = paste0(demo_path,"s1_chr1_vi.mtx"), which.col=2,chunk=2)
A function that parses the viterbi state matrix (in .mtx format), barcode.txt and snpAnno.txt files for each individual.
readHapState( sampleName, chroms = c("chr1"), path, barcodeFile = NULL, minSNP = 30, minlogllRatio = 200, bpDist = 100, maxRawCO = 10, nmad = 1.5, minCellSNP = 200, biasTol = 0.45 )
readHapState( sampleName, chroms = c("chr1"), path, barcodeFile = NULL, minSNP = 30, minlogllRatio = 200, bpDist = 100, maxRawCO = 10, nmad = 1.5, minCellSNP = 200, biasTol = 0.45 )
sampleName |
the name of the sample to parse which is used as prefix for finding relevant files for the underlying sample |
chroms |
the character vectors of chromosomes to parse. Multiple chromosomes' results will be concated together. |
path |
the path to the files, with name patterns *chrom_vi.mtx, *chrom_viSegInfo.txt, end with slash |
barcodeFile |
if NULL, it is assumed to be in the same directory as the other files and with name sampleName_barcodes.txt |
minSNP |
the crossover(s) will be filtered out if introduced by a segment that has fewer than 'minSNP' SNPs to support. |
minlogllRatio |
the crossover(s) will be filtered out if introduced by a segment that has lower than 'minlogllRatio' to its reversed state. |
bpDist |
the crossover(s) will be filtered out if introduced by a segment that is shorter than 'bpDist' basepairs. It can be a single value or a vector that is the same length and order with 'chroms'. |
maxRawCO |
if a cell has more than 'maxRawCO' number of raw crossovers called across a chromosome, the cell is filtered out |
nmad |
how many mean absolute deviations lower than the median number of SNPs per cellfor a cell to be considered as low coverage cell and filtered Only effective when number of cells are larger than 10. When effective, this or 'minCellSNP', whichever is larger, is applied |
minCellSNP |
the minimum number of SNPs detected for a cell to be kept, used with 'nmads' |
biasTol |
the SNP's haplotype ratio across all cells is assumed to be 1:1. This argument can be used for removing SNPs that have a biased haplotype. i.e. almost always inferred to be haplotype state 1. It specifies a bias tolerance value, SNPs with haplotype ratios deviating from 0.5 smaller than this value are kept. Only effective when number of cells are larger than 10 |
a RangedSummarizedExperiment with rowRanges as SNP positions that contribute to crossovers in any cells. colData contains cells annotation including barcodes and sampleName.
Ruqian Lyu
demo_path <- system.file("extdata",package = "comapr") s1_rse_state <- readHapState(sampleName="s1",chroms=c("chr1"), path=paste0(demo_path,"/"), barcodeFile=NULL,minSNP = 0, minlogllRatio = 50, bpDist = 100,maxRawCO=10,minCellSNP=3) s1_rse_state
demo_path <- system.file("extdata",package = "comapr") s1_rse_state <- readHapState(sampleName="s1",chroms=c("chr1"), path=paste0(demo_path,"/"), barcodeFile=NULL,minSNP = 0, minlogllRatio = 50, bpDist = 100,maxRawCO=10,minCellSNP=3) s1_rse_state
Markers by genotype results for a group of samples
data(snp_geno)
data(snp_geno)
A data frame with columns:
genotype of reference mouse train across markers
genotype of alternative mouse train across markers
SNP marker base-pair location
SNP marker chromosome location
a mouse sample
a mouse sample
a mouse sample
a mouse sample
a mouse sample
a mouse sample
a mouse sample
a mouse sample
a mouse sample
a mouse sample
a mouse sample
a mouse sample
a mouse sample
a mouse sample
a mouse sample
a mouse sample
a mouse sample
a mouse sample
a mouse sample
a mouse sample
a mouse sample
a mouse sample
the SNP ID
Statistics Canada. Table 001-0008 - Production and farm value of maple products, annual. http://www5.statcan.gc.ca/cansim/
Markers by genotype results for a group of samples
data(snp_geno_gr)
data(snp_geno_gr)
A GRanges object:
a mouse sample
a mouse sample
a mouse sample
a mouse sample
a mouse sample
a mouse sample
a mouse sample
a mouse sample
a mouse sample
a mouse sample
a mouse sample
a mouse sample
a mouse sample
a mouse sample
a mouse sample
a mouse sample
a mouse sample
a mouse sample
a mouse sample
a mouse sample
a mouse sample
a mouse sample
the SNP ID
TBD
RangedSummarizedExperiment object containing the Viterbi states SNP markers for samples from two groups. 'colData(twoSamples)' contains the sample group factor.
data(twoSamples)
data(twoSamples)
An object of class RangedSummarizedExperiment
with 6 rows and 10 columns.