Package 'comapr' reference manual

Title:	Crossover analysis and genetic map construction
Description:	comapr detects crossover intervals for single gametes from their haplotype states sequences and stores the crossovers in GRanges object. The genetic distances can then be calculated via the mapping functions using estimated crossover rates for maker intervals. Visualisation functions for plotting interval-based genetic map or cumulative genetic distances are implemented, which help reveal the variation of crossovers landscapes across the genome and across individuals.
Authors:	Ruqian Lyu [aut, cre]
Maintainer:	Ruqian Lyu <[email protected]>
License:	MIT + file LICENSE
Version:	1.11.1
Built:	2025-03-22 06:18:31 UTC
Source:	https://github.com/bioc/comapr

bootstrapDist

Description

Generating distribution of sample genetic distances

Usage

bootstrapDist(co_gr, B = 1000, mapping_fun = "k", group_by)
bootstrapDist(co_gr, B = 1000, mapping_fun = "k", group_by)

Arguments

`co_gr`	GRanges or RangedSummarizedExperiment object that contains the crossover counts for each marker interval across all samples. Returned by `countCOs`
`B`	integer the number of sampling times
`mapping_fun`	character default to "k" (kosambi mapping function). It can be one of the mapping functions: "k","h"
`group_by`	the prefix for each group that we need to generate distributions for(only when co_gr is a GRanges object). Or the column name for 'colData(co_gr)' that contains the group factor (only when co_gr is a RangedSummarizedExperiment object)

Details

It takes the crossover counts for samples in multiple groups that is returned by 'countCO'. It then draws samples from a group with replacement and calculate the distribution of relevant statistics.

Value

lists of numeric genetic distances for multiple samples

Author(s)

Ruqian Lyu

Examples

data(coCount)

bootsDiff <- bootstrapDist(coCount, group_by = "sampleGroup",B=10)
data(coCount)

bootsDiff <- bootstrapDist(coCount, group_by = "sampleGroup",B=10)

Calculate genetic distances of marker intervals or binned-chromosome Given whether crossover happens in each marker interval, calculate the recombination fraction in samples and then derive the Haldane or Kosambi genetic distances via mapping functions

Usage

calGeneticDist(
  co_count,
  bin_size = NULL,
  mapping_fun = "k",
  ref_genome = "mm10",
  group_by = NULL,
  chrom_info = NULL
)

## S4 method for signature 'GRanges,missing,ANY,ANY,missing'
calGeneticDist(
  co_count,
  bin_size = NULL,
  mapping_fun = "k",
  ref_genome = "mm10",
  group_by = NULL,
  chrom_info = NULL
)

## S4 method for signature 'GRanges,numeric,ANY,ANY,missing'
calGeneticDist(
  co_count,
  bin_size = NULL,
  mapping_fun = "k",
  ref_genome = "mm10",
  group_by = NULL,
  chrom_info = NULL
)

## S4 method for signature 'GRanges,missing,ANY,ANY,character'
calGeneticDist(
  co_count,
  bin_size = NULL,
  mapping_fun = "k",
  ref_genome = "mm10",
  group_by = NULL,
  chrom_info = NULL
)

## S4 method for signature 'GRanges,numeric,ANY,ANY,character'
calGeneticDist(
  co_count,
  bin_size = NULL,
  mapping_fun = "k",
  ref_genome = "mm10",
  group_by = NULL,
  chrom_info = NULL
)

## S4 method for signature 'RangedSummarizedExperiment,missing,ANY,ANY,missing'
calGeneticDist(
  co_count,
  bin_size = NULL,
  mapping_fun = "k",
  ref_genome = "mm10",
  group_by = NULL,
  chrom_info = NULL
)

## S4 method for signature 
## 'RangedSummarizedExperiment,missing,ANY,ANY,character'
calGeneticDist(
  co_count,
  bin_size = NULL,
  mapping_fun = "k",
  ref_genome = "mm10",
  group_by = NULL,
  chrom_info = NULL
)

## S4 method for signature 
## 'RangedSummarizedExperiment,numeric,ANY,ANY,character'
calGeneticDist(
  co_count,
  bin_size = NULL,
  mapping_fun = "k",
  ref_genome = "mm10",
  group_by = NULL,
  chrom_info = NULL
)

## S4 method for signature 'RangedSummarizedExperiment,numeric,ANY,ANY,missing'
calGeneticDist(
  co_count,
  bin_size = NULL,
  mapping_fun = "k",
  ref_genome = "mm10",
  group_by = NULL,
  chrom_info = NULL
)
calGeneticDist(
  co_count,
  bin_size = NULL,
  mapping_fun = "k",
  ref_genome = "mm10",
  group_by = NULL,
  chrom_info = NULL
)

## S4 method for signature 'GRanges,missing,ANY,ANY,missing'
calGeneticDist(
  co_count,
  bin_size = NULL,
  mapping_fun = "k",
  ref_genome = "mm10",
  group_by = NULL,
  chrom_info = NULL
)

## S4 method for signature 'GRanges,numeric,ANY,ANY,missing'
calGeneticDist(
  co_count,
  bin_size = NULL,
  mapping_fun = "k",
  ref_genome = "mm10",
  group_by = NULL,
  chrom_info = NULL
)

## S4 method for signature 'GRanges,missing,ANY,ANY,character'
calGeneticDist(
  co_count,
  bin_size = NULL,
  mapping_fun = "k",
  ref_genome = "mm10",
  group_by = NULL,
  chrom_info = NULL
)

## S4 method for signature 'GRanges,numeric,ANY,ANY,character'
calGeneticDist(
  co_count,
  bin_size = NULL,
  mapping_fun = "k",
  ref_genome = "mm10",
  group_by = NULL,
  chrom_info = NULL
)

## S4 method for signature 'RangedSummarizedExperiment,missing,ANY,ANY,missing'
calGeneticDist(
  co_count,
  bin_size = NULL,
  mapping_fun = "k",
  ref_genome = "mm10",
  group_by = NULL,
  chrom_info = NULL
)

## S4 method for signature 
## 'RangedSummarizedExperiment,missing,ANY,ANY,character'
calGeneticDist(
  co_count,
  bin_size = NULL,
  mapping_fun = "k",
  ref_genome = "mm10",
  group_by = NULL,
  chrom_info = NULL
)

## S4 method for signature 
## 'RangedSummarizedExperiment,numeric,ANY,ANY,character'
calGeneticDist(
  co_count,
  bin_size = NULL,
  mapping_fun = "k",
  ref_genome = "mm10",
  group_by = NULL,
  chrom_info = NULL
)

## S4 method for signature 'RangedSummarizedExperiment,numeric,ANY,ANY,missing'
calGeneticDist(
  co_count,
  bin_size = NULL,
  mapping_fun = "k",
  ref_genome = "mm10",
  group_by = NULL,
  chrom_info = NULL
)

Arguments

`co_count`	GRange or RangedSummarizedExperiment object, returned by `countCO`
`bin_size`	The binning size for grouping marker intervals into bins. If not supplied,the orginial marker intervals are returned with converted genetic distancens based on recombination rate
`mapping_fun`	The mapping function to use, can be one of "k" or "h" (kosambi or haldane)
`ref_genome`	The reference genome name. It is used to fetch the chromosome size information from UCSC database.
`group_by`	character vector contains the unique prefix of sample names that are used for defining different sample groups. Or the column name in colData(co_count) that specify the group factor. If missing all samples are assumed to be from one group
`chrom_info`	A user supplied data.frame containing two columns with column names chrom and size, describing the chromosome names and lengths if not using ref_genome from UCSC. If supplied, the 'ref_genome' is ignored.

Value

GRanges object GRanges for marker intervals or binned intervals with Haldane or Kosambi centiMorgans

Examples

data(coCount)
dist_se <- calGeneticDist(coCount)
# dist_se <- calGeneticDist(coCount,group_by="sampleGroup")

data(coCount)
dist_se <- calGeneticDist(coCount)
# dist_se <- calGeneticDist(coCount,group_by="sampleGroup")

RangedSummarizedExperiment object containing the crossover counts across samples for the list of SNP marker intervals

Description

RangedSummarizedExperiment object containing the crossover counts across samples for the list of SNP marker intervals

Usage

data(coCount)
data(coCount)

Format

An object of class RangedSummarizedExperiment with 3 rows and 10 columns.

`comapr` package

Description

crossover inference package

Details

See the README on GitLab

combineHapState

Description

combine two 'RangedSummarizedExperiment' objects, each contains the haplotype state for a list of SNPs across a set of cells. The combined result will have cells from two individuals and merged list of SNPs from the two.

Usage

combineHapState(rse1, rse2, groupName = c("Sample1", "Sample2"))
combineHapState(rse1, rse2, groupName = c("Sample1", "Sample2"))

Arguments

`rse1`	the first 'RangedSummarizedExperiment'
`rse2`	the second 'RangedSummarizedExperiment'
`groupName`	a character vector of length 2 that contains the first and the second group's names

Value

A 'RangedSummarizedExperiment' that contains the cells and SNPs in both 'rse'

Author(s)

Ruqian Lyu

Examples

BiocParallel::register(BiocParallel::SnowParam(workers = 1))
demo_path <- paste0(system.file("extdata",package = "comapr"),"/")
s1_rse_state <- readHapState("s1",chroms=c("chr1"),
                             path=demo_path,barcodeFile=NULL,minSNP = 0,
                             minlogllRatio = 50,
                             bpDist = 100,maxRawCO=10,
                             minCellSNP = 1)

s2_rse_state <- readHapState("s2",chroms=c("chr1"),
                             path=demo_path,
                             barcodeFile=paste0(demo_path,"s2_barcodes.txt"),
                             minSNP = 0,
                             minlogllRatio = 50,
                             bpDist = 100,maxRawCO=10,
                             minCellSNP = 1)
sb <- combineHapState(s1_rse_state,s2_rse_state)

BiocParallel::register(BiocParallel::SnowParam(workers = 1))
demo_path <- paste0(system.file("extdata",package = "comapr"),"/")
s1_rse_state <- readHapState("s1",chroms=c("chr1"),
                             path=demo_path,barcodeFile=NULL,minSNP = 0,
                             minlogllRatio = 50,
                             bpDist = 100,maxRawCO=10,
                             minCellSNP = 1)

s2_rse_state <- readHapState("s2",chroms=c("chr1"),
                             path=demo_path,
                             barcodeFile=paste0(demo_path,"s2_barcodes.txt"),
                             minSNP = 0,
                             minlogllRatio = 50,
                             bpDist = 100,maxRawCO=10,
                             minCellSNP = 1)
sb <- combineHapState(s1_rse_state,s2_rse_state)

correctGT

Description

function for formatting and correction genotypes of markers

Usage

correctGT(gt_matrix, ref, alt, failed = "Fail", wrong_label = "Homo_ref")
correctGT(gt_matrix, ref, alt, failed = "Fail", wrong_label = "Homo_ref")

Arguments

`gt_matrix`	the input genotype matrix of markers by samples with rownames as marker IDs and column names as sample IDs
`ref`	a vector of genotypes of the inbred reference strain
`alt`	a vector of genotypes of the inbred alternative strain
`failed`	what was used for encoding failed genotype calling such as "Fail" in example data `snp_geno`
`wrong_label`	what would be considered a wrong genotype label for example Homo_ref which should not be in the possible genotypes of BC1F1 samples

Details

This function changes genotype in alleles to genotype labels, change Homo_ref to Hets/Fail, infer Failed genotype, and change "Failed" to NA for counting crossover later

This function changes genotype in alleles to labels by calling internal functions lable_gt, and changes the wrong genotype Homo_ref to Fail by calling .change_missing.

Value

a genotype data.frame of sample genotypes with dimension as the input 'gt_matrix' with genotypes converted to labels and failed calls are changed to NA.

Author(s)

Ruqian Lyu

Examples

data(snp_geno_gr)
data(parents_geno)
snp_gt_crt <- correctGT(gt_matrix = GenomicRanges::mcols(snp_geno_gr),
                      ref = parents_geno$ref,
                      alt = parents_geno$alt,
                      fail = "Fail",
                      wrong_label = "Homo_ref")
data(snp_geno_gr)
data(parents_geno)
snp_gt_crt <- correctGT(gt_matrix = GenomicRanges::mcols(snp_geno_gr),
                      ref = parents_geno$ref,
                      alt = parents_geno$alt,
                      fail = "Fail",
                      wrong_label = "Homo_ref")

countBinState

Description

Bins the chromosome into supplied number of bins and find the state of the chromosome bins across all gamete cells

Usage

countBinState(chr, snpAnno, viState, genomeRange, ntile = 5)
countBinState(chr, snpAnno, viState, genomeRange, ntile = 5)

Arguments

`chr`	character, the chromosome to check
`snpAnno`	data.frame, the SNP annotation for the supplied chromosome
`viState`	dgTMatrix/Matrix, the viterbi state matrix, output from 'sgcocaller'
`genomeRange`	GRanges object with seqlengths information for the genome
`ntile`	integer, how many tiles the chromosome is binned into

Details

This function is used for checking whether chromosome segregation pattern obeys the expected ratio.

Value

a data.frame that contains chromosome bin segregation ratio

Author(s)

Ruqian Lyu

Examples

 
library(IRanges)
library(S4Vectors)

chrom_info <- GenomeInfoDb::getChromInfoFromUCSC("mm10")
seq_length <- chrom_info$size
names(seq_length) <- chrom_info$chrom

dna_mm10_gr <- GenomicRanges::GRanges(
  seqnames = Rle(names(seq_length)),
  ranges = IRanges(1, end = seq_length, names = names(seq_length)),
  seqlengths = seq_length)

GenomeInfoDb::genome(dna_mm10_gr) <- "mm10"
demo_path <- system.file("extdata",package = "comapr")
sampleName <- "s1"
chr <- "chr1"
vi_mtx <- Matrix::readMM(file = paste0(demo_path,"/", sampleName, "_",
                                       chr, "_vi.mtx"))

snpAnno <- read.table(file = paste0(demo_path,"/", sampleName,
                                    "_", chr, "_snpAnnot.txt"),
                                    stringsAsFactors = FALSE,
                      header = TRUE)

countBinState(chr = "chr1",snpAnno = snpAnno,
viState = vi_mtx,genomeRange = dna_mm10_gr, ntile = 1)

library(IRanges)
library(S4Vectors)

chrom_info <- GenomeInfoDb::getChromInfoFromUCSC("mm10")
seq_length <- chrom_info$size
names(seq_length) <- chrom_info$chrom

dna_mm10_gr <- GenomicRanges::GRanges(
  seqnames = Rle(names(seq_length)),
  ranges = IRanges(1, end = seq_length, names = names(seq_length)),
  seqlengths = seq_length)

GenomeInfoDb::genome(dna_mm10_gr) <- "mm10"
demo_path <- system.file("extdata",package = "comapr")
sampleName <- "s1"
chr <- "chr1"
vi_mtx <- Matrix::readMM(file = paste0(demo_path,"/", sampleName, "_",
                                       chr, "_vi.mtx"))

snpAnno <- read.table(file = paste0(demo_path,"/", sampleName,
                                    "_", chr, "_snpAnnot.txt"),
                                    stringsAsFactors = FALSE,
                      header = TRUE)

countBinState(chr = "chr1",snpAnno = snpAnno,
viState = vi_mtx,genomeRange = dna_mm10_gr, ntile = 1)

countCOs

Description

Count number of COs within each marker interval COs identified in the interval overlapping missing markers are distributed according to marker interval base-pair sizes. Genotypes encoded with "0" are treated as missing value.

Usage

countCOs(geno)

## S4 method for signature 'GRanges'
countCOs(geno)

## S4 method for signature 'RangedSummarizedExperiment'
countCOs(geno)
countCOs(geno)

## S4 method for signature 'GRanges'
countCOs(geno)

## S4 method for signature 'RangedSummarizedExperiment'
countCOs(geno)

Arguments

geno

GRanges object or RangedSummarizedExperiment object with genotype matrix that has SNP positions in the rows and cells/samples in the columns

Value

GRanges object or RangedSummarizedExperiment with markers-intervals as rows and samples in columns, values as the number of COs estimated for each marker interval

Author(s)

Ruqian Lyu

Examples

data(twoSamples)
cocount <- countCOs(twoSamples)
data(twoSamples)
cocount <- countCOs(twoSamples)

countGT

Description

count how many samples have genotypes calls across markers and count how many markers that each individual has called genotypes for. This function helps identify poor samples or poor markers for filtering. It can also generate plots that help identify outlier samples/markers

Usage

countGT(geno, plot = TRUE, interactive = FALSE)
countGT(geno, plot = TRUE, interactive = FALSE)

Arguments

`geno`	the genotype data.frame of markers by samples from output of function `correctGT`
`plot`	it determines whether a plot will be generated, defaults to TRUE
`interactive`	it determines whether an interactive plot will be generated

Value

A list of two elements including n_markers and n_samples

Author(s)

Ruqian Lyu

Examples

data(snp_geno_gr)
genotype_counts <- countGT(GenomicRanges::mcols(snp_geno_gr))
data(snp_geno_gr)
genotype_counts <- countGT(GenomicRanges::mcols(snp_geno_gr))

filterGT

Description

Filter markers or samples that have too many missing values

Usage

filterGT(geno, min_markers = 5, min_samples = 3)

## S4 method for signature 'matrix,numeric,numeric'
filterGT(geno, min_markers = 5, min_samples = 3)

## S4 method for signature 'GRanges,numeric,numeric'
filterGT(geno, min_markers = 5, min_samples = 3)
filterGT(geno, min_markers = 5, min_samples = 3)

## S4 method for signature 'matrix,numeric,numeric'
filterGT(geno, min_markers = 5, min_samples = 3)

## S4 method for signature 'GRanges,numeric,numeric'
filterGT(geno, min_markers = 5, min_samples = 3)

Arguments

`geno`	the genotype data.frame of markers by samples from output of function `correctGT`
`min_markers`	the minimum number of markers for a sample to be kept
`min_samples`	the minimum number of samples for a marker to be kept

Details

This function takes the geno data.frame and filter the data.frame by the provided cut-offs.

Value

The filtered genotype matrix

Author(s)

Ruqian Lyu

Examples

data(snp_geno_gr)
corrected_geno <- filterGT(snp_geno_gr, min_markers = 30,min_samples = 2)

data(snp_geno_gr)
corrected_geno <- filterGT(snp_geno_gr, min_markers = 30,min_samples = 2)

findDupSamples

Description

Find the duplicated samples by look at the number of matching genotypes in all pair-wise samples

Usage

findDupSamples(geno, threshold = 0.99, in_text = FALSE)
findDupSamples(geno, threshold = 0.99, in_text = FALSE)

Arguments

`geno`	the genotype data.frame of markers by samples from output of function `correctGT`
`threshold`	the frequency cut-off of number of matching genotypes out of all geneotypes for determining whether the pair of samples are duplicated, defaults to `0.99`. NAs are regarded as same genotypes for two samples if they both have NA for a marker.
`in_text`	whether text of frequencies should be displayed in the heatmap cells

Value

The paris of duplicated samples.

Author(s)

Ruqian Lyu

Examples

data(snp_geno)
or_geno <- snp_geno[,grep("X",colnames(snp_geno))]
rownames(or_geno) <- paste0(snp_geno$CHR,"_",snp_geno$POS)
or_geno[,1] <- or_geno[,5]
cr_geno <- correctGT(or_geno,ref = snp_geno$C57BL.6J,
                    alt = snp_geno$FVB.NJ..i.)
dups <- findDupSamples(cr_geno)
data(snp_geno)
or_geno <- snp_geno[,grep("X",colnames(snp_geno))]
rownames(or_geno) <- paste0(snp_geno$CHR,"_",snp_geno$POS)
or_geno[,1] <- or_geno[,5]
cr_geno <- correctGT(or_geno,ref = snp_geno$C57BL.6J,
                    alt = snp_geno$FVB.NJ..i.)
dups <- findDupSamples(cr_geno)

getAFTracks

Description

Generate the raw alternative allele frequencies tracks for all cells in the columns of provided 'co_count'

Usage

getAFTracks(
  chrom = "chr1",
  path_loc = "./output/firstBatch/WC_522/",
  sampleName = "WC_522",
  nwindow = 80,
  barcodeFile,
  co_count,
  snp_track = NULL
)
getAFTracks(
  chrom = "chr1",
  path_loc = "./output/firstBatch/WC_522/",
  sampleName = "WC_522",
  nwindow = 80,
  barcodeFile,
  co_count,
  snp_track = NULL
)

Arguments

`chrom`	the chromosome
`path_loc`	the path prefix to the output files from sscocaller including "*_totalCount.mtx" and "_altCount.mtx"
`sampleName`	the sample name, which is the prefix of sscocaller's output files
`nwindow`	the number of windows for binning the chromosome
`barcodeFile`	the barcode file containing the list of cell barcodes used as the input file for sscocaller
`co_count`	'GRange' or 'RangedSummarizedExperiment' object, returned by `countCO` that contains the crossover intervals and the number of crossovers in each cell.
`snp_track`	the SNP position track which is used for obtaining the SNP chromosome locations. It could be omitted and the SNP positions will be acquired from the "*_snpAnnot.txt" file.

Value

a list object, in which each element is a list of two items with the cell's alternative allele frequency DataTrack and the called crossover ranges.

Author(s)

Ruqian Lyu

Examples

demo_path <- system.file("extdata",package = "comapr")
s1_rse_state <- readHapState("s1",chroms=c("chr1"),
                             path=demo_path,barcodeFile=NULL,minSNP = 0,
                             minlogllRatio = 50,
                             bpDist = 100,maxRawCO=10,
                             minCellSNP = 0)
s1_counts <- countCOs(s1_rse_state)

af_co_tracks <- getAFTracks(chrom ="chr1",
                               path_loc = demo_path,
                               sampleName = "s1",
                               barcodeFile = file.path(demo_path,
                                                    "s1_barcodes.txt"),
                               co_count = s1_counts)

demo_path <- system.file("extdata",package = "comapr")
s1_rse_state <- readHapState("s1",chroms=c("chr1"),
                             path=demo_path,barcodeFile=NULL,minSNP = 0,
                             minlogllRatio = 50,
                             bpDist = 100,maxRawCO=10,
                             minCellSNP = 0)
s1_counts <- countCOs(s1_rse_state)

af_co_tracks <- getAFTracks(chrom ="chr1",
                               path_loc = demo_path,
                               sampleName = "s1",
                               barcodeFile = file.path(demo_path,
                                                    "s1_barcodes.txt"),
                               co_count = s1_counts)

getCellAFTrack Generates the DataTracks for plotting AF and crossover regions

Description

It plots the raw alternative allele frequencies and highlight the crossover regions for the selected cell.

Usage

getCellAFTrack(
  chrom = "chr1",
  path_loc = "./output/firstBatch/WC_522/",
  sampleName = "WC_522",
  nwindow = 80,
  barcodeFile,
  cellBarcode,
  co_count,
  snp_track = NULL,
  chunk = 1000L
)

getCellAFTrack(
  chrom = "chr1",
  path_loc = "./output/firstBatch/WC_522/",
  sampleName = "WC_522",
  nwindow = 80,
  barcodeFile,
  cellBarcode,
  co_count,
  snp_track = NULL,
  chunk = 1000L
)
getCellAFTrack(
  chrom = "chr1",
  path_loc = "./output/firstBatch/WC_522/",
  sampleName = "WC_522",
  nwindow = 80,
  barcodeFile,
  cellBarcode,
  co_count,
  snp_track = NULL,
  chunk = 1000L
)

getCellAFTrack(
  chrom = "chr1",
  path_loc = "./output/firstBatch/WC_522/",
  sampleName = "WC_522",
  nwindow = 80,
  barcodeFile,
  cellBarcode,
  co_count,
  snp_track = NULL,
  chunk = 1000L
)

Arguments

`chrom`	the chromosome
`path_loc`	the path prefix to the output files from sscocaller including "*_totalCount.mtx" and "_altCount.mtx"
`sampleName`	the sample name, which is the prefix of sscocaller's output files
`nwindow`	the number of windows for binning the chromosome
`barcodeFile`	the barcode file containing the list of cell barcodes used as the input file for sscocaller
`cellBarcode`	the selected cell barcode
`co_count`	'GRange' or 'RangedSummarizedExperiment' object, returned by `countCO` that contains the crossover intervals and the number of crossovers in each cell.
`snp_track`	the SNP position track which is used for obtaining the SNP chromosome locations. It could be omitted and the SNP positions will be acquired from the "*_snpAnnot.txt" file.
`chunk`	An integer scalar indicating the chunk size to use, i.e., number of rows to read at any one time.

Value

The DataTrack object defined in DataTrack

Author(s)

Ruqian Lyu

Examples

demo_path <-paste0(system.file("extdata",package = "comapr"),"/")
s1_rse_state <- readHapState("s1",chroms=c("chr1"),
                             path=demo_path,barcodeFile=NULL,minSNP = 0,
                             minlogllRatio = 50,
                             bpDist = 100,maxRawCO=10,
                             minCellSNP = 0)
s1_counts <- countCOs(s1_rse_state)
af_co_tracks <- getCellAFTrack(chrom ="chr1",
                               path_loc = demo_path,
                               sampleName = "s1",
                               barcodeFile = paste0(demo_path,
                                                    "s1_barcodes.txt"),
                               cellBarcode = "BC1",
                               co_count = s1_counts)

demo_path <-paste0(system.file("extdata",package = "comapr"),"/")
s1_rse_state <- readHapState("s1",chroms=c("chr1"),
                             path=demo_path,barcodeFile=NULL,minSNP = 0,
                             minlogllRatio = 50,
                             bpDist = 100,maxRawCO=10,
                             minCellSNP = 0)
s1_counts <- countCOs(s1_rse_state)
af_co_tracks <- getCellAFTrack(chrom ="chr1",
                               path_loc = demo_path,
                               sampleName = "s1",
                               barcodeFile = paste0(demo_path,
                                                    "s1_barcodes.txt"),
                               cellBarcode = "BC1",
                               co_count = s1_counts)

demo_path <-paste0(system.file("extdata",package = "comapr"),"/")
s1_rse_state <- readHapState("s1",chroms=c("chr1"),
                             path=demo_path,barcodeFile=NULL,minSNP = 0,
                             minlogllRatio = 50,
                             bpDist = 100,maxRawCO=10,
                             minCellSNP = 0)
s1_counts <- countCOs(s1_rse_state)
af_co_tracks <- getCellAFTrack(chrom ="chr1",
                               path_loc = demo_path,
                               sampleName = "s1",
                               barcodeFile = paste0(demo_path,
                                                    "s1_barcodes.txt"),
                               cellBarcode = "BC1",
                               co_count = s1_counts)

demo_path <-paste0(system.file("extdata",package = "comapr"),"/")
s1_rse_state <- readHapState("s1",chroms=c("chr1"),
                             path=demo_path,barcodeFile=NULL,minSNP = 0,
                             minlogllRatio = 50,
                             bpDist = 100,maxRawCO=10,
                             minCellSNP = 0)
s1_counts <- countCOs(s1_rse_state)
af_co_tracks <- getCellAFTrack(chrom ="chr1",
                               path_loc = demo_path,
                               sampleName = "s1",
                               barcodeFile = paste0(demo_path,
                                                    "s1_barcodes.txt"),
                               cellBarcode = "BC1",
                               co_count = s1_counts)

getCellCORange

Description

It finds the crossover intervals for a selected cell

Usage

getCellCORange(co_count, cellBarcode)
getCellCORange(co_count, cellBarcode)

Arguments

`co_count`	'GRanges' or 'RangedSummarizedExperiment' object,
`cellBarcode`	the selected cell's barcode

Value

GRange object containing the crossover intervals for the selected cell

Author(s)

Ruqian Lyu

Examples

  demo_path <-paste0(system.file("extdata",package = "comapr"),"/")
s1_rse_state <- readHapState("s1",chroms=c("chr1"),
                             path=demo_path,barcodeFile=NULL,minSNP = 0,
                             minlogllRatio = 50,
                             bpDist = 100,maxRawCO=10,
                             minCellSNP = 0)
s1_counts <- countCOs(s1_rse_state)

co_ranges <- getCellCORange(cellBarcode = "BC1",
                            co_count = s1_counts)
demo_path <-paste0(system.file("extdata",package = "comapr"),"/")
s1_rse_state <- readHapState("s1",chroms=c("chr1"),
                             path=demo_path,barcodeFile=NULL,minSNP = 0,
                             minlogllRatio = 50,
                             bpDist = 100,maxRawCO=10,
                             minCellSNP = 0)
s1_counts <- countCOs(s1_rse_state)

co_ranges <- getCellCORange(cellBarcode = "BC1",
                            co_count = s1_counts)

getCellDPTrack Generates the DataTrack for plotting DP of a selected cell

Description

It plots the total allele counts for the selected cell.

Usage

getCellDPTrack(
  chrom = "chr1",
  path_loc = "./output/firstBatch/WC_522/",
  sampleName = "WC_522",
  nwindow = 80,
  barcodeFile,
  cellBarcode,
  snp_track = NULL,
  chunk = 1000L,
  log = TRUE,
  plot_type = "hist"
)
getCellDPTrack(
  chrom = "chr1",
  path_loc = "./output/firstBatch/WC_522/",
  sampleName = "WC_522",
  nwindow = 80,
  barcodeFile,
  cellBarcode,
  snp_track = NULL,
  chunk = 1000L,
  log = TRUE,
  plot_type = "hist"
)

Arguments

`chrom`	the chromosome
`path_loc`	the path prefix to the output files from sscocaller including "*_totalCount.mtx"
`sampleName`	the sample name, which is the prefix of sscocaller's output files
`nwindow`	the number of windows for binning the chromosome
`barcodeFile`	the barcode file containing the list of cell barcodes used as the input file for sscocaller
`cellBarcode`	the selected cell barcode
`snp_track`	the SNP position track which is used for obtaining the SNP chromosome locations. It could be omitted and the SNP positions will be acquired from the "*_snpAnnot.txt" file.
`chunk`	A integer scalar indicating the chunk size to use, i.e., number of rows to read at any one time.
`log`	whether the histogram of SNP density should be plotted on log scale (log10)
`plot_type`	the DataTrack plot type, default to be 'hist'

Value

The DataTrack object defined in DataTrack

Author(s)

Ruqian Lyu

Examples

demo_path <- system.file("extdata",package = "comapr")
s1_rse_state <- readHapState("s1",chroms=c("chr1"),
                             path=demo_path,barcodeFile=NULL,minSNP = 0,
                             minlogllRatio = 50,
                             bpDist = 100,maxRawCO=10,
                             minCellSNP = 0)
s1_counts <- countCOs(s1_rse_state)
dp_co_tracks <- getCellDPTrack(chrom ="chr1",
                               path_loc = demo_path,
                               sampleName = "s1",
                               barcodeFile = file.path(demo_path,
                                                    "s1_barcodes.txt"),
                               cellBarcode = "BC1")

demo_path <- system.file("extdata",package = "comapr")
s1_rse_state <- readHapState("s1",chroms=c("chr1"),
                             path=demo_path,barcodeFile=NULL,minSNP = 0,
                             minlogllRatio = 50,
                             bpDist = 100,maxRawCO=10,
                             minCellSNP = 0)
s1_counts <- countCOs(s1_rse_state)
dp_co_tracks <- getCellDPTrack(chrom ="chr1",
                               path_loc = demo_path,
                               sampleName = "s1",
                               barcodeFile = file.path(demo_path,
                                                    "s1_barcodes.txt"),
                               cellBarcode = "BC1")

getDistortedMarkers

Description

Marker segregation distortion detection using chisq-test

Usage

getDistortedMarkers(geno, p = c(0.5, 0.5), adj.method = "BH")
getDistortedMarkers(geno, p = c(0.5, 0.5), adj.method = "BH")

Arguments

`geno`	the genotype data.frame of markers by samples from output of function `correctGT`
`p`	the expected geneotype ratio in a numeric vector, defaults to c(0.5,0.5)
`adj.method`	Methods to adjust for multiple comparisons, defaults to "BH"

Details

We expect the genotypes to appear with the frequencies of 1:1 homo_alt:hets. We usechisq.test for finding markers that have genotypes among samples that are significantly different from the 1:1 ratio and report them

Value

data.frame with each row representing one SNP marker and columns containing the chisq.test results

Author(s)

Ruqian Lyu

Examples

data(parents_geno)
data(snp_geno_gr)
corrected_geno <- correctGT(gt_matrix = GenomicRanges::mcols(snp_geno_gr),
ref = parents_geno$ref,alt = parents_geno$alt,fail = "Fail",
wrong_label = "Homo_ref")
GenomicRanges::mcols(snp_geno_gr) <- corrected_geno
corrected_geno <- filterGT(snp_geno_gr, min_markers = 30,min_samples = 2)
pvalues <- getDistortedMarkers(GenomicRanges::mcols(corrected_geno))
data(parents_geno)
data(snp_geno_gr)
corrected_geno <- correctGT(gt_matrix = GenomicRanges::mcols(snp_geno_gr),
ref = parents_geno$ref,alt = parents_geno$alt,fail = "Fail",
wrong_label = "Homo_ref")
GenomicRanges::mcols(snp_geno_gr) <- corrected_geno
corrected_geno <- filterGT(snp_geno_gr, min_markers = 30,min_samples = 2)
pvalues <- getDistortedMarkers(GenomicRanges::mcols(corrected_geno))

getMeanDPTrack

Description

Generate the mean DP (Depth) DataTrack (from Gviz) for cells

Usage

getMeanDPTrack(
  chrom = "chr1",
  path_loc,
  nwindow = 80,
  sampleName,
  barcodeFile,
  plot_type = "hist",
  selectedBarcodes = NULL,
  snp_track = NULL,
  log = TRUE
)
getMeanDPTrack(
  chrom = "chr1",
  path_loc,
  nwindow = 80,
  sampleName,
  barcodeFile,
  plot_type = "hist",
  selectedBarcodes = NULL,
  snp_track = NULL,
  log = TRUE
)

Arguments

`chrom`	the chromosome
`path_loc`	the path prefix to the output files from sscocaller including "*_totalCount.mtx" and "_altCount.mtx"
`nwindow`	the number of windows for binning the chromosome
`sampleName`	the sample name, which is the prefix of sscocaller's output files
`barcodeFile`	the barcode file containing the list of cell barcodes used as the input file for sscocaller
`plot_type`	the DataTrack plot type, default to be 'hist'
`selectedBarcodes`	the selected cell barcodes which should be the barcodes that have been called crossovers for. If not supplied then all cells are counted.
`snp_track`	the SNP position track which is used for obtaining the SNP chromosome locations. It could be omitted and the SNP positions will be acquired from the "*_snpAnnot.txt" file.
`log`	whether the histogram of SNP density should be plotted on log scale (log10)

Value

DataTrack object plotting the mean DP histogram for windowed chromosomes

Author(s)

Ruqian Lyu

Examples

demo_path <-paste0(system.file("extdata",package = "comapr"),"/")
meanDP_track <- getMeanDPTrack(chrom ="chr1",
                               path_loc = demo_path,
                               sampleName = "s1",
                               barcodeFile = paste0(demo_path,
                                                    "s1_barcodes.txt"))

demo_path <-paste0(system.file("extdata",package = "comapr"),"/")
meanDP_track <- getMeanDPTrack(chrom ="chr1",
                               path_loc = demo_path,
                               sampleName = "s1",
                               barcodeFile = paste0(demo_path,
                                                    "s1_barcodes.txt"))

getSNPDensityTrack

Description

Generate the SNP density DataTrack (from 'Gviz') for selected chromosome

Usage

getSNPDensityTrack(
  chrom = "chr1",
  sampleName = "s1",
  path_loc = ".",
  nwindow = 80,
  plot_type = "hist",
  log = TRUE
)
getSNPDensityTrack(
  chrom = "chr1",
  sampleName = "s1",
  path_loc = ".",
  nwindow = 80,
  plot_type = "hist",
  log = TRUE
)

Arguments

`chrom`	the chromosome
`sampleName`	the sample name, which is the prefix of sscocaller's output files
`path_loc`	the path prefix to the output files from sscocaller including "*_totalCount.mtx" and "_altCount.mtx"
`nwindow`	the number of windows for binning the chromosome
`plot_type`	the DataTrack plot type, default to be 'hist'
`log`	whether the histogram of SNP density should be plotted on log scale (log10)

Value

DataTrack object plotting the SNP density histogram

Author(s)

Ruqian Lyu

Examples

demo_path <- system.file("extdata",package = "comapr")
snp_track <- getSNPDensityTrack(chrom ="chr1",
                               path_loc = demo_path,
                               sampleName = "s1")
demo_path <- system.file("extdata",package = "comapr")
snp_track <- getSNPDensityTrack(chrom ="chr1",
                               path_loc = demo_path,
                               sampleName = "s1")

Parents' genotype for F1 samples in 'snp_geno'

Description

Parents' genotype for F1 samples in 'snp_geno'

Usage

data(parents_geno)
data(parents_geno)

Format

A data.frame:

C57BL.6J: genotype of reference mouse train across markers
FVB.NJ..i.: genotype of alternative mouse train across markers

perCellChrQC

Description

A function that parses output ('_viSegInfo.txt' )from 'sgcocaller' https://gitlab.svi.edu.au/biocellgen-public/sgcocaller and generate cell cell (per chr) summary statistics

Usage

perCellChrQC(
  sampleName,
  chroms = c("chr1", "chr7", "chr15"),
  path,
  barcodeFile = NULL,
  doPlot = TRUE
)
perCellChrQC(
  sampleName,
  chroms = c("chr1", "chr7", "chr15"),
  path,
  barcodeFile = NULL,
  doPlot = TRUE
)

Arguments

`sampleName`	the name of the sample to parse which is used as prefix for finding relevant files for the underlying sample
`chroms`	the character vectors of chromosomes to parse. Multiple chromosomes' results will be concated together.
`path`	the path to the files, with name patterns chrom_vi.mtx, chrom_viSegInfo.txt, end with slash
`barcodeFile`	defaults to NULL, it is assumed to be in the same d irectory as the other files and with name sampleName_barcodes.txt
`doPlot`	whether a plot should returned, default to TRUE

Value

a list object that contains the data.frame with summarised statistics per chr per cell and a plot (if doPlot)

Author(s)

Ruqian Lyu

Examples

demo_path <-system.file("extdata",package = "comapr")
pcQC <- perCellChrQC(sampleName="s1",chroms=c("chr1"),
path=demo_path,
barcodeFile=NULL)
demo_path <-system.file("extdata",package = "comapr")
pcQC <- perCellChrQC(sampleName="s1",chroms=c("chr1"),
path=demo_path,
barcodeFile=NULL)

permuteDist

Description

Permutation test of two sample groups

Usage

permuteDist(co_gr, B = 100, mapping_fun = "k", group_by)
permuteDist(co_gr, B = 100, mapping_fun = "k", group_by)

Arguments

`co_gr`	GRanges or RangedSummarizedExperiment object that contains the crossover counts for each marker interval across all samples. Returned by `countCOs`
`B`	integer the number of sampling times
`mapping_fun`	character default to "k" (kosambi mapping function). It can be one of the mapping functions: "k","h"
`group_by`	the prefix for each group that we need to generate distributions for(only when co_gr is a GRanges object). Or the column name for 'colData(co_gr)' that contains the group factor (only when co_gr is a RangedSummarizedExperiment object)

Details

It shuffles the group labels for the samples and calculate a difference between two groups after shuffling.

Value

A list of three elements. 'permutes' of length B with numeric differences of permuted group differences,'observed_diff' the observed genetic distances of two groups, 'nSample', the number of samples in the first and second group.

Author(s)

Ruqian Lyu

Examples

data(coCount)
perms <- permuteDist(coCount, group_by = "sampleGroup",B=10)
data(coCount)
perms <- permuteDist(coCount, group_by = "sampleGroup",B=10)

perSegChrQC

Description

Plots the summary statistics of segments that are generated by 'sgcocaller' https://gitlab.svi.edu.au/biocellgen-public/sgcocaller which have been detected by finding consequtive viter states along the list of SNP markers.

Usage

perSegChrQC(
  sampleName,
  chroms = c("chr1", "chr7", "chr15"),
  path,
  barcodeFile = NULL,
  maxRawCO = 10
)
perSegChrQC(
  sampleName,
  chroms = c("chr1", "chr7", "chr15"),
  path,
  barcodeFile = NULL,
  maxRawCO = 10
)

Arguments

`sampleName`	the name of the sample to parse which is used as prefix for finding relevant files for the underlying sample
`chroms`	the vector of chromosomes
`path`	the path to the files, with name patterns chrom_vi.mtx, chrom_viSegInfo.txt, end with slash
`barcodeFile`	defaults to NULL, it is assumed to be in the same directory as the other files and with name sampleName_barcodes.txt
`maxRawCO`	if a cell has more than 'maxRawCO' number of raw crossovers called across a chromosome, the cell is filtered out#'

Details

It provides guidance in filtering out close double crossovers that are not likely biological but due to technical reasons as well as crossovers that are supported by fewer number of SNPs at the ends of the chromosomes.

Value

Histogram plots for statistics summarized across all Viterbi state segments

Author(s)

Ruqian Lyu

Examples

demo_path <- system.file("extdata",package = "comapr")
s1_rse_qc <- perSegChrQC(sampleName="s1",
                            chroms=c("chr1"),
                            path=demo_path, maxRawCO=10)

demo_path <- system.file("extdata",package = "comapr")
s1_rse_qc <- perSegChrQC(sampleName="s1",
                            chroms=c("chr1"),
                            path=demo_path, maxRawCO=10)

plotCount

Description

Plot the number of COs per sample group or per chromosome

Usage

plotCount(
  co_count,
  by_chr = FALSE,
  group_by = "sampleGroup",
  plot_type = "error_bar"
)

## S4 method for signature 'RangedSummarizedExperiment,missing,missing'
plotCount(
  co_count,
  by_chr = FALSE,
  group_by = "sampleGroup",
  plot_type = "error_bar"
)

## S4 method for signature 'RangedSummarizedExperiment,missing,character'
plotCount(
  co_count,
  by_chr = FALSE,
  group_by = "sampleGroup",
  plot_type = "error_bar"
)

## S4 method for signature 'RangedSummarizedExperiment,logical,character'
plotCount(
  co_count,
  by_chr = FALSE,
  group_by = "sampleGroup",
  plot_type = "error_bar"
)

## S4 method for signature 'RangedSummarizedExperiment,logical,missing'
plotCount(
  co_count,
  by_chr = FALSE,
  group_by = "sampleGroup",
  plot_type = "error_bar"
)

## S4 method for signature 'GRanges,logical,missing'
plotCount(
  co_count,
  by_chr = FALSE,
  group_by = "sampleGroup",
  plot_type = "error_bar"
)

## S4 method for signature 'GRanges,missing,missing'
plotCount(
  co_count,
  by_chr = FALSE,
  group_by = "sampleGroup",
  plot_type = "error_bar"
)

## S4 method for signature 'GRanges,missing,character'
plotCount(
  co_count,
  by_chr = FALSE,
  group_by = "sampleGroup",
  plot_type = "error_bar"
)

## S4 method for signature 'GRanges,logical,character'
plotCount(
  co_count,
  by_chr = FALSE,
  group_by = "sampleGroup",
  plot_type = "error_bar"
)
plotCount(
  co_count,
  by_chr = FALSE,
  group_by = "sampleGroup",
  plot_type = "error_bar"
)

## S4 method for signature 'RangedSummarizedExperiment,missing,missing'
plotCount(
  co_count,
  by_chr = FALSE,
  group_by = "sampleGroup",
  plot_type = "error_bar"
)

## S4 method for signature 'RangedSummarizedExperiment,missing,character'
plotCount(
  co_count,
  by_chr = FALSE,
  group_by = "sampleGroup",
  plot_type = "error_bar"
)

## S4 method for signature 'RangedSummarizedExperiment,logical,character'
plotCount(
  co_count,
  by_chr = FALSE,
  group_by = "sampleGroup",
  plot_type = "error_bar"
)

## S4 method for signature 'RangedSummarizedExperiment,logical,missing'
plotCount(
  co_count,
  by_chr = FALSE,
  group_by = "sampleGroup",
  plot_type = "error_bar"
)

## S4 method for signature 'GRanges,logical,missing'
plotCount(
  co_count,
  by_chr = FALSE,
  group_by = "sampleGroup",
  plot_type = "error_bar"
)

## S4 method for signature 'GRanges,missing,missing'
plotCount(
  co_count,
  by_chr = FALSE,
  group_by = "sampleGroup",
  plot_type = "error_bar"
)

## S4 method for signature 'GRanges,missing,character'
plotCount(
  co_count,
  by_chr = FALSE,
  group_by = "sampleGroup",
  plot_type = "error_bar"
)

## S4 method for signature 'GRanges,logical,character'
plotCount(
  co_count,
  by_chr = FALSE,
  group_by = "sampleGroup",
  plot_type = "error_bar"
)

Arguments

`co_count`	GRange or RangedSummarizedExperiment object, returned by `countCO`
`by_chr`	whether it should plot each chromosome separately
`group_by`	the column name in 'colData(co_count)' that specify the grouping factor. Or the character vector contains the unique prefix of sample names that are used for defining different sample groups. If missing all samples are assumed to be from one group
`plot_type`	determins what type the plot will be, choose from "error_bar" or "hist". Only relevant when by_chr=TRUE

Value

ggplot object

Examples

demo_path <-paste0(system.file("extdata",package = "comapr"),"/")
s1_rse_state <- readHapState("s1",chroms=c("chr1"),
                             path=demo_path,barcodeFile=NULL,minSNP = 0,
                             minlogllRatio = 50,
                             bpDist = 100,maxRawCO=10,
                             minCellSNP = 0)
s1_count <- countCOs(s1_rse_state)
plotCount(s1_count)
demo_path <-paste0(system.file("extdata",package = "comapr"),"/")
s1_rse_state <- readHapState("s1",chroms=c("chr1"),
                             path=demo_path,barcodeFile=NULL,minSNP = 0,
                             minlogllRatio = 50,
                             bpDist = 100,maxRawCO=10,
                             minCellSNP = 0)
s1_count <- countCOs(s1_rse_state)
plotCount(s1_count)

plotGeneticDist

Description

Plotting the calculated genetic distanced for each bin or marker interval supplied by the GRanges object

Usage

plotGeneticDist(gr, bin = TRUE, chr = NULL, cumulative = FALSE)
plotGeneticDist(gr, bin = TRUE, chr = NULL, cumulative = FALSE)

Arguments

`gr`	GRanges object with genetic distances calculated for marker intervals
`bin`	TRUE or FALSE, indicating whether the supplied GRange objecct is for binned interval
`chr`	the specific chrs selected to plot
`cumulative`	TRUE or FALSE, indicating whether it plots the bin-wise genetic distances or the cumulative distances

Value

ggplot2 plot

Author(s)

Ruqian Lyu

Examples

data(coCount)
dist_se <- calGeneticDist(coCount)
plotGeneticDist(SummarizedExperiment::rowRanges(dist_se))
data(coCount)
dist_se <- calGeneticDist(coCount)
plotGeneticDist(SummarizedExperiment::rowRanges(dist_se))

plotGTFreq

Description

Function to plot the genotypes for all samples faceted by genotype

Usage

plotGTFreq(geno)
plotGTFreq(geno)

Arguments

geno

the genotype data.frame of markers by samples from output of function correctGT

Value

A ggplot object

Author(s)

Ruqian Lyu

Examples


data(snp_geno)
or_geno <- snp_geno[,grep("X",colnames(snp_geno))]
rownames(or_geno) <- paste0(snp_geno$CHR,"_",snp_geno$POS)
or_geno[1,] <- rep("Fail",dim(or_geno)[2])
cr_geno <- correctGT(or_geno,ref = snp_geno$C57BL.6J,
                    alt = snp_geno$FVB.NJ..i.)
ft_gt <- filterGT(cr_geno)
plotGTFreq(ft_gt)
data(snp_geno)
or_geno <- snp_geno[,grep("X",colnames(snp_geno))]
rownames(or_geno) <- paste0(snp_geno$CHR,"_",snp_geno$POS)
or_geno[1,] <- rep("Fail",dim(or_geno)[2])
cr_geno <- correctGT(or_geno,ref = snp_geno$C57BL.6J,
                    alt = snp_geno$FVB.NJ..i.)
ft_gt <- filterGT(cr_geno)
plotGTFreq(ft_gt)

Plot cumulative genetic distances across the genome

Description

This function takes the calculated genetic distances for all marker intervals across all chromosomes provided and plot the cumulative genetic distances

Usage

plotWholeGenome(gr)
plotWholeGenome(gr)

Arguments

`gr`	GRanges object with genetic distances calculated for marker intervals

Value

A ggplot object

Examples

data(coCount)
dist_se <- calGeneticDist(coCount)
plotWholeGenome(SummarizedExperiment::rowRanges(dist_se))
data(coCount)
dist_se <- calGeneticDist(coCount)
plotWholeGenome(SummarizedExperiment::rowRanges(dist_se))

readColMM

Description

Modified the 'Matrix::readMM' function for reading matrices stored in the Harwell-Boeing or MatrixMarket formats but only reads selected column.

Usage

readColMM(file, which.col, chunk = 1000L)
readColMM(file, which.col, chunk = 1000L)

Arguments

`file`	the name of the file to be read from as a character scalar. Those storing matrices in the MatrixMarket format usually end in ".mtx".
`which.col`	An integer scalar, the column index
`chunk`	An integer scalar indicating the chunk size to use, i.e., number of rows to read at any one time.

Details

See readMM

Value

A sparse matrix object that inherits from the "Matrix" class which the original dimensions.To get the vector of the specified column, one need to subset the matrix to select the column with the same index.

Author(s)

Ruqian Lyu

Examples

demo_path <-paste0(system.file("extdata",package = "comapr"),"/")
readColMM(file = paste0(demo_path,"s1_chr1_vi.mtx"), which.col=2,chunk=2)
demo_path <-paste0(system.file("extdata",package = "comapr"),"/")
readColMM(file = paste0(demo_path,"s1_chr1_vi.mtx"), which.col=2,chunk=2)

readHapState

Description

A function that parses the viterbi state matrix (in .mtx format), barcode.txt and snpAnno.txt files for each individual.

Usage

readHapState(
  sampleName,
  chroms = c("chr1"),
  path,
  barcodeFile = NULL,
  minSNP = 30,
  minlogllRatio = 200,
  bpDist = 100,
  maxRawCO = 10,
  nmad = 1.5,
  minCellSNP = 200,
  biasTol = 0.45
)
readHapState(
  sampleName,
  chroms = c("chr1"),
  path,
  barcodeFile = NULL,
  minSNP = 30,
  minlogllRatio = 200,
  bpDist = 100,
  maxRawCO = 10,
  nmad = 1.5,
  minCellSNP = 200,
  biasTol = 0.45
)

Arguments

`sampleName`	the name of the sample to parse which is used as prefix for finding relevant files for the underlying sample
`chroms`	the character vectors of chromosomes to parse. Multiple chromosomes' results will be concated together.
`path`	the path to the files, with name patterns chrom_vi.mtx, chrom_viSegInfo.txt, end with slash
`barcodeFile`	if NULL, it is assumed to be in the same directory as the other files and with name sampleName_barcodes.txt
`minSNP`	the crossover(s) will be filtered out if introduced by a segment that has fewer than 'minSNP' SNPs to support.
`minlogllRatio`	the crossover(s) will be filtered out if introduced by a segment that has lower than 'minlogllRatio' to its reversed state.
`bpDist`	the crossover(s) will be filtered out if introduced by a segment that is shorter than 'bpDist' basepairs. It can be a single value or a vector that is the same length and order with 'chroms'.
`maxRawCO`	if a cell has more than 'maxRawCO' number of raw crossovers called across a chromosome, the cell is filtered out
`nmad`	how many mean absolute deviations lower than the median number of SNPs per cellfor a cell to be considered as low coverage cell and filtered Only effective when number of cells are larger than 10. When effective, this or 'minCellSNP', whichever is larger, is applied
`minCellSNP`	the minimum number of SNPs detected for a cell to be kept, used with 'nmads'
`biasTol`	the SNP's haplotype ratio across all cells is assumed to be 1:1. This argument can be used for removing SNPs that have a biased haplotype. i.e. almost always inferred to be haplotype state 1. It specifies a bias tolerance value, SNPs with haplotype ratios deviating from 0.5 smaller than this value are kept. Only effective when number of cells are larger than 10

Value

a RangedSummarizedExperiment with rowRanges as SNP positions that contribute to crossovers in any cells. colData contains cells annotation including barcodes and sampleName.

Author(s)

Ruqian Lyu

Examples

demo_path <- system.file("extdata",package = "comapr")
s1_rse_state <- readHapState(sampleName="s1",chroms=c("chr1"),
path=paste0(demo_path,"/"),
barcodeFile=NULL,minSNP = 0, minlogllRatio = 50,
bpDist = 100,maxRawCO=10,minCellSNP=3)
s1_rse_state
demo_path <- system.file("extdata",package = "comapr")
s1_rse_state <- readHapState(sampleName="s1",chroms=c("chr1"),
path=paste0(demo_path,"/"),
barcodeFile=NULL,minSNP = 0, minlogllRatio = 50,
bpDist = 100,maxRawCO=10,minCellSNP=3)
s1_rse_state

Markers by genotype results for a group of samples

Description

Markers by genotype results for a group of samples

Usage

data(snp_geno)
data(snp_geno)

Format

A data frame with columns:

C57BL.6J: genotype of reference mouse train across markers
FVB.NJ..i.: genotype of alternative mouse train across markers
POS: SNP marker base-pair location
CHR: SNP marker chromosome location
X100: a mouse sample
X101: a mouse sample
X102: a mouse sample
X103: a mouse sample
X104: a mouse sample
X105: a mouse sample
X106: a mouse sample
X107: a mouse sample
X108: a mouse sample
X109: a mouse sample
X110: a mouse sample
X111: a mouse sample
X112: a mouse sample
X113: a mouse sample
X92: a mouse sample
X93: a mouse sample
X94: a mouse sample
X95: a mouse sample
X96: a mouse sample
X97: a mouse sample
X98: a mouse sample
X99: a mouse sample
rsID: the SNP ID

Source

Statistics Canada. Table 001-0008 - Production and farm value of maple products, annual. http://www5.statcan.gc.ca/cansim/

Markers by genotype results for a group of samples

Description

Markers by genotype results for a group of samples

Usage

data(snp_geno_gr)
data(snp_geno_gr)

Format

A GRanges object:

X100: a mouse sample
X101: a mouse sample
X102: a mouse sample
X103: a mouse sample
X104: a mouse sample
X105: a mouse sample
X106: a mouse sample
X107: a mouse sample
X108: a mouse sample
X109: a mouse sample
X110: a mouse sample
X111: a mouse sample
X112: a mouse sample
X113: a mouse sample
X92: a mouse sample
X93: a mouse sample
X94: a mouse sample
X95: a mouse sample
X96: a mouse sample
X97: a mouse sample
X98: a mouse sample
X99: a mouse sample
rsID: the SNP ID

Source

TBD

RangedSummarizedExperiment object containing the Viterbi states SNP markers for samples from two groups. 'colData(twoSamples)' contains the sample group factor.

Description

RangedSummarizedExperiment object containing the Viterbi states SNP markers for samples from two groups. 'colData(twoSamples)' contains the sample group factor.

Usage

data(twoSamples)
data(twoSamples)

Format

An object of class RangedSummarizedExperiment with 6 rows and 10 columns.

Package 'comapr'

Help Index

bootstrapDist

Description

Usage

Arguments

Details

Value

Author(s)

Examples

calGeneticDist

Description

Usage

Arguments

Value

Examples

RangedSummarizedExperiment object containing the crossover counts across samples for the list of SNP marker intervals

Description

Usage

Format

comapr package

Description

Details

combineHapState

Description

Usage

Arguments

Value

Author(s)

Examples

correctGT

Description

Usage

Arguments

Details

Value

Author(s)

Examples

countBinState

Description

Usage

Arguments

Details

Value

Author(s)

Examples

countCOs

Description

Usage

Arguments

Value

Author(s)

Examples

countGT

Description

Usage

Arguments

Value

Author(s)

Examples

filterGT

Description

Usage

Arguments

Details

Value

Author(s)

Examples

findDupSamples

Description

Usage

Arguments

Value

Author(s)

Examples

getAFTracks

Description

Usage

Arguments

Value

`comapr` package