Title: | Simulation of Rare Variant Genetic Data |
---|---|
Description: | Haplotype simulations of rare variant genetic data that emulates real data can be performed with RAREsim. RAREsim uses the expected number of variants in MAC bins - either as provided by default parameters or estimated from target data - and an abundance of rare variants as simulated HAPGEN2 to probabilistically prune variants. RAREsim produces haplotypes that emulate real sequencing data with respect to the total number of variants, allele frequency spectrum, haplotype structure, and variant annotation. |
Authors: | Megan Null [aut], Ryan Barnard [cre] |
Maintainer: | Ryan Barnard <[email protected]> |
License: | GPL-3 |
Version: | 1.11.0 |
Built: | 2024-10-31 04:19:29 UTC |
Source: | https://github.com/bioc/RAREsim |
The afs function calculates the proportion of variants in minor allele count (MAC) bins given parameters, as described in RAREsim.
afs(alpha = NULL, beta = NULL, b = NULL, mac_bins, pop = NULL)
afs(alpha = NULL, beta = NULL, b = NULL, mac_bins, pop = NULL)
alpha |
AFS function parameter alpha, does not need to be specified if default parameters are used |
beta |
AFS function parameter beta, does not need to be specified if default parameters are used |
b |
AFS function parameter b, does not need to be specified if default parameters are used |
mac_bins |
The rare MAC bins to use, with Lower and Upper boundaries defined |
pop |
The population: AFR, EAS, NFE or SAS - specified when using default parameters |
The default parameters will be used if an ancestrial population is specified: pop = 'AFR', 'EAS', 'NFE', or 'SAS'. Otherwise, the parameters alpha, beta, and b need to be provided. Alpha, beta, and b can be estimated from target data using the *Fit_afs* function. The MAC bins should be exhaustive, non-overlapping bins of rare allele counts with column names Lower and Upper.
data frame with the MAC bins provided and proportion of variants in each bin
data('afs_afr') mac <- afs_afr[,c(1:2)] afs(mac_bins = mac, pop = 'AFR') afs(alpha = 1.594622, beta = -0.2846474, b = 0.297495, mac_bins = mac)
data('afs_afr') mac <- afs_afr[,c(1:2)] afs(mac_bins = mac, pop = 'AFR') afs(alpha = 1.594622, beta = -0.2846474, b = 0.297495, mac_bins = mac)
African/African American target data from gnomAD v2.1 (Karczewski, 2020)
data(afs_afr)
data(afs_afr)
A data frame for the Non-Finnish European AFS target data. The first two columns define the MAC Bin boundaries. The third column is the proportion of variants in that bin.
Used to fit the afs function with *fit_afs*
This function combines the Number of Variants and AFS functions to produces the expected number of variants per Kb in each MAC bin
expected_variants(Total_num_var, mac_bin_prop)
expected_variants(Total_num_var, mac_bin_prop)
Total_num_var |
estimated total number of variants in the region of interest |
mac_bin_prop |
The MAC bins to use, with three columns: Lower, Upper, and Prop. Lower and Upper define the MAC bins boundaries and Prop is the proportion of variants in each respective bin. Only define for rare variants |
data frame with the MAC bins and expected variants
data('afs_afr') mac <- afs_afr[,c(1:2)] expected_variants(Total_num_var = 19.029*nvariant(pop='AFR', N = 8128), mac_bin_prop = afs(mac_bins = mac, pop = 'AFR'))
data('afs_afr') mac <- afs_afr[,c(1:2)] expected_variants(Total_num_var = 19.029*nvariant(pop='AFR', N = 8128), mac_bin_prop = afs(mac_bins = mac, pop = 'AFR'))
This function takes AFS target data and estimates parameters for the AFS function A dataframe specifying the rare MAC bins and the observed proportion of variants is used to fit the data The proportion of rare variants (p_rv) is by default the sum of the rare allele count bins. The proportion be manually specified if desired
fit_afs(Observed_bin_props, p_rv = NULL)
fit_afs(Observed_bin_props, p_rv = NULL)
Observed_bin_props |
data frame with 3 columns, Lower, Upper (of MAC bins) and proportion of variants in that MAC bin |
p_rv |
proportion of rare variants - default is the sum of the rare MAC bin proportions |
list of parameters - alpha, beta, and b as well as fitted proportions
data("afs_afr") fit_afs(Observed_bin_props = afs_afr)
data("afs_afr") fit_afs(Observed_bin_props = afs_afr)
This function takes Number of Variants target data and estimates parameters for the Number of Variants function. A dataframe specifying the number of variants per Kb at various sample sizes is required to fit the data
fit_nvariant(Observed_variants_per_kb)
fit_nvariant(Observed_variants_per_kb)
Observed_variants_per_kb |
A data frame with the first column sample size and the second variants per Kb, both numeric |
Vector of parameters - phi and omega
data("nvariant_afr") fit_nvariant(nvariant_afr)
data("nvariant_afr") fit_nvariant(nvariant_afr)
The Number of Variants (nvariant) function calculates the number of variants per kilobase, as described in RAREsim. N is the number of individuals. The Number of Variants function changes with N. The default parameters will be used if an ancestrial population is specified: (AFR, EAS, NFE, or SAS) is specified. Otherwise, the parameters phi and omega need to be provided. Phi and omega can be estimated from target data using the Fit_nvariant function.
nvariant(phi = NULL, omega = NULL, N, pop = NULL)
nvariant(phi = NULL, omega = NULL, N, pop = NULL)
phi |
parameter phi |
omega |
parameter omega |
N |
sample size in number of individuals |
pop |
population - only needs to be specified if using default parameters |
the number of variants per kb
nvariant(N = 8128, pop = 'AFR') nvariant(phi = 0.1638108, omega = 0.6248848, N = 8128)
nvariant(N = 8128, pop = 'AFR') nvariant(phi = 0.1638108, omega = 0.6248848, N = 8128)
African target data for the Number of Variants function from gnomADv2.1 (Karczewski, 2020)
data(nvariant_afr)
data(nvariant_afr)
Number of Variants target data. First column is sample size and second column is number of variants per Kb