Package 'mbQTL' reference manual

Title:	mbQTL: A package for SNP-Taxa mGWAS analysis
Description:	mbQTL is a statistical R package for simultaneous 16srRNA,16srDNA (microbial) and variant, SNP, SNV (host) relationship, correlation, regression studies. We apply linear, logistic and correlation based statistics to identify the relationships of taxa, genus, species and variant, SNP, SNV in the infected host. We produce various statistical significance measures such as P values, FDR, BC and probability estimation to show significance of these relationships. Further we provide various visualization function for ease and clarification of the results of these analysis. The package is compatible with dataframe, MRexperiment and text formats.
Authors:	Mercedeh Movassagh [aut, cre] , Steven Schiff [aut], Joseph N Paulson [aut]
Maintainer:	Mercedeh Movassagh <[email protected]>
License:	MIT + file LICENSE
Version:	1.7.0
Built:	2025-03-29 05:02:53 UTC
Source:	https://github.com/bioc/mbQTL

title `mbQTL` is a package for microbial QTL/GWAS Analysis

Description

This package provides statistical methods for identifying significant relationships between microbial/taxa and genetic SNP signatures. We use three models 1) linear regression between all taxa-snp. Main function is linearTaxaSnp(). 2) Correlation analysis between taxa-snp across all taxa and snps. Main function is taxaSnpCor() and 3) Logistic regression analysis between each taxa and each snp simultaneously or for a specific cases. Main function is logRegSnpsTaxa().

Author(s)

Maintainer: Mercedeh Movassagh [email protected] (ORCID)

Authors:

Steven Schiff
Joseph N Paulson

`allToAllProduct` creates a dataframe of snp and taxa correlations

Description

This internal function takes the original snp dataframe and returns a long parsed snp dataframe

Usage

allToAllProduct(SnpFile, microbeAbund, rsID = NULL)
allToAllProduct(SnpFile, microbeAbund, rsID = NULL)

Arguments

`SnpFile`	the snp file (rownames is sample number and colnames is the snps)
`microbeAbund`	he taxa abundance dataframe (rownames sample names and colnames taxa Genus/species/family)
`rsID`	Default is NULL and will run across the who dataset unless specific rsID/SNP/chr_region is specified

Value

A dataframe of correlations between snps and taxa

Examples

data(microbeAbund)
data(SnpFile)
x <- allToAllProduct(SnpFile, microbeAbund, "chr1.171282963_T")

data(microbeAbund)
data(SnpFile)
x <- allToAllProduct(SnpFile, microbeAbund, "chr1.171282963_T")

`binarizeMicrobe` binarizes microbe abundace file based on user's cutoff

Description

This function creates a dataframe output produces a formatted dataframe prepared.

Usage

binarizeMicrobe(microbeAbund, cutoff = NULL, selectmicrobe = NULL)
binarizeMicrobe(microbeAbund, cutoff = NULL, selectmicrobe = NULL)

Arguments

`microbeAbund`	the taxa abundance dataframe (rownames sample names and colnames taxa Genus/species/family)
`cutoff`	cutoff at which the user chose to call taxa positive or negative across samples (should be a numeric value for normalized or count values).
`selectmicrobe`	default is and all taxa are considered at the same time, if the user is interested in a specific pathogen use name of the pathogen for example "Haemophilus".

Value

A data frame of microbial abundance.

`coringTaxa` creates correlation dataframe for taxa

Description

This function creates an output correlation data frame for all microbial taxa (or other organisms such as viral or parasitic taxa)

Usage

coringTaxa(microbeAbund)
coringTaxa(microbeAbund)

Arguments

microbeAbund

the taxa abundance dataframe (rownames sample names and colnames taxa Genus/species/family)

Value

A data frame of correlations between taxa

Examples

data(microbeAbund)
x <- coringTaxa(microbeAbund)

data(microbeAbund)
x <- coringTaxa(microbeAbund)

`mbQTL` "CovFile"

Description

The "CovFile" is the covariate file for linear regression option of taxa and snp association. The covariance file is generated randomly by assigning sex and site of collection to the samples.) rownames are covariate and colnames samples.

histPvalueLm a histogram of Taxa and SNP linear regression analysis. This function creates a histogram object of all SNPs with all taxa Linear regression analysis p values.

Description

histPvalueLm a histogram of Taxa and SNP linear regression analysis. This function creates a histogram object of all SNPs with all taxa Linear regression analysis p values.

Usage

histPvalueLm(LinearAnalysisTaxaSNP)
histPvalueLm(LinearAnalysisTaxaSNP)

Arguments

LinearAnalysisTaxaSNP

the data frame result created from the linearTaxaSnp() function.

Value

A histogram object of p values observed from taxa and SNP Linear Regression analysis.

Examples

data(microbeAbund)
data(microbeAbund)
data(SnpFile)
data(CovFile)
LinearAnalysisTaxaSNPFile <- linearTaxaSnp(microbeAbund, SnpFile, Covariate = CovFile)
x <- histPvalueLm(LinearAnalysisTaxaSNPFile)

data(microbeAbund)
data(microbeAbund)
data(SnpFile)
data(CovFile)
LinearAnalysisTaxaSNPFile <- linearTaxaSnp(microbeAbund, SnpFile, Covariate = CovFile)
x <- histPvalueLm(LinearAnalysisTaxaSNPFile)

linearTaxaSnp Performs linear regression analysis between taxa and SNPs and returns concordance statistics

Description

This function creates a dataframe output from the results all snps with all taxa linear regression analysis of all snps in the dataset. The result is a dataframe with P values and FDRs of all regressions. MatrixeQTL core functions are utilized to achieve this. Note the main functions used are Matrix_eQTL_engine() assuming linear regression with or without a covariate file.

Usage

linearTaxaSnp(microbeAbund, SnpFile, Covariate = NULL)
linearTaxaSnp(microbeAbund, SnpFile, Covariate = NULL)

Arguments

`microbeAbund`	the taxa abundance dataframe (rownames sample names and colnames taxa Genus/species/family)
`SnpFile`	the snp dataframe (values 0,1,2 indicating zygosity), rownames sample names and colnames snp names.
`Covariate`	default is NULL, hence assumed non-existent. If covariates are available they need to be formatted in the CovFile format, that is colnames are sample numbers matching samples in the microbe abundance and snp file and row names are the co-variates names (such as sex, disease etc).

Value

A data frame which is a result of Linear Regression of all snp, taxa relationships, with P values and P value corrected values.

Examples

data(microbeAbund)
data(SnpFile)
data(CovFile)
x <- linearTaxaSnp(microbeAbund, SnpFile, Covariate = CovFile)

data(microbeAbund)
data(SnpFile)
data(CovFile)
x <- linearTaxaSnp(microbeAbund, SnpFile, Covariate = CovFile)

logitPlotSnpTaxa produces bar plots for counts of ref vs alt vs het allells for particular rsID taxa combinations

Description

This function creates a dataframe output produces a formatted dataframe prepared.

Usage

logitPlotSnpTaxa(
  microbeAbund,
  SnpFile,
  selectmicrobe = NULL,
  rsID,
  ref = NULL,
  alt = NULL,
  het = NULL,
  color = NULL,
  cutoff = NULL
)
logitPlotSnpTaxa(
  microbeAbund,
  SnpFile,
  selectmicrobe = NULL,
  rsID,
  ref = NULL,
  alt = NULL,
  het = NULL,
  color = NULL,
  cutoff = NULL
)

Arguments

`microbeAbund`	original microbe abundance file (colnames microbe, rownames= sample IDs)
`SnpFile`	original snp file with (0,1,2 values for ref, het, alt genotypes), colnames SNP names, rownames, sample IDs.
`selectmicrobe`	name of the microbe of interest (for example individual significant microbes associate with a snp).
`rsID`	name of the snp of interest (for example individual significant snps associated with a microbe)
`ref`	the name of reference genotype for example "GG"
`alt`	the name of snp (variant) genotype for example "AA"
`het`	the name of hetrozygote genotype for example "GA"
`color`	the default is NULL and the color is set to c("#ffaa1e", "#87365b").
`cutoff`	cutoff at which we call microbe present or absent

Value

A bar ggplot comparing the counts of ref vs alt vs het genotype

Examples

data(microbeAbund)
data(SnpFile)
x <- logitPlotSnpTaxa(microbeAbund, SnpFile,
  selectmicrobe = "Neisseria", rsID = "chr2.241072116_A",
  ref = NULL, alt = NULL, het = NULL, color = NULL, cutoff = NULL
)
data(microbeAbund)
data(SnpFile)
x <- logitPlotSnpTaxa(microbeAbund, SnpFile,
  selectmicrobe = "Neisseria", rsID = "chr2.241072116_A",
  ref = NULL, alt = NULL, het = NULL, color = NULL, cutoff = NULL
)

`logRegSnpsTaxa` Performs logistic regression analysis between taxa and SNPs and returns concordance statistics

Description

This function creates a dataframe output from the results of either a unique taxa and all snps or all taxa and all snps in the dataset. The result is a dataframe with P values and FDRs of all regressions.

Usage

logRegSnpsTaxa(microbeAbund, SnpFile, cutoff = NULL, selectmicrobe = NULL)
logRegSnpsTaxa(microbeAbund, SnpFile, cutoff = NULL, selectmicrobe = NULL)

Arguments

`microbeAbund`	the taxa abundance dataframe (rownames sample names and colnames taxa Genus/species/family)
`SnpFile`	the snp dataframe (values 0,1,2 indicating zygosity), rownames sample names and colnames snp names.
`cutoff`	default is NULL, hence anything above cutoff is considered positive, the cutoff at which the specific or all taxa are considered positive for the pathogen (1 indicates positive and 0 negative).
`selectmicrobe`	default is and all taxa are considered at the same time, if the user is interested in a specific pathogen use name of the pathogen for example "Haemophilus".

Value

A data frame which is a result of Logistic regression products of individual snp, taxa relationships, with P values and P value corrected values (FDR, Bonferroni).

Examples

data(microbeAbund)
data(SnpFile)
x <- logRegSnpsTaxa(microbeAbund, SnpFile, selectmicrobe = c("Haemophilus"))

data(microbeAbund)
data(SnpFile)
x <- logRegSnpsTaxa(microbeAbund, SnpFile, selectmicrobe = c("Haemophilus"))

`mbQtlCorHeatmap` for making heatmap for snp, taxa rho values

Description

This function produces a log heatmap +1 of the correlation rho values across snp, taxa datasets

Usage

mbQtlCorHeatmap(final_var_long, labels_col = NULL, ...)
mbQtlCorHeatmap(final_var_long, labels_col = NULL, ...)

Arguments

`final_var_long`	the long data frame of rho values created by the `taxaSnpCor()` function.
`labels_col`	set to NULL ass default if TRUE, labels will appear on the heatmap.
`...`	all other parameters for pheatmap.

Value

A data frame of correlations between taxa

Examples


data(microbeAbund)
data(SnpFile)
for_all_rsids <- allToAllProduct(SnpFile, microbeAbund)
correlationMicrobes <- coringTaxa(microbeAbund)
taxaSnpCor(for_all_rsids, correlationMicrobes)
final_var_long <- taxaSnpCor(for_all_rsids, correlationMicrobes, probs = c(0.0001, 0.9999))
x <- mbQtlCorHeatmap(final_var_long)

data(microbeAbund)
data(SnpFile)
for_all_rsids <- allToAllProduct(SnpFile, microbeAbund)
correlationMicrobes <- coringTaxa(microbeAbund)
taxaSnpCor(for_all_rsids, correlationMicrobes)
final_var_long <- taxaSnpCor(for_all_rsids, correlationMicrobes, probs = c(0.0001, 0.9999))
x <- mbQtlCorHeatmap(final_var_long)

`mbQTL` "metagenomeSeqObj" "MetagenomSeqObj" is an `MRexperiment` object format of the "microbeAbund" file.

Description

mbQTL "metagenomeSeqObj"

"MetagenomSeqObj" is an MRexperiment object format of the "microbeAbund" file.

Written by Mercedeh Movassagh [email protected], January 2023

metagenomeSeqToMbqtl Converts metagenomeSeq obj to compatible taxa dataframe

Description

This function takes and MRexperiement class object transforms it and makes the result dataframe compatible with mbQTL taxa input file

Usage

metagenomeSeqToMbqtl(meta_glom, norm, log, aggregate_taxa = NULL)
metagenomeSeqToMbqtl(meta_glom, norm, log, aggregate_taxa = NULL)

Arguments

`meta_glom`	`MRexperiement` class obj from `metagenomeSeq` package.
`norm`	A logical indicating whether or not to return normalized counts.
`log`	TRUE/FALSE whether or not to log2 transform scale.
`aggregate_taxa`	it is recommended that the normalization occurs at taxa level (default NULL) however, if the user chooses to aggregate on the phyla/family/Genus or Species level before normalization they have the option.

Value

A data frame of normalized/not normalized counts compatible with mbQTL.

Examples

data(metagenomeSeqObj)
x <- metagenomeSeqToMbqtl(metagenomeSeqObj, norm = TRUE, log = TRUE, aggregate_taxa = NULL)
data(metagenomeSeqObj)
x <- metagenomeSeqToMbqtl(metagenomeSeqObj, norm = TRUE, log = TRUE, aggregate_taxa = NULL)

`mbQTL` "microbiomeAbund" File

Description

This is the microbial Abudnance file generated from 16s it is either this file or the "metaGenomeSeqObj".The "microbiomeAbund" file is a randomly generated file format for total microbe presence (number of reads)(parasite/viral transcripts) for specific species.This could be generated from select taxa results from biom() or MRexperiment objects as long as the samples are colnames and taxa are rownames.

`prepareCorData` prpares and joins snp-taxa and taxa-taxa correlation file.

Description

This function creates a dataframe output produces a formatted dataframe prepared.

Usage

prepareCorData(microbeAbund, SnpFile, cutoff = NULL, selectmicrobe = NULL)
prepareCorData(microbeAbund, SnpFile, cutoff = NULL, selectmicrobe = NULL)

Arguments

`microbeAbund`	the taxa abundance dataframe (rownames sample names and colnames taxa Genus/species/family)
`SnpFile`	the snp dataframe (values 0,1,2 indicating zygosity), rownames sample names and colnames snp names.
`cutoff`	default is NULL, hence anything above cutoff is considered positive, the cutoff at which the specific or all taxa are considered positive for the pathogen (1 indicates positive and 0 negative).
`selectmicrobe`	default is and all taxa are considered at the same time, if the user is interested in a specific pathogen use name of the pathogen for example "Haemophilus".

Value

A data frame which is a result of Logistic regression products of individual snp, taxa relationships, with P values and P value corrected values.

qqPlotLm creates QQ-Plot of all SNPs with all taxa Linear regression analysis This function creates QQ-Plot object of all SNPs with all taxa Linear regression analysis of expected versus observed P values

Description

qqPlotLm creates QQ-Plot of all SNPs with all taxa Linear regression analysis This function creates QQ-Plot object of all SNPs with all taxa Linear regression analysis of expected versus observed P values

Usage

qqPlotLm(microbeAbund, SnpFile, Covariate = NULL)
qqPlotLm(microbeAbund, SnpFile, Covariate = NULL)

Arguments

`microbeAbund`	the taxa abundance dataframe (rownames sample names and colnames taxa Genus/species/family)
`SnpFile`	the snp dataframe (values 0,1,2 indicating zygosity), rownames sample names and colnames snp names.
`Covariate`	default is NULL, hence assumed non-existent. If covariates are available they need to be formatted in the CovFile format, that is colnames are sample numbers matching samples in the microbe abundance and snp file and row names are the covariates names (such as sex, disease etc).

Value

A QQplot object of expected versus obsesrved taxa and SNP Linear Regression analysis

Examples

data(microbeAbund)
data(SnpFile)
data(CovFile)
x <- qqPlotLm(microbeAbund, SnpFile, Covariate = CovFile)

data(microbeAbund)
data(SnpFile)
data(CovFile)
x <- qqPlotLm(microbeAbund, SnpFile, Covariate = CovFile)

`RegSnp` creates a dataframe of parsed long snp files

Description

This internal function takes the orginal snp dataframe and retruns a long parsed snp dataframe

Usage

RegSnp(SnpFile, microbeAbund)
RegSnp(SnpFile, microbeAbund)

Arguments

`SnpFile`	the snp file (rownames is sample number and colnames is the snps)
`microbeAbund`	the microbial abundance file (rownames is sample number and colnames is the microbe)

Value

A long parsed datframe of snps

`mbQTL` "SnpFile"

Description

The "SnpFile" is randomly generated from GATK (Van der Auwera GA & O'Connor BD. (2020). Genomics in the Cloud: Using Docker, GATK, and WDL in Terra (1st Edition). O'Reilly Media) snp calls followed by plink (Purcell S, et al. (2007) PLINK: a toolset for whole-genome association and population-based linkage analysis. American Journal of Human Genetics) processing and it needs to be in (0,1,2) format representing the zygosity of the snps.

`taxaSnpCor` for estimation of the rho value between snp, taxa correlations across datasets

Description

This function produces a log heatmap +1 of the correlation rho values across snp, taxa dataframe.

Usage

taxaSnpCor(for_all_rsids, correlationMicrobes, probs = NULL)
taxaSnpCor(for_all_rsids, correlationMicrobes, probs = NULL)

Arguments

`for_all_rsids`	A dataframe result of correlation analysis between the snps and taxa dataframe, an output of `allToAllProduct()` function.
`correlationMicrobes`	A dataframe of correlation between `coringTaxa()` function.
`probs`	Default is NULL if other that all rho values are wanted the value can be subseted using c(`x`,`y`).

Value

A data frame of correlations between taxa

Examples

data(microbeAbund)
data(SnpFile)

for_all_rsids <- allToAllProduct(SnpFile, microbeAbund)
correlationMicrobes <- coringTaxa(microbeAbund)
x <- taxaSnpCor(for_all_rsids, correlationMicrobes)

data(microbeAbund)
data(SnpFile)

for_all_rsids <- allToAllProduct(SnpFile, microbeAbund)
correlationMicrobes <- coringTaxa(microbeAbund)
x <- taxaSnpCor(for_all_rsids, correlationMicrobes)

Package 'mbQTL'

Help Index

title mbQTL is a package for microbial QTL/GWAS Analysis

Description

Author(s)

See Also

allToAllProduct creates a dataframe of snp and taxa correlations

Description

Usage

Arguments

Value

Examples

binarizeMicrobe binarizes microbe abundace file based on user's cutoff

Description

Usage

Arguments

Value

coringTaxa creates correlation dataframe for taxa

Description

Usage

Arguments

Value

Examples

mbQTL "CovFile"

Description

histPvalueLm a histogram of Taxa and SNP linear regression analysis. This function creates a histogram object of all SNPs with all taxa Linear regression analysis p values.

Description

Usage

Arguments

Value

Examples

linearTaxaSnp Performs linear regression analysis between taxa and SNPs and returns concordance statistics

Description

Usage

Arguments

Value

Examples

logitPlotSnpTaxa produces bar plots for counts of ref vs alt vs het allells for particular rsID taxa combinations

Description

Usage

Arguments

Value

Examples

logRegSnpsTaxa Performs logistic regression analysis between taxa and SNPs and returns concordance statistics

Description

Usage

Arguments

Value

Examples

mbQtlCorHeatmap for making heatmap for snp, taxa rho values

Description

Usage

Arguments

Value

Examples

mbQTL "metagenomeSeqObj" "MetagenomSeqObj" is an MRexperiment object format of the "microbeAbund" file.

Description

Written by Mercedeh Movassagh [email protected], January 2023

Description

Usage

Arguments

Value

Examples

mbQTL "microbiomeAbund" File

Description

prepareCorData prpares and joins snp-taxa and taxa-taxa correlation file.

Description

Usage

Arguments

Value

qqPlotLm creates QQ-Plot of all SNPs with all taxa Linear regression analysis This function creates QQ-Plot object of all SNPs with all taxa Linear regression analysis of expected versus observed P values

Description

Usage

Arguments

Value

Examples

RegSnp creates a dataframe of parsed long snp files

Description

Usage

Arguments

title `mbQTL` is a package for microbial QTL/GWAS Analysis

`allToAllProduct` creates a dataframe of snp and taxa correlations

`binarizeMicrobe` binarizes microbe abundace file based on user's cutoff

`coringTaxa` creates correlation dataframe for taxa

`mbQTL` "CovFile"

`logRegSnpsTaxa` Performs logistic regression analysis between taxa and SNPs and returns concordance statistics

`mbQtlCorHeatmap` for making heatmap for snp, taxa rho values

`mbQTL` "metagenomeSeqObj" "MetagenomSeqObj" is an `MRexperiment` object format of the "microbeAbund" file.

`mbQTL` "microbiomeAbund" File

`prepareCorData` prpares and joins snp-taxa and taxa-taxa correlation file.

`RegSnp` creates a dataframe of parsed long snp files

`mbQTL` "SnpFile"

`taxaSnpCor` for estimation of the rho value between snp, taxa correlations across datasets