Package: RNAseqCovarImpute 1.11.0

Brennan Baker

RNAseqCovarImpute: Impute Covariate Data in RNA Sequencing Studies

The RNAseqCovarImpute package makes linear model analysis for RNA sequencing read counts compatible with multiple imputation (MI) of missing covariates. A major problem with implementing MI in RNA sequencing studies is that the outcome data must be included in the imputation prediction models to avoid bias. This is difficult in omics studies with high-dimensional data. The first method we developed in the RNAseqCovarImpute package surmounts the problem of high-dimensional outcome data by binning genes into smaller groups to analyze pseudo-independently. This method implements covariate MI in gene expression studies by 1) randomly binning genes into smaller groups, 2) creating M imputed datasets separately within each bin, where the imputation predictor matrix includes all covariates and the log counts per million (CPM) for the genes within each bin, 3) estimating gene expression changes using `limma::voom` followed by `limma::lmFit` functions, separately on each M imputed dataset within each gene bin, 4) un-binning the gene sets and stacking the M sets of model results before applying the `limma::squeezeVar` function to apply a variance shrinking Bayesian procedure to each M set of model results, 5) pooling the results with Rubins’ rules to produce combined coefficients, standard errors, and P-values, and 6) adjusting P-values for multiplicity to account for false discovery rate (FDR). A faster method uses principal component analysis (PCA) to avoid binning genes while still retaining outcome information in the MI models. Binning genes into smaller groups requires that the MI and limma-voom analysis is run many times (typically hundreds). The more computationally efficient MI PCA method implements covariate MI in gene expression studies by 1) performing PCA on the log CPM values for all genes using the Bioconductor `PCAtools` package, 2) creating M imputed datasets where the imputation predictor matrix includes all covariates and the optimum number of PCs to retain (e.g., based on Horn’s parallel analysis or the number of PCs that account for >80% explained variation), 3) conducting the standard limma-voom pipeline with the `voom` followed by `lmFit` followed by `eBayes` functions on each M imputed dataset, 4) pooling the results with Rubins’ rules to produce combined coefficients, standard errors, and P-values, and 5) adjusting P-values for multiplicity to account for false discovery rate (FDR).

Authors:Brennan Baker [aut, cre], Sheela Sathyanarayana [aut], Adam Szpiro [aut], James MacDonald [aut], Alison Paquette [aut]

RNAseqCovarImpute_1.11.0.tar.gz
RNAseqCovarImpute_1.11.0.zip(r-4.7)RNAseqCovarImpute_1.11.0.zip(r-4.6)RNAseqCovarImpute_1.11.0.zip(r-4.5)
RNAseqCovarImpute_1.11.0.tgz(r-4.6-any)RNAseqCovarImpute_1.11.0.tgz(r-4.5-any)
RNAseqCovarImpute_1.11.0.tar.gz(r-4.7-any)RNAseqCovarImpute_1.11.0.tar.gz(r-4.6-any)
RNAseqCovarImpute_1.11.0.tgz(r-4.6-emscripten)
manual.pdf |manual.html
DESCRIPTION |NEWS
card.svg |card.png
RNAseqCovarImpute/json (API)

# Install 'RNAseqCovarImpute' in R:
install.packages('RNAseqCovarImpute', repos = c('https://bioc.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/brennanhilton/rnaseqcovarimpute/issues

Datasets:

On BioConductor:RNAseqCovarImpute-1.11.0(bioc 3.24)RNAseqCovarImpute-1.10.0(bioc 3.23)

rnaseqgeneexpressiondifferentialexpressionsequencing

4.30 score 1 stars 6 scripts 5 exports 75 dependencies

Last updated from:3ddac52327. Checks:8 NOTE, 2 OK. Indexed: yes.

TargetResultTimeFilesSyslog
bioc-checksNOTE192
linux-devel-x86_64NOTE272
source / vignettesOK247
linux-release-x86_64NOTE279
macos-release-arm64NOTE134
macos-oldrel-arm64NOTE172
windows-develNOTE243
windows-releaseNOTE235
windows-oldrelNOTE218
wasm-releaseOK154

Exports:combine_rubinsget_gene_bin_intervalsimpute_by_gene_binlimmavoom_imputed_data_listlimmavoom_imputed_data_pca

Dependencies:backportsBHBiobaseBiocGenericsBiocParallelbitbit64bootbroomclicliprcodetoolscpp11crayondplyredgeRforcatsforeachformatRfutile.loggerfutile.optionsgenericsglmnetgluehavenhmsiteratorsjomolambda.rlatticelifecyclelimmalme4locfitmagrittrMASSMatrixmiceminqamitmlnlmenloptrnnetnumDerivordinalpanpillarpkgconfigprettyunitsprogresspurrrR6rbibutilsRcppRcppEigenRdpackreadrreformulasrlangrpartshapesnowstatmodstringistringrsurvivaltibbletidyrtidyselecttzdbucminfutf8vctrsvroomwithr

Impute Covariate Data in RNA-sequencing Studies
Introduction | Installation | Generate random data with missing covariate data | RNAseqCovarImpute Demonstration | MI PCA method | Conduct PCA | Conduct MI with mice | Conduct limma-voom analysis | Adjust for FDR | Gene binning MI method | Bin the genes into smaller groups | Make imputed data sets for each bin of genes and conduct differential expression analysis | Estimate gene expression changes using voom followed by lmFit functions, separately on each M imputed dataset within each gene bin | Apply variance shrinking Bayesian procedure, pooling results with Rubins’ rules, and FDR-adjust P-values | Session info

Last update: 2024-04-12
Started: 2023-03-09

Example Data for RNAseqCovarImpute
Generate random data | Simulate missingness in the random data | Session info

Last update: 2023-10-03
Started: 2023-03-16

Readme and manuals

Help Manual

Help pageTopics
combine_rubinscombine_rubins
Simulated datasetexample_data
Simulated counts in DGE listexample_DGE
get_gene_bin_intervalsget_gene_bin_intervals
impute_by_gene_binimpute_by_gene_bin
limmavoom_imputed_data_listlimmavoom_imputed_data_list
limmavoom_imputed_data_pcalimmavoom_imputed_data_pca