Package: RNAseqCovarImpute 1.5.0

Brennan Baker

RNAseqCovarImpute: Impute Covariate Data in RNA Sequencing Studies

The RNAseqCovarImpute package makes linear model analysis for RNA sequencing read counts compatible with multiple imputation (MI) of missing covariates. A major problem with implementing MI in RNA sequencing studies is that the outcome data must be included in the imputation prediction models to avoid bias. This is difficult in omics studies with high-dimensional data. The first method we developed in the RNAseqCovarImpute package surmounts the problem of high-dimensional outcome data by binning genes into smaller groups to analyze pseudo-independently. This method implements covariate MI in gene expression studies by 1) randomly binning genes into smaller groups, 2) creating M imputed datasets separately within each bin, where the imputation predictor matrix includes all covariates and the log counts per million (CPM) for the genes within each bin, 3) estimating gene expression changes using `limma::voom` followed by `limma::lmFit` functions, separately on each M imputed dataset within each gene bin, 4) un-binning the gene sets and stacking the M sets of model results before applying the `limma::squeezeVar` function to apply a variance shrinking Bayesian procedure to each M set of model results, 5) pooling the results with Rubins’ rules to produce combined coefficients, standard errors, and P-values, and 6) adjusting P-values for multiplicity to account for false discovery rate (FDR). A faster method uses principal component analysis (PCA) to avoid binning genes while still retaining outcome information in the MI models. Binning genes into smaller groups requires that the MI and limma-voom analysis is run many times (typically hundreds). The more computationally efficient MI PCA method implements covariate MI in gene expression studies by 1) performing PCA on the log CPM values for all genes using the Bioconductor `PCAtools` package, 2) creating M imputed datasets where the imputation predictor matrix includes all covariates and the optimum number of PCs to retain (e.g., based on Horn’s parallel analysis or the number of PCs that account for >80% explained variation), 3) conducting the standard limma-voom pipeline with the `voom` followed by `lmFit` followed by `eBayes` functions on each M imputed dataset, 4) pooling the results with Rubins’ rules to produce combined coefficients, standard errors, and P-values, and 5) adjusting P-values for multiplicity to account for false discovery rate (FDR).

Authors:Brennan Baker [aut, cre], Sheela Sathyanarayana [aut], Adam Szpiro [aut], James MacDonald [aut], Alison Paquette [aut]

# Install 'RNAseqCovarImpute' in R:

install.packages('RNAseqCovarImpute', repos = c('https://bioc.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/brennanhilton/rnaseqcovarimpute/issues

Datasets:

example_DGE - Simulated counts in DGE list
example_data - Simulated dataset

On BioConductor:RNAseqCovarImpute-1.5.0(bioc 3.21)RNAseqCovarImpute-1.4.0(bioc 3.20)

rnaseq geneexpression differentialexpression sequencing

4.48 score 1 stars 6 scripts 171 downloads 5 exports 76 dependencies

Last updated 5 months agofrom:f490cddb97. Checks:1 OK, 8 NOTE. Indexed: yes.

Target	Result	Latest binary
Doc / Vignettes	OK	Mar 30 2025
R-4.5-win	NOTE	Mar 30 2025
R-4.5-mac	NOTE	Mar 30 2025
R-4.5-linux	NOTE	Mar 30 2025
R-4.4-win	NOTE	Mar 30 2025
R-4.4-mac	NOTE	Mar 30 2025
R-4.4-linux	NOTE	Mar 30 2025
R-4.3-win	NOTE	Mar 30 2025
R-4.3-mac	NOTE	Mar 30 2025

Exports:combine_rubins get_gene_bin_intervals impute_by_gene_bin limmavoom_imputed_data_list limmavoom_imputed_data_pca

Dependencies:backports BH Biobase BiocGenerics BiocParallel bit bit64 boot broom cli clipr codetools cpp11 crayon dplyr edgeR fansi forcats foreach formatR futile.logger futile.options generics glmnet glue haven hms iterators jomo lambda.r lattice lifecycle limma lme4 locfit magrittr MASS Matrix mice minqa mitml nlme nloptr nnet numDeriv ordinal pan pillar pkgconfig prettyunits progress purrr R6 rbibutils Rcpp RcppEigen Rdpack readr reformulas rlang rpart shape snow statmod stringi stringr survival tibble tidyr tidyselect tzdb ucminf utf8 vctrs vroom withr

Example Data for RNAseqCovarImpute

Brennan H Baker

Rendered fromExample_Data_for_RNAseqCovarImpute.Rmdusingknitr::rmarkdownon Mar 30 2025.

Last update: 2023-10-03
Started: 2023-03-16

Impute Covariate Data in RNA-sequencing Studies

Brennan H Baker

Rendered fromImpute_Covariate_Data_in_RNA_sequencing_Studies.Rmdusingknitr::rmarkdownon Mar 30 2025.

Last update: 2024-04-12
Started: 2023-03-09

Help page	Topics
combine_rubins	combine_rubins
Simulated dataset	example_data
Simulated counts in DGE list	example_DGE
get_gene_bin_intervals	get_gene_bin_intervals
impute_by_gene_bin	impute_by_gene_bin
limmavoom_imputed_data_list	limmavoom_imputed_data_list
limmavoom_imputed_data_pca	limmavoom_imputed_data_pca

Package: RNAseqCovarImpute 1.5.0

RNAseqCovarImpute: Impute Covariate Data in RNA Sequencing Studies

Example Data for RNAseqCovarImpute

Impute Covariate Data in RNA-sequencing Studies

Citation

Development and contributors

Readme and manuals

Help Manual

Usage by other packages (reverse dependencies)