# --------------------------------------------
# CITATION file created with {cffr} R package
# See also: https://docs.ropensci.org/cffr/
# --------------------------------------------
 
cff-version: 1.2.0
message: 'To cite package "RNAseqCovarImpute" in publications use:'
type: software
license: GPL-3.0-only
title: 'RNAseqCovarImpute: Impute Covariate Data in RNA Sequencing Studies'
version: 1.3.0
abstract: The RNAseqCovarImpute package makes linear model analysis for RNA sequencing
  read counts compatible with multiple imputation (MI) of missing covariates. A major
  problem with implementing MI in RNA sequencing studies is that the outcome data
  must be included in the imputation prediction models to avoid bias. This is difficult
  in omics studies with high-dimensional data. The first method we developed in the
  RNAseqCovarImpute package surmounts the problem of high-dimensional outcome data
  by binning genes into smaller groups to analyze pseudo-independently. This method
  implements covariate MI in gene expression studies by 1) randomly binning genes
  into smaller groups, 2) creating M imputed datasets separately within each bin,
  where the imputation predictor matrix includes all covariates and the log counts
  per million (CPM) for the genes within each bin, 3) estimating gene expression changes
  using `limma::voom` followed by `limma::lmFit` functions, separately on each M imputed
  dataset within each gene bin, 4) un-binning the gene sets and stacking the M sets
  of model results before applying the `limma::squeezeVar` function to apply a variance
  shrinking Bayesian procedure to each M set of model results, 5) pooling the results
  with Rubins’ rules to produce combined coefficients, standard errors, and P-values,
  and 6) adjusting P-values for multiplicity to account for false discovery rate (FDR).
  A faster method uses principal component analysis (PCA) to avoid binning genes while
  still retaining outcome information in the MI models. Binning genes into smaller
  groups requires that the MI and limma-voom analysis is run many times (typically
  hundreds). The more computationally efficient MI PCA method implements covariate
  MI in gene expression studies by 1) performing PCA on the log CPM values for all
  genes using the Bioconductor `PCAtools` package, 2) creating M imputed datasets
  where the imputation predictor matrix includes all covariates and the optimum number
  of PCs to retain (e.g., based on Horn’s parallel analysis or the number of PCs that
  account for >80% explained variation), 3) conducting the standard limma-voom pipeline
  with the `voom` followed by `lmFit` followed by `eBayes` functions on each M imputed
  dataset, 4) pooling the results with Rubins’ rules to produce combined coefficients,
  standard errors, and P-values, and 5) adjusting P-values for multiplicity to account
  for false discovery rate (FDR).
authors:
- family-names: Baker
  given-names: Brennan
  email: brennanhilton@gmail.com
  orcid: https://orcid.org/0000-0001-5459-9141
- family-names: Sathyanarayana
  given-names: Sheela
- family-names: Szpiro
  given-names: Adam
- family-names: MacDonald
  given-names: James
- family-names: Paquette
  given-names: Alison
repository: https://bioc.r-universe.dev
repository-code: https://github.com/brennanhilton/RNAseqCovarImpute
url: https://github.com/brennanhilton/RNAseqCovarImpute
contact:
- family-names: Baker
  given-names: Brennan
  email: brennanhilton@gmail.com
  orcid: https://orcid.org/0000-0001-5459-9141