Title: | Spatial quantile normalization |
---|---|
Description: | The spqn package implements spatial quantile normalization (SpQN). This method was developed to remove a mean-correlation relationship in correlation matrices built from gene expression data. It can serve as pre-processing step prior to a co-expression analysis. |
Authors: | Yi Wang [cre, aut], Kasper Daniel Hansen [aut] |
Maintainer: | Yi Wang <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.19.0 |
Built: | 2024-10-31 05:30:36 UTC |
Source: | https://github.com/bioc/spqn |
The spqn package implements spatial quantile normalization (SpQN). This method was developed to remove a mean-correlation relationship in correlation matrices built from gene expression data. It can serve as pre-processing step prior to a co-expression analysis.
See references for details on spatial quantile normalization.
The main function is normalize_correlation
. We include a number
of plotting functions for examining the mean-correlation relationship,
see the vignette for examples.
Y Wang, SC Hicks, KD Hansen (2020). Co-expression analysis is biased by a mean-correlation relationship. bioRxiv 2020.02.13.944777. doi:10.1101/2020.02.13.944777
This method was developed to remove a mean-correlation relationship in correlation matrices built from gene expression data. It can serve as pre-processing step prior to a co-expression analysis.
normalize_correlation(cor_mat, ave_exp, ngrp, size_grp, ref_grp)
normalize_correlation(cor_mat, ave_exp, ngrp, size_grp, ref_grp)
cor_mat |
A (square and symmetrix) correlation matrix. |
ave_exp |
A vector of expression levels, same length as the
number of rows of the correlation matrix in |
ngrp |
Number of bins in each row/column to be used to partition the correlation matrix, integer. |
size_grp |
Size of the outer bins to be used to appriximate the distribution of the inner bins, in order to smooth the normalization. Note that the product of size_grp and ngrp must be equal or larger than than the row/column number of cor_mat, and there is no smoothness in the normalization when they are equal. |
ref_grp |
Location of the reference bin on the diagonal, whose distribution will be used as target distribution in the normalization, an integer. |
A normalized correlation matrix.
if(require(spqnData)){ data(gtex.4k) cor_ori <- cor(t(assay(gtex.4k))) ave_logrpkm <- rowData(gtex.4k)$ave_logrpkm normalize_correlation(cor_ori, ave_exp = ave_logrpkm, ngrp=10, size_grp=15, ref_grp=9)}
if(require(spqnData)){ data(gtex.4k) cor_ori <- cor(t(assay(gtex.4k))) ave_logrpkm <- rowData(gtex.4k)$ave_logrpkm normalize_correlation(cor_ori, ave_exp = ave_logrpkm, ngrp=10, size_grp=15, ref_grp=9)}
The get_IQR_condition_exp
function computes the IQRs of a set
of 10 by 10 same-size bins that partition the correlation matrix,
ordered according to expression level.
The plot_IQR_condition_exp
function plots the IQR for each bin
among a set of 10 by 10 same-size bins that partition the correlation
matrix, with IQR denoted by the width of boxes in the plot.
get_IQR_condition_exp(cor_mat, ave_exp) plot_IQR_condition_exp(IQR_list)
get_IQR_condition_exp(cor_mat, ave_exp) plot_IQR_condition_exp(IQR_list)
cor_mat |
correlation matrix, generated by gene expression matrix, with genes sorted by average expression levels. |
ave_exp |
vector, average expression level of each gene for the normalized gene expression matrix. |
IQR_list |
List, output of |
A plot with boxes that shows the IQR of each bin
The mnemonic for condition_exp
is ‘conditional on
expression’.
if(require(spqnData)) { data(gtex.4k) cor_mat <- cor(t(assay(gtex.4k))) ave_logrpkm <- rowData(gtex.4k)$ave_logrpkm IQR_list <- get_IQR_condition_exp(cor_mat, ave_exp = ave_logrpkm) plot_IQR_condition_exp(IQR_list) }
if(require(spqnData)) { data(gtex.4k) cor_mat <- cor(t(assay(gtex.4k))) ave_logrpkm <- rowData(gtex.4k)$ave_logrpkm IQR_list <- get_IQR_condition_exp(cor_mat, ave_exp = ave_logrpkm) plot_IQR_condition_exp(IQR_list) }
This function allows users to visualize the distributions of (assumed) signal and background, conditional on expression levels. The predicted signals are defined by the 0.1% highest correlations in each bin.
plot_signal_condition_exp(cor_mat, ave_exp, signal)
plot_signal_condition_exp(cor_mat, ave_exp, signal)
cor_mat |
Matrix, correlation matrix, generated by gene expression matrix |
ave_exp |
Vector, average expression level of each gene for the normalized expression matrix |
signal |
a value between 0 and 1 giving the fraction of
correlations which should be considered signal. We often use a value
of |
Invoked for the side effect of producing a plot.
The mnemonic for condition_exp
is ‘conditional on
expression’.
if(require(spqnData)) { data(gtex.4k) cor_mat <- cor(t(assay(gtex.4k))) ave_logrpkm <- rowData(gtex.4k)$ave_logrpkm plot_signal_condition_exp(cor_mat, ave_exp=ave_logrpkm, signal=0.05)}
if(require(spqnData)) { data(gtex.4k) cor_mat <- cor(t(assay(gtex.4k))) ave_logrpkm <- rowData(gtex.4k)$ave_logrpkm plot_signal_condition_exp(cor_mat, ave_exp=ave_logrpkm, signal=0.05)}
We partition the correlation matrix into 10x10 bins of equal size, with genes ordered according to expression level. As reference bin, we choose the (9,9) bin (ie. the almost-highest expressed genes). We then make a QQ-plot of the (i,j)'th submatrix vs. the (9,9) submatrix. See the SpQN paper for detail on these choices.
qqplot_condition_exp(cor_mat,ave_exp, i,j)
qqplot_condition_exp(cor_mat,ave_exp, i,j)
cor_mat |
Matrix, correlation matrix, generated by gene expression matrix. |
ave_exp |
Vector, average expression level of each gene for the normalized expression matrix. |
i |
Integer, row number of the submatrix (see details). |
j |
Integer, column number of the submatrix (see details). |
Invoked for the side effect of producing a plot.
The mnemonic for condition_exp
is ‘conditional on
expression’.
if(require(spqnData)) { data(gtex.4k) cor_mat <- cor(t(assay(gtex.4k))) ave_logrpkm <- rowData(gtex.4k)$ave_logrpkm qqplot_condition_exp(cor_mat, ave_exp=ave_logrpkm, 1, 1) }
if(require(spqnData)) { data(gtex.4k) cor_mat <- cor(t(assay(gtex.4k))) ave_logrpkm <- rowData(gtex.4k)$ave_logrpkm qqplot_condition_exp(cor_mat, ave_exp=ave_logrpkm, 1, 1) }