Package 'CNVRanger' reference manual

Title:	Summarization and expression/phenotype association of CNV ranges
Description:	The CNVRanger package implements a comprehensive tool suite for CNV analysis. This includes functionality for summarizing individual CNV calls across a population, assessing overlap with functional genomic regions, and association analysis with gene expression and quantitative phenotypes.
Authors:	Ludwig Geistlinger [aut, cre], Vinicius Henrique da Silva [aut], Marcel Ramos [ctb], Levi Waldron [ctb]
Maintainer:	Ludwig Geistlinger <[email protected]>
License:	Artistic-2.0
Version:	1.23.0
Built:	2025-03-27 05:56:25 UTC
Source:	https://github.com/bioc/CNVRanger

CNV-expression association analysis

Description

Testing CNV regions for effects on the expression level of genes in defined genomic windows.

Usage

cnvEQTL(
  cnvrs,
  calls,
  rcounts,
  data,
  window = "1Mbp",
  multi.calls = .largest,
  min.samples = 10,
  min.state.freq = 3,
  de.method = c("edgeR", "limma"),
  padj.method = "BH",
  filter.by.expr = TRUE,
  verbose = FALSE
)
cnvEQTL(
  cnvrs,
  calls,
  rcounts,
  data,
  window = "1Mbp",
  multi.calls = .largest,
  min.samples = 10,
  min.state.freq = 3,
  de.method = c("edgeR", "limma"),
  padj.method = "BH",
  filter.by.expr = TRUE,
  verbose = FALSE
)

Arguments

`cnvrs`	A `GRanges` or character object containing the summarized CNV regions as e.g. obtained with `populationRanges`. Alternatively, the assay name if the 'data' argument is provided.
`calls`	Either a `GRangesList` or `RaggedExperiment` storing the individual CNV calls for each sample. Alternatively, the assay name if 'data' is provided.
`rcounts`	A `RangedSummarizedExperiment` or character name storing either the raw RNA-seq read counts in a rectangular fashion (genes x samples). Alternatively, the assay name if 'data' is provided.
`data`	(optional) A `MultiAssayExperiment` object with 'cnvrs', 'calls', and 'rcounts' arguments corresponding to assay names.
`window`	Numeric or Character. Size of the genomic window in base pairs by which each CNV region is extended up- and downstream. This determines which genes are tested for each CNV region. Character notation is supported for convenience such as "100kbp" (same as 100000) or "1Mbp" (same as 1000000). Defaults to `"1Mbp"`. Can also be set to `NULL` to test against all genes included in the analysis.
`multi.calls`	A function. Determines how to summarize the CN state in a CNV region when there are multiple (potentially conflicting) calls for one sample in that region. Defaults to `.largest`, which assigns the CN state of the call that covers the largest part of the CNV region tested. A user-defined function that is passed on to `qreduceAssay` can also be provided for customized behavior.
`min.samples`	Integer. Minimum number of samples with at least one call overlapping the CNV region tested. Defaults to 10. See details.
`min.state.freq`	Integer. Minimun number of samples in each CNV state being tested. Defaults to 3.
`de.method`	Character. Differential expression method. Defaults to `"edgeR"`.
`padj.method`	Character. Method for adjusting p-values to multiple testing. For available methods see the man page of the function `p.adjust`. Defaults to `"BH"`.
`filter.by.expr`	Logical. Include only genes with sufficiently large counts in the DE analysis? If TRUE, excludes genes not satisfying a minimum number of read counts across samples using the `filterByExpr` function from the edgeR package. Defaults to TRUE.
`verbose`	Logical. Display progress messages? Defaults to `FALSE`.

Details

Association testing between CNV regions and RNA-seq read counts is carried out using edgeR, which applies generalized linear models (GLMs) based on the negative-binomial distribution while incorporating normalization factors for different library sizes.

In the case of only one CN state deviating from 2n for a CNV region under investigation, this reduces to the classical 2-group comparison. For more than two states (e.g. 0n, 1n, 2n), edgeR’s ANOVA-like test is applied to test all deviating groups for significant expression differences relative to 2n.

To avoid artificial effects due to low expression of a gene or insufficient sample size in deviating groups, it is typically recommended to exclude from the analysis (i) genes with fewer than r reads per million reads mapped (cpm, counts per million) in the maximally expressed sample group, and (ii) CNV regions with fewer than s samples in a group deviating from 2n. Use the min.cpm and min.samples arguments, respectively.

When testing local effects (adjacent or coinciding genes of a CNV region), suitable thresholds for candidate discovery are r = 3, s = 4, and a nominal significance level of 0.05; as such effects have a clear biological indication and the number of genes tested is typically small.

For distal effects (i.e. when testing genes far away from a CNV region) more stringent thresholds such as r = 20 and s = 10 for distal effects in conjunction with multiple testing correction using a conservative adjusted significance level such as 0.01 is typically recommended (due to power considerations and to avoid detection of spurious effects).

Value

A DataFrame containing measures of association for each CNV region and each gene tested in the genomic window around the CNV region.

Author(s)

Ludwig Geistlinger

References

Geistlinger et al. (2018) Widespread modulation of gene expression by copy number variation in skeletal muscle. Sci Rep, 8(1):1399.

Examples


# (1) CNV calls
states <- sample(c(0,1,3,4), 17, replace=TRUE)
calls <- GRangesList(
     sample1 = GRanges( c("chr1:1-10", "chr2:15-18", "chr2:25-34"), state=states[1:3]),
     sample2 = GRanges( c("chr1:1-10", "chr2:11-18" , "chr2:25-36"), state=states[4:6] ),
     sample3 = GRanges( c("chr1:2-11", "chr2:14-18", "chr2:26-36"), state=states[7:9] ),
     sample4 = GRanges( c("chr1:1-12", "chr2:18-35" ), state=states[10:11] ),
     sample5 = GRanges( c("chr1:1-12", "chr2:11-17" , "chr2:26-34"), state=states[12:14] ) ,
     sample6 = GRanges( c("chr1:1-12", "chr2:12-18" , "chr2:25-35"), state=states[15:17] )
)

# (2) summarized CNV regions
cnvrs <- populationRanges(calls, density=0.1)

# (3) RNA-seq read counts
genes <- GRanges(c("chr1:2-9", "chr1:100-150", "chr1:200-300",
                   "chr2:16-17", "chr2:100-150", "chr2:200-300", "chr2:26-33"))
y <- matrix(rnbinom(42,size=1,mu=10),7,6)
names(genes) <- rownames(y) <- paste0("gene", 1:7)
colnames(y) <- paste0("sample", 1:6)

library(SummarizedExperiment)
rse <- SummarizedExperiment(assays=list(counts=y), rowRanges=granges(genes))

# (4) perform the association analysis
res <- cnvEQTL(cnvrs, calls, rse, 
               min.samples = 1, min.state.freq = 1, filter.by.expr = FALSE)

# (1) CNV calls
states <- sample(c(0,1,3,4), 17, replace=TRUE)
calls <- GRangesList(
     sample1 = GRanges( c("chr1:1-10", "chr2:15-18", "chr2:25-34"), state=states[1:3]),
     sample2 = GRanges( c("chr1:1-10", "chr2:11-18" , "chr2:25-36"), state=states[4:6] ),
     sample3 = GRanges( c("chr1:2-11", "chr2:14-18", "chr2:26-36"), state=states[7:9] ),
     sample4 = GRanges( c("chr1:1-12", "chr2:18-35" ), state=states[10:11] ),
     sample5 = GRanges( c("chr1:1-12", "chr2:11-17" , "chr2:26-34"), state=states[12:14] ) ,
     sample6 = GRanges( c("chr1:1-12", "chr2:12-18" , "chr2:25-35"), state=states[15:17] )
)

# (2) summarized CNV regions
cnvrs <- populationRanges(calls, density=0.1)

# (3) RNA-seq read counts
genes <- GRanges(c("chr1:2-9", "chr1:100-150", "chr1:200-300",
                   "chr2:16-17", "chr2:100-150", "chr2:200-300", "chr2:26-33"))
y <- matrix(rnbinom(42,size=1,mu=10),7,6)
names(genes) <- rownames(y) <- paste0("gene", 1:7)
colnames(y) <- paste0("sample", 1:6)

library(SummarizedExperiment)
rse <- SummarizedExperiment(assays=list(counts=y), rowRanges=granges(genes))

# (4) perform the association analysis
res <- cnvEQTL(cnvrs, calls, rse, 
               min.samples = 1, min.state.freq = 1, filter.by.expr = FALSE)

Run the CNV-GWAS

Description

Wraps all the necessary functions to run a CNV-GWAS using the output of setupCnvGWAS function.

(i) Produces the GDS file containing the genotype information (if produce.gds == TRUE), (ii) Produces the requested inputs for a PLINK analysis, (iii) run a CNV-GWAS analysis using a linear model (i.e. lm function), and (iv) export a QQ-plot displaying the adjusted p-values. In this release only the p-value for the copy number is available (i.e. 'P(CNP)').

Usage

cnvGWAS(
  phen.info,
  n.cor = 1,
  min.sim = 0.95,
  freq.cn = 0.01,
  snp.matrix = FALSE,
  method.m.test = "fdr",
  lo.phe = 1,
  chr.code.name = NULL,
  genotype.nodes = "CNVGenotype",
  coding.translate = "all",
  path.files = NULL,
  list.of.files = NULL,
  produce.gds = TRUE,
  run.lrr = FALSE,
  assign.probe = "min.pvalue",
  correct.inflation = FALSE,
  both.up.down = FALSE,
  verbose = FALSE
)
cnvGWAS(
  phen.info,
  n.cor = 1,
  min.sim = 0.95,
  freq.cn = 0.01,
  snp.matrix = FALSE,
  method.m.test = "fdr",
  lo.phe = 1,
  chr.code.name = NULL,
  genotype.nodes = "CNVGenotype",
  coding.translate = "all",
  path.files = NULL,
  list.of.files = NULL,
  produce.gds = TRUE,
  run.lrr = FALSE,
  assign.probe = "min.pvalue",
  correct.inflation = FALSE,
  both.up.down = FALSE,
  verbose = FALSE
)

Arguments

`phen.info`	Returned by `setupCnvGWAS`
`n.cor`	Number of cores to be used
`min.sim`	Minimum CNV genotype distribution similarity among subsequent probes. Default is 0.95 (i.e. 95%)
`freq.cn`	Minimum CNV frequency where 1 (i.e. 100%), or all samples deviating from diploid state. Default 0.01 (i.e. 1%)
`snp.matrix`	Only FALSE implemented - If TRUE B allele frequencies (BAF) would be used to reconstruct CNV-SNP genotypes
`method.m.test`	Correction for multiple tests to be used. FDR is default, see `p.adjust` for other methods.
`lo.phe`	The phenotype to be analyzed in the PhenInfo$phenotypesSam data-frame
`chr.code.name`	A data-frame with the integer name in the first column and the original name for each chromosome
`genotype.nodes`	Expression data type. Nodes with CNV genotypes to be produced in the gds file.
`coding.translate`	For 'CNVgenotypeSNPlike'. If NULL or unrecognized string use only biallelic CNVs. If 'all' code multiallelic CNVs as 0 for loss; 1 for 2n and 2 for gain.
`path.files`	Folder containing the input CNV files used for the CNV calling (i.e. one text file with 5 collumns for each sample). Columns should contain (i) probe name, (ii) Chromosome, (iii) Position, (iv) LRR, and (v) BAF.
`list.of.files`	Data-frame with two columns where the (i) is the file name with signals and (ii) is the correspondent name of the sample in the gds file
`produce.gds`	logical. If TRUE produce a new gds, if FALSE use gds previously created
`run.lrr`	If TRUE use LRR values instead absolute copy numbers in the association
`assign.probe`	‘min.pvalue’ or ‘high.freq’ to represent the CNV segment
`correct.inflation`	logical. Estimate lambda from raw p-values and correct for genomic inflation. Use with argument `method.m.test` to generate strict p-values.
`both.up.down`	Check for CNV genotype similarity in both directions. Default is FALSE (i.e. only downstream)
`verbose`	Show progress in the analysis

Value

The CNV segments and the representative probes and their respective p-value

Author(s)

Vinicius Henrique da Silva

References

da Silva et al. (2016) Genome-wide detection of CNVs and their association with meat tenderness in Nelore cattle. PLoS One, 11(6):e0157711.

Examples


# Load phenotype-CNV information
data.dir <- system.file("extdata", package="CNVRanger")

phen.loc <- file.path(data.dir, "Pheno.txt")
cnv.out.loc <- file.path(data.dir, "CNVOut.txt")
map.loc <- file.path(data.dir, "MapPenn.txt")

phen.info <- setupCnvGWAS('Example', phen.loc, cnv.out.loc, map.loc)

# Define chr correspondence to numeric, if necessary
df <- '16 1A
25 4A
29 25LG1
30 25LG2
31 LGE22'

chr.code.name <- read.table(text=df, header=FALSE)
segs.pvalue.gr <- cnvGWAS(phen.info, chr.code.name=chr.code.name)
 
# Load phenotype-CNV information
data.dir <- system.file("extdata", package="CNVRanger")

phen.loc <- file.path(data.dir, "Pheno.txt")
cnv.out.loc <- file.path(data.dir, "CNVOut.txt")
map.loc <- file.path(data.dir, "MapPenn.txt")

phen.info <- setupCnvGWAS('Example', phen.loc, cnv.out.loc, map.loc)

# Define chr correspondence to numeric, if necessary
df <- '16 1A
25 4A
29 25LG1
30 25LG2
31 LGE22'

chr.code.name <- read.table(text=df, header=FALSE)
segs.pvalue.gr <- cnvGWAS(phen.info, chr.code.name=chr.code.name)

OncoPrint plot for CNV regions

Description

Illustrates overlaps between CNV calls and genomic features across a sample population.

Usage

cnvOncoPrint(
  calls,
  features,
  multi.calls = .largest,
  top.features = 25,
  top.samples = 100,
  ...
)
cnvOncoPrint(
  calls,
  features,
  multi.calls = .largest,
  top.features = 25,
  top.samples = 100,
  ...
)

Arguments

`calls`	Either a `GRangesList` or `RaggedExperiment` storing the individual CNV calls for each sample.
`features`	A `GRanges` object containing the genomic features of interest, typically genes. Feature names are either expected as a meta-column `symbol` or as the `names` of the object.
`multi.calls`	A function. Determines how to summarize the CN state in a CNV region when there are multiple (potentially conflicting) calls for one sample in that region. Defaults to `.largest`, which assigns the CN state of the call that covers the largest part of the CNV region tested. A user-defined function that is passed on to `qreduceAssay` can also be provided for customized behavior.
`top.features`	integer. Restricts the number of features for plotting to features experiencing highest alteration frequency. Defaults to 25. Use `-1` to display all features.
`top.samples`	integer. Restricts the number of samples for plotting to samples experiencing highest alteration frequency. Defaults to 100. Use `-1` to display all samples.
`...`	Additional arguments passed on to `ComplexHeatmap::oncoPrint`

Value

None. Plots to a graphics device.

Author(s)

Ludwig Geistlinger

Examples


# read in example CNV calls
data.dir <- system.file("extdata", package="CNVRanger")
call.file <- file.path(data.dir, "Silva16_PONE_CNV_calls.csv")
calls <- read.csv(call.file, as.is=TRUE)

# store in a GRangesList
calls <- makeGRangesListFromDataFrame(calls, 
   split.field="NE_id", keep.extra.columns=TRUE)

# three example genes
genes <- c(  "chr1:140368053-140522639:-", 
             "chr2:97843887-97988140:+",
             "chr2:135418586-135422028:-")
names(genes) <- c("ATP2C1", "MAP2", "ACTL8")
genes <- GRanges(genes)

# plot
cnvOncoPrint(calls, genes, top.samples = 25)

# read in example CNV calls
data.dir <- system.file("extdata", package="CNVRanger")
call.file <- file.path(data.dir, "Silva16_PONE_CNV_calls.csv")
calls <- read.csv(call.file, as.is=TRUE)

# store in a GRangesList
calls <- makeGRangesListFromDataFrame(calls, 
   split.field="NE_id", keep.extra.columns=TRUE)

# three example genes
genes <- c(  "chr1:140368053-140522639:-", 
             "chr2:97843887-97988140:+",
             "chr2:135418586-135422028:-")
names(genes) <- c("ATP2C1", "MAP2", "ACTL8")
genes <- GRanges(genes)

# plot
cnvOncoPrint(calls, genes, top.samples = 25)

Produce CNV-GDS for the phenotyped samples

Description

Function to produce the GDS file in a probe-wise fashion for CNV genotypes. The GDS file which is produced also incorporates one phenotype to be analyzed. If several phenotypes are enclosed in the ‘phen.info’ object, the user may specify the phenotype to be analyzed with the ‘lo.phe’ parameter. Only diploid chromosomes should be included.

Usage

generateGDS(
  phen.info,
  freq.cn = 0.01,
  snp.matrix = FALSE,
  lo.phe = 1,
  chr.code.name = NULL,
  genotype.nodes = c("CNVGenotype", "CNVgenotypeSNPlike"),
  coding.translate = NULL,
  n.cor = 1
)
generateGDS(
  phen.info,
  freq.cn = 0.01,
  snp.matrix = FALSE,
  lo.phe = 1,
  chr.code.name = NULL,
  genotype.nodes = c("CNVGenotype", "CNVgenotypeSNPlike"),
  coding.translate = NULL,
  n.cor = 1
)

Arguments

`phen.info`	Returned by `setupCnvGWAS`
`freq.cn`	Minimum frequency. Default is 0.01 (i.e. 1%)
`snp.matrix`	Only FALSE implemented. If TRUE, B allele frequencies (BAF) and SNP genotypes would be used to reconstruct CNV-SNP genotypes - under development
`lo.phe`	The phenotype to be analyzed in the PhenInfo$phenotypesSam dataframe
`chr.code.name`	A data-frame with the integer name in the first column and the original name in the second for each chromosome previously converted to numeric
`genotype.nodes`	Nodes with CNV genotypes to be produced in the gds file. Use 'CNVGenotype' for dosage-like genotypes (i.e. from 0 to Inf). Use 'CNVgenotypeSNPlike' alongside for SNP-like CNV genotype in a separated node (i.e. '0, 1, 2, 3, 4' as '0/0, 0/1, 1/1, 1/2, 2/2').
`coding.translate`	For 'CNVgenotypeSNPlike'. If NULL or unrecognized string use only biallelic CNVs. If 'all' code multiallelic CNVs as 0 for loss; 1 for 2n and 2 for gain.
`n.cor`	Number of cores

Value

probes.cnv.gr Object with information about all probes to be used in the downstream CNV-GWAS. Only numeric chromosomes

Author(s)

Vinicius Henrique da Silva

Examples


# Load phenotype-CNV information
data.dir <- system.file("extdata", package="CNVRanger")

phen.loc <- file.path(data.dir, "Pheno.txt")
cnv.out.loc <- file.path(data.dir, "CNVOut.txt")
map.loc <- file.path(data.dir, "MapPenn.txt")

phen.info <- setupCnvGWAS('Example', phen.loc, cnv.out.loc, map.loc)

# Construct the data-frame with integer and original chromosome names 
 
# Define chr correspondence to numeric, if necessary
df <- '16 1A
25 4A
29 25LG1
30 25LG2
31 LGE22'

chr.code.name <- read.table(text=df, header=FALSE)
probes.cnv.gr <- generateGDS(phen.info, chr.code.name=chr.code.name)

# Load phenotype-CNV information
data.dir <- system.file("extdata", package="CNVRanger")

phen.loc <- file.path(data.dir, "Pheno.txt")
cnv.out.loc <- file.path(data.dir, "CNVOut.txt")
map.loc <- file.path(data.dir, "MapPenn.txt")

phen.info <- setupCnvGWAS('Example', phen.loc, cnv.out.loc, map.loc)

# Construct the data-frame with integer and original chromosome names 
 
# Define chr correspondence to numeric, if necessary
df <- '16 1A
25 4A
29 25LG1
30 25LG2
31 LGE22'

chr.code.name <- read.table(text=df, header=FALSE)
probes.cnv.gr <- generateGDS(phen.info, chr.code.name=chr.code.name)

Import LRR and BAF from text files used in the CNV analysis

Description

This function imports the LRR/BAF values and create a node for each one in the GDS file at the working folder 'Inputs' created by the setupCnvGWAS function. Once imported, the LRR values can be used to perform a GWAS directly as an alternative to copy number dosage

Usage

importLrrBaf(
  all.paths,
  path.files,
  list.of.files,
  gds.file = NULL,
  verbose = TRUE
)
importLrrBaf(
  all.paths,
  path.files,
  list.of.files,
  gds.file = NULL,
  verbose = TRUE
)

Arguments

`all.paths`	Object returned from `CreateFolderTree` function with the working folder tree
`path.files`	Folder containing the input CNV files used for the CNV calling (i.e. one text file with 5 collumns for each sample). Columns should contain (i) probe name, (ii) Chromosome, (iii) Position, (iv) LRR, and (v) BAF.
`list.of.files`	Data-frame with two columns where the (i) is the file name with signals and (ii) is the correspondent name of the sample in the gds file
`gds.file`	Path to the GDS file which contains nodes harboring respective LRR and BAF values. The ‘snp.rs.id’, ‘sample.id’, ‘LRR’ and ‘BAF’ nodes are mandatory. Both the SNPs and samples should follow the order and length in the CNV.gds (located at all.paths["Inputs"] folder). ‘path.files’ and ‘list.of.files’ will be ignored if ‘gds.file’ is not NULL
`verbose`	Print the samples while importing

Value

Writes to the specified GDS file by side effect.

Author(s)

Vinicius Henrique da Silva

Examples


# Load phenotype-CNV information
data.dir <- system.file("extdata", package="CNVRanger")

phen.loc <- file.path(data.dir, "Pheno.txt")
cnv.out.loc <- file.path(data.dir, "CNVOut.txt")
map.loc <- file.path(data.dir, "MapPenn.txt")

phen.info <- setupCnvGWAS('Example', phen.loc, cnv.out.loc, map.loc)

# Extract path names
all.paths <- phen.info$all.paths

# List files to import LRR/BAF 
list.of.files <- list.files(path=data.dir, pattern="cnv.txt.adjusted$")
list.of.files <- as.data.frame(list.of.files)
colnames(list.of.files)[1] <- "file.names"
list.of.files$sample.names <- sub(".cnv.txt.adjusted$", "", list.of.files$file.names)

# All missing samples will have LRR = '0' and BAF = '0.5' in all SNPs listed in the GDS file
importLrrBaf(all.paths, data.dir, list.of.files)

# Read the GDS to check if the LRR/BAF nodes were added
cnv.gds <- file.path(all.paths["Inputs"], 'CNV.gds')    
genofile <- SNPRelate::snpgdsOpen(cnv.gds, allow.fork=TRUE, readonly=FALSE)
SNPRelate::snpgdsClose(genofile)

# Load phenotype-CNV information
data.dir <- system.file("extdata", package="CNVRanger")

phen.loc <- file.path(data.dir, "Pheno.txt")
cnv.out.loc <- file.path(data.dir, "CNVOut.txt")
map.loc <- file.path(data.dir, "MapPenn.txt")

phen.info <- setupCnvGWAS('Example', phen.loc, cnv.out.loc, map.loc)

# Extract path names
all.paths <- phen.info$all.paths

# List files to import LRR/BAF 
list.of.files <- list.files(path=data.dir, pattern="cnv.txt.adjusted$")
list.of.files <- as.data.frame(list.of.files)
colnames(list.of.files)[1] <- "file.names"
list.of.files$sample.names <- sub(".cnv.txt.adjusted$", "", list.of.files$file.names)

# All missing samples will have LRR = '0' and BAF = '0.5' in all SNPs listed in the GDS file
importLrrBaf(all.paths, data.dir, list.of.files)

# Read the GDS to check if the LRR/BAF nodes were added
cnv.gds <- file.path(all.paths["Inputs"], 'CNV.gds')    
genofile <- SNPRelate::snpgdsOpen(cnv.gds, allow.fork=TRUE, readonly=FALSE)
SNPRelate::snpgdsClose(genofile)

Plot EQTL region

Description

Illustrates differential expression of genes in the neighborhood of a CNV.

Usage

plotEQTL(cnvr, genes, genome, cn = "CN1", cex = 0.8)
plotEQTL(cnvr, genes, genome, cn = "CN1", cex = 0.8)

Arguments

`cnvr`	A `GRanges` of length 1, containing the genomic coordinates of the CNV region of interest.
`genes`	`GRanges` containing genes in the neighborhood of the CNV region of interest.
`genome`	Character. A valid UCSC genome assembly ID such as 'hg19' or 'bosTau6'.
`cn`	Character. Copy number state of interest.
`cex`	A numerical value giving the amount by which gene names should be magnified. Default is 0.8. Use smaller values to decrease font size.

Value

None. Plots to a graphics device.

Author(s)

Ludwig Geistlinger

Examples


# CNV region of interest
cnvr <- GRanges("chr1:7908902-8336254")

# Two genes in the neighborhood
genes <- c("chr1:8021714-8045342:+", "chr1:8412464-8877699:-")
names(genes) <- c("PARK7", "RERE")
genes <- GRanges(genes)

# Annotate differential expression for 1-copy loss
genes$logFC.CN1 <- c(-0.635, -0.728)
genes$AdjPValue <- c(8.29e-09, 1.76e-08) 

# plot
plotEQTL(cnvr, genes, genome="hg19", cn="CN1")

# CNV region of interest
cnvr <- GRanges("chr1:7908902-8336254")

# Two genes in the neighborhood
genes <- c("chr1:8021714-8045342:+", "chr1:8412464-8877699:-")
names(genes) <- c("PARK7", "RERE")
genes <- GRanges(genes)

# Annotate differential expression for 1-copy loss
genes$logFC.CN1 <- c(-0.635, -0.728)
genes$AdjPValue <- c(8.29e-09, 1.76e-08) 

# plot
plotEQTL(cnvr, genes, genome="hg19", cn="CN1")

Manhattan Plot

Description

Manhattan plot for p-values of a CNV-GWAS

Usage

plotManhattan(all.paths, regions, chr.size.order, plot.pdf = FALSE)
plotManhattan(all.paths, regions, chr.size.order, plot.pdf = FALSE)

Arguments

`all.paths`	Object returned from `CreateFolderTree` function with the working folder tree
`regions`	`GRanges` as returned by `cnvGWAS`
`chr.size.order`	`data.frame` with two columns: (i) 'chr': chromosome names (character), and (ii) 'size': length of the chromosomes in bp (integer). A `GRanges` containing one chromosome per range can be used instead (the chromosomes should be in the expected order).
`plot.pdf`	Logical plot a to pdf file

Value

Plots to graphics device.

Author(s)

Vinicius Henrique da Silva

Examples


# Load phenotype-CNV information
data.dir <- system.file("extdata", package="CNVRanger")

phen.loc <- file.path(data.dir, "Pheno.txt")
cnv.out.loc <- file.path(data.dir, "CNVOut.txt")
map.loc <- file.path(data.dir, "MapPenn.txt")

phen.info <- setupCnvGWAS('Example', phen.loc, cnv.out.loc, map.loc)
all.paths <- phen.info$all.paths
segs.pvalue.gr <- cnvGWAS(phen.info)

# Define the chromosome order in the plot
order.chrs <- c(1:24, "25LG1", "25LG2", 27:28, "LGE22", "1A", "4A")

# Chromosome sizes
chr.size.file <- file.path(data.dir, "Parus_major_chr_sizes.txt")
chr.sizes <- scan(chr.size.file)
chr.size.order <- data.frame(chr=order.chrs, sizes=chr.sizes, stringsAsFactors=FALSE)

# Plot Manhatthan to a pdf within the 'Results' workfolder
plotManhattan(all.paths, segs.pvalue.gr, chr.size.order)

# Load phenotype-CNV information
data.dir <- system.file("extdata", package="CNVRanger")

phen.loc <- file.path(data.dir, "Pheno.txt")
cnv.out.loc <- file.path(data.dir, "CNVOut.txt")
map.loc <- file.path(data.dir, "MapPenn.txt")

phen.info <- setupCnvGWAS('Example', phen.loc, cnv.out.loc, map.loc)
all.paths <- phen.info$all.paths
segs.pvalue.gr <- cnvGWAS(phen.info)

# Define the chromosome order in the plot
order.chrs <- c(1:24, "25LG1", "25LG2", 27:28, "LGE22", "1A", "4A")

# Chromosome sizes
chr.size.file <- file.path(data.dir, "Parus_major_chr_sizes.txt")
chr.sizes <- scan(chr.size.file)
chr.size.order <- data.frame(chr=order.chrs, sizes=chr.sizes, stringsAsFactors=FALSE)

# Plot Manhatthan to a pdf within the 'Results' workfolder
plotManhattan(all.paths, segs.pvalue.gr, chr.size.order)

Plot recurrent CNV regions

Description

Illustrates summarized CNV regions along a chromosome.

Usage

plotRecurrentRegions(regs, genome, chr, pthresh = 0.05)
plotRecurrentRegions(regs, genome, chr, pthresh = 0.05)

Arguments

`regs`	A `GRanges`. Typically the result of `populationRanges` with `est.recur=TRUE`.
`genome`	Character. A valid UCSC genome assembly ID such as 'hg19' or 'bosTau6'.
`chr`	Character. A UCSC-style chromosome name such as 'chr1'.
`pthresh`	Numeric. Significance threshold for recurrence. Defaults to 0.05.

Value

None. Plots to a graphics device.

Author(s)

Ludwig Geistlinger

Examples


# read in example CNV calls
data.dir <- system.file("extdata", package="CNVRanger")
call.file <- file.path(data.dir, "Silva16_PONE_CNV_calls.csv")
calls <- read.csv(call.file, as.is=TRUE)

# store in a GRangesList
grl <- GenomicRanges::makeGRangesListFromDataFrame(calls, 
   split.field="NE_id", keep.extra.columns=TRUE)

# summarize CNV regions
cnvrs <- populationRanges(grl, density=0.1, est.recur=TRUE)

# plot
plotRecurrentRegions(cnvrs, genome="bosTau6", chr="chr1")

# read in example CNV calls
data.dir <- system.file("extdata", package="CNVRanger")
call.file <- file.path(data.dir, "Silva16_PONE_CNV_calls.csv")
calls <- read.csv(call.file, as.is=TRUE)

# store in a GRangesList
grl <- GenomicRanges::makeGRangesListFromDataFrame(calls, 
   split.field="NE_id", keep.extra.columns=TRUE)

# summarize CNV regions
cnvrs <- populationRanges(grl, density=0.1, est.recur=TRUE)

# plot
plotRecurrentRegions(cnvrs, genome="bosTau6", chr="chr1")

Summarizing CNV ranges across a population

Description

In CNV analysis, it is often of interest to summarize individual calls across the population, (i.e. to define CNV regions), for subsequent association analysis with e.g. phenotype data.

Usage

populationRanges(
  grl,
  mode = c("density", "RO"),
  density = 0.1,
  ro.thresh = 0.5,
  multi.assign = FALSE,
  verbose = FALSE,
  min.size = 2,
  classify.ranges = TRUE,
  type.thresh = 0.1,
  est.recur = FALSE
)
populationRanges(
  grl,
  mode = c("density", "RO"),
  density = 0.1,
  ro.thresh = 0.5,
  multi.assign = FALSE,
  verbose = FALSE,
  min.size = 2,
  classify.ranges = TRUE,
  type.thresh = 0.1,
  est.recur = FALSE
)

Arguments

`grl`	A `GRangesList`.
`mode`	Character. Should population ranges be computed based on regional density ("density") or reciprocal overlap ("RO"). See Details.
`density`	Numeric. Defaults to 0.1.
`ro.thresh`	Numeric. Threshold for reciprocal overlap required for merging two overlapping regions. Defaults to 0.5.
`multi.assign`	Logical. Allow regions to be assigned to several region clusters? Defaults to `FALSE`.
`verbose`	Logical. Report progress messages? Defaults to `FALSE`.
`min.size`	Numeric. Minimum size of a summarized region to be included. Defaults to 2 bp.
`classify.ranges`	Logical. Should CNV frequency (number of samples overlapping the region) and CNV type (gain, loss, or both) be annotated? Defaults to `TRUE`.
`type.thresh`	Numeric. Required minimum relative frequency of each CNV type (gain / loss) to be taken into account when assigning CNV type to a region. Defaults to 0.1. That means for a region overlapped by individual gain and loss calls that both types must be present in >10 in order to be typed as 'both'. If gain or loss calls are present below the threshold they are ignored.
`est.recur`	Logical. Should recurrence of regions be assessed via a permutation test? Defaults to `FALSE`. See Details.

Details

CNVRuler procedure that trims region margins based on regional density

Trims low-density areas (usually <10% of the total contributing individual calls within a summarized region).

An illustration of the concept can be found here: https://www.ncbi.nlm.nih.gov/pubmed/22539667 (Figure 1)
Reciprocal overlap (RO) approach (e.g. Conrad et al., Nature, 2010)

Reciprocal overlap of 0.51 between two genomic regions A and B:

requires that B overlaps at least 51% of A, *and* that A also overlaps at least 51% of B

Approach:

At the top level of the hierarchy, all contiguous bases overlapping at least 1bp of individual calls are merged into one region. Within each region, we further define reciprocally overlapping regions with the following algorithm:
- Calculate reciprocal overlap (RO) between all remaining calls.
- Identify pair of calls with greatest RO. If RO > threshold, merge and create a new CNV. If not, exit.
- Continue adding unclustered calls to the region, in order of best overlap. In order to add a call, the new call must have > threshold to all calls within the region to be added. When no additional calls may be added, move to next step.
- If calls remain, return to 1. Otherwise exit.
GISTIC procedure (Beroukhim et al., PNAS, 2007) to identify recurrent CNV regions

GISTIC scores each CNV region with a G-score that is proportional to the total magnitude of CNV calls in each CNV region. In addition, by permuting the locations in each sample, GISTIC determines the frequency with which a given score would be attained if the events were due to chance and therefore randomly distributed. A significance threshold can then be used to determine scores / regions that are unlikely to occur by chance alone.

Value

A GRanges object containing the summarized CNV ranges.

Author(s)

Ludwig Geistlinger, Martin Morgan

References

Kim et al. (2012) CNVRuler: a copy number variation-based case-control association analysis tool. Bioinformatics, 28(13):1790-2.

Conrad et al. (2010) Origins and functional impact of copy number variation in the human genome. Nature, 464(7289):704-12.

Beroukhim et al. (2007) Assessing the significance of chromosomal aberrations in cancer: methodology and application to glioma. PNAS, 104(50):20007-12.

Examples


grl <- GRangesList(
     sample1 = GRanges( c("chr1:1-10", "chr2:15-18", "chr2:25-34") ),
     sample2 = GRanges( c("chr1:1-10", "chr2:11-18" , "chr2:25-36") ),
     sample3 = GRanges( c("chr1:2-11", "chr2:14-18", "chr2:26-36") ),
     sample4 = GRanges( c("chr1:1-12", "chr2:18-35" ) ),
     sample5 = GRanges( c("chr1:1-12", "chr2:11-17" , "chr2:26-34") ) ,
     sample6 = GRanges( c("chr1:1-12", "chr2:12-18" , "chr2:25-35") )
)

# default as chosen in the original CNVRuler procedure
populationRanges(grl, density=0.1, classify.ranges=FALSE)

# density = 0 merges all overlapping regions, 
# equivalent to: reduce(unlist(grl))
populationRanges(grl, density=0, classify.ranges=FALSE) 

# density = 1 disjoins all overlapping regions, 
# equivalent to: disjoin(unlist(grl))
populationRanges(grl, density=1, classify.ranges=FALSE)

# RO procedure
populationRanges(grl, mode="RO", ro.thresh=0.5, classify.ranges=FALSE)

grl <- GRangesList(
     sample1 = GRanges( c("chr1:1-10", "chr2:15-18", "chr2:25-34") ),
     sample2 = GRanges( c("chr1:1-10", "chr2:11-18" , "chr2:25-36") ),
     sample3 = GRanges( c("chr1:2-11", "chr2:14-18", "chr2:26-36") ),
     sample4 = GRanges( c("chr1:1-12", "chr2:18-35" ) ),
     sample5 = GRanges( c("chr1:1-12", "chr2:11-17" , "chr2:26-34") ) ,
     sample6 = GRanges( c("chr1:1-12", "chr2:12-18" , "chr2:25-35") )
)

# default as chosen in the original CNVRuler procedure
populationRanges(grl, density=0.1, classify.ranges=FALSE)

# density = 0 merges all overlapping regions, 
# equivalent to: reduce(unlist(grl))
populationRanges(grl, density=0, classify.ranges=FALSE) 

# density = 1 disjoins all overlapping regions, 
# equivalent to: disjoin(unlist(grl))
populationRanges(grl, density=1, classify.ranges=FALSE)

# RO procedure
populationRanges(grl, mode="RO", ro.thresh=0.5, classify.ranges=FALSE)

Setup the folders and files to run CNV-GWAS analysis

Description

This function creates the (i) necessary folders in disk to perform downstream analysis on CNV genome-wide association and (ii) import the necessary input files (i.e. phenotypes, probe map and CNV list) from other locations in disk.

Usage

setupCnvGWAS(
  name,
  phen.loc,
  cnv.out.loc,
  map.loc = NULL,
  folder = NULL,
  pops.names = NULL,
  n.cor = 1
)
setupCnvGWAS(
  name,
  phen.loc,
  cnv.out.loc,
  map.loc = NULL,
  folder = NULL,
  pops.names = NULL,
  n.cor = 1
)

Arguments

`name`	String with a project code or name (e.g. 'Project1')
`phen.loc`	Path/paths to the tab separated text file containing phenotype and sample info. When using more than one population, for populations without phenotypes include the string 'INEXISTENT' instead the path for a file.
`cnv.out.loc`	Path(s) to the CNV analysis output (i.e. PennCNV output, SNP-chip general format or sequencing general format). It is also possible to use a `RaggedExperiment` or a `GRangesList` object instead if the run includes only one population.
`map.loc`	Path to the probe map (e.g. used in PennCNV analysis). Column names containing probe name, chromosome and coordinate must be named as: Name, Chr and Position. Tab delimited. If NULL, artificial probes will be generated based on the CNV breakpoints.
`folder`	Choose manually the project folder (i.e. path as the root folder). Otherwise, user-specific data dir will be used automatically.
`pops.names`	Indicate the name of the populations, if using more than one.
`n.cor`	Number of cores

Details

The user can import several phenotypes at once. All information will be stored in the list returned by this function. The user should be aware although several phenotypes can be imported, the cnvGWAS or generateGDS functions will handle only one phenotype per run.

Value

List ‘phen.info’ with ‘samplesPhen’, ‘phenotypes’, ‘phenotypesdf’, ‘phenotypesSam’, ‘FamID’, ‘SexIds’, ‘pops.names’ (if more than one population) and ‘all.paths’

Author(s)

Vinicius Henrique da Silva

Examples


data.dir <- system.file("extdata", package="CNVRanger")

phen.loc <- file.path(data.dir, "Pheno.txt")
cnv.out.loc <- file.path(data.dir, "CNVOut.txt")
map.loc <- file.path(data.dir, "MapPenn.txt")

phen.info <- setupCnvGWAS('Example', phen.loc, cnv.out.loc, map.loc)


data.dir <- system.file("extdata", package="CNVRanger")

phen.loc <- file.path(data.dir, "Pheno.txt")
cnv.out.loc <- file.path(data.dir, "CNVOut.txt")
map.loc <- file.path(data.dir, "MapPenn.txt")

phen.info <- setupCnvGWAS('Example', phen.loc, cnv.out.loc, map.loc)

Package 'CNVRanger'

Help Index

CNV-expression association analysis

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Run the CNV-GWAS

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

OncoPrint plot for CNV regions

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Produce CNV-GDS for the phenotyped samples

Description

Usage

Arguments

Value

Author(s)

Examples

Import LRR and BAF from text files used in the CNV analysis

Description

Usage

Arguments

Value

Author(s)

Examples

Plot EQTL region

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Manhattan Plot

Description

Usage

Arguments

Value

Author(s)

Examples

Plot recurrent CNV regions

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Summarizing CNV ranges across a population

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Setup the folders and files to run CNV-GWAS analysis

Description

Usage

Arguments