Title: | Candidate Gene Miner |
---|---|
Description: | This package aims to integrate GWAS-derived SNPs and coexpression networks to mine candidate genes associated with a particular phenotype. For that, users must define a set of guide genes, which are known genes involved in the studied phenotype. Additionally, the mined candidates can be given a score that favor candidates that are hubs and/or transcription factors. The scores can then be used to rank and select the top n most promising genes for downstream experiments. |
Authors: | FabrÃcio Almeida-Silva [aut, cre] , Thiago Venancio [aut] |
Maintainer: | FabrÃcio Almeida-Silva <[email protected]> |
License: | GPL-3 |
Version: | 1.13.0 |
Built: | 2024-11-19 03:30:06 UTC |
Source: | https://github.com/bioc/cageminer |
Lengths of pepper chromosomes 1-12 in a GRanges object. The genome for which lengths were calculated (v1.55) was downloaded from http://peppergenome.snu.ac.kr/download.php
data(chr_length)
data(chr_length)
A GRanges object
data(chr_length)
data(chr_length)
This object is a list as returned by BioNERO::exp2gcn(), but only the element genes_and_modules is included. For running time issues, only genes in the cyan module were kept in the element genes_and_modules. All other list elements have been assigned NULL. The network was inferred using the code from the vignette.
data(gcn)
data(gcn)
A list with the elements returned by BioNERO::exp2gcn().
data(gcn)
data(gcn)
GRanges object with genomic coordinates of pepper genes downloaded from http://peppergenome.snu.ac.kr/download.php.
data(gene_ranges)
data(gene_ranges)
A GRanges object
data(gene_ranges)
data(gene_ranges)
The GO annotation was retrieved from PLAZA 4.0 Dicots.
data(guides)
data(guides)
A data frame with genes in the first column and GO description in the second column.
Van Bel, M., Diels, T., Vancaester, E., Kreft, L., Botzki, A., Van de Peer, Y., ... & Vandepoele, K. (2018). PLAZA 4.0: an integrative resource for functional, evolutionary and comparative plant genomics. Nucleic acids research, 46(D1), D1190-D1196.
data(guides)
data(guides)
The data frame was created using the code from the vignette.
data(hubs)
data(hubs)
Data frame with gene IDs, module and intramodular degree.
data(hubs)
data(hubs)
Mine high-confidence candidate genes in a single step
mine_candidates( gene_ranges = NULL, marker_ranges = NULL, window = 2, expand_intervals = TRUE, gene_col = "ID", exp = NULL, gcn = NULL, guides = NULL, metadata, metadata_cols = 1, sample_group, min_cor = 0.2, alpha = 0.05, ... )
mine_candidates( gene_ranges = NULL, marker_ranges = NULL, window = 2, expand_intervals = TRUE, gene_col = "ID", exp = NULL, gcn = NULL, guides = NULL, metadata, metadata_cols = 1, sample_group, min_cor = 0.2, alpha = 0.05, ... )
gene_ranges |
A GRanges object with genomic coordinates of all genes in the genome. |
marker_ranges |
Genomic positions of SNPs. For a single trait, a GRanges object. For multiple traits, a GRangesList or CompressedGRangesList object, with each element of the list representing SNP positions for a particular trait. |
window |
Sliding window (in Mb) upstream and downstream relative to each SNP. Default: 2. |
expand_intervals |
Logical indicating whether or not to expand markers that are represented by intervals. This is particularly useful if users want to use a custom interval defined by linkage disequilibrium, for example. Default: TRUE. |
gene_col |
Column of the GRanges object containing gene ID. Default: "ID", the default for gff/gff3 files imported with rtracklayer::import. |
exp |
Expression data frame with genes in row names and samples in column names or a SummarizedExperiment object. |
gcn |
Gene coexpression network returned by |
guides |
Guide genes as a character vector or as a data frame with genes in the first column and gene annotation class in the second column. |
metadata |
Sample metadata with samples in row names and sample
information in the first column. Ignored if |
metadata_cols |
A vector (either numeric or character) indicating
which columns should be extracted from column metadata if exp
is a |
sample_group |
Level of sample metadata to be used for filtering in gene-trait correlation. |
min_cor |
Minimum correlation value for
|
alpha |
Numeric indicating significance level. Default: 0.05 |
... |
Additional arguments to |
A data frame with mined candidate genes and their correlation to the condition of interest.
data(pepper_se) data(snp_pos) data(gene_ranges) data(guides) data(gcn) set.seed(1) candidates <- mine_candidates(gene_ranges, snp_pos, exp = pepper_se, gcn = gcn, guides = guides$Gene, sample_group = "PRR_stress")
data(pepper_se) data(snp_pos) data(gene_ranges) data(guides) data(gcn) set.seed(1) candidates <- mine_candidates(gene_ranges, snp_pos, exp = pepper_se, gcn = gcn, guides = guides$Gene, sample_group = "PRR_stress")
For a user-defined sliding window relative to each SNP, this function will subset all genes whose genomic positions overlap with the sliding window.
mine_step1(gene_ranges, marker_ranges, window = 2, expand_intervals = TRUE)
mine_step1(gene_ranges, marker_ranges, window = 2, expand_intervals = TRUE)
gene_ranges |
A GRanges object with genomic coordinates of all genes in the genome. |
marker_ranges |
Genomic positions of SNPs. For a single trait, a GRanges object. For multiple traits, a GRangesList or CompressedGRangesList object, with each element of the list representing SNP positions for a particular trait. |
window |
Sliding window (in Mb) upstream and downstream relative to each SNP. Default: 2. |
expand_intervals |
Logical indicating whether or not to expand markers that are represented by intervals. This is particularly useful if users want to use a custom interval defined by linkage disequilibrium, for example. Default: TRUE. |
A GRanges or GRangesList object with the genomic positions of all putative candidate genes.
data(snp_pos) data(gene_ranges) genes <- mine_step1(gene_ranges, snp_pos, window = 2)
data(snp_pos) data(gene_ranges) genes <- mine_step1(gene_ranges, snp_pos, window = 2)
Step 2: Get candidates in modules enriched in guide genes
mine_step2(exp, gcn, guides, candidates, ...)
mine_step2(exp, gcn, guides, candidates, ...)
exp |
Expression data frame with genes in row names and samples in column names or a SummarizedExperiment object. |
gcn |
Gene coexpression network returned by |
guides |
Guide genes as a character vector or as a data frame with genes in the first column and gene annotation class in the second column. |
candidates |
Character vector of all candidates genes to be inspected. |
... |
Additional arguments to |
A list of 2 elements:
Character vector of candidates after step 2
Data frame of results for enrichment analysis
data(pepper_se) data(guides) data(gcn) set.seed(1) mine2 <- mine_step2( exp = pepper_se, gcn = gcn, guides = guides$Gene, candidates = rownames(pepper_se) )
data(pepper_se) data(guides) data(gcn) set.seed(1) mine2 <- mine_step2( exp = pepper_se, gcn = gcn, guides = guides$Gene, candidates = rownames(pepper_se) )
Step 3: Select candidates based on gene significance
mine_step3( exp, metadata, metadata_cols = 1, candidates, sample_group, min_cor = 0.2, alpha = 0.05, ... )
mine_step3( exp, metadata, metadata_cols = 1, candidates, sample_group, min_cor = 0.2, alpha = 0.05, ... )
exp |
Expression data frame with genes in row names and samples in column names or a SummarizedExperiment object. |
metadata |
Sample metadata with samples in row names and sample
information in the first column. Ignored if |
metadata_cols |
A vector (either numeric or character) indicating
which columns should be extracted from column metadata if exp
is a |
candidates |
Character vector of candidate genes to be inspected. |
sample_group |
Level of sample metadata to be used for filtering in gene-trait correlation. |
min_cor |
Minimum correlation value for
|
alpha |
Numeric indicating significance level. Default: 0.05 |
... |
Additional arguments to |
A data frame with mined candidate genes and their correlation to the condition of interest.
data(pepper_se) data(snp_pos) data(gene_ranges) data(guides) data(gcn) data(mine2) set.seed(1) mine3 <- mine_step3( exp = pepper_se, candidates = mine2$candidates, sample_group = "PRR_stress" )
data(pepper_se) data(snp_pos) data(gene_ranges) data(guides) data(gcn) data(mine2) set.seed(1) mine3 <- mine_step3( exp = pepper_se, candidates = mine2$candidates, sample_group = "PRR_stress" )
The list was created using the example code from mine_step().
data(mine2)
data(mine2)
List with elements 'candidates' (character vector) and 'enrichment' (data frame).
data(mine2)
data(mine2)
The data frame was created using the code from the vignette.
data(mined_candidates)
data(mined_candidates)
Data frame with an example of the output from mined_candidates
data(mined_candidates)
data(mined_candidates)
The data were filtered to keep only the top 4000 genes with highest RPKM values in PRR stress-related samples.
data(pepper_se)
data(pepper_se)
A SummarizedExperiment object.
Kim, MS., Kim, S., Jeon, J. et al. Global gene expression profiling for fruit organs and pathogen infections in the pepper, Capsicum annuum L.. Sci Data 5, 180103 (2018). https://doi.org/10.1038/sdata.2018.103
data(pepper_se)
data(pepper_se)
Circos plot of SNP distribution across chromosomes
plot_snp_circos(genome_ranges, gene_ranges, marker_ranges)
plot_snp_circos(genome_ranges, gene_ranges, marker_ranges)
genome_ranges |
A GRanges object with chromosome lengths. |
gene_ranges |
A GRanges object with genomic coordinates of all genes in the genome. |
marker_ranges |
Genomic positions of SNPs. For a single trait, a GRanges object. For multiple traits, a GRangesList or CompressedGRangesList object, with each element of the list representing SNP positions for a particular trait. |
A ggplot object with a circos plot of molecular marker distribution across chromosomes.
data(snp_pos) data(gene_ranges) data(chr_length) p <- plot_snp_circos(chr_length, gene_ranges, snp_pos)
data(snp_pos) data(gene_ranges) data(chr_length) p <- plot_snp_circos(chr_length, gene_ranges, snp_pos)
Plot a barplot of SNP distribution across chromosomes
plot_snp_distribution(marker_ranges)
plot_snp_distribution(marker_ranges)
marker_ranges |
Genomic positions of SNPs. For a single trait, a GRanges object. For multiple traits, a GRangesList or CompressedGRangesList object, with each element of the list representing SNP positions for a particular trait. List elements must have names for proper labelling. |
A ggplot object.
data(snp_pos) p <- plot_snp_distribution(snp_pos)
data(snp_pos) p <- plot_snp_distribution(snp_pos)
Score candidate genes and select the top n genes
score_genes( mined_candidates, hubs = NULL, tfs = NULL, pick_top = 10, weight_tf = 2, weight_hub = 2, weight_both = 3 )
score_genes( mined_candidates, hubs = NULL, tfs = NULL, pick_top = 10, weight_tf = 2, weight_hub = 2, weight_both = 3 )
mined_candidates |
Data frame resulting from |
hubs |
Character vector of hub genes. |
tfs |
Character vector of transcription factors. |
pick_top |
Number of top genes to select. Default: 10. |
weight_tf |
Numeric scalar with the weight to which correlation coefficients will be multiplied if the gene is a TF. Default: 2. |
weight_hub |
Numeric scalar with the weight to which correlation coefficients will be multiplied if the gene is a hub. Default: 2. |
weight_both |
Numeric scalar with the weight to which correlation coefficients will be multiplied if the gene is both a TF and a hub. Default: 3. |
Data frame with top n candidates and their scores.
data(tfs) data(hubs) data(mined_candidates) set.seed(1) scored <- score_genes(mined_candidates, hubs$Gene, tfs$Gene_ID)
data(tfs) data(hubs) data(mined_candidates) set.seed(1) scored <- score_genes(mined_candidates, hubs$Gene, tfs$Gene_ID)
This function counts genes that are contained in sliding windows related to each SNP.
simulate_windows( gene_ranges, marker_ranges, windows = seq(0.1, 2, by = 0.1), expand_intervals = TRUE )
simulate_windows( gene_ranges, marker_ranges, windows = seq(0.1, 2, by = 0.1), expand_intervals = TRUE )
gene_ranges |
A GRanges object with genomic coordinates of all genes in the genome. |
marker_ranges |
Genomic positions of SNPs. For a single trait, a GRanges object. For multiple traits, a GRangesList or CompressedGRangesList object, with each element of the list representing SNP positions for a particular trait. |
windows |
Sliding windows (in Mb) upstream and downstream relative to each SNP. Default: seq(0.1, 2, by = 0.1). |
expand_intervals |
Logical indicating whether or not to expand markers that are represented by intervals. This is particularly useful if users want to use a custom interval defined by linkage disequilibrium, for example. Default: TRUE. |
By default, the function creates 20 sliding windows by expanding upstream and downstream boundaries for each SNP from 0.1 Mb (100 kb) to 2 Mb.
A ggplot object summarizing the results of the simulations.
data(snp_pos) data(gene_ranges) simulate_windows(gene_ranges, snp_pos)
data(snp_pos) data(gene_ranges) simulate_windows(gene_ranges, snp_pos)
The SNPs in this data set were retrieved from Siddique et al., 2019, and they are associated to resistance to Phytophthora root rot.
data(snp_pos)
data(snp_pos)
A GRanges object.
Siddique, M.I., Lee, HY., Ro, NY. et al. Identifying candidate genes for Phytophthora capsici resistance in pepper (Capsicum annuum) via genotyping-by-sequencing-based QTL mapping and genome-wide association study. Sci Rep 9, 9962 (2019). https://doi.org/10.1038/s41598-019-46342-1
data(snp_pos)
data(snp_pos)
Pepper transcription factors and their families retrieved from PlantTFDB 4.0.
data(tfs)
data(tfs)
A data frame with gene IDs in the first column and TF families in the second column.
Jin, J., Tian, F., Yang, D. C., Meng, Y. Q., Kong, L., Luo, J., & Gao, G. (2016). PlantTFDB 4.0: toward a central hub for transcription factors and regulatory interactions in plants. Nucleic acids research, gkw982.
data(tfs)
data(tfs)