R packages by bioc

GenomicRanges - Representation and manipulation of genomic intervals

The ability to efficiently represent and manipulate genomic annotations and alignments is playing a central role when it comes to analyzing high-throughput sequencing data (a.k.a. NGS data). The GenomicRanges package defines general purpose containers for storing and manipulating genomic intervals and variables defined along a genome. More specialized containers for representing and manipulating short alignments against a reference genome, or a matrix-like summarization of an experiment, are defined in the GenomicAlignments and SummarizedExperiment packages, respectively. Both packages build on top of the GenomicRanges infrastructure.

Last updated 4 months ago

geneticsinfrastructuredatarepresentationsequencingannotationgenomeannotationcoveragebioconductor-packagecore-package

17.68 score 44 stars 1.3k dependents 13k scripts 86k downloads

clusterProfiler - A universal enrichment tool for interpreting omics data

This package supports functional characteristics of both coding and non-coding genomics data for thousands of species with up-to-date gene annotation. It provides a univeral interface for gene functional annotation from a variety of sources and thus can be applied in diverse scenarios. It provides a tidy interface to access, manipulate, and visualize enrichment results to help users achieve efficient data interpretation. Datasets obtained from multiple treatments and time points can be analyzed and compared in a single run, easily revealing functional consensus and differences among distinct conditions.

Last updated 4 months ago

annotationclusteringgenesetenrichmentgokeggmultiplecomparisonpathwaysreactomevisualizationenrichment-analysisgsea

17.03 score 1.1k stars 48 dependents 11k scripts 33k downloads

ComplexHeatmap - Make Complex Heatmaps

Complex heatmaps are efficient to visualize associations between different sources of data sets and reveal potential patterns. Here the ComplexHeatmap package provides a highly flexible way to arrange multiple heatmaps and supports various annotation graphics.

Last updated 5 months ago

softwarevisualizationsequencingclusteringcomplex-heatmapsheatmap

16.93 score 1.3k stars 151 dependents 16k scripts 24k downloads

GenomeInfoDb - Utilities for manipulating chromosome names, including modifying them to follow a particular naming style

Contains data and functions that define and allow translation between different chromosome sequence naming conventions (e.g., "chr1" versus "1"), including a function that attempts to place sequence names in their natural, rather than lexicographic, order.

Last updated 2 months ago

geneticsdatarepresentationannotationgenomeannotationbioconductor-packagecore-package

16.32 score 32 stars 1.7k dependents 1.3k scripts 114k downloads

fgsea - Fast Gene Set Enrichment Analysis

The package implements an algorithm for fast gene set enrichment analysis. Using the fast algorithm allows to make more permutations and get more fine grained p-values, which allows to use accurate stantard approaches to multiple hypothesis correction.

Last updated 7 days ago

geneexpressiondifferentialexpressiongenesetenrichmentpathwayscpp

16.31 score 392 stars 101 dependents 3.9k scripts 28k downloads

biomaRt - Interface to BioMart databases (i.e. Ensembl)

In recent years a wealth of biological data has become available in public data repositories. Easy access to these valuable data resources and firm integration with data analysis is needed for comprehensive bioinformatics data analysis. biomaRt provides an interface to a growing collection of databases implementing the BioMart software suite (<http://www.biomart.org>). The package enables retrieval of large amounts of data in a uniform way without the need to know the underlying database schemas or write complex SQL queries. The most prominent examples of BioMart databases are maintain by Ensembl, which provides biomaRt users direct access to a diverse set of data and enables a wide range of powerful online queries from gene annotation to database mining.

Last updated 12 days ago

annotationbioconductorbiomartensembl

15.99 score 38 stars 230 dependents 13k scripts 39k downloads

rhdf5 - R Interface to HDF5

This package provides an interface between HDF5 and R. HDF5's main features are the ability to store and access very large and/or complex datasets and a wide variety of metadata on mass storage (disk) through a completely portable file format. The rhdf5 package is thus suited for the exchange of large and/or complex datasets between R and other software package, and for letting R applications work on datasets that are larger than the available RAM.

Last updated 2 days ago

infrastructuredataimporthdf5rhdf5opensslcurlzlibcpp

15.87 score 62 stars 232 dependents 4.2k scripts 35k downloads

enrichplot - Visualization of Functional Enrichment Result

The 'enrichplot' package implements several visualization methods for interpreting functional enrichment results obtained from ORA or GSEA analysis. It is mainly designed to work with the 'clusterProfiler' package suite. All the visualization methods are developed based on 'ggplot2' graphics.

Last updated 3 months ago

annotationgenesetenrichmentgokeggpathwayssoftwarevisualizationenrichment-analysispathway-analysis

15.71 score 239 stars 58 dependents 3.1k scripts 31k downloads

Rsamtools - Binary alignment (BAM), FASTA, variant call (BCF), and tabix file import

This package provides an interface to the 'samtools', 'bcftools', and 'tabix' utilities for manipulating SAM (Sequence Alignment / Map), FASTA, binary variant call (BCF) and compressed indexed tab-delimited (tabix) files.

Last updated 4 months ago

dataimportsequencingcoveragealignmentqualitycontrolbioconductor-packagecore-packagecurlbzip2xz-utilszlibcpp

15.42 score 28 stars 566 dependents 3.2k scripts 47k downloads

GenomicFeatures - Query the gene models of a given organism/assembly

Extract the genomic locations of genes, transcripts, exons, introns, and CDS, for the gene models stored in a TxDb object. A TxDb object is a small database that contains the gene models of a given organism/assembly. Bioconductor provides a small collection of TxDb objects in the form of ready-to-install TxDb packages for the most commonly studied organisms. Additionally, the user can easily make a TxDb object (or package) for the organism/assembly of their choice by using the tools from the txdbmaker package.

Last updated 5 months ago

geneticsinfrastructureannotationsequencinggenomeannotationbioconductor-packagecore-package

15.34 score 26 stars 339 dependents 5.3k scripts 34k downloads

MultiAssayExperiment - Software for the integration of multi-omics experiments in Bioconductor

Harmonize data management of multiple experimental assays performed on an overlapping set of specimens. It provides a familiar Bioconductor user experience by extending concepts from SummarizedExperiment, supporting an open-ended mix of standard data classes for individual assays, and allowing subsetting by genomic ranges or rownames. Facilities are provided for reshaping data into wide and long formats for adaptability to graphing and downstream analysis.

Last updated 2 months ago

infrastructuredatarepresentationbioconductorbioconductor-packagegenomicsnci-itcrtcgau24ca289073

14.95 score 71 stars 127 dependents 670 scripts 9.1k downloads

GSVA - Gene Set Variation Analysis for Microarray and RNA-Seq Data

Gene Set Variation Analysis (GSVA) is a non-parametric, unsupervised method for estimating variation of gene set enrichment through the samples of a expression data set. GSVA performs a change in coordinate systems, transforming the data from a gene by sample matrix to a gene-set by sample matrix, thereby allowing the evaluation of pathway enrichment for each sample. This new matrix of GSVA enrichment scores facilitates applying standard analytical methods like functional enrichment, survival analysis, clustering, CNV-pathway analysis or cross-tissue pathway analysis, in a pathway-centric manner.

Last updated 5 days ago

functionalgenomicsmicroarrayrnaseqpathwaysgenesetenrichmentgene-set-enrichmentgenomicspathway-enrichment-analysis

14.74 score 212 stars 19 dependents 1.6k scripts 11k downloads

xcms - LC-MS and GC-MS Data Analysis

Framework for processing and visualization of chromatographically separated and single-spectra mass spectral data. Imports from AIA/ANDI NetCDF, mzXML, mzData and mzML files. Preprocesses data for high-throughput, untargeted analyte profiling.

Last updated 12 days ago

immunooncologymassspectrometrymetabolomicsbioconductorfeature-detectionmass-spectrometrypeak-detectioncpp

14.31 score 196 stars 11 dependents 984 scripts 3.3k downloads

BSgenome - Software infrastructure for efficient representation of full genomes and their SNPs

Infrastructure shared by all the Biostrings-based genome data packages.

Last updated 2 months ago

geneticsinfrastructuredatarepresentationsequencematchingannotationsnpbioconductor-packagecore-package

14.12 score 9 stars 267 dependents 1.2k scripts 25k downloads

limma - Linear Models for Microarray and Omics Data

Data analysis, linear models and differential expression for omics data.

Last updated 7 days ago

exonarraygeneexpressiontranscriptionalternativesplicingdifferentialexpressiondifferentialsplicinggenesetenrichmentdataimportbayesianclusteringregressiontimecoursemicroarraymicrornaarraymrnamicroarrayonechannelproprietaryplatformstwochannelsequencingrnaseqbatcheffectmultiplecomparisonnormalizationpreprocessingqualitycontrolbiomedicalinformaticscellbiologycheminformaticsepigeneticsfunctionalgenomicsgeneticsimmunooncologymetabolomicsproteomicssystemsbiologytranscriptomics

13.81 score 586 dependents 16k scripts 58k downloads

BiocFileCache - Manage Files Across Sessions

This package creates a persistent on-disk cache of files that the user can add, update, and retrieve. It is useful for managing resources (such as custom Txdb objects) that are costly or difficult to create, web resources, and data files used across sessions.

Last updated 2 months ago

dataimportcore-packageu24ca289073

13.76 score 13 stars 436 dependents 486 scripts 43k downloads

mixOmics - Omics Data Integration Project

Multivariate methods are well suited to large omics data sets where the number of variables (e.g. genes, proteins, metabolites) is much larger than the number of samples (patients, cells, mice). They have the appealing properties of reducing the dimension of the data by using instrumental variables (components), which are defined as combinations of all variables. Those components are then used to produce useful graphical outputs that enable better understanding of the relationships and correlation structures between the different data sets that are integrated. mixOmics offers a wide range of multivariate methods for the exploration and integration of biological datasets with a particular focus on variable selection. The package proposes several sparse multivariate models we have developed to identify the key variables that are highly correlated, and/or explain the biological outcome of interest. The data that can be analysed with mixOmics may come from high throughput sequencing technologies, such as omics data (transcriptomics, metabolomics, proteomics, metagenomics etc) but also beyond the realm of omics (e.g. spectral imaging). The methods implemented in mixOmics can also handle missing values without having to delete entire rows with missing data. A non exhaustive list of methods include variants of generalised Canonical Correlation Analysis, sparse Partial Least Squares and sparse Discriminant Analysis. Recently we implemented integrative methods to combine multiple data sets: N-integration with variants of Generalised Canonical Correlation Analysis and P-integration with variants of multi-group Partial Least Squares.

Last updated 13 days ago

immunooncologymicroarraysequencingmetabolomicsmetagenomicsproteomicsgenepredictionmultiplecomparisonclassificationregressionbioconductorgenomicsgenomics-datagenomics-visualizationmultivariate-analysismultivariate-statisticsomicsr-pkgr-project

13.71 score 182 stars 22 dependents 1.3k scripts 4.0k downloads

HDF5Array - HDF5 datasets as array-like objects in R

The HDF5Array package is an HDF5 backend for DelayedArray objects. It implements the HDF5Array, H5SparseMatrix, H5ADMatrix, and TENxMatrix classes, 4 convenient and memory-efficient array-like containers for representing and manipulating either: (1) a conventional (a.k.a. dense) HDF5 dataset, (2) an HDF5 sparse matrix (stored in CSR/CSC/Yale format), (3) the central matrix of an h5ad file (or any matrix in the /layers group), or (4) a 10x Genomics sparse matrix. All these containers are DelayedArray extensions and thus support all operations (delayed or block-processed) supported by DelayedArray objects.

Last updated 7 days ago

infrastructuredatarepresentationdataimportsequencingrnaseqcoverageannotationgenomeannotationsinglecellimmunooncologybioconductor-packagecore-packageu24ca289073

13.20 score 12 stars 126 dependents 844 scripts 27k downloads

minfi - Analyze Illumina Infinium DNA methylation arrays

Tools to analyze & visualize Illumina Infinium methylation arrays.

Last updated 4 months ago

immunooncologydnamethylationdifferentialmethylationepigeneticsmicroarraymethylationarraymultichanneltwochanneldataimportnormalizationpreprocessingqualitycontrol

12.82 score 60 stars 27 dependents 996 scripts 4.2k downloads

MSnbase - Base Functions and Classes for Mass Spectrometry and Proteomics

MSnbase provides infrastructure for manipulation, processing and visualisation of mass spectrometry and proteomics data, ranging from raw to quantitative and annotated data.

Last updated 12 days ago

immunooncologyinfrastructureproteomicsmassspectrometryqualitycontroldataimportbioconductorbioinformaticsmass-spectrometryproteomics-datavisualisationcpp

12.76 score 131 stars 36 dependents 772 scripts 5.6k downloads

plyranges - A fluent interface for manipulating GenomicRanges

A dplyr-like interface for interacting with the common Bioconductor classes Ranges and GenomicRanges. By providing a grammatical and consistent way of manipulating these classes their accessiblity for new Bioconductor users is hopefully increased.

Last updated 7 days ago

infrastructuredatarepresentationworkflowstepcoveragebioconductordata-analysisdplyrgenomic-rangesgenomicstidy-data

12.66 score 144 stars 20 dependents 1.9k scripts 1.9k downloads

GenomicRanges - Representation and manipulation of genomic intervals

clusterProfiler - A universal enrichment tool for interpreting omics data

ComplexHeatmap - Make Complex Heatmaps

GenomeInfoDb - Utilities for manipulating chromosome names, including modifying them to follow a particular naming style

fgsea - Fast Gene Set Enrichment Analysis

biomaRt - Interface to BioMart databases (i.e. Ensembl)

rhdf5 - R Interface to HDF5

enrichplot - Visualization of Functional Enrichment Result

Rsamtools - Binary alignment (BAM), FASTA, variant call (BCF), and tabix file import

GenomicFeatures - Query the gene models of a given organism/assembly

MultiAssayExperiment - Software for the integration of multi-omics experiments in Bioconductor

GSVA - Gene Set Variation Analysis for Microarray and RNA-Seq Data

xcms - LC-MS and GC-MS Data Analysis

BSgenome - Software infrastructure for efficient representation of full genomes and their SNPs

limma - Linear Models for Microarray and Omics Data

BiocFileCache - Manage Files Across Sessions

mixOmics - Omics Data Integration Project

HDF5Array - HDF5 datasets as array-like objects in R

minfi - Analyze Illumina Infinium DNA methylation arrays

MSnbase - Base Functions and Classes for Mass Spectrometry and Proteomics

plyranges - A fluent interface for manipulating GenomicRanges

rtracklayer - R interface to genome annotation files and the UCSC genome browser

SpatialExperiment - S4 Class for Spatially Resolved -omics Data

SparseArray - High-performance sparse data representation and manipulation in R

scDblFinder - scDblFinder

TFBSTools - Software Package for Transcription Factor Binding Site (TFBS) Analysis

bsseq - Analyze, manage and store whole-genome methylation data

glmGamPoi - Fit a Gamma-Poisson Generalized Linear Model

SeqArray - Data management of large-scale whole-genome sequence variant calls using GDS files

metagenomeSeq - Statistical analysis for sparse high-throughput sequencing

DelayedMatrixStats - Functions that Apply to Rows and Columns of 'DelayedMatrix' Objects

graph - graph: A package to handle graph data structures

variancePartition - Quantify and interpret drivers of variation in multilevel gene expression experiments

MatrixGenerics - S4 Generic Summary Statistic Functions that Operate on Matrix-Like Objects

Rgraphviz - Provides plotting capabilities for R graph objects

mia - Microbiome analysis

destiny - Creates diffusion maps

VariantAnnotation - Annotation of Genetic Variants

PharmacoGx - Analysis of Large-Scale Pharmacogenomic Data

XVector - Foundation of external vector representation and manipulation in Bioconductor

gdsfmt - R Interface to CoreArray Genomic Data Structure (GDS) Files

Rhdf5lib - hdf5 library as an R package

affy - Methods for Affymetrix Oligonucleotide Arrays

beachmat - Compiling Bioconductor to Handle Each Matrix Type

CATALYST - Cytometry dATa anALYSis Tools

universalmotif - Import, Modify, and Export Motifs with R

ANCOMBC - Microbiome differential abudance and correlation analyses with bias correction

GWASTools - Tools for Genome Wide Association Studies

tximeta - Transcript Quantification Import with Automatic Metadata

MsCoreUtils - Core Utils for Mass Spectrometry Data

GENESIS - GENetic EStimation and Inference in Structured samples (GENESIS): Statistical methods for analyzing genetic data from samples with population structure and/or relatedness

UCell - Rank-based signature enrichment analysis for single-cell data

scRepertoire - A toolkit for single-cell immune receptor profiling

Cardinal - A mass spectrometry imaging toolbox for statistical analysis

EpiDISH - Epigenetic Dissection of Intra-Sample-Heterogeneity

BiocIO - Standard Input and Output for Bioconductor Packages

cBioPortalData - Exposes and Makes Available Data from the cBioPortal Web Resources

graphite - GRAPH Interaction from pathway Topological Environment

UCSC.utils - Low-level utilities to retrieve data from the UCSC Genome Browser

derfinder - Annotation-agnostic differential expression analysis of RNA-seq data at base-pair resolution via the DER Finder approach

DropletUtils - Utilities for Handling Single-Cell Droplet Data

rGREAT - GREAT Analysis - Functional Enrichment on Genomic Regions

splatter - Simple Simulation of Single-cell RNA Sequencing Data

rhdf5filters - HDF5 Compression Filters

PureCN - Copy number calling and SNV classification using targeted short read sequencing

RTCGAToolbox - A new tool for exporting TCGA Firehose data

txdbmaker - Tools for making TxDb objects from genomic annotations

TCGAutils - TCGA utility functions for data management

pcaExplorer - Interactive Visualization of RNA-seq Data Using a Principal Components Approach

AnnotationForge - Tools for building SQLite-based annotation data packages

tidybulk - Brings transcriptomics to the tidyverse

recount - Explore and download data from the recount project

matter - Out-of-core statistical computing and signal processing

MassSpecWavelet - Peak Detection for Mass Spectrometry data using wavelet-based algorithms

ProtGenerics - Generic infrastructure for Bioconductor mass spectrometry packages

ggmsa - Plot Multiple Sequence Alignment using 'ggplot2'

Rsubread - Mapping, quantification and variant analysis of sequencing data

basilisk - Freezing Python Dependencies Inside Bioconductor Packages

batchelor - Single-Cell Batch Correction Methods

sesame - SEnsible Step-wise Analysis of DNA MEthylation BeadChips