Biostrings - Efficient manipulation of biological strings

Memory efficient string containers, string matching algorithms, and other utilities, for fast manipulation of large biological sequences or sets of sequences.

Last updated 2 months ago

sequencematchingalignmentsequencinggeneticsdataimportdatarepresentationinfrastructurebioconductor-packagecore-package

17.85 score 58 stars 1.2k dependents 8.6k scripts 99k downloads

BiocParallel - Bioconductor facilities for parallel evaluation

This package provides modified versions and novel implementation of functions for parallel evaluation, tailored to use with Bioconductor objects.

Last updated 4 months ago

infrastructurebioconductor-packagecore-packageu24ca289073cpp

17.13 score 67 stars 1.1k dependents 6.7k scripts 82k downloads

DESeq2 - Differential gene expression analysis based on the negative binomial distribution

Estimate variance-mean dependence in count data from high-throughput sequencing assays and test for differential expression based on a model using the negative binomial distribution.

Last updated 7 days ago

sequencingrnaseqchipseqgeneexpressiontranscriptionnormalizationdifferentialexpressionbayesianregressionprincipalcomponentclusteringimmunooncologyopenblascpp

16.05 score 368 stars 115 dependents 17k scripts 36k downloads

BiocGenerics - S4 generic functions used in Bioconductor

The package defines many S4 generic functions used in Bioconductor.

Last updated 4 days ago

infrastructurebioconductor-packagecore-package

14.23 score 12 stars 2.2k dependents 612 scripts 108k downloads

GenomicDataCommons - NIH / NCI Genomic Data Commons Access

Programmatically access the NIH / NCI Genomic Data Commons RESTful service.

Last updated 7 days ago

dataimportsequencingapi-clientbioconductorbioinformaticscancercore-servicesdata-sciencegenomicsncitcgavignette

11.93 score 85 stars 12 dependents 238 scripts 1.8k downloads

scater - Single-Cell Analysis Toolkit for Gene Expression Data in R

A collection of tools for doing various analyses of single-cell RNA-seq gene expression data, with a focus on quality control and visualization.

Last updated 10 days ago

immunooncologysinglecellrnaseqqualitycontrolpreprocessingnormalizationvisualizationdimensionreductiontranscriptomicsgeneexpressionsequencingsoftwaredataimportdatarepresentationinfrastructurecoverage

10.96 score 40 dependents 12k scripts 12k downloads

infercnv - Infer Copy Number Variation from Single-Cell RNA-Seq Data

Using single-cell RNA-Seq expression to visualize CNV in cells.

Last updated 4 months ago

softwarecopynumbervariationvariantdetectionstructuralvariationgenomicvariationgeneticstranscriptomicsstatisticalmethodbayesianhiddenmarkovmodelsinglecelljagscpp

10.89 score 588 stars 654 scripts 2.4k downloads

tximeta - Transcript Quantification Import with Automatic Metadata

Transcript quantification import from Salmon and other quantifiers with automatic attachment of transcript ranges and release information, and other associated metadata. De novo transcriptomes can be linked to the appropriate sources with linkedTxomes and shared for computational reproducibility.

Last updated 12 days ago

annotationgenomeannotationdataimportpreprocessingrnaseqsinglecelltranscriptomicstranscriptiongeneexpressionfunctionalgenomicsreproducibleresearchreportwritingimmunooncology

10.68 score 67 stars 1 dependents 466 scripts 2.0k downloads

GSEABase - Gene set enrichment data structures and methods

This package provides classes and methods to support Gene Set Enrichment Analysis (GSEA).

Last updated 14 hours ago

geneexpressiongenesetenrichmentgraphandnetworkgokegg

10.27 score 75 dependents 1.5k scripts 13k downloads

scuttle - Single-Cell RNA-Seq Analysis Utilities

Provides basic utility functions for performing single-cell analyses, focusing on simple normalization, quality control and data transformations. Also provides some helper functions to assist development of other packages.

Last updated 4 months ago

immunooncologysinglecellrnaseqqualitycontrolpreprocessingnormalizationtranscriptomicsgeneexpressionsequencingsoftwaredataimportopenblascpp

10.21 score 77 dependents 1.7k scripts 17k downloads

tidybulk - Brings transcriptomics to the tidyverse

This is a collection of utility functions that allow to perform exploration of and calculations to RNA sequencing data, in a modular, pipe-friendly and tidy fashion.

Last updated 4 months ago

assaydomaininfrastructurernaseqdifferentialexpressiongeneexpressionnormalizationclusteringqualitycontrolsequencingtranscriptiontranscriptomicsbioconductorbulk-transcriptional-analysesdeseq2differential-expressionedgerensembl-idsentrezgene-symbolsgseamds-dimensionspcapiperedundancytibbletidytidy-datatidyversetranscriptstsne

9.47 score 166 stars 1 dependents 169 scripts 546 downloads

Rsubread - Mapping, quantification and variant analysis of sequencing data

Alignment, quantification and analysis of RNA sequencing data (including both bulk RNA-seq and scRNA-seq) and DNA sequenicng data (including ATAC-seq, ChIP-seq, WGS, WES etc). Includes functionality for read mapping, read counting, SNP calling, structural variant detection and gene fusion discovery. Can be applied to all major sequencing techologies and to both short and long sequence reads.

Last updated 8 days ago

sequencingalignmentsequencematchingrnaseqchipseqsinglecellgeneexpressiongeneregulationgeneticsimmunooncologysnpgeneticvariabilitypreprocessingqualitycontrolgenomeannotationgenefusiondetectionindeldetectionvariantannotationvariantdetectionmultiplesequencealignmentzlib

9.17 score 10 dependents 860 scripts 3.6k downloads

EWCE - Expression Weighted Celltype Enrichment

Used to determine which cell types are enriched within gene lists. The package provides tools for testing enrichments within simple gene lists (such as human disease associated genes) and those resulting from differential expression studies. The package does not depend upon any particular Single Cell Transcriptome dataset and user defined datasets can be loaded in and used in the analyses.

Last updated 4 months ago

geneexpressiontranscriptiondifferentialexpressiongenesetenrichmentgeneticsmicroarraymrnamicroarrayonechannelrnaseqbiomedicalinformaticsproteomicsvisualizationfunctionalgenomicssinglecelldeconvolutionsingle-cellsingle-cell-rna-seqtranscriptomics

9.17 score 55 stars 96 scripts 501 downloads

bambu - Context-Aware Transcript Quantification from Long Read RNA-Seq data

bambu is a R package for multi-sample transcript discovery and quantification using long read RNA-Seq data. You can use bambu after read alignment to obtain expression estimates for known and novel transcripts and genes. The output from bambu can directly be used for visualisation and downstream analysis such as differential gene expression or transcript usage.

Last updated 4 days ago

alignmentcoveragedifferentialexpressionfeatureextractiongeneexpressiongenomeannotationgenomeassemblyimmunooncologylongreadmultiplecomparisonnormalizationrnaseqregressionsequencingsoftwaretranscriptiontranscriptomicsbambubioconductorlong-readsnanoporenanopore-sequencingrna-seqrna-seq-analysistranscript-quantificationtranscript-reconstructioncpp

8.91 score 197 stars 1 dependents 86 scripts 728 downloads

GenomicScores - Infrastructure to work with genomewide position-specific scores

Provide infrastructure to store and access genomewide position-specific scores within R and Bioconductor.

Last updated 10 days ago

infrastructuregeneticsannotationsequencingcoverageannotationhubsoftware

8.71 score 8 stars 6 dependents 83 scripts 1.2k downloads

csaw - ChIP-Seq Analysis with Windows

Detection of differentially bound regions in ChIP-seq data with sliding windows, with methods for normalization and proper FDR control.

Last updated 13 days ago

multiplecomparisonchipseqnormalizationsequencingcoveragegeneticsannotationdifferentialpeakcallingcurlbzip2xz-utilszlibcpp

8.32 score 7 dependents 498 scripts 893 downloads

orthogene - Interspecies gene mapping

`orthogene` is an R package for easy mapping of orthologous genes across hundreds of species. It pulls up-to-date gene ortholog mappings across **700+ organisms**. It also provides various utility functions to aggregate/expand common objects (e.g. data.frames, gene expression matrices, lists) using **1:1**, **many:1**, **1:many** or **many:many** gene mappings, both within- and between-species.

Last updated 4 months ago

geneticscomparativegenomicspreprocessingphylogeneticstranscriptomicsgeneexpressionanimal-modelsbioconductorbioconductor-packagebioinformaticsbiomedicinecomparative-genomicsevolutionary-biologygenesgenomicsontologiestranslational-research

7.84 score 41 stars 2 dependents 31 scripts 556 downloads

EBSeq - An R package for gene and isoform differential expression analysis of RNA-seq data

Differential Expression analysis at both gene and isoform level using RNA-seq data

Last updated 14 days ago

immunooncologystatisticalmethoddifferentialexpressionmultiplecomparisonrnaseqsequencingcpp

7.77 score 6 dependents 162 scripts 752 downloads

rrvgo - Reduce + Visualize GO

Reduce and visualize lists of Gene Ontology terms by identifying redudance based on semantic similarity.

Last updated 4 months ago

annotationclusteringgonetworkpathwayssoftware

7.74 score 24 stars 190 scripts 880 downloads

RBioFormats - R interface to Bio-Formats

An R package which interfaces the OME Bio-Formats Java library to allow reading of proprietary microscopy image data and metadata.

Last updated 4 months ago

dataimportbio-formatsbioconductorimage-processingopenjdk

7.55 score 25 stars 1 dependents 52 scripts 366 downloads

mbkmeans - Mini-batch K-means Clustering for Single-Cell RNA-seq

Implements the mini-batch k-means algorithm for large datasets, including support for on-disk data representation.

Last updated 4 months ago

clusteringgeneexpressionrnaseqsoftwaretranscriptomicssequencingsinglecellhuman-cell-atlascpp

7.41 score 10 stars 2 dependents 54 scripts 873 downloads

cytolib - C++ infrastructure for representing and interacting with the gated cytometry data

This package provides the core data structure and API to represent and interact with the gated cytometry data.

Last updated 13 days ago

immunooncologyflowcytometrydataimportpreprocessingdatarepresentation

7.38 score 60 dependents 7 scripts 4.4k downloads

GeomxTools - NanoString GeoMx Tools

Tools for NanoString Technologies GeoMx Technology. Package provides functions for reading in DCC and PKC files based on an ExpressionSet derived object. Normalization and QC functions are also included.

Last updated 4 months ago

geneexpressiontranscriptioncellbasedassaysdataimporttranscriptomicsproteomicsmrnamicroarrayproprietaryplatformsrnaseqsequencingexperimentaldesignnormalizationspatial

7.17 score 3 dependents 218 scripts 754 downloads

qpgraph - Estimation of genetic and molecular regulatory networks from high-throughput genomics data

Estimate gene and eQTL networks from high-throughput expression and genotyping assays.

Last updated 10 days ago

microarraygeneexpressiontranscriptionpathwaysnetworkinferencegraphandnetworkgeneregulationgeneticsgeneticvariabilitysnpsoftwareopenblas

7.16 score 3 dependents 20 scripts 448 downloads

iSEEu - iSEE Universe

iSEEu (the iSEE universe) contains diverse functionality to extend the usage of the iSEE package, including additional classes for the panels, or modes allowing easy configuration of iSEE applications.

Last updated 4 months ago

immunooncologyvisualizationguidimensionreductionfeatureextractionclusteringtranscriptiongeneexpressiontranscriptomicssinglecellcellbasedassayshacktoberfest

7.15 score 9 stars 1 dependents 35 scripts 265 downloads

systemPipeShiny - systemPipeShiny: An Interactive Framework for Workflow Management and Visualization

systemPipeShiny (SPS) extends the widely used systemPipeR (SPR) workflow environment with a versatile graphical user interface provided by a Shiny App. This allows non-R users, such as experimentalists, to run many systemPipeR’s workflow designs, control, and visualization functionalities interactively without requiring knowledge of R. Most importantly, SPS has been designed as a general purpose framework for interacting with other R packages in an intuitive manner. Like most Shiny Apps, SPS can be used on both local computers as well as centralized server-based deployments that can be accessed remotely as a public web service for using SPR’s functionalities with community and/or private data. The framework can integrate many core packages from the R/Bioconductor ecosystem. Examples of SPS’ current functionalities include: (a) interactive creation of experimental designs and metadata using an easy to use tabular editor or file uploader; (b) visualization of workflow topologies combined with auto-generation of R Markdown preview for interactively designed workflows; (d) access to a wide range of data processing routines; (e) and an extendable set of visualization functionalities. Complex visual results can be managed on a 'Canvas Workbench’ allowing users to organize and to compare plots in an efficient manner combined with a session snapshot feature to continue work at a later time. The present suite of pre-configured visualization examples. The modular design of SPR makes it easy to design custom functions without any knowledge of Shiny, as well as extending the environment in the future with contributions from the community.

Last updated 4 months ago

shinyappsinfrastructuredataimportsequencingqualitycontrolreportwritingexperimentaldesignclusteringbioconductorbioconductor-packagedata-visualizationshinysystempiper

7.03 score 33 stars 36 scripts 236 downloads

BioTIP - BioTIP: An R package for characterization of Biological Tipping-Point

Adopting tipping-point theory to transcriptome profiles to unravel disease regulatory trajectory.

Last updated 4 months ago

sequencingrnaseqgeneexpressiontranscriptionsoftware

6.84 score 18 stars 37 scripts 204 downloads

ROTS - Reproducibility-Optimized Test Statistic

Calculates the Reproducibility-Optimized Test Statistic (ROTS) for differential testing in omics data.

Last updated 18 days ago

softwaregeneexpressiondifferentialexpressionmicroarrayrnaseqproteomicsimmunooncologycpp

6.72 score 3 dependents 84 scripts 492 downloads

NanoStringNCTools - NanoString nCounter Tools

Tools for NanoString Technologies nCounter Technology. Provides support for reading RCC files into an ExpressionSet derived object. Also includes methods for QC and normalizaztion of NanoString data.

Last updated 4 months ago

geneexpressiontranscriptioncellbasedassaysdataimporttranscriptomicsproteomicsmrnamicroarrayproprietaryplatformsrnaseq

6.45 score 4 dependents 93 scripts 642 downloads

distinct - distinct: a method for differential analyses via hierarchical permutation tests

distinct is a statistical method to perform differential testing between two or more groups of distributions; differential testing is performed via hierarchical non-parametric permutation tests on the cumulative distribution functions (cdfs) of each sample. While most methods for differential expression target differences in the mean abundance between conditions, distinct, by comparing full cdfs, identifies, both, differential patterns involving changes in the mean, as well as more subtle variations that do not involve the mean (e.g., unimodal vs. bi-modal distributions with the same mean). distinct is a general and flexible tool: due to its fully non-parametric nature, which makes no assumptions on how the data was generated, it can be applied to a variety of datasets. It is particularly suitable to perform differential state analyses on single cell data (i.e., differential analyses within sub-populations of cells), such as single cell RNA sequencing (scRNA-seq) and high-dimensional flow or mass cytometry (HDCyto) data. To use distinct one needs data from two or more groups of samples (i.e., experimental conditions), with at least 2 samples (i.e., biological replicates) per group.

Last updated 4 months ago

geneticsrnaseqsequencingdifferentialexpressiongeneexpressionmultiplecomparisonsoftwaretranscriptionstatisticalmethodvisualizationsinglecellflowcytometrygenetargetopenblascpp

6.35 score 11 stars 1 dependents 34 scripts 508 downloads

CopyNumberPlots - Create Copy-Number Plots using karyoploteR functionality

CopyNumberPlots have a set of functions extending karyoploteRs functionality to create beautiful, customizable and flexible plots of copy-number related data.

Last updated 4 months ago

visualizationcopynumbervariationcoverageonechanneldataimportsequencingdnaseqbioconductorbioconductor-packagebioinformaticscopy-number-variationgenomicsgenomics-visualization

6.24 score 6 stars 2 dependents 16 scripts 313 downloads

crisprViz - Visualization Functions for CRISPR gRNAs

Provides functionalities to visualize and contextualize CRISPR guide RNAs (gRNAs) on genomic tracks across nucleases and applications. Works in conjunction with the crisprBase and crisprDesign Bioconductor packages. Plots are produced using the Gviz framework.

Last updated 4 months ago

crisprfunctionalgenomicsgenetargetbioconductorbioconductor-packagecrispr-analysiscrispr-designgrnagrna-sequencegrna-sequencessgrnasgrna-designvisualization

6.23 score 7 stars 2 dependents 6 scripts 215 downloads

rpx - R Interface to the ProteomeXchange Repository

The rpx package implements an interface to proteomics data submitted to the ProteomeXchange consortium.

Last updated 4 days ago

immunooncologyproteomicsmassspectrometrydataimportthirdpartyclientbioconductordatamass-spectrometryproteomexchange

6.20 score 5 stars 21 scripts 531 downloads

MOMA - Multi Omic Master Regulator Analysis

This package implements the inference of candidate master regulator proteins from multi-omics' data (MOMA) algorithm, as well as ancillary analysis and visualization functions.

Last updated 4 months ago

softwarenetworkenrichmentnetworkinferencenetworkfeatureextractionclusteringfunctionalgenomicstranscriptomicssystemsbiology

6.19 score 6 stars 13 scripts 162 downloads

omicsViewer - Interactive and explorative visualization of SummarizedExperssionSet or ExpressionSet using omicsViewer

omicsViewer visualizes ExpressionSet (or SummarizedExperiment) in an interactive way. The omicsViewer has a separate back- and front-end. In the back-end, users need to prepare an ExpressionSet that contains all the necessary information for the downstream data interpretation. Some extra requirements on the headers of phenotype data or feature data are imposed so that the provided information can be clearly recognized by the front-end, at the same time, keep a minimum modification on the existing ExpressionSet object. The pure dependency on R/Bioconductor guarantees maximum flexibility in the statistical analysis in the back-end. Once the ExpressionSet is prepared, it can be visualized using the front-end, implemented by shiny and plotly. Both features and samples could be selected from (data) tables or graphs (scatter plot/heatmap). Different types of analyses, such as enrichment analysis (using Bioconductor package fgsea or fisher's exact test) and STRING network analysis, will be performed on the fly and the results are visualized simultaneously. When a subset of samples and a phenotype variable is selected, a significance test on means (t-test or ranked based test; when phenotype variable is quantitative) or test of independence (chi-square or fisher’s exact test; when phenotype data is categorical) will be performed to test the association between the phenotype of interest with the selected samples. Additionally, other analyses can be easily added as extra shiny modules. Therefore, omicsViewer will greatly facilitate data exploration, many different hypotheses can be explored in a short time without the need for knowledge of R. In addition, the resulting data could be easily shared using a shiny server. Otherwise, a standalone version of omicsViewer together with designated omics data could be easily created by integrating it with portable R, which can be shared with collaborators or submitted as supplementary data together with a manuscript.

Last updated 13 days ago

softwarevisualizationgenesetenrichmentdifferentialexpressionmotifdiscoverynetworknetworkenrichment

6.12 score 4 stars 22 scripts 196 downloads

metaseqR2 - An R package for the analysis and result reporting of RNA-Seq data by combining multiple statistical algorithms

Provides an interface to several normalization and statistical testing packages for RNA-Seq gene expression data. Additionally, it creates several diagnostic plots, performs meta-analysis by combinining the results of several statistical tests and reports the results in an interactive way.

Last updated 4 months ago

softwaregeneexpressiondifferentialexpressionworkflowsteppreprocessingqualitycontrolnormalizationreportwritingrnaseqtranscriptionsequencingtranscriptomicsbayesianclusteringcellbiologybiomedicalinformaticsfunctionalgenomicssystemsbiologyimmunooncologyalternativesplicingdifferentialsplicingmultiplecomparisontimecoursedataimportatacseqepigeneticsregressionproprietaryplatformsgenesetenrichmentbatcheffectchipseq

6.05 score 7 stars 3 scripts 232 downloads

IgGeneUsage - Differential gene usage in immune repertoires

Detection of biases in the usage of immunoglobulin (Ig) genes is an important task in immune repertoire profiling. IgGeneUsage detects aberrant Ig gene usage between biological conditions using a probabilistic model which is analyzed computationally by Bayes inference. With this IgGeneUsage also avoids some common problems related to the current practice of null-hypothesis significance testing.

Last updated 4 months ago

differentialexpressionregressiongeneticsbayesianbiomedicalinformaticsimmunooncologymathematicalbiologyb-cell-receptorbcr-repertoiredifferential-analysisdifferential-gene-expressionhigh-throughput-sequencingimmune-repertoireimmune-repertoire-analysisimmune-repertoiresimmunogenomicsimmunoglobulinimmunoinformaticsimmunological-bioinformaticsimmunologytcr-repertoirevdj-recombinationcpp

6.03 score 6 stars 1 scripts 183 downloads

SCOPE - A normalization and copy number estimation method for single-cell DNA sequencing

Whole genome single-cell DNA sequencing (scDNA-seq) enables characterization of copy number profiles at the cellular level. This circumvents the averaging effects associated with bulk-tissue sequencing and has increased resolution yet decreased ambiguity in deconvolving cancer subclones and elucidating cancer evolutionary history. ScDNA-seq data is, however, sparse, noisy, and highly variable even within a homogeneous cell population, due to the biases and artifacts that are introduced during the library preparation and sequencing procedure. Here, we propose SCOPE, a normalization and copy number estimation method for scDNA-seq data. The distinguishing features of SCOPE include: (i) utilization of cell-specific Gini coefficients for quality controls and for identification of normal/diploid cells, which are further used as negative control samples in a Poisson latent factor model for normalization; (ii) modeling of GC content bias using an expectation-maximization algorithm embedded in the Poisson generalized linear models, which accounts for the different copy number states along the genome; (iii) a cross-sample iterative segmentation procedure to identify breakpoints that are shared across cells from the same genetic background.

Last updated 4 months ago

singlecellnormalizationcopynumbervariationsequencingwholegenomecoveragealignmentqualitycontroldataimportdnaseq

5.91 score 82 scripts 256 downloads

APAlyzer - A toolkit for APA analysis using RNA-seq data

Perform 3'UTR APA, Intronic APA and gene expression analysis using RNA-seq data.

Last updated 4 months ago

sequencingrnaseqdifferentialexpressiongeneexpressiongeneregulationannotationdataimportsoftwareative-polyadenylationbioinformatics-toolrna-seq

5.75 score 7 stars 9 scripts 188 downloads

ppcseq - Probabilistic Outlier Identification for RNA Sequencing Generalized Linear Models

Relative transcript abundance has proven to be a valuable tool for understanding the function of genes in biological systems. For the differential analysis of transcript abundance using RNA sequencing data, the negative binomial model is by far the most frequently adopted. However, common methods that are based on a negative binomial model are not robust to extreme outliers, which we found to be abundant in public datasets. So far, no rigorous and probabilistic methods for detection of outliers have been developed for RNA sequencing data, leaving the identification mostly to visual inspection. Recent advances in Bayesian computation allow large-scale comparison of observed data against its theoretical distribution given in a statistical model. Here we propose ppcseq, a key quality-control tool for identifying transcripts that include outlier data points in differential expression analysis, which do not follow a negative binomial distribution. Applying ppcseq to analyse several publicly available datasets using popular tools, we show that from 3 to 10 percent of differentially abundant transcripts across algorithms and datasets had statistics inflated by the presence of outliers.

Last updated 4 months ago

rnaseqdifferentialexpressiongeneexpressionnormalizationclusteringqualitycontrolsequencingtranscriptiontranscriptomicsbayesian-inferencedeseq2edgernegative-binomialoutlierstancpp

5.65 score 7 stars 16 scripts 214 downloads

planet - Placental DNA methylation analysis tools

This package contains R functions to predict biological variables to from placnetal DNA methylation data generated from infinium arrays. This includes inferring ethnicity/ancestry, gestational age, and cell composition from placental DNA methylation array (450k/850k) data.

Last updated 9 days ago

softwaredifferentialmethylationepigeneticsmicroarraymethylationarraydnamethylationcpgislandancestrydna-methylation-datageneticsinferencemachine-learningplacenta

5.64 score 4 stars 1 dependents 12 scripts 370 downloads

GRaNIE - GRaNIE: Reconstruction cell type specific gene regulatory networks including enhancers using single-cell or bulk chromatin accessibility and RNA-seq data

Genetic variants associated with diseases often affect non-coding regions, thus likely having a regulatory role. To understand the effects of genetic variants in these regulatory regions, identifying genes that are modulated by specific regulatory elements (REs) is crucial. The effect of gene regulatory elements, such as enhancers, is often cell-type specific, likely because the combinations of transcription factors (TFs) that are regulating a given enhancer have cell-type specific activity. This TF activity can be quantified with existing tools such as diffTF and captures differences in binding of a TF in open chromatin regions. Collectively, this forms a gene regulatory network (GRN) with cell-type and data-specific TF-RE and RE-gene links. Here, we reconstruct such a GRN using single-cell or bulk RNAseq and open chromatin (e.g., using ATACseq or ChIPseq for open chromatin marks) and optionally (Capture) Hi-C data. Our network contains different types of links, connecting TFs to regulatory elements, the latter of which is connected to genes in the vicinity or within the same chromatin domain (TAD). We use a statistical framework to assign empirical FDRs and weights to all links using a permutation-based approach.

Last updated 4 months ago

softwaregeneexpressiongeneregulationnetworkinferencegenesetenrichmentbiomedicalinformaticsgeneticstranscriptomicsatacseqrnaseqgraphandnetworkregressiontranscriptionchipseq

5.40 score 24 scripts 266 downloads

DEWSeq - Differential Expressed Windows Based on Negative Binomial Distribution

DEWSeq is a sliding window approach for the analysis of differentially enriched binding regions eCLIP or iCLIP next generation sequencing data.

Last updated 4 months ago

sequencinggeneregulationfunctionalgenomicsdifferentialexpressionbioinformaticseclipngs-analysis

5.30 score 5 stars 4 scripts 190 downloads

DegNorm - DegNorm: degradation normalization for RNA-seq data

This package performs degradation normalization in bulk RNA-seq data to improve differential expression analysis accuracy.

Last updated 4 months ago

rnaseqnormalizationgeneexpressionalignmentcoveragedifferentialexpressionbatcheffectsoftwaresequencingimmunooncologyqualitycontroldataimportopenblascppopenmp

5.20 score 1 stars 3 scripts 193 downloads

MouseFM - In-silico methods for genetic finemapping in inbred mice

This package provides methods for genetic finemapping in inbred mice by taking advantage of their very high homozygosity rate (>95%).

Last updated 4 months ago

geneticssnpgenetargetvariantannotationgenomicvariationmultiplecomparisonsystemsbiologymathematicalbiologypatternlogicgenepredictionbiomedicalinformaticsfunctionalgenomicsfinemapgene-candidatesinbred-miceinbred-strainsmouseqtlqtl-mapping

5.13 score 5 scripts 359 downloads

SGCP - SGCP: A semi-supervised pipeline for gene clustering using self-training approach in gene co-expression networks

SGC is a semi-supervised pipeline for gene clustering in gene co-expression networks. SGC consists of multiple novel steps that enable the computation of highly enriched modules in an unsupervised manner. But unlike all existing frameworks, it further incorporates a novel step that leverages Gene Ontology information in a semi-supervised clustering method that further improves the quality of the computed modules.

Last updated 4 months ago

geneexpressiongenesetenrichmentnetworkenrichmentsystemsbiologyclassificationclusteringdimensionreductiongraphandnetworkneuralnetworknetworkmrnamicroarrayrnaseqvisualizationbioinformaticsgenecoexpressionnetworkgraphsnetworkclusteringnetworksself-trainingsemi-supervised-learningunsupervised-learning

5.12 score 2 stars 44 scripts 245 downloads

densvis - Density-Preserving Data Visualization via Non-Linear Dimensionality Reduction

Implements the density-preserving modification to t-SNE and UMAP described by Narayan et al. (2020) <doi:10.1101/2020.05.12.077776>. The non-linear dimensionality reduction techniques t-SNE and UMAP enable users to summarise complex high-dimensional sequencing data such as single cell RNAseq using lower dimensional representations. These lower dimensional representations enable the visualisation of discrete transcriptional states, as well as continuous trajectory (for example, in early development). However, these methods focus on the local neighbourhood structure of the data. In some cases, this results in misleading visualisations, where the density of cells in the low-dimensional embedding does not represent the transcriptional heterogeneity of data in the original high-dimensional space. den-SNE and densMAP aim to enable more accurate visual interpretation of high-dimensional datasets by producing lower-dimensional embeddings that accurately represent the heterogeneity of the original high-dimensional space, enabling the identification of homogeneous and heterogeneous cell states. This accuracy is accomplished by including in the optimisation process a term which considers the local density of points in the original high-dimensional space. This can help to create visualisations that are more representative of heterogeneity in the original high-dimensional space.

Last updated 4 months ago

dimensionreductionvisualizationsoftwaresinglecellsequencingcppopenmp

5.12 score 2 stars 10 scripts 2.6k downloads

MBQN - Mean/Median-balanced quantile normalization

Modified quantile normalization for omics or other matrix-like data distorted in location and scale.

Last updated 4 months ago

normalizationpreprocessingproteomicssoftware

4.92 score 2 stars 14 scripts 216 downloads

evaluomeR - Evaluation of Bioinformatics Metrics

Evaluating the reliability of your own metrics and the measurements done on your own datasets by analysing the stability and goodness of the classifications of such metrics.

Last updated 4 months ago

clusteringclassificationfeatureextractionassessmentclustering-evaluationevaluomeevaluomermetrics

4.82 score 33 scripts 230 downloads

DeepPINCS - Protein Interactions and Networks with Compounds based on Sequences using Deep Learning

The identification of novel compound-protein interaction (CPI) is important in drug discovery. Revealing unknown compound-protein interactions is useful to design a new drug for a target protein by screening candidate compounds. The accurate CPI prediction assists in effective drug discovery process. To identify potential CPI effectively, prediction methods based on machine learning and deep learning have been developed. Data for sequences are provided as discrete symbolic data. In the data, compounds are represented as SMILES (simplified molecular-input line-entry system) strings and proteins are sequences in which the characters are amino acids. The outcome is defined as a variable that indicates how strong two molecules interact with each other or whether there is an interaction between them. In this package, a deep-learning based model that takes only sequence information of both compounds and proteins as input and the outcome as output is used to predict CPI. The model is implemented by using compound and protein encoders with useful features. The CPI model also supports other modeling tasks, including protein-protein interaction (PPI), chemical-chemical interaction (CCI), or single compounds and proteins. Although the model is designed for proteins, DNA and RNA can be used if they are represented as sequences.

Last updated 4 months ago

softwarenetworkgraphandnetworkneuralnetworkopenjdk

4.78 score 2 dependents 4 scripts 153 downloads

MatrixQCvis - Shiny-based interactive data-quality exploration for omics data

Data quality assessment is an integral part of preparatory data analysis to ensure sound biological information retrieval. We present here the MatrixQCvis package, which provides shiny-based interactive visualization of data quality metrics at the per-sample and per-feature level. It is broadly applicable to quantitative omics data types that come in matrix-like format (features x samples). It enables the detection of low-quality samples, drifts, outliers and batch effects in data sets. Visualizations include amongst others bar- and violin plots of the (count/intensity) values, mean vs standard deviation plots, MA plots, empirical cumulative distribution function (ECDF) plots, visualizations of the distances between samples, and multiple types of dimension reduction plots. Furthermore, MatrixQCvis allows for differential expression analysis based on the limma (moderated t-tests) and proDA (Wald tests) packages. MatrixQCvis builds upon the popular Bioconductor SummarizedExperiment S4 class and enables thus the facile integration into existing workflows. The package is especially tailored towards metabolomics and proteomics mass spectrometry data, but also allows to assess the data quality of other data types that can be represented in a SummarizedExperiment object.

Last updated 4 months ago

visualizationshinyappsguiqualitycontroldimensionreductionmetabolomicsproteomicstranscriptomics

4.74 score 4 scripts 374 downloads

Dune - Improving replicability in single-cell RNA-Seq cell type discovery

Given a set of clustering labels, Dune merges pairs of clusters to increase mean ARI between labels, improving replicability.

Last updated 4 months ago

clusteringgeneexpressionrnaseqsoftwaresinglecelltranscriptomicsvisualization

4.62 score 42 scripts 192 downloads

RESOLVE - RESOLVE: An R package for the efficient analysis of mutational signatures from cancer genomes

Cancer is a genetic disease caused by somatic mutations in genes controlling key biological functions such as cellular growth and division. Such mutations may arise both through cell-intrinsic and exogenous processes, generating characteristic mutational patterns over the genome named mutational signatures. The study of mutational signatures have become a standard component of modern genomics studies, since it can reveal which (environmental and endogenous) mutagenic processes are active in a tumor, and may highlight markers for therapeutic response. Mutational signatures computational analysis presents many pitfalls. First, the task of determining the number of signatures is very complex and depends on heuristics. Second, several signatures have no clear etiology, casting doubt on them being computational artifacts rather than due to mutagenic processes. Last, approaches for signatures assignment are greatly influenced by the set of signatures used for the analysis. To overcome these limitations, we developed RESOLVE (Robust EStimation Of mutationaL signatures Via rEgularization), a framework that allows the efficient extraction and assignment of mutational signatures. RESOLVE implements a novel algorithm that enables (i) the efficient extraction, (ii) exposure estimation, and (iii) confidence assessment during the computational inference of mutational signatures.

Last updated 4 months ago

biomedicalinformaticssomaticmutation

4.60 score 1 stars 3 scripts 182 downloads

TargetDecoy - Diagnostic Plots to Evaluate the Target Decoy Approach

A first step in the data analysis of Mass Spectrometry (MS) based proteomics data is to identify peptides and proteins. With this respect the huge number of experimental mass spectra typically have to be assigned to theoretical peptides derived from a sequence database. Search engines are used for this purpose. These tools compare each of the observed spectra to all candidate theoretical spectra derived from the sequence data base and calculate a score for each comparison. The observed spectrum is then assigned to the theoretical peptide with the best score, which is also referred to as the peptide to spectrum match (PSM). It is of course crucial for the downstream analysis to evaluate the quality of these matches. Therefore False Discovery Rate (FDR) control is used to return a reliable list PSMs. The FDR, however, requires a good characterisation of the score distribution of PSMs that are matched to the wrong peptide (bad target hits). In proteomics, the target decoy approach (TDA) is typically used for this purpose. The TDA method matches the spectra to a database of real (targets) and nonsense peptides (decoys). A popular approach to generate these decoys is to reverse the target database. Hence, all the PSMs that match to a decoy are known to be bad hits and the distribution of their scores are used to estimate the distribution of the bad scoring target PSMs. A crucial assumption of the TDA is that the decoy PSM hits have similar properties as bad target hits so that the decoy PSM scores are a good simulation of the target PSM scores. Users, however, typically do not evaluate these assumptions. To this end we developed TargetDecoy to generate diagnostic plots to evaluate the quality of the target decoy method.

Last updated 4 months ago

massspectrometryproteomicsqualitycontrolsoftwarevisualizationbioconductormass-spectrometry

4.60 score 1 stars 9 scripts 247 downloads

NoRCE - NoRCE: Noncoding RNA Sets Cis Annotation and Enrichment

While some non-coding RNAs (ncRNAs) are assigned critical regulatory roles, most remain functionally uncharacterized. This presents a challenge whenever an interesting set of ncRNAs needs to be analyzed in a functional context. Transcripts located close-by on the genome are often regulated together. This genomic proximity on the sequence can hint to a functional association. We present a tool, NoRCE, that performs cis enrichment analysis for a given set of ncRNAs. Enrichment is carried out using the functional annotations of the coding genes located proximal to the input ncRNAs. Other biologically relevant information such as topologically associating domain (TAD) boundaries, co-expression patterns, and miRNA target prediction information can be incorporated to conduct a richer enrichment analysis. To this end, NoRCE includes several relevant datasets as part of its data repository, including cell-line specific TAD boundaries, functional gene sets, and expression data for coding & ncRNAs specific to cancer. Additionally, the users can utilize custom data files in their investigation. Enrichment results can be retrieved in a tabular format or visualized in several different ways. NoRCE is currently available for the following species: human, mouse, rat, zebrafish, fruit fly, worm, and yeast.

Last updated 4 months ago

biologicalquestiondifferentialexpressiongenomeannotationgenesetenrichmentgenetargetgenomeassemblygo

4.60 score 1 stars 6 scripts 219 downloads

brendaDb - The BRENDA Enzyme Database

R interface for importing and analyzing enzyme information from the BRENDA database.

Last updated 4 months ago

thirdpartyclientannotationdataimportbrendadatabaseenzymehacktoberfestcpp

4.60 score 2 stars 4 scripts 202 downloads

LRcell - Differential cell type change analysis using Logistic/linear Regression

The goal of LRcell is to identify specific sub-cell types that drives the changes observed in a bulk RNA-seq differential gene expression experiment. To achieve this, LRcell utilizes sets of cell marker genes acquired from single-cell RNA-sequencing (scRNA-seq) as indicators for various cell types in the tissue of interest. Next, for each cell type, using its marker genes as indicators, we apply Logistic Regression on the complete set of genes with differential expression p-values to calculate a cell-type significance p-value. Finally, these p-values are compared to predict which one(s) are likely to be responsible for the differential gene expression pattern observed in the bulk RNA-seq experiments. LRcell is inspired by the LRpath[@sartor2009lrpath] algorithm developed by Sartor et al., originally designed for pathway/gene set enrichment analysis. LRcell contains three major components: LRcell analysis, plot generation and marker gene selection. All modules in this package are written in R. This package also provides marker genes in the Prefrontal Cortex (pFC) human brain region, human PBMC and nine mouse brain regions (Frontal Cortex, Cerebellum, Globus Pallidus, Hippocampus, Entopeduncular, Posterior Cortex, Striatum, Substantia Nigra and Thalamus).

Last updated 4 months ago

singlecellgenesetenrichmentsequencingregressiongeneexpressiondifferentialexpressionenrichmentmarker-genes

4.48 score 3 stars 5 scripts 188 downloads

PDATK - Pancreatic Ductal Adenocarcinoma Tool-Kit

Pancreatic ductal adenocarcinoma (PDA) has a relatively poor prognosis and is one of the most lethal cancers. Molecular classification of gene expression profiles holds the potential to identify meaningful subtypes which can inform therapeutic strategy in the clinical setting. The Pancreatic Cancer Adenocarcinoma Tool-Kit (PDATK) provides an S4 class-based interface for performing unsupervised subtype discovery, cross-cohort meta-clustering, gene-expression-based classification, and subsequent survival analysis to identify prognostically useful subtypes in pancreatic cancer and beyond. Two novel methods, Consensus Subtypes in Pancreatic Cancer (CSPC) and Pancreatic Cancer Overall Survival Predictor (PCOSP) are included for consensus-based meta-clustering and overall-survival prediction, respectively. Additionally, four published subtype classifiers and three published prognostic gene signatures are included to allow users to easily recreate published results, apply existing classifiers to new data, and benchmark the relative performance of new methods. The use of existing Bioconductor classes as input to all PDATK classes and methods enables integration with existing Bioconductor datasets, including the 21 pancreatic cancer patient cohorts available in the MetaGxPancreas data package. PDATK has been used to replicate results from Sandhu et al (2019) [https://doi.org/10.1200/cci.18.00102] and an additional paper is in the works using CSPC to validate subtypes from the included published classifiers, both of which use the data available in MetaGxPancreas. The inclusion of subtype centroids and prognostic gene signatures from these and other publications will enable researchers and clinicians to classify novel patient gene expression data, allowing the direct clinical application of the classifiers included in PDATK. Overall, PDATK provides a rich set of tools to identify and validate useful prognostic and molecular subtypes based on gene-expression data, benchmark new classifiers against existing ones, and apply discovered classifiers on novel patient data to inform clinical decision making.

Last updated 4 months ago

geneexpressionpharmacogeneticspharmacogenomicssoftwareclassificationsurvivalclusteringgeneprediction

4.31 score 1 stars 17 scripts 372 downloads

protGear - Protein Micro Array Data Management and Interactive Visualization

A generic three-step pre-processing package for protein microarray data. This package contains different data pre-processing procedures to allow comparison of their performance.These steps are background correction, the coefficient of variation (CV) based filtering, batch correction and normalization.

Last updated 4 months ago

microarrayonechannelpreprocessingbiomedicalinformaticsproteomicsbatcheffectnormalizationbayesianclusteringregressionsystemsbiologyimmunooncologybackground-correctionmicroarray-datanormalisationproteomics-datashinyshinydashboard

4.30 score 1 stars 6 scripts 302 downloads