addConnections_TF_peak() that threw errors for some edge cases when percBackground_size = 0patchwork version 1.2.0 and above, due to an incompatibility reported between ggplot2 3.5 and patchwork versions < 1.2.0. The error that may be thrown in plotDiagnosticPlots_peakGene is this: Error in Ops.data.frame(guide_loc, panel_loc) : ‘==’ only defined for equally-sized data framesforcats warning for package versions of 1.0.0 and above fixedgene.types in filterGRNAndConnectGenes() to all and adjusted documentation across functions that have also this argument - by default, there is now no standard filter for the gene type (genes were filtered to only protein-coding genes before using protein_coding). This results in bigger eGRNs and seems more sensible than to automatically filter genes for most use cases.plotDiagnosticPlots_peakGene() (upper right plot on page 1). This was caused by not correctly shuffling the RNA cunts data due to a code change that occurred in version 1.7.4. We apologize for the confusion this may have caused.addTFBS() function. We tested this database extensively and use it as default TF database for our GRaNIE networks.tidyr 1.3.0 or above due to the usage of separate_wider_delim in the code in a recent upgradeAnnotationHub::getAnnotationHubaddConnections_peak_gene().JASPAR2024 package and TF motives.plotPCA_all() now also stores all screeplot and PCA results in the object within GRN@stats$PCAaddTFBS when using the JASPAR databaseAnnotationHub and caching annotation data in cases when the cache directory is corrupted or deletedfilterData for cases when too few peaks survive the filtering. Before, the function threw an error when the number of peaks left was < 1000. Due to the error, the original GRN object was not updated in such a case and one could accidentally continue with the normal workflow because the GRN object before the filtering would be taken as input instead for subsequent functions. Now, only a warning is produced.filterConnectionsForPlotting() that can be used to include or exclude particular connections from the stored eGRN for visualization purposes only (!). Note that this filter only applies to visualization and enables a flexible system to visually explore particular features of the stored eGRN. This is particularly handy when the eGRN is large. For more details, see the help pages of the new function.visualizeGRN() now by default only visualizes connections that are marked as such (the result from filterConnectionsForPlotting()) - that is, it excludes connections that the user beforehand excluded from plotting. This allows to specifically plot only part of the eGRN network and explore specific T&F regulons, for example, a feature that before was not so easy to do.GRaNIE via the new function addSNPData(). For more information, see the Package vignette.biomaRt for the full genome annotation retrieval in addData with a different approach that is more reliable, as we had more and more issues with biomaRt in the recent past. While using the old biomaRt approach is still an option, the default is now to use the AnnotationHub package from Bioconductor. This makes GRaNIE overall more stable and less reliant on biomaRt due to the strict timeouts and query size restrictions.addRobustRegression) that was available as an experimental feature until now. It is implemented in the WGCNA package and called biweight midcorrelation or short bicor, a robust type of correlation based on medians that can be used as an alternative to Spearman correlation. Biweight midcorrelation has been shown to be more robust in evaluating similarity in gene expression networks and is often used for weighted correlation network analysis. In addition, this new correlation type can now also be selected for the TF-peak correlations, which was not possible before. Lastly, the code has been cleaned and simplified and all instances of addRobustRegression have been removed and replaced by a new third option bicor (in addition to Pearson and Spearman, as before) for the corMethod argument in multiple functions that support this feature. All vignettes have been updated accordingly.addData, when a DESeq2 size factor normalization is selected by the user, it is now explicitly checked whether enough genes are available that contain no 0 values on which the size factor normalization is based on. If this is not the case (the default and hard-coded limit is currently set to a minimum of 100), an error is thrown. This becomes particularly relevant for single-cell derived data with a high fraction of 0s, and prevents a normalization based on very few genes and improves the error messages that DESeq2 throws otherwise for an improved user experience.biomaRt call, which did not work as originally intended in case of temporary connection failures. Now, calls to biomaRt are attempted up to 40 times to increase the chances of not suffering from connection issues. Also, the approach to deal with BiocParallel failures has been changed.we provide two new functions with this update:
getGRNSummary() that summarizes a GRN object and returns a named list, which can be used to compare different GRN objects ore easily among each other, for example.plotCorrelations() for scatter plots of the underlying data for either TF-peak, peak-gene or TF-gene pairs. This can be useful to visualize specific TF-peak, peak-gene or TF-gene pairs to investigate the underlying data and to judge the reasonability of the inferred connection.methods vignette updates
TF.ID instead of TF.name column as unique TF identifierrn6/rn7 and dm6 for the rat and the Drosophila (fruit fly) genome, respectivelyGRaNIE. We now additionally offer a more user-friendly way by making it possible to directly use the JASPAR2022 database. You do not need any custom files anymore for this approach! See the Package vignette for more details.addConnections_TF_peak (Column peak.GC.class doesn't exist.) that was caused due to the recent GC modificationsplotDiagnosticPlots_TFPeaks (and indirectly in addConnections_TF_peak when plotDiagnosticPlots = TRUE) on page 1 that shows the total number of connections for real and background TF-peak links as calculated and stored in the GRN object, stratified by TF-peak FDR and correlation bin. This is a similar plot as we show in the paper and helps comparing foreground and background.plotDiagnosticPlots_TFPeaks (and indirectly in addConnections_TF_peak when plotDiagnosticPlots = TRUE) when plotAsPDF = FALSEaddConnections_TF_peak when using useGCCorrection = TRUEdplyr (1.1.0) changed their default behavior for the function if_else when NULL is involved, which caused an error. We changed the implementation to accommodate for that and now avoid dplyr::if_else and use base R ifelse instead.GenomeInfoDb::getChromInfoFromUCSC("hg38") (see here for more details), the minimum required version of GenomeInfoDb had to be increased to 1.34.8. If you have troubles installing at least this version, we recommend updating to the newest Bioconductor version 3.16 or (without warranties) use the following line to manually install the newest version directly from GitHub outside of Bioconductor (not recommended): BiocManager::install("Bioconductor/GenomeInfoDb)"addData() so that peak IDs are stored with the same name in the object in case the user-provided peak IDs have the format chr:start:end as opposed to the required chr:start-end. filterData() otherwise incorrectly discarded all peaks because of the ID mismatch caused by the two different formats.filterGRNAndConnectGenes() that caused an error when 0 TF-peak connections were found beforehandGRaNIE for single-cell data! We plan to update it regularly with new information. Check it out here!addData(): geneAnnotation_customHost to specify a custom host and overriding the default and previously hard-coded hostname when retrieving gene annotation data via biomaRt.getGRNConnections() can now also include the various additional metadata for all type parameters and not only the default type all.filtered.biomaRt such as GL000194.1. Peaks from chromosomes with irretrievable lengths are now automatically discarded.plotDiagnosticPlots_peakGene() (which is also called indirectly from addConnections_peak_gene() when setting plotDiagnosticPlots = TRUE) now stores the plot data for the QC plots from the first page into the GRN object. It is stored in GRN@stats$peak_genesgetGRNConnections() are now explained in detail in the R help, and we reference this from the Vignette and other placesgetGRNConnections(), which now does not return duplicate columns for particular cases anymorefilterData() and addData()loadExampleObject() function has been optimized and should now force download an example object when requesting it.tidyselect changes in version 1.2.0 to eliminate deprecated warningsaddConnections_peak_gene() and plotDiagnosticPlots_peakGene() have been homogenized and changed to list(c("all"), c("protein_coding")). Before, the default was list(c("protein_coding", "lincRNA")), but we decided to now split this into two separate lists: Once for all genes irrespective of the gene type and once for only protein-coding genes. As before, lincRNA or other gene types can of course still be selected and chosen.plotCommunitiesEnrichment() that was introduced due to the tidyselect 1.2.0 changestidyselect changes in version 1.2.0 to eliminate deprecated warningsGRN object in loadExampleObject() had to be changed due to changes in the IT infrastructure. The new stable default URL is now \url{https://git.embl.de/grp-zaugg/GRaNIE/-/raw/master/data/GRN.rds}, in the same Git repository that provides GRaNIE outside of Bioconductor.tidyselect changes in version 1.2.0 to eliminate deprecated warningsaddConnections_peak_gene(): TADs_mergeOverlapping. See the R help for more details.addConnections_peak_gene(): shuffleRNACounts. See the R help for more details.tidyselect changes in version 1.2.0 to eliminate deprecated warningstopGO package is now required package and not optional anymore. The reasoning for this is that the standard vignette should run through with the default arguments, and GO annotation is the default ontology so topGO is needed for this. Despite this package still being optional from a strict workflow point of view, we feel this is a better way and improves user friendliness by not having to install another package in the middle of the workflow.initializeGRN(), the objectMetadata argument is now checked whether it contains only atomic elements, and an error is thrown if this is not the case. As this list is not supposed to contain real data, checking this prevents the print(GRN) function to unnecessarily print the whole content of the provided object metadata, thereby breaking the original purpose.addTFBS() got two more arguments to make it more flexible. Now, it is possible to specify the file name of the translation table to be used via the argument translationTable, which makes it more flexible than the previously hard-coded name "translationTable.csv. In addition, the column separator for this file can now be specified via the argument translationTable_sepdecoy, for example. If such elements are found, a warning is now thrown and they are ignored as they are usually not wanted anyway.print function now give a more user-friendly warning / error message.?addData for details.GRaNIE is now more readily applicable for larger analyses and single-cell analysis even though we just started actively optimizing for it, so we cannot yet recommend applying our framework in a single-cell manner. Older GRN objects are automatically changed internally when executing the major functions upon the first invocation.generateStatsSummary() now doesnt alter the stored filtered connections in the object anymore. This makes its usage more intuitive and it can be used anywhere in the workflow.biomaRt calls in the code. This saves time and makes the code less vulnerable to timeout issues caused by remote servicesplotPCA_all() now can only plot the normalized counts and not the raw counts anymore (except when no normalization is wanted)build_eGRN_graph() in the GRaNIE object is now reset whenever the function filterGRNAndConnectGenes() is successfully executed to make sure that enrichment functions etc are not using an outdated graph structure.Suggests to lower the installation burden of the package. In addition, removed topGO from the Depends section (now in Suggests) and removed tidyverse altogether (before in Depends). Detailed explanations when and how the packages listed under Suggests are needed can now be found in the new Package Details Vignette and are clearly given to the user when executing the respective functionsgetGRNConnections(), which now has more arguments allowing a more fine-tuned and rich retrieval of eGRN connections, features and feature metadataadd_featureVariation() to quantify and interpret multiple sources of biological and technical variation for features (TFs, peaks, and genes) in a GRN object, see the R help for more informationfilterGRNAndConnectGenes() now doesnt include feature metadata columns to save space in the result data frame that is created. The help has been updated to make clear that getGRNConnections() includes these features now.GRN@data$TFs@translationTable to GRN@annotation@TFs. All exported functions run automatically a small helper function to make this change for any GRN object automatically to adapt to the new structurelncRNA from biomaRt to lincRNA to be compatible with older versions of GRaNIEfilterGRaNIEAndConnectGenes() (peak_gene.maxDistance) as well as more flexibility how to adjust the peak-gene raw p-values for multiple testing (including the possibility to use IHW - experimental)plotDiagnosticPlots_TFPeaks() for plotting (this function was previously called only internally, but is now properly exported), in analogy to plotDiagnosticPlots_peakGene()first published package version