--- title: "MSTree: Building Minimum Spanning Tree (MST) from chewBBACA pipeline output" author: - name: Abdullah El-Kurdi affiliation: - &id1 American University of Beirut, Beirut, Lebanon email: ak161@aub.edu.lb vignette: > %\VignetteIndexEntry{MSTree} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF8} output: BiocStyle::html_document --- ```{r, echo=FALSE, results="hide", warning=FALSE} suppressPackageStartupMessages({library('MSTree')}) ``` ```{r, setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # Introduction A Minimum spanning tree based for Multi-Locus Sequence Typing (MLST) and Core Genome MLST (cgMLST) is useful tool to assess and check the relatedness between the bacterial genomes, such as the strains that are within an outbreak. ChewBBACA pipeline (https://chewbbaca.readthedocs.io/en/latest/) allows to call the allelic profiles based on gene-by-gene schema and it determines the set of loci that are the core genome, in the case of cgMLST analysis. The output of this analysis is a table, the isolates or strain in rows, the loci that constitute the core genome are in columns. MSTree is built to make a minimum spanning tree from the aformentioned output in two steps: First, a graph object will be made by calculating the distances between the isolates, and in a second step, the generated graph will be used to make a customized minimum spanning tree using one of the two options the user can choose from: plotNetwork function from the NetPathMiner Bioconductor package or using ggraph. An important parameter when constructing the graph object is the nodes clustering. Based on the Complex Type Distance values provided by https://www.cgmlst.org/ncs, a threshold can be set to connect/custer nodes that have a distance less than or equal to a value. For instance, The complex type distance for the _E. coli_ is 10. When set to 10, isolates with less than or equal to 10 are connected to make a cluster. # Overview MSTree has two main functions: 1. makeGraphFromChewBBACA: that construct a graph object from the output of chewBBACA pipeline. 2. PlotMST: that takes the constructed graph from the previous function and it generates a minimum spanning tree using PlotNetwork function from NetPathMiner Bioconductor package or ggraph. __________ # Example ```{r, eval=TRUE} cgmlst_output <- system.file("extdata", "cgMLST95.csv", package = "MSTree") my_graph <- makeGraphFromChewBBACA(cgmlst_output, max_allelic_difference = 9) mst <- PlotMST(my_graph, show_clustering = TRUE, show_legend=FALSE, MST_edges_color="#b97b29", node_color = "#3b17db", node_label_size = 3, title = "MST") mst ``` # Session information ```{r} sessionInfo() ```