--- title: "Data manipulation" author: - Luca De Sano - Daniele Ramazzotti - Giulio Caravagna - Alex Graudenxi - Marco Antoniotti date: "`r format(Sys.time(), '%B %d, %Y')`" graphics: yes package: TRONCO output: BiocStyle::html_document: toc_float: true vignette: > %\VignetteIndexEntry{Data manipulation} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} %\VignetteDepends{TRONCO,BiocStyle} --- ```{r, include=FALSE} library(TRONCO) data(aCML) data(crc_maf) data(crc_gistic) data(crc_plain) ``` All examples in this section will be done with the the aCML dataset as reference. ## Modifying events and samples TRONCO provides functions for renaming the events that were included in a dataset, or the type associated to a set of events (e.g., a `Mutation` could be renamed to a `Missense Mutation`). ```{r} dataset = rename.gene(aCML, 'TET2', 'new name') dataset = rename.type(dataset, 'Ins/Del', 'new type') as.events(dataset, type = 'new type') ``` and return a modified TRONCO object. More complex operations are also possible. For instance, two events with the same signature -- i.e., appearing in the same samples -- can be joined to a new event (see also Data Consolidation in Model Inference) with the same signature and a new name. ```{r } dataset = join.events(aCML, 'gene 4', 'gene 88', new.event='test', new.type='banana', event.color='yellow') ``` where in this case we also created a new event type, with its own color. In a similar way we can decide to join all the events of two distinct types, in this case if a gene *x* has signatures for both type of events, he will get a unique signature with an alteration present if it is either of the second *or* the second type ```{r} dataset = join.types(dataset, 'Nonsense point', 'Nonsense Ins/Del') as.types(dataset) ``` TRONCO also provides functions for deleting specific events, samples or types. ```{r} dataset = delete.gene(aCML, gene = 'TET2') dataset = delete.event(dataset, gene = 'ASXL1', type = 'Ins/Del') dataset = delete.samples(dataset, samples = c('patient 5', 'patient 6')) dataset = delete.type(dataset, type = 'Missense point') view(dataset) ``` ## Modifying patterns TRONCO provides functions to edit patterns, pretty much as for any other type of events. Patterns however have a special denotation and are supported only by CAPRI algorithm -- see Model Reconstruction with CAPRI to see a practical application of that. ## Subsetting a dataset It is very often the case that we want to subset a dataset by either selecting only some of its samples, or some of its events. Function `samples.selection` returns a dataset with only some selected samples. ```{r} dataset = samples.selection(aCML, samples = as.samples(aCML)[1:3]) view(dataset) ``` Function `events.selection`, instead, performs selection according to a filter of events. With this function, we can subset data according to a frequency, and we can force inclusion/exclusion of certain events by specifying their name. For instance, here we pick all events with a minimum frequency of 5%, force exclusion of SETBP1 (all events associated), and inclusion of EZH1 and EZH2. ```{r} dataset = events.selection(aCML, filter.freq = .05, filter.in.names = c('EZH1','EZH2'), filter.out.names = 'SETBP1') ``` ```{r} as.events(dataset) ``` An example visualization of the data before and after the selection process can be obtained by combining the `gtable` objects returned by \Rfunction{oncoprint}. We here use `gtable = T` to get access to have a GROB table returned, and `silent = T` to avoid that the calls to the function display on the device; the call to `grid.arrange` displays the captured `gtable` objects. ```{r fig.width=7, fig.height=5.5, fig.cap="Multiple output from oncoprint can be captured as a gtable and composed via grid.arrange (package gridExtra). In this case we show aCML data on top -- displayed after the as.alterations transformation -- versus a selected subdataset of events with a minimum frequency of 5%, force exclusion of SETBP1 (all events associated), and inclusion of EZH1 and EZH2.", results='hide'} library(gridExtra) grid.arrange( oncoprint(as.alterations(aCML, new.color = 'brown3'), cellheight = 6, cellwidth = 4, gtable = TRUE, silent = TRUE, font.row = 6)$gtable, oncoprint(dataset, cellheight = 6, cellwidth = 4, gtable = TRUE, silent = TRUE, font.row = 6)$gtable, ncol = 1) ```