---
title: "D. Benchmarking with ProteinGym"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{D. Benchmarking with ProteinGym}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>"
)
```

Original version: 8 October, 2024

```{r setup, message = FALSE}
library(AlphaMissenseR)
library(ExperimentHub)
library(dplyr)
library(tidyr)
library(ggdist)
library(gghalves)
library(ggplot2)
```

# Introduction

Benchmarking is essential for assessing different variant effect prediction 
approaches, potentially aiding in decisions about which models are most suitable
for specific research questions or clinical applications. 

To evaluate the performance of AlphaMissense and its predictions, we integrate
with the [ProteinGym][pg_link] database, a comprehensive collection of
benchmarks aimed at comparing the ability of models that predict the effects of
protein mutations. It consists of an extensive dataset of deep mutational
scanning (DMS) assays covering approximately 2.7 million missense variants
across 217 experiments, as well as annotated human clinical variants for 2,525
proteins. DMS assays are highly valuable as they systematically test all
possible single mutations in a protein, recording their fitness effects. They
can reveal the impacts of both deleterious and beneficial mutations on fitness,
offering insights into protein structure and activity.

Furthermore, ProteinGym provides several standardized model evaluation 
metric scores ("AUC", "MCC", "NDCG", "Spearman", "Top_recall") for 62 models 
calculated across the 217 DMS assays, offering a consistent and extensive 
benchmarking framework across approaches.

This vignette demonstrates how to 1) compare AlphaMissense predictions to DMS 
scores on a per-protein basis, and 2) how to benchmark AlphaMissense against
other models aimed at predicting protein mutation effects.

[pg_link]: https://proteingym.org/
[Notin_link]: https://papers.nips.cc/paper_files/paper/2023/hash/cac723e5ff29f65e3fcbb0739ae91bee-Abstract-Datasets_and_Benchmarks.html
[AlphaMissense]: https://www.science.org/doi/10.1126/science.adg7492

# Access ProteinGym datasets through `ExperimentHub`.

The ProteinGym DMS assays can be accessed by querying `ExperimentHub`.

```{r access_dms}
eh <- ExperimentHub::ExperimentHub()
dms_data <- eh[['EH9555']]

head(names(dms_data))
```

Each element of the list is an individual DMS assay with the following
information for each column: the UniProt protein identifier, the DMS 
experiment assay identifier, the mutant at a given protein position, the mutated
protein sequence, the recorded DMS score, and a binary DMS score bin 
categorizing whether the mutation has high (1) or low fitness (0). For 
more details, reference the publication from Notin et al. [2023][Notin_link].

A supplementary table of AlphaMissense pathogencity scores for ~1.6 M 
substitutions matching those in the ProteinGym DMS assays is provided by Cheng 
et al. [2023][AlphaMissense], and can also be accessed through `ExperimentHub`.

```{r am_scores}
am_scores <- eh[['EH9554']]
am_scores |> head()
```

The `data.frame` shows the `DMS_id` matching a ProteinGym assay, the 
UniProt entry name of the protein evaluated, the mutation and position, and 
the aggregated AlphaMissense score for that mutation. For more details about 
the table, reference the [AlphaMissense][AlphaMissense] paper.

# Correlate DMS scores and AlphaMissense predictions

The DMS scores serve as an experimental measure to compare the accuracy of
AlphaMissense mutation effect predictions. For a given protein, we can plot the 
relationship between the two measures and report their Spearman correlation. 

Here, we demonstrate using the "NUD15_HUMAN" assay. First, we filter both 
datasets to the chosen assay.

```{r dms_am_NUD15}
NUD15_dms <- dms_data[["NUD15_HUMAN_Suiter_2020"]]

NUD15_am <- am_scores |> 
    filter(DMS_id == "NUD15_HUMAN_Suiter_2020")
```

Wrangle and merge the DMS and AlphaMissense tables together by the UniProt and 
mutant identifers.

```{r dms_am_wrangle}
NUD15_am <- NUD15_am |>
    mutate(Uniprot_ID = "Q9NV35")

merged_table <- 
    left_join(
        NUD15_am, NUD15_dms, 
        by = c("Uniprot_ID" = "UniProt_id", "variant_id" = "mutant"),
        relationship = "many-to-many"
    ) |>
    select(Uniprot_ID, variant_id, AlphaMissense, DMS_score) |> 
    na.omit()
```

Now, we plot the correlation.

```{r dms_am_plot, fig.width=5, fig.height=3.5}
correlation_plot <- 
    merged_table |> 
    ggplot(
        aes(y = .data$AlphaMissense, x = .data$DMS_score)
    ) +
    geom_bin2d(bins = 60) +
    scale_fill_continuous(type = "viridis") +
    xlab("DMS score") +
    ylab("AlphaMissense score") +
    theme_classic() +
    theme(
        axis.text.x = element_text(size = 16),
        axis.text.y = element_text(size = 16),
        axis.title.y = element_text(size = 16, vjust = 2),
        axis.title.x = element_text(size = 16, vjust = 0),
        legend.title = element_text(size = 16),
        legend.text = element_text(size = 16)
    )

correlation_plot
```

If mutating a residue resulted in reduced fitness in the DMS assay 
(low DMS score), we would expect it to be predicted as pathogenic 
(higher AlphaMissense score). Therefore, a stronger negative correlation 
represents a tighter relationship between the two measures. We can see this 
with the NUD15 protein.

We can also print the Spearman correlation.

```{r corrtest}
cor.test(
    merged_table$AlphaMissense, merged_table$DMS_score, 
    method="spearman", 
    exact = FALSE
)
```
The correlation is r = -0.67 and is statistically significant.


# Benchmark AlphaMissense with other variant effect prediction models

We can calculate the Spearman correlation across multiple DMS assays in 
which we have corresponding AlphaMissense scores, and use this metric to 
benchmark across multiple models aimed at predicting mutation effects. For this
analysis, we will load in the metric scores for 62 models calculated across the 
217 DMS assays from the ProteinGym database available through `ExperimentHub`.

```{r load_metrics}
metrics_scores <- eh[['EH9593']]
```

`metrics_scores` is a `list` object where each element is a `data.frame` 
corresponding to one of five model evaluation metrics ("AUC", "MCC", "NDCG", 
"Spearman", "Top_recall") available from ProteinGym. Briefly, these metrics were
calculated on the DMS assays for 62 models in a zero-shot setting. The 
[Protein Gym paper][Notin_link] provides more information about these metrics. 

To demonstrate, we will benchmarking using the Spearman correlation 
calculated on 10 DMS assays for 5 models. The following code subsets and 
merges the relevant datasets.

```{r prep_data}
chosen_assays <- c(
    "A0A1I9GEU1_NEIME_Kennouche_2019", 
    "A0A192B1T2_9HIV1_Haddox_2018", 
    "ADRB2_HUMAN_Jones_2020", 
    "BRCA1_HUMAN_Findlay_2018", 
    "CALM1_HUMAN_Weile_2017",
    "GAL4_YEAST_Kitzman_2015", 
    "Q59976_STRSQ_Romero_2015", 
    "UBC9_HUMAN_Weile_2017", 
    "TPK1_HUMAN_Weile_2017",
    "YAP1_HUMAN_Araya_2012")

am_subset <- 
    am_scores |> 
    filter(DMS_id %in% chosen_assays)

dms_subset <- 
    dms_data[names(dms_data) %in% chosen_assays]

dms_subset <- 
    bind_rows(dms_subset) |> 
    as.data.frame()

metric_subset <- 
    metrics_scores[["Spearman"]] |> 
    filter(DMS_ID %in% chosen_assays) |> 
    select(-c(Number_of_Mutants, Selection_Type, UniProt_ID, 
        MSA_Neff_L_category, Taxon))

merge_am_dms <-
    left_join(
        am_subset, dms_subset, 
        by = c("DMS_id" = "DMS_id", "variant_id" = "mutant"),
        relationship = "many-to-many"
    ) |>
    na.omit()
```

The Spearman scores are already provided for the 62 models in the ProteinGym 
metrics table, but we will need to calculate this for AlphaMissense. 

```{r calc_sp}
spearman_res <- 
    merge_am_dms |> 
    group_by(DMS_id) |> 
    summarize(
        AlphaMissense = cor(AlphaMissense, DMS_score, method = "spearman")
    ) |> 
    mutate(
        AlphaMissense = abs(AlphaMissense)
    )

DMS_IDs <- metric_subset |> pull(DMS_ID)

metric_subset <- 
    metric_subset |> 
    select(-DMS_ID) |> 
    abs() |> 
    mutate(DMS_id = DMS_IDs)

all_sp <- 
    left_join(
        metric_subset, spearman_res, 
        by = "DMS_id"
    )
```

`all_sp` is a table of Spearman correlation values (absolute values to handle 
difference in directionality of certain DMS assays) across the ProteinGym and 
AlphaMissense models for the 10 assays.

Prepare the data for plotting. 

```{r pivot_long}
res_long <- 
    all_sp |> 
    select(-DMS_id) |> 
    tidyr::pivot_longer(
        cols = everything(), 
        names_to = "model", 
        values_to = "score"
    ) |> 
    group_by(model) |> 
    mutate(
        model_mean = mean(score)
    ) |> 
    mutate(
        model = as.factor(model)
    ) |> 
    ungroup() 

unique_ordered_models <- unique(res_long$model[order(res_long$model_mean, 
                                                        decreasing = TRUE)])

res_long$model <- factor(res_long$model, 
                   levels = unique_ordered_models)

topmodels <- levels(res_long$model)[1:5]

top_spearmans <- 
    res_long |> 
    filter(model %in% topmodels) |> 
    droplevels()
```

Visualize with a raincloud plot.

```{r raincloud, fig.width=7, fig.height=4.5}
top_spearmans |> 
    ggplot(
        aes(x = model, y = score, fill = model, group = model)
    ) + 
    ggdist::stat_halfeye(
        adjust = .5, 
        width = .6, 
        .width = 0, 
        justification = -.2, 
        point_colour = NA
    ) + 
    geom_boxplot(
        width = .15, 
        outlier.shape = NA
    ) +
    gghalves::geom_half_point(
        side = "l", 
        range_scale = .4, 
        alpha = .2
    ) +
    coord_cartesian(clip = "off") +
    scale_fill_discrete(name = "Models") +
    theme_classic() +
    ylab("Spearman") +
    theme(
        axis.text.x = element_text(size = 11, angle = -12),
        axis.text.y = element_text(size = 16),
        axis.title.y = element_text(size = 16),
        axis.title.x = element_blank(),
        legend.title = element_text(size = 16),
        legend.text = element_text(size = 11)
    )
```

Based on the Spearman correlation for the 10 assays we chose, AlphaMissense 
performed the best. For a realistic and comprehensive benchmark, one would 
want to apply this framework across all 217 DMS assays available in ProteinGym.

# Session information {.unnumbered}
```{r}
sessionInfo()
```