Title: | Tools for working with diverse immune genes |
---|---|
Description: | MHC (major histocompatibility complex) molecules are cell surface complexes that present antigens to T cells. The repertoire of antigens presented in a given genetic background largely depends on the sequence of the encoded MHC molecules, and thus, in humans, on the highly variable HLA (human leukocyte antigen) genes of the hyperpolymorphic HLA locus. More than 28,000 different HLA alleles have been reported, with significant differences in allele frequencies between human populations worldwide. Reproducible and consistent annotation of HLA alleles in large-scale bioinformatics workflows remains challenging, because the available reference databases and software tools often use different HLA naming schemes. The package immunotation provides tools for consistent annotation of HLA genes in typical immunoinformatics workflows such as for example the prediction of MHC-presented peptides in different human donors. Converter functions that provide mappings between different HLA naming schemes are based on the MHC restriction ontology (MRO). The package also provides automated access to HLA alleles frequencies in worldwide human reference populations stored in the Allele Frequency Net Database. |
Authors: | Katharina Imkeller [cre, aut] |
Maintainer: | Katharina Imkeller <[email protected]> |
License: | GPL-3 |
Version: | 1.15.0 |
Built: | 2024-12-08 06:09:32 UTC |
Source: | https://github.com/bioc/immunotation |
Assemble a table or MHC protein complexes for a given organism.
assemble_protein_complex(organism)
assemble_protein_complex(organism)
organism |
Organism for which the lookup should be built (e.g.
"human", "mouse", ...). The list of valid
organisms can be found using the function |
a data frame with the MHC complexes annotated in MRO (only completely annotated complexes are returned)
assemble_protein_complex(organism = "mouse")
assemble_protein_complex(organism = "mouse")
build_allele_group
e.g. A*01:01 -> A*01:01:01,
A*01:01:02, A*01:01:03
build_allele_group(allele_selection)
build_allele_group(allele_selection)
allele_selection |
HLA allele for whicht the allele group should be built. |
list of alleles
build_allele_group("A*01:01")
build_allele_group("A*01:01")
Decode a multiple allele code (MAC) into a list of HLA alleles. #' The National Marrow Donor Program (NMDP) uses [MAC](https://bioinformatics.bethematchclinical.org/ hla-resources/allele-codes/allele-code-lists/) to facilitate the reporting and comparison of HLA alleles. MAC represent groups of HLA alleles and are useful when the HLA typing is ambiguous and does not allow to narrow down one single allele from a list of alleles.
decode_MAC(MAC)
decode_MAC(MAC)
MAC |
multiple allele code (e.g. "A*01:ATJNV") |
list of HLA alleles
MAC <- "A*01:ATJNV" decode_MAC(MAC)
MAC <- "A*01:ATJNV" decode_MAC(MAC)
Encode a list of HLA alleles into multiple allele code (MAC). The National Marrow Donor Program (NMDP) uses [MAC](https://bioinformatics.bethematchclinical.org/ hla-resources/allele-codes/allele-code-lists/) to facilitate the reporting and comparison of HLA alleles. MAC represent groups of HLA alleles and are useful when the HLA typing is ambiguous and does not allow to narrow down one single allele from a list of alleles.
encode_MAC(allele_list)
encode_MAC(allele_list)
allele_list |
list of HLA alleles (e.g. c("A*01:01:01", "A*02:01:01", "A*03:01")) |
encoded MAC
allele_list <- c("A*01:01:01", "A*02:01:01", "A*03:01") encode_MAC(allele_list)
allele_list <- c("A*01:01:01", "A*02:01:01", "A*03:01") encode_MAC(allele_list)
Get the G groups for a list of HLA alleles. [G groups](http://hla.alleles.org/alleles/g_groups.html) are groups of HLA alleles that have identical nucleotide sequences across the exons encoding the peptide binding domains.
get_G_group(allele_list)
get_G_group(allele_list)
allele_list |
List of alleles. |
Named list of G-groups the input alleles belong to.
allele_list <- c("DQB1*02:02:01", "DQB1*06:09:01") get_G_group(allele_list)
allele_list <- c("DQB1*02:02:01", "DQB1*06:09:01") get_G_group(allele_list)
NetMHCpan tools for MHC-peptide binding prediction require
HLA complex names in a specific format. get_mhcpan_input
formats a
list of HLA alleles into a list of NetMHC-formated complexes.
get_mhcpan_input(allele_list, mhc_class)
get_mhcpan_input(allele_list, mhc_class)
allele_list |
list of HLA alles (e.g. c("A*01:01:01","B*27:01")) |
mhc_class |
["MHC-I"|"MHC-II"] indicated which NetMHC you want to use. |
protein chain list as formatted for MHCpan input
allele_list <- c("A*01:01:01","B*27:01") get_mhcpan_input(allele_list, mhc_class = "MHC-I")
allele_list <- c("A*01:01:01","B*27:01") get_mhcpan_input(allele_list, mhc_class = "MHC-I")
Get the P groups for a list of HLA alleles. [P groups](http://hla.alleles.org/alleles/p_groups.html) are groups of HLA alleles that have identical protein sequences in the peptide binding domains.
get_P_group(allele_list)
get_P_group(allele_list)
allele_list |
list of HLA alleles |
Named list of P-groups the input alleles belong to.
allele_list <- c("DQB1*02:02:01", "DQB1*06:09:01") get_P_group(allele_list)
allele_list <- c("DQB1*02:02:01", "DQB1*06:09:01") get_P_group(allele_list)
Get the serotypes of the MHC complexes encoded by a list of MHC alleles.
get_serotypes(allele_list, organism = "human", mhc_type)
get_serotypes(allele_list, organism = "human", mhc_type)
allele_list |
List of allele |
organism |
Organism to be used for MRO lookup. If the organism does not match the given allele, a empty object is returned. |
mhc_type |
["MHC-I" or "MHC-II"] MHC class to use for MRO lookup. |
Named list of serotypes, which only contains complexes contained in the MRO. If no serotype is annoted for a given complex, the list element is NA.
allele_list <- c("A*01:01:01","B*27:01") get_serotypes(allele_list, mhc_type = "MHC-I")
allele_list <- c("A*01:01:01","B*27:01") get_serotypes(allele_list, mhc_type = "MHC-I")
get the list of organisms that are part of the MRO annotation
get_valid_organisms()
get_valid_organisms()
list of organisms
get_valid_organisms()
get_valid_organisms()
human_protein_complex_table
human_protein_complex_table
human_protein_complex_table
An object of class data.frame
with 12385 rows and 8 columns.
human_protein_complex_table
: human_protein_complex_table.
# The human protein complex table is available in the following # exported variable human_protein_complex_table
# The human protein complex table is available in the following # exported variable human_protein_complex_table
plot_allele_frequency
Generate a World map
displaying the frequency of a given table of HLA alleles. Use the function
query_allele_frequencies to generate a table with allele frequencies.
plot_allele_frequency(allele_frequency)
plot_allele_frequency(allele_frequency)
allele_frequency |
returned by query_allele_frequencies |
ggplot2 object displaying the allele frequencies on a world map.
# select frequency of given allele sel_allele_freq <- query_allele_frequencies(hla_selection = "A*02:01", hla_sample_size_pattern = "bigger_than", hla_sample_size = 10000, standard="g") plot_allele_frequency(sel_allele_freq)
# select frequency of given allele sel_allele_freq <- query_allele_frequencies(hla_selection = "A*02:01", hla_sample_size_pattern = "bigger_than", hla_sample_size = 10000, standard="g") plot_allele_frequency(sel_allele_freq)
Query allele frequencies
query_allele_frequencies( hla_locus = NA, hla_selection = NA, hla_population = NA, hla_country = NA, hla_region = NA, hla_ethnic = NA, hla_sample_size_pattern = NA, hla_sample_size = NA, standard = "a" )
query_allele_frequencies( hla_locus = NA, hla_selection = NA, hla_population = NA, hla_country = NA, hla_region = NA, hla_ethnic = NA, hla_sample_size_pattern = NA, hla_sample_size = NA, standard = "a" )
hla_locus |
HLA locus that will be used for filtering data. A, B, C, DPA1, DPB1, DQA1, DQB1, DRB1 |
hla_selection |
Allele that will be used for filtering data. e.g. A*01:01 |
hla_population |
Numeric identifier of the population that will be used for filtering. This identifier is defined by the Allele Frequency Net Database. |
hla_country |
Country of interest (e.g. Germany, France, ...). |
hla_region |
Geographic region of interest (e.g. Europe, North Africa, ...) |
hla_ethnic |
Ethnic origin of interest (e.g. Caucasoid, Siberian, ...) |
hla_sample_size_pattern |
Keyword used to define the filtering for a specific population size. e.g. "bigger_than", "equal", "less_than", "less_equal_than", "bigger_equal_than" |
hla_sample_size |
Integer number used to define the filtering for a specific population size, together with the hla_sample_size_pattern argument. |
standard |
Population standards, as defined in the package vignette. "g" - gold, "s" - silver, "a" - all |
data.frame object containing the result of the allele frequency query
# select frequencies of the A*02:01 allele, # for gold standard population with more than 10,000 individuals sel <- query_allele_frequencies(hla_selection = "A*02:01", hla_sample_size_pattern = "bigger_than", hla_sample_size = 10000, standard="g")
# select frequencies of the A*02:01 allele, # for gold standard population with more than 10,000 individuals sel <- query_allele_frequencies(hla_selection = "A*02:01", hla_sample_size_pattern = "bigger_than", hla_sample_size = 10000, standard="g")
Query haplotype frequencies
query_haplotype_frequencies( hla_selection = NA, hla_population = NA, hla_country = NA, hla_region = NA, hla_ethnic = NA, hla_sample_size_pattern = NA, hla_sample_size = NA )
query_haplotype_frequencies( hla_selection = NA, hla_population = NA, hla_country = NA, hla_region = NA, hla_ethnic = NA, hla_sample_size_pattern = NA, hla_sample_size = NA )
hla_selection |
Alleles that will be used to build the haplotype query. One entry per locus. If no entry for a given locus, the function will search for haplotypes that do not include specifications for this locus. If any allele for a given locus should be considered, the list entry should be "A*" or other locus in same format. |
hla_population |
Numeric identifier of the population that will be used for filtering. Thie identifier is defined by the Allele Frequency Net Database. |
hla_country |
Country of interest (e.g. Germany, France, ...). |
hla_region |
Geographic region of interest (e.g. Europe, North Africa, ...) |
hla_ethnic |
Ethnic origin of interest (e.g. Caucasoid, Siberian, ...) |
hla_sample_size_pattern |
Keyword used to define the filtering for a specific population size. e.g. "bigger_than", "equal", "less_than", "less_equal_than", "bigger_equal_than" |
hla_sample_size |
Integer number used to define the filtering for a specific population size, together with the hla_sample_size_pattern argument. |
data.frame object containing the result of the allele frequency query
# works only for one haplotype at a time query_haplotype_frequencies(hla_selection = c("A*02:01", "B*", "C*"), hla_region = "Europe")
# works only for one haplotype at a time query_haplotype_frequencies(hla_selection = c("A*02:01", "B*", "C*"), hla_region = "Europe")
Query population metainformation
query_population_detail(population_ids)
query_population_detail(population_ids)
population_ids |
List of numeric identifiers of the population that will be used for filtering. The identifier is defined by the Allele Frequency Net Database. |
data.frame object containing the result of the population detail query
population_detail <- query_population_detail(0001986)
population_detail <- query_population_detail(0001986)
Retrieve MHC chain lookup table
retrieve_chain_lookup_table(organism)
retrieve_chain_lookup_table(organism)
organism |
name of organism (e.g. "human") |
Table containing MHC chain information for the organism. It contains chain names, MHC restriction and protein sequence.
retrieve_chain_lookup_table("mouse")
retrieve_chain_lookup_table("mouse")