Title: | R package for immunogenomics data handling and association analysis |
---|---|
Description: | MiDAS is a R package for immunogenetics data transformation and statistical analysis. MiDAS accepts input data in the form of HLA alleles and KIR types, and can transform it into biologically meaningful variables, enabling HLA amino acid fine mapping, analyses of HLA evolutionary divergence, KIR gene presence, as well as validated HLA-KIR interactions. Further, it allows comprehensive statistical association analysis workflows with phenotypes of diverse measurement scales. MiDAS closes a gap between the inference of immunogenetic variation and its efficient utilization to make relevant discoveries related to T cell, Natural Killer cell, and disease biology. |
Authors: | Christian Hammer [aut], Maciej Migdał [aut, cre] |
Maintainer: | Maciej Migdał <[email protected]> |
License: | MIT + file LICENCE |
Version: | 1.15.0 |
Built: | 2024-10-30 08:52:46 UTC |
Source: | https://github.com/bioc/midasHLA |
aaVariationToCounts
convert amino acid variation data frame into
counts table.
aaVariationToCounts(aa_variation)
aaVariationToCounts(aa_variation)
aa_variation |
Amino acid variation data frame as returned by hlaToAAVariation. |
Amino acid counts data frame. First column holds samples ID's, further columns, corresponding to specific amino acid positions, give information on the number of their occurrences in each sample.
Given a set of p-values, returns p-values adjusted using one of several methods.
adjustPValues(p, method, n = length(p))
adjustPValues(p, method, n = length(p))
p |
numeric vector of p-values (possibly with
|
method |
correction method. Can be abbreviated. |
n |
number of comparisons, must be at least |
This function modifies stats::p.adjust
method such that for Bonferroni
correction it is possible to specify n
lower than length(p)
.
This feature is useful in cases when knowledge about the biology or
redundance of alleles reduces the need for correction.
See p.adjust
for more details.
A numeric vector of corrected p-values (of the same length as p, with names copied from p).
Accessed on 28.07.20
allele_frequencies
allele_frequencies
A data frame with 2096 rows and 3 variables:
allele number, character
reference population name, character
allele frequency in reference population, float
A dataset containing allele frequencies across 5697 alleles For details visit the search results page in the allelefrequencies.net database website.
analyzeAssociations
perform association analysis on a single variable
level using a statistical model of choice.
analyzeAssociations( object, variables, placeholder = "term", correction = "bonferroni", n_correction = NULL, exponentiate = FALSE )
analyzeAssociations( object, variables, placeholder = "term", correction = "bonferroni", n_correction = NULL, exponentiate = FALSE )
object |
An existing fit from a model function such as lm, glm and many others. |
variables |
Character vector specifying variables to use in association tests. |
placeholder |
String specifying term in |
correction |
String specifying multiple testing correction method. See details for further information. |
n_correction |
Integer specifying number of comparisons to consider during multiple testing correction calculations. For Bonferroni correction it is possible to specify a number lower than the number of comparisons being made. This is useful in cases when knowledge about the biology or redundance of alleles reduces the need for correction. For other methods it must be at least equal to the number of comparisons being made; only set this (to non-default) when you know what you are doing! |
exponentiate |
Logical flag indicating whether or not to exponentiate
the coefficient estimates. Internally this is passed to
|
correction
specifies p-value adjustment method to use, common choice
is Benjamini & Hochberg (1995) ("BH"
). Internally this is passed to
p.adjust.
Tibble containing combined results for all variables
. The
first column "term"
hold the names of variables
. Further
columns depends on the used model and are determined by associated
tidy
function. Generally they will include "estimate"
,
"std.error"
, "statistic"
, "p.value"
, "conf.low"
,
"conf.high"
, "p.adjusted"
.
midas <- prepareMiDAS(hla_calls = MiDAS_tut_HLA, colData = MiDAS_tut_pheno, experiment = "hla_alleles") # analyzeAssociations expects model data to be a data.frame midas_data <- as.data.frame(midas) # define base model object <- lm(disease ~ term, data = midas_data) # test for alleles associations analyzeAssociations(object = object, variables = c("B*14:02", "DRB1*11:01"))
midas <- prepareMiDAS(hla_calls = MiDAS_tut_HLA, colData = MiDAS_tut_pheno, experiment = "hla_alleles") # analyzeAssociations expects model data to be a data.frame midas_data <- as.data.frame(midas) # define base model object <- lm(disease ~ term, data = midas_data) # test for alleles associations analyzeAssociations(object = object, variables = c("B*14:02", "DRB1*11:01"))
analyzeConditionalAssociations
perform stepwise conditional testing
adding the previous top-associated variable as covariate, until there are no
more significant variables based on a self-defined threshold.
analyzeConditionalAssociations( object, variables, placeholder = "term", correction = "bonferroni", n_correction = NULL, th, th_adj = TRUE, keep = FALSE, rss_th = 1e-07, exponentiate = FALSE )
analyzeConditionalAssociations( object, variables, placeholder = "term", correction = "bonferroni", n_correction = NULL, th, th_adj = TRUE, keep = FALSE, rss_th = 1e-07, exponentiate = FALSE )
object |
An existing fit from a model function such as lm, glm and many others. |
variables |
Character vector specifying variables to use in association tests. |
placeholder |
String specifying term to substitute with value from
|
correction |
String specifying multiple testing correction method. See details for further information. |
n_correction |
Integer specifying number of comparisons to consider during multiple testing correction calculations. For Bonferroni correction it is possible to specify a number lower than the number of comparisons being made. This is useful in cases when knowledge about the biology or redundance of alleles reduces the need for correction. For other methods it must be at least equal to the number of comparisons being made; only set this (to non-default) when you know what you are doing! |
th |
Number specifying threshold for a variable to be considered significant. |
th_adj |
Logical flag indicating if adjusted p-value should be used as threshold criteria, otherwise unadjusted p-value is used. |
keep |
Logical flag indicating if the output should be a list of results resulting from each selection step. Default is to return only the final result. |
rss_th |
Number specifying residual sum of squares threshold at which
function should stop adding additional variables. As the residual sum of
squares approaches |
exponentiate |
Logical flag indicating whether or not to exponentiate
the coefficient estimates. Internally this is passed to
|
Tibble with stepwise conditional testing results or a list of tibbles,
see keep
argument. The first column "term"
hold the names of
variables
. Further columns depends on the used model and are
determined by associated tidy
function. Generally they will include
"estimate"
, "std.error"
, "statistic"
, "p.value"
,
"conf.low"
, "conf.high"
, "p.adjusted"
.
midas <- prepareMiDAS(hla_calls = MiDAS_tut_HLA, colData = MiDAS_tut_pheno, experiment = "hla_alleles") # analyzeConditionalAssociations expects model data to be a data.frame midas_data <- as.data.frame(midas) # define base model object <- lm(disease ~ term, data = midas_data) analyzeConditionalAssociations(object, variables = c("B*14:02", "DRB1*11:01"), th = 0.05)
midas <- prepareMiDAS(hla_calls = MiDAS_tut_HLA, colData = MiDAS_tut_pheno, experiment = "hla_alleles") # analyzeConditionalAssociations expects model data to be a data.frame midas_data <- as.data.frame(midas) # define base model object <- lm(disease ~ term, data = midas_data) analyzeConditionalAssociations(object, variables = c("B*14:02", "DRB1*11:01"), th = 0.05)
Helper function transforming experiment counts to selected
inheritance_model
.
applyInheritanceModel( experiment, inheritance_model = c("dominant", "recessive", "additive", "overdominant") ) ## S3 method for class 'matrix' applyInheritanceModel( experiment, inheritance_model = c("dominant", "recessive", "additive", "overdominant") ) ## S3 method for class 'SummarizedExperiment' applyInheritanceModel( experiment, inheritance_model = c("dominant", "recessive", "additive", "overdominant") )
applyInheritanceModel( experiment, inheritance_model = c("dominant", "recessive", "additive", "overdominant") ) ## S3 method for class 'matrix' applyInheritanceModel( experiment, inheritance_model = c("dominant", "recessive", "additive", "overdominant") ) ## S3 method for class 'SummarizedExperiment' applyInheritanceModel( experiment, inheritance_model = c("dominant", "recessive", "additive", "overdominant") )
experiment |
Matrix or SummarizedExperiment object. |
inheritance_model |
String specifying inheritance model to use.
Available choices are |
Under "dominant"
model homozygotes and heterozygotes are coded as
1
. In "recessive"
model homozygotes are coded as 1
and
other as 0
. In "additive"
model homozygotes are coded as
2
and heterozygotes as 1
. In "overdominant"
homozygotes
(both 0
and 2
) are coded as 0
and heterozygotes as 1
.
experiment
converted to specified inheritance model.
Coerce MiDAS to Data Frame
## S3 method for class 'MiDAS' as.data.frame(x, ...)
## S3 method for class 'MiDAS' as.data.frame(x, ...)
x |
any R object. |
... |
additional arguments to be passed to or from methods. |
Data frame representation of MiDAS object. Consecutive columns hold values of variables from MiDAS's experiments and colData. The metadata associated with experiments is not preserved.
backquote
places backticks around elements of character vector
backquote(x)
backquote(x)
x |
Character vector. |
backquote
is useful when using HLA allele numbers in formulas, where
'*'
and ':'
characters have special meanings.
Character vector with its elements backticked.
characterMatches
checks if all elements of a character vector matches
values in choices.
characterMatches(x, choice)
characterMatches(x, choice)
x |
Character vector to test. |
choice |
Character vector with possible values for |
Logical indicating if x
's elements matches any of the values
in choice
.
checkAlleleFormat
test if the input character follows HLA nomenclature
specifications.
checkAlleleFormat(allele)
checkAlleleFormat(allele)
allele |
Character vector with HLA allele numbers. |
Correct HLA number should consist of HLA gene name followed by "*" and sets of digits separated with ":". Maximum number of sets of digits is 4 which is termed 8-digit resolution. Optionally HLA numbers can be supplemented with additional suffix indicating its expression status. See http://hla.alleles.org/nomenclature/naming.html for more details.
HLA alleles with identical sequences across exons encoding the peptide binding domains might be designated with G group allele numbers. Those numbers have additional G or GG suffix. See http://hla.alleles.org/alleles/g_groups.html for more details. They are interpreted as valid HLA alleles designations.
Logical vector specifying if allele
elements follows HLA
alleles naming conventions.
allele <- c("A*01:01", "A*01:02") checkAlleleFormat(allele)
allele <- c("A*01:01", "A*01:02") checkAlleleFormat(allele)
checkColDataFormat
asserts if the colData data frame has proper format.
checkColDataFormat(data_frame)
checkColDataFormat(data_frame)
data_frame |
Data frame containing colData data used to construct
|
Logical indicating if data_frame
is properly formatted.
Otherwise raise an error.
checkHlaCallsFormat
asserts if hla calls data frame have proper
format.
checkHlaCallsFormat(hla_calls)
checkHlaCallsFormat(hla_calls)
hla_calls |
HLA calls data frame, as returned by
|
Logical indicating if hla_calls
follows hla calls data frame
format. Otherwise raise an error.
checkKirCallsFormat
asserts if KIR counts data frame have proper
format.
checkKirCallsFormat(kir_calls)
checkKirCallsFormat(kir_calls)
kir_calls |
KIR calls data frame, as returned by
|
Logical indicating if kir_calls
follow KIR counts data frame
format. Otherwise raise an error.
checkKirGenesFormat
test if the input character follows KIR gene names
naming conventions.
checkKirGenesFormat(genes)
checkKirGenesFormat(genes)
genes |
Character vector with KIR gene names. |
KIR genes: "KIR3DL3", "KIR2DS2", "KIR2DL2", "KIR2DL3", "KIR2DP1", "KIR2DL1", "KIR3DP1", "KIR2DL1", "KIR3DP1", "KIR2DL4", "KIR3DL1", "KIR3DS1", "KIR2DL5", "KIR2DS3", "KIR2DS5", "KIR2DS4", "KIR2DS1", "KIR3DL2".
Logical vector specifying if genes
elements follow KIR genes
naming conventions.
checkKirGenesFormat(c("KIR3DL3", "KIR2DS2", "KIR2DL2"))
checkKirGenesFormat(c("KIR3DL3", "KIR2DS2", "KIR2DL2"))
checkStatisticalModel
asserts if object is an existing fit from a
model functions such as lm, glm and many others. Containing MiDAS object as
its data atribute.
checkStatisticalModel(object)
checkStatisticalModel(object)
object |
An existing fit from a model function such as lm, glm and many others. |
Logical indicating if object
is an existing fit from a
model functions such as lm, glm and many others. Containing MiDAS object as
its data attribute. Otherwise raise an error.
colnamesMatches
check if data frame's columns are named as specified
colnamesMatches(x, cols)
colnamesMatches(x, cols)
x |
Data frame to test. |
cols |
Ordered character vector to test against |
Logical indicating if x
's colnames equals choice
.
convertAlleleToVariable
converts input HLA allele numbers to additional
variables based on the supplied dictionary.
convertAlleleToVariable(allele, dictionary)
convertAlleleToVariable(allele, dictionary)
allele |
Character vector with HLA allele numbers. |
dictionary |
Path to file containing HLA allele dictionary or a data frame. |
dictionary
file should be a tsv format with header and two columns.
First column should hold allele numbers, second additional variables (eg.
expression level).
Type of the returned vector depends on the type of the additional variable.
Vector containing HLA allele numbers converted to additional
variables according to dictionary
.
dictionary <- system.file("extdata", "Match_allele_HLA_supertype.txt", package = "midasHLA") convertAlleleToVariable(c("A*01:01", "A*02:01"), dictionary = dictionary)
dictionary <- system.file("extdata", "Match_allele_HLA_supertype.txt", package = "midasHLA") convertAlleleToVariable(c("A*01:01", "A*02:01"), dictionary = dictionary)
countsToVariables
converts counts table to additional variables.
countsToVariables(counts, dictionary, na.value = NA, nacols.rm = TRUE)
countsToVariables(counts, dictionary, na.value = NA, nacols.rm = TRUE)
counts |
Data frame with counts, such as returned by
|
dictionary |
Path to file containing variables dictionary or data frame. See details for further explanations. |
na.value |
Vector of length one speciyfing value for variables with no
matching entry in |
nacols.rm |
Logical indicating if result columns that contain only
|
dictionary
file should be a tsv format with header and two columns.
First column should be named "Name"
and hold variable name, second
should be named "Expression"
and hold expression used to identify
variable (eg. "KIR2DL3 & ! KIR2DL2"
will match all samples with
KIR2DL3
and without KIR2DL2
). Optionally a data frame formatted
in the same manner can be passed instead.
Dictionaries shipped with the package:
kir_haplotypes
KIR genes to KIR haplotypes dictionary.
Data frame with variable number of columns. First column named
"ID"
corresponds to "ID"
column in counts
, further
columns hold indicators for converted variables. 1
and 0
code presence and absence of a variable respectively.
countsToVariables(MiDAS_tut_KIR, "kir_haplotypes")
countsToVariables(MiDAS_tut_KIR, "kir_haplotypes")
Function deletes 'ID' column from a df
, then transpose it and sets
the column names to values from deleted 'ID' column.
dfToExperimentMat(df)
dfToExperimentMat(df)
df |
Data frame |
Matrix representation of df
.
Integer vector giving Grantham distance values between pairs of amino acid residues.
dict_dist_grantham
dict_dist_grantham
Named integer vector of length 400.
distGrantham
calculates normalized Grantham distance between two
amino acid sequences. For details on calculations see
Grantham R. 1974..
distGrantham(aa1, aa2)
distGrantham(aa1, aa2)
aa1 |
Character vector giving amino acid sequence using one letter codings. Each element must correspond to single amino acid. |
aa2 |
Character vector giving amino acid sequence using one letter codings. Each element must correspond to single amino acid. |
Distance between amino acid sequences is normalized by length of compared sequences.
Lengths of aa1
and aa2
must be equal.
Numeric vector of normalized Grantham distance between aa1
and
aa2
.
Function transpose mat
and inserts column names of input mat
as
a 'ID' column.
experimentMatToDf(mat)
experimentMatToDf(mat)
mat |
Matrix |
Data frame representation of mat
.
Filter MiDAS object by frequency
filterByFrequency( object, experiment, lower_frequency_cutoff = NULL, upper_frequency_cutoff = NULL, carrier_frequency = FALSE )
filterByFrequency( object, experiment, lower_frequency_cutoff = NULL, upper_frequency_cutoff = NULL, carrier_frequency = FALSE )
object |
|
experiment |
String specifying experiment. |
lower_frequency_cutoff |
Number giving lower frequency threshold. Numbers greater than 1 are interpreted as the number of feature occurrences, numbers between 0 and 1 as fractions. |
upper_frequency_cutoff |
Number giving upper frequency threshold. Numbers greater than 1 are interpreted as the number of feature occurrences, numbers between 0 and 1 as fractions. |
carrier_frequency |
Logical flag indicating if carrier frequency should be returned. |
Filtered MiDAS
object.
filterByFrequency(object = MiDAS_tut_object, experiment = "hla_alleles", lower_frequency_cutoff = 0.05, upper_frequency_cutoff = 0.95, carrier_frequency = TRUE)
filterByFrequency(object = MiDAS_tut_object, experiment = "hla_alleles", lower_frequency_cutoff = 0.05, upper_frequency_cutoff = 0.95, carrier_frequency = TRUE)
Filter MiDAS object by omnibus groups
filterByOmnibusGroups(object, experiment, groups)
filterByOmnibusGroups(object, experiment, groups)
object |
|
experiment |
String specifying experiment. |
groups |
Character vector specifying omnibus groups to select. See
|
Filtered MiDAS
object.
filterByOmnibusGroups(object = MiDAS_tut_object, experiment = "hla_aa", groups = c("A_3", "A_6", "C_1"))
filterByOmnibusGroups(object = MiDAS_tut_object, experiment = "hla_aa", groups = c("A_3", "A_6", "C_1"))
Filter MiDAS object by features
filterByVariables(object, experiment, variables)
filterByVariables(object, experiment, variables)
object |
|
experiment |
String specifying experiment. |
variables |
Character vector specifying features to select. |
Filtered MiDAS
object.
filterByVariables(object = MiDAS_tut_object, experiment = "hla_alleles", variables = c("A*25:01", "A*26:01", "B*07:02"))
filterByVariables(object = MiDAS_tut_object, experiment = "hla_alleles", variables = c("A*25:01", "A*26:01", "B*07:02"))
Helper function for experiments filtering
filterExperimentByFrequency( experiment, carrier_frequency = FALSE, lower_frequency_cutoff = NULL, upper_frequency_cutoff = NULL ) ## S3 method for class 'matrix' filterExperimentByFrequency( experiment, carrier_frequency = FALSE, lower_frequency_cutoff = NULL, upper_frequency_cutoff = NULL ) ## S3 method for class 'SummarizedExperiment' filterExperimentByFrequency( experiment, carrier_frequency = FALSE, lower_frequency_cutoff = NULL, upper_frequency_cutoff = NULL )
filterExperimentByFrequency( experiment, carrier_frequency = FALSE, lower_frequency_cutoff = NULL, upper_frequency_cutoff = NULL ) ## S3 method for class 'matrix' filterExperimentByFrequency( experiment, carrier_frequency = FALSE, lower_frequency_cutoff = NULL, upper_frequency_cutoff = NULL ) ## S3 method for class 'SummarizedExperiment' filterExperimentByFrequency( experiment, carrier_frequency = FALSE, lower_frequency_cutoff = NULL, upper_frequency_cutoff = NULL )
experiment |
Matrix or SummarizedExperiment object. |
carrier_frequency |
Logical flag indicating if carrier frequency should be returned. |
lower_frequency_cutoff |
Positive number or |
upper_frequency_cutoff |
Positive number or |
Filtered experiment matrix.
Helper function for experiments filtering
filterExperimentByVariables(experiment, variables) ## S3 method for class 'matrix' filterExperimentByVariables(experiment, variables) ## S3 method for class 'SummarizedExperiment' filterExperimentByVariables(experiment, variables)
filterExperimentByVariables(experiment, variables) ## S3 method for class 'matrix' filterExperimentByVariables(experiment, variables) ## S3 method for class 'SummarizedExperiment' filterExperimentByVariables(experiment, variables)
experiment |
Matrix or SummarizedExperiment object. |
variables |
Character vector specifying features to choose. |
Filtered experiment
object.
Filter two level list by its secondary elements and remove empty items
filterListByElements(list, elements)
filterListByElements(list, elements)
list |
A list. |
elements |
Character vector of elements to keep. |
List filtered according to elements
argument.
formatResults
format statistical analysis results table to html or
latex format.
formatResults( results, filter_by = "p.value <= 0.05", arrange_by = "p.value", select_cols = c("term", "estimate", "std.error", "p.value", "p.adjusted"), format = c("html", "latex"), header = NULL, scroll_box_height = "400px" )
formatResults( results, filter_by = "p.value <= 0.05", arrange_by = "p.value", select_cols = c("term", "estimate", "std.error", "p.value", "p.adjusted"), format = c("html", "latex"), header = NULL, scroll_box_height = "400px" )
results |
Tibble as returned by |
filter_by |
Character vector specifying conditional expression used to
filter |
arrange_by |
Character vector specifying variable names to use for
sorting. Equivalent to |
select_cols |
Character vector specifying variable names that should be included in the output table. Can be also used to rename selected variables, see examples. |
format |
String |
header |
String specifying header for result table. If |
scroll_box_height |
A character string indicating the height of the table. |
Character vector of formatted table source code.
## Not run: midas <- prepareMiDAS(hla_calls = MiDAS_tut_HLA, colData = MiDAS_tut_pheno, experiment = "hla_alleles") object <- lm(disease ~ term, data = midas) res <- runMiDAS(object, experiment = "hla_alleles", inheritance_model = "dominant") formatResults(res, filter_by = c("p.value <= 0.05", "estimate > 0"), arrange_by = c("p.value * estimate"), select_cols = c("allele", "p-value" = "p.value"), format = "html", header = "HLA allelic associations") ## End(Not run)
## Not run: midas <- prepareMiDAS(hla_calls = MiDAS_tut_HLA, colData = MiDAS_tut_pheno, experiment = "hla_alleles") object <- lm(disease ~ term, data = midas) res <- runMiDAS(object, experiment = "hla_alleles", inheritance_model = "dominant") formatResults(res, filter_by = c("p.value <= 0.05", "estimate > 0"), arrange_by = c("p.value * estimate"), select_cols = c("allele", "p-value" = "p.value"), format = "html", header = "HLA allelic associations") ## End(Not run)
getAAFrequencies
calculates amino acid frequencies in amino acid
data frame.
getAAFrequencies(aa_variation)
getAAFrequencies(aa_variation)
aa_variation |
Amino acid variation data frame as returned by hlaToAAVariation. |
Both gene copies are taken into consideration for frequencies calculation,
frequency = n / (2 * j)
where n
is the number of amino acid
occurrences and j
is the number of samples in aa_variation
.
Data frame with each row holding specific amino acid position, it's count and frequency.
aa_variation <- hlaToAAVariation(MiDAS_tut_HLA) getAAFrequencies(aa_variation)
aa_variation <- hlaToAAVariation(MiDAS_tut_HLA) getAAFrequencies(aa_variation)
getAlleleResolution
returns the resolution of input HLA allele
numbers.
getAlleleResolution(allele)
getAlleleResolution(allele)
allele |
Character vector with HLA allele numbers. |
HLA allele resolution can take the following values: 2, 4, 6, 8. See http://hla.alleles.org/nomenclature/naming.html for more details.
NA
values are accepted and returned as NA
.
Integer vector specifying allele resolutions.
allele <- c("A*01:01", "A*01:02") getAlleleResolution(allele)
allele <- c("A*01:01", "A*01:02") getAlleleResolution(allele)
List HLA alleles and amino acid residues at a given position.
getAllelesForAA(object, aa_pos)
getAllelesForAA(object, aa_pos)
object |
|
aa_pos |
String specifying gene and amino acid position, example
|
Data frame containing HLA alleles, their corresponding amino acid residues and frequencies at requested position.
getAllelesForAA(object = MiDAS_tut_object, aa_pos = "A_9")
getAllelesForAA(object = MiDAS_tut_object, aa_pos = "A_9")
getExperimentFrequencies
calculate features frequencies.
getExperimentFrequencies( experiment, pop_mul = NULL, carrier_frequency = FALSE, ref = NULL ) ## S3 method for class 'matrix' getExperimentFrequencies( experiment, pop_mul = NULL, carrier_frequency = FALSE, ref = NULL ) ## S3 method for class 'SummarizedExperiment' getExperimentFrequencies( experiment, pop_mul = NULL, carrier_frequency = FALSE, ref = NULL )
getExperimentFrequencies( experiment, pop_mul = NULL, carrier_frequency = FALSE, ref = NULL ) ## S3 method for class 'matrix' getExperimentFrequencies( experiment, pop_mul = NULL, carrier_frequency = FALSE, ref = NULL ) ## S3 method for class 'SummarizedExperiment' getExperimentFrequencies( experiment, pop_mul = NULL, carrier_frequency = FALSE, ref = NULL )
experiment |
Matrix or SummarizedExperiment object. |
pop_mul |
Number by which number of samples should be multiplied to get the population size. |
carrier_frequency |
Logical flag indicating if carrier frequency should be returned. |
ref |
Wide format data frame with first column named "var" holding
features matching |
Data frame with each row holding specific variable, it's count and frequency.
getExperimentPopulationMultiplicator
extracts population multiplicator
from experiment's metadata.
getExperimentPopulationMultiplicator(experiment) ## S3 method for class 'matrix' getExperimentPopulationMultiplicator(experiment) ## S3 method for class 'SummarizedExperiment' getExperimentPopulationMultiplicator(experiment)
getExperimentPopulationMultiplicator(experiment) ## S3 method for class 'matrix' getExperimentPopulationMultiplicator(experiment) ## S3 method for class 'SummarizedExperiment' getExperimentPopulationMultiplicator(experiment)
experiment |
Matrix or SummarizedExperiment object. |
Experiment's population multiplicator number.
Get available experiments in MiDAS object.
getExperiments(object)
getExperiments(object)
object |
|
Character vector giving names of experiments in object
.
getExperiments(object = MiDAS_tut_object)
getExperiments(object = MiDAS_tut_object)
Calculate features frequencies for a given experiment in MiDAS object.
getFrequencies( object, experiment, carrier_frequency = FALSE, compare = FALSE, ref_pop = list(hla_alleles = c("USA NMDP African American pop 2", "USA NMDP Chinese", "USA NMDP European Caucasian", "USA NMDP Hispanic South or Central American", "USA NMDP Japanese", "USA NMDP North American Amerindian", "USA NMDP South Asian Indian"), kir_genes = c("USA California African American KIR", "USA California Asian American KIR", "USA California Caucasians KIR", "USA California Hispanic KIR")), ref = list(hla_alleles = allele_frequencies, kir_genes = kir_frequencies) )
getFrequencies( object, experiment, carrier_frequency = FALSE, compare = FALSE, ref_pop = list(hla_alleles = c("USA NMDP African American pop 2", "USA NMDP Chinese", "USA NMDP European Caucasian", "USA NMDP Hispanic South or Central American", "USA NMDP Japanese", "USA NMDP North American Amerindian", "USA NMDP South Asian Indian"), kir_genes = c("USA California African American KIR", "USA California Asian American KIR", "USA California Caucasians KIR", "USA California Hispanic KIR")), ref = list(hla_alleles = allele_frequencies, kir_genes = kir_frequencies) )
object |
|
experiment |
Matrix or SummarizedExperiment object. |
carrier_frequency |
Logical flag indicating if carrier frequency should be returned. |
compare |
Logical flag indicating if |
ref_pop |
Named list of character vectors giving names of reference
populations in |
ref |
Named list of reference frequencies data frames. Each element
should give reference for a specific experiment. See
|
Data frame with features from selected experiment and their
corresponding frequencies. Column "term"
hold features names,
"Counts"
hold number of feature occurrences, "Freq"
hold
feature frequencies. If argument compare
is set to TRUE
,
further columns will hold frequencies in reference populations.
# using default reference populations getFrequencies(object = MiDAS_tut_object, experiment = "hla_alleles", compare = TRUE) # using customized set of reference populations getFrequencies( object = MiDAS_tut_object, experiment = "hla_alleles", compare = TRUE, ref_pop = list( hla_alleles = c("USA NMDP Chinese", "USA NMDP European Caucasian") ), ref = list(hla_alleles = allele_frequencies) )
# using default reference populations getFrequencies(object = MiDAS_tut_object, experiment = "hla_alleles", compare = TRUE) # using customized set of reference populations getFrequencies( object = MiDAS_tut_object, experiment = "hla_alleles", compare = TRUE, ref_pop = list( hla_alleles = c("USA NMDP Chinese", "USA NMDP European Caucasian") ), ref = list(hla_alleles = allele_frequencies) )
Helper function for filtering frequency data frame
getFrequencyMask( df, lower_frequency_cutoff = NULL, upper_frequency_cutoff = NULL )
getFrequencyMask( df, lower_frequency_cutoff = NULL, upper_frequency_cutoff = NULL )
df |
Data frame as returned by |
lower_frequency_cutoff |
Positive number or |
upper_frequency_cutoff |
Positive number or |
Character vector containing names of variables after filtration.
Get HLA calls from MiDAS object.
getHlaCalls(object)
getHlaCalls(object)
object |
|
HLA calls data frame.
getHlaCalls(object = MiDAS_tut_object)
getHlaCalls(object = MiDAS_tut_object)
getHlaCallsGenes
get's genes found in HLA calls.
getHlaCallsGenes(hla_calls)
getHlaCallsGenes(hla_calls)
hla_calls |
HLA calls data frame, as returned by
|
Character vector of genes in hla_calls
.
getHlaFrequencies
calculates allele frequencies in HLA calls data
frame.
getHlaFrequencies( hla_calls, carrier_frequency = FALSE, compare = FALSE, ref_pop = c("USA NMDP African American pop 2", "USA NMDP Chinese", "USA NMDP European Caucasian", "USA NMDP Hispanic South or Central American", "USA NMDP Japanese", "USA NMDP North American Amerindian", "USA NMDP South Asian Indian"), ref = allele_frequencies )
getHlaFrequencies( hla_calls, carrier_frequency = FALSE, compare = FALSE, ref_pop = c("USA NMDP African American pop 2", "USA NMDP Chinese", "USA NMDP European Caucasian", "USA NMDP Hispanic South or Central American", "USA NMDP Japanese", "USA NMDP North American Amerindian", "USA NMDP South Asian Indian"), ref = allele_frequencies )
hla_calls |
HLA calls data frame, as returned by
|
carrier_frequency |
Logical flag indicating if carrier frequency should be returned. |
compare |
Logical flag indicating if |
ref_pop |
Character vector giving names of reference populations in
|
ref |
Data frame giving reference allele frequencies. See
|
Both gene copies are taken into consideration for frequencies calculation,
frequency = n / (2 * j)
where n
is the number of allele
occurrences and j
is the number of samples in hla_calls
.
Data frame with each row holding HLA allele, it's count and frequency.
getHlaFrequencies(MiDAS_tut_HLA)
getHlaFrequencies(MiDAS_tut_HLA)
getHlaKirInteractions
calculate presence-absence matrix of HLA - KIR
interactions.
getHlaKirInteractions( hla_calls, kir_calls, interactions_dict = system.file("extdata", "Match_counts_hla_kir_interactions.txt", package = "midasHLA") )
getHlaKirInteractions( hla_calls, kir_calls, interactions_dict = system.file("extdata", "Match_counts_hla_kir_interactions.txt", package = "midasHLA") )
hla_calls |
HLA calls data frame, as returned by
|
kir_calls |
KIR calls data frame, as returned by
|
interactions_dict |
Path to HLA - KIR interactions dictionary. |
hla_calls
are first reduced to all possible resolutions and converted
to additional variables, such as G groups, using dictionaries shipped with
the package.
interactions_dict
file should be a tsv format with header and two
columns. First column should be named "Name"
and hold interactions
names, second should be named "Expression"
and hold expression used to
identify interaction (eg. "C2 & KIR2DL1"
will match all samples
with C2
and KIR2DL1
). The package is shipped with an interactions
file based on Pende et al., 2019.
Data frame with variable number of columns. First column named
"ID"
corresponds to "ID"
column in counts
, further
columns hold indicators for HLA - KIR interactions. 1
and 0
code presence and absence of a variable respectively.
getHlaKirInteractions( hla_calls = MiDAS_tut_HLA, kir_calls = MiDAS_tut_KIR, interactions_dict = system.file( "extdata", "Match_counts_hla_kir_interactions.txt", package = "midasHLA") )
getHlaKirInteractions( hla_calls = MiDAS_tut_HLA, kir_calls = MiDAS_tut_KIR, interactions_dict = system.file( "extdata", "Match_counts_hla_kir_interactions.txt", package = "midasHLA") )
Get KIR calls from MiDAS object.
getKirCalls(object)
getKirCalls(object)
object |
|
KIR calls data frame.
getKirCalls(object = MiDAS_tut_object)
getKirCalls(object = MiDAS_tut_object)
getKIRFrequencies
calculates KIR genes frequencies in KIR calls data
frame.
getKIRFrequencies(kir_calls)
getKIRFrequencies(kir_calls)
kir_calls |
KIR calls data frame, as returned by
|
Data frame with each row holding KIR gene, it's count and frequency.
getKIRFrequencies(MiDAS_tut_KIR)
getKIRFrequencies(MiDAS_tut_KIR)
getObjectDetails
extracts some of the statistical model object
attributes that are needed for runMiDAS
internal calculations.
getObjectDetails(object)
getObjectDetails(object)
object |
An existing fit from a model function such as lm, glm and many others. |
List with following elements:
Object's call
Character containing names of variables in object formula
MiDAS object associated with model
Get omnibus groups from MiDAS object.
getOmnibusGroups(object, experiment)
getOmnibusGroups(object, experiment)
object |
|
experiment |
String specifying experiment. |
For some experiments features can be naturally divided into groups
(here called omnibus groups). For example, in "hla_aa"
experiment
features can be grouped by amino acid position ("B_46_E"
,
"B_46_A"
) can be grouped into B_46
group). Such groups can be
then used to perform omnibus test, see runMiDAS
for more
details.
List of omnibus groups for a given experiment.
getOmnibusGroups(object = MiDAS_tut_object, experiment = "hla_aa")
getOmnibusGroups(object = MiDAS_tut_object, experiment = "hla_aa")
Get placeholder name from MiDAS object.
getPlaceholder(object)
getPlaceholder(object)
object |
|
String giving name of placeholder.
getPlaceholder(object = MiDAS_tut_object)
getPlaceholder(object = MiDAS_tut_object)
Helper transforming reference frequencies
getReferenceFrequencies(ref, pop, carrier_frequency = FALSE)
getReferenceFrequencies(ref, pop, carrier_frequency = FALSE)
ref |
Long format data frame with three columns "var", "population", "frequency". |
pop |
Character giving names of populations to include |
carrier_frequency |
Logical indicating if carrier frequency should be returned instead of frequency. Carrier frequency is calculated based on Hardy-Weinberg equilibrium model. |
Wide format data frame with population frequencies as columns.
getVariableAAPos
finds variable amino acid positions in protein
sequence alignment.
getVariableAAPos(alignment, varchar = "[A-Z]")
getVariableAAPos(alignment, varchar = "[A-Z]")
alignment |
Matrix containing amino acid level alignment, as returned by
|
varchar |
Regex matching characters that should be considered when looking for variable amino acid positions. See details for further explanations. |
The variable amino acid positions in the alignment are those at which
different amino acids can be found. As the alignments can also contain indels
and unknown characters, the user choice might be to consider those positions
as variable or not. This can be achieved by passing appropriate regular
expression in varchar
. Eg. when varchar = "[A-Z]"
occurence of
deletion/insertion (".") will not be treated as variability. In order to
detect this kind of variability varchar = "[A-Z\\.]"
should be used.
Integer vector specifying which alignment columns are variable.
alignment <- readHlaAlignments(gene = "TAP1") getVariableAAPos(alignment)
alignment <- readHlaAlignments(gene = "TAP1") getVariableAAPos(alignment)
hasTidyMethod
check if there is a tidy method available for a given class.
hasTidyMethod(class)
hasTidyMethod(class)
class |
String giving object class. |
Logical indicating if there is a tidy method for a given class.
Helper function returning alignment for Grantham distance calculations
hlaAlignmentGrantham(gene, aa_sel = 2:182)
hlaAlignmentGrantham(gene, aa_sel = 2:182)
gene |
Character vector specifying HLA gene. |
aa_sel |
Numeric vector specifying amino acids that should be extracted. |
HLA alignment processed for grantham distance calculation. Processing includes extracting specific amino acids, masking indels, gaps and stop codons.
hlaCallsGranthamDistance
calculate Grantham distance between two HLA
alleles of a given, using original formula by
Grantham R. 1974..
hlaCallsGranthamDistance( hla_calls, genes = c("A", "B", "C"), aa_selection = "binding_groove" )
hlaCallsGranthamDistance( hla_calls, genes = c("A", "B", "C"), aa_selection = "binding_groove" )
hla_calls |
HLA calls data frame, as returned by
|
genes |
Character vector specifying genes for which allelic distance should be calculated. |
aa_selection |
String specifying variable region in peptide binding
groove which should be considered for Grantham distance calculation. Valid
choices includes: |
Grantham distance is calculated only for class I HLA alleles. First
exons forming the variable region in the peptide binding groove are selected.
Here we provide option to choose either "binding_groove"
- exon 2 and
3 (positions 1-182 in IMGT/HLA alignments, however here we take 2-182 as many
1st positions are missing), "B_pocket"
- residues 7, 9, 24, 25, 34,
45, 63, 66, 67, 70, 99 and "F_pocket"
- residues 77, 80, 81, 84, 95,
116, 123, 143, 146, 147. Then all the alleles containing gaps, stop codons or
indels are discarded. Finally distance is calculated for each pair.
See Robinson J. 2017. for more details on the choice of exons 2 and 3.
Data frame of normalized Grantham distances between pairs of alleles
for each specified HLA gene. First column (ID
) is the same as in
hla_calls
, further columns are named as given by genes
.
hlaCallsGranthamDistance(MiDAS_tut_HLA, genes = "A")
hlaCallsGranthamDistance(MiDAS_tut_HLA, genes = "A")
hlaCallsToCounts
converts HLA calls data frame into a counts table.
hlaCallsToCounts(hla_calls, check_hla_format = TRUE)
hlaCallsToCounts(hla_calls, check_hla_format = TRUE)
hla_calls |
HLA calls data frame, as returned by
|
check_hla_format |
Logical indicating if |
HLA allele counts data frame. First column holds samples ID's, further columns, corresponding to specific alleles, give information on the number of their occurrences in each sample.
hlaToAAVariation
convert HLA calls data frame to a matrix of variable
amino acid positions.
hlaToAAVariation(hla_calls, indels = TRUE, unkchar = FALSE, as_df = TRUE)
hlaToAAVariation(hla_calls, indels = TRUE, unkchar = FALSE, as_df = TRUE)
hla_calls |
HLA calls data frame, as returned by
|
indels |
Logical indicating whether indels should be considered when checking variability. |
unkchar |
Logical indicating whether unknown characters in the alignment should be considered when checking variability. |
as_df |
Logical indicating if data frame should be returned. Otherwise a matrix is returned. |
Variable amino acid positions are found by comparing elements of the
alignment column wise. Some of the values in alignment can be treated
specially using indels
and unkchar
arguments. Function
processes alignments for all HLA genes found in hla_calls
.
Variable amino acid position uses protein alignments from EBI database.
Matrix or data frame containing variable amino acid positions.
Rownames corresponds to ID column in hla_calls
, and colnames to
alignment positions. If no variation is found one column matrix filled with
NA
's is returned.
hlaToAAVariation(MiDAS_tut_HLA)
hlaToAAVariation(MiDAS_tut_HLA)
hlaToVariable
converts HLA calls data frame to additional variables.
hlaToVariable( hla_calls, dictionary, reduce = TRUE, na.value = 0, nacols.rm = TRUE )
hlaToVariable( hla_calls, dictionary, reduce = TRUE, na.value = 0, nacols.rm = TRUE )
hla_calls |
HLA calls data frame, as returned by
|
dictionary |
Path to file containing HLA allele dictionary or a data frame. |
reduce |
Logical indicating if function should try to reduce allele resolution when no matching entry in the dictionary is found. See details. |
na.value |
Vector of length one speciyfing value for alleles with
no matching entry in |
nacols.rm |
Logical indicating if result columns that contain only
|
dictionary
file should be a tsv format with header and two columns.
First column should hold allele numbers and second corresponding additional
variables. Optionally a data frame formatted in the same manner can be passed
instead.
dictionary
can be also used to access dictionaries shipped with the
package. They can be referred to by using one of the following strings:
"allele_HLA_Bw"
Translates HLA-B alleles together with A*23, A*24 and A*32 into Bw4 and Bw6 allele groups. In some cases HLA alleles containing Bw4 epitope, on nucleotide level actually carries a premature stop codon. Meaning that although on nucleotide level the allele would encode a Bw4 epitope it's not really there and it is assigned to Bw6 group. However in 4-digit resolution these alleles can not be distinguished from other Bw4 groups. Since alleles with premature stop codons are rare, Bw4 group is assigned.
"allele_HLA-B_only_Bw"
Translates HLA-B alleles (without A*23, A*24 and A*32) into Bw4 and Bw6 allele groups.
"allele_HLA-C_C1-2"
Translates HLA-C alleles into C1 and C2 allele groups.
"allele_HLA_supertype"
Translates HLA-A and HLA-B alleles into supertypes, a classification that group HLA alleles based on peptide binding specificities.
"allele_HLA_Ggroup"
Translates HLA alleles into G groups, which defines amino acid identity only in the exons relevant for peptide binding. Note that alleles DRB1*01:01:01 and DRB1*01:16 match more than one G group, here this ambiguity was removed by deleting matching with DRB5*01:01:01G group.
reduce
control if conversion should happen in a greedy way, such that
if some HLA number cannot be converted, it's resolution is reduced by 2 and
another attempt is taken. This process stops when alleles cannot be further
reduced or all have been successfully converted.
Data frame with variable number of columns. First column named
"ID"
corresponds to "ID"
column in hla_calls
, further
columns holds converted HLA variables.
hlaToVariable(MiDAS_tut_HLA, dictionary = "allele_HLA_supertype")
hlaToVariable(MiDAS_tut_HLA, dictionary = "allele_HLA_supertype")
Test experiment features for Hardy Weinberg equilibrium.
HWETest( object, experiment = c("hla_alleles", "hla_aa", "hla_g_groups", "hla_supertypes", "hla_NK_ligands"), HWE_group = NULL, HWE_cutoff = NULL, as.MiDAS = FALSE )
HWETest( object, experiment = c("hla_alleles", "hla_aa", "hla_g_groups", "hla_supertypes", "hla_NK_ligands"), HWE_group = NULL, HWE_cutoff = NULL, as.MiDAS = FALSE )
object |
|
experiment |
String specifying experiment to test. Valid values
includes |
HWE_group |
Expression defining samples grouping to test for Hardy Weinberg equilibrium. By default samples are not grouped. |
HWE_cutoff |
Number specifying p-value threshold. When |
as.MiDAS |
Logical flag indicating if MiDAS object should be returned. |
Setting as.MiDAS
to TRUE
will filter MiDAS object based on
p-value cut-off given by HWE_cutoff
.
Data frame with Hardy Weinberg Equilibrium test results or a filtered MiDAS object.
# create MiDAS object midas <- prepareMiDAS(hla_calls = MiDAS_tut_HLA, colData = MiDAS_tut_pheno, experiment = "hla_alleles" ) # get HWE p-values as data frame HWETest(midas, experiment = "hla_alleles") # get HWE in groups defined by disease status # grouping by `disease == 1` will divide samples into two groups: # `disease == 1` and `not disease == 1` HWETest(midas, experiment = "hla_alleles", HWE_group = disease == 1) # filter MiDAS object by HWE test p-value HWETest(midas, experiment = "hla_alleles", HWE_cutoff = 0.05, as.MiDAS = TRUE)
# create MiDAS object midas <- prepareMiDAS(hla_calls = MiDAS_tut_HLA, colData = MiDAS_tut_pheno, experiment = "hla_alleles" ) # get HWE p-values as data frame HWETest(midas, experiment = "hla_alleles") # get HWE in groups defined by disease status # grouping by `disease == 1` will divide samples into two groups: # `disease == 1` and `not disease == 1` HWETest(midas, experiment = "hla_alleles", HWE_group = disease == 1) # filter MiDAS object by HWE test p-value HWETest(midas, experiment = "hla_alleles", HWE_cutoff = 0.05, as.MiDAS = TRUE)
isCharacterOrNULL
checks if the object is a character vector or NULL.
isCharacterOrNULL(x)
isCharacterOrNULL(x)
x |
object to test. |
Logical indicating if object is character vector or NULL
isClassOrNULL
checks if object is an instance of a specified class or
is null.
isClass(x, class)
isClass(x, class)
x |
object to test. |
class |
String specifying class to test. |
Logical indicating if x
is an instance of class
.
isClassOrNULL
checks if object is an instance of a specified class or
is null.
isClassOrNULL(x, class)
isClassOrNULL(x, class)
x |
object to test. |
class |
String specifying class to test. |
Logical indicating if x
is an instance of class
.
isCountOrNULL
check if object is a count (a single positive integer)
or NULL.
isCountOrNULL(x)
isCountOrNULL(x)
x |
object to test. |
Logical indicating if object is count or NULL
isCountsOrZeros
checks if vector contains only positive integers or
zeros.
isCountsOrZeros(x, na.rm = TRUE)
isCountsOrZeros(x, na.rm = TRUE)
x |
Numeric vector or object that can be |
na.rm |
Logical indicating if |
Logical indicating if provided vector contains only positive integers or zeros.
isExperimentCountsOrZeros
checks if experiment contains only positive
integers or zeros.
isExperimentCountsOrZeros(x, na.rm = TRUE)
isExperimentCountsOrZeros(x, na.rm = TRUE)
x |
Matrix or SummarizedExperiment object. |
na.rm |
Logical indicating if |
Logical indicating if x
contains only positive integers or
zeros.
isExperimentInheritanceModelApplicable
check experiment's metadata
for presence of "inheritance_model_applicable"
flag, indicating if
inheritance model can be applied.
isExperimentInheritanceModelApplicable(experiment) ## S3 method for class 'matrix' isExperimentInheritanceModelApplicable(experiment) ## S3 method for class 'SummarizedExperiment' isExperimentInheritanceModelApplicable(experiment)
isExperimentInheritanceModelApplicable(experiment) ## S3 method for class 'matrix' isExperimentInheritanceModelApplicable(experiment) ## S3 method for class 'SummarizedExperiment' isExperimentInheritanceModelApplicable(experiment)
experiment |
Matrix or SummarizedExperiment object. |
Logical flag.
isFlagOrNULL
checks if object is flag (a length one logical vector) or
NULL.
isFlagOrNULL(x)
isFlagOrNULL(x)
x |
object to test. |
Logical indicating if object is flag or NULL
isNumberOrNULL
checks if object is number (a length one numeric
vector) or NULL.
isNumberOrNULL(x)
isNumberOrNULL(x)
x |
object to test. |
Logical indicating if object is number or NULL
isStringOrNULL
checks if object is string (a length one character
vector) or NULL.
isStringOrNULL(x)
isStringOrNULL(x)
x |
object to test. |
Logical indicating if object is string or NULL
isTRUEorFALSE
check if object is a flag (a length one logical vector)
except NA.
isTRUEorFALSE(x)
isTRUEorFALSE(x)
x |
object to test. |
Logical indicating if object is TRUE or FALSE flag
iterativeLRT
performs likelihood ratio test in an iterative manner
over groups of variables given in omnibus_groups
.
iterativeLRT(object, placeholder, omnibus_groups)
iterativeLRT(object, placeholder, omnibus_groups)
object |
An existing fit from a model function such as lm, glm and many others. |
placeholder |
String specifying term to substitute with value from
|
omnibus_groups |
List of character vectors giving sets of variables for which omnibus test should be applied. |
Data frame containing summarised likelihood ratio test results.
Information about variable statistic from each model is extracted using
tidy
function.
iterativeModel(object, placeholder, variables, exponentiate = FALSE)
iterativeModel(object, placeholder, variables, exponentiate = FALSE)
object |
An existing fit from a model function such as lm, glm and many others. |
placeholder |
String specifying term to substitute with value from
|
variables |
Character vector specifying variables to use in association tests. |
exponentiate |
Logical flag indicating whether or not to exponentiate
the coefficient estimates. Internally this is passed to
|
Tibble containing per variable summarised model statistics. The exact
output format is model dependent and controlled by model's dedicated
tidy
function.
kableResults
convert results table (runMiDAS
output) to
HTML or LaTeX format.
kableResults( results, colnames = NULL, header = "MiDAS analysis results", pvalue_cutoff = NULL, format = getOption("knitr.table.format"), scroll_box_height = "400px" )
kableResults( results, colnames = NULL, header = "MiDAS analysis results", pvalue_cutoff = NULL, format = getOption("knitr.table.format"), scroll_box_height = "400px" )
results |
Tibble as returned by |
colnames |
Character vector of form |
header |
String specifying results table header. |
pvalue_cutoff |
Number specifying p-value cutoff for results to be
included in output. If |
format |
String |
scroll_box_height |
A character string indicating the height of the table. |
Association analysis results table in HTML or LaTeX.
midas <- prepareMiDAS(hla_calls = MiDAS_tut_HLA, colData = MiDAS_tut_pheno, experiment = "hla_alleles") object <- lm(disease ~ term, data = midas) res <- runMiDAS(object, experiment = "hla_alleles", inheritance_model = "additive") kableResults(results = res, colnames = c("HLA allele" = "allele"))
midas <- prepareMiDAS(hla_calls = MiDAS_tut_HLA, colData = MiDAS_tut_pheno, experiment = "hla_alleles") object <- lm(disease ~ term, data = midas) res <- runMiDAS(object, experiment = "hla_alleles", inheritance_model = "additive") kableResults(results = res, colnames = c("HLA allele" = "allele"))
Accessed on 28.08.20
kir_frequencies
kir_frequencies
A data frame with 3744 rows and 3 variables:
allele number, character
reference population name, character
KIR genes carrier frequency in reference population, float
A dataset containing KIR genes frequencies across 16 genes. For details visit the search results page in the allelefrequencies.net database website.
Used to run function iteratively over list, while using tryCatch to
catch warnings and errors to finally present a summary of issues rather
than error on each and every one. Used in iterativeLRT
and
iterativeModel
.
lapply_tryCatch(X, FUN, err_res, ...)
lapply_tryCatch(X, FUN, err_res, ...)
X |
a vector (atomic or list) or an |
FUN |
the function to be applied to each element of |
err_res |
Function creating a result that should be output in case of error. |
... |
optional arguments to |
List of elements as returned by FUN
.
listMiDASDictionaries
lists dictionaries shipped with the MiDAS package.
See hlaToVariable
for more details on dictionaries.
listMiDASDictionaries(pattern = "allele", file.names = FALSE)
listMiDASDictionaries(pattern = "allele", file.names = FALSE)
pattern |
String used to match dictionary names, it can be a regular expression. By default all names are matched. |
file.names |
Logical value. If FALSE, only the names of dictionaries are returned. If TRUE their paths are returned. |
Character vector giving names of available HLA alleles dictionaries.
LRTest
carry out an asymptotic likelihood ratio test for two models.
LRTest(mod0, mod1)
LRTest(mod0, mod1)
mod0 |
An existing fit from a model function such as lm, glm and many others. |
mod1 |
Object of the same class as |
mod0
have to be a reduced version of mod1
. See examples.
Data frame with the results of likelihood ratio test of the supplied models.
Column term
holds new variables appearing in mod1
,
df
difference in degrees of freedom between models, logLik
difference in log likelihoods, statistic
Chisq
statistic and
p.value
corresponding p-value.
Example HLA calls data used in MiDAS tutorial
MiDAS_tut_HLA
MiDAS_tut_HLA
Data frame with 1000 rows and 19 columns. First column holds samples ID's, following columns holds HLA alleles calls for different genes.
Character sample ID
Character
Character
Character
Character
Character
Character
Character
Character
Character
Character
Character
Character
Character
Character
Character
Character
Character
Character
Example KIRR presence/absence data used in MiDAS tutorial
MiDAS_tut_KIR
MiDAS_tut_KIR
Data frame with 1000 rows and 17 columns. First column holds samples ID's, following columns holds presence/absence indicators for different KIR genes.
Character sample ID
Integer
Integer
Integer
Integer
Integer
Integer
Integer
Integer
Integer
Integer
Integer
Integer
Integer
Integer
Integer
Integer
Example MiDAS object created with data used in MiDAS tutorial:
MiDAS_tut_HLA
, MiDAS_tut_KIR
, MiDAS_tut_pheno
.
Used in code examlpes and unit tests.
MiDAS_tut_object
MiDAS_tut_object
MiDAS object with following experiments defined:
SummarizedExperiment with 447 rows and 1000 columns
SummarizedExperiment with 1223 rows and 1000 columns
SummarizedExperiment with 46 rows and 1000 columns
SummarizedExperiment with 12 rows and 1000 columns
SummarizedExperiment with 5 rows and 1000 columns
SummarizedExperiment with 16 rows and 1000 columns
SummarizedExperiment with 6 rows and 1000 columns
SummarizedExperiment with 29 rows and 1000 columns
matrix with 4 rows and 1000 columns
SummarizedExperiment with 9 rows and 1000 columns
Example phenotype data used in MiDAS tutorial
MiDAS_tut_pheno
MiDAS_tut_pheno
Data frame with 1000 rows and 4 columns.
Character sample ID
Integer
Numeric
Integer
The MiDAS
class is a MultiAssayExperiment
object
containing data and metadata required for MiDAS analysis.
Valid MiDAS
object must have unique features names across all
experiments and colData. It's metadata list needs to have a placeholder
element, which is a string specifying name of column in colData used when
defining statistical model for downstream analyses (see
runMiDAS
for more details). Optionally the object's metadata
can also store 'hla_calls'
and 'kir_calls'
data frames (see
prepareMiDAS
for more details).
## S4 method for signature 'MiDAS' getExperiments(object) ## S4 method for signature 'MiDAS' getHlaCalls(object) ## S4 method for signature 'MiDAS' getKirCalls(object) ## S4 method for signature 'MiDAS' getPlaceholder(object) ## S4 method for signature 'MiDAS' getOmnibusGroups(object, experiment) ## S4 method for signature 'MiDAS' getFrequencies( object, experiment, carrier_frequency = FALSE, compare = FALSE, ref_pop = list(hla_alleles = c("USA NMDP African American pop 2", "USA NMDP Chinese", "USA NMDP European Caucasian", "USA NMDP Hispanic South or Central American", "USA NMDP Japanese", "USA NMDP North American Amerindian", "USA NMDP South Asian Indian"), kir_genes = c("USA California African American KIR", "USA California Asian American KIR", "USA California Caucasians KIR", "USA California Hispanic KIR")), ref = list(hla_alleles = allele_frequencies, kir_genes = kir_frequencies) ) ## S4 method for signature 'MiDAS' filterByFrequency( object, experiment, lower_frequency_cutoff = NULL, upper_frequency_cutoff = NULL, carrier_frequency = FALSE ) ## S4 method for signature 'MiDAS' filterByOmnibusGroups(object, experiment, groups) ## S4 method for signature 'MiDAS' filterByVariables(object, experiment, variables) ## S4 method for signature 'MiDAS' getAllelesForAA(object, aa_pos)
## S4 method for signature 'MiDAS' getExperiments(object) ## S4 method for signature 'MiDAS' getHlaCalls(object) ## S4 method for signature 'MiDAS' getKirCalls(object) ## S4 method for signature 'MiDAS' getPlaceholder(object) ## S4 method for signature 'MiDAS' getOmnibusGroups(object, experiment) ## S4 method for signature 'MiDAS' getFrequencies( object, experiment, carrier_frequency = FALSE, compare = FALSE, ref_pop = list(hla_alleles = c("USA NMDP African American pop 2", "USA NMDP Chinese", "USA NMDP European Caucasian", "USA NMDP Hispanic South or Central American", "USA NMDP Japanese", "USA NMDP North American Amerindian", "USA NMDP South Asian Indian"), kir_genes = c("USA California African American KIR", "USA California Asian American KIR", "USA California Caucasians KIR", "USA California Hispanic KIR")), ref = list(hla_alleles = allele_frequencies, kir_genes = kir_frequencies) ) ## S4 method for signature 'MiDAS' filterByFrequency( object, experiment, lower_frequency_cutoff = NULL, upper_frequency_cutoff = NULL, carrier_frequency = FALSE ) ## S4 method for signature 'MiDAS' filterByOmnibusGroups(object, experiment, groups) ## S4 method for signature 'MiDAS' filterByVariables(object, experiment, variables) ## S4 method for signature 'MiDAS' getAllelesForAA(object, aa_pos)
object |
|
experiment |
String specifying experiment. |
carrier_frequency |
Logical flag indicating if carrier frequency should be returned. |
compare |
Logical flag indicating if |
ref_pop |
Named list of character vectors giving names of reference
populations in |
ref |
Named list of reference frequencies data frames. Each element
should give reference for a specific experiment. See
|
lower_frequency_cutoff |
Number giving lower frequency threshold. Numbers greater than 1 are interpreted as the number of feature occurrences, numbers between 0 and 1 as fractions. |
upper_frequency_cutoff |
Number giving upper frequency threshold. Numbers greater than 1 are interpreted as the number of feature occurrences, numbers between 0 and 1 as fractions. |
groups |
Character vector specifying omnibus groups to select. See
|
variables |
Character vector specifying features to select. |
aa_pos |
String specifying gene and amino acid position, example
|
Instance of class MiDAS
Transform MiDAS to wide format data.frame
midasToWide(object, experiment)
midasToWide(object, experiment)
object |
Object of class MiDAS |
experiment |
Character specifying experiments to include |
Data frame representation of MiDAS object. Consecutive columns holds values of variables from MiDAS's experiments and colData. The metadata associated with experiments is not preserved.
isTRUEorFALSE
check if object is a flag (a length one logical vector)
except NA.
objectHasPlaceholder(object, placeholder)
objectHasPlaceholder(object, placeholder)
object |
statistical model to test. |
placeholder |
string specifying name of placeholder. |
Logical indicating if placeholder is present in object formula.
OmnibusTest
calculates overall p-value for linear combination of
variables using likelihood ratio test.
omnibusTest( object, omnibus_groups, placeholder = "term", correction = "bonferroni", n_correction = NULL )
omnibusTest( object, omnibus_groups, placeholder = "term", correction = "bonferroni", n_correction = NULL )
object |
An existing fit from a model function such as lm, glm and many others. |
omnibus_groups |
List of character vectors giving sets of variables for which omnibus test should be applied. |
placeholder |
String specifying term in |
correction |
String specifying multiple testing correction method. See details for further information. |
n_correction |
Integer specifying number of comparisons to consider during multiple testing correction calculations. For Bonferroni correction it is possible to specify a number lower than the number of comparisons being made. This is useful in cases when knowledge about the biology or redundance of alleles reduces the need for correction. For other methods it must be at least equal to the number of comparisons being made; only set this (to non-default) when you know what you are doing! |
Likelihood ratio test is conducted by comparing a model given in an
object
with an extended model, that is created by including the effect
of variables given in variables
as their linear combination.
Data frame with columns:
"group" Omnibus group name
"term" Elements of omnibus group added to base model
"df" Difference in degrees of freedom between base and extended model
"logLik" Difference in log likelihoods between base and extended model
"statistic" Chisq statistic
"p.value" P-value
"p.adjusted" Adjusted p-value
midas <- prepareMiDAS(hla_calls = MiDAS_tut_HLA, colData = MiDAS_tut_pheno, experiment = "hla_aa") # define base model object <- lm(disease ~ term, data = midas) omnibusTest(object, omnibus_groups = list( A_29 = c("A_29_D", "A_29_A"), A_43 = c("A_43_Q", "A_43_R") ))
midas <- prepareMiDAS(hla_calls = MiDAS_tut_HLA, colData = MiDAS_tut_pheno, experiment = "hla_aa") # define base model object <- lm(disease ~ term, data = midas) omnibusTest(object, omnibus_groups = list( A_29 = c("A_29_D", "A_29_A"), A_43 = c("A_43_Q", "A_43_R") ))
prepareMiDAS
transform HLA alleles calls and KIR calls according
to selected experiment
s creating a MiDAS
object.
prepareMiDAS( hla_calls = NULL, kir_calls = NULL, colData, experiment = c("hla_alleles", "hla_aa", "hla_g_groups", "hla_supertypes", "hla_NK_ligands", "kir_genes", "kir_haplotypes", "hla_kir_interactions", "hla_divergence", "hla_het", "hla_custom", "kir_custom"), placeholder = "term", lower_frequency_cutoff = NULL, upper_frequency_cutoff = NULL, indels = TRUE, unkchar = FALSE, hla_divergence_aa_selection = "binding_groove", hla_het_resolution = 8, hla_dictionary = NULL, kir_dictionary = NULL )
prepareMiDAS( hla_calls = NULL, kir_calls = NULL, colData, experiment = c("hla_alleles", "hla_aa", "hla_g_groups", "hla_supertypes", "hla_NK_ligands", "kir_genes", "kir_haplotypes", "hla_kir_interactions", "hla_divergence", "hla_het", "hla_custom", "kir_custom"), placeholder = "term", lower_frequency_cutoff = NULL, upper_frequency_cutoff = NULL, indels = TRUE, unkchar = FALSE, hla_divergence_aa_selection = "binding_groove", hla_het_resolution = 8, hla_dictionary = NULL, kir_dictionary = NULL )
hla_calls |
HLA calls data frame, as returned by
|
kir_calls |
KIR calls data frame, as returned by
|
colData |
Data frame holding additional variables like phenotypic
observations or covariates. It have to contain |
experiment |
Character vector indicating analysis type for which data
should be prepared. Valid choices are |
placeholder |
String giving name for dummy variable inserted to
|
lower_frequency_cutoff |
Number giving lower frequency threshold. Numbers greater than 1 are interpreted as the number of feature occurrences, numbers between 0 and 1 as fractions. |
upper_frequency_cutoff |
Number giving upper frequency threshold. Numbers greater than 1 are interpreted as the number of feature occurrences, numbers between 0 and 1 as fractions. |
indels |
Logical indicating whether indels should be considered when
checking amino acid variability in |
unkchar |
Logical indicating whether unknown characters in the alignment
should be considered when checking amino acid variability in
|
hla_divergence_aa_selection |
String specifying variable region in peptide binding
groove which should be considered for Grantham distance calculation. Valid
choices includes: |
hla_het_resolution |
Number specifying HLA alleles resolution used to
calculate heterogeneity in |
hla_dictionary |
Data frame giving HLA allele dictionary used in
|
kir_dictionary |
Data frame giving KIR genes dictionary used in
|
experiment
specifies analysis types for which hla_calls
and
kir_call
should be prepared.
'hla_alleles'
hla_calls
are transformed to counts matrix describing number of
allele occurrences for each sample. This experiment is used to test
associations on HLA alleles level.
'hla_aa'
hla_calls
are transformed to a matrix of variable amino acid
positions. See hlaToAAVariation
for more details. This
experiment is used to test associations on amino acid level.
"hla_g_groups"
hla_calls
are translated into HLA G groups and transformed to
matrix describing number of G group occurrences for each sample. See
hlaToVariable
for more details. This experiment is used to
test associations on HLA G groups level.
"hla_supertypes"
hla_calls
are translated into HLA supertypes and transformed to
matrix describing number of G group occurrences for each sample. See
hlaToVariable
for more details. This experiment is used to
test associations on HLA supertypes level.
"hla_NK_ligands"
hla_calls
are translated into NK ligands, which includes HLA
Bw4/Bw6 and HLA C1/C2 groups and transformed to matrix describing number
of their occurrences for each sample. See hlaToVariable
for
more details.This experiment is used to test associations on HLA NK
ligands level.
"kir_genes"
kir_calls
are transformed to counts matrix describing number of
KIR gene occurrences for each sample. This experiment is used to test
associations on KIR genes level.
"hla_kir_interactions"
hla_calls
and kir_calls
are translated to HLA - KIR
interactions as defined in
Pende et al., 2019..
See getHlaKirInteractions
for more details. This experiment
is used to test associations on HLA - KIR interactions level.
"hla_divergence"
Grantham distance for class I HLA alleles is calculated based on
hla_calls
using original formula by
Grantham R. 1974..
See hlaCallsGranthamDistance
for more details. This
experiment is used to test associations on HLA divergence level measured
by Grantham distance.
"hla_het"
hla_calls
are transformed to heterozygosity status, where 1
designates a heterozygote and 0
homozygote. Heterozygosity status
is calculated only for classical HLA genes (A, B, C, DQA1, DQB1, DRA,
DRB1, DPA1, DPB1). This experiment is used to test associations on HLA
divergence level measured by heterozygosity.
Object of class MiDAS
midas <- prepareMiDAS(hla_calls = MiDAS_tut_HLA, kir_calls = MiDAS_tut_KIR, colData = MiDAS_tut_pheno, experiment = "hla_alleles")
midas <- prepareMiDAS(hla_calls = MiDAS_tut_HLA, kir_calls = MiDAS_tut_KIR, colData = MiDAS_tut_pheno, experiment = "hla_alleles")
Prepare MiDAS data on HLA amino acid level
prepareMiDAS_hla_aa(hla_calls, indels = TRUE, unkchar = FALSE, ...)
prepareMiDAS_hla_aa(hla_calls, indels = TRUE, unkchar = FALSE, ...)
hla_calls |
HLA calls data frame, as returned by
|
indels |
Logical indicating whether indels should be considered when checking variability. |
unkchar |
Logical indicating whether unknown characters in the alignment should be considered when checking variability. |
... |
Not used |
SummarizedExperiment
Prepare MiDAS data on HLA allele level
prepareMiDAS_hla_alleles(hla_calls, ...)
prepareMiDAS_hla_alleles(hla_calls, ...)
hla_calls |
HLA calls data frame, as returned by
|
... |
Not used |
Matrix
Prepare MiDAS data on custom HLA level
prepareMiDAS_hla_custom(hla_calls, hla_dictionary, ...)
prepareMiDAS_hla_custom(hla_calls, hla_dictionary, ...)
hla_calls |
HLA calls data frame, as returned by
|
hla_dictionary |
Data frame giving HLA allele dictionary. See
|
... |
Not used |
Matrix
Prepare MiDAS data on HLA divergence level
prepareMiDAS_hla_divergence( hla_calls, hla_divergence_aa_selection = "binding_groove", ... )
prepareMiDAS_hla_divergence( hla_calls, hla_divergence_aa_selection = "binding_groove", ... )
hla_calls |
HLA calls data frame, as returned by
|
hla_divergence_aa_selection |
String specifying variable region in peptide binding
groove which should be considered for Grantham distance calculation. Valid
choices includes: |
... |
Not used |
Matrix
Prepare MiDAS data on HLA allele's G groups level
prepareMiDAS_hla_g_groups(hla_calls, ...)
prepareMiDAS_hla_g_groups(hla_calls, ...)
hla_calls |
HLA calls data frame, as returned by
|
... |
Not used |
Matrix
Prepare MiDAS data on HLA heterozygosity level
prepareMiDAS_hla_het(hla_calls, hla_het_resolution = 8, ...)
prepareMiDAS_hla_het(hla_calls, hla_het_resolution = 8, ...)
hla_calls |
HLA calls data frame, as returned by
|
hla_het_resolution |
Number specifying HLA alleles resolution used to calculate heterogeneity. |
... |
Not used |
Matrix
Prepare MiDAS data on HLA - KIR interactions level
prepareMiDAS_hla_kir_interactions(hla_calls, kir_calls, ...)
prepareMiDAS_hla_kir_interactions(hla_calls, kir_calls, ...)
hla_calls |
HLA calls data frame, as returned by
|
kir_calls |
KIR calls data frame, as returned by
|
... |
Not used |
Matrix
Prepare MiDAS data on HLA allele's groups level
prepareMiDAS_hla_NK_ligands(hla_calls, ...)
prepareMiDAS_hla_NK_ligands(hla_calls, ...)
hla_calls |
HLA calls data frame, as returned by
|
... |
Not used |
Matrix
Prepare MiDAS data on HLA allele's supertypes level
prepareMiDAS_hla_supertypes(hla_calls, ...)
prepareMiDAS_hla_supertypes(hla_calls, ...)
hla_calls |
HLA calls data frame, as returned by
|
... |
Not used |
Matrix
Prepare MiDAS data on custom KIR level
prepareMiDAS_kir_custom(kir_calls, kir_dictionary, ...)
prepareMiDAS_kir_custom(kir_calls, kir_dictionary, ...)
kir_calls |
KIR calls data frame, as returned by
|
kir_dictionary |
Data frame giving KIR genes dictionary. See
|
... |
Not used |
Matrix
Prepare MiDAS data on KIR genes level
prepareMiDAS_kir_genes(kir_calls, ...)
prepareMiDAS_kir_genes(kir_calls, ...)
kir_calls |
KIR calls data frame, as returned by
|
... |
Not used |
Matrix
Prepare MiDAS data on KIR haplotypes level
prepareMiDAS_kir_haplotypes(kir_calls, ...)
prepareMiDAS_kir_haplotypes(kir_calls, ...)
kir_calls |
KIR calls data frame, as returned by
|
... |
Not used |
Matrix
readHlaAlignments
read HLA allele alignments from file.
readHlaAlignments(file, gene = NULL, trim = FALSE, unkchar = "")
readHlaAlignments(file, gene = NULL, trim = FALSE, unkchar = "")
file |
Path to input file. |
gene |
Character vector of length one specifying the name of a gene for which alignment is required. See details for further explanations. |
trim |
Logical indicating if alignment should be trimmed to start codon of the mature protein. |
unkchar |
Character to be used to represent positions with unknown sequence. |
HLA allele alignment file should follow EBI database format, for details see ftp://ftp.ebi.ac.uk/pub/databases/ipd/imgt/hla/alignments/README.md.
All protein alignment files from the EBI database are shipped with the package.
They can be easily accessed using gene
parameter. If gene
is
set to NULL
, file
parameter is used instead and alignment is
read from the provided file. In EBI database alignments for DRB1, DRB3, DRB4
and DRB5 genes are provided as a single file, here they are separated.
Additionally, for the alleles without sequence defined in the original alignment files we have infered thier sequence based on known higher resolution alleles.
Matrix containing HLA allele alignments.
Rownames correspond to allele numbers and columns to positions in the
alignment. Sequences following the termination codon are marked as empty
character (""
). Unknown sequences are marked with a character of
choice, by default ""
. Stop codons are represented by a hash (X).
Insertion and deletions are marked with period (.).
hla_alignments <- readHlaAlignments(gene = "A")
hla_alignments <- readHlaAlignments(gene = "A")
readHlaCalls
read HLA allele calls from file
readHlaCalls(file, resolution = 4, na.strings = c("Not typed", "-", "NA"))
readHlaCalls(file, resolution = 4, na.strings = c("Not typed", "-", "NA"))
file |
Path to input file. |
resolution |
Number specifying desired resolution. |
na.strings |
a character vector of strings which are to be
interpreted as |
Input file has to be a tsv formatted table with a header. First column should
contain sample IDs, further columns hold HLA allele numbers. See
system.file("extdata", "MiDAS_tut_HLA.txt", package = "midasHLA")
file
for an example.
resolution
parameter can be used to reduce HLA allele numbers. If
reduction is not needed resolution
can be set to 8. resolution
parameter can take the following values: 2, 4, 6, 8. For more details
about HLA allele numbers resolution see
http://hla.alleles.org/nomenclature/naming.html.
HLA calls data frame. First column hold sample IDs, further columns hold HLA allele numbers.
file <- system.file("extdata", "MiDAS_tut_HLA.txt", package = "midasHLA") hla_calls <- readHlaCalls(file)
file <- system.file("extdata", "MiDAS_tut_HLA.txt", package = "midasHLA") hla_calls <- readHlaCalls(file)
readKirCalls
read KIR calls from file.
readKirCalls(file, na.strings = c("", "NA", "uninterpretable"))
readKirCalls(file, na.strings = c("", "NA", "uninterpretable"))
file |
Path to input file. |
na.strings |
a character vector of strings which are to be
interpreted as |
Input file has to be a tsv formatted table. First column should be named
"ID" and contain samples IDs, further columns should hold KIR genes presence
/ absence indicators. See
system.file("extdata", "MiDAS_tut_KIR", package = "midasHLA")
for an
example.
Data frame containing KIR gene's counts. First column hold samples IDs, further columns hold KIR genes presence / absence indicators.
file <- system.file("extdata", "MiDAS_tut_KIR.txt", package = "midasHLA") readKirCalls(file)
file <- system.file("extdata", "MiDAS_tut_KIR.txt", package = "midasHLA") readKirCalls(file)
reduceAlleleResolution
reduce HLA allele numbers resolution.
reduceAlleleResolution(allele, resolution = 4)
reduceAlleleResolution(allele, resolution = 4)
allele |
Character vector with HLA allele numbers. |
resolution |
Number specifying desired resolution. |
In cases when allele number contain additional suffix their resolution
can not be unambiguously reduced. These cases are returned unchanged.
Function behaves in the same manner if resolution
is higher than
resolution of input HLA allele numbers.
NA
values are accepted and returned as NA
.
TODO here we give such warning when alleles have G or GG suffix (see http://hla.alleles.org/alleles/g_groups.html) "Reducing G groups alleles, major allele gene name will be used." I dond't really remember why we are doing this xd These allele numbers are processed as normal alleles (without suffix). Let me know if this warning is relevant or we could go without it. If we want to leave it lets also add text in documentation.
Character vector containing reduced HLA allele numbers.
reduceAlleleResolution(c("A*01", "A*01:24", "C*05:24:55:54"), 2)
reduceAlleleResolution(c("A*01", "A*01:24", "C*05:24:55:54"), 2)
reduceHlaCalls
reduces HLA calls data frame to specified resolution.
reduceHlaCalls(hla_calls, resolution = 4)
reduceHlaCalls(hla_calls, resolution = 4)
hla_calls |
HLA calls data frame, as returned by
|
resolution |
Number specifying desired resolution. |
Alleles with resolution greater than resolution
or optional suffixes
are returned unchanged.
HLA calls data frame reduced to specified resolution.
reduceHlaCalls(MiDAS_tut_HLA, resolution = 2)
reduceHlaCalls(MiDAS_tut_HLA, resolution = 2)
runMiDAS
perform association analysis on MiDAS data using statistical
model of choice. Function is intended for use with prepareMiDAS
.
See examples section.
runMiDAS( object, experiment, inheritance_model = NULL, conditional = FALSE, omnibus = FALSE, omnibus_groups_filter = NULL, lower_frequency_cutoff = NULL, upper_frequency_cutoff = NULL, correction = "bonferroni", n_correction = NULL, exponentiate = FALSE, th = 0.05, th_adj = TRUE, keep = FALSE, rss_th = 1e-07 )
runMiDAS( object, experiment, inheritance_model = NULL, conditional = FALSE, omnibus = FALSE, omnibus_groups_filter = NULL, lower_frequency_cutoff = NULL, upper_frequency_cutoff = NULL, correction = "bonferroni", n_correction = NULL, exponentiate = FALSE, th = 0.05, th_adj = TRUE, keep = FALSE, rss_th = 1e-07 )
object |
An existing fit from a model function such as lm, glm and many others. |
experiment |
String indicating the experiment associated with
|
inheritance_model |
String specifying inheritance model to use.
Available choices are |
conditional |
Logical flag indicating if conditional analysis should be performed. |
omnibus |
Logical flag indicating if omnibus test should be used. |
omnibus_groups_filter |
Character vector specifying omnibus groups to use. |
lower_frequency_cutoff |
Number giving lower frequency threshold. Numbers greater than 1 are interpreted as the number of feature occurrences, numbers between 0 and 1 as fractions. |
upper_frequency_cutoff |
Number giving upper frequency threshold. Numbers greater than 1 are interpreted as the number of feature occurrences, numbers between 0 and 1 as fractions. |
correction |
String specifying multiple testing correction method. See details for further information. |
n_correction |
Integer specifying number of comparisons to consider during multiple testing correction calculations. For Bonferroni correction it is possible to specify a number lower than the number of comparisons being made. This is useful in cases when knowledge about the biology or redundance of alleles reduces the need for correction. For other methods it must be at least equal to the number of comparisons being made; only set this (to non-default) when you know what you are doing! |
exponentiate |
Logical flag indicating whether or not to exponentiate
the coefficient estimates. Internally this is passed to
|
th |
Number specifying threshold for a variable to be considered significant. |
th_adj |
Logical flag indicating if adjusted p-value should be used as threshold criteria, otherwise unadjusted p-value is used. |
keep |
Logical flag indicating if the output should be a list of results resulting from each selection step. Default is to return only the final result. |
rss_th |
Number specifying residual sum of squares threshold at which
function should stop adding additional variables. As the residual sum of
squares approaches |
By default statistical analysis is performed iteratively on each variable in
selected experiment. This is done by substituting placeholder
in the
object
's formula with each variable in the experiment.
Setting conditional
argument to TRUE
will cause the statistical
analysis to be performed in a stepwise conditional testing manner, adding the
previous top-associated variable as a covariate to object
's formula.
The analysis stops when there is no more significant variables, based on
self-defined threshold (th
argument). Either adjusted or unadjusted
p-values can be used as the selection criteria, which is controlled using
th_adj
argument.
Setting omnibus
argument to TRUE
will cause the statistical
analysis to be performed iteratively on groups of variables (like residues at
particular amino acid position) using likelihood ratio test.
Argument inheritance_model
specifies the inheritance model that should
be applyed to experiment's data. Following choices are available:
"dominant" carrier status is sufficient for expression of the phenotype (non-carrier: 0, heterozygous & homozygous carrier: 1).
"recessive" two copies are required for expression of the phenotype (non-carrier & heterozygous carrier: 0, homozygous carrier: 1).
"additive" allele dosage matters, homozygous carriers show stronger phenotype expression or higher risk than heterozygous carriers (non-carrier = 0, heterozygous carrier = 1, homozygous carrier = 2).
"overdominant" heterozygous carriers are at higher risk compared to non-carriers or homozygous carriers (non-carrier & homozygous carrier = 0, heterozygous carrier = 1).
correction
specifies p-value adjustment method to use, common choice
is Benjamini & Hochberg (1995) ("BH"
). Internally this is passed to
p.adjust.
Analysis results, depending on the parameters:
conditional=FALSE, omnibus=FALSE
Tibble with first column "term"
holding names of tested
variables (eg. alleles). Further columns depends on the used
model and are determined by associated tidy
function. Generally
they will include "estimate"
, "std.error"
,
"statistic"
, "p.value"
, "conf.low"
,
"conf.high"
, "p.adjusted"
.
conditional=TRUE, omnibus=FALSE
Tibble or a list of tibbles, see keep
argument. The first column
"term"
hold names of tested variables. Further
columns depends on the used model and are determined by associated
tidy
function. Generally they will include "estimate"
,
"std.error"
, "statistic"
, "p.value"
,
"conf.low"
, "conf.high"
, "p.adjusted"
.
conditional=FALSE, omnibus=TRUE
Tibble with first column holding names of tested omnibus groups
(eg. amino acid positions) and second names of variables in the group
(eg. residues). Further columns are: "df"
giving difference in
degrees of freedom between base and extended model, "statistic"
giving Chisq statistic, "p.value"
and "p.adjusted"
.
conditional=TRUE, omnibus=TRUE
Tibble or a list of tibbles, see keep
argument. The first column
hold names of tested omnibus groups (eg. amino acid positions), second
column hold names of variables in the group (eg. residues). Further
columns are: "df"
giving difference in degrees of freedom
between base and extended model, "statistic"
giving Chisq
statistic, "p.value"
and "p.adjusted"
.
# create MiDAS object midas <- prepareMiDAS(hla_calls = MiDAS_tut_HLA, colData = MiDAS_tut_pheno, experiment = c("hla_alleles", "hla_aa") ) # construct statistical model object <- lm(disease ~ term, data = midas) # run analysis runMiDAS(object, experiment = "hla_alleles", inheritance_model = "dominant") # omnibus test # omnibus_groups_filter argument can be used to restrict omnibus test only # to selected variables groups, here we restrict the analysis to HLA-A # positions 29 and 43. runMiDAS( object, experiment = "hla_aa", inheritance_model = "dominant", omnibus = TRUE, omnibus_groups_filter = c("A_29", "A_43") )
# create MiDAS object midas <- prepareMiDAS(hla_calls = MiDAS_tut_HLA, colData = MiDAS_tut_pheno, experiment = c("hla_alleles", "hla_aa") ) # construct statistical model object <- lm(disease ~ term, data = midas) # run analysis runMiDAS(object, experiment = "hla_alleles", inheritance_model = "dominant") # omnibus test # omnibus_groups_filter argument can be used to restrict omnibus test only # to selected variables groups, here we restrict the analysis to HLA-A # positions 29 and 43. runMiDAS( object, experiment = "hla_aa", inheritance_model = "dominant", omnibus = TRUE, omnibus_groups_filter = c("A_29", "A_43") )
Helper getting variables frequencies from MiDAS object. Additionally for
binary test covariate frequencies per phenotype are added. Used in scope of
runMiDAS
.
runMiDASGetVarsFreq(midas, experiment, test_covar)
runMiDASGetVarsFreq(midas, experiment, test_covar)
midas |
MiDAS object. |
experiment |
String specifying experiment from |
test_covar |
String giving name of test covariate. |
Data frame with variable number of columns. First column,
"term"
holds experiment's variables, further columns hold number of
variable occurrence and their frequencies.
stringMatches
checks if string is equal to one of the choices.
stringMatches(x, choice)
stringMatches(x, choice)
x |
string to test. |
choice |
Character vector with possible values for |
Logical indicating if x
matches one of the strings in
choice
.
List HLA alleles and amino acid residues at a given position.
summariseAAPosition(hla_calls, aa_pos, aln = NULL, na.rm = FALSE)
summariseAAPosition(hla_calls, aa_pos, aln = NULL, na.rm = FALSE)
hla_calls |
HLA calls data frame, as returned by
|
aa_pos |
String specifying gene and amino acid position, example
|
aln |
Matrix containing amino acid sequence alignments as returned by
|
na.rm |
Logical flag indicating if |
Data frame containing HLA alleles, their corresponding amino acid residues and frequencies at requested position.
summariseAAPosition(MiDAS_tut_HLA, "A_9")
summariseAAPosition(MiDAS_tut_HLA, "A_9")
updateModel
adds new variables to model and re-fit it.
updateModel(object, x, placeholder = NULL, backquote = TRUE, collapse = " + ")
updateModel(object, x, placeholder = NULL, backquote = TRUE, collapse = " + ")
object |
An existing fit from a model function such as lm, glm and many others. |
x |
Character vector specifying variables to be added to model. |
placeholder |
String specifying term to substitute with value from
|
backquote |
Logical indicating if added variables should be quoted.
Elements of this vector are recycled over |
collapse |
String specifying how variables should be combined. Defaults
to |
Updated fitted object.
validateFrequencyCutoffs
checks if lower_frequency_cutoff
and
upper_frequency_cutoff
are valid.
validateFrequencyCutoffs(lower_frequency_cutoff, upper_frequency_cutoff)
validateFrequencyCutoffs(lower_frequency_cutoff, upper_frequency_cutoff)
lower_frequency_cutoff |
Number |
upper_frequency_cutoff |
Number |
lower_frequency_cutoff
and upper_frequency_cutoff
should be a
positive numbers, giving either frequency or counts.
lower_frequency_cutoff
has to be lower than
upper_frequency_cutoff
.
Logical indicating if lower_frequency_cutoff
and
upper_frequency_cutoff
are valid.