rexposome
is an R package designed for the analysis of
exposome data. The exposome can be defined as the
measure of all the exposures of an individual in a lifetime and how
those exposures relate to health. Hence, the aim or
rexposome
is to offer a set of functions to incorporate
exposome data to R framework. Also to provide a series of tools
to analyse exposome data using standard methods from
Biocondcutor.
rexposome
is currently in development and not available
from CRAN nor Bioconductor. Anyway, the package can be installed by
using devtools
R package and taking the source from
Bioinformatic Research Group in Epidemiology’s GitHub repository.
This can be done by opening an R session and typing the following code:
User must take into account that this sentence do not install the packages dependencies.
The following pictures illustrates the rexposome
’s
pipeline:
The first step is to load exposome data on R.
rexposome
provides to functions for this aim: one to load
three TXT
files and another to use three
data.frames
. Then the quantification of missing data and
values under limit of detection (LOD) is done, and it helps to consider
imputation processes. The exposome characterization is useful
to understand the nature of the exposome and the relations
between exposures. The clustering processes on individual exposure data
is done to find exposure-signatures which association with health
outcomes can be tested in the next step. From both exposures and
exposure-signatures levels, the association with health outcomes is
tested using Exposome-Wide Association Studies (ExWAS).
rexposome
defines the exposome data as a three
different data-sets:
The description data is a file describing the exposome. This means that has a row for each exposure and, at last, defined the families of exposures. Usually, this file incorporates a description of the exposures, the matrix where it was obtained and the units of measurement among others.
The following is an example of a description data file:
exposure family matrix description
bde100 PBDEs colostrum BDE 100 - log10
bde138 PBDEs colostrum BDE 138 - log10
bde209 PBDEs colostrum BDE 209 - log10
PFOA PFAS cord blood PFOA - log10
PFNA PFAS cord blood PFNA - log10
PFOA PFAS maternal serum PFOA - log10
PFNA PFAS maternal serum PFNA - log10
hg Metals cord blood hg - log 10
Co Metals urine Co (creatinine) - log10
Zn Metals urine Zn (creatinine) - log10
Pb Metals urine Pb (creatinine) - log10
THM Water --- Average total THM uptake - log10
CHCL3 Water --- Average Chloroform uptake - log10
BROM Water --- Average Brominated THM uptake - log10
NO2 Air --- NO2 levels whole pregnancy- log10
Ben Air --- Benzene levels whole pregnancy- log10
The exposures data file is the one containing the measures of each exposures for all the individuals included in the analysis. It is a matrix-like file having a row per individual and a column per exposures. It must includes a column with the subject’s identifier.
The following is an example of a exposures data file:
id bde100 bde138 bde209 PFOA ...
sub01 2.4665 0.7702 1.6866 2.0075 ...
sub02 0.7799 1.4147 1.2907 1.0153 ...
sub03 -1.6583 -0.9851 -0.8902 -0.0806 ...
sub04 -1.0812 -0.6639 -0.2988 -0.4268 ...
sub05 -0.2842 -0.1518 -1.5291 -0.7365 ...
... ... ... ... ...
The last of the data-sets is the phenotype data files. This file contains the covariates to be included in the analysis as well as the health outcomes of interest. It contains a row per individual included in the analysis and a column for each covariate and outcome. Moreover, it must include a column with the individual’s identifier.
The following is an example of a phenotype data file:
id asthma BMI sex age ...
sub01 control 23.2539 boy 4 ...
sub02 asthma 24.4498 girl 5 ...
sub03 asthma 15.2356 boy 4 ...
sub04 control 25.1387 girl 4 ...
sub05 control 22.0477 boy 5 ...
... ... ... ... ...
To properly coordinate the exposome data, the information included in the three data-sets must follow some rules:
This rules are easy seen in the following figure:
In summary: All the exposures, rows, in the description data file are columns in the exposures data file (plus the column for identifying subjects). All the subjects in the exposures data files are, also, in the phenotype data file.
To ease the life of researchers that have their datasets as one big
table (exposures and phenotypes combined in a single table), we offer
the option of using it as a input to rexposome
. Please look
into the documentation of the loadExposome_plain()
function
for further details on how to load this type of data. A few remarks on
that methodology:
rexposome
An example of single table is the following:
id bde100 bde138 bde209 asthma BMI ...
sub01 2.4665 0.7702 1.6866 control 23.2539 ...
sub02 0.7799 1.4147 1.2907 asthma 24.4498 ...
sub03 -1.6583 -0.9851 -0.8902 asthma 15.2356 ...
sub04 -1.0812 -0.6639 -0.2988 control 25.1387 ...
sub05 -0.2842 -0.1518 -1.5291 control 22.0477 ...
... ... ... ... ... ...
And a visual representation of this single tables is the following:
rexposome
R package is loaded using the standard
library
command:
rexposome
provides two functions to load the
exposome data: readExposome
and
loadexposome
. The function readExposome
will
load the exposome data from txt files and loadExposome
will
do the same from standard R data.frame
s. Both functions
will create an ExposomeSet
object. The
ExposomeSet
is a standard S4 class that will encapsulate
the exposome data.
TXT
filesThe function readExposome
will create an
ExposomeSet
from the three txt
files. The
following lines are used to locate these three files, that were included
in the package for demonstration purposes.
path <- file.path(path.package("rexposome"), "extdata")
description <- file.path(path, "description.csv")
phenotype <- file.path(path, "phenotypes.csv")
exposures <- file.path(path, "exposures.csv")
These files follows the rules described in Data
Format section. They are csv
files, meaning each
values is split from the others by a comma (,
). Function
readExposome
allows to load most any type of files
containing exposome data:
## function (exposures, description, phenotype, sep = ",", na.strings = c("NA",
## "-", "?", " ", ""), exposures.samCol = "sample", description.expCol = "exposure",
## description.famCol = "family", phenotype.samCol = "sample",
## exposures.asFactor = 5, warnings = TRUE)
## NULL
readExposome
expects, by default, csv
files. Changing the content of the argument sep
will allow
to load other files types. The missing values are set using the argument
na.strings
. This means that the character assigned to this
argument will be interpreted as a missing value. By default, those
characters are "NA"
, "-"
, "?"
,
" "
and ""
. Then, the columns with the
exposures’ names and the individual’s names need to be indicated.
Arguments exposures.samCol
and
phenotype.samCol
indicates the column with the individuals’
names at exposures file and phenotypes file. The
arguments description.expCol
and
description.famCol
indicates the column containing the
exposures’ names and the exposures’ family in the description
file.
exp <- readExposome(exposures = exposures, description = description, phenotype = phenotype,
exposures.samCol = "idnum", description.expCol = "Exposure", description.famCol = "Family",
phenotype.samCol = "idnum")
The result is an object of class ExposomeSet
, that can
show all the information of the loaded exposome:
## Object of class 'ExposomeSet' (storageMode: environment)
## . exposures description:
## . categorical: 4
## . continuous: 84
## . exposures transformation:
## . categorical: 0
## . transformed: 0
## . standardized: 0
## . imputed: 0
## . assayData: 88 exposures 109 individuals
## . element names: exp
## . exposures: AbsPM25, ..., Zn
## . individuals: id001, ..., id108
## . phenoData: 109 individuals 9 phenotypes
## . individuals: id001, ..., id108
## . phenotypes: whistling_chest, ..., cbmi
## . featureData: 88 exposures 7 explanations
## . exposures: AbsPM25, ..., Zn
## . descriptions: Family, ..., .imp
## experimentData: use 'experimentData(object)'
## Annotation:
Under the section exposures description the number of
continuous (84) and categorical (4) exposures are shown. The
assayData, phenoData and featureData shows
the content of the files we loaded with readExposome
.
data.frame
The function loadExposome
allows to create an
ExposomeSet
through three data.frames
: one as
description data, one as exposures data and one as
phenotypes data. The arguments are similar to the ones from
readExposome
:
## function (exposures, description, phenotype, description.famCol = "family",
## exposures.asFactor = 5, warnings = TRUE)
## NULL
In order to illustrate how to use loadExposome
, we are
loading the previous csv
files as
data.frames
:
dd <- read.csv(description, header = TRUE)
ee <- read.csv(exposures, header = TRUE)
pp <- read.csv(phenotype, header = TRUE)
Then we rearrange the data.frames
to fulfil with the
requirements of the exposome data. The data.frame
corresponding to description data needs to have the exposure’s
names as rownames.
The data.frame
corresponding to exposures data
needs to have the individual’s identifiers as rownames:
The data.frame
corresponding to phenotypes data
needs to have the individual’s identifiers as a rownames, as the
previous data.frame
:
Then, the ExposomeSet
is creating by giving the three
data.frames
to loadExposome
:
The class ExposomeSet
has several accessors to get the
data stored in it. There are four basic methods that returns the names
of the individuals (sampleNames
), the name of the exposures
(exposureNames
), the name of the families of exposures
(familyNames
) and the name of the phenotypes
(phenotypeNames
).
## [1] "id001" "id002" "id003" "id004" "id005" "id006"
## [1] "AbsPM25" "As" "BDE100" "BDE138" "BDE153" "BDE154"
## [1] "Air Pollutants" "Metals" "PBDEs"
## [4] "Bisphenol A" "Water Pollutants" "Built Environment"
## [7] "Cotinine" "Organochlorines" "Home Environment"
## [10] "Phthalates" "Noise" "PFOAs"
## [13] "Temperature"
## [1] "whistling_chest" "flu" "rhinitis" "wheezing"
## [5] "birthdate" "sex" "age" "cbmi"
## [9] "blood_pre"
fData
will return the description of the exposures
(including internal information to manage them).
## Family Name .fct .trn
## AbsPM25 Air Pollutants Measurement of the blackness of PM2.5 filters
## As Metals Asenic
## BDE100 PBDEs Polybrominated diphenyl ether -100
## .std .imp .type
## AbsPM25 numeric
## As numeric
## BDE100 numeric
pData
will return the phenotypes information.
## whistling_chest flu rhinitis wheezing birthdate sex age cbmi blood_pre
## id001 never no no no 2004-12-29 male 4.2 16.3 120
## id002 never no no no 2005-01-05 male 4.2 16.4 121
## id003 7-12 epi no no yes 2005-01-05 male 4.2 19.0 120
Finally, the method expos
allows to obtain the matrix of
exposures as a data.frame
:
## Cotinine PM10V PM25 X5cxMEPP
## id001 0.03125173 0.10373078 1.176255 NA
## id002 1.59401990 -0.47768393 1.155122 NA
## id003 1.46251090 NA 1.215834 1.859045
## id004 0.89059991 NA 1.171610 NA
## id005 NA NA 1.145765 NA
## id006 0.34818304 NA 1.145382 NA
## id007 1.53591130 NA 1.174642 NA
## id008 2.26864700 NA 1.165078 1.291871
## id009 1.24842660 NA 1.171406 1.650948
## id010 -0.36758339 0.01593277 1.179240 2.112357
The number of missing data on each exposure and on each phenotype can
be found by using the function tableMissings
. This function
returns a vector with the amount of missing data in each exposure or
phenotype. The argument set
indicates if the number of
missing values is counted on exposures of phenotypes. The argument
output
indicates if it is shown as counts
(output="n"
) or as percentage
(output="p"
).
The current exposome data has no missing in the exposures nor in the phenotypes:
## Dens Temp Conn AbsPM25 NO NO2
## 0 0 1 2 2 2
## NOx PM10 PM10Cu PM10Fe PM10K PM10Ni
## 2 2 2 2 2 2
## PM10S PM10SI PM10Zn PM25 PM25CU PM25FE
## 2 2 2 2 2 2
## PM25K PM25Ni PM25S PM25Sl PM25Zn PMcoarse
## 2 2 2 2 2 2
## Benzene PM25V ETS G_pesticides Gas BTHM
## 3 3 5 5 5 6
## CHCl3 H_pesticides Noise_d Noise_n THM Cotinine
## 6 6 6 6 6 7
## DDE DDT HCB PCB118 PCB138 PCB153
## 13 13 13 13 13 13
## PCB180 bHCH BPA As Cs Mo
## 13 13 21 24 24 24
## Ni Tl Zn Hg Cd Sb
## 24 24 24 27 28 30
## Green Cu PM10V Se MBzP MEHHP
## 31 40 41 45 46 46
## MEHP MEOHP MEP MiBP MnBP X5cxMEPP
## 46 46 46 46 46 46
## Co PFHxS PFNA PFOA PFOS X7OHMMeOP
## 47 48 48 48 48 49
## Pb X2cxMMHP BDE100 BDE138 BDE153 BDE154
## 59 64 76 76 76 76
## BDE17 BDE183 BDE190 BDE209 BDE28 BDE47
## 76 76 76 76 76 76
## BDE66 BDE71 BDE85 BDE99
## 76 76 76 76
## whistling_chest flu rhinitis wheezing sex
## 0 0 0 0 0
## age cbmi blood_pre birthdate
## 0 0 2 3
Alternatively to tableMissings
, the function
plotMissings
draw a bar plot with the percentage of missing
data in each exposure of phenotype.
Most of the test done in exposome analysis requires that the
exposures must follow a normal distribution. The function
normalityTest
performs a test on each exposure for
normality behaviour. The result is a data.frame
with the
exposures’ names, a flag TRUE
/FALSE
for
normality and the p-value obtained from the Shapiro-Wilk Normality
Test (if the p-value is under the threshold, then the exposure is
not normal).
##
## FALSE TRUE
## 55 29
So, the exposures that do not follow a normal distribution are:
## [1] "DDT" "PM10SI" "PM25K" "PM25Sl" "PCB118" "Tl"
## [7] "PM10V" "PM25Zn" "PM25FE" "PM10K" "BDE17" "PM25"
## [13] "PMcoarse" "PM10" "BPA" "Green" "NO2" "Cs"
## [19] "PFNA" "PCB153" "PM25CU" "MEOHP" "Cu" "HCB"
## [25] "MEHHP" "DDE" "BDE190" "bHCH" "PM10Zn" "MnBP"
## [31] "NO" "NOx" "PM10S" "MEHP" "PCB138" "Zn"
## [37] "X2cxMMHP" "PCB180" "PFOA" "PFHxS" "Cotinine" "PM25S"
## [43] "Co" "Conn" "PM25Ni" "PM10Ni" "Cd" "Dens"
## [49] "Se" "X5cxMEPP" "BDE183" "BDE28" "Sb" "BDE138"
## [55] "PM25V"
Some of these exposures are categorical so they must not follow a
normal distribution. This is the case, for example, of
G_pesticides
. If we plot the histogram of the values of the
exposures it will make clear:
Some others exposures are continuous variables that do not overpass the normality test. A visual inspection is required in this case.
If the exposures were following an anon normal distribution, the
method plotHistogram
has an argument
show.trans
that set to TRUE
draws the
histogram of the exposure plus three typical transformations:
The imputation process is out of rexposome
scope.
Nevertheless, rexposome
incorporates a wrapper to run the
imputation tools from the R packages and Hmisc
. The
imputation of the exposures in the ExposomeSet
is done by
using this code:
To use mice
package instead of hmisc
, see
the vignette entitles Dealing with Multiple Imputations.
We can get a snapshot of the behaviour of the full exposome
using the method plotFamily
or its wrapper
plot
. This function allows drawing a plot of a given family
of exposures or a mosaic with all the exposures.
This plotting method allows to group the exposure by a given
phenotype using the argument group
:
The same method allows to include a second group using the argument
group2
:
To properly perform a PCA analysis the exposures needs to be
standardised. The standardisation is done using function
standardize
that allows using a normal and a
robust approaches or use the interquartile range. The
normal aproache scales the exposures using the mean as a centre
and the standard variation used as dispersion. In robust
aproach the median and the median absolute deviation are used. This
transformation are only applied to continuous exposures. When
interquartile range is used, the median is used as a center and
the coeficient between the interquartile range of the exposure and the
normal range between the percentile 75 and 25 as variance.
## Object of class 'ExposomeSet' (storageMode: environment)
## . exposures description:
## . categorical: 4
## . continuous: 84
## . exposures transformation:
## . categorical: 0
## . transformed: 0
## . standardized: 84
## . imputed: 88
## . assayData: 88 exposures 109 individuals
## . element names: exp
## . exposures: AbsPM25, ..., Zn
## . individuals: id001, ..., id108
## . phenoData: 109 individuals 9 phenotypes
## . individuals: id001, ..., id108
## . phenotypes: whistling_chest, ..., cbmi
## . featureData: 88 exposures 7 explanations
## . exposures: AbsPM25, ..., Zn
## . descriptions: Family, ..., .imp
## experimentData: use 'experimentData(object)'
## Annotation:
Once the exposures are standardized we can run a PCA on the
ExposomeSet
using the method pca
. Typically,
exposome datasets contain both numerical and categorical variables, for
that reason, a Factor Analysis of Mixed Data (FAMD) is performed by
default rather than a PCA (which only uses numerical variables). To
perform a regular PCA, provide the argument pca = TRUE
to
the pca
function.
The method pca
returns an object of class
ExposomePCA
. This object encapsulates all the information
generated by the principal component analysis. The method
plotPCA
can be used in several ways. The first way is
setting the argument set
to "all"
to create a
mosaic of plots.
The plots in the first row correspond to the exposures and samples space. The first plot shows all the exposures on the axis for the first and the second principal components. The second plot shows all the individuals on the axis for the first and second principal components.
The plots on the second row are a summary of the variability explained by each component. The first plot is a bar plot with the variability explained by each component highlighting the components that are being drawn in the two first plots. The second plot is a line plot indicating the cumulative variability explained until each principal component. The vertical dashed line indicates the last principal component that is drawn in the first two plots. The horizontal dashed line indicates the amount of explained variability.
A second way of using plotPCA
is changing the content of
the argument set
to "samples"
to see the
samples’ space. When the set
argument is filled with
samples
, the argument phenotype
can be used to
colour each sample with its phenotype value.
This plot shows the sample space of the first and the second
principal component. Each dot is a sample and it is coloured depending
on its value in sex
. We can see that no cluster is done in
terms of sex.
This view be recreated in a 3D space using the function
plot3PCA
:
The correlation between exposures, in terms of intra-family and
inter-family exposures, is interesting to take into account. The
correlation of the exposome can be computed using
correlation
.
The values of the correlation can be obtained using the method
extract
. This returns a data.frame
.
## AbsPM25 As BDE100 BDE138
## AbsPM25 1.00000000 -0.1681446 0.1098427 -0.03196037
## As -0.16814460 1.0000000 -0.3727296 0.13768481
## BDE100 0.10984268 -0.3727296 1.0000000 0.11482844
## BDE138 -0.03196037 0.1376848 0.1148284 1.00000000
The best option to see the inter-family correlations is the
circos of correlations while the matrix of
correlations is a better way for studying the intra-family
correlations. Both of them are drawn using the method
plotCorrelation
.
Clustering analysis on exposures data results in exposure profile.
The method clustering
allows applying most of any
clustering method to an ExposomeSet
method.
The argument of the method clustering
are:
## function (object, method, cmethod, ..., warnings = TRUE)
## NULL
The argument method
is filled with the clustering
function. This clustering function needs to accept an
argument called data
, that will be filled with the
exposures-matrix. The object obtained from the clustering
function needs to have an accessor called
classification
. Otherwise the argument cmethod
needs to be filled with a function that takes the results of the
clustering function and returns a vector with the
classification of each individual.
In this analysis we apply the clustering method hclust
.
Hence we create a function to accept an argument called
data
.
The argument ...
allows passing arguments from
recposome
’s clustering
method to
hclust
.
Then, a function to obtain the classification of each sample is also
required. This function will use the cutree
function to
obtain the labels.
The new function hclust_k3
is a function that takes the
results of hclust_data
and applies it the
cutree
function, requesting 3 groups of individuals.
Having both clustering function (hclust_data
)
and the classification function (hclust_k3
) we use
them in the clustering
method:
## Warning in clustering(exp, method = hclust_data, cmethod = hclust_k3): Non
## continuous exposures will be discarded.
## Object of class 'ExposomeClust' (storageMode: environment)
## . Method: .... TRUE
## . assayData: 84 exposures 109 samples
## . element names: exp
## . exposures: AbsPM25, ..., Zn
## . samples: id001, ..., id108
## . featureData: 84 exposures 7 explanations
## . exposures: AbsPM25, ..., Zn
## . descriptions: Family, ..., .imp
## . #Cluster: 3
The profile for each group of individuals can be plotted with
plotClassification
method.
The classification of each individual can be obtained using the
method classification
. We can get a table with the number
of samples per group with:
##
## 1 2 3
## 80 24 5
As seen, the groups are given as numbers and the
plotClassification
transforms it to names (Group
1, Group 2 and Group 3).
Once preprocessed the exposome its association with health outcomes can be tested through three different approaches:
From the results of the PCA on the exposome data, two measures can be obtained: the correlation of the exposures with the principal components and the association of the phenotypes with the principal components.
The method plotEXP
draws a heat map with the correlation
of each exposure to the principal components.
From the plot, some conclusions can be obtained:
These conclusions are useful to give a meaning to the Principal Components in terms of exposures.
The method plotPHE
test the association between the
phenotypes and the principal components and draws a heat map with the
score of the association.
The conclusions that can be taken from the heat map are:
Method exwas
performs univariate test of the association
between exposures and health outcomes. This method requests a
formula
to test and the family of the distribution of the
health outcome (dependent variable). The models fitted are:
phenotype ~ exposure_1 + covar1 + ... + covarN
phenotype ~ exposure_2 + covar1 + ... + covarN
phenotype ~ exposure_3 + covar1 + ... + covarN
...
phenotype ~ exposure_M + covar1 + ... + covarN
The following line performs an ExWAS on flu and wheezing adjusted by
sex and age. Since the content of flu
and others in the
ExposomeSet
is dichotomous, the family
is set
to binomial (for more information see ?glm
).
## An object of class 'ExWAS'
##
## ~ blood_pre sex + age
##
## Tested exposures: 88
## Threshold for effective tests (TEF): 1.28e-03
## . Tests < TEF: 2
## Robust standar errors: Computed
## An object of class 'ExWAS'
##
## ~ wheezing sex + age
##
## Tested exposures: 88
## Threshold for effective tests (TEF): 1.28e-03
## . Tests < TEF: 0
## Robust standar errors: Computed
The method exwas
calculates the effective number of
tests in base of the correlation between the exposures. This is
transformed into a threshold for the p-values of the
association. This threshold can be obtained using the method
tef
.
A table with the associations between the exposures and
flu
is obtained with method extract
:
## DataFrame with 6 rows and 4 columns
## pvalue effect X2.5 X97.5
## <numeric> <numeric> <numeric> <numeric>
## NOx 8.47991e-06 13.3616 7.77552 18.9477
## NO 1.11507e-05 10.9207 6.28755 15.5539
## NO2 1.16036e-05 15.4009 8.85282 21.9489
## AbsPM25 1.66436e-05 20.2280 11.45428 29.0018
## PM25 3.71920e-05 37.7805 20.60678 54.9543
## PM25CU 1.35029e-04 11.5171 5.82592 17.2082
A Manhattan-like plot with the p-values of the association between
each exposure and asthma, coloured by families of exposures, is draw by
method plotExwas
.
clr <- rainbow(length(familyNames(exp)))
names(clr) <- familyNames(exp)
plotExwas(fl_ew, we_ew, color = clr) + ggtitle("Exposome Association Study - Univariate Approach")
Then a plot for the effects of a given model can be obtained with
plotEffect
:
No direct method is implemented to perform a stratified exposome wide
analysis, however it may be of interest for some researchers, so a small
code sample is provided to perform such studies. On this example a
stratified analysis using the sex
phenotype as stratifying
variable is used, the formula associates the exposures to the phenotype
cbmi
with no covariates.
strat_variable <- "sex"
formula <- cbmi ~ 1
family <- "gaussian"
strat_ex <- lapply(levels(as.factor(pData(exp)[[strat_variable]])), function(i) {
mask <- pData(exp)[[strat_variable]] == i
exwas_i <- rexposome::exwas(exp[, mask], formula = formula, family = family,
tef = FALSE)
exwas_i@formula <- update.formula(exwas_i@formula, as.formula(paste0("~ . + strata(",
strat_variable, "_", gsub("[[:space:]]|-|+|(|)", "", i), ")")))
return(exwas_i)
})
We have created a list of ExWAS objects that we can plot together using the following:
Method invExWAS
performs a similar association test
between exposures and health outcomes. The method asks for a
formula
to indicate the health outcome and covariables of
interest for the association test. The difference to the regular ExWAS
is on the model fitted, which on the inverse ExWAS is:
exposure_1 ~ phenotype + covar1 + ... + covarN
exposure_2 ~ phenotype + covar1 + ... + covarN
exposure_3 ~ phenotype + covar1 + ... + covarN
...
exposure_M ~ phenotype + covar1 + ... + covarN
Since not all exposures on a dataset have to be of the same family,
linear models (lm
) are fitted for numerical exposures,
while multinomial log-linear models (nnet::multinom
) are
fitted for categorical variables.
The following examples perforns an inverse ExWAS analysis on flu adjusted by sex, note that no left hand side term needs to be supplied, since the left term will be all the exposures on the dataset, if a left han side term is supplied it will be ignored.
## Warning in stats::chisq.test(...): Chi-squared approximation may be incorrect
## Warning in stats::chisq.test(...): Chi-squared approximation may be incorrect
## Warning in stats::chisq.test(...): Chi-squared approximation may be incorrect
## Warning in stats::chisq.test(...): Chi-squared approximation may be incorrect
## Warning in stats::chisq.test(...): Chi-squared approximation may be incorrect
## Warning in stats::chisq.test(...): Chi-squared approximation may be incorrect
## Warning in stats::chisq.test(...): Chi-squared approximation may be incorrect
## An object of class 'ExWAS'
##
## ~ expositions flu + sex
##
## Tested exposures: 88
## Threshold for effective tests (TEF): 1.28e-03
## . Tests < TEF: 0
## Robust standar errors: Computed
Since the object returned by invExWAS
is of class
ExWAS
, all the previous manipulation explained can be used
on it. Following the example we can extract the table of results:
## DataFrame with 6 rows and 4 columns
## pvalue effect X2.5 X97.5
## <numeric> <numeric> <numeric> <numeric>
## As 0.00151377 0.23157105 0.0519541 0.4111880
## BDE28 0.01576420 -0.01815712 -0.1025770 0.0662627
## PFNA 0.01788771 -0.08982500 -0.1637145 -0.0159355
## BDE66 0.04975334 -0.00946186 -0.0844738 0.0655500
## BDE154 0.05357511 0.09111855 0.0162881 0.1659490
## BDE71 0.07885360 0.00156862 -0.1044215 0.1075588
And we can also use the visualization methods:
The last approach is a multivariate analysis in order to find the
group of exposures related to the health outcome. This can be done using
methods like Elastic Net. The method mexwas
applies elastic
net to the exposures given a health outcome of interest.
bl_mew <- mexwas(exp_std, phenotype = "blood_pre", family = "gaussian")
we_mew <- mexwas(exp_std, phenotype = "wheezing", family = "binomial")
The coefficient of each exposure is plotted with
plotExwas
. The method draws a heat map with two columns and
the exposures as rows. The heat map is coloured with the coefficient of
each exposure in relation with the health outcome, so the ones in white
are not interesting. The two columns of the heat map correspond to the
minimum lambda (Min
) and to the lambda which gives the most
regularised model such that error is within one standard error of the
minimum (1SE
).
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ggplot2_3.5.1 rexposome_1.29.0 Biobase_2.67.0
## [4] BiocGenerics_0.53.3 generics_0.1.3 BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] bitops_1.0-9 gridExtra_2.3 formatR_1.14
## [4] sandwich_3.1-1 rlang_1.1.4 magrittr_2.0.3
## [7] ggridges_0.5.6 compiler_4.4.2 reshape2_1.4.4
## [10] vctrs_0.6.5 stringr_1.5.1 pkgconfig_2.0.3
## [13] shape_1.4.6.1 fastmap_1.2.0 backports_1.5.0
## [16] labeling_0.4.3 caTools_1.18.3 rmarkdown_2.29
## [19] nloptr_2.1.1 xfun_0.49 glmnet_4.1-8
## [22] cachem_1.1.0 jsonlite_1.8.9 flashClust_1.01-2
## [25] gmm_1.8 pryr_0.1.6 cluster_2.1.8
## [28] R6_2.5.1 RColorBrewer_1.1-3 bslib_0.8.0
## [31] stringi_1.8.4 boot_1.3-31 rpart_4.1.23
## [34] jquerylib_0.1.4 estimability_1.5.1 Rcpp_1.0.13-1
## [37] iterators_1.0.14 knitr_1.49 zoo_1.8-12
## [40] base64enc_0.1-3 Matrix_1.7-1 splines_4.4.2
## [43] nnet_7.3-19 tidyselect_1.2.1 rstudioapi_0.17.1
## [46] yaml_2.3.10 gplots_3.2.0 codetools_0.2-20
## [49] plyr_1.8.9 lattice_0.22-6 tibble_3.2.1
## [52] withr_3.0.2 evaluate_1.0.1 tmvtnorm_1.6
## [55] foreign_0.8-87 survival_3.8-3 norm_1.0-11.1
## [58] circlize_0.4.16 pillar_1.10.0 BiocManager_1.30.25
## [61] KernSmooth_2.23-24 corrplot_0.95 checkmate_2.3.2
## [64] DT_0.33 foreach_1.5.2 stats4_4.4.2
## [67] S4Vectors_0.45.2 munsell_0.5.1 scales_1.3.0
## [70] minqa_1.2.8 gtools_3.9.5 leaps_3.2
## [73] glue_1.8.0 emmeans_1.10.6 scatterplot3d_0.3-44
## [76] Hmisc_5.2-1 maketools_1.3.1 tools_4.4.2
## [79] lsr_0.5.2 sys_3.4.3 data.table_1.16.4
## [82] lme4_1.1-35.5 imputeLCMD_2.1 buildtools_1.0.0
## [85] mvtnorm_1.3-2 grid_4.4.2 impute_1.81.0
## [88] colorspace_2.1-1 nlme_3.1-166 htmlTable_2.4.3
## [91] Formula_1.2-5 cli_3.6.3 dplyr_1.1.4
## [94] pcaMethods_1.99.0 gtable_0.3.6 sass_0.4.9
## [97] digest_0.6.37 ggrepel_0.9.6 farver_2.1.2
## [100] FactoMineR_2.11 htmlwidgets_1.6.4 htmltools_0.5.8.1
## [103] lifecycle_1.0.4 multcompView_0.1-10 GlobalOptions_0.1.2
## [106] MASS_7.3-61