The pedigree routines came out of a simple need – to quickly draw a Pedigree structure on the screen, within R, that was “good enough” to help with debugging the actual routines of interest, which were those for fitting mixed effecs Cox models to large family data. As such the routine had compactness and automation as primary goals; complete annotation (monozygous twins, multiple types of affected status) and most certainly elegance were not on the list. Other software could do that much better.
It therefore came as a major surprise when these routines proved useful to others. Through their constant feedback, application to more complex pedigrees, and ongoing requests for one more feature, the routine has become what it is today. This routine is still not suitable for really large pedigrees, nor for heavily inbred ones such as in animal studies, and will likely not evolve in that way. The authors fondest hope is that others will pick up the project.
The Pedigree function is the first step, creating an object of class Pedigree. It accepts the following input
ped_df
A dataframe containing the columns
indId
A numeric or character vector of subject
identifiers.fatherId
The identifier of the father.motherId
The identifier of the mother.gender
The gender of the individual. This can be a
numeric variable with codes of 1
=male,
2
=female, 3
=unknown,
4
=terminated, or NA=unknown. A character or factor variable
can also be supplied containing the above; the string may be truncated
and of arbitrary case.available
Optional, a numeric variable with
0
=unavailable and 1
=available.affected
Optional, a numeric variable with
0
=unaffected and 1
=affected.status
Optional, a numeric variable with
0
=alive and 1
=dead.famid
Optional, a numeric or character vector of family
identifiers.steril
Optional, a numeric variable with
0
=not steril and 1
=steril.rel_df
Optional, a data frame with three columns or
four columns.
indId1
identifier values of the subject pairsindId2
identifier values of the subject pairscode
relationship codification :
1
=Monozygotic twin, 2
=Dizygotic twin,
3
=twin of unknown zygosity, 4
=Spouse.famid
Optional, a numeric or character vector of family
identifiers.cols_ren_ped
Optional, a named list for the renaming of
the ped_df
dataframecols_ren_rel
Optional, a named list for the renaming of
the rel_df
dataframenormalize
Optional, a logical to know if the data
should be normalised.hints
Optional, a list containing the horder in which
to plot the individuals and the matrix of the spouse.Note that a factor variable is not listed as one of the choices for the subject identifier. This is on purpose. Factors were designed to accomodate character strings whose values came from a limited class – things like race or gender, and are not appropriate for a subject identifier. All of their special properties as compared to a character variable turn out to be backwards for this case, in particular a memory of the original level set when subscripting is done.
However, due to the awful decision early on in S to automatically turn every character into a factor — unless you stood at the door with a club to head the package off — most users have become ingrained to the idea of using them for every character variable.
(I encourage you to set the global option
stringsAsFactors = FALSE
to turn off autoconversion – it
will measurably improve your R experience).
Therefore, to avoid unnecessary hassle for our users the code will accept a factor as input for the id variables, but the final structure does not retain it. Gender and relation do become factors. Status follows the pattern of the survival routines and remains an integer.
Based on the dataframe given for ped_df
and
rel_df
and their corresponding named list, the columns are
renamed for them to be used correctly. The renaming is done as
follow
rel_df <- data.frame(
indId1 = c("110", "204"),
indId2 = c("112", "205"),
code = c(1, 2),
family = c("1", "2")
)
cols_ren_rel <- list(
id1 = "indId1",
id2 = "indId2",
famid = "family"
)
## Rename columns rel
old_cols <- as.vector(unlist(cols_ren_rel))
new_cols <- names(cols_ren_rel)
cols_to_ren <- match(old_cols, names(rel_df))
names(rel_df)[cols_to_ren[!is.na(cols_to_ren)]] <-
new_cols[!is.na(cols_to_ren)]
print(rel_df)
## id1 id2 code famid
## 1 110 112 1 1
## 2 204 205 2 2
If the normalisation process is selected
normalize = TRUE
, then both dataframe will be checked by
their dedicated normalization function. It will ensure that all
modalities are written correctly and set up the right way. If a
famid
column is present in the dataframe, then it will be
aggregated to the id of each individual and separated by an ’’_’’ to
ensure the uniqueness of the individuals identifiers.
## sex id avail
## Min. :1.000 Length:55 Min. :0.0000
## 1st Qu.:1.000 Class :character 1st Qu.:0.0000
## Median :1.000 Mode :character Median :0.0000
## Mean :1.491 Mean :0.4364
## 3rd Qu.:2.000 3rd Qu.:1.0000
## Max. :2.000 Max. :1.0000
## sex id avail
## male :28 Length:55 Mode :logical
## female :27 Class :character FALSE:31
## unknown : 0 Mode :character TRUE :24
## terminated: 0
If any error is detected after the normalisation process, then the normalised dataframe is gave back to the user with errors column added describing the encountered problems.
## Warning in .local(obj, ...): The relationship informations are not valid. Here is the normalised
## relationship informations with the identified problems
## id1 id2 code famid error
## 1 1_110 1_112 MZ twin 1 <NA>
## 2 2_204 2_205 <NA> 2 CodeNotRecognise
Now that the data for the Pedigree object creation are ready, they
are given to a new Pedigree
object, trigerring the
validation process.
This validation step will check up for many errors such as:
id
momid
and dadid
are present in
id
sex
column only contain “male”, “female”, “unknown” or
“terminated” valuessteril
, status
, available
,
affected
only contains 0, 1 or NA valuesAfter validation an S4 object is generated. This new concept make it possible to easily setup methods for this new type of object. The controls of the parameters is also more precise.
The Pedigree
object contains 4 slots, each of them
contains a different S4 object
containing a specific type of information used for the Pedigree
construction.
ped
a Ped object for the Pedigree information with at
least the following slots:
id
the identifiers of the individualsdadid
the identifiers of the fathersmomid
the identifiers of the motherssex
the gender of each individualsrel
a Rel object describing all special relationship
beetween individuals that can’t be descibed in the ped
slot. The minimal slots needed are :
id1
the identifiers of the 1st individualsid2
the identifiers of the 2nd individualscode
factor describing the type of relationship (“MZ
twin”, “DZ twin”, “UZ twin”, “Spouse”)scales
a Scales object with two slots :
fill
a dataframe describing which modalities in which
columns correspond to an affected individuals. Plotting information such
as colour, angle and density are also providedborder
a dataframe describing which modalities in which
columns to use to plot the border of the plot elements.hints
a Hints object with two slots :
horder
numeric vector for the ordering of the
individuals plottingspouse
a matrix of the spousesFor more information on each object:
help(Ped)
help(Rel)
help(Scales)
help(Hints)
As the Pedigree object is now an S4 class, we have made available a number of accessors. Most of them can be used as a getter or as a setter to modify a value in the correponding slot of the object
ped()
, rel()
,
scales()
, hints()
mcols()
fill()
,
border()
horder()
,
spouse()
id()
, dadid()
,
momid()
, famid()
, sex()
affected()
, avail()
,
status()
isinf()
, kin()
,
useful()
mcols()
id1()
, id2()
, code()
,
famid()
fill()
, border()
horder()
, spouse()
mcols()
The mcols()
accessors is the one you should use to add
more informations to your individuals.
## DataFrame with 55 rows and 5 columns
## num error sterilisation vitalStatus affection_mods
## <integer> <character> <logical> <logical> <numeric>
## 1 2 NA NA NA 0
## 2 3 NA NA NA 1
## 3 2 NA NA NA 1
## 4 4 NA NA NA 0
## 5 6 NA NA NA NA
## ... ... ... ... ... ...
## 51 2 NA NA NA 0
## 52 1 NA NA NA 0
## 53 3 NA NA NA 0
## 54 2 NA NA NA 0
## 55 0 NA NA NA 1
## Add new columns as a threshold if identifiers of individuals superior
## to a given threshold for example
mcols(ped)$idth <- ifelse(as.numeric(mcols(ped)$indId) < 200, "A", "B")
mcols(ped)$idth
## [1] "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A"
## [25] "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "B" "B" "B" "B" "B" "B" "B"
## [49] "B" "B" "B" "B" "B" "B" "B"
With this new S4 object comes multiple methods to ease the use of it:
plot()
summary()
print()
show()
as.list()
[
shrink()
generate_colors()
is_informative()
kindepth()
kinship()
make_famid()
upd_famid()
num_child()
unrelated()
useful_inds()
## We can change the family name based on an other column
ped <- upd_famid(ped, mcols(ped)$idth)
## We can substract a given family
ped_a <- ped[famid(ped(ped)) == "A"]
## Plot it
plot(ped_a, cex = 0.5)
## Pedigree object with
## [1] "Ped object with 41 individuals and 14 metadata columns"
## [1] "Rel object with 0 relationshipswith 0 MZ twin, 0 DZ twin, 0 UZ twin, 0 Spouse"
## $id
## [1] "A_101" "A_102" "A_103" "A_104" "A_105" "A_106" "A_107" "A_108" "A_109" "A_110" "A_111" "A_112"
## [13] "A_113" "A_114" "A_115" "A_116" "A_117" "A_118" "A_119" "A_120" "A_121" "A_122" "A_123" "A_124"
## [25] "A_125" "A_126" "A_127" "A_128" "A_129" "A_130" "A_131" "A_132" "A_133" "A_134" "A_135" "A_136"
## [37] "A_137" "A_138" "A_139" "A_140" "A_141"
##
## $dadid
## [1] NA NA "A_135" NA NA NA NA NA "A_101" "A_103" "A_103" "A_103"
## [13] NA "A_103" "A_105" "A_105" NA "A_105" "A_105" "A_107" "A_110" "A_110" "A_110" "A_110"
## [25] "A_112" "A_112" "A_114" "A_114" "A_117" "A_119" "A_119" "A_119" "A_119" "A_119" NA NA
## [37] NA "A_135" "A_137" "A_137" "A_137"
##
## $momid
## [1] NA NA "A_136" NA NA NA NA NA "A_102" "A_104" "A_104" "A_104"
## [13] NA "A_104" "A_106" "A_106" NA "A_106" "A_106" "A_108" "A_109" "A_109" "A_109" "A_109"
## [25] "A_118" "A_118" "A_115" "A_115" "A_116" "A_120" "A_120" "A_120" "A_120" "A_120" NA NA
## [37] NA "A_136" "A_138" "A_138" "A_138"
## Shrink it to keep only the necessary information
lst1_s <- shrink(ped_a, max_bits = 10)
plot(lst1_s$pedObj, cex = 0.5)
## 10 x 10 sparse Matrix of class "dsCMatrix"
## [[ suppressing 10 column names 'A_101', 'A_102', 'A_103' ... ]]
##
## A_101 0.50 . . . . . . . 0.25 .
## A_102 . 0.50 . . . . . . 0.25 .
## A_103 . . 0.50 . . . . . . 0.25
## A_104 . . . 0.50 . . . . . 0.25
## A_105 . . . . 0.5 . . . . .
## A_106 . . . . . 0.5 . . . .
## A_107 . . . . . . 0.5 . . .
## A_108 . . . . . . . 0.5 . .
## A_109 0.25 0.25 . . . . . . 0.50 .
## A_110 . . 0.25 0.25 . . . . . 0.50
## Get the useful individuals
ped_a <- useful_inds(ped_a, informative = "AvAf")
as.data.frame(ped(ped_a))["useful"][1:10, ]
## [1] TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE TRUE TRUE
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
## [4] LC_COLLATE=C LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
## [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] Pedixplorer_1.3.0 BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] gtable_0.3.6 xfun_0.48 bslib_0.8.0 ggplot2_3.5.1
## [5] htmlwidgets_1.6.4 lattice_0.22-6 quadprog_1.5-8 vctrs_0.6.5
## [9] tools_4.4.1 generics_0.1.3 stats4_4.4.1 tibble_3.2.1
## [13] fansi_1.0.6 highr_0.11 pkgconfig_2.0.3 Matrix_1.7-1
## [17] data.table_1.16.2 S4Vectors_0.43.2 readxl_1.4.3 lifecycle_1.0.4
## [21] compiler_4.4.1 stringr_1.5.1 shinytoastr_2.2.0 munsell_0.5.1
## [25] httpuv_1.6.15 shinyWidgets_0.8.7 htmltools_0.5.8.1 sys_3.4.3
## [29] buildtools_1.0.0 sass_0.4.9 yaml_2.3.10 lazyeval_0.2.2
## [33] plotly_4.10.4 later_1.3.2 pillar_1.9.0 jquerylib_0.1.4
## [37] tidyr_1.3.1 DT_0.33 cachem_1.1.0 mime_0.12
## [41] tidyselect_1.2.1 digest_0.6.37 stringi_1.8.4 colourpicker_1.3.0
## [45] dplyr_1.1.4 purrr_1.0.2 maketools_1.3.1 fastmap_1.2.0
## [49] grid_4.4.1 colorspace_2.1-1 cli_3.6.3 magrittr_2.0.3
## [53] utf8_1.2.4 withr_3.0.2 scales_1.3.0 promises_1.3.0
## [57] rmarkdown_2.28 httr_1.4.7 gridExtra_2.3 cellranger_1.1.0
## [61] shiny_1.9.1 evaluate_1.0.1 knitr_1.48 shinycssloaders_1.1.0
## [65] miniUI_0.1.1.1 viridisLite_0.4.2 rlang_1.1.4 Rcpp_1.0.13
## [69] xtable_1.8-4 glue_1.8.0 BiocManager_1.30.25 BiocGenerics_0.53.0
## [73] jsonlite_1.8.9 R6_2.5.1 plyr_1.8.9