Title: | Pedigree Functions |
---|---|
Description: | Routines to handle family data with a Pedigree object. The initial purpose was to create correlation structures that describe family relationships such as kinship and identity-by-descent, which can be used to model family data in mixed effects models, such as in the coxme function. Also includes a tool for Pedigree drawing which is focused on producing compact layouts without intervention. Recent additions include utilities to trim the Pedigree object with various criteria, and kinship for the X chromosome. |
Authors: | Louis Le Nézet [aut, cre] |
Maintainer: | Louis Le Nézet <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.1.0 |
Built: | 2024-07-01 04:16:12 UTC |
Source: | https://github.com/bioc/Pedixplorer |
The Pedixplorer package for pedigree data an updated package of the
kinship2
package.
The kinship2
package was originally
written by Terry Therneau and Jason Sinnwell.
The Pedixplorer
package is a fork of the
kinship2
package with
additional functionality and bug fixes.
The package download, NEWS, and README are available on CRAN: \urlhttps://cran.r-project.org/package=kinship2 for the previous version of the package.
Below are listed some of the most widely used functions available in arsenal:
Pedigree()
: Contstructor of the Pedigree class,
given identifiers, sex, affection status(es), and special relationships
kinship()
: Calculates the kinship matrix, the
probability having an allele sampled from two individuals
be the same via IBD.
plot()
: Method to transform a Pedigree
object into a graphical plot.
Allows extra information to be included in the id under the
plot symbol.
This method use the plot_fromdf()
function to transform the Pedigree
object into a data frame of graphical elements, the same is done for the
legend with the ped_to_legdf()
function.
When done, the data frames are plotted with the plot_fromdf()
function.
shrink()
: Shrink a Pedigree to a specific bit size,
removing non-informative members first.
bit_size()
: Approximate the output from SAS's
PROC FREQ
procedure when using the /list
option of the TABLE
statement.
sampleped()
: Pedigree example data sets
with two pedigrees
minnbreast()
: Larger cohort of pedigrees
from MN breast cancer study
Maintainer: Louis Le Nézet [email protected] (ORCID)
Authors:
Jason Sinnwell [email protected]
Terry Therneau
Other contributors:
Daniel Schaid [contributor]
Elizabeth Atkinson [contributor]
Louis Le Nezet [contributor]
Useful links:
Report bugs at https://github.com/LouisLeLezet/Pedixplorer/issues
library(Pedixplorer)
library(Pedixplorer)
Given a Pedigree, this function creates helper matrices that describe the layout of a plot of the Pedigree.
## S4 method for signature 'Pedigree' align( obj, packed = TRUE, width = 10, align = TRUE, hints = NULL, missid = "NA_character_" )
## S4 method for signature 'Pedigree' align( obj, packed = TRUE, width = 10, align = TRUE, hints = NULL, missid = "NA_character_" )
obj |
A Pedigree object |
packed |
Should the Pedigree be compressed. (i.e. allow diagonal lines connecting parents to children in order to have a smaller overall width for the plot.) |
width |
For a packed output, the minimum width of the plot, in inches. |
align |
For a packed Pedigree, align children under parents |
hints |
A Hints object or a named list containing |
missid |
A character vector with the missing values identifiers.
All the id, dadid and momid corresponding to those values will be set
to |
This is an internal routine, used almost exclusively by
ped_to_plotdf()
.
The subservient functions auto_hint()
,
alignped1()
, alignped2()
,
alignped3()
, and alignped4()
contain the bulk of the computation.
If the hints are missing the auto_hint()
routine is called to
supply an initial guess.
If multiple families are present in the obj Pedigree, this routine is called once for each family, and the results are combined in the list returned.
For more information you can read the associated vignette:
vignette("pedigree_alignment")
.
A list with components
n
: A vector giving the number of subjects on each horizonal level of the
plot
nid
: A matrix with one row for each level, giving the numeric id of
each subject plotted.
(A value of 17
means the 17th subject in the Pedigree).
pos
: A matrix giving the horizontal position of each plot point
fam
: A matrix giving the family id of each plot point.
A value of 3
would mean that the two subjects in positions 3 and 4,
in the row above, are this subject's parents.
spouse
: A matrix with values
0
= not a spouse
1
= subject plotted to the immediate right is a spouse
2
= subject plotted to the immediate right is an inbred spouse
twins
: Optional matrix which will only be present if the Pedigree
contains twins :
0
= not a twin
1
= sibling to the right is a monozygotic twin
2
= sibling to the right is a dizygotic twin
3
= sibling to the right is a twin of unknown zygosity
alignped1()
,
alignped2()
,
alignped3()
,
alignped4()
,
auto_hint()
data(sampleped) ped <- Pedigree(sampleped) align(ped)
data(sampleped) ped <- Pedigree(sampleped) align(ped)
First alignment routine which create the subtree founded on a single subject as though it were the only tree.
alignped1(idx, dadx, momx, level, horder, packed, spouselist)
alignped1(idx, dadx, momx, level, horder, packed, spouselist)
idx |
Indexes of the subjects |
dadx |
Indexes of the fathers |
momx |
Indexes of the mothers |
level |
Vector of the level of each subject |
horder |
A named numeric vector with one element per subject in the Pedigree. It determines the relative horizontal order of subjects within a sibship, as well as the relative order of processing for the founder couples. (For this latter, the female founders are ordered as though they were sisters). The names of the vector should be the individual identifiers. |
packed |
Should the Pedigree be compressed. (i.e. allow diagonal lines connecting parents to children in order to have a smaller overall width for the plot.) |
spouselist |
Matrix of spouses with 4 columns:
|
In this routine the nid array consists of the final
nid array + 1/2
of the final spouse array.
Note that the spouselist matrix will only contain spouse pairs
that are not yet processed. The logic for anchoring is slightly tricky.
First, if col 4 of the spouselist matrix is 0, we anchor at the first
opportunity. Also note that if spouselist[, 3] == spouselist[, 4]
it is the husband who is the anchor (just write out the possibilities).
Create the set of 3 return structures, which will be matrices
with 1 + nspouse
columns.
If there are children then other routines will widen the result.
This two complimentary lists denote the spouses plotted on the left
and on the right.
For someone with lots of spouses we try to split them evenly.
If the number of spouses is odd, then men should have more on
the right than on the left, women more on the right.
Any hints in the spouselist matrix override.
We put the undecided marriages closest to idx, then add
predetermined ones to the left and right. The majority of marriages will
be undetermined singletons, for which nleft will be 1
for
female (put my husband to the left) and 0
for male. In one bug found
by plotting canine data, lspouse could initially be empty but
length(rspouse) > 1
. This caused nleft > length(indx)
.
A fix was to not let indx to be indexed beyond its length,
fix by JPS 5/2013.
For each spouse get the list of children. If there are any we
call alignped2()
to generate their tree and
then mark the connection to their parent.
If multiple marriages have children we need to join the trees.
To finish up we need to splice together the tree made up from
all the kids, which only has data from lev + 1
down, with the data here.
There are 3 cases:
No children were found.
The tree below is wider than the tree here, in which case we add the data from this level onto theirs.
The tree below is narrower, for instance an only child.
A list containing the elements to plot the Pedigree. It contains a set of matrices along with the spouselist matrix. The latter has marriages removed as they are processed.
n
: A vector giving the number of subjects on each horizonal level of the
plot
nid
: A matrix with one row for each level, giving the numeric id of
each subject plotted.
(A value of 17
means the 17th subject in the Pedigree).
pos
: A matrix giving the horizontal position of each plot point
fam
: A matrix giving the family id of each plot point.
A value of 3
would mean that the two subjects in positions 3 and 4,
in the row above, are this subject's parents.
spouselist
: Spouse matrix with anchors informations
data(sampleped) ped <- Pedigree(sampleped) align(ped)
data(sampleped) ped <- Pedigree(sampleped) align(ped)
Second of the four co-routines which takes a collection of siblings, grows the tree for each, and appends them side by side into a single tree.
alignped2(idx, dadx, momx, level, horder, packed, spouselist)
alignped2(idx, dadx, momx, level, horder, packed, spouselist)
idx |
Indexes of the subjects |
dadx |
Indexes of the fathers |
momx |
Indexes of the mothers |
level |
Vector of the level of each subject |
horder |
A named numeric vector with one element per subject in the Pedigree. It determines the relative horizontal order of subjects within a sibship, as well as the relative order of processing for the founder couples. (For this latter, the female founders are ordered as though they were sisters). The names of the vector should be the individual identifiers. |
packed |
Should the Pedigree be compressed. (i.e. allow diagonal lines connecting parents to children in order to have a smaller overall width for the plot.) |
spouselist |
Matrix of spouses with 4 columns:
|
The input arguments are the same as those to alignped1()
with the
exception that idx will be a vector. This routine does nothing
to the spouselist matrix, but needs to pass it down the tree and back
since one of the routines called by alignped2()
might change the matrix.
The code below has one non-obvious special case. Suppose that two sibs marry.
When the first sib is processed by alignped1
then both partners
(and any children) will be added to the rval structure below.
When the second sib is processed they will come back as a 1 element tree
(the marriage will no longer be on the spouselist), which should be added
onto rval. The rule thus is to not add any 1 element tree whose value
(which must be idx[i]
is already in the rval structure for this level.
A list containing the elements to plot the Pedigree. It contains a set of matrices along with the spouselist matrix. The latter has marriages removed as they are processed.
n
: A vector giving the number of subjects on each horizonal level of the
plot
nid
: A matrix with one row for each level, giving the numeric id of
each subject plotted.
(A value of 17
means the 17th subject in the Pedigree).
pos
: A matrix giving the horizontal position of each plot point
fam
: A matrix giving the family id of each plot point.
A value of 3
would mean that the two subjects in positions 3 and 4,
in the row above, are this subject's parents.
spouselist
: Spouse matrix with anchors informations
data(sampleped) ped <- Pedigree(sampleped) align(ped)
data(sampleped) ped <- Pedigree(sampleped) align(ped)
Third of the four co-routines to merges two pedigree trees which are side by side into a single object.
alignped3(alt1, alt2, packed, space = 1)
alignped3(alt1, alt2, packed, space = 1)
alt1 |
Alignment of the first tree |
alt2 |
Alignment of the second tree |
packed |
Should the Pedigree be compressed. (i.e. allow diagonal lines connecting parents to children in order to have a smaller overall width for the plot.) |
space |
Space between two subjects |
The primary special case is when the rightmost person in the left tree is the same as the leftmost person in the right tree; we need not plot two copies of the same person side by side. (When initializing the output structures do not worry about this, there is no harm if they are a column bigger than finally needed.) Beyond that the work is simple book keeping.
For the unpacked case, which is the traditional way to draw a Pedigree when we can assume the paper is infinitely wide, all parents are centered over their children. In this case we think if the two trees to be merged as solid blocks. On input they both have a left margin of 0. Compute how far over we have to slide the right tree.
Now merge the two trees. Start at the top level and work down.
A list containing the elements to plot the Pedigree. It contains a set of matrices along with the spouselist matrix. The latter has marriages removed as they are processed.
n
: A vector giving the number of subjects on each horizonal level of the
plot
nid
: A matrix with one row for each level, giving the numeric id of
each subject plotted.
(A value of 17
means the 17th subject in the Pedigree).
pos
: A matrix giving the horizontal position of each plot point
fam
: A matrix giving the family id of each plot point.
A value of 3
would mean that the two subjects in positions 3 and 4,
in the row above, are this subject's parents.
spouselist
: Spouse matrix with anchors informations
data(sampleped) ped <- Pedigree(sampleped) align(ped)
data(sampleped) ped <- Pedigree(sampleped) align(ped)
Last routines which attempts to line up children under parents and put spouses and siblings "close" to each other, to the extent possible within the constraints of page width.
alignped4(rval, spouse, level, width, align)
alignped4(rval, spouse, level, width, align)
rval |
A list with components |
spouse |
A boolean matrix with one row per level representing if the subject is a spouse or not. |
level |
Vector of the level of each subject |
width |
For a packed output, the minimum width of the plot, in inches. |
align |
For a packed Pedigree, align children under parents |
The alignped4()
routine is the final step of alignment.
The current code does necessary setup and then calls the
quadprog::solve.QP()
function.
There are two important parameters for the function:
The maximum width specified. The smallest possible width is the maximum number of subjects on a line, if the user suggestion is too low it is increased to that 1 + that amount (to give just a little wiggle room).
The align vector of 2 alignment parameters a
and b
.
For each set of siblings x
with parents at p_1
and p_2
the alignment penalty is :
where k
is the number of siblings in the set.
Using the fact that when a = 1
:
then moving a sibship with k
sibs one unit to the left or
right of optimal will incur the same cost as moving one with only 1 or
two sibs out of place.
If a = 0
then large sibships are harder to move
than small ones, with the default value a = 1.5
they are slightly easier
to move than small ones.
The rationale for the default is as long as the
parents are somewhere between the first and last siblings the result looks
fairly good, so we are more flexible with the spacing of a large family.
By tethering all the sibs to a single spot they tend are kept close to
each other.
The alignment penalty for spouses is , which tends to
keep them together. The size of
b
controls the relative importance of
sib-parent and spouse-spouse closeness.
We start by adding in these penalties.
The total number of parameters in the alignment problem
(what we hand to quadprog) is the set of sum(n)
positions.
A work array myid keeps track of the parameter number for each position so
that it is easy to find. There is one extra penalty added at the end.
Because the penalty amount would be the same if all the final positions
were shifted by a constant, the penalty matrix will not be positive
definite; solve.QP()
does not like this.
We add a tiny amount of leftward pull to the widest line.
If there are k
subjects on a line there will
be k+1
constraints for that line. The first point must be
, each subesquent one must be at least 1 unit to the right,
and the final point must be
the max width.
The updated position matrix
data(sampleped) ped <- Pedigree(sampleped) align(ped)
data(sampleped) ped <- Pedigree(sampleped) align(ped)
Anchor variable to ordered factor
anchor_to_factor(anchor)
anchor_to_factor(anchor)
anchor |
A character, factor or numeric vector corresponding to the anchor of the individuals. The following values are recognized:
|
An ordered factor vector containing the transformed variable "either" < "left" < "right"
anchor_to_factor(c(1, 2, 0, "left", "right", "either"))
anchor_to_factor(c(1, 2, 0, "left", "right", "either"))
Compute an initial guess for the alignment of a Pedigree
## S4 method for signature 'Pedigree' auto_hint(obj, hints = NULL, packed = TRUE, align = FALSE, reset = FALSE)
## S4 method for signature 'Pedigree' auto_hint(obj, hints = NULL, packed = TRUE, align = FALSE, reset = FALSE)
obj |
A Pedigree object |
hints |
A Hints object or a named list containing |
packed |
Should the Pedigree be compressed. (i.e. allow diagonal lines connecting parents to children in order to have a smaller overall width for the plot.) |
align |
For a packed Pedigree, align children under parents |
reset |
If |
A Pedigree structure can contain a Hints object which helps to reorder the Pedigree (e.g. left-to-right order of children within family) so as to plot with minimal distortion. This routine is used to create an initial version of the hints. They can then be modified if desired.
This routine would not normally be called by a user. It moves children within families, so that marriages are on the "edge" of a set children, closest to the spouse. For pedigrees that have only a single connection between two families this simple-minded approach works surprisingly well. For more complex structures hand-tuning of the hints may be required.
When auto_hint()
is called with a a vector of numbers as the hints
argument, the values for the founder females are used to order the founder
families left to right across the plot.
The values within a sibship are used as the preliminary order of
siblings within a family; this may be changed to move one of them to the
edge so as to match up with a spouse. The actual values in the vector are
not important, only their order.
The initial Hints object.
data(sampleped) ped <- Pedigree(sampleped[sampleped$famid == 1, ]) auto_hint(ped)
data(sampleped) ped <- Pedigree(sampleped[sampleped$famid == 1, ]) auto_hint(ped)
When computer time is cheap, use this routine to get a best Pedigree alignment. This routine will try all possible founder orders, and return the one with the least stress.
## S4 method for signature 'Pedigree' best_hint(obj, wt = c(1000, 10, 1), tolerance = 0)
## S4 method for signature 'Pedigree' best_hint(obj, wt = c(1000, 10, 1), tolerance = 0)
obj |
A Pedigree object |
wt |
A vector of three weights for the three error measures.
Default is
|
tolerance |
The maximum stress level to accept.
Default is |
The auto_hint()
routine will rearrange sibling order, but not
founder order.
This calls auto_hint()
with every possible founder order, and finds that
plot with the least "stress".
The stress is computed as a weighted sum of three error measures:
nbArcs The number of duplicate individuals in the plot
lgArcs The sum of the absolute values of the differences in the positions of duplicate individuals
lgParentsChilds The sum of the absolute values of the differences between the center of the children and the parents
If during the search, a plot is found with a stress level less than tolerance, the search is terminated.
The best Hints object out of all the permutations
data(sampleped) ped <- Pedigree(sampleped[sampleped$famid == 1,]) best_hint(ped)
data(sampleped) ped <- Pedigree(sampleped[sampleped$famid == 1,]) best_hint(ped)
Utility function used in the shrink()
function
to calculate the bit size of a Pedigree.
## S4 method for signature 'character_OR_integer' bit_size(obj, momid, missid = NA_character_) ## S4 method for signature 'Pedigree' bit_size(obj) ## S4 method for signature 'Ped' bit_size(obj)
## S4 method for signature 'character_OR_integer' bit_size(obj, momid, missid = NA_character_) ## S4 method for signature 'Pedigree' bit_size(obj) ## S4 method for signature 'Ped' bit_size(obj)
obj |
A Ped or Pedigree object or a vector of fathers identifiers |
momid |
A vector containing for each subject, the identifiers of the biologicals mothers. |
missid |
A character vector with the missing values identifiers.
All the id, dadid and momid corresponding to those values will be set
to |
The bit size of a Pedigree is defined as :
Where NbNonFounders
is the number of non founders in the Pedigree
(i.e. individuals with identified parents) and
NbFounders
is the number of founders in the Pedigree
(i.e. individuals without identified parents).
A list with the following components:
bit_size The bit size of the Pedigree
nFounder The number of founders in the Pedigree
nNonFounder The number of non founders in the Pedigree
data(sampleped) ped <- Pedigree(sampleped) bit_size(ped)
data(sampleped) ped <- Pedigree(sampleped) bit_size(ped)
Create a list of x and y coordinates for a circle with a given number of slices.
circfun(nslice, n = 50)
circfun(nslice, n = 50)
nslice |
Number of slices in the circle |
n |
Total number of points in the circle |
A list of x and y coordinates per slice.
circfun(1) circfun(1, 10) circfun(4, 50)
circfun(1) circfun(1, 10) circfun(4, 50)
Draw arcs
draw_arc(x0, y0, x1, y1, p, ggplot_gen = FALSE, lwd = 1, col = "black")
draw_arc(x0, y0, x1, y1, p, ggplot_gen = FALSE, lwd = 1, col = "black")
x0 |
x coordinate of the first point |
y0 |
y coordinate of the first point |
x1 |
x coordinate of the second point |
y1 |
y coordinate of the second point |
p |
ggplot object |
ggplot_gen |
If TRUE add the segments to the ggplot object |
lwd |
Line width |
col |
Line color |
Plot the arcs to the current device or add it to a ggplot object
Draw a polygon
draw_polygon( x, y, p, ggplot_gen = FALSE, fill = "grey", border = NULL, density = NULL, angle = 45 )
draw_polygon( x, y, p, ggplot_gen = FALSE, fill = "grey", border = NULL, density = NULL, angle = 45 )
x |
x coordinates |
y |
y coordinates |
p |
ggplot object |
ggplot_gen |
If TRUE add the segments to the ggplot object |
fill |
Fill color |
border |
Border color |
density |
Density of shading |
angle |
Angle of shading |
Plot the polygon to the current device or add it to a ggplot object
Draw segments
draw_segment( x0, y0, x1, y1, p, ggplot_gen, col = par("fg"), lwd = par("lwd"), lty = par("lty") )
draw_segment( x0, y0, x1, y1, p, ggplot_gen, col = par("fg"), lwd = par("lwd"), lty = par("lty") )
x0 |
x coordinate of the first point |
y0 |
y coordinate of the first point |
x1 |
x coordinate of the second point |
y1 |
y coordinate of the second point |
p |
ggplot object |
ggplot_gen |
If TRUE add the segments to the ggplot object |
col |
Line color |
lwd |
Line width |
lty |
Line type |
Plot the segments to the current device or add it to a ggplot object
Draw texts
draw_text( x, y, label, p, ggplot_gen = FALSE, cex = 1, col = NULL, adjx = 0, adjy = 0 )
draw_text( x, y, label, p, ggplot_gen = FALSE, cex = 1, col = NULL, adjx = 0, adjy = 0 )
x |
x coordinates |
y |
y coordinates |
label |
Text to be displayed |
p |
ggplot object |
ggplot_gen |
If TRUE add the segments to the ggplot object |
cex |
Character expansion of the text |
col |
Text color |
adjx |
x adjustment |
adjy |
y adjustment |
Plot the text to the current device or add it to a ggplot object
Find the duplicate pairs of a subject
duporder(idlist, plist, lev, obj)
duporder(idlist, plist, lev, obj)
idlist |
List of individuals identifiers to be considered |
plist |
The alignment structure representing the Pedigree layout.
See |
lev |
The generation level of the subject |
obj |
A Pedigree object |
This routine is used by auto_hint()
.
It finds the duplicate pairs of a subject and returns them in
the order they should be plotted.
A matrix of duplicate pairs
Exclude any founders who are not parents.
exclude_stray_marryin(id, dadid, momid)
exclude_stray_marryin(id, dadid, momid)
id |
A character vector with the identifiers of each individuals |
dadid |
A vector containing for each subject, the identifiers of the biologicals fathers. |
momid |
A vector containing for each subject, the identifiers of the biologicals mothers. |
Returns a data frame of subject identifiers and their parents. The data frame is trimmed of any founders who are not parents.
Finds one subject from among available non-parents with indicated affection status.
## S4 method for signature 'Ped' find_avail_affected(obj, avail = NULL, affected = NULL, affstatus = NA) ## S4 method for signature 'Pedigree' find_avail_affected(obj, avail = NULL, affected = NULL, affstatus = NA)
## S4 method for signature 'Ped' find_avail_affected(obj, avail = NULL, affected = NULL, affstatus = NA) ## S4 method for signature 'Pedigree' find_avail_affected(obj, avail = NULL, affected = NULL, affstatus = NA)
obj |
A Ped or Pedigree object. |
avail |
A logical vector with the availability status of the
individuals
(i.e. |
affected |
A logical vector with the affection status of the
individuals
(i.e. |
affstatus |
Affection status to search for. |
When used within shrink()
, this function is called with the first
affected indicator, if the affected item in the Pedigree is a matrix of
multiple affected indicators.
If avail or affected is null, then the function will use the corresponding Ped accessor.
A list is returned with the following components
ped The new Ped object
newAvail Vector of availability status of trimmed individuals
idTrimmed Vector of IDs of trimmed individuals
isTrimmed logical value indicating whether Ped object has been trimmed
bit_size Bit size of the trimmed Ped
data(sampleped) ped <- Pedigree(sampleped) find_avail_affected(ped, affstatus = 1)
data(sampleped) ped <- Pedigree(sampleped) find_avail_affected(ped, affstatus = 1)
Finds subjects from among available non-parents with all affection
equal to 0
.
## S4 method for signature 'Ped' find_avail_noninform(obj, avail = NULL, affected = NULL) ## S4 method for signature 'Pedigree' find_avail_noninform(obj, avail = NULL, affected = NULL)
## S4 method for signature 'Ped' find_avail_noninform(obj, avail = NULL, affected = NULL) ## S4 method for signature 'Pedigree' find_avail_noninform(obj, avail = NULL, affected = NULL)
obj |
A Ped or Pedigree object. |
avail |
A logical vector with the availability status of the
individuals
(i.e. |
affected |
A logical vector with the affection status of the
individuals
(i.e. |
Identify subjects to remove from a Pedigree who are available but
non-informative (unaffected). This is the second step to remove subjects in
shrink()
if the Pedigree does not meet the desired bit size.
If avail or affected is null, then the function will use the corresponding Ped accessor.
Vector of subject ids who can be removed by having lowest informativeness.
data(sampleped) ped <- Pedigree(sampleped) find_avail_noninform(ped)
data(sampleped) ped <- Pedigree(sampleped) find_avail_noninform(ped)
Find the siblings of a subject
findsibs(idpos, plist, lev)
findsibs(idpos, plist, lev)
idpos |
The position of the subject |
plist |
The alignment structure representing the Pedigree layout.
See |
lev |
The generation level of the subject |
This routine is used by auto_hint()
.
It finds the siblings of a subject.
The positions of the siblings
Find the spouse of a subject
findspouse(idpos, plist, lev, obj)
findspouse(idpos, plist, lev, obj)
idpos |
The position of the subject |
plist |
The alignment structure representing the Pedigree layout.
See |
lev |
The generation level of the subject |
obj |
A Pedigree object |
This routine is used by auto_hint()
.
It finds the spouse of a subject.
The position of the spouse
Fix the sex of parents, add parents that are missing from the data. Can be used with a dataframe or a vector of the different individuals informations.
## S4 method for signature 'character' fix_parents(obj, dadid, momid, sex, famid = NULL, missid = NA_character_) ## S4 method for signature 'data.frame' fix_parents(obj, delete = FALSE, filter = NULL, missid = NA_character_)
## S4 method for signature 'character' fix_parents(obj, dadid, momid, sex, famid = NULL, missid = NA_character_) ## S4 method for signature 'data.frame' fix_parents(obj, delete = FALSE, filter = NULL, missid = NA_character_)
obj |
A data.frame or a vector of the individuals identifiers. If a
dataframe is given it must contain the columns |
dadid |
A vector containing for each subject, the identifiers of the biologicals fathers. |
momid |
A vector containing for each subject, the identifiers of the biologicals mothers. |
sex |
A character, factor or numeric vector corresponding to
the gender of the individuals. This will be transformed to an ordered factor
with the following levels:
|
famid |
A character vector with the family identifiers of the individuals. If provide, will be aggregated to the individuals identifiers separated by an underscore. |
missid |
A character vector with the missing values identifiers.
All the id, dadid and momid corresponding to those values will be set
to |
delete |
Boolean defining if missing parents needs to be:
|
filter |
Filtering column containing |
First look to add parents whose ids are given in momid/dadid. Second, fix
sex of parents. Last look to add second parent for children for whom only
one parent id is given.
If a famid vector is given the family id will be added to the ids of all
individuals (id
, dadid
, momid
)
separated by an underscore before proceeding.
Check for presence of both parents id in the id field. If not both presence behaviour depend of delete parameter
If TRUE
then use fix_parents function and merge back the other fields
in the dataframe then set availability to O for non available parents.
If FALSE
then delete the id of missing parents
A data.frame with id, dadid, momid, sex as columns with the relationships fixed.
Jason Sinnwell
test1char <- data.frame( id = paste('fam', 101:111, sep = ''), sex = c('male', 'female')[c(1, 2, 1, 2, 1, 1, 2, 2, 1, 2, 1)], father = c( 0, 0, 'fam101', 'fam101', 'fam101', 0, 0, 'fam106', 'fam106', 'fam106', 'fam109' ), mother = c( 0, 0, 'fam102', 'fam102', 'fam102', 0, 0, 'fam107', 'fam107', 'fam107', 'fam112' ) ) test1newmom <- with(test1char, fix_parents(id, father, mother, sex, missid = NA_character_ )) Pedigree(test1newmom)
test1char <- data.frame( id = paste('fam', 101:111, sep = ''), sex = c('male', 'female')[c(1, 2, 1, 2, 1, 1, 2, 2, 1, 2, 1)], father = c( 0, 0, 'fam101', 'fam101', 'fam101', 0, 0, 'fam106', 'fam106', 'fam106', 'fam109' ), mother = c( 0, 0, 'fam102', 'fam102', 'fam102', 0, 0, 'fam107', 'fam107', 'fam107', 'fam112' ) ) test1newmom <- with(test1char, fix_parents(id, father, mother, sex, missid = NA_character_ )) Pedigree(test1newmom)
Perform transformation uppon a vector given as the one
containing the affection status to obtain an affected
binary state.
generate_aff_inds( values, mods_aff = NULL, threshold = NULL, sup_thres_aff = NULL )
generate_aff_inds( values, mods_aff = NULL, threshold = NULL, sup_thres_aff = NULL )
values |
Vector containing the values of the column to process. |
mods_aff |
Vector of modality to consider as affected in the case
where the |
threshold |
Numeric value separating the affected and healthy subject
in the case where the |
sup_thres_aff |
Boolean defining if the affected individual are above
the threshold or not.
If |
This function helps to configure a binary state from a character or numeric variable.
character
or a factor
:In this case the affected state will depend on the modality provided as an affected status. All individuals with a value corresponding to one of the element in the vector mods_aff will be considered as affected.
numeric
:In this case the affected state will be TRUE
if the value of the individual
is above the threshold if sup_thres_aff is TRUE
and FALSE
otherwise.
A dataframe with the affected
column processed accordingly.
The different columns are:
mods
: The different modalities of the column
labels
: The labels of the different modalities
affected
: The column processed to have only TRUE/FALSE values
Louis Le Nézet
generate_aff_inds(c(1, 2, 3, 4, 5), threshold = 3, sup_thres_aff = TRUE) generate_aff_inds(c("A", "B", "C", "A", "V", "B"), mods_aff = c("A", "B"))
generate_aff_inds(c(1, 2, 3, 4, 5), threshold = 3, sup_thres_aff = TRUE) generate_aff_inds(c("A", "B", "C", "A", "V", "B"), mods_aff = c("A", "B"))
Perform transformation uppon a vector given as the one
containing the availability status to compute the border color.
The vector given will be transformed using the vect_to_binary()
function.
generate_border(values, colors_avail = c("green", "black"))
generate_border(values, colors_avail = c("green", "black"))
values |
The vector containing the values to process as available. |
colors_avail |
Set of 2 colors to use for the box's border of an
individual. The first color will be used for available individual
( |
A list of three elements
mods
: The processed values column as a numeric factor
avail
: A logical vector indicating if the individual is available
sc_bord
: A dataframe containing the description of each modality of the
scale
generate_border(c(1, 0, 1, 0, NA, 1, 0, 1, 0, NA))
generate_border(c(1, 0, 1, 0, NA, 1, 0, 1, 0, NA))
Perform transformation uppon a dataframe given to compute the colors for the filling and the border of the individuals based on the affection and availability status.
## S4 method for signature 'character' generate_colors( obj, avail, mods_aff = NULL, keep_full_scale = FALSE, colors_aff = c("yellow2", "red"), colors_unaff = c("white", "steelblue4"), colors_avail = c("green", "black") ) ## S4 method for signature 'numeric' generate_colors( obj, avail, threshold = 0.5, sup_thres_aff = TRUE, keep_full_scale = FALSE, breaks = 3, colors_aff = c("yellow2", "red"), colors_unaff = c("white", "steelblue4"), colors_avail = c("green", "black") ) ## S4 method for signature 'Pedigree' generate_colors( obj, col_aff = "affected", add_to_scale = TRUE, col_avail = "avail", mods_aff = NULL, threshold = 0.5, sup_thres_aff = TRUE, keep_full_scale = FALSE, breaks = 3, colors_aff = c("yellow2", "red"), colors_unaff = c("white", "steelblue4"), colors_avail = c("green", "black"), reset = TRUE )
## S4 method for signature 'character' generate_colors( obj, avail, mods_aff = NULL, keep_full_scale = FALSE, colors_aff = c("yellow2", "red"), colors_unaff = c("white", "steelblue4"), colors_avail = c("green", "black") ) ## S4 method for signature 'numeric' generate_colors( obj, avail, threshold = 0.5, sup_thres_aff = TRUE, keep_full_scale = FALSE, breaks = 3, colors_aff = c("yellow2", "red"), colors_unaff = c("white", "steelblue4"), colors_avail = c("green", "black") ) ## S4 method for signature 'Pedigree' generate_colors( obj, col_aff = "affected", add_to_scale = TRUE, col_avail = "avail", mods_aff = NULL, threshold = 0.5, sup_thres_aff = TRUE, keep_full_scale = FALSE, breaks = 3, colors_aff = c("yellow2", "red"), colors_unaff = c("white", "steelblue4"), colors_avail = c("green", "black"), reset = TRUE )
obj |
A Pedigree object or a vector containing the affection status for each individuals. The affection status can be numeric or a character. |
avail |
A logical vector with the availability status of the
individuals
(i.e. |
mods_aff |
Vector of modality to consider as affected in the case
where the |
keep_full_scale |
Boolean defining if the affection values need to
be set as a scale. If |
colors_aff |
Set of increasing colors to use for the filling of the affected individuls. |
colors_unaff |
Set of increasing colors to use for the filling of the unaffected individuls. |
colors_avail |
Set of 2 colors to use for the box's border of an
individual. The first color will be used for available individual
( |
threshold |
Numeric value separating the affected and healthy subject
in the case where the |
sup_thres_aff |
Boolean defining if the affected individual are above
the threshold or not.
If |
breaks |
Number of breaks to use when using full scale with numeric values. The same number of breaks will be done for values from affected individuals and unaffected individuals. |
col_aff |
A character vector with the name of the column to be used for the affection status. |
add_to_scale |
Boolean defining if the scales need to be added to the existing scales or if they need to replace the existing scales. |
col_avail |
A character vector with the name of the column to be used for the availability status. |
reset |
If |
The colors will be set using the generate_fill()
and the
generate_border()
functions respectively for the filling and the border.
A list of two elements
The list containing the filling colors processed and their description
The list containing the border colors processed and their description
The Pedigree object with the affected
and avail
columns
processed accordingly as well as the scales
slot updated.
generate_colors( c("A", "B", "A", "B", NA, "A", "B", "A", "B", NA), c(1, 0, 1, 0, NA, 1, 0, 1, 0, NA), mods_aff = "A", ) generate_colors( c(10, 0, 5, 7, NA, 6, 2, 1, 3, NA), c(1, 0, 1, 0, NA, 1, 0, 1, 0, NA), threshold = 3, keep_full_scale = TRUE ) data("sampleped") ped <- Pedigree(sampleped) ped <- generate_colors(ped, "affected", add_to_scale=FALSE) scales(ped)
generate_colors( c("A", "B", "A", "B", NA, "A", "B", "A", "B", NA), c(1, 0, 1, 0, NA, 1, 0, 1, 0, NA), mods_aff = "A", ) generate_colors( c(10, 0, 5, 7, NA, 6, 2, 1, 3, NA), c(1, 0, 1, 0, NA, 1, 0, 1, 0, NA), threshold = 3, keep_full_scale = TRUE ) data("sampleped") ped <- Pedigree(sampleped) ped <- generate_colors(ped, "affected", add_to_scale=FALSE) scales(ped)
Perform transformation uppon a column given as the one containing affection status to compute the filling color.
generate_fill( values, affected, labels, keep_full_scale = FALSE, breaks = 3, colors_aff = c("yellow2", "red"), colors_unaff = c("white", "steelblue4") )
generate_fill( values, affected, labels, keep_full_scale = FALSE, breaks = 3, colors_aff = c("yellow2", "red"), colors_unaff = c("white", "steelblue4") )
values |
The vector containing the values to process as affection. |
affected |
A logical vector with the affection status of the
individuals
(i.e. |
labels |
The vector containing the labels to use for the affection. |
keep_full_scale |
Boolean defining if the affection values need to
be set as a scale. If |
breaks |
Number of breaks to use when using full scale with numeric values. The same number of breaks will be done for values from affected individuals and unaffected individuals. |
colors_aff |
Set of increasing colors to use for the filling of the affected individuls. |
colors_unaff |
Set of increasing colors to use for the filling of the unaffected individuls. |
The colors will be set using the
grDevices::colorRampPalette()
function
with the colors given as parameters.
The colors will be set as follow:
If keep_full_scale is FALSE
:
Then the affected individuals will get the first color of the
colors_aff vector and the unaffected individuals will get the
first color of the colors_unaff vector.
If keep_full_scale is TRUE
:
If values isn't numeric:
Each levels of the affected values vector will get it's own color from
the colors_aff vector using the grDevices::colorRampPalette()
and
the same will be done for the unaffected individuals using the
colors_unaff.
If values is numeric: The mean of the affected individuals will be compared to the mean of the unaffected individuals and the colors will be set up such as the color gradient follow the direction of the affection.
A list of three elements
mods
: The processed values column as a numeric factor
affected
: A logical vector indicating if the individual is affected
sc_fill
: A dataframe containing the description of each modality of the
scale
aff <- generate_aff_inds(seq_len(5), threshold = 3, sup_thres_aff = TRUE) generate_fill(seq_len(5), aff$affected, aff$labels) generate_fill(seq_len(5), aff$affected, aff$labels, keep_full_scale = TRUE)
aff <- generate_aff_inds(seq_len(5), threshold = 3, sup_thres_aff = TRUE) generate_fill(seq_len(5), aff$affected, aff$labels) generate_fill(seq_len(5), aff$affected, aff$labels, keep_full_scale = TRUE)
Get twin relationships
get_twin_rel(obj)
get_twin_rel(obj)
obj |
A Pedigree object |
This routine function determine the twin relationships
in a Pedigree. It determine the order of the twins
in the Pedigree.
It is used by auto_hint()
.
A list containing components
twinset
the set of twins
twinrel
the twins relationships
twinord
the order of the twins
The hints are used to specify the order of the individuals in the pedigree and to specify the order of the spouses.
You either need to provide horder or spouse in the dedicated parameters (together or separately), or inside a list.
Hints(horder, spouse) ## S4 method for signature 'list,missing_OR_NULL' Hints(horder, spouse) ## S4 method for signature 'numeric,data.frame' Hints(horder, spouse) ## S4 method for signature 'numeric,missing_OR_NULL' Hints(horder, spouse)
Hints(horder, spouse) ## S4 method for signature 'list,missing_OR_NULL' Hints(horder, spouse) ## S4 method for signature 'numeric,data.frame' Hints(horder, spouse) ## S4 method for signature 'numeric,missing_OR_NULL' Hints(horder, spouse)
horder |
A named numeric vector with one element per subject in the Pedigree. It determines the relative horizontal order of subjects within a sibship, as well as the relative order of processing for the founder couples. (For this latter, the female founders are ordered as though they were sisters). The names of the vector should be the individual identifiers. |
spouse |
A data.frame with one row per hinted marriage, usually only
a few marriages in a pedigree will need an added hint, for instance reverse
the plot order of a husband/wife pair.
Each row contains the id of the left spouse (i.e. |
A Hints object.
horder
A numeric named vector with one element per subject in the Pedigree. It determines the relative horizontal order of subjects within a sibship, as well as the relative order of processing for the founder couples. (For this latter, the female founders are ordered as though they were sisters).
spouse
A data.frame with one row per hinted marriage, usually
only a few marriages in a Pedigree will need an added hint, for
instance reverse the plot order of a husband/wife pair.
Each row contains the identifiers of the left spouse, the right hand spouse,
and the anchor (i.e : 1
= left, 2
= right, 0
= either).
horder(x)
: Get the horder vector
horder(x) <- value
: Set the horder vector
spouse(x)
: Get the spouse data.frame
spouse(x) <- value
: Set the spouse data.frame
as.list(x)
: Convert a Hints object to a list
subset(x, i, keep = TRUE)
: Subset a Hints object
based on the individuals identifiers given.
i
: A vector of individuals identifiers to keep.
keep
: A logical value indicating if the individuals
should be kept or deleted.
Hints( list( horder = c("1" = 1, "2" = 2, "3" = 3), spouse = data.frame( idl = c("1", "2"), idr = c("2", "3"), anchor = c(1, 2) ) ) ) Hints( horder = c("1" = 1, "2" = 2, "3" = 3), spouse = data.frame( idl = c("1", "2"), idr = c("2", "3"), anchor = c(1, 2) ) ) Hints( horder = c("1" = 1, "2" = 2, "3" = 3) )
Hints( list( horder = c("1" = 1, "2" = 2, "3" = 3), spouse = data.frame( idl = c("1", "2"), idr = c("2", "3"), anchor = c(1, 2) ) ) ) Hints( horder = c("1" = 1, "2" = 2, "3" = 3), spouse = data.frame( idl = c("1", "2"), idr = c("2", "3"), anchor = c(1, 2) ) ) Hints( horder = c("1" = 1, "2" = 2, "3" = 3) )
Transform identity by descent (IBD) matrix data from the form produced by external programs such as SOLAR into the compact form used by the coxme and lmekin routines.
ibd_matrix(id1, id2, ibd, idmap, diagonal)
ibd_matrix(id1, id2, ibd, idmap, diagonal)
id1 |
A character vector with the id of the first individuals of each pairs |
id2 |
A character vector with the id of the second individuals of each pairs |
ibd |
the IBD value for that pair |
idmap |
an optional 2 column matrix or data frame whose first element
is the internal value (as found in |
diagonal |
optional value for the diagonal element. If present, any missing diagonal elements in the input data will be set to this value. |
The IBD matrix for a set of n subjects will be an n by n symmetric matrix whose i,j element is the contains, for some given genetic location, a 0/1 indicator of whether 0, 1/2 or 2/2 of the alleles for i and j are identical by descent. Fractional values occur if the IBD fraction must be imputed. The diagonal will be 1. Since a large fraction of the values will be zero, programs such as Solar return a data set containing only the non-zero elements. As well, Solar will have renumbered the subjects as seq_len(n) in such a way that families are grouped together in the matrix; a separate index file contains the mapping between this new id and the original one. The final matrix should be labeled with the original identifiers.
a sparse matrix of class dsCMatrix
. This is the same form
used for kinship matrices.
Select the ids of the informative individuals.
## S4 method for signature 'character_OR_integer' is_informative(obj, avail, affected, informative = "AvAf") ## S4 method for signature 'Pedigree' is_informative(obj, col_aff = NULL, informative = "AvAf", reset = FALSE)
## S4 method for signature 'character_OR_integer' is_informative(obj, avail, affected, informative = "AvAf") ## S4 method for signature 'Pedigree' is_informative(obj, col_aff = NULL, informative = "AvAf", reset = FALSE)
obj |
A character vector with the id of the individuals or a
|
avail |
A logical vector with the availability status of the
individuals
(i.e. |
affected |
A logical vector with the affection status of the
individuals
(i.e. |
informative |
Informative individuals selection can take 5 values:
|
col_aff |
A character vector with the name of the column to be used for the affection status. |
reset |
If |
Depending on the informative parameter, the function will extract the ids of the informative individuals. In the case of a numeric vector, the function will return the same vector. In the case of a boolean, the function will return the ids of the individuals if TRUE, NA otherwise. In the case of a string, the function will return the ids of the corresponding informative individuals based on the avail and affected columns.
A vector of individuals informative identifiers.
The Pedigree object with its isinf
slot updated.
is_informative(c("A", "B", "C", "D", "E"), informative = c("A", "B")) is_informative(c("A", "B", "C", "D", "E"), informative = c(1, 2)) is_informative(c("A", "B", "C", "D", "E"), informative = c("A", "B")) is_informative(c("A", "B", "C", "D", "E"), avail = c(1, 0, 0, 1, 1), affected = c(0, 1, 0, 1, 1), informative = "AvAf") is_informative(c("A", "B", "C", "D", "E"), avail = c(1, 0, 0, 1, 1), affected = c(0, 1, 0, 1, 1), informative = "AvOrAf") is_informative(c("A", "B", "C", "D", "E"), informative = c(TRUE, FALSE, TRUE, FALSE, TRUE)) data("sampleped") ped <- Pedigree(sampleped) ped <- is_informative(ped, col_aff = "affection_mods") isinf(ped(ped))
is_informative(c("A", "B", "C", "D", "E"), informative = c("A", "B")) is_informative(c("A", "B", "C", "D", "E"), informative = c(1, 2)) is_informative(c("A", "B", "C", "D", "E"), informative = c("A", "B")) is_informative(c("A", "B", "C", "D", "E"), avail = c(1, 0, 0, 1, 1), affected = c(0, 1, 0, 1, 1), informative = "AvAf") is_informative(c("A", "B", "C", "D", "E"), avail = c(1, 0, 0, 1, 1), affected = c(0, 1, 0, 1, 1), informative = "AvOrAf") is_informative(c("A", "B", "C", "D", "E"), informative = c(TRUE, FALSE, TRUE, FALSE, TRUE)) data("sampleped") ped <- Pedigree(sampleped) ped <- is_informative(ped, col_aff = "affection_mods") isinf(ped(ped))
Computes the depth of each subject in the Pedigree.
kindepth(obj, ...) ## S4 method for signature 'character_OR_integer' kindepth(obj, dadid, momid, align_parents = FALSE) ## S4 method for signature 'Pedigree' kindepth(obj, align_parents = FALSE) ## S4 method for signature 'Ped' kindepth(obj, align_parents = FALSE)
kindepth(obj, ...) ## S4 method for signature 'character_OR_integer' kindepth(obj, dadid, momid, align_parents = FALSE) ## S4 method for signature 'Pedigree' kindepth(obj, align_parents = FALSE) ## S4 method for signature 'Ped' kindepth(obj, align_parents = FALSE)
obj |
A character vector with the id of the individuals or a
|
... |
Additional arguments |
dadid |
A vector containing for each subject, the identifiers of the biologicals fathers. |
momid |
A vector containing for each subject, the identifiers of the biologicals mothers. |
align_parents |
If |
Mark each person as to their depth in a Pedigree; 0
for a founder,
otherwise :
In the case of an inbred Pedigree a perfect alignment may not exist.
An integer vector containing the depth for each subject
Terry Therneau, updated by Louis Le Nézet
kindepth( c("A", "B", "C", "D", "E"), c("C", "D", "0", "0", "0"), c("E", "E", "0", "0", "0") ) data(sampleped) ped1 <- Pedigree(sampleped[sampleped$famid == "1",]) kindepth(ped1)
kindepth( c("A", "B", "C", "D", "E"), c("C", "D", "0", "0", "0"), c("E", "E", "0", "0", "0") ) data(sampleped) ped1 <- Pedigree(sampleped[sampleped$famid == "1",]) kindepth(ped1)
Compute the kinship matrix for a set of related autosomal subjects. The function is generic, and can accept a Pedigree, a Ped or a vector as the first argument.
## S4 method for signature 'Ped' kinship(obj, chrtype = "autosome") ## S4 method for signature 'character' kinship(obj, dadid, momid, sex, chrtype = "autosome") ## S4 method for signature 'Pedigree' kinship(obj, chrtype = "autosome")
## S4 method for signature 'Ped' kinship(obj, chrtype = "autosome") ## S4 method for signature 'character' kinship(obj, dadid, momid, sex, chrtype = "autosome") ## S4 method for signature 'Pedigree' kinship(obj, chrtype = "autosome")
obj |
A Pedigree or Ped object or a vector of subject identifiers. |
chrtype |
chromosome type. The currently supported types are 'autosome' and 'X' or 'x'. |
dadid |
A vector containing for each subject, the identifiers of the biologicals fathers. |
momid |
A vector containing for each subject, the identifiers of the biologicals mothers. |
sex |
A character, factor or numeric vector corresponding to
the gender of the individuals. This will be transformed to an ordered factor
with the following levels:
|
The function will usually be called with a Pedigree. The call with a Ped or a vector is provided for backwards compatibility with an earlier release of the library that was less capable. Note that when using with a Ped or a vector, any information on twins is not available to the function.
When called with a Pedigree, the routine
will create a block-diagonal-symmetric sparse matrix object of class
dsCMatrix
. Since the [i, j]
value of the result is 0 for any two
unrelated individuals i and j and a Matrix
utilizes sparse
representation, the resulting object is often orders of magnitude smaller
than an ordinary matrix.
Two genes G1 and G2 are identical by descent (IBD) if they are both physical copies of the same ancestral gene; two genes are identical by state if they represent the same allele. So the brown eye gene that I inherited from my mother is IBD with hers; the same gene in an unrelated individual is not.
The kinship coefficient between two subjects is the probability that a randomly selected allele from a locus will be IBD between them. It is obviously 0 between unrelated individuals. For an autosomal site and no inbreeding it will be 0.5 for an individual with themselves, .25 between mother and child, .125 between an uncle and neice, etc.
The computation is based on a recursive algorithm described in Lange, which assumes that the founder alleles are all independent.
A matrix of kinship coefficients.
A matrix of kinship coefficients ordered by families present in the Pedigree object.
K Lange, Mathematical and Statistical Methods for Genetic Analysis, Springer-Verlag, New York, 1997.
kinship(c("A", "B", "C", "D", "E"), c("C", "D", "0", "0", "0"), c("E", "E", "0", "0", "0"), sex = c(1, 2, 1, 2, 1)) kinship(c("A", "B", "C", "D", "E"), c("C", "D", "0", "0", "0"), c("E", "E", "0", "0", "0"), sex = c(1, 2, 1, 2, 1), chrtype = "x" ) data(sampleped) ped <- Pedigree(sampleped) kinship(ped)
kinship(c("A", "B", "C", "D", "E"), c("C", "D", "0", "0", "0"), c("E", "E", "0", "0", "0"), sex = c(1, 2, 1, 2, 1)) kinship(c("A", "B", "C", "D", "E"), c("C", "D", "0", "0", "0"), c("E", "E", "0", "0", "0"), sex = c(1, 2, 1, 2, 1), chrtype = "x" ) data(sampleped) ped <- Pedigree(sampleped) kinship(ped)
Construct a family identifier from pedigree information
## S4 method for signature 'character' make_famid(obj, dadid, momid) ## S4 method for signature 'Pedigree' make_famid(obj)
## S4 method for signature 'character' make_famid(obj, dadid, momid) ## S4 method for signature 'Pedigree' make_famid(obj)
obj |
A character vector with the id of the individuals or a
|
dadid |
A vector containing for each subject, the identifiers of the biologicals fathers. |
momid |
A vector containing for each subject, the identifiers of the biologicals mothers. |
Create a vector of length n, giving the family 'tree' number of each subject. If the Pedigree is totally connected, then everyone will end up in tree 1, otherwise the tree numbers represent the disconnected subfamilies. Singleton subjects give a zero for family number.
An integer vector giving family groupings
An updated Pedigree object with the family id added and with all ids updated
make_famid( c("A", "B", "C", "D", "E", "F"), c("C", "D", "0", "0", "0", "0"), c("E", "E", "0", "0", "0", "0") ) data(sampleped) ped1 <- Pedigree(sampleped[,-1]) make_famid(ped1)
make_famid( c("A", "B", "C", "D", "E", "F"), c("C", "D", "0", "0", "0", "0"), c("E", "E", "0", "0", "0", "0") ) data(sampleped) ped1 <- Pedigree(sampleped[,-1]) make_famid(ped1)
Compute the minimum distance between the informative individuals and all the others. This distance is a transformation of the maximum kinship degree between the informative individuals and all the others. This transformation is done by taking the log2 of the inverse of the maximum kinship degree.
Therefore, the minimum distance is 0 when the maximum kinship is 1 and is infinite when the maximum kinship is 0. For siblings, the kinship value is 0.5 and the minimum distance is 1. Each time the kinship degree is divided by 2, the minimum distance is increased by 1.
## S4 method for signature 'character' min_dist_inf(obj, dadid, momid, sex, id_inf) ## S4 method for signature 'Pedigree' min_dist_inf(obj, col_aff = NULL, informative = "AvAf", reset = FALSE, ...) ## S4 method for signature 'Ped' min_dist_inf(obj, informative = "AvAf", reset = FALSE)
## S4 method for signature 'character' min_dist_inf(obj, dadid, momid, sex, id_inf) ## S4 method for signature 'Pedigree' min_dist_inf(obj, col_aff = NULL, informative = "AvAf", reset = FALSE, ...) ## S4 method for signature 'Ped' min_dist_inf(obj, informative = "AvAf", reset = FALSE)
obj |
A character vector with the id of the individuals or a
|
... |
Additional arguments |
dadid |
A vector containing for each subject, the identifiers of the biologicals fathers. |
momid |
A vector containing for each subject, the identifiers of the biologicals mothers. |
sex |
A character, factor or numeric vector corresponding to
the gender of the individuals. This will be transformed to an ordered factor
with the following levels:
|
id_inf |
An identifiers vector of informative individuals. |
col_aff |
A character vector with the name of the column to be used for the affection status. |
informative |
Informative individuals selection can take 5 values:
|
reset |
If TRUE, the |
A vector of the minimum distance between the informative individuals
and all the others corresponding to the order of the individuals in the
obj
vector.
The Pedigree object with a new slot named 'kin' containing the minimum
distance between each individuals and the informative individuals.
The isinf
slot is also updated with the informative individuals.
min_dist_inf( c("A", "B", "C", "D", "E"), c("C", "D", "0", "0", "0"), c("E", "E", "0", "0", "0"), sex = c(1, 2, 1, 2, 1), id_inf = c("D", "E") ) data(sampleped) ped <- Pedigree(sampleped) kin(ped(min_dist_inf(ped, col_aff = "affection_mods")))
min_dist_inf( c("A", "B", "C", "D", "E"), c("C", "D", "0", "0", "0"), c("E", "E", "0", "0", "0"), sex = c(1, 2, 1, 2, 1), id_inf = c("D", "E") ) data(sampleped) ped <- Pedigree(sampleped) kin(ped(min_dist_inf(ped, col_aff = "affection_mods")))
Data from the Minnesota Breast Cancer Family Study. This contains extended pedigrees from 426 families, each identified by a single proband in 1945-1952, with follow up for incident breast cancer.
data(minnbreast)
data(minnbreast)
A data frame with 28081 observations, one line per subject, on the following 14 variables.
id
: Subject identifier
proband
: If 1, this subject is one of the original
426 probands
fatherid
: Identifier of the father, if the father is part of
the data set; zero otherwise
motherid
: Identifier of the mother, if the mother is part of
the data set; zero otherwise
famid
: Family identifier
endage
: Age at last follow-up or incident cancer
cancer
: 1
= breast cancer (females) or prostate cancer (males),
0
= censored
yob
: Year of birth
education
: Amount of education: 1-8 years, 9-12 years, high
school graduate, vocational education beyond high school,
some college but did not graduate, college graduate,
post-graduate education, refused to answer on the questionnaire
marstat
: Marital status: married, living with someone in a
marriage-like relationship, separated
or divorced, widowed, never married, refused to answer the questionaire
everpreg
: Ever pregnant at the time of baseline survey
parity
: Number of births
nbreast
: Number of breast biopsies
sex
: M
or F
bcpc
: Part of one of the families in the breast / prostate
cancer substudy: 0
= no, 1
= yes.
Note that subjects who were recruited to the overall study after the date of
the BP substudy are coded as zero.
The original study was conducted by Dr. Elving Anderson at the Dight Institute for Human Genetics at the University of Minnesota. From 1944 to 1952, 544 sequential breast cancer cases seen at the University Hospital were enrolled, and information gathered on parents, siblings, offspring, aunts / uncles, and grandparents with the goal of understanding possible familial aspects of brest cancer. In 1991 the study was resurrected by Dr Tom Sellers.
Of the original 544 he excluded 58 prevalent cases, along with another 19 who had less than 2 living relatives at the time of Dr Anderson's survey. Of the remaining 462 families 10 had no living members, 23 could not be located and 8 refused, leaving 426 families on whom updated pedigrees were obtained.
This gave a study with 13351 males and 12699 females (5183 marry-ins). Primary questions were the relationship of early life exposures, breast density, and pharmacogenomics on incident breast cancer risk. For a subset of the families data was gathered on prostate cancer risk for male subjects via questionnaires sent to men over 40. Other than this, data items other than parentage are limited to the female subjects. In 2003 a second phase of the study was instituted. The pedigrees were further extended to the numbers found in this data set, and further data gathered by questionnaire.
Epidemiologic and genetic follow-up study of 544 Minnesota breast cancer families: design and methods. Sellers TA, Anderson VE, Potter JD, Bartow SA, Chen PL, Everson L, King RA, Kuni CC, Kushi LH, McGovern PG, et al. Genetic Epidemiology, 1995; 12(4):417-29.
Evaluation of familial clustering of breast and prostate cancer in the Minnesota Breast Cancer Family Study. Grabrick DM, Cerhan JR, Vierkant RA, Therneau TM, Cheville JC, Tindall DJ, Sellers TA. Cancer Detect Prev. 2003; 27(1):30-6.
Risk of breast cancer with oral contraceptive use in women with a family history of breast cancer. Grabrick DM, Hartmann LC, Cerhan JR, Vierkant RA, Therneau TM, Vachon CM, Olson JE, Couch FJ, Anderson KE, Pankratz VS, Sellers TA. JAMA. 2000; 284(14):1791-8.
data(minnbreast) breastped <- Pedigree(minnbreast, cols_ren_ped = list( "indId" = "id", "fatherId" = "fatherid", "motherId" = "motherid", "gender" = "sex", "family" = "famid" ), missid = "0", col_aff = "cancer" ) summary(breastped) scales(breastped) #plot family 8, proband is solid, slash for cancers #plot(breastped[famid(breastped) == "8"])
data(minnbreast) breastped <- Pedigree(minnbreast, cols_ren_ped = list( "indId" = "id", "fatherId" = "fatherid", "motherId" = "motherid", "gender" = "sex", "family" = "famid" ), missid = "0", col_aff = "cancer" ) summary(breastped) scales(breastped) #plot family 8, proband is solid, slash for cancers #plot(breastped[famid(breastped) == "8"])
Normalise dataframe for a Ped object
norm_ped( ped_df, na_strings = c("NA", ""), missid = NA_character_, try_num = FALSE )
norm_ped( ped_df, na_strings = c("NA", ""), missid = NA_character_, try_num = FALSE )
ped_df |
A data.frame with the individuals informations. The minimum columns required are: - `indID` individual identifiers -> `id` - `fatherId` biological fathers identifiers -> `dadid` - `motherId` biological mothers identifiers -> `momdid` - `gender` sex of the individual -> `sex` - `family` family identifiers -> `famid` The The following columns are also recognize and will be transformed with the
- `sterilisation` status -> `steril` - `available` status -> `avail` - `vitalStatus`, is the individual dead -> `status` - `affection` status -> `affected` The values recognized for those columns are |
na_strings |
Vector of strings to be considered as NA values. |
missid |
A character vector with the missing values identifiers.
All the id, dadid and momid corresponding to those values will be set
to |
try_num |
Boolean defining if the function should try to convert all the columns to numeric. |
Normalise a dataframe and check for columns correspondance
to be able to use it as an input to create a Ped object.
Multiple test are done and errors are checked.
Sex is calculated based on the gender
column.
The steril
column need to be a boolean either TRUE, FALSE or 'NA'.
Will be considered available any individual with no 'NA' values in the
available
column.
Duplicated indId
will nullify the relationship of the individual.
All individuals with errors will be remove from the dataframe and will
be transfered to the error dataframe.
A number of checks are done to ensure the dataframe is correct:
All ids (id, dadid, momid, famid) are not empty (!= ""
)
All id
are unique (no duplicated)
All dadid
and momid
are unique in the id column (no duplicated)
id is not the same as dadid or momid
Either have both parents or none
All sex code are either male
, female
, terminated
or unknown
.
No parents are steril
All fathers are male
All mothers are female
A dataframe with different variable correctly standardized
and with the errors identified in the error
column
df <- data.frame( indId = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), fatherId = c("A", 0, 1, 3, 0, 4, 1, 0, 6, 6), motherId = c(0, 0, 2, 2, 0, 5, 2, 0, 8, 8), gender = c(1, 2, "m", "man", "f", "male", "m", "m", "f", "f"), available = c("A", "1", 0, NA, 1, 0, 1, 0, 1, 0), famid = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2), sterilisation = c("TRUE", "FALSE", TRUE, FALSE, 1, 0, 1, 0, 1, "TRUE"), vitalStatus = c("TRUE", "FALSE", TRUE, FALSE, 1, 0, 1, 0, 1, 0), affection = c("TRUE", "FALSE", TRUE, FALSE, 1, 0, 1, 0, 1, 0) ) tryCatch( norm_ped(df), error = function(e) print(e) )
df <- data.frame( indId = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), fatherId = c("A", 0, 1, 3, 0, 4, 1, 0, 6, 6), motherId = c(0, 0, 2, 2, 0, 5, 2, 0, 8, 8), gender = c(1, 2, "m", "man", "f", "male", "m", "m", "f", "f"), available = c("A", "1", 0, NA, 1, 0, 1, 0, 1, 0), famid = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2), sterilisation = c("TRUE", "FALSE", TRUE, FALSE, 1, 0, 1, 0, 1, "TRUE"), vitalStatus = c("TRUE", "FALSE", TRUE, FALSE, 1, 0, 1, 0, 1, 0), affection = c("TRUE", "FALSE", TRUE, FALSE, 1, 0, 1, 0, 1, 0) ) tryCatch( norm_ped(df), error = function(e) print(e) )
Normalise a dataframe and check for columns correspondance to be able to use it as an input to create a Ped object.
norm_rel(rel_df, na_strings = c("NA", ""), missid = NA_character_)
norm_rel(rel_df, na_strings = c("NA", ""), missid = NA_character_)
rel_df |
A data.frame with the special relationships between
individuals. See
The value relation code recognized by the function are the one defined
by the |
na_strings |
Vector of strings to be considered as NA values. |
missid |
A character vector with the missing values identifiers.
All the id, dadid and momid corresponding to those values will be set
to |
The famid
column, if provided, will be merged to the ids field
separated by an underscore using the upd_famid_id()
function.
The code
column will be transformed with the rel_code_to_factor()
.
Multiple test are done and errors are checked.
A number of checks are done to ensure the dataframe is correct:
All ids (id1, id2) are not empty (!= ""
)
id1
and id2
are not the same
All code are recognised as either "MZ twin", "DZ twin", "UZ twin" or "Spouse"
A dataframe with the errors identified
df <- data.frame( id1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), id2 = c(2, 3, 4, 5, 6, 7, 8, 9, 10, 1), code = c("MZ twin", "DZ twin", "UZ twin", "Spouse", 1, 2, 3, 4, "MzTwin", "sp oUse"), famid = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2) ) norm_rel(df)
df <- data.frame( id1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), id2 = c(2, 3, 4, 5, 6, 7, 8, 9, 10, 1), code = c("MZ twin", "DZ twin", "UZ twin", "Spouse", 1, 2, 3, 4, "MzTwin", "sp oUse"), famid = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2) ) norm_rel(df)
Compute the number of childs per individual
## S4 method for signature 'character_OR_integer' num_child(obj, dadid, momid, rel_df = NULL, missid = NA_character_) ## S4 method for signature 'Pedigree' num_child(obj, reset = FALSE)
## S4 method for signature 'character_OR_integer' num_child(obj, dadid, momid, rel_df = NULL, missid = NA_character_) ## S4 method for signature 'Pedigree' num_child(obj, reset = FALSE)
obj |
A character vector with the id of the individuals or a
|
dadid |
A vector containing for each subject, the identifiers of the biologicals fathers. |
momid |
A vector containing for each subject, the identifiers of the biologicals mothers. |
rel_df |
A data.frame with the special relationships between
individuals. See
The value relation code recognized by the function are the one defined
by the |
missid |
A character vector with the missing values identifiers.
All the id, dadid and momid corresponding to those values will be set
to |
reset |
If TRUE, the |
Compute the number of direct child but also the number of indirect child given by the ones related with the linked spouses. If a relation ship dataframe is given, then even if no children is present between 2 spouses, the indirect childs will still be added.
A dataframe with the columns num_child_dir
, num_child_ind
and
num_child_tot
giving respectively the direct, indirect and total number
of child.
An updated Pedigree object with the columns num_child_dir
,
num_child_ind
and num_child_tot
added to the
Pedigree ped
slot.
num_child( obj = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"), dadid = c("3", "3", "6", "8", "0", "0", "0", "0", "0", "0"), momid = c("4", "5", "7", "9", "0", "0", "0", "0", "0", "0"), rel_df = data.frame( id1 = "10", id2 = "3", code = "Spouse" ) ) data(sampleped) ped1 <- Pedigree(sampleped[sampleped$famid == "1",]) ped1 <- num_child(ped1, reset = TRUE) summary(ped(ped1))
num_child( obj = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"), dadid = c("3", "3", "6", "8", "0", "0", "0", "0", "0", "0"), momid = c("4", "5", "7", "9", "0", "0", "0", "0", "0", "0"), rel_df = data.frame( id1 = "10", id2 = "3", code = "Spouse" ) ) data(sampleped) ped1 <- Pedigree(sampleped[sampleped$famid == "1",]) ped1 <- num_child(ped1, reset = TRUE) summary(ped(ped1))
Convert a Pedigree to a legend data frame for it to
be plotted afterwards with plot_fromdf()
.
## S4 method for signature 'Pedigree' ped_to_legdf(obj, boxh = 1, boxw = 1, cex = 1, adjx = 0, adjy = 0)
## S4 method for signature 'Pedigree' ped_to_legdf(obj, boxh = 1, boxw = 1, cex = 1, adjx = 0, adjy = 0)
obj |
A Pedigree object |
boxh |
Height of the polygons elements |
boxw |
Width of the polygons elements |
cex |
Character expansion of the text |
adjx |
default=0. Controls the horizontal text adjustment of the labels in the legend. |
adjy |
default=0. Controls the vertical text adjustment of the labels in the legend. |
The data frame contains the following columns:
x0
, y0
, x1
, y1
: coordinates of the elements
type
: type of the elements
fill
: fill color of the elements
border
: border color of the elements
angle
: angle of the shading of the elements
density
: density of the shading of the elements
cex
: size of the elements
label
: label of the elements
tips
: tips of the elements (used for the tooltips)
adjx
: horizontal text adjustment of the labels
adjy
: vertical text adjustment of the labels
All those columns are used by plot_fromdf()
to plot the graph.
A list containing the legend data frame and the user coordinates.
data("sampleped") ped <- Pedigree(sampleped) leg_df <- ped_to_legdf(ped) summary(leg_df$df) plot_fromdf(leg_df$df, usr = c(-1,15,0,7))
data("sampleped") ped <- Pedigree(sampleped) leg_df <- ped_to_legdf(ped) summary(leg_df$df) plot_fromdf(leg_df$df, usr = c(-1,15,0,7))
Convert a Pedigree to a data frame with all the elements and their
characteristic for them to be plotted afterwards with plot_fromdf()
.
## S4 method for signature 'Pedigree' ped_to_plotdf( obj, packed = TRUE, width = 6, align = c(1.5, 2), subreg = NULL, cex = 1, symbolsize = cex, pconnect = 0.5, branch = 0.6, aff_mark = TRUE, label = NULL, ... )
## S4 method for signature 'Pedigree' ped_to_plotdf( obj, packed = TRUE, width = 6, align = c(1.5, 2), subreg = NULL, cex = 1, symbolsize = cex, pconnect = 0.5, branch = 0.6, aff_mark = TRUE, label = NULL, ... )
obj |
A Pedigree object |
... |
Other arguments passed to |
packed |
Should the Pedigree be compressed. (i.e. allow diagonal lines connecting parents to children in order to have a smaller overall width for the plot.) |
width |
For a packed output, the minimum width of the plot, in inches. |
align |
For a packed Pedigree, align children under parents |
subreg |
A 4-element vector for (min x, max x, min depth, max depth),
used to edit away portions of the plot coordinates returned by
|
cex |
Character expansion of the text |
symbolsize |
Size of the symbols |
pconnect |
When connecting parent to children the program will try to
make the connecting line as close to vertical as possible, subject to it
lying inside the endpoints of the line that connects the children by at
least |
branch |
defines how much angle is used to connect various levels of nuclear families. |
aff_mark |
If |
label |
If not |
The data frame contains the following columns:
x0
, y0
, x1
, y1
: coordinates of the elements
type
: type of the elements
fill
: fill color of the elements
border
: border color of the elements
angle
: angle of the shading of the elements
density
: density of the shading of the elements
cex
: size of the elements
label
: label of the elements
tips
: tips of the elements (used for the tooltips)
adjx
: horizontal text adjustment of the labels
adjy
: vertical text adjustment of the labels
All those columns are used by plot_fromdf()
to plot the graph.
A list containing the data frame and the user coordinates.
data(sampleped) ped1 <- Pedigree(sampleped[sampleped$famid == 1,]) plot_df <- ped_to_plotdf(ped1) summary(plot_df$df) plot_fromdf(plot_df$df, usr = plot_df$par_usr$usr, boxh = plot_df$par_usr$boxh, boxw = plot_df$par_usr$boxw )
data(sampleped) ped1 <- Pedigree(sampleped[sampleped$famid == 1,]) plot_df <- ped_to_plotdf(ped1) summary(plot_df$df) plot_fromdf(plot_df$df, usr = plot_df$par_usr$usr, boxh = plot_df$par_usr$boxh, boxw = plot_df$par_usr$boxw )
S4 class to represent the identity informations of the individuals in a pedigree.
You either need to provide a vector of the same size for each slot
or a data.frame
with the corresponding columns.
The metadata will correspond to the columns that do not correspond to the Ped slots.
## S4 method for signature 'data.frame' Ped(obj, cols_used_init = FALSE, cols_used_del = FALSE) ## S4 method for signature 'character_OR_integer' Ped( obj, sex, dadid, momid, famid = NA, steril = NA, status = NA, avail = NA, affected = NA, missid = NA_character_ )
## S4 method for signature 'data.frame' Ped(obj, cols_used_init = FALSE, cols_used_del = FALSE) ## S4 method for signature 'character_OR_integer' Ped( obj, sex, dadid, momid, famid = NA, steril = NA, status = NA, avail = NA, affected = NA, missid = NA_character_ )
obj |
A character vector with the id of the individuals or a
|
cols_used_init |
Boolean defining if the columns that will be used should be initialised to NA. |
cols_used_del |
Boolean defining if the columns that will be used should be deleted. |
sex |
A character, factor or numeric vector corresponding to
the gender of the individuals. This will be transformed to an ordered factor
with the following levels:
|
dadid |
A vector containing for each subject, the identifiers of the biologicals fathers. |
momid |
A vector containing for each subject, the identifiers of the biologicals mothers. |
famid |
A character vector with the family identifiers of the individuals. If provide, will be aggregated to the individuals identifiers separated by an underscore. |
steril |
A logical vector with the sterilisation status of the
individuals
(i.e. |
status |
A logical vector with the affection status of the
individuals
(i.e. |
avail |
A logical vector with the availability status of the
individuals
(i.e. |
affected |
A logical vector with the affection status of the
individuals
(i.e. |
missid |
A character vector with the missing values identifiers.
All the id, dadid and momid corresponding to those values will be set
to |
The minimal needed informations are id
, dadid
, momid
and sex
.
The other slots are used to store recognized informations.
Additional columns can be added to the Ped object and will be
stored in the elementMetadata
slot of the Ped object.
A Ped object.
id
A character vector with the id of the individuals.
dadid
A character vector with the id of the father of the individuals.
momid
A character vector with the id of the mother of the individuals.
sex
An ordered factor vector for the sex of the individuals
(i.e. male
< female
< unknown
< terminated
).
famid
A character vector with the family identifiers of the individuals (optional).
steril
A logical vector with the sterilisation status of the
individuals
(i.e. FALSE
= not sterilised, TRUE
= sterilised, NA
= unknown).
status
A logical vector with the affection status of the
individuals
(i.e. FALSE
= alive, TRUE
= dead, NA
= unknown).
avail
A logical vector with the availability status of the
individuals
(i.e. FALSE
= not available, TRUE
= available, NA
= unknown).
affected
A logical vector with the affection status of the
individuals
(i.e. FALSE
= not affected, TRUE
= affected, NA
= unknown).
useful
A logical vector with the usefulness status of the
individuals
(i.e. FALSE
= not useful, TRUE
= useful).
isinf
A logical vector indicating if the individual is informative
or not
(i.e. FALSE
= not informative, TRUE
= informative).
kin
A numeric vector with minimal kinship value between the individuals and the useful individuals.
num_child_tot
A numeric vector with the total number of children of the individuals.
num_child_dir
A numeric vector with the number of children of the individuals.
num_child_ind
A numeric vector with the number of children of the individuals.
elementMetadata
A DataFrame with the additional metadata columns of the Ped object.
metadata
Meta informations about the pedigree.
For all the following accessors, the x
parameters is a Ped object.
Each getters return a vector of the same length as x
with the values
of the corresponding slot. For each getter, you have a setter with the
same name, to be use as slot(x) <- value
.
The value
parameter is a vector of the same length as x
, except
for the mcols()
accessors where value
is a list or a data.frame with
each elements with the same length as x
.
id(x)
: Individuals identifiers
dadid(x)
: Individuals' father identifiers
momid(x)
: Individuals' mother identifiers
famid(x)
: Individuals' family identifiers
sex(x)
: Individuals' gender
affected(x)
: Individuals' affection status
avail(x)
: Individuals' availability status
status(x)
: Individuals' death status
isinf(x)
: Individuals' informativeness status
kin(x)
: Individuals' kinship distance to the
informative individuals
useful(x)
: Individuals' usefullness status
mcols(x)
: Individuals' metadata
summary(x)
: Compute the summary of a Ped object
show(x)
: Convert the Ped object to a data.frame
and print it with its summary.
as.list(x)
: Convert a Ped object to a list with
the metadata columns at the end.
as.data.frame(x)
: Convert a Ped object to a data.frame with
the metadata columns at the end.
subset(x, i, del_parents = FALSE, keep = TRUE)
: Subset a Ped object
based on the individuals identifiers given.
i
: A vector of individuals identifiers to keep.
del_parents
: A logical value indicating if the parents
of the individuals should be deleted.
keep
: A logical value indicating if the individuals
should be kept or deleted.
data(sampleped) Ped(sampleped) Ped( obj = c("1", "2", "3", "4", "5", "6"), dadid = c("4", "4", "6", "0", "0", "0"), momid = c("5", "5", "5", "0", "0", "0"), sex = c(1, 2, 3, 1, 2, 1), missid = "0" )
data(sampleped) Ped(sampleped) Ped( obj = c("1", "2", "3", "4", "5", "6"), dadid = c("4", "4", "6", "0", "0", "0"), momid = c("5", "5", "5", "0", "0", "0"), sex = c(1, 2, 3, 1, 2, 1), missid = "0" )
A pedigree is a ensemble of individuals linked to each other into a family tree. A Pedigree object store the informations of the individuals and the special relationships between them. It also permit to store the informations needed to plot the pedigree (i.e. scales and hints).
Main constructor of the package.
This constructor help to create a Pedigree
object from
different data.frame
or a set of vectors.
If any errors are found in the data, the function will return the data.frame with the errors of the Ped object and the Rel object.
Pedigree(obj, ...) ## S4 method for signature 'character_OR_integer' Pedigree( obj, dadid, momid, sex, famid = NA, avail = NULL, affected = NULL, status = NULL, steril = NULL, rel_df = NULL, missid = NA_character_, col_aff = "affection", normalize = TRUE, ... ) ## S4 method for signature 'data.frame' Pedigree( obj = data.frame(indId = character(), fatherId = character(), motherId = character(), gender = numeric(), family = character(), available = numeric(), vitalStatus = numeric(), affection = numeric(), sterilisation = numeric()), rel_df = data.frame(id1 = character(), id2 = character(), code = numeric(), famid = character()), cols_ren_ped = list(indId = "id", fatherId = "dadid", motherId = "momid", family = "famid", gender = "sex", sterilisation = "steril", affection = "affected", available = "avail", vitalStatus = "status"), cols_ren_rel = list(id1 = "indId1", id2 = "indId2", famid = "family"), hints = list(horder = NULL, spouse = NULL), normalize = TRUE, missid = NA_character_, col_aff = "affection", ... )
Pedigree(obj, ...) ## S4 method for signature 'character_OR_integer' Pedigree( obj, dadid, momid, sex, famid = NA, avail = NULL, affected = NULL, status = NULL, steril = NULL, rel_df = NULL, missid = NA_character_, col_aff = "affection", normalize = TRUE, ... ) ## S4 method for signature 'data.frame' Pedigree( obj = data.frame(indId = character(), fatherId = character(), motherId = character(), gender = numeric(), family = character(), available = numeric(), vitalStatus = numeric(), affection = numeric(), sterilisation = numeric()), rel_df = data.frame(id1 = character(), id2 = character(), code = numeric(), famid = character()), cols_ren_ped = list(indId = "id", fatherId = "dadid", motherId = "momid", family = "famid", gender = "sex", sterilisation = "steril", affection = "affected", available = "avail", vitalStatus = "status"), cols_ren_rel = list(id1 = "indId1", id2 = "indId2", famid = "family"), hints = list(horder = NULL, spouse = NULL), normalize = TRUE, missid = NA_character_, col_aff = "affection", ... )
obj |
A vector of the individuals identifiers or a data.frame
with the individuals informations. See |
... |
Arguments passed on to |
dadid |
A vector containing for each subject, the identifiers of the biologicals fathers. |
momid |
A vector containing for each subject, the identifiers of the biologicals mothers. |
sex |
A character, factor or numeric vector corresponding to
the gender of the individuals. This will be transformed to an ordered factor
with the following levels:
|
famid |
A character vector with the family identifiers of the individuals. If provide, will be aggregated to the individuals identifiers separated by an underscore. |
avail |
A logical vector with the availability status of the
individuals
(i.e. |
affected |
A logical vector with the affection status of the
individuals
(i.e. |
status |
A logical vector with the affection status of the
individuals
(i.e. |
steril |
A logical vector with the sterilisation status of the
individuals
(i.e. |
rel_df |
A data.frame with the special relationships between
individuals. See
The value relation code recognized by the function are the one defined
by the |
missid |
A character vector with the missing values identifiers.
All the id, dadid and momid corresponding to those values will be set
to |
col_aff |
A character vector with the name of the column to be used for the affection status. |
normalize |
A logical to know if the data should be normalised. |
cols_ren_ped |
A named list with the columns to rename for the
pedigree dataframe. This is useful if you want to use a dataframe with
different column names. The names of the list should be the new column
names and the values should be the old column names. The default values
are to be used with |
cols_ren_rel |
A named list with the columns to rename for the relationship matrix. This is useful if you want to use a dataframe with different column names. The names of the list should be the new column names and the values should be the old column names. |
hints |
A Hints object or a named list containing |
If the normalization is set to TRUE
, then the data will be
standardized using the function norm_ped()
and norm_rel()
.
If a data.frame is given, the columns names needed will depend if the normalization is selected or not. If the normalization is selected, the columns names needed are as follow and if not the columns names needed are in parenthesis:
indID
: the individual identifier (id
)
fatherId
: the identifier of the biological father (dadid
)
motherId
: the identifier of the biological mother (momid
)
gender
: the sex of the individual (sex
)
family
: the family identifier of the individual (famid
)
sterilisation
: the sterilisation status of the individual (steril
)
available
: the availability status of the individual (avail
)
vitalStatus
: the death status of the individual (status
)
affection
: the affection status of the individual (affected
)
...
: other columns that will be stored in the elementMetadata
slot
The minimum columns required are :
indID
/ id
fatherId
/ dadid
motherId
/ momid
gender
/ sex
The family
/ famid
column can also be used to specify the family of the
individuals and will be merge to the indId
/ id
field separated by an
underscore.
The columns sterilisation
, available
, vitalStatus
, affection
will be transformed with the vect_to_binary()
function when the
normalisation is selected.
If you do not use the normalisation, the columns will be checked to
be 0
or 1
.
If affected
is a data.frame, col_aff will be overwritten by the column
names of the data.frame.
A Pedigree object.
ped
A Ped object for the identity informations. See Ped()
for
more informations.
rel
A Rel object for the special relationships. See Rel()
for
more informations.
scales
A Scales object for the filling and bordering
colors used in the plot. See Scales()
for more informations.
hints
A Hints object for the ordering of the
individuals in the plot. See Hints()
for more informations.
For all the following accessors, the x
parameters is a Pedigree object.
Each getters return a vector of the same length as x
with the values
of the corresponding slot.
famid(x)
: Get the family identifiers of a Pedigree object. This
function is a wrapper around famid(ped(x))
.
ped(x, slot)
: Get the value of a specific slot of the Ped object
ped(x)
: Get the Ped object
ped(x, slot) <- value
: Set the value of a specific slot of
the Ped object
Wrapper of slot(ped(x)) <- value
ped(x) <- value
: Set the Ped object
mcols(x)
: Get the metadata of a Pedigree object.
This function is a wrapper around mcols(ped(x))
.
mcols(x) <- value
: Set the metadata of a Pedigree object.
This function is a wrapper around mcols(ped(x)) <- value
.
rel(x, slot)
: Get the value of a specific slot of the Rel object
rel(x)
: Get the Rel object
rel(x, slot) <- value
: Set the value of a specific slot of the
Rel object
Wrapper of slot(rel(x)) <- value
rel(x) <- value
: Set the Rel object
scales(x)
: Get the Scales object
scales(x) <- value
: Set the Scales object
fill(x)
: Get the fill data.frame from the Scales object.
Wrapper of fill(scales(x))
fill(x) <- value
: Set the fill data.frame from the Scales object.
Wrapper of fill(scales(x)) <- value
border(x)
: Get the border data.frame from the Scales object.
Wrapper of border(scales(x))
border(x) <- value
: Set the border data.frame from the Scales object.
Wrapper of border(scales(x)) <- value
hints(x)
: Get the Hints object
hints(x) <- value
: Set the Hints object
horder(x)
: Get the horder vector from the Hints object.
Wrapper of horder(hints(x))
horder(x) <- value
: Set the horder vector from the Hints object.
Wrapper of horder(hints(x)) <- value
spouse(x)
: Get the spouse data.frame from the Hints object.
Wrapper of spouse(hints(x))
.
spouse(x) <- value
: Set the spouse data.frame from the Hints object.
Wrapper of spouse(hints(x)) <- value
.
length(x)
: Get the length of a Pedigree object.
Wrapper of length(ped(x))
.
show(x)
: Print the information of the Ped and Rel
object inside the Pedigree object.
summary(x)
: Compute the summary of the Ped and Rel object
inside the Pedigree object.
as.list(x)
: Convert a Pedigree object to a list
subset(x, i, keep = TRUE)
: Subset a Pedigree object
based on the individuals identifiers given.
i
: A vector of individuals identifiers to keep.
del_parents
: A logical value indicating if the parents
of the individuals should be deleted.
keep
: A logical value indicating if the individuals
should be kept or deleted.
x[i, del_parents, keep]
: Subset a Pedigree object
based on the individuals identifiers given.
Pedigree()
, Ped()
, Rel()
, Scales()
, Hints()
Pedigree( obj = c("1", "2", "3", "4", "5", "6"), dadid = c("4", "4", "6", "0", "0", "0"), momid = c("5", "5", "5", "0", "0", "0"), sex = c(1, 2, 3, 1, 2, 1), avail = c(0, 1, 0, 1, 0, 1), affected = matrix(c( 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1 ), ncol = 2), col_aff = c("aff1", "aff2"), missid = "0", rel_df = matrix(c( "1", "2", 2 ), ncol = 3, byrow = TRUE), ) data(sampleped) Pedigree(sampleped)
Pedigree( obj = c("1", "2", "3", "4", "5", "6"), dadid = c("4", "4", "6", "0", "0", "0"), momid = c("5", "5", "5", "0", "0", "0"), sex = c(1, 2, 3, 1, 2, 1), avail = c(0, 1, 0, 1, 0, 1), affected = matrix(c( 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1 ), ncol = 2), col_aff = c("aff1", "aff2"), missid = "0", rel_df = matrix(c( "1", "2", 2 ), ncol = 3, byrow = TRUE), ) data(sampleped) Pedigree(sampleped)
Given a vector of length n, generate all possible permutations of the numbers 1 to n. This is a recursive routine, and is not very efficient.
permute(x)
permute(x)
x |
A vector of length n |
A matrix with n cols and n! rows
This function is used to create a plot from a data.frame.
If ggplot_gen = TRUE
, the plot will be generated with ggplot2 and
will be returned invisibly.
plot_fromdf( df, usr = NULL, title = NULL, ggplot_gen = FALSE, boxw = 1, boxh = 1, add_to_existing = FALSE )
plot_fromdf( df, usr = NULL, title = NULL, ggplot_gen = FALSE, boxw = 1, boxh = 1, add_to_existing = FALSE )
df |
A data.frame with the following columns:
|
usr |
The user coordinates of the plot. |
title |
The title of the plot. |
ggplot_gen |
If TRUE add the segments to the ggplot object |
boxw |
Width of the polygons elements |
boxh |
Height of the polygons elements |
add_to_existing |
If |
an invisible ggplot object and a plot on the current plotting device
data(sampleped) ped1 <- Pedigree(sampleped[sampleped$famid == 1,]) lst <- ped_to_plotdf(ped1) #plot_fromdf(lst$df, lst$par_usr$usr, # boxw = lst$par_usr$boxw, boxh = lst$par_usr$boxh #)
data(sampleped) ped1 <- Pedigree(sampleped[sampleped$famid == 1,]) lst <- ped_to_plotdf(ped1) #plot_fromdf(lst$df, lst$par_usr$usr, # boxw = lst$par_usr$boxw, boxh = lst$par_usr$boxh #)
This function is used to plot a Pedigree object.
It is a wrapper for plot_fromdf()
and ped_to_plotdf()
as well as
ped_to_legdf()
if legend = TRUE
.
## S4 method for signature 'Pedigree,missing' plot( x, aff_mark = TRUE, label = NULL, ggplot_gen = FALSE, cex = 1, symbolsize = 1, branch = 0.6, packed = TRUE, align = c(1.5, 2), width = 6, title = NULL, subreg = NULL, pconnect = 0.5, fam_to_plot = 1, legend = FALSE, leg_cex = 0.8, leg_symbolsize = 0.5, leg_loc = NULL, leg_adjx = 0, leg_adjy = 0, ... )
## S4 method for signature 'Pedigree,missing' plot( x, aff_mark = TRUE, label = NULL, ggplot_gen = FALSE, cex = 1, symbolsize = 1, branch = 0.6, packed = TRUE, align = c(1.5, 2), width = 6, title = NULL, subreg = NULL, pconnect = 0.5, fam_to_plot = 1, legend = FALSE, leg_cex = 0.8, leg_symbolsize = 0.5, leg_loc = NULL, leg_adjx = 0, leg_adjy = 0, ... )
x |
A Pedigree object. |
aff_mark |
If |
label |
If not |
ggplot_gen |
If TRUE add the segments to the ggplot object |
cex |
Character expansion of the text |
symbolsize |
Size of the symbols |
branch |
defines how much angle is used to connect various levels of nuclear families. |
packed |
Should the Pedigree be compressed. (i.e. allow diagonal lines connecting parents to children in order to have a smaller overall width for the plot.) |
align |
For a packed Pedigree, align children under parents |
width |
For a packed output, the minimum width of the plot, in inches. |
title |
The title of the plot. |
subreg |
A 4-element vector for (min x, max x, min depth, max depth),
used to edit away portions of the plot coordinates returned by
|
pconnect |
When connecting parent to children the program will try to
make the connecting line as close to vertical as possible, subject to it
lying inside the endpoints of the line that connects the children by at
least |
fam_to_plot |
default=1. If the Pedigree contains multiple families,
this parameter can be used to select which family to plot.
It can be a numeric value or a character value. If numeric, it is the
index of the family to plot returned by |
legend |
default=FALSE. If TRUE, a legend will be added to the plot. |
leg_cex |
default=0.8. Controls the size of the legend text. |
leg_symbolsize |
default=0.5. Controls the size of the legend symbols. |
leg_loc |
default=NULL. If NULL, the legend will be placed in the upper right corner of the plot. Otherwise, a 4-element vector of the form (x0, x1, y0, y1) can be used to specify the location of the legend. |
leg_adjx |
default=0. Controls the horizontal labels adjustment of the legend. |
leg_adjy |
default=0. Controls the vertical labels adjustment of the legend. |
... |
Extra options that feed into the plot function. |
Two important parameters control the looks of the result. One is the user specified maximum width. The smallest possible width is the maximum number of subjects on a line, if the user's suggestion is too low it is increased to 1 + that amount (to give just a little wiggle room).
To make a Pedigree where all children are centered under parents simply make the width large enough, however, the symbols may get very small.
The second is align
, a vector of 2 alignment parameters a
and
b
.
For each set of siblings at a set of locations x
and with parents at
p=c(p1,p2)
the alignment penalty is
Where k is the number of siblings in the set.
When a = 1
moving a sibship with k
sibs one unit to the
left or right of optimal will incur the same cost as moving one with
only 1 or two sibs out of place.
If a = 0
then large sibships are harder to move than small ones,
with the default value a = 1.5
they are slightly easier to move
than small ones. The rationale for the default is as long as the parents
are somewhere between the first and last siblings the result looks fairly
good, so we are more flexible with the spacing of a large family.
By tethering all the sibs to a single spot they are kept close to each other.
The alignment penalty for spouses is ,
which tends to keep them together. The size of
b
controls the relative
importance of sib-parent and spouse-spouse closeness.
an invisible list containing
df : the data.frame used to plot the Pedigree
par_usr : the user coordinates used to plot the Pedigree
ggplot : the ggplot object if ggplot_gen = TRUE
Creates plot on current plotting device.
data(sampleped) pedAll <- Pedigree(sampleped) #plot(pedAll)
data(sampleped) pedAll <- Pedigree(sampleped) #plot(pedAll)
Create a list of x and y coordinates for a polygon with a given number of slices and a list of coordinates for the polygon.
polyfun(nslice, coor)
polyfun(nslice, coor)
nslice |
Number of slices in the polygon |
coor |
Element form which to generate the polygon containing x and y coordinates and theta |
a list of x and y coordinates
polyfun(2, list( x = c(-0.5, -0.5, 0.5, 0.5), y = c(-0.5, 0.5, 0.5, -0.5), theta = -c(3, 5, 7, 9) * pi / 4 ))
polyfun(2, list( x = c(-0.5, -0.5, 0.5, 0.5), y = c(-0.5, 0.5, 0.5, -0.5), theta = -c(3, 5, 7, 9) * pi / 4 ))
Create a list of polygonal elements with x, y coordinates and theta for the square, circle, diamond and triangle. The number of slices in each element can be specified.
polygons(nslice = 1)
polygons(nslice = 1)
nslice |
Number of slices in each element
If nslice > 1, the elements are created with |
a list of polygonal elements with x, y coordinates and theta by slice.
polygons() polygons(4)
polygons() polygons(4)
Relationship code variable to ordered factor
rel_code_to_factor(code)
rel_code_to_factor(code)
code |
A character, factor or numeric vector corresponding to the relation code of the individuals:
|
an ordered factor vector containing the transformed variable "MZ twin" < "DZ twin" < "UZ twin" < "Spouse"
rel_code_to_factor(c(1, 2, 3, 4, "MZ twin", "DZ twin", "UZ twin", "Spouse"))
rel_code_to_factor(c(1, 2, 3, 4, "MZ twin", "DZ twin", "UZ twin", "Spouse"))
S4 class to represent the special relationships in a Pedigree.
You either need to provide a vector of the same size for each slot
or a data.frame
with the corresponding columns.
## S4 method for signature 'data.frame' Rel(obj) ## S4 method for signature 'character_OR_integer' Rel(obj, id2, code, famid = NA_character_)
## S4 method for signature 'data.frame' Rel(obj) ## S4 method for signature 'character_OR_integer' Rel(obj, id2, code, famid = NA_character_)
obj |
A character vector with the id of the first individuals of each
pairs or a |
id2 |
A character vector with the id of the second individuals of each pairs |
code |
A character, factor or numeric vector corresponding to the relation code of the individuals:
|
famid |
A character vector with the family identifiers of the individuals. If provide, will be aggregated to the individuals identifiers separated by an underscore. |
A Rel object is a list of special relationships
between individuals in the pedigree.
It is used to create a Pedigree object.
The minimal needed informations are id1
, id2
and code
.
If a famid
is provided, the individuals id
will be aggregated
to the famid
character to ensure the uniqueness of the id
.
A Rel object.
id1
A character vector with the id of the first individual.
id2
A character vector with the id of the second individual.
code
An ordered factor vector with the code of the special relationship.
(i.e. MZ twin
< DZ twin
< UZ twin
< Spouse
).
famid
A character vector with the famid of the individuals.
For all the following accessors, the x
parameters is a Rel object.
Each getters return a vector of the same length as x
with the values
of the corresponding slot.
code(x)
: Relationships' code
id1(x)
: Relationships' first individuals' identifier
id2(x)
: Relationships' second individuals' identifier
famid(x)
: Relationships' individuals' family identifier
famid(x) <- value
: Set the relationships' individuals' family
identifier
value
: A character or integer vector of the same length as x
with the family identifiers
summary(x)
: Compute the summary of a Rel object
show(x)
: Convert the Rel object to a data.frame
and print it with its summary.
as.list(x)
: Convert a Rel object to a list
as.data.frame(x)
: Convert a Rel object to a data.frame
subset(x, i, keep = TRUE)
: Subset a Rel object
based on the individuals identifiers given.
i
: A vector of individuals identifiers to keep.
keep
: A logical value indicating if the individuals
should be kept or deleted.
rel_df <- data.frame( id1 = c("1", "2", "3"), id2 = c("2", "3", "4"), code = c(1, 2, 3) ) Rel(rel_df) Rel( obj = c("1", "2", "3"), id2 = c("2", "3", "4"), code = c(1, 2, 3) )
rel_df <- data.frame( id1 = c("1", "2", "3"), id2 = c("2", "3", "4"), code = c(1, 2, 3) ) Rel(rel_df) Rel( obj = c("1", "2", "3"), id2 = c("2", "3", "4"), code = c(1, 2, 3) )
Small sample pedigree data set for testing purposes.
data("sampleped")
data("sampleped")
A data frame with 55 observations, one line per subject, on the following 7 variables.
famid
: Family identifier
id
: Subject identifier
dadid
: Identifier of the father, if the father is part of the
data set; zero otherwise
momid
: Identifier of the mother, if the mother is part of the
data set; zero otherwise
sex
: 1
for male or 2
for female
affected
: 1
or 0
avail
: 1
or 0
This is a small fictive pedigree data set, with 55 individuals in 2 families. The aim was to create a data set with a variety of pedigree structures.
data("sampleped") ped <- Pedigree(sampleped) summary(ped) #plot(ped)
data("sampleped") ped <- Pedigree(sampleped) summary(ped) #plot(ped)
A Scales object is a list of two data.frame. The first one is used to represent the affection status of the individuals and therefore the filling of the individuals in the pedigree plot. The second one is used to represent the availability status of the individuals and therefore the border color of the individuals in the pedigree plot.
You need to provide both fill and border in the dedicated parameters.
However this is usually done using the generate_colors()
function with a
Pedigree object.
Scales(fill, border) ## S4 method for signature 'data.frame,data.frame' Scales(fill, border)
Scales(fill, border) ## S4 method for signature 'data.frame,data.frame' Scales(fill, border)
fill |
A data.frame with the informations for the affection status. The columns needed are:
|
border |
A data.frame with the informations for the availability status. The columns needed are:
|
A Scales object.
fill
A data.frame with the informations for the affection status. The columns needed are:
'order': the order of the affection to be used
'column_values': name of the column containing the raw values in the Ped object
'column_mods': name of the column containing the mods of the transformed values in the Ped object
'mods': all the different mods
'labels': the corresponding labels of each mods
'affected': a logical value indicating if the mod correspond to an affected individuals
'fill': the color to use for this mods
'density': the density of the shading
'angle': the angle of the shading
border
A data.frame with the informations for the availability status. The columns needed are:
'column_values': name of the column containing the raw values in the Ped object
'column_mods': name of the column containing the mods of the transformed values in the Ped object
'mods': all the different mods
'labels': the corresponding labels of each mods
'border': the color to use for this mods
fill(x)
: Get the fill data.frame
fill(x) <- value
: Set the fill data.frame
border(x)
: Get the border data.frame
border(x) <- value
: Set the border data.frame
as.list(x)
: Convert a Scales object to a list
Scales( fill = data.frame( order = 1, column_values = "affected", column_mods = "affected_mods", mods = c(0, 1), labels = c("unaffected", "affected"), affected = c(FALSE, TRUE), fill = c("white", "red"), density = c(NA, 20), angle = c(NA, 45) ), border = data.frame( column_values = "avail", column_mods = "avail_mods", mods = c(0, 1), labels = c("not available", "available"), border = c("black", "blue") ) )
Scales( fill = data.frame( order = 1, column_values = "affected", column_mods = "affected_mods", mods = c(0, 1), labels = c("unaffected", "affected"), affected = c(FALSE, TRUE), fill = c("white", "red"), density = c(NA, 20), angle = c(NA, 45) ), border = data.frame( column_values = "avail", column_mods = "avail_mods", mods = c(0, 1), labels = c("not available", "available"), border = c("black", "blue") ) )
Set plotting area
set_plot_area(cex, id, maxlev, xrange, symbolsize, ...)
set_plot_area(cex, id, maxlev, xrange, symbolsize, ...)
cex |
Character expansion of the text |
id |
A character vector with the identifiers of each individuals |
maxlev |
Maximum level |
xrange |
Range of x values |
symbolsize |
Size of the symbols |
... |
Other arguments passed to |
List of user coordinates, old par, box width, box height, label height and leg height
Gender variable to ordered factor
sex_to_factor(sex)
sex_to_factor(sex)
sex |
A character, factor or numeric vector corresponding to
the gender of the individuals. This will be transformed to an ordered factor
with the following levels:
|
an ordered factor vector containing the transformed variable "male" < "female" < "unknown" < "terminated"
sex_to_factor(c(1, 2, 3, 4, "f", "m", "man", "female"))
sex_to_factor(c(1, 2, 3, 4, "f", "m", "man", "female"))
Shift set of siblings to the left or right
shift(id, sibs, goleft, hint, twinrel, twinset)
shift(id, sibs, goleft, hint, twinrel, twinset)
id |
The id of the subject to be shifted |
sibs |
The ids of the siblings |
goleft |
If |
hint |
The current hint vector |
twinrel |
The twin relationship matrix |
twinset |
The twinset vector |
This routine is used by auto_hint()
.
It shifts a set of siblings to the left or right, so that the
marriage is on the edge of the set of siblings, closest to the spouse.
It also shifts the subject himself, so that he is on the edge of the
set of siblings, closest to the spouse.
It also shifts the monozygotic twins, if any, so that they are
together within the set of twins.
The updated hint vector
Shrink Pedigree object to specified bit size with priority placed on trimming uninformative subjects. The algorithm is useful for getting a Pedigree condensed to a minimally informative size for algorithms or testing that are limited by size of the Pedigree.
If avail or affected are NULL
, they are extracted with their
corresponding accessors from the Ped object.
## S4 method for signature 'Pedigree' shrink(obj, avail = NULL, affected = NULL, max_bits = 16) ## S4 method for signature 'Ped' shrink(obj, avail = NULL, affected = NULL, max_bits = 16)
## S4 method for signature 'Pedigree' shrink(obj, avail = NULL, affected = NULL, max_bits = 16) ## S4 method for signature 'Ped' shrink(obj, avail = NULL, affected = NULL, max_bits = 16)
obj |
A Pedigree or Ped object. |
avail |
A logical vector with the availability status of the
individuals
(i.e. |
affected |
A logical vector with the affection status of the
individuals
(i.e. |
max_bits |
Optional, the bit size for which to shrink the Pedigree |
Iteratively remove subjects from the Pedigree. The random removal of members was previously controlled by a seed argument, but we remove this, forcing users to control randomness outside the function. First remove uninformative subjects, i.e., unavailable (not genotyped) with no available descendants. Next, available terminal subjects with unknown phenotype if both parents available. Last, iteratively shrinks Pedigrees by preferentially removing individuals (chosen at random if there are multiple of the same status):
Subjects with unknown affected status
Subjects with unaffected affected status
Affected subjects.
A list containing the following elements:
pedObj: Pedigree object after trimming
id_trim: Vector of ids trimmed from Pedigree
id_lst: List of ids trimmed by category
bit_size: Vector of bit sizes after each trimming step
avail: Vector of availability status after trimming
pedSizeOriginal: Number of subjects in original Pedigree
pedSizeIntermed: Number of subjects after initial trimming
pedSizeFinal: Number of subjects after final trimming
Original by Dan Schaid, updated by Jason Sinnwell and Louis Le Nézet
data(sampleped) ped1 <- Pedigree(sampleped[sampleped$famid == '1',]) shrink(ped1, max_bits = 12)
data(sampleped) ped1 <- Pedigree(sampleped[sampleped$famid == '1',]) shrink(ped1, max_bits = 12)
Subset a region of a Pedigree
subregion(plist, subreg)
subregion(plist, subreg)
plist |
The alignment structure representing the Pedigree layout.
See |
subreg |
A 4-element vector for (min x, max x, min depth, max depth),
used to edit away portions of the plot coordinates returned by
|
A Pedigree structure with the specified region
Update the family prefix in the individuals identifiers. Individuals identifiers are constructed as follow famid_id. Therefore to update their family prefix the ids are split by the first underscore and the first part is overwritten by famid.
## S4 method for signature 'character,ANY' upd_famid_id(obj, famid, missid = NA_character_) ## S4 method for signature 'Ped,character_OR_integer' upd_famid_id(obj, famid) ## S4 method for signature 'Ped,missing' upd_famid_id(obj) ## S4 method for signature 'Rel,character_OR_integer' upd_famid_id(obj, famid) ## S4 method for signature 'Rel,missing' upd_famid_id(obj) ## S4 method for signature 'Pedigree,character_OR_integer' upd_famid_id(obj, famid) ## S4 method for signature 'Pedigree,missing' upd_famid_id(obj)
## S4 method for signature 'character,ANY' upd_famid_id(obj, famid, missid = NA_character_) ## S4 method for signature 'Ped,character_OR_integer' upd_famid_id(obj, famid) ## S4 method for signature 'Ped,missing' upd_famid_id(obj) ## S4 method for signature 'Rel,character_OR_integer' upd_famid_id(obj, famid) ## S4 method for signature 'Rel,missing' upd_famid_id(obj) ## S4 method for signature 'Pedigree,character_OR_integer' upd_famid_id(obj, famid) ## S4 method for signature 'Pedigree,missing' upd_famid_id(obj)
obj |
Ped or Pedigree object or a character vector of individual ids |
famid |
A character vector with the family identifiers of the individuals. If provide, will be aggregated to the individuals identifiers separated by an underscore. |
missid |
A character vector with the missing values identifiers.
All the id, dadid and momid corresponding to those values will be set
to |
If famid is missing, then the famid()
function will be called on the
object.
A character vector of individual ids with family prefix updated
upd_famid_id(c("1", "2", "B_3"), c("A", "B", "A")) upd_famid_id(c("1", "B_2", "C_3", "4"), c("A", NA, "A", NA)) data(sampleped) ped1 <- Pedigree(sampleped[,-1]) id(ped(ped1)) new_fam <- make_famid(id(ped(ped1)), dadid(ped(ped1)), momid(ped(ped1))) id(ped(upd_famid_id(ped1, new_fam))) data(sampleped) ped1 <- Pedigree(sampleped[,-1]) make_famid(ped1)
upd_famid_id(c("1", "2", "B_3"), c("A", "B", "A")) upd_famid_id(c("1", "B_2", "C_3", "4"), c("A", NA, "A", NA)) data(sampleped) ped1 <- Pedigree(sampleped[,-1]) id(ped(ped1)) new_fam <- make_famid(id(ped(ped1)), dadid(ped(ped1)), momid(ped(ped1))) id(ped(upd_famid_id(ped1, new_fam))) data(sampleped) ped1 <- Pedigree(sampleped[,-1]) make_famid(ped1)
Compute the usefulness of individuals
## S4 method for signature 'character' useful_inds( obj, dadid, momid, avail, affected, num_child_tot, informative = "AvAf", keep_infos = FALSE ) ## S4 method for signature 'Pedigree' useful_inds(obj, informative = "AvAf", keep_infos = FALSE, reset = FALSE) ## S4 method for signature 'Ped' useful_inds(obj, informative = "AvAf", keep_infos = FALSE, reset = FALSE)
## S4 method for signature 'character' useful_inds( obj, dadid, momid, avail, affected, num_child_tot, informative = "AvAf", keep_infos = FALSE ) ## S4 method for signature 'Pedigree' useful_inds(obj, informative = "AvAf", keep_infos = FALSE, reset = FALSE) ## S4 method for signature 'Ped' useful_inds(obj, informative = "AvAf", keep_infos = FALSE, reset = FALSE)
obj |
A character vector with the id of the individuals or a
|
dadid |
A vector containing for each subject, the identifiers of the biologicals fathers. |
momid |
A vector containing for each subject, the identifiers of the biologicals mothers. |
avail |
A logical vector with the availability status of the
individuals
(i.e. |
affected |
A logical vector with the affection status of the
individuals
(i.e. |
num_child_tot |
A numeric vector of the number of children of each individuals |
informative |
Informative individuals selection can take 5 values:
|
keep_infos |
Boolean to indicate if individuals with unknown status but available or reverse should be kept |
reset |
Boolean to indicate if the |
Check for the informativeness of the individuals based on the
informative parameter given, the number of children and the usefulness
of their parents. A useful
slot is added to the Ped object with the
usefulness of the individual. This boolean is hereditary.
A vector of useful individuals identifiers
The Pedigree or Ped object with the slot 'useful' containing TRUE
for
useful individuals and FALSE
otherwise.
data(sampleped) ped1 <- Pedigree(sampleped[sampleped$famid == "1",]) ped(useful_inds(ped1, informative = "AvAf"))
data(sampleped) ped1 <- Pedigree(sampleped[sampleped$famid == "1",]) ped(useful_inds(ped1, informative = "AvAf"))
Transform a vector to a binary vector.
All values that are not 0
, 1
, TRUE
, FALSE
, or NA
are transformed to NA
.
vect_to_binary(vect, logical = FALSE)
vect_to_binary(vect, logical = FALSE)
vect |
A character, factor, logical or numeric vector corresponding to
a binary variable (i.e.
|
logical |
Boolean defining if the output should be a logical vector
instead of a numeric vector (i.e. |
numeric binary vector of the same size as vect
with 0
and 1
vect_to_binary( c(0, 1, 2, 3.6, "TRUE", "FALSE", "0", "1", "NA", "B", TRUE, FALSE, NA) )
vect_to_binary( c(0, 1, 2, 3.6, "TRUE", "FALSE", "0", "1", "NA", "B", TRUE, FALSE, NA) )